#1
|
|||
|
|||
How to detect a strange characters?
File names visually look traditional, can not see what strange characters.
However, in reality the name contains some strange character that I do not know how to detect? The name is in Polish, so the 1250 (Central European) encoding is correct. UTF-8 for Polish language is not required. But how do detect a "strange character"? |
#2
|
||||
|
||||
Can you give examples? you have to explain what you mean by *strange character* ?
__________________
JD-Dev & Server-Admin |
#3
|
|||
|
|||
ex:
Co sprawia, ¿e zapominasz o wszystkim? |
#4
|
|||
|
|||
Jiaz - #post 16.03.2019 08:24 Well, it just shows me the correct, Polish letter, and here on the forum shows a WRONG char ¿ (not polish)
|
#5
|
||||
|
||||
It depends on encoding if the unicode char will be displayed correctly or not.
In forum the used encoding will be determined by selected language in forum.
__________________
JD-Dev & Server-Admin |
#6
|
||||
|
||||
basically you have to either specify a pattern that matches all wanted chars or find a way to detect the *strange* chars within the used encoding.
__________________
JD-Dev & Server-Admin |
#7
|
|||
|
|||
Use this regx, but without being expected. Zero help.
I do not know what this spell expression and I do not want to know, because it does not work, and at the moment I was trying to make a mistake, and life is one. |
#8
|
||||
|
||||
You forgot to show your regex?!
__________________
JD-Dev & Server-Admin |
#9
|
||||
|
||||
???
__________________
JD-Dev & Server-Admin |
#10
|
|||
|
|||
But he does not find those specific characters
[^\x{0000}\x{0001}\x{0002}\x{0003}\x{0004}\x{0005}\x{0006}\x{0007}\x{0008}\x{0009}\x{000a}\x{000b}\x{ 000c}\x{000d}\x{000e}\x{000f}\x{0010}\x{0011}\x{0012}\x{0013}\x{0014}\x{0015}\x{0016}\x{0017}\x{0018 }\x{0019}\x{001a}\x{001b}\x{001c}\x{001d}\x{001e}\x{001f}\x{0020}\x{0021}\x{0022}\x{0023}\x{0024}\x{ 0025}\x{0026}\x{0027}\x{0028}\x{0029}\x{002A}\x{002B}\x{002C}\x{002D}\x{002E}\x{002F}\x{030}\x{0031} \x{0032}\x{0033}\x{0034}\x{0035}\x{0036}\x{0037}\x{0038}\x{0039}\x{003A}\x{003B}\x{003C}\x{003D}\x{0 03E}\x{003F}\x{040}\x{0041}\x{0042}\x{0043}\x{0044}\x{0045}\x{0046}\x{0047}\x{0048}\x{0049}\x{004A}\ x{004B}\x{004C}\x{004D}\x{004E}\x{004F}\x{050}\x{0051}\x{0052}\x{0053}\x{0054}\x{0055}\x{0056}\x{005 7}\x{0058}\x{0059}\x{005A}\x{005B}\x{005C}\x{005D}\x{005E}\x{005F}\x{060}\x{0061}\x{0062}\x{0063}\x{ 0064}\x{0065}\x{0066}\x{0067}\x{0068}\x{0069}\x{006A}\x{006B}\x{006C}\x{006D}\x{006E}\x{006F}\x{070} \x{0071}\x{0072}\x{0073}\x{0074}\x{0075}\x{0076}\x{0077}\x{0078}\x{0079}\x{007A}\x{007B}\x{007C}\x{0 07D}\x{007E}\x{007F}\x{0AC}\x{201A}\x{201E}\x{2026}\x{2020}\x{2021}\x{2030}\x{0160}\x{2039}\x{015A}\ x{0164}\x{017D}\x{0179}\x{018}\x{2019}\x{201C}\x{201D}\x{2022}\x{2013}\x{2014}\x{2122}\x{0161}\x{203 A}\x{015B}\x{0165}\x{017E}\x{017A}\x{0A0}\x{02C7}\x{02D8}\x{0141}\x{00A4}\x{0104}\x{00A6}\x{00A7}\x{ 00A8}\x{00A9}\x{015E}\x{00AB}\x{00AC}\x{00AD}\x{00AE}\x{017B}\x{0B0}\x{00B1}\x{02DB}\x{0142}\x{00B4} \x{00B5}\x{00B6}\x{00B7}\x{00B8}\x{0105}\x{015F}\x{00BB}\x{013D}\x{02DD}\x{013E}\x{017C}\x{154}\x{00 C1}\x{00C2}\x{0102}\x{00C4}\x{0139}\x{0106}\x{00C7}\x{010C}\x{00C9}\x{0118}\x{00CB}\x{011A}\x{00CD}\ x{00CE}\x{010E}\x{110}\x{0143}\x{0147}\x{00D3}\x{00D4}\x{0150}\x{00D6}\x{00D7}\x{0158}\x{016E}\x{00D A}\x{0170}\x{00DC}\x{00DD}\x{0162}\x{00DF}\x{155}\x{00E1}\x{00E2}\x{0103}\x{00E4}\x{013A}\x{0107}\x{ 00E7}\x{010D}\x{00E9}\x{0119}\x{00EB}\x{011B}\x{00ED}\x{00EE}\x{010F}\x{111}\x{0144}\x{0148}\x{00F3} \x{00F4}\x{0151}\x{00F6}\x{00F7}\x{0159}\x{016F}\x{00FA}\x{0171}\x{00FC}\x{00FD}\x{0163}\x{02D9}] |
#11
|
||||
|
||||
Pattern looks good in regex101
__________________
JD-Dev & Server-Admin |
#12
|
|||
|
|||
I have no idea how to find a f***king character that requires coding in another system. Same as in any other forum.
|
#13
|
||||
|
||||
really don't see how this is our problem, thread closed
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
#14
|
|||
|
|||
Unicode chars
I know it's easy to block the topic, and I know it's not a simple topic, if you know the solution, why did not you write the forum?
https://board.jdownloader.org/showthread.php?t=80063 |
#15
|
||||
|
||||
Quote:
Quote:
does use for characters that are unsupported/defect. You should always pay attention when mixing different encodings!
__________________
JD-Dev & Server-Admin |
#16
|
|||
|
|||
I still do not know how to find this char.
Can you tell me how to do it in EmEditor, Notepad++ or another text editor? In total, I have many texts that look like a normal Polish text, and yet contain some unicode that I do not know about. I've already used a few regex, but I still do not detect it. Ps. I already gave the name on the forum, what is strange in this name? |
#17
|
||||
|
||||
Please send me an example text file to support@jdownloader.org and tell me what character you're looking for.
__________________
JD-Dev & Server-Admin |
#18
|
||||
|
||||
There is nothing strange here? at least to me?!
do you mean ¿ ?
__________________
JD-Dev & Server-Admin |
#19
|
|||
|
|||
Further help needed (Can not copy everything correctly).
Quote:
If paste the text will show only the characters supported by the post or on the forum, others will not. I tried to send text (via e-mail), but you will not be able to detect the character. can not copy everything correctly. |
#20
|
||||
|
||||
You can send the text file for example Instead as text, just append a file to mail
__________________
JD-Dev & Server-Admin |
#21
|
|||
|
|||
The file is on the forum:
https://board.jdownloader.org/showpo...44&postcount=1 |
#22
|
||||
|
||||
And what characters do you mean? I can't see/find any strange/broken char.
__________________
JD-Dev & Server-Admin |
#23
|
|||
|
|||
Quote:
OR Hidden Char Save to UTF-8 and save in 1250 Then compare the two text files. Are different. |
#24
|
|||
|
|||
It is very interesting that this feature can not recognize the encoding. Really unique Unicode.
EmEditor View -> Character Code Value... (CRL+I) Screenshot: https://i.postimg.cc/tTBvSBCx/Screen...t-05-16-PM.jpg Quote:
|
#25
|
||||
|
||||
@djmakinera: same as opening the text in hex editor
I will check/respond to mail as soon as I got some free time
__________________
JD-Dev & Server-Admin |
#26
|
|||
|
|||
https://board.jdownloader.org/showpo...44&postcount=1
I analyzed the values of character CODE in the text (HTML, SCRIPT) \{x}200B ;; Cf ZERO WIDTH SPACE Works in JD2 search engine, but does not work in editors, I do not know how compatible it is for Perl Regex and PCRE |
Thread Tools | |
Display Modes | |
|
|