JDownloader Community - Appwork GmbH
 

Notices

Closed Thread
 
Thread Tools Display Modes
  #1  
Old 15.03.2019, 14:39
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default How to detect a strange characters?

File names visually look traditional, can not see what strange characters.
However, in reality the name contains some strange character that I do not know how to detect?

The name is in Polish, so the 1250 (Central European) encoding is correct.
UTF-8 for Polish language is not required.
But how do detect a "strange character"?
  #2  
Old 15.03.2019, 14:53
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

Can you give examples? you have to explain what you mean by *strange character* ?
__________________
JD-Dev & Server-Admin
  #3  
Old 16.03.2019, 09:24
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

ex:

Co sprawia, ¿e ​​zapominasz o wszystkim?
  #4  
Old 17.03.2019, 11:49
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Jiaz - #post 16.03.2019 08:24 Well, it just shows me the correct, Polish letter, and here on the forum shows a WRONG char ¿ (not polish)
  #5  
Old 18.03.2019, 15:06
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

It depends on encoding if the unicode char will be displayed correctly or not.
In forum the used encoding will be determined by selected language in forum.
__________________
JD-Dev & Server-Admin
  #6  
Old 18.03.2019, 15:07
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

basically you have to either specify a pattern that matches all wanted chars or find a way to detect the *strange* chars within the used encoding.
__________________
JD-Dev & Server-Admin
  #7  
Old 18.03.2019, 15:15
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Use this regx, but without being expected. Zero help.




I do not know what this spell expression and I do not want to know, because it does not work, and at the moment I was trying to make a mistake, and life is one.
  #8  
Old 18.03.2019, 15:49
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

You forgot to show your regex?!
__________________
JD-Dev & Server-Admin
  #9  
Old 18.03.2019, 15:49
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

Quote:
Originally Posted by djmakinera View Post
and at the moment I was trying to make a mistake, and life is one.
???
__________________
JD-Dev & Server-Admin
  #10  
Old 19.03.2019, 04:39
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

But he does not find those specific characters



[^\x{0000}\x{0001}\x{0002}\x{0003}\x{0004}\x{0005}\x{0006}\x{0007}\x{0008}\x{0009}\x{000a}\x{000b}\x{ 000c}\x{000d}\x{000e}\x{000f}\x{0010}\x{0011}\x{0012}\x{0013}\x{0014}\x{0015}\x{0016}\x{0017}\x{0018 }\x{0019}\x{001a}\x{001b}\x{001c}\x{001d}\x{001e}\x{001f}\x{0020}\x{0021}\x{0022}\x{0023}\x{0024}\x{ 0025}\x{0026}\x{0027}\x{0028}\x{0029}\x{002A}\x{002B}\x{002C}\x{002D}\x{002E}\x{002F}\x{030}\x{0031} \x{0032}\x{0033}\x{0034}\x{0035}\x{0036}\x{0037}\x{0038}\x{0039}\x{003A}\x{003B}\x{003C}\x{003D}\x{0 03E}\x{003F}\x{040}\x{0041}\x{0042}\x{0043}\x{0044}\x{0045}\x{0046}\x{0047}\x{0048}\x{0049}\x{004A}\ x{004B}\x{004C}\x{004D}\x{004E}\x{004F}\x{050}\x{0051}\x{0052}\x{0053}\x{0054}\x{0055}\x{0056}\x{005 7}\x{0058}\x{0059}\x{005A}\x{005B}\x{005C}\x{005D}\x{005E}\x{005F}\x{060}\x{0061}\x{0062}\x{0063}\x{ 0064}\x{0065}\x{0066}\x{0067}\x{0068}\x{0069}\x{006A}\x{006B}\x{006C}\x{006D}\x{006E}\x{006F}\x{070} \x{0071}\x{0072}\x{0073}\x{0074}\x{0075}\x{0076}\x{0077}\x{0078}\x{0079}\x{007A}\x{007B}\x{007C}\x{0 07D}\x{007E}\x{007F}\x{0AC}\x{201A}\x{201E}\x{2026}\x{2020}\x{2021}\x{2030}\x{0160}\x{2039}\x{015A}\ x{0164}\x{017D}\x{0179}\x{018}\x{2019}\x{201C}\x{201D}\x{2022}\x{2013}\x{2014}\x{2122}\x{0161}\x{203 A}\x{015B}\x{0165}\x{017E}\x{017A}\x{0A0}\x{02C7}\x{02D8}\x{0141}\x{00A4}\x{0104}\x{00A6}\x{00A7}\x{ 00A8}\x{00A9}\x{015E}\x{00AB}\x{00AC}\x{00AD}\x{00AE}\x{017B}\x{0B0}\x{00B1}\x{02DB}\x{0142}\x{00B4} \x{00B5}\x{00B6}\x{00B7}\x{00B8}\x{0105}\x{015F}\x{00BB}\x{013D}\x{02DD}\x{013E}\x{017C}\x{154}\x{00 C1}\x{00C2}\x{0102}\x{00C4}\x{0139}\x{0106}\x{00C7}\x{010C}\x{00C9}\x{0118}\x{00CB}\x{011A}\x{00CD}\ x{00CE}\x{010E}\x{110}\x{0143}\x{0147}\x{00D3}\x{00D4}\x{0150}\x{00D6}\x{00D7}\x{0158}\x{016E}\x{00D A}\x{0170}\x{00DC}\x{00DD}\x{0162}\x{00DF}\x{155}\x{00E1}\x{00E2}\x{0103}\x{00E4}\x{013A}\x{0107}\x{ 00E7}\x{010D}\x{00E9}\x{0119}\x{00EB}\x{011B}\x{00ED}\x{00EE}\x{010F}\x{111}\x{0144}\x{0148}\x{00F3} \x{00F4}\x{0151}\x{00F6}\x{00F7}\x{0159}\x{016F}\x{00FA}\x{0171}\x{00FC}\x{00FD}\x{0163}\x{02D9}]
  #11  
Old 19.03.2019, 11:20
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

Pattern looks good in regex101
__________________
JD-Dev & Server-Admin
  #12  
Old 19.03.2019, 11:46
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

I have no idea how to find a f***king character that requires coding in another system. Same as in any other forum.
  #13  
Old 19.03.2019, 13:29
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,611
Default

really don't see how this is our problem, thread closed
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
  #14  
Old 19.03.2019, 18:38
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default Unicode chars

I know it's easy to block the topic, and I know it's not a simple topic, if you know the solution, why did not you write the forum?

https://board.jdownloader.org/showthread.php?t=80063
  #15  
Old 19.03.2019, 20:56
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

Quote:
I have no idea how to find a f***king character that requires coding in another system.
You cannot simply find that character. Wrongly decoded characters have different *error codes* in different encodings.


Quote:
The name is in Polish, so the 1250 (Central European) encoding is correct.
UTF-8 for Polish language is not required.
But how do detect a "strange character"?
when you open the textfile with 1250 encoding, then you have to check what replacement character your texteditor/encoding
does use for characters that are unsupported/defect.

You should always pay attention when mixing different encodings!
__________________
JD-Dev & Server-Admin
  #16  
Old 19.03.2019, 21:09
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

I still do not know how to find this char.
Can you tell me how to do it in EmEditor, Notepad++ or another text editor?
In total, I have many texts that look like a normal Polish text, and yet contain some unicode that I do not know about.
I've already used a few regex, but I still do not detect it.
Ps. I already gave the name on the forum, what is strange in this name?
  #17  
Old 20.03.2019, 10:26
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

Please send me an example text file to support@jdownloader.org and tell me what character you're looking for.
__________________
JD-Dev & Server-Admin
  #18  
Old 20.03.2019, 10:26
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

Quote:
Originally Posted by djmakinera View Post
ex:

Co sprawia, ¿e ​​zapominasz o wszystkim?
There is nothing strange here? at least to me?!
do you mean ¿ ?
__________________
JD-Dev & Server-Admin
  #19  
Old 21.03.2019, 14:40
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default Further help needed (Can not copy everything correctly).

Quote:
Please send me an example text file to support@jdownloader.org and tell me what character you're looking for.
Can not copy, paste the text correctly.
If paste the text will show only the characters supported by the post or on the forum, others will not.
I tried to send text (via e-mail), but you will not be able to detect the character.
can not copy everything correctly.
  #20  
Old 21.03.2019, 15:10
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

You can send the text file for example Instead as text, just append a file to mail
__________________
JD-Dev & Server-Admin
  #21  
Old 21.03.2019, 15:24
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

The file is on the forum:
https://board.jdownloader.org/showpo...44&postcount=1
  #22  
Old 21.03.2019, 16:12
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

And what characters do you mean? I can't see/find any strange/broken char.
__________________
JD-Dev & Server-Admin
  #23  
Old 21.03.2019, 16:54
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
And what characters do you mean? I can't see/find any strange/broken char.
Blank Char / Invisible Char

OR

Hidden Char

Save to UTF-8 and save in 1250
Then compare the two text files.
Are different.
  #24  
Old 21.03.2019, 18:23
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

It is very interesting that this feature can not recognize the encoding. Really unique Unicode.
EmEditor

View -> Character Code Value... (CRL+I)
Screenshot:
https://i.postimg.cc/tTBvSBCx/Screen...t-05-16-PM.jpg

Quote:
EmEditor:
This feature allows you to work in almost any language on earth. You can open a file with any encoding supported in the Windows system, and easily convert from one encoding to another within EmEditor.

EmEditor supports Unicode natively since the whole program is built as a Unicode application. Therefore, you can easily open Unicode file names and search for Unicode characters.
  #25  
Old 22.03.2019, 10:08
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

@djmakinera: same as opening the text in hex editor
I will check/respond to mail as soon as I got some free time
__________________
JD-Dev & Server-Admin
  #26  
Old 22.03.2019, 12:05
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

https://board.jdownloader.org/showpo...44&postcount=1
I analyzed the values of character CODE in the text (HTML, SCRIPT)
\{x}200B
;​​;​​
Cf ZERO WIDTH SPACE


Works in JD2 search engine, but does not work in editors, I do not know how compatible it is for Perl Regex and PCRE
Closed Thread

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 22:24.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.