JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 03.04.2019, 22:52
Brian
Guest
 
Posts: n/a
Question Selecting unarchiving encoding

Greetings.

I have been using JD2 for a while but this week I had to reinstall it because of a machine migration, and noted the Archive Extractor options. They all worked correctly, but some file formats, specially ZIP, do not register the packed files encoding, they just use the OS default, which can vary. A lot. Thus, to avoid the files being unarchived with messed up names, we need some way to set at least a default encoding option to use in such formats (zip and rar are the problem, 7z is imune because it uses Unicode). Otherwise, the whole auto unarchiving feature becomes useless. Is there any way to set this via a command option? I could not find nothing like it in the preferences panel. I now have have gone back to doing it mannually with Keka, my default unarchiver, which handles this well. To fix bad names consumes more time then unarchiving manually...

Thanks in advance.
Brian

Last edited by Brian; 04.04.2019 at 04:19.
Reply With Quote
  #2  
Old 04.04.2019, 11:02
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,044
Default

The used library for extraction currently doesn't support unicode encoded filenames in zips. With updated/newer library the filenames for zips are correct again. Would it be possible to provide example archive links for testing?


__________________
JD-Dev & Server-Admin
Reply With Quote
  #3  
Old 04.04.2019, 14:24
Brian
Guest
 
Posts: n/a
Default

My problems are with CD titles, such as the on
**External links are only visible to Support Staff****External links are only visible to Support Staff** or
**External links are only visible to Support Staff****External links are only visible to Support Staff**
and serveral other on blogs like
**External links are only visible to Support Staff****External links are only visible to Support Staff**
It is hard to tell if the zip file will have accented (or cyrilic or ideograms) characteres in advance. This one CD with songs by Martín y Soler certainly has a acute accented i in the title and files, and is available in three URLs, not all reachable with JD extensions:
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
The last two change the í when I auto-unpack them in a UTF-8 environment like MacOS or Linux.
Reply With Quote
  #4  
Old 04.04.2019, 14:29
Brian
Guest
 
Posts: n/a
Default

Anyway, since it is not possible to guess 100% correctly the encoding used by ZIP during packing, there should be some way of setting a fallback value to be used with old formats such as ZIP (the worst and more common) that do not support Unicode natively such as 7z (the best and rarely used).

For example, wheh Keka finds a ZIP archive with non-ASCII characteres it tries to guess the encoding and shows a encoding selection dialog box for confirmation unless you set it to always use a predef encoding such as ISO Latin 1. In JD, interaction is unwanted, so a default key would suffice.

Last edited by Brian; 04.04.2019 at 14:33. Reason: providing example
Reply With Quote
  #5  
Old 04.04.2019, 14:31
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,044
Default

rar does support unicode.
zip only supports via special header extension utf8. but not many tools use those.
zip contains *raw* filename which relies on both systems (zip creator, zip extractor) to work on same encoding.
that's why an optional header got introduced to additionally save the filename as UTF8 encoding but none of the archives contained such a header. This header is supported with newer version of the extraction library
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 04.04.2019 at 16:17.
Reply With Quote
  #6  
Old 04.04.2019, 15:47
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,044
Default

The first example works fine with newer extraction library, see https://board.jdownloader.org/showthread.php?t=71069
The 2nd example fails for me because of different system/encoding

Both zip don't contain any UTF8 filename header

Ouput from 7z commandline tool
1.) Montréal - Théâtre
2.) Mart¡n
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 04.04.2019 at 16:18.
Reply With Quote
  #7  
Old 04.04.2019, 16:20
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,044
Default

Quote:
Originally Posted by Brian View Post
For example, wheh Keka finds a ZIP archive with non-ASCII characteres it tries to guess the encoding and shows a encoding selection dialog box for confirmation unless you set it to always use a predef encoding such as ISO Latin 1. In JD, interaction is unwanted, so a default key would suffice.
That's not that easy because we don't get hands on the 'raw' bytes/infos but already processed information by the extraction module, for example zip.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #8  
Old 04.04.2019, 16:26
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,044
Default

At the moment only *wild* guessing as other tools would be possible but *hard* to detect if such a situation is existing.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #9  
Old 04.04.2019, 17:56
Brian
Guest
 
Posts: n/a
Default

OK, thanks for clarifying! Since we cannot control how others create the archives I believe this is as good as it gets.
Thanks again.
Reply With Quote
  #10  
Old 04.04.2019, 18:25
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,044
Default

Sorry that I couldn't provide a solution to this :(
__________________
JD-Dev & Server-Admin
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 03:27.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.