#1
|
|||
|
|||
4chan file names
When media files are posted in 4chan.org threads, the file names get changed to a random series of numbers. But the original title is what's shown as the URL text linking to the file.
Would it be at all possible to have LinkGrabber automatically use the URL text as the file name...? The URL text also includes the file extension, so the switch would be seamless. I have a Chrome extension called '4chan X' that does this via a little download icon next to the URL text. Unfortunately though there's no option to scrape the whole thread, so you have to go one by one and that gets pretty tiresome. Would be awesome to just use JDownloader instead! |
#2
|
||||
|
||||
Hi,
please post example URLs. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#3
|
|||
|
|||
SFW examples
**External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** NSFW example **External links are only visible to Support Staff****External links are only visible to Support Staff** |
#4
|
||||
|
||||
Hm seems like I don't understand the issue you are having.
If I take this as an example: bla.4chan.org/wsg/1601234567416.webm --> Filename would be: 1601234567416.webm Where is the problem / how/where are other filenames supposed to come from? -psp- EDIT Fo you mean the "file:blabla.mp4" nems in their html code? We could try to crawl- and set those filenames but at this moment, serverside filenames are chosen which is not per-se considerable as a bug.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#5
|
||||
|
||||
@pspzockerscene: yes, that is what he means
__________________
JD-Dev & Server-Admin |
#6
|
||||
|
||||
Done.
Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#7
|
|||
|
|||
Please make this feature optional.
|
#8
|
||||
|
||||
I will create a Packagizer rule for you tomorrow to get the old names back.
If more users want the same, I may add a plugin setting for that. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#9
|
||||
|
||||
I've updated ticket, we will add plugin setting for this
__________________
JD-Dev & Server-Admin |
#10
|
|||
|
|||
Quote:
Great! |
#11
|
||||
|
||||
@Derp: that rule won't work because orgfilename is the one set by plugin. pspzocker was talking about a rule that uses the old names but I (and you) prefer to have plugin setting for this He will work on this
__________________
JD-Dev & Server-Admin |
#12
|
||||
|
||||
@Derp
It won't work this way - you'd have to get the names out of the direct-URLs crawled by the crawler. Anyways - I've added a setting for the next update. See Settings -> Plugins -> boards.4chan.org Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#13
|
||||
|
||||
I've just prepared a huge update for our 4chan plugin.
This includes: - Faster crawling - Can now crawl up to 20 pages of a category (see plugin settings - default = limited to 1) Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#14
|
|||
|
|||
Build Thu Nov 19 18:44:33 CET 2020
The plugin is completely broken now. Regardless of the server-filename-setting, JD2 is adding every file as a .webm and checks it as such, which of course results in all the files which aren't webm-files to not be found. Also, the default packagename just uses the 4chan-board-acronym and not the board-name, i.e. "4chan org - sp - 103946012" instead of "4chan org - Sports - 103946012". **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** <-- this thread happens to have at least one file of every type (jpg, png, gif, webm) and two webms for the auto-packaging. ------------------------------ Quote:
Quote:
In the "Dynamic Variables" there's no entry for that, but "packagename" is listed twice. Maybe I also found a bug: When sourceurl is checked, and the user checks "rename", the checkbox is cleared and disabled. After unchecking "rename", the sourceurl-checkbox is checked again, but not re-enabled. Last edited by Derp; 19.11.2020 at 19:55. Reason: Added build-info |
#15
|
||||
|
||||
Quote:
Quote:
I've fixed this issue - I made a simple but silly mistake with that wrong file-extension and I didn't know that the long/full board title was so important. Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#16
|
|||
|
|||
Quote:
Re: "Packagename" being available twice: It's because of line 284 or 301 in PackagizerFilterRuleDialog.java Quote:
Thank you very much. EDIT2: Just had an interesting case: Two different files in a thread used the same upload-filename, and JD2 treats them as mirrors of each other. Because of that only one of them is downloaded With server-filenames, both files are downloaded. **External links are only visible to Support Staff****External links are only visible to Support Staff** filename is "file.png" EDIT3: Considering how slow the image-server already is, a fast linkcheck option would make sense, too. Last edited by Derp; 20.11.2020 at 00:47. |
#17
|
||||
|
||||
@Derp: Thanks for the info, those two Packagename are different ones. I will change description
__________________
JD-Dev & Server-Admin |
#18
|
||||
|
||||
Quote:
You can still see the filesizes right away because this information is right away for 4chan content. Where/how do you think that a linkcheck process is active- and slow? I don't see how we could speed anything up here. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#19
|
||||
|
||||
@pspzockerscene. maybe he's taking about the request limit
__________________
JD-Dev & Server-Admin |
#20
|
||||
|
||||
Yeah but we should respect that.
Afaik they do not enforce that rate-limit. We could add a setting to prefer using the website without rate-limit but that makes no sense as we'd have to maintain the website code too then. The rate-limit shouldn't be too restrictive in the way it is atm. -psp- EDIT Okay by request-limiting only that API subdomain it will be "faster".
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 20.11.2020 at 15:36. |
#21
|
||||
|
||||
Wait for next update, then request limit will be *limited* to api only and mirror handling also be be working correct (you have to re-add the links for this to work)
__________________
JD-Dev & Server-Admin |
#22
|
||||
|
||||
Limiting the request interval to api only will be available with next core update next week. I'm not finished yet
__________________
JD-Dev & Server-Admin |
#23
|
|||
|
|||
Quote:
Quote:
Considering the JSON already contains the filesizes and JD2 doesn't need to hit the image-server to get that info, it shouldn't be a problem at all. ----- Another question: Is JD2 somehow limiting the concurrent downloads from cloudflare-protected links or something? When I start to download a package with ~30 images (~11MiB total) from 4chan with 5 concurrent downloads, it takes about 3 minutes for it to finish, with most of them being stuck at "Starting..." for 10-15 seconds. Or are you seeing the same behavior and it's being limited server-side? I think it changed to this about a week ago. EDIT: Another try yesterday: 140 files, 75 MiB, 3 concurrent downloads, took 8 minutes. Last edited by Derp; 23.11.2020 at 08:29. |
#24
|
||||
|
||||
@Derp: Has nothing to do with cloudflare. At the moment the request limit also affects the downloads. Once the next core update is live, the request limit only limits api requests, current limiter doesn't support different subdomains.
I will release core update later this day, see https://board.jdownloader.org/showpo...1&postcount=22
__________________
JD-Dev & Server-Admin |
#25
|
||||
|
||||
* this update has not been released yet.
Please be patient. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#26
|
|||
|
|||
Quote:
|
#27
|
||||
|
||||
@Derp: Yes, change is live and working fine here
__________________
JD-Dev & Server-Admin |
#28
|
|||
|
|||
Too bad, download is just as slow as it was before.
But now the downloads aren't stuck at "Starting..." for a while, but at "Download". Many files have to be retried due to timeout, like this: **External links are only visible to Support Staff****External links are only visible to Support Staff** Did another test with a set of 50 URLs and 1 simultaneous download per host: JD2 took 160 seconds, the wget-batchfile 47. Raising the number of simultaneous downloads helps a bit: 30s when setting to 5 downloads. I guess that's as good as it can get? ------ Found another bug with regard to filenames: **External links are only visible to Support Staff****External links are only visible to Support Staff** Post #104266878 has a file with the uploaded filename ".jpg", just the extension, no name-part. JD2 doesn't add this file to the Linkgrabber-list when copying the 4chan-thread-url, regardless of the state of the filename-setting. |
#29
|
||||
|
||||
your log shows
1.) very slow server response times Request-Time: 1758ms 2.) read timeouts Caused by: java.net.SocketTimeoutException: Read timed out I can reproduce both and looks like server side throttling. Same happens for me in browser too Fast start and then takes ages for the last percent or end up in read timeout as well.
__________________
JD-Dev & Server-Admin |
#30
|
||||
|
||||
Quote:
__________________
JD-Dev & Server-Admin |
#31
|
|||
|
|||
Quote:
----- Another minor issue: Since the change, JD2 is adding the MD5-hash to every image file, except WEBM-files. You can use the URL from post #28 for an example. Last edited by Derp; 01.12.2020 at 14:43. |
#32
|
||||
|
||||
@Derp:
MD5 Hash is available to every link for me. Plugin doesn't process/add items without md5. Example link from #28 worked fine for me. Is it missing on linkcrawler or after download?
__________________
JD-Dev & Server-Admin Last edited by Jiaz; 01.12.2020 at 15:10. |
#33
|
|||
|
|||
Quote:
In a new JD2-install, it works for me, too. But in my existing install, the WEBMs don't get the hash added, see pic attached. Both updated to Build Thu Dec 03 19:21:05 CET 2020 Any ideas? |
#34
|
||||
|
||||
Any Eventscripter Scripts active? I cannot reproduce the issue.
Can you try to provide a log? Is this screenshot from download or linkcrawler?
__________________
JD-Dev & Server-Admin |
#35
|
|||
|
|||
No.
Quote:
Found the culprit: A "learned file extension" Linkcrawler-rule from over five years ago. Code:
{ "enabled" : true, "cookies" : null, "updateCookies" : true, "logging" : false, "maxDecryptDepth" : 1, "id" : 1434728570583, "name" : "Learned file extension:webm", "pattern" : "(?i).*\\.webm($|\\?.*$)", "rule" : "DIRECTHTTP", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : null, "rewriteReplaceWith" : null } No idea why the rule is even in there, because webm was added to the default supported extensions nine years ago. Shouldn't be happening even with that rule in place, right? Last edited by Derp; 04.12.2020 at 11:18. |
#36
|
||||
|
||||
Thanks for taking deeper look.
Normally this is expected behaviour because the rule has higher *order* but for this type of rule I've updated JD to forward the meta information correctly. Wait for next core update
__________________
JD-Dev & Server-Admin Last edited by Jiaz; 04.12.2020 at 12:24. |
|
|