JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 03.08.2024, 21:43
bugnotme bugnotme is offline
BugMeNot Account
 
Join Date: Apr 2013
Posts: 386
Default Archive.org: replace part of download URLs

Hi,

I'm looking for a way to dynamically replace part of download URLs, or a way to "when URL contains {this}, try these faster hosts first".

I found this thread asking for same question (for somewhat different reason), but although it's closed as Solved, I couldn't figure out what the solution was. I'm guessing the fix was for rapidu.net only and closed on lack of feedback from the OP.



My specific case: I am downloading a bunch of files from archive.org and would like to force a different mirror that the one I'm automatically assigned, without needing to subscribe to a VPN service with FDM.


When jdownloader scrapes URLs, links are added as:
**External links are only visible to Support Staff**
When download starts, archive.org assigns a mirror, usually based on geolocation (you can see the host/ip by mouseovering the chunk icon during the download).

I'm in Canada so files are usually downloaded from a canadian mirror like:
**External links are only visible to Support Staff**
These canadian servers are usually very slow (around 20-75 KiB/s average).

Most of the time I can download the same files much faster (around 5-25 MiB/s average) by tricking the URLs to point to a US mirror, like:
**External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff**
The US direct download links are not blocked for my country so no need to use a VPN with FDM.
It sometimes take a bit of trial/errors, but it usually work on first try.


A quick solution for http URLs would be to change CA mirror IP using hosts file, but for https the hostname won't match CN on certificate and cause all sort of "insecure/man-in-the-middle" issues.


It doesn't seem to be possible to edit the download links in jdownloader (links are displayed but not editable in File Properties panel), even less batch edit for hundreds of links at once.

I thought I could work something out using Packagizer and "condition: Downloadurl contains..." but it would need an "...then replace this part of URL with this part" or something.

It doesn't seem to be possible to export list of download URLs from JD either so I've managed to build a couple working lists of hundreds of URLs manually using online URLs extractor and multiple manual cut/replace operations, but it's tedious and should be easier.

Even with the list I've build manually, I wish I could've added these new links to complete huge muti-files downloads with much faster links, but the new URLs aren't detected as mirrors because jdownloader only detect mirrors when links are from different hosters.

Maybe something could be done with EventScripter but learning another script language is a huge time sink for a simple search & replace operation.

If I could just close JD and do a search & replace of URLs in the LinkGrabber/Downloads and reopen it, that'd would be totally fine too.

Is there a way to do it that I'm missing?
Reply With Quote
  #2  
Old 03.08.2024, 23:19
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 81,049
Default

I think best would be to add sort of plugin settings to customize server geo location.
Are you aware of a list of possible/available mirror domains or how did you know of working ones? or can you just replace by any mirror location?

If no such list is available, then we can update plugin to either auto replace mirror with some custom ruling or provide rightclick context menu so you can manually update the url by yourself.

Packagizer/Eventscripter will not help in this situation. Packagizer is only applied when adding links. Eventscripter cannot hook into actual downloads.
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 03.08.2024 at 23:22.
Reply With Quote
  #3  
Old 04.08.2024, 02:58
bugnotme bugnotme is offline
BugMeNot Account
 
Join Date: Apr 2013
Posts: 386
Default

I haven't searched much, but I doubt they publish a list of available mirrors since we're probably expected to use **External links are only visible to Support Staff****External links are only visible to Support Staff**.

I happened to notice that from time to time I have a link that download at some decent speed so I rushed to inspect the faster download before it completes and figured out it's from a US host.

Then I tried interchanging the deltas in links between slow (CA) and fast (US) hosts and it happened to work.

As I said, sometimes it takes a bit of trial & error (sometimes I had to try other numbers before /items/), but when you find a link that works it seem to work for all files in packages.

I had one with ~2000 files: ETA to download from US mirror was ~6 hours vs >30 days with default CA mirrors so it's definitely worth a couple tries

Some basic plugin in which we either provide a host (or list of hosts) to try would work, although a more generic plugin with a simple "replace X with Y in selected download links" would do the job and could be useful for more use cases than archive.org alone.
Reply With Quote
  #4  
Old 04.08.2024, 17:58
bugnotme bugnotme is offline
BugMeNot Account
 
Join Date: Apr 2013
Posts: 386
Default

Oh wow!

Found this video about archive.org infrastructure:
**External links are only visible to Support Staff****External links are only visible to Support Staff**
interesting video to watch


Then I searched a bit more and figured out that most archive.org file links are browsable (I don't want to expose too much publicly for security reasons).

I'm guessing something like:
**External links are only visible to Support Staff**
see:
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
(all I checked had 35).

I expect it's not desired behavior to be able to browse these last links directly
(see **External links are only visible to Support Staff****External links are only visible to Support Staff**), but it's most likely expected to be able to browse the package/files links directly.

Note: you may find a list of cluster/nodes **External links are only visible to Support Staff**here or **External links are only visible to Support Staff**here if you have an account (not sure if it's available to all uploaders or restricted to support staff).

Then pick a random cluster/node and check if package exists, ex:
**External links are only visible to Support Staff****External links are only visible to Support Staff**

in this last example, I'm redirected to a specific node:
**External links are only visible to Support Staff****External links are only visible to Support Staff**

I can point LinkGrabber to this URL directly so no more need for search & replace to edit links.

If you want to push more, you can try multiple host/clusters and prioritize based on reponse time.



Now I would like to spread downloads using a couple different clusters/nodes simultaneously as mirrors in a single hoster (ideally limit simultaneous download per cluster/node). Is there some way to do that?

I think the easiest way would be to be able to define custom hosters so we can manually create separate "archive.org-host1", "archive.org-host2", "archive.org-host3", etc. and the mirror handling / simultaneous DL limits per host will work without changing anything else.

(should we move to a new thread a at this point?)
Reply With Quote
  #5  
Old 04.08.2024, 23:45
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 81,049
Default

Quote:
Originally Posted by bugnotme View Post
I think the easiest way would be to be able to define custom hosters so we can manually create separate "archive.org-host1", "archive.org-host2", "archive.org-host3", etc. and the mirror handling / simultaneous DL limits per host will work without changing anything else.
Could you please explain little more detailed/better? Why you want to add the same file multiple times? What's the meaning of this?
I think it would much more sense in adding sort of auto mirror handling in archive plugin, so you can specify one, two or list of mirror domains and JDownloader chooses first working one automatically

I really wonder why those canadian servers are that super slow to respond
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 04.08.2024 at 23:51.
Reply With Quote
  #6  
Old 05.08.2024, 02:26
bugnotme bugnotme is offline
BugMeNot Account
 
Join Date: Apr 2013
Posts: 386
Default

Quote:
Originally Posted by Jiaz View Post
Could you please explain little more detailed/better?
The idea is to handle each host as a mirror inside the same hoster. So you can download multiple files at once while limiting to 1 active DL per host, and handle properly files available from multiple links like regular mirrors.

Yes it would make more sense that it's handled directly in the main archive plugin but I have no idea how much work it would mean. It seems like a pretty big structural change.


My thinking for minimal effort was to duplicate Archive plugin into multiple hosters (not that pretty overall but it seemed much less work for you while allowing more customization):

Archive (current plugin) for **External links are only visible to Support Staff****External links are only visible to Support Staff** links
Archive-host1 for **External links are only visible to Support Staff****External links are only visible to Support Staff** links
Archive-host2 for **External links are only visible to Support Staff****External links are only visible to Support Staff** links
Archive-host3 for **External links are only visible to Support Staff****External links are only visible to Support Staff** links
etc.

then throw a bunch of URLs in the LinkGrabber:
**External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff**
and the handling of these are already coded:
-download multiples files from archive.org (host1+host2+host3, etc.) at once
-limit to 1 active DL per host
-Handling of files already DLing/DLed from a different host (mirrors)


Quote:
Originally Posted by Jiaz View Post
I really wonder why those canadian servers are that super slow to respond
Same.

It feels like they run on an old USR8000 we used in pre-broadband era to share a dialups to whole LAN
Reply With Quote
Old 05.08.2024, 02:36
bugnotme
Message deleted by pspzockerscene. Reason: Double-post
  #7  
Old 05.08.2024, 02:36
bugnotme bugnotme is offline
BugMeNot Account
 
Join Date: Apr 2013
Posts: 386
Default

Just so you know:

Throwing a "files host" link like this one in the linkGrabber:
**External links are only visible to Support Staff****External links are only visible to Support Staff**
works, but file sizes aren't detected properly.

The file sizes are on the page but not the same format as standard Archive.org links like this one:
**External links are only visible to Support Staff****External links are only visible to Support Staff**

If you happen to want to deal with multiple mirrors inside the Archive plugin, you'll want to adjust file size detection (and package name detection) at the same time.
Reply With Quote
  #8  
Old 05.08.2024, 13:55
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 73,118
Default

Sorry, your posts need to be approved manually as you are using our public bugmenot account in order to prevent spam.

Quote:
Originally Posted by bugnotme View Post
works, but file sizes aren't detected properly.
There was a small bug in our generic "http directory parser" so this will be fixed with the next set of CORE-updates.
Alternatively, as a workaround for now to get the file sizes: Rightclick on those added items -> Check online status

Quote:
Originally Posted by bugnotme View Post
The file sizes are on the page but not the same format as standard Archive.org links like this one:
Those links are handled by our specific archive.org crawler.
Internally, it is using the following API call to obtain most of the information:
Code:
archive.org/metadata/ThesharewareDosCollection/
-> Maybe the information in the field "workable_servers" is what we are looking for here?

Quote:
Originally Posted by bugnotme View Post
The idea is to handle each host as a mirror inside the same hoster. So you can download multiple files at once while limiting to 1 active DL per host, and handle properly files available from multiple links like regular mirrors.
I do not see us doing this. If you need that, please do it yourself.

About your initial idea:
All in all, this is such a specific/deep feature that I'am really unsure whether or not we should implement anything towards this direction.
It sounds like a possible time-consuming-to-implement feature which not a lot fo users are going to use.
Luckily JDownloader is open source so if we'd decide against it, nothing would stop you from implementing it yourself.

Also I may ask: Did you already contact the archive.org support about the slow speed of the canadian servers?
If not, why not? If so, what was their response?

On the other side:
Our current archive.org plugin already has a quite specific set of features and options so one more wouldn't hurt.
If there is a reliable way to provide a list of "probably working" servers, I'm all for it.

When adding archive.org links like "/details/<identifier>", JDownloader will internally use those "/download/" links so at this stage, it is not decided which server will be used as archive.org does this serverside when doing the final redirect to the file.
This means that this information is not known unless the user starts downloading or triggers another online-check of the added links.

Quote:
Originally Posted by bugnotme View Post
It doesn't seem to be possible to export list of download URLs from JD either so I've managed to build a couple working lists of hundreds of URLs manually using online URLs extractor and multiple manual cut/replace operations, but it's tedious and should be easier.
Generally it is possible to copy added download-urls but generally, those will be the ones that "you added" and in this case as said, that will be the "/download/..." links - "you get what you add".
If there is a way to get a "probably working server" before the download is started though (maybe via mentioned field "workable_servers"), I can make it possible for you to get those direct-links in beforehand so you'd at least have an easy way to change the server via search & replace, if that would help.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #9  
Old 06.08.2024, 02:38
bugnotme bugnotme is offline
BugMeNot Account
 
Join Date: Apr 2013
Posts: 386
Default

Quote:
Originally Posted by pspzockerscene View Post
Sorry, your posts need to be approved manually as you are using our public bugmenot account in order to prevent spam.
Totally understand, no worries.

Sorry for the double-post sent in error yesterday. Session times out during writing of those long posts so I opened a second tab to re-login before submit and messed up on the submit.

You can delete post #7 for readability. I don't think I can delete it myself.


Quote:
Originally Posted by pspzockerscene View Post
There was a small bug in our generic "http directory parser" so this will be fixed with the next set of CORE-updates.
Alternatively, as a workaround for now to get the file sizes: Rightclick on those added items -> Check online status
Works, thanks!

I also found out by accident yesterday about the Rightclick on a file -> "change url" option. Intuitively I was (and I believe most people are) trying to edit the url directly in the "Download from" field of File Properties panel, but it's not editable there.


Quote:
Originally Posted by pspzockerscene View Post
Code:
archive.org/metadata/ThesharewareDosCollection/
-> Maybe the information in the field "workable_servers" is what we are looking for here?
I think you are right. I tried manual URLs using other hosts but they redirect me to these two hosts.


Quote:
Originally Posted by pspzockerscene View Post
About your initial idea:
All in all, this is such a specific/deep feature that I'am really unsure whether or not we should implement anything towards this direction.
It sounds like a possible time-consuming-to-implement feature which not a lot of users are going to use.
Totally understandable.

I'm debating myself if there is some benefit to gain for mass with a plugin change (probably not worth it if it's not a simple change).

For me it turned out it was much simpler to just duplicate my JD folder and runt it on 6-7 different computers/VMs. I don't know why I didn't think of doing this before. I've read on another thread that I could use multiple proxies, but I'm confused if JD itself can be used as a proxy or if we need to setup external proxies like Squid.

I've been using JD for years with basic scrape & download function, but for the last couple days the more I dig into different options the more I'm impressed by all the stuff you guys implemented that I didn't know existed


Quote:
Originally Posted by pspzockerscene View Post
Also I may ask: Did you already contact the archive.org support about the slow speed of the canadian servers?
If not, why not? If so, what was their response?
I didn't. Maybe I could try.. I kinda felt "it's free and probably non-profit so take it as-is" but I guess it couldn't hurt to try to let them know about the speed issue.

Quote:
Originally Posted by pspzockerscene View Post
On the other side:
Our current archive.org plugin already has a quite specific set of features and options so one more wouldn't hurt.
If there is a reliable way to provide a list of "probably working" servers, I'm all for it.
I believe the "workable_servers" in the metadata page would work and I think it could benefit most people achieving higher speed by using multiple hosts at once. Raising the number of DL alone for archive.org/download often adds more DL from the same host so you seem to get a slight gain (up to 3-4 DL in my testing) but not as high as using multiple different hosts.

Learning curve would be way too high to attempt to code such a change myself though and I have an easy workaround now so unless you guys suddenly see some benefit to feel it's worth the effort, I guess it will just stay as-is. No big deal.

Thanks alot for your help and kindness. 72K and 80K posts, that's serious dedication. Huge respect
Reply With Quote
  #10  
Old 06.08.2024, 12:30
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 73,118
Default

Quote:
Originally Posted by bugnotme View Post
You can delete post #7 for readability. I don't think I can delete it myself.
Done.

Quote:
Originally Posted by bugnotme View Post
I also found out by accident yesterday about the Rightclick on a file -> "change url" option. Intuitively I was (and I believe most people are) trying to edit the url directly in the "Download from" field of File Properties panel, but it's not editable there.
Just keep in mind, that that option is only available for single direct-downloadable links and not for links that are handled by a specific plugin.
Example where it works:
Code:
ping.online.net/20Mo.dat
Example where it doesn't work:
Code:
workupload.com/file/Bdd3u5K8n7k
Background:
If you add 'manually crawled' single direct downloadable archive.org links, they will not be handled by a specific plugin (so not by the archive.org plugin but by the generic "DirectHTTP" plugin) thus for those the option will be present while it will not be present for "archive.org/download/..." links.


Quote:
Originally Posted by bugnotme View Post
I've been using JD for years with basic scrape & download function, but for the last couple days the more I dig into different options the more I'm impressed by all the stuff you guys implemented that I didn't know existed
Thanks for your feedback

Quote:
Originally Posted by bugnotme View Post
I didn't. Maybe I could try
You should.

Quote:
Originally Posted by bugnotme View Post
I kinda felt "it's free and probably non-profit so take it as-is"
JD is also free and here we are
We are always happy about legit bugreports and of course also about good feature requests.
In the past, reports of problem which were not related to JDownloader did actually lead to a lot of reports to developers of 3rd party software. Some of those reports were forwarded directly by us.

Quote:
Originally Posted by bugnotme View Post
I believe the "workable_servers" in the metadata page would work and I think it could benefit most people achieving higher speed by using multiple hosts at once.
Just to be absolutely clear - using "multiple hosts at once for one file" wouldn't be possible as our download-system is not designed to be able to do this.
What would be possible is:
  • Make JD randomly choose a different server for each file on each download-start.
  • and/or: Let the user provide a text-string of "preferred servers", then prefer those if they are in the list of serverside given "workable_servers". This would ensure that no matter what the user does, the user wouldn't be able to break anything

One more idea which could help you in conjunction with the above ideas: Create an EventScripter script which detects slow downloads (e.g. below 100KB/s for 5 minutes) and then restarts them up to X times.
EventScripter forum thread:
https://board.jdownloader.org/showthread.php?t=70525
EventScripter help article:
https://support.jdownloader.org/Know...event-scripter
While this is somehow a 'crowbar method', I can imagine this to be helpful to you.

Quote:
Originally Posted by bugnotme View Post
Raising the number of DL alone for archive.org/download often adds more DL from the same host so you seem to get a slight gain (up to 3-4 DL in my testing) but not as high as using multiple different hosts.
Did you already test with more chunks?

Quote:
Originally Posted by bugnotme View Post
Thanks alot for your help and kindness. 72K and 80K posts, that's serious dedication. Huge respect
Thanks a lot
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?

Last edited by pspzockerscene; 06.08.2024 at 12:39. Reason: Added some missing hyperlinks
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 06:48.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.