|
#1
|
|||
|
|||
Archive.org: replace part of download URLs
Hi,
I'm looking for a way to dynamically replace part of download URLs, or a way to "when URL contains {this}, try these faster hosts first". I found this thread asking for same question (for somewhat different reason), but although it's closed as Solved, I couldn't figure out what the solution was. I'm guessing the fix was for rapidu.net only and closed on lack of feedback from the OP. My specific case: I am downloading a bunch of files from archive.org and would like to force a different mirror that the one I'm automatically assigned, without needing to subscribe to a VPN service with FDM. When jdownloader scrapes URLs, links are added as: **External links are only visible to Support Staff** When download starts, archive.org assigns a mirror, usually based on geolocation (you can see the host/ip by mouseovering the chunk icon during the download). I'm in Canada so files are usually downloaded from a canadian mirror like: **External links are only visible to Support Staff** These canadian servers are usually very slow (around 20-75 KiB/s average). Most of the time I can download the same files much faster (around 5-25 MiB/s average) by tricking the URLs to point to a US mirror, like: **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** The US direct download links are not blocked for my country so no need to use a VPN with FDM. It sometimes take a bit of trial/errors, but it usually work on first try. A quick solution for http URLs would be to change CA mirror IP using hosts file, but for https the hostname won't match CN on certificate and cause all sort of "insecure/man-in-the-middle" issues. It doesn't seem to be possible to edit the download links in jdownloader (links are displayed but not editable in File Properties panel), even less batch edit for hundreds of links at once. I thought I could work something out using Packagizer and "condition: Downloadurl contains..." but it would need an "...then replace this part of URL with this part" or something. It doesn't seem to be possible to export list of download URLs from JD either so I've managed to build a couple working lists of hundreds of URLs manually using online URLs extractor and multiple manual cut/replace operations, but it's tedious and should be easier. Even with the list I've build manually, I wish I could've added these new links to complete huge muti-files downloads with much faster links, but the new URLs aren't detected as mirrors because jdownloader only detect mirrors when links are from different hosters. Maybe something could be done with EventScripter but learning another script language is a huge time sink for a simple search & replace operation. If I could just close JD and do a search & replace of URLs in the LinkGrabber/Downloads and reopen it, that'd would be totally fine too. Is there a way to do it that I'm missing? |
#2
|
||||
|
||||
I think best would be to add sort of plugin settings to customize server geo location.
Are you aware of a list of possible/available mirror domains or how did you know of working ones? or can you just replace by any mirror location? If no such list is available, then we can update plugin to either auto replace mirror with some custom ruling or provide rightclick context menu so you can manually update the url by yourself. Packagizer/Eventscripter will not help in this situation. Packagizer is only applied when adding links. Eventscripter cannot hook into actual downloads.
__________________
JD-Dev & Server-Admin Last edited by Jiaz; 03.08.2024 at 23:22. |
#3
|
|||
|
|||
I haven't searched much, but I doubt they publish a list of available mirrors since we're probably expected to use **External links are only visible to Support Staff****External links are only visible to Support Staff**.
I happened to notice that from time to time I have a link that download at some decent speed so I rushed to inspect the faster download before it completes and figured out it's from a US host. Then I tried interchanging the deltas in links between slow (CA) and fast (US) hosts and it happened to work. As I said, sometimes it takes a bit of trial & error (sometimes I had to try other numbers before /items/), but when you find a link that works it seem to work for all files in packages. I had one with ~2000 files: ETA to download from US mirror was ~6 hours vs >30 days with default CA mirrors so it's definitely worth a couple tries Some basic plugin in which we either provide a host (or list of hosts) to try would work, although a more generic plugin with a simple "replace X with Y in selected download links" would do the job and could be useful for more use cases than archive.org alone. |
#4
|
|||
|
|||
Oh wow!
Found this video about archive.org infrastructure: **External links are only visible to Support Staff****External links are only visible to Support Staff** interesting video to watch Then I searched a bit more and figured out that most archive.org file links are browsable (I don't want to expose too much publicly for security reasons). I'm guessing something like: **External links are only visible to Support Staff** see: **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** (all I checked had 35). I expect it's not desired behavior to be able to browse these last links directly (see **External links are only visible to Support Staff****External links are only visible to Support Staff**), but it's most likely expected to be able to browse the package/files links directly. Note: you may find a list of cluster/nodes **External links are only visible to Support Staff**here or **External links are only visible to Support Staff**here if you have an account (not sure if it's available to all uploaders or restricted to support staff). Then pick a random cluster/node and check if package exists, ex: **External links are only visible to Support Staff****External links are only visible to Support Staff** in this last example, I'm redirected to a specific node: **External links are only visible to Support Staff****External links are only visible to Support Staff** I can point LinkGrabber to this URL directly so no more need for search & replace to edit links. If you want to push more, you can try multiple host/clusters and prioritize based on reponse time. Now I would like to spread downloads using a couple different clusters/nodes simultaneously as mirrors in a single hoster (ideally limit simultaneous download per cluster/node). Is there some way to do that? I think the easiest way would be to be able to define custom hosters so we can manually create separate "archive.org-host1", "archive.org-host2", "archive.org-host3", etc. and the mirror handling / simultaneous DL limits per host will work without changing anything else. (should we move to a new thread a at this point?) |
#5
|
||||
|
||||
Quote:
I think it would much more sense in adding sort of auto mirror handling in archive plugin, so you can specify one, two or list of mirror domains and JDownloader chooses first working one automatically I really wonder why those canadian servers are that super slow to respond
__________________
JD-Dev & Server-Admin Last edited by Jiaz; 04.08.2024 at 23:51. |
#6
|
|||
|
|||
The idea is to handle each host as a mirror inside the same hoster. So you can download multiple files at once while limiting to 1 active DL per host, and handle properly files available from multiple links like regular mirrors.
Yes it would make more sense that it's handled directly in the main archive plugin but I have no idea how much work it would mean. It seems like a pretty big structural change. My thinking for minimal effort was to duplicate Archive plugin into multiple hosters (not that pretty overall but it seemed much less work for you while allowing more customization): Archive (current plugin) for **External links are only visible to Support Staff****External links are only visible to Support Staff** links Archive-host1 for **External links are only visible to Support Staff****External links are only visible to Support Staff** links Archive-host2 for **External links are only visible to Support Staff****External links are only visible to Support Staff** links Archive-host3 for **External links are only visible to Support Staff****External links are only visible to Support Staff** links etc. then throw a bunch of URLs in the LinkGrabber: **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** **External links are only visible to Support Staff** and the handling of these are already coded: -download multiples files from archive.org (host1+host2+host3, etc.) at once -limit to 1 active DL per host -Handling of files already DLing/DLed from a different host (mirrors) Quote:
It feels like they run on an old USR8000 we used in pre-broadband era to share a dialups to whole LAN |
05.08.2024, 02:36 |
bugnotme |
Message deleted by pspzockerscene.
Reason: Double-post
|
#8
|
|||
|
|||
Just so you know:
Throwing a "files host" link like this one in the linkGrabber: **External links are only visible to Support Staff****External links are only visible to Support Staff** works, but file sizes aren't detected properly. The file sizes are on the page but not the same format as standard Archive.org links like this one: **External links are only visible to Support Staff****External links are only visible to Support Staff** If you happen to want to deal with multiple mirrors inside the Archive plugin, you'll want to adjust file size detection (and package name detection) at the same time. |
#9
|
||||
|
||||
Sorry, your posts need to be approved manually as you are using our public bugmenot account in order to prevent spam.
There was a small bug in our generic "http directory parser" so this will be fixed with the next set of CORE-updates. Alternatively, as a workaround for now to get the file sizes: Rightclick on those added items -> Check online status Quote:
Internally, it is using the following API call to obtain most of the information: Code:
archive.org/metadata/ThesharewareDosCollection/ Quote:
About your initial idea: All in all, this is such a specific/deep feature that I'am really unsure whether or not we should implement anything towards this direction. It sounds like a possible time-consuming-to-implement feature which not a lot fo users are going to use. Luckily JDownloader is open source so if we'd decide against it, nothing would stop you from implementing it yourself. Also I may ask: Did you already contact the archive.org support about the slow speed of the canadian servers? If not, why not? If so, what was their response? On the other side: Our current archive.org plugin already has a quite specific set of features and options so one more wouldn't hurt. If there is a reliable way to provide a list of "probably working" servers, I'm all for it. When adding archive.org links like "/details/<identifier>", JDownloader will internally use those "/download/" links so at this stage, it is not decided which server will be used as archive.org does this serverside when doing the final redirect to the file. This means that this information is not known unless the user starts downloading or triggers another online-check of the added links. Quote:
If there is a way to get a "probably working server" before the download is started though (maybe via mentioned field "workable_servers"), I can make it possible for you to get those direct-links in beforehand so you'd at least have an easy way to change the server via search & replace, if that would help.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#10
|
||||||
|
||||||
Quote:
Sorry for the double-post sent in error yesterday. Session times out during writing of those long posts so I opened a second tab to re-login before submit and messed up on the submit. You can delete post #7 for readability. I don't think I can delete it myself. Quote:
I also found out by accident yesterday about the Rightclick on a file -> "change url" option. Intuitively I was (and I believe most people are) trying to edit the url directly in the "Download from" field of File Properties panel, but it's not editable there. Quote:
Quote:
I'm debating myself if there is some benefit to gain for mass with a plugin change (probably not worth it if it's not a simple change). For me it turned out it was much simpler to just duplicate my JD folder and runt it on 6-7 different computers/VMs. I don't know why I didn't think of doing this before. I've read on another thread that I could use multiple proxies, but I'm confused if JD itself can be used as a proxy or if we need to setup external proxies like Squid. I've been using JD for years with basic scrape & download function, but for the last couple days the more I dig into different options the more I'm impressed by all the stuff you guys implemented that I didn't know existed Quote:
Quote:
Learning curve would be way too high to attempt to code such a change myself though and I have an easy workaround now so unless you guys suddenly see some benefit to feel it's worth the effort, I guess it will just stay as-is. No big deal. Thanks alot for your help and kindness. 72K and 80K posts, that's serious dedication. Huge respect |
#11
|
||||||
|
||||||
Quote:
Quote:
Example where it works: Code:
ping.online.net/20Mo.dat Code:
workupload.com/file/Bdd3u5K8n7k If you add 'manually crawled' single direct downloadable archive.org links, they will not be handled by a specific plugin (so not by the archive.org plugin but by the generic "DirectHTTP" plugin) thus for those the option will be present while it will not be present for "archive.org/download/..." links. Quote:
You should. Quote:
We are always happy about legit bugreports and of course also about good feature requests. In the past, reports of problem which were not related to JDownloader did actually lead to a lot of reports to developers of 3rd party software. Some of those reports were forwarded directly by us. Quote:
What would be possible is:
One more idea which could help you in conjunction with the above ideas: Create an EventScripter script which detects slow downloads (e.g. below 100KB/s for 5 minutes) and then restarts them up to X times. EventScripter forum thread: https://board.jdownloader.org/showthread.php?t=70525 EventScripter help article: https://support.jdownloader.org/Know...event-scripter While this is somehow a 'crowbar method', I can imagine this to be helpful to you. Quote:
Thanks a lot
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 06.08.2024 at 12:39. Reason: Added some missing hyperlinks |
Thread Tools | |
Display Modes | |
|
|