JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 12.02.2018, 01:34
netgearjd netgearjd is offline
JD Fan
 
Join Date: Aug 2014
Posts: 76
Lightbulb Add Smart/Super-Fast/Short-Circuit Check for File Existence

Here's the situation - there are folders on different file hosts (MediaFire, ZippyShare etc.) where files are added off and on. So every month or so I add the folder links to JD, let the LinkGrabber do the crawling and it gives me a list of all the files in those folders, ready for downloading. So far, so good.

However a problem arises because every month I obviously want to catch up with only the latest added content, i.e. I want to download only the files I haven't already downloaded earlier.

Now to do this I add all the links to the Downloads list, then set JD to skip files if they already exist. Again, this does work and the existing files are all skipped, but the major issue is that the checking and skipping is so damn slow!

As far as I can tell, JD first starts every link, connects to the server, goes through the captcha/waiting process if required, then retrieves the file details and possibly even initiates the download before it realizes that oh wait, a file with that same name already exists in the download folder, so the file must be skipped. How crazy!

Why can't JD simply check for the existence of a file with the same name as the current one in the list before it goes through the laborious and slow process of connecting to the server? That way it can simply skip an existing file in a fraction of the time it currently takes.

Moreover it's not just a simple time penalty as described above, because there's another terrible side-effect of the current process too. JD connecting to the server for every file, whether it's to be skipped or not, inevitably results in the file host erroneously imposing limits because according to it so many downloads have been initiated (irrespective of the fact that no data was actually downloaded for so many of the files). This is ridiculous but there's no way around it at present.

The end result is that even if a mere 5 files have been added since my last download session, due to the server-imposed limits, checking the rest of the files takes hours or even days right now (not kidding!), whereas with the short-circuit file check it would probably take not more than a minute at most, which would be a massive improvement.

Final note: I understand that some plugins do not retrieve the actual file name till the download actually starts. The easy solution is for JD to always perform the the short-circuit file check first, then if that fails (no match found) it can continue with the existing slow process. So best-case scenario would be a file match before the server needs to be contacted which would save a huge amount of time, while worst-case scenario would be that the server does need to be contacted for a particular file, resulting in the same time penalty as exists now. Seems like a win-win situation to me!
Reply With Quote
  #2  
Old 12.02.2018, 01:44
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 15,934
Default

clearly you did limited research on this, there are settings in which you can change mirror behaviour. settings > advanced settings > filter 'mirror'
main one is GeneralSettings.mirrordetectiondecision

raztoki
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #3  
Old 12.02.2018, 07:32
dabrown dabrown is offline
Tornado
 
Join Date: Jun 2015
Location: North America
Posts: 248
Default

Mirror handling doesn't work when links are in different packages, though. It sounds like the OP is placing them in different packages. And even if you were to put it into an existing package, if the file was already downloaded, the new copies won't be detected as a mirror because they weren't there when the originals downloaded. So they will try again.

SOME host plugins check the destination folder before initiating the download, some don't. I've been told it's because some hosts don't set the "final" filename until the download is actually running. That often breaks mirror detection! If I have 5 files all the same name and they show as mirrors when added in linkgrabber, some end up renaming themselves when the download initiates. Or worse, rename themselves if the file skips! If it renames some but not all when NONE of the files in the package downloaded, then the next time I unskip the mirror detection is all screwed up. Jiaz suggested the only way I could prevent this was to manually rename EVERY link before starting the downloads. Ugh. But, I end up doing that anyway to force mirror detection. I'll linkgrab up to a dozen different versions of the same file, only want ONE to download, so I rename them all the exact same (even if the extension was different) to force it to only attempt whichever one becomes available first. Anyhow, enough on why the current mirror handling isn't "optimal". I've already said what I'd prefer (a force mirror flag, as in "treat all files in this package as a mirror regardless of size, name, color, date, race, creed, etc)".


I do see the OP's point, though. JD doesn't check the destination folder until it STARTS the download for some hosts. Which counts against the daily limit for those hosts. I can't tell you how many times I've lost one of my 3 allowed files per day from cRapidgator because it tried to download a file that already existed, that I'd downloaded from a different host days/weeks earlier. Depositfiles, on the other hand, will skip:file exists before the captcha. It's inconsistent. It would be nice if it ALWAYS checked first. Even if the filename is a placeholder. In my case, I rename almost every link anyway, so it's not a placeholder it's the final file name.

I think I've confused it enough now.
Reply With Quote
  #4  
Old 12.02.2018, 10:25
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 15,934
Default

@dabrown
correct mirror handling is within the same package, though file on disk checks still happen outside of that depending on the mirror handling setting. It's more so dependant that you save all the data in the same path as previous data. Some mirror handling can be done on filename & checksums/filesize. Not all providers give real filesize figures before download. Hence difference in behaviours between some platforms.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #5  
Old 12.02.2018, 11:15
dabrown dabrown is offline
Tornado
 
Join Date: Jun 2015
Location: North America
Posts: 248
Default

I didn't realize the file on disk check was controlled by mirror handling- I thought it was totally separate. Doesn't matter much, I have mirror handling set basically that ALL it checks is filename to determine if a file is a mirror or not. I do, however, want it to check to ensure the file doesn't already exist as there are a lot of files with generic common names. It does that, just at the wrong time (after the download starts). Though maybe I'm mistaken about manually renamed links, because I saw some files "Skipped-File Exists" when I wasn't watching. I still had to manually disable them, but that's how I have it set up. It's not smart enough to know a "duplicate" from a "common generic name".
Reply With Quote
  #6  
Old 12.02.2018, 21:30
netgearjd netgearjd is offline
JD Fan
 
Join Date: Aug 2014
Posts: 76
Default

Quote:
Originally Posted by raztoki View Post
clearly you did limited research on this, there are settings in which you can change mirror behaviour. settings > advanced settings > filter 'mirror'
main one is GeneralSettings.mirrordetectiondecision
Limited research, really? Not to sound snarky, but can you point me to a proper online description of these Advanced mirror-related settings? How am I even supposed to know or suspect that mirror handling settings are also relevant for on-disk previously downloaded files? So far I was under the impression that mirror detection is only relevant to figure out multiple links to the exact same file in the same package. However as I mentioned above, every month or so I add the folders afresh to the LinkGrabber, and the previous session's downloaded links are no longer present in the Downloads list. (Every time the package name and thus download folder name obviously remains the same however, otherwise how would the file exists check work?) If mirror detection will help me speed up the process by running a short-circuit comparison check between new filenames and existing files on disk, that's great. I'll try it out and let you know.

Just to be clear though, you're saying that if I set GeneralSettings: Mirror Detection Decision to FILENAME instead of AUTO as it is now, it will do exactly what I'm looking for and perform a short-circuit check with every file host?

Quote:
Originally Posted by dabrown View Post
I do see the OP's point, though. JD doesn't check the destination folder until it STARTS the download for some hosts. Which counts against the daily limit for those hosts. I can't tell you how many times I've lost one of my 3 allowed files per day from cRapidgator because it tried to download a file that already existed, that I'd downloaded from a different host days/weeks earlier. Depositfiles, on the other hand, will skip:file exists before the captcha. It's inconsistent. It would be nice if it ALWAYS checked first. Even if the filename is a placeholder. In my case, I rename almost every link anyway, so it's not a placeholder it's the final file name.
Thanks, you got my point exactly. As I mentioned at the end of my initial post, I know that certain hosts/plugins don't provide the actual file names until the download is started. Hence my suggestion (and the point you acknowledged as well) - "It would be nice if it ALWAYS checked first." Frankly I don't see the downside to JD always running a short-circuit check first with whatever is the currently listed file name. If it matches, great; if not, connect to the server and continue as before.

Makes me scratch my head really - is there any reason why this should not be the default way to do things?

Edit: Setting GeneralSettings: Mirror Detection Decision to FILENAME seemed to make no difference at all for MediaFire links at least. JD still seems to be connecting to the server every time before running the file comparison check. Not sure how this thread deserves to be marked as [Solved].

Last edited by netgearjd; 12.02.2018 at 21:55.
Reply With Quote
  #7  
Old 13.02.2018, 07:15
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 327
Default

@netgearjd: When a download starts,it is possible to use a script to disable/remove other links in the package (or even the entire download/linkgrabber list) which have matching names. I will PM the link to you if you are interested in testing/trying it.
Reply With Quote
  #8  
Old 16.02.2018, 19:48
netgearjd netgearjd is offline
JD Fan
 
Join Date: Aug 2014
Posts: 76
Default

To reiterate, this feature request is still not Solved.

Quote:
Originally Posted by mgpai View Post
@netgearjd: When a download starts,it is possible to use a script to disable/remove other links in the package (or even the entire download/linkgrabber list) which have matching names. I will PM the link to you if you are interested in testing/trying it.
Thanks for chiming in, mgpai. I posted my response in your scripts thread.

Last edited by netgearjd; 16.02.2018 at 22:03.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 23:03.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.