#1
|
||||
|
||||
Questions about LinkGrabbing process
Could somebody please complete/edit the process of LinkGrabbing in JD I have documented below?
Once everything has been reviewed/edited and complete, I'd be happy to prepare an 'Article' for the Knowledgebase on https://support.jdownloader.org/Knowledgebase/ What I see and think to know is: After I have added a number of links to 'Analyse and Add Links'-window of LinkGrabber and clicked on 'Continue'… …following steps/operations will be performed: Step 1: A dupe check is being performed. As I can see from the number of links found, any duplicates that were added are only counted once. Question 1: Is this dupecheck being performed offline and before anything gets written to linkcollector*.zip files. Step 2: If there is only one single link that does not require a deep scan, a normal link analysis is started. If there are only links that require a deep scan, then a popup window will show up, asking me whether or not I want to perform a deep scan. Please also read this thread about an important LinkGrabber warning message missing: https://board.jdownloader.org/showth...985#post503985 Step 3: A popup window 'Crawling for download links' appears. It shows an increasing number of links found, which is the number of (unique) links, pasted to the LinkGrabber window. The popup window will disappear, as soon as all pasted links have been read. Step 4: In case you have 'Bubble Notifier' for this enabled after a while you see the progress also in that 'Bubble'. Now, an online check is being performed for all links found. Question 2: How many links are being checked simultaneously? Question 3: They are not being checked in the order of lines: link in line 1, link in line 2, link in line 3,…, right? During this online check files are added to LinkGrabber Table (as online, as offline, as…). They maybe subject to filtering and/or hiding. Step 5: When all files have been checked, in 'Bubble Notifier' you will read 'Done' and 'Bubble Notifier' will disappear after a few seconds. This is what I see/conclude. But I'm sure my list of steps is not complete. So, I'm asking for completion and correction of any statements that are no correct. |
#2
|
|||||
|
|||||
Before that, some links eventually get crawled.
Yes. Jiaz will be able to answer that. Quote:
Your "Step 1" reads itself as if it should include the whole add & crawl process... Quote:
Yes and no. It always depends on how the website is made (and if there is an API or not) and how the plugin is made: Is it optimized to show the links as fast as possible or does that depend on plugin settings e.g. some plugins do have a "Fast linkcheck" setting. If the website is e.g. a cloud folder structure such as google drive, the crawler will do the "linkchecking part" alltogether (for folders) because we can be sure in beforehand that all files found in a folder-structure are online and we do get all needed information (status, filename, filesize, md5 hash) right away. This very much speeds up/skips the linkchecking that will usually happen if you e.g. add hundrets of filehoster links. This varies from website to website. Some will provide an API to linkcheck batches of X (mostly up to 100) at the same time, others don't. Quote:
Quote:
Quote:
I doubt that "guides in this style" will be helpful for our users. Example of our current articles about the linkgrabber (yes we only got 2 atm.): https://support.jdownloader.org/Know...25/linkgrabber --> https://support.jdownloader.org/Know...download-paths https://support.jdownloader.org/Know...w-to-add-links I don't like to self compliment but I'd say my "Add links" article cntains more useful information than this thread of yours (and no I don't mean to be rude).
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 08.06.2022 at 15:21. Reason: Fixed wrong info about stuff that happens before dupe check |
#3
|
||||
|
||||
Quote:
And yes, I read all those articles in the knowledgebase. I even mirrored the whole Knowledgebase with HTTrack, so I can edit and comment it for my personal documentation. My intention was not to write an article on 'howto'. My intention was to write a documentation about what happens, when I use LinkGrabber. But if you think, that nobody is interested to know what happens (apart from me)... ... you probably know that better than me. And that's fine with me. I will write that document anyway - also if it is just for me. PS: I will send you a copy of my email to Jiaz. I think you will better understand then, why I'm doing some things, the way I do ... |
#4
|
||||
|
||||
Quote:
Before links I added to LinkGrabber are checked for dupes, they are being checked for availability? This means dupes, triplicates,... would be checked online twice or more, unnecessarily? Unnecessary requests sent? If yes, I would file a request to change that, as the number of requests sent, should be kept as low as possible... ... in order not to flood the website with requests, which might lead to temporary banning. Last edited by StefanM; 08.06.2022 at 15:15. |
#5
|
||||
|
||||
Quote:
Quote:
Quote:
No. I've edited my post accordingly. Jiaz can/will add more information here once he finds the time.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#6
|
||||
|
||||
Quote:
Dupecheck has nothing to do with this. Linkcollector is just the current list stored to disk. These files are getting written only. Only on startup of JDownloader, the last working list will be read. Those files (downloadListXXX.zip and linkcollectorXXX.zip) are not used for anything else than storage.
__________________
JD-Dev & Server-Admin |
#7
|
||||
|
||||
Quote:
1.) none of the added links are supported/being processed. or 2.) the way how you add the link is NOT part of Settings->Advanced Settings->LinkCrawler.autolearnextensionorigins
__________________
JD-Dev & Server-Admin |
#8
|
||||
|
||||
Quote:
It is NOT the number of links added to Linkgrabber window, because the dupe check takes places AFTER the crawling/processing of the links. Linkcrawler is decoupled from Linkgrabber due to speed optimization. Else a long list/list activity can block/slow down the Linkcrawler.
__________________
JD-Dev & Server-Admin |
#9
|
||||
|
||||
Quote:
and Linkcrawler and Linkchecker and Linkgrabber window all run simultaneously. LinkFilter are processed multiple times during linkcrawler and after linkcheck (because condition requires the link status to be checked first) For each host/plugin only ONE instance is running simultaneously but Settings->Advanced Settings->LinkChecker.maxthreads hosts/plugins may run simultaneously. But plugins may make use of special api that allows the checking of multiples links at once to save up requests and speedup the whole process. Quote:
optimization within Linkcrawler (eg process known faster plugins over known slower plugins).
__________________
JD-Dev & Server-Admin |
#10
|
||||
|
||||
Just a simple flowchart:
Linkcrawler: 1.) one/multiple links -> linkcrawlerjob 2.) linkcrawler is getting started and takes over one linkcrawlerjob 3.) linkcrawler now searches for supported links (linkcrawler rule, plugins...) and processes them. before any next processing of a link, the linkfilter are processed/checked to see if we can abort processing of the link. after any processing, packagizer rules are processed/checked. 4.) links that are not filtered and with status unknown are forwared to linkchecker 5.) once all links are processed, the linkcrawler finishes -> done Linkchecker: 1.) one/multiple links from linkcrawler(job) are forwared to linkchecker 2.) existing linkchecker instance for host/plugin will enqueue the link(s) or new linkchecker instance is enqueued/started 3.) checked links are forwarded to Linkgrabber window Linkgrabber: 1.) link(s) are getting added and once more Linkfilter and Packagizer rules are processed/checked. 2.) links that are not filtered by Linkfilter rules, will get dupe checked against all links in Linkgrabber and are only added if no dupe exists.
__________________
JD-Dev & Server-Admin |
#11
|
||||
|
||||
|
#12
|
||||
|
||||
You're welcome! In case of further questions, just ask
__________________
JD-Dev & Server-Admin |
|
|