JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 12.06.2022, 13:00
StefanM's Avatar
StefanM StefanM is offline
JD VIP
 
Join Date: Oct 2020
Posts: 311
Default Issue: Dupes from LinkGrabber list will be crawled, dupes from Download list won't

The issue is, that links, which have duplicates in LinkGrabber list are still being verified and online-checked, even though they won't be added by default.
Links, which have duplicates in Downloads list won't be verified or online-checked- unless I 'restore' them. This is correct!

For the full report with screenshots, please read the attachment.
And, please verify, if what I'm writing here is correct. If not, I would edit it accordingly, so others can use this as (correct) information.


Preface
JD provides the uses with an option to 'Restore x Filtered Links' - which is enabled by default.
Duplicate handling with default settings works as follows:
No duplicates at all will be added to LinkGrabber list, when a link to a duplicate already exists in Download list or in LinkGrabber list.
However, you can restore filtered duplicates of links which exist in Download list (but only those!). In order to be able to see restored links, you have to enable 'Already in Downloadlist' in 'Views' pane

Issue, I found
In my eyes, also verifying and online-checking of duplicates in LinkGrabber list should not be performed.
There is different duplicate crawling handling depending on which list the duplicates were found in.
Please see my test cases for details - in particular the red/bold phrases:

Case 1:
When I copy a link into the LinkGrabber window (Analyze and Add Links) whose URL is already in LinkGrabber list, the link is not added. It won't be shown, even with this box checked:

But I see the following in the BubbleNotifier: The link is obviously being verified and online-checked.

But Found Packages stays at zero.

Case 2:
If the link is not in LinkGrabber list but in Downloads list, it looks like this:

Here too - but only since a more recent update - the link is no longer added to LinkGrabber list. But in this case 2 it is not verified or online-checked - recognizable by the zeros in BubbleNotifier.
But the second difference to case 1 is that I have the option to restore the link that was filtered as a duplicate.
In fact, it's probably not a restoration. Because - if I click on the 'Restore 1 filtered links button' - then, and only then this link will be verified and online-checked:

Differences to case 1 are:
  1. The file is now added to LinkGrabber list and displayed - provided I have ticked the 'Already in Downloadlist' box. It is highlighted in red as a duplicate.
  2. Found Packages is now displayed as 1.

Case 3:
If I now try to add the link again, the link is not added once more and is not verified or online-checked either:

The behavior is identical to case 2, except that no restore is offered here and it is also not indicated why the link was not entered in the LinkGrabber list.
Attached Files
File Type: zip Duplicate Handling.zip (134.7 KB, 1 views)

Last edited by StefanM; 12.06.2022 at 15:31. Reason: Corrections after Developer Feedback and more precise description
Reply With Quote
  #2  
Old 12.06.2022, 13:48
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Quote:
Originally Posted by StefanM View Post
With one of the more recent updates an option 'Restore x Filtered Links' was introduced - which is enabled by default.
I'm sorry but this is not true at all. This button exists since 31.10.2013! It's functionality has not changed since then.
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 12.06.2022 at 14:03.
Reply With Quote
  #3  
Old 12.06.2022, 13:50
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Quote:
Originally Posted by StefanM View Post
No duplicates at all will be added to LinkGrabber list, when a link to a duplicate already exists in Download list or in LinkGrabber list.
Again that is not true. I've already explained here, https://board.jdownloader.org/showpo...79&postcount=2. You can add a link that is in Download list without any problem again to Linkgrabber list. As explained you cannot add the same link multiple times to Linkgrabber list.

When you use "Already in Download list" as filter condition, then of course it will be filtered/not added except you have a view that matches it, explained here https://board.jdownloader.org/showpo...23&postcount=7
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 12.06.2022 at 14:04.
Reply With Quote
  #4  
Old 12.06.2022, 13:53
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Case 1: you cannot add the same link multipe times to Linkgrabber list. You can disable this behaviour, see https://board.jdownloader.org/showpo...79&postcount=2
__________________
JD-Dev & Server-Admin
Reply With Quote
  #5  
Old 12.06.2022, 13:58
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

@Stefan: I've already explained it here, https://board.jdownloader.org/showpo...23&postcount=7
__________________
JD-Dev & Server-Admin
Reply With Quote
  #6  
Old 12.06.2022, 14:00
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Quote:
Originally Posted by StefanM View Post
In my eyes, also crawling of duplicates in LinkGrabber list should not be performed.
Quote:
Originally Posted by StefanM View Post
The issue is, that links, which have duplicates in LinkGrabber list are still being crawled, even though they won't be added by default.
JDownloader doesn't know the link being a duplicate until this duplicate check happens. So it must analyze/crawl/process in order to *know* the link (name, internal details..) and then can check for existing duplicate in LinkGrabber. The duplicate check happens AFTER analyze/crawl/process because only then internal details will be available. The duplicate check happens on internal details. for example, some hosts do have multiple domains or multipe url formats that lead to the same file and this information is not known in advance, so the link must be processed first. then duplicate check can detect a.com/test.txt and b.com/text.txt are the same file and not add it twice.

Quote:
Originally Posted by StefanM View Post
Links, which have duplicates in Downloads list won't be crawled - unless I 'restore' them. This is correct!
Because you've enabled a filter that has condition "Already in Downloads list".
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 12.06.2022 at 14:12.
Reply With Quote
  #7  
Old 12.06.2022, 14:19
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Quote:
Originally Posted by StefanM View Post
But Found Packages stays at zero.
Found Packages stays at zero because no new package got added to list.
Found links is 1 because the link is NOT filtered, but not added to list because it's already
part of the Linkgrabber list.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #8  
Old 12.06.2022, 14:24
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Quote:
Originally Posted by StefanM View Post
For the full report with screenshots, please read the attachment.
Either please attach the images OR upload to an image hoster and link them.
Please do NOT attach a pdf of your post!
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 12.06.2022 at 14:29.
Reply With Quote
  #9  
Old 12.06.2022, 14:24
StefanM's Avatar
StefanM StefanM is offline
JD VIP
 
Join Date: Oct 2020
Posts: 311
Default

Quote:
Originally Posted by Jiaz View Post
I'm sorry but this is not true at all. This button exists since 31.10.2013! It's functionality has not changed since then.
Of course I believe what you say.
But in all of my numerous installations it just showed up for the first time a few weeks ago.

Still remember this, because then I started to search for that, as it did not work as expected (which was due to another custom filter).

Is it possible, that somehow a setting in 'advanced settings' it was (without me doing that) set or kept as disabled?

I have thousands of dupes in my LinkGrabber list which prove that this option was not there or not enabled in my installations for a very loooong time then.
Reply With Quote
  #10  
Old 12.06.2022, 14:27
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Quote:
Originally Posted by StefanM View Post
Of course I believe what you say.
But in all of my numerous installations it just showed up for the first time a few weeks ago.
One explanation would be that you have customized your menu and either removed or customized it before the botton got added.
In both cases you'll have a custom configuration which does not contain the button.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #11  
Old 12.06.2022, 14:29
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Quote:
Originally Posted by StefanM View Post
Is it possible, that somehow a setting in 'advanced settings' it was (without me doing that) set or kept as disabled?
Already answered here, https://board.jdownloader.org/showpo...07&postcount=7
__________________
JD-Dev & Server-Admin
Reply With Quote
  #12  
Old 12.06.2022, 14:30
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Quote:
Originally Posted by StefanM View Post
I have thousands of dupes in my LinkGrabber list which prove that this option was not there or not enabled in my installations for a very loooong time then.
The restore feature has NOTHING to do with dupes. You can only add dupes when having the dupe check disabled, previously answered here https://board.jdownloader.org/showpo...28&postcount=6.
The feature to disable dupe checks in Linkgrabber was added on 31.03.2017 (default enabled). The dupe check itself exists since 2011.
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 12.06.2022 at 14:36.
Reply With Quote
  #13  
Old 12.06.2022, 14:38
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

@StefanM: Just to make it clear. Dupe check is NOT done on filename/filesize but ONLY on link/internal link!
Dupes and Mirrors are different things.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #14  
Old 12.06.2022, 14:40
StefanM's Avatar
StefanM StefanM is offline
JD VIP
 
Join Date: Oct 2020
Posts: 311
Default

Quote:
Originally Posted by Jiaz View Post
Found Packages stays at zero because no new package got added to list.
Found links is 1 because the link is NOT filtered, but not added to list because it's already
part of the Linkgrabber list.
I guess my wording - using the general terms crawling/grabbing - was misleading.

Well, I tested this at least 5 or 6 times back and forth:
I can see the grabbing process or better: the verification/online check process.

And I think it would be better, that - in case a duplicate was found - no verification/online check would be performed at all. This would also save a lot of time.

Real Life Scenario
Please note: I am talking about adding a list of links, not the contents of a web page, which would have to be crawled for links first.

I add a few hundred links (paste a list of links) to LinkGrabber, and all of them are dupes, which already exist in LinkGrabber list, it would only cost seconds for JD to figure that out.

But instead, all links are verified and online-checked first, which can take a lot of time. And after this process those dupes are not added to LinkkGrabber table.

This my observation!

And when I do the same with a list of links that have duplicates in Downloads list, then they won't be verified or online-checked, which is the behavior I asked for to implement it also for dupes in LinkGrabber list.
Reply With Quote
  #15  
Old 12.06.2022, 14:42
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

@StefanM: finally we come to the *real* topic or *issue* you want to be optimized!
So you would like to have an option/optimization that the crawling process should check for existing dupe in linkgrabber list
to avoid unnecessary processing of the link, just to later *trash* it because it's already part of Linkgrabber list.

So your wish is: add the same link again, abort it as soon as possible. in best case before it's been processed/online checked, right?
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 12.06.2022 at 14:46.
Reply With Quote
  #16  
Old 12.06.2022, 14:44
StefanM's Avatar
StefanM StefanM is offline
JD VIP
 
Join Date: Oct 2020
Posts: 311
Default

Quote:
Originally Posted by Jiaz View Post
@StefanM: Just to make it clear. Dupe check is NOT done on filename/filesize but ONLY on link/internal link!
Dupes and Mirrors are different things.
Yes, I'm aware of that and remember that you use the hash value for that, as any other duplicate finder software would do. (I created the German GUIs for SpaceMan99 and DuplicateCleaner )
Reply With Quote
  #17  
Old 12.06.2022, 14:55
StefanM's Avatar
StefanM StefanM is offline
JD VIP
 
Join Date: Oct 2020
Posts: 311
Default

Quote:
Originally Posted by Jiaz View Post
The restore feature has NOTHING to do with dupes. You can only add dupes when having the dupe check disabled, previously answered here **External links are only visible to Support Staff**....
The feature to disable dupe checks in Linkgrabber was added on 31.03.2017 (default enabled). The dupe check itself exists since 2011.
Again a misunderstanding!

If you look at the pdf you will see the screenshot with that many dupes. Those are dupes of links in Downloads list.

And I would not have gotten them, in case the filter/restore option would have been enabled. That's what I tried to say.
Reply With Quote
  #18  
Old 12.06.2022, 14:57
StefanM's Avatar
StefanM StefanM is offline
JD VIP
 
Join Date: Oct 2020
Posts: 311
Default

:P
Quote:
Originally Posted by jiaz View Post
@stefanm: Finally we come to the *real* topic or *issue* you want to be optimized!
so you would like to have an option/optimization that the crawling process should check for existing dupe in linkgrabber list
to avoid unnecessary processing of the link, just to later *trash* it because it's already part of linkgrabber list.

So your wish is: Add the same link again, abort it as soon as possible. In best case before it's been processed/online checked, right?
right!

Exactly as you already do it with links in Downloads list.
And for better understanding I created the pdf, where you can see this from the screenshots I was referring to.
Reply With Quote
  #19  
Old 12.06.2022, 15:02
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

@StefanM: I'm sorry but I don't understand! What is the problem here?
You add Link X into Linkgrabber and move to Downloads. Then you will be able to add Link X again to Linkgrabber except you've added a filter with condition *already in download list*. that will prevent this link to be added again. BUT in case you also have a matching view rule, then this link will be added to Linkgrabber because a matching view rule overrides a matching filter rule.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #20  
Old 12.06.2022, 15:05
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 76,841
Default

Quote:
Originally Posted by StefanM View Post
:P

right!

Exactly as you already do it with links in Downloads list.
And for better understanding I created the pdf, where you can see this from the screenshots I was referring to.
perfect! and I guess you're talking about your vk links, so I can use them for testing, right? as explained, this dupe check works on link/internal link, so I must check the plugin what information is available before the processing of the link
__________________
JD-Dev & Server-Admin
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 11:21.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.