Quote:
Originally Posted by pspzockerscene
@madmax2
If you ask me, improving mirror detection by "tolerating" a lot of seemingly random "mistakes" that filehosts make when renaming "duplicated" files is a very very bad idea.
We do already try to correct smaller mistakes e.g. if a host always replaces specified characters with other specified characters, we try to fix this inside our plugins but your case is different.
If you ask me, you want either one of those two things to happen
- Wrongly named files should get detected as mirrors too
or
- Remove/ignore such files
I would choose the first option.
If that " (2)" issue happens only for megaup.net and letsupload.org, you can make a **External links are only visible to Support Staff**... in JDownloader that removes that part of the filename --> Problem solved.
Additionally, you might want to contact the uploader of these files - chances are, the mistake was made by him (although I don't think so).
Also, I would not consider your issue as a JD issue.
The more you download, the more edge cases you will find and although JD does a lot of work for you already, you cannot expect it to handle such edge cases.
Also keep in mind that, in other cases, such names could have been set intentional which would lead to other failures if JD was to detect these ones as mirrors
-psp-
|
I don't think my issue should be consider an edge case
cos all these filehosters naming incorrectly is part of jdownloader filehoster plugins...
This problem will occur if anyone has mirrors in a package (that has those filehosters that are naming the file differently)
and then they click the start download triangle button..
If you look at my links I posted you can see
that the filename matches the the 3 words and then it change a bit on the differently filehoster
then match again on the S06E01 word..
What would the problem be if JDownloader matching algorithm sees that it matches those 3 words before it stop matching and then matches again on the S06E01 and identifies it as a mirror and skip/ignore the 3 filehoster that didn't quite match in that package?
The worst that can happen is, jdownloader keeps the file in the package not downloaded, and a human can look at later to see why it was not downloaded..
This is much better than JD wasting people's bandwidth downloading 3 different files which are exactly the same file...
Just had a look at my current hdd now, and noticed it has downloaded the same duplicates
2 duplicate for one show, and 3 duplicates for another show
This is not good cos it has
wasted bandwidth + time downloading unnecessary duplicates..as well as
wearing out the hdd a bit more now by downloading those dupes...which I would now need to delete as well..
This has been an ongoing issue that I have been experiencing here and there
but I just didn't report it yet, till now..
https://i.imgur.com/Xk0h6zL.png
If a human was looking at those links, they can see clearly that all those links are mirrors of the same file, based on their own human matching logic..
Yet JDownloader is seeing 3 different files within that package...
This is not good matching algorithm, and could / should be improved...
Additionally, you might want to contact the uploader of these files - chances are, the mistake was made by him (although I don't think so).
Personally I don't think the uploader had anything to do with how those filehoster has renamed the file slightly differently...
so no point in contacting them, or even if they would care to fix it (if it was due to them renaming it, which is like you said, unlikely due to them)
Would like to see what Jiaz can offer some input into this issue
and whether the matching algorithm can be tweaked somewhat like I said..
Maybe not matching 100% filename is necessary to say it is a mirror but slightly less than 100% same filename
provided if it has match the beginning of the filename is the same, until it didn't match then it match again on the
season number, episode number (e.g. S06E01)..
If it has match on two of those part of the filename, then JD can be somewhat be confident that is is a mirror..
since a different filename is unlike to match on two of those things ..
CASE 1
e.g
JD scans the links and sees the beginning of the filename...
Matching ("Fear.the.Walking.Dead") + non matching + matching (S06E01) so it can ignore rest of the filename
and confidently say this is a mirror.
<match><not match><match S06E01><ignore>
Conclusion = this is a mirror
A differently filename is unlikely to match the first beginning part of the filename
even if it has the same matching season+episode number
e.g.
Fear.the.Walking.Dead....S06E01.......rar
The.Walking.Dead...S06E01.....rar
<not match><match S06E01>
Conclusion = this is not a mirror
As you can see, S06E01 may have match for both filenames, but it does not match the beginning of the filename..
So they are not mirrors.
CASE 2
Fear.the.Walking.Dead....S06E01.......rar
Fear.the.Walking.Dead....S06E01......(2).rar
<match><match><not match (2)>
Conclusion = this is a mirror
----------------
I don't see how improving the algorithm like how I said should be consider tolerating ..
since it has match the beginning of the filename + the S06E01 filename...
JDownloader should be pretty confident that those links clearly are mirrors..
if the algorithm has match on those two parts of the filename..
This should be consider more of a smart matching improvement rather tolerating
since JD is unlikely to be incorrect in this instance...
A human would be 100% confident that it is mirror links, if they did the same thing..
This is like google improving their searching algorithm to get results
And again what is the worst that can happen?
JD would just ignore/skip those files that are in that package so that a human can look at it later to determine whether to download it or delete it..
This is better than what is happening right now which is
JD wasting bandwidth downloading 3 files that are exactly the same file.. just named a bit different
Note : This smarter algorithm could be added as an advance setting for people who want/need this better algorithm.
I can be a tester if needed, to see if this smarter algorithm work or not work