JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 13.10.2019, 03:10
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default Anti duplicates history Script

In answer to this :
Quote:
Originally Posted by Jiaz View Post
@mgpai/Demongornot: I'll suggest to create a new thread for the discussion about the development/ideas/questions for the dupe/history support. I can then move the posts to the new thread.
Here is the post, and here I'll post the prototype code for the first part of the script which is to save finished downloads in files.

CAUTION : I've done writing it and I have fixed the few typo so it compile but I haven't tested it yet, actually I tested none if it (I usually test bits of the code while writing it) except the function where the code come from mgpai and I wanted to make sure my modified version as a function worked.
I'll test it tomorrow, so only test run this code if you know what you are doing because I doubt I wrote it first time without any flaws, compiling in JDownloader only check a small part of the possible bugs and errors.

Code:
const MAX_URLS_PER_FILE = 20; //0 = no limits, I advise setting it to 0 or between 1000 (100 if you don't download a lot) and 100 000.
const FILE_SAVE_MODE = 0; //0 = Short URL with host per file(s); 1 = Long URL with host per file(s); 2 = Long URL on a single file.
const BASEPATH = JD_HOME + '\\History';

if (link.isFinished()) {
	var myDownloadLink = link;
	var path = getPath(BASEPATH);
	var nl = getEnvironment().getNewLine();
	var separator = ' ';
	if (path.exists()) {
		var linkHost = myDownloadLink.getDownloadHost();
		var linkURL = getShortURL(myDownloadLink);
		var fileText = linkURL;
		var filePath;
		var arrayText = [];
		var pass = true;
		if (FILE_SAVE_MODE > 0 || (FILE_SAVE_MODE == 0 && MAX_URLS_PER_FILE == 0)) {
			if (FILE_SAVE_MODE == 2) filePath = getPath(path + '\\History.txt');
			if (FILE_SAVE_MODE < 2) filePath = getPath(path + '\\' + linkHost + separator + '0.txt');
			if (filePath.exists()) {
				arrayText = readFile(filePath).split(nl);
				if (arrayText != null) {
					if (arrayText.indexOf(linkURL) >= 0) pass = false;
					fileText = nl + linkURL;
				}
			}
		} else {
			var fileCounter = 0;
			var previousLength = 0;
			var loopContinue = true;
			while (loopContinue) {
				filePath = getPath(path + '\\' + linkHost + separator + fileCounter + '.txt');
				if (filePath.exists()) {
					arrayText = readFile(filePath).split(nl);
					previousLength = arrayText.length;
					if (arrayText.indexOf(linkURL) >= 0) {
						loopContinue = false;
						pass = false;
					}
					fileCounter++;
				} else {
					if (fileCounter > 0 && previousLength < MAX_URLS_PER_FILE) {
						fileCounter--;
						fileText = nl + linkURL;
					}
					loopContinue = false;
				}
			}
			if (pass) {
				if (filePath.exists()) linkURL = nl + linkURL;
				writeFile(filePath, fileText, true);
			}
		}
	} else {
		var AllDownloadLinks = getAllDownloadLinks();
		if (AllDownloadLinks.length > 0) {
			var fileText;
			var hostsList = [];
			var urlsList = [];
			var linkURL;
			var linkHost;
			var hostIndex;
			var counter;
			for (counter = 0; counter < AllDownloadLinks.length; counter++) {
				linkURL = getShortURL(AllDownloadLinks[counter]);
				if (!(linkURL.isFinished())) continue;
				if (FILE_SAVE_MODE > 1) {
					urlsList.push(linkURL);
				} else {
					linkHost = AllDownloadLinks[counter].getDownloadHost();
					hostIndex = hostsList.indexOf(linkHost);
					if (hostIndex < 0) {
						hostsList.push(linkHost);
						urlsList.push([]);
					}
					urlsList[hostIndex].push(linkURL);
				}
			}
			if (urlsList.length > 0) {
				var filePath;
				if (FILE_SAVE_MODE > 1) {
					fileText = urlsList.join(nl);
					filePath = getPath(path + '\\History.txt');
					writeFile(filePath, fileText, true);
				} else {
					var counter2;
					var counter3;
					var fileCounter = 0;
					var interUrl = '';
					for (counter = 0; counter < urlsList.length; counter++) {
						if (FILE_SAVE_MODE == 0 && MAX_URLS_PER_FILE > 0) {
							counter3 = 0;
							for (counter2 = 0; counter2 < urlsList[counter].length; counter2++) {
								if (counter3 >= MAX_URLS_PER_FILE) {
									counter3 = 0;
									fileCounter++;
									filePath = getPath(path + '\\' + hostsList[counter] + separator + fileCounter + '.txt');
									writeFile(filePath, fileText, true);
									fileText = '';
									interUrl = '';
								}
								fileText += interUrl + urlsList[counter][counter2];
								interUrl = nl;
							}
						} else {
							fileText = urlsList[counter].join(nl);
							filePath = getPath(path + '\\' + hostsList[counter] + separator + fileCounter + '.txt');
							writeFile(filePath, fileText, true);
						}
					}
				}
			}
		}
	}
}

function getShortURL(DownloadLink) {
	if (FILE_SAVE_MODE > 0) return DownloadLink.getContentURL() || DownloadLink.getPluginURL();
	var url = DownloadLink.getProperty("LINKDUPEID") || link.getPluginURL();
	return url.replace(/(^(https?|ftp):\/\/[^\/]+\/)/, '').replace(/.+:\/\//, '');
}
@mgpai what you think about it ?
(I'll test and comment it tomorrow, it's past 2am now, my bed is calling)
Reply With Quote
  #2  
Old 13.10.2019, 14:39
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 638
Default

@Demongornot: Code appears to be fine (Haven't tried it in JD).

Have you considered generating timestamp based files instead of numeric counter + url count based system? Might consume less resources as it would elminate the need to query the counter/url count of existing files each time you need to add a url to the list. The user can choose a shorter (hour/day) frequency, or a longer (month/year) frequency, depending on his/her download traffic.
Reply With Quote
  #3  
Old 13.10.2019, 20:22
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

The issue there would be that I'll need to make a code to access files with an indeterminable name, it is easier to go through "host.com 0.txt" to "host.com 19.txt" than "host.com xx:xx.txt".
Alternatively I could simply put an option to not check if the URL is already written or not, it will save performances at the expense of some dupe URLs in the anti dupe URLs histories files (oh the irony), though since the links with the same URLs shouldn't get to the download list except if the user want them to, there shouldn't be (in theory) any dupe URLs inside the files anyway, and 2 identical URLs out of, lets say 10000 differents ones, is a minor size issue.
Reply With Quote
  #4  
Old 13.10.2019, 20:52
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 638
Default

Quote:
Originally Posted by Demongornot View Post
The issue there would be that I'll need to make a code to access files with an indeterminable name, it is easier to go through "host.com 0.txt" to "host.com 19.txt" than "host.com xx:xx.txt".
I wouldn't have suggested it if it was indeterminable. The string should be generated from the finished date.
Reply With Quote
  #5  
Old 13.10.2019, 21:21
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 638
Default

Quote:
Originally Posted by Demongornot View Post
it is easier to go through "host.com 0.txt" to "host.com 19.txt" than "host.com xx:xx.txt"
It should not be "host.com xx:xx.txt". That will be too vague. If finished date is "2019-10-13T17:46:44Z" (assuming xx:xx is hour:seconds in your example), the name should be, "host.com 2019-10-13 17.46.txt".

Just an idea. In any case, you are in a better position to decide the best way to implement your ideas.
Reply With Quote
  #6  
Old 13.10.2019, 22:19
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

When I mean undeterminable, I meant that, in order to read said file to check for dupe, it would require to test all possible dates to find the file, from "host.com (script creation date).txt" to "host.com (actual date).txt" and incrementing one (minute at worst, one month at best) at the time, the number of possible combinaisons is too high for that...
While incrementing a number take almost no time or computing power.

If there was a command like "myString[] = getFilesNameInFolder(path)" I could just go through the array and check those who have "host + space (or any other separator) + string matching date format + '.txt'"...And from those, compute the date to see which one is next (I don't thrust array to be in order when created from a command that gather a list).

And even there if the user decide for any reasons to change his OS's date (cheating in a game ? Been there, done that) everything will be messed up.
Also if the user only download once on a particular host in one month and another month continuously download from another host, it will create files with only 1 url, making the whole path and file opening thing wasting time for a single URL having low chances to be the one we are looking for anyway, and another which could potentially contain a ridiculously high number of URLs, mainly if the user download many small files.
Occasional downloaders will get many files containing few URLs, and even if I let an option to choose the interval (hours, days, weeks?, months) there is still issues.

If the goal is to avoid checking for for already existing path, well considering that the script also check if the link is already written inside it (which I could make optional to save performances), it won't be compatible, but your idea without checking for already existing URLs inside the files before writing it will obviously consume less resources than using using path.exists() where path = host + " " + counter + ".txt".
But once we need to check for dupe for added links, the whole point of the system get lost as now there are a lot of possible dates to check from without possibility to know if the last one we checked was the last one written or not until we reach actual date...
Of course a solution would be to use a page/index file which basically contain all files names, but it defeat the whole performance saving purpose anyway as incrementing a counter will be faster and lighter than reading strings from a file.

Also the date system don't really provide a good solution for minimizing file size while keeping them under a certain limit as anyone can download way less one month than another from the same host.
And considering that I want to check backward, from the host with the highest counter number to the first, as older downloads URLs have more chances to have expired, the counter system is still faster, as incrementing the counter and checking if the path exist without reading the file and then going backward by decreasing the counter from max value to 0 (which anyway only do one file check and one read per file, just not sequentially) could be faster than doing the same by reading and sorting dates from an array, array which depend on a file which can be altered, deleted etc.

If the goal was only to save URLs for user only to read them before (and not another script) your idea would be better in term of performances, considering that in Javascript unlike other lower level languages, we have the same command for both opening and appending text and creating and writing text.
Reply With Quote
  #7  
Old 13.10.2019, 22:53
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 638
Default

Quote:
Originally Posted by Demongornot View Post
When I mean undeterminable, I meant that, in order to read said file to check for dupe, it would require to test all possible dates to find the file, from "host.com (script creation date).txt" to "host.com (actual date).txt" and incrementing one (minute at worst, one month at best) at the time, the number of possible combinaisons is too high for that...
While incrementing a number take almost no time or computing power.
You just need to filter results using the download host name.

Code:
var folder = JD_HOME + "/history/";
var host = link.getDownloadHost();

var files = getPath(folder).getChildren().filter(function(file) {
    return file.toString().indexOf(host) > -1;
})

var filesDescending = files.reverse();

This is just FYI. I am not in any way suggesting one method is better than the other.
Reply With Quote
  #8  
Old 14.10.2019, 02:17
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

That's an awesome code !
I discovered the "getChildren" method only after I posted the comment, because in my code I didn't created the "History" folder and it didn't work at first, so looking for a folder creating method I also discovered the getChildren, I don't know why but I always miss the "FilePath" methods part in the help...
Most of the time I search for methods using ctrl+F search function, but the names aren't necessarily what I expected them to be, so...
Thanks you for this code anyway, I know this is just to show me and I highly appreciate this !

I fixed my code, for the moment, when a download finished and the "History" folder doesn't exist, on mode 0 and with a higher than 0 number of max downloads, I fixed 10 things.

Code:
Line 66, use ".isFinished()" on an URL rather than on a link, changed from "if (!(linkURL.isFinished())) continue;" to "if (!(AllDownloadLinks[counter].isFinished())) continue;"
Line 76, hostIndex if it was at value -1 stayed at -1 and created an error with the .push function, code added at line 75 "hostIndex = hostsList.length - 1;"
Line 101 - 104, Forgot to increment counter3, added at line 102 "counter3++"
Line 84, misunderstood the writeFile command, thought it created dirrectory too, added at line 80 "path.mkdirs();"
Line 57, changed "var fileText;" to "var fileText = ''" as it wrote "undefined" on the text file otherwise
Line 93, forgot to reset variable fileCounter to 0, creating files with increasing numbers regardless of the host, added at line 93 "fileCounter = 0;"
Line 96, Move "fileCounter++" to after its use as its value should start being used at 0 and not 1, line 98/99 moved from line 96 "fileCounter++;"
Line 119, forgot to change "link" to "DownloadLink" changed at line 119 from "link.getPluginURL();" to "DownloadLink.getPluginURL();"
Line 91, if condition prevented last file to be saved, added a condition at line 91 from "if (counter3 >= MAX_URLS_PER_FILE)" to "if (counter3 >= MAX_URLS_PER_FILE || counter2 == urlsList[counter].length - 1)"
Line 102 103, moved both code to add url to the file list and the new line code before the file creating "if" where it should be, lines 94 and 95 from lines 102 and 103 "fileText += interUrl + urlsList[counter][counter2];" and "interUrl = nl;"
And the new code is :
Code:
const MAX_URLS_PER_FILE = 50; //0 = no limits, I advise setting it to 0 or between 1000 (100 if you don't download a lot) and 100 000.
const FILE_SAVE_MODE = 0; //0 = Short URL with host per file(s); 1 = Long URL with host per file(s); 2 = Long URL on a single file.
const BASEPATH = JD_HOME + '\\History';

if (link.isFinished()) {
    var myDownloadLink = link;
    var path = getPath(BASEPATH);
    var nl = getEnvironment().getNewLine();
    var separator = ' ';
    if (path.exists()) {
        var linkHost = myDownloadLink.getDownloadHost();
        var linkURL = getShortURL(myDownloadLink);
        var fileText = linkURL;
        var filePath;
        var arrayText = [];
        var pass = true;
        if (FILE_SAVE_MODE > 0 || (FILE_SAVE_MODE == 0 && MAX_URLS_PER_FILE == 0)) {
            if (FILE_SAVE_MODE == 2) filePath = getPath(path + '\\History.txt');
            if (FILE_SAVE_MODE < 2) filePath = getPath(path + '\\' + linkHost + separator + '0.txt');
            if (filePath.exists()) {
                arrayText = readFile(filePath).split(nl);
                if (arrayText != null) {
                    if (arrayText.indexOf(linkURL) >= 0) pass = false;
                    fileText = nl + linkURL;
                }
            }
        } else {
            var fileCounter = 0;
            var previousLength = 0;
            var loopContinue = true;
            while (loopContinue) {
                filePath = getPath(path + '\\' + linkHost + separator + fileCounter + '.txt');
                if (filePath.exists()) {
                    arrayText = readFile(filePath).split(nl);
                    previousLength = arrayText.length;
                    if (arrayText.indexOf(linkURL) >= 0) {
                        loopContinue = false;
                        pass = false;
                    }
                    fileCounter++;
                } else {
                    if (fileCounter > 0 && previousLength < MAX_URLS_PER_FILE) {
                        fileCounter--;
                        fileText = nl + linkURL;
                    }
                    loopContinue = false;
                }
            }
            if (pass) {
                if (filePath.exists()) linkURL = nl + linkURL;
                writeFile(filePath, fileText, true);
            }
        }
    } else {
        var AllDownloadLinks = getAllDownloadLinks();
        if (AllDownloadLinks.length > 0) {
            var fileText = "";
            var hostsList = [];
            var urlsList = [];
            var linkURL;
            var linkHost;
            var hostIndex;
            var counter;
            for (counter = 0; counter < AllDownloadLinks.length; counter++) {
                linkURL = getShortURL(AllDownloadLinks[counter]);
                if (!(AllDownloadLinks[counter].isFinished())) continue;
                if (FILE_SAVE_MODE > 1) {
                    urlsList.push(linkURL);
                } else {
                    linkHost = AllDownloadLinks[counter].getDownloadHost();
                    hostIndex = hostsList.indexOf(linkHost);
                    if (hostIndex < 0) {
                        hostsList.push(linkHost);
                        urlsList.push([]);
                        hostIndex = hostsList.length - 1;
                    }
                    urlsList[hostIndex].push(linkURL);
                }
            }
            if (urlsList.length > 0) {
                path.mkdirs();
                var filePath;
                if (FILE_SAVE_MODE > 1) {
                    fileText = urlsList.join(nl);
                    filePath = getPath(path + '\\History.txt');
                    writeFile(filePath, fileText, true);
                } else {
                    var counter2;
                    var counter3;
                    var fileCounter = 0;
                    var interUrl = '';
                    for (counter = 0; counter < urlsList.length; counter++) {
                        if (FILE_SAVE_MODE == 0 && MAX_URLS_PER_FILE > 0) {
                            counter3 = 0;
                            fileCounter = 0;
                            for (counter2 = 0; counter2 < urlsList[counter].length; counter2++) {
                                fileText += interUrl + urlsList[counter][counter2];
                                interUrl = nl;
                                if (counter3 >= MAX_URLS_PER_FILE || counter2 == urlsList[counter].length - 1) {
                                    counter3 = 0;
                                    filePath = getPath(path + '\\' + hostsList[counter] + separator + fileCounter + '.txt');
                                    writeFile(filePath, fileText, true);
                                    fileCounter++;
                                    fileText = '';
                                    interUrl = '';
                                }
                                counter3++;
                            }
                        } else {
                            fileText = urlsList[counter].join(nl);
                            filePath = getPath(path + '\\' + hostsList[counter] + separator + fileCounter + '.txt');
                            writeFile(filePath, fileText, true);
                        }
                    }
                }
            }
        }
    }
}

function getShortURL(DownloadLink) {
    if (FILE_SAVE_MODE > 0) return DownloadLink.getContentURL() || DownloadLink.getPluginURL();
    var url = DownloadLink.getProperty("LINKDUPEID") || DownloadLink.getPluginURL();
    return url.replace(/(^(https?|ftp):\/\/[^\/]+\/)/, '').replace(/.+:\/\//, '');
}
I haven't tested the other modes yet and though when a download finish it save it, it don't save it on the proper file yet (it increment the counter, or don't decrement it or something like that, didn't look into it yet).
AFAIK the part to save all the finished downloads into multiples hosts files with a URL limit per file is working fine.

To test it in "real conditions" I have created a script which, when I press a toolbar button, basically reset the last download (a 10kb image so it isn't long) and start downloads, so it trigger properly "download stopped" scripts.
Reply With Quote
  #9  
Old 14.10.2019, 07:34
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 638
Default

It is better to use it like this, just in case the filepaths also contain folder which match the host name. Can choose to keep/remove '.reverse()' depending on your preference of sort order.

Code:
var files = getPath(folder).getChildren().filter(function(filePath) {
    return filePath.isFile() && filePath.toString().indexOf(host) > -1;
}).reverse();
Reply With Quote
  #10  
Old 14.10.2019, 18:14
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

I am retrofitting my script prototype with your code !
So if I understand, we can use a condition with "return" to return the data passed as argument in the function when the condition is true ?
I never thought we could use the filter method like that !
I would be more familiar with :
array.filter(word => word.toString().indexOf(host) > -1).reverse();
But it doesn't work in JD, so I guess there is no other way to do it as a method.
Reply With Quote
  #11  
Old 14.10.2019, 19:34
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 638
Default

Quote:
Originally Posted by Demongornot View Post
So if I understand, we can use a condition with "return" to return the data passed as argument in the function when the condition is true ?
Will create an array which contains only the elements which pass the test.

Quote:
Originally Posted by Demongornot View Post
I would be more familiar with :
array.filter(word => word.toString().indexOf(host) > -1).reverse();
Looks like it is a ECMAScript 6 method. You can only use ECMAScript 5 (if I am correct) methods in eventscripter.
Reply With Quote
  #12  
Old 14.10.2019, 19:55
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 66,134
Default

Big Thumbs up to @mgpai and @Demongornot
You do great work here and its nice to see what great magic you can do with little Javascript and an idea in mind
__________________
JD-Dev & Server-Admin
Reply With Quote
  #13  
Old 20.10.2019, 19:40
Amiganer Amiganer is offline
DSL Light User
 
Join Date: Mar 2019
Posts: 30
Default mirror links?

Hello.

How will Mirror links managed? If I'm correct "link.isFinished" is only triggert for the really downloaded link...

Last edited by Amiganer; 20.10.2019 at 19:44. Reason: new idea
Reply With Quote
  #14  
Old 22.10.2019, 19:33
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 66,134
Default

@Amiganer: link.isFinished sounds like remote api events?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #15  
Old 23.10.2019, 11:28
Amiganer Amiganer is offline
DSL Light User
 
Join Date: Mar 2019
Posts: 30
Default

Quote:
Originally Posted by Jiaz View Post
@Amiganer: link.isFinished sounds like remote api events?
The link in the Code is:

Quote:
if (link.isFinished()) {
It is the first line that do something in the script....
As we found out (as I mentended it) for the mirror links it is neccessary to run throught all finished links manually. I hope, the "DOWNLOADED FROM MIRROR" is marked in the links.

Bye, Christian
Reply With Quote
  #16  
Old 23.10.2019, 14:57
Amiganer Amiganer is offline
DSL Light User
 
Join Date: Mar 2019
Posts: 30
Default Thoughts about it

Hello.

A thing, that comes in my mind....

A Mirror-link is produced, if the original-link is finished, that means, a obsulete Mirror-link gets obsolite, if another link is finished....

So handling mirror-links can be done if another link is fished.

Bye, Christian
Reply With Quote
  #17  
Old 23.10.2019, 17:47
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 66,134
Default

link.isFinished returns true on
Quote:
case FINISHED:
case FINISHED_MIRROR:
case FINISHED_CRC32:
case FINISHED_MD5:
case FINISHED_SHA1:
case FINISHED_SHA256:
so yes, it also returns true on mirror link
__________________
JD-Dev & Server-Admin
Reply With Quote
  #18  
Old 15.11.2019, 13:09
Amiganer Amiganer is offline
DSL Light User
 
Join Date: Mar 2019
Posts: 30
Default Status of EventScript?

Hello.

May I ask about the status of the script? Is the last version here running?

Is it possible to integrate the "old Already-Downloaded" list in it?
If not, I think I can make a script to do that...

Bye,
Christian
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 22:49.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.