View Single Post
  #6  
Old 13.10.2019, 21:19
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

When I mean undeterminable, I meant that, in order to read said file to check for dupe, it would require to test all possible dates to find the file, from "host.com (script creation date).txt" to "host.com (actual date).txt" and incrementing one (minute at worst, one month at best) at the time, the number of possible combinaisons is too high for that...
While incrementing a number take almost no time or computing power.

If there was a command like "myString[] = getFilesNameInFolder(path)" I could just go through the array and check those who have "host + space (or any other separator) + string matching date format + '.txt'"...And from those, compute the date to see which one is next (I don't thrust array to be in order when created from a command that gather a list).

And even there if the user decide for any reasons to change his OS's date (cheating in a game ? Been there, done that) everything will be messed up.
Also if the user only download once on a particular host in one month and another month continuously download from another host, it will create files with only 1 url, making the whole path and file opening thing wasting time for a single URL having low chances to be the one we are looking for anyway, and another which could potentially contain a ridiculously high number of URLs, mainly if the user download many small files.
Occasional downloaders will get many files containing few URLs, and even if I let an option to choose the interval (hours, days, weeks?, months) there is still issues.

If the goal is to avoid checking for for already existing path, well considering that the script also check if the link is already written inside it (which I could make optional to save performances), it won't be compatible, but your idea without checking for already existing URLs inside the files before writing it will obviously consume less resources than using using path.exists() where path = host + " " + counter + ".txt".
But once we need to check for dupe for added links, the whole point of the system get lost as now there are a lot of possible dates to check from without possibility to know if the last one we checked was the last one written or not until we reach actual date...
Of course a solution would be to use a page/index file which basically contain all files names, but it defeat the whole performance saving purpose anyway as incrementing a counter will be faster and lighter than reading strings from a file.

Also the date system don't really provide a good solution for minimizing file size while keeping them under a certain limit as anyone can download way less one month than another from the same host.
And considering that I want to check backward, from the host with the highest counter number to the first, as older downloads URLs have more chances to have expired, the counter system is still faster, as incrementing the counter and checking if the path exist without reading the file and then going backward by decreasing the counter from max value to 0 (which anyway only do one file check and one read per file, just not sequentially) could be faster than doing the same by reading and sorting dates from an array, array which depend on a file which can be altered, deleted etc.

If the goal was only to save URLs for user only to read them before (and not another script) your idea would be better in term of performances, considering that in Javascript unlike other lower level languages, we have the same command for both opening and appending text and creating and writing text.
Reply With Quote