JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #961  
Old 05.10.2019, 04:16
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

@mgpai thanks a lot but I already started working on the script using one for finished downloads and another for new links where I use "A Download Stopped" + checking "myDownloadLink.isFinished();" and "A new link has been added", it is way more optimised, performance friendly and faster than trying to use a single script with many API calls, I mean, I already feel bad because I have to go twice through the same array xD.
I am busy doing a lot of things and I have a life rhythm leaving me with not much free time so the script(s) progress slowly, but it is still one of my priorities.

By the way, how much URLs per files should I put by default to optimise both memory and performances, considering that if the url is found in the first file(s) it will pay off to have "many" small files but it will punish if the download is on the latest one (also knowing that I'll first compare with downloads already in JD) ?

Also my current code remove the http(s) and the www when it is in the link because I thought that there could be cases where one files have been downloaded through a "http" and another through "https", and knowing that some websites accept both with and without "www" it might also reduce possible different URLs pointing to the same address, but is it a good idea or should I remove it or make it optional ?
Reply With Quote
  #962  
Old 05.10.2019, 05:13
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 16,297
Default

removing protocol and subdomain prefix might improve, but wont entirely as many sites have multiple domains, and or continually add new ones. some plugins set a unique identifier (most sites have uid) to combat that. We also typically also correct all urls into JD to one format protocol://domain/(path/)?uid. maybe also adding feature on checksumming either from advertised (hoster end) and confirmation your end could also assist. Note and many alter small components of files to create unique checksums.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #963  
Old 05.10.2019, 07:56
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 644
Default

Quote:
Originally Posted by Demongornot View Post
... I use "A Download Stopped" + checking "myDownloadLink.isFinished();"...
This will not return all 'finished' links, but only the ones that stopped. Mirror links are marked as finished by JD, without downloading them. Also it is possible the user marks a download as "finished" too (using context menu command) without starting/downloading or finishing it. Neither of them will trigger this event.

While it is possible to iterate the package links with this event [link.getPackage().getDownloadLinks()] and find the related links which were marked by JD as "mirror finished", you may not be able to get the finished link from a package using the same method, if the user has manually marked all the links in the package as 'finished".

Quote:
Originally Posted by Demongornot View Post
... By the way, how much URLs per files should I put by default to optimise both memory and performances ...
Not sure. Most modern systems have the resources to handle large files. I have created a few scripts which require to read/load several 100,000 links at once, without having the need to limit it (based on feedback from the people who are using it). Also, a plain text file which contain only the urls will take much less resources (I presume) compared to keeping the links in the list in native format (as the user currently does), which will also contain the various link properties and download progress related data.

While a single file will be easier to manage. A multiple file design might be required/useful in some cases. Guess Jiaz can provide insight in this matter.

It may also be easier to create files on per hoster basis (as suggested by Jiaz), instead of limiting the number of urls per file. It will prevent having to iterate all the stored urls, by only having to query those that belong to a particular hoster.

Quote:
Originally Posted by Demongornot View Post
Also my current code remove the http(s) and the www when it is in the link because I thought that there could be cases where one files have been downloaded through a "http" and another through "https", and knowing that some websites accept both with and without "www" it might also reduce possible different URLs pointing to the same address, but is it a good idea or should I remove it or make it optional ?
Quote:
Originally Posted by raztoki View Post
.. We also typically also correct all urls into JD to one format protocol://domain/(path/)?uid
While rare, it is quite possible the same file will be added to JD with a different url. If a "LINKDUPEID" is available for a link you can use it instead of or in conjunction with the download url. Most users may like to have it as a default feature rather than optional.

Code:
var url = link.getProperty("LINKDUPEID") || link.getPluginURL();
Reply With Quote
  #964  
Old 07.10.2019, 02:22
Amiganer Amiganer is offline
DSL Light User
 
Join Date: Mar 2019
Posts: 30
Default Preventing double Downloads

Hello.

I use the script from post 950 in this thread. It copies all finished downloads to a new container (=Already Downloaded). The listed flag is disabled.
In this few days, I found out that not all links are moved. There is no system in that, only most of them are small files like pictures, some are longer archives.
Does that Script has to be run in synchron Mode? I have that not enabled.

Bye,
Christian
Reply With Quote
  #965  
Old 07.10.2019, 11:30
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 644
Default

Quote:
Originally Posted by Amiganer View Post
... In this few days, I found out that not all links are moved. There is no system in that, only most of them are small files like pictures, some are longer archives ...
I was not able to reproduce the behvior. Are those files mirrors by any chance? Mirrors are marked by JD as finished, but will not trigger that ("download stopped") event, and hence will not be moved. If so, the script can be modified to identify/move such files, or a script similar to the first script in #954 can be used.
Reply With Quote
  #966  
Old 08.10.2019, 00:20
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

@raztoki I'll try to use the plugin uid rather than host when this is possible then.

@mgpai
Could you provide me urls that would create mirror links recognised by JDownloader.
I don't know how it does that, if it need to be on the same folder or not as I tried downloading the same file from two different hosts and save them in different folder where it just started downloaded normally then I tried again on the same folder, and I just got prompted that the file already existed and there are no choice for any kind of mirror.
So I have no idea how it work in JDownloader.
Also I thought about it I was about to implement a logic which check if the file have downloaded by checking its size, but all those cases match an already scenario where we should actually set the URLs in the list of already downloaded files anyway.

I was thinking about making both a file per host but when it reach a certain number of links make another versions, for example host then host1, host2 etc.
Actually to avoid the cases where host would have a number on their name messing things up, writing something like host!1 or host_1, as valid hostnames only accept letters numbers dot and "-" sign anyway.

Also, in absence of LINKDUPEID and plugin id, is .getContentURL() the right one for individual files ? There are so many 'URLs' that I don't know which one use...
Reply With Quote
  #967  
Old 08.10.2019, 08:49
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 644
Default

Quote:
Originally Posted by Demongornot View Post
... Could you provide me urls that would create mirror links recognized by JDownloader. I don't know how it does that, if it need to be on the same folder or not
JD will use name, size, hash etc. (depending on default/user settings), to determine mirror links.

MIRROR LINK: Is detected by JD at the time of starting the download, by comparing it with other links in the SAME package based on the "Mirror Detection" settings (Advanced Settings).

When a download is completed, the final status of that link will be set to "FINISHED" and that of it's "mirrors" will be set to "FINISHED_MIRROR". You will just need to query that status to determine if a download is finished.

DUPLICATE FILE : If a file with same name exists in the destination folder (irrespective of the package where the download link originated from), JD will consider it as a duplicate file.

Quote:
Originally Posted by Demongornot View Post
Also, in absence of LINKDUPEID and plugin id, is .getContentURL() the right one for individual files ? There are so many 'URLs' that I don't know which one use...
link.getPluginURL() will always return the final url (AFAIK). On the other hand, link.getContentURL() will be null if the container is encrypted.
Reply With Quote
  #968  
Old 09.10.2019, 17:40
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

I couldn't trigger a mirror link status even with two identical files (same size, different host, different name but in advanced settings I turned off name matching).

Anyway, after experimenting I concluded that myDownloadLink.getDownloadHost() is reliable to get a proper host name for the files that will contain the URLs.

I tried to see what I got with myDownloadLink. + getPluginURL() and getProperty("LINKDUPEID");
For PluginURL, sometime I get protocol://domain/(path/)?uid as raztoki stated and sometime I got domain://(path/)?uid.
And for LINKDUPEID I either get domain://(path/)?uid or (path/)?uid and sometime I get a format of websitecom_(path/)?uid :
(protocol://)website.com/path/video/?quality=480 turned into :
websitecom_path_480p.

I never saw PluginURL with Domain first format without having LINKPUPEID being identical, but in case of I made the code so that it filter out domain and protocol anyway.

So I made this code which always return the (path/)?uid or LINKDUPEID version of it :

Code:
var myShortURL = Discombobulator(myDownloadLink.getPluginURL(), myDownloadLink.getProperty("LINKDUPEID"));
function Discombobulator(pluginURL, LINKDUPEID) {
    var shortURL = ''; //Check if there is a LINKDUPEID and take LINKDUPEID or PluginURL depending
    if (LINKDUPEID == null) {
        shortURL = pluginURL;
    } else {
        shortURL = LINKDUPEID.toString();
    }
    var authority = shortURL.indexOf('://');
    if (authority < 0) return shortURL; //Check if URL contain '://' if not return it as it is already the shortest
    /*Check if there is a protocol before the '://' meaning it contain protocol and host.
If it contain protocol, remove protocol and host and return, otherwise remove host and return*/
    var shorterURL = shortURL.substring(authority + ('://').length);
    if (turboEncabulator(shortURL.substring(0, authority))) return shorterURL.substring(shorterURL.indexOf('/') + 1);
    return shorterURL;
}

function turboEncabulator(bit) {
    var protocols = ['http', 'https', 'ftp'];
    for (var i = 0; i < protocols.length; i++)
        if (bit == protocols[i]) return true;
    return false;
}
Do you think this code is solid enough to deal with all possible cases or did I miss something ?

Last edited by Demongornot; 09.10.2019 at 17:43.
Reply With Quote
  #969  
Old 09.10.2019, 18:56
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 644
Default

Quote:
Originally Posted by Demongornot View Post
Do you think this code is solid enough to deal with all possible cases or did I miss something ?
Not all links will have a unique ID. It is better to include the domain name in "shortURL". For e.g., if the url is "https://board.jdownloader.org/images/logo.png", the script will currently generate only "images/logo.png" as "shortURL".

Also, "LINKDUPEID" is not useful outside of JD. It is better to store the final url in its original format, and strip the protocol only when comparing them during dupe check. This will allow the list to be used outside of JD (review/edit/open link in browser/Add back to JD etc.).

The plugin url is not always useful (e.g. youtube plugin url) outside of JD. It is better to use content url wherever possible and have plugin url as fallback (From what I have seen, this will return a usable url when content url is null).

You can strip to the protocol at the time of dupe check. For example:
Code:
var duplicate = linkInList.replace(/https?:\/\/(www\.)?/, "") == linkInJD.replace(/https?:\/\/(www\.)?/, "");

Not in any way suggesting this is the way to do it. Just sharing my thoughts on the subject.
Reply With Quote
  #970  
Old 09.10.2019, 21:47
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

Is it really necessary to keep the domain name as the file in which short URL would be saved will be already named as the the domain name ?
Or do you suggesting that this isn't enough as subdomain.domain.com can turn into domain.com when using myDownloadLink.getDownloadHost() ?
Because in this case I already have a code which return the whole domain and subdomains without the protocol and path, could make it the file name, but this mean that links from the same domain with different subdomains won't be checked, so I think the getDownloadHost is better in that regard.

Alternatively, as you said :
Quote:
Originally Posted by mgpai View Post
While rare, it is quite possible the same file will be added to JD with a different url. If a "LINKDUPEID" is available for a link you can use it instead of or in conjunction with the download url. Most users may like to have it as a default feature rather than optional.]
I could write for each lines : completeURL ShortURL:LINKDUPEID and use the (space)ShortURL: as keyword for indexOf, if it return -1 I check the whole line which is the complete url minus protocol and "www" and when it return a positive value ShortURL would be LINKDUPEID when available, otherwise it will use the code I posted earlier to get a shortURL.
Would it work better ?

Well in the case where we want users to be able to manually interact with the URLs indeed Plugin URL isn't the way to go and I like your dupe check code.
Reply With Quote
  #971  
Old 09.10.2019, 22:35
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 644
Default

Quote:
Originally Posted by Demongornot View Post
Is it really necessary to keep the domain name as the file in which short URL would be saved will be already named as the the domain name ?
If you are using a mulitple file system (per host basis), it should be enough to just store the "shortURL" generated by your snippet. Associating the "shortURL" with host (e.g. link.getDownloadHost()+shortURL), would be required only if single file system is adopted. I should have made the distinction clear in my previous reply. Sorry for the confusion.

Storing the urls in original format may not be necessary if the script will be primarily used for dupe check.
Reply With Quote
  #972  
Old 09.10.2019, 22:46
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

No problems
So I'll make files title being getDownloadHost_number.txt containing lines being shortURL.
Considering the default path will be JD_HOME + '\\History' this isn't really for user but rather for dupe check.
Using short URL have the advantage of lowering file size and required performances when checking for match.
But the way it work could allow for user to set their own path, so I guess I could make an option to store the whole URL only without protocol and www, but this is a one time only decision as obviously changing formats would make things complicated, that's why I didn't really considered making it an option, but well if someone want to, why not.
Reply With Quote
  #973  
Old 09.10.2019, 22:54
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 644
Default

Quote:
Originally Posted by Demongornot View Post
So I'll make files title being getDownloadHost_number.txt containing lines being shortURL.
Should be fine.
Reply With Quote
  #974  
Old 09.10.2019, 23:53
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

Quote:
Originally Posted by mgpai View Post
Should be fine.
Great
By the way, am I good using only http, https and ftp as protocols ?
I read that JDownloader also support Metalinks and Podcasts, and I don't know how those protocols work, as what I understood from a quick read is that Metalink is a collection of regular URL but I don't know how JDownloader handle those anyway.
Reply With Quote
  #975  
Old 10.10.2019, 20:31
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 644
Default

Quote:
Originally Posted by Demongornot View Post
.... am I good using only http, https and ftp as protocols ?
I read that JDownloader also support Metalinks and Podcasts, and I don't know how those protocols work, as what I understood from a quick read is that Metalink is a collection of regular URL but I don't know how JDownloader handle those anyway.
With regards to protocol, I know of one more - "usenet". There might be others. Jiaz should be able to confirm.

As far as the containers are concerned, the final url will always be available as 'content url' (regular container) or 'plugin url' (encrypted/protected containers).

You can also try this code to generate the 'shortURL':
Code:
var link = myDownloadLink;
var host = link.getDownloadHost();
var url = link.getProperty("LINKDUPEID") || link.getPluginURL();
var shortURL = url.replace(new RegExp(".+:\/\/.*" + host + "/"), "").replace(/.+:\/\//, "");
Reply With Quote
  #976  
Old 10.10.2019, 20:44
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

Good, I was afraid it could be a list of URL split by a comma or something like that.

Awesome code, I haven't learn how to control those string and character yet for regular expression, replace and all that.
I tested it and well it is impressive how this can filter out so many things in a single line, including cases with subdomains that getDownloadHost() don't return and the domainpluginname:// case too !

Last edited by Demongornot; 10.10.2019 at 20:56.
Reply With Quote
  #977  
Old 10.10.2019, 21:19
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 644
Default

Quote:
Originally Posted by Demongornot View Post
... how to control those string and character yet for regular expression, replace and all that ...
I had used 'download host' in the expression to make it restrictive. But if the subdomains can be either before or after the domain, based on the examples you provided in this post (pre-edit), you can use broader match pattern, without including the download host in it.

Code:
var link = myDownloadLink;
var url = link.getProperty("LINKDUPEID") || link.getPluginURL();
var shortURL = url.replace(/(^(https?|ftp):\/\/[^\/]+\/)/, "").replace(/.+:\/\//, "");

Last edited by mgpai; 10.10.2019 at 22:32.
Reply With Quote
  #978  
Old 11.10.2019, 00:06
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

I tested your 2 codes and mine and I got to the conclusion that mine and your second one does the same thing but your first on get trouble when the "DownloadHost" differ from what is in the url.

Using this code :
Trigger : Downloadlist Contextmenu Button Pressed
Code:
myDownloadlistSelection = dlSelection;
if (myDownloadlistSelection.isLinkContext() == true) {
    var myDownloadLink = myDownloadlistSelection.getContextLink();

    var rAr = Discombobulator(myDownloadLink.getPluginURL(), myDownloadLink.getProperty("LINKDUPEID"));
    var host = myDownloadLink.getDownloadHost();
    var url = myDownloadLink.getProperty("LINKDUPEID") || myDownloadLink.getPluginURL();
    var rAr1 = url.replace(new RegExp(".+:\/\/.*" + host + "/"), "").replace(/.+:\/\//, "");
    var rAr2 = url.replace(/(^(https?|ftp):\/\/[^\/]+\/)/, "").replace(/.+:\/\//, "");
    var nl = getEnvironment().getNewLine();
    var sep = nl + "_______________________________" + nl;
    var t = ["Demongornot's :" + nl, "mgpai's 1 :" + nl, "mgpai's 2 :" + nl, "Host :" + nl, "LINKDUPEID or Plugin URL :" + nl];
    alert(t[0] + rAr + sep + t[1] + rAr1 + sep + t[2] + rAr2 + sep + t[3] + host + sep + t[4] + url);
}

function Discombobulator(pluginURL, LINKDUPEID) {
    var shortURL;
    if (LINKDUPEID == null) {
        shortURL = pluginURL;
    } else {
        shortURL = LINKDUPEID.toString();
    }
    var authority = shortURL.indexOf('://');
    if (authority < 0) return shortURL;
    var shorterURL = shortURL.substring(authority + ('://').length);
    if (turboEncabulator(shortURL.substring(0, authority))) return shorterURL.substring(shorterURL.indexOf('/') + 1);
    return shorterURL;
}

function turboEncabulator(bit) {
    var protocols = ['http', 'https', 'ftp'];
    if (protocols.indexOf(bit.toLowerCase()) >= 0) return true;
    return false;
}
I got this :
Code:
Demongornot's :
embed/xxxx
_______________________________
mgpai's 1 :
streamango.com/embed/xxxx
_______________________________
mgpai's 2 :
embed/xxxx
_______________________________
Host :
fruithosts.net
_______________________________
LINKDUPEID or Plugin URL :
(protocol)streamango.com/embed/xxxx
As the host name and url don't match, other than that your first script work in any cases when there are subdomains before or after AFAIK.

Last edited by Demongornot; 11.10.2019 at 00:10.
Reply With Quote
  #979  
Old 11.10.2019, 20:17
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 66,134
Default

@mgpai/Demongornot: I'll suggest to create a new thread for the discussion about the development/ideas/questions for the dupe/history support. I can then move the posts to the new thread.
sorry that I'm so quiet but I don't have much time at the moment :(
__________________
JD-Dev & Server-Admin
Reply With Quote
  #980  
Old 11.10.2019, 20:50
dsfsdfasdfasf dsfsdfasdfasf is offline
Vacuum Cleaner
 
Join Date: May 2012
Posts: 15
Default

Hey mgpai, jiaz sent me here. Is it possible to blacklist a proxy via eventscripter when it causes a 403 geoblocking state?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 13:14.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.