JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #901  
Old 24.09.2019, 14:14
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,084
Default

thanks @mgpai for your fast help as always a great teacher
__________________
JD-Dev & Server-Admin
Reply With Quote
  #902  
Old 24.09.2019, 14:16
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,084
Default

@Amiganer: DLC is a crypted container format meant for sharing files. Not required here. I would recommend another script @Demongornot for context menu(action) maybe to add/remove a link from history that way Fetter Biff can just add links to list and manually add them to history
__________________
JD-Dev & Server-Admin
Reply With Quote
  #903  
Old 24.09.2019, 14:22
Fetter Biff
Guest
 
Posts: n/a
Default

Quote:
maybe that links should be in that Database as that links were for download (and so should be checked)
Yes, that would be good.

DLC is a "container", a file, that stores links, status of links, downloads and some more information, if I am right. One can save the links from JD's download and linkgrabber list to a DLC and the links in a DLC back to the link grabber window / download window.
Reply With Quote
  #904  
Old 24.09.2019, 14:23
Fetter Biff
Guest
 
Posts: n/a
Default

What actually is the history?
Reply With Quote
  #905  
Old 24.09.2019, 14:25
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,084
Default

Quote:
Originally Posted by Fetter Biff View Post
Yes, that would be good.

DLC is a "container", a file, that stores links, status of links, downloads and some more information, if I am right.
I'm sorry but DLC doesn't store status/some more information. those meta information are not supported at all and that's why this container is purely meant for sharing and not export/import/backup


Quote:
Originally Posted by Fetter Biff View Post
One can save the links from JD's download and linkgrabber list to a DLC and the links in a DLC back to the link grabber window / download window.
You can easily add the downloadListXXX.zip from cfg folder as Container and then chose to import into Linkgrabber. That way all meta information is kept!
__________________
JD-Dev & Server-Admin
Reply With Quote
  #906  
Old 24.09.2019, 14:36
Fetter Biff
Guest
 
Posts: n/a
Default

Quote:
I'm sorry but DLC doesn't store status/some more information. those meta information are not supported at all and that's why this container is purely meant for sharing and not export/import/backup
Very sorry. So the links only are stored in a DLC file? Encrypted? So you cannot use them outside of JD?

Quote:
You can easily add the downloadListXXX.zip from cfg folder as Container and then chose to import into Linkgrabber.
I do not have that any more, just the DLC I have.
Reply With Quote
  #907  
Old 24.09.2019, 14:48
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,084
Default

DLC containers can be opened with other Downloadmanagers as well but not with Texteditor because they are encrypted, correct.
For what you're trying to achieve (train/fill the history) the DLC will do fine
__________________
JD-Dev & Server-Admin
Reply With Quote
  #908  
Old 24.09.2019, 14:56
Fetter Biff
Guest
 
Posts: n/a
Default

Alright, thank you.
Reply With Quote
  #909  
Old 25.09.2019, 07:23
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

How can I get the list of added crawled links from a crawler id or a job id when it is finished ?

I tried multiples things using JSON and objects but I always got error...
Code:
    var lscq = '{\r\n  "collectorInfo" : true,\r\n  "jobIds": ' + jid + '\r\n}';
    var lscq2 = '{"collectorInfo" : true, "jobIds": ' + jid + '}';
    var jst = JSON.stringify(lscq);
    alert(callAPI("linkgrabberv2", "queryLinkCrawlerJobs", lscq2));


Edit :
Nevermind for the error, after trying many variant I finally found a way :
Code:
    var lscq = {
        "collectorInfo": true,
        "jobId": jid
    };
    alert(callAPI("linkgrabberv2", "queryLinkCrawlerJobs", lscq));
Still don't know what API call to use for getting which links said crawler job id added, but I'll still try to find it myself until someone can answer me anyway.

Last edited by Demongornot; 25.09.2019 at 10:03.
Reply With Quote
  #910  
Old 25.09.2019, 10:27
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,084
Default

@Demongornot: When adding links, you can specify to *remember/save* the jobID as meta information, so you can later use the jobID to query all found/processed links from the job. What do you want to achieve? then I can help better
__________________
JD-Dev & Server-Admin
Reply With Quote
  #911  
Old 25.09.2019, 11:39
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

Well for the script of anti double downloads, I try to use only one script using API trigger, so far by looking at all API call using "alert(event);" during download end and adding links to the link grabber, I found those two who fit my needs :
"event.id : LINK_UPDATE.finished" for download ended, and it provide me all info I need.
"event.id : STOPPED event.publisher : linkcrawler"
Though that latter don't really get me what I need, only the job id and crawler id are really exploitable, the event id "FINISHED" give only those two.

I tried using those ID to get to find the list of links that it added using the api call "queryLinkCrawlerJobs" who return nothing and "CrawledLinkQuery" who don't work as the job id cause an error by being read as a float while it require a double, even if I use "parseInt(job id);" while the same variable containing said work fine with "queryLinkCrawlerJobs".
I previously got the same error with the first code on my previous post, it read :
Code:
Can not deserialize instance of long[] out of VALUE_NUMBER_FLOAT token
 at [Source: {
  "collectorInfo" : false,
  "jobIds" : 1.569388206182E12

I lately tried to see if I could actually use the opposite way by using "var myCrawlerJob = myCrawledLink.getSourceJob();" but I got a "null" result...

I could simply go through all crawled links and check their URL, but this isn't a really optimised solution of multiple crawler jobs each adding multiples links are runnings...

Also I found that using "getAllCrawledLinks" after the API trigger "STOPPED" or "FINISHED" only return a partial list of links when crawling an URL with multiples links as the last links are not to be found in my array, actually only few of the crawled links show up...So I was forced to use a sleep delay to get them all...

My other solution would be to use the job id and crawler id (whichever is the biggest) and go through all the crawled links in descending order and treat every links who have an UUID larger than the job id or the crawler id, sadly the list isn't in order of the first to latest added, so I might have one of the added link in the first package, forcing me to go through basically all other links and check their UUID...
The only optimisation I can do is using the event.data of "STOPPED" API trigger and count how many links have been added using the "offline", "online" and "links" properties and once I got the same number of links UUID analysed end the loop, but here is the trap, I need a delay to lets all the file be available to "getAllCrawledLinks" which mean I can overlap with the next crawler job and don't get the correct number of links analysed...

So I am out of idea about how to analyse only the latest links and already existing ones in a CPU and memory usage friendly way...
Reply With Quote
  #912  
Old 25.09.2019, 13:20
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,084
Default

Lots of text So I try my best to understand and help

Notice: please don't make use of UUID the way you do now because there is now guarantee that they will stay the same. Just use them as numbers but avoid *magic* like comparing those via greater/less...

Instead of using api event system, better use the *ON_NEW_LINK* event that is triggered for every new link added to linkgrabber list and you just have to check for duplicate and then can modify/remove the link.

Or you can make use of *NEW_CRAWLER_JOB* and then toggle setAssignJobID(true) and (after next core update) and getUUID(next core update) and later you can make use of queryLinksParameter method with jobUUIDs, see https://my.jdownloader.org/developers/#tag_265
jobUUIDs is a long array and that's the cause of the error you got. you're trying to cast a number to a long[] to retrieve all links that are result of the job.

the api event system is very lightweight whereas the native events provide much more data/methods.

getSourceJob is only available during the crawling process and is cleared after the link is in linkgrabber.

remember you can contact me via mail and irc chat
__________________
JD-Dev & Server-Admin
Reply With Quote
  #913  
Old 25.09.2019, 14:28
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

Ok I see, thanks for the answer, I would still like to use API event if possible because it would allow to use a single script.
Obviously my first though was to use two script, one at new link added and another on download stop, but if I could get the list of added links per job id or crawler id it would be more practical and more optimised as all identical required variables and some of the tests will only got loaded in memory and executed once, also the LINK_UPDATE.finished is better than the trigger Download Stop which don't necessarily happen only when the download have finished, forcing to test every time a download stop, helping making the code simpler getting rid of the test to see if it is finished and rather only test of this is the API event I am looking for, which enclose the whole code for that in a "if" while giving possibility to use function shared by the two main feature of this script.
Edit : Getting the crawler job from the crawler id would be not only practical but also quite logical to be able to.

Last edited by Demongornot; 25.09.2019 at 14:31.
Reply With Quote
  #914  
Old 25.09.2019, 16:03
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,084
Default

Quote:
Originally Posted by Demongornot View Post
Edit : Getting the crawler job from the crawler id would be not only practical but also quite logical to be able to.
hmm, maybe I don't understand but /linkgrabberv2/queryLinkCrawlerJobs?query is what you are looking for?
https://my.jdownloader.org/developers/#tag_262
there is no method to lookup crawlerJob by crawlerID because there is no way to retreive that information.
when you add new links/jobs, you will get a jobID for this crawlerJob and then can use it to query for status/links..
there is no entry point that returns crawlerID because there can be multiple crawlerIDs involved during crawling


Quote:
Originally Posted by Demongornot View Post
if I could get the list of added links per job id or crawler id it would be more practical
this requires setAssignJobID to be set to true, so each resulting link will have a reference to its source job. you must either add the links with this option enabled or toggle it via script because it increases memory footprint

please know that a single jobID can result in multiple crawlers with different ids.
crawlerJob is the input -> one or more crawler are processing it -> resulting links
enable setAssignJobID and resultinglinks for crawlerJob can be queried via
queryLinksParameter method with jobUUIDs, see https://my.jdownloader.org/developers/#tag_265
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 25.09.2019 at 16:06.
Reply With Quote
  #915  
Old 25.09.2019, 19:58
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

Well I tried "queryLinkCrawlerJobs" using this :
Code:
var eventX = event;
if (eventX.publisher == 'linkcrawler' && eventX.id == 'STARTED') {
    var dt = JSON.parse(eventX.data);
    var jid = dt.jobId;
    var cid = dt.crawlerId;
    var lscq = {
        "collectorInfo": true,
        "jobId": jid
    };
    alert((callAPI("linkgrabberv2", "queryLinkCrawlerJobs", lscq)));
}
But it return "[]" only, also if I get the term correctly, "Crawler ID" is the ID of a crawler searching links from a single URL while "Job ID" is basically the process of looking for links from all the URLs that have been put into JD, which encapsulate as many Crawler as there is URLs (which is why there can be multiple Crawler id ?) ?
Or did I got it wrong ? I mean between "Job", "Crawler Job" (though I guess those two are the same, but we never know) and "Crawler" I am not sure what is actually what...
But well yes I get a jobID and I would like to retrieve links from it.

The issue is that it look like a complicated mess, "queryLinkCrawlerJob" return nothing, and even if it did I would get a "List<JobLinkCrawler>" but the "JobLinkCrawler" isn't used by any API method, and I need to set "setAssignJobID" to "true" but the only place I find it is in "AddLinksQuery" which also isn't returned by any API methods, I can't find any "queryLinksParameter" either...

It look like it would require a really messy way of lot of API call to get from a JobID to the list of links it added...
If Only I could get an "added time" for crawled links, it would simplify things as I would simply look for the latest added one and find those who came from the same job using ".getSourceJob" but I don't know how to get jobID from that and ".getSourceJob" return "null", though with added date itself I could look for those older than the Job...
Also if from ".getContainerURL", ".getContentURL", ".getOriginURL", ".getReferrerURL" and ".getURL" I knew which one was actually the original URL the crawler used to search them, I could simply compare all those with the same xxxURL.

I'm out of ideas of how to only get the latest added links, even by going through the whole crawled link list.
Reply With Quote
  #916  
Old 26.09.2019, 07:41
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 1,484
Default

Quote:
Originally Posted by Demongornot View Post
... I need to set "setAssignJobID" to "true" but the only place I find it is in "AddLinksQuery"...
Quote:
Originally Posted by Jiaz View Post
... you can make use of *NEW_CRAWLER_JOB* and then toggle setAssignJobID(true) ...
Code:
// Store Job ID in crawled links
// Trigger: "New Crawler Job"

job.setAssignJobID(true);
Reply With Quote
  #917  
Old 26.09.2019, 13:03
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

@mgpai Thanks you !
Just a question, if I use this command after the first link has been added, will said link still be referenced ?

@Jiaz
It increase the memory only during the crawling execution or it stay ? And if it is the latter, can it be cleared ?

Also I would love an API method to get "job" from "jobId" please !
Reply With Quote
  #918  
Old 26.09.2019, 14:14
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 1,484
Default

Quote:
Originally Posted by Demongornot View Post
.. if I use this command after the first link has been added, will said link still be referenced ?
It needs to be enabled/run BEFORE the source url/text is added to JD.

Quote:
Originally Posted by Demongornot View Post
Also I would love an API method to get "job" from "jobId" please !
Code:
var myJobId = jobId;
var apiLinks = callAPI("linkgrabberv2", "queryLinks", {
    "jobUUIDs": [myJobId]
});
alert(apiLinks);
Reply With Quote
  #919  
Old 26.09.2019, 15:45
Demongornot Demongornot is offline
JD Beta
 
Join Date: Sep 2019
Location: Universe, Local group, Milky Way, Solar System, Earth, France
Posts: 50
Default

Well...Using "job.setAssignJobID(true);" on "New Crawler Job" trigger cause that :
"TypeError: Cannot find function setAssignJobID in object org.jdownloader.extensions.eventscripter.sandboxobjects.CrawlerJobSandbox@1c578d. (#1)"

Details here :
Spoiler:
net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot find function setAssignJobID in object org.jdownloader.extensions.eventscripter.sandboxobjects.CrawlerJobSandbox@86a6d7. (#1)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3629)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3613)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3634)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3650)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.notFunctionError(ScriptRuntime.java:3714)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThisHelper(ScriptRuntime. java:2233)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2 215)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1333)
at script(:1)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
at org.jdownloader.scripting.JSHtmlUnitPermissionRestricter$SandboxContextFactory.doTopCall(JSHtmlUnitP ermissionRestricter.java:119)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
at net.sourceforge.htmlunit.corejs.javascript.Context.evaluateString(Context.java:1212)
at org.jdownloader.extensions.eventscripter.ScriptThread.evalUNtrusted(ScriptThread.java:286)
at org.jdownloader.extensions.eventscripter.ScriptThread.executeScipt(ScriptThread.java:178)
at org.jdownloader.extensions.eventscripter.ScriptThread.run(ScriptThread.java:158)


Quote:
Originally Posted by mgpai View Post
It needs to be enabled/run BEFORE the source url/text is added to JD.
Thanks you, though sadly for me it probably mean I can't actually toggle it by using anything else than "New Crawler Job" trigger, as I guess the API event "STARTED" by "linkconnector" fire it too late/after links are url&text are added and even if it didn't, the time I extract the jobId from the API data to get the job to call that line, the first link could already be added anyway...


Quote:
Originally Posted by mgpai View Post
Code:
var myJobId = jobId;
var apiLinks = callAPI("linkgrabberv2", "queryLinks", {
    "jobUUIDs": [myJobId]
});
alert(apiLinks);
Thanks, but it return [] only, even when I make sure links are still crawled.
Using this code :
Spoiler:
Code:
if (event.publisher == 'linkcrawler' && event.id == 'STARTED') {
    var dt = JSON.parse(event.data);
    var myJobId = dt.jobId;
    var apiLinks = callAPI("linkgrabberv2", "queryLinks", {
        "jobUUIDs": [myJobId]
    });
    alert(apiLinks);
}
Reply With Quote
  #920  
Old 26.09.2019, 15:55
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,084
Default

Quote:
Originally Posted by Demongornot View Post
Well...Using "job.setAssignJobID(true);" on "New Crawler Job" trigger cause that :
"TypeError: Cannot find function setAssignJobID in object org.jdownloader.extensions.eventscripter.sandboxobjects.CrawlerJobSandbox@1c578d. (#1)"
the new methods in Job are available since yesterday evening with latest update. Just update your JDownloader

Quote:
Originally Posted by Demongornot View Post
Thanks you, though sadly for me it probably mean I can't actually toggle it by using anything else than "New Crawler Job" trigger, as I guess the API event "STARTED" by "linkconnector" fire it too late/after links are url&text are added and even if it didn't, the time I extract the jobId from the API data to get the job to call that line, the first link could already be added anyway...
It's important to make that script blocking/synchronized so the crawling process doesn't start and you can change settings. "New Crawler Job" is the easiest way, as you already have
access to the job itself and can chance stuff. the crawling process will start after the script has ended (synchronized). using api even will also be possible (after next core update + synchronized)
but at the moment there is no api method available to change job remotely.

Quote:
Originally Posted by Demongornot View Post
Thanks, but it return [] only, even when I make sure links are still crawled.
Using this code :
Spoiler:
Code:
if (event.publisher == 'linkcrawler' && event.id == 'STARTED') {
    var dt = JSON.parse(event.data);
    var myJobId = dt.jobId;
    var apiLinks = callAPI("linkgrabberv2", "queryLinks", {
        "jobUUIDs": [myJobId]
    });
    alert(apiLinks);
}
jobUUIDs is expecting a long array and on STARTED it returns [] because the crawler has not yet started. you will have to wait for crawling to be finished
or you'll get none/partial results
__________________
JD-Dev & Server-Admin
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 11:10.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.