#1
|
|||
|
|||
Script/LinkCrawler issues (like not respecting synchronous-execution setting)
I made a script that's supposed to crawl a website's profile page of user-saved favorites. The primary problem I'm having is that the API call to LinkCrawler results in LinkCrawler running endlessly, never ending. The second problem is that it's finding links that I haven't asked it to, which there are thousands of (I don't even know where they come from, since it's not in the page's source). The third issue is that despite enabling the "Synchronous Execution" setting, LinkCrawler continues to get called repeatedly even though the first one's run still hasn't finished. Here's what I'm seeing:
Here's the code: Code:
// Download all favorited postings' images if (interval >= 30000) { var linkz = []; // User profile's JSON URL containing all their saved/favorited/liked postings // JSON: **External links are only visible to Support Staff** var contentz = JSON.parse(getPage("**External links are only visible to Support Staff**)); var prodz = content.products; var keyz = Object.keys(prodz); // Construct the URLs of individual postings from each one's JSON "slug" Object.keyz(prodz).forEach(function(key) { var itemz = prodz[key].slug; linkz.push("**External links are only visible to Support Staff** + itemz + "/") }); // Initiate URL crawl/download callAPI("linkgrabberv2", "addLinks", { "autostart": true, // <- Set to 'true' (without quotes) to autostart the downloads "deepDecrypt": true, // <- Set to 'true' to enable deep analyse "links": linkz.join(" ") }); } Here's the LinkCrawler rule I set in Settings (since I couldn't find a way to do this in the script, which I'd prefer rather than a universal rule in Settings): Code:
[ { "enabled" : true, "cookies" : null, "updateCookies" : true, "logging" : false, "maxDecryptDepth" : 1, "id" : 1586860036960, "name" : "linkcrawlz", "pattern" : null, "rule" : "DEEPDECRYPT", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : "(**External links are only visible to Support Staff**]P0\\.jpg)", "rewriteReplaceWith" : null } ] Here's the growing stack of LinkCrawlers I see piling up on top of each other despite "Synchronous Execution" being enabled in the script: JDownloader Scripted LinkCrawler issues_4.13.2020.jpg And here's the Packagerizer rule I've set: JDownloader Scripted LinkCrawler issues_Packagerizer Rule_4.13.2020.png Any ideas what the issue(s) may be? All I want is for the script to find and download the handful of P0.jpg's on each page, which are found in the first 10 seconds of the script's LinkCrawler running. I'd like LinkCrawler to terminate after finding these P0.jpg's, ignoring everything else, so that the URLs can then be automatically sent to the Download tab to be downloaded. I'm sure I'm going about this inefficiently, so any additional pointers would be appreciated. Ideally, I would do everything in the script itself -- the script, the LinkCrawler rule, the Packagerizer rule, and filters if necessary. |
#2
|
||||
|
||||
they all have different times
why don't you have a pattern with your lincrawler rule? null should make it a invalid rule maybe with addlinks in your javascript, maybe use new line \r\n vs space to delimitate
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
#3
|
|||
|
|||
Quote:
The reason I didn't add a pattern was because the P0.jpg's don't always have the same base URL, so I wouldn't know how to make a pattern that could encompass/capture multiple different top-level domains. Any suggestions on that? And do you happen to know why linkcrawler is being allowed to run again and again despite the first one not finishing? I thought asynchronous-support meant that only one script/linkcrawler could run at a time. Thanks for the help, btw |
#4
|
||||
|
||||
https?://[^/]+/
and no about the sync question I have never played with eventscripter to crawl for links.
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
#5
|
|||
|
|||
Quote:
Use MYJD API methods to check if crawler/collector is active on each run, before adding the links. |
Thread Tools | |
Display Modes | |
|
|