JDownloader Community - Appwork GmbH
 

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 15.04.2020, 03:04
Coldblackice Coldblackice is offline
Wind Gust
 
Join Date: Sep 2019
Location: San Francisco
Posts: 40
Default Script/LinkCrawler issues (like not respecting synchronous-execution setting)

I made a script that's supposed to crawl a website's profile page of user-saved favorites. The primary problem I'm having is that the API call to LinkCrawler results in LinkCrawler running endlessly, never ending. The second problem is that it's finding links that I haven't asked it to, which there are thousands of (I don't even know where they come from, since it's not in the page's source). The third issue is that despite enabling the "Synchronous Execution" setting, LinkCrawler continues to get called repeatedly even though the first one's run still hasn't finished. Here's what I'm seeing:

Here's the code:

Code:
// Download all favorited postings' images

if (interval >= 30000) {

    var linkz = [];

// User profile's JSON URL containing all their saved/favorited/liked postings
// JSON: **External links are only visible to Support Staff**

    var contentz = JSON.parse(getPage("**External links are only visible to Support Staff**));
    var prodz = content.products;

    var keyz = Object.keys(prodz);

// Construct the URLs of individual postings from each one's JSON "slug"

    Object.keyz(prodz).forEach(function(key) {
        var itemz = prodz[key].slug;
        linkz.push("**External links are only visible to Support Staff** + itemz + "/")
    });

// Initiate URL crawl/download

    callAPI("linkgrabberv2", "addLinks", {
        "autostart": true, // <- Set to 'true' (without quotes) to autostart the downloads
        "deepDecrypt": true, // <- Set to 'true' to enable deep analyse
        "links": linkz.join(" ")
    });
}

Here's the LinkCrawler rule I set in Settings (since I couldn't find a way to do this in the script, which I'd prefer rather than a universal rule in Settings):
Code:
[ {
  "enabled" : true,
  "cookies" : null,
  "updateCookies" : true,
  "logging" : false,
  "maxDecryptDepth" : 1,
  "id" : 1586860036960,
  "name" : "linkcrawlz",
  "pattern" : null,
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "(**External links are only visible to Support Staff**]P0\\.jpg)",
  "rewriteReplaceWith" : null
} ]

Here's the growing stack of LinkCrawlers I see piling up on top of each other despite "Synchronous Execution" being enabled in the script:

JDownloader Scripted LinkCrawler issues_4.13.2020.jpg


And here's the Packagerizer rule I've set:

JDownloader Scripted LinkCrawler issues_Packagerizer Rule_4.13.2020.png


Any ideas what the issue(s) may be? All I want is for the script to find and download the handful of P0.jpg's on each page, which are found in the first 10 seconds of the script's LinkCrawler running. I'd like LinkCrawler to terminate after finding these P0.jpg's, ignoring everything else, so that the URLs can then be automatically sent to the Download tab to be downloaded.

I'm sure I'm going about this inefficiently, so any additional pointers would be appreciated. Ideally, I would do everything in the script itself -- the script, the LinkCrawler rule, the Packagerizer rule, and filters if necessary.
Reply With Quote
  #2  
Old 15.04.2020, 10:54
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,659
Default

they all have different times
why don't you have a pattern with your lincrawler rule? null should make it a invalid rule
maybe with addlinks in your javascript, maybe use new line \r\n vs space to delimitate
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #3  
Old 15.04.2020, 12:12
Coldblackice Coldblackice is offline
Wind Gust
 
Join Date: Sep 2019
Location: San Francisco
Posts: 40
Default

Quote:
Originally Posted by raztoki View Post
they all have different times
why don't you have a pattern with your lincrawler rule? null should make it a invalid rule
maybe with addlinks in your javascript, maybe use new line \r\n vs space to delimitate
Ooooh, is that the case?? I thought if Pattern was null it would just operate without it, using whatever URL it was fed. I didn't know it would invalidate it.

The reason I didn't add a pattern was because the P0.jpg's don't always have the same base URL, so I wouldn't know how to make a pattern that could encompass/capture multiple different top-level domains. Any suggestions on that?

And do you happen to know why linkcrawler is being allowed to run again and again despite the first one not finishing? I thought asynchronous-support meant that only one script/linkcrawler could run at a time.

Thanks for the help, btw
Reply With Quote
  #4  
Old 15.04.2020, 14:31
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,659
Default

https?://[^/]+/

and no about the sync question I have never played with eventscripter to crawl for links.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #5  
Old 15.04.2020, 19:54
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 1,533
Default

Quote:
Originally Posted by Coldblackice View Post
... despite enabling the "Synchronous Execution" setting, LinkCrawler continues to get called repeatedly even though the first one's run still hasn't finished
"Synchronous Execution" works as expected. The script run will be finished almost immediately after the "addLinks" method is called, since that method only creates a crawler job and exits. It does not wait for the links to be collected/checked.

Use MYJD API methods to check if crawler/collector is active on each run, before adding the links.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 10:03.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.