JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 10.06.2024, 22:15
TGU TGU is offline
Mega Loader
 
Join Date: Jun 2024
Location: International Waters where DRM/DMCA protections are ignored
Posts: 67
Default Linkgrabber missing expected queryLinkCrawlerJobs results

I'm having an issue in where the results from RemoteAPI "/linkgrabberv2/queryLinkCrawlerJobs" are empty after adding a link with "/linkgrabberv2/addLinks" with the provided jobUUIDs

After sending 9000 links (99% mega.nz) via the API sequentially around 200 of them were unable to get ANY result from "/linkgrabberv2/queryLinkCrawlerJobs" after adding the link.
(However running "/linkgrabberv2/queryLinks" and providing it with jobUUIDs from "/linkgrabberv2/addLinks" results in the expected links, but this does not solve the issue)

After sending x amount of links or waiting x hours the jobUUID will no longer exist in the "/linkgrabberv2/queryLinkCrawlerJobs" which is fine since I normally have the linkUUIDs and packageUUIDs associated with the jobUUIDs
(there could be some sort of internal limit? But these appear to still exist after restarting)

These ~200 link URLs are unable to be handled in my application in its current state as they only have their jobUUID saved.
I was expecting "/linkgrabberv2/queryLinkCrawlerJobs" to ALWAYS return a result. I will need to fix these by running "/linkgrabberv2/queryLinks" with the saved jobUUID.

JD Info:
Code:
Build Data: Fri Jun 07 17:51:22 CEST 2024
Java: AdoptOpenJDK - OpenJDK Runtime Environment - 1.8.0_265(64bit/X86)
OS: WINDOWS(WINDOWS_10_22H2)(64bit)
Core: #48254
Launcher #5770
AppWork Utilities: #4055
Browser: #48227
Updater: #1061
Log File: **External links are only visible to Support Staff****External links are only visible to Support Staff**
Log File Password: **External links are only visible to Support Staff**(archive password is contained within this URL)

If you have any question for me I will be in the IRC #jDownloader as @TheGreenUser, DM me if it's more convenient.

I'm unable to provide any more detailed logs at this time however I do have one example, I can find up to ~200 more if required within the 8 JD log folders I have. If the debug logs are truly required I can reset JD and attempt to re-create the issue.


Example:

Within log folder: "1717954683594_" file: "jd.controlling.linkcollector.LinkCollector.log.0" find "CrawlerJob:ID:1718007087379"
Code:
"/linkgrabberv2/addLinks" {"assignJobID": true, "overwritePackagizerRules": false, "packageName": "priority": "DEFAULT", "links": "**External links are only visible to Support Staff**, "sourceUrl":"my source url here"}:
{"data" :[[1718007087379]],"rid": my_rid_placeholder_here}"


"/linkgrabberv2/queryLinkCrawlerJobs":
{"collectorInfo":true,"jobIds":[1718007087379]}: {"data" :[],"rid":my_rid_placeholder_here}"

// program loop would normally end here for this response with error("received 0 LinkCrawlerJobs")

"/linkgrabberv2/queryLinks" {"collectorInfo":true, jobUUIDs [1718007087379]}:
{"data" :[{Availability:ONLINE BytesTotal:911906257 Comment: DownloadPassword: Enabled:true Host:mega.co.nz Name:VR PMV - Mother's Daughter.mp4 PackageUUID:1718007088166 Priority: Url:**External links are only visible to Support Staff**rid":1718047882924969911}

basic program flow: (Only 1 link is handled at a time)
Code:
for each link in links:


Send 1 link to "/linkgrabberv2/addLinks"
Receive 1 CrawlerJob:ID or return error

Save CrawlerJob:ID to the link's database record

Wait 2 seconds
var jobs JobLinkCrawlerSortables

loop:
  // todo: replace "/linkgrabberv2/queryLinkCrawlerJobs" with "/linkgrabberv2/isCollecting" then fetch LinkCrawlerJobs if !Collecting

  jobs, err = "/linkgrabberv2/queryLinkCrawlerJobs" {"collectorInfo":true, jobIds:[CrawlerJob:ID]}
  if err { return err } // we should never have an error here unless there is an issue with the connection to the internet or JD Error 
  var completedJobs = 0
  for _, job := range jobs {
    if !job.Crawling && !job.Checking {
      completedJobs++
    }
  }

  if completedJobs >= len(jobs) {
    break
  }

  if loop takes more than 5 minutes {
    return error("failed to queryLinkCrawlerJobs after 5 minutes") // all +9000 jobs have taken less than 6 seconds
  }
  Wait 3 seconds
}

if len(jobs) == 0 {
  // "data" :[]
  return error("received 0 LinkCrawlerJobs")  <--- Here's the issue
}

for job in jobs {
// we should ONLY have one job, but just in-case

  query links with job.jobId
  extract unique packageUUID's
  rename each packageUUID with a custom prefix

  if links contains a link which is not "ONLINE" {
    do nothing, just log the issue to the link's database record
  } else {
    send packageUUIDs to download
  }
  
}
Save any errors, jobIds, packageUUIDs, linkUUIDs to the link's database record
Thanks
//TGU

Last edited by TGU; 15.06.2024 at 01:46. Reason: clarified Remote API, updated queryLinkCrawlerJobs to 3sec
Reply With Quote
  #2  
Old 15.06.2024, 00:38
TGU TGU is offline
Mega Loader
 
Join Date: Jun 2024
Location: International Waters where DRM/DMCA protections are ignored
Posts: 67
Default

Update. I was able to replicate this issue with Log: Debug Mode and MyJdownloaderSettings: Debug enabled while testing 3 pixeldrain links (this happens with anything) on the current version, the same basic program flow still applies as above. The issue should be able to be found quickly.

It appears to be only occurring when downloads are running (possibly) as the 3rd or 4th time when attempting to get this I started the downloads then added the links; but this could just be up to luck.

14.06.24 17.12.25 <--> 14.06.24 17.19.46 jdlog://4643411370661/

Remote API "/linkgrabberv2/queryLinkCrawlerJobs" is returning an empty data response for the first link

Code:
Link 1 @ 2024-06-14 17:16:55.9338301: j_1718403401910 (received 0 link crawler jobs)
Link 2 @ 2024-06-14 17:17:09.9908334: j_1718403415941,l_1718403427244,l_1718403427245,p_1718403427243
Link 3 @ 2024-06-14 17:17:24.0503298: j_1718403429998,l_1718403441283,p_1718403441282
The times given are upon database update which the final DB update occurs, upon error or success.
Jobs, Links and Packages ids are prefixed with their first character + _

Last edited by TGU; 15.06.2024 at 01:32.
Reply With Quote
  #3  
Old 15.06.2024, 01:36
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,439
Default

@TGU: JobLinkCrawler (via "/linkgrabberv2/queryLinkCrawlerJobs") are only available while the job is still waiting/running and shortly after it has finished. After it is finished/has processed all links, there are no guarantees how long it will be available.
That is desired behaviour. In Short: "/linkgrabberv2/queryLinkCrawlerJobs" no longer returning a JobLinkCrawler for specific jobUUID, that means it is finished/cleaned up.

You should make use of "assignJobID" in "/linkgrabberv2/addLinks" if you want to get hold on the resulting links via "/linkgrabberv2/queryLinks".

If neccessary, I can introduce new api method so you can *extend* the reachability of a JobLinkCrawler for certain of time to prevent earlier cleanup.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #4  
Old 15.06.2024, 02:01
TGU TGU is offline
Mega Loader
 
Join Date: Jun 2024
Location: International Waters where DRM/DMCA protections are ignored
Posts: 67
Default

Quote:
Originally Posted by Jiaz View Post
If neccessary, I can introduce new api method so you can *extend* the reachability of a JobLinkCrawler for certain of time to prevent earlier cleanup.
Thanks for that quick response, I was unaware of cleanup duration sometimes being less than 2 seconds after the link was added (with only 1 link at a time handled). I am tracking all jobIds, however I expected queryLinkCrawlerJobs to last more than a few seconds but not forever. If the duration could be somewhat known/expected that would be great.

The reason for using this endpoint vs "/linkgrabberv2/queryLinks". Is that naturally I also want to check the data about the job itself isChecking, isCrawling, #broken, #crawled, #filtered, #unhandled it just makes sense; I'm sure some of these could be extracted via queryLinks but not all of them. It was working well for ~9000 links but the ~200 caused issues.

Last edited by TGU; 15.06.2024 at 02:04.
Reply With Quote
  #5  
Old 15.06.2024, 02:04
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,439
Default

Quote:
Originally Posted by TGU View Post
Thanks for that quick response, I was unaware of cleanup duration sometimes being less than 2 seconds after the link was added (with only 1 link at a time handled). I am tracking all jobIds, however I expected queryLinkCrawlerJobs to last more than a few seconds but not forever. If the duration could be somewhat known/expected that would be great.
It's unknown. It's up to Java Garbage Collection. The reference to the JobLinkCrawler is weak, see docs.oracle.com/javase/8/docs/api/java/lang/ref/WeakReference.html

I think a way to extend reachability would be best solution for your case then?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #6  
Old 15.06.2024, 02:11
TGU TGU is offline
Mega Loader
 
Join Date: Jun 2024
Location: International Waters where DRM/DMCA protections are ignored
Posts: 67
Default

Quote:
Originally Posted by Jiaz View Post
I think a way to extend reachability would be best solution for your case then?
Any solution would be good, I think an advanced setting would be best as there is no real need to send the CrawlerJob lifetime duration with an API request. No immediate rush is required, but would be nice to see in the near future.

:) I'm glad it's just a garbage collection "issue"

Last edited by TGU; 15.06.2024 at 02:14.
Reply With Quote
  #7  
Old 15.06.2024, 02:19
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,439
Default

I will think about an easy/fast solution to this

Quote:
Originally Posted by TGU View Post
Any solution would be good, I think an advanced setting would be best
I agree. Most likely I will add a customizable *keep reachable* timeout via advanced settings.
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 15.06.2024 at 02:30.
Reply With Quote
  #8  
Old 16.06.2024, 12:34
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,439
Default

@TGU: to keep it nice&easy&simple, how about a new flag in AddLinksQuery to disable auto cleanup of JobLinkCrawler entries from list and a new additional cleanup call so you can manually remove/cleanup once you no longer need the information. that way nothing changes for existing usage and you can change behaviour on per job basis and not have to change advanced settings all the time
whats your opinion on this?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #9  
Old 17.06.2024, 03:18
TGU TGU is offline
Mega Loader
 
Join Date: Jun 2024
Location: International Waters where DRM/DMCA protections are ignored
Posts: 67
Default

Quote:
Originally Posted by Jiaz View Post
@TGU: to keep it nice&easy&simple, how about a new flag in AddLinksQuery to disable auto cleanup of JobLinkCrawler entries from list and a new additional cleanup call so you can manually remove/cleanup once you no longer need the information. that way nothing changes for existing usage and you can change behaviour on per job basis and not have to change advanced settings all the time
whats your opinion on this?
:thumbup: That will work just fine, and it's more flexible for those who want to use it.

I don't suppose you could also add the ability to get the stored Job UUIDs for QueryLinks & QueryPackages, as these aren't really available anywhere other than when you call "/linkgrabberv2/AddLinks". (I would create a new thread, but I've made too many recently)
Reply With Quote
  #10  
Old 17.06.2024, 13:45
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,439
Default

Quote:
Originally Posted by TGU View Post
I don't suppose you could also add the ability to get the stored Job UUIDs for QueryLinks & QueryPackages, as these aren't really available anywhere other than when you call "/linkgrabberv2/AddLinks". (I would create a new thread, but I've made too many recently)
Packages don't have Job UUID, only Links do.You can query Links and will get the package UUID from the link entry
with next update, add
Quote:
"jobUUID":true
to Query object and the job UUID will be part of the links
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 17.06.2024 at 13:51.
Reply With Quote
  #11  
Old 17.06.2024, 22:38
TGU TGU is offline
Mega Loader
 
Join Date: Jun 2024
Location: International Waters where DRM/DMCA protections are ignored
Posts: 67
Default

:thumbup: Perfect thanks for that, I've thought about it since a few years ago.
I'll provide updates for all the API changes on each thread once the core update is live.
Reply With Quote
  #12  
Old 17.06.2024, 22:52
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,439
Default

Quote:
Originally Posted by TGU View Post
I'll provide updates for all the API changes on each thread once the core update is live.
I'll ping once those are live
__________________
JD-Dev & Server-Admin
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 17:28.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.