JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 26.04.2022, 17:33
I3ordo I3ordo is offline
Mega Loader
 
Join Date: Mar 2022
Posts: 65
Default Automation for the crawljob?

My last two months have been "religiously" visiting my favourite site and visiting my booked marked (fixed url) search results that shows new entries as first at top: **External links are only visible to Support Staff****External links are only visible to Support Staff**

After opening each post individually, i ctr+a and ctrl+c the whole page so that the link grabber can find the links and add them. Normally, if i don't open the actual paste but paste the link of the results page to JD, it finds nothing even with deep crawl...

It s practical already, but it is semi automated in a way. Would love to know if it can get fully automated = opening each post on the first two pages(there are hundreds of pages out there) and grab their texts that contain links that lead to archives and auto-start download with appropriate provider (which it already does already).

I have checked that there s an extension called "folder watch" which scans the "list.crawljob" and auto adds them to downloads which is a very good starting point.

maybe i should find someone with phyton automation and selenium experience to create a script that auto creates those links for the crawljob?.

So the actual question becomes, can JD auto crawl web site pages daily or should i go for the "phyton-selenium" option?
Reply With Quote
  #2  
Old 26.04.2022, 18:50
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,128
Default

Quote:
Originally Posted by I3ordo View Post
So the actual question becomes, can JD auto crawl web site pages daily or should i go for the "phyton-selenium" option?
I'm sorry but JDownloader does not support such feature *out-of-the-box*.
Best would be to use/write external tools that automate the creation of crawljob files
and/or use the Eventscripter and a script to watch/automate the adding of links, see
https://board.jdownloader.org/showthread.php?t=70525
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 26.04.2022 at 18:53.
Reply With Quote
  #3  
Old 26.04.2022, 19:10
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,128
Default

Quote:
Originally Posted by I3ordo View Post
After opening each post individually, i ctr+a and ctrl+c the whole page so that the link grabber can find the links and add them.
This you can easily automate via Eventscripter, a script that runs in interval and then loads the html page and parses the links/content you are interested in and then *feed* JDownloader with it.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #4  
Old 03.05.2022, 04:48
I3ordo I3ordo is offline
Mega Loader
 
Join Date: Mar 2022
Posts: 65
Default

is there an already working script that i can start with? i doubt i can find someone that would build a script for a few bucks
Reply With Quote
  #5  
Old 03.05.2022, 13:47
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,128
Default

@I3ordo: best would be to write a post in https://board.jdownloader.org/showthread.php?t=70525 and explain what you need/what the script should do
__________________
JD-Dev & Server-Admin
Reply With Quote
  #6  
Old 28.11.2023, 00:57
Coldblackice Coldblackice is offline
DSL User
 
Join Date: Sep 2019
Location: San Francisco
Posts: 38
Default

Did you ever locate a script that does this?
Reply With Quote
  #7  
Old 07.01.2024, 17:44
tb21 tb21 is offline
Junior Loader
 
Join Date: Oct 2021
Posts: 14
Default

It's been some weeks, but I guess some keywords and tools which help creating what you are looking for might still be useful for everyone?

Windows machine:
I use ruby scripting environment with watir (a ruby extension/gem, which is based on selenium). Also in the mix: chromedriver.exe (required by watir) and a portable chrome.exe. In JD2 the eventscriper plugin is used as well, all this allows for automatic parsing of websites and creating required crawljob files for JD2.

So..
- install ruby (for scripting and using watir)
- install watir gem (for controlling browser and parsing website content)
- install chromedriver matching your chrome version (to control chrome.exe)
- install/enable folderwatch plugin in JD2
- install/enable eventscripter plugin in JD2
- learn ruby and watir and how to control browser and extract links
- create crawljob files with ruby
- move crawljobs to JD2 watchfolder
- you can start downloads manually in JD2 at this point with your crawljob results being present in the linkgrabber section
- now learn JD2 eventscripter plugin and how to handle crawljob results in the linkgrabber if you need full automatic processing and downloading.

I can already say, you won't get away with a script which is 20 lines long. It takes quite some time to learn ruby, watir, the JD2 eventscripter etc., but if you can code, it is doable within a week or two to get at least a working prototype.

Once you have a basic setup for crawling websites and pushing desired links over to JD2, you are golden. It's rather easy from that point on to adopt the project to other scenarios and use cases.

If you need specific help, I can give some advice I guess, but it's rather unlikely that anyone will come up with a set of files and tools in a downloadable package and a ready to use script on top, because parsing websites is kind of hard and also a moving target, since websites tend to change a lot (e.g: new forum software, new layout, new style, new landing page, new domain, new login etc.)

I have to update my ruby/watir/event-scripter project regularly to keep them in working condition. It is also advised that you build a basic "web scraping" framework right from the start, which is as flexible as possible. If you start writing all your code into one big blob of code, you will cry if things need to be parsed in different order or the website changed its HTML rendering or you updated ruby or any other component and all things fall apart at once.

You need to prepare things like "parse only 1 page and 1 link" instead of "parse and download everything". You also probably want logging in console and in text files to be able track down errors/results or generic unexpected behaviour. You might also need a log of what you already parsed / downloaded successfully to not end up with hundreds of duplicate downloads if something went wrong in the process.

Parsing and downloading a huge amount of links / data will break at some point in the process, I can tell from experience. So you need to make sure to be able to continue from that moment (page and link) on, after fixing the problem.

cu! o)
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 16:53.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.