[Solved] Automation for the crawljob? - JDownloader Community

I3ordo · #1 26.04.2022, 16:33

My last two months have been "religiously" visiting my favourite site and visiting my booked marked (fixed url) search results that shows new entries as first at top: **External links are only visible to Support Staff****External links are only visible to Support Staff**

After opening each post individually, i ctr+a and ctrl+c the whole page so that the link grabber can find the links and add them. Normally, if i don't open the actual paste but paste the link of the results page to JD, it finds nothing even with deep crawl...

It s practical already, but it is semi automated in a way. Would love to know if it can get fully automated = opening each post on the first two pages(there are hundreds of pages out there) and grab their texts that contain links that lead to archives and auto-start download with appropriate provider (which it already does already).

I have checked that there s an extension called "folder watch" which scans the "list.crawljob" and auto adds them to downloads which is a very good starting point.

maybe i should find someone with phyton automation and selenium experience to create a script that auto creates those links for the crawljob?.

So the actual question becomes, can JD auto crawl web site pages daily or should i go for the "phyton-selenium" option?

Jiaz · #2 26.04.2022, 17:50

Quote:

Originally Posted by I3ordo

So the actual question becomes, can JD auto crawl web site pages daily or should i go for the "phyton-selenium" option?

I'm sorry but JDownloader does not support such feature *out-of-the-box*.
Best would be to use/write external tools that automate the creation of crawljob files
and/or use the Eventscripter and a script to watch/automate the adding of links, see
https://board.jdownloader.org/showthread.php?t=70525

Jiaz · #3 26.04.2022, 18:10

Quote:

Originally Posted by I3ordo

After opening each post individually, i ctr+a and ctrl+c the whole page so that the link grabber can find the links and add them.

This you can easily automate via Eventscripter, a script that runs in interval and then loads the html page and parses the links/content you are interested in and then *feed* JDownloader with it.

I3ordo · #4 03.05.2022, 03:48

is there an already working script that i can start with? i doubt i can find someone that would build a script for a few bucks

Jiaz · #5 03.05.2022, 12:47

@I3ordo: best would be to write a post in https://board.jdownloader.org/showthread.php?t=70525 and explain what you need/what the script should do

Coldblackice · #6 27.11.2023, 23:57

Did you ever locate a script that does this?

tb21 · #7 07.01.2024, 16:44

It's been some weeks, but I guess some keywords and tools which help creating what you are looking for might still be useful for everyone?

Windows machine:
I use ruby scripting environment with watir (a ruby extension/gem, which is based on selenium). Also in the mix: chromedriver.exe (required by watir) and a portable chrome.exe. In JD2 the eventscriper plugin is used as well, all this allows for automatic parsing of websites and creating required crawljob files for JD2.

So..
- install ruby (for scripting and using watir)
- install watir gem (for controlling browser and parsing website content)
- install chromedriver matching your chrome version (to control chrome.exe)
- install/enable folderwatch plugin in JD2
- install/enable eventscripter plugin in JD2
- learn ruby and watir and how to control browser and extract links
- create crawljob files with ruby
- move crawljobs to JD2 watchfolder
- you can start downloads manually in JD2 at this point with your crawljob results being present in the linkgrabber section
- now learn JD2 eventscripter plugin and how to handle crawljob results in the linkgrabber if you need full automatic processing and downloading.

I can already say, you won't get away with a script which is 20 lines long. It takes quite some time to learn ruby, watir, the JD2 eventscripter etc., but if you can code, it is doable within a week or two to get at least a working prototype.

Once you have a basic setup for crawling websites and pushing desired links over to JD2, you are golden. It's rather easy from that point on to adopt the project to other scenarios and use cases.

If you need specific help, I can give some advice I guess, but it's rather unlikely that anyone will come up with a set of files and tools in a downloadable package and a ready to use script on top, because parsing websites is kind of hard and also a moving target, since websites tend to change a lot (e.g: new forum software, new layout, new style, new landing page, new domain, new login etc.)

I have to update my ruby/watir/event-scripter project regularly to keep them in working condition. It is also advised that you build a basic "web scraping" framework right from the start, which is as flexible as possible. If you start writing all your code into one big blob of code, you will cry if things need to be parsed in different order or the website changed its HTML rendering or you updated ruby or any other component and all things fall apart at once.

You need to prepare things like "parse only 1 page and 1 link" instead of "parse and download everything". You also probably want logging in console and in text files to be able track down errors/results or generic unexpected behaviour. You might also need a log of what you already parsed / downloaded successfully to not end up with hundreds of duplicate downloads if something went wrong in the process.

Parsing and downloading a huge amount of links / data will break at some point in the process, I can tell from experience. So you need to make sure to be able to continue from that moment (page and link) on, after fixing the problem.

cu! o)

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

	JDownloader Community Board - Archive - Top
Provided By AppWork GmbH \| Privacy \| Imprint