[Solved] I wrote a small RSS feed scraper for jDownloader - JDownloader Community

#1 19.10.2010, 00:49

Grab it here:

Code:

**External links are only visible to Support Staff**

Check out the source code and README here:

Code:

**External links are only visible to Support Staff**

If you have problems, check the wiki here:

Code:

**External links are only visible to Support Staff**

Please feel free to open issues, send patches or give feedback/suggestions. Consider this an alpha release.

I've already spotted a couple of big issues, one of which is that it DOES NOT support ATOM feeds. Only RSS. If you get a parser error and your know it is a valid feed, chances are it's ATOM. I already have a fix for this in mind.

New version 0.1.1 supports Atom! Resolved bugs when adding links.
New version 0.1.2 detects missing parts of multipart rars. Improved TV episode recognition.

I am aware of flexget and it's ilk. I thought that my scripts cobbled together with bits of string and tape would be easier for the uninitiated to use.

Right now, I can only confirm that this works with the latest STABLE jDownloader (and Web Interface).

Happy downloading,
jdlbot

#2 19.10.2010, 09:01

much appreciated.

i'll try it out tonight. i'm using feedreader at the moment. works ok, but sometimes copy/paste freaks out.

00h.

drbits · #3 19.10.2010, 10:38

Nice.

Flexget needs an external program to put the description or HTML page into the clipboard. It could use other interfaces, but that would mean relying on Flexget to find the correct URLs.

#4 19.10.2010, 18:21

@drbits - I should be clear in my initial post. This is an external program that finds the correct urls and sends them to jDownloader. It uses the jDownloader web interface to communicate.

Unlike flexget, it requires minimal configuration. Setup some feeds, make some filters and presto!

@zerobyte - Unlike feedreader, jDlBot is cross platform (will run anywhere perl runs) and will (hopefully) avoid any clipboard issues. Also, it just doesn't scrape the feeds, it will follow the feed links and scrape the resulting pages if desired.

#5 19.10.2010, 18:24

Quote:

Originally Posted by jdlbot

It uses the jDownloader web interface to communicate.

Maybe its easier to use the Remote Control for interacting with the JD?

#6 19.10.2010, 18:47

Quote:

Originally Posted by Greeny

Maybe its easier to use the Remote Control for interacting with the JD?

I use the stable version of JD myself and found that the version of the remote control in stable does not work consistently. Also, it gives no feedback on the status of links in the linkgrabber queue.

All of this has been changed in the nightly version of the remote control. I have to say that it's awesome (great work!), and I will be targeting that for JD integration after I clean up the initial version of jdlbot.

I have some experience scraping the web with perl, so scraping the Web Interface page and generating appropriate querystrings wasn't really an issue.

#7 19.10.2010, 19:07

Allright! When the Nightly will be the next Stable I'm looking out for the next version of your little app.

Are you familiar with Java? If so, feel free to download the source and implement your app in Java, that we can release it as an official addon :-)

#8 19.10.2010, 20:17

@Greenly - That's actually how I started on this. Unfortunately, the only experience I have with Java is in writing web services and not gui application dev. I wasn't really making much progress on that front, so I just expanded on my existing perl scripts.

I could hack at making it an addon again, but there were several issues that I had yet to solve. Storing and calling feed/filter data, generating the gui forms (no idea what I was doing there), integrating feed reading and screen scraping libs. etc.

drbits · #9 20.10.2010, 04:03

Feel free to post your program on a file host and link to it here.

If you license the program under GPL3, one of us can take your source, translate it to Java, and convert it into an addon.

On the other hand, it might be better to keep the RSS/ATOM/NNTP feeds as accessory programs (still need the GPL license).

It would be nice if programs that passed just URIs could use the CNL standard. If the program is passing a web page for LinkGrabber, it is best to use the Clipboard.

#10 20.10.2010, 04:41

@drbits - I'll put a license file in my repo sometime tomorrow. If GPLv3 is best for you guys, I'll put all my source under that. The binary distributions however fall under one of the perl licenses (artistic or gpl depending on the modules.)

Does CNL have a way to check the status of the linkgrabber queue? The only documented feature I noticed was adding links.

jDlBot already extracts the links and posts a URI encoded list to the web interface. It should be easy to change if necessary.

#11 20.10.2010, 22:19

Released a new version with a bunch of fixes.

drbits · #12 21.10.2010, 10:10

CNL does not have a way to check status it is POST only.

The new (Nightly) remote control has a command to check the number of links in the Link Grabber.
To get help: **External links are only visible to Support Staff** You will probably be interested in:
/get/grabber/count
/get/grabber/isbusy

I know that you don't want to use the Nightly Test version, but the only other way to get the information is to scrape the Web interface and that will not be easy.

#13 21.10.2010, 15:39

Quote:

Originally Posted by drbits

... but the only other way to get the information is to scrape the Web interface and that will not be easy.

This is pretty much what I'm doing (**External links are only visible to Support Staff**link) :D

It's not pretty... without getting the links back in the interface, it really just tries to tell which ones were added last and polls the web interface add links page for updates. If every link it thinks it added last is online, then (if desired) it pushes the packages to the download queue.

This could pose a problem if you add links to the clipboard/linkgrabber while jdlbot is pushing links. I've been running this at home for a little while now and it hasn't been an issue.

#14 03.11.2010, 05:13

This is exactly a thing we miss from the JDownloader.

Can you give me a hint how to set up the feeds / filters?

I could run and configure the app to access the JDownloader, added some feeds, but but except a short Checking for updates... No new updates sequence nothing happens.

You do suggest something about the filters, but I found the interface is a bit over my capabilities

Can you explain please, where and what do I have put to get a feed like this to work?

Code:

**External links are only visible to Support Staff**

Thank you for your effort and patient.

drbits · #15 03.11.2010, 05:36

This is great!

However, a lot of the JD interaction will be much easier when the new remote control is ready. That just means the next release

#16 03.11.2010, 23:12

Quote:

Originally Posted by kiberiada

This is exactly a thing we miss from the JDownloader.

Can you give me a hint how to set up the feeds / filters?

I could run and configure the app to access the JDownloader, added some feeds, but but except a short Checking for updates... No new updates sequence nothing happens.

You do suggest something about the filters, but I found the interface is a bit over my capabilities

Can you explain please, where and what do I have put to get a feed like this to work?

Code:

**External links are only visible to Support Staff**

Thank you for your effort and patient.

The interface is a bit sparse at the moment and will probably change in the next release.

I should make a note also to NOT use Internet Explorer to access the configuration, as IE uh... has issues.

That being said, to add a new feed/filter please follow these steps:

1. Click on the feeds link on the left hand panel
2. Input the feed url, the interval and whether or not you want to follow the feed links
3. Click "Add feed"
4. Wait for either A) an error message or B) the recently added feed to popup above the new feed area.
5. At this point you should see a "Running Watcher" fire in the terminal window. This means your feed is active.

6. Click on the filters link in the left hand pane
7. Add your filter parameters, hover over the inputs to get help bubbles
7a. Be sure to add at least one expected linktype. ie megaupload or hotfile. If there can be multiple link types put them in a pipe delimited list: megaupload|hotfile

I'm currently working on expanding this feature.

8. Click "Add filter" and wait for confirmation.

9. If you want to rerun your feed watcher right that second, go to the feeds page and uncheck and recheck the box next to it. You will see "Running Watcher" appear again in the status window.

Now... after all that I've checked out the link you provided and I see absolutely no useful links in either the feed or resulting pages. I'm not sure why you would even want to add that feed. (unless you're trying to scrape myspace pages... then you would add something like myspace in the linktype field of the filter.)

When I made this I had things like katz or ev0 feeds in mind.

#17 07.11.2010, 11:43

a similar plugin is developed.
check out this: JDFeedMe

Code:

http://board.jdownloader.org/showthread.php?p=114316
**External links are only visible to Support Staff**

maybe we can collaborate

#18 07.11.2010, 12:17

@holtzi

Thanks for this great feature.

I wonder why this is just an "add-on", because it seems to be properly documented and well integrated with jD.

#19 08.11.2010, 23:41

Quote:

Originally Posted by holtzi

a similar plugin is developed.
check out this: JDFeedMe

Code:

http://board.jdownloader.org/showthread.php?p=114316
**External links are only visible to Support Staff**

maybe we can collaborate

pm sent!

#20 09.11.2010, 11:27

cause it cannot work on jdownloader untill they sighn it please help me find a beta release thanks

#21 09.11.2010, 11:32

It seems to work with the Nightly test version of jD. Please visit the Nightly forum and read the sticky posts of that forum.

buggsy · #22 26.01.2011, 09:17

I set up a custom rss feed at filestube.com, which looks something like this:
_www.filestube.com/rss.rss?q=robot.chicken.*s05e02.*720p
(the episode-specific feed is just to test things until I get it working)

I fed this into jdlbot, and set up the feed and filter accordingly. When running, it gives me this error:

error, 599 Only http and https URL schemes supported
Failed to follow link: _**External links are only visible to Support Staff**
(the underscores are added to keep the links from disappearing)
Any clue as to what's causing this?

drbits · #23 26.01.2011, 10:30

Try .*? instead of .*.

.* is greedy and matches to the end of the line.
.*? is lazy and slower, but it only matches what is necessary.

**External links are only visible to Support Staff**www.filestube.com/rss.rss?q=robot.chicken.*?s0?5[ex]?\d{1,4).*?720p

s0? The ? means the 0 is optional.
[ex] Means either character
[ex]? Means either character, but optional
\d means a digit [0-9] (There are other special \ combinations)
{1,4) (those are curly braces) means from 1 to 4 occurrences
\d{1,4) means 1 to 4 digits.

Search the web for ("Regular expression" tutorial OR summary OR nutshell)

buggsy · #24 26.01.2011, 13:49

Thanks for the quick reply - however, I don't think the expressions are the problem. Replacing the .* with .*? didn't solve it, but my original feed had a list of links that jdlbot should have been able to parse. It's not getting the links in the rss that is the problem, it's the fact that, for some reason, jdlbot doesn't like the list I'm giving it.

drbits · #25 27.01.2011, 08:53

@ Buggsy,

The addon is not part of JDownloader. It is a user provided program that is separate from JDownloader, but interacts with JDownloader via interfaces.

Try putting .*? at the beginning and end of the expression.

Try debugging your expressions by loading only one at a time and see which work and which do not (let us know the results).

Re-read posts 16 and 19.

buggsy · #26 27.01.2011, 19:30

@drbits,
Thanks for the help troubleshooting, I appreciate the time that you're putting towards this. However, as I said before, I don't think it's a problem with the expressions, I think it's a problem with how jdlbot interacts with filestube. Any rss feed I grab from filestube gives me the error - I even tried leaving the filter blank in jdlbot to allow it to pull every link. It still gives me this error. I strongly believe it is not a problem with the regex, as it is finding the links but it has an issue trying to follow them.

Also, I understand that this is not part of JDownloader - I assumed this thread would still be an appropriate place for help, though. As for debugging my expressions, do you mean the expressions in the filters of jdlbot, or in the search field of filestube? As I said, even a blank filter in jdlbot gives errors, and I don't think the expression in filestube should matter, as long as there are search results. I tried adding .*? in my search query regardless, and it did not help. I have also tried other filestube rss feeds, including a simple one-word search. Any feed from filestube is unsuccessful.

I should add that I have other feeds working in jdlbot, so it isn't a lack of understanding the interface.

Just a reminder, the error I'm receiving is "error, 599 Only http and https URL schemes supported", followed by a "failed to follow link". Any other suggestions?

drbits · #27 28.01.2011, 09:41

Something is preventing jdlbot from finding the whole string. Try using a jdlbot filter that starts with
http[s]?://(www\.)?filestube.com/
and so on.

#28 22.07.2011, 05:41

Hi,

I have installed this, read through the thread, read the wiki, but I cannot figure out how to get this working. Can someone post a sample feed and filter that they use, so I can see what exactly I need to do?

Thanks!

#29 26.07.2011, 09:47

Are you talking about jdlbot or jdfeedme?

A sample feed and filter depend on what links you want to obtain.

#30 07.08.2011, 23:37

For jdlbot, here is a (terse) example:

Here is a site to learn more about regular expressions:

Code:

**External links are only visible to Support Staff**

#31 26.08.2011, 05:56

nothings happening, it indicates its watching the feed and my connections are good but nothing is going into jdownloader. i setup a couple of test runs with shows that i saw on the rss feed. can anyone help me? i really want to get this working. great idea by the way

#32 12.03.2012, 10:49

hi,

i just installed jdlbot and i'm getting an error message saying

Error parsing Feed: **External links are only visible to Support Staff**www.rslog.net/feed/

can anyone tell me, where the problem is?

Thank you in advance!

Jiaz · #33 12.03.2012, 11:39

as the plugin is outdated and the original developers no longer work on it, it got removed from our sourcebase. we can't fix/support 3rd party addons. i'm sorry for this

#34 12.03.2012, 14:02

ok thanks for the fast reply. is there any alternative?

i just want to get a list of uploaded.to download links for me tv shows. i'm note really sure, what i should be looking for...

thawn · #35 09.12.2016, 02:12

Hi, I just recently picked up from where this was last left off at github.

I managed to get it to work with jdownloader 2. I am now using the cnl api (too lazy/inexperienced with perl to write an api for my.jdownloader.org).

I also freshened up the user interface and added a feature to automatically create filters from filter titles.

Check it out here: **External links are only visible to Support Staff****External links are only visible to Support Staff**

raztoki · #36 09.12.2016, 06:18

@thawn
nice, by the way there are public libraries out for the new api (which my.jd uses) in most languages, you would just need find one of those than reinvent the wheel. Might have some added benefits for example if mirrors added were offline, then your rss scraper could keep adding mirrors from other sources.

raztoki

thawn · #37 09.12.2016, 08:43

Quote:

Originally Posted by raztoki

@thawn
nice, by the way there are public libraries out for the new api (which my.jd uses) in most languages, you would just need find one of those than reinvent the wheel. Might have some added benefits for example if mirrors added were offline, then your rss scraper could keep adding mirrors from other sources.

raztoki

jdlbot is written in pearl (not my idea but I did not want to start from scratch). Unfortunately i could not find any my.jdownloader api library available for pearl. :(

As soon as a pearl library becomes available, I'll make sure to use it

thawn · #38 09.12.2016, 11:42

question to the admins:
should I open an new discussion/support thread for the new version of the rss scraper or should we remove the [Erledigt] tag from this thread?

raztoki · #39 09.12.2016, 12:36

think you will find its perl and not pearl ;p

ok, I'm surprised that it's not available. Quick look on our support mediums and I can't find it either, nor via google. Seemed to be a c#, php and python libs around.

I would just persist within this thread so have continuity.

The forum tags are more for us support staff, each language that forum software supports has its own tag translation also (In English it's 'Solved', which is adequate.

Jiaz · #40 09.12.2016, 15:54

@thawn: you can also assign
dir=downloadfolder
package=packagename
add additional parameters in the flashgot api that you are using

		JDownloader Community - Appwork GmbH > English Support > General Discussion
[Solved] I wrote a small RSS feed scraper for jDownloader

	JDownloader Community Board - Archive - Top
Provided By AppWork GmbH \| Privacy \| Imprint