JDownloader Community - Appwork GmbH
 

Go Back   JDownloader Community - Appwork GmbH > English Support > General Discussion
Reply
 
Thread Tools Display Modes
  #1  
Old 19.10.2010, 00:49
jdlbot
Guest
 
Posts: n/a
Default I wrote a small RSS feed scraper for jDownloader

Grab it here:
Code:
**External links are only visible to Support Staff**
Check out the source code and README here:
Code:
**External links are only visible to Support Staff**
If you have problems, check the wiki here:
Code:
**External links are only visible to Support Staff**
Please feel free to open issues, send patches or give feedback/suggestions. Consider this an alpha release.

I've already spotted a couple of big issues, one of which is that it DOES NOT support ATOM feeds. Only RSS. If you get a parser error and your know it is a valid feed, chances are it's ATOM. I already have a fix for this in mind.

New version 0.1.1 supports Atom! Resolved bugs when adding links.
New version 0.1.2 detects missing parts of multipart rars. Improved TV episode recognition.

I am aware of flexget and it's ilk. I thought that my scripts cobbled together with bits of string and tape would be easier for the uninitiated to use.

Right now, I can only confirm that this works with the latest STABLE jDownloader (and Web Interface).

Happy downloading,
jdlbot

Last edited by jdlbot; 30.11.2010 at 18:32. Reason: new version!
Reply With Quote
  #2  
Old 19.10.2010, 09:01
zerobyte
Guest
 
Posts: n/a
Default

much appreciated.

i'll try it out tonight. i'm using feedreader at the moment. works ok, but sometimes copy/paste freaks out.

00h.
Reply With Quote
  #3  
Old 19.10.2010, 10:38
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,434
Default

Nice.

Flexget needs an external program to put the description or HTML page into the clipboard. It could use other interfaces, but that would mean relying on Flexget to find the correct URLs.
Reply With Quote
  #4  
Old 19.10.2010, 18:21
jdlbot
Guest
 
Posts: n/a
Default

@drbits - I should be clear in my initial post. This is an external program that finds the correct urls and sends them to jDownloader. It uses the jDownloader web interface to communicate.

Unlike flexget, it requires minimal configuration. Setup some feeds, make some filters and presto!

@zerobyte - Unlike feedreader, jDlBot is cross platform (will run anywhere perl runs) and will (hopefully) avoid any clipboard issues. Also, it just doesn't scrape the feeds, it will follow the feed links and scrape the resulting pages if desired.
Reply With Quote
  #5  
Old 19.10.2010, 18:24
Greeny
Guest
 
Posts: n/a
Default

Quote:
Originally Posted by jdlbot View Post
It uses the jDownloader web interface to communicate.
Maybe its easier to use the Remote Control for interacting with the JD?
Reply With Quote
  #6  
Old 19.10.2010, 18:47
jdlbot
Guest
 
Posts: n/a
Default

Quote:
Originally Posted by Greeny View Post
Maybe its easier to use the Remote Control for interacting with the JD?
I use the stable version of JD myself and found that the version of the remote control in stable does not work consistently. Also, it gives no feedback on the status of links in the linkgrabber queue.

All of this has been changed in the nightly version of the remote control. I have to say that it's awesome (great work!), and I will be targeting that for JD integration after I clean up the initial version of jdlbot.

I have some experience scraping the web with perl, so scraping the Web Interface page and generating appropriate querystrings wasn't really an issue.
Reply With Quote
  #7  
Old 19.10.2010, 19:07
Greeny
Guest
 
Posts: n/a
Default

Allright! When the Nightly will be the next Stable I'm looking out for the next version of your little app.

Are you familiar with Java? If so, feel free to download the source and implement your app in Java, that we can release it as an official addon :-)
Reply With Quote
  #8  
Old 19.10.2010, 20:17
jdlbot
Guest
 
Posts: n/a
Default

@Greenly - That's actually how I started on this. Unfortunately, the only experience I have with Java is in writing web services and not gui application dev. I wasn't really making much progress on that front, so I just expanded on my existing perl scripts.

I could hack at making it an addon again, but there were several issues that I had yet to solve. Storing and calling feed/filter data, generating the gui forms (no idea what I was doing there), integrating feed reading and screen scraping libs. etc.
Reply With Quote
Old 20.10.2010, 03:51
drbits
Message deleted by drbits.
  #9  
Old 20.10.2010, 04:03
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,434
Default

Feel free to post your program on a file host and link to it here.

If you license the program under GPL3, one of us can take your source, translate it to Java, and convert it into an addon.

On the other hand, it might be better to keep the RSS/ATOM/NNTP feeds as accessory programs (still need the GPL license).

It would be nice if programs that passed just URIs could use the CNL standard. If the program is passing a web page for LinkGrabber, it is best to use the Clipboard.
Reply With Quote
  #10  
Old 20.10.2010, 04:41
jdlbot
Guest
 
Posts: n/a
Default

@drbits - I'll put a license file in my repo sometime tomorrow. If GPLv3 is best for you guys, I'll put all my source under that. The binary distributions however fall under one of the perl licenses (artistic or gpl depending on the modules.)

Does CNL have a way to check the status of the linkgrabber queue? The only documented feature I noticed was adding links.

jDlBot already extracts the links and posts a URI encoded list to the web interface. It should be easy to change if necessary.
Reply With Quote
  #11  
Old 20.10.2010, 22:19
jdlbot
Guest
 
Posts: n/a
Default

Released a new version with a bunch of fixes.
Reply With Quote
  #12  
Old 21.10.2010, 10:10
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,434
Default

CNL does not have a way to check status it is POST only.

The new (Nightly) remote control has a command to check the number of links in the Link Grabber.
To get help: **External links are only visible to Support Staff** You will probably be interested in:
/get/grabber/count
/get/grabber/isbusy

I know that you don't want to use the Nightly Test version, but the only other way to get the information is to scrape the Web interface and that will not be easy.
Reply With Quote
  #13  
Old 21.10.2010, 15:39
jdlbot
Guest
 
Posts: n/a
Default

Quote:
Originally Posted by drbits View Post
... but the only other way to get the information is to scrape the Web interface and that will not be easy.
This is pretty much what I'm doing (**External links are only visible to Support Staff**link) :D

It's not pretty... without getting the links back in the interface, it really just tries to tell which ones were added last and polls the web interface add links page for updates. If every link it thinks it added last is online, then (if desired) it pushes the packages to the download queue.

This could pose a problem if you add links to the clipboard/linkgrabber while jdlbot is pushing links. I've been running this at home for a little while now and it hasn't been an issue.

Last edited by jdlbot; 21.10.2010 at 16:46.
Reply With Quote
  #14  
Old 03.11.2010, 05:13
kiberiada
Guest
 
Posts: n/a
Question A little tutorial

This is exactly a thing we miss from the JDownloader.

Can you give me a hint how to set up the feeds / filters?

I could run and configure the app to access the JDownloader, added some feeds, but but except a short Checking for updates... No new updates sequence nothing happens.

You do suggest something about the filters, but I found the interface is a bit over my capabilities Can you explain please, where and what do I have put to get a feed like this to work?

Code:
**External links are only visible to Support Staff**
Thank you for your effort and patient.
Reply With Quote
  #15  
Old 03.11.2010, 05:36
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,434
Default

This is great!

However, a lot of the JD interaction will be much easier when the new remote control is ready. That just means the next release
Reply With Quote
  #16  
Old 03.11.2010, 23:12
jdlbot
Guest
 
Posts: n/a
Default

Quote:
Originally Posted by kiberiada View Post
This is exactly a thing we miss from the JDownloader.

Can you give me a hint how to set up the feeds / filters?

I could run and configure the app to access the JDownloader, added some feeds, but but except a short Checking for updates... No new updates sequence nothing happens.

You do suggest something about the filters, but I found the interface is a bit over my capabilities Can you explain please, where and what do I have put to get a feed like this to work?

Code:
**External links are only visible to Support Staff**
Thank you for your effort and patient.
The interface is a bit sparse at the moment and will probably change in the next release.

I should make a note also to NOT use Internet Explorer to access the configuration, as IE uh... has issues.

That being said, to add a new feed/filter please follow these steps:

1. Click on the feeds link on the left hand panel
2. Input the feed url, the interval and whether or not you want to follow the feed links
3. Click "Add feed"
4. Wait for either A) an error message or B) the recently added feed to popup above the new feed area.
5. At this point you should see a "Running Watcher" fire in the terminal window. This means your feed is active.

6. Click on the filters link in the left hand pane
7. Add your filter parameters, hover over the inputs to get help bubbles
7a. Be sure to add at least one expected linktype. ie megaupload or hotfile. If there can be multiple link types put them in a pipe delimited list: megaupload|hotfile

I'm currently working on expanding this feature.

8. Click "Add filter" and wait for confirmation.

9. If you want to rerun your feed watcher right that second, go to the feeds page and uncheck and recheck the box next to it. You will see "Running Watcher" appear again in the status window.



Now... after all that I've checked out the link you provided and I see absolutely no useful links in either the feed or resulting pages. I'm not sure why you would even want to add that feed. (unless you're trying to scrape myspace pages... then you would add something like myspace in the linktype field of the filter.)

When I made this I had things like katz or ev0 feeds in mind.
Reply With Quote
  #17  
Old 07.11.2010, 11:43
holtzi
Guest
 
Posts: n/a
Default JDFeedMe

a similar plugin is developed.
check out this: JDFeedMe
Code:
http://board.jdownloader.org/showthread.php?p=114316
**External links are only visible to Support Staff**
maybe we can collaborate

Last edited by holtzi; 07.11.2010 at 12:34.
Reply With Quote
  #18  
Old 07.11.2010, 12:17
remi
Guest
 
Posts: n/a
Default

@holtzi

Thanks for this great feature.

I wonder why this is just an "add-on", because it seems to be properly documented and well integrated with jD.
Reply With Quote
  #19  
Old 08.11.2010, 23:41
jdlbot
Guest
 
Posts: n/a
Default

Quote:
Originally Posted by holtzi View Post
a similar plugin is developed.
check out this: JDFeedMe
Code:
http://board.jdownloader.org/showthread.php?p=114316
**External links are only visible to Support Staff**
maybe we can collaborate
pm sent!
Reply With Quote
  #20  
Old 09.11.2010, 11:27
chaver1
Guest
 
Posts: n/a
Default can you please post a link for jd beta release

cause it cannot work on jdownloader untill they sighn it please help me find a beta release thanks
Reply With Quote
  #21  
Old 09.11.2010, 11:32
remi
Guest
 
Posts: n/a
Default

It seems to work with the Nightly test version of jD. Please visit the Nightly forum and read the sticky posts of that forum.
Reply With Quote
  #22  
Old 26.01.2011, 09:17
buggsy buggsy is offline
BugMeNot Account
 
Join Date: Mar 2009
Location: everywhere/nowhere
Posts: 1,120
Default

I set up a custom rss feed at filestube.com, which looks something like this:
_www.filestube.com/rss.rss?q=robot.chicken.*s05e02.*720p
(the episode-specific feed is just to test things until I get it working)

I fed this into jdlbot, and set up the feed and filter accordingly. When running, it gives me this error:

error, 599 Only http and https URL schemes supported
Failed to follow link: _**External links are only visible to Support Staff**
(the underscores are added to keep the links from disappearing)
Any clue as to what's causing this?

Last edited by buggsy; 26.01.2011 at 09:28.
Reply With Quote
  #23  
Old 26.01.2011, 10:30
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,434
Default Mini Regular Expression tutorial

Try .*? instead of .*.

.* is greedy and matches to the end of the line.
.*? is lazy and slower, but it only matches what is necessary.

**External links are only visible to Support Staff**www.filestube.com/rss.rss?q=robot.chicken.*?s0?5[ex]?\d{1,4).*?720p

s0? The ? means the 0 is optional.
[ex] Means either character
[ex]? Means either character, but optional
\d means a digit [0-9] (There are other special \ combinations)
{1,4) (those are curly braces) means from 1 to 4 occurrences
\d{1,4) means 1 to 4 digits.

Search the web for ("Regular expression" tutorial OR summary OR nutshell)
Reply With Quote
  #24  
Old 26.01.2011, 13:49
buggsy buggsy is offline
BugMeNot Account
 
Join Date: Mar 2009
Location: everywhere/nowhere
Posts: 1,120
Default

Thanks for the quick reply - however, I don't think the expressions are the problem. Replacing the .* with .*? didn't solve it, but my original feed had a list of links that jdlbot should have been able to parse. It's not getting the links in the rss that is the problem, it's the fact that, for some reason, jdlbot doesn't like the list I'm giving it.
Reply With Quote
  #25  
Old 27.01.2011, 08:53
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,434
Default

@ Buggsy,

The addon is not part of JDownloader. It is a user provided program that is separate from JDownloader, but interacts with JDownloader via interfaces.

Try putting .*? at the beginning and end of the expression.

Try debugging your expressions by loading only one at a time and see which work and which do not (let us know the results).

Re-read posts 16 and 19.
__________________
Please, in each Forum, Read the Rules!.Helpful Links. Read before posting.
Reply With Quote
  #26  
Old 27.01.2011, 19:30
buggsy buggsy is offline
BugMeNot Account
 
Join Date: Mar 2009
Location: everywhere/nowhere
Posts: 1,120
Default

@drbits,
Thanks for the help troubleshooting, I appreciate the time that you're putting towards this. However, as I said before, I don't think it's a problem with the expressions, I think it's a problem with how jdlbot interacts with filestube. Any rss feed I grab from filestube gives me the error - I even tried leaving the filter blank in jdlbot to allow it to pull every link. It still gives me this error. I strongly believe it is not a problem with the regex, as it is finding the links but it has an issue trying to follow them.

Also, I understand that this is not part of JDownloader - I assumed this thread would still be an appropriate place for help, though. As for debugging my expressions, do you mean the expressions in the filters of jdlbot, or in the search field of filestube? As I said, even a blank filter in jdlbot gives errors, and I don't think the expression in filestube should matter, as long as there are search results. I tried adding .*? in my search query regardless, and it did not help. I have also tried other filestube rss feeds, including a simple one-word search. Any feed from filestube is unsuccessful.

I should add that I have other feeds working in jdlbot, so it isn't a lack of understanding the interface.

Just a reminder, the error I'm receiving is "error, 599 Only http and https URL schemes supported", followed by a "failed to follow link". Any other suggestions?

Last edited by buggsy; 27.01.2011 at 19:32.
Reply With Quote
  #27  
Old 28.01.2011, 09:41
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,434
Default

Something is preventing jdlbot from finding the whole string. Try using a jdlbot filter that starts with
http[s]?://(www\.)?filestube.com/
and so on.
Reply With Quote
  #28  
Old 22.07.2011, 05:41
RobbieG
Guest
 
Posts: n/a
Question Help using?

Hi,

I have installed this, read through the thread, read the wiki, but I cannot figure out how to get this working. Can someone post a sample feed and filter that they use, so I can see what exactly I need to do?

Thanks!
Reply With Quote
  #29  
Old 26.07.2011, 09:47
remi
Guest
 
Posts: n/a
Default

Are you talking about jdlbot or jdfeedme?

A sample feed and filter depend on what links you want to obtain.
Reply With Quote
  #30  
Old 07.08.2011, 23:37
jdlbot
Guest
 
Posts: n/a
Default

For jdlbot, here is a (terse) example:




Here is a site to learn more about regular expressions:
Code:
**External links are only visible to Support Staff**

Last edited by jdlbot; 07.08.2011 at 23:39.
Reply With Quote
  #31  
Old 26.08.2011, 05:56
tminusg
Guest
 
Posts: n/a
Default

nothings happening, it indicates its watching the feed and my connections are good but nothing is going into jdownloader. i setup a couple of test runs with shows that i saw on the rss feed. can anyone help me? i really want to get this working. great idea by the way
Reply With Quote
  #32  
Old 12.03.2012, 10:49
Shinobi
Guest
 
Posts: n/a
Default

hi,

i just installed jdlbot and i'm getting an error message saying

Error parsing Feed: **External links are only visible to Support Staff**www.rslog.net/feed/

can anyone tell me, where the problem is?

Thank you in advance!
Reply With Quote
  #33  
Old 12.03.2012, 11:39
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,571
Default

as the plugin is outdated and the original developers no longer work on it, it got removed from our sourcebase. we can't fix/support 3rd party addons. i'm sorry for this
__________________
JD-Dev & Server-Admin
Reply With Quote
  #34  
Old 12.03.2012, 14:02
Shinobi
Guest
 
Posts: n/a
Default

ok thanks for the fast reply. is there any alternative?

i just want to get a list of uploaded.to download links for me tv shows. i'm note really sure, what i should be looking for...
Reply With Quote
  #35  
Old 09.12.2016, 02:12
thawn thawn is offline
Junior Loader
 
Join Date: Jul 2012
Posts: 11
Default

Hi, I just recently picked up from where this was last left off at github.

I managed to get it to work with jdownloader 2. I am now using the cnl api (too lazy/inexperienced with perl to write an api for my.jdownloader.org).

I also freshened up the user interface and added a feature to automatically create filters from filter titles.

Check it out here: **External links are only visible to Support Staff****External links are only visible to Support Staff**
Reply With Quote
  #36  
Old 09.12.2016, 06:18
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,614
Default

@thawn
nice, by the way there are public libraries out for the new api (which my.jd uses) in most languages, you would just need find one of those than reinvent the wheel. Might have some added benefits for example if mirrors added were offline, then your rss scraper could keep adding mirrors from other sources.

raztoki
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]

Last edited by raztoki; 09.12.2016 at 07:05.
Reply With Quote
  #37  
Old 09.12.2016, 08:43
thawn thawn is offline
Junior Loader
 
Join Date: Jul 2012
Posts: 11
Default

Quote:
Originally Posted by raztoki View Post
@thawn
nice, by the way there are public libraries out for the new api (which my.jd uses) in most languages, you would just need find one of those than reinvent the wheel. Might have some added benefits for example if mirrors added were offline, then your rss scraper could keep adding mirrors from other sources.

raztoki
jdlbot is written in pearl (not my idea but I did not want to start from scratch). Unfortunately i could not find any my.jdownloader api library available for pearl. :(

As soon as a pearl library becomes available, I'll make sure to use it
Reply With Quote
  #38  
Old 09.12.2016, 11:42
thawn thawn is offline
Junior Loader
 
Join Date: Jul 2012
Posts: 11
Default reopen or new thread?

question to the admins:
should I open an new discussion/support thread for the new version of the rss scraper or should we remove the [Erledigt] tag from this thread?
Reply With Quote
  #39  
Old 09.12.2016, 12:36
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,614
Default

think you will find its perl and not pearl ;p

ok, I'm surprised that it's not available. Quick look on our support mediums and I can't find it either, nor via google. Seemed to be a c#, php and python libs around.

I would just persist within this thread so have continuity.

The forum tags are more for us support staff, each language that forum software supports has its own tag translation also (In English it's 'Solved', which is adequate.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]

Last edited by raztoki; 09.12.2016 at 12:40.
Reply With Quote
  #40  
Old 09.12.2016, 15:54
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,571
Default

@thawn: you can also assign
dir=downloadfolder
package=packagename
add additional parameters in the flashgot api that you are using
__________________
JD-Dev & Server-Admin
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 19:14.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.