JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 17.02.2020, 03:24
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default Automatically fetch links from that host based on keywords

Is there anyway to have JD2 automate the task of adding links from a particular 'website' and 'host' based on keywords? Same keywords that I use on the website for filtering two separate search queries.

For example: From examplewebsitedotcom add the url from host xyz that match the following keywords in their posts:

keywords set 1: 1080p Bluray H264 AAC
keywords set 2: 1080p WEB

I would not be downloading everything since I still need to re-filter, I just need them to be added to the linkgrabber queue.
Reply With Quote
  #2  
Old 17.02.2020, 04:24
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 16,674
Default

if your regex is good, you could use linkcrawler rule, else decrypter plugin for given website and filter results (once or more), and return links
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #3  
Old 17.02.2020, 05:25
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default

For the linkcrawler rule I'm trying to find a sample on this site that I can use and edit.
Unfortunately, there are no plugins for rmz.cr.
Reply With Quote
  #4  
Old 17.02.2020, 05:26
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 16,674
Default

there is already decrypter plugin
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #5  
Old 17.02.2020, 05:56
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default

Where is the plugin?

Last edited by RPNet-user; 17.02.2020 at 08:07.
Reply With Quote
  #6  
Old 17.02.2020, 10:22
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 16,674
Default

search source for the domain name, or class name is rpdmvzcm
plugin works off the /release/ url not /video

if you dont want to edit the main plugin (since it returns all), you could use link filters and ignore all but what you want via regex.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #7  
Old 17.02.2020, 19:13
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default

I tried this combo and it does not work.

[ {
"enabled" : true,
"cookies" : null,
"updateCookies" : true,
"logging" : false,
"maxDecryptDepth" : 0,
"id" : 1581919157382,
"name" : "rmz.cr",
"pattern" : "https?://(www\\.)?rmz\\.cr/[^/]/.+",
"rule" : "DEEPDECRYPT",
"packageNamePattern" : null,
"passwordPattern" : null,
"formPattern" : null,
"deepPattern" : null,
"rewriteReplaceWith" : null
} ]
Attached Images
File Type: png Screenshot - 2_17_2020 , 12_07_52 PM.png (27.6 KB, 2 views)

Last edited by RPNet-user; 17.02.2020 at 19:17.
Reply With Quote
  #8  
Old 17.02.2020, 20:39
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 50,932
Default

Hi,

please explain exactly what you want to do here and post example URLs.
The things you tried to do here will not work.

Please add a detailed description on what is happening at this moment when you add the URLs and how the desired behavior would be like.

-psp-
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
Reply With Quote
  #9  
Old 17.02.2020, 21:48
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default

Nothing is actually happening with this because it is not doing 'any' of what I'm trying to accomplish.

When I test using
Code:
**External links are only visible to Support Staff**
every link from any and all hosts and titles/posts are been added to the linkgrabber queue which would still be the case regardless of filters, rules, etc.

Here is what I'm trying to accomplish:

Allow JD2 to add links to the linkgrabber from Uploaded(UL) and ClicknUpload(CU) from rmz.cr using two sets of keyword patterns:

keywords set 1: 1080p Bluray H264 AAC
keywords set 2: 1080p WEB

Set some kind of date range or limit as I do not want it to crawl indefinitely and/or grab links from months/years ago.

In basic laymen: I want the crawler to only add the (UL) and (CU) links for posts that match the movie titles with the above keyword patterns for date range example---> feb.10-to-current-date, it may require two separate sets of rules and I'm ok with that.

Example A: marvelvsdc 2019 1080p BluRay H264 AAC-groupname
Example B: dcvsmarvel 2020 1080p WEBRip x264-groupname

So only the links for posts/titles that have these keywords will be added '1080p BluRay H264 AAC' and '1080p WEB'.
Given that the 'WEB' is not specific, it will certainly add links from both 'WEBRip' and 'WEB-DL' and that is perfectly fine.

If it is not possible to do this with a date range then on a daily basis for all newly posted titles that match the filtered criteria would be ok as well since this can be easily automated/scheduled to run daily.
Reply With Quote
  #10  
Old 17.02.2020, 21:52
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 50,932
Default

So you want to add posts posted on their mainpage based on keywords, correct?

-psp-
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
Reply With Quote
  #11  
Old 18.02.2020, 00:21
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default

Yes, but for the past range of days if possible, otherwise, on a daily basis as they are been posted.

See screenshot of both keyword search query that I filter every three to four days. For the 1080p WEB I use chrome to highlight 'WEB' on every page of the search since their query for WEB cannot be filtered as well as other sites like scene-rls.net and rarbg. After three days, the left side query will provide anywhere from 2-3 pages, however the web (right side query) will result with around 16+ pages.

I just noticed that every single post for both queries includes downloadable links for Rapidrar(RR) so it would be best to substitute CU with RR as some posts do not include downloadable links for either UL or CU, this way the crawler will not skip those filtered posts when they do not have links for UL and CU. So add the downloadable RR and UL links for the filtered results.

I just realized that the query by keyword for the second set '1080p WEB' may not work well because it will probably include TV-shows which I do not want, so, here is a further breakdown of what I 'actually' need from the 1080p WEB keyword query in Movies category: 1080p WEBRip VXT and 1080p WEBRip RARBG.

RMZ search query results in movies will not work properly when it includes the groupname VXT since it is too short for the query.
So basically, I'm only interested in 1080p WEBRip VXT and RARBG and 1080p Bluray VXT and RARBG from the movies categories.

Both queries include all four results, otherwise I would have to use several search queries. This is not a problem with sites like scene-rls.net/releases/index.php and RARBG since I'm able to search on both of those sites with only two queries that filters the results accurately for just movies by using '1080p VXT' and '1080p RARBG' so if both of these keyword queries can be used for the site crawler rules then that would be perfect.
Attached Images
File Type: png RMZ.Keyword.Search.Query.png (178.9 KB, 3 views)

Last edited by RPNet-user; 18.02.2020 at 02:51. Reason: added information
Reply With Quote
  #12  
Old 18.02.2020, 18:40
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 50,932
Default

What you want is far too complicated and cannot be accomplished using link crawler rules.
I've also noticed that this website will sometimes ask for a captcha when you do search querries --> That is another issue.

Please also keep in mind that we do not develop plugins that support search querries.

Here are your options:
- Develop an EventScripter script for this purpose
- Develop any other kind of script that crawls the links and adds them to JD from outside e.g. via myjdownloader API or via the old remote API
- Grab our plugin and extend the code so that it does what you want. We are open source --> Easiest solution probably

I will mark this thread as "Declined" because we will not develop the solution you want to have.

-psp-
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
Reply With Quote
  #13  
Old 18.02.2020, 20:55
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default

Thanks, I will look into all three options starting with an eventscripter.
Reply With Quote
  #14  
Old 22.02.2020, 14:52
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default URL pattern in link crawler rules

Ok, so I'm trying to crawl 5 pages of a site, however, the first page has no number(null) and there is no page labeled with a "1" so here is how they are numbered. Regex101 shows it as a 'pattern error' especially since JD2 crawler rules slashes do not match with regex101.

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

The pattern that I'm testing with is:
"https?://rmz\\.cr/l/m/[0-5]"
Reply With Quote
  #15  
Old 22.02.2020, 15:54
tony2long's Avatar
tony2long tony2long is offline
English Supporter
 
Join Date: Jun 2009
Posts: 6,321
Default

"https?://rmz\\.cr/l/m/[0-5]?"
__________________
FAQ: How to upload a Log
Reply With Quote
  #16  
Old 22.02.2020, 15:55
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 16,674
Default

the error (on regex101) as you have double escaped, use the python flavour and remove \\. for \.

for Java / and in Eclipse IDE you need to double escape some chars.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #17  
Old 22.02.2020, 19:49
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default

@tony2long, thanks, unfortunately, adding the quantifier ? provides the same results as without it, in that it breaks the deepPattern expression which means that if no expression in the pattern will make it work then I will need to modify the DP as well.

@raztoki, thanks, using python without the extra backlash does match all five of my crawler test strings, unfortunately, JD/Java must have it so testing with other languages will not benefit the test strings.
Reply With Quote
  #18  
Old 23.02.2020, 05:29
tony2long's Avatar
tony2long tony2long is offline
English Supporter
 
Join Date: Jun 2009
Posts: 6,321
Default

Sorry, it's not clear for me.
__________________
FAQ: How to upload a Log
Reply With Quote
  #19  
Old 23.02.2020, 12:27
RPNet-user RPNet-user is offline
JD Addict
 
Join Date: Apr 2017
Posts: 153
Default

Would someone explain why the linkcrawler is able crawl and grab from:
**External links are only visible to Support Staff****External links are only visible to Support Staff**

But not crawl and grab from:
**External links are only visible to Support Staff****External links are only visible to Support Staff**

Last edited by raztoki; 23.02.2020 at 13:19.
Reply With Quote
  #20  
Old 23.02.2020, 13:20
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 16,674
Default

at a guess, the link format never existed when plugin was created.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 17:48.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.