#1
|
|||
|
|||
[LinkCrawler Rule] request for eurekadll.asia
Hi staff,
can you add support for eurekadll.asia ? example links: **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** |
#2
|
||||
|
||||
Hi,
no. A plugin for this website is not required as it contains the URLs right away inside its HTML code. If you want JD to auto-crawl URLs from this website, you could accomplish this by creating a LinkCrawler rule (type: DEEPDECRYPT). -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 12.02.2021 at 17:51. Reason: Fixed typo |
#3
|
|||
|
|||
Hi psp
I add this rule Code:
[ { "enabled" : true, "cookies" : null, "updateCookies" : true, "logging" : false, "maxDecryptDepth" : 2, "id" : 1613401915616, "name" : "eureka example rule", "pattern" : "https?://eurekaddl\\.asia\\?s=\\d+", "rule" : "DEEPDECRYPT", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : "Download from <a href=\"(https?://[^\"]+)\"", "rewriteReplaceWith" : null } ] **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** but it crawls also other many spam links (for example facebook, disqus ecc) This rule, for example, doesn't work for links like **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** also if I set "maxDecryptDepth" : 2 LOG Code:
15.02.21 16.04.38 <--> 15.02.21 16.30.10 jdlog://2123725302851/ |
#4
|
||||
|
||||
Quote:
You need a separate rule for that. I made one which always only grabs filecrypt.cc URLs as it seems like they're only using that. (See the end of this post.) The rule you've made does not work at all as your regular expression is wrong and your DEEPDECRYPT pattern is also waay too open. You can use webtools like regex101.com to test your regular expression. Please keep in miond that it is not part of our support to teach our users how regular expressions work. Please re-read our LinkCrawler Rules documentation and learn how regular expressions work so you can write your own rules in the future. I've made two rules for your two different types of URLs for this website, see here: Code:
[ { "enabled" : true, "maxDecryptDepth" : 1, "name" : "eurekaddl.asia 1: crawl filecrypt.cc URLs", "pattern" : "https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/", "rule" : "DEEPDECRYPT", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : "(https?://filecrypt\\.cc/[^\"]+)", "rewriteReplaceWith" : null },{ "enabled" : true, "logging" : false, "maxDecryptDepth" : 1, "name" : "eurekaddl.asia 2: crawl search results", "pattern" : "https?://eurekaddl\\.asia/\\?s=.+", "rule" : "DEEPDECRYPT", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : "\"(https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/)\"", "rewriteReplaceWith" : null } ] pastebin.com/wecvyRtt Please keep in mind that this rule should be furtherly optimized as it will also process "category URLs" at this moment which will slow it down. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#5
|
|||
|
|||
Thank you very much
|
#6
|
|||
|
|||
@psp
In your JD crawler rule is possible to add also a specific html tag to avoid that second rule capture all eurekaddl urls from query /?s=? For example I see that Cosey - Jonathan are files that I wish capture and are 12 urls but with your second rule JD capture urls also outside html tag Code:
class="container mainBg mainContainer" |
#7
|
||||
|
||||
Sure.
Well you can't just "define an area" to search as you're limited to that one pattern but you can of course change it to only grab all URLs inside tags after specified html classes representing the search results e.g.: Code:
"deepPattern" : "<div class=\"teaser-box\">\\s*<a href=\"(https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/)\"", See regex101.com Full rule(s): Code:
[ { "enabled" : true, "maxDecryptDepth" : 1, "name" : "eurekaddl.asia 1: crawl filecrypt.cc URLs", "pattern" : "https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/", "rule" : "DEEPDECRYPT", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : "(https?://filecrypt\\.cc/[^\"]+)", "rewriteReplaceWith" : null },{ "enabled" : true, "logging" : false, "maxDecryptDepth" : 1, "name" : "eurekaddl.asia 2: crawl search results", "pattern" : "https?://eurekaddl\\.asia/\\?s=.+", "rule" : "DEEPDECRYPT", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : "<div class=\"teaser-box\">\\s*<a href=\"(https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/)\"", "rewriteReplaceWith" : null } ] pastebin.com/mwHdCcxf -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#8
|
|||
|
|||
I try your edit but seems doesn't work well because it crawls only 3 URLs and not 12 urls that are inside that query
You choose teaser-box as tag and this is ok, but for strange reason JD fetch only 3 links I try also to change teaser-box into row but it don't fetch nothing but I think to write correctly Code:
"deepPattern" : "<div class=\"row\">\\s*<a href=\"(https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/)\"", Code:
16.02.21 16.59.12 <--> 16.02.21 16.55.45 jdlog://3943725302851/ Last edited by nathan1; 16.02.2021 at 17:12. |
#9
|
||||
|
||||
Works just fine here.
Please make sure to use the exact rule I've posted: pastebin.com/mwHdCcxf Again: You can check all regular expressions here: regex101.com Please learn how to use regular expressions on your own. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#10
|
|||
|
|||
yes, you're right.
Strange problem. I change my windows account where I have installed another Jdownloader and I see that capture all 12 links correctly. But into Jdownload of other windows account I return always 3 files. Thanks for all, psp! |
#11
|
|||
|
|||
@psp
I try to test this link **External links are only visible to Support Staff****External links are only visible to Support Staff** but it crawls only 12 links but we have about 45. Maybe is wrong something? |
#12
|
||||
|
||||
Just scroll down - there is mutliple pages of search results.
You'd need to extend that rule to grab these pages too and accept their URL format. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
|
|