[Solved] [LinkCrawler Rule] request for eurekadll.asia - JDownloader Community

nathan1 · #1 12.02.2021, 17:45

Hi staff,
can you add support for eurekadll.asia ?

example links:

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

pspzockerscene · #2 12.02.2021, 17:50

Hi,

no.

A plugin for this website is not required as it contains the URLs right away inside its HTML code.
If you want JD to auto-crawl URLs from this website, you could accomplish this by creating a LinkCrawler rule (type: DEEPDECRYPT).

-psp-

nathan1 · #3 15.02.2021, 16:30

Hi psp

I add this rule

Code:

[ {
  "enabled" : true,
  "cookies" : null,
  "updateCookies" : true,
  "logging" : false,
  "maxDecryptDepth" : 2,
  "id" : 1613401915616,
  "name" : "eureka example rule",
  "pattern" : "https?://eurekaddl\\.asia\\?s=\\d+",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "Download from <a href=\"(https?://[^\"]+)\"",
  "rewriteReplaceWith" : null
} ]

it works for links such

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

but it crawls also other many spam links (for example facebook, disqus ecc)

This rule, for example, doesn't work for links like

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

also if I set "maxDecryptDepth" : 2

LOG

Code:

15.02.21 16.04.38 <--> 15.02.21 16.30.10 jdlog://2123725302851/

pspzockerscene · #4 15.02.2021, 17:10

Quote:

Originally Posted by nathan1

it works for links such

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

No it doesn't!
You need a separate rule for that.
I made one which always only grabs filecrypt.cc URLs as it seems like they're only using that.
(See the end of this post.)

Quote:

Originally Posted by nathan1

This rule, for example, doesn't work for links like

The rule you've made does not work at all as your regular expression is wrong and your DEEPDECRYPT pattern is also waay too open.
You can use webtools like regex101.com to test your regular expression.
Please keep in miond that it is not part of our support to teach our users how regular expressions work.
Please re-read our LinkCrawler Rules documentation and learn how regular expressions work so you can write your own rules in the future.

I've made two rules for your two different types of URLs for this website, see here:

Code:

[ {
  "enabled" : true,
  "maxDecryptDepth" : 1,
  "name" : "eurekaddl.asia 1: crawl filecrypt.cc URLs",
  "pattern" : "https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "(https?://filecrypt\\.cc/[^\"]+)",
  "rewriteReplaceWith" : null
},{
  "enabled" : true,
  "logging" : false,
  "maxDecryptDepth" : 1,
  "name" : "eurekaddl.asia 2: crawl search results",
  "pattern" : "https?://eurekaddl\\.asia/\\?s=.+",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "\"(https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/)\"",
  "rewriteReplaceWith" : null
} ]

Rules as plaintext to get around out forum "http" censoring:

pastebin.com/wecvyRtt

Please keep in mind that this rule should be furtherly optimized as it will also process "category URLs" at this moment which will slow it down.

-psp-

nathan1 · #5 16.02.2021, 12:26

Thank you very much

nathan1 · #6 16.02.2021, 15:00

@psp

In your JD crawler rule is possible to add also a specific html tag to avoid that second rule capture all eurekaddl urls from query /?s=?

For example I see that Cosey - Jonathan are files that I wish capture and are 12 urls but with your second rule JD capture urls also outside html tag

Code:

class="container mainBg mainContainer"

that I wish to insert to define a perimeter in order to delimit the capture links area.

pspzockerscene · #7 16.02.2021, 15:07

Sure.
Well you can't just "define an area" to search as you're limited to that one pattern but you can of course change it to only grab all URLs inside tags after specified html classes representing the search results e.g.:

Code:

  "deepPattern" : "<div class=\"teaser-box\">\\s*<a href=\"(https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/)\"",

As said you will have to learn RegEx on your own in order to be able to write your own patterns!
See regex101.com

Full rule(s):

Code:

[ {
  "enabled" : true,
  "maxDecryptDepth" : 1,
  "name" : "eurekaddl.asia 1: crawl filecrypt.cc URLs",
  "pattern" : "https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "(https?://filecrypt\\.cc/[^\"]+)",
  "rewriteReplaceWith" : null
},{
  "enabled" : true,
  "logging" : false,
  "maxDecryptDepth" : 1,
  "name" : "eurekaddl.asia 2: crawl search results",
  "pattern" : "https?://eurekaddl\\.asia/\\?s=.+",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "<div class=\"teaser-box\">\\s*<a href=\"(https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/)\"",
  "rewriteReplaceWith" : null
} ]

Plaintext:
pastebin.com/mwHdCcxf

-psp-

nathan1 · #8 16.02.2021, 17:08

I try your edit but seems doesn't work well because it crawls only 3 URLs and not 12 urls that are inside that query
You choose teaser-box as tag and this is ok, but for strange reason JD fetch only 3 links

I try also to change teaser-box into row but it don't fetch nothing but I think to write correctly

Code:

  "deepPattern" : "<div class=\"row\">\\s*<a href=\"(https?://eurekaddl\\.asia/[a-z0-9\\-]+/[a-z0-9\\-]+/)\"",

LOG

Code:

16.02.21 16.59.12 <--> 16.02.21 16.55.45 jdlog://3943725302851/

pspzockerscene · #9 16.02.2021, 17:31

Works just fine here.
Please make sure to use the exact rule I've posted:
pastebin.com/mwHdCcxf

Again:
You can check all regular expressions here:
regex101.com

Please learn how to use regular expressions on your own.

-psp-

nathan1 · #10 16.02.2021, 18:04

yes, you're right.
Strange problem. I change my windows account where I have installed another Jdownloader and I see that capture all 12 links correctly.

But into Jdownload of other windows account I return always 3 files.
Thanks for all, psp!

nathan1 · #11 18.02.2021, 13:45

@psp

I try to test this link
**External links are only visible to Support Staff****External links are only visible to Support Staff**

but it crawls only 12 links but we have about 45.
Maybe is wrong something?

pspzockerscene · #12 18.02.2021, 13:48

Just scroll down - there is mutliple pages of search results.
You'd need to extend that rule to grab these pages too and accept their URL format.

-psp-

		JDownloader Community - Appwork GmbH > English Support > Suggestions & Requests
[Solved] [LinkCrawler Rule] request for eurekadll.asia

	JDownloader Community Board - Archive - Top
Provided By AppWork GmbH \| Privacy \| Imprint