[Solved] sanet.st sanet.lc softarchive.is Linkcrawler Passing Cookies - JDownloader Community

MediaFanatic · #1 14.04.2022, 05:28

I've worked on-and-off -- over two years -- and 70+ hours -- making different attempts to use LinkCrawler rules in jDownloader that pass cookies.

I've never been able to get them to work. I previously opened a thread on this topic. After going into great detail with the syntax and logging, the reply I received was: Cookies in Link Crawler rules are not very good and need to be worked on.

I assumed I would give it time, for these issues to be resolved. Every few months I would try again. To this day, I cannot write a LinkCrawler rule that passes cookies.

Here's a simple example, with sensitive data removed:

Code:

[
 {
  "name"               : "SaNet-SoftArchive",
  "id"                 : 1649901977688,
  "enabled"            : true,
  "pattern"            : "**External links are only visible to Support Staff**,
  "maxDecryptDepth"    : 1,
  "rewriteReplaceWith" : null,
  "passwordPattern"    : null,
  "packageNamePattern" : "<title>(.*?)</title>",
  "rule"               : "DEEPDECRYPT",
  "logging"            : true,
  "formPattern"        : null,
  "deepPattern"        : "(rapidgator\\.net/file/|nitroflare\\.com/view/|nitro\\.download/view/|filefactory\\.com/file/)",
  "updateCookies"      : true,
  "cookies"            : [
                          ["id","123456"],
                          [
                           "sa_remember",
                           "xxxxxd5d26xxxxx0ae03dxxxxxe62170xxxxx359b6xxxxx7517fxxxxx3bxxxxx"
                          ],
                          [
                           "AdskeeperStorage",
                           "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
                          ],
                          [
                           "PHPSESSID",
                           "xxxxx2b1cbaxxxxx46cxxxxx669xxxxx"
                          ]
                         ]
 }
]

As you can see, this is an extremely simple example.

This is for the website "SoftArchive" (sanet), the largest OCH-software indexer as rated by traffic.

I test the exact same settings manually and I am authenticated (links are shown). Through jDownloader, the cookies are not sent.

This is the same situation on every site I've tested. I've confirmed the crawlers are running in the log.

Jiaz · #2 14.04.2022, 10:24

@MediaFanatic:
Next time please ask for help earlier

We often can help you faster or tell you where to look at or what the problem might have been

You can enable logging of your rule, then you can see if the rule matches and what the server response is.
https://support.jdownloader.org/Know...kcrawler-rules
set

Quote:

"logging": true,

What I can tell is that your deepPattern is wrong. Your pattern will not return any links at all

Please know that the deepPattern specifies in what matching group JDownloader should be looking for links. In your case the matchig group will just be "host.com/file", no protocol, no fileID, nothing

We can help with rule but we need username/password or cookies, send to support@jdownloader.org (including your rule).

Jiaz · #3 14.04.2022, 10:32

Quote:

Originally Posted by MediaFanatic

I assumed I would give it time, for these issues to be resolved.

Currently there are no known issues with cookies in Linkcrawler rules

MediaFanatic · #4 26.04.2022, 09:15

@Jiaz --
Thank you for the reply. I'm sorry I didn't reply earlier (for some reason I didn't receive the forum notification despite being subscribed -- will check spam folder).

I should clarify -- I did bring this to your attention much earlier. When I posted about, referring to "waiting", it was because my forum-posts ended in a dead-end. In that post, @raztoki mentioned there wasn't a solution and the cookie-implementation is not ideal.

I interpreted that to suggest that the cookie implementation would improve over time. In the meantime I continued testing different sites, writing my own demo page to test with it, etc.

Of course I've been using the logging; however, that has not helped. It does not state the reason the rule didn't work as you mention. When I started about two years ago, it was better in regard to the custom-LinkCrawler logging. The logging has changed since then and (I'm sure I could be wrong) from my impression, in the area of custom-LinkCrawlers, it has shown fewer details in the newer logging approach.

If you are interested in my original message that resulted in a dead-end, I would be very happy if you could take a look -- starting at the first post and reading chronologically: https://board.jdownloader.org/showthread.php?t=83773

I included the full logging in that message; that was prior to the new logging and it was a bit more helpful (for my unique case).

Thank you in advance for your time and insight

MediaFanatic · #5 26.04.2022, 09:50

Sorry - One more thing to add --

Quote:

Originally Posted by Jiaz

Please know that the deepPattern specifies in what matching group JDownloader should be looking for links. In your case the matchig group will just be "host.com/file", no protocol, no fileID, nothing

The DeepPattern I'm using is in the same format specified in the articles / posts I have been able to find on LinkCrawler Rules.

It's the exact format you mentioned "host.com/file". There is no protocol, no fileID, nothing else -- just as you've written.

Here it is, one more time:

Code:

"(rapidgator\\.net\\/file\\/|nitroflare\\.com\\/view\\/|nitro\\.download\\/view\\/|filefactory\\.com\\/file\\/)

Because this is regex, I have to escape the dot "." -- however, in simpler terms, I'm just doing what you've said (host.com/file) using "OR" logic, nothing more.

Although there is one mistake; unfortunately it was part of my entering a sample here in the forum -- not in the original rule. I have corrected my mistake and created a sample where you can see very clearly how simple this rule works:
**External links are only visible to Support Staff****External links are only visible to Support Staff**

Thank you again for you all of your help!

Jiaz · #6 26.04.2022, 15:57

@MediaFanatic: Nothing to be sorry for. Yes, I'm aware of the old thread but I'm currently not aware of any problems with cookie support in linkcrawler rules.

Quote:

Originally Posted by MediaFanatic

The logging has changed since then and (I'm sure I could be wrong) from my impression, in the area of custom-LinkCrawlers, it has shown fewer details in the newer logging approach.

With logging enabled within the rule, the log will contain request/response from the corresponding requests. I'm not aware of any problems with the logging, maybe you can explain what you're missing?
I also take another look at , https://board.jdownloader.org/showpo...76&postcount=1 and I wonder about this rule because it conains fields that are not even supported/handled/used at all, like accountPattern/domainPattern, looks like it's a Linkcrawler Rule mixed with DomainRule because accountPattern/domainPattern are part of DomainRules and not Linkcrawler Rules. Also it is missing the essential pattern field, so JDownloader will not make use of that rule at all.

I guess the problem is because of wrong/mixed up rule in first place. pspzocker wrote good help articles on it, see
https://support.jdownloader.org/Know...kcrawler-rules
https://support.jdownloader.org/Know...kcrawler-rules

Jiaz · #7 26.04.2022, 16:07

Quote:

Originally Posted by MediaFanatic

The DeepPattern I'm using is in the same format specified in the articles / posts I have been able to find on LinkCrawler Rules.

It's the exact format you mentioned "host.com/file". There is no protocol, no fileID, nothing else -- just as you've written.

Here it is, one more time:

Code:

"(rapidgator\\.net\\/file\\/|nitroflare\\.com\\/view\\/|nitro\\.download\\/view\\/|filefactory\\.com\\/file\\/)

This pattern is wrong. JDownloader will just find the matching group! your pattern must include the complete link. in your case JDownloader will just return for example "rapidgator.net/file".
You can see it in your regex101.com/r/x8Z5bo/1 too, see on the right side "match information", that is what JDownloader will *find*, those are no valid links at all

your deepPattern must either match exactly on the link you want to find or on the region where the links can be found.

I can help with the rule but I need working cookies. You can send them to support@jdownloader.org

MediaFanatic · #8 27.04.2022, 02:54

@Jiaz, thank you!

Yes, exactly, @pspzocker's page you linked is one of two that I was using to learn/compose the rule. I also searched posts on this forum.

To your comment re:Invalid Fields --

If you examine my rule above, in the first post, do you see any fields that are invalid? If there is anything unrelated to LinkCrawler (I didn't see a domain or accountPattern; that may have been in my original post where I was still trying to test different ideas from the forum). Hopefully I'm not including any incorrect fields any longer.

You second post about the "deepPattern" is perfect.

I had no idea that it had to match the entire link; I thought it was a wildcard, where anything that matched would be automatically searched for the full HREF. I also didn't realize that it needed to be an explicit match-group (separate parenthesis to indicate capture group).

Based on your help, I simply wrote a RegEx rule that took everything between the specific HREF HTML tags that included downloads. This worked perfectly!

I was also able to play with the JSON and address another issue, to get my first cookie to finally pass properly, thanks to your suggestion re:deepPattern, which eliminated that issue. This allowed me to see the issues with cookies more clearly and resolve the issue; at least on my simple test website I created to diagnose the issue.

Now that I have a test scenario working, I'll return to my original problem with LinkedIn Learning, to see if I can create a method for jDownloader to download my videos.

Thank you again for your help

Jiaz · #9 27.04.2022, 10:25

Quote:

Originally Posted by MediaFanatic

To your comment re:Invalid Fields --
If you examine my rule above, in the first post, do you see any fields that are invalid?

I'm sorry, my bad. I was referring to your other thread
https://board.jdownloader.org/showthread.php?t=83773

Jiaz · #10 27.04.2022, 10:25

Quote:

Originally Posted by MediaFanatic

I had no idea that it had to match the entire link; I thought it was a wildcard, where anything that matched would be automatically searched for the full HREF. I also didn't realize that it needed to be an explicit match-group (separate parenthesis to indicate capture group).

I will ask pspzockerscene to explain this more explicit

Quote:

Originally Posted by MediaFanatic

Based on your help, I simply wrote a RegEx rule that took everything between the specific HREF HTML tags that included downloads. This worked perfectly!

You're welcome and sorry it took so long to come down to the root of the issue

Jiaz · #11 27.04.2022, 10:32

Quote:

Originally Posted by MediaFanatic

Thank you again for your help

You're welcome and next time better ask and not wait an eternity

Sometimes I also just read over or forget about issues/threads until reading again about it

pspzockerscene · #12 11.10.2022, 12:18

As a public reply to a ticket we received I will post a new rule for this website here as the website has slightly changed.
The new rule will also auto set a package name according to the websites' title and supports some more domains:

Code:

[
  {
    "name": "example rule for sanet.st with login cookies",
    "enabled": true,
    "cookies": [
      [
        "sa_remember",
        "CENSORED"
      ],
      [
        "id",
        "CENSORED"
      ]
    ],
    "updateCookies": true,
    "pattern": "https?://(sanet\\.st|softarchive\\.is|sanet\\.lc)/.*?\\.\\d+\\.html",
    "rule": "DEEPDECRYPT",
    "deepPattern": "<a rel=\"external nofollow noopener\" href=\"(https?://[^\"]+)\" target=\"_blank\">",
    "packageNamePattern": "<title>(.*?)</title>"
  }
]

Rule as plaintext for easier copy & paste:
pastebin.com/raw/nx6tuF59

This rule will only work after adding valid login cookies of that website (see fields where values are "CENSORED" in the example)!

Cpd0day · #13 08.11.2023, 21:01

Hi,
They seem to have changed the login process on the site and this rule no longer works.

Is it possible for some one to check what modifications I would need to make to get our working

Sorry, I am not a coder and so the modifications I tried to make aren't working.

Thanks
C

pspzockerscene · #14 09.11.2023, 09:54

You don't need to be a coder for this.

To be able to help you, I need the following information via PN:

Your login credentials for said website
Example URLs that you want to add to JDownloader

pspzockerscene · #15 16.11.2023, 11:54

The credentials you sent me via PN do not work.
The website appears to be broken to me - login even fails with disabled adblocker.

Cpd0day · #16 23.12.2023, 06:41

I checked and it seems to work for me:

user: cpd0day
pw: REMOVED_BY_PSPZOCKERSCENE

site: sanet.st
alternative urls:

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

Thanks in advance

cpd0day

pspzockerscene · #17 28.12.2023, 13:26

So is this just an information or do you need help?
I'm asking because you wrote "it seems to work for me".

pspzockerscene · #18 28.12.2023, 16:48

I've checked the rule and indeed it didn't work anymore as the website has changed a bit.
I've updated the deepPattern and now it's working again:

Code:

[
  {
    "name": "example rule for sanet.st with login cookies",
    "enabled": true,
    "logging": false,
    "cookies": [
      [
        "sa_remember",
        "CENSORED"
      ],
      [
        "id",
        "CENSORED"
      ]
    ],
    "updateCookies": true,
    "pattern": "https?://(sanet\\.st|softarchive\\.is|sanet\\.lc)/.*?\\.\\d+\\.html",
    "rule": "DEEPDECRYPT",
    "deepPattern": "<a rel=\"external nofollow noopener\" href=\"(https?://[^\"]+)\"[^>]*target=\"_blank\">",
    "packageNamePattern": "<title>(.*?)</title>"
  }
]

Rule on pastebin for easier copy & paste:
pastebin.com/raw/xWjuMyv9
I've highlighted the changes I did.
If you want this rule to work for more domains, feel free to add them yourself.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

	JDownloader Community Board - Archive - Top
Provided By AppWork GmbH \| Privacy \| Imprint