JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 01.12.2024, 15:52
DevLA DevLA is offline
Vacuum Cleaner
 
Join Date: Aug 2020
Posts: 16
Default [LinkCrawler Rules] How to download multiple things from xivmodarchive.com

Request for Guidance: Downloading Preview Images and Mod URLs with JDownloader 2
I am seeking assistance in using JDownloader 2 to capture preview images and mod URLs from **External links are only visible to Support Staff****External links are only visible to Support Staff**. The site requires a Discord login to access its full functionality and uses Cloudflare protection. Would using cookies help bypass these restrictions?

Below are examples of the URLs I am trying to download, categorized as mod pages, preview images, and mod files:

Mod Pages:

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
Preview Images:

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
Mod Files:

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
External Links: Some mods redirect to external platforms, such as Google Drive or Mega.nz:

**External links are only visible to Support Staff****External links are only visible to Support Staff**
Author Pages and Search Results:

Author Page Example
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

How can I set up JDownloader 2 to efficiently grab all the above content, especially given the site's login and Cloudflare protection? Basically I want each url in its own package with its own content like other websites have it in JDownloader2. Is a plugin or a linkcrawler needed? Any guidance or workarounds would be greatly appreciated.
Reply With Quote
  #2  
Old 02.12.2024, 13:51
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 73,786
Default

Hi,

Quote:
Originally Posted by DevLA View Post
How can I set up JDownloader 2 to efficiently grab all the above content
Create multiple LinkCrawler rules that do what you want.
Docs:
https://support.jdownloader.org/know...kcrawler-rules

https://support.jdownloader.org/know...kcrawler-rules

Quote:
Originally Posted by DevLA View Post
Basically I want each url in its own package with its own content like other websites have it in JDownloader2
Default handling should work fine for you then.

Here is the rough way:

1. Create a Rule of type DIRECTHTTP for the "mod files" links.
https://support.jdownloader.org/know...ple-directhttp
Test it.

2. To make JD automatically find the "preview images" and/or also the "mod files" links from the "mod pages", you need another rule, this time of type DEEPDECRYPT:
https://support.jdownloader.org/know...le-deepdecrypt

3. About your Google Drive link:
JD supports Google Drive so that will work out of the box.

4. About "Author Pages and Search Results"
Either create more rules (one rule of type DEEPDECRYPT could do the job) which find the "/modid/" links inside user/search links or use the following semi-automatic way:
https://support.jdownloader.org/know...orted-websites

You can find a lot of example rules for other websites in our forums and some basic examples in our knowledgebase.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #3  
Old 02.12.2024, 16:47
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 81,696
Default

@DevLA: In case you're stuck, need help or got questions, please just ask
__________________
JD-Dev & Server-Admin
Reply With Quote
  #4  
Old 04.12.2024, 01:26
DevLA DevLA is offline
Vacuum Cleaner
 
Join Date: Aug 2020
Posts: 16
Default

So, I tried setting up a LinkCrawler rule. However, when I run it, nothing happens. I checked the LinkCrawler log, and it says it's blocked by Cloudflare. If I try to copy and paste a direct URL link into JDownloader, it will pick it up. But when I attempt to download it, it says it's blocked by Cloudflare.

Here is the linkcrawler rule and setup.

Code:
[
 {
  "cookies"            : [
                          [
                           "cf_clearance",
                           "4MAk312EtgkDrAM9_wO9LUW7mrOHBuk4FARou.u.0GY-1733267423-1.2.1.1-yBtXi8zE3bIyFYfydW6PIU030FpBewhM0d2Rk3HD7DtoOVlN1qGbZJs_X2hpgev0UGl9IY4tX4iW9osSa7f3IOxpBNE1vDfLuobnCn1IX7AlK6yZoU4FaMq9idJ2kbIJbr18skGQHsrPzLiNoWNAXgffggggfgeMDsVkzeU5XFGzR1UuDyMh0Z5YEnHLcNYBcWEfP2xzpsgSgiGKd8wzLnqDy_cKm_Anm37fLK61xzmfIUEsPYK_E6I.zGDOq4KyQEaKze6BKnoXCUmFOCSDGrA_NZ6wQayl.G7oqiTGfTpCCKWr0CLYARmn11oCnf5KSxHm4HWZg60ZqumiDvJlf.N7W.rgPqQ7dcMraBuKCzNHvArRu2h8ddEXvWx.WaOc0Ic276cp8UU7RFqbxKFkkPYT.8_GrITQVubHAkFPRe1PZ8gKe7HAtoDkAJvtgSVTgTrDFl2m2S5h7A"
                          ],
                          [
                           "connect.sid",
                           "s%3ASdILioTgsApfJlw3_J8hrr5TSnLQFsJfggffggfggA7Z2XgDGCT%2B8XFXSIZEnt8rCElG1A"
                          ]
                         ],
  "deepPattern"        : "<a href=\"/private/([a-f0-9\\-]+)\" id=\"mod-download-link\">",
  "formPattern"        : null,
  "headers"            : null,
  "id"                 : 1716000458778,
  "maxDecryptDepth"    : 0,
  "name"               : null,
  "packageNamePattern" : null,
  "passwordPattern"    : null,
  "pattern"            : "**External links are only visible to Support Staff**,
  "propertyPatterns"   : null,
  "rewriteReplaceWith" : null,
  "rule"               : "DEEPDECRYPT",
  "enabled"            : true,
  "logging"            : true,
  "updateCookies"      : false
 }
]
cookies modified to not be in used.

Last edited by DevLA; 04.12.2024 at 01:31.
Reply With Quote
  #5  
Old 04.12.2024, 10:38
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 73,786
Default

@DevLa
I cannot reproduce the Cloudflare problem.

Also the deepPattern of your rule is wrong and thus it will not find any results.
Also, you should really create that DIRECTHTTP rule for the "mod files" links first because if you don't, only those "/private" links that contain known file extensions (e.g. .zip) will be detected.
However, for basic testing it is fine to create the DEEPDECRYPT rule first.

Here is a version of your rule with the deepPattern fixed:
Code:
[
  {
    "deepPattern": "<a href=\"(/private/[^\"]+)\" id=\"mod-download-link\">",
    "formPattern": null,
    "headers": null,
    "id": 1716000458778,
    "maxDecryptDepth": 0,
    "name": null,
    "packageNamePattern": null,
    "pattern": "**External links are only visible to Support Staff**,
    "propertyPatterns": null,
    "rewriteReplaceWith": null,
    "rule": "DEEPDECRYPT",
    "enabled": true,
    "logging": true,
    "updateCookies": false
  }
Plaintext:
pastebin.com/raw/tF3QwU96

Example used for testing:
xivmodarchive.com/modid/122789

Results: Finds one .zip link
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #6  
Old 04.12.2024, 12:44
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 81,696
Default

@DevLA: Maybe an additional linkcrawler rule that also provides cookies for the download urls? it must be of rule/type DIRECTHTTP and match the pattern of the download urls. That may work
__________________
JD-Dev & Server-Admin
Reply With Quote
  #7  
Old 04.12.2024, 23:03
DevLA DevLA is offline
Vacuum Cleaner
 
Join Date: Aug 2020
Posts: 16
Default

@pspzockerscene
Thank you for the fix! I’m still relatively new to link crawling and have limited experience with regex and wildcards, so I greatly appreciate learning from the resources and guidance provided by you and the team. Your support has been incredibly helpful as I navigate this process.

I would like to test the different patterns and configurations, but I’m currently encountering persistent Cloudflare-related issues. It seems like JDownloader may not be updating with the new cookies I’ve been adding when modifying the LinkCrawler rules. This makes it difficult to determine whether my changes are effective or not.

@Jiaz
Thank you for your suggestion; I’ve incorporated it into the rules to see if it resolves the issue. However, I’m still facing the same problems, and I’m uncertain whether the issue lies with my JDownloader setup or with the format of my LinkCrawler rules.

Here is the linkcrawler rules and here is the linkcrawler log
Code:
[
  {
    "cookies": [
      [""],
      ["connect.sid", ""]
    ],
    "deepPattern": null,
    "formPattern": null,
    "headers": null,
    "id": 1716000458779,
    "maxDecryptDepth": 0,
    "name": "Direct File Links",
    "packageNamePattern": null,
    "passwordPattern": null,
    "pattern": "/private/[a-f0-9\\-]+/files/.*\\.(zip|rar|7z|ttmp2|pmp|ttmp)$",
    "propertyPatterns": null,
    "rewriteReplaceWith": null,
    "rule": "DIRECTHTTP",
    "enabled": true,
    "logging": true,
    "updateCookies": false
  },
  {
    "cookies": [
      [""],
      ["connect.sid", ""]
    ],
    "deepPattern": "<a href=\"(/private/[^\"]+)\" id=\"mod-download-link\">",
    "formPattern": null,
    "headers": null,
    "id": 1716000458778,
    "maxDecryptDepth": 0,
    "name": "Private Links",
    "packageNamePattern": null,
    "passwordPattern": null,
    "pattern": "**External links are only visible to Support Staff**,
    "propertyPatterns": null,
    "rewriteReplaceWith": null,
    "rule": "DEEPDECRYPT",
    "enabled": true,
    "logging": true,
    "updateCookies": false
  }
]
Log
04.12.24 13.39.35 <--> 04.12.24 14.00.27 jdlog://8817411370661/

Last edited by DevLA; 05.12.2024 at 22:04. Reason: Retracted dummy cookies
Reply With Quote
  #8  
Old 05.12.2024, 12:39
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 73,786
Default

Quote:
Originally Posted by DevLA View Post
It seems like JDownloader may not be updating with the new cookies I’ve been adding when modifying the LinkCrawler rules.
Please add more details here.
I'am able to easily change the cookies inside already added rules.
What do you mean?

Quote:
Originally Posted by DevLA View Post
Here is the linkcrawler rules and here is the linkcrawler log
Code
Your logs indeed show that you are running into Cloudflare.
More information about Cloudflare:
https://board.jdownloader.org/showthread.php?t=83712
If you can't solve this, you won't be able to make this work.
Did you use any kind of proxy/VPN? If so: Turn it off and try again.

Cloudflare may check the cookies against the source User-Agent they were generated with.
Try this:
1. Obtain the User-Agent value of the browser you are using, for example via "whatmyuseragent.com".
2. Take that value and put it into all of the rules you have via:
Code:
"headers": [["User-Agent", "Your User-Agent value here"]]
I cannot provide further help since I'am not running into the Cloudflare issue - it is working fine here.

Lastly some information about your new rules:
1. I highly recommend not to pust any cookie values publicly in our forums.
Cookies are potentially sensitive data which only you should handle.

2. About your rule with id "1716000458779":
The pattern is wrong - it needs to be absolute (so "http..." and the host need to be in it).

3. About your rule with id "1716000458778":
Your pattern may work but it is still wrong.
You need to escape the dots. Go ahead and google "dots and regular expressions", ask any AI model like ChatGPT about it or take a closer look in regex101.com to understand what I mean.

...then again even perfectly working rules would not fix your Cloudflare problem.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #9  
Old 07.12.2024, 05:23
DevLA DevLA is offline
Vacuum Cleaner
 
Join Date: Aug 2020
Posts: 16
Default

@pspzockerscene

After two days of frustration, I finally managed to get past one part of Cloudflare and was able to at least get the file to show up in JDownloader. However, when I tried to actually download the file, Cloudflare blocked it. So, progress is being made.

I initially thought my cookie format was incorrect, but it turns out that wasn't the issue.

The first JSON is what finally worked. I added the User-Agent as you suggested, but when I added it before, it still resulted in being blocked. For some reason, when I moved the header to the top near "updateCookies," that’s when I was finally able to bypass Cloudflare and add the download to JDownloader. I then tried duplicating that setup and adding it to the second part of the code. I also moved the DIRECTHTTP rule to the top. So far, I’ve tested putting the cookies in both rules, but nothing has changed.

Once I manage to get through this, I can continue tweaking the rules. I've been using Regex101 to better understand what you meant, and I can see where you're coming from now. I also wasn’t using the Java 8 flavor until later on, so I’ll be sure to look into it more.


Code:
[
  {
    "cookies": [
      ["cf_clearance", ""],
      ["connect.sid", ""]
    ],
    "updateCookies": false,
    "headers": [
      ["User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"]
    ],
    "logging": true,
    "maxDecryptDepth": 0,
    "name": "example first rule in list of rules",
    "pattern": "**External links are only visible to Support Staff**,
    "rule": "DEEPDECRYPT",
    "packageNamePattern": null,
    "passwordPattern": null,
    "formPattern": null,
    "deepPattern": "<a href=\"(/private/[^\"]+)\" id=\"mod-download-link\">",
    "rewriteReplaceWith": null
  },
  {
    "enabled": true,
    "logging": false,
    "maxDecryptDepth": 1,
    "name": "example second rule in list of rules",
    "pattern": "https:\/\/www\\.xivmodarchive\\.com\\/private\\/[a-f0-9\\-]+\\/files\\/.*\\.(zip|rar|7z|ttmp2|pmp|ttmp)$",
    "rule": "DIRECTHTTP"
  }
]






Version 2

Code:
[
  { "cookies": [
      ["cf_clearance", ""],
      ["connect.sid", ""]
    ],
    "enabled": true,
    "updateCookies": false,
    "headers": [
      ["User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"]
    ],
    "logging": true,
    "maxDecryptDepth": 0,
    "name": "example second rule in list of rules",
    "pattern": "https:\/\/www\\.xivmodarchive\\.com\\/private\\/[a-f0-9\\-]+\\/files\\/.*\\.(zip|rar|7z|ttmp2|pmp|ttmp)$",
    "deeppattern": null,
    "rule": "DIRECTHTTP"
  },
{
    "cookies": [
      ["cf_clearance", ""],
      ["connect.sid", ""]
    ],
    "enabled": true,
    "updateCookies": false,
    "headers": [
      ["User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"]
    ],
    "logging": true,
    "maxDecryptDepth": 0,
    "name": "example first rule in list of rules",
    "pattern": "**External links are only visible to Support Staff**,
    "rule": "DEEPDECRYPT",
    "packageNamePattern": null,
    "passwordPattern": null,
    "formPattern": null,
    "deepPattern": "<a href=\"(/private/[^\"]+)\" id=\"mod-download-link\">",
    "rewriteReplaceWith": null
  }
]
Reply With Quote
  #10  
Old 07.12.2024, 14:46
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 81,696
Default

@DevLA: Thanks for detailed feedback
I will have to check if the DirectHTTP/generic http plugin makes use of headers of rules. Can't tell for sure if they are used or not. Will check on monday and respond here
__________________
JD-Dev & Server-Admin
Reply With Quote
  #11  
Old 09.12.2024, 13:15
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 73,786
Default

Fixed.
Indeed the plugin was not setting the headers which means in the end they were used for crawling but not for downloading.

@DevLA
Update your JDownloader and try again.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #12  
Old 11.12.2024, 11:01
DevLA DevLA is offline
Vacuum Cleaner
 
Join Date: Aug 2020
Posts: 16
Default

@Jiaz & @pspzockerscene
Thank you for addressing the matter. The update successfully resolved the issue, and I can now crawl the domain without any problems. The changes ensure that links are found and downloaded without being blocked. I truly appreciate the time and effort you dedicated to assisting me with this!
Reply With Quote
  #13  
Old 11.12.2024, 11:19
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 73,786
Default

Also thanks to you for not giving up and actually finding a bug
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 00:54.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.