#1
|
|||
|
|||
[LinkCrawler Rules] How to download multiple things from xivmodarchive.com
Request for Guidance: Downloading Preview Images and Mod URLs with JDownloader 2
I am seeking assistance in using JDownloader 2 to capture preview images and mod URLs from **External links are only visible to Support Staff****External links are only visible to Support Staff**. The site requires a Discord login to access its full functionality and uses Cloudflare protection. Would using cookies help bypass these restrictions? Below are examples of the URLs I am trying to download, categorized as mod pages, preview images, and mod files: Mod Pages: **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** Preview Images: **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** Mod Files: **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** External Links: Some mods redirect to external platforms, such as Google Drive or Mega.nz: **External links are only visible to Support Staff****External links are only visible to Support Staff** Author Pages and Search Results: Author Page Example **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** How can I set up JDownloader 2 to efficiently grab all the above content, especially given the site's login and Cloudflare protection? Basically I want each url in its own package with its own content like other websites have it in JDownloader2. Is a plugin or a linkcrawler needed? Any guidance or workarounds would be greatly appreciated. |
#2
|
||||
|
||||
Hi,
Quote:
Docs: https://support.jdownloader.org/know...kcrawler-rules https://support.jdownloader.org/know...kcrawler-rules Quote:
Here is the rough way: 1. Create a Rule of type DIRECTHTTP for the "mod files" links. https://support.jdownloader.org/know...ple-directhttp Test it. 2. To make JD automatically find the "preview images" and/or also the "mod files" links from the "mod pages", you need another rule, this time of type DEEPDECRYPT: https://support.jdownloader.org/know...le-deepdecrypt 3. About your Google Drive link: JD supports Google Drive so that will work out of the box. 4. About "Author Pages and Search Results" Either create more rules (one rule of type DEEPDECRYPT could do the job) which find the "/modid/" links inside user/search links or use the following semi-automatic way: https://support.jdownloader.org/know...orted-websites You can find a lot of example rules for other websites in our forums and some basic examples in our knowledgebase.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#3
|
||||
|
||||
@DevLA: In case you're stuck, need help or got questions, please just ask
__________________
JD-Dev & Server-Admin |
#4
|
|||
|
|||
So, I tried setting up a LinkCrawler rule. However, when I run it, nothing happens. I checked the LinkCrawler log, and it says it's blocked by Cloudflare. If I try to copy and paste a direct URL link into JDownloader, it will pick it up. But when I attempt to download it, it says it's blocked by Cloudflare.
Here is the linkcrawler rule and setup. Code:
[ { "cookies" : [ [ "cf_clearance", "4MAk312EtgkDrAM9_wO9LUW7mrOHBuk4FARou.u.0GY-1733267423-1.2.1.1-yBtXi8zE3bIyFYfydW6PIU030FpBewhM0d2Rk3HD7DtoOVlN1qGbZJs_X2hpgev0UGl9IY4tX4iW9osSa7f3IOxpBNE1vDfLuobnCn1IX7AlK6yZoU4FaMq9idJ2kbIJbr18skGQHsrPzLiNoWNAXgffggggfgeMDsVkzeU5XFGzR1UuDyMh0Z5YEnHLcNYBcWEfP2xzpsgSgiGKd8wzLnqDy_cKm_Anm37fLK61xzmfIUEsPYK_E6I.zGDOq4KyQEaKze6BKnoXCUmFOCSDGrA_NZ6wQayl.G7oqiTGfTpCCKWr0CLYARmn11oCnf5KSxHm4HWZg60ZqumiDvJlf.N7W.rgPqQ7dcMraBuKCzNHvArRu2h8ddEXvWx.WaOc0Ic276cp8UU7RFqbxKFkkPYT.8_GrITQVubHAkFPRe1PZ8gKe7HAtoDkAJvtgSVTgTrDFl2m2S5h7A" ], [ "connect.sid", "s%3ASdILioTgsApfJlw3_J8hrr5TSnLQFsJfggffggfggA7Z2XgDGCT%2B8XFXSIZEnt8rCElG1A" ] ], "deepPattern" : "<a href=\"/private/([a-f0-9\\-]+)\" id=\"mod-download-link\">", "formPattern" : null, "headers" : null, "id" : 1716000458778, "maxDecryptDepth" : 0, "name" : null, "packageNamePattern" : null, "passwordPattern" : null, "pattern" : "**External links are only visible to Support Staff**, "propertyPatterns" : null, "rewriteReplaceWith" : null, "rule" : "DEEPDECRYPT", "enabled" : true, "logging" : true, "updateCookies" : false } ] Last edited by DevLA; 04.12.2024 at 01:31. |
#5
|
||||
|
||||
@DevLa
I cannot reproduce the Cloudflare problem. Also the deepPattern of your rule is wrong and thus it will not find any results. Also, you should really create that DIRECTHTTP rule for the "mod files" links first because if you don't, only those "/private" links that contain known file extensions (e.g. .zip) will be detected. However, for basic testing it is fine to create the DEEPDECRYPT rule first. Here is a version of your rule with the deepPattern fixed: Code:
[ { "deepPattern": "<a href=\"(/private/[^\"]+)\" id=\"mod-download-link\">", "formPattern": null, "headers": null, "id": 1716000458778, "maxDecryptDepth": 0, "name": null, "packageNamePattern": null, "pattern": "**External links are only visible to Support Staff**, "propertyPatterns": null, "rewriteReplaceWith": null, "rule": "DEEPDECRYPT", "enabled": true, "logging": true, "updateCookies": false } pastebin.com/raw/tF3QwU96 Example used for testing: xivmodarchive.com/modid/122789 Results: Finds one .zip link
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#6
|
||||
|
||||
@DevLA: Maybe an additional linkcrawler rule that also provides cookies for the download urls? it must be of rule/type DIRECTHTTP and match the pattern of the download urls. That may work
__________________
JD-Dev & Server-Admin |
#7
|
|||
|
|||
@pspzockerscene
Thank you for the fix! I’m still relatively new to link crawling and have limited experience with regex and wildcards, so I greatly appreciate learning from the resources and guidance provided by you and the team. Your support has been incredibly helpful as I navigate this process. I would like to test the different patterns and configurations, but I’m currently encountering persistent Cloudflare-related issues. It seems like JDownloader may not be updating with the new cookies I’ve been adding when modifying the LinkCrawler rules. This makes it difficult to determine whether my changes are effective or not. @Jiaz Thank you for your suggestion; I’ve incorporated it into the rules to see if it resolves the issue. However, I’m still facing the same problems, and I’m uncertain whether the issue lies with my JDownloader setup or with the format of my LinkCrawler rules. Here is the linkcrawler rules and here is the linkcrawler log Code:
[ { "cookies": [ [""], ["connect.sid", ""] ], "deepPattern": null, "formPattern": null, "headers": null, "id": 1716000458779, "maxDecryptDepth": 0, "name": "Direct File Links", "packageNamePattern": null, "passwordPattern": null, "pattern": "/private/[a-f0-9\\-]+/files/.*\\.(zip|rar|7z|ttmp2|pmp|ttmp)$", "propertyPatterns": null, "rewriteReplaceWith": null, "rule": "DIRECTHTTP", "enabled": true, "logging": true, "updateCookies": false }, { "cookies": [ [""], ["connect.sid", ""] ], "deepPattern": "<a href=\"(/private/[^\"]+)\" id=\"mod-download-link\">", "formPattern": null, "headers": null, "id": 1716000458778, "maxDecryptDepth": 0, "name": "Private Links", "packageNamePattern": null, "passwordPattern": null, "pattern": "**External links are only visible to Support Staff**, "propertyPatterns": null, "rewriteReplaceWith": null, "rule": "DEEPDECRYPT", "enabled": true, "logging": true, "updateCookies": false } ] 04.12.24 13.39.35 <--> 04.12.24 14.00.27 jdlog://8817411370661/ Last edited by DevLA; 05.12.2024 at 22:04. Reason: Retracted dummy cookies |
#8
|
||||
|
||||
Quote:
I'am able to easily change the cookies inside already added rules. What do you mean? Quote:
More information about Cloudflare: https://board.jdownloader.org/showthread.php?t=83712 If you can't solve this, you won't be able to make this work. Did you use any kind of proxy/VPN? If so: Turn it off and try again. Cloudflare may check the cookies against the source User-Agent they were generated with. Try this: 1. Obtain the User-Agent value of the browser you are using, for example via "whatmyuseragent.com". 2. Take that value and put it into all of the rules you have via: Code:
"headers": [["User-Agent", "Your User-Agent value here"]] Lastly some information about your new rules: 1. I highly recommend not to pust any cookie values publicly in our forums. Cookies are potentially sensitive data which only you should handle. 2. About your rule with id "1716000458779": The pattern is wrong - it needs to be absolute (so "http..." and the host need to be in it). 3. About your rule with id "1716000458778": Your pattern may work but it is still wrong. You need to escape the dots. Go ahead and google "dots and regular expressions", ask any AI model like ChatGPT about it or take a closer look in regex101.com to understand what I mean. ...then again even perfectly working rules would not fix your Cloudflare problem.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#9
|
|||
|
|||
@pspzockerscene
After two days of frustration, I finally managed to get past one part of Cloudflare and was able to at least get the file to show up in JDownloader. However, when I tried to actually download the file, Cloudflare blocked it. So, progress is being made. I initially thought my cookie format was incorrect, but it turns out that wasn't the issue. The first JSON is what finally worked. I added the User-Agent as you suggested, but when I added it before, it still resulted in being blocked. For some reason, when I moved the header to the top near "updateCookies," that’s when I was finally able to bypass Cloudflare and add the download to JDownloader. I then tried duplicating that setup and adding it to the second part of the code. I also moved the DIRECTHTTP rule to the top. So far, I’ve tested putting the cookies in both rules, but nothing has changed. Once I manage to get through this, I can continue tweaking the rules. I've been using Regex101 to better understand what you meant, and I can see where you're coming from now. I also wasn’t using the Java 8 flavor until later on, so I’ll be sure to look into it more. Code:
[ { "cookies": [ ["cf_clearance", ""], ["connect.sid", ""] ], "updateCookies": false, "headers": [ ["User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"] ], "logging": true, "maxDecryptDepth": 0, "name": "example first rule in list of rules", "pattern": "**External links are only visible to Support Staff**, "rule": "DEEPDECRYPT", "packageNamePattern": null, "passwordPattern": null, "formPattern": null, "deepPattern": "<a href=\"(/private/[^\"]+)\" id=\"mod-download-link\">", "rewriteReplaceWith": null }, { "enabled": true, "logging": false, "maxDecryptDepth": 1, "name": "example second rule in list of rules", "pattern": "https:\/\/www\\.xivmodarchive\\.com\\/private\\/[a-f0-9\\-]+\\/files\\/.*\\.(zip|rar|7z|ttmp2|pmp|ttmp)$", "rule": "DIRECTHTTP" } ] Version 2 Code:
[ { "cookies": [ ["cf_clearance", ""], ["connect.sid", ""] ], "enabled": true, "updateCookies": false, "headers": [ ["User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"] ], "logging": true, "maxDecryptDepth": 0, "name": "example second rule in list of rules", "pattern": "https:\/\/www\\.xivmodarchive\\.com\\/private\\/[a-f0-9\\-]+\\/files\\/.*\\.(zip|rar|7z|ttmp2|pmp|ttmp)$", "deeppattern": null, "rule": "DIRECTHTTP" }, { "cookies": [ ["cf_clearance", ""], ["connect.sid", ""] ], "enabled": true, "updateCookies": false, "headers": [ ["User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"] ], "logging": true, "maxDecryptDepth": 0, "name": "example first rule in list of rules", "pattern": "**External links are only visible to Support Staff**, "rule": "DEEPDECRYPT", "packageNamePattern": null, "passwordPattern": null, "formPattern": null, "deepPattern": "<a href=\"(/private/[^\"]+)\" id=\"mod-download-link\">", "rewriteReplaceWith": null } ] |
#10
|
||||
|
||||
@DevLA: Thanks for detailed feedback
I will have to check if the DirectHTTP/generic http plugin makes use of headers of rules. Can't tell for sure if they are used or not. Will check on monday and respond here
__________________
JD-Dev & Server-Admin |
#11
|
||||
|
||||
Fixed.
Indeed the plugin was not setting the headers which means in the end they were used for crawling but not for downloading. @DevLA Update your JDownloader and try again.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#12
|
|||
|
|||
@Jiaz & @pspzockerscene
Thank you for addressing the matter. The update successfully resolved the issue, and I can now crawl the domain without any problems. The changes ensure that links are found and downloaded without being blocked. I truly appreciate the time and effort you dedicated to assisting me with this! |
#13
|
||||
|
||||
Also thanks to you for not giving up and actually finding a bug
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
Thread Tools | |
Display Modes | |
|
|