JDownloader Community - Appwork GmbH
 

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 23.09.2020, 02:06
wanko wanko is offline
JD VIP
 
Join Date: Aug 2015
Posts: 300
Default Link Crawler for 4archive

**External links are only visible to Support Staff****External links are only visible to Support Staff**

most of available link is https://i.imgur.com/xxxx.jpg



i'm trying with rule, but seem like it's not working

[
{
"enabled": true,
"cookies": null,
"updateCookies": true,
"maxDecryptDepth": 0,
"id": 1547771712492,
"name": "Test crawl all imgur urls from 4archive",
"pattern": "https?://4archive\\.org\/board\/[a-z]\/thread\/[0-9]+/.+",
"rule": "DEEPDECRYPT",
"packageNamePattern": null,
"passwordPattern": null,
"formPattern": null,
"deepPattern": "https?://i\\.imgur\\.com/.+",
"rewriteReplaceWith": null
}
]

what i want is crawl all imgur link inside and put it in folder
and stored in path | named by title
/board/tv/thread/87855690/cgi-from-1993-is-still-the-best-we-have-ever-seen-how

Last edited by wanko; 23.09.2020 at 02:41.
Reply With Quote
  #2  
Old 23.09.2020, 15:25
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,906
Default

Your regular expression is wrong - you do not have to escape slashes for LinkCrawler Rules.
Here is a simple example rule:
Code:
[ {
  "enabled" : true,
  "updateCookies" : true,
  "logging" : false,
  "maxDecryptDepth" : 1,
  "name" : "4archive.oprg example rule grab all imgur.com URLs",
  "pattern" : "https?://4archive\\.org/board/tv/thread/.+",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "(https?://i\\.imgur\\.com/[A-Za-z0-9]+\\.jpg)",
  "rewriteReplaceWith" : null
} ]
Plaintext to get around our forum protocol censoring:
pastebin.com/aiKE9rHG

-psp-
EDIT

Regarding download-path:
You cannot set download paths via LinkCrawler Rules.
You can try to use the packagizer for that.

Regarding package name:

You could e.g. try this:
Code:
"packageNamePattern" : "<title>(.*?)</title>"
--> This would simply set the title of the page as packagename.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?

Last edited by pspzockerscene; 23.09.2020 at 15:27.
Reply With Quote
  #3  
Old 23.09.2020, 19:50
wanko wanko is offline
JD VIP
 
Join Date: Aug 2015
Posts: 300
Default

imgur link wont work cuz it's missing extension (most is jpg and then png,gif) from url : **External links are only visible to Support Staff****External links are only visible to Support Staff**

i'm checking with all imgur link on imgur.com, it dont have extension too

**External links are only visible to Support Staff****External links are only visible to Support Staff**
Code:
[{
  "enabled" : true,
  "cookies" : [ [ "__cfduid", "df930d6c3e7ce358a4124684fc0d9dc691600878986" ] ],
  "updateCookies" : true,
  "logging" : false,
  "maxDecryptDepth" : 1,
  "id" : 1600879440267,
  "name" : "4archive.oprg example rule grab all imgur.com URLs",
  "pattern" : "https?://4archive\\.org/board/[A-Za-z0-9]+/thread/.+",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : "<title>(.*?)</title>",
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "(https?://i\\.imgur\\.com/[A-Za-z0-9]+\\.+)",
  "rewriteReplaceWith" : null
}]


Thank you

Last edited by wanko; 23.09.2020 at 20:43.
Reply With Quote
  #4  
Old 24.09.2020, 01:30
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,906
Default

Hi again,
1. Here is another version which will:
- Allow 4archive URLs with other country-tags in there and not only "tv"
- Find all kinds of imgur.com URLs
Please keep in mind that while we do help creating such rules, it is not our job to teach you how regular expressions work.
You can use online tools such as regex101.com to easily play around with/test your regular expressions!

New rule:
Code:
[ {
  "enabled" : true,
  "updateCookies" : true,
  "logging" : false,
  "maxDecryptDepth" : 1,
  "name" : "4archive.oprg example rule grab all imgur.com URLs",
  "pattern" : "https?://4archive\\.org/board/[a-z]{2}/thread/.+",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "(https?://([a-z0-9]+\\.)?imgur\\.com/[A-Za-z0-9]+[^<>\"\\']+)",
  "rewriteReplaceWith" : null
} ]
New rule as plaintext:
pastebin.com/nJq7tjkh

2. If you are experiencing issues with imgur.com pictures getting displayed as offline, you will have to wait until the following ticket gets resolved:

As a workaround-attempt, you can try to add your imgur.com account to JDownloader but that won't make it work for sure!

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #5  
Old 25.09.2020, 01:42
wanko wanko is offline
JD VIP
 
Join Date: Aug 2015
Posts: 300
Default

sorry for that and thank you for new rule,
imgur is working
Reply With Quote
  #6  
Old 25.09.2020, 15:44
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,906
Default

Thanks for your feedback.

We're still working on imgur so just in case you're experiecing issues, you know why

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 14:37.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.