#1
|
|||
|
|||
crawler rule help
HTML Code:
{ "maxDecryptDepth": 0, "name": "nudecosplaygirls.com rule", "pattern": "https?://(?:www\\.)?nudecosplaygirls\\.com/(?!wp)[\\w-]+/", "rule": "DEEPDECRYPT", "deepPattern": null, "packageNamePattern": "<title>(.*?)( - nudecosplaygirls)?</title>" } 2. some wordpress website have enabled index, so how i can download all content inside each, with package structure domain/wp-content/uploads/years/month/ **External links are only visible to Support Staff****External links are only visible to Support Staff** Thank you Last edited by wanko; 28.06.2021 at 13:12. |
#2
|
||||
|
||||
re1: utilise deepPattern, by providing regex pattern that will find the content you want. Else JD will return all supported urls.
re2: create a second rule for this url structure. I would set max depth to prevent infinite loop.
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
#3
|
|||
|
|||
i'm tried w this but it wont work
"deepPattern": "(class="entry-content"><img src="([^"]+)")", first time deal with deeppattern class |
#4
|
||||
|
||||
you will need to escape the " chars within html, else will break json
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
#5
|
||||
|
||||
see
https://support.jdownloader.org/Know...kcrawler-rules Quote:
__________________
JD-Dev & Server-Admin |
#6
|
|||
|
|||
{
"maxDecryptDepth":0, "name":"nudecosplaygirls.com rule", "pattern":"https?://(?:www\\.)?nudecosplaygirls\\.com/[\\w-]+/", "rule":"DEEPDECRYPT", "deepPattern" : "(<\\class='entry-content'\\><img src='([^']+)//)", "packageNamePattern":"<title>(.*?)( - nudecosplaygirls)?<\/title>" } but still nothing found, weird Last edited by wanko; 28.06.2021 at 18:51. |
#7
|
||||
|
||||
You've escaped some stuff that you don't have to escape and my attempt is a bit different but it works:
Code:
[ { "maxDecryptDepth": 0, "name": "nudecosplaygirls.com rule", "pattern": "https?://(?:www\\.)?nudecosplaygirls\\.com/[\\w\\-]+/", "rule": "DEEPDECRYPT", "deepPattern": "class=\"alignnone size-(?:full|medium) wp-image-[0-9]+\" src=\"(**External links are only visible to Support Staff**]+)\"", "packageNamePattern": "<title>(.*?)( - nudecosplaygirls)?</title>" } ] pastebin.com/csCjj1dM -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#8
|
|||
|
|||
hmm it just crawl and stop right after that, no result
28.06.21 23.27.47 <--> 28.06.21 23.30.32 jdlog://0314825302851/ HTML Code:
{ "maxDecryptDepth": 0, "name": "nudecosplaygirls.com rule1", "pattern": "https?://(?:www\\.)?nudecosplaygirls\\.com/[\\w\\-]+/", "rule": "DEEPDECRYPT", "deepPattern": "class="alignnone size-(?:full|medium) wp-image-[0-9]+" src="(**External links are only visible to Support Staff**]+)"", "packageNamePattern": "<title>(.*?)( - nudecosplaygirls)?</title>" } , { "enabled" : true, "maxDecryptDepth" : 0, "name" : "nudecosplaygirls.com replace thumbnail URL to full image URL", "pattern" : "(https?://nudecosplaygirls\\.com/wp-content/uploads/\\d{4}/\\d{2}/.*)(-\\d+x\\d+)\\.jpg", "rule" : "REWRITE", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : "(https?://nudecosplaygirls\\.com/wp-content/[^"]+\\.jpg)", "rewriteReplaceWith" : "$1.jpg" } |
#9
|
||||
|
||||
Please first make sure that the first rule is working for you before starting to work on a 2nd rule...
-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#10
|
|||
|
|||
i'm already checked 1st rule alone but it not working
**External links are only visible to Support Staff****External links are only visible to Support Staff** |
#11
|
||||
|
||||
Which URL did you add to test this?
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#12
|
|||
|
|||
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff** |
#13
|
||||
|
||||
They seem to have totally different pages/allignments:
Wider attempt which doesn#t work for all URLs either:
Spoiler:
Code:
[ { "maxDecryptDepth": 0, "name": "nudecosplaygirls.com rule", "pattern": "https?://(?:www\\.)?nudecosplaygirls\\.com/[\\w\\-]+/", "rule": "DEEPDECRYPT", "deepPattern": "(?:class=\"alignnone size-(?:full|medium) wp-image-[0-9]+\" |wp-block-image size-(?:large|medium).*?)src=\"(**External links are only visible to Support Staff**]+)\"", "packageNamePattern": "<title>(.*?)( - nudecosplaygirls)?</title>" } ] Widest attempt which just grabs all wordpress .jpg image URLs: Code:
[ { "maxDecryptDepth": 0, "name": "nudecosplaygirls.com rule", "pattern": "https?://(?:www\\.)?nudecosplaygirls\\.com/[\\w\\-]+/", "rule": "DEEPDECRYPT", "deepPattern": "(**External links are only visible to Support Staff**, "packageNamePattern": "<title>(.*?)( - nudecosplaygirls)?</title>" } ] pastebin.com/fELqVEHc -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#14
|
||||
|
||||
The rule can further be optimized to just grab the blogpost images
__________________
JD-Dev & Server-Admin |
#15
|
|||
|
|||
Quote:
**External links are only visible to Support Staff****External links are only visible to Support Staff** https://imgur.com/a/gipcXQY **External links are only visible to Support Staff****External links are only visible to Support Staff** have only 1 but it also grab **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** but this dont grab any related thumbnails **External links are only visible to Support Staff****External links are only visible to Support Staff** i have filtered : \d\dx\d\d and rta/scaled so it wont grab any resized images and rta Last edited by wanko; 28.06.2021 at 21:13. |
#16
|
||||
|
||||
Yeah as said, you may need to adjust the deepPattern further to fit your needs.
As linked in our LinkCrawler Rules support article, you can e.g. use the tool "regex101.com" to test your filters against the html code of this website. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#17
|
|||
|
|||
ok i think that's because some thumbnail have full resolution, so it will grab those thumbnail too.
these two website have pagination and i dont know how to make it work, testing, it just crawl and stop btw **External links are only visible to Support Staff****External links are only visible to Support Staff** [{ "enabled": true, "cookies": null, "updateCookies": true, "maxDecryptDepth": 1, "name": "hentai-img", "pattern": "https?://[a-z]\\.hentai-img\\.com/image\\.+//", "rule": "DEEPDECRYPT", "packageNamePattern": null, "passwordPattern": null, "formPattern": null, "deepPattern": "(**External links are only visible to Support Staff****External links are only visible to Support Staff**[0-12]\\.hentai-img\\.com/upload/\\d{8}/\\d{3}/\\d{6}/p=/[0-9]+\\.(jpeg|jpg|gif))", "rewriteReplaceWith": null }] 2 - **External links are only visible to Support Staff****External links are only visible to Support Staff** [{ "enabled": true, "cookies": null, "updateCookies": true, "maxDecryptDepth": 1, "name": "jpg4", "pattern": "http?://img\\.jpg4\\.biz\\.+//pic\\d+\\.html", "rule": "DEEPDECRYPT", "packageNamePattern": null, "passwordPattern": null, "formPattern": null, "deepPattern": "id="img[0-9]+" src="([^"]+)", "rewriteReplaceWith": null }] |
#18
|
||||
|
||||
I'm unable to access this website at this moment due to Cloudflare...
Pagination via LinkCrawler Rules is not that easy and not always possible. You might need to use browser addons to auto-scroll and copy URLs to make this possible. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#19
|
|||
|
|||
ok nvm then...
now i'm trying to crawl this website **External links are only visible to Support Staff****External links are only visible to Support Staff** { "enabled" : true, "updateCookies" : true, "maxDecryptDepth" : 1, "name" : "amateurfetishist", "pattern" : "(https?://amateurfetishist\\.com/wp-content/uploads/\\d{4}/\\d{2}/.*)(-\\d+x\\d+)\\.(jpeg|jpg|gif|mp4)", "rule" : "DEEPDECRYPT", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "rewriteReplaceWith" : null } but it not working **External links are only visible to Support Staff****External links are only visible to Support Staff** i want to crawl all .(jpeg|jpg|gif|mp4) and keep same link structure eg domain/wp-content/uploads/year/month/files it have scaled and /d/dx/d/d resized images. there have so many website have index enabled so i just need one crawler rule example, thank you |
#20
|
||||
|
||||
You're doing it wrong!
Your rule- and your regular expression are completely wrong! Please read our LinkCrawler Docs again. First you need to tell the rule which URLs it is supposed to handle (pattern) and then what to crawl in those URLs (deepPattern). Also please invest some time to learn how regular expressions work and test your custom regular expressions using online tools like "regex101.com". In this case the following rule should do the job: Code:
[ { "enabled": true, "logging": false, "maxDecryptDepth": 0, "name": "amateurfetishist.com example rule", "pattern": "^**External links are only visible to Support Staff**, "rule": "DEEPDECRYPT", "packageNamePattern": null, "deepPattern": "(F\\d+-\\d+x\\d+\\.(jpeg|jpg|gif|mp4))" } ] pastebin.com/Qv4gQD22 Please keep in mind that while we are always trying to help our users, we won't provide countless custom linkcrawler rules for you as you're supposed to create them on your own so please invest some time into learning how to do this. We've already provided countless LinkCrawler rules for you! Our forum contains a lot of working example rules for all kinds of cases! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
Thread Tools | |
Display Modes | |
|
|