#1
|
|||
|
|||
[LinkCrawler Rule] [Request] Plugin for nsfw.xxx
Not sure if this website does the auto-scrolling thing that causes JDownloader issues, but LinkGrabber only gets 15 images or so (and only small thumbnails, not full images/videos; as well as a bunch of random UI elements)
Would be great if JDownloader could automatically crawl through and download all high-quality images & videos on a page (and 'gallery' photos as well if possible) Sample links: **External links are only visible to Support Staff****External links are only visible to Support Staff** (563 images, 1 video) **External links are only visible to Support Staff****External links are only visible to Support Staff** (206 images, no videos) **External links are only visible to Support Staff****External links are only visible to Support Staff** (24 images, 318 videos) **External links are only visible to Support Staff****External links are only visible to Support Staff** (16 regular images, 1 video, 1 'gallery' with 4 images) Included 'gallery' of above (with 4 images): **External links are only visible to Support Staff****External links are only visible to Support Staff** If any more info is needed or this post violates the rules somehow, please let me know! |
#2
|
||||
|
||||
Hi,
I don't see us adding a plugin for this website. To crawl e.g. all images of single posts, you can use custom LinkCrawler Rules. Let us know if you need help with that. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 20.01.2022 at 17:59. Reason: Added missing LinkCrawler rules support article hyperlink |
#3
|
|||
|
|||
Hi psp, thanks for your response. Where should I even begin taking a look at implementing this function :S - any help would be appreciated!
I think the key functions I'm trying to do are: 1) Handle the 'autoscroll' so that JDownloader doesn't only see the first few results 2) Download the image/video that is magnified when clicking on the link (instead of the small thumbnail shown on the page) |
#4
|
||||
|
||||
Hi again,
Sorry I forgot to hyperlink our support article in the last post (just fixed that) - here it is: https://support.jdownloader.org/Know...kcrawler-rules Quote:
Use external browser/addons/scripts for this e.g. scroll all the way down, then use the addon "Link Gopher" to get all URLs. Quote:
I'm still not sure if I understood this website. Where does the content come from? It seems like a lot of content is just from reddit.com? That's kinda strange but nice if you only look at the technical aspect: We already have plugins for reddit if all of that content links to reddit, the easiest way might be to find those "original" reddit URLs and the rest will be auto done by our reddit crawler... -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#5
|
|||
|
|||
Thanks again for your response!
As far as I understand how this site works, it basically acts like an aggregator and archiver of nsfw content on reddit. Other than its image-focused UI, its main features that make it work better than normal reddit browsing (and likewise JD's crawler for reddit): - It already pre-filters out the huge amount of duplicates that some posters might have tendency to spam/cross-post across numerous subreddits or re-post over time - It pre-filters out non-nsfw content that users mix in with their posts - It archives content that might have been deleted and lost (by the original poster, by subreddit moderators, etc.) I've also just tried to use the "Link Gopher" Chrome extension you mention to copy-paste the list of relevant URLs (all those beginning with Quote:
I'd appreciate any pointers you can give to better automate such a process? |
#6
|
||||
|
||||
Quote:
- That website is sorted better - Some of it may already be down on reddit Quote:
https://support.jdownloader.org/Know...kcrawler-rules I've created and very basic example for you which will process those "/post/" URLs and grab all images. It will still find unwanted stuff such as thumbnails but you can tweak it further in order to avoid this. Please keep in mind that I didn't test it with a lot of content so it might not find all expected results for all URLs. Rule: Code:
[ { "enabled": true, "updateCookies": true, "logging": false, "maxDecryptDepth": 1, "name": "example rule for nsfw.xxx pictures in single posts", "pattern": "**External links are only visible to Support Staff**, "rule": "DEEPDECRYPT", "packageNamePattern": "<title>(.*?)</title>", "deepPattern": "<img src=\"(https?://[^\"]+)" } ] pastebin.com/raw/TZPnEjSF -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#7
|
||||
|
||||
Plugin request for deleted
I just found another thread requesting this site and it appears it won't be supported. Thanks though!
Last edited by DukeM; 13.05.2022 at 12:29. Reason: please delete this thread |
#8
|
||||
|
||||
Hey, @annodominus! Have you figured out how to download from this site properly?
|
#9
|
||||
|
||||
Quote:
Use the LinkCrawler Rule I provided... Quote:
Which link(s) did you add, what was the expected outcome and what happened instead? -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#10
|
||||
|
||||
Hey @psp!
Quote:
I tried the rule you made when I saw this thread earlier but didn't notice a difference when trying out some links with and without the rule. I tried a /user/ and /post/ link changing the one indicated in the rule when appropriate. Tbh, I was too focused on testing out the /user/ links until something annodominus said caught my eye while scrolling this post to reply to you just now. So, I just tried getting all the proper links (filtering the ones with /post/) with Link Gopher and after a quick sorting, it does manage get the direct image links of the images! It wasn't obvious to me when I was only testing out single /post/ links at first so I'm sorry for missing that. But thank you!! If you don't mind a follow-up, I'm not too adept with using linkcrawler rules (or even simple regex) but how do I filter it some more so that it won't include links with /thumbnail/ in it? Having a hard time separating the two, even my duplicate checker tool misses a few images for some unknown reason. |
#11
|
||||
|
||||
1. That rule is only working for "/post/" links.
If it doesn't work for you, please provide more example URLs. Have you used such rules before? Are you sure you've added the rule correctly to your config? 2. Regarding regular expressions, you can use webtools like regex101.com for testing/building regular expressions that fit your needs. I will of course provide further help if needed but I won't *teach* you how to create your own regular expressions. Just as a hint: Regular expressions are very powerful and you can do a lot of nice things with them -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
Thread Tools | |
Display Modes | |
|
|