JDownloader Community - Appwork GmbH
 

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 19.01.2022, 10:49
annodominus annodominus is offline
Modem User
 
Join Date: Aug 2021
Posts: 3
Default [LinkCrawler Rule] [Request] Plugin for nsfw.xxx

Not sure if this website does the auto-scrolling thing that causes JDownloader issues, but LinkGrabber only gets 15 images or so (and only small thumbnails, not full images/videos; as well as a bunch of random UI elements)

Would be great if JDownloader could automatically crawl through and download all high-quality images & videos on a page (and 'gallery' photos as well if possible)

Sample links:
**External links are only visible to Support Staff****External links are only visible to Support Staff** (563 images, 1 video)
**External links are only visible to Support Staff****External links are only visible to Support Staff** (206 images, no videos)
**External links are only visible to Support Staff****External links are only visible to Support Staff** (24 images, 318 videos)
**External links are only visible to Support Staff****External links are only visible to Support Staff** (16 regular images, 1 video, 1 'gallery' with 4 images)
Included 'gallery' of above (with 4 images): **External links are only visible to Support Staff****External links are only visible to Support Staff**

If any more info is needed or this post violates the rules somehow, please let me know!
Reply With Quote
  #2  
Old 19.01.2022, 14:06
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

Hi,
I don't see us adding a plugin for this website.
To crawl e.g. all images of single posts, you can use custom LinkCrawler Rules.
Let us know if you need help with that.

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?

Last edited by pspzockerscene; 20.01.2022 at 17:59. Reason: Added missing LinkCrawler rules support article hyperlink
Reply With Quote
  #3  
Old 19.01.2022, 19:58
annodominus annodominus is offline
Modem User
 
Join Date: Aug 2021
Posts: 3
Default

Hi psp, thanks for your response. Where should I even begin taking a look at implementing this function :S - any help would be appreciated!

I think the key functions I'm trying to do are:
1) Handle the 'autoscroll' so that JDownloader doesn't only see the first few results
2) Download the image/video that is magnified when clicking on the link (instead of the small thumbnail shown on the page)
Reply With Quote
  #4  
Old 20.01.2022, 17:59
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

Hi again,

Quote:
Originally Posted by annodominus View Post
any help would be appreciated!
Sorry I forgot to hyperlink our support article in the last post (just fixed that) - here it is:
https://support.jdownloader.org/Know...kcrawler-rules

Quote:
Originally Posted by annodominus View Post
1) Handle the 'autoscroll' so that JDownloader doesn't only see the first few results
That's not possible using our simple LinkCrawler Rules.
Use external browser/addons/scripts for this e.g. scroll all the way down, then use the addon "Link Gopher" to get all URLs.

Quote:
Originally Posted by annodominus View Post
2) Download the image/video that is magnified when clicking on the link (instead of the small thumbnail shown on the page)
We'll get to that later.

I'm still not sure if I understood this website.
Where does the content come from?
It seems like a lot of content is just from reddit.com?
That's kinda strange but nice if you only look at the technical aspect:
We already have plugins for reddit if all of that content links to reddit, the easiest way might be to find those "original" reddit URLs and the rest will be auto done by our reddit crawler...

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #5  
Old 21.01.2022, 17:40
annodominus annodominus is offline
Modem User
 
Join Date: Aug 2021
Posts: 3
Default

Thanks again for your response!

As far as I understand how this site works, it basically acts like an aggregator and archiver of nsfw content on reddit. Other than its image-focused UI, its main features that make it work better than normal reddit browsing (and likewise JD's crawler for reddit):
- It already pre-filters out the huge amount of duplicates that some posters might have tendency to spam/cross-post across numerous subreddits or re-post over time
- It pre-filters out non-nsfw content that users mix in with their posts
- It archives content that might have been deleted and lost (by the original poster, by subreddit moderators, etc.)

I've also just tried to use the "Link Gopher" Chrome extension you mention to copy-paste the list of relevant URLs (all those beginning with
Quote:
**External links are only visible to Support Staff****External links are only visible to Support Staff**
), but the end results (images and videos only) captured by LinkGrabber and downloaded is approx. 3x the expected number of downloads, and includes a bunch of small thumbnails, random advertisement images, etc.

I'd appreciate any pointers you can give to better automate such a process?
Reply With Quote
  #6  
Old 24.01.2022, 17:25
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

Quote:
Originally Posted by annodominus View Post
As far as I understand how this site works, it basically acts like an aggregator and archiver of nsfw content on reddit. Other than its image-focused UI, its main features that make it work better than normal reddit browsing (and likewise JD's crawler for reddit):
Ahh okay so I guess it doesn't make any sense to crawl the original reddit links since:
- That website is sorted better
- Some of it may already be down on reddit

Quote:
Originally Posted by annodominus View Post
I'd appreciate any pointers you can give to better automate such a process?
As explained in my first reply, you could use LinkCrawler Rules:
https://support.jdownloader.org/Know...kcrawler-rules

I've created and very basic example for you which will process those "/post/" URLs and grab all images.
It will still find unwanted stuff such as thumbnails but you can tweak it further in order to avoid this.
Please keep in mind that I didn't test it with a lot of content so it might not find all expected results for all URLs.
Rule:
Code:
[
  {
    "enabled": true,
    "updateCookies": true,
    "logging": false,
    "maxDecryptDepth": 1,
    "name": "example rule for nsfw.xxx pictures in single posts",
    "pattern": "**External links are only visible to Support Staff**,
    "rule": "DEEPDECRYPT",
    "packageNamePattern": "<title>(.*?)</title>",
    "deepPattern": "<img src=\"(https?://[^\"]+)"
  }
]
Rule as plaintext for easier copy & paste:
pastebin.com/raw/TZPnEjSF

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #7  
Old 13.05.2022, 07:38
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default Plugin request for deleted

I just found another thread requesting this site and it appears it won't be supported. Thanks though!

Last edited by DukeM; 13.05.2022 at 12:29. Reason: please delete this thread
Reply With Quote
  #8  
Old 13.05.2022, 12:30
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Hey, @annodominus! Have you figured out how to download from this site properly?
Reply With Quote
  #9  
Old 13.05.2022, 14:20
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

Quote:
Originally Posted by DukeM View Post
I just found another thread requesting this site and it appears it won't be supported.
As explained a plugin is unnecessary.
Use the LinkCrawler Rule I provided...

Quote:
Originally Posted by DukeM View Post
Hey, @annodominus! Have you figured out how to download from this site properly?
What's the problem with the rule I provided?
Which link(s) did you add, what was the expected outcome and what happened instead?

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #10  
Old 13.05.2022, 16:01
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Hey @psp!

Quote:
Originally Posted by pspzockerscene View Post
As explained a plugin is unnecessary.
Use the LinkCrawler Rule I provided...


What's the problem with the rule I provided?
Which link(s) did you add, what was the expected outcome and what happened instead?

I tried the rule you made when I saw this thread earlier but didn't notice a difference when trying out some links with and without the rule. I tried a /user/ and /post/ link changing the one indicated in the rule when appropriate.

Tbh, I was too focused on testing out the /user/ links until something annodominus said caught my eye while scrolling this post to reply to you just now.

So, I just tried getting all the proper links (filtering the ones with /post/) with Link Gopher and after a quick sorting, it does manage get the direct image links of the images! It wasn't obvious to me when I was only testing out single /post/ links at first so I'm sorry for missing that. But thank you!!

If you don't mind a follow-up, I'm not too adept with using linkcrawler rules (or even simple regex) but how do I filter it some more so that it won't include links with /thumbnail/ in it? Having a hard time separating the two, even my duplicate checker tool misses a few images for some unknown reason.
Reply With Quote
  #11  
Old 17.05.2022, 01:30
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

1. That rule is only working for "/post/" links.
If it doesn't work for you, please provide more example URLs.
Have you used such rules before?
Are you sure you've added the rule correctly to your config?

2. Regarding regular expressions, you can use webtools like regex101.com for testing/building regular expressions that fit your needs.
I will of course provide further help if needed but I won't *teach* you how to create your own regular expressions.
Just as a hint:
Regular expressions are very powerful and you can do a lot of nice things with them

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 04:21.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.