JDownloader Community - Appwork GmbH
 

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 26.04.2023, 09:33
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default How do I make a Linkgrabber Filter rule only apply to a specific website?

Hello. I have a Linkgrabber rule that filters by File Type and Downloadurl. Without needing to enable/disable it, is there a way to make it only apply the filter if the links I'm pasting and crawling (via deep link analysis/deep decrypt) is from a specific website?

Thank you!
Reply With Quote
  #2  
Old 26.04.2023, 11:55
notice notice is offline
JD Supporter
 
Join Date: Mar 2023
Posts: 505
Default

Quote:
Originally Posted by DukeM View Post
only apply the filter if the links I'm pasting and crawling (via deep link analysis/deep decrypt)
use 'Link Origin' and select the sources you want to check, eg 'Paste Links Action'
Quote:
Originally Posted by DukeM View Post
from a specific website?
use 'Sourceurl(s)' condition for this.
Reply With Quote
  #3  
Old 26.04.2023, 14:06
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Quote:
Originally Posted by notice View Post
use 'Link Origin' and select the sources you want to check, eg 'Paste Links Action'

use 'Sourceurl(s)' condition for this.
Thanks! But whenever I add something to the Sourceurl part, it starts adding in other files from other sources which defeats the purpose of the filter.

I realised I don't need the File Type filter if the Downloadurl is working anyway so I just updated the Linkgrabber filter and now it only has this condition:

Downloadurl > containsnot > hostlinkiwant.com

And it perfectly filters the thing.

If I add the domain of the links I'm pasting to Sourceurl, it starts grabbing links from other websites as well.

Last edited by DukeM; 26.04.2023 at 14:11.
Reply With Quote
  #4  
Old 26.04.2023, 14:10
notice notice is offline
JD Supporter
 
Join Date: Mar 2023
Posts: 505
Default

@DukeM
see here https://board.jdownloader.org/showpo...2&postcount=11 to know the difference between sourceURL and downloadURL
Reply With Quote
  #5  
Old 26.04.2023, 14:10
notice notice is offline
JD Supporter
 
Join Date: Mar 2023
Posts: 505
Default

Quote:
Originally Posted by DukeM View Post
Thanks! But whenever I add something to the Sourceurl part, it starts adding in other files from other sources which defeats the purpose of the filter.
Can you provide example link and rule(eg screenshot)?

Please know that rules are processed from top to bottom and the conditions are AND connected
Reply With Quote
  #6  
Old 26.04.2023, 14:34
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Quote:
Originally Posted by notice View Post
Can you provide example link and rule(eg screenshot)?

Please know that rules are processed from top to bottom and the conditions are AND connected
Example links:
Spoiler:
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**


The links I want Linkgrabber to grab are from **External links are only visible to Support Staff****External links are only visible to Support Staff**.

Here's a screenshot of the rule: **External links are only visible to Support Staff****External links are only visible to Support Staff**

Currently, I think it would work for me if I disable Sourceurl because as far as I can tell, I only use the Right click>Add New Links function for this purpose. But just in case I'd have other in the future, would be nice to know how to resolve it now anyway. Thank you!


Edit: After reading the link you gave about Downloadurl vs Sourceurl, I think I get it now. But I also think that makes it impossible to filter this specific website then if that's the case.

Last edited by DukeM; 26.04.2023 at 14:37.
Reply With Quote
  #7  
Old 26.04.2023, 14:53
notice notice is offline
JD Supporter
 
Join Date: Mar 2023
Posts: 505
Default

@DukeM:
1.) you must be more specific as many ressources contain cdninstagram.com in URL, better use contains not
Code:
https?://[^\.]*.cdninstagram.com/
and enable regex checkbox
2.) you can make use of linkcrawler rules, to teach/tell JDownloader in what links exactly you're interested in, se
https://support.jdownloader.org/Know...kcrawler-rules
https://support.jdownloader.org/Know...deepdecrypt/22
that way JDownloader doesn't process all found links but only those matching your pattern
Reply With Quote
  #8  
Old 26.04.2023, 14:54
notice notice is offline
JD Supporter
 
Join Date: Mar 2023
Posts: 505
Default

Quote:
Originally Posted by DukeM View Post
Edit: After reading the link you gave about Downloadurl vs Sourceurl, I think I get it now. But I also think that makes it impossible to filter this specific website then if that's the case.
I don't see any reason why it should not work I cannot test becaue of cloudflare but with more specific conditions, the rule should work just as fine
Reply With Quote
  #9  
Old 26.04.2023, 15:11
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Quote:
Originally Posted by notice View Post
@DukeM:
1.) you must be more specific as many ressources contain cdninstagram.com
In the example links, there are 29 links from cdninstagram.com and they are all what I want to get. Thanks though!

Quote:
Originally Posted by notice View Post
2.) you can make use of linkcrawler rules, to teach/tell JDownloader in what links exactly you're interested in, se
**External links are only visible to Support Staff**...
**External links are only visible to Support Staff**...
that way JDownloader doesn't process all found links but only those matching your pattern
I actually played around with Linkcrawler rules yesterday before I decided to make a post today. I can't figure it out but then again, I've always had a hard time with RegEx no matter how much I try.

Here's the linkcrawler rule I made if you want to take a look: **External links are only visible to Support Staff****External links are only visible to Support Staff**

Do note though, I just tried to follow the format for the link pattern in the rules that were already there.
Reply With Quote
  #10  
Old 26.04.2023, 15:18
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Quote:
Originally Posted by notice View Post
I don't see any reason why it should not work I cannot test becaue of cloudflare but with more specific conditions, the rule should work just as fine
Ah, because like in your example there with reddit.com, if I add **External links are only visible to Support Staff****External links are only visible to Support Staff** in the Sourceurl, it will have to process every link it finds within that url. And since the nature of how it includes the post description and the users can add whatever outbound links they want there like facebook, youtube, spotify, etc, JD2 will have to process them as well if they exist. If I wanted to filter them too, I would have to create individual rules for each of those websites but it could have unlimited possibilities.
Reply With Quote
  #11  
Old 26.04.2023, 16:01
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

Here is an example rule that will get the job done:
Code:
[
  {
    "enabled": true,
    "logging": false,
    "maxDecryptDepth": 1,
    "name": "example rule grab single images from picnob.com",
    "pattern": "https?://(?:www\\.)?picnob\\.com/post/\\d+/",
    "rule": "DEEPDECRYPT",
    "packageNamePattern": null,
    "passwordPattern": null,
    "deepPattern": "class=\"pic\">\\s*<a href=\"(**External links are only visible to Support Staff**]+)"
  }
]
On pastebin:
pastebin.com/raw/51nps1yT
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #12  
Old 26.04.2023, 16:50
notice notice is offline
JD Supporter
 
Join Date: Mar 2023
Posts: 505
Default

Quote:
Originally Posted by DukeM View Post
Ah, because like in your example there with reddit.com, if I add **External links are only visible to Support Staff****External links are only visible to Support Staff** in the Sourceurl, it will have to process every link it finds within that url. And since the nature of how it includes the post description and the users can add whatever outbound links they want there like facebook, youtube, spotify, etc, JD2 will have to process them as well if they exist. If I wanted to filter them too, I would have to create individual rules for each of those websites but it could have unlimited possibilities.
Understood. Yes, of course the current sourceURL has its limitations and for this use case it's recommended to use Linkcrawler rules instead as you can very fine control what part of the website to crawl/process further, see pspzockerscene post.

Please don't hesitate and ask if you need help/got questions about linkcrawler, packagizer of filter rules
Reply With Quote
  #13  
Old 26.04.2023, 19:33
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Hi @psp!
Wow, thank you! Looking at how you wrote even just the URL pattern, I still don't get how but I'll keep trying. Haha.

A follow up if you don't mind, for the deepPattern bit, how can I make it so it also grabs the video files? I tried to copy paste another rule and just stupidly changed pic to vid but obviously it didn't work.

@notice
Yeah but it's understandable though, my use-case is pretty unique that it's a bit irrational to expect it to work specifically for it. Haha. But thank you for helping though. You're as awesome as psp and Jiaz!
Reply With Quote
  #14  
Old 26.04.2023, 20:12
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

Quote:
Originally Posted by DukeM View Post
Wow, thank you! Looking at how you wrote even just the URL pattern, I still don't get how but I'll keep trying. Haha.
See regex101.com and/or google/youtube-search for "Regular expressions tutorial" that is what's mostly needed to create such rules.

Quote:
Originally Posted by DukeM View Post
A follow up if you don't mind, for the deepPattern bit, how can I make it so it also grabs the video files? I tried to copy paste another rule and just stupidly changed pic to vid but obviously it didn't work.
It won't work like this!
It's like in math classes: Every website/challenge is different and you need to understand the topic in order to be able to create your own rules:
1. Look at the html of the website you're trying to crawl stuff from -> Press CTRL + U when you're on a website to view the html code.
You need to find out where the URL(s) you want are.
2. Create a working RegEx/"filter" which will only find what you want.
Use tools like regex101.com to learn and test regular expressions.
3. When you're sure that it should work, you can put that in a LinkCrawler Rule and test it in JD.
Testing it in JD before testing your regular expressions doesn't make any sense as you can't see where the problem/mistake is if it's not working right away.

Here is an update rule (I've changed the "deepPattern") which will also pickup videos from picnob.com:
Code:
[
  {
    "enabled": true,
    "logging": false,
    "maxDecryptDepth": 1,
    "name": "example rule grab single images from picnob.com",
    "pattern": "https?://(?:www\\.)?picnob\\.com/post/\\d+/",
    "rule": "DEEPDECRYPT",
    "packageNamePattern": null,
    "passwordPattern": null,
    "deepPattern": "class=\"(?:pic|down)\">\\s*<a[^>]*href=\"(**External links are only visible to Support Staff**]+)"
  }
]
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #15  
Old 27.04.2023, 06:58
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Quote:
Originally Posted by pspzockerscene View Post
See regex101.com and/or google/youtube-search for "Regular expressions tutorial" that is what's mostly needed to create such rules.
Thanks! I honestly tried on several occasions to learn/understand but my brain just doesn't work. Haha. My most recent attempt was when I was trying to write a custom script for this bot called YAGPDB. When I was finished, I probably had like 50 lines of code for the script then when a friend saw it, rewrote the whole thing with just 5 lines. I tried comparing the two to at least see a pattern and probably help me figure it out but still can't.

I'll keep trying though whenever I get the chance. RegEx is pretty useful even in simple tasks like renaming things.

Quote:
Originally Posted by pspzockerscene View Post
It won't work like this!
It's like in math classes: Every website/challenge is different and you need to understand the topic in order to be able to create your own rules:
1. Look at the html of the website you're trying to crawl stuff from -> Press CTRL + U when you're on a website to view the html code.
You need to find out where the URL(s) you want are.
2. Create a working RegEx/"filter" which will only find what you want.
Use tools like regex101.com to learn and test regular expressions.
3. When you're sure that it should work, you can put that in a LinkCrawler Rule and test it in JD.
Testing it in JD before testing your regular expressions doesn't make any sense as you can't see where the problem/mistake is if it's not working right away.
Hahaha thanks! Yeah, I can at least see now the class in deepPattern was referring to the html div class. I thought it was about File Type. Still have a ways to go but thanks for the explanation, it'll help me understand it better.

And thanks for the Linkcrawler rule update! Looks like it's still somehow missing a few video links (2 offline, 1 video, and 1 mirror) but I think this will be a great opportunity to figure out myself. I'll keep at it. Again, thank you so much!
Reply With Quote
  #16  
Old 27.04.2023, 13:49
notice notice is offline
JD Supporter
 
Join Date: Mar 2023
Posts: 505
Default

@DukeM: In case you need help/got questions, need guidance or just hints, please just ask
Reply With Quote
  #17  
Old 27.04.2023, 16:23
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

Quote:
Originally Posted by DukeM View Post
Thanks! I honestly tried on several occasions to learn/understand but my brain just doesn't work.
No worries.
All I wanted to say is basically: If you need rules for another 50 websites, I'm out.

Quote:
Originally Posted by DukeM View Post
RegEx is pretty useful even in simple tasks like renaming things.
Exactly. And filtering stuff.

Quote:
Originally Posted by DukeM View Post
Hahaha thanks! Yeah, I can at least see now the class in deepPattern was referring to the html div class. I thought it was about File Type. Still have a ways to go but thanks for the explanation, it'll help me understand it better.
You can easily copy the regular expressions of that rule into regex101.com and copy the target-websites' html code in the bottom field then you can "see live how the regex is working".

Quote:
Originally Posted by DukeM View Post
And thanks for the Linkcrawler rule update! Looks like it's still somehow missing a few video links (2 offline, 1 video, and 1 mirror)
Please provide example URLs if you can't make it work on your own.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #18  
Old 29.04.2023, 14:55
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Quote:
Originally Posted by pspzockerscene View Post
No worries.
All I wanted to say is basically: If you need rules for another 50 websites, I'm out.
Hahaha. Thankfully, I'm not entitled to make such an obscene request.

Quote:
Originally Posted by pspzockerscene View Post
You can easily copy the regular expressions of that rule into regex101.com and copy the target-websites' html code in the bottom field then you can "see live how the regex is working".


Please provide example URLs if you can't make it work on your own.
Thank you so much for regex101. Still nowhere close to understanding even 1% of it but at least it feels like I'm getting there. Thank you!
Reply With Quote
  #19  
Old 08.05.2023, 19:23
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Hi again. Hope it's okay if I just post here and not make another thread.

Do you have any idea why JD2 suddenly stopped crawling this website? I haven't changed anything since the last time I failed updating the linkgrabber rule which was a few days ago. Lol.

Anyway, I noticed this problem happen once last week but it was solved by restarting JD2. Now, I've tried the same trick as well as restarting my PC to no avail.

Here's a log of a test I did:
08.05.23 16.18.20 <--> 08.05.23 16.19.39 jdlog://4525311370661/

Thank you!
Reply With Quote
  #20  
Old 08.05.2023, 20:15
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

Quote:
Originally Posted by DukeM View Post
Hi again. Hope it's okay if I just post here and not make another thread.
Sure - that keeps the context here.

Quote:
Originally Posted by DukeM View Post
Do you have any idea why JD2 suddenly stopped crawling this website? I haven't changed anything since the last time I failed updating the linkgrabber rule which was a few days ago. Lol.
It's actually not too trivial: Website admins can change their websites at any point of time which can make previously created rules or even plugins useless.

Quote:
Originally Posted by DukeM View Post
Anyway, I noticed this problem happen once last week but it was solved by restarting JD2. Now, I've tried the same trick as well as restarting my PC to no avail.
That rule is still working fine here.
According to your logs, your rule is failing because of Cloudflare.
More information:

English / Englisch
the issue you've reported has been caused by our current Cloudflare issues.
Please read the first post of the linked thread and post in this thread if you have further questions!
In case you've posted a new thread and it will either get merged with the linked Cloudflare thread or we will lock it.
Please post Cloudflare related questions in the above linked thread only!
Attention: The first post of that linked thread contains useful information for you AND website-owners and also the hint that website-owners can solve this issue on their side which would also make our plugin work again!
There is no ETA for a Cloudflare fix from our side so it might be faster to ask website admins! Please read the first post of the linked thread completely and consider sending it to the admins of the website you are having issues with!

Deutsch / German
Das von dir gemeldete Problem wurde durch die aktuellen Cloudflare Probleme verursacht.
Bitte lies den ersten Post des Cloudflare Threads.
Falls du einen neuen Thread dazu geöffnet hast wurde dieser entweder in den bestehenden Cloudflare Thread verschoben oder mit diesem Post von mir beantwortet und geschlossen.
Weitere Fragen bitte im oben verlinkten Cloudflare Thread posten!
Achtung: Im ersten Post ist u.a. beschrieben, dass auch Webseiten-Admins das Problem auf ihrer Seite lösen können --> Unser Plugin würde dann sofort wieder funktionieren - das bitte beachten, den ersten Post vollständig lesen und ggf. an den Support der betroffenen Webseite weiterleiten!
Wie können derzeit nicht sagen, wann es von unserer Seite aus eine Lösung für das Cloudflare Problem geben wird!

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?

Last edited by pspzockerscene; 08.05.2023 at 20:16. Reason: Added more information.
Reply With Quote
  #21  
Old 08.05.2023, 20:25
DukeM's Avatar
DukeM DukeM is offline
JD Adviser
 
Join Date: Sep 2019
Posts: 113
Default

Ah, thank you for the explanation and clarification. I did notice Cloudflare checks are taking longer than usual for me today. Thanks again!
Reply With Quote
  #22  
Old 08.05.2023, 20:35
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

No worries.

We actually implemented detection for Cloudflare and similar but there is no way yet to provide feedback to the user if it happens e.g. during execution of a LinkCrawler rule.

You will see it though if it happens during a download attempt or during a login-process.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 01:51.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.