JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 19.06.2020, 23:29
anderson02 anderson02 is offline
Modem User
 
Join Date: Jun 2020
Posts: 4
Question Grab multiple urls from main source!!!???

Hi all! .....I'm new over here!!

I'm wondering if there is a way to set a condition and grab all the urls from main url, then filter them to get my final images pachages. Example:

main url: webpageA.com have inside
secondary urls: webpageB1.com/img1.jpg and webpageB2.com/img2.jpg

I will like to know if I can create a condition to download all the images(*.jpg) from secondary pages webpageB*.com/*.jpg just givin to the LinkGrabber the main url webpageA.com

***I have been playing with Packagizer and/or LinkGrabber Filter and work amazing under other situations, but I fail in this one.

Thanks in advace,
Stay safe.:thumbup:

Last edited by anderson02; 20.06.2020 at 02:26.
Reply With Quote
  #2  
Old 20.06.2020, 05:22
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,165
Default

You will need to look at creating a
linkcrawler rule (Advanced setting) in JSON, help and numerous examples on the forum
or
decrypter plugin in Java, thousand or so examples exist on the SVN.

raztoki
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #3  
Old 22.06.2020, 04:47
anderson02 anderson02 is offline
Modem User
 
Join Date: Jun 2020
Posts: 4
Default

Thanks for your response,

I was unable to run LinkCrawler Rules correctly. I saw few examples of them, I'm confident that I need can be done with LinkCrawler rules.

I didn't found any videos on internet regarding LinkCrawler. I found one of your post (board-jdownloader-org/showthread.php?t=77280)but no example how to use the expression and/or operators.

"deepPattern" : "class="mp4"><a href="([^"]+)""
"deepPattern" : "<img id="image-\\d+" data-src=" (http[^<>"]+)"",
"deepPattern" : "(https?://[A-Za-z0-9]+\\.cloudfront\\.net/[^"]+/P0\\.jpg)",
"deepPattern" : "window\\.open\\('(http[^<>\\'"]+)'\\)",

1) How can I trigger LinkCrawler after insert the code in Setting/Advance Setting/LinkCrawler: Link Crawler Rules? Do I need to do something else? Deffinitly I missing somthing.

2) Can I use LinkCrawler and Packagaizer+LinkGrabber Filter at the same time?is possible to have in LinkGrabber all the url from main page source, then use Packagaizer + Filter to get what I need?

3) In LinkCrawler, If the problem is: How to get all URLs from main source?. Would be somthing like this? :

[ {
"enabled" : true,
"cookies" : null,
"updateCookies" : false,
"logging" : false,
"maxDecryptDepth" : 100,
"name" : "grab_all_links",
"pattern" : "mywebsite_plus_expresion_and/or_ operators",
"rule" : "DEEPDECRYPT",
"packageNamePattern" : null,
"passwordPattern" : null,
"formPattern" : null,
"deepPattern" : null,
"rewriteReplaceWith" : null
} ]


If you have any videos and/or tutorial references will be helpfull for us.

Thanks for all.

Last edited by anderson02; 22.06.2020 at 04:51.
Reply With Quote
  #4  
Old 22.06.2020, 09:55
tony2long's Avatar
tony2long tony2long is offline
English Supporter
 
Join Date: Jun 2009
Posts: 6,372
Default

If you provide a real example link, we can test a solution for you.
__________________
FAQ: How to upload a Log
Reply With Quote
  #5  
Old 22.06.2020, 10:06
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,165
Default

re1) Nothing more, but I would advise that you copy paste your rule in. Note that components have to be json encoded within quotations for Strings. For instance regex needs its normal escaping for patterns, then you need to escape those for json (so basically you require double escaping).

re2) you can use combo of all 3 but you wont need linkgrabber filter if you create a deepPattern to just find the content you want vs returning all supported content.

re3) yes that will work, but see answer 2 for the reasons you shouldn't do that. Also leave out everything than what you need, then reset is auto generated & give appropriate defaults. Recommend to use name, pattern, deepPattern, packageNamePattern.

there isn't any videos, but there are many examples on the forum in regards to link crawler rules

google: linkcrawler rules site:board.jdownloader.org

note: if you use chrome, 'view source' formats (seen it strip whitespace) source code o_O. I first experienced this last night for the first time. You will need to base your expressions to match real source code.

raztoki
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #6  
Old 23.06.2020, 00:03
anderson02 anderson02 is offline
Modem User
 
Join Date: Jun 2020
Posts: 4
Default

Thanks @tony2long

This is a real example where LinkGrabber can't get secondary URLs content, even with the Deep Link Analyze. Please let my know if there is an option in Settings/Advanced Settings where JD can grab deeps URL from main source or how would be the JSON in this case? I appreciate the help.

(main page pattern = **External links are only visible to Support Staff****External links are only visible to Support Staff**)
(secondary page pattern = **External links are only visible to Support Staff****External links are only visible to Support Staff**)

Example#1
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

Example#2
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

Example#3
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

Last edited by anderson02; 23.06.2020 at 00:07.
Reply With Quote
  #7  
Old 23.06.2020, 09:36
tony2long's Avatar
tony2long tony2long is offline
English Supporter
 
Join Date: Jun 2009
Posts: 6,372
Default

I tried with this rule:
Code:
[ {
  "logging" : true,
  "rule" : "DEEPDECRYPT",
  "maxDecryptDepth" : 1,
  "pattern" : "https?://www\\.aliexpress\\.com/item/\\d+\\.html",
  "packageNamePattern" : "<title>(.+?)</title>",
  "deepPattern" : "<img alt=\"\" src=\"([^\"]*?)\""
} ]
The links are taken, get page was done, response was
HTTP/1.1 200 OK
Content-Type: text/html;charset=UTF-8
but no output at all, don't know why, maybe raztoki can explain.
__________________
FAQ: How to upload a Log
Reply With Quote
  #8  
Old 24.06.2020, 04:06
anderson02 anderson02 is offline
Modem User
 
Join Date: Jun 2020
Posts: 4
Default

Hello @raztoki, sorry ot bother you again but @tony2long's code doesn't show any output, Could you give us any idea on this?

[ {
"logging" : true,
"rule" : "DEEPDECRYPT",
"maxDecryptDepth" : 1,
"pattern" : "https?://www\\.aliexpress\\.com/item/\\d+\\.html",
"packageNamePattern" : "<title>(.+?)</title>",
"deepPattern" : "<img alt="" src="([^"]*?)""
} ]

Thanks in advance guys if its possible to solve this issue.
Reply With Quote
  #9  
Old 24.06.2020, 05:21
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,165
Default

That wont work, if you check the source code the images are not present within the example url. Inspect/dev tools > network > find the image > initiator shows the larger images are requested from Javascript.

If you look for the image id HTB1EB9jN9zqK1RjSZPxq6A4tVXa1 from your first example url you can't see it even presented anywhere (network tab search) other than the requested image itself. So its constructed via JS ?
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]

Last edited by raztoki; 24.06.2020 at 05:32.
Reply With Quote
  #10  
Old 24.06.2020, 07:22
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,165
Default

request in the past, jiaz has stated outcome I found, https://board.jdownloader.org/showth...ght=aliexpress
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #11  
Old 24.06.2020, 08:47
tony2long's Avatar
tony2long tony2long is offline
English Supporter
 
Join Date: Jun 2009
Posts: 6,372
Default

@raztoki,
The problem is there is no output at all, not even <html> ... </html>, like if Content-Length: 0 but there is no "Content-Length".
__________________
FAQ: How to upload a Log
Reply With Quote
  #12  
Old 24.06.2020, 12:51
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,165
Default

I must admit I didn't test that tony2long sorry. maybe Jiaz can have a look into your findings.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 12:11.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.