#1
|
|||
|
|||
Grab multiple urls from main source!!!???
Hi all! .....I'm new over here!!
I'm wondering if there is a way to set a condition and grab all the urls from main url, then filter them to get my final images pachages. Example: main url: webpageA.com have inside secondary urls: webpageB1.com/img1.jpg and webpageB2.com/img2.jpg I will like to know if I can create a condition to download all the images(*.jpg) from secondary pages webpageB*.com/*.jpg just givin to the LinkGrabber the main url webpageA.com ***I have been playing with Packagizer and/or LinkGrabber Filter and work amazing under other situations, but I fail in this one. Thanks in advace, Stay safe.:thumbup: Last edited by anderson02; 20.06.2020 at 02:26. |
#2
|
||||
|
||||
You will need to look at creating a
linkcrawler rule (Advanced setting) in JSON, help and numerous examples on the forum or decrypter plugin in Java, thousand or so examples exist on the SVN. raztoki
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
#3
|
|||
|
|||
Thanks for your response,
I was unable to run LinkCrawler Rules correctly. I saw few examples of them, I'm confident that I need can be done with LinkCrawler rules. I didn't found any videos on internet regarding LinkCrawler. I found one of your post (board-jdownloader-org/showthread.php?t=77280)but no example how to use the expression and/or operators. "deepPattern" : "class="mp4"><a href="([^"]+)"" "deepPattern" : "<img id="image-\\d+" data-src=" (http[^<>"]+)"", "deepPattern" : "(https?://[A-Za-z0-9]+\\.cloudfront\\.net/[^"]+/P0\\.jpg)", "deepPattern" : "window\\.open\\('(http[^<>\\'"]+)'\\)", 1) How can I trigger LinkCrawler after insert the code in Setting/Advance Setting/LinkCrawler: Link Crawler Rules? Do I need to do something else? Deffinitly I missing somthing. 2) Can I use LinkCrawler and Packagaizer+LinkGrabber Filter at the same time?is possible to have in LinkGrabber all the url from main page source, then use Packagaizer + Filter to get what I need? 3) In LinkCrawler, If the problem is: How to get all URLs from main source?. Would be somthing like this? : [ { "enabled" : true, "cookies" : null, "updateCookies" : false, "logging" : false, "maxDecryptDepth" : 100, "name" : "grab_all_links", "pattern" : "mywebsite_plus_expresion_and/or_ operators", "rule" : "DEEPDECRYPT", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : null, "rewriteReplaceWith" : null } ] If you have any videos and/or tutorial references will be helpfull for us. Thanks for all. Last edited by anderson02; 22.06.2020 at 04:51. |
#4
|
||||
|
||||
If you provide a real example link, we can test a solution for you.
__________________
FAQ: How to upload a Log |
#5
|
||||
|
||||
re1) Nothing more, but I would advise that you copy paste your rule in. Note that components have to be json encoded within quotations for Strings. For instance regex needs its normal escaping for patterns, then you need to escape those for json (so basically you require double escaping).
re2) you can use combo of all 3 but you wont need linkgrabber filter if you create a deepPattern to just find the content you want vs returning all supported content. re3) yes that will work, but see answer 2 for the reasons you shouldn't do that. Also leave out everything than what you need, then reset is auto generated & give appropriate defaults. Recommend to use name, pattern, deepPattern, packageNamePattern. there isn't any videos, but there are many examples on the forum in regards to link crawler rules google: linkcrawler rules site:board.jdownloader.org note: if you use chrome, 'view source' formats (seen it strip whitespace) source code o_O. I first experienced this last night for the first time. You will need to base your expressions to match real source code. raztoki
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
#6
|
|||
|
|||
Thanks @tony2long
This is a real example where LinkGrabber can't get secondary URLs content, even with the Deep Link Analyze. Please let my know if there is an option in Settings/Advanced Settings where JD can grab deeps URL from main source or how would be the JSON in this case? I appreciate the help. (main page pattern = **External links are only visible to Support Staff****External links are only visible to Support Staff**) (secondary page pattern = **External links are only visible to Support Staff****External links are only visible to Support Staff**) Example#1 **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** Example#2 **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** Example#3 **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** Last edited by anderson02; 23.06.2020 at 00:07. |
#7
|
||||
|
||||
I tried with this rule:
Code:
[ { "logging" : true, "rule" : "DEEPDECRYPT", "maxDecryptDepth" : 1, "pattern" : "https?://www\\.aliexpress\\.com/item/\\d+\\.html", "packageNamePattern" : "<title>(.+?)</title>", "deepPattern" : "<img alt=\"\" src=\"([^\"]*?)\"" } ] HTTP/1.1 200 OK Content-Type: text/html;charset=UTF-8 but no output at all, don't know why, maybe raztoki can explain.
__________________
FAQ: How to upload a Log |
#8
|
|||
|
|||
Hello @raztoki, sorry ot bother you again but @tony2long's code doesn't show any output, Could you give us any idea on this?
[ { "logging" : true, "rule" : "DEEPDECRYPT", "maxDecryptDepth" : 1, "pattern" : "https?://www\\.aliexpress\\.com/item/\\d+\\.html", "packageNamePattern" : "<title>(.+?)</title>", "deepPattern" : "<img alt="" src="([^"]*?)"" } ] Thanks in advance guys if its possible to solve this issue. |
#9
|
||||
|
||||
That wont work, if you check the source code the images are not present within the example url. Inspect/dev tools > network > find the image > initiator shows the larger images are requested from Javascript.
If you look for the image id HTB1EB9jN9zqK1RjSZPxq6A4tVXa1 from your first example url you can't see it even presented anywhere (network tab search) other than the requested image itself. So its constructed via JS ?
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] Last edited by raztoki; 24.06.2020 at 05:32. |
#10
|
||||
|
||||
request in the past, jiaz has stated outcome I found, https://board.jdownloader.org/showth...ght=aliexpress
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
#11
|
||||
|
||||
@raztoki,
The problem is there is no output at all, not even <html> ... </html>, like if Content-Length: 0 but there is no "Content-Length".
__________________
FAQ: How to upload a Log |
#12
|
||||
|
||||
I must admit I didn't test that tony2long sorry. maybe Jiaz can have a look into your findings.
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
Thread Tools | |
Display Modes | |
|
|