#1
|
|||
|
|||
asian-sirens.net DOES NOT bring any results at linkcrawler...
asian-sirens.net DOES NOT bring any results at linkcrawler...
(I also tried with new JD install using MULTIOS JAR) example link **External links are only visible to Support Staff****External links are only visible to Support Staff** |
#2
|
||||
|
||||
Hi,
1. This website is not supported by JDownloader so you will have to use the DEEP-parser (= clipboard-observation will not work!) or use custom LinkCrawler rules. Have you tried that already and did not get any results? 2. This website is using Cloudflare so if you're unlucky it fails for you because of that. Our deep-parser is working just fine here with this website (= finds all of these images) so if you want us to check this, please provide a debug-log: Please post your log-ID here | bitte poste deine Log-ID hier. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#3
|
|||
|
|||
Quote:
can you help me with pattern maybe? [ { "enabled" : true, "cookies" : [ ], "updateCookies" : true, "logging" : false, "maxDecryptDepth" : 0, "name" : null, "pattern" : "**External links are only visible to Support Staff**, "rule" : "DEEPDECRYPT", "packageNamePattern" : "<title>(.+)</title>", "passwordPattern" : null, "formPattern" : null, "deepPattern" : null, "rewriteReplaceWith" : null } ] |
#4
|
|||
|
|||
Quote:
url slugs look like those for example /wp/2021/11/satsuki/ /wp/2021/11/davina-stewart-davinastewart_/ /wp/2020/10/arabella_kat/ /wp/2020/10/putrypoyz/ wp/2018/10/vidadadada/ but I dont know to use code to make it work with linkcrawler !? I know about /.+?/ (.+) \\d{6} but it brings no results... I think it is about to use the right PATTERN... |
#5
|
|||
|
|||
Quote:
by the way: I am using eventscripter with txt-file with links to be added for deepdecrypt crawling with sleep intervall of 250 |
#6
|
||||
|
||||
I would like to see some more effort from you.
We've helped you with multiple LinkCrawler Rules yet it seems like you haven't even learned the basics of regular expressions. Example: https://board.jdownloader.org/showthread.php?t=87038 Please re-read our LinkCrawler Rules article - it even links multiple websites to learn the basics of regular expressions: https://support.jdownloader.org/Know...kcrawler-rules Hints: - your "pattern" is not escaped (e.g. dot = "match all") - "deepPattern" is not defined but I'm quite sure you want it to grab only images or only specific images e.g.: Code:
property=\"og:image\" content=\"(https?://www\\.asian-sirens\\.net/uploads/[0-9]{4}/[^\"]+) Quote:
Use the websites/webtools we've linked to test your regular expressions e.g. regex101.com. Play around with it and you'll soon know the meaning of many more RegEx symbols Quote:
Also please stop double/tripple posting! Our forum lets you edit existing posts which is what you would usually do if you want to add information to your existing posts if it was the last post inside a forum thread. -psp- EDIT Edit example.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 02.12.2021 at 00:49. Reason: Added "Edit example." |
#7
|
|||
|
|||
Quote:
is there internet provider issues or country limitation possible on my end because of cloudflare? thanks you for your untested example... property="og:image" content="(https?://www\\.asian-sirens\\.net/uploads/[0-9]{4}/[^"]+) but I have no idea where to add 'property' - cannot find any help at google searching for example "linkcrawler property" |
#8
|
||||
|
||||
Your answer goes to show that again you either haven't read any of my linked help articles or you completely failed to understand it.
If you do not understand parts of our article, please let us know so we can improve it instead of having to describe something again and again in our forums. I'll try to explain it more detailed: So back to the beginning: You have a website which JD does not have a plugin for and you want JD to do two things: 1. Recognize a specific URL structure. 2. Find something on that website (in this case, only those pictures). This means first we need to make a regular expression matching the URLs our rule is supposed to work on. ...because we usually do not want that rule to just process all URLs from that website but only those with a specific structure (our pattern). In this case it is roughly: asian-sirens.net/wp/1234/12/bla-bla/ --> So we could use the following as our pattern value: Code:
**External links are only visible to Support Staff** Quote:
Example: Code:
[ { "enabled": true, "logging": false, "maxDecryptDepth": 1, "name": "example rule crawl EVERYTHING from specific URLs from website asian-sirens.net", "pattern": "**External links are only visible to Support Staff**, "rule": "DEEPDECRYPT", "packageNamePattern": null, "deepPattern": null } ] pastebin.com/raw/Ppw6CSpf Now to test it, add the URL you provided in your original post and if JD finds some/all of these pictures you can be sure that: - The pattern is working - There are no issues with Cloudflare Now that we're at it: You could also use tools like regex101.com to verify in beforehand, that the pattern we created for our rule is matching the URLs we want it to work for. ...so let's continue. but I have no idea where to add 'property' - cannot find any help at google searching for example "linkcrawler property" Quote:
Use examples found in our knowledge base and also ones you can find in our forum. It's all explained in our help article but I'll add a more detailed explanation here: Now you got a working rule but it crawls everything instead of only the pictures you want to have so let's add a "deepPattern" to the rule we've created before to tell it what to look for inside the websites' HTML code. That is what we need deepPattern for: Code:
[ { "enabled": true, "logging": false, "maxDecryptDepth": 1, "name": "example rule crawl PICTURES from specific URLs from website asian-sirens.net", "pattern": "**External links are only visible to Support Staff**, "rule": "DEEPDECRYPT", "packageNamePattern": null, "deepPattern": "\"(https?://www\\.asian-sirens\\.net/uploads/[0-9]{4}/[^\"]+)" } ] pastebin.com/raw/0RPb35i0 (As you can see, I've modified the "deepPattern" once again - it is not the same version that I've shown you in my previous reply.) Now test this one again and you should get only pictures. If you still get too many, you might have to modify your deepPattern to try to only get the items you want. For some websites this might be impossible but in this case it should be possible. Again you could simply use webtools like regex101.com to test your deepPattern against the HTML of your target website so you could see the results JD would find in beforehand for example:
Spoiler:
...by the way this one works exactly like the other one I've created for you here: https://board.jdownloader.org/showpo...02&postcount=6 Now if you e.g. want to have nicer package names, you could make use of packageNamePattern with another regular expression but that's up to you. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 02.12.2021 at 17:27. Reason: Added more screenshots |
#9
|
|||
|
|||
Quote:
I once again installed new jd with multios file I added linkcrawler code from you please see **External links are only visible to Support Staff****External links are only visible to Support Staff** I only added "packageNamePattern" : "<title>(.+)</title>", myself I also added eventscripter to crawl a txt file with links from website... I get NO results using txt-file eventscripter crawling, and I get NO results using 'continue' button using START DEEP link Analyse... I am sorry :( by the way: pattern and deeppattern have different https www with '?' I think that was your intention !? |
#10
|
||||
|
||||
Quote:
Please first add the rule and manually add links (without any EventScripter script) and check if it works that way! Your post makes me think that you've tried it via EventScripter straight away without any testing... Which URLs did you used for testing? I only tested it using the URL you provided in your first post. Quote:
Again the fact that you're even asking tells me that you didn't even invest 1 minute to think about it/learn how regular expressions work... Here is a version which (dis-)allows the 'www.' in both regular expressions: Code:
[ { "enabled": true, "logging": false, "maxDecryptDepth": 1, "name": "example rule crawl PICTURES from specific URLs from website asian-sirens.net", "pattern": "**External links are only visible to Support Staff**, "rule": "DEEPDECRYPT", "packageNamePattern": null, "deepPattern": "\"(https?://(?:www\\.)?asian-sirens\\.net/uploads/[0-9]{4}/[^\"]+)" } ] pastebin.com/raw/74QsXN4w Quote:
Even without any rule you get no results when you let JD deep-analyze the URL you provided in your first post?? Then maybe something is blocking JD or that "asian-sirens" website or Cloudflare (I don't think it's Cloudflare in this case as I don't see any Cloudflare cookies here). To narrow this down, please do the following: 1. Change the "logging" parameter in that rule from false to true. 2. Enable debug logs (see link down below). 3. Please post your log-ID here | bitte poste deine Log-ID hier. -psp- EDIT If none of this is working, we can look at this together via a Teamviewer session.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 02.12.2021 at 19:28. Reason: Fixed some typos |
Thread Tools | |
Display Modes | |
|
|