#1
|
|||
|
|||
Problems to decrypt links for metalarea.org
I have add basic authentication in JD2 for metalarea.org
I paste these links on JD2 but it can't crawl nothing. But inside these pages I have several hosts **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** Maybe you can't see links inside metalarea pages because you need credentials to see hidden content (host links are also hidden in spoilers) |
#2
|
||||
|
||||
Hi,
I don't think that basic-authentication will work for this forum style website. You will probably have to extract the cookies of that website and add a link crawler rule. Without your login credentials I won't be able to help you with that. If you need an example, see your older thread HERE. -psp- EDIT I'll be offline soon and back again tomorrow.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 10.11.2020 at 17:40. |
#3
|
|||
|
|||
I sent you metalarea credentials.
Just another thing if I can. When I copy information from JD2 I see like that Link;2010 - Crusted.rar;**External links are only visible to Support Staff****External links are only visible to Support Staff** Link;1996 - Incomplete Minds.7z;**External links are only visible to Support Staff****External links are only visible to Support Staff** Is possible to see also (not only) the original link of forum from which I copied the link? I mean in this way: Link;2010 - Crusted.rar;**External links are only visible to Support Staff****External links are only visible to Support Staff** Link;1996 - Incomplete Minds.7z;**External links are only visible to Support Staff****External links are only visible to Support Staff** Because I need to have also forum link in informations from which that link was taken. |
#4
|
||||
|
||||
You can customize the CopyToClipboard Action via rightclick->context menu-> menu editor and there you can modify what is copied to clipboard
additional tags are Quote:
__________________
JD-Dev & Server-Admin |
#5
|
||||
|
||||
Also check rightclick->context menu->properties->show url and double click into the url to see all known urls
__________________
JD-Dev & Server-Admin |
#6
|
||||
|
||||
The procedure is basically the same as described in your older forum thread but this time you need the "masession_id" cookie:
Code:
[ { "enabled" : true, "cookies" : [ ["masession_id", "CENSORED"] ], "updateCookies" : true, "logging" : false, "maxDecryptDepth" : 1, "name" : "metalarea.org example rule with cookie-login", "pattern" : "https?://metalarea\\.org/forum/index\\.php\\?showtopic=\\d+", "rule" : "DEEPDECRYPT", "packageNamePattern" : null, "passwordPattern" : null, "formPattern" : null, "deepPattern" : "Download from <a href=\"(https?://[^\"]+)\"", "rewriteReplaceWith" : null } ] pastebin.com/JNC85fCH EDIT Please keep in mind that while we always try to help, the creation of custom LinkCrawler Rules is something that you should learn! We won't provide example rules for another 100 websites for you. In order to learn how to do this, you need to learn how to use regular expressions first - you can use webtools such as this to practice: regex101.com Regarding your 2nd question: Go to Settings -> User Interface -> Downloadlink address display Move "Data" to the top and/or deselect all others. This will only work if you add "uncrypted" downloadlinks. If you e.g. all .DLC containers, JD won't ever display the direct-URLs to you. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 10.11.2020 at 18:05. |
#7
|
|||
|
|||
@Jiaz
@pspzockerscene Thanks a lot! |
#8
|
||||
|
||||
@nathan1: also see my comment here https://board.jdownloader.org/showthread.php?t=85914
__________________
JD-Dev & Server-Admin |
#9
|
||||
|
||||
Thanks for your feedback.
-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#10
|
|||
|
|||
@Jiaz
@pspzockerscene I add in this way {type};{name};{url};{url.container};{url.origin};{packagename} I try to test some links but after update it has some problems to recognize URLs and set <title> or don't copy <title> with CopyInformation Action For example in this URL has mediafire link **External links are only visible to Support Staff****External links are only visible to Support Staff** but it don't crawl also for this links have difficult **External links are only visible to Support Staff****External links are only visible to Support Staff** LOG Code:
11.11.20 22.33.02 <--> 11.11.20 22.28.29 jdlog://6033425302851/ |
#11
|
|||
|
|||
Just another example
For example, this link is not crawled **External links are only visible to Support Staff****External links are only visible to Support Staff** The <title> of this URL is Carach Angren - Franckensteina Strataemontanus (2020), Symphonic Black Metal But JD2 after update don't works 1. don't crawl links inside it 2. don't generates <title> of **External links are only visible to Support Staff****External links are only visible to Support Staff** for packet where host links are crawled or however don't copy <title> from CopyInformation (also if I set up {type};{name};{url};{url.container};{url.origin};{packagename}) |
#12
|
||||
|
||||
All working fine here!
1. Working fine. If it doesn't for you, logout in browser and login again --> Grab the new value of the cookie and put that in your rule. Cookies can expire - yours might have expired. Mine also expired in my test-rule and I had to renew the cookie to make it work again. 2. If you want the rule to set a package title, you'd have to define that in the rule ("packageNamePattern"). Again I'm uring you to learn how to use regular expressions but I've modified the rule once again for you to grab- and set the title: Code:
[ { "enabled" : true, "cookies" : [ [ "masession_id", "CENSORED" ] ], "updateCookies" : true, "logging" : false, "maxDecryptDepth" : 1, "id" : 1605027636498, "name" : "metalarea.org example rule with cookie-login", "pattern" : "https?://metalarea\\.org/forum/index\\.php\\?showtopic=\\d+", "rule" : "DEEPDECRYPT", "packageNamePattern" : "<title>(.*?)</title>", "passwordPattern" : null, "formPattern" : null, "deepPattern" : "Download from <a href=\"(https?://[^\"]+)\"", "rewriteReplaceWith" : null } ] pastebin.com/Cfv2Zgv0 -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#13
|
|||
|
|||
Ok, I refresh browser and add new cookie and now works better but not perfectly.
I give you an examples. I copy these links (I use copy selected links extension of firefox to copy URLs) **External links are only visible to Support Staff****External links are only visible to Support Staff** javascript:multi_page_jump('**External links are only visible to Support Staff**, 38, 30 ); **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** JD2 crawls these links I copy information and I see this Link;Demoniac 1993 Satanas 666 (Rehearsal 11-1993).rar;**External links are only visible to Support Staff****External links are only visible to Support Staff** 1993 Satanas 666 (Rehearsal 11-1993); Link;Moonblood-The Winter Falls Over The Land [Remastered](CD 2015).rar;**External links are only visible to Support Staff****External links are only visible to Support Staff** Winter Falls Over The Land [Remastered](CD 2015); Link;st.rar;**External links are only visible to Support Staff****External links are only visible to Support Staff** What is problem? For st.rar "packageNamePattern" : "<title>(.*?)</title>" doesn't works. Information that it returns to me are Link;st.rar;**External links are only visible to Support Staff****External links are only visible to Support Staff** At the end of line I see ;st; and not ;Moonblood-The Winter Falls Over The Land [Remastered](CD 2015); And I don't understand why then JD2 requires also Login for metalarea if I add correctly cookie login. When I copy several links JD2 ask me login. I give you LOG Code:
11.11.20 23.48.47 <--> 12.11.20 00.11.04 jdlog://5333425302851/ |
#14
|
||||
|
||||
Oh lol we actually had a crawler for this website from 2016.
I'll check that with Jiaz tomorrow. The old crawler only listens for "http" URLs which is why you triggered it. Please wait for us to re-check this tomorrow/later ... -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#15
|
||||
|
||||
I've fixed our old metalarea plugin from 2016.
We're usually not making plugins for such simple websites anymore but you're lucky that this one still exists and didn't require a lot of work to fix. You do not need the above linkcrawler rule anymore after the next update. Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#16
|
|||
|
|||
Quote:
Last edited by nathan1; 12.11.2020 at 23:41. |
#17
|
|||
|
|||
In your package naming I see this
Quote:
I see that for every package is added Metal Area - Extreme Music Portal. Is possible to retrieve the real name of packet? Real names have its music genre tag in the <title> name. Look why, please Quote:
**External links are only visible to Support Staff****External links are only visible to Support Staff** you can see the real <title> is this Amenra - Mass Iii (2006), Doom/Sludge/Hardcore and not Amenra - Mass Iii (2006) - Metal Area - Extreme Music Portal This because relative URLs are inside that page from where JD2 crawls links. I would like his to refer to that title Last edited by nathan1; 13.11.2020 at 00:30. |
#18
|
||||
|
||||
Quote:
Also, our plugin auto-does this to get the packagename. Quote:
In the future, please use packagizer rules to correct package titles if you need to remove parts of them. Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#19
|
|||
|
|||
I try to change <tag> to retrieve correct title name but something doesn't work.
I use your packagizer rules to fix package titles and instead of Code:
"packageNamePattern" : "<title>(.*?)</title>", Code:
"packageNamePattern" : "<td style=\"word-wrap:break-word;\" width=\"99%\">(.*?)</td>", Code:
[ { "enabled" : true, "cookies" : [ [ "masession_id", "CENSORED" ] ], "updateCookies" : true, "logging" : false, "maxDecryptDepth" : 1, "id" : 1605027636498, "name" : "metalarea.org example rule with cookie-login", "pattern" : "https?://metalarea\\.org/forum/index\\.php\\?showtopic=\\d+", "rule" : "DEEPDECRYPT", "packageNamePattern" : "<td style="word-wrap:break-word;" width="99%">(.*?)</td>", "passwordPattern" : null, "formPattern" : null, "deepPattern" : "Download from <a href="(https?://[^"]+)"", "rewriteReplaceWith" : null } ] In your update you just remove "Metal Area.. " but real name is inside this tag Code:
<td style="word-wrap:break-word;" width="99%"> </td> For example, in this URL **External links are only visible to Support Staff****External links are only visible to Support Staff** this is text that I try to set up with your packagizer rules to correct package titles |
#20
|
||||
|
||||
I've updated it once again in our plugin.
Again: If ay plugin for a website is available, the plugin will be used and LinkCrawler Rules for the same website will be ignored. Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#21
|
|||
|
|||
@pspzockerscene
Great job! Can I ask you what you change to retrieve text from that tag ? What you write ? |
#22
|
||||
|
||||
@nathan1: he updated the native plugin to fetch the title as you want it
__________________
JD-Dev & Server-Admin |
#23
|
|||
|
|||
@pspzockerscene
Seems that in your last update you disable my last fetching request Please see LOG Code:
18.11.20 09.16.21 <--> 18.11.20 09.15.44 jdlog://9615425302851/ |
#24
|
||||
|
||||
@nathan1: was my fault, wait for next update
__________________
JD-Dev & Server-Admin |
#25
|
|||
|
|||
@Jiaz
ok,thanks |
#26
|
|||
|
|||
@Jiaz & psp
I see that plugin have problems to decrypt and fetch <title> names if links are from file.karelia.ru If you test these urls **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** you return this strange folder names Code:
923zjt rvw36q wn93q9 qgw44f gv79df LOG Code:
19.11.20 14.03.23 <--> 19.11.20 14.31.10 jdlog://9645425302851/ |
#27
|
||||
|
||||
Updated our file.karelia.ru crawler to only set packagenames for folders with more than 1 items.
Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#28
|
|||
|
|||
host url missing from Copy Information (Action)
I think that there are some problems with CopyToClipboard Action via rightclick -> context menu
I add these additional tags in (Action) Copy Information Code:
{type};{name};{url};{url.container};{url.origin};{packagename};{url.referrer} **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** Inside these pages are links like **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** but all external hoster's url like yadi.sk or sampo.ru are missing from (Action) Copy Information. These information don't appears. When I copy information What additional tag do I need? These are insufficient Code:
{type};{name};{url};{url.container};{url.origin};{packagename};{url.referrer} Code:
21.11.20 03.26.47 <--> 21.11.20 03.46.28 jdlog://4695425302851/ Last edited by Jiaz; 21.11.2020 at 12:25. |
#29
|
|||
|
|||
The data returned by {url} will vary, depending on the url sort order in Settings > User Interface. Use {url.content} instead, to get the download url.
|
#30
|
|||
|
|||
Thank you !
|
#31
|
||||
|
||||
@mgpai: Thanks for the fast and correct help
__________________
JD-Dev & Server-Admin |
#32
|
|||
|
|||
@psp
I follow your tips from your last post of khinsider to apply for metalarea Quote:
Note: clipboard observer is enabled LOG Code:
01.09.22 23.15.02 <--> 01.09.22 23.52.12 jdlog://7369211370661/ In theory it shouldn't just show me the package name only (and this is great), but without having to check any links in it? Why check me anyway if the files are offline even after disabling the linkcollector check? I'm only interested in coming up with the name of the packages, which it does correctly and only later can I check their status online - manually or when I download them. |
#33
|
|||||
|
|||||
Quote:
Quote:
- All of those 500 URLs - Your metalarea username + password Quote:
This can't be turned off as JD would simply do nothing then Also some crawlers will do the linkcheck right away because they've accessed the URL already so yes even with disabled linkcheck, some items will be displayed as online/offline with filename/filesize information set. Quote:
Quote:
I can imagine that some of your metalarea links go to a 404 error-page which the crawler will detect and display them as offline. This is not a real "online check" but as explained the crawler needs to access the added link anyways so that is not a bug. Also, a package in JDownloader cannot be empty it needs to contain at least one item.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#34
|
|||
|
|||
@psp
ok, the problem is that it does not detect many links: if you copy and paste all 500 links in bulk, JDownloader at some point freezes and "runs idle" for 1-2 minutes and in any case does not scan all the links. Instead, if you copy and paste 10-20 links it scans them and detects all 20. What I don't understand is why the scan is conditioned by the number of links. Maybe there is some setting or timer to improve? I sent you credentials + 500 urls metalarea links |
#35
|
||||
|
||||
Quote:
If you're frequently running into rate-limits, consider adding links with a delay of X seconds. This should be easily possible using external scripts or even an EventScripter script: https://support.jdownloader.org/Know...event-scripter Quote:
I'll check it...
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#36
|
||||
|
||||
It was indeed a rate-limit.
Metalarea will return http error-code 400 when that limit is reached. For the next update I've added some measures to try to prevent running into the rate-limit. These measures include: - Wait 1000ms between requests - Limit max simultan crawler instances for metalarea to 1 - Return dummy URLs for retrying in case rate-limit is hit Bitte auf das nächste CORE-Update warten! Please wait for the next CORE-Update! Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#37
|
|||
|
|||
@psp
Thank you for this update |
#38
|
||||
|
||||
Seems like is not a classic rate limit neither is it a sophisticated one:
They're simply blocking your current session after X (about 250) requests no matter how fast these requests are performed. I'm still working on it but I think I should be able to add auto handling and remove the requestInterval so the crawler can crawl fullspeed.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#39
|
||||
|
||||
Done. Rate-limit for metalarea links shouldn't be a problem anymore after the next set of plugin updates.
Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#40
|
|||
|
|||
@psp
Thank you very much ! |
Thread Tools | |
Display Modes | |
|
|