JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 04.12.2020, 02:15
RPNet-user's Avatar
RPNet-user RPNet-user is offline
Storm
 
Join Date: Apr 2017
Posts: 224
Default Broken category from website is preventing JD from grabbing links

Within the past six hours, the site admins for rmz broke something on their site including their new domain where the filter for the "Movies Only" category displays a blank white page and this has caused a problem for JD2 not able to add new links to the linkgrabber.

I have submitted an email, a post, and a chat message and hopefully they will fix this soon.
This issue is only in effect when not logged in to their site.

Currently the links in the JD script are set to crawl and grab from "**External links are only visible to Support Staff**, "**External links are only visible to Support Staff**, "**External links are only visible to Support Staff** and so forth.

If I'm logged in to their site, these links will display as "**External links are only visible to Support Staff**, "**External links are only visible to Support Staff**, "**External links are only visible to Support Staff** and so forth which I have already tested after the browser set the cookies for the linkcrawler rule, however, this will not work either since it will only work while logged in (authenticated to their site).

Is it possible to set the "linkcrawler" rule to grab them while authenticated to their site or is this currently not a supported option?
The linkcrawler rule has three fields that appears that this option may be possible: logging, ID, and passwordPattern.

===================================
Update: The website has already fixed the issue.

I'm still curious if it is possible to use the linkcrawler rule/JD2 to grab logged in links as I mentioned above.

Last edited by RPNet-user; 04.12.2020 at 03:40. Reason: update
Reply With Quote
  #2  
Old 04.12.2020, 14:05
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 59,717
Default

Quote:
Originally Posted by RPNet-user View Post
I'm still curious if it is possible to use the linkcrawler rule/JD2 to grab logged in links as I mentioned above.
Hi,
your question basically requires two answers:
1. Currently it is impossible to add LinkCrawler Rules for patterns/"websites" which re already supported via plugin - there is a ticket with a suggestion to allow this:


2. Yes generally it is possible to make linkcrawler rules which allow JD to access websites in logged-in state (by providing the required cookies).
I've just recently added a new Knowledgebase Article regarding LinkCrawler Rules.
Please keep in mind that I'm still working on providing more example rules so at this moment, you will find the basics in that linked article but you will have to search our forum for example rules.

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist

Last edited by raztoki; 05.12.2020 at 02:39.
Reply With Quote
  #3  
Old 05.12.2020, 02:20
RPNet-user's Avatar
RPNet-user RPNet-user is offline
Storm
 
Join Date: Apr 2017
Posts: 224
Default

Unfortunately, "the required cookies set by the linkcrawler rule" is not enough since the links necessary for JD to grab will not work as I have tried this several times and JD will not recognize those links which are only accessible in a logged-in state(rmz.cr/my/b/2) vs not logged-in state(rmz.cr/l/m/2).

So basically, the only way to over-ride this would be an adjustment to the plugin(to also accept-->/my/b/) or the "support to prefer rule over plugin".

Hopefully, either will become available someday.

Thanks

Last edited by RPNet-user; 05.12.2020 at 04:37. Reason: misspelling
Reply With Quote
  #4  
Old 07.12.2020, 15:00
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 59,717
Default

Well as long as your URLs differ from the ones that our plugin accepts at this moment, you can add linkcrawler rules even if there is a plugin.
Please post example URLs of the ones you want to add so I can at least compare the structure of those to the currently supported ones.

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
Reply With Quote
  #5  
Old 07.12.2020, 19:54
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

Please provide example URLs, then we will take a look and I will try to finish work on the ticket
__________________
JD-Dev & Server-Admin
Reply With Quote
  #6  
Old 07.12.2020, 19:54
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

@pspzocker: maybe add cookie settings to plugin settings to let plugin make use of cookies?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #7  
Old 08.12.2020, 00:47
RPNet-user's Avatar
RPNet-user RPNet-user is offline
Storm
 
Join Date: Apr 2017
Posts: 224
Default

It is not about my urls, this is about the linkcrawler rule not able to grab links from a login state link-structure eg.(**External links are only visible to Support Staff**..., **External links are only visible to Support Staff**

The current plugin only supports gabbing links while in a logged out state eg.(**External links are only visible to Support Staff**..., **External links are only visible to Support Staff**linkcrawler rule" as I have tried them and it will not work since the plugin link crawl structure over-rides the linkcrawler rules every time.

The only way around this is to either provide support for the plugin to crawl and grab from this link structure---> (rmz.cr/my/b/2), (rmz.cr/my/b/3), (rmz.cr/my/b/4)
or to provide "rule over plugin" support; one or the other should work.

Here are three sample links, however, they must be tested by setting the "linkcrawler rule pattern" to use (/my/b/) instead of (/l/m/) and the event scripter urls to:

Code:
urls[urls.length] = "**External links are only visible to Support Staff**;
urls[urls.length] = "**External links are only visible to Support Staff**;
urls[urls.length] = "**External links are only visible to Support Staff**;
urls[urls.length] = "**External links are only visible to Support Staff**;
urls[urls.length] = "**External links are only visible to Support Staff**;
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
Reply With Quote
  #8  
Old 08.12.2020, 10:08
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

Quote:
Originally Posted by pspzockerscene View Post
1. Currently it is impossible to add LinkCrawler Rules for patterns/"websites" which re already supported via plugin - there is a ticket with a suggestion to allow this:
This is the cause. The pattern of the plugin matches all URLs and therefore your Linkcrawler rule isn't processed at all. All URLs are processed by plugin but because of missing cookies it will fail with those *must be logged in* URLs
__________________
JD-Dev & Server-Admin
Reply With Quote
  #9  
Old 08.12.2020, 10:15
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

Just for my understanding, what are those /my/ URLs?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #10  
Old 09.12.2020, 01:40
RPNet-user's Avatar
RPNet-user RPNet-user is offline
Storm
 
Join Date: Apr 2017
Posts: 224
Default

Quote:
Originally Posted by Jiaz View Post
Just for my understanding, what are those /my/ URLs?
The cookies make no difference whether I'm logged in or not. The problem is that the "JD plugin/LC-rule" does not recognize crawling and grabbing links from "/rmz.cr/my/b/2"(logged in). It will only recognize "/rmz.cr/l/m/2"(logged out).
Attached Images
File Type: png RMZ.Logged.In.png (24.5 KB, 2 views)
File Type: png RMZ.Logged.Out.png (28.4 KB, 2 views)
File Type: png Linkcrawler.Rule.and.Event.Script.png (18.4 KB, 5 views)

Last edited by RPNet-user; 09.12.2020 at 02:02.
Reply With Quote
  #11  
Old 09.12.2020, 13:22
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

Please send your Linkcrawler rule with cookies to support@jdownloader.org
Most likely there is something missing/wrong with the rule.

The rule in your screenshot doesn't even match on /my/ URLs ?!
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 09.12.2020 at 13:24.
Reply With Quote
  #12  
Old 10.12.2020, 00:43
RPNet-user's Avatar
RPNet-user RPNet-user is offline
Storm
 
Join Date: Apr 2017
Posts: 224
Default

Quote:
Originally Posted by Jiaz View Post
Please send your Linkcrawler rule with cookies to support@jdownloader.org
Most likely there is something missing/wrong with the rule.

The rule in your screenshot doesn't even match on /my/ URLs ?!
Jiaz, it is not supposed to match.
The linkcrawler rule and urls in the script that I'm currently using as per my screenshot has been working perfectly fine all day every day for the past year.

This has nothing to do with my current linkcrawler rule as it is working perfectly fine as it is.

My inquiry was in regards to the linkcrawler rule and the plugin for that site will not work for crawling and grabbing links if the rule and the urls in the script were to crawl from a link structure like this---> "/rmz.cr/my/b/2".

I presume that the reason for this is because the plugin for "rmz" is configured to only parse and crawl for links by using this link structure--->"/rmz.cr/l/m/2" which is how my rule and script is configured and working perfectly fine for the past year.

That is how the link structure appears to you and everyone that is not logged in to the site, however, when a user is logged in, the structure appears different, eg."/rmz.cr/my/b/2", and because the plugin does not support crawling and grabbing from this path-structure and the rule does not over-ride the plugin, then JD will not be able to grab the links when the link-path structure is set to "/my/b/" in both the rule and the script urls.

JD will crawl but not able to grab the links.
As I have said, the cookies make no difference as long as the plugin over-rides the rule and the plugin does not support grabbing links from "/rmz.cr/my/b/".

So basically, if the plugin had support to crawl and grab links from either "/my/b/" or "/l/m/" then it should work....or; for the linkcrawler rule to over-ride the plugin.

It is not necessary to update the plugin or add support for rule-over-ride plugin as the site had a temporary bug that they fixed within eight hours. If the plugin for rmz had support to crawl and grab links from "/my/b/" or if the rule was able over-ride the plugin, then I could have temporarily changed the rule to grab from "/my/b/" until they had fixed the "bug", however, it was a temporary issue and they fixed it right away so it is ok as far as I'm concerned to leave this as is and change this post back to "solved".

Last edited by RPNet-user; 10.12.2020 at 10:03.
Reply With Quote
  #13  
Old 10.12.2020, 12:57
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

Quote:
Originally Posted by RPNet-user View Post
My inquiry was in regards to the linkcrawler rule and the plugin for that site will not work for crawling and grabbing links if the rule and the urls in the script were to crawl from a link structure like this---> "/rmz.cr/my/b/2"
The plugin does only process "/release/" URLs. Your Linkcrawler rule (Screenshot) ONLY match on "/l/m" URLs. So none of both matches "/my/" URLs and that's what I'm trying to say.
The "/my/" URLs require to be logged in (Browser redirects to login) so you have to modify your LinkCrawler rule to match on those as well AND include your logins so JDownloader is able to process them
__________________
JD-Dev & Server-Admin
Reply With Quote
  #14  
Old 10.12.2020, 12:58
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

Quote:
Originally Posted by RPNet-user View Post
I presume that the reason for this is because the plugin for "rmz" is configured to only parse and crawl for links by using this link structure--->"/rmz.cr/l/m/2" which is how my rule and script is configured and working perfectly fine for the past year.
The plugin only supports "/release" URLs. It's the Linkcrawler rule that supports "/l/m/" URLs
__________________
JD-Dev & Server-Admin
Reply With Quote
  #15  
Old 10.12.2020, 13:01
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

Quote:
Originally Posted by RPNet-user View Post
That is how the link structure appears to you and everyone that is not logged in to the site, however, when a user is logged in, the structure appears different, eg."/rmz.cr/my/b/2", and because the plugin does not support crawling and grabbing from this path-structure and the rule does not over-ride the plugin, then JD will not be able to grab the links when the link-path structure is set to "/my/b/" in both the rule and the script urls.

JD will crawl but not able to grab the links.
As I have said, the cookies make no difference as long as the plugin over-rides the rule and the plugin does not support grabbing links from "/rmz.cr/my/b/".
Already explained above that plugin and rule work perfectly fine because they match on different URLs and NOTHING matches "/my/" URLs so nothing happens
__________________
JD-Dev & Server-Admin
Reply With Quote
  #16  
Old 10.12.2020, 13:02
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

You just have to update your rule, or add new rule
1.) also support "/my/" URLs
2.) contains the cookies so JDownloader doesn't get redirected to login page
__________________
JD-Dev & Server-Admin
Reply With Quote
  #17  
Old 10.12.2020, 13:05
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

Quote:
Originally Posted by RPNet-user View Post
The problem is that the "JD plugin/LC-rule" does not recognize crawling and grabbing links from "/rmz.cr/my/b/2"(logged in). It will only recognize "/rmz.cr/l/m/2"(logged out).
exactly! because you have to update your rule to make it match those URLs and add cookies so it will be able to process those URLs
__________________
JD-Dev & Server-Admin
Reply With Quote
  #18  
Old 10.12.2020, 23:12
RPNet-user's Avatar
RPNet-user RPNet-user is offline
Storm
 
Join Date: Apr 2017
Posts: 224
Default

Quote:
Originally Posted by Jiaz View Post
exactly! because you have to update your rule to make it match those URLs and add cookies so it will be able to process those URLs
I tried all that before I started this post and it does not work.

Quote:
Originally Posted by Jiaz View Post
You just have to update your rule, or add new rule
1.) also support "/my/" URLs
2.) contains the cookies so JDownloader doesn't get redirected to login page
1.) Apparently, it does not support grabbing links from /my/b/ urls, it only crawls the pages.
2.) I already tried it with cookies set while authenticated and it makes no difference. There is "no redirect", ever, it crawls but does not grab the links.

I updated the pattern in the rule from /l/m to /my/b/ and updated the urls in the script from /l/m to /my/b with the cookies set during login.

There are no redirects, it simply crawls but does not grab any links.
The rule-pattern and event script-urls from my screenshot above called "Linkcrawler.Rule.and.Event.Script" was basically inverted from /l/m to /my/b with the cookies set while authenticated.

I sent you the JD cookie with the entire linkcrawler rule along with the event script that includes 10 urls at: support@jdownloader.org

Code:
"pattern" : "https?://rmz\\.cr/l/m/[0-9]*?",
This is the only section in the linkcrawler rule that I manually changed and updated. I changed it from /l/m to ---> /my/b/ and I also changed all the urls in the script from rmz.cr/l/m/ to ---> rmz.cr/my/b/

Unless I have to change/update something else other than the "pattern" in the rule or there is something wrong with the cookie, than using /my/b/ will not work; at least not for me. I did try all this during the eight hours that their /l/m/ pages were blank and only accessible as /my/b/ while logged in so this may have affected the linkcrawler rule from grabbing the links.

Code:
urls[urls.length] = "**External links are only visible to Support Staff**;
urls[urls.length] = "**External links are only visible to Support Staff**;
urls[urls.length] = "**External links are only visible to Support Staff**;
urls[urls.length] = "**External links are only visible to Support Staff**;
urls[urls.length] = "**External links are only visible to Support Staff**;

Last edited by RPNet-user; 11.12.2020 at 04:15.
Reply With Quote
  #19  
Old 11.12.2020, 12:24
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 72,936
Default

Thanks for the mail. Issue is with login cookie because it's bound to browser and IP.
JDownloader gets redirected to /account/sign_in


You will have to wait until rules do support customization of headers, like User-Agent

Can you send me Logins to the site? Then I can check if correct headers will make the rule work
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 11.12.2020 at 12:29.
Reply With Quote
  #20  
Old 11.12.2020, 22:45
RPNet-user's Avatar
RPNet-user RPNet-user is offline
Storm
 
Join Date: Apr 2017
Posts: 224
Default

Done, I just replied in the ticket with username and pass.


For some reason, JD never redirected for me, I tested it on two separate computers, one using Firefox, and a separate computer using Chrome and JD never redirected. It simply crawled the pages set in the script but did not add any links.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 17:38.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.