JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 15.10.2020, 03:04
svArtist svArtist is offline
Vacuum Cleaner
 
Join Date: Mar 2015
Posts: 15
Default LinkGrabber stripping URL components?

I'm heavily using userScripts in my browser to handle titles for all kinds of downloads, and at some point, I wanted to integrate that into JDownloader and wrote a script for that.
Originally, I wrote it for single URLs that I would copy from my browser's address bar:
My userScripts would get the appropriate title for the download and, depending on how the website was set up, append it to the URL, usually either as a query string ("?title=Good%20Title"), or hash ("#Good%20Title").
When I copied the URL, the JDownloader would grab the URL, and if I wanted, I could run the script from the context menu after right clicking the desired link in the list.
The script would look at the link (myCrawledLink.getUrl()), extract the title and rename the file and the package accordingly.

Today, I wanted to expand on this, but I noticed that now, after some update, the URLs are being stripped from their titles.
What used to be "һttps://aparatꓸcam/35sfec251822/Fargo.S04E03.1080p.WEB.H264-CAKES.mkv.mp4.html#Fargo%20Season%204%20Episode%203%20-%20Raddoppiarlo"
Spoiler:
=Superfluous, I guess, added by the site to make it more readable/for SEO
=Superfluous, added by me to assign the title I acquired before
is now "һttps://aparatꓸcam/35sfec251822"
The components that aren't strictly necessary to identify the link are gone.

For URLs that I copied directly (from the Address Bar, for example), the original URL is still visible in myCrawledLink.getContentURL(), but no such luck for cases where the "extra info" was in the href property of <a> elements that were selected in the site text flow.

Is there a way to change that?

Last edited by svArtist; 15.10.2020 at 03:16. Reason: Link detection broke example highlighting
Reply With Quote
  #2  
Old 15.10.2020, 15:33
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,226
Default

our plugins look for specific content of urls which is usually the least amount of the url required to make it work. typically domain+uid, && and the plugins determine filename/filesize online confirmations.

the easiest way to make this work would be to make a package customiser rule and search for your #whatever since most plugins do not use this (though mega and a few others do) and provide the custom filename you desire.

In respects to traceability you could reference a unique id or set a comment which you can then later refer.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #3  
Old 15.10.2020, 17:43
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 68,318
Default

@raztoki: Thanks for the explanation and I would also suggest packagizer rule and comment field.

@svArtist:
There is no way to disable the internal handling of the plugin internal URL, that part removes
unnecessary parts/rebuild URLs with important information only.
How exactly do you add those URLs?
You should use the packagizer to either directly parse hash fields and set values or copy those to comment field for late use.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #4  
Old 17.10.2020, 01:52
svArtist svArtist is offline
Vacuum Cleaner
 
Join Date: Mar 2015
Posts: 15
Default

Quote:
Originally Posted by raztoki View Post
our plugins look for specific content of urls which is usually the least amount of the url required to make it work. [...]

the easiest way to make this work would be to make a package customiser rule [...]
Quote:
Originally Posted by Jiaz View Post
@raztoki: Thanks for the explanation and I would also suggest packagizer rule and comment field.

@svArtist:
[...] How exactly do you add those URLs? [...]
Thank you both for your useful feedback!
I've never used custom packagizer rules before. On one hand, I think it'd be easier to have my userScripts just hand me a text-only list of the modified URLs that I'll give the LinkGrabber's parser directly, such that I can simply refer to the content URL, which seems to be set to the full original URL in cases where no context URL was found.
But on the other hand, I want to get to know the packagizer stuff, so I'll look into that.
So the packagizer gets the links before they're handled by the individual plugins?

@Jiaz: I use GreaseMonkey for a lot of things. There's one site, for example, who lists the useful titles for things like episodes, for their site internal links. But the links to the actual contents are links to external sites.
So I get the titles from the first listing and track them through site navigation, using URL components, until I get to the external links. I add the titles to these links (depending on the host, I'll usually choose a hash or a query string). Usually, a different userscript for the target site then sets the document title, so that I can download already correctly titled videos with a video downloader extension.

Or, for other sites that don't have the actual titles but have the contents load dynamically in a page, I implemented a textarea in which I can dump a pre-formatted (JSON) list of all the episode numbers and titles that I can generate using another userScript on IMDB, which will change the document title upon episode change.

I just started using JDownloader for some cases where the titles can be appended to several links at the same time, thinking I could just copy them from the modified document. Turns out to be not quite as simple
Reply With Quote
  #5  
Old 17.10.2020, 05:05
svArtist svArtist is offline
Vacuum Cleaner
 
Join Date: Mar 2015
Posts: 15
Default

Nice!
Turns out, I can use packagizer to do my job for me for most cases anyway!

If anyone is interested in what I did:
Sourceurl(s) -> contains -> "(#)|(customTitle=)(.+)" -> as RegEx
Link origin -> is -> Clipboard
...
Filename -> "<jd:source:3>.<jd:orgfiletype>"
Comment -> "customTitle=<jd:source:3>"

It even seems to decode the URL Component for you!

This way, if I need to do some custom stuff, I can use my EventScripter script and refer to the title.

Now I just need to come up with a way to exclude some patterns or other ways of filtering cases like mega links which use the hash symbol...
Negative look-aheads in RegEx are so ugly, I'm told...
But hey, this seems to work:
Code:
^(?:(?:https?:\/\/(?!mega\.(?:co\.)?nz)[^#]*#)|(?:.*customTitle=))([^\s&\/#]+).*$
Marked groups as non-capturing, so the only thing that appears on right click now is "Source URL Wildcard (*) #1" -> "<jd:source:1>"

Last edited by svArtist; 17.10.2020 at 05:08.
Reply With Quote
  #6  
Old 17.10.2020, 13:43
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,226
Default

think your making your package customiser rule more complicated than it needs to be. As in if it doesn't contain customtitle or uniquereference it wont conflict JD setting its own filename. As long as your script sets every time and its the end of the url structure there should be no problem.

#uniquereference=(your%20name%20here)
#uniquereference=(.+)

if contains ...

Sourceurl(s) -> contains -> "#uniquenamehere=(.+)" -> as RegEx

and then
Filename -> "<jd:source:1>.<jd:orgfiletype>"
Comment -> "<jd:source:1>"

do you use mega?
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 15:26.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.