JDownloader Community - Appwork GmbH
 

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 15.03.2019, 23:25
armin.beispiel armin.beispiel is offline
Vacuum Cleaner
 
Join Date: Nov 2015
Posts: 19
Default Plugin request to get HD video tracks

URL: **External links are only visible to Support Staff****External links are only visible to Support Staff** (**External links are only visible to Support Staff**adjara.com has a link to **External links are only visible to Support Staff**adjaranet.com which redirects to net.adjara.com)

This is a Georgian site, but still easy to navigate. (I don't speak Georgian.)

The site lets you search for movie titles (it displays English and Georgian titles) and shows the respective covers. When you click a search result, you can watch the movie in your browser. It works reliably.

With the controls near the seekbar, you can set the video quality, the language and subtitles. Quality and language are also reflected in the URL parameters.

Usefulness of support: I regularly use the site when I can't find a German HD movie release, but there are German releases with bad video quality (such as 700MB releases that even have good AC3 audio). Then, I do some checking and calculating and combine the HD video with the German audio track with MKVToolNix. And when the Adjara video has an English audio track, I even have a dual-language version.

A quick look at the DOM shows that the site uses the **External links are only visible to Support Staff**JW Player. Maybe you have experience with that?

Anyways, currently, I get my download URLs by inspecting the video area with the developer tools, then expanding the following element in close proximity:

HTML Code:
<span class="jwvideo" style="z-index: 0;">
  <video src="**External links are only visible to Support Staff** crossorigin="anonymous">
    <track kind="captions" label="English" srclang="En" src="**External links are only visible to Support Staff**>
    <track kind="captions" label="Russian" srclang="Ru" src="**External links are only visible to Support Staff**>
  </video>
</span>
The HTML is from this movie page: **External links are only visible to Support Staff****External links are only visible to Support Staff**.

When JD is told the Adjara URL, it should get the video file URL that is associated with the URL parameters (quality and language). Additionally, the subtitle files should be retrieved (.vtt).
Reply With Quote
  #2  
Old 18.03.2019, 15:25
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

I'm sorry but we don't add support for such index sites that only link to other (embedded) video streams/sites. You might wanna try to create linkcrawler rules to tell JDownloader how to parse those urls and where to look for video stream. Use board search to look for examples for linkcrawler rules.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #3  
Old 19.03.2019, 06:43
armin.beispiel armin.beispiel is offline
Vacuum Cleaner
 
Join Date: Nov 2015
Posts: 19
Default

I was successful in creating custom link crawler rules for net.adjara.com:

Code:
[
	{
		"name": "SOME DOCUMENTATION: https://board.jdownloader.org/showthread.php?t=77280#post422008",
		"enabled": false
	},
	{
		"name": "net.adjara.com JD Link Crawler Rule v2019.03.20 (Part 1: Go to scrapable page)",
		"enabled": true,
		"pattern": "https?://net\\.adjara\\.com/Movie(?:/main)?\\?id=\\d.*",
		"rule": "REWRITE",
		"rewriteReplaceWith": "$0&js=1"
	},
	{
		"name": "net.adjara.com JD Link Crawler Rule v2019.03.20 (Part 2: Find LQ video URLs in all listed languages and set package name)",
		"enabled": true,
		"pattern": "https?://net\\.adjara\\.com/Movie(?:/main)?\\?id=\\d.*?&js=1",
		"rule": "DEEPDECRYPT",
		"maxDecryptDepth": 0,
		"packageNamePattern": "(?x) var\\s+pageTitle\\s*=\\s* ['\"] [\\p{IsGeorgian}:,()/\\x20-]* ( (?: \\\\['\"] | [^'\"] )+ ) ['\"] ",
		"deepPattern": "data-href=\"(https?://[^\"]+?\\.mp4)\""
	},
	{
		"name": "net.adjara.com JD Link Crawler Rule v2019.03.20 (Part 3: Convert LQ to HQ video URLs that are possibly invalid)",
		"enabled": true,
		"pattern": "(https?://\\d+(?:\\.\\d+){3}/.+?/\\d+_(?:Russian|Georgian|English))_300(\\.mp4)",
		"rule": "REWRITE",
		"rewriteReplaceWith": "$1_1500$2"
	}
]
Installation: Settings > Advanced settings > search for "crawler rules" > paste the JSON.

You don't have to manually switch to HD to get the HD files. There doesn't even have to be the quality and language in the URL. That means, you can drag URLs directly from the search results into JD. Getting the low-quality video URLs is not supported at all, though. Subtitles are also not supported.

Last edited by armin.beispiel; 20.03.2019 at 01:55. Reason: Version updated
Reply With Quote
  #4  
Old 19.03.2019, 10:43
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

Thanks for your understanding and time to check how linkcrawler rules work and providing working examples

I guess it would be best to make those rules easy to importable via import/export actions in JDownloader.

Just out of interest, was it easy for you to make those rules work? Was it easy to understand how the rules work?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #5  
Old 20.03.2019, 03:43
armin.beispiel armin.beispiel is offline
Vacuum Cleaner
 
Join Date: Nov 2015
Posts: 19
Default

Quote:
I guess it would be best to make those rules easy to importable via import/export actions in JDownloader.
Yes, it would be better, especially for non-technical people, not to have to merge different JSON snippets, if they want to use rules from multiple sources.

Quote:
Just out of interest, was it easy for you to make those rules work? Was it easy to understand how the rules work?
Generally yes. It's always desirable to have a central location with the complete documentation, though.

I still don't understand "maxDecryptDepth".

In "packageNamePattern", I originally wanted group #2 to be the package name and had to compromise (I captured "['"]", then referenced it at the end of the JS string). It would help, if you generally (not only in this property) allowed a named group to be the return, such as "(?<return>...)", "(?<result>...)", "(?<yield>...)" or "(?<output>...)".

For "deepPattern", it seems that "$0" or "$1" can be the return, depending on whether a capture group is specified. That should also be documented. Otherwise, you might tend to use unnecessary look-arounds.

----------------------------------

Here's a feature request for a new rule type that matches against the whole response body (HTML with JS) and that would've been much more useful in my case:

Code:
...
"rule": "GENERATEURLS",
"urlGeneration": [
	{
		"//": "Video file URLs",
		"contentPattern": "(regex to match low-quality video file URLs; that's the only complete set of URLs in the source text)",
		"urlsToGenerate": [
			"$1_lowQuality.mp4",
			"$1_highQuality.mp4"
		]
	},
	{
		"//": "Subtitle file URLs",
		"contentReductionPattern": "(?x) /(?<movieId>\\d+)_.+?\\.mp4'; \\s* var\\s+movieLangs\\s*=\\s* '(?<langList>[^'])+'; ",
		"reducedContent": "${langList}\n${movieId}",
		"//": "Now, contentPattern is matched against reducedContent",
		"contentPattern": "(?x) (?<lang>[A-Za-z]+) (?= .+\n (?<movieId>\d+) )",
		"urlsToGenerate": [
			"http DONTHIDEME ://static.domain/subtitles/${movieId}_${lang}.vtt",
		],

		"//": "contentPattern is applied as often as possible. contentReductionPattern can only be applied once, because the whole content will be reducedContent after the one replacement."
	}
],
...
Example page to use it against: **External links are only visible to Support Staff****External links are only visible to Support Staff** (Adjara movie URL + "&js=1"). I checked the response body with **External links are only visible to Support Staff****External links are only visible to Support Staff**.

Last edited by armin.beispiel; 20.03.2019 at 04:03.
Reply With Quote
  #6  
Old 20.03.2019, 10:30
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

Quote:
Originally Posted by armin.beispiel View Post
I still don't understand "maxDecryptDepth".
here you can specify how 'deep' the rule may process. imagine a rule that matches on forum posts and within a post there is link to another post. with maxDecryptDepth you can control how 'deep' the linkcrawler will follow those.
how often the rule is allowed to match max in chain of results
Input->Rule1->Rule2->Rule1......
Understood?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #7  
Old 20.03.2019, 10:33
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

Quote:
Originally Posted by armin.beispiel View Post
----------------------------------

Here's a feature request for a new rule type that matches against the whole response body (HTML with JS) and that would've been much more useful in my case:
Hmm, what exactly do you mean by HTML with JS? JDownloader is no browser. It simply loads the given URL and that's it. It has no knowledge about DOM/JS/CSS...
__________________
JD-Dev & Server-Admin
Reply With Quote
  #8  
Old 20.03.2019, 10:40
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

Named Groups are supported by Java Regex since Java1.7 but we have our own wrapper class for easier use with Regex and that doesn't support named groups yet.


I also like the idea to not have fixed matching group index but being able to customize the
result/return value via additional rule
__________________
JD-Dev & Server-Admin
Reply With Quote
  #9  
Old 20.03.2019, 10:41
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

__________________
JD-Dev & Server-Admin
Reply With Quote
  #10  
Old 20.03.2019, 10:42
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

It would also be possible to add a Hook for the Eventscripter so that you specify a simple Linkcrawler Rule
that tells JDownloader to match on XY URLs and then call specific Eventscripter Script to process
and return links. What do you think about that? With javascript/api involved there are many more possibilities than with simple linkcrawler rules
__________________
JD-Dev & Server-Admin
Reply With Quote
  #11  
Old 20.03.2019, 16:16
armin.beispiel armin.beispiel is offline
Vacuum Cleaner
 
Join Date: Nov 2015
Posts: 19
Default

Quote:
Originally Posted by Jiaz View Post
here you can specify how 'deep' the rule may process. imagine a rule that matches on forum posts and within a post there is link to another post. with maxDecryptDepth you can control how 'deep' the linkcrawler will follow those.
how often the rule is allowed to match max in chain of results
Input->Rule1->Rule2->Rule1......
Understood?
I'm not sure.

For some reason, I had it in my mind that maxDecryptDepth only applies to DEEPDECRYPT rules. According to this, this doesn't seem to be correct.

Does an individual rule have a counter that is initialized every time the individual rule is the first instance in the rule application chain and is increased for every further occurence of that rule in the rule application chain, and that counter may not exceed the rule's maxDecryptDepth?

Or is there a global value that is set to the number of rules in the rule application chain, and a rule won't be applied, no matter its pattern, when that value is larger than the rule's maxDecryptDepth?

Quote:
Originally Posted by Jiaz View Post
imagine a rule that matches on forum posts and within a post there is link to another post.
As I understand it, that specific case would require that pattern and deepPattern both match the same URLs. Or, given your example of "Input->Rule1->Rule2->Rule1", there might be a REWRITE rule that makes the deepPattern URLs fit for matching the DEEPDECRYPT pattern.

-------------------------------

Quote:
Originally Posted by Jiaz View Post
Hmm, what exactly do you mean by HTML with JS? JDownloader is no browser. It simply loads the given URL and that's it. It has no knowledge about DOM/JS/CSS...
I'm just referring to the response body/the text of **External links are only visible to Support Staff****External links are only visible to Support Staff**, which is HTML markup with inline JS code. The JS code (the text) contains the data to be mined via regexes. Here's an excerpt:

Code:
    var movieUrlEmpty = '**External links are only visible to Support Staff**;
    var movieLangs = 'Russian,Georgian,English';
    var movieQuals = '1500,300';
Those variable contents are different for every movie.

-----------------------------------

Quote:
Originally Posted by Jiaz View Post
It would also be possible to add a Hook for the Eventscripter so that you specify a simple Linkcrawler Rule
that tells JDownloader to match on XY URLs and then call specific Eventscripter Script to process
and return links. What do you think about that? With javascript/api involved there are many more possibilities than with simple linkcrawler rules
That may be worth a look. Thanks for the info. But I can't go into this for now.

Last edited by armin.beispiel; 20.03.2019 at 16:19.
Reply With Quote
  #12  
Old 20.03.2019, 16:35
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

maxDecryptDepth:
-internally each processed 'result' knows its matching rule.
-whenever a new rule is checked for matching, the check counts how often the rule itself has matched in the result-chain (input->result1->result2...->resultx) and then aborts
-it's basically a 'avoid unlimited matching/crawling' protection
-the maxDecryptDepth is not about global matching/depth but is evaluated/checked again for each new crawling 'result'

In case it's still not clear enough how this works/what it does, please let me know
__________________
JD-Dev & Server-Admin
Reply With Quote
  #13  
Old 20.03.2019, 16:37
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

Quote:
Originally Posted by armin.beispiel View Post
As I understand it, that specific case would require that pattern and deepPattern both match the same URLs. Or, given your example of "Input->Rule1->Rule2->Rule1", there might be a REWRITE rule that makes the deepPattern URLs fit for matching the DEEPDECRYPT pattern.
INPUT->DEEPDECRYPT-> return 4 new links -> each one *(REWRITE -> DEEPDECRYPT->4 new links...)

For example DeepDecrypt matches on '...showthread.php?t=nummner' and returns every html link on that site-> that may lead to many different showthread links and endless crawling through the board.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #14  
Old 21.03.2019, 11:02
armin.beispiel armin.beispiel is offline
Vacuum Cleaner
 
Join Date: Nov 2015
Posts: 19
Default

Thanks for explaining. I think that's clear enough.
Reply With Quote
  #15  
Old 21.03.2019, 11:19
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,289
Default

Just ask if you got further questions or comments
__________________
JD-Dev & Server-Admin
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 17:18.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.