JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1081  
Old 19.02.2020, 14:48
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 49,953
Default

@mgpai
What he wants is not possible via link crawler rules.
He added a detailed description here:
https://board.jdownloader.org/showpo...3&postcount=11

To sum it up, what he wants is:
- Search that website for keywords
- Add the last X pages of the results
The website might also display a reCaptchaV2 on search attempt/Cloudflare

I told him that he will probably either need a very customized script or edit our official plugin and add the functionality he wants.

-psp-
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
Reply With Quote
  #1082  
Old 19.02.2020, 15:15
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 713
Default

Quote:
Originally Posted by pspzockerscene View Post
@mgpai
What he wants is not possible via link crawler rules.
He added a detailed description here:

To sum it up, what he wants is:
- Search that website for keywords
- Add the last X pages of the results
The website might also display a reCaptchaV2 on search attempt/Cloudflare

I told him that he will probably either need a very customized script or edit our official plugin and add the functionality he wants.

-psp-
I had come across that post and am familiar with that site. Should be very much possible to achieve what he wants using the solution I provided.
  1. The main page is regularly updated with new releases. Instead of crawling 'search pages' (should also work just fine though), just create a linkcrawler rule which grabs links from the main url. Use 'deepPattern" or linkgrabber rule to filter the content.
  2. Add the 'source url' to JD at regular intervals (for e.g. every 60 minutes) using a script.
Reply With Quote
  #1083  
Old 19.02.2020, 15:28
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 49,953
Default

You are right but how would you manage the thing with the "keywords" he wants?
Also via link crawler rule --> Only allow it to pick-ip URLs containing the keywords?

-psp-
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
Reply With Quote
  #1084  
Old 19.02.2020, 16:18
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 713
Default

Quote:
Originally Posted by pspzockerscene View Post
You are right but how would you manage the thing with the "keywords" he wants?
Also via link crawler rule --> Only allow it to pick-ip URLs containing the keywords?

-psp-
Yes. Specify 'deepPattern' or create linkgrabber filter rule.

That page lists only the most recent 15 releases, so will not take long to crawl. Shorter interval can be used if the page is updated frequently.

Linkcrawler rule > deeppattern > Create a html/url pattern which contains keywords. (best/most efficient option).

OR

Linkgrabber fiilter: Block urls which do not contain the keyword.
Reply With Quote
  #1085  
Old 19.02.2020, 16:21
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 49,953
Default

Hm you are right this could work --> And I'am wrong
Sometimes things are easier than expected at first glance.

-psp-
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
Reply With Quote
  #1086  
Old 20.02.2020, 03:21
RPNet-user RPNet-user is offline
JD Adviser
 
Join Date: Apr 2017
Posts: 103
Default

In the Linkcrawler rule I have tried several combinations for the deepPattern with a single key word and none of them worked, much less for two keywords separated by a space. Nothing is filtered, it adds everything, and crawls forever.

The linkgrabber filter will not work properly because it shows both the 'filtered' and the thousands of 'accepted' which continues to add everything else indefinitely, obviously because I'm unable to set a proper deepPattern.

Here are three variations of deeppatterns that I have tried and none worked:

"deepPattern" : "class="RARBG"><a href="([^"]+)""
"deepPattern" : "(http.+\\RARBG)"
"deepPattern" : "(**External links are only visible to Support Staff**


[ {
"enabled" : true,
"cookies" : null,
"updateCookies" : true,
"logging" : false,
"maxDecryptDepth" : 0,
"id" : 1582157977984,
"name" : "rmz.cr",
"pattern" : "**External links are only visible to Support Staff**,
"rule" : "DEEPDECRYPT",
"packageNamePattern" : null,
"passwordPattern" : null,
"formPattern" : null,
"deepPattern" : "class="RARBG"><a href="([^"]+)"",
"rewriteReplaceWith" : null
} ]

Last edited by RPNet-user; 20.02.2020 at 06:50.
Reply With Quote
  #1087  
Old 20.02.2020, 16:50
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 49,953
Default

How do you expect this to work?
Your rule does not even contain your keysowrds anywhere.
Anyways, here a blank example:
Code:
[ {
  "enabled" : true,
  "updateCookies" : true,
  "logging" : false,
  "maxDecryptDepth" : 1,
  "id" : 1422443765154,
  "name" : "rmz.cr example rule",
  "pattern" : "https?://rmz\\.cr/",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "(/release/keyword1[a-z0-9\\-]+keyword2[a-z0-9\\-]+keyword3)",
  "rewriteReplaceWith" : null
} ]
Here an example which grabs all "release" URLs containing the word "480p" (= only 1 keyword):
Code:
[ {
  "enabled" : true,
  "updateCookies" : true,
  "logging" : false,
  "maxDecryptDepth" : 1,
  "id" : 1422443765154,
  "name" : "rmz.cr example rule",
  "pattern" : "https?://rmz\\.cr/",
  "rule" : "DEEPDECRYPT",
  "packageNamePattern" : null,
  "passwordPattern" : null,
  "formPattern" : null,
  "deepPattern" : "(/release/[a-z0-9\\-]+480p[a-z0-9\\-]+)",
  "rewriteReplaceWith" : null
} ]
-psp-
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
Reply With Quote
  #1088  
Old 20.02.2020, 23:37
RPNet-user RPNet-user is offline
JD Adviser
 
Join Date: Apr 2017
Posts: 103
Default

@psp, Thanks

Yes my sample does contain the keyword as I was testing with the keyword 'RARBG', but obviously I was not using the proper syntax and regular expressions for the "pattern" : "https?://rmz\\.cr/" and the deepPattern: "/release/keyword[a-z0-9\\-]". Not that it matters anyway since it still will not work.

I tested your 480p sample and it does grab all 480p only, and I also tested 1080p by replacing 480p with 1080p and that also works, however, adding the second keyword does not work, for example when adding the keyword RARBG; so with your 480p sample I decided to replace the 480p with the keyword RARBG which is the same keyword I tested with before my previous post with incorrect syntax and it 'does not work'.

There is a single 1080p RARBG on the front page at the moment which the crawler does not add, however, when I use only the keyword 1080p it does add the 1080p RARBG with all other 1080p releases as well, which means that the linkcrawler rule does not work for keywords like 'RARBG' +added or by itself.

I also tried with just single keywords using only the release names like: VXT, ION10, etc. and none of them worked.
BTW, although a-z supposedly accepts either upper or lower case I also tested with the A-Z since some of these keywords are all upper case letters only and that still did not work.

Here is what I found so far, the regex will accept any keyword in the body of the title before/prior to the hyphen "-" that separates the trailing word, for example in the title: title.2020.720p.webrip.x264.aac-expresso regex will accept any keyword prior to "-expresso" but not after the hyphen "-".

@mgpai
How do you create a linkgrabber filter that blocks urls which do not contain the keyword?
So as not to show, not make visible, not include and 'not accept anything' other than the urls that contains that keyword?

Last edited by RPNet-user; 21.02.2020 at 06:02.
Reply With Quote
  #1089  
Old 21.02.2020, 01:10
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 16,536
Default

regex, (?!keyword|s)
regex101 dot com is good to help write patterns. Place in sample of content to match against with some that junk and you can see whats what.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #1090  
Old 21.02.2020, 02:48
RPNet-user RPNet-user is offline
JD Adviser
 
Join Date: Apr 2017
Posts: 103
Default

Quote:
Originally Posted by raztoki View Post
regex, (?!keyword|s)
regex101 dot com is good to help write patterns. Place in sample of content to match against with some that junk and you can see whats what.
Thanks but that's not going to help if no pattern will accept those type of keywords.
Reply With Quote
  #1091  
Old 21.02.2020, 11:46
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 16,536
Default

sure you need to have some way to identify objects within the url for instance. you can either look for what you want,.
or
block everything that is not what you want. which is a negative look around ignore, you create a filter block for everything BUT what you want. Should work assuming the information is available within the url.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #1092  
Old 21.02.2020, 14:59
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 713
Default

Quote:
Originally Posted by RPNet-user View Post
... I decided to replace the 480p with the keyword RARBG ... and it 'does not work'.
While PSP has gone the extra mile and provided you example rules, it is up to you to fine tune it and make it work. The 'release' urls on that site do not seem to contain any 'uppercase' characters. Hence, if you use 'uppercase' characters in keywords, it will obiviously not match. Either use the correct case, or include the case-insensitive flag in the regex.

Quote:
...for example in the title: title.2020.720p.webrip.x264.aac-expresso regex will accept any keyword prior to "-expresso" but not after the hyphen "-".
The 'release' urls on that site also do not seem to contain any 'dot' characters. But, even if we take your string just as an example, it will also not match, as it contains '.' character while the regex doesn't. You have to modify it in order to match the string.

Example:
Code:
[.a-z0-9\\-]+

Quote:
How do you create a linkgrabber filter that blocks urls which do not contain the keyword?
The default selection in filter rule condition is 'contains'. Change it to 'contains not' to exclude urls which do not contain your keywords.

If you specify the correct 'deepPattern' in linkcrawler rule, you will not need to create the linkgrabber filter rule.

Quote:
Originally Posted by raztoki View Post
regex, (?!keyword|s)
regex101 dot com is good to help write patterns. Place in sample of content to match against with some that junk and you can see whats what.
I second that.
Reply With Quote
  #1093  
Old 21.02.2020, 15:15
RPNet-user RPNet-user is offline
JD Adviser
 
Join Date: Apr 2017
Posts: 103
Smile

Quote:
Originally Posted by mgpai View Post
While PSP has gone the extra mile and provided you example rules, it is up to you to fine tune it and make it work. The 'release' urls on that site do not seem to contain any 'uppercase' characters. Hence, if you use 'uppercase' characters in keywords, it will obiviously not match. Either use the correct case, or include the case-insensitive flag in the regex.



The 'release' urls on that site also do not seem to contain any 'dot' characters. But, even if we take your string just as an example, it will also not match, as it contains '.' character while the regex doesn't. You have to modify it in order to match the string.

@mgpai

The title sample that includes those dots is the name pattern that is used for the actual downloadable file names not the title of the posts, it was just an example to point out that none of the keywords after the hyphen "-" are been accepted by the regex keyword search, and yes I tried all those release names and several others after the hyphen in lowercase and none of them worked so the case does not appear to be affecting the functionality of the regex keyword search, and although I didn't use the "i" flag, I did test with A-Za-z0-9 to verify that it was not a case issue.

Anyway it is working now, I had to remove the keyword from the middle and add it after the last+quantifier so it looks like this: (/release/[a-z0-9\\-]+[a-z0-9\\-]+rmteam)
So the keywords after the hyphen will never work anywhere except in the last keyword placement of the regex regardless of the case.

The site is updating regularly with just 15 posts on the first page which includes both tv shows and movies(nonfiltered), however, on the top of their page there is an option to select "movies only" which then the url adds /l/m after the top level dname. At the bottom of each page they are numbered with links to each page in the format: /l/m/2, /l/m/3, and so on, so the second page looks like this: rmz.cr/l/m/2.
So if I wanted to add just the first five pages from the "/l/m/" to my crawl, then I assume that I would have to add/change this in the pattern and the deepPattern regex?

Scratch that, I believe that mgpai's script--> "Add urls to linkgrabber at user-defined intervals" will handle that.
I will test and post back with results.

Last edited by raztoki; 22.02.2020 at 05:26. Reason: insert /quote bbcode
Reply With Quote
  #1094  
Old 22.02.2020, 06:38
Germini Germini is offline
Modem User
 
Join Date: Feb 2020
Posts: 1
Default

Hi, I have tried the following script.



Quote:
Originally Posted by mgpai View Post
Convert AAC/M4A/OGG files to MP3.
Code:
// Convert aac/m4a/ogg files to mp3.
// Trigger required: "A Download Stopped".
// Requires ffmpeg/ffprobe. Uses JD ffmpeg/ffprobe settings if available.
// Overwrites destination file (mp3) if it already exists.

if (link.isFinished()) {
    var fileName = link.name.replace(/(.+)(\..+$)/, "$1");
    var fileType = link.name.replace(/(.+)(\..+$)/, "$2");
    var sourceFile = link.getDownloadPath();
    var audioFile = /\.(aac|m4a|ogg)$/.test(sourceFile);

    if (audioFile) {
        var downloadFolder = package.getDownloadFolder();
        var destFile = downloadFolder + "/" + fileName + ".mp3";
        var ffmpeg = callAPI("config", "get", "org.jdownloader.controlling.ffmpeg.FFmpegSetup", null, "binarypath");
        var ffprobe = callAPI("config", "get", "org.jdownloader.controlling.ffmpeg.FFmpegSetup", null, "binarypathprobe");
        var data = JSON.parse(callSync(ffprobe, "-v", "quiet", "-print_format", "json", "-show_streams", "-show_format", sourceFile));
        var streamsBitrate = data.streams[0].bit_rate ? data.streams[0].bit_rate : 0;
        var formatBitrate = data.format.bit_rate ? data.format.bit_rate : 0;
        var bitrate = Math.max(streamsBitrate, formatBitrate) / 1000;
        var deleteSourceFile = false; // Set this to true to delete source file after conversion.

        if (bitrate > 0) {
            callSync(ffmpeg, "-y", "-i", sourceFile, "-b:a", bitrate + "k", destFile);
            if (deleteSourceFile && getPath(destFile).exists()) deleteFile(sourceFile, false);
        }
    }
}

It worked perfect when downloading one file only. If I try to download multiple files It fails. I tried using and not using "Synchronous execution of script" but both didn't work for multiple files.

This is the error I am getting

Code:
net.sourceforge.htmlunit.corejs.javascript.EcmaError: SyntaxError: Unterminated object literal (#17)
	at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3629)
	at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3613)
	at net.sourceforge.htmlunit.corejs.javascript.NativeJSON.parse(NativeJSON.java:125)
	at net.sourceforge.htmlunit.corejs.javascript.NativeJSON.execIdCall(NativeJSON.java:97)
	at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call(IdFunctionObject.java:89)
	at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1531)
	at script(:17)
	at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
	at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
	at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
	at org.jdownloader.scripting.JSHtmlUnitPermissionRestricter$SandboxContextFactory.doTopCall(JSHtmlUnitPermissionRestricter.java:119)
	at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)
	at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
	at net.sourceforge.htmlunit.corejs.javascript.Context.evaluateString(Context.java:1212)
	at org.jdownloader.extensions.eventscripter.ScriptThread.evalUNtrusted(ScriptThread.java:288)
	at org.jdownloader.extensions.eventscripter.ScriptThread.executeScipt(ScriptThread.java:180)
	at org.jdownloader.extensions.eventscripter.ScriptThread.run(ScriptThread.java:160)
After reading the whole topic I found some scripts which I modified to make it work for my purposes. If someone wants to improve it would be awesome but works for what I need it.

Code:
// Convert aac/m4a/ogg files to mp3 for youtube.com links
// Trigger required: "A Download Stopped".

var deleteSourceFile = true; // Set this to true to delete source file after conversion.
var sourceFile = link.getDownloadPath();
var filetype = getPath(link.getDownloadPath()).getExtension();
var filename = link.getName();
var extLength = filetype.length + 1
var newfilename = filename.substring(0, filename.length - extLength)
var downloadFolder = package.getDownloadFolder();
var destFile = downloadFolder + "\\" + newfilename + ".mp3";


if (link.isFinished()) {
    if (link.getHost() == "youtube.com") {
        if (filetype == "m4a" || filetype == "aac" || filetype == "ogg") {
            callSync(JD_HOME + "\\tools\\Windows\\ffmpeg\\x64\\ffmpeg.exe", "-v", "5", "-y", "-i", sourceFile, destFile)
        }
        if (deleteSourceFile && getPath(destFile).exists()) deleteFile(sourceFile, false);
    }
If somebody knows how to make Jdownloader to show "Demuxing/conversion" as status or similar when ffmpeg process is active It would be awesome.

Now I am trying to make the image from the video to be the cover. I will update when made. Also want to detect if there is a square to cut borders.

Greetings, Germini

Last edited by Germini; 22.02.2020 at 06:40.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 19:36.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.