JDownloader Community - Appwork GmbH
 

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 15.06.2021, 14:31
ScottQ ScottQ is offline
Junior Loader
 
Join Date: Mar 2021
Posts: 13
Question [LinkCrawler rule] Using Directory watch plugin

Despite reading many posts; copy and pasting code snippets into a crawlerjob file; and spending a few hours experimenting I could not get anything to work using directory watch plugin. I cannot work out which options are the critical ones that make it work. It is too complex for me. I am using MS windows. I was trying to do this:

Test 1 - setup a rule in test.crawljob created in watch folder
  • copy in test1.url file that contains a link to
    Code:
    [InternetShortcut]
    URL=somesite.xyz/a/b/c/page1.htm

I wanted the crawljob to trigger page1.htm web page link, to be scanned and find any upstor.re link on page1.htm and add it to download list.

Test 2 - assuming I cannot do that, I tried for something simpler
  • copy in test2.url file to watch folder. The file contains
    Code:
    [InternetShortcut]
    URL=h**ps://upsto.re/aaaaaa

I wanted upsto.re/aaaaaa to be added to download list (i.e. triggered whenever a new *.url is added into the watch folder)

What I tried test.crawljob file:
Code:
[ {  
    "pattern" : "file:/.*?\\.url$",
    "rule" : "DEEPDECRYPT",
    "deepPattern": "https?://upsto\\.re/[^ ]+",
    "deepAnalyseEnabled" : true,
    "enabled": true
} ]

Is this something that should be possible using folder watch plugin with a "test.crawljob". What should my [ { ... } ] rule look like?

(I know the last case can be done with clipboard monitor but I was wanting to use a "watch" folder)

Last edited by ScottQ; 19.06.2021 at 11:20. Reason: clarify content of URL file
Reply With Quote
  #2  
Old 15.06.2021, 14:45
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

Crawljob File to auto crawl the given URL
Quote:
text=....URL
deepAnalyseEnabled=true
but this will find more than you want

so
Crawljob
Quote:
text=....URL
and additional LinkCrawlerRule that matches on your somesite and deepPattern to only return wanted links
Quote:
[ {
"pattern" : "This pattern has to match your somesite....",
"rule" : "DEEPDECRYPT",
"deepPattern": "https?://upsto\\.re/[^ ]+"
} ]
__________________
JD-Dev & Server-Admin
Reply With Quote
  #3  
Old 15.06.2021, 14:47
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

Please know that you can always ask for help/hints
__________________
JD-Dev & Server-Admin
Reply With Quote
  #4  
Old 19.06.2021, 12:49
ScottQ ScottQ is offline
Junior Loader
 
Join Date: Mar 2021
Posts: 13
Question "text=" expressions

Quote:
Originally Posted by Jiaz View Post
...
Crawljob

Code:
text=....URL
and additional LinkCrawlerRule that matches on your somesite and deepPattern to only return wanted links
Thanks for clarifying what I need to do. All the examples I have seen in posting so far are like
Code:
text=**External links are only visible to Support Staff**
meaning fetch content of
Code:
**External links are only visible to Support Staff**
and pass content to the user defined LinkCrawlerRule parser rules.

I tested by copying file test1.url (shown in earlier post) into "folderwatch" and did not see anything triggered. I tried many variations of "text=" in my crawljob file such as
Code:
text=test1.url
# text=s:\\my\\folderwatch\\test1.url
# text="s:\\my\\folderwatch\\test1.url"
# text=**External links are only visible to Support Staff**# text=file:///S:/my/folderwatch/test1.url
# text=file:///./TEST1.URL
# text=file:///./test1.url
# filename=test1.url

# I guess this is just wishful thinking?
# text=.+.URL
# text=....URL
JD moved crawljob file to "added" sub-folder.

Can I still edit it in "added" folder for testing?

If I edit file in added folder does it have to be moved back to parent folder for JD to detect change?

I then copy a test1.url file (see above) into "folderwatch". Should test1.url get moved to "added" sub folder or does it always stay in "folderwatch". I do not see anything added to download list to indicate the link in file was processed.

(@moderators please make post links visible, I always try to use sample links. I understand the redaction rationale, but it's been hard to learn from topics where links are redacted.)

Last edited by ScottQ; 19.06.2021 at 12:53.
Reply With Quote
  #5  
Old 19.06.2021, 18:14
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 1,533
Default

Crawljob file to add the 'test1.url' file to JD using folderwatch:
Code:
text=file:///c:/downloads/test1.url

Linkcrawler to detect/parse 'url' files:
Code:
[ {
  "enabled" : true,
  "pattern" : "file:/.*?\\.url$",
  "rule" : "DEEPDECRYPT",
  "deepPattern" : "url=(.+)"
} ]

If the pattern of the links in the 'url' file is supported by JD (e.g upstore), they will be automatically processed/added to JD. Else, you will have to create additional linkcrawler rule for them.

While the crawljob file can be edited in the 'added' folder, it will have to be moved back or copied to the folderwatch folder, for it to be detected by JD again.
Reply With Quote
  #6  
Old 20.06.2021, 11:50
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

@ScottQ: sorry, I guess I did missunderstand you. You want to process html/text files that you have on disk via Folderwatch, right?

As an alternative you modify your html/text file and just append the modifiers with trailing
resttext, eg
Quote:
packageName=Test
downloadFolder=....
resttext=
rest of the text/html/... file
__________________
JD-Dev & Server-Admin
Reply With Quote
  #7  
Old 20.06.2021, 11:52
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

Quote:
Originally Posted by ScottQ View Post
(@moderators please make post links visible, I always try to use sample links. I understand the redaction rationale, but it's been hard to learn from topics where links are redacted.)
Your links are visible to support staff
__________________
JD-Dev & Server-Admin
Reply With Quote
  #8  
Old 20.06.2021, 11:53
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

Quote:
Originally Posted by ScottQ View Post
I then copy a test1.url file (see above) into "folderwatch". Should test1.url get moved to "added" sub folder or does it always stay in "folderwatch". I do not see anything added to download list to indicate the link in file was processed.
ONLY The crawljob file itself will be moved to added sub folder. JDownloader does not move any other referenced files
__________________
JD-Dev & Server-Admin
Reply With Quote
  #9  
Old 20.06.2021, 14:09
ScottQ ScottQ is offline
Junior Loader
 
Join Date: Mar 2021
Posts: 13
Default

Quote:
Originally Posted by Jiaz View Post
@ScottQ: sorry, I guess I did misunderstand you. You want to process html/text files that you have on disk via Folderwatch, right?

As an alternative you modify your html/text file and just append the modifiers with trailing
resttext, eg
No you were understanding it correctly. In first post test1.url content is literally shown.

Update: split this post into 2 items

I have not been able to get folderwatch to read a *.url file and extract the URL=**External links are only visible to Support Staff**

Last edited by ScottQ; 01.07.2021 at 23:27.
Reply With Quote
  #10  
Old 02.07.2021, 00:01
ScottQ ScottQ is offline
Junior Loader
 
Join Date: Mar 2021
Posts: 13
Default

Q1: "mgpai"'s reply indicates full path in URI format "file://...." is that applicable for JD on MS Windows OS and is that format used everywhere in JD?

I tried testing just using 2 linkcrawler rules (i.e. I turned off folderwatch) just to test local file name matching. Here are snippets for each of the rules:

RULE 1:
Code:
    "pattern": "file:/.*?\\.url$",
    "rule" : "DEEPDECRYPT",
    "deepPattern": "URL=(.+)",
RULE 2:
Code:
  "pattern" : "https?://somesite\\.xyz/[^ ]+",
  "rule" : "DEEPDECRYPT",
  "deepPattern" : "https?://upsto\\.re/[^ ]+",
For benefit of other novice users RULE 2 does:
Spoiler:
  1. any time URL somesite.xyz/... appears in the clipboard text crawl (=read) that web site page link (1st line)
  2. find any upstore.net (upsto.re is an alias website domain that goes to upstore.net) link (3rd line)
  3. pass link to linkgrabber screen
This saves time browsing "somesite.xyz" and finding the upsto.re links


Create C:\test1.url file:
Code:
[InternetShortcut]
URL=**External links are only visible to Support Staff**
A test of copying "somesite" string to clipboard triggers second link crawler rule and does what I want

Clipboard:
Code:
**External links are only visible to Support Staff**
Q2: If I copy the string "C:\test.url" to clipboard will JD read the file "C:\test.url" and look for URL= line as defined in first linkcrawler rule?

Clipboard:
Code:
C:\test.url
Q3: I have tried copying several string permutations to clipboard but nothing seems to trigger RULE 1 followed by RULE 2. Does linkcrawler rule list stop processing when it gets the first rule that matches?

Last edited by ScottQ; 11.07.2021 at 02:02.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 19:26.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.