JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 15.06.2021, 13:31
ScottQ ScottQ is offline
Baby Loader
 
Join Date: Mar 2021
Posts: 7
Question [LinkCrawler rule] Using Directory watch plugin

Despite reading many posts; copy and pasting code snippets into a crawlerjob file; and spending a few hours experimenting I could not get anything to work using directory watch plugin. I cannot work out which options are the critical ones that make it work. It is too complex for me. I am using MS windows. I was trying to do this:

Test 1 - setup a rule in test.crawljob created in watch folder
  • copy in test1.url file that contains a link to
    Code:
    [InternetShortcut]
    URL=somesite.xyz/a/b/c/page1.htm

I wanted the crawljob to trigger page1.htm web page link, to be scanned and find any upstor.re link on page1.htm and add it to download list.

Test 2 - assuming I cannot do that, I tried for something simpler
  • copy in test2.url file to watch folder. The file contains
    Code:
    [InternetShortcut]
    URL=h**ps://upsto.re/aaaaaa

I wanted upsto.re/aaaaaa to be added to download list (i.e. triggered whenever a new *.url is added into the watch folder)

What I tried test.crawljob file:
Code:
[ {  
    "pattern" : "file:/.*?\\.url$",
    "rule" : "DEEPDECRYPT",
    "deepPattern": "https?://upsto\\.re/[^ ]+",
    "deepAnalyseEnabled" : true,
    "enabled": true
} ]

Is this something that should be possible using folder watch plugin with a "test.crawljob". What should my [ { ... } ] rule look like?

(I know the last case can be done with clipboard monitor but I was wanting to use a "watch" folder)

Last edited by ScottQ; 19.06.2021 at 10:20. Reason: clarify content of URL file
Reply With Quote
  #2  
Old 15.06.2021, 13:45
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 70,918
Default

Crawljob File to auto crawl the given URL
Quote:
text=....URL
deepAnalyseEnabled=true
but this will find more than you want

so
Crawljob
Quote:
text=....URL
and additional LinkCrawlerRule that matches on your somesite and deepPattern to only return wanted links
Quote:
[ {
"pattern" : "This pattern has to match your somesite....",
"rule" : "DEEPDECRYPT",
"deepPattern": "https?://upsto\\.re/[^ ]+"
} ]
__________________
JD-Dev & Server-Admin
Reply With Quote
  #3  
Old 15.06.2021, 13:47
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 70,918
Default

Please know that you can always ask for help/hints
__________________
JD-Dev & Server-Admin
Reply With Quote
  #4  
Old 19.06.2021, 11:49
ScottQ ScottQ is offline
Baby Loader
 
Join Date: Mar 2021
Posts: 7
Question "text=" expressions

Quote:
Originally Posted by Jiaz View Post
...
Crawljob

Code:
text=....URL
and additional LinkCrawlerRule that matches on your somesite and deepPattern to only return wanted links
Thanks for clarifying what I need to do. All the examples I have seen in posting so far are like
Code:
text=**External links are only visible to Support Staff**
meaning fetch content of
Code:
**External links are only visible to Support Staff**
and pass content to the user defined LinkCrawlerRule parser rules.

I tested by copying file test1.url (shown in earlier post) into "folderwatch" and did not see anything triggered. I tried many variations of "text=" in my crawljob file such as
Code:
text=test1.url
# text=s:\\my\\folderwatch\\test1.url
# text="s:\\my\\folderwatch\\test1.url"
# text=https://upsto.re/txxxxxx
# text=file:///S:/my/folderwatch/test1.url
# text=file:///./TEST1.URL
# text=file:///./test1.url
# filename=test1.url

# I guess this is just wishful thinking?
# text=.+.URL
# text=....URL
JD moved crawljob file to "added" sub-folder.

Can I still edit it in "added" folder for testing?

If I edit file in added folder does it have to be moved back to parent folder for JD to detect change?

I then copy a test1.url file (see above) into "folderwatch". Should test1.url get moved to "added" sub folder or does it always stay in "folderwatch". I do not see anything added to download list to indicate the link in file was processed.

(@moderators please make post links visible, I always try to use sample links. I understand the redaction rationale, but it's been hard to learn from topics where links are redacted.)

Last edited by ScottQ; 19.06.2021 at 11:53.
Reply With Quote
  #5  
Old 19.06.2021, 17:14
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 1,212
Default

Crawljob file to add the 'test1.url' file to JD using folderwatch:
Code:
text=file:///c:/downloads/test1.url

Linkcrawler to detect/parse 'url' files:
Code:
[ {
  "enabled" : true,
  "pattern" : "file:/.*?\\.url$",
  "rule" : "DEEPDECRYPT",
  "deepPattern" : "url=(.+)"
} ]

If the pattern of the links in the 'url' file is supported by JD (e.g upstore), they will be automatically processed/added to JD. Else, you will have to create additional linkcrawler rule for them.

While the crawljob file can be edited in the 'added' folder, it will have to be moved back or copied to the folderwatch folder, for it to be detected by JD again.
Reply With Quote
  #6  
Old 20.06.2021, 10:50
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 70,918
Default

@ScottQ: sorry, I guess I did missunderstand you. You want to process html/text files that you have on disk via Folderwatch, right?

As an alternative you modify your html/text file and just append the modifiers with trailing
resttext, eg
Quote:
packageName=Test
downloadFolder=....
resttext=
rest of the text/html/... file
__________________
JD-Dev & Server-Admin
Reply With Quote
  #7  
Old 20.06.2021, 10:52
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 70,918
Default

Quote:
Originally Posted by ScottQ View Post
(@moderators please make post links visible, I always try to use sample links. I understand the redaction rationale, but it's been hard to learn from topics where links are redacted.)
Your links are visible to support staff
__________________
JD-Dev & Server-Admin
Reply With Quote
  #8  
Old 20.06.2021, 10:53
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 70,918
Default

Quote:
Originally Posted by ScottQ View Post
I then copy a test1.url file (see above) into "folderwatch". Should test1.url get moved to "added" sub folder or does it always stay in "folderwatch". I do not see anything added to download list to indicate the link in file was processed.
ONLY The crawljob file itself will be moved to added sub folder. JDownloader does not move any other referenced files
__________________
JD-Dev & Server-Admin
Reply With Quote
  #9  
Old 20.06.2021, 13:09
ScottQ ScottQ is offline
Baby Loader
 
Join Date: Mar 2021
Posts: 7
Default

Quote:
Originally Posted by Jiaz View Post
@ScottQ: sorry, I guess I did missunderstand you. You want to process html/text files that you have on disk via Folderwatch, right?

As an alternative you modify your html/text file and just append the modifiers with trailing
resttext, eg
No you were understanding it correctly. In first post test1.url content is literally shown.

So far I am testing a setup with 2 Linkcrawler rules
Code:
    "pattern": "file:/.*?\\.url$",
    "deepPattern": "URL=(.+)",
and
Code:
  "pattern" : "https?://somesite\\.xyz/[^ ]+",
  "rule" : "DEEPDECRYPT",
  "deepPattern" : "https?://upsto\\.re/[^ ]+",
to do the processing. If I right click in a file manager and copy test1.url file that results in linkcrawler seeing something on windows clipboard. I assume this is what is taking place:
  1. matching first rule, so deepattern following literal "URL=(.+)" is working
  2. it goes to correct somesite.xyz web site page
  3. rule 2 triggers on somesite.xyz and finds upstore.net link
  4. puts link on linkgrabber screen correctly

What's working so far is quite useful for me so I can use copy action in a file manager on *.url files and JD will get the upstore.net link from "somesite.xyz" page. In the browser I can copy the page address and upstore.net links are loaded get into JD - that quicker than manual page navigation. Close to getting it all working - I think??

I would like to fix the "text=" or whatever it should be to hook into linkcrawler rule 1.

"mgpai"'s reply indicates full path in URI format "file://...." is that applicable for MS Windows?

What is the string linkcrawler gets for a local file is always "file://...." on MS Windows as well ?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 20:32.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.