- any time URL somesite.xyz/... appears in the clipboard text crawl (=read) that web site page link (1st line)
- find any upstore.net (upsto.re is an alias website domain that goes to upstore.net) link (3rd line)
- pass link to linkgrabber screen
#1
|
|||
|
|||
[LinkCrawler rule] Using Directory watch plugin
Despite reading many posts; copy and pasting code snippets into a crawlerjob file; and spending a few hours experimenting I could not get anything to work using directory watch plugin. I cannot work out which options are the critical ones that make it work. It is too complex for me. I am using MS windows. I was trying to do this:
Test 1 - setup a rule in test.crawljob created in watch folder
I wanted the crawljob to trigger page1.htm web page link, to be scanned and find any upstor.re link on page1.htm and add it to download list. Test 2 - assuming I cannot do that, I tried for something simpler
I wanted upsto.re/aaaaaa to be added to download list (i.e. triggered whenever a new *.url is added into the watch folder) What I tried test.crawljob file: Code:
[ { "pattern" : "file:/.*?\\.url$", "rule" : "DEEPDECRYPT", "deepPattern": "https?://upsto\\.re/[^ ]+", "deepAnalyseEnabled" : true, "enabled": true } ] Is this something that should be possible using folder watch plugin with a "test.crawljob". What should my [ { ... } ] rule look like? (I know the last case can be done with clipboard monitor but I was wanting to use a "watch" folder) Last edited by ScottQ; 19.06.2021 at 11:20. Reason: clarify content of URL file |
#2
|
||||
|
||||
Crawljob File to auto crawl the given URL
Quote:
so Crawljob Quote:
Quote:
__________________
JD-Dev & Server-Admin |
#3
|
||||
|
||||
Please know that you can always ask for help/hints
__________________
JD-Dev & Server-Admin |
#4
|
|||
|
|||
"text=" expressions
Quote:
Code:
text=**External links are only visible to Support Staff** Code:
**External links are only visible to Support Staff** I tested by copying file test1.url (shown in earlier post) into "folderwatch" and did not see anything triggered. I tried many variations of "text=" in my crawljob file such as Code:
text=test1.url # text=s:\\my\\folderwatch\\test1.url # text="s:\\my\\folderwatch\\test1.url" # text=**External links are only visible to Support Staff**# text=file:///S:/my/folderwatch/test1.url # text=file:///./TEST1.URL # text=file:///./test1.url # filename=test1.url # I guess this is just wishful thinking? # text=.+.URL # text=....URL Can I still edit it in "added" folder for testing? If I edit file in added folder does it have to be moved back to parent folder for JD to detect change? I then copy a test1.url file (see above) into "folderwatch". Should test1.url get moved to "added" sub folder or does it always stay in "folderwatch". I do not see anything added to download list to indicate the link in file was processed. (@moderators please make post links visible, I always try to use sample links. I understand the redaction rationale, but it's been hard to learn from topics where links are redacted.) Last edited by ScottQ; 19.06.2021 at 12:53. |
#5
|
|||
|
|||
Crawljob file to add the 'test1.url' file to JD using folderwatch:
Code:
text=file:///c:/downloads/test1.url Linkcrawler to detect/parse 'url' files: Code:
[ { "enabled" : true, "pattern" : "file:/.*?\\.url$", "rule" : "DEEPDECRYPT", "deepPattern" : "url=(.+)" } ] If the pattern of the links in the 'url' file is supported by JD (e.g upstore), they will be automatically processed/added to JD. Else, you will have to create additional linkcrawler rule for them. While the crawljob file can be edited in the 'added' folder, it will have to be moved back or copied to the folderwatch folder, for it to be detected by JD again. |
#6
|
||||
|
||||
@ScottQ: sorry, I guess I did missunderstand you. You want to process html/text files that you have on disk via Folderwatch, right?
As an alternative you modify your html/text file and just append the modifiers with trailing resttext, eg Quote:
__________________
JD-Dev & Server-Admin |
#7
|
||||
|
||||
Your links are visible to support staff
__________________
JD-Dev & Server-Admin |
#8
|
||||
|
||||
ONLY The crawljob file itself will be moved to added sub folder. JDownloader does not move any other referenced files
__________________
JD-Dev & Server-Admin |
#9
|
|||
|
|||
Quote:
Update: split this post into 2 items I have not been able to get folderwatch to read a *.url file and extract the URL=**External links are only visible to Support Staff** Last edited by ScottQ; 01.07.2021 at 23:27. |
#10
|
|||
|
|||
Q1: "mgpai"'s reply indicates full path in URI format "file://...." is that applicable for JD on MS Windows OS and is that format used everywhere in JD?
I tried testing just using 2 linkcrawler rules (i.e. I turned off folderwatch) just to test local file name matching. Here are snippets for each of the rules: RULE 1: Code:
"pattern": "file:/.*?\\.url$", "rule" : "DEEPDECRYPT", "deepPattern": "URL=(.+)", Code:
"pattern" : "https?://somesite\\.xyz/[^ ]+", "rule" : "DEEPDECRYPT", "deepPattern" : "https?://upsto\\.re/[^ ]+",
Spoiler:
Create C:\test1.url file: Code:
[InternetShortcut] URL=**External links are only visible to Support Staff** Clipboard: Code:
**External links are only visible to Support Staff** Clipboard: Code:
C:\test.url Last edited by ScottQ; 11.07.2021 at 02:02. |
Thread Tools | |
Display Modes | |
|
|