#1
|
|||
|
|||
![]()
Hello, i've been trying to research and get this working on my own over the holidays, but i failed badly so i would like to ask for some help here.
I'd like to crawl the forum **External links are only visible to Support Staff****External links are only visible to Support Staff** in specific subsections for rapidgator links. For example: **External links are only visible to Support Staff****External links are only visible to Support Staff** until **External links are only visible to Support Staff****External links are only visible to Support Staff** I can easily create a text file with links to each forum threadslist page since the page numbers are just increased with no other changes. However I would need to crawl 2 levels deep to actually go into the threads and collect the links. I have read some threads about crawling deeper than the default setting but i failed to implement/adapt them. I also have problems with setting the correct filter to only catch rapidgator links, what is the best way to set the filter up for this? I've tried countless different ways of filtering but none worked, either the whole page was added or nothing at all. Any help is greatly appreciated ![]() |
#2
|
||||
|
||||
![]()
Here are the rough options you got:
1. LinkCrawler rules: https://support.jdownloader.org/know...kcrawler-rules 2. Collect links with browser addons: https://support.jdownloader.org/know...orted-websites In addition to this you could also setup folder rules in JDownloader to ignore everything that is not a rapidgator.net link. See Settings -> Linkgrabber filter
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#3
|
||||
|
||||
![]()
@tr909: Linkcrawler Rule can look for download links AND also support the *next* page and thus auto crawl through the pages. Please give it a try by yourself first but of course you can ask for help when you're stuck
__________________
JD-Dev & Server-Admin |
#4
|
|||
|
|||
![]()
I spend the last 2 hours working on this and learning about regular expressions and i'm getting closer to make the linkcrawler work but i came up with something i don't know how to handle.
the threads are set up like this: **External links are only visible to Support Staff**www.domain.com/threads/page1 but the single threads containing the rapidgator links are setup like this: **External links are only visible to Support Staff**www.domain.com/threads/examplethread how should i setup my pattern and deepPattern? I already have my filter up and running only allowing rapidgator links to be collected. so should my deepPattern target the /threads/ URL or rapidgator? |
#5
|
||||
|
||||
![]()
you need 2 rules:
1.) rule (deep decrypt) that matches on ../forums/... and /forums.../page-number and deep pattern should match on the *next* pages and the threads 2.) rule (deep decrypt) that matches on .../threads/.+ and deep pattern should match on the messages section (to avoid finding other stuff) you can always ask for more help ![]() ![]()
__________________
JD-Dev & Server-Admin |
![]() |
Thread Tools | |
Display Modes | |
|
|