#1
|
|||
|
|||
![]()
So, I am trying to acquire links to attachments in a phpBB-hosted forum. This requires being logged-in, so I have provided my user/pass to JD via 'Basic Authentication' settings. Natively, provided with a suitable forum page URL, JDownloader is only finding the thumbnails, and not following the href links:
Code:
<a class="file-preview " href="/phpBB2/index.php?attachments/capture-jpg.1988988/" target="_blank"> <img src="/phpBB2/data/attachments/1915/1915251-5369dc6cb5e0da0fb7fd46255176b083.jpg" alt="Capture.JPG" width="264" height="200" loading="lazy"> </a> To this end, I have made a DEEPDECRYPT rule. I have established that its pattern catches the page URL, because the LinkCrawlerRule...log contains the page source, and the Rule cookies update, but it does not result in any additions to the LinkGrabber pane, nor any traces in logs. My deepPattern: Code:
"deepPattern" : "(?i)<a class=\"file-preview \" (href=\"[^\"]+\") target=\"_blank\">", It's frustrating to trawl this forum and finding so much of the content of LinkCrawler Rule discussions is submerged under "**External links are only visible to Support Staff**"! |
#2
|
||||
|
||||
![]()
use some \s* for whitespace as if they change html in respects to " or ' it will fail easily.
also you don't want to listen to href itself just the component after = char or inside " or ' (which ever it uses) you will need a linkcrawler rulefor the newly captured url pattern so it can then process so your Code:
https ? : //domain/phpBB2/index.php?attachments.+? there is also support articles
__________________
raztoki @ jDownloader reporter/developer http://svn.jdownloader.org/users/170 Don't fight the system, use it to your advantage. :] |
#3
|
|||
|
|||
![]()
Followup question: What does deepPattern emit from the LinkCrawler Rule, when:
a. There is no capture group. Is it the whole pattern? b. There is one or more capture groups. Is it just one of the capture groups? c. If the capture group isolates after the 'href=' as you suggest, is the URL emitted in its relative form, or is it automatically converted to fully-qualified? If relative URLs are emitted by the first Rule, how to make the followup Rule discriminate between relative URLs from different hosts? I'll be making the regex more defensive once I've got something working! Last edited by Nimboid; 29.07.2023 at 17:31. Reason: Added info |
#4
|
||||
|
||||
![]()
a+b: No idea - I'd need to look into the code myself. You may as well just try it out.
c. Only full URLs will be returned for further processing. Also as you seem to have already found out: That forum most likely does not support basic authentification so you need your login cookies -> Creating a LinkCrawler rule is the way to go.
__________________
JD Supporter, Plugin Dev. & Community Manager Erste Schritte & Tutorials || JDownloader 2 Setup Download ----------------------------------- On Vacation / Im Urlaub Start: 2023-12-09 End: TBA Last edited by raztoki; 01.08.2023 at 13:56. |
#5
|
|||
|
|||
![]()
OK, I'm making progress, I can identify non-standard URLs using a DEEPDECRYPT rule, and emit the href links below, which are subsequently handled by a DIRECTHTTP rule.
However, the page source treats different media types differently: HTML Code:
<a class="u-anchorTarget" id="attachment-2062965"></a> <a class="file-preview js-lbImage" href="/phpBB2/index.php?attachments/capture-jpg.2062965/" target="_blank"> <a class="u-anchorTarget" id="attachment-2062963"></a> <a class="file-preview" href="/phpBB2/data/video/1989/1989225-00fcffd963a4710b33ea59fdd998dba7.mp4" target="_blank"> I have discovered that if I remodel the video URLs using the attachment Id thus: HTML Code:
"/phpBB2/index.php?attachments/2062963/" HTML Code:
"/phpBB2/index.php?attachments/Rover1.mp4.2062963/" Bizarrely, if I have regex errors, there are conditions where JDownloader finds the 'human-generated' video names, none of the image files, but also dozens of crud files. I haven't found any clues in the logs, and the URLs it finds are nowhere in the page source. If my DEEPDECRYPT were to emit the HTML Code:
"<a class="u-anchorTarget" id="attachment-2062963"></a>" Or MUST rule "patterns" contain fully-qualified URLs? If this isn't technically possible, is it possible within Event Scripter to create new links and submit them to the Link Crawler? I could then use getPage(myString/*PageURL*/); and pick out the "u-anchorTarget" lines to generate and submit my URLs. I could also have knocked out a JavaScript routine to paste into my browser console, in the time it's taken me to compose this post! |
#6
|
||||
|
||||
![]()
I'm sorry (not meaning to sound rude) but we could waste tons of time doing this in a theoretical way.
It would be way easier if you supplied all the required information (real life testlinks, logins, rules you've created so far) so we can make some progress here. You're free to do this via email - we do value your privacy. Send the information to support@jdownloader.org. Quote:
Quote:
Quote:
That's what I mean - doing this outside JD might be faster than trying to learn the internals especially if you hit a possible dead end like the need to work with multiple snippets of a given html webpage.
__________________
JD Supporter, Plugin Dev. & Community Manager Erste Schritte & Tutorials || JDownloader 2 Setup Download ----------------------------------- On Vacation / Im Urlaub Start: 2023-12-09 End: TBA |
#7
|
|||
|
|||
![]()
I now have it working to my satisfaction.
As the crucial information is within a short element of the page source HTML Code:
"<a class="u-anchorTarget" id="attachment-2062963"></a>" If any of the links given to it address a qualifying page, those links are separated and searched, the remainder being passed onwards using myCrawlerJob.setText(...). Processed links are queued using callAPI("linkgrabberv2", "addLinks", {...}), with login credentials obtained from a partner LinkCrawler Rule. I know a bit more than I did... |
#8
|
||||
|
||||
![]()
Nice one.
You should consider posting that script here as it may be helpful for other users as well.
__________________
JD Supporter, Plugin Dev. & Community Manager Erste Schritte & Tutorials || JDownloader 2 Setup Download ----------------------------------- On Vacation / Im Urlaub Start: 2023-12-09 End: TBA |
![]() |
Thread Tools | |
Display Modes | |
|
|