JDownloader Community - Appwork GmbH
 

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 05.11.2019, 00:08
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default URL Extractor issue, question or feature

Looking for a similar tool or is it possible in feature JD2?

URL Extractor. Please read on the main page e.g.

Code:
"What is URL Extractor?"
and other...

**External links are only visible to Support Staff****External links are only visible to Support Staff**

"Anchor Text" The Cyrillic alphabet is not supported here (unrecognized characters)

I want to add such a link to JD2 later to download.

The standard "Deep Search or Crawl Search" in this case does not extract links.
Reply With Quote
  #2  
Old 05.11.2019, 00:19
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,611
Default

in respects to the topic this is no issue, JD is designed in relation to how we want it to work. It already supports returning of links of supported content via deep analyse, for example it only follows links when they are supported and only adds items to linkgrabber when content is supported like jpg png etc. If you want to extract urls and content outside of that, I would recommend using your webbrowser with browser extension (plenty exist). They will also return content derived from javascript. or use advanced setting: LinkCrawler.linkcrawlerrules to add support for additional content.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]

Last edited by raztoki; 05.11.2019 at 00:22.
Reply With Quote
  #3  
Old 05.11.2019, 07:08
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

I used this extension.
https://chrome.google.com/webstore/d...gofnhkkchiekoo

Saves everything, including titles, that it expects in a CSV file.
There is only a problem with Regex.
Regex, which works in other tools, does not work here.
I have no idea what regex supports this engine and how to modify it to work.
Extract only specific links.
Reply With Quote
  #4  
Old 05.11.2019, 12:15
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

@raztoki

The problem is that I want to extract:

Text link + Address

It works. But nowhere on the internet can I find any "Multi-Link Extract" tool.
All tools work only on single links.
How can you enter, for example, 20 links?
LinkCrawler.linkcrawlerrules cannot save additional data (EXTRA DATA): "TextLink"
Reply With Quote
  #5  
Old 05.11.2019, 12:58
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

Quote:
Originally Posted by djmakinera View Post
LinkCrawler.linkcrawlerrules cannot save additional data (EXTRA DATA): "TextLink"
What do you mean by *additional data* ?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #6  
Old 05.11.2019, 13:41
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

@Jiaz - This is not intended to download entire sites, only: Extract URL with title, description.
Reply With Quote
  #7  
Old 07.11.2019, 15:00
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
What do you mean by *additional data* ?

I tried the tool, it's quick, but see what the Cyrillic problem is :(
I don't know how to solve this problem.

**External links are only visible to Support Staff****External links are only visible to Support Staff**


https://postimg.cc/K4t7hHV5
Reply With Quote
  #8  
Old 08.11.2019, 12:31
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

You can't just open the file as:
UTF-8
or
Cyrillic Windows 1251
And save as Cyrillic -1251 or UTF-8
In this case, the text file will completely lose the correct encoding.

https://i.postimg.cc/LXy1scNt/Screen...t-11-23-AM.jpg
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 23:13.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.