JDownloader Community - Appwork GmbH
 

Closed Thread
 
Thread Tools Display Modes
  #1  
Old 20.01.2020, 20:00
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,452
Default regex issue / analyze plain text , extract url

Do not block posts because I do not want to create new posts and unnecessarily post.

1. JD2 is not a great tool for analyzing links especially with text. All in all, I think Raztoki said the same thing.
Add Analyze Link does not ignore whitespace.

2. I can't add everything to analyze, because JD2 will always hang or run very slowly.

3. If I have the optimal regex for my engine, I will be able to extract the maximum speed of the all links.

Try to understand. Don't do and instruct others, especially. it's not pretty, because it sounds like you know everything and I was just a layman.
  #2  
Old 20.01.2020, 20:03
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 54,803
Default

I have told you that I will block your posts and you if you keep ignoring our rules and bothering our support!

You are wasting the time of our support staff and my time too.

We advised you to only post if you have specified questions and provide all the info we need to help you.

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
  #3  
Old 20.01.2020, 20:26
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,452
Default

Here, I want to paste something, of course, the tool can not analyze the text in search of the correct links also without protocol pattern could lead to lockups attempting to match URLs that contained literal parenthesis characters. Pattern that attempts to match any sort of URL, using the extended multiline regex format that disregards literal whitespace . Can't type regex here to parse the links more correctly?

Code:
Run JD2 -> LinkGrabber - > Add & Analyze Link
  #4  
Old 21.01.2020, 00:11
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,201
Default

far as I remember, I never said does not ignore whitespace.
add links dialog has its own parser of links
its not designed to analyse text files for urls (in which parsing of files many MiB in size), its more to strip out everything thats not a URL and then display the urls. This allows end users to analyse url before adding, allows them to remove content if they want.
I did say make and use decrypter or linkcrawler rules vs doing this yourself, and filter the content that you want. But you refuse. So we are in full circle again.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
  #5  
Old 21.01.2020, 03:08
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,452
Default

raztoki Please see that it is defective!!!
screenshot:
**External links are only visible to Support Staff****External links are only visible to Support Staff**

JD2 Does not ignore whitespace!

They should improve the detection mechanism and ignore whitespace.
E.g. Use this regex that ignores whitespace.

(https?://)?([\w\.]+)\.([a-z]{2,6}\.?)(/[\w\.]*)*/?

Of course this regex still NOT PERFECT problem because extract example:

Code:
nebes.html
1.po
ab.3...2916.4011..4234...0.0..0.83.393.5......0....1..gws
4yUUd4.jpg
...za
...bo
  #6  
Old 21.01.2020, 06:08
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,201
Default

I did test your screenshot on mac, with textedit but could not reproduce the auto urlencoding in my JD2. I checked settings > advanced settings > LinkgrabberSettings.addlinkspreparserenabled and that is true and false. Both didn't alter the url.

I know Jiaz has in the past added each variation of encoded, non encoded. But this did lead to many incorrect / offline instances.

In respects to %20 == urlencoding of standard space, it's made to do this as white space is a valid URL component. if within the url JD just urlencodes it, as the specification states.

within HTML urls are either pre urlencoded, OR placed within brackets ' or " which then allows everything encased.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
  #7  
Old 21.01.2020, 10:51
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,452
Default

Whitespace at the end of the address, unless you are kidding, this is not correct! it will not parse some addresses, you'll get 404 and thank you for analyzing that doesn't work.
There was already a post in the forum which characters are allowed and in which cases they are not.
Jiaz must look at this.
This problem has not occurred from tomorrow, nor from today only for a long time.
If there were no such problem, I would not write and waste time for myself and you, but the problem is serious.

So: If JD2 can't parse the text correctly, so I have to do it in another third-party tool (editor) to extract the correct URL without whitespace!
  #8  
Old 21.01.2020, 11:39
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,452
Default

raztoki - Create an address specially, add a whit space and open Add Analyze Link (not Clipboard!)
Adds unnecessary Percentage-Encoding that can lead to a faulty address (one that doesn't exist)
  #9  
Old 21.01.2020, 14:36
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,201
Default

I copied your screenshot url exactly. the advanced setting should change url automatically (I assume within the display and not on continue), at least in my testing it didn't.

I also did further tests of encased/wrapped in quote marks and I didn't find any changes either when I would expect to see changes.

whitespace space end of a url sure I agree, but you also had text after (before new line) standard space which is considered continuation of this url in this case, and I assume jiaz has created in a fashion that is valid and urlencodes it. I can only go from whats happening on my system, and draw assumptions based on history (at least what I recall with the add links dialog).

finding urls within html which complies with standards is a lot easier than text with no standard.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]

Last edited by raztoki; 21.01.2020 at 14:39.
  #10  
Old 21.01.2020, 15:56
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,452
Default

There are two problems:

1. Cutting addresses that I have already mentioned. Every time I have to manually add truncated &s=(\d+)
I'm sick of typing / correcting manually! It detects badly from the clipboard and I have no more convincing! This is bug. I want add full adress!

2. This error has been around since the beginning of JD2 and I don't see much improvement in it.
It doesn't even have to be text to make the link invalid!

3. I don't know what system you have, I have Windows 7 and this problem occurs on many computers because I can check it with someone else and the same error, so I don't understand why you are trying to indicate that there is no error.
There are hundreds of different "url regex" programs for a specific sentence, but in this case you need to change something, because if I paste wrong links again, I have to edit the links in the editor again and again, it's sick!

4. I've had billions of files in my life and I've never seen ANY (!) Link with a white space, there's a sign everywhere. (but I'm talking about NORMAL (decoding) and not "%" encoding.
JD2 Not select Decode/Encode !!!
And life in the 21st century and I use the most popular addresses in the world, unless you use some Unicode, Russian, Chinese and damn knows what else.
  #11  
Old 21.01.2020, 16:03
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,452
Default

I would like to ask if it is possible and will certainly help solve many problems with addition, detection and analysis.
More advanced options, something like that

Example advanced setting /feature URL:
Screenshot
**External links are only visible to Support Staff****External links are only visible to Support Staff**
  #12  
Old 21.01.2020, 16:10
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,452
Default

The log shows that JD2 has faulty analysis of some links.
Because it doesn't detect valid links, and I'd find a hundred hosts where it won't detect correctly.

21.01.20 15.48.31 <--> 21.01.20 16.07.43 jdlog://6799330900751/
  #13  
Old 21.01.2020, 16:14
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,452
Default

Bug: "ADD_LINKS_DIALOG"

Screen:
**External links are only visible to Support Staff****External links are only visible to Support Staff**

21.01.20 15.48.31 <--> 21.01.20 16.14.14 jdlog://9799330900751/
  #14  
Old 21.01.2020, 17:23
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 54,803
Default

Quote:
Originally Posted by djmakinera View Post
There are two problems:

1. Cutting addresses that I have already mentioned. Every time I have to manually add truncated &s=(\d+)
I'm sick of typing / correcting manually! It detects badly from the clipboard and I have no more convincing! This is bug. I want add full adress!
This is not a bug.
If your url e.g. looks like "http:blabla.jpg?s=123" it might get truncated at ".jpg" as this is a file-extension.

This is not a bug this is simply how our parser works.
Quote:
Originally Posted by djmakinera View Post
Bug: "ADD_LINKS_DIALOG"
No not a bug.

You are wasting our time!
If you want solutions for your special "problems", go in the Internet, grab a cheap freelancer and let them code whatever you like e.g. guru.com.

Closed!
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
Closed Thread

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 08:54.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.