JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #41  
Old 26.04.2019, 14:31
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

I tried a regular expression, but I see that it still finds such a word:
Quote:
Про Нижнее бельё

^(?!www|https?:).*[^\!'(),.:;?="А-Яа-я]$

Example1:
Mój człowiek musi najpierw mnie zrozumieć. Wszystkie moje załamania i dziwactwa, które przy okazji mi wystarczają

Example2:
Mój człowiek musi najpierw mnie zrozumieć. Wszystkie moje załamania i dziwactwa, które przy okazji mi wystarczają. Zrozumcie moje hobby i zainteresowania. Dla mnie jest to bardzo ważne, gdy dana osoba jest zainteresowana moimi hobby, a przynajmniej stosuje swoją siłę, aby zrozumieć. Potrzebuję kogoś, kto będzie ze mną na tej samej częstotliwości!

Example3:
Mój człowiek musi najpierw mnie zrozumieć. Wszystkie moje załamania i dziwactwa, które przy okazji mi wystarczają. Zrozumcie moje hobby i zainteresowania. Dla mnie jest to bardzo ważne, gdy dana osoba jest zainteresowana moimi hobby, a przynajmniej stosuje swoją siłę, aby zrozumieć. Potrzebuję kogoś, kto będzie ze mną na tej samej częstotliwości?

Example4:
===
Potrzebuję kogoś, kto będzie ze mną na tej samej częstotliwości

Example5:
Mój człowiek musi najpierw mnie zrozumieć.
Reply With Quote
  #42  
Old 26.04.2019, 17:31
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

And what is wrong with that? I don't see any difference from *such a word* to your rest text
__________________
JD-Dev & Server-Admin
Reply With Quote
  #43  
Old 26.04.2019, 17:37
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

^(?!www|https?:).*[^\!'(),.:;?="аАбБвВгГдДеЕёЁжЖзЗиИйЙкКлЛмМнНоОпПрРсСтТуУфФхХцЦчЧшШщЩъЪыЫьЬэЭюЮяЯ]{1}$

Something is wrong, it is not only the last but one of any allowed character (end line)
{1}
Reply With Quote
  #44  
Old 26.04.2019, 18:01
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
And what is wrong with that? I don't see any difference from *such a word* to your rest text
Not words, but character, make differences.
Reply With Quote
  #45  
Old 27.04.2019, 17:50
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Regex to find all sentences of text?
Sentences <end with '.','?' or '!'>.

Enabled Regex:
[^.!?0-9]+[.!?]

[x]Enabled Count Matches


Regular expression correct, but should ignore all links www,http and https
Example:
**External links are only visible to Support Staff****External links are only visible to Support Staff**
Reply With Quote
  #46  
Old 29.04.2019, 16:07
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

Quote:
^(?!https?:).*(\.|!|,|,)\s*$
maybe like this
__________________
JD-Dev & Server-Admin
Reply With Quote
  #47  
Old 29.04.2019, 19:03
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
maybe like this
Unfortunately, but an incorrect expression.
Count 2 matches. Should count in this case: 10 matches.

https://i.postimg.cc/DZcfKVhM/Screen...t-07-00-PM.jpg
Reply With Quote
  #48  
Old 29.04.2019, 19:36
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

There are only 2 lines in your examples, see 1 and 2 on the left. so only 2 matches
__________________
JD-Dev & Server-Admin
Reply With Quote
  #49  
Old 29.04.2019, 20:33
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
There are only 2 lines in your examples, see 1 and 2 on the left. so only 2 matches
Quote:
Originally Posted by djmakinera View Post
Regex to find all sentences of text?

See my regex:
Here shows 10 counter + 2 counter (no link needed!).
So here is something to improve.

https://i.postimg.cc/bNsJZHtV/Screen...t-08-31-PM.jpg
Reply With Quote
  #50  
Old 30.04.2019, 10:12
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

But a sentence doesn't end with , ?
So you already have a working pattern, so why asking for help?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #51  
Old 30.04.2019, 12:32
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
But a sentence doesn't end with , ?
So you already have a working pattern, so why asking for help?
Sentences <end with '.','?' or '!'>.

You misinterpreted. The comma in this case was the dividing sentence.
Regex - It does not work properly because it includes some characters in the links, even a dot in the links.
And links are not sentences!
Reply With Quote
  #52  
Old 30.04.2019, 19:12
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

/^(www|https?:\/\/)?
the letters www, http or https

([\da-z\.-]+)\.([a-z\.]{2,6})
any number a dot ...two to six

([/\w\.-]*)\/?$/
letters,numbers, underscores,dots, or hyphens

[^.!?0-9]+[.!?]
Count Matches Sentences

?!
Negative Lookahead

Not work for me:
[^.!?0-9]+[.!?]/^(?!www|https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
Reply With Quote
  #53  
Old 30.04.2019, 20:03
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

The pattern is invalid, for example
(?!www|https?:\/\/)?
you can't use negative lookahead but then make it optional with ?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #54  
Old 30.04.2019, 21:00
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

I have corrected, but still regular expression is not correct.

Code:
ERROR
The complexity of matching the regular expression exceeded predefined bounds.  Try refactoring the regular expression to make each choice made by the state machine unambiguous.  This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate.
Reply With Quote
  #55  
Old 01.05.2019, 00:03
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

The expression should be corrected:
1. It can not include links (because it contains these characters . ! ?)
?! Negative Lookahead NOT WORK

2. He can not tolerate simple names, for example:
Меары А. С. Одинцой
must have at least two characters

3. Ignore Cyrylic

[^.!?0-9\p{Cyrillic}]+[.!?](?!https?:\/\/(www\.)[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*))
Reply With Quote
  #56  
Old 02.05.2019, 17:22
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

Negative Lookahead works fine but you cannot use it in combination with ?
Quote:
(?!https?:\/\/(www\.) is correct
(?!https?:\/\/(www\.)? is invalid. you specify that this text may not be present and yet want it optional with ?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #57  
Old 02.05.2019, 17:23
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

You should really start to learn conding in a language and not try to solve all problems with regex!
__________________
JD-Dev & Server-Admin
Reply With Quote
  #58  
Old 02.05.2019, 17:39
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,614
Default

its powerful, just cant solve all yours queries ;p
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #59  
Old 03.05.2019, 16:26
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

I do not need to know everything, I see nothing wrong with it, but in some cases a regular expression may work, maybe not in my case. You can find similar expressive examples on the other forum, and here everything is fine, but not in my case. So I have to include sentences in links, there is no other solution, it does not exist.
Reply With Quote
  #60  
Old 04.05.2019, 03:45
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
(?!https?:\/\/(www\.) is correct
The regular expression contains mismatched '(' and ')'.


(?!http|https):\/\/(www\.)[\w\-_]+(\.[\w]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])

Code:
/ An unescaped delimiter must be escaped with a backslash (\)
The regular expression is correct, but on regex101.com
Indicates that the pattern error.

?! does not ignore links, instead of ignoring it is still a match!
Reply With Quote
  #61  
Old 04.05.2019, 08:13
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,614
Default

sure you just wan to ignore http or https component of the protocol?
you would be better off with code like jiaz indicated, i would personally recommend bash like script you can then run multiple regular expressions one after another (unlike most text/word processors).

for example cat \file\text | grep patternexpression1 | grep patternexpression2
this allows you to process the text, to pre filter, and then additional patterns to find what you want. you can even port the findings to files and parse them multiple times if you require different outcomes.

raztoki
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #62  
Old 04.05.2019, 13:05
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

The pattern is invalid, because in this case it ignores only the protocol, it must be changed to exclude the entire address.
Reply With Quote
  #63  
Old 04.05.2019, 15:11
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,614
Default

yes i gathered, hence my question. since you know the answer then you should be able to fix the expression.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #64  
Old 04.05.2019, 17:10
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by raztoki View Post
yes i gathered, hence my question. since you know the answer then you should be able to fix the expression.
I have changed the regex, it ignores the links, but the error in the selection of sentences and the error in some lines does not mark the whole text.

(?!(http|https?://|**External links are only visible to Support Staff**ftp://|www\.|[^\s:=]+@www\.).*?[a-z_\/0-9\-\#=&])(?=(\.|,|;|\?|\!)?("|'|«|»|\[|\s|\r|\n|$))[^.!?0-9]+[.!?]
Reply With Quote
  #65  
Old 04.05.2019, 17:21
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,614
Default

you are on the right track with encasing, but you have now introduced more issues. anyway I'm not providing you with any assistance with regular expressions. I'm glad you're learning though!
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #66  
Old 05.05.2019, 17:55
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

(?<!")😊(?!")|(?<!"(?=😊"))😊|😊(?!"(?<="😊"))|(?<!")😊|😊(?!")|😊(?!"(?:(?:[^"]*"){2})*[^"]*)|(?:"😊".*?)*\k😊|(?:(?>{[^}]*?})[^{}]*?)*\k😊
Reply With Quote
  #67  
Old 05.05.2019, 22:24
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

From what I know many times, someone asked on the programming forum stackoverflow.com about almost the same.
But there is no good solution. "Nothing you do will be perfect." To reduce the error rate as much as possible. Run the program on a large set of texts and add exceptions until you reach an acceptable level of error. However, if you need more than dozens of rules, you'll probably just want to rethink the problem.

Step1:

Search sentences that allowed at end .!?
Example sentence:
!
Code:
Gdy patrzę na świat, to jest tak piękne i straszne w tym samym czasie!
-or-
.
Code:
Gdy patrzę na świat, to jest tak piękne i straszne w tym samym czasie.
-or-
?
Code:
Gdy patrzę na świat, to jest tak piękne i straszne w tym samym czasie?

Step2:
Search sentences NOT allowed at end .
The beginning of the Line:
0. (ANY NUMBER + DOT)
5. (ANY NUMBER + DOT)
156. (ANY NUMBER + DOT)
Only at the beginning of the line, everywhere else is acceptable.

Step3:
All languages of the world are allowed, except for Russian.

Step4:
Add a search exception for any links (URLs). Completely ignore.

Step5:

Allow sentence detection when another sentence ends with "three dots", "three exclamation marks", "three question marks" and the next begins with a capital letter:
Example:
Code:
Jestem w innym świecie... W świecie o innej kulturze, języku, tradycjach, architekturze, przyrodzie, kuchni, pogodzie.
Code:
Jestem w innym świecie!!! W świecie o innej kulturze, języku, tradycjach, architekturze, przyrodzie, kuchni, pogodzie.
Code:
Jestem w innym świecie??? W świecie o innej kulturze, języku, tradycjach, architekturze, przyrodzie, kuchni, pogodzie.
Reply With Quote
  #68  
Old 06.05.2019, 01:47
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Step2:
Regex ignores URLs and finds the sentences.
Code:
^(?!\d+\.).*[.!?]$
Only still remains to solve the issues of numbering some sentences (only the beginning of the line) Screenshot text: https://postimg.cc/TywFXf2z

----------------------------------
At the moment, only such a workaround, but that and this expression works separately.
... need to look for a solution so that the numbering at the beginning of the line of the sentence is treated as a whole sentence.
With numbering and without numbering (in both cases)

the word "KONIEC" means the completion of the text, and then the separator. "="

Code:
^(?!\d+\.)|(?!KONIEC).*[.!?]$
Reply With Quote
  #69  
Old 07.05.2019, 11:48
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Update:
New pattern to ignore spaces before the selection.

Basically, a sentence ends with a ".?!" OR a sentence begins a line with a number + "." and ends with ".?!"

Screenshot:
https://postimg.cc/hzzVcNh2

Regular expression - only do not ignore characters in links. And here is an error not solved.
Reply With Quote
  #70  
Old 22.05.2019, 14:58
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

@Jiaz - You can correct the pattern so that it does not detect char:
. ! ? in links?

Spoiler:
(\S+\.(com|net|org|edu|gov|ru|pl)(\/\S+)?)|((^\d+\..*?|[^\s].*?)(\.\.\.|[\.?!]))
Reply With Quote
  #71  
Old 23.05.2019, 11:43
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

^\d+\..*? -> .*? -> you allow everything
^\d+\[^\\.!\?]*?

[^\s].*? -> .*? -> you allow everything
[^\s\.!\?]*?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #72  
Old 23.05.2019, 14:42
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
^\d+\..*? -> .*? -> you allow everything
^\d+\[^\\.!\?]*?

[^\s].*? -> .*? -> you allow everything
[^\s\.!\?]*?

(^\d+\..*?|.*?)(\.\.\.|[\.?!]). Basically, a sentence ends with a ".?!" OR a sentence begins a line with a number + "." and ends with ".?!"

-or-
This works better: (^\d+\..*?|[^\s].*?)(\.\.\.|[\.?!]) to ignore spaces before the selection.

I've tested. Unfortunately, this pattern is incorrect because it matches the links as sentences.

See screenshot:
https://i.postimg.cc/SRqZRvs9/Screen...t-02-37-PM.jpg
Reply With Quote
  #73  
Old 23.05.2019, 15:45
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

Please understand that I simply don't have the time to help you with your pattern. If you want to do this all within a single pattern, then you have to learn it and learn to write more complex patterns, like
emailregex.com
__________________
JD-Dev & Server-Admin
Reply With Quote
  #74  
Old 23.05.2019, 16:11
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Anyway, thanks for partial help.
This question was surprisingly difficult to find an answer for. The regexes I found were too complicated to understand, and anything more that a regex is overkill and too difficult to implement.
Reply With Quote
  #75  
Old 23.05.2019, 16:48
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
like
emailregex.com
A Filter, e.g. a name or package - what is the engine using? Because none of the ready-made pattern "E-MAIL" is incorrect :D
Reply With Quote
  #76  
Old 24.05.2019, 17:09
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

Quote:
Originally Posted by djmakinera View Post
A Filter, e.g. a name or package - what is the engine using? Because none of the ready-made pattern "E-MAIL" is incorrect :D
normal pattern/regex. no *engine*.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #77  
Old 17.06.2019, 10:26
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

How do reverse the order of numbers separated by a comma?

Example: 01,03 -> 03,01
Reply With Quote
  #78  
Old 17.06.2019, 11:03
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

There comes a time you should learn some sort of coding language and not try to achieve everything with regex.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #79  
Old 17.06.2019, 16:05
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

For Unix it is full on the net, but I do not see anything for Windows, but these scripts and commands also take time - writing patterns, the same effect as I would have to change manually. I do not see anything in it that could facilitate the exchange, but to convert 20 numbers, I would have to spend a few or more minutes each time, and manually a lot faster.
Reply With Quote
  #80  
Old 17.06.2019, 16:31
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,522
Default

Ever thought about regex NOT being the answer for all of your *stuff*?
__________________
JD-Dev & Server-Admin
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 15:19.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.