JDownloader Community - Appwork GmbH
 

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 08.04.2019, 12:24
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default pattern issue

I did such a regex, but it does not work for me.
How to find titles only in Russian and many different match unicode. The match must match at least one character in Russian.
It seems to be complicated and confusing.


^([\p{Cyrillic}]+[\-\.\,\!\…\?\(\)\„\”—0-9\s]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”—0-9\s]*|[\-\.\,\…\?\(\)\„\”—0-9]+[\p{Cyrillic}]+[\p{Cyrillic}\-\.\,\!\…\?\(\)\„\”—0-9\s]*)$

Text included:
\p{Cyrillic}
!
?
… (unicode)
— (unicode)
-
.
..
...
,
0-9
(
)
„ (unicode)
„ (unicode)
\s (space)
\
/
\x(200b}
*
Reply With Quote
  #2  
Old 09.04.2019, 18:05
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

Please understand that this is no *How do I learn regex* nor a *Regex is the solution for everything* forum. Maybe you should start thinking about learning to code and not just rely on (huge, complex) regex
__________________
JD-Dev & Server-Admin
Reply With Quote
  #3  
Old 09.04.2019, 18:35
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

I want Match at least one Cyrillic character, which I assume want, otherwise it would match a long string of numbers or dashes or dots and others.
I do not want to match everything, I have to choose from the specified ones.
:confused:


^ = match start of string (or line, in multiline mode)
[] = match character in a set
\p{Cyrillic} = match a Cyrillic character
\- = match a literal -
\. = match a literal .
= match previous element one or more times.
* = match previous element zero or more times.
$ = match end of string (or line, in multiline mode)
Reply With Quote
  #4  
Old 09.04.2019, 19:07
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

As always, it's much easier if you also provide examples!
__________________
JD-Dev & Server-Admin
Reply With Quote
  #5  
Old 09.04.2019, 19:18
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

**External links are only visible to Support Staff****External links are only visible to Support Staff**
Reply With Quote
  #6  
Old 10.04.2019, 11:07
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

Thanks, and what exactly do you want to achieve? What do you want to find?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #7  
Old 12.04.2019, 07:14
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

What to say, just text.
Reply With Quote
  #8  
Old 15.04.2019, 12:36
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

I'm sorry but I still haven't understood what you're trying to achieve via pattern
__________________
JD-Dev & Server-Admin
Reply With Quote
  #9  
Old 15.04.2019, 14:35
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Find the text containing at least the Cyrillic alphabet and other symbols (but do not match literally everything) in one line.
Reply With Quote
  #10  
Old 15.04.2019, 14:58
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

I'm sorry but I don't understand. find at least the Cyrillic alphabet should be possible via \p{Cyrillic}
but then you lost me. what do you mean by *in one line*
If you only want line by line, then don't enable dotall nor multiline
__________________
JD-Dev & Server-Admin
Reply With Quote
  #11  
Old 15.04.2019, 15:17
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

It only matches Cyrillic, does not match the entire title.
Besides, I want to match only the title, not the name of the author and text.
Eg.

Line1: Title
Line2: The name of the author
Line3: Other name or any text.
Line4: Blank line (not always) or separator "=" (always)
Line5: Another title
Line6: Another name of the author

Another line in a similar way.
Reply With Quote
  #12  
Old 15.04.2019, 15:22
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

so basically a complete line with minimum one cryillic character?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #13  
Old 15.04.2019, 15:42
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
so basically a complete line with minimum one cryillic character?
Yeah.

Example:
Line1: **External links are only visible to Support Staff****External links are only visible to Support Staff**
Line2: Дневник наÑтроениÑ. 25 Ğ°Ğ¿Ñ€. ĞŸĞµÑ€Ğ²Ğ°Ñ ĞºĞ»ÑƒĞ±Ğ½Ğ¸ĞºĞ°
Line3: Ğ“Ğ°Ñник Ирина ĞлекÑандровна
Line4: Dziennik mojego nastroju. 25 kwietnia. Pierwsza truskawka
Line5: Cze¶æ, mój wierny czytelniku!

Line 1 + 2: Find & Extract
Line 3,4,5: Ignore
Reply With Quote
  #14  
Old 15.04.2019, 19:24
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

I'm surry but what's the difference between 2 and 3? if you always have the same format (line1 up to line5) why not simply keep line 1 and 2 and ignore the rest? why the pattern with cryillic required?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #15  
Old 15.04.2019, 20:24
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Because I have over 2000 lines, not 5, I want to extract URLs and titles. If I simply sort the text by URL order and other text, I just do not keep the order of the title names that are always stored under the next URL line.

The order from top to bottom
Line:URLs
Next Line:Title
Reply With Quote
  #16  
Old 16.04.2019, 11:07
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

wtf. I don't understand again. First you say *keep Line 1+2* and *ignore 3-5*, now you say oyu have 2000 lines?
how do you want to extract titles without knowing what lines are titles?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #17  
Old 16.04.2019, 14:18
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

It seems to be a complicated task, but it is possible to do.
Let's say I have a separator, if he changes something in this situation?
Separator (example:===)
URL
Title
Name of the Author
Text
Next Separator (example:===)
Next URL
Next Title
Next Name of the Author
Next Text

The further part is repeated.
Reply With Quote
  #18  
Old 16.04.2019, 16:40
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

In that case its simple. Create pattern that matches line full of = (seperator) and then use the next two lines, that's it
__________________
JD-Dev & Server-Admin
Reply With Quote
  #19  
Old 16.04.2019, 19:27
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

But the text can contain multilines, so it's still not easy. And besides, with this regular expression, it is not so easy, because instead of looking for something specific to match, almost everything match!
Reply With Quote
  #20  
Old 17.04.2019, 11:16
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

I'm sorry. Please don't change it all the time.
First you explain the lines are
URL
Title
Name
Text
===================(Seperator)
URL
Title
Name
Text
===================(Seperator)
....

and now you say *but the text can contain multilines"?

multilines don't matter as long as there is a seperator where you can match on.
without a unique seperator, how do you expect to seperate them?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #21  
Old 17.04.2019, 13:37
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

You can consider
but nobody knows how to do it anyway.
Reply With Quote
  #22  
Old 17.04.2019, 16:00
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

please provide an example file with 100 lines or so and send to support@jdownloader.org
__________________
JD-Dev & Server-Admin
Reply With Quote
  #23  
Old 17.04.2019, 18:00
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Ticket ID: LMN-474-68187
Reply With Quote
  #24  
Old 18.04.2019, 11:37
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

Tested and works fine in regex101
Quote:
(?:===\s*|^)(https?:.*?)(?:[\r\n]+)(.*?)(?:[\r\n]+)
__________________
JD-Dev & Server-Admin
Reply With Quote
  #25  
Old 18.04.2019, 12:07
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
Tested and works fine in regex101
Thanks for the help.
Unfortunately, there is a problem with the expression.
In this case, select lines 383 and 384
It does not mark the next line 385
Similarly with others lines.
See screenshot:
https://postimg.cc/m1s2T5Rk

Used engines:
Perl Regex, Regex++
Onigmo
Reply With Quote
  #26  
Old 18.04.2019, 12:10
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Hm, maybe add the next part of the expression
Include 3 lines

(?:===\s*|^)(https?:.*?)(?:[\r\n]+)(.*?)(?:[\r\n]+)(.*?)(?:[\r\n]+)
Reply With Quote
  #27  
Old 18.04.2019, 12:26
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

It's up to you if you want to include 1 or 2 lines, but now you have a working pattern
__________________
JD-Dev & Server-Admin
Reply With Quote
  #28  
Old 18.04.2019, 12:40
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Thanks Jiaz for taking the time and help.
Reply With Quote
  #29  
Old 18.04.2019, 15:39
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

You're welcome. I can help better and faster if you always provide real examples first and tell what you really want to achieve
__________________
JD-Dev & Server-Admin
Reply With Quote
  #30  
Old 21.04.2019, 02:26
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

New question...


OK:
Find the missing dot at the end of the line in the text.

Missing dot:
Find the missing dot at the end of the line in the text


do not look for a dot at the end of the line:
I want to ignore any URLs
I want to ignore any Cyrillic alphabet

Something is wrong here:
[^\.]\r\n(!?(http|https):\/\/[\w\-_]+(\.[\w]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?)+[А-Яа-я])
Look only in the text in Polish or English (normal text), without Unicode.
Reply With Quote
  #31  
Old 21.04.2019, 14:44
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

need some CORRECT:
\r\n(?!http|www|Ğ-Яа-Ñ?).+[^\.]
Reply With Quote
  #32  
Old 24.04.2019, 20:27
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

Sorry but I don't understand. Can you please provide new example file and tell exactly what you want to achieve. I don't understand your OK and Missing dot
__________________
JD-Dev & Server-Admin
Reply With Quote
  #33  
Old 25.04.2019, 10:46
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

I want to find sentences in which there is a missing dot at the end of the line.
This regular expression searches for sentences with a dot at the end of the line, but I want it to find the missing dot.

\.{1,}$

Besides, he must ignore the links in the text, because there can not be dots at the end of the line.

**External links are only visible to Support Staff****External links are only visible to Support Staff**
Ğбо мне
O mnie
1. Witamy na stronach mojego bloga, który staram siê uczyniæ najbardziej
Reply With Quote
  #34  
Old 25.04.2019, 15:13
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

how about
Quote:
.*[^\.]$
__________________
JD-Dev & Server-Admin
Reply With Quote
  #35  
Old 25.04.2019, 15:13
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

So you want all lines without dot at the end, but no links?
__________________
JD-Dev & Server-Admin
Reply With Quote
  #36  
Old 25.04.2019, 15:28
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Quote:
Originally Posted by Jiaz View Post
So you want all lines without dot at the end, but no links?
Yes. No links.

Other special Character
for quotations by adding the escape \ before the " in the string
.*[^"\.?!:=»А-Яа-я"]$
Reply With Quote
  #37  
Old 25.04.2019, 19:59
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

check this
Quote:
^(?!https?:).*[^\.]$
__________________
JD-Dev & Server-Admin
Reply With Quote
  #38  
Old 25.04.2019, 20:23
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

It works, but I still have a question, if I want to add different special characters, what is the order of matching?
Quote:
The regular expression contains an invalid character.

https://i.postimg.cc/9Fhz3PCm/Screen...t-12-57-PM.jpg
Reply With Quote
  #39  
Old 25.04.2019, 21:22
djmakinera djmakinera is offline
Banned
 
Join Date: May 2010
Location: Poland
Posts: 8,387
Default

Recognize These Characters:
!#$%&'()*+,-./:;=?@\^_~

Character Not Allowed at End:
!'(),.:;?
Reply With Quote
  #40  
Old 26.04.2019, 14:23
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,286
Default

Can you again provide an example?
You want complete line without
Quote:
!'(),.:;?
__________________
JD-Dev & Server-Admin
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 16:43.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.