JDownloader Community - Appwork GmbH
 

Closed Thread
 
Thread Tools Display Modes
  #21  
Old 25.04.2010, 05:04
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,437
Default Why Hotfile Asks for reCaptcha Only Sometimes

It has to do with the size of the file.

Hotfile is only issuing the reCaptcha challenge for files that are smaller than about 200.000.000 bytes = 190.73 MiB (it is > 190.72 and < 190.74).

Their affiliate program pays different amounts for different size files. Files over 100MiB receive the most payback. I think they are afraid that affiliates will use a program like JD to continuously download a file (changing IP address after each download), to cheat Hotfile. The cheater would minimize the file size downloaded to maximize profit.

drbits
  #22  
Old 25.04.2010, 12:13
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 48,698
Default

@drbits
Well many payhosters do not change their site against jd or add a captcha, maybe only because they don't know JD.
Whatever i am not a fan of the "get paid for downloads" system^^

GreeZ pspzockerscene
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
  #23  
Old 26.04.2010, 11:56
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,437
Default

Actually, I was thinking we could somehow make use of the 200MB boundary. Also, one of the users had asked about why hotfile.com doesn't always ask for a Captcha.

GreeZ drbits
  #24  
Old 26.04.2010, 14:24
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 48,698
Default

(This post is only about Hotfile)
Well the only thing we could do is add a setting "prevent captchas" so if this is enabled jd will only load files smaller than 100MB.
How about that ?
It's easy to do, even i can do it

GreeZ pspzockerscene
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
  #26  
Old 27.04.2010, 03:05
pauldmps
Guest
 
Posts: n/a
Default

Okay, I've observed some very simple things about reCaptcha. You all might already know these:

1. One word of the two words in the captcha is known & another one is unknown (you all obviously know this). However, it cannot be said which word is known & which is unknown. Sometimes the first is unknown, sometimes the second one.

2. It is important to get the known word right. But it is not important to get the unknown word right.

3. If you could count the number of letters in the unknown word & put as many spaces as the number of letters, the captcha will still be solved.

I am not a programmer so I can't give any hint that way. But let's see how we could approach the problem:

1. Identify the known word. Mostly by trial and error. Let the plugin always think that the first word is known word. If not, the captcha refreshes.

2. Decrypt the known (first) word.

3. Count the number of letters in the second (unknown) word. Put as many spaces as the letters
in the word +1 extra space for the space between the two words.
  #27  
Old 27.04.2010, 10:06
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,437
Default

You are doing well. But you aren't up to speed on everything.

1) The answer to the unknown word is not important, but a non-space answer is required. It is in our best interest to enter an incorrect word for the unknown word. Matching length would be good.

2) The real problem with reCaptcha is that it is so easy for Google to change things compared to how hard it is to solve.

3) Ignoring the obfuscation that Google adds (like the blotches or some changes in the lines), these are from hand written documents. None of the known OCR engines can solve the word (or it wouldn't be in the challenge as the unknown word). Once a statistically significant number of users give the same answer for an unknown, it becomes a know and is still just as hard.

4) Part of the obfuscation is to thicken the lines. This makes the letters touch and makes letter separation difficult (no counting the letters in advance). An easy example to see typed here are burn (Bum or Burn). Things get much worse when the letters are freehand. Bunting can look like dantim. pauldmps could look like fwnbrnao or powbna.

5) BeerCan has figured out how to identify the unknown. BeerCan, please correct me. They switch things around, but the odds are good that the second word is the unknown. The known words are only letters. The known words are usually longer than the unknowns.

6) Given a spelling checker and starting with the shortest words, we could identify the ligatures (combined letters, traditionally ae, fi, fl, ." (the period should appear centered under the closing quatation mark). In hand written documents you would have 100's of such combinations (1 to 4 per individual letter, Some letters are relatively easy, such as x and q which each have only a few combinations. Vowels can have many following letters). All of the letter and 2 letter combinations have been identified with their probability in linguistics books. Probabilities for 3 common letter combinations are available as well. The probabilities can be adjusted based on the words encountered.

Traditional filter, rotate, and then neural net will not work well here.

The blotches can easily be removed by edge tracing. When there is an anomaly in the edge (a letter crossing the edge), a cubic spline can be used to estimate the missing blotch edge. Then, the blotch can be removed by inverting all pixels that belong to the blotch. Slight adjustments when crossing letters may also help.

Either edge finding or center finding can be used to follow the flow of the letters, but simplify the information into a few kinds of curves, angles, lines, and gaps. That can be what is fed into the Neural Network identification (which returns a collection of letters, with probabilities for the first unknown letter). Where a line is skinny, we can use an intermediate value to represent it (meaning it might be a gap, such as the r+n vs m).

One form of obfuscation that Google seems to have added is thresholding the greyscale image to form a black and white image. The thresholds are apparently not always the same, but they are low enough to widen most letters and hide the "pen" and "hand". At the same time, when the center of the writing nib was drier than the edges, you get two thin lines instead of one fat line. Two close lines (usually from different letters) can appear as a single line, but the line thickness and length can help distinguish these situations. Using center finding allows us to trace thick lines based on the distance from the clockwise edge and later the counterclockwise edges.

Our best shot is to estimate the length (+/- 2), identify the most common letter pairs for the beginning of a word, and then identify the common letter pairs for the letters 2&3. The results would be ordered by the highest "probability" and the first two characters identified that way. The third character is identified in a similar way (now the probability is the probability of the first pair times the probability of the second pair).

----------------

Sorry this is so poorly organized. Let me try to summarize.

1) Process out the blotches (the big disks) and apply a despeckling filter.
2) We trace the edges to find a polyline that represents the skeleton of the letter.
3) We train a Neural Network to give us a probability of a letter, given a skeleton. We train it by manually separating the letters. The number of nodes will be around twice the number needed for printed text and the probabilities lower.
3a) A separate NN may be necessary for first letters.
4) Given each first letter (with the NN probability pNN), we apply the binary letter table which supplies pF(L1, L2)
4a) We threshold the probabilities to create a collection of letter pairs
5) We take the first letter in the pair and estimate where separation of the skeletons, based on matching the skeleton to the data. We can then apply a NN for the second letter. Again, the probabilities are multiplied.
6) At this point, we have the probability for each letter that it appears at the beginning of a word, the identification strength of the corresponding NN node(s), the probability of letter X appearing after the first (for each of the first letters above a threshold), and the NN node(s) strength for each of those second letters.
6a) Especially for vowels and capital letters, the number of nodes that identify the various forms of the letter is more than one. The usable result is the sum (OR) of the values that indicate that letter. Capital Q has two very distinct and different appearances (one looks like a large 2). The lower case a has the form with the top hook and the form without that hook. The distinction between whether a the line segment on the right ends at the top of the curve or beyond it can be handled by a NN node. Lower case q can look like a backwards p, the descender can end in a hook (acute corner to a sorter line segment), a loop (both ways), or the descender line may ascend beyond the curved portion (dq combined).
7) We go to our local NN expert (Jiaz?) to determine which shapes can use the same NN node. For simplicity, we have enough nodes so that no node represents returning two different letters (like q and a without the hooks).
8) The NN for the beginning of a word will likely be different from the NN for intermediate letters (the letter pair tables are divided into beginning, middle, and end tables).
9) We continue until we have choices and probabilities for the first three letters in the word. We can then use a dictionary (in a trei) to identify those combinations that are unlikely (give this a weight to multiply the value by). This eliminates some of the choices. We continue subtracting the skeleton of the guessed letter, running the NN on the next letter, multiplying by the probability of those letters following each other, and comparing with the dictionary.
10) At each point, we limit the number of guesses to a specific number of possibilities (first letter 26, two letters might be 16, three letters and on might be 8 or fewer).
11) Obviously, this needs diagrams.

If necessary, we can use the Gutenberg Foundation's files to determine some of the letter pairings for the pre-typewriter, pre-ballpoint era (typewriters were adopted in business starting around 1900, ballpoint pens were mid 20th century).

Good Morning Europe
drbits
  #28  
Old 04.05.2010, 04:11
pauldmps
Guest
 
Posts: n/a
Default

Any way to crack the audio captcha instead of the text ? If you google "cracking ReCaptcha", you'll find many reports saying that the audio has been cracked but not the text.
  #29  
Old 04.05.2010, 13:58
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 48,698
Default

Whatever, if YOU (users) can help us there its okay but we're doing nothing concerning a Re Captcha recognition atm...

If you don't like those captchas, buy premium OR just don't use the hosts that use Re Captcha, is it so hard to do that ??

GreeZ pspzockerscene
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
  #30  
Old 04.05.2010, 14:46
pauldmps
Guest
 
Posts: n/a
Default

Quote:
Originally Posted by pspzockerscene View Post
Whatever, if YOU (users) can help us there its okay but we're doing nothing concerning a Re Captcha recognition atm...

If you don't like those captchas, buy premium OR just don't use the hosts that use Re Captcha, is it so hard to do that ??

GreeZ pspzockerscene
This thread is was merely started to discuss potential ways to crack ReCaptcha by the developers. So it is basically a discussion thread. Whatever being discussed here may not be actually done. So no need to get angry here.
  #31  
Old 04.05.2010, 14:50
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 48,698
Default

@pauldmps

Don't worry i am not getting.
I'am just saying that the easiest way is just to never use services which have those captchas.
I got bo problem doing that
Well okay lets get back to the topic^^

GreeZ pspzockerscene
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
  #32  
Old 05.05.2010, 09:24
remi
Guest
 
Posts: n/a
Cool

If someone cracks reCaptcha, people can also crack this board.

I think this makes psp a little angry.
  #33  
Old 05.05.2010, 09:57
scr4ve's Avatar
scr4ve scr4ve is offline
JD-Dev & board tech
JD Logo by artcore-illustrations.de
 
Join Date: Feb 2009
Location: Germany, Lower Saxony
Posts: 241
Default

Thanks for your effort, drbits :-)

Two thoughts about cracking ReCaptcha:
  • JD isn't a small project. If we develop a recognition, Google might change it's captchas immediately.
  • ReCaptcha is mostly used for spam protection, JD is Open Source. Developing a recognition comes along with providing spammers with a recognition, too.
Anyway, I think it's nice to have, though more manpower is needed for that. As long as none of the big hosts (RS, MU, ...) are using ReCaptcha, this is a feature with relatively low priority for the main devs I guess.

Regards,


scr4ve

Last edited by scr4ve; 05.05.2010 at 10:02.
  #34  
Old 05.05.2010, 14:32
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 48,698
Default

@remi
Indeed it often makes me angry if no one uses the boardsearch for for now i calmed down so don't worry, i won't freak out here because it isn't ma thread so freaking out in it wouldn't be very nice

GreeZ pspzockerscene
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
  #35  
Old 05.05.2010, 23:05
drbits's Avatar
drbits drbits is offline
JD English Support (inactive)
 
Join Date: Sep 2009
Location: Physically in Los Angeles, CA, USA
Posts: 4,437
Default

This started as a discussion of possibilities. However, I believe that somebody at Google reads our public board <shock, bewilderment>. One of the dangers of being open source.

When we discuss a possible future technique, within days reCaptcha has changed in a way exactly designed to block that technique.

This thread gives people a place to present ideas, without messing up the other thread (which is now one message, followed by a lot of whining).

GreeZ drbits
  #36  
Old 06.05.2010, 23:28
scr4ve's Avatar
scr4ve scr4ve is offline
JD-Dev & board tech
JD Logo by artcore-illustrations.de
 
Join Date: Feb 2009
Location: Germany, Lower Saxony
Posts: 241
Default

Quote:
Originally Posted by drbits View Post
One of the dangers of being open source.
I agree.

After having switched to a new license* we can consider to develop a closed-source Recaptcha Plugin in case of emergency (e.g. megaupload employs ReCaptcha) as we do it with the DLC Plugin. This is definitely not a nice solution since we prefer open source, though we really don't want to support spammers.

*) Note: JD stays Open Source. License Change is needed as the GPL forbids us to use closed-source parts like the DLC Plugin.

Regards,

scr4ve
  #37  
Old 06.05.2010, 23:33
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 48,698
Default

@scr4ve
Awesome, Zippyshare will like JD even more then

GreeZ pspzockerscene
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
  #38  
Old 07.05.2010, 09:24
remi
Guest
 
Posts: n/a
Cool

Quote:
Originally Posted by scr4ve View Post
As long as none of the big hosts (RS, MU, ...) are using ReCaptcha, this is a feature with relatively low priority for the main devs I guess.
HF are currently number five in the ranking of hosts. I believe HF are using reCaptcha.

I agree with you that other people should do it. There are enough hackers and crackers out there.

Another way to stop reCaptcha and whatever Gogol might invent to make our lives more difficult, is to simply boycott Gogol. Punish them on a grand scale and they'll have to review their enslaving policy. Letting ordinary people 'scan' other people's books for free should stop.

Note that they currently also are among the worst privacy invaders and spies on Earth. I hope you won't have to use the cliché phrase "Wir haben es nicht gewusst" in the context of this extraordinary powerful and corrupt/evil company!
  #39  
Old 07.05.2010, 19:58
pspzockerscene's Avatar
pspzockerscene pspzockerscene is online now
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 48,698
Default

And still i think that JD will never have a working Re Captcha recognition.
Just imagine we would have a Re Captcha recognition (maybe even only for hotfile).
Hotfile would then just change their page to kill the JD plugin and the game starts from zero:rolleyes:

GreeZ pspzockerscene

If you don't wanna enter captchas, use hosts without captchas or buy premium.
Sorry, i always gotta write this again^^
__________________

Ad-free installers || Werbefreie Installer
Windows Setup<--JD2 BETA-->Linux Setup x86 || Linux Setup x64 || Mac Setup
-----=>Support Chat<=-----
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
That's true James
Quote:
Originally Posted by James
Die Leute verstehen einfach nicht dass nur weil man mit einer Waffe auch auf Menschen schießen kann dass ein Schützenver​ein kein Ort für Amoklaufide​en ist
  #40  
Old 08.05.2010, 02:54
pauldmps
Guest
 
Posts: n/a
Default

Dear PSP,
You know why the sites such as hotfile.com is using ReCaptcha ?
To stop people using JDownloader. Yes they know about it. (I've read an article in their news section about JD)

Now just imagine if the most important file hosters (if not all) use the same ReCaptcha, JD will be dead. This might happen in a few years. What will you do then ???
Closed Thread

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 22:26.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.