#1
|
||||
|
||||
ReCapcha solution discussion
ReCapcha recognition might be simpler than OCR. It might be cracked by creating a map from physical characteristics of each "word" to a letter sequence. 98% accuracy is plenty (if multiple attempts are taken before giving up on a link). This approach is simpler, if less reliable than letter or skeleton recognition. The main drawback is periodic redefinition of the tri as new words are added.
This is a measure + dictionary approach and has broad application. What makes this workable is that ReCapcha has a limited dictionary, so specific recognition for each word should work. In other words, word recognition, Measures include:
A tri is better than a neural network, because it is easier to store in a database, requires only one pass training, and contains fewer connections (a neural network contains one input node per input (n), one output node per possible output(N), and n*N connections. If an intermediate row is required this is n*N*N connections. The calculations listed above can all be done in "int32 fixed point". The 256's are to return a range of integers. Floor is specified, because of normal integer division. Round can also be used. Code:
Width and Height are integers. These are defined after the actual word box is found. Density could be the floor of (256 * (total number of pixels with total RGB < 384)) / (width * height) Range: (0-256) X Center of mass (CM1x) could be floor (((sum of (X position [base 1] if RGB < 384) ) *256) / (width * widtj)) averaged over all rows. Example: [ . . H . H . . . . H . . ] is (256 * (3 + 5 + 10) )/(12*12) = 2 Range: (0 ... 256) X Center of momentum (CM2x = second order Center of Mass) could be (CM1x * CM1x). Range: (0 ... 32768) X median (CMm1x) is position (base 1) of middle pixel (averaged over all rows) Example: [ . . H . H . . . . H . . ] is 5 Range: (0 ... width) Standardized X median can be sCMm1x = (256 *CMm1x)/width Example: [ . . H . H . . . . H . . ] is 106 Range: (0 ... 256) Convolution of a saved pattern with a given word box is a last resort. It is compute intensive. It can be done in spacial or frequency domain. After width, height, density, and (X and Y)center of mass, statistical methods should be used to determine the best order to apply characteristics. The tri can be simplified by use of standardized ranges (for a range 0f 0 to 256, the ranges could be <=0, 1..16, 17..32. etc.). This is less accurate than setting ranges by equalizing the number of leaves in that range, but it allows a faster tri build and quick indexing to find the correct key on retrieval. ------------------------------------ Answers to questions posed in http://board.jdownloader.org/showthread.php?t=11101 Quality OCR programs are 100% on known roman typefaces (where the imaging quality is excellent); even "ransom note" typefaces are 100%. Quality OCR on hand written roman block lettering (often called hand printing or manuscript) is about 98% on random samples (multiple hands) for clear images. _________________ OCR does not take a Turing, Knuth, or Dijkstra. It takes somebody smart, but not hyperintelligent. The real requirement is to be able to combine dictionaries, neural networks, expert systems, and image processing techniques in one package. This is called a computer science generalist. Last edited by drbits; 26.11.2009 at 21:23. Reason: Additions |
|
|