Recaptcha recognition. Testing a universal CAPTCHA recognizer. How to hack his own captcha using Google

Hello dear readers of the blog site. Anti-captcha (temporarily it was Antigate) is a multifunctional platform for automatic recognition of the so-called captcha (protection from automatic posting by bots, as well as protecting search engines from parsing their output).

Approximately every second site in the world is forced to resort to such protection measures when suspicious bot activity is detected. CAPTCHA for a blog, for example, allows you to avoid the appearance of a mass of the same type of messages, frequent posting and advertising in the comments.

All these bots create a significant load on sites, and captcha is great for the role of identifying a person among an army of bots. However, for every action there is a reaction. There are people who need to solve captchas constantly (those who use special software for mass registrations, analysis of search results, etc.).

It is just such people that are offered a system for automatically solving these puzzles - Anticaptcha... They have a whole army of workers (they are lured - this is the other side of AntiCaptcha). Let's figure out how to use the Anti-captcha resource and what is required for this.

What is Captcha and why does it appear?

Also, Antigate can be useful when promoting a service or a page on a social network using your own script. In a word, there are many areas of application, which distinguishes this resource from its counterparts, sharpened for purely 2-3 functions and with high prices.

How to work with the AntiCaptcha service

Before creating your first order, go through a short registration.

All you need is email. A password will be sent to it.

Then you need top up balance on the service. The minimum limit is from one cent, which is perfect for testing the Anti-CAPTCHA functions. In addition, you can take advantage of a special offer: the first captcha will be solved for free.

Further captchas will cost from $ 0.001 (regular) to $ 0.002 (Google ReCaptcha). Do not forget copy API keywhich is generated automatically. You will find it on the main page of the resource.

Download the Antigate application to your PC (you can also do this on the official website) and activate the automatic captcha solving mode. To clarify statistics, take a look at the menu. It is located in the right corner of the page.

You can top up your balance on the main page by clicking on the button "top up your account". Alternative solution - open the "Finance" item in the menu and click on "Add funds" already there. You can control the process directly from the Settings menu. For example, to change the information in your account.

To make your work more comfortable, use additional tools that can also be found in the menu. Using the "help" window, you can view the documentation on the operation of the service, as well as read the news of the project, find answers to any questions about the work of AntiCaptcha.

Developer contacts

If you have any questions about the site or application, you can always write to the developers. The official address of AntiCaptcha is [email protected]

There are also several specialized forums where you can consult with users of the site. You can find them using any search engine.

Referral system

The creators of the service reacted to the conditions of the referral system with special attention. That is why, if you plan to make money on your own referrals, be sure to read the rules. There are few of them, and they are simple to fulfill, but if you do not comply with obligations, you will not be able to increase your income.

Sub-accounts can be as follows:

  1. Inactive subaccounts are registered in the system, but do not use the service, do not deposit funds into it.
  2. Active - registered users, but with a small number of orders - no more than fifty captchas per week.
  3. Making expenses - accounts that order solving from fifty captchas and more weekly.
  4. Those who make expenses through the application are users who also ordered solving captchas (more than 50 pieces), but already through a special utility "AppCenter".

By referring new customers, you can earn 10% from referrals “making spending” and 5% from referring “making spending through the app”. To receive funds, you need to have at least five sub-accounts with an elevated status (active users whom you have attracted to the site). Immediately after downloading the captcha, you will be credited with the indicated amount.

The number of invited referrals is not limited by the rules, but one user can have no more than ten referral links.

Summary

Resource "Anticaptcha" is ideal for mass mailings and promotion on various sites. The low cost of services and the possibility of making money on referrals are conducive to use. By bringing new users, you can save on captcha decryption fees.

Itoza is a completely affordable and even inexpensive solution for SEO masters and SMM. What else? Unless you can look (in fact, this is their main competitor).

Good luck to you! See you soon on the blog site pages

You might be interested

CAPTCHA (captcha) - what is it and what is it used for
FAQ and FAQ - what is it?
Feedback for the site using form scripts in Html and Php, as well as online constructors and generators
Radikal - free photo hosting with fast and easy photo upload via Radikal.ru What is a bot - the purpose of creation, types of programs and examples of use How to delete mail and mailbox on Mail.ru, Yandex and Gmail Bitfun - how to earn satoshi on the popular bitcoin faucet Internet Survey - what makes InternetOpros stand out among other paid surveys + 10 secrets of increasing earnings on it BonusBitcoin - Bitcoin faucet with great reviews Google Translate - translation from photos, voice input, phrasebook, offline mode and much more

  • Recognition of captchas using recognition services and built-in tools also works in the free version of the program
  • Quote Wikipedia
    - CAPTCHA
    (from the English Completely Automated Public Turing test to tell C omputers and Humans Apart - a fully automated public Turing test for distinguishing computers and people) is a trademark of Carnegie Mellon University, which developed a computer test used to determine by whom is a user of the system: a human or a computer.

    More and more often on the Internet, when working on automation, registration, adding messages, comments, announcements, and so on, we are faced with a test of recognizing who the user of the system is: a human or a computer. This computer test is called "CAPTCHA" and often it spreads more and more every day, respectively, and the algorithm and complexity of this test is constantly being improved to reduce the vulnerability of test bypass and its recognition.
  • As a result, at the moment we have different types of captchas, the main ones, which I will describe more often below:
  • # 1 yaCAPTCHA

    This is one of the earliest and most widespread types of spam protection. Usually it is set for registration on forums and sites. For blogs, I would not recommend putting it, since the captcha is quite complicated, and some users simply because of laziness to recognize and enter it, simply do not want to leave a comment. # 2 Anti Spam Image


    Very similar to the first type of captcha, but here a note is displayed next to the picture, for example, "enter only red characters", or "enter only numbers" or "enter only letters". Thus, if spammers have a robot that can recognize characters from a picture, then it will logically enter all the characters, and not exactly those that are required in the note. No. 3 SI Captcha Anti-spam


    Just like the first 2 types, this plugin displays captcha in the form of numbers and letters, but here you can listen to what is shown in the picture.
    # 4 reCAPTCHA


    Another kind of captcha with the output of symbols, here you also have the ability to play symbols in the picture. Typically, a CAPTCHA form consists of two words. This type is also more suitable as a captcha for a site where registration is required than a captcha for a blog, where you just need to leave a comment. # 5 Simple CAPTCHA


    Captcha displays various characters, you cannot listen to them, but if they are not visible, then by clicking on the adjacent buttons, the characters in the picture can be replaced. At the same time, without refreshing the page, that is, without losing the written comment in the field. # 6 Math Comment Spam Protection


    Here, on the form with captcha, two numbers are displayed, but you need to enter not them, but their sum. Again, if the robot can recognize the numbers in the picture, then adding them and entering their sum in the field is already problematic for the robot. # 7 WP-NOTCAPTCHA


    This is a pretty funny and simple captcha for a human, but difficult for a robot. Here you just need to move the slider under the picture so that the pictures are arranged vertically. # 8 ImHuman


    It is also quite an interesting form with captchas, and at the same time very difficult for robots. Several pictures are displayed here, and you must select one of them, which is written in the note. # 9 Checkbot



    This type of captcha is one of the easiest and most convenient ways to protect against spam. Here you just need to choose a little man with a raised hand. # 10 Dcaptcha - I am not a robot (YA-ne-robot)


    This is the simplest captcha for blogs. Here, as you can see, to confirm that you are a human, not a robot, you just need to check the box. But our Human Emulator program will cope with all this routine of solving captcha using the appropriate captcha services.

    The principle of these services is simple. You register with any service convenient for you, replenish your account with the amount you need. You will find "captcha key" in your account
    aka $ api_key
    - this is the key of the recognition service, which must be specified in different programs, including ours, to connect the corresponding service. This is how it works, this algorithm is similar for most captcha services:

    1. Your application uploads the captcha to our server and gets its unique ID. (Via HTTP POST, multipart or base64 methods).
    2. We expect 10 seconds (the average minimum time for which our employees enter the text from the captcha).
    3. Doing HTTP GET
    request with captcha ID to our server. You get either the text from the captcha, or the CAPCHA_NOT_READY code
    meaning she's not ready yet.
    4. If you receive CAPCHA_NOT_READY, try again in 5 seconds (step 3).
    5. If you received OK | SOME_TEXT_HERE, then SOME_TEXT_HERE is your captcha text.

  • Human Emulator has eight functions for captcha recognition, such as:
    recognize_captcha
    - to recognize a picture from a disk as a captcha.
    recognize_by_anticaptcha
    - recognize a captcha of an image through the anti-captcha service
    recognize_by_rucaptcha
    - recognize captcha images through the rucaptcha.com service
    recognize_by_captcha24
    - recognize a captcha of an image through the captcha24.com service
    recognize_by_ripcaptcha
    - recognize captcha images through ripcaptcha.com service
    recognize_by_evecaptcha
    - to recognize the captcha of a picture through the eve.cm service
    recognize_by_bypasscaptcha
    - recognize the captcha of a picture through the bypasscaptcha.com service
    recognize_by_captchabot
    - recognize captcha images through the captchabot.com service
  • For clarity, let's look at an example of recognizing google captcha using the antigate.com service
$ xhe_host \u003d "127.0.0.1:7011"; // The following code is required to properly run XWeb Human Emulator require ( "../../Templates/xweb_human_emulator.php") ; // Go to the example of captcha on google site $ browser -\u003e navigate ("http://google.ru/sorry"); // Recognize the captcha and enter your custom api_key echo $ captcha \u003d $ image -\u003e recognize_by_anticaptcha ("/ sorry / image? id \u003d", "C: \\ T emp \\ 1 .jpg ", "$ api_key is your recognition service key ", "http://antigate.com"); // Enter the result of the captcha in the required field $ input -\u003e send_keyboard_input_by_name ("captcha", " $ captcha "); // Quit $ app -\u003e quit ();
  • Below are links to the description of objects containing functionality that allows you to use the services API for captcha recognition.
  • At the present time, services for working with captchas are more and more relevant and demanded for their use in various Internet resources and services, they are rapidly developing and increasing their functionality, at the same time the HumanEmulator program tries to keep up with the times and is increasingly introducing them into its internal structure functional for working with these services. Summing up the results of the above, we can say with confidence that when working in tandem (bundle) of captcha services and our software, you can easily and with confidence solve most of the types of captchas presented on the Internet. But there is no limit to perfection, and therefore we will be happy to add and implement everything new that will be associated with these services and functionality.

    Good day, ladies and gentlemen.

    Automatic captcha recognition services can help in a wide variety of situations. For example, they greatly facilitate the work of programs for collecting the semantic core - Kay Collector, SlovoEb, etc., applications for checking text for uniqueness and rewriting - AntiPlagiarism.

    With large volumes of the same text or requests, you may face the fact that a request to enter a captcha will take off every 10 seconds. Not very convenient, right? AntiCaptcha deprives you of the need to enter these same numbers and letters manually. This is done by other people who make money from solving them. You only need to pay for services that offer automatic captcha input.

    Most of the programs that cooperate with online services (Wordstat, Google Analytics, etc.) require constant captcha input. It is not profitable for such online projects to have bots working with them, so they are trying with all their might to fight this.

    But what about simple webmasters who decide to collect semantics or parse data from analytics services? Do it by hand? Not a very sensible decision, especially since now there are a lot of programs for decrypting captcha, and free ones.

    Captcha is solved by real people, receiving a reward for it. They work in a special window, the script of which redirects the captcha from your program directly to them. Autocomplete occurs when entered correctly. Your application runs smoothly and you don't have to worry about it anymore.

    Captcha recognition sites offer their employees a flat rate per captcha. You, as customers, need to deposit a certain amount to the balance. It will gradually decrease.

    Automatic input services do not require large investments. 300 - 400 rubles for several months, or even six months, will be quite enough. But it also depends on the amount of use.

    With the help of special codes or data from an account on such a site, you can integrate the desired application with the service.

    List of online services for captcha recognition

    If you also want your utilities to work in auto mode, then you need to familiarize yourself with this list. Here I will present to your judgment the most popular sites that will help you get rid of the need to enter captcha manually.

    RuCaptcha

    RuCaptcha is a popular project that solves the problem of working with many applications. The prices here are higher than in the others, by 10 rubles, but the quality and speed of work correspond to this.

    It knows how to work with all types of verification for a robot, so you don't have to worry if a new captcha from Google suddenly pops up, where you need to select some road signs, etc., RuCapcha users can easily cope with this in a couple of minutes.

    The rest of the service is similar to the others. Easy API, integration with almost any program and, most importantly, a large number of performers. Many people know in their spare time, thereby helping ordinary users.

    2Captcha

    An English-language resource, very similar to RuKapcha. The average price for 1,000 answers is half a bucks. Based on this, we can conclude that the prices are the same as in the CIS market.

    2Captcha works great with Google. As a rule, there are English-speaking employees who specialize purely in Google captchas. There may be a problem with Russian versions (from the same Yandex). But I think there will be a performer there too.

    Anti Captcha

    Anti Captcha is a modern service (formerly Antigate) that provides services for automatically solving characters. The project is distinguished by the most simplified API, a large number of performers and low prices.

    Comparative cheap prices and high-quality performance of the service will definitely not leave you indifferent. The site is known in Runet, and therefore the average time for solving symbols is only 10 - 15 seconds. That is, you almost never have to wait until your captcha is solved.

    The project is suitable for recognition directly in the browser. A mutually useful option that can help newbies make money while making the job easier for professionals.

    Which of the services to choose - decide for yourself. Each has its own advantages and disadvantages. One thing can be said: each project has been working for quite a long time. You do not have to worry that you will be deceived, your money stolen or some viruses will be sent to your PC. This will definitely not happen, however, this cannot be said about others.

    Be careful when choosing an anti-captcha service. Runet is full of fakes that engage in fraud. If you suddenly decide to try a cheaper unknown project, then it would be better for you to check the reviews about it before using it. It is possible that this is a phishing resource that collects money from gullible users.

    Instructions for working with services

    After you choose the online anti-captcha service, you will need to use it somehow. Usually, such services have special keys - you get them in your account, and then enter them in a special field in the application. As part of today's material, I will consider RuKapcha.

    Go to the section “API webmaster”, where we see something like this.

    There is a “captcha KEY” field here - that's what we need. We copy this key and go to the anti-captcha settings of our program.

    Check the box “Use anti-captcha service”, select the service from the drop-down list and insert the key. Done! Now our application will automatically “solve” the captcha using the appropriate service. No further action is required from you. Just top up your account on the site in a timely manner.

    The settings in all these programs are almost the same. And in Kei Collector, and in Slovoyob, and in any other application, everything will look something like the one I described.

    Conclusion

    Now you know how to bypass character input and various "Are you a robot?" using online services. Convenient practice and simple implementation. You can permanently remove the captcha from your life, only occasionally replenishing the balance. As a rule, very little money is spent on such projects, but how much benefit.

    In the same Key Collector, this captcha can very often take off, preventing you from doing your work. And so, we connected the program to the service, launched the collection of the semantic core and you can go about your business. The same goes for other utilities that require constant input of characters.

    There are different ways to bypass the CAPTCHAs that secure sites. Firstly, there are special services that use cheap manual labor and offer to solve 1000 captchas literally for $ 1. Alternatively, you can try to write an intelligent system that, according to certain algorithms, will perform the recognition itself. The latter can now be implemented using a special utility.

    Solve CAPTCHA

    CAPTCHA recognition is often a non-trivial task. It is necessary to apply a lot of different filters to the image in order to remove distortion and interference, which the developers want to strengthen the protection resistance. Often, you have to implement a trainable system based on neural networks (by the way, this is not as difficult as it might seem) in order to achieve an acceptable result for the automated solution of captchas. To understand what I'm talking about, it's better to pick up the archive and read the wonderful articles “Cracking CAPTCHA: Theory and Practice. Understanding how captchas are broken ”and“ Let's peep and recognize. Hacking Captcha Filters ”from # 135 and # 126 numbers respectively. Today I want to tell you about the development of TesserCap, which the author calls a universal CAPTCHA solver. Curious thing, whatever one may say.

    First look at TesserCap

    What did the author of the program do? He looked at how they usually approach the problem of automated CAPTCHA solution and tried to summarize this experience in one tool. The author noticed that to remove noise from the image, that is, to solve the most difficult problem in captcha recognition, the same filters are most often used. It turns out that if you implement a convenient tool that allows you to apply filters to images without complex mathematical transformations, and combine it with an OCR system for text recognition, you can get a completely workable program. This, in fact, was done by Gursev Singh Kalra from McAfee. Why was it necessary? The author of the utility decided to check how safe the captchas of large resources are. For testing, we selected those Internet sites that are the most visited according to the version of the well-known statistics service. Monsters such as Wikipedia, eBay, and also the captcha provider reCaptcha became candidates for testing.

    If we consider in general terms the principle of the program's functioning, then it is quite simple. The original captcha goes to the image preprocessing system, which cleans the captcha from any noise and distortions and passes the resulting image through the conveyor to the OCR system, which tries to recognize the text on it. TesserCap has an interactive graphical interface and has the following properties:

    1. Has a versatile image preprocessing system that can be configured for each individual captcha.
    2. Includes Tesseract recognition engine that extracts text from a previously parsed and rendered CAPTCHA image.
    3. Supports the use of various encodings in the recognition system.

    I think the general meaning is clear, so I propose to see how it looks. The versatility of the utility could not but lead to the complication of its interface, so the program window can lead to a small stupor. So, before proceeding directly to the recognition of captchas, I propose to deal with its interface and built-in functionality.


    Image preprocessing and extraction
    text from captcha

    About

    We could not help but say at least a couple of words about the author of the wonderful TesserCap utility. His name is Gursev Singh Kalra. He serves as Principal Consultant for Foundstone Professional Services, an affiliate of McAfee. Gursev has spoken at conferences such as ToorCon, NullCon and ClubHack. He is the author of TesserCap and SSLSmart tools. In addition, he developed several tools for the company's internal needs. Favorite programming languages \u200b\u200bare Ruby, Ruby on Rails and C #. Foundstone® Professional Services, where he works, provides organizations with expertise and training to ensure that their assets are continually and effectively protected from the most pressing threats. The Professional Services team is comprised of renowned security experts and developers with extensive experience working with international corporations and government

    Interface. Main tab

    After starting the program, we see a window with three tabs: Main, Options, Image Preprocessing. The main tab contains controls that are used to start and stop the CAPTCHA image test, generate test statistics (how many guessed and how many not), navigate and select an image for preprocessing. The URL entry field (control # 1) must contain the exact URL that the web application uses to retrieve the captchas. The URL can be obtained by clicking on the right side of the CAPTCHA image, copying or viewing the page code, and extracting the URL from the src attribute of the ..site / common / rateit / captcha.asp? Image tag. Next to the address line, there is an element that sets the number of captchas that need to be loaded for testing. Since the application can display only 12 images at a time, it provides controls for page-by-page scrolling of downloaded captchas. Thus, during large-scale testing, we will be able to scroll through the downloaded captchas and view the results of their recognition. The Start and Stop buttons start and stop testing, respectively. After testing, you need to evaluate the results of image recognition, marking each of them as correct or incorrect. Well, the last, most significant function is used to transfer any image to the preprocessing system, in which a filter is set that removes noise and distortion from the image. To send an image to the preprocessing system, right-click on the required image and select Send To Image Preprocessor from the context menu.

    Interface. Options tab

    The options tab contains various controls for configuring TesserCap. Here you can select an OCR system, set web proxy settings, enable image forwarding and preprocessing, add custom HTTP headers, and specify a range of characters for the recognition system: numbers, lowercase letters, uppercase letters, special characters.

    Now about each option in more detail. First of all, you can choose an OCR system. By default, only one is available - Tesseract-ORC, so you don't have to bother with a choice. Another very interesting feature of the program is the choice of a range of characters. Take, for example, a captcha from a website - you can see that it does not contain a single letter, but consists only of numbers. So why do we need extra characters that will only increase the likelihood of incorrect recognition? But what about the Upper Case? Will the program be able to recognize a captcha consisting of capital letters of any language? No, it cannot. The program takes a list of characters used for recognition from configuration files located in \\ Program Files \\ Foundstone Free Tools \\ TesserCap 1.0 \\ tessdata \\ configs. Let me explain with an example: if we selected the Numerics and Lower Case options, then the program will refer to the lowernumeric file starting with the tessedit parameter charwhitelist. This is followed by a list of symbols that will be used to solve the captcha. By default, the files contain only letters of the Latin alphabet, so to recognize the Cyrillic alphabet you need to replace or supplement the list of characters.

    Now a little about what the Http Request Headers field is for. For example, on some websites you need to log in in order to see the captcha. For TesserCap to be able to access the captcha, the program needs to pass in the HTTP request headers such as Accept, Cookie and Referrer, etc. Using a web proxy (Fiddler, Burp, Charles, WebScarab, Paros, etc.), you can intercept the sent request headers and enter them into the Http Request Headers input field. Another option that will surely come in handy is Follow Redirects. The point is that TesserCap does not follow a redirect by default. If the test URL needs to follow the redirect to get the image, you must select this option.

    Well, the last option remains, which enables / disables the image preprocessing mechanism, which we will consider further. Image preprocessing is disabled by default. Users first set up image preprocessing filters according to the tested CAPTCHA images and then activate this module. All CAPTCHA images loaded after the Enable Image Preprocessing option is enabled are preprocessed and then passed to the Tesseract OCR system for text extraction.

    Interface. Image Preprocessing Tab

    Well, now we got to the most interesting tab. It is here that filters are configured to remove various noises and blurring from captchas, which try to complicate the task of the recognition system as much as possible. Setting up a universal filter is extremely simple and consists of nine steps. At each stage of image preprocessing, its changes are displayed. In addition, the page has a validation component that allows you to evaluate the correctness of captcha recognition when a filter is applied. Let's consider each stage in detail.

    Stage 1. Color inversion

    At this stage, the pixel colors for the CAPTCHA images are inverted. The code below demonstrates how this happens:

    For (each pixel in CAPTCHA) (if (invertRed is true) new red \u003d 255 - current red if (invertBlue is true) new blue \u003d 255 - current blue if (invertGreen is true) new green \u003d 255 - current green)

    Inverting one or more colors often opens up new possibilities for validating a tested CAPTCHA.

    Stage 2. Color change

    At this step, you can change the color components for all pixels in the image. Each numeric field can contain 257 ( 1 to 255) possible values. For RGB components of each pixel, depending on the value in the field, the following actions are performed:

    1. If the value is -1, the corresponding color component does not change.
    2. If the value is not -1, all found components of the specified color (red, green, or blue) are changed according to the value entered in the fields. A value of 0 removes the component, a value of 255 sets its maximum intensity, and so on.

    Step 3. Grayscale (Grayscale)

    In the third step, all images are converted to grayscale images. This is the only required image conversion step that you cannot skip. Depending on the selected button, one of the following actions is performed related to the color component of each pixel:

    1. Average -\u003e (Red + Green + Blue) / 3.
    2. Human -\u003e (0.21 * Red + 0.71 * Green + 0.07 * Blue).
    3. Average of minimum and maximum color components -\u003e (Minimum (Red + Green + Blue) + Maximum (Red + Green + Blue)) / 2.
    4. Minimum -\u003e Minimum (Red + Green + Blue).
    5. Maximum -\u003e Maximum (Red + Green + Blue).

    Depending on the intensity and color distribution of the CAPTCHA, any of these filters can enhance the extracted image for further processing.


    Stage 4. Anti-aliasing and Sharpening

    To make it more difficult to extract text from CAPTCHA images, noise is added to them in the form of single-pixel or multi-pixel dots, extraneous lines, and spatial distortion. When the image is smoothed, random noise increases, which are then eliminated by Bucket or Cutoff filters. In the numerical field Passes, you should indicate how many times the corresponding image mask should be applied before proceeding to the next stage. Let's take a look at the filter components for smoothing and sharpening. There are two types of image masks available:

    1. Fixed masks. TesserCap has six of the most popular image masks by default. These masks can smooth the image or sharpen (Laplace transform). Changes are displayed immediately after selecting a mask using the corresponding buttons.
    2. Custom image masks. The user can also set up custom image processing masks by entering values \u200b\u200bin the numeric fields and clicking the Save Mask button. if the sum of the coefficients in these windows is less than zero, an error is generated and the mask is not applied. When choosing a fixed mask, the Save Mask button is not required.

    Step 5. Introducing shades of gray

    At this stage of image processing, its pixels can be colored in a wide range of shades of gray. This filter displays grayscale distribution in 20 buckets / ranges. The percentage of grayscale pixels in the range 0 to 12 is in bucket 0, the percentage of grayscale pixels in the range 13 to 25 is in bucket 1, and so on. choose one of the following for each range of grayscale values:

    1. Leave As Is.
    2. Replace with White.
    3. Replace with Black.

    With these options it is possible to control different ranges of grayscale and also reduce / remove noise by changing the grayscale to white or black.

    Step 6. Setting the cutoff

    This filter plots the dependence of the gray level value on the frequency of occurrence and offers to select a cutoff. The principle of the cut-off filter is shown below in pseudocode:

    If (pixel's grayscale value<= Cutoff) pixel grayscale value = (0 OR 255) -> depending on which option is selected (<= или => : Set Every Pixel with value<=/=> Threshold to 0. Remaining to 255)

    The graph shows the detailed distribution of CAPTCHA pixels by color and helps remove clutter by clipping gray level values.

    Stage 7: chopping

    After applying anti-aliasing, clipping, bucketing, and other filters, CAPTCHA images can still be cluttered with single-pixel or multi-pixel dots, extraneous lines, and spatial distortion. The principle of the chipping filter is as follows: if the number of adjacent pixels colored in a given shade of gray is less than the value in the numeric field, the chipping filter assigns them a value of 0 (black) or 255 (white) at the user's choice. In this case, the CAPTCHA is analyzed both horizontally and vertically.

    Step 8: Change the border width

    According to the author of the utility, during the initial research and development of TesserCap, he repeatedly noted that when CAPTCHA images have a thick border line and its color is different from the main background of the CAPTCHA, some OCR systems cannot recognize the text. This filter is designed to process boundary lines and change them. Boundary lines with the width specified in the numeric field are colored black or white at the user's choice.

    Step 9: Invert the gray tint

    This filter goes through each pixel and replaces its gray level value with a new one, as shown in pseudocode below. Grayscale inversion is performed to fit the image to the color settings of the OCR system.

    For (each pixel in CAPTCHA) new grayscale value \u003d 255 - current grayscale value

    Step 10: Verify Captcha Recognition

    The purpose of this step is to pass the preprocessed CAPTCHA image to the OCR system for recognition. The Solve button takes the image after the grayscale inversion filter, sends it to the OCR system to extract the text, and displays the returned text in a GUI. If the recognized text matches the text on the captcha, then we have correctly set the filter for preprocessing. Now you can go to the options tab and enable the Enable Image Preprocessing option to process all subsequent uploaded captchas.

    Recognizing captcha

    Well, perhaps we have considered all the options of this utility, and now it would be nice to test some captcha for strength ..


    The result of the analysis of the captcha site with a preliminary
    image processing. Based on the results, the filter
    failed to pick up

    So, we launch the utility and go to the magazine's website. We see a list of fresh news, go to the first one that comes across and scroll to the place where you can leave your comment. Yeah, it’s not easy to add a comment (of course, otherwise it would have been spammed long ago) - you need to enter a captcha. Well, let's check if this can be automated. Copy the URL of the picture and paste it into the TesserCap address bar. We indicate that you need to download 12 captchas, and click Start. The program dutifully uploaded 12 pictures and tried to recognize them. Unfortunately, all captchas were either not recognized, as evidenced by the -Failed- inscription under them, or they were not recognized correctly. In general, it is not surprising, since extraneous noise and distortion have not been removed. This is what we will do now. Right-click on one of the 12 downloaded images and send it to the Send To Image Preprocessor. Having carefully examined all 12 captchas, we see that they contain only numbers, so go to the options tab and indicate that only numbers need to be recognized (Character Set \u003d Numerics). Now you can go to the Image Preprocessing tab to configure filters. I must say right away that after playing with the first three filters ("Color Inversion", "Color Change", "Grayscale"), I did not see any positive effect, so I left everything there by default. I selected Smooth Mask 2 and set the number of passes to one. I skipped the Grayscale buckets filter and went straight to the clipping setting. I chose the value 154 and indicated that those pixels that are smaller should be set to 0, and those that are larger, to 255. To get rid of the remaining points, I turned on chopping and changed the border width to 10. The last filter did not make sense to include I immediately clicked on Solve.

    I had the number 714945 on the captcha, but the program recognized it as 711435. This, as you can see, is completely wrong. Ultimately, no matter how hard I fought, I did not succeed in properly recognizing the captcha. I had to experiment with pastebin.com, which I managed to recognize without any problems. But if you turn out to be more diligent and patient and manage to get the correct recognition of captchas from the site, then immediately go to the options tab and enable image pre-processing (Enable Image Preprocessing). Then go to Main and, by clicking on Start, download a fresh portion of captchas, which will now be pre-processed by your filter. After the program runs, mark the correctly / incorrectly recognized captchas (buttons Mark as Correct / Mark as InCorrect). From this moment on, you can view the summary statistics on recognition using Show Statistics. In general, this is a kind of report on the security of a particular CAPTCHA. If the question is about the choice of one or the other solution, then with the help of TesserCap it is quite possible to conduct your own testing.

    CAPTCHA check result on popular sites

    Website and share of recognized captchas:

    • Wikipedia\u003e 20-30%
    • Ebay\u003e 20-30%
    • reddit.com\u003e 20-30%
    • CNBC\u003e 50%
    • foodnetwork.com\u003e 80–90%
    • dailymail.co.uk\u003e 30%
    • megaupload.com\u003e 80%
    • pastebin.com\u003e 70-80%
    • cavenue.com\u003e 80%

    Conclusion

    CAPTCHA images are one of the most effective mechanisms for protecting web applications from automated form filling. However, weak captchas will be able to protect against random robots and will not resist purposeful attempts to solve them. Like cryptographic algorithms, CAPTCHA images, rigorously tested to provide a high level of security, are the best way to protect yourself. Based on the statistics provided by the author of the program, I chose reCaptcha for my projects and will recommend it to all my friends - it turned out to be the most persistent of those tested. In any case, do not forget that there are many services on the Web that offer a semi-automated CAPTCHA solution. Through a special API, you transfer an image to the service, and the service returns a solution after a short time. A real person (for example, from China) solves the captcha, getting his own pretty penny for it. There is no longer any protection. 🙂