Unity3d system and computer vision apply pdf. Computer Vision. What is computer star? The value of the computer “bachiti”

Computer scanning and recognition depicts an invisible part (II), which has become very popular over the years. The CES 2017 exhibition has taken place in this day and age, where you can marvel at the remaining achievements in this area. sprat axis of all butts Vikoristannaya kompyuternogo zoru, kosno bulo bachit na vistavitsi.

8 applications of computer science

Veronica Yolkina

1. Self-driving cars

The largest stands with computer vision lie in the automotive industry. Of course, the technologies of unmanned and self-driving cars work, largely due to computer vision.

Products from NVIDIA, which has already made great strides in the field of technology, are used in many self-driving cars. For example, the NVIDIA Drive PX 2 supercomputer will also serve as the base platform for unmanned vehicles, Volvo, Audi, BMW and Mercedes-Benz.

NVIDIA's DriveNet ad hoc technology is a native computer system that operates on the basis of neural measurements. This includes lidars, radars, cameras and ultrasonic sensors building recognition sharpening, road markings, transport and much more.

3. Interface

Technologies for powering the eyes of the computer behind the scenes are being developed not only in gaming laptops, but also in basic and corporate computers, so that they can be used by people who cannot use their hands quickly. The Tobii Dynavox PCEye Mini is the size of a scullery pen, making it an ideal and discreet accessory for tablets and laptops. The same technology for accelerating the eyes is being used in new gaming primary laptops Asus and Huawei smartphones.

Tim will continue to develop for an hour gesture control(Computer vision technology, which allows you to recognize special hands with your hands). Now you will be victorious in upcoming BMW and Volkswagen cars.

The new HoloActive Touch interface allows home users to control virtual 3D screens and press buttons in space. We can say that the veen is a simple version of the useful holographic interface of the Interesting Person (the veen reacts in the same way with a slight vibration to the pressure of the elements). Thanks to technologies such as ManoMotion, it will be possible to easily add gesture control practically to any device. Moreover, to gain control over a virtual 3D object using additional gestures, ManoMotion uses the original 2D camera, so you don’t need any additional equipment.

eyeSight's Singlecue Gen 2 device provides advanced computer monitoring (gesture recognition, person analysis, action) and allows gesture-based monitoring of TVs, intelligent lighting systems, and refrigerators.

Hayo

Crowdfunding project Hayo may have the most exciting new interface. This technology allows you to create virtual controls throughout your home - by simply raising or lowering your hand, you can increase or change the volume of the music, or turn on the light in the kitchen by waving your hand over the stove . It all comes down to a cylindrical structure that includes a computer camera, a camera and 3D sensors, Infrared viprominuvaniya and the ruin.

4. Adjustments

Expensive cameras that show what is in the middle of your refrigerator no longer seem so revolutionary. What can you say about a program that analyzes images from a camera placed in the refrigerator and notifies you when you run out of food?

The elegant FridgeCam Smarter device attaches to the refrigerator wall and can indicate when the accessory term ends, inform you what is in the refrigerator, and recommend recipes for herbs from selected products. The device is sold at an undisclosed price at an affordable price- for only $100.

5. Digital whiskey

Computers can change how banners and advertisements look in stores, museums, stadiums and public parks.

At the Panasonic stand a demo version of the technology for projecting images on the ensign was presented. By using infrared markers, invisible to the human eye, and video stabilization, this technology can project advertising onto hanging banners and point to flags that flutter in the wind. Moreover, the images will look like this, something very effective is placed on them.

6. Smartphones and augmented reality

They talked a lot about the load that first supplied the mass with elements (AR). However, like other add-ons that can be captured on an AR train, this game relied more heavily on GPS and triangulation so that the pilots would perceive that the object was right in front of them. In smartphones, it is practically not necessary to use the latest computer technologies.

However, in the fall Lenovo released Phab2 - the first smartphone supported by Google Tango technology. This technology is a combination of sensors and software with a computer vision that can recognize images, videos and excess light in real time behind an additional camera lens.

At CES, Asus first introduced ZenPhone AR, a smartphone powered by Tango and Daydream VR powered by Google. A smartphone can not only connect hands, analyze precision and accurately determine the position, but also vikorystvo Qualcomm processor Snapdragon 821, which allows you to share your data with your computer. This all helps to use the latest augmented reality technologies to actually analyze the situation through the smartphone’s camera.

Later, Changhong H2 will be released - the first smartphone with a new molecular scanner. It collects light, which is selected from the object and divided into a spectrum, and then analyzes its chemical warehouse. nerds software security, Where the computer data is collected, the information can be used for a variety of purposes - such as registering facial expressions and reducing calories until the skin size is determined and the level of annual growth is increased.

On the 15th of spring there will be a conference in Moscow with great tributes Big Data Conference. The program contains business cases, technical solutions and scientific achievements of the leading experts in this field. We ask everyone who is interested in working with great data and wants to get involved in real business. Follow the Big Data Conference on
  • Image of the sample
  • The scope of computer vision is even wider: from barcode readers in supermarkets to augmented reality. In this lecture, you will learn how the computer system works and how it works, how images look in numbers, what is in this picture is clearly easy to see, what is important, and why.

    The lecture is reserved for high school students - students of Maly Shadai, and adults can learn a lot from it.

    The ability to learn and recognize objects is a natural and natural ability for humans. However, for a computer it is still very difficult. It’s time to hesitate to try to learn how to use a computer, I would like to know what people spend every day, without even noticing it.

    Chantly, most of the time people stare at the checkout counter in the supermarket with their computer eyes on them. Of course, you need to know about reading barcodes. The stinks were separated specially in this way, in order to simplify the reading process for the computer as much as possible. There are also more complex tasks: reading car license plates, analyzing medical images, flaw detection in vibration testing, character recognition, etc. The use of computer vision for the creation of augmented reality systems is actively developing.

    The difference between the human eye and the computer
    The child learns to recognize objects step by step. You begin to understand how the shape of an object changes depending on its position and lighting. Next, when recognizing objects, people focus on forward evidence. Over the course of their lives, people accumulate a great amount of information, and the process of neural networking does not slow down for a second. For a person, it is not particularly difficult to reconstruct the perspective from a flat picture and see how everything looked in three worlds.

    The computer is becoming more and more complex. And firstly through the problems of accumulating information. It is necessary to collect a large number of butts so that it is not possible to get out yet.

    In addition, when a person recognizes an object, he or she is immediately wary. As soon as you perceive an object from a certain point of view, recognizing it becomes much more important. Here the role of savings for life also plays a role, which the computer does not have.

    Boy or girl?
    It is obvious that we need to learn at one glance to become people (frightened!) by photography. To begin with, it is necessary to identify factors that may indicate belonging to another object. In addition, you need to take the initial factor. Bazhano, to be representative. Our sample is based on the initial selection of all those present in the audience. And on their basis we will try to find out important factors: for example, hair loss, the presence of a beard, makeup and clothing (the back of the head or pants). Knowing that each group of representatives of the same article has a combination of these and other factors, we can create clear rules: the visibility of these and other combinations of factors with any consistency Let us tell you what kind of people are in the photograph.
    machine learning
    Of course, this is a very simple and clever approach with a small number of upper-level factors. In real tasks, such as those posed to computer surveillance systems, there are many more factors. Picking them up by hand and cleaning the deposits is too much for a person to do. So in such situations there is no way to cope without machine skills. For example, you can identify dozens of cob officials, as well as set positive and negative applications. And when these factors are selected automatically, a formula is formed that allows decisions to be made. Dosit often and the officials themselves are seen automatically.
    Image in numbers
    Most often, to save digital images, the RGB color space is used. Each skin is assigned its own color based on three axes (channels): red, green and blue. Each skin channel displays 8 bits of information, so the color intensity on the skin axis can be set to values ​​in the range of 0 to 255. All colors in the RGB digital space come out in the same way There are three main colors.

    Unfortunately, RGB is not always good for information analysis. Experiments show that the geometrical proximity of questions goes far beyond the fact that people perceive the proximity of these and other questions one by one.

    Let us discover other colors of space. Velmi tsikavo in our context the space HSV (Hue, Saturation, Value). It has a whole Value, which indicates the intensity of the light. Here you can see the next channel, under the RGB view, where the value needs to be calculated every time. In fact, this is a black and white version of the image that can already be used. Hue appears in appearance and represents the basic tone. Depending on the Saturation value (stand from the center to the edge), set the color saturation.

    HSV is much closer to how we reveal our own color. If you show people dark red and green objects, you cannot separate the colors. The same applies to HSV. The lower you are on the V axis, the less the difference between the shades becomes, as the range of saturation values ​​decreases. In the diagram it looks like a cone, at the top of which there is a boundary black point.

    Color and light
    Why is it so important to read about the quantity of light? In most cases in the computer system, color does not have any meaning, since it does not carry any important information. Let’s look at the two pictures: colorful and black and white. It is easier to recognize all objects on the black and white version than on the color version. For us, colors in this period do not carry any additional significance, but create computational problems without any consequences. When we work with the color version of the image, the data, roughly speaking, is reduced to the level of a cube.

    The color is used only in rare cases, which nevertheless makes it possible to simplify calculations. For example, if you need to detect a person: it’s easier to immediately know what color can be seen in the image, focusing on the range of bodily colors. Therefore, there is a need to analyze the image as a whole.

    Local and global signs
    The signs that we use to analyze images can be local or global. If you look at this picture, you’ll probably say that it shows a red car:

    Such a statement is important that people saw the object in the image, and therefore described the local sign of the color. Behind the Great Rakhunk in the photo there are images of a forest, a road and a few cars. In terms of space, the car takes up a smaller part. We understand that the car in this picture is the most important object. If people want to know pictures similar to this, they will first select images that show the red machine.

    Detection and segmentation
    In computer science, this process is called detection and segmentation. Segmentation is the division of an image into anonymous parts, knitted friend with one visually and semantically. And detection means identifying objects in the image. Detection must be clearly distinguished from recognition. For example, on the same picture of a car you can detect a road sign. But it’s impossible to recognize him, so he’ll turn his back to us. So, when recognizing an individual, the detector can indicate the growth of the individual, and the “recognizer” will tell you why.

    Descriptors and visual words
    There are many different approaches to recognition.

    For example, this: on the image of the kidney you need to see the right points or the right places. Use something different from the background: bright colors, transitions, etc. There are a number of algorithms that allow you to create it.

    One of the most extensive methods is called Difference of Gaussians (DoG). By expanding the image with different radii and taking the results equally, you can find the greatest contrast fragments. The areas around these fragments are the most vulnerable.

    The picture below is an approximation of what it looks like. The selected data is recorded in the descriptor.

    In order for the new descriptors to be recognized independently of the rotations in the plane, they flare up so that the largest vectors are turned in one direction. It's never a bad idea to be shy. Otherwise, it is necessary to identify two new objects, spread out in different planes.

    Descriptors can be written in numerical form. The descriptor can be represented as a visible point in a rich array. We have a two-dimensional massif in our illustration. Our descriptors were stolen from him. We can cluster them - divide them into groups.

    Next, for the skin cluster, we describe an area in space. If the descriptor is lost in this area, what becomes important to us is not those in which it was lost, but in which of the areas it was sunk in. And then we can rank up the images, which means how many descriptors of one image appeared in the same clusters as the descriptors of another image. Such clusters can be called visually.

    To find not just the same pictures, but images of similar objects, it is necessary to take a blank image of the object and blank pictures that do not contain it. Then see descriptors from them and cluster them. Next, it is necessary to understand which clusters have the descriptors from the image on which object we need. Now we know that descriptors from a new image go into the same clusters, which means that the searched object is in a new presence.

    Avoiding descriptors does not guarantee the identity of their objects. One of the methods of additional verification is geometric validation. In this case, an equalization of the descriptors is carried out, obviously one at a time.

    Recognition and classification
    For simplicity, it is acceptable that we can divide all images into three classes: architecture, nature and portrait. We have our own nature, we can divide it into different species of creatures and birds. And having already realized that there are birds, we can say for ourselves: an owl, a seagull, or a crow.

    The difference between recognition and classification is clear. If we recognized the owl in the picture, then we will pay more for recognition. If it’s just a bird, then this is an intermediate option. And just like nature, there is an absolute classification. So the difference between recognition and classification lies in how deeply we went through the tree. And the longer the computer data is exposed, there will be less confusion between classification and recognition.

    The scope of computer vision is even wider: from barcode readers in supermarkets to augmented reality. In this lecture, you will learn how the computer system works and how it works, how images look in numbers, what is in this picture is clearly easy to see, what is important, and why.

    The lecture is reserved for high school students - students of Maly Shadai, and adults can learn a lot from it.

    The ability to learn and recognize objects is a natural and natural ability for humans. However, for a computer it is still very difficult. It’s time to hesitate to try to learn how to use a computer, I would like to know what people spend every day, without even noticing it.

    Chantly, most of the time people stare at the checkout counter in the supermarket with their computer eyes on them. Of course, you need to know about reading barcodes. The stinks were separated specially in this way, in order to simplify the reading process for the computer as much as possible. There are also more complex tasks: reading car license plates, analyzing medical images, flaw detection in vibration testing, character recognition, etc. The use of computer vision for the creation of augmented reality systems is actively developing.

    The difference between the human eye and the computer
    The child learns to recognize objects step by step. You begin to understand how the shape of an object changes depending on its position and lighting. Next, when recognizing objects, people focus on forward evidence. Over the course of their lives, people accumulate a great amount of information, and the process of neural networking does not slow down for a second. For a person, it is not particularly difficult to reconstruct the perspective from a flat picture and see how everything looked in three worlds.

    The computer is becoming more and more complex. And firstly through the problems of accumulating information. It is necessary to collect a large number of butts so that it is not possible to get out yet.

    In addition, when a person recognizes an object, he or she is immediately wary. As soon as you perceive an object from a certain point of view, recognizing it becomes much more important. Here the role of savings for life also plays a role, which the computer does not have.

    Boy or girl?
    It is obvious that we need to learn at one glance to become people (frightened!) by photography. To begin with, it is necessary to identify factors that may indicate belonging to another object. In addition, you need to take the initial factor. Bazhano, to be representative. Our sample is based on the initial selection of all those present in the audience. And on their basis we will try to find out important factors: for example, hair loss, the presence of a beard, makeup and clothing (the back of the head or pants). Knowing that each group of representatives of the same article has a combination of these and other factors, we can create clear rules: the visibility of these and other combinations of factors with any consistency Let us tell you what kind of people are in the photograph.
    machine learning
    Of course, this is a very simple and clever approach with a small number of upper-level factors. In real tasks, such as those posed to computer surveillance systems, there are many more factors. Picking them up by hand and cleaning the deposits is too much for a person to do. So in such situations there is no way to cope without machine skills. For example, you can identify dozens of cob officials, as well as set positive and negative applications. And when these factors are selected automatically, a formula is formed that allows decisions to be made. Dosit often and the officials themselves are seen automatically.
    Image in numbers
    Most often, to save digital images, the RGB color space is used. Each skin is assigned its own color based on three axes (channels): red, green and blue. Each skin channel displays 8 bits of information, so the color intensity on the skin axis can be set to values ​​in the range of 0 to 255. All colors in the RGB digital space come out in the same way There are three main colors.

    Unfortunately, RGB is not always good for information analysis. Experiments show that the geometrical proximity of questions goes far beyond the fact that people perceive the proximity of these and other questions one by one.

    Let us discover other colors of space. Velmi tsikavo in our context the space HSV (Hue, Saturation, Value). It has a whole Value, which indicates the intensity of the light. Here you can see the next channel, under the RGB view, where the value needs to be calculated every time. In fact, this is a black and white version of the image that can already be used. Hue appears in appearance and represents the basic tone. Depending on the Saturation value (stand from the center to the edge), set the color saturation.

    HSV is much closer to how we reveal our own color. If you show people dark red and green objects, you cannot separate the colors. The same applies to HSV. The lower you are on the V axis, the less the difference between the shades becomes, as the range of saturation values ​​decreases. In the diagram it looks like a cone, at the top of which there is a boundary black point.

    Color and light
    Why is it so important to read about the quantity of light? In most cases in the computer system, color does not have any meaning, since it does not carry any important information. Let’s look at the two pictures: colorful and black and white. It is easier to recognize all objects on the black and white version than on the color version. For us, colors in this period do not carry any additional significance, but create computational problems without any consequences. When we work with the color version of the image, the data, roughly speaking, is reduced to the level of a cube.

    The color is used only in rare cases, which nevertheless makes it possible to simplify calculations. For example, if you need to detect a person: it’s easier to immediately know what color can be seen in the image, focusing on the range of bodily colors. Therefore, there is a need to analyze the image as a whole.

    Local and global signs
    The signs that we use to analyze images can be local or global. If you look at this picture, you’ll probably say that it shows a red car:

    Such a statement is important that people saw the object in the image, and therefore described the local sign of the color. Behind the Great Rakhunk in the photo there are images of a forest, a road and a few cars. In terms of space, the car takes up a smaller part. We understand that the car in this picture is the most important object. If people want to know pictures similar to this, they will first select images that show the red machine.

    Detection and segmentation
    In computer science, this process is called detection and segmentation. Segmentation is the division of an image into impersonal parts, linked one to another visually or semantically. And detection means identifying objects in the image. Detection must be clearly distinguished from recognition. For example, on the same picture of a car you can detect a road sign. But it’s impossible to recognize him, so he’ll turn his back to us. So, when recognizing an individual, the detector can indicate the growth of the individual, and the “recognizer” will tell you why.

    Descriptors and visual words
    There are many different approaches to recognition.

    For example, this: on the image of the kidney you need to see the right points or the right places. Use something different from the background: bright colors, transitions, etc. There are a number of algorithms that allow you to create it.

    One of the most extensive methods is called Difference of Gaussians (DoG). By expanding the image with different radii and taking the results equally, you can find the greatest contrast fragments. The areas around these fragments are the most vulnerable.

    The picture below is an approximation of what it looks like. The selected data is recorded in the descriptor.

    In order for the new descriptors to be recognized independently of the rotations in the plane, they flare up so that the largest vectors are turned in one direction. It's never a bad idea to be shy. Otherwise, it is necessary to identify two new objects, spread out in different planes.

    Descriptors can be written in numerical form. The descriptor can be represented as a visible point in a rich array. We have a two-dimensional massif in our illustration. Our descriptors were stolen from him. We can cluster them - divide them into groups.

    Next, for the skin cluster, we describe an area in space. If the descriptor is lost in this area, what becomes important to us is not those in which it was lost, but in which of the areas it was sunk in. And then we can rank up the images, which means how many descriptors of one image appeared in the same clusters as the descriptors of another image. Such clusters can be called visually.

    To find not just the same pictures, but images of similar objects, it is necessary to take a blank image of the object and blank pictures that do not contain it. Then see descriptors from them and cluster them. Next, it is necessary to understand which clusters have the descriptors from the image on which object we need. Now we know that descriptors from a new image go into the same clusters, which means that the searched object is in a new presence.

    Avoiding descriptors does not guarantee the identity of their objects. One of the methods of additional verification is geometric validation. In this case, an equalization of the descriptors is carried out, obviously one at a time.

    Recognition and classification
    For simplicity, it is acceptable that we can divide all images into three classes: architecture, nature and portrait. We have our own nature, we can divide it into different species of creatures and birds. And having already realized that there are birds, we can say for ourselves: an owl, a seagull, or a crow.

    The difference between recognition and classification is clear. If we recognized the owl in the picture, then we will pay more for recognition. If it’s just a bird, then this is an intermediate option. And just like nature, there is an absolute classification. So the difference between recognition and classification lies in how deeply we went through the tree. And the longer the computer data is exposed, there will be less confusion between classification and recognition.

  • machine learning
  • For the rest of my life, I am actively involved in tasks related to pattern recognition, computer vision, and machine learning. It turned out to accumulate a lot of baggage before completing projects (both my own and those of a full-time programmer, which are about to be completed). Before that, since I’ve written a couple of articles on Habré, readers often contact me, asking for help with their tasks, to please them. So I often come across completely undeveloped CV algorithms.
    Ale, damn it, in 90% of cases I drink one and the same systemic mercy. Every now and again. For the remaining fates 5 I have already explained this to dozens of people. So there, from time to time, there are ongoing operations with ...

    For 99% of the computer vision, the manifestations of the task that you have formulated in your head, and what’s more, the progress that you have outlined has nothing meaningful to do with reality. Situations will arise in the future that you couldn’t even think about. One way to formulate a task is to collect a database of applications and work from it, looking at both ideal and worst situations. The wider the base, the more precisely the task is assigned. It’s impossible to talk about the plant without a base.

    Trivial thought. Ale everyone has mercy. Absolutely everything. Statistically, I will give a few examples of such situations. If the plant is set up poorly, sometimes it is good. And what kind of pitfalls can you look out for in the form of technical specifications for computer vision systems.

    The oblivion itself, in my opinion, is worse than virishuvan. How to take the icons as reference points and level them using the same methods as leveling. Well, I say again, until you test the database with at least a couple of hundred butts, you never know whether a robot can be successfully kicked out. Alas, such a proposition was not worthy of the author of the article... Skoda!
    These are the two most meaningful and representative, in my opinion, examples. Behind him you can understand why it is necessary to abstract from the idea and marvel at the real footage.
    There are a few more butts with which I’m stuck, but in just a few words. In all these cases, people did not have the same photograph at the moment when they began to talk about the possibility of implementing the task:
    1) Recognition of marathon runners’ numbers on T-shirts via video stream (picture from Yandix)


    hee. Poki gotuvav stattu rubbed on the tse. A very good stock, which shows all the potential problems. There are different fonts, there is an unstable background with shadows, there is blurriness and blurred color. And most importantly. The deputy introduces the idealized base. I took a good camera on a sleepy day. Try looking at the athletes' numbers on the jerseys using Yandix search.
    Hee.Hee A couple of years before publication, the author of Raptovo’s request asked me to take up the work, as I was inspired by :) Still, it’s karma, let’s add it to the statistics.

    2) Recognition of text in photographs of phone screens


    3) I, my favorite butt. Sheet by mail:
    " A program is needed in the commercial sector for image recognition.
    The robotic algorithm is as follows. The program operator specifies images of the object(s) from several angles, etc.
    Then, when any or the most similar image of an object appears, the program will perform the necessary task.
    Naturally, I can’t reveal the details yet.
    "(Spelling, punctuation saving)

    good
    Ala, not everything is so bad! The situation, when the task is set ideally, gets worse often. My favorite: “Software is needed to automatically pick up moose in the photo.
    Butt photo with moose hanging. »



    The photo is clickable.
    I still regret that this plant has not grown up. Having stolen the candidate's first degree and started studying, and then the deputy wasted his enthusiasm (or knew other Viconavians).
    The production does not have the least interpretation of the solution. Just two speeches: “what needs to be done”, “input data”. Lots of input data. That's it.

    Dumka - visnovok

    A single way to set a task is to collect a base and determine the methodology of work based on this base. What do you want to remove? What are the limits to the algorithm? Without whom you will not be able to go to the wedding, but you will not be able to make it. Without a basis of data, the deputy can immediately say, “You don’t have such and such a problem. But the situation is critical! Without this, I won’t accept the job.”

    How to formulate a base

    Chantly, everything is a prequel to the point. This article begins here. The idea that any CV and ML task requires a testing base is obvious. How can you get such a base? As far as I can remember, three or four times the base was collected and flushed down the toilet. Others and friends. Because it was unrepresentative. What's foldable?
    It is necessary to understand that “collecting a basis” = “setting a task.” The base is guilty:
    1. Represent the problems of management;
    2. Imagining the minds in which the world will exist;
    3. Formulate the command as follows;
    4. Bring the Deputy and the Viconavian to a consensus on what has been broken down.
    It's time to rock
    A couple of years ago, we decided to create a system that could work on mobile phones and recognize license plates... At that time, we were already new to CV systems. They knew that it was necessary to collect such a base, so that it would be rotten. To marvel at her and immediately understand all the problems. We collected the following database:


    They hacked the algorithm and ended up doing it badly. Giving 80-85% recognition of the numbers seen.
    Well, yes... Just in case all the numbers have become clean and the accuracy of the system for asking cell phone numbers is good for 5...
    Biometrics
    We have worked with biometrics (,) a lot in our lives. And, it seems, we have fallen into every possible rake when collecting biometric databases.
    The base is guilty of being collected in various locations. If the device for collecting the base is only available from retailers, it becomes clear early that it is not tied to the current lamp.
    Biometric databases require 5-10 snapshots per person. And 5-10 shots of guilt will be taken on different days, at different hours of the day. Every time a person approaches a biometric scanner, they are scanned in the same way. People approach on different days - in different ways. Some biometric characteristics can be changed a little by pulling.
    The database collected from retailers is unrepresentative. The stench is obviously read so that everything goes smoothly...
    Do you have a new scanner model? Do you think that you are working on the old base?
    The eyes are collected from different scanners. Different fields of work, different views, different shades, different spaces of permission, etc.





    Base for neural measurements and algorithms for beginning
    If you have some sort of algorithm in your code, it’s a lost cause. You need to formulate a basis for starting with your understanding. Let’s say your recognition task has two very different fonts. The first one occurs in 90% of cases, the other in 10%. If you cut two fonts in the same proportion and use a single classifier for them, then with high confidence the letters of the first font will be recognized, but the letters of the other will not. Therefore, the neural network / SVM will find the local minimum not there, where 97% of the first font is recognized and 97% of the other, but where 99% of the first font is recognized and 0% of the other. Your base must have enough applications of the skin type so that the progress does not go to a different minimum.

    How to formulate a base when working with a real deputy

    One of the non-trivial problems when collecting a database is who is to blame for the failure. Deputy or Vikonavets. I’ll start by bringing up a bunch of dirty butts from life.
    I hire you to complete my task!
    I felt this very phrase once. And soon, you won’t be in a hurry. Otherwise, only the base would need to be collected at the factory, where no one would be allowed to go. And even more for us, without giving us the opportunity to gain possession. These are the data that gave the key to the difference: an object the size of a few pixels, a very noisy camera with pulse transcodes, which periodically mixes, including twenty test pictures. On the subject, put a better camera, choose a better angle for filming, build a database of a couple of hundred butts, a deputy head with a phrase from the title.
    We don't have enough time to do this!
    Once upon a time, the director of a great company (100 people + offices in many countries of the world) started to wake up. In the product that this company produced, some of the functionality was implemented by even older and even simpler algorithms. The director told us that he has been dreaming about modifying this functionality into current algorithms for a long time. Let's hire two different teams of developers. Ale has not matured. One team overly theorized behind his words, and the other did not know any theory and was troubling. We wanted to try it.
    The next day we had access to a huge amount of raw information. Much more, I would not mind looking across the river. Having spent a couple of days analyzing the information, we became wary and asked: “What do you need in terms of new algorithms?” We were given about two dozen situations, if the algorithms do not work precisely. But in a couple of days I have experienced more than one or two of these situations. Having looked through another stack of data, you can see another one. For food: “What situations are turbulent for your clients next week?” - neither the director nor any of the head engineers were able to give a release date. They didn't have such statistics.
    We monitored the nutrition and developed a decision algorithm that could automatically select all possible situations. Ale we needed help with two speeches. First of all, we need to start processing information on the servers of the company itself (we did not have sufficient computational effort, nor a sufficient channel to the point where our data was saved). Nowadays, the work of a company administrator is active. Otherwise, the company representative is responsible for classifying the collected information as important and according to the need for processing (for another three days). Up to this point, we had already spent two or three hours of our time analyzing data, researching articles on topics and writing programs for collecting information (no agreement had been signed at this point, everyone was working on voluntary ambushes).
    To which we were told: “We cannot recruit anyone for this task. Find out for yourself." On which we bowed and left.
    The deputy gives the base
    There was another episode. This time there is a smaller deputy. And the system that the deputy is in charge of is scattered throughout the entire territory of the region. Then the deputy understands that we can’t take the base. I try with all my might to take the base. Collects. Very great and varied. And it is said that the base is representative. Let's start practising. The algorithm is being further refined. Before starting, it is clear that the algorithm operates on the collected basis. I am completely satisfied with the agreement. However, the axis of the base was not representative. She doesn't have 2/3 of the situation. And those situations are presented disproportionately. And in real data, the system works much worse.
    Axis and exit. We got wet. Everything they did was taken away, although the task turned out to be much more complex than they had planned. The deputy was getting tired. I spent a lot of time collecting the database.
    Alas, the result is crap. I had a chance to figure things out on the go, in case I wanted to roll up the holes...
    So who is to blame for forming the base?
    The problem is that computer scientists often blame folding systems. Systems in which dozens of rocks have worked with a lot of people. Getting started in such a system is often much more expensive than the task itself. And the deputy wants the rozrobka to begin tomorrow. And of course, the proposal is to pay for the preparation of the technical specifications and the basis an amount 2 times greater than the production value, increase the terms by 3 times, give access to your systems and algorithms, see a specialist that will show everything and confirm What a miracle.
    In my opinion, the solution to any task of the computer watch requires a smooth dialogue between the deputy and the Vikonavian, and the very duty of the deputy to formulate the task. You don’t know all the nuances of a deputy’s business, you don’t know the system in between. Every now and then I never asked the question: “It’s a pittance for you, tomorrow I’ll make a decision,” asking. It was a decision. Did it work as it should?
    I myself am tempted to shy away from such contracts. I do it myself, because in some company, I made a contract for the sale.
    In general, the situation can be painted as follows: let’s say you want to control your fun. You can:
    Think through and organize everything yourself from start to finish. In essence, this option is to “create the treasure yourself.”
    Think through everything from start to finish. Write all the scripts. І hire Vikonavians for the skin role. Toastmaster so that the guests are not bothered, the restaurant so that everything is prepared and carried out. Write a basic outline for a tamadi, a menu for a restaurant. This option is the dialogue. Provide the Vikonavian with tributes, sign everything that is needed.
    You can think in big blocks without delving into the details. Hire a toastmaster, don’t hesitate, don’t hesitate. Don't wait for the menu for the restaurant. Ask the fashion designer to select cloth, combing, and image. There will be a minimum of headaches, but if there are competitions for undressing, then you can realize that everything was done wrong. It is far from a fact that those who formulated the commands in the style of “recognize my symbol”, the convict and the deputy understand the same thing.
    And you can tell everything to a happy agency. Expensive, no need to think at all. Well, no one knows what will come out. Option - “do better for me.” More than everything, the yakist will lie in the face of the war. Ale not obov'yazkovo

    It's a factory, and there's no need for a base

    E. First of all, in tasks, de base - it’s too complicated. For example, the development of a robot that analyzes video and then makes decisions. Some kind of test stand is required. You can create a base for any number of functions. However, it is impossible to create a database according to the regular cycle of actions. In a different way, if you go doslednytska robot. For example, we are developing not only the algorithms, but also the devices with which the database will be collected. Every day, a new device, new parameters. When the algorithm changes three times a day. In such minds the base is marred. You can create any local databases that change daily. Alas, the world is not aware of it.
    Thirdly, once again, you can create a model. Modeling is a very large and complex topic. How can you earn money good model cheap, then it is imperative to work. If you want to recognize text that contains only one font, the easiest way is to create a modeling algorithm (

    Interest in computer science, Vinik is one of the first in the field piece intelligence order with such tasks as the automatic completion of theorems and intellectual games. The architecture of the first piece of neural circuitry - the perceptron - was proposed by Frank Rosenblatt, based on an analogy with the retina of the eye, and its research was carried out on the application of the task of recognizing image characters.

    The significance of the problem of vision was never in doubt, but at the same time its complexity was seriously underestimated. For example, in 1966, one of the founders of the field of artificial intelligence, Marvin Minsky, became legendary for its ostentatiousness when he decided not to solve the problem of artificial intelligence himself, but entrusted it to one student in the near future. . With the creation of a program that plays checks on par with a grandmaster, a significantly longer hour was introduced. However, it is immediately obvious that to create a program to beat people in checks, it is simpler to create an adaptive control system with a computer vision subsystem that could simply rearrange the check pieces on a more realistic basis.

    Progress in the field of computer science is indicated by two factors: the development of theory, methods, and the development of hardware. For a long time, theory and academic research have outpaced the feasibility of practical development of computer vision systems. Mentally, one can see a number of stages in the development of the theory.

    • Until the 1970s, the basic conceptual apparatus in the field of image processing was formed, which is the basis for investigating vision problems. We also saw the main tasks, specific to machine vision, related to the assessment of the physical parameters of the scene (range, motion fluidity, surface softness, etc.) Based on the images, although a number of these tasks Having already looked at the simplified setting for “ the world of toy cubes."
    • Until the 1980s, the theory of peers represented in the methods of their analysis was formulated. David Marr’s book “Zir. Information approach to education and processing of healthy images."
    • Until the 90s, there was a systematic development of approaches to the most basic ones, which had already become classic, given to the machine eye.
    • Since the mid-90s, there has been a transition to the creation and research of large-scale computer surveillance systems used for work in various natural minds.
    • The current stage is the greatest development of methods for automatic detection of images in image recognition systems and computer vision based on the principles of machine learning.

    At the same time, applied stagnation was separated by computational resources. In order to achieve the simplest possible processing of the image, you need to look at all the pixels once (and click more than once). For this purpose it is necessary to save at least hundreds of thousands of operations per second, which for a long time was impossible and needed to be simplified.

    For example, for automatic recognition parts in the production could be distorted by the black line of the conveyor, which necessitates the need for a separate object from the background, or scanning objects that collapse with a line of photodiodes with a special light, which already at the level of forming the signal ensured the visibility of invariant signs for recognition without stalling any complex methods for analyzing information. In optical-electronic systems for tracking and recognition purposes, physical stencils were used, which allow “hardware” configuration of the filtering. Some of these solutions were ingenious from an engineering point of view, but only stagnated in tasks with low a priori unimportance, and therefore, short-term, poor transferability to a new task.

    It is also not surprising that in the 1970s there was a peak in interest and optical calculations in image processing. They allowed the implementation of a small set of methods (most importantly correlation ones) with interconnected authorities of invariance, but also with great efficiency.

    With the increasing productivity of processors (as well as the development of digital video cameras), the situation has changed. Reaching the threshold of productivity required for the production of bark crops in a reasonable hour has opened the way for a whole avalanche of computer aids. It should be noted, however, that this transition is not a mitigation and is troubling.

    Recently, image processing algorithms have become available for special processors - digital signal processors (DSP) and programmable logic integrated circuits (FPGAs), which have been widely exploited and are still widely used today in onboard and industrial systems.

    However, the effective mass adoption of computer vision methods has taken away less than ten years from the achievements of a similar level of productivity of processors in personal computers. mobile computers. Thus, in terms of practical operation of the computer surveillance system, a number of stages have passed: the stage of individual decision (both in terms of hardware security and algorithms) of specific tasks; stage of development in professional fields (especially in the industrial and defense sectors) with the use of special processors, specialized imaging systems and algorithms, designed for work in minds of low a priori insignificance, but the decision allowed for scaling; and the stage of mass stagnation.

    Apparently, the machine vision system includes the following main components:

    The most widely used systems are computer vision systems that use standard cameras and computers as part of the first two components (the term “computer vision” is more suitable for such systems, but we don’t want to clearly understand machine and computer vision may). However, of course, other machine vision systems are no less important. The choice of “non-standard” image shaping methods (including spectral ranges other than visible, coherent propagation, structured switching, hyperspectral devices, time-lapse, all-directional, etc.) cameras, telescopes, microscopes, etc.) greatly expands the capabilities of machine vision systems . At that time, as the capabilities of the algorithmic security system of the machine vision are completely compromised by the human gaze, the capabilities of removing information about the monitoring of objects of stench are completely overcoming it. However, image forming becomes an independent area, and the methods of working with images taken from different sensors are so varied that their review goes beyond the scope of this article. In connection with these two, we exchange a glance at the computer vision systems that are used by the primary cameras.

    Zastosuvannya in robotics

    Robotics is a traditional field of machine learning. However, the main part of the fleet of robots for a long time fell on industrial work, where the feelings of robots were not obvious, but highly controlled minds (low indeterminacy of the middle) were able to find highly specialized solutions, including for tasks machine gaze. In addition, industrial add-ons allowed for the need for expensive installations, which included optical and computing systems.

    In connection with this, it is obvious (although it is not related only to computer surveillance systems) that the share of the robot fleet that falls on industrial robots has become less than 50% less by the beginning of the 2000s. Robotics has begun to develop and is intended for mass production. For everyday robots, as opposed to industrial ones, the importance is critical, as well as the hour autonomous robot, What is important for the use of mobile and used processor systems. Therefore, such robots are responsible for functioning in non-deterministic media. For example, in industry, for a long time (that same day), photogrammetric marks were used to stick on guard objects and calibration plates, - for the most important tasks, internal parameters and external parameters Camera images. Naturally, the need to stick such labels on objects in the interior would seriously destroy the tenacity of everyday robots. It’s also not surprising that the market of everyday robots is waiting for the beginning of its booming development to reach the great level of technology that happened at the end of the 90s.

    The starting point for this could be the release of the first version of the AIBO robot (Sony), which, despite its high price ($2500), was a great success. The first batch of these robots, amounting to 5,000 examples, was bought on the Internet in the 20th century, another batch (also in 1999) - in 17 seconds, and the sales rate reached close to 20,000 examples per year. ik.

    Also, at the end of the 90s, a device appeared in mass production that could be called everyday robots in a completely common sense. The most common autonomous household robots are robotic cleaners. The first model released in 2002 by iRobot was the Roomba. Then robotic cleaners appeared, produced by LG Electronics, Samsung and others. Until 2008, the total sales of robotic cleaners in the world amounted to over a million people in the world.

    Let us show that the first robotic cleaners, equipped with computer surveillance systems, appeared in 2006. Until then, mobile processors The type of sirmia ARM Z frequency of 200 MHz allowed the sistannium of the trivimiral scenes of the universal primacy on the basics of the іnvar -naval deskripptorovs, the point of the sensory Localiza robot z frequency 5 frames / s. The choice for the robot to designate its place of expansion has become economically feasible, although recently for these purposes, generators have been willing to use sonars.

    Further improvements in the productivity of mobile processors make it possible to set new tasks for computer monitoring systems in everyday robots, the number of sales of which around the world is already counted by millions of examples in the market. In the navigation task, such as robots used for personal research, it may be necessary to solve problems of recognizing people and their emotions by person, recognizing gestures, furnishings, including tableware and dishes, clothes, Their creatures, etc., depending on the type a mystery unleashed by a robot. Many of these tasks are far from being in full swing and promising from an innovative point of view.

    Thus, today's robotics attracts a wide range of computer science, which includes, for example:

    • set of tasks related to the orientation in the external space (for example, the task of one-hour localization and mapping - Simultaneous Localization and Mapping, SLAM), the designated distances to objects, etc.;
    • learning from recognizing different objects and interpreting scenes as a whole;
    • knowledge of people, recognition of their personalities and analysis of emotions.

    Water assistance systems

    The use of everyday robots and computer surveillance methods have become widely used in water assistance systems. Work on detecting markings, crossing codes on the road, recognizing signs, etc. was actively carried out in the 90s. However, a sufficient level of competition (both in terms of the accuracy and reliability of the methods themselves, and in terms of the productivity of processors built on a real-time scale compared to other methods) was achieved significantly in the last decade.

    One of the demonstrated applications is stereo vision methods, which are used to detect dosage distortions. These methods can be even more critical to reliability, accuracy and productivity. Zokrema, with the method of identifying pedestrians, you can get a detailed range map on a scale close to the real hour. These methods can achieve hundreds of operations per pixel and accuracy, which is achievable with image sizes no smaller than a megapixel, then with hundreds of millions of operations per frame (several billions or more operations per second ).

    Varto means that the advanced progress in the field of computer vision has nothing to do with the development of hardware security. It remains impossible to establish costly methods of image processing, and the methods themselves may also require further development. Over the past 10-15 years, effective and practical methods for creating images of three-dimensional scenes, methods for updating large range maps based on stereo vision, methods for identifying and recognizing individuals, etc. have been developed. the involvement of primary tasks with these methods has not changed, except They were rich in the extraordinary technical details and mathematical techniques that made their methods successful.

    Turning to the water assistance systems, one cannot help but think about the current methods of detecting pedestrians and watercourses based on histograms of oriented gradients. Modern methods of machine learning, as will be discussed later, first allowed the computer to perform such a hidden task as recognizing road signs, rather than using special techniques to represent different recognition algorithms that were selected for the input accuracy of the same information as people.

    One of the greatest technical achievements has become Google's driverless car, which, however, has a rich set of sensors and a video camera, and also does not work on unknown (not previously known) roads and in bad weather conditions.

    Therefore, for water assistance systems it is necessary to solve various computer problems, including:

    • stereo vision;
    • the appearance of traffic jams;
    • recognition of road signs, markings, pedestrians and cars;
    • The mystery also poses riddles related to the control of water.

    Mobile add-ons

    Even more widespread in everyday robotics and water assistance systems and the creation of computer vision for personal mobile devices Such as smartphones, tablets, etc. Zokrem, the number of mobile phones is growing steadily and has already practically surpassed the population of the Earth. Therefore, the majority of phones are released with cameras. In 2009, the number of such phones exceeded the billions, which creates a colossal market for image processing systems and computer vision, which is a far cry from the number of R & D projects that are carried out by the companies themselves - mobile devices, so and a large number of start-ups.

    Part of the image processing task for mobile devices with cameras is avoided from the task for digital cameras. The main difference lies in the core of the objectives and in the minds of the student. For example, you can create the desired synthesis of images with a high dynamic range (HDRI) behind a number of photographs taken from different exposures. In some mobile devices, there is more noise in the images, frames are formed at great intervals of an hour, and there is also more camera displacement in space, which makes it more difficult to capture clear HDRI images, which means you have to cesor mobile phone. In connection with this decision, it would seem that identical instructions for various outbuildings You may be interested in working to solve all the demands on the market.

    Of greater interest, however, are new additives that were previously on the market. A wide class of such add-ons for personal mobile devices associated with augmented reality tasks, which can be even different. This includes gaming programs (which enhance the visualization of virtual objects over the images of the real scene when the camera is moved), as well as various important programs in general, tourist programs (memory recognition 'with the information provided about them), as well as a lot of other supplements, associated with information search and object recognition: recognition of writings in foreign languages ​​with images of their translation, recognition of business cards with automatic entry of information into the phone book , as well as recognition of individuals from the collection of information from phone book, Recognition of movie posters (by replacing the poster image with a movie trailer), etc.

    Augmented reality systems can be created in the form of specialized devices such as Google Glass, which further enhances the innovative potential of computer vision methods.

    Thus, the class of tasks of the computer eye, solutions of which can be contained in mobile applications, is extremely wide. There are a large number of advantages in imaging methods (mapping associated points), including assessment of the trivial structure of the scene and significant changes in camera orientation and detection methods ects, as well as analysis of people’s personalities. However, a large number can be placed mobile add-ons For those who will require the development of specialized methods of computer surveillance. Let’s look at just two of these: recording on mobile phone with automatic decoding of the game in any table game and reconstruction of the trajectory of the golf key when a shot is struck.

    Information search and knowledge

    Many augmented reality tasks are closely related to information search (so that the activities of systems such as Google Goggles are difficult to transfer to any specific area), so as to generate substantive independent interest.

    The task of depicting behind the place is also different. They include the creation of images in search of images of unique objects, such as architectural objects, sculptures, paintings, etc., revealed and recognized in images of objects of different stages of construction (cars, etc. Varin, furniture, people’s personalities, etc. ., And also their subclasses), categorization of scenes (place, forest, mountains, saving, etc.). These tasks can be combined in various applications - for sorting images in home digital photo albums, for searching for products based on their images in online stores, for collecting images in geoinformation systems, for biometric identification systems ii, for a specialized search, display in social media (for example, search people who are recruited for koristuvach) etc., right up to the point of being depicted on the Internet.

    Both the progress already achieved and the prospects for its continuation can be seen in the Large Scale Visual Recognition Challenge competition, in which the number of recognized classes increased from 20 in 2010 to 200 in 2013.

    Recognition of objects of many classes at once is inconceivable without acquiring methods of machine learning in the field of computer vision. One of the most popular directions here is the boundaries of deep learning, designed for automatic activation of rich sign systems, which require further recognition. The demand for this is directly visible from the facts of the acquisition of various startups by such corporations as Google and Facebook. Thus, the Google corporation bought the company DNNresearch in 2013, and the startup DeepMind in 2014. Moreover, for the purchase of the remaining startup, Facebook competed (which had previously hired such a leader as Yann Le Kun, for the laboratory, which conducts research in the field of advanced science), and the purchase price amounted to $ 400 million. Varto notes that this method , What a winner in the competition for the recognition of road signs, as well as the basis for deep knowledge.

    Deep learning methods require large computing resources, and the ability to adapt the recognition of a limited class of objects can require several days of work on a computing cluster. In this case, in the future, the methods may become even more complex, and thus require even greater computational resources.

    visnovok

    We took a closer look at the computer programs for the mass koristuvach. However, there is also the impersonality of others, less typical programs. For example, computer vision methods can be used in microscopy, optical coherence tomography, and digital holography. Numerical additions to processing and analysis methods are shown in various professional fields - biomedicine, space exploration, criminology, etc.

    Updating the 3D profile of a metal sheet, which is monitored using a microscope, using the “depth by focusing” method

    At this hour it's hot current supplements Computer knowledge will continue to grow. Zokrema, for the most part, tasks related to the analysis of video data become available. The active development of trivial television expands the concept of computer vision systems, for the creation of not yet developed effective algorithms and the need for more computational effort. Such required tasks include video conversion, conversion of 2D video to 3D.

    It is not surprising that at the forefront of computer surveillance systems, special computing functions are actively being developed. Zokrema, now more popular graphics processors legal purpose (GPGPU) i gloomy calculations. However, new solutions are gradually flowing into the segment of personal computers with a wide range of possible additions.