What is Big data: we collected all the most important things about big data. Encyclopedia of Marketing What is a robot with great tributes

It was reported that the total volume of data created and replicated in 2011 could be close to 1.8 zettabytes (1.8 trillion gigabytes) - approximately 9 times more than those created in 2006.

More foldable

Prote` great tributes Let us assume more than just an analysis of the great obligations of information. The problem is not that organizations create large amounts of data, but that most of it is presented in a format that is poorly compatible with the traditional structured database format - web logs, video recordings, text documents, machine code, etc. , geospatial data. Everything is stored in a variety of different collections, sometimes across organizational boundaries. As a result, corporations can access as much of their data as possible without the necessary tools to establish relationships between their data and derive value from it. Add to this the situation that data are being updated more and more frequently, and you are seeing a situation in which traditional methods of information analysis cannot catch up with the magnitude of the data that is constantly being updated. exist, which as a result opens the way for technology great tributes.

The most beautiful design

Basically I understand great tributes transfers information to the robot with a great service and a diverse warehouse, which is often updated and is in different units to increase the efficiency of the robot, the creation of new products and increased competitiveness i. The consulting company Forrester gives a short formulation: ` Great tributes Integrate techniques and technologies that take the sense of data to extreme practicality.

How big is the difference between business analytics and great data?

Craig Batey, vice-president chief marketing officer and chief technology officer of Fujitsu Australia, said that business analysis is a descriptive process of analyzing the results achieved by a business in the past, now as the speed of processing great tributes Allows you to perform an analysis of transfers, providing business with recommendations for the future. Big data technologies allow you to analyze more types of data compared to business intelligence tools, which allows you to focus on more than just structured entities.

Matt Slocum with O"Reilly Radar matters what you want great tributes And business analytics can be of the same importance (search for nutritional evidence), they are divided into one type after three aspects.

  • Great data is used for processing large amounts of information, such as business analytics, and this is in line with the traditional importance of great data.
  • Greater data is used to process more data and change data, which means deeper research and interactivity. In some cases, the results are generated faster, depending on the web site.
  • Great data is important for the processing of unstructured data, and we begin to develop methods for selecting them after collecting and saving them can benefit, and we need algorithms and the possibility of dialogue to facilitate the search trend. y, which is located in the middle of these massifs.

Based on the white book "Oracle Information Architecture: An Architect's Career Behind Big Data" published by Oracle, when working with big data we approach information differently than when conducting business. analysis.

Working with big data is not like the basic process of business analytics, but rather the simple accumulation of data that produces results: for example, the provision of data on the payment of expenses becomes an obligation for sales per market. Working with great data, the result comes out of the process of their purification by sequential modeling: a hypothesis is first formulated, a statistical, visual and semantic model is created, and the validity of the hypothesis is verified. And then the step hangs out. This process involves the investigation of either the interpretation of visual meanings, the development of interactive queries based on knowledge, or the development of adaptive algorithms of “machine learning”, which are designed to reject the result that is being sought. Moreover, the life span of such an algorithm may be short.

Techniques for analyzing great data

There are a variety of different techniques for analyzing massive data, which are based on tools, statistics and computer science (for example, machine learning). The list does not pretend to be exhaustive, however, this image has the greatest need for various types of applications. With this, it is understood that the predecessors will continue to work on the creation of new techniques and the thorough development of existing ones. In addition, some of these methods are not necessarily stagnant, including large data, and can be successfully used for smaller arrays (for example, A/B testing, regression analysis). Incredibly, the larger and more diversified the array that can be analyzed, the more accurate and relevant data can be gleaned at the output.

A/B testing. The method of each control sample is equal to the others. Tim himself is trying to identify the optimal combination of indicators to achieve, for example, the best reaction among fellow residents to a marketing proposition. Great tributes allow you to carry out a silent iteration and thus obtain a statistically significant result.

Association rule learning. A set of techniques for identifying interactions, then. association rules between variable values ​​in great data sets. Vikorist in data mining.

Classification. A set of techniques that allows you to transfer the behavior of the customers in the segment to the market (making decisions about purchases, transactions, etc.). Vikorist in data mining.

Cluster analysis. A statistical method for classifying objects into groups based on previously unknown hidden signs. Vikorist in data mining.

Crowdsourcing. Methodology for collecting data from a large number of dzherel.

Data fusion and data integration. A set of methods that allows you to analyze the comments of clients social measures and display sales results in real time.

Data mining. A set of techniques that allows you to identify the most suitable for the product that is leaking, or services for the category of companions, identify the characteristics of the most successful workers, transfer the behavioral model of companions.

Ensemble learning. This entire method relies on the absence of predictive models, which further advances the ability to develop forecasts.

Genetic algorithms. With this method, it is possible to represent the appearance of chromosomes that can combine and mutate. As with the processes of natural evolution, the most dominant individual survives.

Machine learning. Directly in information science (historically the name “piece intelligence” has been attached to it), which entails the creation of self-instruction algorithms based on the analysis of empirical data.

Natural language processing (NLP). A set of methods for recognizing natural language people from computer science and linguistics.

Network analysis. A set of methods for analyzing connections between nodes at boundaries. One hundred social measures allow you to analyze the relationships between nearby investors, companies, groups, etc.

Optimization. A set of numerical methods for redesigning folding systems and processes for decorating one or more displays. Helps in making strategic decisions, for example, stocking a product line to be introduced to the market, conducting investment analysis, etc.

Pattern recognition. A set of techniques with elements of self-guidance for transferring the behavioral model of learners.

Predictive modeling. A set of techniques that allow you to create a mathematical model of a predetermined possible development scenario. For example, an analysis of the database of a CRM system is useful for capable minds to encourage subscribers to change providers.

Regression. A set of statistical methods to identify patterns between the change of stale and non-stale varieties. It is often necessary to make predictions and transfers. Vikorizes data mining.

Sentiment analysis. The methods for assessing the mood of fellow residents are based on natural language recognition technologies. The smells allow you to extract from the hidden information flow of information associated with the object you are talking about (for example, a live product). Next, evaluate the polarity of the judgment (positive or negative), the stage of emotion, and so on.

Signal processing. Based on radio engineering, there is a set of techniques that can be used for signal recognition, noise and further analysis.

Spatial analysis. A set of statistical methods for analyzing spatial data - locality topology, geographic coordinates, object geometry. Jerel great tributes Geographic information systems (GIS) often come into play.

  • Revolution Analytics (based on the R language for mathematical statistics).

Of particular interest to this list is Apache Hadoop - a open source software that has been tested for the last five years as a data analyzer for most of the stock trackers. Just as Yahoo released the Hadoop code in open-source code, the IT industry has suddenly seen a number of directly created products based on Hadoop. Almost all daily analysis features great tributes Provide features for integration with Hadoop. Their distributors are both startups and high-profile world companies.

Market solutions for managing big data

Great Data Platforms (BDP, Big Data Platform) as a way to combat digital hording

Possibility to analyze great tributes, colloquially called Big Data, is perceived as a benefit, and definitely. Is it really true? How long can you estimate the non-streaming accumulation of data? It’s been a long time before that psychologists from all over the world call pathological hoarding, sylogomania, or figuratively “Plyushkin’s syndrome.” In English, the habit of collecting everything quickly is called hording (from the English hoard - “Stock”). Behind the classification of mental illnesses there is a classification of insurances up to mental disorders. In the digital era, before traditional speech hording, Digital Hoarding is added, which can affect both the individual and the purposes of the enterprise and organization.

Light and market of Russia

Big data Landscape - Main contributors

Interest in tools for collection, processing, management and analysis great tributes Not all IT companies revealed this, which is entirely natural. First of all, the stench completely sticks to this phenomenon in the power business, in other words, great tributes They reveal incredible opportunities for developing new market niches and obtaining new replacements.

There were a lot of startups on the market that wanted to make business by collecting large amounts of data. Some of them are relying on ready-made infrastructure to put great pressure on the Amazon platform.

Theory and practice of the Great Tributes among the Galusians

Development history

2017

TmaxSoft Forecast: Big Data is coming with modernization of DBMS

Enterprises know that the great obligations they have accumulated are due to important information about their business and clients. If a company can successfully capture this information, it will have a significant advantage on par with its competitors, and it will be able to introduce better products and services than them. However, many organizations still cannot effectively compete great tributes through those whose IT infrastructure has been reduced, it is not possible to provide the necessary capacity for storage systems, data exchange processes, utilities and programs necessary for processing and analyzing large arrays of unstructured data to extract data from them Other information, posted in TmaxSoft.

In addition, the increased processing power required to analyze complex data, which is steadily increasing, can require significant investments in the organization’s legacy IT infrastructure, as well as additional resources for support that could be used to develop new add-ons and services.

On February 5, 2015, the White House published a testimony in which they discussed how vikorist companies great tributes to set different prices for different buyers - a practice known as “price discrimination” or “personalized pricing”. This describes the impact of “great data” for both sellers and buyers, and the authors come to the conclusion that there are a lot of problems associated with the emergence of great data and differential pricing that may be and within the framework of the general anti-discrimination legislation and laws that protect the rights of residents

It is clear that at this time there is little more than facts to tell about how companies are exploiting great data in the context of individualized marketing and differential pricing. This information shows that sellers offer pricing methods that can be divided into three categories:

  • implantation of a crooked butt;
  • Steering and differential pricing based on demographic data; i
  • Targeted behavioral marketing (behavioral targeting) and individualized pricing.

Vivchennya crooked bottom: In order to understand the behavior of consumer marketers, marketers often conduct experiments in this area, in which clients are randomly assigned to one of two possible price categories. “Technically, these experiments are a form of differential pricing, and they ultimately result in different prices for clients, suggesting that they are “non-discriminatory” for those who "All clients' confidence will be spent at the same price."

Steering: This is the practice of presenting products to employees based on their affiliation and demographic group. Thus, the website of a computer company can offer the same laptop to different types of buyers at different prices, based on the information they provide about themselves (for example, depending on whether the buyer is a representative of the government in scientific and commercial settings , or a private person) or based on their geographic distribution (for example, assigned to the computer’s IP address).

Purposeful behavioral marketing and individualized pricing: In these cases, customers' personal data is used for targeted advertising and customized pricing for specific products. For example, online advertisers collect data from advertisements and third-party cookies about their online activity in order to enhance their advertising materials. This approach, on the one hand, allows people to reject advertising that represents the interest of goods and services to them. However, you can contact those people who don’t want to see their personal data (such as to the information about the promotion of sites, 'required for medical and financial provisions) were collected without any profit.

While goal-oriented behavioral marketing is likely to expand significantly, there is clearly little evidence of individualized pricing in the online environment. There is an understatement, which may be due to the fact that similar methods are still being fragmented, or because companies are in no hurry to resort to individual pricing (or respect talk more quickly) - perhaps, for fear of a negative reaction from fellow residents .

The authors note that “for the individual learner, the development of great data is undoubtedly associated with both potential returns and risks.” Knowing that despite the fact that the problems of discrimination and discrimination arise, it is at the same time affirmed that basic anti-discrimination laws and laws protecting the rights of residents are sufficient to overcome them. However, they also agree on the need for “consistent control” in these cases, if companies are stealing confidential information in an opaque manner or in ways that do not comply with the existing regulatory framework.

This evidence continues to inform the White House about the stagnation of “great data” and discriminatory pricing on the Internet and the legacy of American citizens. Previously, it was already reported that the working group of the White House, for great reasons, published in the newspaper 2014 their evidence about this diet. The Federal Trade Commission (FTC) also looked at this during a seminar it conducted in the spring of 2014 on discrimination in connection with the corruption of great data.

2014

Gartner explores the myths about the “Great Tributes”

In an analytical note in the fall of 2014, Gartner reconsidered the low-broadness of the IT-kernel myths of the Great Data and their emergence.

  • All systems for processing the Great Tributes will be implemented for us

Interest in Big Data technologies is at an all-time high: 73% of organizations informed by Gartner analysts are already investing in or recruiting from related projects. However, most of these initiatives are still in the early stages, and only 13% of respondents have already made such decisions. The trickiest thing is to figure out how to extract income from the Great Data, and figure out what to start with. Many organizations are stuck at the pilot stage, and the fragments cannot tie new technology to specific business processes.

  • We have so much data that there is no need to worry about the various details in them

IT professionals respect that other flaws in the data do not interfere with the underlying results of the analysis of great obligations. If there is a lot of data, skin care will have less effect on the result, say analysts, and there will be more damage. On the other hand, most of the analysis of data is external, of unknown structure and similarity, so the reliability of mercy increases. In this way, in the world of the Great Data, kindness is truly very important.

  • Great Data technologies address the need for data integration

Great Data appreciates the possibility of processing data in the original format with automatic generation of schemes in the world of reading. It is important to be able to analyze information from these devices themselves using several data models. It is important that it is also possible for the end-users to interpret any set of data at their own discretion. In fact, most developers often need the traditional method with a ready-made scheme if the data is formatted in a proper manner, and for the sake of the integrity of the information and how it can be used Enter the script of the vikoristan.

  • The sheer volume of data has no sense of vikorism for complex analytics.

Information management system administrators are very aware that there is no sense in wasting time on data creation, and that complex analytical systems are being updated with new types of data. In fact, many advanced analytics systems collect information from data warehouses. In other cases, new types of data need to be prepared in advance before analysis by Great Data processing systems; it is necessary to make decisions about the relevance of data, the principles of aggregation and the necessary level of consistency - such preparation can be done in a similar way.

  • Lakes of data will replace the monasteries of data.

It is right for employers to introduce assistants into positional data lakes as a replacement for storage or as critically important elements of the analytical infrastructure. The underlying data lake technologies lack maturity and breadth of functionality and power. To those who are responsible for data management, wait until the lake reaches the same level of development, respect Gartner.

Accenture: 92% of great data released by the system are satisfied with the result

Among the main achievements of the great data of nutrition were named:

  • “Search for new income” (56%),
  • “Increased customer awareness” (51%),
  • “new products and services” (50%)
  • “an influx of new clients and preservation of the loyalty of old ones” (47%).

With the introduction of new technologies, many companies have encountered traditional problems. For 51%, the stumbling block was safety, for 47% it was the budget, for 41% it was the defect of necessary personnel, and for 35% it was difficulties in integrating with the main system. Almost all company managers (about 91%) plan to solve the problem of staff shortages and hire specialists for great data.

Businesses are optimistic about upcoming big data technologies. 89% respect that they will change business as much as the Internet. 79% of respondents said that companies that do not deal with great data lose competitive advantage.

Meanwhile, the people in the Duma were talking about those who are ready to respect great tributes. 65% of respondents value “great data files”, 60% value “deep analytics and analysis”, and 50% value “data from visualization tools”.

Madrid spends 14.7 million euros on managing big data

U lipny 2014 r. It became known that Madrid is using big data technologies to manage the city's infrastructure. The cost of the project is 14.7 million euros, the basis for the proposed solution will be technologies for the analysis and management of big data. With our help, our administration works with each other as a service provider and is paid according to the level of services.

We are talking about the administration's contractors, who are working on streets, lighting, irrigation, greenery, cleaning up the territory and exporting, as well as processing waste. During the course of the project, 300 key indicators of the effectiveness of the work of Microsoft services have been divided for special inspectors, on the basis of which there are currently 1.5 thousand. various re-verifications and extinctions. In addition, the place will continue to develop an innovative technology platform called Madrid iNTeligente (MiNT) - Smarter Madrid.

2013

Experts: Pik fashion on Big Data

All vendors in the data management market are currently developing technologies for Big Data management. This new technological trend is also actively discussed by the professional industry, both retailers and Galouze analysts and potential developers of such solutions.

As the Datashift company has stated, at the end of 2013 there will be a discussion about great tributes» moved all possible dimensions. Having analyzed a number of Big Data mysteries in social networks, Datashift found that in 2012 this term was used nearly 2 billion times in posts created by nearly 1 million different authors around the world. This is equivalent to 260 posts per year, and the peak of riddles became 3070 riddles per year.

Gartner: Kozhen is another IT director ready to invest in Big data

After many experiments with Big data technologies and the first introduction in 2013, the adoption of such solutions will grow significantly, Gartner predicts. Researchers surveyed IT leaders around the world and found that 42% of them had already invested in Big Data technology or were planning to make such investments over the course of the near future (data as of February 2013).

Companies are spending money on processing technologies great tributes The information landscape is rapidly changing, requiring new approaches to information processing. Many companies have already learned that large amounts of data are critically important, and working with them allows them to achieve benefits that are not available with traditional information and processing methods. In addition, the constant appreciation of these great data among ZMI fuels interest in advanced technologies.

Frank Buytendijk, vice-president of Gartner, was inspired by the company's calls for streaming, as the company's activities reveal the uneasiness that competitors are facing in mastering Big Data.

“It’s hard to boast, the possibilities for implementing ideas based on “great data” technologies are virtually limitless,” he said.

Gartner predicts that by 2015, 20% of Global 1000 companies will adopt a strategic focus on “information infrastructure.”

In the wake of new opportunities that technologies for processing “great data” bring, organizations are now organizing the process of collecting and preserving a variety of information.

For general and government organizations, as well as industrial companies, the greatest potential for business transformation lies in the accumulated data, so-called dark data, which remains information by e-mail, multimedia and other similar content. According to Gartner, the data carrier will help those who learn to deal with different types of information.

Cisco research: Big Data will help increase IT budgets

During the study (spring 2013) under the title Cisco Connected World Technology Report, conducted in 18 countries by the independent analytical company InsightExpress, 1,800 college students and the same number of young facists were educated in from 18 to 30 pm. The testing was carried out to determine the level of readiness of IT departments before the implementation of projects Big Data and reject the statements about the problems associated with it, the technological aspects and the strategic value of such projects.

Most companies collect, record and analyze data. Protely, it appears that many companies in connection with Big Data are faced with a whole range of complex business and information technology problems. For example, 60 hundreds of respondents know that Big Data solutions can improve decision-making processes and promote competitiveness, while only 28 hundreds of respondents said that real strategic advantages have already been accumulated ї information.

More than half of the IT workers believe that Big Data projects will help increase IT budgets in their organizations, as there will be advancements in technology, personnel and professional skills. Moreover, more than half of the respondents believe that such projects will increase the IT budgets of their companies already in 2012. 57 hundreds of ideas that Big Data will increase their budgets over the next three years.

Eighty-one hundred respondents said that all (or other companies) Big Data projects will result in stagnation of poor calculations. Thus, the expansion of advanced technologies can be reflected in the rapidity of widespread Big Data solutions and the value of these solutions for business.

Companies collect and mine data of various types, both structured and unstructured. Axis of which elements are highlighted by the participants of the study (Cisco Connected World Technology Report):

Nearly half (48 hundred square meters) of IT workers predict sub-war expansion along their borders along the nearest two rivers. (This is especially typical for China, where 68 hundred thousand of food products reach such a point of view, and 60 hundred thousand for China). 23 hundreds of respondents find a tripling of moderate importance by stretching the front two rocks. Moreover, more than 40 hundred respondents declared their readiness for the rapid increase in border traffic.

27 hundreds of employees have learned that they need clear IT policies and information security practices.

21 hundred meters will require increased transmission capacity.

Big Data opens up new opportunities for IT departments to increase value and form close connections with business units, allowing them to increase revenue and improve the financial strength of the company. Big Data projects have become a strategic partner for the IT subsidiary.

According to 73% of respondents, IT itself will become the main driver of the implementation of the Big Data strategy. At the same time, it is important to note that other sectors will also be involved before the implementation of this strategy. In front of us, there are areas of finance (named by 24 hundred respondents), research (20 hundred), operational (20 hundred), engineering (19 hundred), as well as iv marketing (15 hundred thousand) and sales (14 hundred thousand).

Gartner: Managing Big Data Needs Millions of New Workers

World IT spending will reach $3.7 billion by 2013, which is 3.8% more than spending on information technology in 2012 (the forecast for the end is $3.6 billion). Segment great tributes(big data) is developing at an ever-increasing pace, according to Gartner.

Until 2015, there were 4.4 million workers in the sphere information technologies servicing of the great data will be created, with 1.9 million workers in . Moreover, as soon as three additional jobs are created across the borders of the IT sector, only 6 million people in the United States will be working to support the information economy ki.

According to Gartner experts, the main problem lies in the fact that there is a lack of talent in the country: both the private and national public information system, for example, the United States does not have the capacity to supply a sufficient number of qualified personnel. Also, from the predictions of new jobs, only one out of three will be provided with IT personnel.

Analysts believe that the role of highly qualified IT personnel will be played by companies that urgently require them, as such satellites will become a gateway for them to new information economy Maybutnyogo.

2012

First skepticism about the "Great Tributes"

Analysts from Ovum and Gartner admit that fashionable topics in 2012 great tributes The time may come to let go of the illusion.

The term “Great Data” usually refers to the steadily growing amount of information that is available online from social media, including sensors and other devices, as well as a growing range of tools that Vykorists are used to process the data and identify important business matters on their basis - Trends.

“Through (or without respect to) the idea of ​​great data, marketers in 2012 watched this trend with great hope,” said Tony Bayer, an analyst at Ovum.

Bayer reported that DataSift conducted a retrospective analysis of the mysteries of great data in

We regularly come across fashionable words and significance, the sense of which we intuitively understand, but we don’t have a clear picture of what this thing is and how it works.

One of these is Big Data. In Russian language, you can get the literal translation - “great data”, but more often people say and write it as: Big Data. Everyone, in a sing-song tone, has heard these words on the Internet, and it’s difficult, but what exactly matters, far from the subtle digital world, office humanists don’t understand first

The only attempt to fill this gap in the mud of the widest stake of koristuvachs is the article of one of our favorite authors Bernard Marr, as it is called What is Big Data? Super simple explanation for the skin". Without sophisticated jargon in a single way, explaining the key ideas of this phenomenon for the skin is not necessary to illuminate that area of ​​activity.

In fact, the remaining few of us already live in a world thoroughly permeated with Big Data, but we continue to get lost in the understanding of what is still the same. It is partly true that the concept of Big Data itself is constantly being transformed and re-interpreted, since the world of high technologies and the processing of large amounts of information is rapidly changing, including all new new options. And the demand for this information is constantly growing.

So, what does Big Data – 2017 mean?

It all started from Vibukh’s growth of a large amount of data that we create from the beginning of a digital series. This has become possible mainly due to the increasing number and complexity of computers, the expansion of the Internet and the development of technologies that allow us to capture information from the real, physical world in which we all live, and convert it into digital and data.

In 2017, we generate data when we go to the Internet, when we use our GPS-equipped smartphones, when we connect with friends on social networks, when we enjoy mobile programs or music, when we buy.

We can say that we are deprived of personal digital traces, so that we would not be bothered, since our activities include any digital transactions. That may happen forever and ever.

Moreover, the volume of data generated by the machines themselves is growing at a rapid rate. Data is created and transmitted when our intelligent accessories communicate one with another. Viral enterprises around the world are equipped with devices that collect and transmit data day and night.

In the near future, our streets will be filled with self-driving cars that independently plot routes based on maps from around the world, data generated in real time.

What can we do with Big Data?

The endlessly growing flow of sensory information, photographs, text messages, audio and video data lies at the heart of Big Data, which we can analyze in such a way as it was impossible to identify many of the reasons for this.

Projects based on Big Data were immediately launched to help:

- Treat illnesses and save cancer. Based on the science of Big Data, medicine analyzes a large number of medical records and images, which allows for early diagnosis and facilitates the creation of new treatment methods.

- Fighting hunger. The rural kingdom is experiencing the current Big Data revolution, which allows the use of resources in such a way as to maximize the yield for a minimum input into the ecosystem and optimize the use of machines and ownership.

- Reveal distant planets. NASA, for example, analyzes a lot of data and comes up with a model of future missions in distant worlds.

- Transfer superordinate situations of different nature and minimize possible damage. Data from numerical sensors can be transferred whenever there is an attack and the possible behavior of people in an emergency situation that increases the chances of survival.

- Avoid evildoers for the use of modern technologies that allow for more efficient distribution of resources and directing them where they are most needed.

And most of us: Big Data is helping the lives of everyday people in both the simple and the simple - this includes online shopping, planning trips, and navigating the minds of the metropolis.

Finding the best time to buy air tickets and choosing which movie or series to watch has become much easier with Big Data robots.

How does this work?

Big Data works on the principle: the more you know about something, the more accurately you can tell what will happen next. The leveling of nearby data and the connections between them (we are talking about the enormous number of data and the incredibly large number of possible connections between them) allows one to identify patterns earlier. This makes it possible to get to the bottom of the problem and to understand how we can deal with this or any other process.

Most often, the process of processing large amounts of information involves running models based on the collected data and running a simulation, during which key adjustments gradually change, during which the system monitors how “Adjustment” leads to a possible result.

This process is completely automated, including the analysis of millions of simulations, the selection of all possible options until the pattern (required circuit) is found or until there is “enlightenment” that will help the virus It’s a mess, for whose sake everything began.

In addition to the objects that are familiar to us, the data are taken in an unstructured form, so that they are difficult to put into tables with middles and stoppers that are familiar to us, people. A large amount of data is transmitted as images and videos: from satellite photos to selfies that you post on Instagram or Facebook, as well as entries in email and instant messenger or telephone calls.

In order to give a practical place to every unscrambled and varied stream of data, Big Data often uses the latest analysis technologies, which include artificial intelligence and machine learning (if a program on a computer program).

Computers themselves are beginning to understand what other information represents - for example, recognizing images, words - and they can work much more quickly than people.

Great brother?

In proportion to the unprecedented capabilities that today's Big Data gives us, the number of benefits associated with it is growing.

Lack of specific data. Big Data collects a large amount of information about our private lives. There is a lot of information that we would like to save in the dungeon.

SAFETY. Do we think that there is nothing terrible in transferring all our personal data to a machine for the sake of some specific, visible mark, but can we hope that our data is kept in a safe place?
Who and how can guarantee this for us?

DISCRIMINATION. If everything is known, is it possible to discriminate against people based on what Big Data knows about them? Banks check your credit history, and insurance companies check your car insurance rates based on what they know about you. How far can you go?

It can be assumed that in order to minimize risks, companies, government agencies and private individuals will protect those who can find out about us, and therefore limit our access to resources and information.

For all our successes, we can recognize that everything is also about the unknown part of Big Data. Until now, people have been scratching their heads over the news for the past few days, until the time has come, when the sickness has reached the point of business, which wants to use the advantages of Big Data for its own purposes. But this can threaten with catastrophic consequences.

Peredmova

“Big data” is a fashionable new term that appears at all professional conferences dedicated to data analysis, predictive analytics, data mining, CRM. The term is used in areas where work with clearly great data volumes is relevant, and there is a constant increase in the fluidity of data flow to the organizational process: economics, banking, manufacturing, marketing, telecommunications, Information, web analytics, medicine, etc.

At the same time, due to the rapid accumulation of information, data analysis technologies are developing at a rapid pace. Since it was still possible, say, to segment clients into groups with similar similarities, it is now possible to create models for each client in real time, analyzing, for example, its movements along the Inter no for searching for a specific product. The interests of the resident can be analyzed, and appropriate advertising or specific propositions can be displayed based on the generated model. The model can also be adjusted and reset to real-time mode, which would inevitably lead to many risks.

In the telecommunications sector, for example, the development of technology requires physical expansion old phones that of their rulers, and, it seems, the idea will inevitably become reality, described in the science fiction film “A Special Thought”, 2002, where advertising information is displayed in shopping centers protected the interests of specific individuals who were involved in the process.

At the same time, there are situations where the accumulation of new technologies can lead to disappointment. For example, when the data is sparse ( Sparse data), which gives important insight into action, and is richly valuable, but Great tributes(Big Data), which describes mountains, is often not real information.

Meta these articles - clarify and discuss the new capabilities of Big Data and illustrate how it is an analytical platform STATISTICA StatSoft can help you in effectively using Big Data to optimize processes and tasks.

How big is Big Data?

Of course, the correct sentence on this food may sound - “this is to lie down...”

Currently discussed concepts of Big Data describe data in the order of terabytes.

In practice (as we talk about gigabytes or terabytes), such data can be easily saved and stored using “traditional” databases and standard databases (database servers).

Software security STATISTICA Vikorist's rich flow technology for algorithms for accessing data (reading), creating and creating predictive (and scoring) models, so that such data selections can be easily analyzed and do not require specialized tools.

Some current StatSoft projects produce samples of approximately 9-12 million rows. Multiply them by 1000 parameters (changeable), collected and organized from a data warehouse to generate risk and predictive models. Such a file is only about 100 gigabytes in size. This, of course, is not a small mass of data, but its size does not outweigh the capabilities of standard database technology.

Product line STATISTICA for batch analysis and per-unit scoring models ( STATISTICA Enterprise), decisions to be made in real time mode ( STATISTICA Live Score), and analytical tools for creating and managing models ( STATISTICA Data Miner, Decisioning) easily scales across multiple servers with multi-core processors.

In practice, this means that there is sufficient flexibility in the work of analytical models (for example, forecasts of the credit risk, the confidence of the shahraist, the reliability of the holdings, etc.), which allows us to make operational decisions, perhaps can be reached using standard tools. STATISTICA.

From great obligations of data to Big Data

As a rule, the discussion of Big Data is focused on data collections (and analysis based on such collections), the volume of which is much larger than just a few terabytes.

Data collections can grow up to a thousand terabytes, or up to petabytes (1000 terabytes = 1 petabyte).

Beyond petabytes, the accumulation of data can be measured in exabytes; for example, the global manufacturing sector in 2010 is estimated to have accumulated a total of 2 exabytes of new information (Manyika et al., 2011).

There are gales, where data is going to accumulate and accumulate even more intensely.

For example, in a nuclear industry, such as a power plant, a continuous flow of data is generated for tens of thousands of parameters every minute or every second.

In addition, so-called “smart grid” technologies are increasingly being promoted, which allow utility services to collect electricity from every moment of the day.

For such programs, in which data is saved by fate, the accumulated data is classified as Extremely Big Data.

The number of Big Data applications in commercial and government sectors is also growing, with data collections reaching hundreds of terabytes and petabytes.

Modern technologies make it possible to “survey” people and their behavior in a variety of ways. For example, if we use the Internet, we can easily make purchases in online stores or large stores such as Walmart (using Wikipedia, Walmart’s data collection is estimated at less than 2 petabytes), or moving from With sealed mobile phones, we lose track of our actions, what to lead to the accumulation of new information.

A variety of communication methods, from simple telephone calls to the acquisition of information through social networking sites such as Facebook (including Wikipedia, the exchange of information is expected to reach 30 billion units. b), or sharing videos on sites such as YouTube (Youtube confirms, that will attract 24 years of video skin, div. Wikipedia), today generate a large amount of new data.

In a similar manner, today medical technologies generate large quantities of data that support medical care (images, videos, real-time monitoring).

Therefore, the classification of these obligations can be done as follows:

Large data sets: from 1000 megabytes (1 gigabyte) to hundreds of gigabytes

Large data sets: from 1000 gigabytes (1 terabyte) to many terabytes

Big Data: from several terabytes to hundreds of terabytes

Extremely Big Data: from 1000 to 10,000 terabytes = from 1 to 10 petabytes

Management, connected with Big Data

There are three types of tasks related to Big Data:

1. Saving and management

There are hundreds of terabytes of data, or a petabyte does not allow you to easily save and manage them using traditional methods relational databases tributes

2. Unstructured information

Most Big Data data is unstructured. Tobto. How can you organize text, video, images, etc.?

3. Big Data analysis

How to analyze unstructured information? How can we create and develop advanced predictive models based on Big Data?

Saving and servicing Big Data

Big Data is stored and organized in separate file systems.

Generally, information is saved on dozens (sometimes thousands) of hard drives and on standard computers.

This is the name of the “map” (map) where (on which computer and/or disk) a specific piece of information is saved.

To ensure durability and reliability, skin information should be saved several times, for example, three times.

So, for example, it is acceptable that you collected individual transactions from a large variety of stores. Detailed information Each transaction is saved on different servers and hard drives, and the “map” is indexed, and information about the relevant business is also saved.

For the help of standard equipment and closed software features for processing a whole distributed file system (for example, Hadoop), it is still easy to implement reliable data storage at the petabyte scale.

Unstructured information

Most of the collected information in a file system partition consists of unstructured data such as text, images, photographs and videos.

It has its advantages and shortcomings.

The advantage lies in the fact that the ability to save large data allows you to save all the data without worrying about how much of the data is relevant for further analysis and decision making.

There are not many of those who have such fits for exercise basic information Further processing of these great bodies of data is required.

Although some of these operations may be simple (for example, simple cleaning, etc.), others require more complex algorithms that may be specially developed for efficient operation on the divisions of the file system.

One top manager once told StatSoft that he had “spent an entire stan on IT and saved data before he even started to swindle pennies,” without even thinking about how to better use data to reduce the main activity.

Also, as data obligations can grow in geometric progression, the ability to retrieve information and act on the basis of that information will asymptotically reach boundaries.

It is important that the methods and procedures for generating, updating models, as well as automating the process, decide to separate a number of data storage systems to ensure that such systems are useful and useful for the business.

Big Data Analysis

This is indeed a great problem associated with the analysis of unstructured Big Data data: how to analyze them in a meaningful way. Much less has been written about this technology, and less about data saving and Big Data management technologies.

The nutrition is low, as you can see.

Map-Reduce

When analyzing hundreds of terabytes or petabytes of data, it is impossible to extract the data in any other place for analysis (for example, in STATISTICA Enterprise Analysis Server).

The process of transferring data by channels to the same server or server (for parallel processing) takes a lot of time and requires a lot of traffic.

In practice, analytical calculations are carried out physically close to the place where the data is saved.

The Map-Reduce algorithm is a model for divisional calculations. Its principle is current: the input data is divided into working nodes (individual nodes) of the distributed file system for forward processing (map-croc) and, then, a handful (pooling) of the already forward processing of data (reduce-croc) ).

In this way, for example, to calculate the sum of the sum, the algorithm will simultaneously calculate the intermediate sums in each node of the distributed file system, and then sum up the intermediate values.

There is a large amount of information on the Internet about how you can pay different fees for a map-reduce model, including for predictive analytics.

Simple statistics, Business Intelligence (BI)

To create simple answers, BI uses a number of products with a closed code that allows you to calculate amounts, averages, proportions, etc. map-reduce for more help.

Thus, it is very easy to extract accurate data and other simple statistics for compiling results.

Forecast, no modeling, lost statistics

At first glance, you might think that there are predictive models in the storage file system, but this is not the case at all. Let's look at the advanced stages of data analysis.

Data preparation. Recently, StatSoft has conducted a series of great and successful projects using very large data sets that describe extensive evidence of the power plant operation process. The results of the analysis showed increased efficiency of power plant operation and reduced waste generation (Electric Power Research Institute, 2009).

It is important that, regardless of the fact that data sets can be even large, the information contained in them may be significantly smaller in size.

For example, as data accumulates pressure or damage, many parameters (temperature of gases and ovens, flows, damper positions, etc.) become stable over long periods of time. Otherwise, data is recorded every second, and it is important to repeat the same information.

Thus, it is necessary to carry out “intelligent” aggregation of data, which is used for modeling and optimization of data, in order to obtain unnecessary information about dynamic changes that affect the efficiency of power plant operation. This is the number of videos.

Classification of texts Front trim tributes We illustrate once again how large data sets can contain much less essential information.

For example, StatSoft has taken part in projects related to text mining from tweets that show how satisfied passengers are with airlines and their services.

Regardless of the fact that today a great number of similar responses were received, the moods expressed in them seemed to be both simple and monotonous. Most of the information is swearing and short information with one proposition about “damned evidence.” In addition, the quantity and “strength” of these moods is often more stable in specific situations (for example, waste of luggage, trashy food, missed flights).

In this way, the reduction of actual tweets to a quick (rating) mood, vikoryst methods of text mining (for example, implemented in STATISTICA Text Miner), reduce to a much smaller amount of data, which can then be easily combined with other structured data (actual ticket sales, or information about passengers who fly frequently). The analysis allows you to divide clients into groups and identify their characteristic habits.

There are no tools available to perform such aggregation of data (for example, default settings) across file system divisions, allowing data analytics to be carried out easily.

Pobudova models

The challenge is often to quickly create accurate models of the data that is stored across the file system.

There is a need to implement map-reduce for various data mining/predictive analytics algorithms that are suitable for large-scale parallel processing of data across file system divisions (which can be supported by an additional platform STATISTICA StatSoft).

However, because you have collected a great deal of data, do you believe that the pouch model is more accurate?

In fact, the most powerful models will be for small segments of data across file system divisions.

As a recent Forrester report puts it: “Two plus two equals 3.9 and it’s a good thing” (Hopkins & Evelson, 2011).

The statistical and mathematical accuracy lies in the fact that the linear regression model, which includes, for example, 10 predictors, based on the correct formation Worldwide election With 100,000 precautions, it will be as accurate as the model generated by 100 million precautions.

Big data is a broad term for the non-traditional strategies and technologies needed to collect, organize, and process information from large data sets. Although the problem of working with data, which outweighs the computational difficulty and the ability to save one computer, is not new, the remaining fate of the scale of its value has significantly expanded.

In this article you will find the basic concepts with which you can come across, following the great data. Also here we can see the processes and technologies that are being developed in this country at the moment.

What are these great tributes?

It is important to formulate the precise meaning of the “great data”, because projects, vendors, practitioners and business professionals define it in very different ways. Due to their respect, great tributes can be expressed as:

  • Great sets of data.
  • The category of computational strategies and technologies that are used to process large data sets.

In this context, “a large data set” means a data set that is large enough to be processed or saved using traditional tools or on a single computer. This means that the underlying scale of large data sets is constantly changing and can significantly change from hour to hour.

Great Tribute Systems

The basic benefits of working with large data are the same as with other data sets. However, the massive scale, the fluidity of processing and the characteristics of the data, which are sharpened at the skin stage of the process, pose serious new problems in the development of stocks. The method of most great data systems is to understand the connections with the great data of different data, which would be impossible with the help of other primary methods.

In 2001, Doug Laney of Gartner presented the “Three Vs of Great Data” to describe the characteristics that distinguish the processing of great data from the processing of other types of data:

  1. Volume (obsessions given).
  2. Velocity (speed of accumulation and processing of data).
  3. Variety (diversity of types of collected data).

Obsyag tributes

Vinyatkov’s scale of information that is being compiled contributes to the significance of the great data system. These data sets may be orders of magnitude larger than traditional sets, which will require greater care at the skin stage of processing and conservation.

When fragments can outweigh the capabilities of a single computer, there is often a problem with pooling, sharing, and coordinating resources from groups of computers. Cluster control and algorithms that break down the given larger parts become more important in our minds.

Liquidity of accumulation and waste

Another characteristic that is fundamentally different from large data from other data systems is the fluidity with which information is moved by the system. Data often comes from the system with a large number of parts and can be processed as soon as possible in order to update the system's production mill.

This emphasis is on mittevy the call of the bell Having encouraged many fach practitioners to adopt a batch-oriented approach and give priority to the real-time streaming system. Data is continuously added, compiled and analyzed to keep up with the tide of new information and extract valuable data at an early stage, when it is most relevant. Who needs reliable systems with highly available components to protect against data conveyor failures.

Variety of types of collected data

Great data has no unique problems associated with a wide range of deposits and its aqueous container.

Data can come from internal systems, such as application logs and servers, from social media channels and other external APIs, sensors of physical devices and other devices. The method of great data systems is the processing of potentially valuable data independently from the way of combining all the information into a single system.

The formats and types of noses can also vary significantly. Media files (images, video and audio) are combined with text files, structured logs, etc. Most traditional data processing systems are designed to process data in a conveyor that is already tagged, formatted and organized, or the systems of great invite them to accept and save tribute by applying save them weekend camp. Ideally, any transformations or changes of unprocessed data will be generated by the memory at the time of processing.

Other characteristics

Over the years, leaders and organizations have promoted the expansion of the “three Vs”, wanting these innovations to describe problems rather than characteristics of great data.

  • Veracity: The variability of the data and the complexity of the processing can lead to problems in assessing the quality of the data (and, therefore, the quality of the resulting analysis).
  • Variability: Data changes lead to wide changes in data. Identifying, processing, or filtering low-brightness data may require additional resources that can increase the data's brightness.
  • Value: the end of great data is value. Other systems and processes are even more complex, which complicates the calculation of data and the derivation of actual values.

Life cycle of great tributes

So, how do we really collect great tributes? There are a number of different approaches to implementation, as well as strategies and software-supported solutions.

  • Entering data into the system
  • Saving data from the shepherd
  • Calculation and data analysis
  • Visualization of results

First, let’s take a look at these categories of work processes, let’s talk about cluster computing, an important strategy that has many ways to process large data. Adjustment of the computational cluster is the basis of the technology applied to the skin stage of the life cycle.

Enrollment clusters

Due to the nature of great data, computers are not suitable for processing data. For which clusters are better suited, because they can cope with the savings and computing needs of large data.

Software for clustering large data will use the resources of many small machines, thereby providing a number of advantages:

  • Resource Sharing: Processing large sets of data requires a lot of processor and memory resources, as well as a lot of available space for storing data.
  • High availability: clusters can provide varying levels of visibility and availability without hardware or software failures affecting data access and processing. This is especially important for real-time analytics.
  • Scalability: clusters encourage horizontal scaling (adding new machines to the cluster).

For work in a cluster, there are necessary features for managing cluster membership, coordinating a subdivision of resources, and scheduling work with adjacent nodes. Membership in clusters and distribution of resources can be obtained through additional programs such as Hadoop YARN (Yet Another Resource Negotiator) or Apache Mesos.

An aggregate computing cluster often acts as a basis for processing data in a mutually security software. The machines in the computing cluster are also connected to the controls of a distributed saving system.

Retrieving tributes

Acceptance of tributes is the process of adding unobtained tributes to the system. The complexity of this operation largely depends on the format of the data and the extent to which the data is suitable for processing.

Large amounts of data can be added to the system using additional special tools. Technologies such as Apache Sqoop can take raw data from relational databases and feed it into a big data system. You can also use Apache Flume and Apache Chukwa – projects designed for aggregation and import of application logs and servers. Brokers, such as Apache Kafka, can be used as an interface between various data generators and a big data system. Frameworks like Gobblin can combine and optimize the execution of all tools at the end of the conveyor.

As soon as the data is received, analysis, sorting and marking should be carried out. This process is sometimes called ETL (extract, transform, load), which means transformation, transformation and transformation. Although this term is related to old data storage processes, it is also related to big data systems. Typical operations include changing input data for formatting, categorization and labeling, filtering and checking data for relevance.

Ideally, the data that is found will undergo minimal formatting.

Preservation of tributes

After receiving the data, move on to the components that contain the structure.

Ensure that unacquired data is saved across different file system divisions. Solutions such as HDFS in Apache Hadoop allow you to record large amounts of data on a number of nodes in a cluster. This system will provide computing resources with access to data, and can import data into the cluster’s RAM for memory operations and component failure detection. Instead of HDFS, you can use other file system distributions, including Ceph and GlusterFS.

Data can also be imported to other systems for more structured access. Partitioned databases, especially NoSQL databases, are well suited to this role because the fragments can process heterogeneous data. Sleeping without a soul different types divisions of databases, choose to store depending on how you want to organize and submit data.

Calculation and data analysis

Once only the data is available, the system can print the form. The calculation system is perhaps the largest part of the system, and the possible approaches here can be differentiated according to the type of information. Data is often processed repeatedly, either with one tool or with a series of tools to process different types of data.

Batch processing is one of the calculation methods for large data sets. This process involves breaking down the data into smaller parts, planning to process the skin part on a different machine, rearranging the data based on the intermediate results, and then calculating and collecting the residual result. This strategy is based on MapReduce in Apache Hadoop. Batch processing is most useful when working with very large sets of data that require extensive calculation.

Other jobs require processing in real time. In this case, information must be processed and prepared securely, and the system can promptly respond to the need for new information. One of the ways to implement real-time processing is to process the continuous flow of data that consists of several elements. Another unique characteristic of real-time processors is the calculation of data in the cluster memory, which eliminates the need to write to disk.

Apache Storm, Apache Flink and Apache Spark demonstrate different ways of implementing real-time processing. These advanced technologies allow you to select the best approach to a specific skin problem. Real-time processing is best suited for analyzing small fragments of data that change or are quickly delivered to the system.

All programs are frameworks. However, there are many other ways to calculate and analyze data from a big data system. These tools often connect to knowledge frameworks and provide additional interfaces for interacting with lower levels. For example, Apache Hive provides a data warehouse interface for Hadoop, Apache Pig provides a query interface, and interactions with SQL data are provided by Apache Drill, Apache Impala, Apache Spark SQL, and Presto. I've learned to use Apache SystemML, Apache Mahout and MLlib from Apache Spark. For direct analytical programming, which is widely supported by the data ecosystem, use R and Python.

Visualization of results

Often, recognizing trends and changes in data is increasingly important. Data visualization is one of the most powerful ways to identify trends and organize large numbers of data points.

Real-time processing is used to visualize server program metrics. Data change frequently, and great differences among figures indicate a significant influx of systems and organizations into the country. Projects like Prometheus can efficiently process data streams and time series and visualize this information.

One popular way to visualize data is the Elastic stack, formerly known as the ELK stack. Logstash is used for data collection, Elasticsearch for data indexing, and Kibana for visualization. The Elastic stack can process large amounts of data, visualize calculation results, and interact with raw metrics. A similar stack can be created by using Apache Solr to index the Kibana fork under the name Banana for visualization. This stack is called Silk.

Another visualization technology for interactive work in the data gallery is documents. Such projects allow for interactive research and visualization of data in a format that is suitable for comprehensive data mining and filing. Popular applications for this interface are Jupyter Notebook and Apache Zeppelin.

Glossary of Great Tributes

  • Great data is a broad term for data sets that can be properly summarized original computers and the tools through their service, fluidity and versatility. This term also depends on the technology and strategy for working with such data.
  • Batch processing is a computational strategy that involves processing data from large sets. Know that this method is ideal for working with non-terminal data.
  • Clustered computing is the practice of pooling the resources of several machines and managing their various capabilities until the task is completed. In this case, it is necessary to create a cluster that forms connections between adjacent nodes.
  • The lake of data is a great accumulation of collected data from a seemingly orphaned camp. This term is often used to refer to unstructured great data that changes frequently.
  • Data collection is a broad term for various practices of searching for patterns in large data sets. This is an attempt to organize the mass of data into a more intelligent and cohesive set of information.
  • The data warehouse is a large, well-ordered warehouse for analysis and information. In addition, the data lake consists of formatted and well-ordered data integrated with other devices. Collections of data are often thought of as great data, but often as components of primary data processing systems.
  • ETL (extract, transform, and load) – transformation, transformation and transformation of data. This is the process of extracting and preparing uncollected data before vikoristan. In connection with data collections, the characteristics of this process also appear in the pipelines of great data systems.
  • Hadoop – this Apache project is open output code for great tributes. It consists of a distributed file system called HDFS and a cluster and resource planner called YARN. Batch processing capabilities are provided by the MapReduce calculation engine. Simultaneously with MapReduce, you can run other computing and analytical systems in current Hadoop throats.
  • Memory enumeration is a strategy that transfers the movement of working data sets to the cluster memory. Interim calculations are not written to disk; instead, they are saved in memory. This gives the systems a great advantage in speed, equal to systems connected to I/O.
  • Machine learning is the research and practice of designing systems that can be started, adjusted and improved based on the data that is passed to them. This is due to the implementation of predictive and statistical algorithms.
  • Map reduce (not to be confused with MapReduce and Hadoop) is a method for planning a computational cluster. The process includes the subdivision between nodes and the subtraction of intermediate results, shuffling and the subsequent restoration of a single value for the skin set.
  • NoSQL is a broad term that means databases that are broken down into a traditional relational model. NoSQL databases are well suited for large data due to their flexibility and partitioned architecture.
  • Stream processing is the practice of calculating various elements of data moved by the system. This allows you to analyze data in real time and is suitable for processing term operations from various high-speed metrics.
Tags: ,

At the same time, I came across the term “Big Data” from German Gref (head of Oschadbank). Having said that, they immediately actively work on problems, which will help them speed up the hour of work with each client.

Suddenly I came across these concepts in the client’s online store, which they worked on and increased the assortment from several thousand to tens of thousands of product items.

The third time, if you realize that Yandex will require a big data analyst. Then I decided to delve deeper into this topic and immediately write an article to explain what this term is, which is swirling in the minds of TOP managers and the Internet space.

What is it like?

Anyway, I’ll start my article by explaining what this term is. By the way, I won’t blame you.

However, we were asked not to show the people how intelligent I am, but because the topic is truly complex and requires careful explanation.

For example, you can read the big data on Wikipedia, understand nothing, and then go back to this article to understand the relevant importance for business. Well, let’s finish with the description, and then on to business applications.

Big data means great things. It's amazing, isn't it? In fact, in English this is translated as “great tributes”. Ale is intended, one might say, for dummies.

Big data technology- this is an approach/method of processing a large amount of data to capture new information that is important to process in the most basic ways.

Data can be both aggregated (structured) and disaggregated (unstructured).

The term vinik itself recently appeared. In 2008, in a scientific journal, this approach was transferred as necessary for working with the great amount of information that increases in geometric progress.

For example, the amount of information on the Internet that needs to be saved, and of course processed, increases by 40%. Once again: +40% New information appears on the Internet as soon as possible.

Once the divided documents have become clear and the methods of processing them are clear (transfer into electronic form, put in one folder, numbered), then how to work with the information that is presented in other “media” and other volumes:

  • Internet documents;
  • Blogs and social networks;
  • Audio/video device;
  • Vimirival devices.

Є characteristics that allow you to bring information and data into big data. However, not all data can be used for analytics. These characteristics still contain the key concept of big data. All the stench fits into three V.

  1. About us(English volume). The data is determined by the size of the physical document, which facilitates analysis;
  2. Shvidkistost(English version: Velocity). The data does not stand at its own development, but gradually increases, which is why the same kind of processing is required to obtain results;
  3. Variety(English variety). The data may be of the same format. These can be divided, structured, or structured in parts.

However, from time to time VVV add the fourth V (veracity - reliability/plausibility of data) and add the fifth V (in some options it is viability, in others it is value).

Here I’m talking about 7V, which characterizes the data that is going on today. Ale, in my opinion, is not from the series (where P is periodically added, although 4 cobs are enough for understanding).

WE ARE ALREADY MORE THAN 29,000 people.
TURN ON

Who needs it?

It’s more logical, how can you analyze information (since the data is hundreds and thousands of terabytes)?

It's not like that. The axis is information. What did they come up with today? What is the situation with big data in marketing and business?

  1. Primary data bases cannot store and process (I’m not talking about analytics, but simply saving and processing) the enormous amount of information.
    The big date is in full swing. Successfully saves and preserves information with great care;
  2. Structures the video that comes from different sources (video, image, audio and text documents), into one single, sensible and gentle appearance;
  3. Formation of analytics and creation of accurate forecasts on the basis of structured and collected information.

It's complicated. To put it simply, if you are a marketer who understands that if you absorb a great deal of information (about you, your company, your competitors, your hobby), you can get decent results:

  • Beyond the understanding of your company and your business in terms of numbers;
  • Check out your competitors. And then, with my own black, let me rush forward for the rack of importance over them;
  • Find out new information about your clients.

And even though big data technology produces immediate results, everything with it fades away. They are trying to tie this up to the right of their company in order to avoid increased sales and changes in expenses. And specifically, then:

  1. Increased sales cross-country and additional sales for the increase in customer knowledge;
  2. The search for popular products and the reasons why people buy them (and for good reason);
  3. Improvement of product and service;
  4. Improved service level;
  5. Increased loyalty and customer focus;
  6. Advancement of shahraism (more relevant for the banking sector);
  7. Reduction of zaivich vitrat.

The most extensive example that applies to all devices is, first of all, the Apple company, which collects data about its users (phone, anniversary card, computer).

Through the presence of the ecosystem, the corporation itself knows about its profiteers and uses its means to steal profits.

You can read these and other articles on this page in any other article besides this.

Such a butt

I’ll tell you about another project. More precisely about the people who will be working on big data solutions in the near future.

This is Elon Musk and his company Tesla. Your main goal is to make cars autonomous, so that you sit behind the car, use the autopilot from Moscow to Vladivostok and... fall asleep, so that you don’t need to operate the car at all, and even build everything yourself.

It would seem like a fantasy? No! It’s just that Elon has done much wiser than Google, how to manage cars with the help of dozens of satellites. Let's take another route:

  1. Every car that is sold is equipped with a computer that collects all the information.
    Everything means everything burned up. About driving, the style of driving, roads nearby, the decline of other cars. The storage of such data amounts to 20-30 GB per year;
  2. Further, this information is transmitted via satellite communication to the central computer, which processes this data;
  3. Based on big data, as collected Denmark Computer, there will be a model of a driverless car

In fact, while Google is doing a lot better and its cars are wasting hours in accidents, Musk is doing a lot better when it comes to working with big data, and even test models show even worse results.

Ale... It's all about economics. What are we all about profits, and about profits? Much of what you can determine is not at all related to earnings and money.

Google statistics, based on big data, show a significant increase.

Before doctors announce the beginning of an epidemic of illness in any region in which the number of people is growing sound bites about the celebration of someone’s illness.

Thus, the correct interpretation of the data and their analysis can formulate forecasts and convey the beginning of the epidemic (and, obviously, its destruction) more widely, below the ranks of official bodies and their activities.

Zastosuvannya near Russia

However, Russia, as always, is a little bit of a galm. Thus, the importance of big data in Russia appeared to be no more than 5 years old (I’m talking about primary companies).

And don’t be surprised by those that are one of the fastest growing markets in the world (drugs and avoid nervous smoking), and the market for software for collecting and analyzing big data is growing by 32%.

To characterize the big data market in Russia, I can think of one old joke. Big date ce yak sex until 18 rocks. All we can say about this is that there is so much talk and little real action, and it is disgraceful for everyone to know that they themselves are not doing this. And to be honest, there is so much wealth, but there are few real actions.

However, the last company Gartner already in 2015 announced that today is no longer a growing trend (like, for example, piece intelligence), but rather independent tools for analysis and the development of advanced technologies.

The most active areas where big data is in Russia are banks/insurance (it’s not for nothing that I published an article with the head of Oschadbank), telecommunications sector, retail, security and the government sector.

For now, let’s talk in more detail about a couple of sectors of the economy, such as big data algorithms.

1. Banks

Let’s finish with banks and the information they collect about us and our affairs. For example, I took the TOP 5 Russian banks that actively invest in big data:

  1. Oschadbank;
  2. Gazprombank;
  3. VTB 24;
  4. Alfa Bank;
  5. Tinkoff Bank.

It is especially welcome among the Russian leaders of Alfa Bank. At a minimum, it is welcome to acknowledge that the bank, an official partner of any kind, understands the need to introduce new marketing tools to its company.

Ale apply vikoristannya and dalgo vprovadzhenya big data, I want to show you on the jar, which I deserve for the non-standard look and the details of my boss.

I'm talking about Tinkoff Bank. Their main task was to develop a system for analyzing large amounts of data in real time through a large client base.

Results: the hour of internal processes speeds up at least 10 times, and for active ones - more than 100 times.

Well, a little extravagance. Do you know why I started talking about the non-standard twists and changes of Oleg Tinkov? It’s just that, in my opinion, they themselves helped me transform from a middle-of-the-road businessman, like there are thousands in Russia, into one of the most famous and most respected entrepreneurs. Take a look at the confirmation and this unexpected video:

2. Indestructibility

With non-rukhomost, everything is much more complex. And this is the very same example that I want to give you for understanding your life in the context of emergency business. Legal details:

  1. Great commitment to textual documentation;
  2. Open dzherela (private companions who transmit data about earth changes);
  3. The great burden of uncontrolled information on the Internet;
  4. Permanent changes in the dzherelakh and data.

And on the basis of this, it is necessary to prepare and evaluate the value of a plot of land, for example, near a Ural village. This is the season for a professional.

The Russian Association of Estimators & ROSECO, both of them, have developed their own big data analysis with the help of software, at a cost of no more than 30 hours of inefficient work. Adjust the period for 30 minutes. The cost is colossal.

Tools and creations

Of course, large amounts of information cannot be stored and processed on simple hard drives.

And the security program, which structures and analyzes the data, gives rise to intellectual authority and thorough authorial development. However, these are the tools on the basis of which all this beauty is created:

  • Hadoop & MapReduce;
  • NoSQL Database;
  • Tools for the Data Discovery class.

To be honest, I can’t clearly explain to you what stinks differ from each other, since people are familiar with these words at physics and mathematics institutes.

That's what I was talking about that I can't explain? Do you remember how robbers all the time come into any bank and collect a large number of all sorts of hookers connected to the darts? The same and for good dates. For example, the axle model is like this Narazi one of the leaders in the market.

Big date tool

The price in the maximum configuration amounts to 27 million rubles per rack. This is obviously a luxury version. I want you to try to create big data for your business.

Briefly about smut

Can you ask what's next for you, a small or medium-sized business, with a robot in the near future?

At this point, I will remind you with a quote from one person: “In the near future, clients will demand companies to better understand their behavior, and their ads will best suit them.”

Let's look at the truth in our eyes. To distribute big data in a small business, you need no more than a large budget for the development and deployment of software, and for the management of accountants, such as big data analysts and system administrators.

And that’s why I’m talking about those that you may have such tributes for processing.

OK. For small businesses, the topic may not be stagnant. This does not mean that you need to forget everything you have read. Just enter your data and the results of data analytics from both foreign and Russian companies.

For example, Target's various analytics, using big data analytics, found that pregnant women before the next trimester of pregnancy (from the 1st to the 12th period of pregnancy) are actively purchasing unflavored foods.

We will give you the stink to force them with coupons and discounts on unflavored cats with the term dei in between.

What about this really small cafe, for example? Yes, very simple. Vikorist the loyalty program. And within an hour and with accumulated information, you will be able to not only convey to your clients the products that are relevant to their needs, but also generate the most unsold and high-margin products with just a couple of clicks of the bear.

Look back. It is unlikely that it will be easy to manage the results of a small business, but it is obligatory to obtain the results of other companies.