We are in 2018, the “big data” is devoted buzzword of the year in computer science. “Glory to Big Data” … But what is it?
For some time now, companies need large volumes of data (web, retail, telecom, …). This domain is reserved for them, and requires specific and specific projects that are adapted to them. In 2017, the Gartner sets the main principles, and defines what will become big data by “3 V” .
1 – Volumetry
The firm IDC expects an increase in data volume of 45% per year , a volume doubling every 2 years. The main datacenter of the NSA , which contain Yotta-bytes of data (thousands of billions of terabytes of data) is an example to that. It’s obvious that not everyone is manipulating such large volumes of data. However, the data go will soon be overwhelmed by many applications, especially when aggregating different data sources.
2 – Variety
The volume of unstructured data such as videos, images, tags, text, increases five times more than structured data. They carry a huge surplus value. In the new areas of analysis, we can cite twitter and its real-time monitoring of buzz or analysis of tweets to understand the kind of information is being exchanged in large numbers. Videos or images can be analyzed for facial recognition, reading license plates, etc. While audio streams can be transcribed into text for better processing.
3 – Velocity
These huge data streams need to be captured and analyzed quickly. This is made possible by the use of NoSQL, more flexible and scalable than SQL databases, as well as massive parallelizations (Hadoop, appliances). For example, Facebook integrates 2000 TB of data each day with HDFS, and analyzes 1050 TB of data in 30 minutes thanks to Hive. HDFS and Hive are two technologies of Hadoop.
4 – and some other V …
According to the publishers and the points they want to develop (… with their marketing), we can see other “V” highlighted, such as valuation, visualization, veracity. These additional adjectives are specific to each publisher. Although interesting, they are not part of the original definition of big data.
“This is a revolution” full of potential! Data sources of all types will be able to be cross-referenced and massively processed.