Big Data Analytics – A look ahead to 2020
As 2020 begins we are entering a new era for data analytics that centers around some core transformations of the entire industry. In 2019 we saw the migration of corporate data to the cloud–faster than most people had imagined would be possible from the previous world of on-premises data warehouses, the emergence of serverless and pre-trained general-purpose machine learning systems, and the rising dominance of the three major cloud providers. These themes, and several new ones, will occupy the days and minds of Chief Data Officers and others responsible for data and analytics as we enter into 2020. Here are some things to look forward to in this new year.
Last Throes of on-Premises Data Management
Hadoop has been called by some of the biggest head-fake in the technology world in the last 50 years. There was so much momentum behind the big data platform just a few years ago that it seemed like any other data management approach was on a fast-track to the dust bin of history. While Cloudera has since consolidated the Hadoop industry and has pivoted away from many discussions of Hadoop at all, the desire for companies to have a complex data management platform is starting to wane. Data warehousing platforms such as Teradata, Netezza, and Greenplum were hard enough for companies to scale, operate, and pay for, and Hadoop seemed to end up taking that to an even higher level of complexity.
What people want now is a simple platform for data storage and data analysis, one that is easy to operate and use, and compatible with all of their existing data analytics and data transformation technologies. They also want a data platform that can be on on-ramp into the world of machine learning. As we enter 2020 there are no remaining on-premises data platforms that can meet these needs, and also provide a level of cost control yet instant scalability that can be achieved by data platforms that are native to the cloud.
The emergence of full-featured Cloud Data Platforms
Maturing seemingly just in time to catch the fallout of the above trend, cloud-native data platforms are the obvious solution as we enter 2020. Even large companies that a few years ago said they would never store and process their data on a public cloud system are doing just that. The cost savings, scalability benefits, and potentially most importantly the ability to access cutting-edge machine learning and AI services in a seamless manner are too compelling to ignore. Further, concerns about the security of the cloud have turned out to be relatively unfounded, as companies realize that the large cloud providers have hired all the best security experts, and they may as well leverage that versus trying to hire their own.
Cloud data platforms as we enter 2020 now cover the storage of data as well as the processing of data, including data loading and transformation, database queries (both transactional and analytical), as well as machine learning operations. The cloud data platforms offered by the large cloud mega-vendors such as Google, Amazon, and Microsoft, as well as by independent data platform upstart Snowflake (together, the “GAMS Platforms”) represent the best of breed cloud data platform offerings as we enter into 2020. If you are not using or exploring the use of one of these platforms as the core for your data management strategy by the end of this year, you will be in a small minority of the world.
Serverless, pre-trained Machine Learning Systems
One of the benefits of using a modern cloud data platform in 2020 is that you can have seamless access to pre-trained machine learning models that are starting to be provided by the large cloud vendors. In the past, you would need to assemble training data, train an ML model, and then deploy it to get benefits from machine learning. But these cloud vendors have now done all of that for you, for popular machine learning areas like forecasting, image recognition, sentiment analysis, and personalization.
For example, Amazon Forecast is trained by Amazon using Amazon and other data that they have gotten from their own business–some of which would likely be impossible for any other company to replicate on their own. Amazon has trained this model to forecast any real-world set of events. It takes your own data, and uses what it has learned from Amazon-sourced data, and can predict any future time series based on what has happened in the past. Amazon Personalize is similar but it predicts the best offer or other things to offer to a user, based on what users like them have done in the past.
If your data is stored in a cloud data platform, it is very easy and very inexpensive to deploy these best of breed machine learning models against your data without doing the normal legwork to assemble training data and train models. This is definitely an area worth exploring in 2020 as large benefits can be achieved with relatively small amounts of effort, and without having to hire your own army of data scientists.
Privacy Regulation
The new CCPA legislation from California is now live as of January 1, 2020, and it impacts the way companies need to collect, store, and use data well beyond the hills and shores of the great state of California. CCPA, together with the European GDPR legislation, for the first time, put strong bounds on how companies can use certain kinds of personal data. And since most companies aren’t about to start turning away business from Europeans or Califorianians, the impact of these regulations is vast.
By now you likely have a GDPR and CCPA plan in place to comply, and if you don’t you better get moving on that quickly. But beyond the basic legal checkboxes, these regulations are likely to have a big effect on how you process and especially share data with other companies. You may find that you can no longer share data containing personal information about consumers, even with business partners of yours outside of your company, for example. You also may want to look into technologies such as Immuta and SecuPi that can mask personal data in your user’s query results, so they can analyze your customer base but without being able to see the PII in your customer databases.
2020, Just the Beginning on all these Fronts
And 2020 is just the beginning in these areas. Cloud data platforms’ capabilities are leaping forward by the day. More and more pre-trained machine learning models are being offered by large companies and also by smaller companies. And privacy concerns and regulations may start out as a constraint or drag on your business but if properly complied with, could unlock new avenues of value-add for your customers, and increased trust between you and your customer base. So keep an eye on these trends, and make sure you are focused on getting the most of them as we kick off 2020, as their importance will only grow from here!
———————————————————————————————————————————————————————————————————————————-
This post was authored by Tatiana Langseth. If you want to get featured on our website please reach us at advertising@alltechevent.com
Author Details:
Tatiana Langseth
Founder & CEO- Augaroo, Inc.
Follow Tatiana Langseth on Linkedin