Cloud ML In Perspective: Surprises of 2021, Projections for 2022

Let’s take a closer look on Cloud ML market in 2021 in retrospective (with occasional drills into realities of 2020, too). Read this in-depth analysis.



By George Vyshnya, Co-Founder/CTO at SBC




~ Life is just a series of conformity tests. — William Joseph Donovan

 

Introduction

 
 
In my article at KDnuggets in Jan 2021, I made the following forecast as for the Cloud Computing and Cloud ML industries, for 2021

  • The next couple of years will be crucial in the battle of Cloud Computing giants for minds, arms, and budgets in the Data Science and ML industry. Although AWS’s position still looks stronger than other top rivals, the challenges from GCP could be the intrigues part of the market reshaping in the years to come. At the same time, MS Azure seems to keep its strong positions in North America (while having little chances to penetrate other continents significantly vs. AWS and GCP).
  • However, we entered the age of global turbulence. 2021, the year under the Star of Kings, may expose us to unexpected surprises in every aspect of our lives.

I should admit both forecasts have been fulfilled.

The Battle of Giants, with AWS, GCP and MS Azure struggling for the market shares and revenues, continued in 2021, and each giant had its strong sides and weaknesses in the positioning their products across the globe.


As for the surprise, it happened indeed. In Cloud ML industry, we observed several hard-to-predict phenomena in 2021

- Google lost its market share after the recent consolidation of its individual cloud ML products under the unified product umbrella (Google Cloud Vertex AI)

- Databricks jumped into the top 3 products within Cloud ML market, replacing the rival offerings from Google there


Let’s take a closer look on Cloud ML market in 2021 in retrospective (with occasional drills into realities of 2020, too). The next sections of this article will enlighten the industry insights.

Note: When drilling for the insights, the data collected in Kaggle’s surveys of ‘State of Data Science and Machine Learning 2020’ (https://www.kaggle.com/c/kaggle-survey-2020) and of ‘State of Data Science and Machine Learning 2021’ (https://www.kaggle.com/c/kaggle-survey-2021) was used.

 

Databricks Breaking the Wall and Other Cloud ML Trends in 2021

 
 



We can find a lot of captivating facts and trends about Cloud ML industry as of 2021. They are as follows

  • Google’s move toward consolidating all their Cloud ML products under the new unified platform (that is, Google Cloud Vertex AI) in 2021 did not yield the increase in the Google’s market share in the enterprise (commercial) Cloud ML market.
  • Moreover, the major Cloud ML rivals within the Big Three Cloud Giants — Amazon SageMaker and Azure ML Studio — improved their positions, and they became the ultimate market leaders as of 2021 (with Amazon SageMaker taking the first place, being slightly ahead of Azure ML Studio).
  • We can see Databricks to take the third position in the Cloud ML product ranking as of 2021 (it is above Google Cloud Vertex AI, and it is slightly below Amazon SageMaker and Azure ML Studio)
  • Other Cloud ML challenger products (DataRobot, Dataiku, Alteryx, Rapidminer) are well below Databricks and Google Cloud Vertex AI.
  • Amazon SageMaker keeps the leading positions in the Cloud ML market for two years in a raw (in 2020–2021).
  • Still, a large fraction of the respondents indicate they do not use Cloud ML in their daily activities at all (the side note is, ‘None’ for 2020 and ‘None’ for 2021 could have quite different meaning, due to the drastic difference in the list of Cloud ML products exposed as the survey options in 2020 and 2021, respectively).
  • Databricks makes a strong offensive on Google’s unified AI platform in 2021 (so the recent decision of Google to invest in Databricks was probably the trigger to significant growth of Databricks’s market share in 2021).
  • Cloud ML industry leaders — Amazon SageMaker and Azure ML Studio — maintain the strong position vs. Databricks and Google Cloud Vertex AI as of the end of 2021.
  • India and USA are the top two countries with the biggest number of users of all Cloud ML products represented in Kaggle 2021 Survey questionnaire.
  • Amazon SageMaker takes leading positions in both India and USA.
  • Azure Machine Learning Studio takes the second place in India.
  • Databricks takes the second place in the USA, and it has good growing potential in the UK and EU countries in 2022 as well.
  • Companies with the biggest amount of ML spending prefer to work with Amazon SageMaker and Databricks, and Azure ML Studio takes the third place in such a ranking (Google Cloud Vertex AI is well behind them).
  • Google Cloud Vertex AI’s position is better within the companies with moderate and small ML spending (it could be an opportunity to convert them to the customers that pay more in future); it can potentially explore the opportunity to grow in the multiple locations outside North America and EU in 2022.

Let’s now jump into details to see the data-driven story behind the insights listed above.

Note: if you are interested in the detailed reproducing my discovery line, you are welcome to review the relevant Jupyter notebooks in the Github repo (https://github.com/gvyshnya/kaggle-2021-survey).

 

Usage of Cloud ML Products By Occupation and Programming Experience (2020–2021)

 
 
Let’s review how Cloud ML products mentioned in the 2021 survey have been used by the survey participants (with the breakdown by their occupation and programming experience).

The by-occupation break-down in 2021 looks as follows



Regarding the programming experience of the users of the major Cloud ML products, it is pictured below



From the charts above, it is evident that

  • The big fraction of the survey respondents does not use any Cloud ML products in their daily activities
  • In the smaller fraction of the survey respondents who use such tools, Data Scientists predominate
  • Amazon SageMaker and Azure ML Studio take the leadership in terms of the number of respondents using them
  • The rival product from Google (that is, Google Cloud Vertex AI) is behind the leaders as well as behind the challenger product appeared in the survey just this year (that is, Databricks)
  • Other challenger products (DataRobot, Dataiku, Alteryx, Rapidminer) are well behind the leaders
  • Additionally, the most use of Cloud ML products is observed among professionals with 1–3 years and 10–20 years of programming experience

Now, let’s review how Cloud ML products mentioned in the 2020 survey have been used by the survey participants (with the breakdown by their occupation and programming experience).

By-occupation break-down as of 2020 looked as follow



The spread of users of Cloud ML products by programming experience as of 2020 is provided below



We can see that

  • Google Cloud AI Platform / Google Cloud ML Engine lead the ML cloud products usage ‘nomination’ as of 2020
  • The second and third best were Amazon SageMaker and Azure Machine Learning Studio, respectively
  • Data Scientists were the top users of cloud ML products (for every product investigated)
  • There was a huge chunk of responders who indicated they do not use cloud ML products at all — this indicates the market is under-saturated, and there is a good growth potential, subject to resolving the marketing and end-user barries on the way
  • Additionally, we see the professionals with programming experience of 3–5 years and 5–10 year to use the cloud ML products the most, as of 2020

If we consolidate the observations in 2020 and 2021 respectively, we see that

  • Google’s move toward consolidating all of their Cloud ML products under the new unified platform (that is, Google Cloud Vertex AI) in 2021 did not yield the increase in the Google’s market share in the enterprice (commercial) Cloud ML market.
  • Moreover, the major Cloud ML rivals within the Big Three Cloud Giants — Amazon SageMaker and Azure ML Studio — improved their positions, and they became the ultimate market leaders as of 2021 (with Amazon SageMaker taking the first place, being slightly ahead of Azure ML Studio).
  • Azure Congnitive Services have been excluded from the survey in 2021 so it is hard to estimate the changes in its market position among the Kagglers vs. 2020.
  • It is hard to estimate the market position changes for the challenger Cloud ML products (DataRobot, Databricks, Dataiku, Alteryx, Rapidminer) in 2021 vs. 2020 since they were not listed in the survey for 2020.
  • However, we can see Databricks to take the third position in the Cloud ML product ranking as of 2021 (it is above Google Cloud Vertex AI, and it is slightly below Amazon SageMaker and Azure ML Studio).
  • Other Cloud ML challenger products (DataRobot, Dataiku, Alteryx, Rapidminer) are well below Databricks and Google Cloud Vertex AI.
  • Still, a large fraction of the respondents indicate they do not use Cloud ML in their daily activities at all (the side note is, ‘None’ for 2020 and ‘None’ for 2021 could have quite differnt meaning, due to the drastical difference in the list of Cloud ML products exposed as the survey options in 2020 and 2021, respectively).

If we try to interpret the observations above, matched with the major business development, fund raising, and publicity progress of Databricks achieved in Feb — Aug 2021 (see Appendix below for more details), we can tell that

  • Databricks had a strong offensive on Google’s unified AI platform in 2021 (so the recent decision of Google to invest in Databricks was probably the trigger to good growth of Databricks market share in 2021).
  • Cloud ML industry leaders — Amazon SageMaker and Azure ML Studio — maintained the strong position vs. Databricks and Google Cloud Vertex AI as of the end of 2021.

 

Usage of Cloud ML Tools by Organization Size and Industries (2021)

 
 
As for the usage of Cloud ML in the organizations of various size as of 2021, we observe the following picture



It is evident that

  • A lot of organizations in every industry do not use Cloud ML products as of 2021
  • The companies in Computers/Technologies use Cloud ML the most, with organizations in Academia/Education taking the second place, nand Accounting/Finance taking the third one
  • For every Cloud ML product under review, most of the users are in Computers/Technologies industry
  • For Azure Machine Learning Studio and Google Cloud Vertex AI, the second most popular industry is Academia/Education
  • For Amazon SageMaker and Databricks, the second most popular industry is Accounting/Finance

As for the break-down of Cloud ML users by industries as of 2021, it looks as follows



We find that

  • A lot of organizations in every industry do not use Cloud ML products as of 2021
  • Google Cloud Vertex AI and DataRobot are the most popular with the organizations of the smallest size (0–49 employees)
  • Azure Machine Learning Studio and Amazon SageMaker are equally popular for both the organizations of the smallest (0–49 employees) and largest (10k+ employees) sizes
  • Databricks is more popular with the large organizations (with 10k+ and 1000–9999 employees, respectively)

 

Cloud ML and ML Spending within Organizations

 
 
The picture as of 2021 is provided below



As we can see, the industry trends as of 2021 are as follows

  • Companies with the biggest amount of ML spending prefer to work with Amazon SageMaker and Databricks
  • Azure ML Studio takes the third place in such a ranking
  • Google Cloud Vertex AI is well behind the three leaders above for the organizations spending much money in ML spending
  • Google Cloud Vertex AI’s position is better within the companies with moderate and small ML spending (it could be an opportunity to convert them to the customers that pay more in 2022)
  • The rest of the Cloud ML products are far behind

If we look back at 2020, we can see the picture below



It is evident that

  • Amazon SageMaker clearly took the leading role in every ML spending category as of 2020 (like it does as of 2021).
  • Azure ML Studio and one of then-Google Cloud ML products (Google Cloud AI Platform/Google Cloud ML Engine) were almost on the par to take the second place after Amazon SageMaker, as of 2020.

 

Cloud ML Geography (2021)

 
 
Let’s look at the geographic spread of users of the leading Cloud ML products participated in Kaggle Survey 2021.

First of all, it should be noted India and USA to be the countries with the biggest number of ML engineers.

With that said, let’s look at the geographical insights for each of the leading Cloud ML products.

 

Amazon SageMaker

 
 



As we see, the geography of Amazon SageMaker users indicates that

  • India takes the first place (and it looks like Azure Sagemaker is the top Cloud ML product in this country, too)
  • The US takes the second place
  • Other countries are well behind India and USA
  • Japan takes the third place in terms of Amazon SageMaker user base (among the respondents to Kaggle 2021 survey)
  • We can see the moderate popularity of the product in Brazil, UK, EU countries, and Nigeria

 

Azure Machine Learning Studio

 
 



As we see, the geography of Azure Machine Learning Studio users indicates that

  • India is the major location of the product users (its popularity in India is only slightly below the popularity of Amazon SageMaker)
  • USA takes the second place (however, Azure Machine Learning Studio is quite below Amazon SageMaker in terms of the popularity)
  • Other countries are well behind India and USA
  • Nigiria takes the third place in terms of Amazon SageMaker user base (among the respondents to Kaggle 2021 survey)
  • We can see the moderate popularity of the product in Brazil, UK, EU countries, Pakistan, and Kenia

 

Google Cloud Vertex AI

 
 



As we see, the geography of Google Cloud Vertex AI users indicates that

  • India is the major location of the product users (although its popularity in India is much lower than the popularity of Amazon SageMaker and Azure Machine Learning Studio)
  • USA takes the second place (however, like in India, its popularity in the US is much lower than the popularity of Amazon SageMaker and Azure Machine Learning Studio)
  • Other countries are well behind India and USA
  • Surprisingly, Indonesia takes the third place in the list of the popular locations for Google Cloud Vertex AI
  • There is a bunch of countries across the globe that are close to Indonesia in terms of the number of survey respondents who use the product (Nigeria, Taiwan, China, Japan, South Korea, Turkey) — these areas could be potential opportunities for growing the sales and revenue from Google Cloud Vertex AI
  • There is very low interest in Google Cloud Vertex AI in Canada, UK and EU countries (the latter can be explained by some state- and industry-level policies that discriminate Google Cloud as a platform in favor of MS Azure and Amazon AWS, in EU and UK)

 

Databricks

 
 



As we see, the geography of Databricks users indicates that

  • India is the country with the largest number of Databricks users amoung the survey respondents in 2021 (Databricks outperforms Google Cloud Vertex AI in India, and it underperforms vs. Amazon SageMaker and Azure Machine Learning AI)
  • USA takes the second place in terms of the number of users of the product across the globe (Databricks acaully takes the second place in the US, after Amazon SageMaker, overperforming both Azure Machine Learning and Google Cloud Vertex AI)
  • Other countries are well behind India and USA
  • Brazil takes the third place in the world’s ‘Databrick rank’ (however, UK and EU countries are quite close to it)

 

What Is in it for 2022?

 
 


The trends described above will shape the future of the industry in 2022 for sure. From the insights I collected in my data-driven drill, I can forecast that

  • Being the leader of the industry in 2020–2021, Amazon SageMaker is going to retain his top position across the globe in 2022
  • Further proliferation of Databricks as a Cloud ML product in 2022 could be affected by the ongoing discussion whether doing ML on top of Spark is a good idea or not (while Spark ML libs are incredible powerful when it turns to the BigData-scale datasets, certain API drawbacks of Spark and Databricks leave some space for the criticism).
  • Despite the mentioned Spark ML debates, Databricks is likely to further improve its leading positions both in USA and China, with good growth potential in the UK and EU countries in 2022 as well.
  • Azure ML Studio offering is quite protected from the ‘Databricks offensive’ as Azure Cloud platform used to ‘contain’ Databricks as a solely BigData Engineering tool (with Azure Databricks service running in this role for a few years already).
  • Google Cloud Vertex AI can potentially explore the opportunity to grow in the multiple locations outside North America and EU in 2022, targeting medium and quickly growing companies there.
  • The game changer in the Google Cloud ML ecosystem in 2022 could be the massive promotion for BigQuery ML offering; since it can be a real alternative to the BigData-scale ML offering of Databricks, it may change the landscape in Cloud ML tools in the upcoming year.

 

Methodology and References

 
 
This article operates with the insights from the data collected in Kaggle’s survey of ‘State of Data Science and Machine Learning 2020’ (https://www.kaggle.com/c/kaggle-survey-2020) and of ‘State of Data Science and Machine Learning 2021’ (https://www.kaggle.com/c/kaggle-survey-2021), respectively.

Kaggle (www.kaggle.com) is a global community made up of data scientists and machine learners from all over the world with a variety of skills and backgrounds. The community has more than 3 million active members. Although it is not rigorously representative of the entire population of Data Science and ML professionals across the globe from the sociological perspective, it still constitutes the significant fraction of the practitioners and professionals in the field. Therefore, the results of the survey can really draw the projections of where the Data Science and AI/ML industry is likely to evolve in the next couple of years.

My comprehensive EDA-style notebooks for Kaggle 2020 Survey and Kaggle 2021 Survey are listed below

The source code for the data-driven drill to collect the insights described in this article are available in the repo in GitHub per https://github.com/gvyshnya/kaggle-2021-survey

My article per “Cloud Computing, Data Science and ML Trends in 2020–2022: The battle of giants” has been published by KDNuggets in Jan 2021 per /2021/01/cloud-computing-data-science-ml-trends-2020-2022-battle-giants.html

You can also find a lot of interesting insights in “State of AI Report 2021” (https://www.stateof.ai/).

 

Appendix. History of Databricks

 
 
Since launching its original data platform built on Apache Spark in 2015, Databricks has grown into a one-stop home for (un)structured data, automated ETL, collaborative data science notebooks, business intelligence using SQL, and full-stack machine learning built on open source MLflow. Interestingly, all three major cloud vendors — Amazon, Google, and Microsoft — invested in Databricks in February 2021.

Databricks is headquartered in San Francisco. It also has operations in Canada, UK, Netherlands, Singapore, Australia, Germany, France, Japan, China, and India.

With this company having the investments from all of three Cloud Computing giants (Amazon, Microsoft, and Google) as of Feb 2021 as well as receiving the largest valuation vs. its competitors as of Aug 2021, it becomes a strong challenger to the industry. From this perspective, Databricks products/offerings can potentially

- partially cannibalize other BigData, Cloud ML and AutoML products of Big 3 Cloud Computing providers (AWS, MS Azure, Google Cloud Platform)

- make a strong competition to the challenger products in Cloud ML and enterprise AutoML segments (H20.ai, DataRobot etc.)

The chart below displays the key points in the corporate history of the company


 
Bio: George Vyshnya is Co-Founder/CTO at SBC, helping CEOs and CTOs to grow revenue via implementing smart AI, BI and Web solutions.

Original. Reposted with permission.

Related: