Gold BlogWhich Data Science Skills are core and which are hot/emerging ones?

We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.



The latest KDnuggets Poll asked
1. Which skills / knowledge areas do you currently have (at the level you can use in work or research)? and
2. Which skills do you want to add or improve?


We selected a list of 30 skills based on a number of previous KDnuggets articles and polls - see useful links at the end of this post, as well as external sources.

Altogether(*), this poll received over 1,500 votes - a large enough sample to make meaningful inferences. An average voter reported having 10 skills and wanted to add or improve 6.5 skills.

Fig. 1 below shows key findings, with X-axis showing % Have Skill - answers to the first poll question, and Y-axis showing % Want Skill - answers to the 2nd poll question. The size of each circle is proportional to the percent of voters that have that skill, while color depends on the ratio of Want/Have (red is high - more than 1, blue is low - less than 1).
Note: Other Big Data Tools entry is for Big Data tools other than Hadoop or Spark.

Skill Want Have Scatter
Fig. 1: Data Science-related Skills, Have skill vs Want to add or improve skill

We note two main clusters in this chart.

Cluster 1, in blue dashed rectangle on the right side of the chart, includes skills that over 40% of all voters have, and where the ratio of Want/Have is less than 1. We call them Core Data Science Skills.. They are listed in Table 1.

Table 1: Core Data Science Skills, in decreasing order of %Have
Skill%Have%Want%Want/
%Have
Python71.2%37.1%0.52
Data Visualization69.0%25.3%0.37
Critical Thinking66.7%15.5%0.23
Excel66.5%4.6%0.07
Communications Skills65.9%16.5%0.25
Machine Learning64.3%41.0%0.64
Statistics63.8%27.8%0.44
SQL/Database Coding57.3%16.0%0.28
Business Understanding57.0%22.2%0.39
Math52.6%17.5%0.33
ETL - Data Preparation48.3%14.1%0.29
R45.1%19.8%0.44
Scikit-learn44.1%24.0%0.54



Of these, the skills with most desire to add or improve are Machine Learning (41%) and Python (37%). The least growing skill is Excel - only 7% want to add or improve their Excel skills.

The second cluster, on the left in Fig. 1 and marked with a red border includes skills currently less popular (%Have< 30%) but growing, with %Want/%Have ratio over 1 - see table 2. We call them Hot / Emerging Data Science Skills.

Table 2: Hot / Emerging Data Science Skills, in decreasing order of %Want/%Have
Skill%Have%Want%Want/
%Have
Pytorch7.0%29.6%4.26
Scala4.2%13.3%3.14
Other Big Data Tools8.9%27.4%3.08
TensorFlow19.1%46.4%2.44
Apache Spark16.0%34.6%2.16
Hadoop10.9%22.7%2.08
Deep Learning25.9%49.6%1.92
No-SQL Databases14.0%23.2%1.65
NLP - Text Processing25.0%33.8%1.35
Kaggle14.5%18.4%1.27
Unstructured Data22.3%27.7%1.24

Interestingly, despite opinions that Hadoop is declining, in this poll more people want to learn Hadoop than already know it, so it may still grow in popularity.
We did not include Julia among hot/emerging skills despite its high Want/Have ratio=3.4, because with only 2% of voters selecting it, it doesn't yet have enough support.

The remaining skills - XGBoost, Software Engineering, Java, MATLAB, SAS are possessed by between 10 and 30% of voters, but are not growing - have Want/Have ratio < 1.

Table 3: Other Data Science Skills, in decreasing order of %Have
Skill%Have%Want%Want/
%Have
Software Engineering25.7%15.2%0.59
XGBoost22.3%19.0%0.85
Java15.1%7.7%0.51
SAS12.7%7.2%0.57
MATLAB10.9%7.9%0.73
Julia2.0%6.9%3.44

Here is more detail on the poll. Fig. 2 ranks all the skills in decreasing order of %Have.

Skill Have All
Fig. 2: Data Science Skills KDnuggets readers have


Fig. 3 show the skills readers want to add or improve, overlayed with skills they have.

Skill Want Have All
Fig. 3: Data Science Skills KDnuggets readers want to add or improve (red) and have (blue)
We see that the top skills current and aspiring Data Scientists want to add are Deep Learning, Tensorflow, Machine Learning, and Python.

Poll also asked about employment type:
  • Industry/Self-employed, 64.4%
  • Government/non-profit, 7.2%
  • Academia/University, 7.0%
  • Student, 14.3%
  • Other/NA, 7.1%
Regional distribution was:
  • US/Canada, 37.9%
  • Europe, 28.3%
  • Asia, 19.3%
  • Latin America, 6.1%
  • Africa/Middle East, 4.8%
  • Other, 3.5%
This poll presents an initial analysis and, depending on the popularity of this post, we will take a more detailed look at associations between skills, employment types, and regions later.

Note: We originally launched this poll using Google Forms, and it was attacked by bots with over 50,000 votes for Julia and MATLAB each. We removed bot votes while keeping other votes and relaunched poll using another platform, however without Julia and MATLAB - to avoid another attack. Final Julia and MATLAB results are estimated based on the valid votes in the first poll version.



Related: How to Become More Marketable as a Data Scientist