In a world flooded with data, the ability to transform raw numbers into meaningful stories has become a sought-after skill. Data storytelling is more than just presenting facts; it’s an art that breathes life into information and sparks curiosity. Let’s explore the enchanting realm of data storytelling and learn how to captivate audiences with the persuasive magic of data-driven narratives.
Data storytelling is a method of communicating insights and information derived from data through the use of compelling narratives, visuals, and data-driven evidence. It involves presenting data in a way that makes it easier for people to understand, engage with, and draw meaningful conclusions from the information presented. By weaving data into a cohesive and persuasive story, data storytelling enables organizations and individuals to make informed decisions, influence stakeholders, and create impactful presentations.
The art of Data storytelling is simple and complex at the same time. Stories provoke thought and bring out insights that could not have been understood or explained before. It’s often overlooked in data-driven operations as we believe it’s a trivial task.
What we fail to understand is that the best stories not presented well end up being useless!
In several firms, the first step towards analyzing anything is story-boarding. Questions like why do we have to analyze it? what decisions can we make out of it? Sometimes, data alone tells such visual and intricate stories that we don’t need to run complex correlations to confirm it.
The best example of needing stories and visuals to explain data is the Anscombe’s Quartet. The Anscombe’s Quartet is a set of four datasets with very similar statistical summaries, but completely different when you visualize them.
These are the four datasets used during the depiction of the Anscombe’s Quartet. If we look at mere numbers, we find that their summary statistics are almost identical.
Let’s see how they appear when we visualize them.
Did you ever think these four quartets would have such varying visuals?
Note: Make sure you check out the comprehensive multi-course Certified Business Analytics Program that covers the art of storytelling through various industry examples and using tools like Excel, Python and Tableau.
To create a story or a plot is the first step to selling your ideas with a strong foot forward. Most people fail to think their stories through and cannot differentiate themselves from mediocrity. Let me take an example and guide you through the steps of creating stories.
We will be exploring a dataset that has news headlines and details of every stock price from the NASDAQ 100 tech companies. The columns selected are as follows:
Visually engaging presentations will inspire your audience, but they definitely need more work to be put in. One of the best presentations have been created on rough pages and tissue papers. Scripting down your ideas and flow before you start structuring your story is very essential to your final product. The single most important thing you can do to dramatically improve your analytics is to have a story to tell. A flow that you can generate can have a lot of friction in your end result.
Aristotle’s classic five-point plan that helps deliver strong impacts is:
The way I structured my report was by involving plots that would give me a better understanding of my data. First idea that I had was, how can I make better business decisions of stocks by using the data that I have?
Involving a line graph would help me analyze trend lines of specific stock prices.
As I can see, February 2016 has been a drop for all stocks. This would help me scrape news articles only from that period to identify what caused the drop. Now, how do I select which news source to scrape from?
By identifying which news source reported most about a particular stock, we would have reason to believe that this is a good source for the specific stock.
Now that you have put forward all points of your story, your conclusion should be short and powerful. In my report, I mentioned small 3-4 liner summaries to conclude why to buy a particular stock.
Let us see the common types of data we encounter and how to tell stories from those, by selecting the best-fit charts. Commonly encountered types of data:
When data is found in this form, it’s usually good to be finding how often a word has been used or what the sentiment of the text is. Stories can be told best using this form of data. One of the best-suited visualizations for textual data is the WordCloud. The wordcloud brings the more frequent ones to the center and enlarges them, giving us a clear picture of what the general idea of the text depicts.
For example, the wordcloud in this article displayed above gives a representation of the twitter dataset. It shows that love is the most frequent positive term used in the tweets.
When our data consists of numeric or any other variety of formats, we need to know which ones are important and give us better insights from our dataset.
The preferred visual for this kind of data can vary; here I will show you how to use facet grids for the data. I will be using the Titanic Passenger Data.
As this plot shows us, females and first-class passengers tend to have a higher survival chance than men who are a part of the crew or lower boarding classes.
Isn’t that what had really happened on the Titanic?
Another way to visualize this kind of data is by trying a multivariate plot. The dataset in use for this plot is the Car Performance and Specifications dataset.
Here we can see how Cars that have a heavier built are slower than the ones with lighter bodies. Makes sense, right?
When we encounter this kind of data, we’re usually looking for trends or lines that depict numbers. The visual that would suit numeric data best would be a line or a step graph.
Here, we can very clearly see the rise of prices at a local attraction for adults and children. See how easy it is to see the growth at each year interval?
One of the datasets that we also encounter are related to stocks. Stock market data is primarily a time series data of numeric values, but as a trader or an investor, I would like to understand each date and drop carefully.
The most visually captivating charts in this regard is the Candlestick chart.
Here, we take the example of Tesla’s stocks. The candlestick charts can be used to maneuver across each date and see the lows and highs of stocks individually. This could help us take better investment decisions based on current or past market trends.
As the graph shows us, February 2016 was a drop for Tesla’s stocks. We could now use this information to understand other market conditions and economic situations to make decisions about their stock.
When we have data pertaining to specific locations and areas, we use maps to add clarity and meaning to our analysis.
In this example, we can see how countries fared at and after the 2002 World Cup. Germany has scored the maximum number of goals, being one of the most dominant teams in world football ever since.
Often, we would be questioned about how our stories and visuals can work or help when it’s time to create mathematical models. During all stages of predictive modeling, storytelling could be a vital addition to your analysis.
Let us understand the basic steps involved in creating models out of our data and go through telling stories within them.
The first step of model building is understanding your data. I’ll give you instances and show you how you can explore your data without computing complex statistics.
Let’s consider a dataset on Wine Quality. This is the structure of the dataset is as follows
Here, we can see the associated summary statistics of the dataset in use.
So, if we need to see whether there is any correlation between alcohol volumes and wine qualities, how do we do it?
We could either compute Pearson’s ‘r’. It would help us in building a model, but would not help us in analyzing much.
This shows a very strong correlation between Alcohol content and wine quality. But does it tell you anything else?
Ideally, it doesn’t. So, what does?
Let’s see how we can visualize these and tell a lot more from them.
First, we’ll begin by seeing how Wine Quality relates to Alcohol content.
Here, we can see that the higher alcohol volumes relate to better wine qualities and it helps us come to a better understanding of our data. We can also spot outliers better in this scenario.
Next, would you wonder how acid contents in your wine affect its quality?
This would be one way to visualize the effects of acid. As the Violin Plot expands horizontally, it shows that there are higher numbers of data points within those areas.
After you generate features, how do you see how well one is predicting?
Graphs tell us how far away our predicted points are from our fitted line.
Another example where we might have to visualize newly created visuals is the Principal Component Analysis. If you want to get an in-depth understanding of PCA, you can go through this article.
This is the Iris dataset found in RStudio.
When we run the principal component analysis on this dataset, we find these statistics.
Although when we plot this, we find that the resulting visual is much more informative than the statistics.
Coming to the model creation phase, we usually find the need to understand how our data is being fitted.
This is a model that predicts whether the car should go fast or slow, based on the grade of the road and bumpiness.
As you can see, the decision boundary clearly classifies most of the data but an accuracy of 88.21% doesn’t tell much of a story. Here we can even see how far the misclassified points are from the decision boundary.
We can also compare certain algorithms and techniques by looking at their decision boundaries as we did above.
Another example using the Iris dataset is shown below.
Here, there’s not much information to derive valuable insights about our model.
To learn more about Support Vector Machines, you can go through this article.
On the other hand, this plot shows us a clear classification boundary where the Species separate from each other.
Now that you know the scenarios where we can use story telling to explain our point, I will give you a few practical tips when you take this up on your own.
In this data-driven age, storytelling has evolved beyond the realms of fiction and imagination. Data storytelling has emerged as a powerful tool, bridging the gap between raw numbers and captivating narratives. Mastering this art can unlock new avenues of influence, decision-making, and success.
Sign up for the Certified AI & ML Program by Analytics Vidhya to elevate your data storytelling skills to new heights. Unleash the potential of industry examples, Excel, Python, and Tableau, and become a storytelling expert. Explore the program now!
A. Data storytelling conveys data-driven insights and analysis through a compelling narrative that engages and informs the audience, making data more understandable and actionable.
A. The three key elements of data storytelling are data, visualization, and narrative. Data provides the foundation, visualization aids in comprehension, and a well-crafted narrative contextualizes and communicates the insights.
A.Data Analysis: Extracting insights from data using statistical methods.
Data Storytelling: Communicating insights through engaging narratives and visuals.
Focus: Analysis is about uncovering patterns; storytelling is about communication.
Techniques: Analysis uses statistical methods; storytelling uses narratives and visuals.
Purpose: Analysis uncovers insights; storytelling communicates them effectively.
Audience Engagement: Analysis informs; storytelling engages and influences.
A. An example of data storytelling could be a presentation that uses data and visualizations to explain the impact of a marketing campaign on sales, highlighting the key metrics, trends, and the story of how the campaign led to business growth.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
Such an important topic in Data Analysis. Great article - thanks for your insights. Do you have a favourite storyboard tool?
Thanks a lot, David! I've been very loyal to PowerBI by Microsoft, and it's yielded me fabulous results as well :)
Great article. Some very good clues on presenting data.
Thanks a lot!
Very Informative. The Concept of story telling using data visualization has been explained very well
Thank you, Pragati!
Thanks a lot Shantanu! I learnt few things from this.
Thanks a lot, Vineet! :)
Dear Kumar, This is so nsightful. Thank you for this masterpiece
Glad I could help, Frederick!
This is a really insightful article; I love the layout, how depth it is and the visual aids coupled with some well-explained situations. Thanks for sharing!
Thanks, Brandon! Means a lot :)
Thanks for such a comprehensive article. Great piece information and work.
Thank you! :)
Great article.well done
Thanks!
Lovely story told:) Enjoyed reading...
Hi I am ba graduate with 5 years exp in bpo. Can I do this course data scientist without it background. Can anyone suggest me
Thank you for this educational experience. It’s more a post. Reading this was a college class. Wow! Thank you.
Thanks a lot, Glenda! Glad you liked it :)
Hello Shantanu, have you published any research papers on data analytics. Please let me know, if you have published any. I will go through them.
Hi Sridevi! Yes I have, you can find it on my LinkedIn that is associated with my author profile. Thanks for your time!
Learnt some new things in a great way. Thanks Shantanu :)
Thanks a lot, Gaurav! :D
This is amazing and it will help me to make good stories out of my data.
Great Insights! Do you conduct workshops also on similar topics.
Nice writeup
Such an important topic in Data Analysis very informative Great article - thanks for your insights.
Thank you for this educational experience. It’s more a post. Reading this was a college class. Wow! Thank you.