How to Create Waterfall Charts with Matplotlib and Plotly?

Kashish Rastogi 29 Aug, 2023 • 6 min read

Introduction

The waterfall chart, often referred to as Floating Bricks or Flying Bricks Charts, is a unique 2-Dimensional visualization. It serves as a powerful tool to analyze incremental positive and negative changes across time or multiple steps. As Anthony T. Hincks humorously notes, waterfalls can take on diverse forms. In this article, we delve into the significance of waterfall charts and demonstrate their creation using libraries like Matplotlib and Plotly.

 This article was published as a part of the Data Science Blogathon.

What is a Waterfall Chart?

The waterfall chart is frequently used in financial analysis to understand the positive and negative effects of multiple factors over a particular asset. The chart can show the effect based on either time based or category based. Category based charts represent gain or loss over expense or sales or any other variable having sequentially positive and negative values. Time based charts represent the gain or loss over the time period.

The waterfall chart is mostly in a horizontal manner. They start from the horizontal axis and are connected by a series of floating columns which are related to negative or positive comments. Sometimes the bars are connected with lines in the charts.

Waterfall Chart
Source: WIkipedia

Need of Waterfall Chart

Let’s take an example to understand when and where to use waterfall charts because making waterfall charts is not a big problem. We will take some dummy data and the Kaggle dataset to build a waterfall chart.

Example

If I give you a table in pandas not a normal one but a stylish one and a waterfall chart, which one is more convenient to read? Tell me?

This table represents the data for the sales for the whole one week and I have used the seaborn library to create heatmaps with the background_gradient

import seaborn as sns
# data
a = ['mon','tue','wen','thu','fri','sat','sun']
b = [10,-30,-7.5,-25,95,-7,45]
df2 = pd.DataFrame(b,a).reset_index().rename(columns={'index':'values',0:'week'})
# table
cm = sns.light_palette("green", as_cmap=True)
df2.style.background_gradient(cmap=cm)
Waterfall Chart data

Now, look at the table and waterfall chart side by side.

Waterfall Chart weekly sales

The table is showing the importance of values in order but it is quite difficult to read the values. But on the other hand, you can easily see that the yellow bar shows the decrement and the red bar shows the incremernt.

Waterfall chart with Plotly

The data which we are going to use it is taken from Kaggle of Netflix Movies and TV Shows the notebook can be found here.

We are going to use Plotly, an open source charting library.

Importing the library

import plotly.graph_objects as go

Dataset

df = pd.read_csv(r'D:/netflix_titles.csv')

Adding year and month and converting into proper date time format

df["date_added"] = pd.to_datetime(df['date_added'])
df['year_added'] = df['date_added'].dt.year
df['month_added'] = df['date_added'].dt.month
df.head(3)

Let’s prepare the data

d2 = df[df["type"] == "Movie"]
col = "year_added"
vc2 = d2[col].value_counts().reset_index().rename(columns = {col : "count", "index" : col})
vc2['percent'] = vc2['count'].apply(lambda x : 100*x/sum(vc2['count']))
vc2 = vc2.sort_values(col)

Now we will make a waterfall chart with Plotly trace go.Waterfall(). Now we are going to make a waterfall chart for Movies over the years.

fig2 = go.Figure(go.Waterfall(
    name = "Movie", orientation = "v", 
    x = ["2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021"],
    textposition = "auto",
    text = ["1", "2", "1", "13", "3", "6", "14", "48", "204", "743", "1121", "1366", "1228", "84"],
    y = [1, 2, -1, 13, -3, 6, 14, 48, 204, 743, 1121, 1366, -1228, -84],
    connector = {"line":{"color":"#b20710"}},
    increasing = {"marker":{"color":"#b20710"}},
    decreasing = {"marker":{"color":"orange"}},
))
Waterfall chart with Plotly

Let’s go through each parameter one by one:

  • x: The values which are going to be on the x-axis
  • y: The values which are going to be on the y-axis
  • text: The values which are going to be present on the charts
  • textposition: We can put the text inside the bars of the chart or above the bars of the charts

To make the charts elegant we will be giving the colors to bars of the charts and their connector line too. For increasing bars, I have given red color and for decreasing bars, it is yellow color.

The parameters for the charts:

  • connector: Giving colors to the connector line
  • increasing: Giving colors to the increasing bars
  • decreasing: Giving colors to the decreasing bars

As we see the chart it looks pretty good but let’s make it more attractive:

fig2.update_xaxes(showgrid=False)
fig2.update_yaxes(showgrid=False, visible=False)
fig2.update_traces(hovertemplate=None)
fig2.update_layout(title='Watching Movies over the year', height=350,
                   margin=dict(t=80, b=20, l=50, r=50),
                   hovermode="x unified",
                   xaxis_title=' ', yaxis_title=" ",
                   plot_bgcolor='#333', paper_bgcolor='#333',
                   title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
                   font=dict(color='#8a8d93'))

Now it looks perfect.

Let’s look at the parameters now.

  • title: Title for the chart
  • margin: Setting the margin for the chart: top, bottom, left, right
  • plot_bgcolor: Setting the plot background color
  • paper_bgcolor: Setting the paper background color
  • font: Setting the font properties
  • title_font: Setting the title font properties
  • I have hide the y-axis because by using update_yaxes(visible=False).

The Full code:

d2 = df[df["type"] == "Movie"]
col = "year_added"
vc2 = d2[col].value_counts().reset_index().rename(columns = {col : "count", "index" : col})
vc2['percent'] = vc2['count'].apply(lambda x : 100*x/sum(vc2['count']))
vc2 = vc2.sort_values(col)
fig2 = go.Figure(go.Waterfall(
    name = "Movie", orientation = "v", 
    x = ["2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020", "2021"],
    textposition = "auto",
    text = ["1", "2", "1", "13", "3", "6", "14", "48", "204", "743", "1121", "1366", "1228", "84"],
    y = [1, 2, -1, 13, -3, 6, 14, 48, 204, 743, 1121, 1366, -1228, -84],
    connector = {"line":{"color":"#b20710"}},
    increasing = {"marker":{"color":"#b20710"}},
    decreasing = {"marker":{"color":"orange"}},
))
fig2.update_xaxes(showgrid=False)
fig2.update_yaxes(showgrid=False, visible=False)
fig2.update_traces(hovertemplate=None)
fig2.update_layout(title='Watching Movies over the year', height=350,
                   margin=dict(t=80, b=20, l=50, r=50),
                   hovermode="x unified",
                   xaxis_title=' ', yaxis_title=" ",
                   plot_bgcolor='#333', paper_bgcolor='#333',
                   title_font=dict(size=25, color='#8a8d93', family="Lato, sans-serif"),
                   font=dict(color='#8a8d93'))

Waterfall chart with Matplotlib

Importing the waterfallcharts library using pip:

!pip install waterfallcharts

Importing the library:

import pandas as pd
import waterfall_chart
import matplotlib.pyplot as plt
%matplotlib inline

Let’s plot a waterfall chart for Each week’s sales data:

a = ['mon','tue','wen','thu','fri','sat','sun']
b = [10,-30,-7.5,-25,95,-7,45]
waterfall_chart.plot(a, b);

 

Waterfall Chart in Matplotib

If we look closely at the charts the bars having positive values are in green, negative values are in red and total value is in blue by default.

Adding some parameters to the chart

waterfall_chart.plot(a, b, net_label='Total', rotation_value=360)

parameters of the chart:

  • net_label: At the last bar we can change the name of the bar by net_label
  • rotation_value: Rotating and setting the value of the x-axis
matplotlib

Conclusion

In conclusion, the waterfall chart is invaluable in understanding the intricate dynamics of incremental changes. Its ability to visually represent positive and negative shifts over time or steps offers clarity in various scenarios. Whether tracking financial performance or analyzing project progress, the waterfall chart brings insights to the forefront. As you delve deeper into data visualization, consider expanding your skills with our BlackBelt program. This advanced program empowers you to master waterfall charts and many other data visualization techniques, enhancing your proficiency in analytics.

Frequently Asked Questions

Q1. What is a waterfall chart used for? 

A. A waterfall chart visualizes incremental changes in a total value, displaying the impact of positive and negative contributions over time or steps. It helps in understanding how different factors contribute to the final result.

Q2. What is a waterfall chart example?

A. An example of a waterfall chart could be tracking a company’s annual profit. It would show the initial profit, followed by positive factors like increased sales and cost reductions, and negative factors like expenses, resulting in the final profit.

Q3. Is a waterfall chart available in Excel? 

A. Yes, Excel offers the option to create waterfall charts. It’s a popular tool for generating this type of visualization, allowing users to display the cumulative effects of various values.

Q4. What is a waterfall chart in data visualization? 

A. In data visualization, a waterfall chart represents the cumulative impact of sequentially introduced positive and negative values on an initial point. It’s used to depict how different factors contribute to an outcome, making it easier to comprehend complex changes.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Kashish Rastogi 29 Aug 2023

A student who is learning and sharing with a storyteller to make your life easy.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear