Skewness and Kurtosis: Quick Guide (Updated 2024)

Suvarna Gawali 07 Feb, 2024 • 7 min read

Introduction

“Skewness essentially is a commonly used measure in descriptive statistics that characterizes the asymmetry of a data distribution, while kurtosis determines the heaviness of the distribution tails.”

Understanding the shape of data is crucial while practicing data science. It helps to understand where the most information lies and analyze the outliers in a given data. In this article, we’ll learn about the shape of data, the importance of skewness, and kurtosis in statistics. The types of skewness and kurtosis and Analyze the shape of data in the given dataset. Let’s first understand what skewness and kurtosis is.

Learning Objectives

  • In this article, you will learn about Skewness and its different types.
  • You will learn how to calculate the Skewness Coefficient.
  • This article will also help you learn about Kurtosis and its type.

What Is Skewness?

Skewness is a statistical measure that assesses the asymmetry of a probability distribution. It quantifies the extent to which the data is skewed or shifted to one side.

Positive skewness indicates a longer tail on the right side of the distribution, while negative skewness indicates a longer tail on the left side. Skewness helps in understanding the shape and outliers in a dataset.

Depending on the model, skewness in the values of a specific independent variable (feature) may violate model assumptions or diminish the interpretation of feature importance.

A probability distribution that deviates from the symmetrical normal distribution (bell curve) in a given set of data exhibits skewness, which is a measure of asymmetry in statistics.

A skewed data set, typical values fall between the first quartile (Q1) and the third quartile (Q3).

The normal distribution helps to know a skewness. When we talk about normal distribution, data symmetrically distributed. The symmetrical distribution has zero skewness as all measures of a central tendency lies in the middle.

Skewness and Kurtosis m=m=m,Skewness and Kurtosis

In a symmetrically distributed dataset, both the left-hand side and the right-hand side have an equal number of observations. (If the dataset has 90 values, then the left-hand side has 45 observations, and the right-hand side has 45 observations.). But, what if not symmetrical distributed? That data is called asymmetrical data, and that time skewness comes into the picture.

Types of Skewness

Positive Skewed or Right-Skewed  (Positive Skewness)

In statistics, a positively skewed or right-skewed distribution has a long right tail. It is a sort of distribution where the measures are dispersing, unlike symmetrically distributed data where all measures of the central tendency (mean, median, and mode) equal each other. This makes Positively Skewed Distribution a type of distribution where the mean, median, and mode of the distribution are positive rather than negative or zero.

1. Positive skewed or right-skewed  

In positively skewed, the mean of the data is greater than the median (a large number of data-pushed on the right-hand side). In other words, the results are bent towards the lower side. The mean will be more than the median as the median is the middle value and mode is always the most frequent value.

Extreme positive skewness is not desirable for a distribution, as a high level of skewness can cause misleading results. The data transformation tools are helping to make the skewed data closer to a normal distribution. For positively skewed distributions, the famous transformation is the log transformation. The log transformation proposes the calculations of the natural logarithm for each value in the dataset.

Negative Skewed or Left-Skewed (Negative Skewness)

A distribution with a long left tail, known as negatively skewed or left-skewed, stands in complete contrast to a positively skewed distribution. In statistics, negatively skewed distribution refers to the distribution model where more values are plots on the right side of the graph, and the tail of the distribution is spreading on the left side.

In negatively skewed, the mean of the data is less than the median (a large number of data-pushed on the left-hand side). Negatively Skewed Distribution is a type of distribution where the mean, median, and mode of the distribution are negative rather than positive or zero.

Negative skewed or left-skewed

Median is the middle value, and mode is the most frequent value. Due to an unbalanced distribution, the median will be higher than the mean.

How to Calculate the Skewness Coefficient?

Various methods can calculate skewness, with Pearson’s coefficient being the most commonly used method.

Pearson’s first coefficient of skewness
To calculate skewness values, subtract the mode from the mean, and then divide the difference by standard deviation.

Pearson’s first coefficient of skewness

As Pearson’s correlation coefficient differs from -1 (perfect negative linear relationship) to +1 (perfect positive linear relationship), including a value of 0 indicating no linear relationship, When we divide the covariance values by the standard deviation, it truly scales the value down to a limited range of -1 to +1. That accurately shows the range of the correlation values.

Pearson’s first coefficient of skewness is helping if the data present high mode. However, if the data exhibits low mode or multiple modes, it is preferable not to use Pearson’s first coefficient, and instead, Pearson’s second coefficient may be superior, as it does not depend on the mode.

Pearson’s second coefficient of skewness
subtract the median from the mean, multiply the difference by 3, and divide the product by the standard deviation.

Pearson’s second coefficient of skewness

Rule of thumb :

  • For skewness values between -0.5 and 0.5, the data exhibit approximate symmetry.
  • Skewness values within the range of -1 and -0.5 (negative skewed) or 0.5 and 1(positive skewed) indicate slightly skewed data distributions.
  • Data with skewness values less than -1 (negative skewed) or greater than 1 (positive skewed) are considered highly skewed.

What Is Kurtosis?

Kurtosis is a statistical measure that quantifies the shape of a probability distribution. It provides information about the tails and peakedness of the distribution compared to a normal distribution.

Positive kurtosis indicates heavier tails and a more peaked distribution, while negative kurtosis suggests lighter tails and a flatter distribution. Kurtosis helps in analyzing the characteristics and outliers of a dataset.

The measure of Kurtosis refers to the tailedness of a distribution. Tailedness refers to how often the outliers occur.

Peakedness in a data distribution is the degree to which data values are concentrated around the mean. Datasets with high kurtosis tend to have a distinct peak near the mean, decline rapidly, and have heavy tails. Datasets with low kurtosis tend to have a flat top near the mean rather than a sharp peak.

In finance, kurtosis is used as a measure of financial risk. A large kurtosis is associated with a high level of risk for an investment because it indicates that there are high probabilities of extremely large and extremely small returns. On the other hand, a small kurtosis signals a moderate level of risk because the probabilities of extreme returns are relatively low.

What Is Excess Kurtosis?

In statistics and probability theory, researchers use excess kurtosis to compare the kurtosis coefficient with that of a normal distribution. Excess kurtosis can be positive (Leptokurtic distribution), negative (Platykurtic distribution), or near zero (Mesokurtic distribution). Since normal distributions have a kurtosis of 3, excess kurtosis is calculated by subtracting kurtosis by 3.

               Excess kurtosis  =  Kurt – 3

Types of Excess Kurtosis

  1. Leptokurtic or heavy-tailed distribution (kurtosis more than normal distribution).
  2. Mesokurtic (kurtosis same as the normal distribution).
  3. Platykurtic or short-tailed distribution (kurtosis less than normal distribution).
Skewness and Kurtosis

Leptokurtic (Kurtosis > 3)

Leptokurtic has very long and thick tails, which means there are more chances of outliers. Positive values of kurtosis indicate that distribution is peaked and possesses thick tails. Extremely positive kurtosis indicates a distribution where more numbers are located in the tails of the distribution instead of around the mean.

Platykurtic (Kurtosis < 3)

Platykurtic having a thin tail and stretched around the center means most data points are present in high proximity to the mean. A platykurtic distribution is flatter (less peaked) when compared with the normal distribution.

Mesokurtic (Kurtosis = 3)

Mesokurtic is the same as the normal distribution, which means kurtosis is near 0. In Mesokurtic, distributions are moderate in breadth, and curves are a medium peaked height.

Conclusion

Skewness and Kurtosis naturally complement each other in analyzing data distributions. Skewness, which measures the symmetry or asymmetry of data distribution, helps us understand if the data is pushed towards one side or the other. For instance, positive skewness indicates a distribution pushed towards the right side, while negative skewness implies a distribution pushed towards the left side. On the other hand, Kurtosis helps determine whether the data exhibits a heavy-tailed or light-tailed distribution. By incorporating both Skewness and Kurtosis into our analysis, we gain a more comprehensive understanding of the shape and characteristics of the data.

Skewed data may cause the tail region to act as an outlier for the statistical model, and such outliers can adversely impact the performance of the model, particularly in regression-based models. Some statistical models are robust to outliers like Tree-based models, but it will limit the possibility of trying other models. So there is a necessity to transform the skewed data to be close enough to a Normal distribution.

Key Takeaways

  • Skewness is a statistical measure of the asymmetry of a probability distribution. It characterizes the extent to which the distribution of a set of values deviates from a normal distribution.
  • Skewness between -0.5 and 0.5 is symmetrical.
  • Kurtosis determines whether the data exhibits a heavy-tailed or light-tailed distribution.
  • Data sets with high kurtosis have heavy tails and more outliers, while data sets with low kurtosis tend to have light tails and fewer outliers.
  • Excess kurtosis can be positive (Leptokurtic distribution), negative (Platykurtic distribution), or near zero (Mesokurtic distribution).

Frequently Asked Questions

Q1. Is kurtosis a measure of shape?

A. Kurtosis describes the shape of the distribution tale in relation to its overall shape. Low kurtosis can sharply peak a distribution, while high kurtosis can result in a distribution with a lower peak.

Q2. What does negative kurtosis indicate?

A. A distribution with a negative kurtosis value indicates that the distribution has lighter tails than the normal distribution.

Q3. What is the shape of a data distribution?

A. A distribution of data item values may be symmetrical or asymmetrical. Two common examples of symmetry and asymmetry are the ‘normal distribution’ and the ‘skewed distribution.’

The media shown in this article on skewness and Kurtosis are not owned by Analytics Vidhya and is used at the Author’s discretion.

Suvarna Gawali 07 Feb 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Roberto
Roberto 19 Aug, 2021

Hi SUVARNA3, I am a data scientist at KNIME and I really like your article! How can I get in touch with you?

binod budha
binod budha 07 Jan, 2022

yes l am intersting for you lesson today.

gheith
gheith 23 Aug, 2023

thanks , keep it up

Suvarna Gawali
Suvarna Gawali 28 Dec, 2023

Hii Roberto, you can connect with me over mail: [email protected] Thank you

Machine Learning
Become a full stack data scientist