Important SQL Functions Every Data Analyst Must Know!

Vishesh Arora 01 Aug, 2021 • 5 min read

This article was published as a part of the Data Science Blogathon

Introduction

SQL is one of the most important skills for an Analyst. SQL helps you query your data, which helps you find answers to all your questions and validate hypotheses. In this article, I’ve listed some of the most important functions you should know if you are starting out as an Analyst.

SELECT and FROM

Let us imagine we have a database called ‘PRODUCTS’ which contains information on different products available in a store. What if we want to look at the entire data at once? We can do that using the SELECT statement and FROM statement in SQL. You can use the code below:

SELECT Category, Vendor, Price
FROM PRODUCTS;

You will get the following output:

select SQL Functions

The SELECT statement allows you to select columns from the database. The FROM statement is to indicate which table the data needs to be picked up from.

In case you don’t want to type out individual column names, and want to select all the columns in the table, you can use * to denote that:

SELECT *
FROM PRODUCTS;

This will return all columns present in the PRODUCTS table.

DISTINCT

From the data, it is clear that the category column contains a set of fixed values. What if we want to select all distinct values present in the category column? We can use the DISTINCT statement for that. You can use the code below:

SELECT DISTINCT Category
FROM PRODUCTS;

You will get the following output:

distinct | SQL Functions

WHERE

What if we only want to select only those records when a certain criterion is met? We can use the WHERE statement in such a situation. WHERE acts as a filter that helps us to filter rows basis certain conditions. For instance, if we want to filter rows where the price is more than 60, we can use the code below:

SELECT Category, Vendor, Price, Rating
FROM PRODUCTS
WHERE Price > 60;

You will get the following output:

where | SQL Functions

You can see that the price is greater than 60 in all records.

We can also use multiple conditions (filters) under where statement using the AND clause. For instance, the below code will filter out records where the price is more than 60 and the rating is more than 3.

SELECT Category, Vendor, Price, Rating
FROM PRODUCTS
WHERE Price > 60
AND Rating > 3;

ORDER BY

What if we want to see all records sorted by rating, or by price? The ORDER BY clause in SQL will help us do that. ORDER BY helps us in ordering records based on the values present in a column.

SELECT Category, Vendor, Price, Rating
FROM PRODUCTS
WHERE Price > 50
ORDER BY Rating;

The above code will list all records where the price is more than 50 in increasing order of rating. The output of the above piece of code will look like this:

order by

It will order records in ascending order by default, but we can also arrange records in descending order by specifying DESC at the end of the ORDER BY statement.

SELECT Category, Vendor, Price, Rating
FROM PRODUCTS
WHERE Price >= 50
ORDER BY Rating DESC;

The above code will list out records in decreasing order of rating.

MIN, MAX, and AVG

MIN helps us in identifying the minimum value in a column, MAX in the maximum, and AVG in the average value in a column. Let us use these functions to calculate the minimum, maximum, and average value in the price column. You can use the code below:

SELECT MIN(Price), MAX(Price), AVG(Price)
FROM PRODUCTS;

The output will look like this:

Min max avg

If you want to change the column (field) name and replace it with an alias, you can use the AS clause.

SELECT MIN(Price) AS "Minimum Price", MAX(Price) AS "Maximum Price", AVG(Price) AS "Average Price"
FROM PRODUCTS;

The output will look like this:

GROUP BY and HAVING

The GROUP BY clause in SQL is used to group records that have the same value in a particular column. For example, you can group on the category column in our dataset and see what is the average price in each category. You can use the below code:

SELECT Category, AVG(Price) AS "Average Price"
FROM PRODUCTS
GROUP BY Category;

The output will look something like this:

group by

The GROUP BY clause is often used in combination with aggregate functions like AVG, MIN, MAX, and COUNT. You can also use the HAVING clause in conjunction with the GROUP BY clause to filter records.

SELECT Category, AVG(Price) AS "Average Price"
FROM PRODUCTS
GROUP BY Category
HAVING MIN(Price) > 20;

The HAVING statement at the end will help us filter out categories where the minimum price is less than 20. The output will now contain only those categories where the minimum price is more than 20.

IN

The IN operator helps you give a range of values instead of a single value. It is often used with the WHERE clause. For example, look at the code below:

SELECT Title, Category, Vendor, Price, Rating
FROM PRODUCTS
WHERE Category IN ('Gizmo', 'Gadget')
ORDER BY Price DESC;

The above code will give us all products in the Gizmo and Gadget category and arrange them in decreasing order of price. The output will look like this:

In | SQL Functions

BETWEEN

BETWEEN is a useful operator that helps us filter values between a range. You can specify the minimum and maximum values of the range, and BETWEEN will filter records that lie in between these values. Keep in mind that BETWEEN is inclusive on both ends.

SELECT Title, Category, Vendor, Price
FROM PRODUCTS
WHERE Price BETWEEN 25 AND 50;

The above code will filter out values where the Price value is between 25 and 50. BETWEEN is often used with dates to filter records within a certain date range.

EndNote

In this article, we learnt about the most commonly used SQL functions and operators that are important for every Analyst. This article is written by Vishesh Arora. You can connect with me on LinkedIn.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Vishesh Arora 01 Aug 2021

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear