How much Mathematics do you need to know for Machine Learning?

KAVITA MALI 31 Mar, 2023 • 8 min read

This article was published as a part of the Data Science Blogathon

Mathematics For Machine Learning

 Image Source

No matter how long-running a love-hate relationship you have with maths, understanding its core concepts is essential for designing Machine Learning Models and making strategic decisions. Mathematics for Machine Learning is a prerequisite for building a career in Data Science and AI, so embracing its concepts and implementing them in your future work is crucial.

Machine learning is all about mathematics, which successively helps in creating an ML algorithm that will learn from data provided to form an accurate prediction. The prediction might be as simple as classifying cats or dogs from a given set of images or what quite products to recommend to a customer supported past purchases. Having a proper understanding of the mathematics behind the ML algorithms will help you choose all the proper algorithms for your project in data science and machine learning.

As long as you’ll understand why maths is employed, you’ll find it more interesting. With this, you’ll understand why we pick one machine learning algorithm over the opposite and the way it affects the performance of the machine learning model.

We will try to cover the following points in this blog post:

  • Vectors and Vector Spaces
  • Linear Transformations and matrices

Vectors and Vector Spaces

The ability to visualize data is one of the most useful skills to possess as a data science professional, and a solid foundation in linear algebra enables one to do that. Some concepts and algorithms are quite easy to understand if one can visualize them as vectors and matrices, rather than looking at the data as lists and arrays of numbers.

Linear Algebra is the workhorse of Data Science and ML. While training a machine learning model using a library (such as in R or Python), much of what happens behind the scenes is a bunch of matrix operations. The most popular deep learning library today, Tensorflow, is essentially an optimized (i.e. fast and reliable) matrix manipulation library. So is scikit-learn, the Python library for machine learning.

Vectors

A vector is an object having both magnitudes as well as direction. Vectors are usually represented in two ways – as ordered lists, such as x = [x1X2 . . . xn] or using the ‘hat’ notation, such as x = x1ˆi + x2ˆj + x3ˆk where ˆi, ˆj, ˆk represent the three perpendicular directions (or axes).

The number of elements in a vector is the dimensionality of the vector. For e.g. x = [ x1 , x] is two dimensional (2-D) vector , x = [ x1 , x2 , x] is a 3-D vector and so on.

The magnitude of a vector is the distance of its tip from the origin. For an n-dimensional vector x = [x1,x2 , . . . xn ] , the magnitude is given by,

vector Mathematics For Machine Learning

A unit vector is one whose distance from the origin is exactly 1 unit. E.g the vectors i hat comma j hat comma the fraction with numerator i hat and denominator the square root of 2 plus the fraction with numerator j hat and denominator the square root of 2are unit vectors.

Vector Operations 

1. Vector Addition/Subtraction:  It is the element-wise sum/difference of two vectors.
Mathematically,

Mathematics For Machine Learning the 4 by 1 column matrix Row 1: x sub 1 Row 2: x sub 2 Row 3: vertical ellipsis Row 4: x sub n plus or minus the 4 by 1 column matrix Row 1: y sub 1 Row 2: y sub 2 Row 3: vertical ellipsis Row 4: y sub n equals the 4 by 1 column matrix Row 1: x sub 1 plus y comma Row 2: x sub 2 plus y sub 2 Row 3: vertical ellipsis Row 4: x sub n plus y sub n

2. Scalar Multiplication/Division: It is the element-wise multiplication/division of the scalar value.
Mathematically,

scalar multiplication division Mathematics For Machine Learning a. times the 4 by 1 column matrix Row 1: x sub 1 Row 2: x sub 2 Row 3: vertical ellipsis Row 4: x sub n equals the 4 by 1 column matrix Row 1: a. x sub 1 Row 2: a. x sub 2 Row 3: vertical ellipsis Row 4: a. x sub n

3. Vector Multiplication or Dot Product: It is the element-wise product of the two vectors. It is also known as the dot product of two vectors. The dot product of two vectors returns a scalar quantity. Mathematically,

Mathematics For Machine Learning vector multi the 4 by 1 column matrix Row 1: x sub 1 Row 2: x sub 2 Row 3: vertical ellipsis Row 4: x sub n period the 4 by 1 column matrix Row 1: y sub 1 Row 2: y sub 2 Row 3: vertical ellipsis Row 4: y sub n equals x sub 1 y sub 1 plus x sub 2 y sub 2 plus period period period positive x sub n y sub n

Geometrically,

cos theta - d o t of open paren x right arrow comma y right arrow divided into equals double vertical double vertical bar x times double vertical bar double vertical bar y times cosine theta

where thetais the angle between two vectors?

The dot product of two perpendicular vectors (also called orthogonal vectors) is 0. The dot product can be used to compute the angle between two vectors using the formula,

Mathematics For Machine Learning cosine theta equals the fraction with numerator x right arrow times y right arrow and denominator double vertical bar double vertical bar x times double vertical bar double vertical bar y

This simple property of the dot product is extensively used in data science applications.

Vector Spaces

  1. Basis Vector: A basis vector of a vector space V is defined as a subset (v 1,  v2,. . . vn ) of vectors in vector space V, that are linearly independent and span vector space V. Consequently, if (v1, v2, . . . vn) is a list of vectors in vector space V, then these vectors form a vector basis if and only if every v in vector space V can be uniquely written as, Mathematics For Machine Learning V equals a. sub 1 v sub 1 plus a. sub 2 v sub 2 plus times times times plus a. sub n v sub n
  2. Span: The span of two or more vectors is the set of all possible vectors that one can get by changing the scalars and adding them.
  3.  Linear Combination: The linear combination of two vectors is the sum of the scaled vectors.
  4. Linearly Dependent: A set of vectors is called linearly dependent if any one or more of the vectors can be expressed as a linear combination of the other vectors.
  5. Linearly Independent: If none of the vectors in a set can be expressed as a linear combination of the other vectors, the vectors are called linearly independent.

LINEAR TRANSFORMATIONS AND MATRICES:

Matrices are a time-tested and powerful data structure used to perform numerical computations. Briefly, a matrix is a collection of values stored as rows and columns, i.e.

Linear transformation - A equals the 4 by 1 column matrix Row 1: x sub 11 comma x sub 12 times times times x sub 1 n Row 2: x sub 12 comma x sub 22 times times times x sub 2 n Row 3: colon colon colon Row 4: x sub m 1 comma x sub m 2 raised to the power times times times x sub m n

Matrices:

  1. Rows: Rows are horizontal. The matrix A has m rows. Each row itself is a vector, so they are also called row vectors.
  2. Columns: Columns are vertical. The matrix A has n columns. Each column itself is a vector, so they are also called column vectors.
  3. Entities: Entities are individual values in a matrix. For a given matrix A, value of row i and column j is represented as A ij 
  4. Dimensions: The number of rows and columns. For m rows and n columns, the dimensions are (m × n).
  5. Square Matrices: These are matrices where the number of rows is equal to the number of columns, i.e m = n.
  6. Diagonal Matrices: These are square matrices where all the off-diagonal elements are zero,i.e,Mathematics For Machine Learning the 4 by 1 column matrix Row 1: x sub 11 comma 0 times times times 0 Row 2: 0 sub x sub 22 period period period 0 Row 3: vertical ellipsis vertical ellipsis vertical ellipsis Row 4: 0 0 times times times x sub m n
  7. Identity Matrices: These are diagonal matrices where all the diagonal elements are 1, i.e,Mathematics For Machine Learning the 4 by 1 column matrix Row 1: 1 0 times times times 0 Row 2: 0 sub 1 period period period 0 Row 3: vertical ellipsis vertical ellipsis vertical ellipsis Row 4: 0 0 times times times 1

Matrix Operations

  1. Matrix Addition/Subtraction: It is the element-wise sum/difference of two matrices. Mathematically,the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 12 Column 3, times times times Column 4, x sub 1 n Row 2: Column 1, x sub 21 Column 2, x sub 22 Column 3, times times times Column 4, x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 Column 2, x sub m 2 Column 3, times times times Column 4, x sub m n plus the 4 by 4 matrix Row 1: Column 1, y sub 11 Column 2, y sub 12 Column 3, times times times Column 4, y sub 1 n Row 2: Column 1, y sub 21 Column 2, y sub 22 Column 3, times times times Column 4, y sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, y sub m 1 Column 2, y sub m 2 Column 3, times times times Column 4, y sub m n equals the 4 by 4 matrix Row 1: Column 1, x sub 11 plus y sub 11 Column 2, x sub 12 plus y sub 12 Column 3, times times times Column 4, x sub 1 n plus y sub 1 n Row 2: Column 1, x sub 21 plus y sub 21 Column 2, x sub 22 plus y sub 22 Column 3, times times times Column 4, x sub 2 n plus y sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 plus y sub m 1 Column 2, x sub m 2 plus y sub m 2 Column 3, times times times Column 4, x sub m n plus y sub m n
  2. Matrix Multiplication/Division: It is the element-wise multiplication/division of the scalar value. Mathematically,Mathematics For Machine Learning a. times the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 12 Column 3, times times times Column 4, x sub 1 n Row 2: Column 1, x sub 21 Column 2, x sub 22 Column 3, times times times Column 4, x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 Column 2, x sub m 2 Column 3, times times times Column 4, x sub m n equals the 4 by 4 matrix Row 1: Column 1, a. x sub 11 Column 2, a. x sub 12 Column 3, times times times Column 4, a. x sub 1 n Row 2: Column 1, a. x sub 21 Column 2, a. x sub 22 Column 3, times times times Column 4, a. x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, a. x sub m 1 Column 2, a. x sub m 2 Column 3, times times times Column 4, a. x sub m n
  3. Matrix Multiplication or Dot Product: It is the element-wise product of the two matrices i.e the (i, j) element of the output matrix is the dot product of the ith row of the first matrix and the jth column of the second matrix. Mathematically,      4 lines Line 1: the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 12 Column 3, times times times Column 4, x sub 1 n Row 2: Column 1, x sub 21 Column 2, x sub 22 Column 3, times times times Column 4, x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 Column 2, x sub m 2 Column 3, times times times Column 4, x sub m n times the 4 by 4 matrix Row 1: Column 1, y sub 11 Column 2, y sub 12 Column 3, times times times Column 4, y sub 1 o Row 2: Column 1, y sub 21 Column 2, y sub 22 Column 3, times times times Column 4, y sub 2 o Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, y sub o 1 Column 2, y sub o 2 Column 3, times times times Column 4, y sub o p Line 2: open paren m times n close paren open paren o times p close paren Line 3: equals the 4 by 3 matrix Row 1: Column 1, x sub 11 y sub 11 plus x sub 12 y sub 21 plus times times times plus x sub 1 n y sub o 1 Column 2, times times times Column 3, x sub 11 y sub 1 p plus x sub 12 y sub 2 p plus times times times plus x sub 1 n y sub o p Row 2: Column 1, x sub 21 y sub 11 plus x sub 22 y sub 21 plus times times times plus x sub 2 n y sub o 1 Column 2, times times times Column 3, x sub 21 y sub 1 p plus x sub 22 y sub 2 p plus times times times plus x sub 2 n y sub o p Row 3: Column 1, vertical ellipsis Column 2, times times times Column 3, vertical ellipsis Row 4: Column 1, x sub m 1 y sub 11 plus x sub m 2 y sub 21 plus times times times plus x sub m n y sub o 1 Column 2, times times times Column 3, x sub m 1 y sub 1 p plus x sub m 2 y sub 2 p plus times times times plus x sub m n y sub o p Line 4: open paren m times p close paren              Not all matrices can be multiplied with each other. For the matrix multiplication AB to be valid, the number of columns in A should be equal to the number of rows in B. i.e for two matrices A and B with dimensions (m × n) and (o × p), AB exists if and only if m = p and BA exists if and only if o = n. Matrix multiplication is not commutative i.e
    AB-is not equal to  BA.
  4. Matrix Inverse: The inverse of a matrix A is a matrix such that AA -1 = I ( Identity Matrix).
  5. Matrix Transpose: The transpose of a matrix produces a matrix in which the rows and columns are interchanged. Mathematically, A to the T power equals the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 21 Column 3, times times times Column 4, x sub m 1 Row 2: Column 1, x sub 12 Column 2, x sub 22 Column 3, times times times Column 4, x sub m 2 Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub 1 n Column 2, x sub 2 n Column 3, times times times Column 4, x sub n m where comma A equals the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 12 Column 3, times times times Column 4, x sub 1 n Row 2: Column 1, x sub 21 Column 2, x sub 22 Column 3, times times times Column 4, x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 Column 2, x sub m 2 Column 3, times times times Column 4, x sub m n
Linear Transformations:

 

Image Source

     Any transformation can be geometrically visualized as the distortion of the n-dimensional space (it can be squishing, stretching, rotating, etc.). The distortion of space can be visualized as a distortion of the grid lines that make up the coordinate system. Space can be distorted in several different ways. A linear transformation, however, is a special distortion with two distinct properties,
  1.  Straight lines remain straight and parallel to each other
  2.  The origin remains fixed
Consider a linear transformation where the original basis vectors-i hat a. n d j hat move to the new points,i hat equals open bracket 1 comma negative 2 close bracket a. n d j hat equals open bracket 3 comma 0 close bracket (where i  and j are unit vectors along the x-direction and y-direction in the co-ordinate system respectively) This means that i moves to (1,− 2) from (1,0) and j moves to (3, 0) from (0, 1) in the linear transformation. This transformation simply stretches the space in the y-direction by three units while stretching the space in the x-direction by two units and rotating it by sixty degrees in the clockwise direction. One can combine the two vectors where i and j land and write them as a single matrix, i.e,L equals the 2 by 1 column matrix Row 1: 1 3 Row 2: negative 2 0
As can be seen, each of these vectors forms one column of the matrix (and hence are often called column vectors). This matrix fully represents the linear transformation. Now, if one wants to find where any given vector v would land after this transformation, one simply needs to multiply the vector v with the matrix L, i.e vnew = L.v. It is convenient to think of this matrix as a function that describes the transformation, i.e it takes the original vector v as the input and returns the new vector vnew. The following figures represent the linear transformation.

Image source 

Formally, a transformation is linear if it satisfies the following two properties,
  1. Additivity or Distributivity, i.e L(v + w) = L(v) + L(w) .
  2. Associativity of Homogeneity, i.e L(cv) = cL(v) where c is a scalar.

End!!!!

I hope you enjoyed the article !!! Though there are plenty of valuable resources available on the internet which explain concepts like matrix decompositions, vector calculus, linear algebra, geometry, matrices, the mathematics behind the principal component analysis, and support vector machines, and many more. The following links may help you to understand the mathematical concepts :

  1. Khan Academy’s courses – Comprehensive free course for complex mathematical concepts.

  2. 3Blue1Brown – Here you will understand each most of the mathematical concept in depth

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

KAVITA MALI 31 Mar 2023

A Mathematics student turned Data Scientist. I am an aspiring data scientist who aims at learning all the necessary concepts in Data Science in detail. I am passionate about Data Science knowing data manipulation, data visualization, data analysis, EDA, Machine Learning, etc which will help to find valuable insights from the data.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Edward
Edward 02 Aug, 2021

This is great considering the changing technology and the need for Artificial Intelligence capabilities this is sure a good start where one can utilize his/her mathematical capabilities