Support Vector Regression Tutorial for Machine Learning

Alakh Sethi 27 May, 2024 • 8 min read

Introduction

Support Vector Machines (SVM) are widely used in machine learning for classification problems, but they can also be applied to regression problems through Support Vector Regression (SVR). SVR uses the same principles as SVM but focuses on predicting continuous outputs rather than classifying data points. This tutorial will explore how SVR works, emphasizing key concepts such as quadratic, radial basis function, and sigmoid kernels. By leveraging these kernels, SVR can effectively handle complex, non-linear relationships in data. We will also demonstrate how to implement SVR in Python using training samples, showcasing its practical applications in artificial intelligence.

Support-Vector-Regression

Learning Outcomes

  • Grasp the fundamental concepts of SVM, including hyperplanes, margins, and how SVM separates data into different classes.
  • Recognize the key differences between Support Vector Machines for classification and Support Vector Regression for regression problems.
  • Learn about important SVR hyperparameters, such as kernel types (quadratic, radial basis function, and sigmoid), and how they influence the model’s performance.
  • Gain practical experience in implementing Support Vector Regression using Python, including data preprocessing, feature scaling, and model training.
  • Use SVR to predict continuous outputs in various contexts, demonstrating its application in fields like finance, engineering, and healthcare.
  • Develop skills to visualize the results of SVR models, understanding how to interpret the best-fit line and the impact of different kernels on the model’s predictions.
  • Learn how to assess the performance of SVR models using appropriate metrics and techniques, ensuring accurate and reliable predictions.

What is a Support Vector Machine (SVM)?

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. SVM works by finding a hyperplane in a high-dimensional space that best separates data into different classes. It aims to maximize the margin (the distance between the hyperplane and the nearest data points of each class) while minimizing classification errors. SVM can handle both linear and non-linear classification problems by using various kernel functions. It’s widely used in tasks such as image classification, text categorization, and more.

So what exactly is Support Vector Machine (SVM)? We’ll start by understanding SVM in simple terms. Let’s say we have a plot of two label classes as shown in the figure below:

Support Vector Machine

Can you decide what the separating line will be? You might have come up with this:

Support Vector Machine

The line fairly separates the classes. This is what SVM essentially does – simple class separation. Now, what is the data was like this:

Support Vector Machine

Here, we don’t have a simple line separating these two classes. So we’ll extend our dimension and introduce a new dimension along the z-axis. We can now separate these two classes:

Support Vector Machine

When we transform this line back to the original plane, it maps to the circular boundary as I’ve shown here:

Support Vector Machine

This is exactly what SVM does! It tries to find a line/hyperplane (in multidimensional space) that separates these two classes. Then it classifies the new point depending on whether it lies on the positive or negative side of the hyperplane depending on the classes to predict.

Hyperparameters of the Support Vector Machine (SVM) Algorithm

There are a few important parameters of SVM that you should be aware of before proceeding further:

  • Kernel: A kernel helps us find a hyperplane in the higher dimensional space without increasing the computational cost. Usually, the computational cost will increase if the dimension of the data increases. This increase in dimension is required when we are unable to find a separating hyperplane in a given dimension and are required to move in a higher dimension:
Support Vector Machine parameters
  • Hyperplane: This is basically a separating line between two data classes in SVM. But in Support Vector Regression, this is the line that will be used to predict the continuous output
  • Decision Boundary: A decision boundary can be thought of as a demarcation line (for simplification) on one side of which lie positive examples and on the other side lie the negative examples. On this very line, the examples may be classified as either positive or negative. This same concept of SVM will be applied in Support Vector Regression as well

To understand SVM from scratch, I recommend this tutorial: Understanding Support Vector Machine(SVM) algorithm from examples.

Introduction to Support Vector Regression (SVR)

Support Vector Regression (SVR) is a type of machine learning algorithm used for regression analysis. The goal of SVR is to find a function that approximates the relationship between the input variables and a continuous target variable, while minimizing the prediction error.

Unlike Support Vector Machines (SVMs) used for classification tasks, SVR seeks to find a hyperplane that best fits the data points in a continuous space. This is achieved by mapping the input variables to a high-dimensional feature space and finding the hyperplane that maximizes the margin (distance) between the hyperplane and the closest data points, while also minimizing the prediction error.

SVR can handle non-linear relationships between the input variables and the target variable by using a kernel function to map the data to a higher-dimensional space. This makes it a powerful tool for regression tasks where there may be complex relationships between the input variables and the target variable.

Support Vector Regression (SVR) uses the same principle as SVM, but for regression problems. Let’s spend a few minutes understanding the idea behind SVR.

The Idea Behind Support Vector Regression

The problem of regression is to find a function that approximates mapping from an input domain to real numbers on the basis of a training sample. So let’s now dive deep and understand how SVR works actually.

Support Vector Regression

Consider these two red lines as the decision boundary and the green line as the hyperplane. Our objective, when we are moving on with SVR, is to basically consider the points that are within the decision boundary line. Our best fit line is the hyperplane that has a maximum number of points.

The first thing that we’ll understand is what is the decision boundary (the danger red line above!). Consider these lines as being at any distance, say ‘a’, from the hyperplane. So, these are the lines that we draw at distance ‘+a’ and ‘-a’ from the hyperplane. This ‘a’ in the text is basically referred to as epsilon.

Assuming that the equation of the hyperplane is as follows:

Y = wx+b (equation of hyperplane)

Then the equations of decision boundary become:

wx+b= +a

wx+b= -a

Thus, any hyperplane that satisfies our SVR should satisfy:

-a < Y- wx+b < +a 

Our main aim here is to decide a decision boundary at ‘a’ distance from the original hyperplane such that data points closest to the hyperplane or the support vectors are within that boundary line.

Hence, we are going to take only those points that are within the decision boundary and have the least error rate, or are within the Margin of Tolerance. This gives us a better fitting model.

Implementing Support Vector Regression (SVR) in Python

Time to put on our coding hats! In this section, we’ll understand the use of Support Vector Regression with the help of a dataset. Here, we have to predict the salary of an employee given a few independent variables. A classic HR analytics project!

Support Vector Machine

Step 1: Importing the libraries

Step 2: Reading the dataset

Step 3: Feature Scaling

A real-world dataset contains features that vary in magnitudes, units, and range. I would suggest performing normalization when the scale of a feature is irrelevant or misleading.

Feature Scaling basically helps to normalize the data within a particular range. Normally several common class types contain the feature scaling function so that they make feature scaling automatically. However, the SVR class is not a commonly used class type so we should perform feature scaling using Python.

Step 4: Fitting SVR to the dataset

Kernel is the most important feature. There are many types of kernels – linear, Gaussian, etc. Each is used depending on the dataset. To learn more about this, read this: Support Vector Machine (SVM) in Python and R

Step 5. Predicting a New Result

So, the prediction for y_pred(6, 5) will be 170,370.

Step 6. Visualizing the SVR results (for higher resolution and smoother curve)

This is what we get as output- the best fit line that has a maximum number of points. Quite accurate!

Conclusion

Support Vector Regression (SVR) extends the principles of Support Vector Machines (SVM) to regression problems, offering a powerful tool for predicting continuous outputs. By leveraging various kernels such as quadratic, radial basis function, and sigmoid, SVR can handle complex and non-linear relationships in the data. Through this tutorial, we’ve explored the essential hyperparameters, implemented SVR in Python, and applied it to real-world datasets, demonstrating its versatility in artificial intelligence applications. Whether dealing with training samples in finance, engineering, or healthcare, SVR provides a robust approach to model continuous data effectively, enhancing the accuracy and reliability of predictive analytics.

Key Takeaways

  • SVR extends Support Vector Machines (SVM) into regression problems, allowing for the prediction of continuous outcomes rather than classifying data into discrete categories as with a classifier.
  • SVR utilizes various kernel functions, such as quadratic, radial basis function, and sigmoid, to handle non-linear relationships in data, akin to how neural networks manage complex patterns.
  • Effective hyperparameter tuning, including choosing the right kernel and setting the epsilon parameter, is vital for maximizing SVR performance, similar to the role of gradient optimization in neural networks.
  • SVR offers greater flexibility and robustness compared to traditional linear regression, by finding a hyperplane that best fits the data within a specified margin, making it suitable for more complex datasets.
  • Unlike logistic regression, which is primarily used for binary classification problems, Support Vector Regression (SVR) focuses on predicting continuous outcomes. SVR leverages kernel functions to handle non-linear relationships in data, offering a more versatile approach for regression tasks.

Frequently Asked Questions

Q1. What are the applications of SVM regression?

A. Support Vector Regression (SVM) is a versatile algorithm used in finance, engineering, bioinformatics, natural language processing, image processing, and healthcare for accurate predictions. It is commonly used for stock price prediction, machine performance prediction, protein structure prediction, text classification, sentiment analysis, object recognition, and medical outcomes prediction.

Q2. How does the regularization parameter in SVM affect the regression model?

A. Regularization is a technique used to avoid overfitting by penalizing large coefficients in the model. In Support Vector Regression, the regularization parameter determines the trade-off between achieving a low error on the training data and minimizing the complexity of the regression model. A higher value of the regularization parameter increases the penalty for large coefficients, which helps to prevent the model from fitting the noise in the training data.

Q3. What are the benefits of using a polynomial kernel in SVM for regression?

A. A polynomial kernel helps in fitting a regression model that can capture more complex relationships in the input data. It transforms the original features into polynomial features of a given degree, thus allowing the model to learn non-linear relationships. This is especially beneficial in scenarios where the relationship between the dependent and independent variables is not linear, providing a more flexible and powerful model.

Q4. How does cross-validation help in tuning the parameters of an SVR model?

A. Cross-validation is a method used to assess the performance of the model with different parameter settings during the optimization problem. It involves splitting the training set into multiple smaller sets to validate the model’s performance against each one. This technique helps in identifying the best set of parameters that generalize well to unseen data. It’s particularly useful in SVR for selecting the optimal values of the regularization parameter, the kernel type (like polynomial or non-linear kernels), and other hyperparameters that impact the model’s accuracy and performance.

Alakh Sethi 27 May 2024

Aspiring Data Scientist with a passion to play and wrangle with data and get insights from it to help the community know the upcoming trends and products for their better future.With an ambition to develop product used by millions which makes their life easier and better.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Rahul Dev
Rahul Dev 19 Sep, 2020

Thanks for the article,it gave an intuitive understanding about SVR It would be really helpful if you could also include the dataset,used for the demonstration.

Venkat
Venkat 16 Nov, 2020

The code is completely irrelevant to the dataset shown in the picture. Also this code is from Udemy course by Kiril Ermenko. Atleast give them the credit when you have plagiarized the code and content of the tutorial from elsewhere.

Junior Mukenze
Junior Mukenze 02 Nov, 2022

Thank you for this article, is very clear and helpful. However, I have one question on the example you gave. And My question concern characteristics variables (X) and target variables (Y). How to use SVR if we have more then one (1) characteristic variables. Like if we want to consider Salary against position level and age?

Rayan
Rayan 24 Feb, 2023

Why we used inverse transform in step 5 line 2

Machine Learning
Become a full stack data scientist