Tuning XGBoost Hyperparameters

Hyperparameter tuning is about finding a set of optimal hyperparameter values which maximizes the model's performance, minimizes loss, and produces better outputs.



Tuning XGBoost Hyperparameters
Garett Mizunaka via Unsplash

 

To recap, XGBoost stands for Extreme Gradient Boosting and is a supervised learning algorithm that falls under the gradient-boosted decision tree (GBDT) family of machine learning algorithms.

They make their predictions based on combining a set of weaker models and evaluate other decision trees through if-then-else true/false feature questions. They are created in sequential form to assess and estimate the probability of producing a correct decision. 

import xgboost as xgb


Before we get into the tuning of XGBoost hyperparamters, let’s understand why tuning is important

 

Why is Hyperparamter Tuning Important?

 

Hyperparameter tuning is a vital part of improving the overall behavior and performance of a machine learning model. It is a type of parameter that is set before the learning process and happens outside of the model.

A lack of hyperparameter tuning can lead to inaccurate results if the loss function is not minimized. Our aim is that our model produces as few errors as possible. 

Hyperparameters are used to calculate the model parameters where the different hyperparameter values have the ability to produce different model parameter values.

Hyperparameter tuning is all about finding a set of optimal hyperparameter values which maximizes the models performance, minimizes loss and produces better outputs. 

 

Features of XGBoost

 

1. Gradient Tree Boosting

 

There are a fixed number of trees added and with each iteration which should show a reduction in loss function value. 

 

2. Regularized Learning 

 

Regularized Learning helps to minimize the loss function and prevent overfitting or underfitting from occurring - helping to smooth out the final learnt weight. 

 

3. Shrinkage and Feature Subsampling

 

These two techniques are used to further prevent overfitting: Shrinkage and Feature Subsampling.

These features will be further explored in the hyperparameter tuning of XGBoost. 

 

XGBoost Parameter Tuning

 

XGBoost parameters are divided into 4 groups:

 

1. General parameters

 

This relates to the type of booster we are using to do boosting. The most common types are tree or linear model.

 

2. Booster parameters

 

This depend on which booster you have chosen

 

3. Learning task parameters

 

This is about deciding on the learning scenario. 

 

4. Command line parameters

 

This relates to the behavior of the CLI version of XGBoost

General parameters, Booster parameters and Task parameters are set before running the XGBoost model. The Command line parameters are only used in the console version of XGBoost.

 

General Parameters

 

These are the general parameters in XGBoost:

 
booster [default=gbtree]

Choosing which booster to use such as gbtree and dart for tree based models and gblinear for linear functions.

 
verbosity [default=1]

This is printing of messages where valid values are 0 (silent), 1 (warning), 2 (info), 3 (debug). 

 
validate_parameters [default to false, except for Python, R and CLI interface]

When this is set to True, XGBoost will perform validation of input parameters to check whether a parameter is used or not.

 
nthread

This is the number of parallel threads used to run XGBoost.

 
disable_default_eval_metric [default=false]

This will flag to disable the default metric. You can set it to 1 or true to disable.

 
num_feature 

Feature dimension used in boosting, set to maximum dimension of the feature.

 

Tree-based Learners Most Common Parameters

 

max_depth [default=6]

Increasing this value will make the model more complex and more likely to overfit, therefore you need to be careful when choosing your value. The value must be an integer greater than 0.

 
eta [default=0.3, alias: learning_rate]

This determines the step size at each iteration. The value must be between 0 and 1 and the default is 0.3. A low learning rate makes computation slower, and will need more rounds to achieve a reduction in error in comparison with a model with a higher learning rate. 

 
n_estimators [default=100]

This is the number of trees in our ensemble which is the same number of boosting rounds.

The value must be an integer greater than 0 and the default is 100.

 
subsample [default=1]

This represents the fraction of observations that need to be sampled for each tree. A lower value helps to prevent overfitting, but raises the possibility of under-fitting. The value must be between 0 and 1, where the default is 1.

 
colsample_bytree, colsample_bylevel, colsample_bynode [default=1]

This is a family of parameters for subsampling of columns. Feature subsampling helps to prevent overfitting and also speeds up computations of the parallel algorithm.

 

params = {
    # Parameters that we are going to tune.
    'max_depth':6,
    'n_estimators': 100,
    'eta':.3,
    'subsample': 1,
    'colsample_bytree': 1,
}


 

Regularization Parameters

 

alpha [default=0, alias: reg_alpha]

L1 (Lasso Regression) regularization on weights. When you increase this value it will make the model more conservative. When you work with a large number of features, there is a possibility of an improvement in speed performance. It can be any integer and the default is 0.

 
lambda [default=1, alias: reg_lambda]

L2 (Ridge Regression) regularization on weights. When you increase this value it will make the model more conservative. It can also be used to help to reduce overfitting. It can be any integer and the default is 1.

 
gamma [default=0, alias: min_split_loss]

Minimum loss reduction required to make a further partition on a leaf node of the tree. The higher the Gamma value is, the higher the regularization. It can be any integer and the default is 0.

 
You can find the complete list of XGBoost parameters here.

 

Conclusion

 

From this article, you would have understood how the 3 features of XGBoost are achieved by the 4 divided parameter tuning groups. Which were then further explored through the different XGBoost parameters.

 
 
Nisha Arya is a Data Scientist and Freelance Technical Writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.