Home Artificial Intelligence 10 Confusing XGBoost Hyperparameters and The best way to Tune Them Like a Pro in 2023

10 Confusing XGBoost Hyperparameters and The best way to Tune Them Like a Pro in 2023

0
10 Confusing XGBoost Hyperparameters and The best way to Tune Them Like a Pro in 2023

1. num_boost_roundn_estimators

Afterwards, you’ve to find out the variety of decision trees (often called base learners in XGBoost) to plant during training using num_boost_round. The default is 100 but that is hardly enough for today’s large datasets.

Increasing the parameter will plant more trees but significantly increases the probabilities of overfitting because the model becomes more complex.

One trick I learned from Kaggle is to set a high number like 100,000 for num_boost_round and make use of early stopping rounds.

In each boosting round, XGBoost plants another decision tree to enhance the collective rating of the previous ones. That’s why it known as boosting. This process continues until num_boost_round rounds, regardless whether each latest round is an improvement on the last or not.

But through the use of early stopping, we will stop the training and thus planting of unnecessary trees when the rating hasn’t been improving for the last 5, 10, 50 or any arbitrary variety of rounds.

With this trick, we will find the right variety of decision trees without even tuning num_boost_round and we are going to save time and computation resources. Here is how it might appear like in code:

# Define the remainder of the params
params = {...}

# Construct the train/validation sets
dtrain_final = xgb.DMatrix(X_train, label=y_train)
dvalid_final = xgb.DMatrix(X_valid, label=y_valid)

bst_final = xgb.train(
params,
dtrain_final,
num_boost_round=100000 # Set a high number
evals=[(dvalid_final, "validation")],
early_stopping_rounds=50, # Enable early stopping
verbose_eval=False,
)

The above code would’ve made XGBoost use 100k decision trees but due to early stopping, it’ll stop when the validation rating hasn’t been improving for the last 50 rounds. Normally, the variety of required trees will probably be lower than 5000–10000.

Controlling num_boost_round can be one among the most important aspects in how long the training process runs as more trees require more resources.

LEAVE A REPLY

Please enter your comment!
Please enter your name here