Home Artificial Intelligence XGBoost: The Definitive Guide (Part 2)

XGBoost: The Definitive Guide (Part 2)

0
XGBoost: The Definitive Guide (Part 2)

Implementation of the XGBoost algorithm in Python from scratch

Towards Data Science
Image by StockSnap from Pixabay

Within the previous article we discussed the XGBoost algorithm and showed its implementation in pseudocode. In this text we’re going to implement the algorithm in Python from scratch.

The provided code is a concise and light-weight implementation of the XGBoost algorithm (with only about 300 lines of code), intended to exhibit its core functionality. As such, it is just not optimized for speed or memory usage, and doesn’t include the total spectrum of options provided by the XGBoost library (see https://xgboost.readthedocs.io/ for more details on the features of the library). More specifically:

  1. The code is written in pure Python, whereas the core of the XGBoost library is written in C++ (its Python classes are only thin wrappers over the C++ implementation).
  2. It doesn’t include various optimizations that allow XGBoost to cope with huge amounts of knowledge, comparable to weighted quantile sketch, out-of-core tree learning, and parallel and distributed processing of the information. These optimizations will likely be discussed in additional detail in the subsequent article within the series.
  3. The implementation currently supports only regression and binary classification tasks, whereas the XGBoost library also supports multi-class classification and rating problems.
  4. The implementation supports only a small subset of the hyperparameters that exist within the XGBoost library. Specifically, it supports the next hyperparameters:
  • n_estimators (default = 100): the variety of regression trees within the ensemble (which can also be the variety of boosting iterations).
  • max_depth (default = 6): the utmost depth (variety of levels) of every tree.
  • learning_rate (default = 0.3): the step size shrinkage applied to the trees.
  • reg_lambda (default = 1): L2 regularization term applied to the weights of the leaves.
  • gamma (default = 0): minimum loss reduction required to separate a given node.

For consistency, I even have kept the identical names and default values of those hyperparameters as they’re defined within the XGBoost library.

LEAVE A REPLY

Please enter your comment!
Please enter your name here