In a previous article:
you have got learned about rewriting decision trees using a Differentiable Programming approach, as suggested by the NODE paper. The thought of this paper is to switch XGBoost by a Neural Network.
More specifically, after explaining why the technique of constructing Decision Trees will not be differentiable, it introduced the needed mathematical tools to regularize the 2 most important elements related to a call node:
- Feature Selection
- Branch detection
The NODE paper shows that each could be handled using the entmax function.
To summarize, we now have shown tips on how to create a binary tree without using comparison operators.
The previous article ended with open questions regarding training a regularized decision tree. It’s time to reply these questions.
In case you’re considering a deep dive in Gradient Boosting Methods, have a have a look at my book:
First, based on what we presented within the previous article, let’s create a brand new Python class: SmoothBinaryNode .
This class encodes the behavior of a smooth binary node. There are two key parts in its code :
- The choice of the features, handled by the function
_choices - The evaluation of those features, with respect to a given threshold, and the identification of the trail to follow:
leftorright. All that is managed by the methodsleftandright.