## Learn adjust regression algorithms to predict any quantile of knowledge

Regression is a machine learning task where the goal is to predict an actual value based on a set of feature vectors. There exists a big number of regression algorithms: linear regression, logistic regression, gradient boosting or neural networks. During training, each of those algorithms adjusts the weights of a model based on the loss function used for optimization.

The alternative of a loss function will depend on a certain task and particular values of a metric required to realize. Many loss functions (like MSE, MAE, RMSLE etc.) deal with predicting the expected value of a variable given a feature vector.

In this text, we can have a take a look at a special loss function called **quantile loss** used to predict particular variable quantiles. Before diving into the small print of quantile loss, allow us to briefly revise the term of a quantile.

Quantileqₐ is a price that divides a given set of numbers in a way at which α* 100%of numbers are lower than the worth and(1 —α) * 100%of numbers are greater than the worth.Quantiles

qₐforα = 0.25,α = 0.5andα = 0.75are sometimes utilized in statistics and calledquartiles. These quartiles are denoted asQ₁,Q₂andQ₃respectively. Three quartiles split data into 4 equal parts.Similarly, there are

percentilesp which divide a given set of numbers by 100 equal parts. A percentile is denoted as pₐ where α is the share of numbers lower than the corresponding value.Quartiles Q₁, Q₂ and Q₃ correspond to percentiles p₂₅, p₅₀ and p₇₅ respectively.

In the instance below, for a given set of numbers, all three quartiles are found.

Machine learning algorithms aiming to predict a selected variable quantile use quantile loss because the loss function. Before going to the formulation, allow us to consider an easy example.

Imagine an issue where the goal is to predict the 75-th percentile of a variable. The truth is, this statement is similar to the one which prediction errors need to be negative in 75% of cases and in the opposite 25% to be positive. That’s the actual intuition used behind the quantile loss.

## Formulation

The quantile loss formula is illustrated below. The *α* parameter refers back to the quantile which must be predicted.

The worth of quantile loss will depend on whether a prediction is less or greater than the true value. To know higher the logic behind it, allow us to suppose we objective is to predict the 80-th quantile, thus the worth of *α* = 0.8 is plugged into the equations. Because of this, the formula looks like this:

Principally, in such a case, the quantile loss penalizes under-estimated predictions 4 times greater than over-estimated. This manner the model might be more critical to under-estimated errors and can predict higher values more often. Because of this, the fitted model on average will over-estimate results roughly in 80% of cases and in 20% it should produce under-estimated.

Right away assume that two predictions for a similar goal were obtained. The goal has a price of 40, while the predictions are 30 and 50. Allow us to calculate the quantile loss in each cases. Despite the incontrovertible fact that absolutely the error of 10 is identical in each cases, the loss value is different:

- for 30, the loss value is
*l = 0.8 * 10 = 8* - for 50, the loss value is
*l =**0.2 * 10 = 2*.

This loss function is illustrated within the diagram below which shows loss values for various parameters of *α* when the true value is 40.

Inversely, if the worth of *α* was 0.2, then over-estimated predictions can be penalized 4 times greater than the under-estimated.

The issue of predicting a certain variable quantile known as

quantile regression.

Allow us to create an artificial dataset with 10 000 samples where rankings of players in a video game might be estimated based on the variety of playing hours.

Allow us to split the info on train and test in 80:20 proportion:

For comparison, allow us to construct 3 regression models with different *α* values: 0.2, 0.5 and 0.8. Each of the regression models might be created by LightGBM — a library with an efficient implementation of gradient boosting.

Based on the knowledge from the official documentation, LightGBM allows solving quantile regression problems by specifying the **objective** parameter as *‘quantile’* and passing a corresponding value of **alpha**.

After training 3 models, they may be used to acquire predictions (line 6).

Allow us to visualize the predictions via the code snippet below:

From the scatter plot above, it is evident that with greater values of *α*, models are likely to generate more over-estimated results. Moreover, allow us to compare the predictions of every model with all goal values.

This results in the next output:

The pattern from the output is clearly seen: for any *α*, predicted values are greater than true values in roughly *α * 100%* of cases. Subsequently, we are able to experimentally conclude that our prediction models work accurately.

Prediction errors of quantile regression models are negative roughly in α

* 100%of cases and are positive in(1 —α) * 100%of cases.

We have now discovered quantile loss — a versatile loss function that may be incorporated into any regression model to predict a certain variable quantile. Based on the instance of LightGBM, we saw adjust a model, so it solves a quantile regression problem. The truth is, many other popular machine learning libraries allow setting quantile loss as a loss function.

The code utilized in this text is obtainable here:

*All images unless otherwise noted are by the writer.*