A deep dive into scoring functions to be used in RandomizedSearchCV, GridSearchCV and cross_val_score

RandomizedSearchCV
, GridSearchCV
, and cross_val_score
are all tools to optimize and evaluate machine learning models in scikit-learn. Each of those tools offers a scientific approach to hyper-parameter tuning and model performance assessment.
For a very long time, I used these tools out of the box as a right for the scoring function. Nevertheless, I finally learned that when using these tools, scikit-learn defaults to the model’s inherent scoring function to guage performance. The default scoring metric isn’t at all times appropriate, which may result in misinformed decisions regarding the model.
The rest of this text will delve into how and when to utilize custom scoring functions in scikit-learn.
In this instance, we develop a regressor for predicting future insurance claim costs, a task complicated by the inherent uncertainty in insurance data. Uncertainty in insurance data stems from a few places.
- Once someone has purchased an insurance policy, there isn’t any guarantee they may ever file a claim. This results in a high concentration of zeroes within the goal.
- If someone does file a claim, the dimensions of that claim may very well be large or small. This results in a big variance in our goal variable.
By default, RandomizedSearchCV
, GridSearchCV
, and cross_val_score
use the default scoring metric related to the classifier or regressor passed to it. For a lot of widely used regressors traditional metrics like R² and RMSE are the default scores. Nevertheless, using these metrics to make decisions about parameter tuning or evaluate model performance for insurance data will often result in incorrect decisions and results.
Consequently, when working with insurance data we’d like to make sure we pass an appropriate scoring function to those tools to make sure we accurately set the model’s parameters and evaluate the model’s performance. More generally, any time you utilize these tools you need to be certain that the scoring function being…