Home Artificial Intelligence Ensemble Learning with Scikit-Learn: A Friendly Introduction

Ensemble Learning with Scikit-Learn: A Friendly Introduction

0
Ensemble Learning with Scikit-Learn: A Friendly Introduction

Ensemble learning algorithms like XGBoost or Random Forests are among the many top-performing models in Kaggle competitions. How do they work?

Towards Data Science
Source: unsplash.com

Fundamental learning algorithms as logistic regression or linear regression are sometimes too easy to realize adequate results for a machine learning problem. While a possible solution is to make use of neural networks, they require an unlimited amount of coaching data, which is never available. Ensemble learning techniques can boost the performance of straightforward models even with a limited amount of information.

Imagine asking an individual to guess what number of jellybeans there are inside a giant jar. One person’s answer will unlikely be a precise estimate of the right number. As a substitute, if we ask a thousand people the identical query, the typical answer will likely be near the actual number. This phenomenon known as the wisdom of the gang [1]. When coping with complex estimation tasks, the gang could be considerably more precise than a person.

Ensemble learning algorithms reap the benefits of this straightforward principle by aggregating the predictions of a bunch of models, like regressors or classifiers. For an aggregation of classifiers, the ensemble model could simply pick essentially the most common class between the predictions of the low-level classifiers. As a substitute, the ensemble can use the mean or the median of all of the predictions for a regression task.

Image by the creator.

By aggregating a lot of weak learners, i.e. classifiers or regressors that are only barely higher than random guessing, we are able to achieve unthinkable results. Consider a binary classification task. By aggregating 1000 independent classifiers with individual accuracy of 51% we are able to create an ensemble achieving an accuracy of 75% [2].

That is the explanation why ensemble algorithms are sometimes the winning solutions in lots of machine-learning competitions!

There exist several techniques to construct an ensemble learning algorithm. The principal ones are bagging, boosting, and stacking. In the next…

LEAVE A REPLY

Please enter your comment!
Please enter your name here