2.1. Support Vector Machines and Iris Data Set
In a previous post I used Grid Search, Random Search and Bayesian Optimization for hyperparameter optimization using the Iris data set provided by scikit-learn. Iris data set includes 3 different irises petal and sepal lengths and is a commonly-used data set for classification exercises. On this post, we are going to use the identical data set but we are going to use a Support Vector Machine (SVM) as a model with two parameters that we will optimize as follows:
C
: Regularization parameter, which trades off misclassification of coaching examples against simplicity of the choice surface.gamma
: Kernel coefficient, which defines how much influence a single training example has. The larger gamma is, the closer other examples have to be to be affected.
Because the goal of this exercise is to undergo the hyperparameter optimization, I won’t go deeper into what SVMs do but for those who have an interest, I find this scikit-learn post helpful.
We’ll generally follow the identical steps that we utilized in the straightforward example earlier but can even visualize the method at the tip:
1. Import crucial libraries and packages
2. Define the target function and the search space
3. Run the optimization process
4. Visualize the optimization
2.1.1. Step 1 — Import Libraries and Packages
Let’s import the libraries and packages after which load the info set.
# Import libraries and packages
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.goal
2.1.2. Step 2 — Define Objective Function and Search Space
Let’s first start with defining the target function, which is able to train an SVM and returns the negative of the cross-validation rating — that’s what we wish to attenuate. Note that we’re minimizing the negative of cross-validation rating to be consistent with the overall goal of “minimizing” the target function (as a substitute of “maximizing” the cross-validation rating).
def objective_function(parameters):
clf = SVC(**parameters)
rating = cross_val_score(clf, X, y, cv=5).mean()
return -score
Next we are going to define the search space, which consists of the values that our parameters of C
and gamma
can take. Note that we’ll use Hyperopt’s hp.uniform(label, low, high)
, which returns a worth uniformly between “low” and “high” (source).
# Search Space
search_space = {
'C': hp.uniform('C', 0.1, 10),
'gamma': hp.uniform('gamma', 0.01, 1)
}
2.1.3. Run Optimization
Same as the straightforward example earlier, we are going to use a TPE algorithm and store the leads to a Trials
object.
# Trials object to store the outcomes
trials = Trials()# Run optimization
best = fmin(fn=objective_function, space=search_space, algo=tpe.suggest, trials=trials, max_evals=100)
Results: