Numpy
First we are going to arrange the identical example from above:
import numpy as np# Binary Classification
samples = np.array([0.22, 0.64, 0.92, 0.42, 0.51, 0.15, 0.70, 0.37, 0.83])
true_labels = np.array([0,1,0,0,0,1,1,0,1])
We then define the ECE function as follows:
def expected_calibration_error(samples, true_labels, M=3):
# uniform binning approach with M variety of bins
bin_boundaries = np.linspace(0, 1, M + 1)
bin_lowers = bin_boundaries[:-1]
bin_uppers = bin_boundaries[1:]# keep confidences / predicted "probabilities" as they're
confidences = samples
# get binary class predictions from confidences
predicted_label = (samples>0.5).astype(float)
# get a boolean list of correct/false predictions
accuracies = predicted_label==true_labels
ece = np.zeros(1)
for bin_lower, bin_upper in zip(bin_lowers, bin_uppers):
# determine if sample is in bin m (between bin lower & upper)
in_bin = np.logical_and(confidences > bin_lower.item(), confidences <= bin_upper.item())
# can calculate the empirical probability of a sample falling into bin m: (|Bm|/n)
prop_in_bin = in_bin.astype(float).mean()
if prop_in_bin.item() > 0:
# get the accuracy of bin m: acc(Bm)
accuracy_in_bin = accuracies[in_bin].astype(float).mean()
# get the typical confidence of bin m: conf(Bm)
avg_confidence_in_bin = confidences[in_bin].mean()
# calculate |acc(Bm) - conf(Bm)| * (|Bm|/n) for bin m and add to the full ECE
ece += np.abs(avg_confidence_in_bin - accuracy_in_bin) * prop_in_bin
return ece
Calling the function on the binary example returns the identical value as we calculated above 0.23778 (rounded).
expected_calibration_error(samples, true_labels)
It is best to now know easy methods to calculate ECE for binary classification by hand and using numpy
Along with the binary example, we can even add the option for multi-class classification with few lines of additional code. Let’s use James D. McCaffrey’s example. This offers us 5 goal classes and the associated sample confidences. We actually only need the goal indices for our calculation: [0,1,2,3,4] and might, with regard to ECE, ignore the label that they correspond to. Taking a look at sample i=1, we are able to see that as an alternative of only one estimated probability we now have an estimate related to each class: [0.25,0.2,0.22,0.18,0.15].
# Multi-class Classification
samples_multi = np.array([[0.25,0.2,0.22,0.18,0.15],
[0.16,0.06,0.5,0.07,0.21],
[0.06,0.03,0.8,0.07,0.04],
[0.02,0.03,0.01,0.04,0.9],
[0.4,0.15,0.16,0.14,0.15],
[0.15,0.28,0.18,0.17,0.22],
[0.07,0.8,0.03,0.06,0.04],
[0.1,0.05,0.03,0.75,0.07],
[0.25,0.22,0.05,0.3,0.18],
[0.12,0.09,0.02,0.17,0.6]])true_labels_multi = np.array([0,2,3,4,2,0,1,3,3,2])
We now must change the ‘confidences’ variable in our code to take the utmost value, as that one will now determine the anticipated label. For sample i=1 the utmost estimated probability is 0.25.
if binary:
# keep confidences / predicted "probabilities" as they're
confidences = samples
# get binary predictions from confidences
predicted_label = (samples>0.5).astype(float)
else:
# get max probability per sample i
confidences = np.max(samples, axis=1)
# get predictions from confidences (positional on this case)
predicted_label = np.argmax(samples, axis=1).astype(float)
To be able to get the anticipated label we now must change the ‘predicted_label’ variable to take the argmax over the samples, which for i=1 would give us the index 0 corresponding to the label ‘democrat’.
Give the Google Colab Notebook a go and check out it out for yourself in numpy or PyToch (see below).
Now you can too calculate ECE for multi-class classification 🙂