An actual-world case study of performance optimization in Numpy
This can be a relatively transient article. In it, I’ll use a real-world scenario for instance to clarify how one can use Numexpr expressions in multidimensional Numpy arrays to attain substantial performance improvements.
There aren’t many articles explaining how one can use Numexpr in multidimensional Numpy arrays and how one can use Numexpr expressions, so I hope this one will enable you.
Recently, while reviewing a few of my old work, I stumbled upon this piece of code:
def predict(X, w, b):
z = np.dot(X, w)
y_hat = sigmoid(z)
y_pred = np.zeros((y_hat.shape[0], 1))for i in range(y_hat.shape[0]):
if y_hat[i, 0] < 0.5:
y_pred[i, 0] = 0
else:
y_pred[i, 0] = 1
return y_pred
This code transforms prediction results from probabilities to classification results of 0 or 1 within the logistic regression model of machine learning.
But heavens, who would use a for loop to iterate over Numpy ndarray?
You may foresee that when the information reaches a specific amount, it should not only occupy loads of memory, however the performance can even be inferior.
That’s right, the one that wrote this code was me after I was younger.
With a way of responsibility, I plan to rewrite this code with the Numexpr library today.
Along the way in which, I’ll show you how one can use Numexpr and Numexpr’s where expression in multidimensional Numpy arrays to attain significant performance improvements.
Should you aren’t conversant in the fundamental usage of Numexpr, you possibly can check with this text: