Ridge regression is a model tuning method that’s used to analyse any data that suffers from multicollinearity. This method performs L2 regularization. When the problem of multicollinearity occurs, least-squares are unbiased, and variances are large, this leads to predicted values being distant from the actual values.

The associated fee function for ridge regression:

Lambda is the penalty term. λ given here is denoted by an alpha parameter within the ridge function. So, by changing the values of alpha, we’re controlling the penalty term. The upper the values of alpha, the larger is the penalty and due to this fact the magnitude of coefficients is reduced.

- It shrinks the parameters. Subsequently, it’s used to stop multicollinearity
- It reduces the model complexity by coefficient shrinkage
- Try the free course on regression evaluation.

**Ridge Regression Models **

For any sort of regression machine learning model, the standard regression equation forms the bottom which is written as:

Where Y is the dependent variable, X represents the independent variables, B is the regression coefficients to be estimated, and e represents the errors are residuals.

Once we add the lambda function to this equation, the variance that is just not evaluated by the overall model is taken into account. After the info is prepared and identified to be a part of L2 regularization, there are steps that one can undertake.

**Standardization **

In ridge regression, step one is to standardize the variables (each dependent and independent) by subtracting their means and dividing by their standard deviations. This causes a challenge in notation since we must someway indicate whether the variables in a specific formula are standardized or not. So far as standardization is anxious, all ridge regression calculations are based on standardized variables. When the ultimate regression coefficients are displayed, they’re adjusted back into their original scale. Nonetheless, the ridge trace is on a standardized scale.

Also Read: Support Vector Regression in Machine Learning

**Bias and variance trade-off**

Bias and variance trade-off is usually complicated in terms of constructing ridge regression models on an actual dataset. Nonetheless, following the overall trend which one needs to recollect is:

- The bias increases as λ increases.
- The variance decreases as λ increases.

**Assumptions of Ridge Regressions**

The assumptions of ridge regression are the identical as that of linear regression: linearity, constant variance, and independence. Nonetheless, as ridge regression doesn’t provide confidence limits, the distribution of errors to be normal needn’t be assumed.

Now, let’s take an example of a linear regression problem and see how ridge regression if implemented, helps us to cut back the error.

We will consider a knowledge set on Food restaurants trying to search out the perfect combination of food items to enhance their sales in a specific region.

**Upload Required Libraries**

```
import numpy as np
import pandas as pd
import os
import seaborn as sns
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import matplotlib.style
plt.style.use('classic')
import warnings
warnings.filterwarnings("ignore")
df = pd.read_excel("food.xlsx")
```

After conducting all of the EDA on the info, treatment of missing values, we will now go ahead with creating dummy variables, as we cannot have categorical variables within the dataset.

```
df =pd.get_dummies(df, columns=cat,drop_first=True)
```

Where columns=cat is all the specific variables in the info set.

After this, we’d like to standardize the info set for the Linear Regression method.

**Scaling the variables as continuous variables have different weightage**

```
#Scales the info. Essentially returns the z-scores of each attribute
from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale
df['week'] = std_scale.fit_transform(df[['week']])
df['final_price'] = std_scale.fit_transform(df[['final_price']])
df['area_range'] = std_scale.fit_transform(df[['area_range']])
```

**Train-Test Split**

```
# Copy all of the predictor variables into X dataframe
X = df.drop('orders', axis=1)
# Copy goal into the y dataframe. Goal variable is converted in to Log.
y = np.log(df[['orders']])
# Split X and y into training and test set in 75:25 ratio
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25 , random_state=1)
```

**Linear Regression Model**

Also Read: What’s Linear Regression?

```
# invoke the LinearRegression function and find the bestfit model on training data
regression_model = LinearRegression()
regression_model.fit(X_train, y_train)
# Allow us to explore the coefficients for every of the independent attributes
for idx, col_name in enumerate(X_train.columns):
print("The coefficient for {} is {}".format(col_name, regression_model.coef_[0][idx]))
The coefficient for week is -0.0041068045722690814
The coefficient for final_price is -0.40354286519747384
The coefficient for area_range is 0.16906454326841025
The coefficient for website_homepage_mention_1.0 is 0.44689072858872664
The coefficient for food_category_Biryani is -0.10369818094671146
The coefficient for food_category_Desert is 0.5722054451619581
The coefficient for food_category_Extras is -0.22769824296095417
The coefficient for food_category_Other Snacks is -0.44682163212660775
The coefficient for food_category_Pasta is -0.7352610382529601
The coefficient for food_category_Pizza is 0.499963614474803
The coefficient for food_category_Rice Bowl is 1.640603292571774
The coefficient for food_category_Salad is 0.22723622749570868
The coefficient for food_category_Sandwich is 0.3733070983152591
The coefficient for food_category_Seafood is -0.07845778484039663
The coefficient for food_category_Soup is -1.0586633401722432
The coefficient for food_category_Starters is -0.3782239478810047
The coefficient for cuisine_Indian is -1.1335822602848094
The coefficient for cuisine_Italian is -0.03927567006223066
The coefficient for center_type_Gurgaon is -0.16528108967295807
The coefficient for center_type_Noida is 0.0501474731039986
The coefficient for home_delivery_1.0 is 1.026400462237632
The coefficient for night_service_1 is 0.0038398863634691582
#checking the magnitude of coefficients
from pandas import Series, DataFrame
predictors = X_train.columns
coef = Series(regression_model.coef_.flatten(), predictors).sort_values()
plt.figure(figsize=(10,8))
coef.plot(kind='bar', title="Model Coefficients")
plt.show()
```

Variables showing Positive effect on regression model are food_category_Rice Bowl, home_delivery_1.0, food_category_Desert,food_category_Pizza ,website_homepage_mention_1.0, food_category_Sandwich, food_category_Salad and area_range – these aspects highly influencing our model.

The upper the worth of the beta coefficient, the upper is the impact.

Dishes like Rice Bowl, Pizza, Desert with a facility like home delivery and website_homepage_mention plays a crucial role in demand or variety of orders being placed in high frequency.

Variables showing negative effect on regression model for predicting restaurant orders: cuisine_Indian,food_category_Soup , food_category_Pasta , food_category_Other_Snacks.

Final_price has a negative effect on the order – as expected.

Dishes like Soup, Pasta, other_snacks, Indian food categories have a negative effect on model prediction on the variety of orders being placed at restaurants, keeping all other predictors constant.

Some variables that are hardly affecting model prediction for order frequency are week and night_service.

Through the model, we’re capable of see object forms of variables or categorical variables are more significant than continuous variables.

Also Read: Introduction to Regular Expression in Python

**Regularization**

- Value of alpha, which is a hyperparameter of Ridge, which suggests that they usually are not robotically learned by the model as a substitute they should be set manually. We run a grid seek for optimum alpha values
- To seek out optimum alpha for Ridge Regularization we’re applying GridSearchCV

```
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
ridge=Ridge()
parameters={'alpha':[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40,45,50,55,100]}
ridge_regressor=GridSearchCV(ridge,parameters,scoring='neg_mean_squared_error',cv=5)
ridge_regressor.fit(X,y)
print(ridge_regressor.best_params_)
print(ridge_regressor.best_score_)
{'alpha': 0.01}
-0.3751867421112124
```

The negative sign is due to known error within the Grid Search Cross Validation library, so ignore the negative sign.

```
predictors = X_train.columns
coef = Series(ridgeReg.coef_.flatten(),predictors).sort_values()
plt.figure(figsize=(10,8))
coef.plot(kind='bar', title="Model Coefficients")
plt.show()
```

From the above evaluation we are able to determine that the ultimate model might be defined as:

Orders = 4.65 + 1.02home_delivery_1.0 + .46 website_homepage_mention_1 0+ (-.40* final_price) +.17area_range + 0.57food_category_Desert + (-0.22food_category_Extras) + (-0.73food_category_Pasta) + 0.49food_category_Pizza + 1.6food_category_Rice_Bowl + 0.22food_category_Salad + 0.37food_category_Sandwich + (-1.05food_category_Soup) + (-0.37food_category_Starters) + (-1.13cuisine_Indian) + (-0.16center_type_Gurgaon)

Top 5 variables influencing regression model are:

- food_category_Rice Bowl
- home_delivery_1.0
- food_category_Pizza
- food_category_Desert
- website_homepage_mention_1

The upper the beta coefficient, the more significant is the predictor. Hence, with certain level model tuning, we are able to discover the perfect variables that influence a business problem.

When you found this blog helpful and need to learn more about such concepts, you possibly can join Great Learning Academy’s free online courses today.