**Contributed by: Dinesh Kumar **

**Introduction**

On this blog, we are going to see the techniques used to beat overfitting for a lasso regression model. Regularization is certainly one of the methods widely used to make your model more generalized.

**What’s Lasso Regression?**

Lasso regression is a regularization technique. It’s used over regression methods for a more accurate prediction. This model uses shrinkage. Shrinkage is where data values are shrunk towards a central point because the mean. The lasso procedure encourages easy, sparse models (i.e. models with fewer parameters). This particular form of regression is well-suited for models showing high levels of multicollinearity or when you should automate certain parts of model selection, like variable selection/parameter elimination.

Lasso Regression uses L1 regularization technique (will probably be discussed later in this text). It’s used when now we have more features since it routinely performs feature selection.

**Lasso Meaning**

The word “LASSO” stands for **L**east **A**bsolute **S**hrinkage and **S**election **O**perator. It’s a statistical formula for the regularisation of information models and have selection.

**Regularization**

Regularization is a very important concept that’s used to avoid overfitting of the info, especially when the trained and test data are much various.

Regularization is implemented by adding a “penalty” term to the most effective fit derived from the trained data, to attain a with the tested data and likewise restricts the influence of predictor variables over the output variable by compressing their coefficients.

In regularization, what we do is often we keep the identical variety of features but reduce the magnitude of the coefficients. We are able to reduce the magnitude of the coefficients through the use of several types of regression techniques which uses regularization to beat this problem. So, allow us to discuss them. Before we move further, you can even upskill with the assistance of online courses on Linear Regression in Python and enhance your skills.

**Lasso Regularization Techniques**

There are two predominant regularization techniques, namely Ridge Regression and Lasso Regression. They each differ in the way in which they assign a penalty to the coefficients. On this blog, we are going to try to know more about Lasso Regularization technique.

**L1 Regularization**

If a regression model uses the L1 Regularization technique, then it known as Lasso Regression. If it used the L2 regularization technique, it’s called Ridge Regression. We are going to study more about these within the later sections.

L1 regularization adds a penalty that is the same as the absolute value of the magnitude of the coefficient. This regularization type can lead to sparse models with few coefficients. Some coefficients might develop into zero and get eliminated from the model. Larger penalties end in coefficient values which might be closer to zero (ideal for producing simpler models). However, L2 regularization doesn’t end in any elimination of sparse models or coefficients. Thus, Lasso Regression is less complicated to interpret as in comparison with the Ridge. While there are ample resources available online to enable you understand the topic, there’s nothing quite like a certificate. Take a look at Great Learning’s best artificial intelligence course online to upskill within the domain. This course will enable you learn from a top-ranking global school to construct job-ready AIML skills. This 12-month program offers a hands-on learning experience with top faculty and mentors. On completion, you’ll receive a Certificate from The University of Texas at Austin, and Great Lakes Executive Learning.

**Mathematical equation of Lasso Regression**

**Residual Sum of Squares + λ * (Sum of absolutely the value of the magnitude of coefficients)**

Where,

- λ denotes the quantity of shrinkage.
- λ = 0 implies all features are considered and it’s such as the linear regression where only the residual sum of squares is taken into account to construct a predictive model
- λ = ∞ implies no feature is taken into account i.e, as λ closes to infinity it eliminates increasingly features
- The bias increases with increase in λ
- variance increases with decrease in λ

**Lasso Regression in Python**

For this instance code, we are going to consider a dataset from Machine hack’s Predicting Restaurant Food Cost Hackathon.

**In regards to the Data Set**

The duty here is about predicting the typical price for a meal. The information consists of the next features.

Size of coaching set: 12,690 records

Size of test set: 4,231 records

**Columns/Features**

**TITLE**: The feature of the restaurant which will help discover what and for whom it’s suitable for.

**RESTAURANT_ID**: A singular ID for every restaurant.

**CUISINES**: The variability of cuisines that the restaurant offers.

**TIME**: The open hours of the restaurant.

**CITY**: The town wherein the restaurant is situated.

**LOCALITY**: The locality of the restaurant.

**RATING**: The typical rating of the restaurant by customers.

**VOTES**: The general votes received by the restaurant.

**COST**: The typical cost of a two-person meal.

After completing all of the steps till Feature Scaling (Excluding), we will proceed to constructing a Lasso regression. We’re avoiding feature scaling because the lasso regression comes with a parameter that enables us to normalise the info while fitting it to the model.

**Lasso regression example**

```
import numpy as np
```

**Making a Recent Train and Validation Datasets**

```
from sklearn.model_selection import train_test_split
data_train, data_val = train_test_split(new_data_train, test_size = 0.2, random_state = 2)
```

**Classifying Predictors and Goal**

```
#Classifying Independent and Dependent Features
#_______________________________________________
#Dependent Variable
Y_train = data_train.iloc[:, -1].values
#Independent Variables
X_train = data_train.iloc[:,0 : -1].values
#Independent Variables for Test Set
X_test = data_val.iloc[:,0 : -1].values
```

**Evaluating The Model With RMLSE**

```
def rating(y_pred, y_true):
error = np.square(np.log10(y_pred +1) - np.log10(y_true +1)).mean() ** 0.5
rating = 1 - error
return rating
actual_cost = list(data_val['COST'])
actual_cost = np.asarray(actual_cost)
```

**Constructing the Lasso Regressor**

```
#Lasso Regression
from sklearn.linear_model import Lasso
#Initializing the Lasso Regressor with Normalization Factor as True
lasso_reg = Lasso(normalize=True)
#Fitting the Training data to the Lasso regressor
lasso_reg.fit(X_train,Y_train)
#Predicting for X_test
y_pred_lass =lasso_reg.predict(X_test)
#Printing the Rating with RMLSE
print("nnLasso SCORE : ", rating(y_pred_lass, actual_cost))
```

**Output**

**0.7335508027883148**

**The Lasso Regression attained an accuracy of 73% with the given Dataset.**

**Lasso Regression in R**

Allow us to take “The Big Mart Sales” dataset now we have product-wise Sales for Multiple outlets of a series.

Within the dataset, we will see characteristics of the sold item (fat content, visibility, type, price) and a few characteristics of the outlet (yr of firm, size, location, type) and the variety of the items sold for that exact item. Let’s see if we will predict sales using these features.

Let’s us take a snapshot of the dataset:

**Let’s Code!**

**Quick check –** Deep Learning Course

**Ridge and Lasso Regression**

Lasso Regression is different from ridge regression because it uses absolute coefficient values for normalization.

As loss function only considers absolute coefficients (weights), the optimization algorithm will penalize high coefficients. That is generally known as the L1 norm.

Within the above image we will see, Constraint functions (blue area); left one is for lasso whereas the appropriate one is for the ridge, together with contours (green eclipse) for loss function i.e, RSS.

Within the above case, for each regression techniques, the coefficient estimates are given by the primary point at which contours (an eclipse) contacts the constraint (circle or diamond) region.

However, the lasso constraint, due to diamond shape, has corners at each of the axes hence the eclipse will often intersect at each of the axes. As a consequence of that, no less than certainly one of the coefficients will equal zero.

Nonetheless, lasso regression, when α is sufficiently large, will shrink among the coefficients estimates to 0. That’s the explanation lasso provides sparse solutions.

The predominant problem with lasso regression is when now we have correlated variables, it retains just one variable and sets other correlated variables to zero. That may possibly result in some loss of data leading to lower accuracy in our model.

That was Lasso Regularization technique, and I hope now you may comprehend it in a greater way. You should utilize this to enhance the accuracy of your machine learning models.

Difference Between Ridge Regression and Lasso Regression

Ridge Regression | Lasso Regression |
---|---|

The penalty term is the sum of the squares of the coefficients (L2 regularization). | The penalty term is the sum of absolutely the values of the coefficients (L1 regularization). |

Shrinks the coefficients but doesn’t set any coefficient to zero. | Can shrink some coefficients to zero, effectively performing feature selection. |

Helps to cut back overfitting by shrinking large coefficients. | Helps to cut back overfitting by shrinking and choosing features with less importance. |

Works well when there are a lot of features. | Works well when there are a small variety of features. |

Performs “soft thresholding” of coefficients. | Performs “hard thresholding” of coefficients. |

Briefly, Ridge is a shrinkage model, and Lasso is a feature selection model. Ridge tries to balance the bias-variance trade-off by shrinking the coefficients, but it surely doesn’t select any feature and keeps all of them. Lasso tries to balance the bias-variance trade-off by shrinking some coefficients to zero. In this fashion, Lasso may be seen as an optimizer for feature selection.

**Quick check – **Free Machine Learning Course

**Interpretations and Generalizations**

**Interpretations**:

- Geometric Interpretations
- Bayesian Interpretations
- Convex rest Interpretations
- Making λ easier to interpret with an accuracy-simplicity tradeoff

**Generalizations**

- Elastic Net
- Group Lasso
- Fused Lasso
- Adaptive Lasso
- Prior Lasso
- Quasi-norms and bridge regression

**What’s Lasso regression used for?**Lasso regression is used for eliminating automated variables and the collection of features.

**What’s lasso and ridge regression?**Lasso regression makes coefficients to absolute zero; while ridge regression is a model turning method that’s used for analyzing data affected by multicollinearity

**What’s Lasso Regression in machine learning?**Lasso regression makes coefficients to absolute zero; while ridge regression is a model turning method that’s used for analyzing data affected by multicollinearity

**Why does Lasso shrink zero?**The L1 regularization performed by Lasso, causes the regression coefficient of the less contributing variable to shrink to zero or near zero.

**Is lasso higher than Ridge?**Lasso is taken into account to be higher than ridge because it selects just some features and reduces the coefficients of others to zero.

**How does Lasso regression work?**Lasso regression uses shrinkage, where the info values are shrunk towards a central point comparable to the mean value.

**What’s the Lasso penalty?**The Lasso penalty shrinks or reduces the coefficient value towards zero. The less contributing variable is due to this fact allowed to have a zero or near-zero coefficient.

**Is lasso L1 or L2?**A regression model using the L1 regularization technique known as Lasso Regression, while a model using L2 known as Ridge Regression. The difference between these two is the term penalty.

**Is lasso supervised or unsupervised?**Lasso is a supervised regularization method utilized in machine learning.