Home Learn A Guide to Linear Regression in Machine Learning – 2023

A Guide to Linear Regression in Machine Learning – 2023

0
A Guide to Linear Regression in Machine Learning – 2023

What’s Linear Regression?

Linear Regression is the fundamental type of regression evaluation. It assumes that there’s a linear relationship between the dependent variable and the predictor(s). In regression, we attempt to calculate the very best fit line, which describes the connection between the predictors and predictive/dependent variables.

There are 4 assumptions related to a linear regression model:

  1. Linearity: The connection between independent variables and the mean of the dependent variable is linear. 
  2. Homoscedasticity: The variance of residuals must be equal.
  3. Independence: Observations are independent of one another.
  4. Normality: The dependent variable is generally distributed for any fixed value of an independent variable.

Isn’t Linear Regression from Statistics?

Before we dive into the small print of linear regression, you might be asking yourself why we’re this algorithm.

Isn’t it a way from statistics? Machine learning, more specifically the sphere of predictive modeling, is primarily concerned with minimizing the error of a model or making probably the most accurate predictions possible on the expense of explainability. In applied machine learning, we’ll borrow and reuse algorithms from many alternative fields, including statistics and use them towards these ends.

As such, linear regression was developed in the sphere of statistics and is studied as a model for understanding the connection between input and output numerical variables. Nonetheless, it has been borrowed by machine learning, and it’s each a statistical algorithm and a machine learning algorithm.

Linear Regression Model Representation

Linear regression is a beautiful model since the representation is so easy.
The representation is a linear equation that mixes a particular set of input values (x), the answer to which is the expected output for that set of input values (y). As such, each the input values (x) and the output value are numeric.

The linear equation assigns one scale factor to every input value or column, called a coefficient and represented by the capital Greek letter Beta (B). One additional coefficient is added, giving the road a further degree of freedom (e.g., moving up and down on a two-dimensional plot) and is commonly called the intercept or the bias coefficient.

For instance, in a straightforward regression problem (a single x and a single y), the shape of the model could be:
Y= β0 + β1x

In higher dimensions, the road known as a plane or a hyper-plane when we now have a couple of input (x). The representation, subsequently, is in the shape of the equation and the precise values used for the coefficients (e.g., β0and β1 within the above example).

Performance of Regression

The regression model’s performance might be evaluated using various metrics like MAE, MAPE, RMSE, R-squared, etc.

Mean Absolute Error (MAE)

By utilizing MAE, we calculate the typical absolute difference between the actual values and the expected values. 

Mean Absolute Percentage Error (MAPE) 

MAPE is defined as the typical of absolutely the deviation of the expected value from the actual value. It’s the typical of the ratio of absolutely the difference between actual & predicted values and actual values. 

Root Mean Square Error (RMSE)

RMSE calculates the square root average of the sum of the squared difference between the actual and the expected values.

R-squared values

R-square value depicts the share of the variation within the dependent variable explained by the independent variable within the model. 

RSS = Residual sum of squares: It measures the difference between the expected and the actual output. A small RSS indicates a good fit of the model to the info. It’s also defined as follows: 

TSS = Total sum of squares: It’s the sum of information points’ errors from the response variable’s mean. 

R2 value ranges from 0 to 1. The upper the R-square value higher the model. The worth of R2 increases if we add more variables to the model, no matter whether the variable contributes to the model or not. That is the drawback of using R2.

Adjusted R-squared values

The Adjusted R2 value fixes the drawback of R2. The adjusted R2 value will improve provided that the added variable contributes significantly to the model, and the adjusted R2 value adds a penalty to the model.

where R2 is the R-square value, n = the overall variety of observations, and k = the overall variety of variables utilized in the model, if we increase the variety of variables, the denominator becomes smaller, and the general ratio might be high. Subtracting from 1 will reduce the general Adjusted R2. So to extend the Adjusted R2, the contribution of additive features to the model must be significantly high.

Easy Linear Regression Example

For the given equation for the Linear Regression,

If there is just 1 predictor available, then it’s referred to as Easy Linear Regression. 

While executing the prediction, there’s an error term that’s related to the equation.

The SLR model goals to seek out the estimated values of β1 & β0 by keeping the error term (ε) minimum.

Multiple Linear Regression Example

For the given equation of Linear Regression,

if there’s greater than 1 predictor available, then it’s referred to as Multiple Linear Regression. 

The equation for MLR might be:

β1 = coefficient for X1 variable

β2 = coefficient for X2 variable

β3 = coefficient for X3 variable and so forth…

β0 is the intercept (constant term). While making the prediction, there’s an error term that’s related to the equation.

The goal of the MLR model is to seek out the estimated values of β0, β1, β2, β3… by keeping the error term (i) minimum.

Broadly speaking, supervised machine learning algorithms are classified into two types-

  1. Regression: Used to predict a continuous variable
  2. Classification: Used to predict discrete variable 

On this post, we’ll discuss certainly one of the regression techniques, “Multiple Linear Regression,” and its implementation using Python.

Linear regression is certainly one of the statistical methods of predictive analytics to predict the goal variable (dependent variable). When we now have one independent variable, we call it Easy Linear Regression. If the variety of independent variables is a couple of, we call it Multiple Linear Regression.

Assumptions for Multiple Linear Regression

  1. Linearity: There must be a linear relationship between dependent and independent variables, as shown within the below example graph.

2. Multicollinearity: There shouldn’t be a high correlation between two or more independent variables. Multicollinearity might be checked using a correlation matrix, Tolerance and Variance Influencing Factor (VIF).

3. Homoscedasticity: If Variance of errors is constant across independent variables, then it known as Homoscedasticity. The residuals must be homoscedastic. Standardized residuals versus predicted values are used to envision homoscedasticity, as shown within the below figure. Breusch-Pagan and White tests are the famous tests used to envision Homoscedasticity. Q-Q plots are also used to envision homoscedasticity.

4. Multivariate Normality: Residuals must be normally distributed.

5. Categorical Data: Any categorical data present must be converted into dummy variables.

6. Minimum records: There must be a minimum of 20 records of independent variables.

A mathematical formulation of Multiple Linear Regression

In Linear Regression, we try to seek out a linear relationship between independent and dependent variables through the use of a linear equation on the info.

The equation for a linear line is-

Y=mx + c

Where m is slope and c is the intercept.

In Linear Regression, we are literally attempting to predict the very best m and c values for dependent variable Y and independent variable x. We fit as many lines and take the very best line that offers the least possible error. We use the corresponding m and c values to predict the y value.

The identical concept might be utilized in multiple Linear Regression where we now have multiple independent variables, x1, x2, x3…xn.

Now the equation changes to- 

Y=M1X1 + M2X2 + M3M3 + …MnXn+C

The above equation just isn’t a line but a plane of multi-dimensions.

Model Evaluation:

A model might be evaluated through the use of the below methods-

  1. Mean absolute error: It’s the mean of absolute values of the errors, formulated as- 
  1. Mean squared error: It’s the mean of the square of errors.
  1. Root mean squared error: It’s just the square root of MSE.

Applications

  1. The effect of the independent variable on the dependent variable might be calculated.
  2. Used to predict trends.
  3. Used to seek out how much change might be expected in a dependent variable with change in an independent variable.

Polynomial Regression

Polynomial regression is a non-linear regression. In Polynomial regression, the connection of the dependent variable is fitted to the nth degree of the independent variable. 

Equation of polynomial regression: 

Underfitting and Overfitting

Once we fit a model, we try to seek out the optimized, best-fit line, which may describe the impact of the change within the independent variable on the change within the dependent variable by keeping the error term minimum. While fitting the model, there might be 2 events that may result in the bad performance of the model. These events are

  1. Underfitting 
  2. Overfitting

Underfitting 

Underfitting is the condition where the model cannot fit the info well enough. The under-fitted model results in low accuracy of the model. Subsequently, the model is unable to capture the connection, trend, or pattern within the training data. Underfitting of the model might be avoided through the use of more data or by optimizing the parameters of the model.

Overfitting

Overfitting is the other case of underfitting, i.e., when the model predicts thoroughly on training data and just isn’t in a position to predict well on test data or validation data. The principal reason for overfitting might be that the model is memorizing the training data and is unable to generalize it on a test/unseen dataset. Overfitting might be reduced by making feature selection or through the use of regularisation techniques. 

The above graphs depict the three cases of the model performance. 

Implementing Linear Regression in Python

Dataset Introduction

The info concerns city-cycle fuel consumption in miles per gallon(mpg) to be predicted. There are a complete of 392 rows, 5 independent variables, and 1 dependent variable. All 5 predictors are continuous variables.

 Attribute Information:

  1. mpg:                   continuous (Dependent Variable)
  2. cylinders:           multi-valued discrete
  3. displacement:   Continuous
  4. horsepower:      continuous
  5. weight:               Continuous
  6. acceleration:     Continuous

The target of the issue statement is to predict the miles per gallon using the Linear Regression model.

Python Packages for Linear Regression

Import the obligatory Python package to perform various steps like data reading, plotting the info, and performing linear regression. Import the next packages:

Read the info

Download the info and reserve it in the info directory of the project folder.

Easy Linear Regression With scikit-learn

Easy Linear regression has just one predictor variable and 1 dependent variable. From the above dataset, let’s consider the effect of horsepower on the ‘mpg’ of the vehicle.

Let’s take a have a look at what the info looks like:

From the above graph, we are able to infer a negative linear relationship between horsepower and miles per gallon (mpg). With horsepower increasing, mpg is decreasing.

Now, let’s perform the Easy linear regression. 

From the output of the above SLR model, the equation of the very best fit line of the model is 

mpg = 39.94 + (-0.16)*(horsepower)

By comparing the above equation to the SLR model equation Yi= βiXi + β0 , β0=39.94, β1=-0.16

Now, check for the model relevancy by its R2 and RMSE Values

R2 and RMSE (Root mean square) values are 0.6059 and 4.89, respectively. It implies that 60% of the variance in mpg is explained by horsepower. For a straightforward linear regression model, this result’s okay but not so good since there might be an effect of other variables like cylinders, acceleration, etc. RMSE value can also be very less. 

Let’s check how the road suits the info.

From the graph, we are able to infer that the very best fit line is in a position to explain the effect of horsepower on mpg.

Multiple Linear Regression With scikit-learn

Because the data is already loaded within the system, we’ll start performing multiple linear regression.

The actual data has 5 independent variables and 1 dependent variable (mpg)

The perfect fit line for Multiple Linear Regression is 

Y = 46.26 + -0.4cylinders + -8.313e-05displacement + -0.045horsepower + -0.01weight + -0.03acceleration

By comparing the very best fit line equation with

β0 (Intercept)= 46.25, β1 = -0.4, β2 = -8.313e-05, β3= -0.045, β4= 0.01, β5 = -0.03

Now, let’s check the R2 and RMSE values.

R2 and RMSE (Root mean square) values are 0.707 and 4.21, respectively. It implies that ~71% of the variance in mpg is explained by all of the predictors. This depicts a very good model. Each values are lower than the outcomes of Easy Linear Regression, which suggests that adding more variables to the model will assist in good model performance. Nonetheless, the more the worth of R2 and the least RMSE, the higher the model might be.

Multiple Linear Regression- Implementation using Python

Allow us to take a small data set and check out out a constructing model using python.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import metrics

data=pd.read_csv("Consumer.csv")
data.head()

The above figure shows the highest 5 rows of the info. We are literally attempting to predict the Amount charged (dependent variable) based on the opposite two independent variables, Income and Household Size. We first check for our assumptions in our data set.

  1. Check for Linearity
plt.figure(figsize=(14,5))
plt.subplot(1,2,1)
plt.scatter(data['AmountCharged'], data['Income'])
plt.xlabel('AmountCharged')
plt.ylabel('Income')
plt.subplot(1,2,2)
plt.scatter(data['AmountCharged'], data['HouseholdSize'])
plt.xlabel('AmountCharged')
plt.ylabel('HouseholdSize')
plt.show()

We are able to see from the above graph, there exists a linear relationship between the Amount Charged and Income, Household Size.

2. Check for Multicollinearity

sns.scatterplot(data['Income'],data['HouseholdSize'])

There exists no collinearity between Income and HouseholdSize from the above graph.

We split our data to coach and test in a ratio of 80:20, respectively, using the function train_test_split

X = pd.DataFrame(np.c_[data['Income'], data['HouseholdSize']], columns=['Income','HouseholdSize'])
y=data['AmountCharged']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=9)

3. Check for Homoscedasticity

First, we’d like to calculate residuals-

resi=y_test-prediction

Polynomial Regression With scikit-learn

For Polynomial regression, we’ll use the identical data that we used for Easy Linear Regression. 

The graph shows that the connection between horsepower and miles per gallon just isn’t perfectly linear. It’s somewhat bit curved. 

Graph for the Best fit line for Easy Linear Regression as per below:

From the plot, we are able to infer that the very best fit line is in a position to explain the effect of the independent variable, nevertheless, this doesn’t apply to a lot of the data points. 

Let’s try polynomial regression on the above dataset. Let’s fit degree = 2 

Now, visualize the Polynomial Regression results

From the graph, the very best fit line looks higher than the Easy Linear Regression. 

Let’s discover the model performance by calculating mean absolute Error, Mean squared error, and Root mean square.

Easy Linear Regression Model Performance:

Polynomial Regression (degree = 2) Model Performance:

From the above results, we are able to see that Error-values are less in Polynomial regression but there just isn’t much improvement. We are able to increase the polynomial degree and experiment with the model performance. 

Advanced Linear Regression with statsmodels

There are numerous ways to perform regression in python. 

  1. scikit Learn 
  2. statsmodels 

Within the MLR within the python section explained above, we now have performed MLR using the scikit learn library. Now, let’s perform MLR using the statsmodels library.

Import the below-required libraries

Now, perform Multiple Linear Regression using statsmodels

From the above results, R2 and Adjusted R2 are 0.708 and 0.704, respectively. All of the independent variables explain almost 71% of the variation within the dependent variables. The worth of R2 is similar as the results of the scikit learn library. 

By the p-value for the independent variables, intercept, horsepower, and weight are vital variables for the reason that p-value is lower than 0.05 (significance level). We are able to attempt to perform MLR by removing other variables which are usually not contributing to the model and choosing the very best model.

Now, let’s check the model performance by calculating the RMSE value:

Linear Regression in R

To see an example of Linear Regression in R, we’ll select the CARS, which is an inbuilt dataset in R. Typing CARS within the R Console can access the dataset. We are able to observe that the dataset has 50 observations and a pair of variables, namely distance and speed. The target here is to predict the gap traveled by a automotive when the speed of the automotive is thought. Also, we’d like to ascertain a linear relationship between them with the assistance of an arithmetic equation. Before stepping into modeling, it’s at all times advisable to do an Exploratory Data Evaluation, which helps us to grasp the info and the variables.

Exploratory Data Evaluation

This paper goals to construct a Linear Regression Model that might help predict distance. The next are the fundamental visualizations that may help us understand more in regards to the data and the variables:

  1. Scatter Plot – To assist establish whether there exists a linear relationship between distance and speed.
  2. Box Plot – To ascertain whether there are any outliers within the dataset.
  3. Density Plot – To ascertain the distribution of the variables; ideally, it must be normally distributed.

Below are the steps to make these graphs in R.

Scatter Plots to visualise Relationship

A Scatter Diagram plots the pairs of numerical data with one variable on each axis, and helps establish the connection between the independent and dependent variables.

Steps in R

If we fastidiously observe the scatter plot, we are able to see that the variables are correlated as they fall along the road/curve. The upper the correlation, the nearer the points, might be to the road/curve. 

As discussed earlier, the Scatter Plot shows a linear and positive relationship between Distance and Speed. Thus, it fulfills certainly one of the assumptions of Linear Regression i.e., there must be a positive and linear relationship between dependent and independent variables.

Check for Outliers using Boxplots.

A boxplot can also be called a box and whisker plot that’s utilized in statistics to represent the five number summaries. It’s used to envision whether the distribution is skewed or whether there are any outliers within the dataset.

Wikipedia defines ‘Outliers’ as an commentary point that’s distant from other observations within the dataset.

Now, let’s plot the Boxplot to envision for outliers.

After observing the Boxplots for each Speed and Distance, we are able to say that there aren’t any outliers in Speed, and there appears to be a single outlier in Distance. Thus, there isn’t a need for the treatment of outliers.

Checking distribution of Data using Density Plots

One in every of the important thing assumptions to performing Linear Regression is that the info must be normally distributed. This might be done with the assistance of Density Plots. A Density Plot helps us visualize the distribution of a numeric variable over a time period.

After the Density Plots, we are able to conclude that the info set is kind of normally distributed.

Linear Regression Modelling

Now, let’s get into the constructing of the Linear Regression Model. But before that, there’s one check we’d like to perform, which is ‘Correlation Computation’. The Correlation Coefficients help us to envision how strong is the connection between the dependent and independent variables. The worth of the Correlation Coefficient ranges from -1 to 1.

A Correlation of 1 indicates an ideal positive relationship. It means if one variable’s value increases, the opposite variable’s value also increases.

A Correlation of -1 indicates an ideal negative relationship. It means if the worth of variable x increases, the worth of variable y decreases.

A Correlation of 0 indicates there isn’t a relationship between the variables.

The output of the above R Code is 0.8068949. It shows that the correlation between speed and distance is 0.8, which is near 1, stating a positive and powerful correlation.

The linear regression model in R is built with the assistance of the lm() function.

The formula uses two principal parameters:

Data – variable containing the dataset.

Formula – an object of the category formula.

The outcomes show us the intercept and beta coefficient of the variable speed.

From the output above,

a) We are able to write the regression equation as distance = -17.579 + 3.932 (speed).

Model Diagnostics

Just constructing the model and using it for prediction is the job half done. Before using the model, we’d like to make sure that the model is statistically significant. This implies:

  1. To ascertain if there’s a statistically significant relationship between the dependent and independent variables.
  2. The model that we built suits the info thoroughly.

We do that by a statistical summary of the model using the summary() function in R.

The summary output shows the next:

  1. Call – The function call used to compute the regression model.
  2. Residuals – Distribution of residuals, which generally has a mean of 0. Thus, the median shouldn’t be removed from 0, and the minimum and maximum must be equal in absolute value.
  3. Coefficients – It shows the regression beta coefficients and their statistical significance.
  4. Residual stand effort (RSE), R – Square, and F –Statistic – These are the metrics to envision how well the model suits our data.

Detecting t-statistics and P-Value

T-Statistic and associated p-values are very vital metrics while checking model fitment.

The t-statistics tests whether there’s a statistically significant relationship between the independent and dependent variables. This implies whether the beta coefficient of the independent variable is significantly different from 0. So, the upper the t-value, the higher.

At any time when there’s a p-value, there’s at all times a null in addition to an alternate hypothesis related to it. The p-value helps us to check for the null hypothesis, i.e., the coefficients are equal to 0. A low p-value means we are able to reject the null hypothesis.

The statistical hypotheses are as follows:

Null Hypothesis (H0) – Coefficients are equal to zero.

Alternate Hypothesis (H1) – Coefficients are usually not equal to zero.

As discussed earlier, when the p-value < 0.05, we are able to safely reject the null hypothesis.

In our case, for the reason that p-value is lower than 0.05, we are able to reject the null hypothesis and conclude that the model is extremely significant. This implies there’s a big association between the independent and dependent variables.

R – Squared and Adjusted R – Squared

R – Squared (R2) is a basic metric which tells us how much variance has been explained by the model. It ranges from 0 to 1. In Linear Regression, if we keep adding recent variables, the worth of R – Square will keep increasing no matter whether the variable is important. That is where Adjusted R – Square comes to assist. Adjusted R – Square helps us to calculate R – Square from only those variables whose addition to the model is important. So, while performing Linear Regression, it’s at all times preferable to take a look at Adjusted R – Square reasonably than simply R – Square.

  1. An Adjusted R – Square value near 1 indicates that the regression model has explained a big proportion of variability.
  2. A number near 0 indicates that the regression model didn’t explain an excessive amount of variability.

In our output, Adjusted R Square value is 0.6438, which is closer to 1, thus indicating that our model has been in a position to explain the variability.

AIC and BIC

AIC and BIC are widely used metrics for model selection. AIC stands for Akaike Information Criterion, and BIC stands for Bayesian Information Criterion. These help us to envision the goodness of fit for our model. For model comparison model with the bottom AIC and BIC is preferred.

Which Regression Model is the very best fit for the info?

There are variety of metrics that help us resolve the very best fit model for our data, but probably the most widely used are given below:

Statistics Criterion
R – Squared Higher the higher
Adjusted R – Squared Higher the higher
t-statistic Higher the t-values lower the p-value
f-statistic Higher the higher
AIC Lower the higher
BIC Lower the higher
Mean Standard Error (MSE) Lower the higher

Predicting Linear Models

Now we all know tips on how to construct a Linear Regression Model In R using the complete dataset. But this approach doesn’t tell us how well the model will perform and fit recent data.

Thus, to unravel this problem, the final practice within the industry is to separate the info into the Train and Test datasets within the ratio of 80:20 (Train 80% and Test 20%). With the assistance of this method, we are able to now get the values for the test dataset and compare them with the values from the actual dataset.

Splitting the Data

We do that with the assistance of the sample() function in R. 

Constructing the model on Train Data and Predict on Test Data

Model Diagnostics

If we have a look at the p-value, because it is lower than 0.05, we are able to conclude that the model is important. Also, if we compare the Adjusted R – Squared value with the unique dataset, it’s near it, thus validating that the model is important.

K – Fold Cross-Validation

Now, we now have seen that the model performs well on the test dataset as well. But this doesn’t guarantee that the model might be a very good slot in the long run as well. The rationale is that there may be a case that a number of data points within the dataset won’t be representative of the entire population. Thus, we’d like to envision the model performance as much as possible. One approach to ensure that is to envision whether the model performs well on train and test data chunks. This might be done with the assistance of K – Fold Cross-validation. 

The procedure of K – Fold Cross-validation is given below:

  1. The random shuffling of the dataset.
  2. Splitting of information into k folds/sections/groups.
  3. For every fold/section/group:
  1. Make the fold/section/group the test data.
  2. Take the remaining data as train data.
  3. Run the model on train data and evaluate the test data.
  4. Keep the evaluation rating and discard the model.

After performing the K – Fold Cross-validation, we are able to observe that the R – Square value is near the unique data, as well, as MAE is 12%, which helps us conclude that model is a very good fit.

Benefits of Using Linear Regression

  1. The linear Regression method may be very easy to make use of. If the connection between the variables (independent and dependent) is thought, we are able to easily implement the regression method accordingly (Linear Regression for linear relationship).
  2. Linear Regression provides the importance level of every attribute contributing to the prediction of the dependent variable. With this data, we are able to make a choice from the variables that are highly contributing/ vital variables. 
  3. After performing linear regression, we get the very best fit line, which is utilized in prediction, which we are able to use in accordance with the business requirement.

Limitations of Linear Regression

The principal limitation of linear regression is that its performance just isn’t up to speed within the case of a nonlinear relationship. Linear regression might be affected by the presence of outliers within the dataset. The presence of high correlation among the many variables also results in the poor performance of the linear regression model.

Linear Regression Examples

  1. Linear Regression might be used for product sales prediction to optimize inventory management.
  2. It will probably be utilized in the Insurance domain, for instance, to predict the insurance premium based on various features.
  3. Monitoring website click count every day using linear regression could assist in optimizing the web site efficiency etc.
  4. Feature selection is certainly one of the applications of Linear Regression.

Linear Regression – Learning the Model

With easy linear regression, when we now have a single input, we are able to use statistics to estimate the coefficients.
This requires that you just calculate statistical properties from the info, comparable to mean, standard deviation, correlation, and covariance. All of the info have to be available to traverse and calculate statistics.

When we now have a couple of input, we are able to use Unusual Least Squares to estimate the values of the coefficients.
The Unusual Least Squares procedure seeks to reduce the sum of the squared residuals. Because of this given a regression line through the info, we calculate the gap from each data point to the regression line, square it, and sum all the squared errors together. That is the amount that strange least squares seek to reduce.

This operation known as Gradient Descent and works by starting with random values for every coefficient. The sum of the squared errors is calculated for every pair of input and output values. A learning rate is used as a scale factor, and the coefficients are updated within the direction of minimizing the error. The method is repeated until a minimum sum squared error is achieved or no further improvement is feasible.
When using this method, you should select a learning rate (alpha) parameter that determines the scale of the development step to tackle each iteration of the procedure.

There are extensions to the training of the linear model called regularization methods. These seek to reduce the sum of the squared error of the model on the training data (using strange least squares) and likewise to cut back the complexity of the model (just like the number or absolute size of the sum of all coefficients within the model).
Two popular examples of regularization procedures for linear regression are:
– Lasso Regression: where Unusual Least Squares are modified also to reduce absolutely the sum of the coefficients (called L1 regularization).
– Ridge Regression: where Unusual Least Squares are modified also to reduce the squared absolute sum of the coefficients (called L2 regularization).

Preparing Data for Linear Regression

Linear regression has been studied at great length, and there’s plenty of literature on how your data have to be structured to best use the model. In practice, you need to use these rules more like rules of thumb when using Unusual Least Squares Regression, probably the most common implementation of linear regression.

Try different preparations of your data using these heuristics and see what works best in your problem.

  • Linear Assumption
  • Noise Removal
  • Remove Collinearity
  • Gaussian Distributions

Summary

On this post, you discovered the linear regression algorithm for machine learning.
You covered plenty of ground, including:

  • The common names used when describing linear regression models.
  • The representation utilized by the model.
  • Learning algorithms are used to estimate the coefficients within the model.
  • Rules of thumb to think about when preparing data to be used with linear regression. 

Check out linear regression and get comfortable with it. Should you are planning a profession in Machine Learning, listed here are some Must-Haves On Your Resume and probably the most common interview questions to organize.

LEAVE A REPLY

Please enter your comment!
Please enter your name here