In this article, we will discuss on Lasso regression which is one of the regression models that are available to analyze the data. Further, the regression model is explained with an example and the formula is also listed for reference.
LASSO stands for Least Absolute Shrinkage and Selection Operator.
Lasso regression is one of the regularization methods that creates parsimonious models in the presence of large number of features, where large means either of the below two things:
1. Large enough to enhance the tendency of the model to over-fit. Minimum ten variables can cause overfitting.
2. Large enough to cause computational challenges. This situation can arise in case of millions or billions of features.
Lasso regression performs L1 regularization that is it adds the penalty equivalent to the absolute value of the magnitude of the coefficients. Here the minimization objective is as followed.
Minimization objective = LS Obj + λ (sum of absolute value of coefficients)
Where LS Obj stands for Least Squares Objective which is nothing but the linear regression objective without regularization and λ is the turning factor that controls the amount of regularization. The bias will increase with the increasing value of λ and the variance will decrease as the amount of shrinkage (λ) increases.
When Lasso regression was developed and what is its purpose?
The lasso regression model was developed in 1989. It is basically an alternative to the classic least squares estimate to avoid many of the problems with overfitting when we have a large number of independent variables.
By using large coefficient, we are putting a huge emphasis on the particular feature that it can be a good predictor of the outcome. And when it is too large, the algorithm starts modeling intricate relations to calculate the output & ends up overfitting to the particular data. Lasso regression adds a factor of the sum of the absolute value of the coefficients the optimization objective.
Now let us understand lasso regression formula with a working example:
The lasso regression estimate is defined as
Here the turning factor λ controls the strength of penalty, that is
When λ = 0: We get same coefficients as simple linear regression
When λ = ∞: All coefficients are zero
When 0 < λ < ∞: We get coefficients between 0 and that of simple linear regression
So when λ is in between the two extremes, we are balancing the below two ideas.
But the nature of L1 regularization penalty causes some coefficients to be shrunken to zero. Hence, unlike ridge regression, lasso regression is able to perform variable selection in the liner model. So as the value of λ increases, more coefficients will be set to value zero (provided fewer variables are selected) and so among the nonzero coefficients, more shrinkage will be employed. The below working example will explain it well.
For analyzing the prostate-specific antigen and the clinical measures among the patients who were about to have their prostates removed, ridge regression can give good results provided there are a good number of true coefficients. But if there are only a few coefficients to predict the results lasso regression is the better option to have accurate results since lasso can perform better than ridge when the coefficients are few.
So we have discussed on Lasso regression and understood the formula in detail. If you have any important suggestions that would be useful for the readers then please advice in the comments section below.
Regression Related Articles:
Free Demo for Corporate & Online Trainings.