In today’s article, we will discuss Ridge regression which is one of the standard regression models that an individual can avail to analyze the data in detail. Further, the regression model is explained with the help of the formula and example.
Though linear regression and logistic regression are the most beloved members of the regression family, according to a record-talk at NYC Data Science Academy, you have to be very special to use regression without regularization.
Ridge regression is one of the most fundamental regularization technique which is not used by many due to the complex science behind it. If you have an overall idea about the concept of multiple regression, it’s not so difficult to explore the science behind Ridge regression. When the overall idea about regression is same, what makes regularization different is the way how the model coefficients are determined.
The Ridge regression is a technique which is specialized to analyze multiple regression data which is multicollinearity in nature.
The term multicollinearity also refers to collinearity concept in statistics. In this phenomenon, one predicted value in multiple regression models is linearly predicted with others to attain a certain level of accuracy.
The concept multicollinearity occurs when there are high co-relations between more than two predicted variables.
For example A person’s height, weight, age, annual income etc.
Ridge regression is used to create a parsimonious model in the following scenarios.
The regularization techniques are as follows.
Though there are two regularization techniques – Ridge regression and Lasso regression for creating parsimonious models with a large number of features, the practical use, and the inherent properties are completely different.
Ridge regression performs L2 regularization. Here the penalty equivalent is added to the square of the magnitude of coefficients. The minimization objective is as followed.
Taking a response vector y ∈ Rn and a predictor matrix X ∈ Rn×p, the ridge regression coefficients are defined as
Here λ is the turning factor that controls the strength of the penalty term.
If λ = 0, the objective becomes similar to simple linear regression. So we get the same coefficients as simple linear regression.
If λ = ∞, the coefficients will be zero because of infinite weightage on the square of coefficients as anything less than zero makes the objective infinite.
If 0 < λ < ∞, the magnitude of λ decides the weightage given to the different parts of the objective.
In simple terms, the minimization objective = LS Obj + λ (sum of the square of coefficients)
Where LS Obj is Least Square Objective that is the linear regression objective without regularization.
As ridge regression shrinks the coefficients towards zero, it introduces some bias. But it can reduce the variance to a great extent which will result in a better mean-squared error. The amount of shrinkage is controlled by λ which multiplies the ridge penalty. As large λ means more shrinkage, we can get different coefficient estimates for the different values of λ.
For example, ridge regression can be used for the analysis of prostate-specific antigen and clinical measures among people who were about to have their prostates removed.
The performance of ridge regression is good when there is a subset of true coefficients which are small or even zero. But it doesn’t give good results when all the true coefficients are moderately large. However, it can still perform linear regression over a narrow range of (small) λ values.
So we have talked about ridge regression model and also understood the concept of multicollinearity and how it is used in a ridge regression model analysis. If you have any suggestions on this topic then please advise the same in the comments section below so that others can avail the opportunity to gain complete knowledge about ridge regression.
Free Demo for Corporate & Online Trainings.