Ridge Regression

  • (4.0)
  •   |   1157 Ratings

How Ridge Regression Model Works

What is ridge regression? Ridge regression formula and example

In today’s article, we will discuss Ridge regression which is one of the standard regression models that an individual can avail to analyze the data in detail. Further, the regression model is explained with the help of the formula and example.

So let us understand what is Ridge regression is all about?

Though linear regression and logistic regression are the most beloved members of the regression family, according to a record-talk at NYC Data Science Academy, you have to be very special to use regression without regularization.

Ridge regression is one of the most fundamental regularization technique which is not used by many due to the complex science behind it. If you have an overall idea about the concept of multiple regression, it’s not so difficult to explore the science behind Ridge regression. When the overall idea about regression is same, what makes regularization different is the way how the model coefficients are determined.

So how you can define Ridge regression?

The Ridge regression is a technique which is specialized to analyze multiple regression data which is multicollinearity in nature.

So lets under what is multicollinearity in detail:

The term multicollinearity also refers to collinearity concept in statistics. In this phenomenon, one predicted value in multiple regression models is linearly predicted with others to attain a certain level of accuracy.

The concept multicollinearity occurs when there are high co-relations between more than two predicted variables.

For example A person’s height, weight, age, annual income etc.

Regularization techniques:

Ridge regression is used to create a parsimonious model in the following scenarios.

  1. The number of predictor variables in a given set exceeds the number of observations
  2. The dataset has multicollinearity (that is correlations between predictor variables).

The regularization techniques are as follows.

  1. Penalize the magnitude of coefficients of features
  2. Minimize the error between the actual and predicted observations

Though there are two regularization techniques – Ridge regression and Lasso regression for creating parsimonious models with a large number of features, the practical use, and the inherent properties are completely different.

Now let us understand how ridge regression model actually works:

Ridge regression performs L2 regularization. Here the penalty equivalent is added to the square of the magnitude of coefficients. The minimization objective is as followed.

Taking a response vector y ∈ Rn and a predictor matrix X ∈ Rn×p, the ridge regression coefficients are defined as

Here λ is the turning factor that controls the strength of the penalty term.

If λ = 0, the objective becomes similar to simple linear regression. So we get the same coefficients as simple linear regression.

If λ = ∞, the coefficients will be zero because of infinite weightage on the square of coefficients as anything less than zero makes the objective infinite.

If 0 < λ < ∞, the magnitude of λ decides the weightage given to the different parts of the objective.

In simple terms, the minimization objective = LS Obj + λ (sum of the square of coefficients)

Where LS Obj is Least Square Objective that is the linear regression objective without regularization.

As ridge regression shrinks the coefficients towards zero, it introduces some bias. But it can reduce the variance to a great extent which will result in a better mean-squared error. The amount of shrinkage is controlled by λ which multiplies the ridge penalty. As large λ means more shrinkage, we can get different coefficient estimates for the different values of λ.


For example, ridge regression can be used for the analysis of prostate-specific antigen and clinical measures among people who were about to have their prostates removed.

The performance of ridge regression is good when there is a subset of true coefficients which are small or even zero. But it doesn’t give good results when all the true coefficients are moderately large. However, it can still perform linear regression over a narrow range of (small) λ values.


So we have talked about ridge regression model and also understood the concept of multicollinearity and how it is used in a ridge regression model analysis. If you have any suggestions on this topic then please advise the same in the comments section below so that others can avail the opportunity to gain complete knowledge about ridge regression.

Regression Related Articles:

Popular Courses in 2018

Get Updates on Tech posts, Interview & Certification questions and training schedules