Shrinkage Methods in Linear Regression – Busigence (2024)

Ever have a question that, “Why is Linear Regression giving me such good accuracy on the training set but a low accuracy on the test set in spite of adding all the available dependent features to the model?”

The question above seems inexplicable to many people but is answered by a concept called overfitting in which your model, in addition to learning the data, also learns the noise present in it. Hence learning the training points, a bit too perfectly.

How do you solve it?

This is where shrinkage methods (also known as regularization) come in play. These methods apply a penalty term to the Loss function used in the model. Minimizing the loss function is equal to maximizing the accuracy. To understand this better, we need to go into the depths of Loss function in Linear Regression.

Linear Regression uses Least Squares to calculate the minimum error between the actual values and the predicted values. The aim is to minimize the squared difference between the actual and predicted values to draw the best possible regression curve for the best prediction accuracy.

Now, what does shrinking do?

Shrinking the coefficient estimates significantly reduces their variance. When we perform shrinking, we essentially bring the coefficient estimates closer to 0.

The need for shrinkage method arises due to the issues of underfitting or overfitting the data. When we want to minimize the mean error (Mean Squared Error(MSE) in case of Linear Regression), we need to optimize the bias-variance trade-off.

What is this bias-variance trade-off?

The bias-variance trade-off indicates the level of underfitting or overfitting of the data with respect to the Linear Regression model applied to it. A high bias-low variance means the model is underfitted and a low bias-high variance means that the model is overfitted. We need to trade-off between bias and variance to achieve the perfect combination for the minimum Mean Squared Error as shown by the graph below.

Shrinkage Methods in Linear Regression – Busigence (1)
In this figure, the green curve is variance, the black curve is squared bias and the purple curve is the MSE. Lambda is the regularization parameter which will be covered later.

How do we use shrinking methods?

The best known shrinking methods are Ridge Regression and Lasso Regression which are often used in place of Linear Regression.
Ridge Regression, like Linear Regression, aims to minimize the Residual Sum of Squares(RSS) but with a slight change. As we know, Linear Regression estimates the coefficients using the values that minimize the following equation:
Shrinkage Methods in Linear Regression – Busigence (2)
Ridge Regression adds a penalty term to this, lambda, to shrink the coefficients to 0 :

Shrinkage Methods in Linear Regression – Busigence (3)

Ridge Regression’s advantage over Linear Regression is that it capitalizes on the bias-variance trade-off. As λ increases, the coefficients shrink more towards 0.

Ridge Regression has a major disadvantage that it includes all p predictors in the output model regardless of the value of their coefficients which can be challenging for a model with huge number of features. This disadvantage is overcome by Lasso Regression which performs variable selection. Lasso Regression uses L-1 penalty as compared to Ridge Regression’s L-2 penalty which instead of squaring the coefficient, takes its absolute value as shown below :

Shrinkage Methods in Linear Regression – Busigence (4)

Ridge Regression brings the value of coefficients close to 0 whereas Lasso Regression forces some of the coefficient values to be exactly equal to 0. It is important to optimize the value of λ in Lasso Regression as well to reduce the MSE error.

Take Away

In conclusion, shrinkage methods provide us with better Regression Models as they minimize the possibility of overfitting or underfitting the data by adding a penalty term to the RSS. Hence it safely removes the misconception that if a Linear Regression model predicts with a good accuracy on a training set then it will also predict with the same accuracy on the test set.
We now know that there are better methods than simple Linear Regression in the form of Ridge Regression and Lasso Regression which account for the underfitting and overfitting of data.

Interested in knowing more on such niche techniques? Check outhttp://research.busigence.com

As an expert in machine learning and data analysis, I've extensively worked with techniques like linear regression, regularization, and model evaluation to understand the intricacies behind predictive models. The phenomenon of overfitting in linear regression is a common challenge where the model becomes too intricate, capturing not just the underlying pattern but also the noise present in the data, leading to poor performance on unseen data.

The concept of regularization or shrinkage methods plays a pivotal role in combating overfitting. Regularization techniques like Ridge Regression and Lasso Regression involve modifying the loss function of linear regression by adding penalty terms. These penalties control the complexity of the model by shrinking the coefficient estimates, thereby reducing variance and mitigating overfitting.

The core idea behind these methods is to strike a balance between bias and variance, known as the bias-variance trade-off. This trade-off determines the model's fit to the data. A high bias-low variance scenario signifies an underfit model, while low bias-high variance indicates an overfit model. Regularization methods help find the optimal equilibrium to minimize the Mean Squared Error (MSE), as depicted in graphs showing the relationship between bias, variance, and MSE.

Ridge Regression introduces an L2 penalty term that scales the coefficients, gradually driving them closer to zero as the regularization parameter (λ) increases. On the other hand, Lasso Regression employs an L1 penalty, not only shrinking coefficients but also forcing some to be exactly zero, thus performing variable selection.

Both Ridge and Lasso Regression offer solutions to the limitations of simple linear regression. While Ridge Regression retains all predictors in the model, Lasso Regression aids in feature selection by effectively zeroing out less significant predictors. However, optimizing the regularization parameter (λ) is crucial in achieving the desired model performance.

In summary, these shrinkage methods enhance regression models by addressing overfitting or underfitting issues through penalty terms applied to the residual sum of squares (RSS). They debunk the misconception that good accuracy on the training set guarantees similar performance on the test set. Instead, they provide more robust models by striking a balance between complexity and accuracy.

If you're interested in delving further into such specialized techniques, exploring sites like might offer deeper insights into niche methodologies in the field.

Shrinkage Methods in Linear Regression – Busigence (2024)

FAQs

What is shrinkage method in linear regression? ›

In the linear regression context, subsetting means choosing a subset from available variables to include in the model, thus reducing its dimensionality. Shrinkage, on the other hand, means reducing the size of the coefficient estimates. Consequently, such a case can also be seen as a kind of subsetting.

What are the advantages of shrinkage methods? ›

It is important to optimize the value of λ in Lasso Regression as well to reduce the MSE error. In conclusion, shrinkage methods provide us with better Regression Models as they minimize the possibility of overfitting or underfitting the data by adding a penalty term to the RSS.

Are both ridge and lasso regression able to shrink coefficients all the way to zero? ›

Ridge regression shrinks all regression coefficients towards zero; the lasso tends to give a set of zero regression coefficients and leads to a sparse solution. Note that for both ridge regression and the lasso the regression coefficients can move from positive to negative values as they are shrunk toward zero.

Are shrinkage estimators biased? ›

By shrinking the estimator by a factor of a, the bias is not zero. So, it is not an unbiased estimator anymore. The variance of ~β=1/a2 β ~ = 1 / a 2 . Therefore, the bigger a gets the higher the bias would be.

How do you calculate shrinkage? ›

Calculation of Shrinkage = Planned Shrinkage + Unplanned Shrinkage. Planned Shrinkage = [Total number of leaves + Total number of week-offs] / Total headcount. Unplanned = [Total number of absent + (Half-day/2)] / Total roster-count.

What is the formula for linear shrinkage? ›

The linear shrinkage is defined as ∆L/Lo, where Lo is the original length, L is the length at a given time or temperature, and ∆ L=L - Lo (a negative quantity).

What are the disadvantages of shrinkage? ›

The largest impact of shrinkage is a loss of profits. This is especially negative in retail environments, where businesses operate on low margins and high volumes, meaning that retailers have to sell a large amount of product to make a profit.

What are the disadvantages of shrinkage stoping? ›

larger number of laborers means that there is more exposure to hazards (which is only heightened by the rough working surface) and that high operating costs are as- sociated with shrinkage stoping.

What is the purpose of shrinkage? ›

Shrinkage is the value used to determine the total required staffing levels necessary to meet your business goals. In other words, it's the amount of “over-scheduling” you must perform in order to have the right number of agents working at any given time of the day.

Why use lasso instead of ridge regression? ›

Use Ridge when you have many correlated predictors and want to avoid multicollinearity. Use Lasso when feature selection is crucial or when you want a sparse model.

Why lasso can shrink to zero? ›

The lasso performs shrinkage so that there are "corners'' in the constraint, which in two dimensions corresponds to a diamond. If the sum of squares "hits'' one of these corners, then the coefficient corresponding to the axis is shrunk to zero.

Why can L1 shrink weights to 0 but not L2? ›

It turns out they have different but equally useful properties. From a practical standpoint, L1 tends to shrink coefficients to zero whereas L2 tends to shrink coefficients evenly. L1 is therefore useful for feature selection, as we can drop any variables associated with coefficients that go to zero.

Can you have a negative shrinkage? ›

Yes. In case of no shrinkage, var(ηi)=ω2 when var(ηi) is calculated on an infinitely large sample. In practice, var(ηi) is calculated on a limited sample related to the number of individuals. Its value can be by chance a little bigger than ω2, leading to a slightly negative shrinkage.

What is shrinkage in multiple regression? ›

In statistics, shrinkage is the reduction in the effects of sampling variation. In regression analysis, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coefficient of determination 'shrinks'.

How do you tell if an estimator is biased or unbiased? ›

In order for an estimator to be unbiased, its expected value must exactly equal the value of the population parameter. The bias of an estimator is the difference between the expected value of the estimator and the actual parameter value. Thus, if this difference is non-zero, then the estimator has bias.

What is the purpose of linear shrinkage? ›

Linear shrinkage is a test method to determine the shrinkage of soil in a linear dimension with water content equal to or more than the liquid limit (LL) of the designated soil.

What is shrinkage method in ML? ›

The idea is to shrink some of the parameters to zero. It uses an optimization formula. It is also a good method for improvement of prediction accuracy but it is hard to say it improves interpretability because of very small coefficients in the model.

What is shrinkage and how is it calculated? ›

The 'shrinkage formula' (also known as 'shrinkage calculation') is as follows: Shrinkage (%) = (Total Hours of External Shrinkage + Total Hours of Internal Shrinkage ) ÷ Total Hours Available × 100. Note, this is the shrinkage formula that can be used in Excel.

Top Articles
Latest Posts
Article information

Author: Aracelis Kilback

Last Updated:

Views: 5547

Rating: 4.3 / 5 (64 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Aracelis Kilback

Birthday: 1994-11-22

Address: Apt. 895 30151 Green Plain, Lake Mariela, RI 98141

Phone: +5992291857476

Job: Legal Officer

Hobby: LARPing, role-playing games, Slacklining, Reading, Inline skating, Brazilian jiu-jitsu, Dance

Introduction: My name is Aracelis Kilback, I am a nice, gentle, agreeable, joyous, attractive, combative, gifted person who loves writing and wants to share my knowledge and understanding with you.