• The assumptions of this regression is same as least squared regression except normality is not to be assumed
• It shrinks coefficients to zero (exactly zero), which certainly helps in feature selection
• This is a regularization method and uses l1 regularization
• If group of predictors are highly correlated, lasso picks only one of them and shrinks the others to zero