Linear Regression Models Main
Notes and Ideas:
- Simple Linear Regression
- Mutiple Linear Regression
- Piecewise Regression
- Standarized Regression
- Weighted Leaset Squares Regression
- Polynomial Regression
- LoweSS(Local Regression)
- Least Mean Square Regression
- Standarized Regression
- Robust Regression
- General Linear Model
- Non-Linear Model:
Definition:
Linear Regrssion Means Linear Algebra Basic Guide of features Genral Linear Model
Funciton
-
From the euqation above, we have shown the lienar model based on the n number of features. Considering only a subject feature as us probably already have understood that
will be the slope and the b will represent the intercept. -
Cost Function:
- We are looing for optimizing
and such taht it minimizes the cost function which - Assumed that the dataset has M instances and p features.
- Cost Function:
- We are looing for optimizing
and such taht it minimizes the cost function which - Assumed that the dataset has M instances and p features.
Linear Regression Assumptions:
- Ordinary Leaset Square:
- We are looing for optimizing
-
Linear Relationship between predictors and the target variable, meaning the pattern must in the form of a straight-line(or a hyperplane in case of multiple linear regressions)
- This assumption is validated if there is no discerning, nonlinear pattern in the residual plot. Let’s consider the following example
- Example:
- In the above case, the assumption is violated since a U-shape pattern is apparent. In other words, the true relationship is nonlinear.
- In the above case, the assumption is violated since a U-shape pattern is apparent. In other words, the true relationship is nonlinear.
-
Homoscedaticity, i.e, constant variance of the Residulas
- This assumption is validated if the residuals are scattered evenly (about the same distance) with respect to the zero-horizontal line throughout the x-axis in the residual plot.
- Example:
- In the below case, the assumption is violated since the variance is getting smaller on larger fitted values
- In the below case, the assumption is violated since the variance is getting smaller on larger fitted values
- Independenet observations, this is actually equivalent to independent Residulas
- This assumption is validated if there is no discerning pattern between several consecutive residuals in the residual plot.
- Example:
- In the below case, the assumption is violated since there are discerning patterns (both are linear with a negative slope) between consecutive residuals
- Normality of Residuals, i.e., the residuals follow the normal distribution Find the Best Model:
-
see Model Selection
-
Apply Penalty(Ridge and Lasso regression are some of the simple techniques to reduce model complexity and prevent over-fitting which may result from simple linear regression ):
-
Ridge Regression
Ridge Regression(
Penalty)
Defination and Ideas:
-
In ridge Regression, the cost function is altered by adding a penalty equivalent to square of the magnitude of the coefficients:
- equivalent to saying minimizing the cost function in equation under the condition that for some c>0,
-
The ridge regression puts constraint on the coefficients(
). Then penalty term( ) regularizes the coefficients such that if the coefficients take large values the optimization function is penalized. -
We taking the correlation of the matrix and add constant
-
1+e & & & \ & 1+e & & \ & & 1+e & \ & & & 1+e \end{matrix}| $$
Issue with Ridge Regression:
-
While it can have better prediction error than linear regression
-
it workout best when there was a subset of the true coefficients that are samll or zero.
-
it will never sets coefficients to zero exactly, and therefore cannot perform variable selection in the linear model
Apply Ridge Regression: R Code:
Python Code:
Link to original -
-
Lasso Regression
Lasso Regression(L1 Penalty)
Defination and Ideas:
- In Lasso Regression, the cost function for lasso(least absolute shrinkage and selection operator) regression can be written as:
- For some t>0,
-This type of regluarization not only helps in reducing over-fitting but it can help us in Feature Selection.
Link to original - In Lasso Regression, the cost function for lasso(least absolute shrinkage and selection operator) regression can be written as:
-