In both examples, linear regression assumes a straight-line relationship between variables, while nonlinear regression introduces more complex equations that capture nonlinear patterns §
Example 1.1 Continued - Coffee Sales and Shelf Space
For the coffee data, we observe the following summary statistics in Table 2.
Table 2: Summary Calculations - Coffee sales data
From this, we obtain the following sums of squares and crossproducts.
From these, we obtain the least squares estimate of the true linear regression relation .
The key assumptions (conditions) that need to be satisfied for the Gauss-Markov theorem to hold:
Linearity in Parameters: The regression model is linear in parameters. In the context of a simple linear regression, this can be represented as , where is the dependent variable, is the independent variable, and is the error term.
Random Sampling: The data (observations) are obtained using a random sampling process.
No Perfect Collinearity: For the multiple regression case, no independent variable is a perfect linear function of other explanatory variables. In simple linear regression, this is trivially satisfied since there’s only one independent variable.
Zero Conditional Mean: The expected value of the error term, given the independent variable(s), is zero. Mathematically, this is represented as . This implies that any variation in the error term is not systematically related to the independent variable(s).
Homoscedasticity: The variance of the error term is constant across all values of the independent variable(s). Formally, . This means that the spread of the residuals remains constant as the value of the independent variable(s) changes.
Residuals in a statistical or machine learning model are the differences between observed and predicted values of data. They are a diagnostic measure used when assessing the quality of a model. They are also known as errors
is an estimation of =0 . unbiased
Note:
is an unbiased estimator of
Sampling distribution of :
Normally distributed
Out of Sample Error:
given (new data)
properties:
Sampling distribution of :
Normally Distribution
Use :
Residuals are importatant when determining the quality of a model. We can examine residuals in terms of their mahnitude and/ or whether they form a parttern
Where the residuals are all 0, the model predicts perfectly. THe further residuals are from 0, the less accurate the model. In the case of Linear Regression, the greater the sum of squared residucals, the smaller the r-squared tatistic, all else being equal
Where the average residual is not 0, it implies that the model is systematically biased(i.e., consistently over-or under-predicting)
Residual plot:
In the case of simple linear regression (regression with 1 predictor), we set the predictor as the x-axis and the residual as the y-axis
In the case of multiple linear regression (regression with >1 predictor), we set the fitted value as the x-axis and the residual as the y-axis
F test and t test are equavilent in SLR -> p-value of F = p-value of t, the test statistic relationship between t-test and F-test in the context of SLR is
F test is used to ocmpare a “shorter” model and a “longer model”:
SSE-Reduced > SSE-FULL, Adding predictors to a linear regression model will always reduces SSE. RMSE always selects the model with most predictors
Lower values of RMSE indicate a better fit of the model to the data, while higher values indicate a poorer fit. However, by itself, RMSE doesn’t tell you whether the model’s predictions are biased. It merely indicates the magnitude of the error.