Testing:


  1. Global Anova test and set up
  2. t-test for individual slope()
  3. Generalized Linear / Partial F test

Gloabal:

  • “Fit the full model” to the data:
    • Obtain the least squares estimates of and .
    • Determine the error sum of squares, which we denote “SSE
  • “Fit the reduced model” to the data.:
    • Obtain the least squares estimate of .
    • Determine the error sum of squares, which we denote “SSE(0).”

(Null Model) (The model you choose to fit)

or at least one , for

note: at least one predictor is a significant predictor of Y
note:

Anova Table

Reject if or p-value (if is rejected, model is significant better than the null model. it does not provide which predictors are significant)

Visual Compare reduced model and Full model

Full model: Reduced model:

note:

SSE-Reduced > SSE-FULL, Adding predictors to a linear regression model will always reduces SSE. RMSE always selects the model with most predictors

Lower values of RMSE indicate a better fit of the model to the data, while higher values indicate a poorer fit. However, by itself, RMSE doesn’t tell you whether the model’s predictions are biased. It merely indicates the magnitude of the error.


T test for individual coefficient():

In SLR

Testing for : Testing for feature importance

Test set up:

Rejection Rule : Reject when or -value . IF is rejected, then X is a significance factor of y

Global T test:

In MLR, t test for to check the importance of :

In matrix form:

Reject Ho if or -vale

If rejected, changes significantly (by ) When increases, given all other predictions stay the same. thus is a significant predictor of . given all the other predictor in the model.


ANOVA Test

sequential sum of squares

  • sequential sum of squares(Type=1):
    • definition:
      • It is the reduction in the error sum of squares (SSE) when one or more predictor variables are added to the model.
      • Or, it is the increase in the regression sum of squares (SSR) when one or more predictor variables are added to the model.
    • note: Sequential sum of squares (Type I) breaks down the variability explained by each predictor as they’re added to the model. For each predictor, we assess how much additional variance it explains beyond preceding predictors. However, in cases of multicollinearity, the order of adding predictors can significantly influence the results.
    • The sequential sum of squares measures either the reduction in the error sum of squares (SSE) when adding predictors or the increase in the regression sum of squares (SSR) due to the same addition.
    • Theory model:

For Test using :

  • in this example, denote to the error sum of squares when  is the only predictor in the model, denotes the error sum of squares when   and   are both in the model
  • The rejection of indicate the significance of given the predictors before it already in the model
note:
  • all ss are collated in the above way, therefore, changing the order of the predictors in the model will CHANGE THE RESULT!

\begin{array}{ll} y \sim x_1+x_2+x_3 & s s\left(x_2\right)=s s\left(x_2 \mid x_1\right) \ y \sim x_3+x_2+x_1 & \text { ss }\left(x_2\right)=s s\left(x_2 \mid x_3\right) \ y \sim x_1+x_3+x_2 & s s\left(x_2\right)=s s\left(x_2 \mid x_1, x_3\right) \end{array}

You can't use 'macro parameter character #' in math mode- when you change the order, you con see the "importance" of the predictors changes. - SSE in the table is the SSE of full model ( $y \sim$ all the predictor in the table) - Summation of $s s\left(x_1\right)+s s\left(X_2\right)+s s\left(X_3\right)+S s E \neq S S T$. ### Partial sum of squares: - definition: - we wish to know what percent of the variation _not_ explained by x1 _is_ explained by x2 and x3. In other words, given x1, what additional percent of the variation can be explained by x2 and x3 - Partial sum of squares(Type = 2): - $$ \begin{aligned} & x_1: \operatorname{ss}\left(x_1 \mid x_2, x_3\right) \quad H_0: y=\beta_0+\beta_2 x_3+\beta_3 x_3+\varepsilon &H_1: y=\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3 x_3 \\ & x_2: \operatorname{ss}\left(x_2 \mid x_1, x_3\right) \\ & x_3: \operatorname{ss}\left(x_3 \mid x_1, x_2\right) \end{aligned}

Summary:

  1. t-test for Coefficients
  • Test if each coefficient is significantly different from zero.
  • The coefficients:

with variance are assumed normally distributed.

  • -statistic formula for :

where

  • Hypothesis:
  • Decision Rule: Reject if

or if -value .


  1. Partial F-test(Type II)
  • Compares the fit of the chosen model to a null model.
  • Hypothesis:
  • (Null Model)
  • (Chosen Model)
  • F-statistic formula:
  • Decision Rule: Reject if

or if -value .

  • Note: Rejecting indicates the chosen model is significantly better than the null model. However, it does not identify which predictors are significant.

  1. Type III (Marginal F-test):

  1. Analysis of Variance (ANOVA) in Regression
  • Type I (Sequential F-test): Tests the significance of predictors in the order they are entered into the model.
  • Type II (Partial F-test): Tests each predictor’s significance after controlling for other predictors.
  • Type III: Marginal F test, tests the significance of each predictor by considering all possible orders of entry of the predictors into the model. This test is particularly useful when there are interactions and higher-order terms in the model, as it offers an unbiased test for each term.

reference: Class notes https://online.stat.psu.edu/stat501/lesson/6/6.3#:~:text=What%20is%20a%20%22sequential%20sum,are%20added%20to%20the%20model. https://stats.stackexchange.com/questions/20452/how-to-interpret-type-i-type-ii-and-type-iii-anova-and-manova