MLR - Categorical Predictors(09/14)


Notes :

  • Y is numerical (values taken from a continuous scale) MLR
  • is categorical : Classification. Logistic Regression x can be any date type, type of x doesn’t affect the model approach

Approach:

Dummy encoding in MLR

  • (k-1) dummy variables
  • when , the obs is in
  • ex:
    • car brand:

Dummy Encoding: For dummy encoding, we’ll convert the “Car Brand” categorical variable into separate binary (0 or 1) columns for each brand. We’ll use ’ ’ dummy variables where ’ ’ is the number of categories. This avoids the “dummy variable trap”, which can cause multicollinearity issues in MLR.

Price is_Toyota is_Honda Horsepower

Using the above principle:

  • Toyota:

  • Honda:

  • Ford:

  • If both is_Toyota and is_Honda are 0, the car is implicitly a Ford.

  • The coefficients and capture the average price differences between Toyota and Ford, and Honda and Ford, respectively, after accounting for horsepower.

  • The coefficient captures the price change for a one-unit increase in horsepower, holding brand constant.

  • class notes(0914, page 4)


t test:

test for vs.

If is rejected: the mean value of is significant different from the mean Y value of reference level. from the example, if is_Toyota has a significant coefficient, it would mean that being a Toyota( of course) has a significant impact on the car’s price after accounting for other variables in the model

if we failed to reject:

  • The category represented by that coefficient does not have a statistically significant difference in the mean outcome when compared to the reference category

  • if the coefficient for is_Toyota is not statistically significant, it suggests that, after adjusting for other predictors, there’s no significant evidence to claim that Toyota cars have a different average price than the reference category (e.g., Ford)

  • Not rejecting the null hypothesis can offer insights about the relationship between the categorical predictor and the outcome. It tells us that, at least in the context of the current dataset and model, the specific category doesn’t appear to influence the outcome any differently than the reference category doe


F test: Given:

Type1:

Type2:


NOtes:

  • don’t simply drop a dummy variable because your fail to reject(same as relationship)
  • MLR is bad on handle too many predictors well
    • more predictors means:
      • MOre data to support estimating
      • more parameters
      • more complex
  • When there’s a predictors with too many categories:
    • regroup to less

TAGS