MLR - Categorical Predictors(09/14)
Notes :
- Y is numerical (values taken from a continuous scale) MLR
is categorical : Classification. Logistic Regression x can be any date type, type of x doesn’t affect the model approach
Approach:
Dummy encoding in MLR
- (k-1) dummy variables
- when
, the obs is in - ex:
- car brand:
Dummy Encoding:
For dummy encoding, we’ll convert the “Car Brand” categorical variable into separate binary (0 or 1) columns for each brand.
We’ll use ’
Price
Using the above principle:
-
Toyota:
-
Honda:
-
Ford:
-
If both
is_Toyota
andis_Honda
are 0, the car is implicitly a Ford. -
The coefficients
and capture the average price differences between Toyota and Ford, and Honda and Ford, respectively, after accounting for horsepower. -
The coefficient
captures the price change for a one-unit increase in horsepower, holding brand constant. -
class notes(0914, page 4)
t test:
If is_Toyota
has a significant coefficient, it would mean that being a Toyota( of course) has a significant impact on the car’s price after accounting for other variables in the model
if we failed to reject:
-
The category represented by that coefficient does not have a statistically significant difference in the mean outcome when compared to the reference category
-
if the coefficient for
is_Toyota
is not statistically significant, it suggests that, after adjusting for other predictors, there’s no significant evidence to claim that Toyota cars have a different average price than the reference category (e.g., Ford) -
Not rejecting the null hypothesis can offer insights about the relationship between the categorical predictor and the outcome. It tells us that, at least in the context of the current dataset and model, the specific category doesn’t appear to influence the outcome any differently than the reference category doe
F test: Given:
Type1:
Type2:
NOtes:
- don’t simply drop a dummy variable because your fail to reject(same as relationship)
- MLR is bad on handle too many predictors well
- more predictors means:
- MOre data to support estimating
- more parameters
- more complex
- more predictors means:
- When there’s a predictors with too many categories:
- regroup to less