12.2 Dummy Variables
So far, independent variables were quantitative such as price, income, square footage, miles, and so on. But very often, a qualitative characteristic such as religion or gender must be modeled. For this purpose, dummy variables that can be either 0 or 1 are used. Dummy variables represent a single qualitative characteristic. For example, consider the price (\(y_i\)) of a car depending on miles (\(x_i\)) and whether the car has all-wheel drive (AWD) or rear-wheel drive (RWD). This characteristic can be modeled using a dummy variable (\(d_i\)). If \(d_i=1\), the car has AWD and if \(d_1=0\), the car has RWD. The regression equation can be written as follows: \[y_i=\beta_0+\beta_1 \cdot x_i+\beta_2 \cdot d_i+\epsilon_i\] his regression can theoretically be separated into two single equations:
- RWD: \(y_i=\beta_0+\beta_1 \cdot x_i+\epsilon_i\)
- AWD: \(y_i=(\beta_0+\beta_2)+\beta_1 \cdot x_i+\epsilon_i\)
To interpret the dummy variables, it is necessary to know how it is coded. In the above case, if the coefficient \(\beta_2\) is positive, then the dummay variable adds to the price. That is, the coefficient \(\beta_2\) represents the value of all-wheel drive.
##
## Call:
## lm(formula = price ~ miles + allwheeldrive, data = bmw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3874.1 -1724.0 -176.5 1604.5 5355.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.047e+04 1.711e+03 23.660 < 2e-16 ***
## miles -2.728e-01 4.044e-02 -6.745 3.05e-07 ***
## allwheeldrive 3.429e+03 1.063e+03 3.227 0.00327 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2449 on 27 degrees of freedom
## Multiple R-squared: 0.6287, Adjusted R-squared: 0.6012
## F-statistic: 22.86 on 2 and 27 DF, p-value: 1.553e-06