12.2 Dummy Variables

So far, independent variables were quantitative such as price, income, square footage, miles, and so on. But very often, a qualitative characteristic such as religion or gender must be modeled. For this purpose, dummy variables that can be either 0 or 1 are used. Dummy variables represent a single qualitative characteristic. For example, consider the price (\(y_i\)) of a car depending on miles (\(x_i\)) and whether the car has all-wheel drive (AWD) or rear-wheel drive (RWD). This characteristic can be modeled using a dummy variable (\(d_i\)). If \(d_i=1\), the car has AWD and if \(d_1=0\), the car has RWD. The regression equation can be written as follows: \[y_i=\beta_0+\beta_1 \cdot x_i+\beta_2 \cdot d_i+\epsilon_i\] his regression can theoretically be separated into two single equations:

  • RWD: \(y_i=\beta_0+\beta_1 \cdot x_i+\epsilon_i\)
  • AWD: \(y_i=(\beta_0+\beta_2)+\beta_1 \cdot x_i+\epsilon_i\)

To interpret the dummy variables, it is necessary to know how it is coded. In the above case, if the coefficient \(\beta_2\) is positive, then the dummay variable adds to the price. That is, the coefficient \(\beta_2\) represents the value of all-wheel drive.

## 
## Call:
## lm(formula = price ~ miles + allwheeldrive, data = bmw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3874.1 -1724.0  -176.5  1604.5  5355.0 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.047e+04  1.711e+03  23.660  < 2e-16 ***
## miles         -2.728e-01  4.044e-02  -6.745 3.05e-07 ***
## allwheeldrive  3.429e+03  1.063e+03   3.227  0.00327 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2449 on 27 degrees of freedom
## Multiple R-squared:  0.6287, Adjusted R-squared:  0.6012 
## F-statistic: 22.86 on 2 and 27 DF,  p-value: 1.553e-06