Analysis of Variance (ANOVA) models (also know as Dummy Variable Regression models) are regressions with only dummy variables. An ANOVA model with two independent variables can be written as follows:
\[y_i = \beta_0 + \beta_1 \cdot d_1 + \beta_2 \cdot d_2\]
where \(d_1\) and \(d_2\) are dummy variables. Consider the following model using the nfl
data for the year 2005:
\[total = \beta_0 + \beta_1 \cdot draft1 + \beta_2 \cdot veteran\]
where draft1 and veteran are dummy variables. That is, if \(draft1=1\), then the player was selected in the first draft round. If \(veteran=1\), then the player has played multiple seasons in the NFL. To distinguish j categories only j-1 dummy variables are needed. Otherwise, we have perfect multicollinearity. The category without a dummy variable is the base category.
## Warning: In subset.data.frame(nfl, year = 2005) :
## extra argument 'year' will be disregarded
## Call:
## lm(formula = total ~ draft1 + veteran, data = subset(nfl, year = 2005))
## Residuals:
## Min 1Q Median 3Q Max
## -3.340 -1.865 -0.702 0.792 32.429
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9999 0.2534 3.945 8.63e-05 ***
## draft1 2.6422 0.4262 6.200 8.81e-10 ***
## veteran 1.6083 0.2820 5.703 1.62e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 3.083 on 848 degrees of freedom
## (158 observations deleted due to missingness)
## Multiple R-squared: 0.05191, Adjusted R-squared: 0.04968
## F-statistic: 23.22 on 2 and 848 DF, p-value: 1.526e-10
For a player who was not drafted in the first round and is not a veteran, the income is close to $1 million. Note that both dummy variables are statistically significant. Note that the R-squared is very low.