11.1 Measuring the Strength of the Relationship

To measure the strength of the hypothesized statistical relationship between the dependent and independent variables of the regression equation, we calculate a value called \(R^2\). The value of \(R^2\) can also be thought of as an indicator of goodness of fit, or how well the sample regression line fits the sample data. To see how this statistic is used, we decompose the variation of \(y\) in the sample into two components, i.e., the and the explained variation. Let the total sum of squares (TSS) be

\[TSS = \sum_{i=1}^N (y_i-\bar{y})^2\]

Let the explained sum of squares (ESS) be

\[ESS = \sum_{i=1}^N (\hat{y}_i-\bar{y})^2\]

And let the unexplained (residual) sum of square (RSS) be

\[RSS = \sum_{i=1}^N (y_i-\hat{y}_i)^2\]

Thus, the total sum of squares is equal to the explained sum of squares plus the unexplained sum of squares, i.e., TSS=RSS+ESS. The RSS represents the ``unexplained’’ variation, since it indicates the amount of error (or the residual) in the prediction of \(Y\); i.e., the difference between the actual value of \(y\) and its predicted value. The SSE represents the variation of the predicted values of \(y\) around \(\bar{y}\), and indicates the gain in predictive power achieved by using \(\hat{y}\) as a predictor of \(y\) instead of \(\bar{y}\). Hence, the ESS is the amount of total variation in \(y\) which is accounted for (or explained) by the regression line. So \(R^2\) is defined as \[R^2 = \frac{ESS}{TSS}=1-\frac{RSS}{TSS}\]