## 8.3 Sample Size Calculation for a Proportion

Recall the confidence interval for a proportion: \[\hat{p} \pm \underbrace{z \cdot \sqrt{\frac{\hat{p} \cdot (1-\hat{p})}{n}}}_{\text{Margin of Error (ME)}}\] If a maximum margin of error is desired, e.g., \(\pm 2\%\), then the above expression must be solved for \(n\). This leads to the following sample size formula: \[n > \frac{1.96^2 \cdot p \cdot (1-p)}{ME^2}\] In order to use the formula, the value of \(p\) must be known. The problem is that the value is only known after the survey or poll is conducted and not before. To ensure that the margin of error is within the desired limits, a value of \(p=0.5\) is chosen. If the survey or poll is conducted and results in a proportion of people being in favor of an issue or a candidate, then the margin of error will exactly correspond chosen one. If the resulting proportion is different from 0.5, the margin of error is smaller. Thus, the value of \(p=0.5\) can be considered a worst-case scenario because it maximizes the variance. If there is knowledge about the resulting proportion before then that value can be used. For example, using a value of 0.5 to determine the sample size for the U.S. unemployment rate would be prohibitively expensive and unnecessary.

To illustrate the concept of sample size calculation, let us consider a survey that is interested in the proportion of people in support of a property tax reform. You do not have any knowledge about the population parameters but want the estimate to be within 2%. For this reason, you adopt an initial estimate of \(p=0.5\). This results in a worst case scenario. \[n= \frac{1.96^2 \cdot 0.5 \cdot (1-0.5)}{0.02^2} =2401\] If the desired margin of error is reduced in half, then the sample size does not double but quadruples. This is due to squared terms in the above equations.

In some (rare) cases, the sample size necessary depends on the population size. Suppose you are interested in how many students support a privatization of parking. The sample size calculation for a finite population is written as follows: \[n_f = \frac{n_{\infty} \cdot N}{n_{\infty} + (N-1)}\] The term \(n_\infty\) represents the sample size for an infinite population. Consider a college of 10,000 students. The sample size calculation proceeds as follows: \[\frac{2401 \cdot 10000}{2401 +( 10000-1)}=1937\]