8.1 Confidence Interval for a Proportion

Recall from the Bernoulli distribution that \(p+q=1\) and \(\sigma=\sqrt{(p \cdot q)}\). The standard error for the mean is \[\sigma_{\hat{p}}=\sqrt{\frac{\hat{p} \cdot (1-\hat{p})}{n}}\] Thus, the 95% confidence interval is constructed as follows: \[\hat{p} \pm 1.96 \cdot \sigma_{\bar{x}} \Leftrightarrow \hat{p} \pm 1.96 \cdot \sqrt{\frac{\hat{p} \cdot (1-\hat{p})}{n}}\] Consider the voting data in gss2018. The mean is 0.67 and the standard error is \[\sigma_{\bar{x}}=\sqrt{\frac{0.67 \cdot (1-0.67)}{772}}=0.0169\]

In this case, the margin of error is \(1.96 \cdot 0.0169=0.033\). To calculate a confidence interval for a proportion in R, the function t.test() is used. Before using the function, the manual calculations are presented.

voting         = subset(gss,vote20 %in% c(1,2),select=c("vote20"))
voting$vote20  = ifelse(voting$vote20==1,1,0)
voting         = as.numeric(voting$vote20)
nobs           = nrow(voting)
meandata       = mean(voting)
z              = qnorm(0.975)
stderror       = sqrt(meandata*(1-meandata)/nobs)
CI_lower       = meandata-z*stderror
CI_upper       = meandata+z*stderror

Using the function t.test() is simpler:

t.test(voting)

## 
##  One Sample t-test
## 
## data:  voting
## t = 107.04, df = 3154, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.7697887 0.7985156
## sample estimates:
## mean of x 
## 0.7841521