9.1 One-Group: Proportions
To execute a hypothesis test for a population proportion, we have to assume that the data is categorical with the population proportion \(p\) defined in the context. Assuming that the sample size is above 30, the test statistic is written as \[z=\frac{\hat{p}-p_0}{\sqrt{p_0 \cdot (1-p_0)/n}}\] Recall that the sampling distribution of a sample proportion has mean \(p\) and standard deviation \(\sqrt{p \cdot (1-p)/n}\). This z-score measures the number of standard errors between the sample proportion \(\hat{p}\) and the null hypothesis \(p_0\). The significance level shows us how strong the evidence must be. For example, assume we have a sample size of \(n=100\) and that \(\hat{p}=0.48\). He hypothesize that \(p_0=0.5\), i.e., \(H_0\): \(p_0=0.5\). The standard error is \[S.E. = \sqrt{\frac{0.5 \cdot 0.5}{100}}=0.05\] Thus, the z-score is \[z=\frac{0.48-0.5}{0.05}=\frac{0.02}{0.05}=-0.4\] For a two-sided hypothesis test at the \(\alpha=0.05\) level, we fail to reject the hypothesis because \(-0.4>-1.96\).
Consider the data in gsssocialmedia
. Suppose that Instagram claims that 1/3 of Americans use their service. This can be verified with a two-sided hypothesis test:
socialmedia = gss[c("instagrm")]
socialmedia = na.omit(socialmedia)
socialmedia$instagrm = ifelse(socialmedia$instagrm=="yes",1,0)
t.test(socialmedia$instagrm,mu=1/3,alternative=c("two.sided"))
##
## One Sample t-test
##
## data: socialmedia$instagrm
## t = -2.0065, df = 1371, p-value = 0.045
## alternative hypothesis: true mean is not equal to 0.3333333
## 95 percent confidence interval:
## 0.2838431 0.3327750
## sample estimates:
## mean of x
## 0.308309
The p-value is above 5% and thus, we fail to reject the hypothesis. Now suppose that Instagram claims that more than 1/3 of Americans use their service. The term “more than” suggests a one-sided hypothesis. The hypothesis is formulated as follows: \[ H_0: p \geq 1/3\\ H_a: p < 1/3\]
It is very important to correctly state the alternative hypothesis in R.
##
## One Sample t-test
##
## data: socialmedia$instagrm
## t = -2.0065, df = 1371, p-value = 0.0225
## alternative hypothesis: true mean is less than 0.3333333
## 95 percent confidence interval:
## -Inf 0.3288373
## sample estimates:
## mean of x
## 0.308309
In this case, the p-value is below 5% and thus, the hypothesis is rejected.