9.1 One-Group: Proportions

To execute a hypothesis test for a population proportion, we have to assume that the data is categorical with the population proportion $p$ defined in the context. Assuming that the sample size is above 30, the test statistic is written as $z=\frac{\hat{p}-p_0}{\sqrt{p_0 \cdot (1-p_0)/n}}$ Recall that the sampling distribution of a sample proportion has mean $p$ and standard deviation $\sqrt{p \cdot (1-p)/n}$ . This z-score measures the number of standard errors between the sample proportion $\hat{p}$ and the null hypothesis $p_0$ . The significance level shows us how strong the evidence must be. For example, assume we have a sample size of $n=100$ and that $\hat{p}=0.48$ . He hypothesize that $p_0=0.5$ , i.e., $H_0$ : $p_0=0.5$ . The standard error is $S.E. = \sqrt{\frac{0.5 \cdot 0.5}{100}}=0.05$ Thus, the z-score is $z=\frac{0.48-0.5}{0.05}=\frac{0.02}{0.05}=-0.4$ For a two-sided hypothesis test at the $\alpha=0.05$ level, we fail to reject the hypothesis because $-0.4>-1.96$ .

Consider the data in gsssocialmedia. Suppose that Instagram claims that 1/3 of Americans use their service. This can be verified with a two-sided hypothesis test:

legalgrass          = gss[c("year","grass")]
legalgrass          = na.omit(legalgrass)
legalgrass$grass    = ifelse(legalgrass$grass==1,1,0)
df                  = subset(legalgrass,year==2022)
t.test(df$grass,mu=1/3,alternative=c("two.sided"))

## 
##  One Sample t-test
## 
## data:  df$grass
## t = 26.969, df = 1122, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0.3333333
## 95 percent confidence interval:
##  0.6748924 0.7284914
## sample estimates:
## mean of x 
## 0.7016919

The p-value is above 5% and thus, we fail to reject the hypothesis. Now suppose that Instagram claims that more than 1/3 of Americans use their service. The term “more than” suggests a one-sided hypothesis. The hypothesis is formulated as follows: $H_0: p \geq 1/3\\ H_a: p < 1/3$

It is very important to correctly state the alternative hypothesis in R.

t.test(df$grass,mu=1/3,alternative=c("less"))

## 
##  One Sample t-test
## 
## data:  df$grass
## t = 26.969, df = 1122, p-value = 1
## alternative hypothesis: true mean is less than 0.3333333
## 95 percent confidence interval:
##      -Inf 0.724177
## sample estimates:
## mean of x 
## 0.7016919

In this case, the p-value is below 5% and thus, the hypothesis is rejected.