7 Basic Statistics and Sampling

Slides: Basic Statistics and Sampling.pdf

In the previous sections, we assumed that we know the parameters associated with probability distributions. From now on, we are interested in finding those parameters by sampling from a population. It is important to differentiate between the population ( whose parameters remain unknown to the researcher) and a sample (i.e., a subset) taken from the population. The sample can tell us something about the population parameters. The sampling distribution will be the probability distribution associated with the statistic, e.g., mean or variance, from the sample. Put differently, when we take a sample and calculate the mean, how would that estimate differ and which values would a different sample produce. Suppose you have a set of random variables \(X_1, X_2, X_3, \dots , X_n\) which represent the results of repeating an experiment. The random variables are independent and identically distributed (i.i.d.). The expectation of the average is written as: \[\bar{X}_n = \frac{X_1 + X_2 + \dots + X_n}{n}\] The sampling variance is expressed as: \[Var(\bar{X}_n) = \frac{\sigma^2}{n}\]

The standard error of the mean is written as \[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\]

This is different from the sample variance! The sampling variance represents the variation of a particular statistic, e.g., mean. Before we get into the details, let us introduce two very important concepts: (1) the Law of Large Numbers and (2) the Central Limit Theorem.