Central Limit Theorem and Sampling Distribution
Even for events that do no follow normal distribution,
Central Limit Theorem (CLT) states that if you take several large samples of size 'n'
and plot the mean of each sample then it will follow the normal distribution!!
Also CLT states that the mean computed this way will be equal to the population
mean and the standard deviation will be equal to the standard deviation of the
population divided by the square root of the sample size 'n'.
It is because
of these two statements that we see wide applicability of the normal distribution.
s or σx¯ is given as:
Example:
Let's say we want to check air quality in our community. We decided to take 20 samples and compute the mean and standard deviation of a particular pollutant. We can estimate the air quality using the data of this one sample. However, the problem is that if we take 20 samples one more time, we'll come up with another mean and standard deviation. Which one is the correct value?
Here comes the CLT to the rescue! If we do many more such experiments of 20 samples and plot the mean and standard deviation of those experiments, we'll get the mean from that distribution which will be very close to the true mean of the air quality.
What about the standard deviation? The standard deviation of sampling distribution will be less than the standard deviation of the population. If our sample size was n=30 then the standard deviation of this sampling distribution will be lesser than the one with n=20. The reason is that as sample gets larger, we are more likely to include wide range of values in the sample mean computation. Therefore, we will average out those variations and the mean values will be less further apart from sample to sample, which translates into the fact that the spread of values about the true mean (standard deviation) will be less as 'n' gets larger.
Following figure explains this concept graphically.