Essay on variability and graphing data

It would be prudent to use a sample as a statistical representation of the entire population.

A suitable sample is where n≤30 , and an appropriate distribution assumed for the data. For a data set/sample Xi , i= 1, 2, n, where n≤30 with mean x , consider an individual observation xi. Then (xi-x) is the distance of xi from the mean, also called the deviation. The two can be thought of as points plotted on a line graph. It might be logical to use | xi-x| in analyzing this deviation but the graph of such an absolute function gives a nasty kink where x= 0. The best option would be to use (xi -x)2, since this gives a smooth curve. The variance is this summation divided by n-1 and not n to give us an unbiased estimate of the population variance. This can be proven using the mean squared error.
We say that S2= 1n-1xi –x2 , this is the sample variance and its square root is the standard deviation. However if n is large enough, there is no difference between the estimates. That is, S2= δ2= 1 (xi -x)2n where n→∞. But the fact that S2 is an unbiased estimator of the population variance does not mean S is an unbiased estimator of the population standard deviation. A biased estimator of the standard deviation could be suitable estimate if it deviates very little from its true value, that is, it has a small spread. Using the measure of efficiency called the mean squared error, we can always show that the biased estimator is more suitable compared to the unbiased one.
Technically, the mean squared error MSE= variance +bias2, so if the estimator is unbiased then MSE= variance. But for consistency, it is desirable that as n becomes larger MSE approaches zero, this makes a biased estimator a good estimator since its variability is low. This is why a sample standard deviation may not be a good estimator of the population standard deviation based on S2 the sample variance. So I would say my standard deviation based on my sample of xi , i= 1, 2, n and n≤30 is a biased estimate but provided it has a small mean squared error it is a suitable estimate.
When comparing the standard deviation to the variance, I would prefer to interpret the variance but base my decision on the standard deviation. One can interpret the variance by comparing it to CRLB (Crammer Rao Lower Bound), which gives the lower bound of variance of the unbiased estimator. On the other hand variability is caused by squaring the standard deviation, so the standard deviation is a more stable measurement to use since no factoring is involved.


Peck, R., Olsen, C., & Devore, J. L. (2012). Introduction to statistics and data analysis. Boston, MA: Brooks/Cole Cengage Learning.
Morley, S., & Adams, M. (1991). Graphical analysis of single‐case time series data. British Journal of Clinical Psychology, 30(2), 97-115.