When a set of numbers is obtained from a sample (e.g. income, voting preference, number of children, number of product defects, etc), it is often useful to know something about how ‘spread out’ the values are. Are they all bunched closely together, are they spread across a wide range, or something in between?
The simplest measure of spread is called the range, which simply measures the difference between the highest and the lowest value. Although very easy to calculate, the range contains no information about all the values in between the highest and lowest, and so is not particularly informative.
By far the most common measure of spread is called the variance. Calculating the variance involves several steps:
- Calculate the mean
- Take each value in the sample and calculate its deviation from the mean (i.e. subtract the mean from each value)
- Square all the deviations from the mean
- Calculate the mean (average) of these deviations (not the mean of the original sample, the mean of the deviations)
The variance is thus the average squared deviation of each value from the sample mean. A large variance means that many values are far away from the mean, whereas a small variance means that most values are bunched close to the mean. The reason why the deviation values are squared is so that positive deviations (values higher than the mean) do not cancel out with negative deviations (values lower than the mean), thus enabling us to get a measure of the total variation including deviations both above and below the mean.
Another measure of spread closely related to the variance is called the standard deviation, which is simply the square root of the variance. The purpose of taking the square root is to offset the squaring of each deviation that is done in order to calculate the variance. By taking the square root in this way, the standard deviation becomes a measure of variance that can be directly compared to the mean. So, for example, if the mean is 5 and the standard deviation is 10, this means (roughly) that most values in the sample are located about 10 away from the mean (either above or below). Many individual values may be closer or further way, but this represents the ‘average deviation’ from the mean.
Measures of spread: An introduction to range, quartiles, and variance
Variance definition: A concise definition of variance from Investopedia
Statistical language – measures of spread: A detailed explanation from the Australian Bureau of Statistics