Considering measures of variability
The variability of a distribution tells us important information
about how different the scores are from one another. Variability helps
us understand the shape of the distribution, understand the difference
between the mean and median of the distribution, and make decisions about
what the data distribution means with regard to our research question.
We'll concentrate on two measures of variability, the range and
the standard deviation. Consider a very simple distribution (scores
on a 30 point quiz), with only eight data points.
21, 22, 23, 24, 25, 26, 27, 28
The simplest measure of variability is the range.
The range is the difference between the largest (maximum) X
value and the smallest (minimum) X value.
We can also calculate the standard deviation of the distribution. It measures how far off all of the individuals in the distribution are
from a standard, where that standard is the mean of the distribution. In other words, it is the "average distance" that each point is from the mean.
(1) Look at the data set above and compute the range
of the distribution. What is the range of quiz scores?
STEP 1: Our first step is to compute the mean of the distribution.
(3f) Let's think about these measures for a minute:
(2) Compute the mean for this quiz score distribution.
STEP 2: The next step is to figure out how far away each of the
data points in the distribution is from this standard (the deviations).
(3a) Calculate the deviation of each score from the
mean by subtracting each score in the distribution from the mean you calculated
above (i.e., score - mean = deviation). List the deviation scores below.
You should have one deviation score for each of the 8 quiz scores listed
Notice that if you add up all of the deviations they should/must equal
0. Think about it at a conceptual level. What you are doing is taking one
side of the distribution and making it positive, and the other side negative
and adding them together. They should cancel each other out.
STEP 3: The next step is to find the "average" of these differences.
But notice that we have a problem. If they always add up to 0, then the
average difference will always be zero. So what we have to do is get rid
of the negative signs. We do this by squaring the deviations (and then
later we'll reverse this by taking the square root of the sum of the squared
deviations). So to do this step we'll square the deviations first and then
add them together, this value we'll refer to as the Sum of the Squared
Deviations (Sum of Squares for short).
(3b) Square each of the deviation scores and then add
of the values you get. This will be the sum of squared deviations. What
is the sum of squares for this distribution?
STEP 4: Now we have the sum of squares (SS), but remember that
we're looking for the average of the squared deviations. So to get the
mean, we need to divide by the number of individuals in the sample minus
1 (n - 1). The reason that we need to subtract 1 is related to the
fact that our deviations always add up to 0. Because we know the mean of
our sample in advance, this constrains one of the data points. We will
discuss this issue (called degrees of freedom) in more detail later in
the course. If we are computing the standard deviation for a population
rather than a sample, then we figure out the average deviation by dividing
by n alone. Some calculators and statistical packages will give
you the option of calculating the standard deviation for either a sample
or a population.
(3c) Divide your sum of squares value by n - 1 (i.e.,
8 - 1 = 7) to find the average sum of squares. List it below.
STEP 5: The last step is to reverse that squaring we did to get
rid of the signs of the deviations. We need to take the square root to
get our standard deviation.
(3d) Take the square root of the average sum of squares
value. This will give you the standard deviation (i.e., the average difference
from the mean for the scores in the distribution). The value you get should
be about 2.45 (with some room for rounding error). If you didn't get something
close to this value, go back through the steps again and make sure you
did each step correctly.
You should always ask yourself, "does this answer make sense." Look
back at the deviations. These represent the actual difference between each
score and the mean. If you ignored the sign for each and just took the
average, we'd have an average of 2. And if this were a population, that's
the value we should get for the standard deviation. But remember, since
this is a sample we divide by n - 1 and get a number close to this, 2.45.
This represents the average difference between the scores and the mean
and it makes sense for this distribution. With a larger sample, we'd get
even closer to the value of 2.
STEP 1: Compute the mean of the distribution
STEP 2: Find the deviation scores by subtracting each score
from the mean
STEP 3: Square the deviation scores to get rid of the sign and
add them up to get the sum of squares
STEP 4: Take the average of the sum of squares by dividing by
n for a population or n - 1 for a sample
STEP 5: Take the square root to reverse the squaring that was
How many scores in the distribution is the range
How many scores in the distribution is the standard
deviation based on?
From your answers here, which measure should
give you more information about the entire distribution?
Using SPSS to compute measures of variability
Let's check our calculations using SPSS to find the range and standard
deviation for this distribution.
Open SPSS and instead of opening a data file,
choose the option to type in new data. Type in the quiz scores (21, 22,
23, 24, 25, 26, 27, 28) into a single column. Follow the steps below to
calculate the range and standard deviation. Type the values into your worksheet.
You can access these descriptive statistics in the same way that we
accessed the mean, median, and mode. Go to "Analyze", "Descriptive Statistics",
This will open up a window so that you can choose the variable you want
descriptive statistics on. Choose the variable and click on the arrow tab
in the middle of the window to put it in the box. The click "Ok." Your
output will display minimum and maximum values, the mean of the distribution,
and the standard deviation.
You can get the range by chooing "Options" and clicking the range box
to check it.
The click "Ok." Your output will display minimum and maximum values,
the range, the mean of the distribution, and the standard deviation.
(4) Type the range and standard deviation values into
your worksheet. Compare your calculated range and sd values with the ones
given by SPSS. If they are different, explain why you think that would
Comparing variability for different distributions
The results are presented as two distributions of reaction times, one for
large squares and one for small squares.
(5a) Briefly describe the similarities and differences
between the two distributions (e.g. shape, center, and spread).
(5b) Based on these distributions, what effect does square size have on your
Consider the following three sets of data:
One last example:
|1, 2, 3, 3, 4, 4, 4, 5, 5, 5,
5, 5, 6, 6, 6, 7, 7, 8, 9, 9
|3, 3, 3, 3, 4, 4, 4, 5, 5, 5,
5, 5, 6, 6, 6, 7, 7, 7, 7, 8
|1, 3, 3, 4, 4, 5, 5, 5, 5, 5,
5, 5, 5, 5, 6, 6, 6 ,7 ,7, 9
(6) Just looking at the numbers, for which distribution
is variability the lowest? Why did you come to that conclusion?
Next, we will type the data from the 3 distributions
into a new file in SPSS. You can think of the 3 distributions above as
3 different variables for SPSS - that is, you will enter data from 'distribution1'
into the first column, data from 'distribution2' into the second column,
and 'distribution3' into the 3rd column. You may name these 3 new variables
whatever you'd like. Once you've entered this data into SPSS, complete
(7) For each distribution, use SPSS to construct a histogram.
Cut and paste the histogram into your worksheet.
(8) For each distribution use SPSS to compute the range
and standard deviation. Type the answers into your worksheet.
(9) Which measure of variability is most affected by
extreme values (hint: compare the 2nd and 3rd distributions)?
Let's go back to the students.sav file and look at the distributions of
quiz scores (quizzes 1-5) that we looked at in lab 8 to compare them based
on their variability.
Open the students.sav file and calculate the range
and variability for each variable. Cut and paste the output into your worksheet.
Then answer the questions below:
(10) Which quiz appears to do a better job of discriminating
student learning of the material on the quiz (that is, which quiz best
differentiates between high and low performers/students)? Why do you think
Properties of the standard deviation
1) Adding a constant to each score in the distribution will not change
the standard deviation.
So if you add 2 to every score in the distribution, the mean changes
(by 2), but the standard deviation stays the same (since none of the deviations
would change because you add 2 to each score and the mean changes by 2).
2) Multiplying each score by a constant causes the stardard deviation
to be multiplied by the same constant.
Let's look at these properties (along with the properties of the mean)
This one is easier to think of with numbers. Suppose that your mean
is 20, and that two of the individuals in your distribution are 21 and
23. If you multiply 21 and 23 by 2 you get 42 and 46, and your mean also
changes by a factor of 2 and is now 40. Before your deviations were (21
- 20 = 1) & (23 - 20 = 3). But now, your deviations are (42 - 40 =
2) & (46 - 40 = 6). So your deviations are getting twice as big as
Enter the following set of data into an SPSS data file.
|1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9, 9
(11) What is the mean and standard deviation for this
(12) Transform the distribution by adding 3 points to
each score (adding a constant to every score in the distribution). (hint:
you can use the compute
function in SPSS to do this). What will your mean and standard deviation
for the new distribution be? Check your answer with SPSS.
(13) Transform the original distribution (not the new
one from question 2) by multiplying every score by 3 (again, use the compute
function). What will your mean and standard deviation for the new distribution
be? Check your answer with SPSS.