For this lab, you will
need SPSS
file students.sav.
Computing variability
The variability of a
distribution tells us important information
about how different the scores are from one
another. Variability helps us understand the
shape of the distribution, understand the
difference between the mean and median of the
distribution, and make decisions about what the
data distribution means with regard to our
research question.
The Range
Consider a very simple
distribution of a sample of 8 scores on a 30
point quiz.
21, 22, 23, 24, 25, 26, 27, 28
The simplest measure of variability is
the range.
The range is the
difference between the largest (maximum) X
value and the smallest (minimum) X value. In
this case the answer would be 28−21=7
(1)
Look at the data set above (the
distribution of 8 scores from the 30 point
quiz) and compute the range of the
distribution. What is the range of quiz
scores?
The Standard Deviation
Calculating the standard
deviation by hand
We can calculate the standard
deviation of the distribution. It
measures how far off all of the individuals in
the distribution are from a standard, where
that standard is the mean of the distribution.
In other words, it is the "average distance"
that each point is from the mean.
STEP 1: Compute
the mean of the distribution.
STEP 2: Figure out
how far away each of the data points in the
distribution is from this standard (the deviations). Calculate the deviation of
each score from the mean by subtracting
each score in the distribution from the
mean (i.e., score - mean = deviation).
Hint: remember that
you will have some negative
deviations.
Notice that if you add up all of the
deviations they should/must equal 0. Think
about it at a conceptual level. What you are
doing is taking one side of the distribution
and making it positive, and the other side
negative and adding them together. They
should cancel each other out.
STEP 3:
What we want to do is
find the "average" of these
deviations. However notice that we
have a problem. Usually to find an
aveage, we add up the scores and divide by
the total number of scores. In this
case, if we add up all of the deviations,
they sum up to 0. This is because of
the point above, the negative and positive
deviations cancel each other out. So
we will do an additional step first, we
will square all of the deviation
scores. Square the deviations and
add them together to get the Sum of
the Squared Deviations (Sum of
Squares for short).
STEP 4:
To get the average of
the squared deviations, divide by the
number of scoress in the sample minus 1 (n
- 1).
The averaged squared deviation is called Variance.
The reason that we need to subtract 1 is
related to the fact that our deviations
always add up to 0. Because we know the mean
of our sample in advance, this constrains
one of the data points. We will discuss this
issue (called degrees of freedom) in more
detail later in the course.
Note:
If we are computing the standard
deviation for a population, then we
divide by n alone. Some
calculators and statistical packages
will give you the option of calculating
the standard deviation for either a
sample or a population, but the default
in SPSS, and most other software is the
sample formula.
STEP 5:
Reverse that squaring
we did to get rid of the signs of the
deviations. We need to take the square
root to get our Standard Deviation.
Because calculation of the standard
deviation requires many steps, it will be
much easier if you create a table that looks
like the one below and then fill in the
blanks as you go.
Score
X
|
Mean
μ
|
Deviation
X−μ
|
Squared Deviation
(X−μ)2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Totals |
|
|
|
(2) Now let's
calculate the standard deviation of the
quiz distribution (the 8 score
distribution form the 30 point quiz). You
may wish to use the table in the worksheet
like the one above to complete these
steps.
(a) What is the mean for this quiz score
distribution.
(b)
Calculate the deviation of each score from
the mean by subtracting each score in the
distribution from the mean you calculated
above (i.e., score - mean = deviation). List
the deviation scores below. You should have
one deviation score for each of the 8 quiz
scores listed above.
(c)
Square each of the deviation scores and then
add of the values you get. This will be the
sum of squared deviations. What is the sum
of squares for this distribution?
(d)
Divide your sum of squares value by n - 1
(i.e., 8 - 1 = 7) to find the average sum of
squares. List it below.
(e)
Take the square root of the average sum of
squares value. This will give you the
standard deviation (i.e., the average
difference from the mean for the scores in
the distribution). The value you get should
be about 2.45 (with some room for rounding
error). If you didn't get something close to
this value, go back through the steps again
and make sure you did each step correctly.
You should always ask yourself, "does this
answer make sense." Look back at the
deviations. These represent the actual
difference between each score and the mean. If
you ignored the sign for each and just took
the average, we'd have an average of 2. And if
this were a population, that's the value we
should get for the standard deviation. But
remember, since this is a sample we divide by
n - 1 and get a number close to this, 2.45.
This represents the average difference between
the scores and the mean and it makes sense for
this distribution. With a larger sample, we'd
get even closer to the value of 2.
(3) Take a step back and
think about these measures for a minute:
How many scores in the
distribution is the range based on?
How many scores in the
distribution is the standard deviation based
on?
From your answers here,
which measure should give you more
information about the entire distribution?
Why the standard deviation is better
than the range
Let’s compare the standard deviation and the
range for a second. The range is based on just
2 numbers: the lowest and the highest. The
standard deviation is based on every number in
the distribution. Every number matters. Thus
it is easy to see why statisticians prefer the
standard deviation over the range as an
accurate and informative measure of
variability.
Using SPSS to compute measures of
variability
Let’s check our calculations using SPSS to find
the range and standard deviation for this
distribution.
Open SPSS and instead of opening a data file,
choose the option to type in new data or just
click Cancel. SPSS opens to the Variable
View tab by default. In the Name
column, type in the name of a new variable. It
does not matter what you call it. You can do
what you want but whenever I don’t care what
something is named, I name it Fred
like this:
Click the Data View tab at the bottom
left and type in the quiz scores (21, 22, 23,
24, 25, 26, 27, 28) into a single column.
You can access these descriptive statistics in
the same way that we accessed the mean, median,
and mode. In the menu, go to Analyze→Descriptive
Statistics→Descriptives.
This will open up a window so that you can
choose the variable you want descriptive
statistics on. Choose the variable and click on
the arrow tab in the middle of the window to put
it in the box. Then click OK. Your
output will display minimum and maximum values,
the mean of the distribution, and the standard
deviation.
You can get the range by choosing Options
and clicking the range box to check it.
Click Continue and then click OK.
Your output will display minimum and maximum
values, the range, the mean of the distribution,
and the standard deviation.
(4)
Open students.sav.
Access these descriptive statistics in the
same as accessing the mean, median, and mode.
Go to Analyze, Descriptive Statistics,
and Descriptives.
Select quiz1-quiz5 and
final. (Hint: you can do this all
at once if you highlight all of them and move
them as a block.)
(a) What are the
mean, standard deviation, and range for
quizzes 1-5?
(b) Which quiz has the largest variability
based on range? Based on standard
deviation?
Comparing variability for different
distributions
Consider the following
three sets of data:
distribution 1
|
distribution 2
|
distribution 3
|
1,
2, 3, 3, 4, 4, 4, 5, 5, 5,
5, 5, 6, 6, 6, 7, 7, 8, 9, 9 |
3, 3, 3, 3, 4,
4, 4, 5, 5, 5,
5, 5, 6, 6, 6, 7, 7, 7, 7, 8 |
1, 3, 3, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,
7, 7, 9 |
Type the data from the 3
distributions into a new file in SPSS. You
can think of the 3 distributions above as 3
different variables for SPSS, that is, you
will enter data from distribution1
into the first column, data from distribution2
into the second column, and distribution3
into the 3rd column. Once you've entered
this data into SPSS, complete the following:
For each distribution, use
SPSS to construct a histogram and to compute
the range and standard deviation.
(5) Just looking at
the numbers, for which distribution is
variability the lowest? Why did you come
to that conclusion?
(6) For each
distribution, use SPSS to construct a
histogram. Include the histogram in your
SPSS output file that you attach to the
assignment.
(7) For each
distribution use SPSS to compute the range
and standard deviation. Type the answers
into your worksheet.
(8) Which measure of
variability is most affected by extreme
values (hint: compare the 2nd and 3rd
distributions)?
Properties of the standard deviation
Let’s look at some properties of the
standard deviation using SPSS.
Enter the following set of data into an
SPSS data file and calculate the standard
deviation:
Transform the distribution by adding 3
points to each score (adding a constant to
every score in the distribution). (Hint:
you can use the Compute function
in the Transform menu to do
this).
Calculate the standard deviation for the
transformed variable.
(9)
Comparing the standard deviations of the
original and transformed variables, what
happened?
Now transform the original distribution
by multiplying every score by 2 (again,
use the compute function). Calculate the
new standard deviation.
(10)
Comparing the standard deviations of the
original and transformed variables, what
happened?
Here is what I hope you learned by doing
this:
1) Adding a constant to each score in the
distribution will not change the standard
deviation.
So if you add 2 to every
score in the distribution, the mean
changes by 2, but the standard deviation
stays the same.
2) Multiplying each score by a constant
causes the stardard deviation to be
multiplied by the same constant.
This one is easier to
think of with numbers. Suppose that your
mean is 20, and that two of the
individuals in your distribution are 21
and 23.
If you multiply 21 and 23
by 2 you get 42 and 46, and your mean also
changes by a factor of 2 and is now 40.
Before your deviations
were (21−20 = 1) & (23−20 = 3).
But now, your deviations
are (42−40 = 2) & (46−40 = 6).
So your deviations are
getting twice as big as well.
Now attach
your Word worksheet, and SPSS output
(or copy and paste the required
parts of the output into your Word
document) file into Lab 9
assignment.
|