Outline

Consider what measures of variability tell us about a distribution
Compute measures of variability using Excel & SPSS
Compare distributions with different amounts of variability

Lab 9

Describing distributions: Variability

For this lab, you will need SPSS file students.sav.

Computing variability

The variability of a distribution tells us important information about how different the scores are from one another. Variability helps us understand the shape of the distribution, understand the difference between the mean and median of the distribution, and make decisions about what the data distribution means with regard to our research question.

The Range

Consider a very simple distribution of a sample of 8 scores on a 30 point quiz.

21, 22, 23, 24, 25, 26, 27, 28

The simplest measure of variability is the range.

The range is the difference between the largest (maximum) X value and the smallest (minimum) X value. In this case the answer would be 28−21=7

(1) Look at the data set above (the distribution of 8 scores from the 30 point quiz) and compute the range of the distribution. What is the range of quiz scores?

The Standard Deviation

Calculating the standard deviation by hand

We can calculate the standard deviation of the distribution. It measures how far off all of the individuals in the distribution are from a standard, where that standard is the mean of the distribution. In other words, it is the "average distance" that each point is from the mean.

STEP 1: Compute the mean of the distribution.
STEP 2: Figure out how far away each of the data points in the distribution is from this standard (the deviations). Calculate the deviation of each score from the mean by subtracting each score in the distribution from the mean (i.e., score - mean = deviation).

Hint: remember that you will have some negative deviations.
Notice that if you add up all of the deviations they should/must equal 0. Think about it at a conceptual level. What you are doing is taking one side of the distribution and making it positive, and the other side negative and adding them together. They should cancel each other out.

STEP 3: What we want to do is find the "average" of these deviations. However notice that we have a problem. Usually to find an aveage, we add up the scores and divide by the total number of scores. In this case, if we add up all of the deviations, they sum up to 0. This is because of the point above, the negative and positive deviations cancel each other out. So we will do an additional step first, we will square all of the deviation scores. Square the deviations and add them together to get the Sum of the Squared Deviations (Sum of Squares for short).

STEP 4: To get the average of the squared deviations, divide by the number of scoress in the sample minus 1 (n - 1). The averaged squared deviation is called Variance.

The reason that we need to subtract 1 is related to the fact that our deviations always add up to 0. Because we know the mean of our sample in advance, this constrains one of the data points. We will discuss this issue (called degrees of freedom) in more detail later in the course.

Note: If we are computing the standard deviation for a population, then we divide by n alone. Some calculators and statistical packages will give you the option of calculating the standard deviation for either a sample or a population, but the default in SPSS, and most other software is the sample formula.

STEP 5: Reverse that squaring we did to get rid of the signs of the deviations. We need to take the square root to get our Standard Deviation.

Because calculation of the standard deviation requires many steps, it will be much easier if you create a table that looks like the one below and then fill in the blanks as you go.

Score $X$ Mean $μ$ Deviation $X - μ$ Squared Deviation $(X - μ) 2$

Totals

(2) Now let's calculate the standard deviation of the quiz distribution (the 8 score distribution form the 30 point quiz). You may wish to use the table in the worksheet like the one above to complete these steps.
    (a) What is the mean for this quiz score distribution.
    (b) Calculate the deviation of each score from the mean by subtracting each score in the distribution from the mean you calculated above (i.e., score - mean = deviation). List the deviation scores below. You should have one deviation score for each of the 8 quiz scores listed above.
    (c) Square each of the deviation scores and then add of the values you get. This will be the sum of squared deviations. What is the sum of squares for this distribution?
    (d) Divide your sum of squares value by n - 1 (i.e., 8 - 1 = 7) to find the average sum of squares. List it below.
    (e) Take the square root of the average sum of squares value. This will give you the standard deviation (i.e., the average difference from the mean for the scores in the distribution). The value you get should be about 2.45 (with some room for rounding error). If you didn't get something close to this value, go back through the steps again and make sure you did each step correctly.

You should always ask yourself, "does this answer make sense." Look back at the deviations. These represent the actual difference between each score and the mean. If you ignored the sign for each and just took the average, we'd have an average of 2. And if this were a population, that's the value we should get for the standard deviation. But remember, since this is a sample we divide by n - 1 and get a number close to this, 2.45. This represents the average difference between the scores and the mean and it makes sense for this distribution. With a larger sample, we'd get even closer to the value of 2.

(3) Take a step back and think about these measures for a minute:

How many scores in the distribution is the range based on?
How many scores in the distribution is the standard deviation based on?
From your answers here, which measure should give you more information about the entire distribution?

Why the standard deviation is better than the range

Let’s compare the standard deviation and the range for a second. The range is based on just 2 numbers: the lowest and the highest. The standard deviation is based on every number in the distribution. Every number matters. Thus it is easy to see why statisticians prefer the standard deviation over the range as an accurate and informative measure of variability.

Using SPSS to compute measures of variability

Let’s check our calculations using SPSS to find the range and standard deviation for this distribution.

Open SPSS and instead of opening a data file, choose the option to type in new data or just click Cancel. SPSS opens to the Variable View tab by default. In the Name column, type in the name of a new variable. It does not matter what you call it. You can do what you want but whenever I don’t care what something is named, I name it Fred like this:

Click the Data View tab at the bottom left and type in the quiz scores (21, 22, 23, 24, 25, 26, 27, 28) into a single column.

You can access these descriptive statistics in the same way that we accessed the mean, median, and mode. In the menu, go to Analyze→Descriptive Statistics→Descriptives.

This will open up a window so that you can choose the variable you want descriptive statistics on. Choose the variable and click on the arrow tab in the middle of the window to put it in the box. Then click OK. Your output will display minimum and maximum values, the mean of the distribution, and the standard deviation.

You can get the range by choosing Options and clicking the range box to check it.

Click Continue and then click OK. Your output will display minimum and maximum values, the range, the mean of the distribution, and the standard deviation.

(4) Open students.sav. Access these descriptive statistics in the same as accessing the mean, median, and mode. Go to Analyze, Descriptive Statistics, and Descriptives. Select quiz1-quiz5 and final. (Hint: you can do this all at once if you highlight all of them and move them as a block.)

freq

(a) What are the mean, standard deviation, and range for quizzes 1-5?
(b) Which quiz has the largest variability based on range? Based on standard deviation?

Comparing variability for different distributions

Consider the following three sets of data:

distribution 1	distribution 2	distribution 3
1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9, 9	3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8	1, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 9

Type the data from the 3 distributions into a new file in SPSS. You can think of the 3 distributions above as 3 different variables for SPSS, that is, you will enter data from distribution1 into the first column, data from distribution2 into the second column, and distribution3 into the 3rd column. Once you've entered this data into SPSS, complete the following:

For each distribution, use SPSS to construct a histogram and to compute the range and standard deviation.

(5) Just looking at the numbers, for which distribution is variability the lowest? Why did you come to that conclusion?

(6) For each distribution, use SPSS to construct a histogram. Include the histogram in your SPSS output file that you attach to the assignment.
(7) For each distribution use SPSS to compute the range and standard deviation. Type the answers into your worksheet.
(8) Which measure of variability is most affected by extreme values (hint: compare the 2nd and 3rd distributions)?

Properties of the standard deviation

Let’s look at some properties of the standard deviation using SPSS.

Enter the following set of data into an SPSS data file and calculate the standard deviation:

2, 2, 8, 14, 14

Transform the distribution by adding 3 points to each score (adding a constant to every score in the distribution). (Hint: you can use the Compute function in the Transform menu to do this).

Calculate the standard deviation for the transformed variable.

(9) Comparing the standard deviations of the original and transformed variables, what happened?

Now transform the original distribution by multiplying every score by 2 (again, use the compute function). Calculate the new standard deviation.

(10) Comparing the standard deviations of the original and transformed variables, what happened?

Here is what I hope you learned by doing this:

1) Adding a constant to each score in the distribution will not change the standard deviation.

So if you add 2 to every score in the distribution, the mean changes by 2, but the standard deviation stays the same.

2) Multiplying each score by a constant causes the stardard deviation to be multiplied by the same constant.

This one is easier to think of with numbers. Suppose that your mean is 20, and that two of the individuals in your distribution are 21 and 23.

If you multiply 21 and 23 by 2 you get 42 and 46, and your mean also changes by a factor of 2 and is now 40.

Before your deviations were (21−20 = 1) & (23−20 = 3).

But now, your deviations are (42−40 = 2) & (46−40 = 6).

So your deviations are getting twice as big as well.

Now attach your Word worksheet, and SPSS output (or copy and paste the required parts of the output into your Word document) file into Lab 9 assignment.