Psy138 Logo.

Outline

  • Consider what measures of variability
    tell us about a distribution
  • Compute measures of variability using SPSS
  • Compare distributions with different
    amounts of variability

Lab 10

Describing distributions: Variability

Download Lab 10 WorksheetTutorial 10

 

Considering measures of variability

The variability of a distribution tells us important information about how different the scores are from one another. Variability helps us understand the shape of the distribution, understand the difference between the mean and median of the distribution, and make decisions about what the data distribution means with regard to our research question.

We'll concentrate on two measures of variability, the range and the standard deviation. Consider a very simple distribution (scores on a 30 point quiz), with only eight data points.

 21, 22, 23, 24, 25, 26, 27, 28

    The simplest measure of variability is the range.
      The range is the difference between the largest (maximum) X value and the smallest (minimum) X value.

      (1) Look at the data set above and compute the range of the distribution. What is the range of quiz scores?
       
       

    We can also calculate the standard deviation of the distribution. It measures how far off all of the individuals in the distribution are from a standard, where that standard is the mean of the distribution. In other words, it is the "average distance" that each point is from the mean.
      STEP 1: Our first step is to compute the mean of the distribution.

      (2) Compute the mean for this quiz score distribution.

      STEP 2: The next step is to figure out how far away each of the data points in the distribution is from this standard (the deviations).

      (3a) Calculate the deviation of each score from the mean by subtracting each score in the distribution from the mean you calculated above (i.e., score - mean = deviation). List the deviation scores below. You should have one deviation score for each of the 8 quiz scores listed above.

      Notice that if you add up all of the deviations they should/must equal 0. Think about it at a conceptual level. What you are doing is taking one side of the distribution and making it positive, and the other side negative and adding them together. They should cancel each other out.

      STEP 3: The next step is to find the "average" of these differences. But notice that we have a problem. If they always add up to 0, then the average difference will always be zero. So what we have to do is get rid of the negative signs. We do this by squaring the deviations (and then later we'll reverse this by taking the square root of the sum of the squared deviations). So to do this step we'll square the deviations first and then add them together, this value we'll refer to as the Sum of the Squared Deviations (Sum of Squares for short).

      (3b) Square each of the deviation scores and then add of the values you get. This will be the sum of squared deviations. What is the sum of squares for this distribution?

      STEP 4: Now we have the sum of squares (SS), but remember that we're looking for the average of the squared deviations. So to get the mean, we need to divide by the number of individuals in the sample minus 1 (n - 1). The reason that we need to subtract 1 is related to the fact that our deviations always add up to 0. Because we know the mean of our sample in advance, this constrains one of the data points. We will discuss this issue (called degrees of freedom) in more detail later in the course. If we are computing the standard deviation for a population rather than a sample, then we figure out the average deviation by dividing by n alone. Some calculators and statistical packages will give you the option of calculating the standard deviation for either a sample or a population.

      (3c) Divide your sum of squares value by n - 1 (i.e., 8 - 1 = 7) to find the average sum of squares. List it below.

      STEP 5: The last step is to reverse that squaring we did to get rid of the signs of the deviations. We need to take the square root to get our standard deviation.

      (3d) Take the square root of the average sum of squares value. This will give you the standard deviation (i.e., the average difference from the mean for the scores in the distribution). The value you get should be about 2.45 (with some room for rounding error). If you didn't get something close to this value, go back through the steps again and make sure you did each step correctly.

      You should always ask yourself, "does this answer make sense." Look back at the deviations. These represent the actual difference between each score and the mean. If you ignored the sign for each and just took the average, we'd have an average of 2. And if this were a population, that's the value we should get for the standard deviation. But remember, since this is a sample we divide by n - 1 and get a number close to this, 2.45. This represents the average difference between the scores and the mean and it makes sense for this distribution. With a larger sample, we'd get even closer to the value of 2.

      To review:

        STEP 1: Compute the mean of the distribution
        STEP 2: Find the deviation scores by subtracting each score from the mean
        STEP 3: Square the deviation scores to get rid of the sign and add them up to get the sum of squares
        STEP 4: Take the average of the sum of squares by dividing by n for a population or n - 1 for a sample
        STEP 5: Take the square root to reverse the squaring that was done earlier
    (3f) Let's think about these measures for a minute:

      How many scores in the distribution is the range based on?
      How many scores in the distribution is the standard deviation based on?
      From your answers here, which measure should give you more information about the entire distribution?


Using SPSS to compute measures of variability

Let's check our calculations using SPSS to find the range and standard deviation for this distribution.

Open SPSS and instead of opening a data file, choose the option to type in new data. Type in the quiz scores (21, 22, 23, 24, 25, 26, 27, 28) into a single column. Follow the steps below to calculate the range and standard deviation. Type the values into your worksheet.

You can access these descriptive statistics in the same way that we accessed the mean, median, and mode. Go to "Analyze", "Descriptive Statistics", and "Descriptives".

This will open up a window so that you can choose the variable you want descriptive statistics on. Choose the variable and click on the arrow tab in the middle of the window to put it in the box. The click "Ok." Your output will display minimum and maximum values, the mean of the distribution, and the standard deviation.

You can get the range by chooing "Options" and clicking the range box to check it.

The click "Ok." Your output will display minimum and maximum values, the range, the mean of the distribution, and the standard deviation.

(4) Type the range and standard deviation values into your worksheet. Compare your calculated range and sd values with the ones given by SPSS. If they are different, explain why you think that would occur.


Comparing variability for different distributions

    Example 1

      Let's do a little experiment. In this experiment you're task is to click on a square with the mouse. You'll be asked to do this a number of times (about 40). It'll only take a couple of minutes.

      Click on the button, read the instructions, and perform the experiment.

      The results are presented as two distributions of reaction times, one for large squares and one for small squares.

      (5a) Briefly describe the similarities and differences between the two distributions (e.g. shape, center, and spread).

      (5b) Based on these distributions, what effect does square size have on your clicking speed?


    Another example:

      Consider the following three sets of data:
      distribution 1
      distribution 2
      distribution 3
      1, 2, 3, 3, 4, 4, 4, 5, 5, 5,
      5, 5, 6, 6, 6, 7, 7, 8, 9, 9
      3, 3, 3, 3, 4, 4, 4, 5, 5, 5,
      5, 5, 6, 6, 6, 7, 7, 7, 7, 8
      1, 3, 3, 4, 4, 5, 5, 5, 5, 5,
      5, 5, 5, 5, 6, 6, 6 ,7 ,7, 9

      • (6) Just looking at the numbers, for which distribution is variability the lowest? Why did you come to that conclusion?

      Next, we will type the data from the 3 distributions into a new file in SPSS. You can think of the 3 distributions above as 3 different variables for SPSS - that is, you will enter data from 'distribution1' into the first column, data from 'distribution2' into the second column, and 'distribution3' into the 3rd column. You may name these 3 new variables whatever you'd like. Once you've entered this data into SPSS, complete the following...

      • (7) For each distribution, use SPSS to construct a histogram. Cut and paste the histogram into your worksheet.
      • (8) For each distribution use SPSS to compute the range and standard deviation. Type the answers into your worksheet.
      • (9) Which measure of variability is most affected by extreme values (hint: compare the 2nd and 3rd distributions)?

      •  
    One last example:

      Let's go back to the students.sav file and look at the distributions of quiz scores (quizzes 1-5) that we looked at in lab 8 to compare them based on their variability.

      Open the students.sav file and calculate the range and variability for each variable. Cut and paste the output into your worksheet. Then answer the questions below:

      • (10) Which quiz appears to do a better job of discriminating student learning of the material on the quiz (that is, which quiz best differentiates between high and low performers/students)? Why do you think this?



Properties of the standard deviation

    1) Adding a constant to each score in the distribution will not change the standard deviation.

      So if you add 2 to every score in the distribution, the mean changes (by 2), but the standard deviation stays the same (since none of the deviations would change because you add 2 to each score and the mean changes by 2).
    2) Multiplying each score by a constant causes the stardard deviation to be multiplied by the same constant.
      This one is easier to think of with numbers. Suppose that your mean is 20, and that two of the individuals in your distribution are 21 and 23. If you multiply 21 and 23 by 2 you get 42 and 46, and your mean also changes by a factor of 2 and is now 40. Before your deviations were (21 - 20 = 1) & (23 - 20 = 3). But now, your deviations are (42 - 40 = 2) & (46 - 40 = 6). So your deviations are getting twice as big as well.
       
       
Let's look at these properties (along with the properties of the mean) using SPSS.

Enter the following set of data into an SPSS data file.

1, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 9, 9
  • (11) What is the mean and standard deviation for this distribution?
  • (12) Transform the distribution by adding 3 points to each score (adding a constant to every score in the distribution). (hint: you can use the compute function in SPSS to do this). What will your mean and standard deviation for the new distribution be? Check your answer with SPSS.
  • (13) Transform the original distribution (not the new one from question 2) by multiplying every score by 3 (again, use the compute function). What will your mean and standard deviation for the new distribution be? Check your answer with SPSS.