Psychology 240: Statistics 1 Lectures: Chapter 4

Psychology 240 Lectures
Chapter 4
Statistics 1

Illinois State University
J. Cooper Cutting
Fall 1998, Section 04

Your textbook:

Gravetter, F. J., Wallnau, L. B. (1996). Statistics for the Behavioral Sciences:
A First Course for Students of Psychology and Education, 4th Edition. New York: West Publishing.

Chapter 4: Variability

So far we've discussed two of the three characteristics used to describe distributions, now we need to discuss the remaining - variability. Notice in our distributions that not every score is the same, e.g., not everybody gets the same score on the exam. So what we need to do is describe the varied results, rougly to describe the width of the distribution.

Variability

In other words variablility refers to the degree of "differentness" of the scores in the distribution. High variability means that the scores differ by a lot, while low variability means that the scores are all similar ("homogeneousness").

We'll concentrate on three measures of variability, the range, the interquartile range, and the standard deviation.

The simplest measure of variability is the range, which we've already mentioned in our earlier discussions.

range

upper real limit

lower real limit

So look at your frequency distribution table, find the highest and lowest scores and subtract the lowest from the highest (note, if continuous must consider the real limits).

__X	f	cf	c% 
  10	2	25	100
  9	8	23	92
  8	4	15	60
  7	6	11	44
  6	4	5	20
  5	1	1	4

if X is discrete then:

the range = 10 - 5 = 5

if X is continuous then:

the range = 10.5- 4.5 = 6

- there are some drawbacks of using the range as the description of the variability of a distribution

- the statistic is based solely on the two most extream values in the distribution, thus it doesn't capture all of the members of the distribution.

An alternative measure of variability is the interquartile range.

So think back to percentiles. 50%tile equals the point at which exactly half the distribution exists on one side and the other half on the other side.

_X 	f	%	c% 
 7	4	12.5	100
 6	4	12.5	87.5
 5	4	12.5	75
 4	8	25	62.5
 3	4	12.5	37.5
 2	4	12.5	25
 1	4	12.5	12.5

The interquartile range is the distance between the first quartile and the third quartile. So this corresponds to the middle 50% of the scores of our distribution.

So for the above distribution (assume that it is a continuous variable)

So the interquartile range (IQR) = 5.5 - 2.5 = 3.0

Note that the interquartile range is often transformed into the semi-interquartile range which is 0.5 of the interquartile range.

		SIQR = (Q3 - Q1)
			   2

So for our example the semi-interquartile range is (3.0)(0.5) = 1.5

So the interquartile range focusses on the middle half of all of the scores in the distribution. Thus it is more representative of the distribution as a whole compared to the range and extreme scores (i.e., outliers) will not influence the measure (sometimes refered to as being robust). However, this still means that 1/2 of the scores in the distribution are not represented in the measure.

The standard deviation is the most popular and most important measure of variability. It takes into account all of the individuals in the distribution.

In essence, the standard deviation measures how far off all of the individuals in the distribution are from a standard, where that standard is the mean of the distribution.

parameter

statistic

So to get a measure of the deviation we need to subtract the population mean from every individual in our distribution.

- if the score is a value above the mean the deviation score will be positive - if the score is a value below the mean the deviation score will be negative

Example: consider the following data set: the population of heights (in inches) for the class

69, 67, 72, 74, 63, 67, 64, 61, 69, 65, 70, 60, 75, 73, 63, 63, 69, 65, 64, 69, 65

mean = m = 67

S (X - m) = (69 - 67) + (67 - 67) + .... + (65 - 67) = ?
= 2+ 0 + 5 + 7 + -4 + 0 + -3 + -6 + 2 + -2 + 3 + -7 + 8 + 6 + -4 + -4 + 2 + -2 + -3 + 2 + -2
= 0

Notice that if you add up all of the deviations they should/must equal 0. Think about it at a conceptual level. What you are doing is taking one side of the distribution and making it positive, and the other side negative and adding them together. They should cancel each other out.

So what we have to do is get rid of the negative signs. We do this by squaring the deviations and then taking the square root of the sum of the squared deviations.

Sum of Squares = SS = S (X - m)² = (69 - 67) ² + (67 - 67)²+ .... + (65 - 67)² =
SS = 4+ 0 + 25 + 49 + 16 + 0 + 9 + 36 + 4 + 4 + 9 +49 + 64 + 36 + 16 + 16 + 4 + 4 + 9 + 4 + 4
SS = 362

The equation that we just used (SS = S (X - m)²) is refered to as the definitional formula for the Sum of Squares. However, there is another way to compute the SS, refered to as the computational formula. The two equations are mathematically equivalent, however sometimes one is easier to use than the other. The advantage of the computational formula is that it works with the X values directly.

The computational formula for SS is:

	SS = SX² - (SX)²
		    N

So for our example:

	SS = [(69)² + (67)²+ ..... + (69)² + (65)²] - (69 + 67 + ... + 69 + 65)²
							      	    21

	     =  94631  -  (1407)²=  94631 - 94269  = 362
			    21

Now we have the sum of squares (SS), but to get the Population Variance which is simply the average of the squared deviations (we want the population variance not just the SS, because the SS depends on the number of individuals in the population, so we want the mean). So to get the mean, we need to divide by the number of individuals in the population.

However the population variance isn't exactly what we want, we want the standard deviation from the mean of the population. To get this we need to take the square root of the population variance.

variance

s = sqroot(s)

So for our example:

To review:

step 1

step 2:

step 3:

- take the square root of the variance

Now let's move onto the Standard Deviation of a Sample

- need to adjust the computation to tak into account that a sample will typically be less variable than the corresponding population.

- if you have a good, representative sample, then your sample and population means should be very similar, and the overall shape of the two distributions should be similar. However, notice that the variability of the sample is smaller than the variability of the population.

- to account for this the sample variance is divided by n - 1 rather than just n

	sample variance = s²= __SS _
			       n - 1

- and the same is true for sample standard deviation

So what we're doing when we subtract 1 from n is using degrees of freedom to adjust our sample deviations to make an unbiased estimation of the population values.

What are degrees of freedom? Think of it this way. You know what the sample mean is ahead of time (you've got to to figure out the deviations). So you can vary all but one item in the distribution. But the last item is fixed. There will be only one value for that item to make the mean equal what it does. So n - 1 means all the values but one can vary.

Example:

Okay, so let's do an example of computing the standard deviation of a sample

step 1: compute the SS

-- OR --

You can still use the computational formula to get SS

	        SS = SX² - (S  X)²
			     N

		  = (1+4+9+16+16+25+36+49) - (1+2+3+4+4+5+6+7)
						      8
		  = 156 - 128 = 28.0

step 2

		sample variance = s²= _SS_
				      n - 1


		  = 28/(8-1) = 28/7 = 4.0

= sqroot 4.0 = 2.0

Properties of the standard deviation (Transformations)

So if you add 2 to every score in the distribution, the mean changes (by 2), but the variance stays the same (notice that none of the deviations would change because you add 2 to each score and the mean changes by 2).

This one is easier to think of with numbers. Suppose that your mean is 20, and that two of the individuals in your distribution are 21 and 23. If you multiply 21 and 23 by 2 you get 42 and 46, and your mean also changes by a factor of 2 and is now 40. Before your deviations were (21 - 20 = 1) & (23 - 20 = 3). But now, your deviations are (42 - 40 = 2) & (46 - 40 = 6). So your deviations are getting twice as big as well.

Comparing Measures of Variability

Go to Chapter 3: Central Tendency
Go to Chapter 5: Z-Scores: Location of scores and standardized distributions

Return to Psych 240 syllabus page
Return to Psych 345 syllabus page
Return to Statistics Lectures page

Return to Illinois State University Home Page
Return to Illinois State University Psychology Home Page

Psychology 240 LecturesChapter 4 Statistics 1

Illinois State University J. Cooper Cutting Fall 1998, Section 04

If you have any questions, please feel free to contact me at cutting@main.psy.ilstu.edu.

Psychology 240 Lectures
Chapter 4
Statistics 1

Illinois State University
J. Cooper Cutting
Fall 1998, Section 04