Psychology340: Describing Distributions I

Psychology 340 Syllabus
Statistics for the Social Sciences

Illinois State University
J. Cooper Cutting
Fall 2002

Describing Distributions I

What are distributions

What are the properties of distributions

shape
center
spread

categorical vs. quantitative variables

Exploring distributions with tables and graphs

frequency distribution tables
bar charts
pie charts
histograms
stem and leaf plots

measures of center

What is a distribution?

Consider the final round scores in the 2002 NEC World Golf Championship

65 69 68 68 68 67 70 71 71 72 69
69 71 72 70 70 71 74 67 67 68 69
72 71 72 72 72 70 70 71 71 71 72
73 65 71 74 72 74 70 75 72 72 75
71 70 73 72 70 70 78 74 74 71 73
71 68 74 73 70 69 68 77 72 70 70
74 73 70 69 78 74 73 69 84 75 73

These are all of the final round scores of the 77 golfers who particpated. In other words, this is the distribution of final round scores.

It is difficult to get a sense of the overall distribution by just looking at the raw scores. Instead, we use several descriptive statistical methods to summarize, simplify, and describe the distribution.

Three characteristics of distributions

There are 3 characteristics used that completely describe a distribution: shape, central tendency, and variability. We'll be talking about central tendency (roughly, the center of the distribution) and variability (how broad is the distribution) in future chapters.

Shape

Skewness

kurtosis

In a symmetrical distribution, it is possible to draw a vertical line through the middle so that one side of the distribution is an exact mirror image of the other.

red

In a skewed distribution, the scores tend to pile up toward one end of the scale and taper off gradually at the other end.

The section where the scores taper off towards one end of a distribution is called the tail of the distribution.

<------ tail points: negatively skewed

positively skewed: tail points this way ---->

positively skewed

negatively skewed

Kurtosis is a relative measure of the body and tail portions of the distribution.

Distributions that are "flat" are platykurtic

Distributions that are "peaked" are leptokurtic.

In addition to the shapes mentioned above, one should also look for whether a distribution is uni-modal or multi-modal.

If there are two (or more) clear peaks, then the distribution is bi-modal (or multi-modal if more than two).

Measures of Center

Central tendency is a statistical measure that identifies a single score as representative of an entire distribution. The goal of central tendency is to find the single score that is most typical or most representative of the entire group.

We will focus on three measures of central tendency: the mean, the median, and the mode. All are measures of central tendency, but for some distributions, some are more meaningful or appropriate than the others.

Measures of Variability

Variability provides a quantitiative measure of the degree to which scores in a distribution are spread out or clustered together.

In other words variablility refers to the degree of "differentness" of the scores in the distribution. High variability means that the scores differ by a lot, while low variability means that the scores are all similar ("homogeneousness").

We'll concentrate on three measures of variability, the range, the interquartile range, and the standard deviation.

Graphic and Tabular organizational methods

1) A frequency distribution tablesis an organized tabulation of the number of individuals located in each category on the scale of measurement.

Notice that if you add up the frequecy column, you get the total number of observations
S f = N

_____________________________
 X	f	%	c%   
84	1	1.3	100
83	0	0	98.7	
82	0	0	98.7
81	0	0	98.7
80	0	0	98.7
79	0	0	98.7
78	2	2.6	98.7
77	1	1.3	96.1
76	0	0	94.8
75	3	3.9	94.8
74	8	10.4	90.9
73	7	9.1	80.5
72	12	15.6	71.4
71	12	15.6	55.8
70	13	16.9	40.3
69	7	9.1	23.4
68	6	7.8	14.3
67	3	3.9	6.5
66	0	0	2.6
65	2	2.6	2.6
______________________________
	77	100

If you wanted to know what the total of all of the X's was, how would you do it? The easiest way would be to multiply the (X) & (f) columns and then add (sum) the results.
S (Xf )

Percentages. What percent of the group got this value for X? How do you get this?
f / N * 100

For a histogram, vertical bars are drawn above each score so that 1) the height of the bar corresponds to the frequency, & 2) The width of the bar extends to the real limits of the score. A histogram is used when the data are measured on an interval or a ratio scale.

For a bar graph, a vertical bar is drawn above each score (or category) so that 1) The height of the bar corresponds to the frequency, & 2) there is a space separating each bar from the next. A bar graph is used when the data are measured on a nominal or an ordinal scale.

Stem and leaf displays - These displays break each number down into a lef part called the stem and a right part called the leaf. If numbers are two digits, then the left digit is the stem and the right digit is the leaf. -get a picture and can recover all of the individual data points

 8  |  
 8  |  4
 7  |  555788
 7  |  0000000000000111111111111222222222222333333344444444
 6  |  557778888889999999
 6  |

Measuring the center of a distribution

There are a number of different measures of center. Which is appropriate largely depends of the kind of variable and the shape of the distributions. So consider these three distributions:

Where is the single value that is most representative of the enitre distribution? For first - 5, for second is it 7 or 5 (this one is neg. skewed) for the third, is it 5, nobody is at 5. this one is bi-modal, that is it may be most appropriate to talk about having two middles - more on this in a bit

The most commonly known measure of central tendency is the arithmetic average, or the mean. We've already talked about how you would go about figuring this out from the data in a frequency distribution table.

The mean for a distribution is the sum of the scores divided by the number of scores.

The formula for the mean is:
mean = sum of all scores (X's) divided by the total number (N)

We can think of the mean in a couple of different ways.

Weighted means

the weighted means of two (or more) groups is achieved by adding the sums and dividing by the sums of the sample sizes.

e.g.,

= S X₁ + S X₂

₁

₂

So suppose that I were to decide to make up my grading scale collapsing over all of my sections of stats. If I know that one section (n = 20) had a mean of 5 and the other 6 (n=30) how would I figure out the weighted mean?

(20)(5) + (30)(6) = 100 + 180 = 5.6
   20 + 30  		50

Effects of linear transformations on the mean

2) if you add (or subtract) a constant to each score, then the mean will change by adding that constant. - suppose that you want to factor out the fact that each girl spent $2 buying supplies for the bakesale. So you want to subtract 2 from each amount. Now the total is $180, so the mean is 180/10 = $18. But notice you could have just subtracted $2 from the previous mean of $20 and arrived at the same answer.

3) if you multiply (or divide) each score by a constant, then the mean will change by being multiplied by that constant. - suppose that the troop sponser agreed to match the money made by each girlscout. That is they agree to give each girl scout an additional amount of money equal to however much they make on the sale. So now the total is $400, and the mean for each girl is 400/10 = $40.

The median is the score that divides a distribution exactly in half. Exactly 50% of the individuals in a distribution have scores at or below the median. The median is equivalent to the 50th percentile.

So how do we find the median? Let's start by assuming that we have discrete categories.

3, 4, 4, 5, 5, 5, 6, 6, 7

2) With an even number of scores, just list them in order from lowest to highest. Then find the middle two scores and determine the point exactly midway between them. To do this add them together and divide by two.
-so what is the median for our girl scouts?

$8, 10, 12, 15, 15, 18, 18, 19, 25, 60

middle two are 15 & 18 so 15 + 18 = 33 33/2 = 16.5

The final measure of central tendency that we'll consider is the mode.

In a frequency distribution, the mode is the score or category that has the greatest frequency.

so the mode is 5

However, be aware that a frequency distribution may have more than one mode.

so the modes are 2 and 8
if one were bigger than the other it would be called the major mode and the other would be the minor mode

So how do you know which measure of central tendency?

- the answer depends on a number of factors.

- You cannot find a mean or median of a nominal scale, however you can find a mode for a nominal scale

- Use the median if:

2) there are undetermined values - if for some reason you don't know the value of one (or more) of your items (e.g., the person died before answering your question)

3) your distributions are 'open-ended' - by this we mean that there is no upper or lower limit on the possible values of your variable (e.g. your top answer on your questionare is '5 or more')

4) If your data are on an ordinal scale (rankings), then use the median.

How do the shapes of distributions and relate the shapes with our measures of central tendency.

symmetric distribution mean = median = mode
positively skewed distribution mode < median < mean
negatively skewed distribution mean < median < mode
bimodal distribution mean = median, 2 modes

We will discuss the third characteristic variability (or spread) in the next time.

If you have any questions, please feel free to contact me at jccutti@mail.ilstu.edu.

Psychology 340 SyllabusStatistics for the Social Sciences

Illinois State University J. Cooper Cutting Fall 2002