Although frequency distributions may be complex at times, it is often very useful to be able to summarize or describe the distribution with a single numerical value. However, we need to take care to select a value that is the most representative of the entire distribution, that is of all of the individuals.
This is what we mean by central tendency.
Central tendency is a statistical measure that identifies a single score as representative of an entire distribution. The goal of central tendency is to find the single score that is most typical or most representative of the entire group.
We will focus on three measures of central tendency: the mean, the median, and the mode. All are measures of central tendency, but for some distributions, some are more meaningful or appropriate than the others.
So consider these three distributions:
Where is the single value that is most representative of the enitre distribution? For first - 5, for second is it 7 or 5 (this one is neg. skewed) for the third, is it 5, nobody is at 5. this one is bi-modal, that is it may be most appropriate to talk about having two middles - more on this in a bit
The most commonly known measure of central tendency is the arithmetic average, or the mean (note: in everyday speech, the term average actually refers to all three measures of central tendency, for examples of this see gray box 3.4, pg 90). We've already talked about how you would go about figuring this out from the data in a frequency distribution table.
The mean for a distribution is the sum of the scores divided by the number of scores.
The formula for the population mean is:
m = S X
N
The formula for the sample mean is:
= S X
We can think of the mean in a couple of different ways.
2) the mean can also be thought of as the 'balance' point of the distribution. If you put the observations on an imaginary see-saw (teeter totter) with the mean at the center point, then the two sides of the see-saw should be balanced (that is both sides are off the ground and the see-saw is level)
Weighted means
the weighted means of two (or more) groups is achieved by adding the sums and dividing by the sums of the sample sizes.
e.g.,
= S X1 + S X2
So suppose that I were to decide to make up my grading scale collapsing over all of my sections of stats. If I know that one section (n = 20) had a mean of 5 and the other 6 (n=30) how would I figure out the weighted mean?
(20)(5) + (30)(6) = 100 + 180 = 5.6 20 + 30 50
A mean has several properties or characteristics:
2) if you add (or subtract) a constant to each score, then the mean will change by adding that constant. - suppose that you want to factor out the fact that each girl spent $2 buying supplies for the bakesale. So you want to subtract 2 from each amount. Now the total is $180, so the mean is 180/10 = $18. But notice you could have just subtracted $2 from the previous mean of $20 and arrived at the same answer.
3) if you multiply (or divide) each score by a constant, then the mean will change by being multiplied by that constant. - suppose that the troop sponser agreed to match the money made by each girlscout. That is they agree to give each girl scout an additional amount of money equal to however much they make on the sale. So now the total is $400, and the mean for each girl is 400/10 = $40.
So how do we find the median? Let's start by assuming that we have discrete categories.
3, 4, 4, 5, 5, 5, 6, 6, 7 |
2) With an even number of scores, just list them in order from lowest to
highest. Then find the middle two scores and determine the point
exactly midway between them. To do this add them together and
divide by two.
-so what is the median for our girl scouts?
$8, 10, 12, 15, 15, 18, 18, 19, 25, 60
middle two are 15 & 18 so 15 + 18 = 33 33/2 = 16.5
Now let's make life a little more complex and consider continuous variables
So suppose that the information that we were using in the girl scout case really was based on an interval scale of continuous data. That is the girl who we credited as getting $8 really earned $7.89. So the data are really continuous, so we must consider the real limits of the intervals to compute our median. We already know how to do this when you think about it. The median is the 50th percentile.
___________________________ X f % c% 60 1 10 100 : 25 1 10 90 : 19 1 10 80 18 2 20 70 : 15 2 20 50 <----- 50%, but remember real limits so this : is really $15.50 12 1 10 30 : 10 1 10 20 : 8 1 10 10 total 10See your book for another example - gray box 3.3 on page 85 and figure 3.7b on top of 84
The final measure of central tendency that we'll consider is the mode.
In a frequency distribution, the mode is the score or category that has the greatest frequency.
So look at your frequency table or graph and pick the variable that has the highest frequency.
so the mode is 5 |
However, be aware that a frequency distribution may have more than one mode.
so the modes are 2 and 8
if one were bigger than the other it would be called the major mode and the other would be the minor mode |
So how do you know which measure of central tendency? - the answer depends on a number of factors.
The mean is the most prefered measure, it takes every item in the distribution into account, and it is closely related to measures of variability (which we'll talk about next week). However, there are times when the mean isn't the appropriate measure.
- You cannot find a mean or median of a nominal scale, however you can find a mode for a nominal scale
- Use the median if:
2) there are undetermined values - if for some reason you don't know the value of one (or more) of your items (e.g., the person died before answering your question)
3) your distributions are 'open-ended' - by this we mean that there is no upper or lower limit on the possible values of your variable (e.g. your top answer on your questionare is '5 or more')
4) If your data are on an ordinal scale (rankings), then use the median.
symmetric distribution mean = median = mode |
|
positively skewed distribution mode < median < mean |
|
negatively skewed distribution mean < median < mode |
|
bimodal distribution mean = median, 2 modes |