Finding the Center (or central tendency) of a
Distribution
We’ll use the same students.sav
file that you used in the last lab.
A measure of central tendency
is a statistic that identifies a single value as
representative of an entire distribution. The
goal of central tendency is to find the single
value that is most typical or most
representative of the entire group.
The mean is the arithmetic
average of all of the scores in the
distribution.
The median is the score in
the middle of the rest of the scores. That is,
half of the scores are to the right of the
median and half are to the left.
The mode is the most
frequently occurring score in the
distribution.
For now let’s learn how to compute these three
measures of center.
We’ll begin by looking at the distribution for
quiz4. Look at the histogram.
If you had to pick a single value that is
representative of the entire distribution, there
are several reasonable options:
Select 7.0 because is the most frequent
score. It is the mode.
Select the value in the middle of the
distribution, at which half of the scores are
to the right and half are to the left. This
will again be 7.0. This is the median.
Select the arithmetic average. For this
distribution the mean is
6.89.
Calculating by Hand
Calculating the Mode
This is easy. You simply find the number that
occurs most often. For example, consider this
distribution:
11, 3, 5, 3, 1, 7
Rearrange it in ascending order:
1, 3, 3, 5, 7, 11
Now it is easy to see that the most frequent
number (i.e., the mode) is a 3.
Calculating the Median
This is only slightly more complicated. After
you ahve arranged a distribution of numbers in
ascending order, simply count how many numbers
there are. Let’s call this total N,
which is also called the sample size
because it is the number of participants in your
study). Notice that N is italicized.
This is proper for all letters (even Greek ones)
that stand for statistics or parameters.
If N is an odd number, the median is
the middle number in the ordered distribution.
To find which number is the middle, find N+12.
If N is 7, then
7+12=4.
This tells us that the median number is the 4th
number in the distribution. So if the
distribution were:
45, 52, 76, 88, 99, 100, 100
then the median would be the 4th
number, 88.
In the earlier example above (1, 3, 3, 5, 7,
11), N was 6. When N is
even, the median is the average of both middle
scores. Actually the formula from above still
works. The median is the score that falls in
slot number N+12.
In this case, 6+12=3.5.
What is the 3.5th number? By
convention it is the average of the 3rd
and 4th numbers. In this case, the 3rd
and 4th numbers are 3 and 5 so the
median is 3+52=4.
More complicated formulas exist for the median
but we will not concern ourselves with them in
this class.
Calculating the Mean
This is something that is familiar to you.
Simply add up all the numbers and divide by N
(the number of numbers in the distribution). In
example above, the mean is 1+3+3+5+7+116=306=5.
Technically, this is called the arithmetic
mean. Note that there are other
kinds of averages that won’t be covered in
this course. Each kind of average has its
specific purposes but the arithmetic mean will
do for most purposes in this course.
Generally, unless it makes sense to do
otherwise, we will calculate the mean and other
kinds of statistics to 2 decimal points of
precision.
(1)
Answer the following questions about this
dataset (Note: calculate these by hand not
SPSS):
(a) What is the mean of
this distribution?
(b) What
is the mode of this distribution?
(c) What
is the median of this distribution?
Calculating the measures of central tendency
in SPSS
The following section outlines how to compute
the mean, median, and mode in SPSS.
We can get SPSS to compute all of these values
in the same command submenu. Go to the Statistics
menu, select the Analyze
submenu, and then the Descriptive
Statistics submenu, and then the Frequencies
option.
This should open a window that looks like this:
Select quiz4 as your variable.
And then click on the Statistics
button.
This will open another window.
In this window select mean, median,
and mode. Then click Continue.
This will take you back to the previous window.
Now click OK.
Now SPSS should open up an output window that
includes a table that looks like this:
That’s all there is to it.
Let’s try to do what I just outlined above but
with a different variable. For the variable final
in your students.sav file I’d like you to answer
the following questions.
(2)
Using SPSS for your compuations, answer the
following questions.
(a) What
is the mean for the "final" variable?
(b) What
is the median for the "final" variable?
(c) What
is the mode for the "final" variable?
(d) What
percent of students scored lower than the mode
on the final?
(hints: Don't include the
students who scored the mode exactly.
Don't include "%" in your
answer.
Round to 1 digit. Thus, 22.2% would be entered
as 22.2)
Properties of Central Tendency Measures
The mean, median and mode are descriptive
statistics that are designed to tell us
something about the center of a distribution.
That is, where most of the data are.
So how do you know which measure of central
tendency should be used?
The answer depends on a number of factors,
including the shape of the
distribution and the scale of
measurement that you use.
Use the mean if you can.
The mean is the preferred
measure of central tendency for interval and
ratio level data. It takes every item in the
distribution into account, and it is more stable
than the median and the mode. In this context, stable
refers to the tendency to get similar results
when you draw multiple samples from the same
population and calculate a statistic. In one
sample of numbers ranging from 0 to 24, the mean
might be 10.4 and in another sample the mean
might be 10.3. 10.4 and 10.3 are similar to each
other. If we drew many samples from the same
population and the mean was roughly the same
each time, we would say that the sample mean is
stable. In small samples, the mode is very
unstable. That is, from one sample to the text,
the mode is very different each time. The median
is more stable than the mode, but typically less
stable than than the mean, especially in smaller
samples.
If you change a given score, add an
observation, delete an observation, and then
the mean will change.
If you add (or subtract) a constant to each
score, then the mean will change by adding
that constant.
If you multiply (or divide) each score by a
constant, then the mean will change by being
multiplied by that constant.
However, there are times when the mean is not
the appropriate measure.
Use the median if you must.
The variable is ordinal.
The interval or ratio data have extreme
values (outliers) that cause the distribution
to be highly skewed.
Use the mode as a last resort.
The mode is pretty much your only choice if you
are looking at a nominal scale.
Distribution Shape and Central Tendencies
Okay now let’s look at a few distributions to
examine these different measures of central
tendency relate to one another.
In a distribution like this one (symmetric
distribution), the mean, median, and mode will
have similar values.
However, in a positively skewed distribution,
the mean will be larger than the median, which
will be larger than the mode.
The opposite is true for a negatively skewed
distribution:
(3)
Look at the distribution in the histogram
above.
(a) From
lowest to hightest, rank the mean, median, and
mode.
(b) Of the three measues of
central tendency (mean, median, and mode),
which would be the
least representative number
in this case?
(c) Which kind of skewness
is evident in this distribution?