Frequency Distribution Tables
For this lab we'll use a new
SPSS data file that includes hypothetical course
grade information. Click on students.sav
to get this file and then select save.
(Note: Computer actions will be indicated by red bold typeface.) You may need to name the file
as "gradebook.sav" so that SPSS will recognize
it (all SPSS data files must have a .sav ending
in the name). Put the file someplace to save it
for future use. After saving the file you should
open it up in SPSS. Open SPSS and then open the
data file students.sav from where you saved it.
In this file there are a number
of variables; click on Variable
View and go through the variables,
checking each for the kind of scale it is. (We
will indicate variables by bold
typeface.) Remember that variable
names cannot have spaces in them. Notice the
labels (which can be many words, as needed) and
values for year, lowup, and review.
For now we'll work at quiz1
and quiz2; click on their row
numbes and all the information about the
variables will be highlighted. Your task for
this part of the lab is to create a frequency
distribution table and graph for each of these
variables.
We'll start by looking at
quiz1. Did some people do a lot worse
or better than the rest of the class? Overall,
was the quiz easy or hard? It is hard to answer
these questions just by looking at the numbers
as they are. Instead we can start using some
statistical procedures to organize the data, to
make it easier to understand the data.
Statistical
procedure are found in the top menu in SPSS,
so they are accessible from either Data View
or Variable View. We will use four in this
course: Data, Transform, Analyze, and Graphs.
Our first procedure will be to sort the datafile by
quiz1. To do this,
select the top menu Data
and select from the submenu Sort Cases.
A new window opens up for
selecting variables. This kind of window will
open up for all statistical procedures. Click on
quiz1, then click on the arrow
in the middle to move it to the Sort by box:
Now click on the OK
button. A new Output
window opens but for this
procedure it does not show a new table or
graph. Instead the column for quiz1
has been sorted on the Data
View page. You can begin to see the
pattern of the distribution. For example, now
it is easy to see what the lowest and highest
scores are (now at the top and bottom of the
column). However, usually just sorting the
variable isn't enough. Also, you can only sort
one variable at a time. (If you sort quiz2,
quiz1 scores no longer are
completely sorted.)
The
standard statistical procedure for displaying
a distribution is to make a frequency
distribution table.
As
elaborated in the the text, a frequency distribution is
an organized tabulation of the number of
individual scores located in each response
category. If only integers are allowed and the
discrete measure has a specific range, then the
number of categories will be limited. If the
measure is continuous, then their are
potentially huge numbers of different entries
and a grouped frequency table is preferable, as
presented below.
We will now go to Analyze in the top menu,
which contains the procedures that we will
select from during the rest of the course; all
of these produce tables of results. Select Analyze
and then select Descriptive
Statistics from the drop-down menu.
From the submenu that pops up, select Frequencies.
The standard window for
selecting variables opens up. You select quiz
1 and click on the arrow to move it
to the Variable(s) box.
Click the OK
button. The Output window will open up. The
frequency table for quiz 1
should look something like this:
Now look at your finished
frequency distribution table and answer the
following questions:
(1) What percentage of the scores is at or
below a score of 7?
(2) Where does it appear that most of the
scores are located?
(3) What does your answer to (2) tell you
about the difficulty of the quiz?
NOW
MAKE A FREQUENCY DISTRIBUTION FOR QUIZ 2.
To do so, you need to return to
the Data View page. After you select your
procedure, the variable selection window will
open, still containing quiz
1. Click on it and the arrow to move it
back to the variable list. Then click on grade and the arrow to
move it to the Variable(s) box.
When you are finished, compare the two
distributions and answer the following
questions:
(4) For which quiz do the
scores appear to be more evenly distributed
across the scale?
(5)
Which quiz appeared to be harder? How do you
know this?
Grouped Frequency Distribution Tables
When there are too many
different response categories to list every
category in a frequency distribution table, we
can group the
scores into class intervals
and use the intervals as the X values in our
table. For example, think of a percentage
grading scale (A = 90-100, B = 80-89, ...). In
the gradebook.sav
file, the letter grade earned on the final exam
is shown as ExamGrade.
Use SPSS to create a grouped frequency table:
First we will recode the quiz1
scores into grouped values.
Click Transform at the
top of your screen and click Recode into
Different Variables.
Select quiz1 and
click the arrow to move the quiz1
variable into the white box.
Click Old and New Values.
Select Range, LOWEST
through value on the left and enter 2.
Then on the upper right corner of the box enter
2 where it says Value.
Next, click Add.
Enter the rest of the grouped
values by first selecting Range on the
left and enter the range between 2 and 4. Enter
4 where it says Value
on the upper right and then click Add.
Recode the ranges for 46, 68, and
810 into the values 6, 8, and 10, respectively.
Then click Continue.
Enter the new variable name quiz1grouped
in the Name box. Then click Change.
Click OK.
Click Analyze at the top
of the screen and go to Descriptive
Statistics. Click Frequencies.
Select quiz1grouped
at the bottom of the variable list and move it
into the white box with the blue arrow. Click OK.
You should see a table that looks like this:
quiz1grouped |
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
2.00 |
3 |
2.9 |
2.9 |
2.9 |
4.00 |
12 |
11.4 |
11.4 |
14.3 |
6.00 |
18 |
17.1 |
17.1 |
31.4 |
8.00 |
30 |
28.6 |
28.6 |
60.0 |
10.00 |
42 |
40.0 |
40.0 |
100.0 |
Total |
105 |
100.0 |
100.0 |
|
(6) Finish
the table in the Word document that looks like
the table below for the variable percent
which represents final course grades for the
students.sav file. Use the Recode
function described above.
X |
f |
p |
cf |
cp |
049.99 |
|
|
|
|
5059.99 |
|
|
|
|
6069.99 |
|
|
|
|
70-79.99 |
|
|
|
|
8089.99 |
|
|
|
|
90100 |
|
|
|
|
Bar graphs
To
display the distribution of a nominal or
ordinal (that
is, categorical) variable
one should use a bar graph (pie charts are also
used, but we won't be discussing them). SPSS
makes a number of different kinds of bar charts,
but we'll focus on simple and clustered. Excel
also makes graphs, but it is somewhat difficult
and we will not cover it.
Bar chart
(simple, clustered, and stacked):
These are used most often to display
the distribution of subjects or cases
in certain categories, such as the
number of A, B, C, D, and F grades in
a given class. |
Lets start with looking at the distribution of
ethnicity in our students.sav data
file. So what our graph will show are the counts
(or frequency) for each of ethnic
category.
- From the menu click Graphs→Chart Builder.
- Drag the Simple Bar chart into the white box
at the top.
- Drag the ethnicity variable into
the X-axis box.
- Click OK.
You should get a bar chart that looks something
like this.
Bar charts are also
useful for presenting distributions that are
broken into different categories.
For example suppose that we wanted to know the
mean scores on quiz1 broken down
by the three different sections.
- From the menu click Graphs→Chart
Builder.
- Drag the Simple Bar chart into the
white box at the top.
- Drag the section variable into the
X-axis box.
- Drag the
quiz1 variable into
the Y-axis box.
- Click OK.
We should end up with a graph that looks like
this.
Suppose that we want to look at the same means
by section but broken down by ethnicity. To do
this we must use a clustered bar graph.
So select bar graph, then chose clustered. Now
enter things as we did in the example above,
drag the clustered bar chart and set the Cluster
on X: set color box with the ethnicity
variable as shown in the box below.
We should end up with a graph that looks like
this.
Make a bar graph of the counts of the final
grades (called grades in the file)
in the class (i.e. A, B, C,...) and paste it
into the Word document.
Make a bar graph of the counts of the final
grades in the class (i.e. A, B, C,...), further
broken down by whether they attended the review
session or not and paste it into the Word
document.
MAKE
A BAR GRAPH OF GRADES ON THE FINAL EXAM.
Recall that it is called
ExamGrade in the file.
(7) What was the most common grade in the
course?
NOW
MAKE A BAR GRAPH OF THE COUNTS OF THE FINAL
GRADES FURTHER BROKEN DOWN BY WHETHER
STUDENTS ATTENDED THE REVIEW SESSION OR NOT. (You can set either color or
pattern for the clustered variable.)
(8)
Based on the graph, would you
conclude that attending the review session had
an impact on final grades? Why?
Histograms
We already created frequency
distribution tables. Now we will create
histograms to display the
distributions of SPSS scale (that is,
interval or ratio quantitative) variables. We
will construct a histogram rather than a bar
graph for quiz1 because it is
a continuous quantitative variable (its SPSS
symbol is a ruler).
Histogram: A
histogram is a pictorial
representation of the distribution of
values for a particular variable. The
bars represent the number of
occurences of each value. These look
similar to bar graphs except they are
used more often to indicate the number
of subjects or cases in ranges of
values for a continuous variable, such
as the number of subjects or cases in
ranges of values for a continuous
variable. |
Creating a histogram of the students scores on
quiz1 .
- At the top of the data window is a row of
menus. Click the graphs menu.
- Under this menu a large number of graphing
options will appear. On the bottom third of
the list is histogram. This is the
option that we'll use to look at distributions
(for this lab at least).
- Select histogram. Now you'll get a
window that looks like this:
- Select
quiz1 as your variable
and then click OK.
This should result in a new window (the output
window) opening up, and it should have your
histogram in it. The histogram of quiz1
is basically just a picture of the frequency
distribution table. Below is a frequency
distribution table and a histogram for quiz1 .
For quiz1 the frequency table output
should look something like this:
In this case the histogram is a little different
than you might expect after comparing it to the
frequency distribution table above. Why?
Because, the above histogram is based on a Grouped
frequency distribution table of quiz1
(see previous lab for discussion). Go ahead and
group scores 10 & 9, 8 & 7, 6 & 5,
etc. and see if now the histogram looks as you'd
expect it would.
An important lesson from this is that the size
of the interval that you plot may influence the
overall shape of the histogram.
NOW MAKE HISTOGRAMS OF THE OTHER 3
QUIZZES (2, 3, and 4).
(9) What are the differences
between the different distributions?
(10)
Which quiz was the hardest? Which was the
easiest? Why do you come to that conclusion?
(11)
On which quiz(zes) did most people get the
same score? On which quiz(zes) were the scores
widely distributed?
(12)
Are there some quizzes where some students did
especially well (compared to the rest) or
especially poorly?
(13)
Which quiz(zes) was/were positively skewed?
Which quiz(zes) was/were negatively skewed?
Are there any that are not skewed (i.e. are
roughly symmetric)?
(14)
Are there any scores that may be potential
outliers?
FINALLY, save your
SPSS datafile and ATTACH THE OUTPUT OF
ALL GRAPHS & HISTOGRAMS IN THIS LAB TO
ASSIGNMENT 7.
|