Outline

Make frequency distribution tables
Make graphs using SPSS

Lab 7

SPSS:

Tables and Graphs

for Frequency Distributions

Frequency Distribution Tables

For this lab we'll use a new SPSS data file that includes hypothetical course grade information. Click on students.sav to get this file and then select save. (Note: Computer actions will be indicated by red bold typeface.) You may need to name the file as "gradebook.sav" so that SPSS will recognize it (all SPSS data files must have a .sav ending in the name). Put the file someplace to save it for future use. After saving the file you should open it up in SPSS. Open SPSS and then open the data file students.sav from where you saved it.

In this file there are a number of variables; click on Variable View and go through the variables, checking each for the kind of scale it is. (We will indicate variables by bold typeface.) Remember that variable names cannot have spaces in them. Notice the labels (which can be many words, as needed) and values for year, lowup, and review. For now we'll work at quiz1 and quiz2; click on their row numbes and all the information about the variables will be highlighted. Your task for this part of the lab is to create a frequency distribution table and graph for each of these variables.

We'll start by looking at quiz1. Did some people do a lot worse or better than the rest of the class? Overall, was the quiz easy or hard? It is hard to answer these questions just by looking at the numbers as they are. Instead we can start using some statistical procedures to organize the data, to make it easier to understand the data.

Statistical procedure are found in the top menu in SPSS, so they are accessible from either Data View or Variable View. We will use four in this course: Data, Transform, Analyze, and Graphs. Our first procedure will be to sort the datafile by quiz1. To do this, select the top menu Data and select from the submenu Sort Cases.

A new window opens up for selecting variables. This kind of window will open up for all statistical procedures. Click on quiz1, then click on the arrow in the middle to move it to the Sort by box:

Now click on the OK button. A new Output window opens but for this procedure it does not show a new table or graph. Instead the column for quiz1 has been sorted on the Data View page. You can begin to see the pattern of the distribution. For example, now it is easy to see what the lowest and highest scores are (now at the top and bottom of the column). However, usually just sorting the variable isn't enough. Also, you can only sort one variable at a time. (If you sort quiz2, quiz1 scores no longer are completely sorted.)

The standard statistical procedure for displaying a distribution is to make a frequency distribution table. As elaborated in the the text, a frequency distribution is an organized tabulation of the number of individual scores located in each response category. If only integers are allowed and the discrete measure has a specific range, then the number of categories will be limited. If the measure is continuous, then their are potentially huge numbers of different entries and a grouped frequency table is preferable, as presented below.

We will now go to Analyze in the top menu, which contains the procedures that we will select from during the rest of the course; all of these produce tables of results. Select Analyze and then select Descriptive Statistics from the drop-down menu. From the submenu that pops up, select Frequencies.

The standard window for selecting variables opens up. You select quiz 1 and click on the arrow to move it to the Variable(s) box.

Click the OK button. The Output window will open up. The frequency table for quiz 1 should look something like this:

Now look at your finished frequency distribution table and answer the following questions:
(1) What percentage of the scores is at or below a score of 7?
(2) Where does it appear that most of the scores are located?
(3) What does your answer to (2) tell you about the difficulty of the quiz?

NOW MAKE A FREQUENCY DISTRIBUTION FOR QUIZ 2.

To do so, you need to return to the Data View page. After you select your procedure, the variable selection window will open, still containing quiz 1. Click on it and the arrow to move it back to the variable list. Then click on grade and the arrow to move it to the Variable(s) box.

When you are finished, compare the two distributions and answer the following questions:
(4) For which quiz do the scores appear to be more evenly distributed across the scale?
(5) Which quiz appeared to be harder? How do you know this?

Grouped Frequency Distribution Tables

When there are too many different response categories to list every category in a frequency distribution table, we can group the scores into class intervals and use the intervals as the X values in our table. For example, think of a percentage grading scale (A = 90-100, B = 80-89, ...). In the gradebook.sav file, the letter grade earned on the final exam is shown as ExamGrade.

Use SPSS to create a grouped frequency table:

First we will recode the quiz1 scores into grouped values.

Click Transform at the top of your screen and click Recode into Different Variables.

Select quiz1 and click the arrow to move the quiz1 variable into the white box.

Click Old and New Values.

Select Range, LOWEST through value on the left and enter 2. Then on the upper right corner of the box enter 2 where it says Value. Next, click Add.

Enter the rest of the grouped values by first selecting Range on the left and enter the range between 2 and 4. Enter 4 where it says Value on the upper right and then click Add.

Recode the ranges for 4–6, 6–8, and 8–10 into the values 6, 8, and 10, respectively. Then click Continue.

Enter the new variable name quiz1grouped in the Name box. Then click Change.

Click OK.

Click Analyze at the top of the screen and go to Descriptive Statistics. Click Frequencies.

Select quiz1grouped at the bottom of the variable list and move it into the white box with the blue arrow. Click OK.

You should see a table that looks like this:

quiz1grouped
	Frequency	Percent	Valid Percent	Cumulative Percent
2.00	3	2.9	2.9	2.9
4.00	12	11.4	11.4	14.3
6.00	18	17.1	17.1	31.4
8.00	30	28.6	28.6	60.0
10.00	42	40.0	40.0	100.0
Total	105	100.0	100.0

(6) Finish the table in the Word document that looks like the table below for the variable percent which represents final course grades for the students.sav file. Use the Recode function described above.

X	f	p	cf	cp
0–49.99
50–59.99
60–69.99
70-–79.99
80–89.99
90–100

Bar graphs

To display the distribution of a nominal or ordinal (that is, categorical) variable one should use a bar graph (pie charts are also used, but we won't be discussing them). SPSS makes a number of different kinds of bar charts, but we'll focus on simple and clustered. Excel also makes graphs, but it is somewhat difficult and we will not cover it.

Bar chart (simple, clustered, and stacked): These are used most often to display the distribution of subjects or cases in certain categories, such as the number of A, B, C, D, and F grades in a given class.

Let’s start with looking at the distribution of ethnicity in our students.sav data file. So what our graph will show are the counts (or frequency) for each of ethnic category.

From the menu click Graphs→Chart Builder.
Drag the Simple Bar chart into the white box at the top.
Drag the ethnicity variable into the X-axis box.
Click OK.

You should get a bar chart that looks something like this.

Bar charts are also useful for presenting distributions that are broken into different categories.

For example suppose that we wanted to know the mean scores on quiz1 broken down by the three different sections.

From the menu click Graphs→Chart Builder.
Drag the Simple Bar chart into the white box at the top.
Drag the section variable into the X-axis box.
Drag the quiz1 variable into the Y-axis box.
Click OK.

We should end up with a graph that looks like this.

Suppose that we want to look at the same means by section but broken down by ethnicity. To do this we must use a clustered bar graph.

So select bar graph, then chose clustered. Now enter things as we did in the example above, drag the clustered bar chart and set the Cluster on X: set color box with the ethnicity variable as shown in the box below.

We should end up with a graph that looks like this.

Make a bar graph of the counts of the final grades (called grades in the file) in the class (i.e. A, B, C,...) and paste it into the Word document.

Make a bar graph of the counts of the final grades in the class (i.e. A, B, C,...), further broken down by whether they attended the review session or not and paste it into the Word document.

MAKE A BAR GRAPH OF GRADES ON THE FINAL EXAM.

Recall that it is called ExamGrade in the file.

(7) What was the most common grade in the course?

NOW MAKE A BAR GRAPH OF THE COUNTS OF THE FINAL GRADES FURTHER BROKEN DOWN BY WHETHER STUDENTS ATTENDED THE REVIEW SESSION OR NOT. (You can set either color or pattern for the clustered variable.)

(8) Based on the graph, would you conclude that attending the review session had an impact on final grades? Why?

Histograms

We already created frequency distribution tables. Now we will create histograms to display the distributions of SPSS scale (that is, interval or ratio quantitative) variables. We will construct a histogram rather than a bar graph for quiz1 because it is a continuous quantitative variable (its SPSS symbol is a ruler).

Histogram: A histogram is a pictorial representation of the distribution of values for a particular variable. The bars represent the number of occurences of each value. These look similar to bar graphs except they are used more often to indicate the number of subjects or cases in ranges of values for a continuous variable, such as the number of subjects or cases in ranges of values for a continuous variable.

Creating a histogram of the students scores on `quiz1`.

At the top of the data window is a row of menus. Click the graphs menu.
Under this menu a large number of graphing options will appear. On the bottom third of the list is histogram. This is the option that we'll use to look at distributions (for this lab at least).
Select histogram. Now you'll get a window that looks like this:
Select quiz1 as your variable and then click OK.

This should result in a new window (the output window) opening up, and it should have your histogram in it. The histogram of quiz1 is basically just a picture of the frequency distribution table. Below is a frequency distribution table and a histogram for quiz1.

For quiz1 the frequency table output should look something like this:

In this case the histogram is a little different than you might expect after comparing it to the frequency distribution table above. Why?

Because, the above histogram is based on a Grouped frequency distribution table of quiz1 (see previous lab for discussion). Go ahead and group scores 10 & 9, 8 & 7, 6 & 5, etc. and see if now the histogram looks as you'd expect it would.

An important lesson from this is that the size of the interval that you plot may influence the overall shape of the histogram.

NOW MAKE HISTOGRAMS OF THE OTHER 3 QUIZZES (2, 3, and 4).

(9) What are the differences between the different distributions?

(10) Which quiz was the hardest? Which was the easiest? Why do you come to that conclusion?

(11) On which quiz(zes) did most people get the same score? On which quiz(zes) were the scores widely distributed?

(12) Are there some quizzes where some students did especially well (compared to the rest) or especially poorly?

(13) Which quiz(zes) was/were positively skewed? Which quiz(zes) was/were negatively skewed? Are there any that are not skewed (i.e. are roughly symmetric)?

(14) Are there any scores that may be potential outliers?

FINALLY, save your SPSS datafile and ATTACH THE OUTPUT OF ALL GRAPHS & HISTOGRAMS IN THIS LAB TO ASSIGNMENT 7.