Cross tabulation and the Pearson Chi-Square Test



Suppose that you have noticed that a lot of psychology majors are women with many fewer men. It could be that there are just more women enrolled in the university, and so you'd expect more women psych majors than men. Or, it could be that there is something about the psychology major that attracts women (or repels men?).

When do we use these methods?


Crosstabulation

Example

Suppose that you are interested in whether there is a relationship between gender and educational level (undergraduate vs. graduate students) at ISU (year 2002). That is, are men and women equally likely to pursue a graduate education relative to an undergraduate education.

Setup our data in a "cross tabulation" of our two variables. The data are observed frequencies (fo).

Student level
Undergraduate Graduate
Sex Male 7,715 938
Female 10,780 1,625

The next step in the crosstabulation procedure is to compute the marginals for the rows and columns. This simply means add the frequencies across the rows and down the columns.

Student level
Undergraduate Graduate Row Marginals
Sex Male 7,715 938 8,653
Female 10,780 1,625 12,405
Column marginals18,495 2,563

So what can we tell from this table?


Hypothesis Testing with Chi-squared

The Chi-Square Formula

Example

A manufacturer of watches takes a sample of 200 people. Each person is classified by age and watch type preference (digital vs. analog). The question: is there a relationship between age and watch preference?

Setup our data in a "cross tabulation" of our two variables. The data are observed frequencies (fo).

Watch preference
digital analog undecided
Age under 30 90 40 10
over 30 10 40 10

Step 1: State the hypotheses and select an alpha level

Step 2: Step 3: Collect your data and compute your test statistic

So let's enter the predicted (expected) values (in green) into our crosstabulation.

Watch preference
digital analog undecided
Age under 30 90
70
40
56
10
14
140
over 30 10
30
40
24
10
6
60
100 80 20

Part 3: Compute the Chi-squared statistic

Step 4: Compare this computed statistic (38.09) against the critical value (5.99) and make a decision about your hypotheses



Computing Crosstabs and Chi-squared in SPSS


Assumptions of the Chi-Square


1) Gender differences in dream content are well documented. Suppose that a researcher studies aggression content in the dreams of men and women. Each subject reports his or her most recent dream. Then each dream is judged by a panel of experts to have low, medium, or high aggression content. The observed frequencies are shown in the following table. Is there a relationship between gender and the aggression content of dreams? Test with a = 0.01. Be sure to state your hypotheses.
Aggression content
low medium high
Gender Female 18 4 2
male 4 17 15

2) Suppose that you're interested in whether there is a relationship between sex and membership in an after-school club (in high school students). So you randomly selected 30 students from a local high school and recorded their sex and whether or not they were members of an after-school club. Create a crosstabulation for the following data.

3) Using SPSS, compute the marginals and expected values and chi-squre for the data in (2).

For the following two questions download the file students.sav.

4) Were juniors and seniors more likely than freshmen and sophomores to attend the review sessions? Provide a bar chart showing the breakdown. Assuming an a = 0.05, test whether these variables are independent. Remember to state your hypotheses.

5) Were men more likely than women to do an extra credit assignment? Report the number of people who did and didn't do the extra credit project broken down by gender. Assuming an a = 0.05, test whether gender and extra credit participation are independent. Remember to state your hypotheses.



If you have any questions, please feel free to contact me at jccutti@mail.ilstu.edu.