SPSS: Crosstabulation and chi-square test
For the following instructions:
* = A single click of the left mouse button
**= A double-click of the left mouse button
Crosstabulation
Crosstabulation is useful to show the relationship between two or more categorical variables. Usually, continuous data is not used for chi-square analyses since a great deal of information is lost by the process of categorization.
Crosstabs Example
Do people with different work statuses (e.g., full-time, retired, etc.) differ in (a) how exciting life is
and (b) happiness?
This amounts to tabulating frequencies for life excitement and happiness, but it must be broken down
by work status
Crosstabs gives frequencies for one variable separately for each level of another variable
Computing Crosstabs in SPSS
Choose Statistics, Summarize, Crosstabs
Select categorical variables; put one in Row and the other in Column
Output:
Case Processing Summary shows missing values for each table
Crosstab shows frequencies of one variable for each level of the other
Calculating Percentages
Choose Cells, Row Percentages to show percentages across each row
Choose Cells, Column Percentages to show percentages across each column
Choose Cells, Total Percentages to find percentage of respondent that were in each cell
Expected Counts
Expected counts are based on marginal percentages
Multiply the marginal percentages together to get the expected percentage for that cell, then multiply
by N to get expected counts
Or, have SPSS compute them -- Choose Cells, Expected Counts
Residuals
Difference between expected and observed counts
Choose Cells, Unstandardized Residuals
Standardized Residuals are distributed as z-scores (they were divided by the standard deviation of the
residuals)
Controlling for a Third Variable
Controlling for a variable means it is held constant
This allows us to look at crosstabs separately for each value of a third variable
Example: wrkstat by life separately for men and women
In SPSS add sex as a layer in Crosstabs
Bar Charts
Simple
Can only show frequencies of one variable
Choose Graphs, Bar, Simple
Cluster and Stacked
Can show frequencies of one variable broken down by another
Percentage information can also be shown
Crosstabs
Compute percentages of happy for different values of wkstat
Compute percentages of wkstat for different values of life; include expected values and residuals
Compute percentages of wkstat for different values of life, layered by gender
Compute bar charts for wkstat by life
Chi-Square (c2)
Chi-Square Lecture
Chi-Square Example
Research questions - Are there gender differences in happiness? How about in how important it is to
have a fulfilling job?
What would you hypothesize?
The hypothesis test for whether the pattern of percentages in one variable differs as a function of
another is called the chi-square test
Hypothesis Testing
We test the null hypothesis that nothing interesting is happening (versus alternative hypothesis that
findings are interesting)
The null hypothesis can only be rejected if there is a .05 probability that our findings are due to chance
Hypothesis tests determine the extent to which our findings may be due to chance
Computing the Pearson Chi-Square test in SPSS
Chi-Square (c2) Tests of Independence: SPSS can compute the expected value for each cell, based on the assumption that the two variables are independent of each other. If there is a large discrepancy between the observed values and the expected values, the c2 statistic would be large, which suggests a significant difference between observed and expected values. In addition, a probability value is also computed.
- *Statistics, *Summarize, *Crosstabs
- * the desired variable in the list to the left, then * the uppermost of the right arrows to
indicate that this variable be the row variable.
- * a second variable, and * the middle right arrow (to indicate the column variable).
- For three or more variables: use the lowest box in this window. * on the third variable
under section list, and then * the lowest of the three right arrows.
- * OK when complete.
- You can now conduct a chi-square analysis. * Statistics. Here, many different tests of
independence or association are listed. * Chi-square, * Phi and Cramer's V, * Continue, * OK
- To conduct a cross tabulation and chi-square analysis on a subset of a certain variable,
select the variables for crosstabulation, choose cell values, and the desired statistics. Then, * Data (in the Menu Bar at the top of the screen). * Select Cases, *If
condition is satisfied, *If. Select desired variable from list on the left, *
right arrow to paste it in the "active" box, type in selected levels to consider.
* Continue when completed.
- Output shows Pearson chi-square and "Asymp. Sig." (significance level)
-
If "Asymp. Sig." is less than .05 then the residuals differ as a function of the independent variable
The chi-square test essentially tells us whether the results of a crosstab are statistically significant
A chi-square will be significant if the residuals for one level of a variable differ as a function of
another variable
The chi-square value does not tell us the nature of the differences
The Chi-Square Formula
What are all those symbols?
c2 = chi-square
S = Sigma (sum of...)
fo = frequency observed
fe = frequency expected
Degrees of freedom are necessary to compute the significance of the chi-square: df = (#rows -
1)(#columns - 1)
Assumptions of the Chi-Square
Categories are independent (no overlap)
Must have an expected count of at least 5 in each cell
Remember that large samples mean large chi-squares, thus making it easier to find a significant
chi-square (this is called power)