SPSS : Regression Procedures
For the following instructions:
* = A single click of the left mouse button
**= A double-click of the left mouse button
SPSS allows you to perform both simple and multiple regression.
The output produced by the Regression command includes four different values:
- A score which measures the strength of the relationship between the DV and the IV. This
is designated with a capital R (the same as the bivariate correlation "r").
- A probability value (p) associated with R which indicates the significance of that assoc.
- R square, which is the proportion of variance in one variable accounted for by the other
variable.
- The constant and the coefficient (called B-values) for the regression equation.
To perform simple linear and curvilinear regression:
- *Analyze, *Regression, *Linear
- A new dialog box opens which allows you to conduct regression analysis. Here, enter the independent and dependent variables you wish to use. Do this by highlighting the variable in the left-hand column, * on the right arrow next to Dependent or Independent(s).
- * OK when completed.
Multiple Regression Analysis:
SPSS can also perform multiple regression analysis, which shows the influence of two or more variables on a designated dependent variable. In multiple regression analysis, you may use any number of variables for use as predictors. However, many variables are not necessarily the best. Instead, you would want to find variables which significantly influence the dependent variable. SPSS has procedures where only significant predictors are entered into the regression equation. The Regression procedure will cease to add new variables when the p value associated with the inclusion of an additional variable increases above the .05 significance. (You may also designate another level of significance as a criterion for entry into the equation.)
Also recognize the menu box labeledMethod. This allows you five different methods of entering variables into the regression equation. * on the down arrow to make them appear.
- Enter: This is the forced entry option. SPSS will enter at one time all specified variables
regardless of significance levels.
- Forward: This method will enter variables one at a time, based on the significance value to
enter.
- Backward: This enters all independent variables at one time and then removes variables
one at a time based on a preset significance value to remove.
- Stepwise: This combines both forward and backward procedures. Since inter correlations
are complex, the variance due to certain variables will change when new variables
are entered into the equation. This is the most frequently used of the regression
methods.
- Remove: This is the forced removal option. It requires an initial regression analysis usingthe Enter procedure. In the next block (Block 1 of 1) you may specify one or morevariables to remove. SPSS will then remove the specified variables and run the analysis again.
By * on Statistics, two options appear. Estimates will produce the B values, associated
standard errors, t values, and significance values. The Model fit will produce the Multiple R, R2, an ANOVA table and associated F rations and significance values.
Does self-esteem increase as one's social support increases?
Do psychiatric symptoms decrease as one's social support increases?
Correlation and regression are appropriate tests of the relationship between two continuous variables
Basics
Correlation describes the relationship between two continuous variables
Correlation and regression test the null hypothesis that the two variables are independent of one another
Regression allows one to predict scores on one variable given a score on another
Let's Start With Scatterplots
A visual description of the relationship between two continuous variables
In SPSS choose Graphs, Scatter, Simple, Define
Select X-axis and Y-axis variables as independent and dependent
Scatterplots do not always clearly show a relationship
The correlation coefficient ("r") gives two pieces of information:
strength of relationship, measured by absolute value
direction of relationship, indicated by a positive sign or negative sign
Correlations in SPSS
Choose Analyze, Correlate, Bivariate
Enter the two variables in the "Variables" box
Select "OK"
Is correlation significant?
If so, is it positive or negative?
Correlation and Causality
Correlation does not imply causality
Conditions for causality
Temporal sequence
There is a relationship (correlation) between two variables
Relationship cannot be explained by a third variable
Linear Regression
Used to predict one's score on a variable given a score on a second variable
The mean is the best predictor of a variable (let's call it "Y") in the absence of any other information
With information about a related independent variable, prediction can be improved
The Regression Line
A regression line provides more precise predictions than simply predicting the mean for each observation
Regression line is Y = a + bX
"a" is the intercept (value of Y when X=0)
"b" is the slope (change in Y per unit increase in X)
Using the Regression Line
Use number of close friends to predict number of health center visits per year with this regression line: Y = 5 + (-1)X
Use number of close friends to predict number of late-night phone calls per week with this regression line: Y = 1 +
(0.5)X
Explaining Variance
"Explaining variance" is the extent to which an independent variable accounts for variation in the dependent variable
Percentage of variance explained is found by squaring the correlation (R2)
Regression in SPSS
Choose Analyze, Regression, Linear
Enter dependent variable in the "Dependent" box
Enter independent variable in the "Independent" box
Select "OK"
Interpreting Output
Model Summary -- shows r and R2
ANOVA table -- shows significance of r and R2 as an F statistic
Coefficients -- shows the regression line
Under B column is intercept and slope
Beta column shows standardized slope (the correlation)
Significance of slope is indicated by t
Assumptions and Cautions
We assume the relationship between X and Y is linear
We assume the variance of Y is equal at all values of X (homoscedasticity)
Do not assume causality
Using SPSS to compute the correlation coefficient
Under the Analyze menu you will find the Correlate submenu.
From the Correlate submenu you want to select "bivariate"
|
|
In the bivariate correlation window, select the variables that you want correlated (you can have more than two at a time). For today's lab, make sure that Pearson is selected (the others are other kinds of correlations). |
|
The output that you get is a correlation matrix. It correlates each variable against each variable (including itself). You should notice that the table has redundant information on it (e.g., you'll find an r for height correlated with weight, and and r for weight correlated with height. These two statements are identical.)
In SPSS you'll also get some additional information in the correlation matrix.
For now you can ignore the "Sig. 2-tailed" stuff. N is simply the number of cases in your data set.
|
|
So in the correlation matrix above, height and weight have an r = .794. This is a fairly strong positive correlation.
Getting SPSS to put a least squares regression line on our scatterplot
Okay, so now we know how regression works and (if we must) we can do it by hand. Now let's see how to do regression in SPSS. We'll start with how to get SPSS to put a least squares regression line on our scatterplot and then we'll discuss how to get the regression equation.
- The first step is to create the scatterplot (remember, height should be your response variable Y).
- After the scatterplot is created, we can fit a least squares regression line on the plot by using the Chart Editor. Recall that to open the chart editor you need to double click on the graph of interest. This will open up a new window, the chart editor.
- Now you need to go into the "Chart" menu and select "options". Click here.
- This will open up the options window. In this window you should click on "Total" in the fit line box (upper right corner). Then click 'OK'. That's it, your scatterplot should now have a line on it.
Getting SPSS to compute the least squares regression equation
So for this relationship the linear equation is:
Y = 1.2X - 12.9
Some facts about using least squares regression
- As we already mentioned, unlike correlation, in regression the distinction between explanatory and response variables is very important. If you look back at the doing regression by hand part of the lab you'll notice that we are only looking at the deviations from the line for the Y variable (in the Y direction). That is because, we are trying to use X to predict Y, or to explain the variability in Y.
- There is a close connection betweeen correlation and the slope of the least-squares line. This was also discussed above.
- The least-squares line always passes through the point (,).
- The correlation r describes the strength of a straight line relationship. In the regression setting, this description takes a specific form: the square of the correlation, r2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x. We'll discuss more about this point in the next lab.
R2 in SPSS.
If you look at the output of the regression analysis you'll find r2 in the "Model Summary" box (Don't worry about the "adjusted R square").
Residuals and residual plots
The predicted value is not perfect (unless r = ± 1.0). Notice that it may be that none of the observed data points actually fit exactly on the line.
In other words, there is some error. We refer to the error between each point and the predicted points the residuals. residual = observed y - predicted y
= Y -
The sum of the residuals should always equal 0 (as should the mean). This is because the least squares regression line splits the data in half, half of the error is above the line and half is below the line.
|
|
However, in addition to summing to zero, we also want there the residuals to be randomly distributed. That is, there should be no pattern to the residuals. If there is a pattern, it may suggest that there is more than a simple linear relationship between the two variables. To examine the residuals we can graph these residuals in a residual plot - a scatterplot of the regression residuals against the expanatory variable.
|
This is the pattern that we want to see if there is a simple linear relationship between X and Y.
|
|
This pattern suggests that a non-linear (probably a curved relationship) might be a better description of the relationship.
|
|
A pattern like this suggests that as X increases, so does the variability of the residuals. This is refered to as a violation of homogeniety of variance.
|
Getting residual plots in SPSS
This is done when SPSS performs the regression analysis. At the bottom of the regression window there is a button labeled "save".
|
|
When you click the save button, this window opens. Click the save residuals box in the upper right corner.
|
|
Below, I've provided a link to a very nice tool for getting a feel for regression (and residuals). On this page, you can place points on a scatterplot. The page will automatically compute the least squares regression line corresponding to the points (you can have a line put on too). Additionally, you can have it open up another window on which it will display a residual plot. I stongly suggest that you play with this.
It takes a little time to load the page, so be patient.