UNDER CONSTRUCTION

SPSS : Regression Procedures

For the following instructions:

SPSS allows you to perform both simple and multiple regression. The output produced by the Regression command includes four different values:

A score which measures the strength of the relationship between the DV and the IV. This is designated with a capital R (the same as the bivariate correlation "r").
A probability value (p) associated with R which indicates the significance of that assoc.
R square, which is the proportion of variance in one variable accounted for by the other variable.
The constant and the coefficient (called B-values) for the regression equation.

To perform simple linear and curvilinear regression:

*Analyze, *Regression, *Linear
A new dialog box opens which allows you to conduct regression analysis. Here, enter the independent and dependent variables you wish to use. Do this by highlighting the variable in the left-hand column, * on the right arrow next to Dependent or Independent(s).
* OK when completed.

Multiple Regression Analysis: SPSS can also perform multiple regression analysis, which shows the influence of two or more variables on a designated dependent variable. In multiple regression analysis, you may use any number of variables for use as predictors. However, many variables are not necessarily the best. Instead, you would want to find variables which significantly influence the dependent variable. SPSS has procedures where only significant predictors are entered into the regression equation. The Regression procedure will cease to add new variables when the p value associated with the inclusion of an additional variable increases above the .05 significance. (You may also designate another level of significance as a criterion for entry into the equation.)

Also recognize the menu box labeledMethod. This allows you five different methods of entering variables into the regression equation. * on the down arrow to make them appear.

Enter: This is the forced entry option. SPSS will enter at one time all specified variables regardless of significance levels.
Forward: This method will enter variables one at a time, based on the significance value to enter.
Backward: This enters all independent variables at one time and then removes variables one at a time based on a preset significance value to remove.
Stepwise: This combines both forward and backward procedures. Since inter correlations are complex, the variance due to certain variables will change when new variables are entered into the equation. This is the most frequently used of the regression methods.
Remove: This is the forced removal option. It requires an initial regression analysis usingthe Enter procedure. In the next block (Block 1 of 1) you may specify one or morevariables to remove. SPSS will then remove the specified variables and run the analysis again.

By * on Statistics, two options appear. Estimates will produce the B values, associated standard errors, t values, and significance values. The Model fit will produce the Multiple R, R2, an ANOVA table and associated F rations and significance values.

Correlation and Regression

Basics

Let's Start With Scatterplots

The Correlation Coefficient

Correlation and Causality

Linear Regression

The Regression Line

Using the Regression Line

Explaining Variance

Regression in SPSS

Interpreting Output

Assumptions and Cautions

Using SPSS to compute the correlation coefficient

Under the Analyze menu you will find the Correlate submenu.
From the Correlate submenu you want to select "bivariate"

In the bivariate correlation window, select the variables that you want correlated (you can have more than two at a time). For today's lab, make sure that Pearson is selected (the others are other kinds of correlations).

The output that you get is a correlation matrix. It correlates each variable against each variable (including itself). You should notice that the table has redundant information on it (e.g., you'll find an r for height correlated with weight, and and r for weight correlated with height. These two statements are identical.)

In SPSS you'll also get some additional information in the correlation matrix.
For now you can ignore the "Sig. 2-tailed" stuff. N is simply the number of cases in your data set.

So in the correlation matrix above, height and weight have an r = .794. This is a fairly strong positive correlation.

Getting SPSS to put a least squares regression line on our scatterplot

Okay, so now we know how regression works and (if we must) we can do it by hand. Now let's see how to do regression in SPSS. We'll start with how to get SPSS to put a least squares regression line on our scatterplot and then we'll discuss how to get the regression equation.

The first step is to create the scatterplot (remember, height should be your response variable Y).
After the scatterplot is created, we can fit a least squares regression line on the plot by using the Chart Editor. Recall that to open the chart editor you need to double click on the graph of interest. This will open up a new window, the chart editor.
Now you need to go into the "Chart" menu and select "options". Click here.
This will open up the options window. In this window you should click on "Total" in the fit line box (upper right corner). Then click 'OK'. That's it, your scatterplot should now have a line on it.

Getting SPSS to compute the least squares regression equation

Note: the regression analysis also gives us the power to do more than just get the equation for the line. Because of this, our output will have a lot of information in it. Be prepared to have to sift through it to get the information that we want.

Under the "Analyze" menu select "regression". Click here.
Under the "regression" submenu select "linear". Click here.
Enter your dependent (response) variable and your independent (explanatory) variable into the appropriate fields. Click here.
Your output window will have a bunch of information in it. The information for the Least Squares Regression curve are highlighted in yellow here (but won't be in your SPSS output). They correspond to the "Unstandardized Beta weights" for the intercept (constant) and the slope (your variable name).

So for this relationship the linear equation is:

Y = 1.2X - 12.9

Some facts about using least squares regression

As we already mentioned, unlike correlation, in regression the distinction between explanatory and response variables is very important. If you look back at the doing regression by hand part of the lab you'll notice that we are only looking at the deviations from the line for the Y variable (in the Y direction). That is because, we are trying to use X to predict Y, or to explain the variability in Y.
There is a close connection betweeen correlation and the slope of the least-squares line. This was also discussed above.
The least-squares line always passes through the point (,).
- The correlation r describes the strength of a straight line relationship. In the regression setting, this description takes a specific form: the square of the correlation, r², is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x. We'll discuss more about this point in the next lab.

R² in SPSS.

Residuals and residual plots

The predicted value is not perfect (unless r = ą 1.0). Notice that it may be that none of the observed data points actually fit exactly on the line.

In other words, there is some error. We refer to the error between each point and the predicted points the residuals.
residual = observed y - predicted y
= Y -

The sum of the residuals should always equal 0 (as should the mean). This is because the least squares regression line splits the data in half, half of the error is above the line and half is below the line.

However, in addition to summing to zero, we also want there the residuals to be randomly distributed. That is, there should be no pattern to the residuals. If there is a pattern, it may suggest that there is more than a simple linear relationship between the two variables. To examine the residuals we can graph these residuals in a residual plot - a scatterplot of the regression residuals against the expanatory variable.

This is the pattern that we want to see if there is a simple linear relationship between X and Y.

This pattern suggests that a non-linear (probably a curved relationship) might be a better description of the relationship.

A pattern like this suggests that as X increases, so does the variability of the residuals. This is refered to as a violation of homogeniety of variance.

Getting residual plots in SPSS

The first step is to save the residuals.

This is done when SPSS performs the regression analysis. At the bottom of the regression window there is a button labeled "save".
When you click the save button, this window opens. Click the save residuals box in the upper right corner.

This will save a new column in your datafile. It contains the residuals of your linear regression analysis.

The second step is to make a scatterplot, using the residuals as your Y-axis variable and the X variable as your X-axis.

Below, I've provided a link to a very nice tool for getting a feel for regression (and residuals). On this page, you can place points on a scatterplot. The page will automatically compute the least squares regression line corresponding to the points (you can have a line put on too). Additionally, you can have it open up another window on which it will display a residual plot. I stongly suggest that you play with this.

Some suggestions for "play activities":

Enter data for a strong positive relationship. Examine the residual plot. Add an outlier, what happens to the correlation and slope of the line? What happens to the residual plot?
Enter data for a weak negative relationship. Examine the residual plot. Add an outlier, what happens to the correlation and slope of the line? What happens to the residual plot?
Enter data for a curved relationship. Examine the residual plot.

	This is the pattern that we want to see if there is a simple linear relationship between X and Y.
	This pattern suggests that a non-linear (probably a curved relationship) might be a better description of the relationship.
	A pattern like this suggests that as X increases, so does the variability of the residuals. This is refered to as a violation of homogeniety of variance.

SPSS : Regression Procedures