Psychology 340 Syllabus
Statistics for the Social Sciences

Illinois State University
J. Cooper Cutting
Fall 2002



Correlation

  • What is correlation?
  • How do we compute correlation?
  • How do we interpret correlation?
  • Testing hypotheses with correlation.
  • Correlation in SPSS


    Consider the follwing example:

    Suppose that you want to know if there really is a relationship between amount of time studying and test performance. So you get 6 of your fellow students to volunteer to report to you how much time they spent studying (in hours) for the exam, and what their score on the exam is (on a scale of 0 to 6, with 6 being the maximum score on the exam). The data are presented below in two ways. The table shows each person's exam score and the number of hours that they studied. The graph (a scatterplot) shows the same information in a different way. Each point corresponds to a person, the location of the point is determined by their values on the two variables (study time [X-axis] & exam score [Y-axis]).

    Data Set Scatterplot
    Person	Hrs(X)	Exam score(Y)
        A	  1		1
        B 	  1		3
        C 	  3		2
        D 	  4		5
        E	  6		4
        F	  7		5
    
    
    Y
    X

    Correlation is a statistical technique that measures and describes the relationship between two variables. (Notice that this means that there must be at least two scores from each individual, one for each of the two variables.)

    Why (and When) do we use correlations?


    Computing Pearson's correlation coefficient

    Okay, so how do we quantify the idea of correlation? There are a number of different correlations, we will focus on the most common measure, the Pearson product-moment correlation.

    	r    =   degree to which X and Y vary together     =       
    covariability of X and Y    
    	   	degree to which X and Y vary separately       variability of X and Y separately
    

    Now let's consider how we actually compute r.

    r =

    note: your book uses what looks like a very different formula.


    However, if you compare that formula with the one I'll use, you'll see that they really are the same thing [the book's formula removes 1/(n-1) from the summation, which turns our SSX & SSY become standard deviations (sX & sY)]

    
     	X	Y	X-	Y-	
    (devX)(devY)
    	0	1	-6	-1		6
    	10	3	+4	+1		4
    	4	1	-2	-1		2
    	8	2	+2	 0		0
    	8	3	+2	+1		2    
    sum     30	10				14
    mean	6.0	2.0
    
    

    So: SP = 14

    Okay, now let's compute the pearson correlation (r).

    
    	r    = 	 degree to which X and Y vary together   =       covariability of X and Y
    	   	degree to which X and Y vary separately     variability of X and Y separately
    


    Now that we know how to compute a correlation, we need to consider how we interpret it. We already know the basics:

    But there are some additional things that we need to consider.

    Let's look at each point in a little more depth


    Hypothesis Testing with Pearson r

    Okay, what about hypothesis testing? Can we test hypotheses with correlations?


    Using SPSS for Hypothesis Testing with Pearson r

    We can also use SPSS to a hypothesis test with Pearson r. We could calculate the Pearson r with SPSS and then look at the output to make our decision about H0. The output will give us a p value for our Pearson r (listed under Sig in the Output). We can compare this p value with a to determine if the p value is in the critical region.

    Under the Analyze menu you will find the Correlate submenu.

    From the Correlate submenu you want to select "bivariate"

    In the bivariate correlation window, select the variables that you want correlated (you can have more than two at a time). For today's lab, make sure that Pearson is selected (the others are other kinds of correlations).

    The output that you get is a correlation matrix. It correlates each variable against each variable (including itself). You should notice that the table has redundant information on it (e.g., you'll find an r for height correlated with weight, and and r for weight correlated with height. These two statements are identical.)

    In SPSS you'll also get some additional information in the correlation matrix. This is te information we are now interested. Look where it says "Sig. 2-tailed". This is where we'll find the p value we're looking for to compare with a. In this case, the given p is .000 (meaning p < .001). If this value is lower than a (which is should be), we can reject the H0. N is simply the number of paired scores that were in the comparison.

    So in the correlation matrix above, height and weight have an r = .794. This is a fairly strong positive correlation.


    Now try a few of these types of problems on your own. Show all four steps of hypothesis testing in your answer (some questions will require more for each step than others) and be sure to state hypotheses in terms of r.

    (1) A high school counselor would like to know if there is a relationship between mathematical skill and verbal skill. A sample of n = 25 students is selected, and the counselor records achievement test scores in mathematics and English for each student. The Pearson correlation for this sample is r = +0.50. Do these data provide sufficient evidence for a real relationship in the population? Test at the .05 a level, two tails.

    (2) It is well known that similarity in attitudes, beliefs, and interests plays an important role in interpersonal attraction. Thus, correlations for attitudes between married couples should be strong and positive. Suppose a researcher developed a questionnaire that measures how liberal or conservative one's attitudes are. Low scores indicate that the person has liberal attitudes, while high scores indicate conservatism. Here are the data from the study:

    Test the researcher's hypothesis with a set at .05.

    (3) A researcher believes that a person's belief in supernatural events (e.g., ghosts, ESP, etc) is related to their education level. For a sample of n = 30 people, he gives them a questionnaire that measures their belief in supernatural events (where a high score means they believe in more of these events) and asks them how many years of schooling they've had. He finds that SSbeliefs = 10, SSschooling = 10, and SP = -8. With a = .01, test the researcher's hypothesis.

    (4) To measure the relationship between anxiety and test performance, a researcher asked his students to come to the lab 15 minutes before they were to take an exam in his class. The researcher measured the students' heart rates and then matched these scores with their exam performance after they had taken the exam. Use the data below and SPSS to conduct a hypothesis test for the correlation between anxiety and test performance in the population. Use a = .05.



    If you have any questions, please feel free to contact me at jccutti@mail.ilstu.edu.