t-tests and z-scores are great but they are limited, they can't be used for more than 2 groups. Instead what we need to do is use a new inferential statistical procedure: Analysis of Variance (ANOVA).
Why do we have to use variance?
How would we compute a single score that would describe the difference between these distributions? Difference just doesn't cut it, but variance does (recall it is a measure of how these much these three distributions differ from one another).
Now let's consider the sources of variance (that is, what causes the variability that we observe in our dependent variable).
So when is an ANOVA the appropriate analysis? Check the decision tree.
Find the string of decisions that lead to a 1-way between groups Analysis of Variance.
We'll start with by talking about the overall logic of ANOVA, and discuss some new notation. Then we'll work through the process and some examples. We'll end the chapter with a discussion of post-hoc tests (don't worry about what these are, yet).
We'll be using the same hypothesis testing logic that we used in previous chapters, but the details will change (as they did from chapter to chapter).
Okay, let's start by considering a new example that is a single-factor, independent-measures research design (more complex designs will flow in the future weeks).
Example research project
Step 1: setting your alpha level is the same as we've worried about in the past. Because we're dealing with variances the F-distribution really only has a positive tail that we worry about, so consider all tests to be 1-tailed.
The big difference with ANOVA is that we've got more than one possible alternative hypothesis. However, the basic ANOVA is still only testing the null hypothesis. Selecting between different alternative hypotheses require further tests (we'll discuss these a couple of lectures from now).
The basic alternative hypothesis is that "not all groups are the same." However there are a number of ways that this could be statisfied.
Step 2: Compute our degrees of freedom
There are a number of different degrees of freedom that we need to compute, one for the between group variance and one for the within groups variance (and a third for total df).
dfwithin = total number of scores - number of groups
dfbetween = number of groups - 1
Step 3: The F-ratio is conceptually similar to the tests that we've already examined, but the number of computations that we have to do increase (note: they don't get harder, just more numerous).
Okay, here is where things will start to get complex. For now, we'll stick with talking about things at the conceptual level (no computations yet). In many ways the ANOVA is very similar to a two independent smaples t-test (in fact we'll later we'll talk about how they are nearly identical under certain circumstances).
tobs = obtained difference between sample means difference expected by chance
For ANOVA the test statistic (called the F-ratio) is similar except that we use variance rather than just difference.
F = variance (differences) between sample means variance (differences) expected by chance (error)
Think back to the sources of variance.
So the F-ratio can be expressed as:
F-ratio = variance (differences) between sample means variance (differences) expected by chance (error)F-ratio = variance between treatments variance within treatments
F-ratio = treatment effect + individual differences + random error individual differences + random error
The experimental situations that we're dealing with are more complex than those we've dealt with so far. We aren't going to need to use any new kinds of math (still just adding, subtracting, multiplying, and dividing), but we do need some new notation. The computations that we'll go through are pretty much the same things that we've done in the past (e.g. computing sums of squares, means, etc.), but there are more of them.
|
K = # of treatment conditions (or groups), each of which is called a level
of the factor. ni = # of observations in the ith group (if they are not equal) N = Sni = total sample size Ti = SXij (where Xij is the jth observation in the ith group) G = SXij = the sum of all the X's = grand sum SSi = the sum of squares for each group = S(Xij - |
Basically the new notation is takes into account the need to know means and standard deviations for each group alone and for all of the data together (collapsed across the different groups).
Example:
| Study method | ||
| Method A book alone |
Method B taking notes |
Method C borrowing notes |
| 0 | 4 | 1 |
| 1 | 3 | 2 |
| 3 | 6 | 2 |
| 1 | 3 | 0 |
| 0 | 4 | 0 |
| T1 = 5 | T2 = 20 | T3 = 5 |
| SS1 = 6 | SS2 = 6 | SS3 = 4 |
| n1 = 5 | n2 = 5 | n3 = 5 |
SX2 = 106
G = 30 = grand sum
N = 15 = total sample size
= 30/15 = 2 = grand mean
K = 3 = # of treatment conditions (or groups), each of which is called a
level of the factor
recall that what we're after is:
F-ratio = variance between treatments variance within treatmentsso what we need to do is figure out how to get these two variances.
in the past we've used s2 = SS/df. That's essentially what we're going to be doing here, but things will look a bit more complex.
There is a shorter way to do the computation:
SStotal = SX2 - (G2/N)
SStotal = 106 - (302/15) =106 - 60 = 46
But it isn't the SStotal that we really need. Remember that what we really need are the within and between groups variabilities.
we add together all of the SS estimates for each group
SSwithin = SSS inside each treatment = SSSi
= 6 + 6 + 4 = 16
But, there are two drawbacks of doing it this way:
| definitional | computational |
|
SSbetween = S[ni( |
SSbetween = S(T2/ni) - G2/N |
|
= 5(1 - 2) 2 + 5(4 - 2) 2 + 5(1 -
2)2
= 5 + 20 + 5 = 30 |
= 52/5 + 202/5 + 52/5 -
302/15
= 5 + 80 + 5 - 60 = 30 |
Let's check out math.
SStotal = SSwithin + SSbetween = 16 + 30 = 46 (and that's the number we got before)
Okay, let's return to what we're interested in: Variances.
We've got our SS, now we need to figure out our df. There are going to be two (or three depending on how you look at it) degrees of freedom, one for the between group variance and one for the within groups variance (and a third for total df).
dfwithin = N - K
dfbetween = K - 1
dftotal = dfwithin + dfbetween
dfbetween = 3 - 1 = 2
dftotal = 15 - 1 = 14, which is also = 12 + 2
MSbetween = SSbetween/dfbetween
for our example = 30/2 = 15
MSwithin = MSerror = Mean Square Error = SSwithin/dfwithin
--> for our example = 16/12 = 1.33
F-ratio = variance between treatments = MSbetween variance within treatments MSwithin
So the F-ratio for our example is: 15/1.33 = 11.28
Source SS df MS Between treatments 30 2 15.0 F = 11.28 Within treatments 16 12 1.33 Total 46 14
So what's the next step?
dfwithin = 12
dfbetween = 2
| df in the numerator | |||||
| df in denominator | 1 | 2 | 3 | 4 | 5 |
| 1 |
161 4052 |
200 4999 |
216 5403 |
225 5625 |
230 5764 |
| 2 |
18.51 98.49 |
19.00 99.00 |
19.16 99.17 |
19.25 99.25 |
19.30 99.30 |
| 3 |
10.13 34.12 |
9.55 30.92 |
9.28 29.46 |
9.12 28.71 |
9.01 28.24 |
|
: : |
: : |
: : |
: : |
: : |
: : |
| 12 |
4.75
9.33 |
3.88
6.93 |
3.49
5.95 |
3.26
5.41 |
3.11
5.06 |
| 13 |
4.67
9.07 |
3.80
6.70 |
3.41
5.74 |
3.18
5.20 |
3.02
4.86 |
|
: : |
: : |
: : |
: : |
: : |
: : |
In analysis of variance, a factor is an independent variable. A study that invloves only one independent variable is called a single-factor design. A study with more than one independent variable is called a factorial design. The individual treatment conditions that make up a factor are called levels of the factor.
F(dfbetween,dfwithin) = Fobs, p-value
for example:
A one-way ANOVA yeilded a significant effect of study method, F(2,12) = 11.28, p < 0.01.
| Source | SS | df | MS | |
| Between treatments | 30 | 2 | 15.0 | F = 11.28 |
| Withing treatments | 16 | 12 | 1.33 | |
| Total | 46 | 14 | ||
Note: in this day and age, computers actually have the family of F distributions, so your data output may actually give you your actual p-value. Rememeber that the logic of the test is such that you must specify your a level ahead of time. If you select 0.01, then that's the level that you are using for all of your tests. So if you do two experiments and your computer stats program tells you that in experiment 1, your p-value = .001, and in experiment 2 your p-value is .01 they are both equally statistically significant. The H0 is a yes/no decision. In this example the answer to both is YES. The results in Experiment 1 are NOT "more significant" than Experiment 2.
To set up a paired samples t-test you will need two columns of data, one for each sample (related samples) or one for each meansurement (repeated measures).
| Note: To do One-way ANOVA you'll need to have two variables (columns) in your data file (this is just like with 2 independent samples t-test except now your independent variable will have more than two categories). One column will contain the data (your dependent measure). The other column will be an independent variable that specifies which group the subject belongs to (e.g., 1 for group 1, 2 for group 2). | |
| Go to the Analyze menu and select the submenu Compare Means. In this submenu you'll see several tests. The one that we're interested in today is One-way ANOVA. | ![]() |
| After selecting One-way ANOVA you'll get a window that looks like this. Here you should select the variables that you are testing. Your test variable is your dependent variable. Your group variable is the independent variable that assigns each subject to a group. | |
![]() |
|
| Here is what the output will look like. |
|
| Notice that the output is given in the standard ANOVA table output. SPSS doesn't tell you to reject or fail to reject the H0, nor does it give you the Fcrit. To make your decision about the H0 you must compare the p-value with your a-level. If the p-value is equal to or smaller than the your a-level, then you should reject the H0, otherwise you should fail to reject H0. | |
One factor independent samples ANOVAF-ratio = variance (differences) between sample means variance (differences) expected by chance (error) F-ratio = treatment effect + individual differences + random error individual differences + random error
Single variable
– one Factor
·
Two
levels (t-test) o
Basically
you want to compare two groups o
The
statistics are pretty easy, a t-test
Disadvantages: ·
“True” shape of the function is hard
to see ·
interpolation and extrapolation are not a good
idea ·
more complex theories typically need more complex
designs (more than two levels of one IV)
·
More
than two-levels (ANOVA) o
Gives a better
picture
of the relationship (function) o
Requires more
complex
statistical analysis (analysis of variance and
pairwise-comparisions) o
Needs more
resources
(participants and/or stimuli)
| ||
Let's finish up with the situation that we started with in the fisrt
two ANOVA lectures.
Here was the data for this study.
With a = 0.05, our Fcrit(2,12) =
3.88. So we rejected the H0.
Recall what the H0 is:
If we were to write out our alternative hypotheses what would we
write?
All that our ANOVA analysis has told us is that we
reject H0. However, we may want to know which of the
groups are different (in other words, which alternative hypotheses can we
reject). In our current example this is pretty easy to answer just by
looking at the means.
Groups 1 and 3 clearly aren't different (1 - 1 = 0). Groups 1 and 2
differ by 3 (4 - 1) as do groups 2 and 3. So here the difference that the
ANOVA yeilds as being statistically different must be 1&2 and 2&3.
However it is not always this easy to tell just by looking at the means.
Usually what we have to do are some additional statistical tests.
There are two sets of tests that are used to determine which groups are
different: planned means comparisons and post hoc
tests.
The downside of making many comparisons (whether planned or post hoc) is
that you increase the chance of making a Type I error. Recall that the
a-level that we set is the probability of
rejecting the H0 when there really isn't a difference. This is
true for each comparison. However, when we start testing a whole set of
comparisions, then the overall the a-level
increases. In other words, the more comparisions that we make, the bigger
the chance of making a Type I error. The combined chance of making a Type
I error is refered to as Experimentwise error.
The reason that we limit the number of planned comparisons (typically to
no more than K - 1) is to keep our Experiment wise error levels low. Post
hoc tests are designed to take experimentwise error into account, although
depending on which test you use, the way in which it is taken into account
varies.
Each of these comparisons is a separate hypothesis test, each one
has a risk of making a Type I error. So, the more
comparisions that you make, the higher the risk of
concluding that there is a difference when there really isn't
one. This is called experimentwise alpha level (or
familywise error)
aEW = 1 - (1 - a)c c = # of comparisons
so for our example, if we chose a = 0.05
and make 3 comparisons
aEW = 1 - (1 - a)c = 1 - (.95)3 = 1 - .857 = .143
our chance of making a Type I error when making the comparisions
is now 1 in 7 rather than 1 in 20
Most post hoc tests have been designed to control the experimentwise
error.
Rather than doing planned comparisons or post hoc tests by hand, we'll
focus on how to do them and interpret them using SPSS.
Suppose that for your senior research project you decide to test the
effectiveness of three different studying methods on learning. Method A,
is to have students only read the textbook, but not go to class. Students
assigned to Method B, go to class and take notes, but don't read the
textbook. Students the the Method C group don't read the textbook or go
to class, they just get to look at another student's class notes
Study method
Method A
book aloneMethod B
taking notesMethod C
borrowing notes
0
4
1
1
3
2
3
6
2
1
3
0
0
4
0
n1 = 5
n2 = 5
n3 = 5
1 = 1
2 = 4
3 = 1
And here were the results.
Source SS df MS
Between treatments 30 2 15.0 F = 11.28
Within treatments 16 12 1.33
Total 46 14
There are several possibilities:
More generally, the alternative hypothesis is not all the groups
are equal.
m1 not equal to m2 not equal to m3
m1 not equal to m2 = m3
m1 = m2 not equal to m3 m1 = m3
not equal to m2
1 = 1
2 = 4
3 = 1
Using SPSS to do single factor ANOVA: Planned comparisons and Post hoc
tests
| Reminder: To do One-way ANOVA you'll need to have two variables (columns) in your data file (this is just like with 2 independent samples t-test except now your independent variable will have more than two categories). One column will contain the data (your dependent measure). The other column will be an independent variable that specifies which group the subject belongs to (e.g., 1 for group 1, 2 for group 2, 3 for group 3). | |||||
| Go to the Analyze menu and select the submenu Compare Means. In this submenu you'll see several tests. The one that we're interested in today is One-way ANOVA. | |||||
![]() |
|||||
| After selecting One-way ANOVA you'll get a window that looks like this. Here you should select the variables that you are testing. Your test variable is your dependent variable. Your group variable is the independent variable that assigns each subject to a group. | |||||
![]() |
|||||
| |||||
To the right is what the output from SPSS looks like. The results of Comparision 1 (group 1 vs 2) is significant (p = 0.001) Comparison 2 (group 1 vs 3) is not significant (p > 0.05) Comparison 3 (group 2 vs 3) is significant (p = 0.001) |
|
The output for these three tests is presented below. For each, you will see the results of each pairwise comparision. For example, the Tukey HSD test, book alone vs. notes alone, is significant (p = 0.004), while book alone vs. borrowed notes is not significant (p > 0.05). The next set is notes alone vs book alone and notes alone vs borrowed, and the set after that is borrowed against book alone and borrowed against notes alone. The results for the other post hoc tests are aranged in the same way.
Based on these results (either the planned comparisons, if we had some reason in advance to test the groups against one another, or the post hocs, if we found rejected the H0 based on our ANOVA first) we can reject several of the alternative hypotheses:
| m1 not equal to m2 not equal to m3 | REJECT | m1 not equal to m2 = m3 | REJECT |
| m1 = m2 not equal to m3 | REJECT |
| m1 = m3 not equal to m2 | FAIL TO REJECT |
An example
A drug company is developing several new pain killers. It wants two test
the effectiveness of the drugs compared to a placebo. They give 4 groups
of participants one of 4 drugs, A, B, C, and Placebo and then measure
their pain tolerance. Consider the following set of data. Use SPSS to
perform the One-way ANOVA.
H0: m1 = m2 = m3
= m4
Looking at the p-value, we should reject H0.
Looking at the p-values, we should conclude that drugs B and C differ from
the placebo, but drug A does not.
Looking at the p-values, we should conclude that C differs from the
placebo, but A and B don't.
Notice that this differs from the results of the planned comparisons.
Planned comparisons are more statistically powerful than post hoc
tests.
Drug type
Placebo
Drug A
Drug B
Drug C
0
0
3
8
0
1
4
5
3
2
5
5
Source SS df MS F p
Between treatments 54 3 18.0 9.0 0.006
Within treatments 16 8 2.0
Total 70 11
Comparison t p
1 (1, -1, 0, 0) 0.0 > 0.05
2 (1, 0, -1, 0) -2.6 0.032
3 (1, 0, 0, -1) -4.3 0.003
Comparison difference p
Placebo drug A 0.0 > 0.05
drug B -3.0 0.117
drug C -5.0 0.011