UNDER CONSTRUCTION |
Your textbook:
Consider the following senario:
Reading session duration | ||||
5 mins | 15 mins | 30 mins | Age |
3 yrs | 8 yrs | 14 yrs |
T-tests and z-scores are great but they are limited, they can't be used for more than 2 groups. Instead what we need to do is use a new inferential statistical procedure: Analysis of Variance (ANOVA).
The individual treatment conditions that make up a factor are called levels of the factor.
So the study described above is a factorial design, with two between groups factors, and each factor has 3 levels (sometimes described as a 3 by 3 between groups design).
We'll start with by talking about the overall logic of ANOVA, and discuss some new notation. Then we'll work through the process and some examples. We'll end the chapter with a discussion of post-hoc tests (don't worry about what these are, yet).
We'll be using the same hypothesis testing logic that we used in chapters 8-11, but the details will change (as they did from chapter to chapter).
Okay, let's consider a new example that is a single-factor, independent-measures research design.
H1: one of the groups is different from one or more of the other groups so there are really lots of possible alternative hypotheses
Often, people will just give the null hypothesis, because there are just too many alternatives (imagine how many we could have for our orignal 3X3 design)
insert figures here step 3: figure out the df for your test. We'll save this for a little later when we start using a concrete example. One thing to note is that we're going to have several dfs to consider (or worry about, if that's the way you feel).
step 4: find the critical f-statistic from the table (new table starting on pg 695)
df in the numerator | |||||
df in denominator | 1 | 2 | 3 | 4 | 5 |
1 |
161 4052 |
200 4999 |
216 5403 |
225 5625 |
230 5764 |
2 |
18.51 98.49 |
19.00 99.00 |
19.16 99.17 |
19.25 99.25 |
19.30 99.30 |
3 |
10.13 34.12 |
9.55 30.92 |
9.28 29.46 |
9.12 28.71 |
9.01 28.24 |
: : |
: : |
: : |
: : |
: : |
: : |
Recall that we'll have two dfs, we use one to find the correct row and the other to find the correct column. You'll also note that there are two numbers per cell. The lighter numbers correspond to a = 0.05, the bold numbers correspond to a = 0.01.
In many ways the ANOVA is very similar to a two independent smaples t-test (in fact we'll later we'll talk about how they are nearly identical under certain circumstances).
tobs = obtained difference between sample means difference expected by chance= insert formula for independent samples t-test here
For ANOVA the test statistic (called the F-ratio) is similar
F = variance (differences) between sample means variance (differences) expected by chance (error)
Why do we have to use variance?
How would we compute a single score that would describe the difference between these distributions? Difference just doesn't cut it, but variance does (recall it is a measure of how these much these three distributions differ from oneanother).
Notice that this is how ANOVA gets its name. Analysis of Variance.
BUT NOT treatment/group effects - this is the key
So the F-ratio can be expressed as:
F-ratio = variance (differences) between sample means variance (differences) expected by chance (error)F-ratio = variance between treatments variance within treatments
F-ratio = treatment effect + individual differences + random error individual differences + random error
if H0 is true, then what should the value of the treatment effect be?
H0: m1 = m2 = m3
so there are no difference, so variance should be = 0
If variance = 0, then what is the value of the F-ratio?
F-ratio = 0 + individual differences + random error = 1 = 1.0 individual differences + random error 1
of H0 is false, then the F-ratio will be greater than 1.
K = # of treatment conditions (or groups), each of which is called a level
of the factor.
n = # of observations in each group (if they are equal)
ni = # of observations in the ith group (if they are not equal)
N = Sni = total sample size
Ti = SXij (where
Xij is the jth observation in the ith group)
G = SXij = the sum of all the X's
= grand sum
G-bar = G / N = grand mean
SSi = the sum of squares for each group = S(Xij - i)2
So let's consider some data for our proposed experiment.
Study method | ||
Method A book alone |
Method B taking notes |
Method C borrowing notes |
0 | 4 | 1 |
1 | 3 | 2 |
3 | 6 | 2 |
1 | 3 | 0 |
0 | 4 | 0 |
T1 = 5 | T2 = 20 | T3 = 5 |
SS1 = 6 | SS2 = 6 | SS3 = 4 |
n1 = 5 | n2 = 5 | n3 = 5 |
1 = 1 | 2 = 4 | 3 = 1 |
SX2 = 106
G = 30 = grand sum
N = 15 = total sample size
G-bar = 30/15 = 2 = grand mean
K = 3 = # of treatment conditions (or groups), each of which is called a
level of the factor
recall that what we're after is:
F-ratio = variance between treatments variance within treatmentsso what we need to do is figure out how to get these two variances.
in the past we've used s2 = SS/df. That's essentially what we're going to be doing here, but things will look a bit more complex.
SStotal = SX2 - (G2/N)
SStotal = 106 - (302/15) =106 - 60 = 46
SStotal = SSwithin + SSbetween
we add together all of the SS estimates for each group
SSwithin = SSS inside each treatment = SSSi
= 6 + 6 + 4 = 16
But, there are two drawbacks of doing it this way:
definitional | computational |
SSbetween = S[ni( - G-bar)2] | SSbetween = S(T2/ni) - G2/N |
= 5(1 - 2) 2 + 5(4 - 2) 2 + 5(1 - 2)2
= 5 + 20 + 5 = 30 |
= 52/5 + 202/5 + 52/5 -
302/15
= 5 + 80 + 5 - 60 = 30 |
Let's check out math.
SStotal = SSwithin + SSbetween = 16 + 30 = 46 (and that's the number we got before)
Okay, let's return to what we're interested in.
s2 = SS/df.
We've got our SS, now we need to figure out our df. There are going to be two (or three depending on how you look at it) degrees of freedom, one for the between group variance and one for the within groups variance (and a third for total df).
dfwithin = = N - K
dfbetween = K - 1
dftotal = dfwithin + dfbetween
dfbetween = 3 - 1 = 2
dftotal = 15 - 1 = 14, which is also = 12 + 2
MSbetween = SSbetween/dfbetween
for our example = 30/2 = 15
MSwithin = MSerror = Mean Square Error = SSwithin/dfwithin
--> for our example = 16/12 = 1.33
Almost done. Let's look back at what we're after:
F-ratio = variance between treatments = MSbetween variance within treatments MSwithin
So the F-ratio for our example is: 15/1.33 = 11.28
Source SS df MS Between treatments 30 2 15.0 F = 11.28 Within treatments 16 12 1.33 Total 46 14
So what's the next step?
dfwithin = 12
dfbetween = 2
df in the numerator | |||||
df in denominator | 1 | 2 | 3 | 4 | 5 |
1 |
161 4052 |
200 4999 |
216 5403 |
225 5625 |
230 5764 |
2 |
18.51 98.49 |
19.00 99.00 |
19.16 99.17 |
19.25 99.25 |
19.30 99.30 |
3 |
10.13 34.12 |
9.55 30.92 |
9.28 29.46 |
9.12 28.71 |
9.01 28.24 |
: : |
: : |
: : |
: : |
: : |
: : |
12 |
4.75
9.33 |
3.88
6.93 |
3.49
5.95 |
3.26
5.41 |
3.11
5.06 |
13 |
4.67
9.07 |
3.80
6.70 |
3.41
5.74 |
3.18
5.20 |
3.02
4.86 |
: : |
: : |
: : |
: : |
: : |
: : |
How would one report this (this is what you'll want to know for the Holcomb exercise)?
"A one-way ANOVA yeilded a significant effect of study method, F(2,12) = 11.28, p < 0.01."
Note: in this day and age, computers actually have the family of F distributions, so your data output may actually give you your actual p-value. Rememeber that the logic of the test is such that you must specify your a level ahead of time. If you select 0.01, then that's the level that you are using for all of your tests. So if you do two experiments and your computer stats program tells you that in experiment 1, your p-value = .001, and in experiment 2 your p-value is .01 they are both equally statistically significant. The H0 is a yes/no decision. In this example the answer to both is YES. The results in Experiment 1 are NOT "more significant" than Experiment 2.
m1 not equal to m2 = m3 m1
= m3 not equal to m2
m1 = m2 not equal to m3 m1
not equal to m2 not equal to m3
So typically, after getting a significant difference result from your
ANOVA
(rejecting the H0) one would then perform some post hoc
tests. Post hoc tests will
allow us to compare the groups to one another, to see which are different
from
which.
Basically, what the post hoc tests allow you to do is go back and
compare
each treatment group to each other treatment group, two at a time.
This is called making pairwise comparisions.
So in our above example, we could go back and compare m1 to m2, m1
to m3,
and m2 to m3.
Anybody see a potential problem with doing this?
Each of these comparisons is a separate hypothesis test, each one
has a risk of making a Type I error. So, the more
comparisions that you make, the higher the risk of
concluding that there is a difference when there really isn't
one. This is called experimentwise alpha level (or
familywise error)
aEW = 1 - (1 - a)c c = # of comparisons
so for our example, if we chose a = 0.05
and make 3 comparisons
aEW = 1 - (1 - a)c = 1 - (.95)3 = 1 - .857 = .143
our chance of making a Type I error when making the comparisions
is now 1 in 7 rather than 1 in 20
Most post hoc tests have been designed to control the experimentwise
error.
We'll talk about two such tests: Tukey's HSD test (honestly
signficant difference) and the Scheffˇ test.
Tukey's HSD test
allows us to compute a single value that determines the minimum
difference betweeen treatment means that we must have to consider
the difference statistically significant.
This test requires that the groups have equal sample sizes.
HSD =
the value for q is found in Table B.5 (in the Appendix, p. A-
32). To figure out q you must know K, and dfwithin,
and what aEW you want to use.
So for our study methods example (pick aEW = .05):
HSD = = = (3.77)(.516) = 1.94
Recall: 1 = 1, 2 = 4, 3 = 1
Comparison 1: H0: m1 = m2
2 -1 = 4.0 - 1.0 = 3.0
HSD = 1.94 < 3.0 So we reject H0
Comparison 2: H0: m1 = m3
3 -1 = 1.0 - 1.0 = 0.0
HSD = 1.94 > 0.0 So we fail to reject H0
Comparison 3: H0: m2 = m3
2 -3 = 4.0 - 1.0 = 3.0
HSD = 1.94 < 3.0 So we reject H0
So group B is different from A and C, but A and C don't
differ from one another.
Scheffˇ test
Uses the F-ratio to test the differences. This is an extremely
conservative test (reduces the risk of Type I error, but
increases risk of Type II error).
You CAN use this test with unequal n's
we will recompute our MSbetween, to reflect only the
comparison that we test
each time. Note: we use the overall dfbetween and the
overall MSwithin.
Recall:
Study method
Method A
book alone
Method B
taking notes
Method C
borrowing notes
0
4
1
1
3
2
3
6
2
1
3
0
0
4
0
T1 = 5
T2 = 20
T3 = 5
SS1 = 6
SS2 = 6
SS3 = 4
n1 = 5
n2 = 5
n3 = 5
1 = 1
2 = 4
3 = 1
Source
SS
df
MS
Between treatments
30
2
15.0
F = 11.28
Within treatments
16
12
1.33
Total
46
14
Comparison 1: H0: m1
= m2
SSbetween == + - = 22.5
MSbetween = = 22.5/2 = 11.25
MSwithin = = 16/12 = 1.33
F-ratio = MSbetween = 11.25/1.33 = 8.46
MSwithin
Now go look at the F-table. a = .05,
Fcrit(2,12) = 3.88
8.46 > 3.88 So we reject H0
Comparison 2: H0: m1 =
m3
SSbetween == + - = 0
MSbetween = = 0/2 = 0
MSwithin = = 16/12 = 1.33
F-ratio = MSbetween = 0/1.33 = 0
MSwithin
Now go look at the F-table. a = .05,
Fcrit(2,12) = 3.88
0 < 3.88 So we fail to reject H0
Comparison 3: H0: m2
= m3
SSbetween == + - = 22.5
MSbetween = = 22.5/2 = 11.25
MSwithin = = 16/12 = 1.33
F-ratio = MSbetween = 11.25/1.33 = 8.46
MSwithin
Now go look at the F-table. a = .05,
Fcrit(2,12) = 3.88
8.46 > 3.88 So we reject H0
One final note: relation to t-test
What is the difference between an independent samples t-test and a one
factor
between groups ANOVA with only two levels?
Not much. F-ratio = t2
Think about it. The difference between t-tests and ANOVA, is that
t-tests
look at differences between two means and ANOVAs look at
variance. But when there are only two groups, then variance
basically boils down to squared differences. So square the t statistic
and you get the F statistic.
Post hoc tests
Recall that the ANOVA result is a test of the H0: m1 = m2
= m3
This is a binary (reject/fail to reject) decision. It does not tell us
which alternative
hypothesis is supported. In other words, we do know that some groups
are
different than other groups, but we don't know which ones they are.
Go to Chapter 12: Estimation
Go to Chapter 16: Correlation and Regression