Hypothesis testing is an inferential procedure that uses sample data to evaluate the credibility of a hypothesis about a population.
In other words, we want to be able to make claims about populations based on samples.
 Results of comprehension test (fictional):
 problem: is this 4% difference "real" or is it just due to sampling error.
Hpothesis testing  the big picture view (more details will follow)
The alternative hypothesis (H_{1}) predicts that the independent variable will have an effect on the dependent variable for the population  we'll talk more about how specific this hypothesis may be
So part of the first step is to set up your null hypothesis and your alternative hypothesis
The other part of this step is to decide what criteria that you are going to use to either reject or fail to reject (not accept) the null hypothesis
To deal with this problem the researcher must set a criteria in advance.
setting a criteria in advance is concerned with
this part about saying
"that's pretty small". When we set the
criteria in advance,
we are essentially saying, how small a
chance is small
enough to reject the null hypothesis. Or
in other words,
how big a difference do I need to have to
reject the null
hypothesis.
That's the big picture of setting the criteria, now let's look at the details
Actual situation  
Experimenter's Conclusions 

type I error (a, alpha)  the H_{0} is actually correct, but the experimenter rejected it
type II error (b, beta) the H_{0} is really wrong, but the experiment didn't feel as though they could reject it
Actual situation  
Jury's Verdict 

In scientific research, we typically take a conservative approach, and set our critera such that we try to minimize the chance of making a Type I error (concluding that there is an effect of something when there really isn't). In other words, scientists focus on setting an acceptible alpha level (a), or level of significance.
The alpha level (a), or level of significance, is a probabiity value that defines the very unlikely sample outcomes when the null hypothesis is true. Whenever an experiment produces very unlikely data (as defined by alpha), we will reject the null hypothesis. Thus, the alpha level also defines the probability of a Type I error  that is, the probability of rejecting H_{0} when it is actually true.
Almost done, but we need to talk a bit about the other kind of error that we might make
Actual situation  
Experimenter's Conclusions 

Type II error (b) the H_{0} is really wrong, but the experiment didn't feel as though they could reject it
The power of a statistical test is the probability that the test will correctly reject a false null hypothesis. So power is 1  b.
So, the more "powerful" the test, the more readily it will detect a treatment effect.
So to consider power, we need to consider the situation where H_{0} is wrong, that is when there are two populations, the treatment population and the null population
Power is the probability of obtaining sample data in the critical region when the null hypothesis is false.
So when there are two populations, the power will be related to how big a difference there is between the two.
a big difference between the two populations
notice that the shaded region is large the chance to correctly reject the null hypothesis is good  
a smaller difference between the two populations
notice that the shaded region is smaller the chance to correctly reject the null hypothesis is not nearly as good 
Factors that affect power
2) Onetailed tests have more power than twotailed tests, given that you have specified the correct tail.
Onetailed test a = 0.05 all of the critical region (a) is on one side of the distribution 

Twotailed test a = 0.05 because a specific direction is not predicted, the critical region (a) is spread out equally on both sides of the distribution as a result the power is smaller 
3) Increasing sample size increases power by reducing the standard error.
Small n a = 0.05 relatively large standard error 

Larger n a = 0.05 Smaller standard error as a result the power is greater 
Let's look at this with pictures of distributions to try and connect this with what we've been talking about so far.
Consider the following sample mean distributions.
a = prob of making a type I error  
general alternative hypothesis
H_{0}: no difference H_{1}: there is a difference
Twotailed test 

specific alternative hypothesis
H_{1}: there is a difference & the new group should have a higher mean
Onetailed test 
so how do we interpret these graphs?
The critical region is composed of extreme sample values that are very unlikely to be obtained if the null hypothesis is true. The size of the critical region is determined by the alpha level. Sample data that fall in the critical region will warrant the rejection of the null hypothesis.
Population distribution 
So the population m = 65 and s = 10.
Did the treatment work? Does it affect the population of individuals?
Which distribution should you look at? 
distribution of sample means 
Look at distribution of sample means.
Find your sample mean in the distribution. Look up the probability of getting that mean or higher for the sample (see last chapter).
Let's assume an a = 0.05 now we need to find our standard error. = = 10/5 = 2 

what is our critical region? Well, this is a
one tailed test. so, look at the unit normal table, and find the area that corresponds to a = 0.05 z = 1.65 (conservative, really 1.645) so, translate this into a sample mean = Z^{} + m = (1.65)(2)+65 = 68.3 so, if = 69, then we reject the H_{0} 
Another way that we could have done this question is just to use zscores.
Z_{} = = (69  65) / 2 = 2.0 since > Z_{critical}, then we can reject the H_{0}
However, the most common way to do hypothesis testing is to make a more general hypothesis, that the treatment will change the mean, either increase or decrease.
Population distribution 
So the population m = 65 and s = 10.
Suppose that you take a sample of n = 25, give them the treatment
and get a = 69.
Did the treatment work? Does it affect the
population of individuals?
Which distribution should you look at?
population? 
distribution of sample means 
Look at distribution of sample means. Find your sample mean in the distribution. Look up the probability of getting that mean or higher for the sample (see last chapter).
Let's assume an a = 0.05 
now we need to find our standard error.
= = 10/(sqroot 25)
= 2
what is our critical region? Well, this is a
two tailed test. 
Assumtions of hypothesis testing
Violations of any of these assumptions will severly compromise any conclusions that you make about the population based on your sample (basically, you need to use other kinds of inferential statistics that can deal with violations of various assumptions)
Everything that we did in the last four chapters is related to this chapter. However, the logic of what we are doing here, estimation, is different from the logic used in hypothesis testing.
In the last several chapters we tested the a null hypothesis that basically asked the question, is this different from that? Estimation asks a different question. With estimation we are making educated guesses as to the value of a population parameter.
When do we use estimates?
We'll focus on two kinds of estimates of the population mean.
2) interval estimates (confidence intervals) of the mean: using a range of values as your estimate of an unknown quantity. When an interval is accompanied with a specific level of confidence (or probability) , it is called a confidence interval.
Both kinds of estimates are determined by the same equation, the difference is that for the point estimates, we'll just compute a single number (that's why it is called a point estimate), but for the interval estimate, we'll compute an interval between two points.
Let's start at the conceptual level. Consider the following population distribution.
Suppose that we guess that the mean is somewhere between 71 & 99? How confident are we in this guess?
point estimate  interval estimate  
Disadvantages  it doesn't convey any sense of how much precision we have in making that estimate.  we often need to have one specific value, a range of possible values just may not be enough 
Okay, now let's begin with a point estimate of the mean. What will be the best single estimate of the population mean?
population sample means
However, suppose that all we have is a single sample. Now what is our best guess?
How can we get an estimate where we'd have a better chance of being right? Instead of giving a point estimate, we can estimate an interval.
Okay, now let's formalize things a bit. Let's first talk about the logic of estimation, and then move onto the actual formulas that we'll use.
Okay, so what's the formula? It is the same one(s) that we've been using all along, but we do a little algebra to move it around so that instead of solving for a zscore, we solve for the population parameter.
z = > (z)() =  m > m =  ()(z)
step 2: and we see that m = is our most reasonable estimate.
Okay, so that's the formula for point estimation. What about for an interval estimation?
m = +
(z)()
= 85 + (1.65)(5/sqroot 25) = 86.65
m = 
(z)()
= 85  (1.65)(5/sqroot 25) = 83.35