Psy138 Logo.

Outline

  • Consider the logic of Hypothesis
    Testing
  • Discuss H0 and H1
  • Consider assumptions of Hypothesis
    Testing

Lab 14

Inferential statistics

Hypothesis testing

Download Lab 14 Worksheet Tutorial 14

 

In the last lab we learned to use the inferential statistical procedure of estimation. In this lab (and the next) we'll learn about a related inferential procedure: Hypothesis testing

    Hypothesis testing is an inferential procedure that uses sample data to evaluate the credibility of a hypothesis about a population.

    Hpothesis testing - the big picture view (more details will follow)

      Step 1: State assumptions, make a hypothesis and select a criteria for the decision
        - the assumptions are related to the stat test you'll do and we'll talk more about those as we discuss each individual test
        - your hypothesis is an educated guess/prediction about the effect of particular events/treatments/factors (which result in differences between populations)
        - your hypothesis may be general (e.g., this course will change comprehension abilities), or specific (e.g., this course will improve comprehension abilities by at least 10%).

      Step 2: Collect a sample

        - randomly select individuals from a population
        - randomly assign selected individuals to specific treatment groups
        - after the treatment, the question that we have is, roughly, are all of our individuals in the same population, or do we have individuals belonging to a new population because of our treatment

      Step 3: Compute a test statistic

        - things like z-scores, t-tests, F-tests (ANOVA)

      Step 4: Compare the test statistic to a distribution to make an inference about the parameter and hence draw a conclusion about the sample

        - roughly, how likely is this difference due to sampling error? Given this probability, what should we conclude?

    The reasoning of statistical tests, like that of confidence intervals, is based on asking what would happen if we repeated the experiment over and over again.

    Let's look at each of these steps in more detail.

      Step1: Make a hypothesis and select a criteria for the decision The standard logic that underlies hypothesis testing is that there are always (at least) two hypotheses: the null hypothesis and the alternative hypothesis

        The null hypothesis (H0) predicts that the independent variable (treatment) has no effect on the dependent variable for the population.

        The alternative hypothesis (Ha) predicts that the independent variable will have an effect on the dependent variable for the population

        The hypothesis testing procedure assumes we are trying to reject the null hypothesis, not trying to prove the alternative hypothesis.

          Why?
            Generally, it is easier to show that something isn't true, than to prove that it is. This is especially true when we are dealing with samples. Remember that we aren't testing every individual in the population, only a subset.

        Think about it this way. Suppose we had a hypothesis that all dogs have 4 legs. To reject this hypothesis, we'd need to have a sample which includes 1 or more dogs with more or fewer than 4 legs. To accept it, we'd need to examine every dog in the population and count their legs. It's much easier to get a sample to show it's wrong than to test the whole population to show that it's correct.

      Example: Suppose that we know that in the US on average 30% of registered voters vote in each election. You want to try to increase that number with an ad campaign to try to get more people to vote. So we conduct the ad campaign before a major election and then record the percentage of voters that vote in that election.

        What will our hypotheses be in this case? H0 states that the independent variable will have no effect so our H0 is that m = 30% (indicating no effect of ad campaign). Our H1 is the opposite: that m will not equal 30%.

        Alternatively, we could make a specific alternative hypothesis if we chose. This would change our H0 too. Let's consider the specific case above where we expect that the ad campaign will INCREASE voters. This means that we expect higher voting rates for our sample than is in the population (30%). Here our Ha is that m > 30%. That means that our H0 is m < or = 30%.


Try some on your own. Each of the following situations calls for a significance test for a population mean m. State the null hypothesis H0 and the alternative hypothesis Ha in each case.

    (1) The diameter of a spindel in a small motor is supposed to be 5mm. If the spindle is either too small or too large, the motor will not work properly. The manufacturer measures the diameter in a sample of motors to determine whether the mean diameter has moved away from the target.

    (2) Census Bureau data show that the mean household income in the area served by a shopping mall is $52,500 per year. A market research firm questions shoppers at the mall. The researchers suspect the mean household income of mall shoppers is higher than that of the general population.

    (3) The examinations in a large psychology class are scaled after grading so that the mean score is 50. The professor thinks that one teaching assistant is a poor teacher and suspects that his students have a lower mean than the class as a whole. The TA's students this semester can be considered a sample from the population of all students in the course, so the professor compares their mean score with 50.


So part of the first step is to set up your null hypothesis and your alternative hypothesis (which we did above).

The other part of this step is to decide what criteria you are going to use to either reject or fail to reject (not accept) the null hypothesis. This is sometimes referred to as setting your a level (that's alpha level).

    So consider the problem that we have. We have a sample and its descriptive statistics are different from the population's parameters. How do we decide whether the difference that we see is due to a "real" difference (which reflects a difference between two populations) or is due to sampling error?

To deal with this problem the researcher must set a criteria in advance.

    For example, think of the kinds of questions we were doing in earlier labs.

    Given a population with a m = 65 and a s = 10, what is the probability that our sample (of size n = 25) will have a mean of 70 or more?

    To figure this out we computed the standard error and then a z-score.

    p( < 70): Need s = 10/sqroot(25) = 2. So z = (70 - 65) / 2 = 2.5 And p( < 70) = p(z < 2.5) = 0.0062

We're going to be asking the same questions here, but taking it a step further and saying things like, "Gee, the probability that my sample has a mean of 70 or higher is 0.0062. That's pretty small. I'll bet that my sample isn't really from this population, but is instead from another population."

Setting a criteria in advance is concerned with this part about saying "that's pretty small". When we set the criteria in advance, we are essentially saying, how small a chance is small enough to reject the null hypothesis. Or in other words, how big a difference do I need to have to reject the null hypothesis. This cutoff p value is called alpha ( a).

    Note: often alpha is determined by convention within your own discipline. For example, some fields may say that p < 0.05 is low enough to reject the H0. While other fields may chose p =< 0.01 as alpha.


Now let's look at some examples of this procedure (like the ones we did in lab13) with our new context of how small p is.

(4) A bottling company uses a filling machine to fill plastic bottles with cola. The bottles are supposed to contain 300 milliliters (ml). In fact the contents vary according to a normal distribution with a m = 298 ml and a standard deviation s = 3 ml. What is the probability that the mean contents of the bottles in a six-pack is less than 295 ml? How small is this probability (i.e., do you think it is very likely that a sample of 6 bottles would have an average contents of less than 295?)?

(5) IQ scores for the general population form a normal distribution with m = 100 and s = 15. However, there are data that indicate that children's intelligence can be affected if their mothers have German measles during pregnancy. Using hospital records, a researcher obtained a sample of n = 20 school children whose mothers all had German measles during their pregnancies. The average IQ for this sample was 97.3. What is the probability that this sample came from the general population described in the first sentence [Hint: you're looking for p( < 97.3)]? Assume that p < alpha is low enough to reject H0 and assume the sample is different due to having had mothers with German measles. Use alpha = .05. Do you think that there is enough evidence here to decide that the sample came from a different population than the general one described above?


That's the big picture of setting the criteria, now let's look at the details.

    What are the possible real world situations?
      - H0 is correct
      - H0 is wrong
    What are the possible conclusions?
      - H0 is correct
      - H0 is wrong
    So this sets up four possibilities (2 * 2):
      - 2 ways of making mistakes
      - 2 chances to be correct

Actual situation


Experimenter's Conclusions
H0 is correct H0 is wrong
Reject H0
Fail to reject H0
Type I error
(oops!)
correct
(Yay!)
correct
(Yay!)
Type II error
(oops!)

    The two kinds of errors each have their own name, because they really are reflecting different things.

      Type I error (a, alpha) - the H0 is actually correct, but the experimenter rejected it

        - e.g., there really is only one population, even though the probability of getting a sample was really small, you just got one of those rare samples

      Type II error (b, beta)- the H0 is really wrong, but the experiment didn't give us the evidence we need to reject it

        - e.g., your sample really does come from another population, but your sample mean is too close to the original population mean that you aren't can't rule out the possibility that there is only one population

In scientific research, we typically take a conservative approach, and set our critera such that we try to minimize the chance of making a Type I error (concluding that there is an effect of something when there really isn't). In other words, scientists focus on setting an acceptable alpha level (a), or level of significance.

    The alpha level (a), or level of significance, is a probabiity value that defines the very unlikely sample outcomes when the null hypothesis is true. Whenever an experiment produces very unlikely data (as defined by alpha), we will reject the null hypothesis. Thus, the alpha level also defines the probability of a Type I error - that is, the probability of rejecting H0 when it is actually true.

      Note: In psychology a is usually set at 0.05

Let's look at pictures of distributions to try and connect this with what we've been talking about so far.

Consider the following sample mean distributions.

a = prob of making a type I error
general alternative hypothesis

    H0: no difference H1: there is a difference

    Two-tailed test
    a = 0.05
    so this is 0.025 in each tail 0.025 + 0.025 = 0.05

specific alternative hypothesis

    H0: no difference
    H1: there is a difference & the new group should have a higher mean

    One-tailed test
    a = 0.05
    so this is 0.05 in the tail

So how do we interpret these graphs?

    If our sample mean falls into the shaded areas then we reject the H0. On the other hand, if our sample mean falls outside of the shaded areas, then we may not reject the H0. These shaded regions are called the critical regions. This is the same thing as comparing p with alpha since the shaded regions are equal to the proportion set by alpha.

      The critical region is composed of extreme sample values that are very unlikely to be obtained if the null hypothesis is true. The size of the critical region is determined by the alpha level. Sample data that fall in the critical region will warrant the rejection of the null hypothesis.


(6) Suppose we think that listening to classical music will affect the amount of time it takes a person to fall asleep so we conduct a study to test this idea.

    (a) Suppose that the average person in the population falls asleep in 15 minutes (without listening to classical music) with s = 6 min, state the null and alternative hypotheses for this study.

    (b) Assume that the amount of time it takes people in the population to fall asleep is normally distributed. In the study we have a sample of people listen to classical music and then we measure how long it takes them to fall asleep. Suppose the sample of 36 people fall asleep in 12 minutes. What is the probability of obtaining a sample mean of 12 minutes or smaller? Assuming a = .05, is your calculated p value in the critical region (Hint: remember to consider two critical regions)?

    (c) Using your answer to part (b), what decision should be made about the null hypothesis you stated in part (a)?

    (d) Assume now that in reality classical music does not affect how long it takes people to fall asleep. In this case, what kind of decision (correct, Type I error or Type II error) have you made in part (c)?

(7) A developmental psychologist believes that a new technique can help kids learn math skills faster than the current technique. He measures math skills from a standardized math skills test. It is known that the population of 5th graders in the US score and average of 80 on this test. The psychologist uses the new technique on a sample of 5th graders for one year and then has them take the standardized test at the end of the year to compare their scores with the population mean.

    (a) What are the researcher's null and alternative hypotheses (Hint: remember that he believes the new technique will increase scores)?

    (b) Suppose the psychologist calculated the z score for his sample mean and found that there is a .0890 chance of getting a sample mean that large or larger. If his alpha level is .05, is his sample in the critical region? If his new technique really does have and effect, what kind of decision will he make for his test?


One-sample z test

Assumtions of the test (and most hypothesis testing)

    1) Random sample - the samples must me representative of the populations. Random sampling helps to ensure the representativeness.
    2) Independent observations -also related to the representativeness issue, each observation should be independent of all of the other observations. That is, the probability of a particular observation happening should remain constant.
    3) s is known and is constant - the standard deviation of the original population must stay constant. Why? More generally, the treatment is assumed to be adding (or subtracting) a constant from every individual in the population. So the mean of that population may change as a result of the treatment, however, recall that adding (or subtracting) a constant from every individual does not change the standard deviation.
    4) the sampling distribution is relatively normal - either because the distribution of the raw observations is relatively normal, or because of the Central Limit Theorem (or both).

Violations of any of these assumptions will severly compromise any conclusions that you make about the population based on your sample (basically, you need to use other kinds of inferential statistics that can deal with violations of various assumptions)

So far we've been discussing the logic of our Hypothesis Testing procedure. In this lab, we're going to cover one type of test statistic and put all the steps of hypothesis testing together. We're going to conduct hypothesis testing using the one-sample z-test. We've already covered the logic of how this works, but now we'll make it more formal as an inferential test statistic.

Let's quickly recall the decision tree that we saw earlier in the semester.

Find the string of decisions that lead to a 1-sample z-test.

The one-sample z-test is used to compare a single sample to a known population mean m when we know:
(a) the distribution of sample means is normal
AND
(b) the population standard deviation s.

Let's look at a complete example using our hypothesis testing steps:

    Suppose we were interested in whether the number of hours students spend studying differs by class status (i.e., freshmen, sophomores, juniors, seniors). Specifically, we want to know if seniors spend more time studying than the average college student of any year. We know that the general population of college student in the US spends an average of 4 hours a day studying for their courses, with a s = 1 hour. To answer our question, we'll ask a sample of 50 seniors how much time they spend studying per day.

    In this study we are comparing a sample to a known population m and s. We also know that the distribution of sample means will be normal because our sample size is greater than 30. That means we can use our z-score procedure to conduct this test.

      Step 1: State hypotheses and decision criterion.

      For Ha: mseniors > 4
      (because we are predicting that seniors study MORE)

      For H0: mseniors < 4
      (because these are the other possibilities for the comparison)

      Since our Ha is a directional hypothesis (we're only predicting an increase in study hours), we'll have a one-tailed test because we'll only need to consider the critical region above the null population mean.

      We also need to set our alpha level for our decision criterion. We'll use a = .05. This is the highest probability of the null being true that we'll accept as evidence against it.

      Step 2: Collect sample data

      In this step, we actually collect our data. Suppose we asked the 50 seniors we selected for our sample how many hours a week they study and the average reported mean for the sample was = 4.4 hours. With our sample mean, we're ready to move on to Step 3 and calculate our z-score test statistic.

      Step 3: Calculate test statistic

      The first thing we need to do here is to calculate the standard error . We'll have:

      = = 1/sq. root 50 = 1/7.07 = .14

      Then we're ready to calculate z:

      z = = (4.4 - 4) / .14 = 2.86

      This is our test statistic. We'll need to know the probability of getting a sample mean this large or larger so we need to find z = 2.86 in the unit normal table.

      We find that the probability of a sample mean this large or larger is .0021. Now it's time to do our last step to make our decision.

      Step 4: Make a decision about H0

      We need to determine where our test statistic (z = 2.86) falls in the distribution by comparing it's p value with alpha. Is it in the critical region? In this case it is because it is lower than .05 so it would fall in the shaded region. This means there is less than a .0021 chance that we'd get a sample mean of 4.4 if seniors are the same as the general population. Since that's less than .05 (our decision crtierion), that's low enough that we can decide that seniors come from a different population with a different mean than the general population of college students. This also means we have enough evidence to reject the H0 hypothesis and accept Ha that seniors study more than the general population of college students. So our decision is:

      Reject the null hypothesis and accept the alternative hypothesis.

      Of course remember that there's still a chance (less than 5%) that we made a Type I error, but we're reasonably sure we made the right decision.


One last note about statistical significance: Remember from the last lab that the larger the sample size, the more likely we are to detect an effect that exists, because we're more likely to reject the null hypothesis. However, this also means that with large sample sizes, even if the effect is really small, we're more likely to reject the null and decide there's an effect. Therefore, in the case of very large samples, we may detect effects that lack practical significance because it is small enough to be not important. We must be careful that when we find statistical significance that our findings also have practical significance, meaning the effect of the treatment is important.

Ok, now try some on your own. For problems (8) - (9), write out each step of hypothesis testing.

(8) A psychologist examined the effect of chronic alcohol abuse on memory. In this study, a standardized memory test was used. Scores on this test for the general population form a normal distribution with m = 50 and s = 6. A sample of n = 22 alcohol abusers had a mean score of = 47. Is there evidence for memory impairment among alcoholics? Use a = .05 for a one-tailed test.

(9) On a vocational interest inventory that measures interest in several categories, a very large standardization group of adults (i.e., a population) has an average score of m = 22 and s = 4. Scores are normally distributed. A researcher would like to determine if scientists differ from the general population in terms of writing interests. A random sample of scientists is selected from the directory of a national science society. The scientists are given the inventory, and their test scores on the literary scale are as follows: 21, 20, 23, 28, 30, 24, 23, 19. Do scientists differ from the general population in their writing interests? Test at the .05 level of significance for two tails.