Outline

  • Consider the logic & assumptions of hypothesis testing
  • Discuss H0 & HA
  • Create a distribution of sample means
  • Determine likelihood of obtaining particular samples from populations
banner

Lab 14

Hypothesis Testing and

the Distribution of Sample Means

Hypothesis Testing

In the section of the course we learned to use the descriptive statistical procedures to describe distributions (and relationships between distributions). In the remainder of the course we will focus on inferential statistical procedures, which are used to make claims about the populations based on data collected from samples.

In today's lab we will begin discussing the inferential procedure of Hypothesis testing. The reasoning of statistical tests is based on asking what would happen if we repeated the experiment over and over again.
    Hypothesis testing is an inferential procedure that uses sample data to evaluate the credibility of a hypothesis about a population.
      Step 1: State the hypotheses and select a criteria for the decision
      Step 2: Collect a sample
      Step 3: Compute a test statistic
      Step 4: Compare the test statistic to a distribution to make an inference about the parameter and hence draw a conclusion about the sample

Let's look at each of these steps in more detail.


    Step1: Make a hypothesis and select a criteria for the decision

      - the assumptions are related to the stat test you'll do and we'll talk more about those as we discuss each individual test
      - your hypothesis is an educated guess/prediction about the effect of particular events/treatments/factors (which result in differences between populations)
      - your hypothesis may be general (e.g., this course will change comprehension abilities), or specific (e.g., this course will improve comprehension abilities by at least 10%).
    The standard logic that underlies hypothesis testing is that there are always (at least) two hypotheses: the null hypothesis and the alternative hypothesis

      The null hypothesis (H0) predicts that the independent variable (treatment) has no effect on the dependent variable for the population.

      The alternative hypothesis (Ha) predicts that the independent variable will have an effect on the dependent variable for the population

      The hypothesis testing procedure assumes we are trying to reject the null hypothesis, not trying to prove the alternative hypothesis.

        Why?
          Generally, it is easier to show that something isn't true, than to prove that it is. This is especially true when we are dealing with samples. Remember that we aren't testing every individual in the population, only a subset.

      Think about it this way. Suppose we had a hypothesis that all dogs have 4 legs. To reject this hypothesis, we'd need to have a sample which includes 1 or more dogs with more or fewer than 4 legs. To accept it, we'd need to examine every dog in the population and count their legs. It's much easier to get a sample to show it's wrong than to test the whole population to show that it's correct.

      Example: Suppose that we know that in the US on average 30% of registered voters vote in each election. You want to try to increase that number with an ad campaign to try to get more people to vote. So we conduct the ad campaign before a major election and then record the percentage of voters that vote in that election.
        What will our hypotheses be in this case? H0 states that the independent variable will have no effect so our H0 is that m = 30% (indicating no effect of ad campaign). Our H1 is the opposite: that m will not equal 30%.

        Alternatively, we could make a specific alternative hypothesis if we chose. This would change our H0 too. Let's consider the specific case above where we expect that the ad campaign will INCREASE voters. This means that we expect higher voting rates for our sample than is in the population (30%). Here our Ha is that m > 30%. That means that our H0 is m < or = 30%.


Try some on your own. Each of the following situations calls for a significance test for a population mean m. State the null hypothesis H0 and the alternative hypothesis Ha in each case.
    (1a) The diameter of a spindle in a small motor is supposed to be 5mm. If the spindle is either too small or too large, the motor will not work properly. The manufacturer measures the diameter in a sample of motors to determine whether the mean diameter has moved away from the target.

    (1b) Census Bureau data show that the mean household income in the area served by a shopping mall is $52,500 per year. A market research firm questions shoppers at the mall. The researchers suspect the mean household income of mall shoppers is higher than that of the general population.

    (1c) The examinations in a large psychology class are scaled after grading so that the mean score is 50. The professor thinks that one teaching assistant is a poor teacher and suspects that his students have a lower mean than the class as a whole. The TA's students this semester can be considered a sample from the population of all students in the course, so the professor compares their mean score with 50.


    So part of the first step is to set up your null hypothesis and your alternative hypothesis (which we did above).

    The other part of this step is to decide what criteria you are going to use to either reject or fail to reject (not accept) the null hypothesis. This is sometimes referred to as setting your a level (that's alpha level).

      So consider the problem that we have. We have a sample and its descriptive statistics are different from the population's parameters. How do we decide whether the difference that we see is due to a "real" difference (which reflects a difference between two populations) or is due to sampling error?

    To deal with this problem the researcher must set a criteria in advance.

        What are the possible real world situations?
          - H0 is correct
          - H0 is wrong
        What are the possible conclusions?
          - H0 is correct
          - H0 is wrong
        So this sets up four possibilities (2 * 2):
          - 2 ways of making mistakes
          - 2 chances to be correct


      Actual situation


      Experimenter's Conclusions

      H0 is correct H0 is wrong
      Reject H0
      Fail to reject H0
      Type I error
      (oops!)
      correct
      (Yay!)
      correct
      (Yay!)
      Type II error
      (oops!)
        The two kinds of errors each have their own name, because they really are reflecting different things.

          Type I error (a, alpha) - the H0 is actually correct, but the experimenter rejected it

            - e.g., there really is only one population, even though the probability of getting a sample was really small, you just got one of those rare samples

          Type II error (b, beta)- the H0 is really wrong, but the experiment didn't give us the evidence we need to reject it

            - e.g., your sample really does come from another population, but your sample mean is too close to the original population mean that you aren't can't rule out the possibility that there is only one population

      In scientific research, we typically take a conservative approach, and set our critera such that we try to minimize the chance of making a Type I error (concluding that there is an effect of something when there really isn't). In other words, scientists focus on setting an acceptable alpha level (a), or level of significance.

        The alpha level (a), or level of significance, is a probabiity value that defines the very unlikely sample outcomes when the null hypothesis is true. Whenever an experiment produces very unlikely data (as defined by alpha), we will reject the null hypothesis. Thus, the alpha level also defines the probability of a Type I error - that is, the probability of rejecting H0 when it is actually true.

          Note: In psychology a is usually set at 0.05

    (2) A researcher would like to test the effectiveness of a newly developed growth hormone. The researcher knows that under normal circumstances laboratory rats reach an average weight of 1000 grams at 10 weeks of age. When the sample of 10 rats is weighed at 10 weeks, they weigh 1010 grams.

      (a) Assuming that the growth hormone has no effect, what would a type I error be in this situation?
      (b) Assuming that the growth hormone does have an effect, what would a type II error be in this situation?




Step 2: Collecting your sample

    When we discussed z-scores, we were using z-scores to locate a score or set of scores in the population.

    Now we are dealing with situations in which we are looking not at single scores, but rather at samples of scores. These samples consist of scores that were randomly selected from a population of scores. As we saw in our earlier discussions of probability, random events are predicable in the long run. The following discussion and exercises demonstrate how we use this knowledge within the hypothesis testing framework to make claims about populations from samples.

      Suppose that you take 3 different random samples from the same population. They are probably going to be different from one another. See the figures below for an example of what I mean.



      samples
      Normal Samples

      The samples may have different shapes, different means, and different variability. So how do you figure out what the best estimate of the population mean is?

      There are essentially an infinite number of samples that can be taken from a population if we sample with replacement (put the ones we choose back into the population each time). But the huge set of possible samples forms a simple, orderly, and predictable pattern (a sampling distribution). Because of this, we are able to base our predictions about sample characteristics on the distribution of sample means.

      The distribution of sample means is the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population.

    In other words, what we want to do is look at all of the possible samples (of a particular size, this part is important) and make predictions based on the properties of all of them. We do this the same way that we've done in the past, we essentially find the average of those properties.



    The Distribution of Sample Means

We can create a distribution of sample means by looking at all possible samples of a certain size (n) and considering the means of each of those samples.

Let's look at a concrete example:

    Consider the following small population of scores: 2, 4, 6, 8

    Because this population is so small we actually can know the mean (and variability): mean = (2+4+6+8)/4 = 5, but suppose that we didn't, and wanted to be able to make an estimate of this population from samples chosen from the population (like we do when we conduct a research study).

      step 1: pick a sample size: for this example we'll pick samples of n = 2

        - we'll talk more about sample size a little later, but typically the bigger your sample size, the more likely that your samples will be similar to one another (and to the population as a whole)

      step 2: Because we selected such a small population, we can actually consider all of the possible samples that you could get (ignoring duplications resulting from sampling with replacement), and look at their distribution.

        Okay, imagine that each score is a number on a tile. Put all four tiles in a bag. To create the first sample, pull out one tile, record the number on the tile, replace the tile in the bag, and then select the second tile (recall our sample size is 2) and record the value on that tile. The table below presents all 16 possible n = 2 samples that could result from this process along with the mean of each sample.

      ____________________________________
      	scores		sample mean
      sample	first	second	      (s)  
      1	2	2		2
      2	2	4		3
      3	2	6		4
      4	2	8		5
      5	4	2		3
      6	4	4		4
      7	4	6		5
      8	4	8		6
      9	6	2		4
      10	6	4		5
      11	6	6		6
      12	6	8		7
      13	8	2		5
      14	8	4		6
      15	8	6		7
      16	8	8		8   
      
      Distribution of the sample means
        Now let's plot all of the different sample means (and provide a frequency distribution table). This is the distribution of sample means (where scores are means from the samples you chose).

      f
       x     f
       2	1
       3	2
       4	3
       5	4
       6	3
       7	2
       8	1

      step 3: Now you're ready to answer questions like: What is the probability of getting a sample with a mean greater than 7? p(x > 7) = ?

        s

        look at our distribution of sample means, we find that 1 out of 16 have a mean greater than 7. So that's our answer: 1/16 = .0625 = 6.25%



    Properties of the distribution of sample means

    • Mean:
        the average of all of the sample means will equal the mean of the population.

    • Variability:
        the standard deviation of the distribution of sample means is called the standard error of the mean.

    • Shape:
        the shape of the distribution of sample means tends to be a normal distribution. In fact, when n is large (around 30 or more), the distribution of sample means is almost perfectly normal.

    The expected value of the mean

    Open Dist. of Sample Means and click Begin.

    The top distribution is the population distribution. We'll start a normally distributed population, which is set as the default. On the left are the descriptive statistics that describe this distribution.

    s1

    From this population distribution we can draw random samples.

    (3) Click on the "animated sample" button. This will randomly sample 5 individuals (n = 5) from the population (they'll drop down on to the graph immediately below the population graph). The applet will also compute the mean of this sample and plot the sample mean on the third plot (below the plot of the 5 individuals). In your worksheet, record the mean of the first sample. How does this mean compare to the mean of the population? How much sampling error is there (sampling error is the difference between the population mean and the sample mean)?

    s2

Click the "clear lower 3" button.

    (4) Now let's try more samples. Click on the "5 samples" button. This will randomly select 5 samples (each sample will have n=5). The distribution of sample means plot (the third one down) will now have 5 sample means in it. What are 5 means? What is the mean of the distibution of sample means (the mean of the 5 sample means)? How does this mean of the distribution of sample means compare to the actual population mean?

    Click the "clear lower 3 button.

    (5) Now click the "10,000 samples" button. This will take 10,000 samples (of size n=5) and plot all of the sample means on the distribution of sample means plot. What is the mean of the distibution of sample means? How does this mean of the distribution of sample means compare to the actual population mean?

    (6) State a generalization about the relationship between the population mean and the mean of the distribution of sample means.


    Standard Error

    The standard deviation of the distribution of sample means is called the standard error. The standard error is influenced by two factors: the variability of the population (s) and the sample size (n).

    We'll consider each of these factors below:

      (A) the variability of the population - the bigger the variability of the population, the more variability you'll have in the sample means.

      sd3a
      large s
      big differences from the pop mean
      sd3b
      small s
      small differerences from the pop mean

      (B) the size of the sample - the larger your sample size (n), the more accurately the sample represents the population. This is known as the Law of large numbers.

        think of it this way:

      sd4a - If I randomly selected 1 score, how accurately will that score predict the population's mean?
      sd4b - Suppose that I take 5 scores. Are things more accurate?
      sd4c - What about 100 scores?

      These two characteristics are combined in the formula for the standard error.

      standard error = se

      Now go back to the distribution of sample means applet again (or click below to open it again).

      Dist. of Sample Means

We can use the applet to take samples of different sizes from the sample population for comparison. Set the bottom graph to take samples of size 20 (n=20) and to plot the "mean" (change the "none" to mean).
(7) Click on 1,000 sample button. This will take 1,000 samples of size n = 5 and 1,000 samples of size n = 20. What are the means of each of these sampling distributions? How do they compare? What are the standard errors (standard deviations of the sampling distributions) for each other?


    The shape of the distribution of sample means.

    Open up the distribution of sample means applet again.

    Dist. of Sample Means

    We can change the shape of the population distribution.

    (8) Change the "normal" option to "skewed." As in the exercise above, sample two different sizes n = 5 & n = 20. Click the "1,000 samples" button. How do the shape of these distributions look? Are they skewed or fairly symmetrical? Which appears to be closer to Normal (hint: you can click the "fit normal" boxes to overlay a normal distribution)?


    Central limit theorem

    All of these properties (shape, mean, variability) are covered in the Central Limit Theorem
      Central Limit Theorem: For any population with mean m and standard deviation s, the distribution of sample means for sample size n will approach a normal distriution with a mean of m and a standard deviation of stdr as n approaches infinity.
        Note: for practical purposes this holds true for n > 30 (that is, for samples larger than n = 30).


Using the Distribution of Sample Means to Determine Sample Likelihood

    Often we are not concerned with where a single individual is in a distribution, but rather where a sample is in the distribution of sample means. This tells us how likely we are to get that sample from a specific population. We can do this with the z-score distribution (and the unit normal table) if we know that the distribution of sample means is normal. (Remember that using the Central Limit Theorem, we know the distribution of sample means is normal if n is greater than 30 OR the population is normal.)

      Example:
        Consider the following situation. An instructor is interested in the IQ of her students. She has 9 students in her class and thinks that they are, on the average, really smart. What is the probability that the group of students has a mean greater than or equal to 112?

        In other words, we don't want to know the probability of each individual having a score of 112 or better separately. Instead we want to know as a group, what is the probability of getting an average score of 112 or better.

          We need to start by getting the population parameters
            for the standardized IQ test: mean = 100, standard deviation = 15

          Next we need to get the mean and standard deviation of the distribution of the samples (note: we'll assume a normal distribution because the original population distribution of IQ scores is normally distributed) so that we can calculate the z-score.

          m = 100 (because the population mean is 100).

          sterr = stderr = 15/sq. root of 9 = 15/3 = 5

          Now we need to figure out the z-score that corresponds to this sample mean: the z-score formula pretty much looks like what we've used in the examples above (except now we're locating a sample in the distribution of sample means rather than finding a single score in a population): Zx = z

            so for our example:

            P(x > 112) = P(Zx > (112 - 100)/ 5 ) = P(Z > 2.4) = 0.0082

            In other words, the probability that we'll get a sample of size n = 9 students with an average IQ equal to or greater than 112 is very small (0.0082). In our next labs we will extend this result to make claims concerning hypotheses about our population and our sample.

          Does this answer make sense? Let's look at the pictures of our distributions.

          Population distribution
          75
          - at first it looks wrong

          - it seems like 112 should be less than a z = 1, because 115 is where z should equal 1

          Distribution of Sample means
          57
          - however, we must remember that this isn't the correct distribution to be looking at, we need to look at the distribution of sample means.

          -we know that the distribution of sample means has a standard error = 5 and a mean = 100.

          - So 112 should have a z >2

      Let's look at a different kind of example.

        Example:
          How high a mean would a group of 25 have to have on IQ to be in the top 10% of the IQ distribution for groups of this size?
            First we need to get the mean and standard deviation (i.e., standard error) for the distribution of the samples
              population mean = 100

              st = sd = 15/sq. root of 25 = 15/5 = 3

            Now we need to work backwards because we don't know the z-score. We can determine the z-score for the range based on the portion of the distribution we're looking for. We want the top 10% of the distribution. This corresponds to a proportion of .1000 for the distribution. If we look in the unit normal table, we find that .1000 corresponds to a z-score of 1.28 (that's as close to .1000 as we can get). You can verify this by looking at the unit normal table.

            So for our example:

              step 1: look at unit normal table for 10%

              step 2: work backwards through the z-score formula to solve for x

              x = z * stderr + population mean = (1.28)(3) + 100 = 103.84

            so, for a group of 25 people, they'd have to have a mean of just under 104 to be in the top 10%

          (9) Suppose we think that listening to classical music will affect the amount of time it takes a person to fall asleep so we conduct a study to test this idea.
            (a) Suppose that the average person in the population falls asleep in 15 minutes (without listening to classical music) with σ = 6 min, state the null and alternative hypotheses for this study.

            (b) Assume that the amount of time it takes people in the population to fall asleep is normally distributed. In the study we have a sample of people listen to classical music and then we measure how long it takes them to fall asleep. Suppose the sample of 36 people fall asleep in 12 minutes. What is the probability of obtaining a sample mean of 12 minutes or smaller?

    Next time: Finishing up hypothesis testing, steps 3 and 4