Outline

  • construct confidence intervals
banner

Lab 23

Confidence intervals (estimation)

So far we've talked about one way to use inferential statistics: hypothesis testing. In this lab, we're going to look at another way to use inferential statistics: to estimate the population mean from the sample.

    Estimation is an inferential procedure in which we make educated guess as to the value of a population parameter.

    This lab will focus on the basic logic of estimation using the tests we've talked about this semester (i.e., z-test and t-tests).

When do we use estimates?

    (1) You just want to know some basic information about a population, but you can't measure the whole group, so instead you take a sample.
    (2) If you already know that there is an effect but you want to know how big it is.
    (3) After we do hypothesis testing and have rejected the H0 .
      "So we reject that there is no difference due to the treatment, but we still want to know how much of a difference is there"

We'll focus on two kinds of estimates of the population mean.

    (1) point estimates of the mean: using a single number as your estimate of an unknown quantity

    (2) interval estimates (confidence intervals) of the mean: using a range of values as your estimate of an unknown quantity. When an interval is accompanied with a specific level of confidence (or probability) , it is called a confidence interval.

Both kinds of estimates are determined by the same equation, the difference is that for the point estimates, we'll just compute a single number (that's why it is called a point estimate), but for the interval estimate, we'll compute an interval between two points.

Let's start at the conceptual level. Consider the following population distribution.

    Suppose that we guess that the mean is 85? How confident are we in this guess?

    Suppose that we guess that the mean is somewhere between 71 & 99? How confident are we in this guess?

      Hopefully, you will think that you'd be more confident in the range. This difference corresponds to the difference between point and interval estimations.


      point estimate interval estimate
      Disadvantages it doesn't convey any sense of how much precision we have in making that estimate. we often need to have one specific value, a range of possible values just may not be enough

So what do we mean by confidence?

Remember what the confidence interval is: it is an interval of estimates of the population mean based on the data from a sample. So a 90% confidence interval means that for 90% of our interval estimates will contain the actual population mean (of course that also means the 10% of our confidence intervals won't contain the actual population mean).

Consider a point estimate of the mean. What will be the best single estimate of the population mean?

    If we have access to all possible random samples, then our best estimate is the mean of the distribution of sample means. (Recall that the population mean is equal to the mean of the distribution of sample means).

    		   	   population			   sample means

    However, suppose that all we have is a single sample. Now what is our best guess?

      The sample mean. So how good is it?
        (1) It is the only piece of evidence that we have, so it is our best guess.
        (2) Recall, that most of our sample means will be pretty close to the population mean, so we have a good chance that our sample mean is close.

    How can we get an estimate where we'd have a better chance of being right? Instead of giving a point estimate, we can estimate an interval.

      Again, consider the distribution of sample means. If we think in terms of z-scores, pick a range of ±1 z-units. Then what we can say is that about 68% of the possible means are within that range. So we can be quite confident that our population mean fits into that range. If we pick a range of ±2 z-units, we can be very confident. That range is about 95% of the distribution.

      What do the z-units correspond to? The standard error (the standard deviation of the distribution of sample means). The standard error is essentially the average amount that a sample mean will deviate from the population mean. In other words, most of the means will be close to M , but some are further away. The variability of these sample means represents the standard distance between mu and , or the "standard" error distance. It defines the relationship between sample size and the accuracy with which represents mu.

So far we have discussed the general concept of estimation. Now we will formalize things a bit (i.e., do the math). Let's first talk about the logic of estimation, and then move onto the actual formulas that we'll use.

    Step 1: You begin by making a reasonable estimation of what the z value should be for your estimate.
    • For a point estimation, you want what? z = 0, right in the middle
    • For an interval, your values will depend on how confident you want to be in your estimate
    Step 2: You take your "reasonable" estimate for your test statistic, and put it into a formula and solve for the unknown population parameter. Because you use a reasonable estimate for your test statistic, then you should get a reasonable estimate of the population parameter.

      Okay, so what's the formula? The formula to use will depend on the inferential test that is appropriate for the situation. It turns out that the math that we did when we discussed hypothesis tests is essentially the same math that we'll use for estimation. As was the case with hypothesis testing, the research design determines the formula that we use.

      Remember,

      Test Use When
      One sample z-test You have one sample and know the s for the population and have a sample mean to compare to the population.
      One sample t-test You have one sample but don't know the s for the population. You also have a sample mean to compare to the population and a sample standard deviation (s) to use to calculate estimated standard error.
      Related samples t-test You have a matched or related samples design.
      Independent samples t-test You have two independent samples that you use to represent each comparison population.

    Estimation of the population mean using one sample and the population standard deviation

    (the 1-sample z design)
      We'll look at an example for the one sample z-test (in other words we know m and s for the population). We do a little algebra to move the formula around so that instead of solving for a z-score, we solve for the population parameter.
        For the example, let's assume that = 85, s = 5, n = 25
      z = z ---> (z)(s) = x - µ --->
      µ = x - (z)(s)

        So step 1: need to estimate m, so we make a reasonable estimate of z. Our best guess will be when z = 0. So, we plug that into the formula.
        µ = x - 0 * (5 / sqroot 25) = 85.0

        step 2: and we see that µ = x is our most reasonable estimate.

      Okay, so that's the formula for point estimation. What about for an interval estimation?

        We use the same formula, except we change the minus sign to a plus-or-minus sign. This is so we get a high and low value for our interval.
        µ = x ± (z)(s)

      So, the first thing that we want to do is decide how confident do we want to be in our estimate. Let's chose 90%. So we need to go to our unit normal table and figure out between what two z-scores do 90% of the sample means lie. So 10% won't be between, so we want two- tails with 5% in them, so the z-scores are ±1.65.

      µ = x + (z)(s) = 85 + (1.65)(5/sqroot 25) = 86.65
      µ = x - (z)(s) = 85 - (1.65)(5/sqroot 25) = 83.35

    The above example, used z-scores. The same logic will apply to the other statistics we've talked about (e.g., t-statistics).

    Here is a list of all the formulas for estimation by test:

    Test Formula
    One sample z-test µ = x ± (z)(s)
    One sample t-test µ = x ± (tcrit)(s)
    Related samples t-test µD = d ± (tcrit)(sd)
    Independent samples t-test µ1-µ2 = (x1 - x2) + (tcrit)(d)

      (1) Suppose that you give a sample of 1077 students a test of quantitative (math) skills. You find that the mean d = 275. Assume that we know that the population s = 60.
        (a) What is the point estimate of the population mean? What is the standard error for the distribution of sample means for samples of n = 1077. Give a 95% confidence interval for the mean score m in the population of students. What does this confidence interval mean?
        (b) Suppose that the same result, mean x = 275, had come from a sample of n = 250 students. What is the 95% confidence interval for the mean score m in the population of students.
        (c) Suppose that the same result, mean x = 275, had come from a sample of n = 4000 students. What is the 95% confidence interval for the mean score m in the population of students.

      Now try estimation using other research designs we've talked about (you'll need to figure out which test to use before you start the problem).

      (2) Suppose that you give a sample of 244 students a test of verbal skills twice, once under normal conditions and once with a noise distraction. You find that the mean of the difference scores is d = 315 and SS of the difference scores is 25.

        (a) What is the point estimate of the population mean? What is the standard error for the distribution of sample means for samples of n = 244. Give a 95% confidence interval for the mean score m in the population of students. What does this confidence interval mean?
        (b) Suppose that the same result, mean d = 315, had come from a sample of n = 244 students. What is the 90% confidence interval for the mean score m in the population of students.
        (c) Suppose that the same result, mean d = 325, had come from a sample of n = 244 students. What is the 80% confidence interval for the mean score m in the population of students.

      (3) In families with several children, the first-born tend to be more reserved and serious, whereas the last-born tend to be more outgoing and happy-go-lucky. A psychologist is using a standardized personality inventory to measure the magnitude of this difference. Two independent samples are used: 8 first-born children and 8 last-born children. Each child is given the personality test. The descriptive statistics are as follows: first born group mean = 11.4, SS = 26; last-born group mean = 13.9, SS = 30 (high scores reflect more extroversion). Make an interval estimate of the population mean difference so that you are 80% confident that the true mean difference is in your interval.

      (4) A professor notices that students who get a A in physics have high grade point averages in their engineering courses. The professor selects a sample of n=16 engineering majors who have earned As in physics. The mean GPA in engineering courses for this sample is 3.30 with SS = 10. Use this sample to find the 99% confidence interval for the population mean.

      (5) How do confidence intervals behave?

        (a) As the critical test value gets smaller, what happens to the confidence interval?
        (b) As the variability gets smaller, what happens to the confidence interval?
        (c) As you increase sample size n, what happens to the confidence interval?


A few cautions about estimation

  • The data must be from a simple random sample. The formula is not correct for more complex sampling designs (there are other formulas that we can use for those designs).
  • There is no correct method for inference from data from a biased sample
  • Because the sample mean is strongly influenced by a few extreme observations, outliers can have a large effect on the confidence interval.
  • If the sample size is small and the population is not Normally distributed, the confidence level will be different from the critical z value used in the formula above.
  • To use the z-score as your critical value, you must know the value of the population standard deviation s. We will learn in future chapters how to compute confidence intervals if we don't konw the population standard deviation.


Estimating Confidence Intervals with SPSS

Estimating the population mean of a sample (One-sample t-test)

When you wish to estimate the population mean from a sample of scores in SPSS, the procedure is very simple.

Click AnalyzeCompare MeansOne Sample t-test

Select the variable in question in the list in at the left and click the arrow button to move the variable to the list at the right. Leave the Test Value box as 0. If a confidence level other than 95% is desired, click the Options button. Click OK when ready.

Find the lower and upper bounds of the 95% confidence interval in the output.

Estimating the population mean of the difference between two paired sample means (Paired-samples t-test)

Proceed as if conducting a paired-samples t-test and then read the upper and lower bounds of the confidence interval from the output.

Estimating the population mean of the difference between two independent sample means (Independent-samples t-test)

Proceed as if conducting an independent-samples t-test and then read the upper and lower bounds of the confidence interval from the output.