Finding Probabilities in Distributions
We have already seen the f
/ N formula that we're now going to use. Think
back to our frequency distribution tables. We
used this formula to figure out proportions.
In fact, probabilities are most often given as
proportions (but you can also give them as
fractions or percentages).
Consider the following
frequency distribution table and its
histogram.
___________________
X f_ p_
5 2 .05
4 10 .25
3 16 .40
2 8 .20
1 4 .10
|
|
You can see that our
proportion column corresponds to probability.
It in turn corresponsd to the area under
the curve (or in this case under the bars)
for those intervals.
Imagine that they are
numbered tokens in a bag, and that your task
is to reach in and pull one out.
What is the probability of selecting
(sampling) a 3?
What is the probability
of selecting (sampling) a 5?
p(5) = f / N = 2 / 40 =
.05
What about more complex
questions?
What is the probability
of selecting a token with a value greater
than 2?
p(X > 2) = ?
.05 + .25 + .40 = .70
|
|
What is the probability
of selecting a token with a value less
than 5?
p(X < 5) = ?
.10 + .20 + .40 + .25 = .95
|
|
What is the probability
of selecting a token with a value greater
than 1 & less than 4?
p(4 > X > 1) = ?
.20 + .40 = .60
|
|
(1) Consider the
following data. Draw a histogram of
these data and answer the following
questions.
a) What proportion of the scores is
less than 4?
b) What proportion of the scores is
greater than 4?
c) What proportion of the scores is
greater than 1 and less than 4?
|
___________________
X f_ p_
5 5 .10
4 12 .24
3 18 .36
2 10 .20
1 5 .10
|
The Normal Distribution
One of the most commonly
occuring distributions is the Normal
Distribution.
Let's examine the Normal
Distribution and see how we work with
probabilities to find the area under the curve
for different ranges of scores. If a
distribution is normally distributed
then it is symmetrical and unimodal.
A graph of a normal distribution is shown below.
A few things to note about normal
distributions:
- Not all unimodal, symmetrical curves are
normal, but a lot are.
- For this class, we will not worry about
how close a distribution is to normal, in
fact for most of the course we'll assume
that the distribution is normal.
- The equation that defines a smooth curve
like that above is referred to as a probability
density function. In the case of the
normal curve with a mean μ and a
standard deviation σ, the
probability density function is:
f(X)=1σ2π‾‾‾√e−12(X−μσ)2
- The area under the normal curve (or any
other curve) must sum to 1. Why? remember
that the area under the curve refers to the
probabilities (or proportions) and the total
probability must equal 1.
- The normal distribution is often
transformed into z-scores.
- In the image below, you can see the
proportions between each standard deviation
interval.
In the normal distribution with mean μ
and a standard deviation σ:
- 34.13% of the scores will fall
between the mean μ and 1 σ.
- 13.59% of the scores will fall
between 1 σ & 2 σ.
- 2.28% of the scores will fall
between the 2 σ & 3 σ.
|
|
This
relationship is sometimes referred to
as the "68-95-99.7 Rule".
|
In the normal
distribution with mean μ and
standard deviation σ
- 68% of
the observations fall within 1 σ
of μ
- 95% of
the observations fall within 2 σ
of μ
- 99.7% of
the observations fall within 3 σ
of μ
|
|
Answer the following
questions about the normal distribution:
(2a) What percentage of the area under the
curve is between the mean and the right
most end of the curve?
(2b) What percentage of the area under the
curve is within one standard deviation of
the mean (on either side)?
Using z-scores with Normal Distributions
An important tool that we'll use is the unit
normal table. You'll find it in the
back of your reading packet. In this table are a
bunch of z-scores and proportions for the
Standard Normal Distribution (which is the
z-score standarized Normal distribution). In
other words this table allows you to figure out
the area under the curve (and thus the
probability of sampling) at nearly every
position on the curve (defined in z-scores). One thing to keep in mind
about this table is that is that there are
several ways that it gets organized, depending
on the source. So make sure that you
understand the Unit Normal Table that you are
using.
Using the unit normal table.
first column - the Z-score in question
the rest of the columns - the p(Z <
z) - the proportion of the distribution to the
left of the z-score. The heading of the columns
are the second decimal digit.
z |
.00 |
.01 |
-3.4
-3.3
:
:
0.0
:
:
1.0
:
:
3.3
3.4
|
0.0003
0.0005
:
:
0.5000
:
:
0.8413
:
:
0.9995
0.9997 |
0.0003
0.0005
:
:
0.5040
:
:
0.8438
:
:
0.9995
0.9997 |
|
|
So by using the table, we can an ask about
different areas under the curve. We can also go
in both directions. That is, from the table of
z-scores to probabilities and/or from
probabilities to z-scores.
(3a) Find the probabilities
that correspond to the area to the right of
the following z-scores: 2.0, 0.5, -0.75,
-2.0 (hint: sketch the distribution and
locate the score).
(3b) Find the z-scores that correspond to
the following probabilities: 0.5000, 0.8413,
0.3050 these probabilities correspond to
areas to the right of the z-score (hint: a
sketch will be helpful again)
What follows are procedures and
examples of using the Unit Normal
Table
- Finding probabilities from
z-scores
- Finding z-scores from
probabilities
- Finding Percentile ranks
Note:
Don't underestimate the value of
drawing a picture of the
distribution and trying to just
"eyeball" the answer (in addition
to doing the math). It just may
save you from making a mistake. |
Here is the "best" way to find a
probability from the table:
step 1: sktech the distribution,
showing the mean & standard deviation
step 2: sketch the score in question,
being sure to place it on the correct side
of the mean & roughly the correct
distance from the mean
step 3: read the problem again to see
if you need the probability of getting a
score > or <. Shade this area on your
sketch.
step 4: translate the X score into a
Z-score
step 5: Use the correct column (and
sign) to find the probability in the unit
normal table.
Let's look an an example like this:
Example:
Suppose we have a normal distribution of
IQ scores with mean = 100 and sd = 15.
What is the probability of having
an IQ of 85 or less?
p(X < 85)?
For IQ scores, μ = 100,
σ = 15,
z=IQ−μσ=85−10015=−1
Thus, 85 is −1 standard deviations
below the mean.
|
|
Now let's look at finding the z score if
we know the probability. In this case we
start with a probability and find the z
score in the table. Once we have the z score
we can use the z-score formula to solve for
X to get the score.
Here is the "best" way to find a
Z-score from a probability:
step 1: Sketch the normal
distribution
step 2: shade the region
corresponding to the required probability
step 3: locate the probability in the
correct column of the table
step 4: label the edge of the shaded
region with the z-score from the step above
step 5: compute the corresponding raw
score (X).
** keep in mind that the percentile rank
is equal to the probability of being at or
below a given score. Thus, percentile
ranks less than 50% refer to the lower
tail.
Let's look an an example like this:
Example:
What IQ score do you need to have
to be in the top 5% of the
population?
The upper-tail is needed.
p = 0.05
---- look at the table --->
z = 1.65
so X = (1.65)(15) + 100 = 124.75
|
|
Sometimes we need to find the probability
that X will fall between two scores rather
than simply above a score or below
a score.
step 1: Sketch the curve & shade
the region of interest
step 2: Translate both scores to
Z-scores
step 3: Look up the probabilities of
scoring < or > each of the two
z-scores
step 4: Add (or subtract) the
probabilities accordingly
Example:
What is the prob. of scoring
between 300 and 650 on the SAT?
recall: μ = 500, σ
=100
p(z < (650 - 500) = p(z < 1.5) = 0.9332
100
p(z < (300 - 500) = p(z < -2.0) = 0.0228
100
the .9332 from 650 includes the
lower tail, so we determine the
proportion in the lower tail, and
subtract that p(300 < z <
650) = .9332 - .0228 =.9104
|
|
You might want to know what percentage
lies outside two points (essentially
the opposite of the last situation).
Example:
What is the prob. of scoring
lower than 300 or higher than 650 on
the SAT?
recall: μ = 500, σ
=100
p(z > (650 - 500) = p(z > 1.5) = 0.0668
100
p(z < (300 - 500) = p(z < -2.0) = 0.0228
100
the two numbers both reflect the
proportions in the tails, so we just
need to add them together p(300 <
z < 650) = .0668 + .0228 =.0896 |
|
Another thing that you can use the unit
normal table for is to find percentile
ranks
Example:
What is your percentile
rank if you have an IQ of 130?
for IQ scores μ = 100,
σ =15
z = (130 - 100)/15 = 2.0
--look at the table--> p =
0.9772 --> percntile rank 97.72
|
|
Asking questions about the probability of
getting a single score from the population.
Examples:
What is the probability of having
an IQ of 130 or above?
p(X > 130)?
for IQ scores μ = 100, σ
=15
z = (130 - 100)/15 = 2.0
--look at the table--> p =
0.0228
|
|
What is the probability of
having an IQ of 85 or less?
p(X < 85)?
for IQ scores μ = 100, s =15,
z = (85 - 100)/15 = -1.0
--look at the table--> p =
0.1587
|
|
What IQ score do you need to have
to be in the top 5% of the
population?
The upper-tail is needed.
p = 0.05
---- look at the table --->
z = 1.65
so X = (1.65)(15) + 100 = 124.75
|
|
Finding out what the probability of a
single score being within a range of
scores in the population.
Example:
Suppose we have a normal
distribution of SAT scores with μ
= 500 and σ = 100.
What is the prob. of scoring
between 300 and 650 on the SAT?
recall: μ = 500, σ
=100
p(z < (650 - 500) = p(z < 1.5) = 0.9332
100
p(z < (300 - 500) = p(z < -2.0) = 0.0228
100
the .9332 from 650 includes the
lower tail, so we determine the
proportion in the lower tail, and
subtract that p(300 < z <
650) = .9332 - .0228 =.9104
|
|
And finally, you might want to know what
percentage lies outside two points
(essentially the opposite of the last
situation).
Example:
What is the prob. of scoring
lower than 300 or higher than 650 on
the SAT?
recall: μ = 500, σ
=100
p(z > (650 - 500) = p(z > 1.5) = 0.0668
100
p(z < (300 - 500) = p(z < -2.0) = 0.0228
100
the two numbers both reflect the
proportions in the tails, so we just
need to add them together p(300 <
z < 650) = .0668 + .0228 =.0896 |
|
(4) Now try some on
your own:
(a) What is the
probability of having an IQ of 130 or
above?
(b) What is the probability of having an
IQ of 120 or above?
(c) What is the probability of having an
IQ score of 91 or less?
5) The scale for the
SAT is set so that the distribution of
scores is approximately normal with mean
= 500 and standard deviation = 100. You
think that you might need a tutor. You
know of a tutoring service for students
who score between 350 and 650 on the
SAT. You think that you probably fit
within their range. What is the
probability that you will get an SAT
score between 350 and 650?
6) The National
Collegiate Athletic Association (NCAA)
requires Division I athletes to score at
least 820 on the combined mathematics
and verbal parts of the SAT exam in
order to compete in their first college
year. In 1999, the scores of the
millions of students taking the SATs
were approximately normal with a mean =
1017 and a standard deviation = 209.
What is the probability of scoring an
820 or less?
|
The Area Under the Normal Curve Spreadsheet
There are a variety of alternatives to looking
things up in the unit normal table. For
example, I've got an app on my iPhone called
"Bell Curve." that provides an easy visual
interface to get these values. Dr. Joel
Schneider has developed an Excel spreadsheet
tool to do this as well. If you are
interested you can download this
Excel spreadsheet tool and use it.
Whatever method you use, I strongly suggest that
you compare your answers with what you find in
the table at first to make sure that you are
using the method correctly.
To use this spreadsheet:
- Select Score to Proportion if you
know the score(s) and wish to calculate
proportions or probabilities. Select Proportion
to Score if you wish to know a raw
score when you already know the probability or
proportion.
- Select Less Than, More Than,
Between, or Exclude Between,
depending on what you wish to do.
- Enter the mean in the dark box at the top
left.
- Enter the standard deviation in the dark box
at the top right.
- Enter the raw score(s) or proportion(s) that
are known in the dark boxes below the mean and
standard deviation boxes. Remember that
proportions MUST range from 0 to 1. Any value
outside this range will result in an error.
Here is a silent demonstration of how to use
the file:
Example:
Suppose you wish to know what proportion of
scores are less than 5 when μ = 10
and σ = 3.
- You know the score (i.e., 5) and you want to
know a proportion so you select Score to
Proportion.
- You want to know how much of the scores are
less than 5 so select Less Than.
- Enter 10 as the mean
- Enter 3 as the standard deviation.
- Enter 5 as the raw score.
You should now see the answer (0.05) in the Proportion
Under Curve box.
|