The background is the closing weeks for the 2000
presidential election
We had two polls at the same time with apparently different
findings
NBC poll reflects 45% for
Gore and 43% for Bush
CNN poll reflects 42% for Gore and 46%
for Bush
Was one of these polls wrong?
How much variability in polls
is acceptable?
This topic deals with predicting
how much a proportion will vary
Modeling the Distribution of Sample Proportions
Let's imagine simulating many, many random samples of 1,000
respondents
Let's make a histogram of our findings
Let's also assume p (the center of our distribution)
= 0.46
Figure 18.1 on page 459 shows histogram results
of 2,000 simulated samples of 808 each
note that the histogram is very close to
a normal curve with its characteristic bell shape
as you have learned from past application of the normal
model, it is necessary to know two parameters
the mean which in this case is p
and the standard deviation [square root (pq ÷
n)]
the special feature of proportions is that once we have
the mean p, we automatically can derive the standard deviation
with the simple formula [square root (pq ÷ n)]
for the 2000 presidential election Bush received 47.9% of
the popular vote
how probable would this 47.9% be, if the assumption for
p = 46.0% were true?
if we let the mean of a normal curve be 46.0%, how deviant
is a proportion of 47.9%?
what is the standard deviation in this situation?
the standard deviation = square root [(0.46 * 0.54) ÷
1000] = 0.016 = 1.6%
using the 68, 95, 99.7 Rule, 95% of normally distributed
values would fall within the mean of our distribution 46.0 ± 3.2%
the election results popular vote as well as the NBC and
CNN polls fall within this range
How Good is the Normal Model? pretty
good if certain assumptions are met
the book reminds us that models are just that models ("...all
models are wrong, but some are more useful...")
models are not perfect nor are they intended to be - they
are intended to be approximate
for example, small samples do not work very well
however, larger samples do behave in accordance with a NORMAL
MODEL
What must be true to use the normal model for the distribution
of sample proportions?
the sampled values must be independent of
one another
the size of the sample, n must be sufficiently
large
because assumptions can be very difficult or impossible
to check, we assume them
selected conditions can be checked to help support assumptions
10% condition - sample should be no larger than 10% of the
population
sample should be large enough so both npANDnq result in a product > 10
The Central Limit Theorem
We imagine performing simulations under 5 different
conditions (rolling 1 die, rolling 2 dice, rolling 3 dice, rolling
5 dice, and rolling 20 dice --
each condition will contain 10,000 rolls
we roll the single die 10,000 times and make
a frequency distribution plot - this plot is uniform with approximately equal
frequencies for each of the die's six results: 1, 2, 3, 4, 5, 6
with two dice, 10,000 rolls, take the average
of each roll and make a frequency distribution
if you have not stopped to think about the average for
rolling two dice, the frequency distribution will not make much sense
to you
look at the shape of the resulting frequency distribution
with three dice, 10,000 rolls, take the average
of each roll and make a frequency distribution
compare this distribution with the previous one
with five dice, 10,000 rolls, take the average
of each roll and make a frequency distribution
compare this distribution with the previous ones
with 20 dice, 10,000 rolls, take the average
of each roll and make a frequency distribution
compare this distribution with the previous ones
what should we be noticing when we compare these frequency
distributions?
what is happening with the shape of the distributions?
what is happening with the spread?
what is happening with range?
what is true of this dice simulation is true of means for
any repeated samples
this phenomenon is known as the Central Limit Theorem
the sampling distribution of any mean becomes Normal
as the sample size grows
this is true even if we sample from a non-normal distribution
as the sample size gets larger the variability gets
smaller
in fact, the standard deviation falls by the square
root of the sample size