We need to concentrate on three basic ideas surrounding
the WHO in a survey
Sample part of the whole
Make sure one obtains a random sample
Make sure the size of the sample is adequate
Common problems
Volunteers
Samples of Convenience
Lack of coverage
Sources of bias not related to sample
survey validity or reliability
survey length
missing data
Experimentation
Non-experimental methods
observation
retrospection
correlational
lack of a control group
Experimental
randomization
control
manipulation
Four Principles of Experimental Design
randomization
control
replication
block
Evaluation of experimental data
how are the data distributed
what is the proper test to apply to the data distribution?
Probability
Do you know the difference between randomness and chaos?
Probability of an event is it long-run frequency?
What is the difference between probability and odds?
What is a trial? For any random phenom, each attempt
or trial generates an outcome
What is an event? Combination of outcomes are events
Simple probability deals with random and independent events
what does this mean?
LLN long-run relative
frequency of repeated independent events gets closer and closer to the true
relative frequency as the number of trials increases. How does this
compare with the law of averages?
Probability in a formal sense takes on values between 0
and 1
During a trial something has to happen. The set of all possibilities
is known as the sample space. The sum of all possibilities is 1.000.
If A represents an event, the set of outcomes not in A is
the complement of A. The probability that event occurs is 1 minus the probability
that it does not occur.
What are disjoint events?
favorable events divided by the total possible events
- simple probability where all outcomes are equal
probability of flipping a heads in a single coin flip =
0.5
1 favorable event (heads) divided by 2 possible events
(heads or tails)
probability = 0.5 for flipping a tails in a single coin
flip
what is the probability of selecting the correct of
4 multiple choice answers when you have absolutely no idea about the question?
assuming that there is only a single correct response
and you are going to guess, the probability is 1/4 (or 0.25)
Additive Law
when there is more than one way to achieve a favorable
event (in only one attempt or trial), each of these favorable events must
be summed and divided by the total possible events
the probability of drawing a diamond OR a club
from a 52 card deck
there are 13 ways to draw a diamond and 13 more totally
different (or mutually exclusive) ways to draw a club
thus (13 diamond favorable events + 13 club favorable
events) ÷ (52 total possibilities) = 0.5
it is sometimes difficult to accurately note all the separate
favorable events as well as the separate possible events
the text example warns us that the probability of
drawing (in a single draw) a king OR a diamond from a deck is
not the sum of drawing a king plus the probability of drawing a diamond,
since king of diamonds satisfies both the king and diamond condition
simultaneously
in contrast there is no overlap if we want to determine
the probability of rolling a one OR three with a single die in
a single roll
note that so far we have only talked about a single
trial or attempt, (a single draw from a card deck or a single roll of
the dice)
however, we have noted that the probability increases
if we include multiple potential favorable outcomes for the single trial
as the text advises, the additive law can be used any
time when the conjunction "OR" connects mutually exclusive
favorable events that could occur in a single trial
there are 36 different possibilities when rolling a single
die, two times
trial 3 shows us the single favorable outcome of rolling
a 1 and THEN a 3
the probability of a favorable event divided by all possible
events = 1/36
Let's say we change the wording of this last problem slightly
so that we want to know the probability of rolling a 1 and a 3 in any order
- that is, either first a 1 ANDthen a 3 ORfirst
a 3 ANDthen a 1
notice that we now have a situation that is compounded by
having both an AND and an OR
as you probably would guess, we will need to apply the multiplicative
law for the use of the AND and the additive law for the use of the
OR
there remain a total of 36 possibilities for the two rolls
of the die
however, two trials in the table above show that both trial
3 and 13 are favorable events
thus, we have 1/18 = (1÷6) x (1÷6) + (1÷6)
x (1÷6)
The above examples have been fairly simple, but this style
problem can really become complicated and I suggest that you try to solve many
more of the problems than are listed in this week's assignment.
Random & Independent Selection
Random - all elements in a population have an equal chance
of being selected
Independent Selection - the probability of a given event
is not dependent on any previous event
the importance of meeting these two selection criteria is
for proper generalizing from a sample to a population
it is worth noting that although these two criteria are
almost always assumed, they are not so frequently achieved
Binomial Distribution
the binomial distribution is one of a number of distributions
that statisticians use to compute probabilities associated with events or
outcomes
the binomial distribution has a structure of n independent
trials, each of which can have only two possible outcomes, (1) "success"
P or (2) "failure" Q
use of binomial expansion allows us relative ease of computing
the probability of any combination of P and Q outcomes
in the case of flipping a coin, the probability of P
and Q are equal
in the first few examples we will use H for heads and
T for tails
the probability for a heads or a tails in a single flip
is 1/2 or 0.5
there are 10 levels or rows to this numerical pyramid
numbers on each row represent coefficients for the expanded
binomial terms
each numbered row corresponds to the exponent N in the general
formula
the leading 1 in the pyramid is always the coefficient for
the first term which is always raised to the power of the row number
above we expand the binomial for N = 3
the first term, the probability of 3 heads in 3 flips,
1/8 = 1/2 x 1/2 x 1/2
the exponent 3 is the key
3 heads in 3 flips
similarly for 3 tails as the last term, 1/8 = 1/2 x
1/2 x 1/2
similarly, the exponent 3 is the key
3 tails in 3 flips
the 2nd term is for 2 heads and 1 tail, 3/8 = 3 x 1/2
x 1/2 x 1/2
the exponents for H and T are 2 and 1 respectively
indicating 2 heads and 1 tail in 3 flips
the 3rd term is for 1 head and 2 tails, 3/8 = 3 x 1/2
x 1/2 x 1/2
the exponents for H and T are 1 and 2 respectively
indicating 1 head and 2 tails in 3 flips
note how the exponents in each term sum to 3
note also, if we sum the probabilities for all four
terms, 1/8+1/8+3/8+3/8, we obtain 1
let's examine the coefficients more closely
both the first and last term have an implied 1 indicating
only 1 way to get 3 consecutive heads (or tails) in only 3 flips
for the second and third terms with a 3 indicate the
3 different ways in which one can get either 2 heads and 1 tail (or 1
head and 2 tails) in 3 flips of a coin
As indicated binomial expansion can get to be problematic for
larger exponents, but the general formula for any set of binomial events is
not so bad if you will take a few moments to break this formula into smaller
components.
Pr (X) = probability of X binomial favorable events
N = total number of events
! = factorial operation (e.g., 5! = 5 x 4 x 3 x 2 x 1 =
120
P = probability of success in a single event
Q = probability of failure in a single event
Let's take an example dealing with rolling a single die. What
is the probability of rolling exactly one 6 in three rolls of a die?
From the information supplied in the last question, we need
to first substitute values for each variable.
X = 1
N = 3
P = 1/6
Q = 5/6
now we can build the formula with numeric substitutions
and solve for Pr(X)
thus, the probability of rolling a single 6 in 3 rolls of
a die = 0.347
We have now seen how to calculate some probabilities of random
events, but what does this have to do with either the practice of statistics
or Decision Making?
Perhaps we should return to where we left off in Lecture 4
with the coin toss for who buys coffee
remember we were going to make a decision about whether
or not our friend was cheating by using a biased coin
the friend had successfully won the coin toss on 4 consecutive
days with 4 heads in 4 consecutive flips
with 4 heads in 4 trials, we have a probability of 0.06
= 1/16 = (1/2)4
since we have a p = 0.06 and that is greater than
p = 0.05, we cannot reject our null hypothesis - (anyone remember what
our null hypothesis was?)
Let's take a fresh example dealing with a study performed here
at AU.
a HFM graduate student was doing a thesis project on smoking
she had found statistics published by RJ Reynolds that approximately
23% of the US population over 18 currently smoke cigarettes
she had observed what she considered a higher incidence
of smoking at AU
she formulated a hypothesis that the incidence of smoking
on the AU campus exceeded the incidence claimed by RJR
what was her null hypothesis?
she suggested at her proposal defense that all she needed
to find was at least 16 smokers among 50 randomly selected students and she
could reject her null hypothesis
how could you determine if she were on the right track?
Summary
most research problems focus on cases exhibiting one or
more variables
when we have selected our cases randomly and independently,
we will be able to create a frequency distribution of our variables
depending on the nature of the variable(s) our frequency
distribution will be a specific type of probability distribution
we have focused on the Binomial Distribution, but there
are other distributions
each distribution has characteristic ways in which sample
means and sample standard deviations are distributed
statistical inference is based on determining the nature
of a distribution and using the proper statistical tools to describe the distribution