The use of STANDARD SCORES provides a method of converting
any given units of measure to a standard that can be compared with standard
units of another measure. A rough analogy is the way we often use percentages
(proportions) for the purposes of making comparisons.
To see how percentages convert to an objective standard units
let's consider the class data set for Fall 2002. Below you can see that I have
abstracted a few columns from these data including ID, age, and GENDER.
these data are sorted by gender AND ALSO age within gender
you can see that there are 22 total cases, including 20
women and 2 men
there are clearly more women in the group than men (91%
are women and 9% are men; why is this so?)
if I were to ask you about the relative number of women
to men less than 22 years of age there are at least two ways that you could
respond....
we can see that there are five
women and one man with ages less than 22 years
however, since there are a total of 20 women and a total
of 2 men, we could also say that in this present data set, 25.0%
(5 ÷ 20) of the women and 50.0% (1 ÷ 2) of the men exhibit
an age less than 22 years;
since our class is not evenly split (i.e., 50% women &
50% men) between women and men, we can standardize the relative number of
women and men < 22 years of age, by creating two proportions
the proportion is created by dividing the number of counts
(of a certain condition such as < 22 years of age) by the appropriate denominator
reflecting the total number of women or men
thus, our proportions of (25%) women and (50%) men <
22 years of age is sort of a standard score
Standard Scores perform a function allowing us to make similar
comparisons - let's take an example...
Are you heavier than you are tall? This
sounds absurd, but statistics can provide a pretty logical answer.
to answer this question we need to see that we must compare
two variables with different units of measure, weight and height
proceeding, we must know not only our weight and height,
but also some tools for helping us normalize the units of measure for
height and weight
the tools are in the form of NORMS (aka statistics and
sometimes parameters)
one necessary NORM is average height and [height] standard
deviation for the appropriate population
another necessary NORM is the average weight and standard
deviation for the appropriate population
our hypothetical person is 73 inches tall and weighs 210
pounds
for our population mean height = 70 inches with a standard
deviation of 3 inches
for our population mean weight = 170 pounds with a standard
deviation of 20 pounds
NOTE: WE CAN
EASILY SEE THAT THE HYPOTHETICAL CASE IS BOTH ABOVE THE AVERAGE HEIGHT AND
ABOVE THE AVERAGE WEIGHT...the issue is which measure in this case is MORE above
the mean, height or weight?
the formula for computing the
normalized or standard score height is
There are a number of other statistical
techniques for calculating standard scores and most are derived from the general
standard score formula:
note that there are two new terms in this general formula
that were not apparent in the z-score calculation for our hypothetical
heavier than tall person
in the z-score calculation the standard score standard
deviation = 1
further, the standard score mean = 0
thus, we actually used these two terms, but there was no
visible impact of multiplying by one and adding a zero.
The textbook describes two other specific standard scores
a STANINE uses a standard score standard deviation = 2 and
a mean = 5
a Z score (that is capital Z, before it was lower case z)
is scaled differently with a standard score standard deviation = 10 and a
standard score mean = 50
Both stanine and Z scores are frequently used, but we will
only deal with them a small bit. Another commonly used standard score that I
believe that you will find more practical is the ORDINAL STANDARD SCORE
- sometimes referred to as centiles. In contrast to the INTERVAL standard
scores we have already discussed, Ordinal standard scores are ordinal.
calculations of ordinal standard scores is very simple,
but does require the construction of a CUMULATIVE FREQUENCY DISTRIBUTION
using the example in the textbook, we see that we have 10
scores, which is cumulative frequency of all the scores
there are five unique scores in this example from 1 - 5
and the distribution as a simple list looks like this: 1, 1, 1, 2, 2, 2, 3,
3, 4, 5
we can easily turn this into a table representing the cumulative
frequency distribution
the X column contains the raw scores 1-5
the f column illustrates the frequency of each score
the cf column reflect the cumulative frequency of each X
score
start at the bottom with X = 1, which has a frequency
of 3
moving up a single row, X = 2 with a frequency of 3
but a cumulative frequency of the 3 for X= 1 + 3 for X = 2 or a cumulative
frequency of 3 + 3 = 6
as you can see the cumulative frequency for the entire
distribution is 10, which is the sum of all the individual frequencies
in column f and also = N, the total number of cases
the CENTILE formula is:
we can now complete the cumulative frequency table with
the centile scores
prior to now, you probably preferred to hear that a result
was in the 98th percentile rather than a result was 2 standard deviations
above the mean
now however, you understand both of these results with equal
clarity
Normal Distribution
Description of data is the not the final goal of research,
but we achieve to make inferences so that we can explain and predict
research findings.
a statistical inference is decision making in the absence
of complete information
to understand inference we must know about distributions
the value of a set of scores (X) and their corresponding
frequency (f) are the basis of a distribution
when a distribution exhibits a certain appearance and
characteristics it is distinguished by calling it a normal distribution
Characteristics
Because of the characteristic shape, the normal distribution
is frequently called a bell-shaped curve
this distribution is symmetrical with half of the curves
area on either side of the middle of the curve
the middle of the normal curve is simultaneously the mode,
median, and mean
all normal curves are not superimposable, but can exhibit
different central tendency and variability
moreover the normality of a distribution does not depend
on either the mean or the variability, but on a characteristic mathematical
equation
in fact, perfectly symmetrical distributions are rarely
seen in practice
and when a sample is drawn from a normally distributed
population, the resulting sample mean and sample standard deviation rarely
exactly match population mean and population standard deviation
however, the sample STATISTICS usually closely
approximate the population PARAMETERS
sample statistics M = mean; S = standard deviation
population parameters m = mean; s = standard deviation
any normal distribution, regardless of mean and standard
deviation can be converted into the z distribution
z = (X - m)/s
Area Under the Normal Curve
no matter what the absolute units of measure of the content
under a curve, the total content under a curve is 100% of the area
with mathematics beyond the scope of this course one can
determine specific areas under the curve, such as between major X score
divisions
with z scores reflecting standard deviations, it is practical
to know the relative percentage of areas associated with each standard deviation
above you can see that 34.13% of the area is contained
between the mean at 0 and 1 standard deviation from the mean
since the normal distribution is symmetrical, an additional
34.13% of area falls between the mean at 0 and -1 standard deviation
thus we have 68.26% of the area under a normal curve falls
between -1 and +1 standard deviation from the mean
as you can see above, an additional 13.59% of the curve
falls between 1 and 2 standard deviations
and area from beyond 2 standard deviations is 2.28%
So how can we do anything practical by knowing any of these
percentages?
let's assume that you have just taken a qualifying test
for a new job
you have heard that only people who score in the top 5.0%
get to have an interview
you have been among 100 individuals who have taken the
test
you know that the score distribution is normal with a
mean of 50 and standard deviation = 10
you also know that you scored 70
Let's see if you can answer these questions:
1) will you be among those who get an interview?
2) what is the lowest raw score to qualify for an interview?
3) how many individuals will have an interview opportunity?
4) how many people had a lower score than you?
5) how many people had a higher score than you?
6) how many people scored less than than the mean?
7) how many people scored between 30 and 70?
8) how many people scored higher than 1.5 standard deviation
above the mean?
click here to obtain the
Excel file with solutions
...making decision in absence of complete information
How does one construct an objective method of practicing
inference?
What is meant by HYPOTHESIS?
How is a hypothesis constructed?
What is meant by the NULL HYPOTHESIS?
What is meant by the Alternate (or Experimental)
Hypothesis?
The above picture of overlapping
curves and explanation is an important key to understanding the process of statistical
decision making through hypothesis testing. A few more comments about this key
will enhance the explanation.
to test a research idea, this idea must be stated in terms
of a hypothesis
the syntax of the idea is usually of the form, "if
x..., then y...." and this statement is normally considered H1
one obtains support for H1,
by rejecting its counterpart, H0
one can either retain or reject H0,
but not both
in this way H0 and H1 are mutually exclusive
whether one rejects or retains H0,
there is always, at least some small chance of error
if H0 is true, the distribution is depicted by the right-hand curve
small values of this right-hand curve are unlikely
similarly, if H1 is true, the distribution is depicted by the left-hand curve
large values of the left-hand curve are also unlikely
These two distributions show why H0
is rejected for small values of the right-hand curve and accepted for large
values of the left-hand curve.
if H0 is true, the most likely values of the right hand curve will tend
to cluster around its mean, m0
however, if H1 is true, the most likely values will tend to cluster around m1
Decision
Table
we saw above that we have four possible outcomes based on
two possible states of reality in combination with two possible decisions.
a tabular representation can help us visualize the four
possible outcomes
notice below that the two columns are titled "REALITY
or TRUTH"
the two rows are titled "DECISION"
the intersection of two columns with the two rows gives
us the four possible outcomes
based on a statistical result we must choose one and only
one of these four possible outcomes
a frequent misunderstanding of students of statistics (and
this includes many professionals) is that for most problems we cannot KNOW
"Reality" or "Truth" but we are trying to
use a statistical calculation to make a "Decision" in the
face of incomplete information
we make a "Decision" to either Reject or
Retain H0
You and your good friend Ima Matthwicz meet each morning before
going to work and school. Ima has talked you into flipping a coin to see who
buys coffee at the corner Starbucks. Curiously, Ima produces a coin for you
to flip and she calls it in the air. She calls it heads and is correct. You
buy. The following morning when you meet and she again hands you a coin to flip
and she calls it heads correctly and again you buy. She has won two for two.
These circumstances repeat exactly for two more consecutive mornings so that
she has allowed you to flip the coin, but she has called it heads correctly
and you have bought coffee each of four days in a row. Could Ima be cheating
somehow -- is the coin that she is handing you to flip a biased coin that always
comes up heads? How many consecutive heads does it take to determine that a
coin is weighted so that it always lands on the same face? In many statistical
situations, we decide that chance is defied if an event occurs less than or
equal to five percent of the time (p
£ 0.05). This means that something happening by chance
can occur greater than five percent of the time, but less than or equal to five
percent of the time suggests that there is some compelling reason that an event
has happened and that there is something other than chance involved.
the tables below and the one above are parallel
note that there are two ways in which one can be correct
and there are two ways in which one can make an error
alpha error is the one you have probably heard about
the most (e.g., p = 0.05)
and generally, statisticians are a little more concerned
with alpha errors,
but one cannot avoid at least some worry over beta errors
since alpha and beta are related in a counterbalanced
way, both errors cannot simultaneously be reduced to extremely small levels
most importantly, we NEVER can use this decision making model
to PROVE either a null or an alternate hypothesis; we can only REJECT
or RETAIN the null hypothesis!
Statistical
Power
POWER is the probability that
a false null hypothesis will be rejected
ideally, we would like for power to be large (e.g., 90%,
95%, or 99%)
maximize our ability to support our alternate hypothesis
a good chance (or high probability) for us to reject
a false null hypothesis
if power is too low, there is little chance of finding
a significant difference, even if a real difference exists
Power = 1 - b
in addition to b, there are a handful of other
drivers of statistical power
a,
normally set to 0.05
N - the size of
the sample
effect size, that is the magnitude of the difference
between two means, m0 & m1
for continuous measures, the standard deviation s
thus, power is related to each of these other measures (i.e.,
a, b, N, effect size, s)
although power is useful to determine the likelihood
of finding a real effect...
...it is also useful in estimating an appropriate sample
size
it is very practical to use power calculations to help us
estimate appropriate sample size
this is possible by solving the power formula for N
why would we want estimate sample size?
if you would like to review my narrative
of power and some examples of sample size estimation, follow this link.
For some additional
examples, please examine this page!
What are the potential limitations on designing a study to
have 90% - 95% power?
how is power related to sample size?
how is power related to effect size?
how is power related to a error?
Some Examples
are right-handed and left-hande people equally common in
a group that has been sampled randomly
the problem: essentially are people right- and left-handed
in equal proportions
the solution: a null hypothesis, alternate hypothesis, and
a test
H0: there is NO difference between the proportion of left-
and right-handed people, or in other words right-handers = 50% and left-handers
= 50%
H1: there is a difference between the proportion of right- and left
handed people, or in other words the proportion of right-handers to left-handers
is not equal
the book claims that we need to find only 61 right-handers
out of 100 randomly selected people to reject H0
we must attempt to reject the null hypothesis by demonstrating
that our OBSERVED proportion of right-handers (61%) is greater than the
EXPECTED proportion of right-handers (50%), and the OBSERVED proportion
of left-handers (39%) is less than EXPECTED proportion of left-handers
(50%)
in fact, we can reject H0 because
the probability is less than 0.05 that we would obtain such a great difference
between our OBSERVATIONS and our EXPECTATIONS
since we were able to reject the null hypothesis, we must
have had enough statistical power
what hypothesis do we retain?
what would happen if we tried to do the same study with
a total of 50 observations? (see this link to compare the p values for a sample
of 100 randomly selected people and a sample of 50 randomly selected people)
the null hypothesis would remain the same
the alternate hypothesis would remain the same
we would apply the same test
let us assume that we have the same proportion of right-handers
as before 61%
we cannot reject the null hypothesis now because the probability
is greater than 0.05
we are using the same proportions as before, but what
is different
the total number of cases has been cut in half
the reduction in the sample size has reduced our statistical
power
Assignment
#4, Due prior to class 9/24/09
Text Reading & Text Problems
Read De Veaux Chapter 6, 7, 8
Problems All Problems are additional problems based on Chapter
6.
Additional Problem A
1) Make your own solution of the class data set computing
heavier than tall for each of the 13 students. In order according to case
ID, display the corresponding z-scores, and indicate for each of the 13
cases
whether or not they are heavier than they are tall. Assume that the sample
summary statistics are representative of the population parameters.
2) What proportion of our 13 cases is heavier than they
are tall?
Additional Problem B; download the class example z-score
solutions for whether or not you will get an interview
from your qualifying test score
A new cancer treatment has been developed and it has been
put into clinical trials around the world. It is thought that this new treatment
could escalate total cholesterol levels and thus cholesterol needs to be monitored
closely. You are in charge of the data for two of the clinical centers, including
one here in Washington and the other in Milan, Italy. Each of these two centers
has enrolled 50 patients and each of the two centers has a normal distribution
for cholesterol values, but unfortunately the normal ranges are different.
In Washington, the mean and standard deviation are 225 ±
33 mg/dl and in Milan the mean and standard deviation are 190 ± 30
mg/dl. Further, the most extreme patient in Washington has total cholesterol
of 320 mg/dl and the most extreme patient in Milan has a total cholesterol
of 280 mg/dl.
1) is the value from Washington or Milan more deviant?
2) what is the Washington raw score delineating the top
10% of all total cholesterol values?
3) in Milan, how many patients exhibit a total cholesterol
less than 150 mg/dl?
4) in Washington, how many patients exhibit a total cholesterol
less than 150 mg/dl?
5) how confident should we be that these extreme values
would not be exceeded in the future and why?
Additional Problem C
1) In your own words describe the difference between the
null hypothesis and the experimental hypothesis?
2) State in your own words an experimental hypothesis involving
some aspect of health and behavior. If you were to test this hypothesis, provide
a statement of the hypothesis that you would test?