
HFIT-565
Assessment & Evaluation
of Health Fitness Parameters
Fall Semester 2008
Dr. Marc Schaeffer
mschaef@american.edu
Lecture Notes Class #11
Thursday November 6, 2008

Go to Course Syllabus
Lecture #10
Lecture #12
BACK TO TOP
c2
This statistical test for comparing proportions is simple and
convenient, but an often abused test. To perform this test we need to have data
in the form of observations and these observations are compared with formula
generated expectations. Many students become unnecessarily confused in pondering
the difference between observed and expected values.
- observed values are the data gathered and tallied
to test a hypothesis
- in contrast, the expected values are data derived
to test the null hypothesis
It may not be immediately obvious, but the c2
test does NOT test absolute counts, but relative proportions. Although
observed counts are the raw input, the results are adjusted through a process
of generating expected values and creating a ratio of observed to expected
values. It is well worth your time to spend a few moments considering the formula
for this test.
As an initial example, let's go back to a problem we first
looked at in Lecture #4. If you want to view the problem in its original context
click here, but it is repeated below. As you
can see below the emphasis is more on the added power one derives from a larger
sample. As you read through this example again, you should be asking yourself
how the EXPECTED VALUES are derived.
- lets consider the example problem in Lecture #4 dealing
with right- & left-handedness
- the problem: essentially are people right- and left-handed
in equal proportions
- the solution: a null hypothesis, alternate hypothesis, and
a test
- H0: there is NO difference between the proportion of left-
and right-handed people, or in other words right-handers = 50% and left-handers
= 50%
- H1: there is a difference between the proportion of right- and left
handed people, or in other words the proportion of right-handers to left-handers
is not equal
- Lecture #4 suggests that we need to find only 61 right-handers
out of 100 randomly selected people to reject H0
- we must attempt to reject the null hypothesis by demonstrating
that our OBSERVED proportion of right-handers (61%) is greater
than the EXPECTED proportion of right-handers (50%), and the OBSERVED
proportion of left-handers (39%) is less than EXPECTED proportion
of left-handers (50%)
- in fact, we can reject H0 because
the probability is less than 0.05 that we would obtain such a great difference
between our OBSERVATIONS and our EXPECTATIONS
- since we were able to reject the null hypothesis, we must
have had enough statistical power
- what hypothesis do we retain?
- what would happen if we tried to do the same study with
a total of 50 observations? (see this link to compare the p values for a sample of
100 randomly selected people and a sample of 50 randomly selected people)
- the null hypothesis would remain the same
- the alternate hypothesis would remain the same
- we would apply the same test
- let us assume that we have the same proportion of right-handers
as before 61%
- we canNOT reject the null hypothesis now because
the probability is greater than 0.05
- we are using the same proportions as before, but what
is different
- the total number of cases has been cut in half
- the reduction in the sample size has reduced our statistical
power
BACK TO TOP
If you were awake when you read the above example, you recognized
early that the null hypothesis is that there is no difference between the proportion
of individuals who are right- and left-handed. Because there must be exactly
the same total of observed cases as expected cases,
you may be able to figure out how the expected values were generated in this
problem. There is a very simple set of rules to generate the expected
values for any problem of this nature and you will see these in the next example.
Read the following given parts
of this hypothetical problem and click on the link next to the Expected
values.
Hypothetical example:
- we are evaluating a worksite health promotion program at
an office with 5,000 employees
- in many, but not all programs of this nature, we see greater
membership of men than women
- one question to answer is whether or not men are participating
in greater proportion than women
- what is the null hypothesis and what is the alternate hypothesis?
- we need to know:
- how many men and women are members of the health promotion
program?
- there are 664 members who are men
- there are 397 members who are women
- how many men and women are employed at this location?
- there are 3,000 employees who are men
- there are 2,000 employees who are women

- the above facts are our OBSERVED data
- we now have all of the raw data to generate a solution
- we must now generate our EXPECTED data
- generation of c2 EXPECTED values
- calculate the marginal values for the observed
matrix
- the marginal totals for women and men as seen above
are 2,000 and 3,000 respectively
- the marginal totals for Members (no, yes) are 3,939
and 1,061 respectively
- calculate the expected values by multiplying the marginal
row & column totals and dividing by the total number of cases

Now click here to see a detail of how Expected values are generated
BACK TO TOP

- each of the four cells of this design will be substituted
into this formula
- thus, the following would be necessary if
computing by hand (but we will be using Excel):

- this expression simplifies to a c2
value = 3.742
- now we must determine the proper degrees of freedom
- number of rows minus 1 = rows df
- number of columns minus 1 = column df
- multiply the rows df by column df to obtain
the c2
degrees of freedom
- in this example 1 = (2 - 1) x (2 - 1)
- in the text Appendix D (on page A60), we read the c2
critical value (at p = 0.05) for 1 df = 3.841
- we do not reject the null hypothesis because we did not
meet or beat the tabled value (tabled value = 3.841) with our test statistic
(test statistic = 3.742)
- Assumptions
- must use frequency data (counts of persons or events)
- if the expected value in any cell is less than
5, another test should be used
- the sum of the observed frequencies must equal
the sum of the expected frequencies
- each frequency count must be independent of each other
count
- Using Excel to assist with c2 solutions
- there are three different c2
functions
- use chitest to generate a p value from
observed and expected data
- use chidist to determine the p value if you
know
chi-square and df
- use chiinv if you want to determine a chi-square
value for a given p value and df
- you will probably find chitest to be the most useful application
in Excel
- an example of each of these c2
functions will be demonstrated in class
BACK TO TOP
Student's t Distribution
- similar to the z-distribution, the t-distribution
is important in estimating the true mean (m) from the sample mean
(M)
- the t distribution however, does NOT make use of the
true standard deviation (s)
- there are a few requirements
- sample should be drawn from a normally distributed population
- the sample should have a mean (M) and standard
deviation (S)
- additionally, we need an estimate of sM and
for this we can use SM

BACK
TO TOP
- you should be able to see the subtle but important difference
below

- also important, is that this z distribution
is normal but the t distribution is not
- additionally, the t distribution changes with the
size of the sample
- however, when the sample becomes infinitely large, the z
and t distributions are exactly the same
- consequently, we must have a method of dealing with changes
in the t distribution as the sample size varies
- this is accomplished through the associated degrees
of freedom (df) for t
- you will need to learn how to read Appendix E to interpret
t values properly
- for t, df = N-1
Testing Hypotheses about Sample
Means
- let's say we know a population mean and we want to compare
a sample mean to it
- assuming we have the population mean and sample size, mean
and a sample standard deviation, we can proceed.
- let's solve the example where we know the adolescent boy
population mean for daily calcium intake and want to compare it to a sample
mean for 64 impoverished boys from an urban ghetto
- we are given the population mean of 1400 mg and the sample
(N = 64) mean is 1200 mg with a standard deviation of 200 mg
BACK
TO TOP
- how many cases (N) are there in this sample? what
is the square root of N?
- what is the null hypothesis?
- what is the alternate hypothesis?
- first we calculate the value for SM
SM = 200 ÷ 8 = 25
- next we use the t formula

- with N = 64, df = 63
- we look in Appendix D (on page A59) to see that we need
to exceed a t value of about 2.000 to reject the null hypothesis
- so, can we reject the null hypothesis with our t
= 8.000? yes, we can since 8.000 is > 2.000
by a large margin
- what is the interpretation of this finding??
- what is the 95% CI of the true mean? (see below
for how we make this calculation)
Assumptions
- random sampling
- scores in the sample are independent of one another
- sample comes from a normal distribution
- the population
standard deviation is unknown and the sample
standard deviation is used
- the measurement scale is at least interval
Confidence Intervals
- if we take the t formula and solve for m, we obtain:

we must know M, SM, and t
- given the above, we can then calculate the 95% CI of the
true mean
- lets calculate the 95% CI for the last problem
involving dietary calcium in the normal adolescent population compared with
impoverished boys
BACK TO TOP
- how do we start this process?
- what do we want to know? the CI of the true mean
- what formula must we use? and what variable are we solving
for?
- we will use the t formula solved for m
- what values must be given to proceed? we must know M,
SM, and t
- let's substitute the values for these variables that
are given M, SM, and t
- M = 1200, SM = 25,
but how do we determine t?
- how many df are there in this problem? - reminder:
there are 64 subjects, so there are 63 df
- if we now turn to Appendix E and read the t value
for 63 df for alpha = 0.05 we see t = 2.000
- thus, we now have m = 1200 ± (2.000)*(25)
- now that we have covered all this, what violation of
assumption have we made?
Testing the difference between
two sample means
- for us, this is perhaps more practical than comparing a
sample mean with the population mean
- in other words, it is probably more useful for us to focus
on problems involving t-distributions when we are trying to determine
if two sample means came from two populations with the same mean
- since we have two sample means we need to allow for respective
SEMs for each mean
- when the size of each of the two samples is the same, we
have the following expression in the denominator of the t formula
- so when the two sample size are equal, the revised t
formula becomes
- the df for this t = N1 +
N2 - 2 (e.g., N1 = N2
= 15; df = 28)
- now, let's do an example to see how all of this works
We have randomly selected 16 high school seniors and randomly
assigned 8 subjects into each of two groups. One group served as study controls.
The other group received a daily two-month intervention of climbing a 50 ft
rope. To test whether or not upper body strength was affected after the two-month
intervention, each group member was asked to perform as many pull-ups as possible.
With the following counts for pull-ups per individual, does it appear as though
the intervention group performed better than the control group on the upper
body strength test?
BACK TO TOP
- what are the independent and dependent variables?
- what is the alternate hypothesis and what is the null hypothesis?
- what is the value of t?
- how many df are there for this problem?
- do we reject or retain the null hypothesis?
Assignment #11,
Due prior to class 11/13/08
Text Reading & Text Problems
- Read De Veaux Chapters 23 24
- Problems Chapter 23: 3, 4, 6, 8, 14, 18
Additional Chi-square & t-test problems you
will have to use mid-term exam raw data to perform each of the following.
- A. derive the proper chi-square to agree with the
mid-term Table 1 row for females sex (SEX is the variable name)
- 1 give the null hypothesis
- 2 produce the table of observed counts
- 3 produce the table of expected counts
- 4 how many degrees of freedom are there
- 5 what is the chi square test statistic
- 6 what is the p-value for this test statistic
- 7 conclusion
- 8 interpretation
- B. derive the proper chi-square to agree with the
mid-term Table 1 row for marital status of participants (MSTAT is the
variable name)
- 1 give the null hypothesis
- 2 produce the table of observed counts
- 3 produce the table of expected counts
- 4 how many degrees of freedom are there
- 5 what is the chi square test statistic
- 6 what is the p-value for this test statistic
- 7 conclusion
- 8 interpretation
- C. derive the proper chi-square for
mid-term variable SATIS (as the DV), but splitting the data in two groups
as follows; split the 500 cases in two unequal sized groups with one being
less than
or equal
2 lbs
for
WT_CHG
and the
other
group
being those greater than 2 lbs of WT_CHG; the sum of these two groups should
still be 500 cases Thus, the two groups representing two levels of the
IV are ≤ 2 lbs WT_CHG and > 2 lbs WT_CHG and the DV is SATIS.
- 1 give the null hypothesis
- 2 produce the table of observed counts
- 3 produce the table of expected counts
- 4 how many degrees of freedom are there
- 5 what is the chi square test statistic
- 6 what is the p-value for this test statistic
- 7 conclusion
- 8 interpretation
- 9 plot the appropriate graph to exhibit this result
- D. derive the proper t-test to agree with the mid-term
Table 1 row for weight loss (WT_CHG) is the variable name)
- 1 give the null hypothesis
- 2 what is the independent variable and how is it scaled
- 3 what is the dependent variable and how is it scaled
- 4 how many degrees of freedom are there
- 5 what is the t-test statistic
- 6 what is the p-value for this test statistic
- 7 conclusion
- 8 interpretation
- 9 develop the 95% CI for each condition of treatment
Solutions
Go to the Lecture #12 ----->
<------ Go back to Lecture #10
Go to Course Syllabus
BACK TO TOP
send
email to Dr. Schaeffer


- this page last modified by M Schaeffer
- on November 13, 2008