a correlation of 0.000 indicates a total lack of relationship
a correlation of either -1.000 or +1.000 is called a
perfect relationship - why?
the ± sign of the correlation reflects the negative/positive
slope of regression line
negative correlation does not necessarily mean worse
then positive
which relationship is stronger, r = -0.750
or r = +0.750?
we will try to always use three decimal places in our correlation coefficients
example
a distribution with mean = 50, sd = 10
10,000 ordered pairs drawn randomly from this distribution
(in sets of 10 pairs each)
the known correlation is 0.000
now we plot the Frequency of the 10,000 correlations
correlation coefficients (r) on the x
- axis
frequency on the y - axis
note that 0.000 and low correlations are in the fat part
of this curve
note also that correlations > 0.632 comprise 2.5% of
the curve & also note that correlations < -0.632 comprise another
2.5% of the curve
thus, even when sampling among data with no correlation
5.0% of the correlation coefficients reflect strong relationships (both
positive and negative relationships)
NOTE the (closer) approximation of the points in delineating
a straight line in this graph compared with above
first let us obtain the X Y products and sum = 158
now divide the products summed by N = 31.6
find the mean of X and the mean of Y; 6 and 4 respectively
now calculate sigma sub x and sigma sub y
sigma sub x = 4
sigma sub y = 2
assemble terms and remove the parenthesis: (31.6 - 24)
÷ 8, to obtain = 7.6 ÷ 8 = 0.950
is this r = 0.950 statistically significant.
we have N-2 df or 5-2 = 3 df in this problem
for 3 df we need a r of at least 0.878
to reject the null hypothesis
thus, we can reject the null hypothesis with r
= 0.950
let's take a look at the best fitting regression line
in this new graph
what do you observe in comparing these last two line
of best fit graphs?
Assignment #5,
Due prior to class 10/1/09
Text Reading &
Text Problems
Be sure to obtain
a copy of the download for your assignment, click here
Read De Veaux Chapters 9, 12, 13
CHAPTER 10 will NOT be covered
Chapter 11 will be skipped temporarily
Use either the Excel CORREL or PEARSON function to derive
r = when the correlation is requested in each of the following
TEXTBOOK problems and where instructed in the Additional problems.
Extremely important! chapter 7, problems 8, 35, 38,
40, 42 have different questions (as stated below) than are stated in
the
text.
Please
answer questions below as instructed, but you will need to read the text
to become oriented to the circumstances and given information.
Problems Chapter 7: 4, 8, 10, 35, 38, 40, 42
4 please answer each part a through d using the prompts
in the answer sheet blanks, AND no Excel work
is required on this problem
8 there is no need to show any work or access the data
from the text's accompanying CD, AND no Excel
work is required on this problem
a) which variable is the response variable; why;
which of the four measurement scales does the variable represent?
b) describe the direction, form, & strength
c) describe the rate of performance change across
the range of time depicted
10 please answer each part a through c, noting that
the data are on the worksheet for the histogram (i.e., count of days as
a function of daily revenue) - please answer text problems as stated
35 the raw data are on the designated worksheet tab
a) make an appropriate scatterplot
b) determine the correlation r
c) describe the direction, form, & strength
d) interpret the plot, considering the value of
r
38 the raw data are on the designated worksheet tab
a) make an appropriate scatterplot
b) determine the correlation r
c) describe the direction, form, & strength
d) interpret the plot, considering the value of
r
40 the raw data are on the designated worksheet tab
a) make an appropriate scatterplot for attendance
and wins
b) determine the correlation r
c) describe the direction, form, & strength
d) interpret the plot, considering the value of
r
e) compare the interpretation of your plot with
the scatterplot exhibiting attendance as a function of runs
42 the raw data are on the designated worksheet tab
a) make an appropriate scatterplot
b) determine the correlation r
c) describe the direction, form, & strength
d) interpret the plot, considering the value of
r
Please notice you have a formatted answer sheet and most
of your worksheets are started for you. For each of the chapter 7 problems
you need not come up with any work to support the text boxes other than instructed
above. Please notice that the text boxes will accept text in a customary word
processing manner with automatic carriage returns. Do
not overflow a text box or else.... Please remember to place
correlation work on corresponding worksheet tab.
Additional Problems
Each of the following problems pertain to the attached raw
data in your download. I strongly suggest that you review all of the problems
and the accompany worksheets before you try to begin this series of questions.
This exercise is extremely similar to part of the mid-term exam.
A.) For A, please complete the worksheet tab labeled "codebook"
these answers should NOT be displayed
on the answer sheet, but only on the codebook tab worksheet. I highly recommend
using Min, Max functions, sorting, and/or AutoFilter for this task. No
Work needs to be shown on this problem.
For each of the following B.1 through B.8b fill in the yellow
cells with numbers only NO
TEXT, please. In contrast, on problems B.8c, B.8d, B.9b and B.9c,
you need to enter text, but you must limit your TEXT
response (text only) to the size of the box. Please notice that
I have formatted the boxes so that your text will wrap automatically without
having to hit a line feed. Anybody who tries to expand the box size or reduce
the font size (below 10 pt) or in any other way tries to find a way to jam more
words in these boxes than will fit will lose full credit.
For B.2 through B.8, I highly recommend that you use at least
1 page for each problem, and you can see that I have already inserted a page
tab for each problem subpart B.2 through B.8. You should also note that I have
intentionally deleted some of the unnecessary columns of the full raw data set
on selected pages from B.2 to B.8. This is to make your solution easier to follow.
B.1) complete the yellow-shaded, boxed cell for the sample
total N
For B.2 through B.7 complete the shaded cells using the Excel 'AutoFilter'
feature in Excel and if necessary, other Excel functions.
B.2) using Excel determine how many patients are in the
placebo group and how many are in the drug group
B.3) using Excel determine how many males and how many females
are in the placebo and drug groups
B.4) using Excel how many male and female patients in EACH
treatment had a successful PTCA
B.5) use Excel to determine the mean age for male and female
patients in EACH treatment
B.6) use Excel to determine the mean height AND sd for all
patients in EACH treatment
B.7) use Excel to determine the mean weight AND sd for all
patients in EACH treatment
B.8a) use Excel to graph weight as a function of height
for all patients; to receive full credit on this part, you must scale
your x-axis from 140 - 240 cm in increments of 10 cm and your y-axis from
40 - 140 kg in increments of 10 kg
B.8b) the correlation coefficient (r) for
the association between weight and height is: ___
B.8c) describe the direction, form, & strength
B.8d) interpret the plot, considering the value of r
EXTRA CREDIT
B.9a) Plot a column graph to compare the mean ages for patients
in the placebo group and the drug treatment group (to obtain credit, please
show all work)
B.9b) state the null hypothesis to go with the graph in
B.9a
B.9c) state the corresponding experimental hypothesis for
B.9b