R
Directions (read before starting)
\(~\)
This lab focuses on the implementations of several common hypothesis
tests in R
. You should recognize that many of these
functions were first introduced in Lab 11
(confidence intervals in R
).
You are not expected to memorize any of these functions for Exam 2. However; you should be prepared to use them on your project, and you should be prepared to interpret their output on the exam.
\(~\)
In this part of the lab we’ll use the “infant heart” data set that has been used in several previous labs. As a refresher, these data come from a Harvard Medical School study where infants born with congenital heart defects were randomly assigned to receive one of two different surgeries, low-flow bypass or circulatory arrest, then their development, measured by their MDI (mental development index) and PDI (psychomotor development index) scores, was recorded 2 years after the surgery.
inf_heart = read.csv("https://remiller1450.github.io/data/InfantHeart.csv")
In R
we can perform hypothesis tests involving a
single proportion or a difference in proportions using
the prop.test()
function. The example below demonstrates a
hypothesis test on a single proportion:
## Find the numerator and denominator of the sample proportion
n_male = sum(inf_heart$Sex == "Male")
n_total = nrow(inf_heart)
## Use prop.test and print the p-value for H0: p = 0.5
results = prop.test(x = n_male, n = n_total, p = 0.5)
results$p.value
## [1] 6.310591e-06
Notice that the null hypothesis in this test was \(H_0: p = 0.5\), which we set using the
argument p=0.5
. This hypothesis implies that male and
female infants were equally likely to be selected into this study (ie:
that congenital heart defects are equally common among each each). We
have overwhelming statistical evidence against this null hypothesis.
In previous analyses of these data some groups have debated whether there is an imbalance in the proportion of male infants across each type of surgery. We can assess this using a difference in proportions test:
## Find the numerator and denominator of both sample proportions
n_male_lf = sum(inf_heart$Sex == "Male" & inf_heart$Treatment == "Low-flow bypass")
n_total_lf = sum(inf_heart$Treatment == "Low-flow bypass")
n_male_ca = sum(inf_heart$Sex == "Male" & inf_heart$Treatment == "Circulatory arrest")
n_total_ca = sum(inf_heart$Treatment == "Circulatory arrest")
## Use prop.test, printing the sample difference in proportions and the corresponding p-value for H0: p1 - p2 = 0
results = prop.test(x = c(n_male_lf, n_male_ca), n = c(n_total_lf, n_total_ca), correct = FALSE)
results$estimate
## prop 1 prop 2
## 0.6714286 0.7123288
results$p.value
## [1] 0.5962946
Here we can conclude that there is no statistical evidence of an imbalance in the distribution of sexes assigned to each type of surgery. This should align with our expectations as each infant in the study was randomly assigned to the type of surgery they received.
Question 1: Using a Normal approximation and the test statistic approach, replicate “by hand” the hypothesis test assessing whether there was an imbalance in the distribution of sexes assigned to each type of surgery. Your answer should clearly state the null and alternative hypotheses, show the calculations of your test statistic, and provide a two-sided p-value that you found using StatKey which is close to the one found in the example given above.
Question 2: Various
studies have consistently found that male infants are more likely to
be born with congenital heart defects than female infants. Many of these
studies suggest a 2:1 sex imbalance, implying that roughly two-thirds of
this study’s sample should be male if it were to accurately reflect the
target population. Using R
, perform a hypothesis test to
evaluate whether the sample data significantly deviate from this
expectation. Be sure to clearly state your hypothesis and provide both a
\(p\)-value and a brief conclusion
involving the context surrounding the test.
\(~\)
The procedure used by prop.test()
relies upon a Normal
approximation that is only reasonable when at least 10 observations in
your sample(s) experienced each outcome involved in the proportion (ie:
\(n*p\geq 10\) and \(n*(1-p) \geq 10\) for a single
proportion).
For a single proportion, an exact hypothesis test that doesn’t rely
upon this approximation is implemented in the binom.test()
function. The arguments of this function are largely the same as
prop.test()
:
## Find the numerator and denominator of the sample proportion
n_male = sum(inf_heart$Sex == "Male")
n_total = nrow(inf_heart)
## Now use binom.test
results = binom.test(x = n_male, n = n_total, p = 0.5)
results$p.value
## [1] 4.887162e-06
Notice this \(p\)-value is very
close to the one produced by prop.test()
and it still
suggests overwhelming evidence against the null hypothesis. The
similarity shouldn’t be a surprise when considering that these data
contain 99 males and 44 females (both values are well above 10).
Question 3: Early in the semester we worked with data from an impaired driving experiment where subjects smoked cannabis then engaged in 3 simulated drives at various time points after smoking. Before each drive they were asked whether they felt ready to drive on real roadways.
rtd = read.csv("https://remiller1450.github.io/data/Ready.csv")
binom.test()
or prop.test()
to evaluate
whether a majority of cannabis users feel ready to drive 180-minutes
after smoking.binom.test()
or
prop.test()
). How does this \(p\)-value compare to the one you found in
Part B?binom.test()
or prop.test()
to
calculate the \(p\)-value? If so, which
conclusion is more justifiable?\(~\)
As you should recall, a difference in proportions is sometimes called
a risk difference, and it is only one of several ways to
summarize the association between two categorical variables. In most
circumstances it is more appropriate to describe an association using an
odds ratio. The fisher.test()
function is used to
calculate an odds ratio and perform a test of the hypothesis \(H_0: OR = 1\); you should recognize that an
odds ratio of 1 indicates the odds of the value/outcome of interest are
the same in both groups that are being compared.
## Example using relevel() to compare the odds of "Yes"
result = fisher.test(x = factor(inf_heart$Sex, levels = c("Male","Female")),
y = factor(inf_heart$Treatment, levels = c("Circulatory arrest", "Low-flow bypass")))
result$estimate # Odds ratio of receiving circulatory arrest for males vs. females
## odds ratio
## 1.210118
result$p.value # p-value for H0: OR = 1
## [1] 0.7172831
You should be mindful of the way R
will order the
categorical values involved in the odds ratio calculation and that the
factor()
function can be used to provide the ordering you
desire.
Question 4: Modify the example above to analyze the odds ratio that compares the odds of an infant in the low-flow group being male to the odds of an infant in the circulatory arrest group being male. Notice that the estimate is different, but the \(p\)-value is the same. This is because the same variables and data are being used in a test of association, but the association is being described differently.
\(~\)
The t.test()
function is used to perform both one-sample
and two-sample t-tests:
## One sample t-test of H0: mu = 100
result = t.test(x = inf_heart$PDI, mu = 100)
result$p.value
## [1] 0.0001294405
## Two sample t-test of H0: mu_low_flow = mu_circ_arrest
result = t.test(PDI ~ Treatment, data = inf_heart)
result$p.value
## [1] 0.02640162
We can also use t.test()
to obtain a few other pieces of
information involved in the hypothesis test. This can be useful for
checking your work when performing a test “by hand”.
## Two-sample test
result = t.test(PDI ~ Treatment, data = inf_heart)
result$stderr # Standard error
## [1] 2.60861
result$statistic # T-value (test statistic)
## t
## -2.243961
result$parameter # Degrees of freedom - notice it's not n-1 for a two-sample t-test
## df
## 140.2472
Question 5: In the United States a newborn is considered to be “low birth weight” if they weigh less than 2500 grams at birth.
filter()
function to create the
necessary subsets of data.\(~\)
R
In the previous section we saw an example where the test statistic
was \(t = -2.243961\) and the degrees
of freedom were \(df = 140.2472\).
While you could use StatKey to translate this information into a \(p\)-value. It’s worthwhile knowing how you
could calculate the \(p\)-value
yourself in R
using the pt()
function:
pt(q = -2.243961, df = 140.2472)
## [1] 0.01320082
The code above calculates the area to the left of
q = -2.243961
in a t-distribution with 140.2472 degrees of
freedom. You should notice that this is exactly half of the two-sided
\(p\)-value of 0.02640162 given by the
t.test()
function in the corresponding example in the
previous section.
Because pt()
always returns the area to the left of the
value q
, you will need to be careful when using it for a
test statistic greater than 1 (since you want the area to the right of
the test statistic in these scenarios). In these situations you can take
1 minus the output of pt()
, or you can input your test
statistic as a negative value (even if it were calculated to be
positive). Because we’ll generally use two-sided \(p\)-values, we will multiply this area by
two anyways in order to reflect possible differences from \(H_0\) that could occur in either
direction.
Question 6: The pnorm()
function can be
used to calculate areas from the Standard Normal distribution in a
similar manner to pt()
(noting that you don’t need to
provide df
since the Normal curve doesn’t depend upon
degrees of freedom).
pnorm()
function to
find the two-sided \(p\)-value
corresponding to a test statistic of -2.243961.pt()
).\(~\)
In this part of the lab you will need to decide for yourself which hypothesis testing approach should be used.
Question 7: Early in the semester you worked with the “police deaths” data set. These data were aggregated by the Washington Post and made publicly available here. The data set documents all fatal shootings by a police officer that took place between 2015 and mid-2020.
## Read the police data, the command na.omit() will remove any cases with missing values
library(dplyr)
police = read.csv("https://remiller1450.github.io/data/Police.csv") %>% na.omit()
Question 8: In Lab #9 (sampling variability) we worked with the words in the Gettysburg Address. According an analysis of articles published in the New York Times, the newspaper’s average word length is 4.9 characters. The premise of this question is to use hypothesis testing to evaluate whether the word length in Abraham Lincoln’s speeches significantly differs from this reference point.
gettysburg = read.csv("https://remiller1450.github.io/data/gettysburg.csv")
nchar()
(introduced in
Lab
9) and mutate()
(introduced in Lab 4)
functions to create a new variable in the gettysburg data called
word_length
.summarize()
function
to find each quantity that you need to perform a “by hand” hypothesis
test for the scenario described above (ie: Question 7).R
.R
function to perform the same
hypothesis test.