Lab #13 - Hypothesis Testing in R

Directions (read before starting)

Please work together with your assigned partner. Make sure you both fully understand something before moving on.
Record your answers to lab questions separately from the lab’s examples. You and your partner should only turn in responses to lab questions, nothing more and nothing less.
Ask for help, clarification, or even just a check-in if anything seems unclear.

\(~\)

Introduction

This lab focuses on the implementations of several common hypothesis tests in R. You should recognize that many of these functions were first introduced in Lab 11 (confidence intervals in R).

You are not expected to memorize any of these functions for Exam 2. However; you should be prepared to use them on your project, and you should be prepared to interpret their output on the exam.

\(~\)

Lab

Part 1

In this part of the lab we’ll use the “infant heart” data set that has been used in several previous labs. As a refresher, these data come from a Harvard Medical School study where infants born with congenital heart defects were randomly assigned to receive one of two different surgeries, low-flow bypass or circulatory arrest, then their development, measured by their MDI (mental development index) and PDI (psychomotor development index) scores, was recorded 2 years after the surgery.

inf_heart = read.csv("https://remiller1450.github.io/data/InfantHeart.csv")

Proportions

In R we can perform hypothesis tests involving a single proportion or a difference in proportions using the prop.test() function. The example below demonstrates a hypothesis test on a single proportion:

## Find the numerator and denominator of the sample proportion
n_male = sum(inf_heart$Sex == "Male")
n_total = nrow(inf_heart)

## Use prop.test and print the p-value for H0: p = 0.5
results = prop.test(x = n_male, n = n_total, p = 0.5)
results$p.value

## [1] 6.310591e-06

Notice that the null hypothesis in this test was \(H_0: p = 0.5\), which we set using the argument p=0.5. This hypothesis implies that male and female infants were equally likely to be selected into this study (ie: that congenital heart defects are equally common among each each). We have overwhelming statistical evidence against this null hypothesis.

In previous analyses of these data some groups have debated whether there is an imbalance in the proportion of male infants across each type of surgery. We can assess this using a difference in proportions test:

## Find the numerator and denominator of both sample proportions
n_male_lf = sum(inf_heart$Sex == "Male" & inf_heart$Treatment == "Low-flow bypass")
n_total_lf = sum(inf_heart$Treatment == "Low-flow bypass")
n_male_ca = sum(inf_heart$Sex == "Male" & inf_heart$Treatment == "Circulatory arrest")
n_total_ca = sum(inf_heart$Treatment == "Circulatory arrest")

## Use prop.test, printing the sample difference in proportions and the corresponding p-value for H0: p1 - p2 = 0
results = prop.test(x = c(n_male_lf, n_male_ca), n = c(n_total_lf, n_total_ca), correct = FALSE)
results$estimate

##    prop 1    prop 2 
## 0.6714286 0.7123288

results$p.value

## [1] 0.5962946

Here we can conclude that there is no statistical evidence of an imbalance in the distribution of sexes assigned to each type of surgery. This should align with our expectations as each infant in the study was randomly assigned to the type of surgery they received.

Question 1: Using a Normal approximation and the test statistic approach, replicate “by hand” the hypothesis test assessing whether there was an imbalance in the distribution of sexes assigned to each type of surgery. Your answer should clearly state the null and alternative hypotheses, show the calculations of your test statistic, and provide a two-sided p-value that you found using StatKey which is close to the one found in the example given above.

Question 2: Various studies have consistently found that male infants are more likely to be born with congenital heart defects than female infants. Many of these studies suggest a 2:1 sex imbalance, implying that roughly two-thirds of this study’s sample should be male if it were to accurately reflect the target population. Using R, perform a hypothesis test to evaluate whether the sample data significantly deviate from this expectation. Be sure to clearly state your hypothesis and provide both a \(p\)-value and a brief conclusion involving the context surrounding the test.

\(~\)

Exact Binomial

The procedure used by prop.test() relies upon a Normal approximation that is only reasonable when at least 10 observations in your sample(s) experienced each outcome involved in the proportion (ie: \(n*p\geq 10\) and \(n*(1-p) \geq 10\) for a single proportion).

For a single proportion, an exact hypothesis test that doesn’t rely upon this approximation is implemented in the binom.test() function. The arguments of this function are largely the same as prop.test():

## Find the numerator and denominator of the sample proportion
n_male = sum(inf_heart$Sex == "Male")
n_total = nrow(inf_heart)

## Now use binom.test
results = binom.test(x = n_male, n = n_total, p = 0.5)
results$p.value

## [1] 4.887162e-06

Notice this \(p\)-value is very close to the one produced by prop.test() and it still suggests overwhelming evidence against the null hypothesis. The similarity shouldn’t be a surprise when considering that these data contain 99 males and 44 females (both values are well above 10).

Question 3: Early in the semester we worked with data from an impaired driving experiment where subjects smoked cannabis then engaged in 3 simulated drives at various time points after smoking. Before each drive they were asked whether they felt ready to drive on real roadways.

rtd = read.csv("https://remiller1450.github.io/data/Ready.csv")

Part A: The final simulated drive took place 180-minutes after smoking, and the variable “RT2” recorded whether the subject indicated they felt ready to drive on real roadways at this time. Check the assumptions for using a hypothesis test based upon a Normal approximation with the aim of evaluating whether a majority of subjects felt ready to drive that this time point.
Part B: Perform a hypothesis test using either binom.test() or prop.test() to evaluate whether a majority of cannabis users feel ready to drive 180-minutes after smoking.
Part C: Now find the \(p\)-value with the function you did not use in Part B (either binom.test() or prop.test()). How does this \(p\)-value compare to the one you found in Part B?
Part D: If you are asked to make a conclusion using the decision threshold \(\alpha = 0.10\) will your conclusion change depending upon whether you decided to use binom.test() or prop.test() to calculate the \(p\)-value? If so, which conclusion is more justifiable?

\(~\)

Odds Ratios

As you should recall, a difference in proportions is sometimes called a risk difference, and it is only one of several ways to summarize the association between two categorical variables. In most circumstances it is more appropriate to describe an association using an odds ratio. The fisher.test() function is used to calculate an odds ratio and perform a test of the hypothesis \(H_0: OR = 1\); you should recognize that an odds ratio of 1 indicates the odds of the value/outcome of interest are the same in both groups that are being compared.

## Example using relevel() to compare the odds of "Yes"
result = fisher.test(x = factor(inf_heart$Sex, levels = c("Male","Female")),
                     y = factor(inf_heart$Treatment, levels = c("Circulatory arrest", "Low-flow bypass")))
result$estimate   # Odds ratio of receiving circulatory arrest for males vs. females

## odds ratio 
##   1.210118

result$p.value    # p-value for H0: OR = 1

## [1] 0.7172831

You should be mindful of the way R will order the categorical values involved in the odds ratio calculation and that the factor() function can be used to provide the ordering you desire.

Question 4: Modify the example above to analyze the odds ratio that compares the odds of an infant in the low-flow group being male to the odds of an infant in the circulatory arrest group being male. Notice that the estimate is different, but the \(p\)-value is the same. This is because the same variables and data are being used in a test of association, but the association is being described differently.

\(~\)

Means

The t.test() function is used to perform both one-sample and two-sample t-tests:

## One sample t-test of H0: mu = 100
result = t.test(x = inf_heart$PDI, mu = 100)
result$p.value

## [1] 0.0001294405

## Two sample t-test of H0: mu_low_flow = mu_circ_arrest
result = t.test(PDI ~ Treatment, data = inf_heart)
result$p.value

## [1] 0.02640162

We can also use t.test() to obtain a few other pieces of information involved in the hypothesis test. This can be useful for checking your work when performing a test “by hand”.

## Two-sample test
result = t.test(PDI ~ Treatment, data = inf_heart)
result$stderr  # Standard error

## [1] 2.60861

result$statistic  # T-value (test statistic)

##         t 
## -2.243961

result$parameter  # Degrees of freedom - notice it's not n-1 for a two-sample t-test

##       df 
## 140.2472

Question 5: In the United States a newborn is considered to be “low birth weight” if they weigh less than 2500 grams at birth.

Part A: Use a hypothesis test to evaluate whether you can statistically confident that infants born with congenital heart defects are on average not born with a low birth weight.
Part B: It has been found that male infants tend to weigh approximately 130 grams more than female infants at birth. Therefore, because the infants in this study were disproportionately male you might be concerned that the hypothesis testing result in Part A is impacted by the sex imbalance and the relationship between sex and birth weight. To address this concern, perform a stratified analysis that repeats the hypothesis test in Part A for each sex. Hint: Use the filter() function to create the necessary subsets of data.

\(~\)

Finding \(p\)-values “by hand” in `R`

In the previous section we saw an example where the test statistic was \(t = -2.243961\) and the degrees of freedom were \(df = 140.2472\). While you could use StatKey to translate this information into a \(p\)-value. It’s worthwhile knowing how you could calculate the \(p\)-value yourself in R using the pt() function:

pt(q = -2.243961, df = 140.2472)

## [1] 0.01320082

The code above calculates the area to the left of q = -2.243961 in a t-distribution with 140.2472 degrees of freedom. You should notice that this is exactly half of the two-sided \(p\)-value of 0.02640162 given by the t.test() function in the corresponding example in the previous section.

Because pt() always returns the area to the left of the value q, you will need to be careful when using it for a test statistic greater than 1 (since you want the area to the right of the test statistic in these scenarios). In these situations you can take 1 minus the output of pt(), or you can input your test statistic as a negative value (even if it were calculated to be positive). Because we’ll generally use two-sided \(p\)-values, we will multiply this area by two anyways in order to reflect possible differences from \(H_0\) that could occur in either direction.

Question 6: The pnorm() function can be used to calculate areas from the Standard Normal distribution in a similar manner to pt() (noting that you don’t need to provide df since the Normal curve doesn’t depend upon degrees of freedom).

Part A: Use the pnorm() function to find the two-sided \(p\)-value corresponding to a test statistic of -2.243961.
Part B: Given what you know about the \(t\)-distribution, explain why the results in Part A are similar to the \(p\)-value shown in the example in this section (which used pt()).

\(~\)

Part 2

In this part of the lab you will need to decide for yourself which hypothesis testing approach should be used.

Question 7: Early in the semester you worked with the “police deaths” data set. These data were aggregated by the Washington Post and made publicly available here. The data set documents all fatal shootings by a police officer that took place between 2015 and mid-2020.

## Read the police data, the command na.omit() will remove any cases with missing values
library(dplyr)
police = read.csv("https://remiller1450.github.io/data/Police.csv") %>% na.omit()

Part A: According to the 2020 US Census, the nation’s most populous state, California (CA), is home to 11.8% of the US population. Use a hypothesis test to determine whether the share of police involved deaths that occur in California is significantly different from California’s population share.
Part B: According to the 2020 US Census, the average age in the United States is 38.2 years. Perform a hypothesis test to evaluate whether there is sufficient statistical evidence to claim that the victims of fatal police shootings are younger or older than the average US resident.
Part C: Find the odds ratio comparing the odds of a black victim being unarmed to the odds of a white victim being unarmed. Perform a hypothesis test to determine if there is a statistically significant difference in the odds of being unarmed across these two racial groups among the victims of fatal police shootings. Hint: start by filtering the data to only include cases whose race is either “W” or “B”.
Part D: California (CA) and Texas (TX) are the two most populous states and each have very different firearms laws. Perform a hypothesis test to evaluate whether the proportion of armed individuals among the victims of fatal police shootings is significantly different across these two states.

Question 8: In Lab #9 (sampling variability) we worked with the words in the Gettysburg Address. According an analysis of articles published in the New York Times, the newspaper’s average word length is 4.9 characters. The premise of this question is to use hypothesis testing to evaluate whether the word length in Abraham Lincoln’s speeches significantly differs from this reference point.

gettysburg = read.csv("https://remiller1450.github.io/data/gettysburg.csv")

Part A: Use the nchar() (introduced in Lab 9) and mutate() (introduced in Lab 4) functions to create a new variable in the gettysburg data called word_length.
Part B: Use the summarize() function to find each quantity that you need to perform a “by hand” hypothesis test for the scenario described above (ie: Question 7).
Part C: Perform an appropriate hypothesis test “by hand”, showing all of your calculations. Calculate a two-sided \(p\)-value using R.
Part D: Verify the two-sided \(p\)-value of your “by hand” test (from Part C) using an appropriate R function to perform the same hypothesis test.

Lab #13 - Hypothesis Testing in R

Introduction

Lab

Part 1

Proportions

Exact Binomial

Odds Ratios

Means

Finding \(p\)-values “by hand” in R

Part 2

Lab #13 - Hypothesis Testing in `R`

Finding \(p\)-values “by hand” in `R`