This lab is intended to provide practice and insight in applying two-sample hypothesis testing methods to real data.
Directions (Please read before starting)
\(~\)
Similar to Lab #6, which covered methods for one-sample data, this lab is intended to provide you with practice applying statistical methods used on two-sample data.
As review, we’ve covered the following hypothesis tests for two-sample categorical data:
fisher.test
prop.test
The \(Z\)-test is more computationally efficient than Fisher’s exact test, and it should be used for large datasets. Otherwise, Fisher’s exact test should be preferred in most circumstances.
We’ve also covered the following tests for one-sample quantitative data:
t.test
wilcox.test
Generally speaking, the \(t\)-test is more powerful and should always be used if conditions allow for it.
\(~\)
Some infants are born with congenital heart defects that require surgery shortly after birth. In this study, researchers at Harvard Medical School randomly assigned 143 infants in need of heart surgery to either the current standard of care known as “circulatory arrest”, which had the downside of cutting of the flow of blood to the brain during the surgery, or a new alternative surgical approach known as “low-flow bypass”, which maintains circulation to the brain but uses an external pump that might lead to other types of brain injuries.
ih = read.csv("https://remiller1450.github.io/data/InfantHeart.csv")
The researchers followed up on these infants a few years later to assess their mental and physical development via the outcomes:
Additionally, the research team recorded data on the following variables for each infant:
Question #1: Considering the design of this study, can a causal relationship between the type of surgery and an infant’s development (measured via PDI or MDI) be established? Briefly explain.
Question #2: Using graphical methods, verify that “weight” and “length” were balanced across both types of surgery. Then, briefly explain why you’d expect these variables to be balanced given the design of the study.
\(~\)
Question #3: Create side-by-side boxplots depicting the outcome variable “PDI” for each type of surgery. Then, considering the sample sizes for each surgery and the distributions seen in these boxplots, determine whether a two-sample \(t\)-test can be used to evaluate a difference in mean PDI scores across the two types of surgery?
Question #4: Regardless of your answer to Question #3, perform a two-sample \(t\)-test comparing the mean PDI scores across each type of surgery. You should report the observed difference in means, the \(p\)-value of the test, and a conclusion in the context of the application.
Question #5: Create side-by-side boxplots depicting the outcome variable “MDI” for each type of surgery. Then, considering the sample sizes for each surgery and the distributions seen in these boxplots, determine whether a two-sample \(t\)-test can be used to evaluate a difference in mean MDI scores across the two types of surgery?
Question #6: Regardless of your answer to Question #5, perform a two-sample \(t\)-test comparing the mean MDI scores across each type of surgery. You should report the observed difference in means, the \(p\)-value of the test, and a conclusion in the context of the application.
Question #7: The code and output below uses Wilcoxon rank-sum tests to statistically evaluate differences in median PDI and MDI scores across the two types of surgery. How do the \(p\)-values and the conclusions drawn from these tests compare with those you found in Questions #4 and #6? Briefly explain why the two approaches produce results that are similar and/or different.
## Testing median PDI
wilcox.test(x = ih$PDI[ih$Treatment == "Circulatory arrest"], y = ih$PDI[ih$Treatment == "Low-flow bypass"])
##
## Wilcoxon rank sum test with continuity correction
##
## data: ih$PDI[ih$Treatment == "Circulatory arrest"] and ih$PDI[ih$Treatment == "Low-flow bypass"]
## W = 1975.5, p-value = 0.01891
## alternative hypothesis: true location shift is not equal to 0
## Testing median MDI
wilcox.test(x = ih$MDI[ih$Treatment == "Circulatory arrest"], y = ih$MDI[ih$Treatment == "Low-flow bypass"])
##
## Wilcoxon rank sum test with continuity correction
##
## data: ih$MDI[ih$Treatment == "Circulatory arrest"] and ih$MDI[ih$Treatment == "Low-flow bypass"]
## W = 2140.5, p-value = 0.09424
## alternative hypothesis: true location shift is not equal to 0
\(~\)
Early in the semester we looked at data from a widely cited study which considered trial verdicts for the offenders in all murders that took place during a felonies committed in the state of Florida between 1972 and 1977.
dp = read.csv("https://remiller1450.github.io/data/DeathPenaltySentencing.csv")
This dataset contains the following variables:
The researchers who assembled the dataset were interested in whether or not juries exhibited racially biased sentencing in death penalty verdicts.
Question #8: Considering the design of this study, can a causal relationship between the race of the offender and the death penalty verdict be established? Briefly explain.
Question #9: Using graphical methods, verify that “VictimRace” is imbalanced across the two categories of offender’s race. Then, briefly explain why this result is not unexpected given the design of the study.
\(~\)
Question #10: Using an appropriate statistical test, evaluate the null hypothesis \(H_0: p_1 - p_2 = 0\), where \(p_1\) is the proportion of black offenders that receive the death penalty, and \(p_2\) is the proportion of white offenders that receive the death penalty. Report the observed difference in proportions, a \(p\)-value, and a brief conclusion.
Question #11: Suppose an individual looks at the results of the hypothesis test you performed in Question #10 and claims that the results of the test provide proof that death penalty sentences were not racially biased in Florida during the 1970s. Briefly explain two common hypothesis testing mistakes that this individual has committed.
\(~\)
A proper statistical analysis of these data will control for the race of the victim. One way of doing this is a stratified analysis, which performs separate statistical tests on the sub-groups that are created when the data is split according to a confounding variable. In the context of this study, a stratified analysis would split the data into cases involving white victims and cases involving black victims, then evaluate the death penalty rates for white and black offenders within each of those strata. The stratified table below summarizes these data:
Death | Not | |
---|---|---|
Black Offenders | 37 | 41 |
White Offenders | 46 | 144 |
Death | Not | |
---|---|---|
Black Offenders | 1 | 101 |
White Offenders | 0 | 8 |
Question #12: Perform separate hypothesis tests for each of the two strata defined above (cases involving white victims and cases involving black victims). Report the \(p\)-value and a brief conclusion for each hypothesis test.
Question #13: Could a two-sample \(Z\)-test have been used evaluate a possible difference in death penalty sentencing rates for white and black offenders in one, both, or neither of the two strata defined above? Briefly explain.