MATH-146 Lab #6 - Two-sample Hypothesis Testing

Goals:

The purpose of this lab is to provide practice using two-sample Z and T tests within the framework of the scientific method.

Directions:

You are expected to progress through the analyses described in this document as a group, recording your answers in a shared document. It’s completely up to your group how you’d like to organize this - some groups like using a shared Google Doc, while other might designate one person to be the group’s recorder.
You are expected to work together, any attempts to “divide and conquer” the lab questions may result in point deductions on your group’s lab score.
Labs are graded primarily for completion, and we will get together as group for the last 10-15 minutes of class to discuss some of the lab questions. This means you should focus on learning the material (while also helping the teammates in your group) rather than seeing labs as an assessment (like homework or exams).
Please upload your responses to the Lab’s questions on Canvas. The expectation is that everyone uploads their own copy (they can be identical within your group).
Use the snipping tool on Windows or take a Mac screenshot to add a screenshots to your lab write-up as requested.

\(~\)

Study #1 - Death Penalty Sentencing (revisited)

Early in the semester we looked at data from a widely cited study which considered all murders that took place during a felonies committed in the state of Florida between 1972 and 1977.

The Death Penalty Sentencing Data contains the following variables:

OffenderRace - whether the person tried for the crime was white or black
VictimRace - whether the murder victim was white or black
DeathPenalty - Whether or not the person tried received the death penalty

The researchers who assembled the dataset were interested in whether or not juries exhibited racially biased sentencing in death penalty verdicts.

Question #1: Consider the null hypothesis \(H_0: p_1 - p_2 = 0\), where \(p_1\) is the proportion of black offenders that receive the death penalty, and \(p_2\) is the proportion of white offenders that receive the death penalty. What is the sample statistic related to this hypothesis?

Question #2: Considering the null hypothesis stated in Question #1, what is the pooled proportion that should be used in the standard error calculation when performing a \(Z\)-test to evaluate this hypothesis?

Question #3: Using the information from your answers to Questions #1 and #2, conduct a two-sample \(Z\)-test evaluating whether or not these data provide sufficient statistical evidence that death penalty was applied at unequal rates to white and black offenders. Show the calculation of your \(Z\)-value, provide a two-sided \(p\)-value, and make a conclusion.

Question #4: Suppose an individual looks at the results of the hypothesis test you performed in Question #3 and uses it as proof that death penalty sentences were not racially biased in Florida during the 1970s. Briefly explain two common hypothesis testing mistakes that this individual has committed.

\(~\)

Stratification

A proper statistical analysis of these data will control for the race of the victim. One way of doing this is a stratified analysis, which performs separate statistical tests on the sub-groups that are created when the data is split according to a confounding variable. In the context of this study, a stratified analysis would split the data into cases involving white victims and cases involving black victims, then evaluate the death penalty rates for white and black offenders within each of those strata. The stratified table below summarizes these data:

Cases involving a White Victim
	Death	Not
Black Offenders	37	41
White Offenders	46	144

Cases involving a Black Victim
	Death	Not
Black Offenders	1	101
White Offenders	0	8

Question #5: Using the information above, perform a two-sample \(Z\)-test for cases involving a white victim. Show your calculation of the pooled proportion and your \(Z\)-value Report your \(p\)-value along with a conclusion regarding whether you believe these data provide statistically compelling evidence of racially biased death penalty sentencing.

Question #6: Briefly explain why it wouldn’t be prudent to perform a two-sample \(Z\)-test for the cases involving a black victim? That is, what might make a statistician reluctant to trust a \(Z\)-test if it were used in this situation?

\(~\)

Study #2 - Infant Heart Surgery (revisited)

We’ve previously looked at data from a study involving infants born with congenital heart defects that require surgery shortly after birth.

Infant Heart Surgery dataset

In this study, researchers at Harvard Medical School randomly assigned 143 infants in need of heart surgery to either the current standard of care known as “circulatory arrest”, which had the downside of cutting of the flow of blood to the brain during the surgery, and a new alternative surgical approach known as “low-flow bypass”, which maintains circulation to the brain but uses an external pump that might lead to other types of brain injuries. The researchers followed up on these infants are few years later to assess their mental and physical development via the outcomes:

Psychomotor Development Index (PDI) - a composite score measuring physiological development, with higher scores indicating greater development
Mental Development Index (MDI) - a composite score measuring mental development, with higher scores indicating greater development

Additionally, the research team recorded the following variables for each infant:

Treatment - the type of surgery the infant received
Weight - the infant’s weight (in grams)
Length - the infant’s length (in cm)
Age - the infant’s age (in hours)
Sex - the infant’s sex (male or female)

Question #7: Perform a \(Z\)-test or \(T\)-test to evaluate whether the average PDI score differs across the two different types of surgery. Your answer should clearly state your null and alternative hypotheses, show how you calculated your \(Z\)-value or \(T\)-value (including the \(SE\) calculation), provide a two-sided \(p\)-value, and provide a brief conclusion.

Question #8: Is it possible that the hypothesis test you performed in Question #7 resulted in an error? If so, would it have been a Type I or a Type II error? Briefly explain.

Question #9: Perform a \(Z\)-test or \(T\)-test to evaluate whether the average MDI score differs across the two different types of surgery. Your answer should clearly state your null and alternative hypotheses, show how you calculated your \(Z\)-value or \(T\)-value (including the \(SE\) calculation), provide a two-sided \(p\)-value, and provide a brief conclusion.

Question #10: Is it possible that the hypothesis test you performed in Question #7 resulted in an error? If so, would it have been a Type I or a Type II error? Briefly explain.

Question #11: Without actually performing any additional hypothesis tests, explain why the design of this study makes it unlikely for there to be statistically significant difference in the average weight, average length, average age, or proportion male across the two different surgical groups.

Question #12: As we’ve previously discussed during class, a paired study design offers a number of statistical advantages over a two-sample design (see the last example in the “One Sample Testing Procedures” notes for a refresher). So, why did the researchers in this study use a two-sample design rather than a paired design? Briefly explain (in 1-2 sentences).