Directions:
\(~\)
Introduction
The purpose this lab is to practice applying concepts and procedures related to hypothesis testing. More specifically, \(Z\) and \(T\) tests performed on either a single sample, or used to compare data from two different samples/groups.
\(~\)
A hypothesis test seeks to falsify a certain null hypothesis using sample data.
We’ve now learned how to do this using the \(Z\)-test (categorical outcomes) and \(T\)-test (quantitative outcomes)
Below is a review of the different \(SE\) formulas from the CLT:
Summary measure | population parameter | sample estimate | \(SE\) |
---|---|---|---|
single proportion | \(p\) | \(\hat{p}\) | \(\sqrt{\tfrac{p*(1-p)}{n}}\) |
difference of proportions | \(p_1-p_2\) | \(\hat{p}_1-\hat{p_2}\) | \(\sqrt{\tfrac{p_1*(1-p_1)}{n_1} + \tfrac{p_2*(1-p_2)}{n_2}}\) |
single mean | \(\mu\) | \(\bar{x}\) | \(\tfrac{\sigma}{\sqrt{n}}\) |
difference of means | \(\mu_1 - \mu_2\) | \(\bar{x}_1 - \bar{x}_2\) | \(\sqrt{\tfrac{\sigma_1^2}{n_1} + \tfrac{\sigma_2^2}{n_2}}\) |
The \(Z\)-test and \(T\)-test each rely test statistics of the form:
\[\text{test statistic} = \frac{\text{observed} - \text{null}}{SE}\] The test statistic is compared against an appropriate probability model to find the \(p\)-value and reach a conclusion.
\(~\)
Some infants are born with congenital heart defects that require surgery shortly after birth. The standard surgical approach is known as “circulatory arrest”, and has the downside of cutting of the flow of blood to the brain during the surgery, potentially leading to brain damage. An alternative surgical approach is “low-flow bypass”, which maintains circulation to the brain, but does so with an external pump that might lead to other types of brain injuries. The goal of this study is to determine which surgical approach yields better developmental outcomes for infants born with congenial heart defects.
The Infant Heart Surgery dataset contains data from a randomized experiment conducted by surgeons at Harvard Medical School. The data document the outcomes of 70 infants who received low-flow bypass surgery, and 73 infants who received surgery under a circulatory arrest approach. The study considered two primary outcomes:
Additionally, the research team recorded the following variables for each infant:
CLICK HERE to download the dataset.
Question #1: What are the explanatory and response variables in this study? With this in mind, what are some graphs or tables might you use to convey the relationships or distributions of these variables? The purpose of this question is to help you practice for your final project.
Question #2: Determine whether the conditions are met to use a \(t\)-test to evaluate whether the mean PDI scores differ in the two surgical groups. (Hint: you can find these conditions on Slide #18 of our two-sample hypothesis testing slides)
Question #3: Perform a two-sample \(t\)-test to determine whether the new low-flow surgery leads to significantly better physiological development. Be sure to clearly state your hypotheses, show how your test statistic is calculated, and provide a \(p\)-value with an appropriate conclusion.
Question #4: Determine whether the conditions are met to use a \(t\)-test to evaluate whether the mean MDI scores differ in the two surgical groups.
Question #5: Perform a two-sample \(t\)-test to determine whether the new low-flow surgery leads to significantly better mental development. Be sure to clearly state your hypotheses, show how your test statistic is calculated, and provide a \(p\)-value with an appropriate conclusion.
Question #6: Suppose a critic of this study is concerned that there might be a sex imbalance across the two surgical groups. To address this concern, perform an appropriate two-sample hypothesis test comparing the proportion of male infants in each group. Be sure to clearly state your hypotheses, show how your test statistic is calculated, and provide a \(p\)-value with an appropriate conclusion.
Question #7: Considering the design of this study, briefly explain why the \(p\)-value you found in Question #6 was not statistically significant.
\(~\)
The Hollywood Movies Dataset contains information on 970 movies released by various Hollywood production studios between 2007 and 2013. It contains the following variables:
CLICK HERE to download the dataset.
\(~\)
Question #8: While animated movies tend to be very memorable, they make up a relatively small fraction of major films. For this question, use these data to statistically test whether fewer than 10% of Hollywood Movies belong to the “Animation” genre. Be sure to clearly state your hypotheses, show how your test statistic is calculated, and provide a \(p\)-value with an appropriate conclusion.
Question #9: Perform a hypothesis test to determine whether is statistical evidence to conclude that Paramount Studios’ movies have higher budgets than Universal Studios’ movies. Be sure to clearly state your hypotheses, show how your test statistic is calculated, and provide a \(p\)-value with an appropriate conclusion.
Question #10: Perform a hypothesis test to determine whether is statistical evidence to conclude that a lower proportion of movies produced by Paramount are in the “Action” genre relative to the movies produced by Universal Studios. Be sure to clearly state your hypotheses, show how your test statistic is calculated, and provide a \(p\)-value with an appropriate conclusion.