Directions:
\(~\)
Introduction
The purpose this lab is to practice applying concepts and procedures related to hypothesis testing. More specifically, \(Z\) and \(T\) tests performed on a single sample.
\(~\)
A hypothesis test seeks to falsify a certain null hypothesis using sample data.
We’ve now learned how to do this using the \(Z\)-test (categorical outcomes) and \(T\)-test (quantitative outcomes)
Below is a review of the different \(SE\) formulas from the CLT:
Summary measure | population parameter | sample estimate | \(SE\) |
---|---|---|---|
single proportion | \(p\) | \(\hat{p}\) | \(\sqrt{\tfrac{p*(1-p)}{n}}\) |
difference of proportions | \(p_1-p_2\) | \(\hat{p}_1-\hat{p_2}\) | \(\sqrt{\tfrac{p_1*(1-p_1)}{n_1} + \tfrac{p_2*(1-p_2)}{n_2}}\) |
single mean | \(\mu\) | \(\bar{x}\) | \(\tfrac{\sigma}{\sqrt{n}}\) |
difference of means | \(\mu_1 - \mu_2\) | \(\bar{x}_1 - \bar{x}_2\) | \(\sqrt{\tfrac{\sigma_1^2}{n_1} + \tfrac{\sigma_2^2}{n_2}}\) |
The \(Z\)-test and \(T\)-test each rely test statistics of the form:
\[\text{test statistic} = \frac{\text{observed} - \text{null}}{SE}\] The test statistic is compared against an appropriate probability model to find the \(p\)-value and reach a conclusion.
\(~\)
In an investigation of whether oatbran cereal might be effective in reducing LDL cholesterol, researchers randomly assigned 14 adult males with high cholesterol into two groups:
We’ll analyze each subject’s difference in LDL cholesterol when they were on the oatbran diet relative to when they were on the cornflakes diet. This outcome is recorded as the variable “difference” in the dataset linked below. You should recognize that a positive value of “difference” indicates a reduction in LDL cholesterol on the oatbran diet.
Click Here to download the data from this study.
\(~\)
Question #1: Briefly describe one population that the researchers can reasonably generalize the results of this study to. Additionally, briefly describe another population that the researchers should avoid generalizing the results of this study to.
Question #2: Is this a randomized experiment or an observational study? With that in mind, how concerned are you about the study’s outcome being influenced by bias or confounding variables? You should respond in 2-3 sentences.
\(~\)
Question #3: These researchers wanted to determine whether the oatbran diet was capable of reducing LDL cholesterol levels as measured by the variable “difference”. With that in mind, state the null hypothesis the researchers should evaluate. Be sure to define (in words) any population parameters you include in the null hypothesis (ie: define the meaning of \(\mu\), \(p\), etc.)
Question #4: Perform a \(Z\) or \(T\) test to evaluate the null hypothesis you proposed in Question #3. Your answer should show how you calculated your test statistic, and it should provide a \(p\)-value alongside a brief conclusion that addresses the context of this application.
Question #5: Notice how this dataset contains columns labeled “OatBran” and “CornFlakes”. For this question, briefly explain why it is more reasonable to analyze the “Difference” column instead of comparing the averages found in the “OatBran” and “CornFlakes” columns.
\(~\)
Intensive care units, or ICUs, are primary spaces in hospitals that are reserved for patients in critical condition. The dataset linked below is a random sample of \(n = 200\) ICU patients from a research hospital affiliated with Carnegie Mellon University (CMU).
Link: https://remiller1450.github.io/data/ICUAdmissions.csv
The data dictionary below documents each variable contained within the dataset:
\(~\)
Question #6: Briefly describe one population that the researchers can reasonably generalize the results of this study to. Additionally, briefly describe another population that the researchers should avoid generalizing the results of this study to.
Question #7: Is this a randomized experiment or an observational study? What role, if any, does the design of this study have on the strength of the conclusions you can reach by analyzing it?
\(~\)
Question #8: The demographics of the Pittsburgh, PA metropolitan area (where CMU is located) are 85% non-Hispanic white according to the most recent census. Based upon this information, use a \(Z\) or \(T\) test to evaluate whether these data provide evidence that racial minorities are overrepresented among ICU patients at CMU. Your answer should clearly state the null hypothesis, and show how you calculated your test statistic. It should then provide a \(p\)-value alongside a brief conclusion that addresses the context of this application.
Question #9: According to the American Heart Association, a healthy systolic blood pressure is 120 mm Hg. Based upon this information, use a \(Z\) or \(T\) test to evaluate whether these data provide evidence that ICU patients are admitted with systolic blood pressures that differ from the healthy level. Your answer should clearly state the null hypothesis, and show how you calculated your test statistic. It should then provide a \(p\)-value alongside a brief conclusion that addresses the context of this application.
Question #10: Use these data to perform a \(Z\) or \(T\) test to evaluate whether these data provide evidence of a sex imbalance among ICU patients. Your answer should clearly state the null hypothesis, and show how you calculated your test statistic. It should then provide a \(p\)-value alongside a brief conclusion that addresses the context of this application.