Goals:
The purpose of this lab is to provide practice using Chi-squared tests (goodness of fit and association) within the framework of the scientific method.
Directions:
\(~\)
Intensive care units, or ICUs, are primary spaces in hospitals that are reserved for patients in critical condition. The dataset linked below is a random sample of n = 200 ICU patients from a research hospital affiliated with Carnegie Mellon University (CMU).
Link: https://remiller1450.github.io/data/ICUAdmissions.csv
The data dictionary below documents each variable contained within the dataset:
\(~\)
Question #1: According recent US Census data, the population of the Pittsburgh, PA metropolitan area (where CMU is located) is 85% white, 8% black, and 7% other races. Based upon this information, do any racial groups appear to be disproportionately represented among the ICU patients at this hospital? Your answer should: clearly state a null hypothesis, provide a table of expected counts, provide the test statistic, provide the \(p\)-value, and make a conclusion.
Question #2: Using a Chi-Squared test, do these data provide statistical evidence of a difference in survival depending upon the level of consciousness of a patient arriving at the ICU? Your answer should: Your answer should: clearly state a null hypothesis, provide a table of expected counts, provide the test statistic, provide the \(p\)-value, and make a conclusion.
Question #3: Consider the hypothesis test you performed in Question #2. Based upon the table of expected counts do you believe this test to be a statistically reliable choice? That is, would a statistician have an issue with using this particular hypothesis test given the nature of these data?
Question #4: Use a randomization Chi-squared test to evaluate the hypothesis you considered in Question #2. Briefly comment on whether using a randomization Chi-squared test (a test that is robust to violated assumptions) instead of a traditional Chi-squared test (a test that is sensitive to violated assumptions) appears to make any difference in this application.
Question #5: Find and interpret the odds ratio that describes the estimated odds of survival for a patient who was conscious upon arrival relative to a patient who was in a coma upon arrival.
Question #6: Based upon these data, evaluate whether it appears statistically plausible that all three recorded levels of consciousness are equally likely among new arrivals to the ICU. Your answer should: clearly state a null hypothesis, provide a table of expected counts, provide the test statistic, provide the \(p\)-value, and make a conclusion.
\(~\)
Enzyme-Linked Immunosorbent Assays, or ELISA, are commonly used to determine if an individual has human immunodeficiency virus (HIV). An ELISA test is often used as a screening test prior to blood donation to prevent transmission of HIV. However, as with most medical diagnostic tests, it is not infallible. Experts estimate that if an individual truly has HIV, they’ll test positive during an ELISA screening 97.7% of the time. If an individual does not have HIV, they’ll test negative 92.6% of the time.
Question #7: Based upon the information above, what is the sensitivity of an ELISA test?
Question #8: Based upon the information above, what is the specificity of an ELISA test?
Question #9: Suppose an individual tests positive during an ELISA screening prior to a blood donation. How likely do you think it is that this person actually has HIV? In your write-up, record whether you think this probability is closest to 0.1, 0.5, or 0.9. For now, there’s not a correct answer, I’m only looking for your intuitive judgment and a brief explanation.
\(~\)
Imagine a hypothetical population of 1,000,000 people and suppose that 0.5% of this population actually has HIV (this is roughly the percentage in the US).
Question #10: How many people in this population have HIV (as a count)? How many do not have HIV (as a count)?
Question #11: Considering all of the members of the population who have HIV, if 97.7% of them would test positive on an ELISA screening, how many positive and negative tests will there be (as counts) in this group?
Question #12: Considering all of the members of the population who do not have HIV, if 92.6% of them would test negative on an ELISA screening, how many positive and negative tests will there be (as counts) in this group?
Question #13: Using your previous answers, fill out the contingency table outlined below:
Positive ELISA | Negative ELISA | Total | |
---|---|---|---|
Has HIV | |||
Doesn’t have HIV | |||
Total | 1,000,000 |
\(~\)
Question #15: Using the hypothetical contingency table you created in Question #13, what proportion of those who test positive on an ELISA screening test actually have HIV? Why is this probability smaller than what most people would expect?
Question #16: Using the hypothetical contingency table you created in Question #13, what proportion of people who test negative on an ELISA screening test are actually free of HIV? How does this probability estimate compare with what you’d expect?
Note: The probability estimated in Question #15 is called the test’s positive predictive value, while the probability from Question #16 is called its negative predictive value.