Goals:

The purpose of this lab is to provide practice using Chi-squared tests (goodness of fit and association) within the framework of the scientific method.

Directions:

  • You are expected to progress through the analyses described in this document as a group, recording your answers in a shared document. It’s completely up to your group how you’d like to organize this - some groups like using a shared Google Doc, while other might designate one person to be the group’s recorder.
  • You are expected to work together, any attempts to “divide and conquer” the lab questions may result in point deductions on your group’s lab score.
  • Labs are graded primarily for completion, and we will get together as group for the last 10-15 minutes of class to discuss some of the lab questions. This means you should focus on learning the material (while also helping the teammates in your group) rather than seeing labs as an assessment (like homework or exams).
  • Please upload your responses to the Lab’s questions on Canvas. The expectation is that everyone uploads their own copy (they can be identical within your group).
  • Use the snipping tool on Windows or take a Mac screenshot to add a screenshots to your lab write-up as requested.

\(~\)

Study #1 - ICU Admissions

Intensive care units, or ICUs, are primary spaces in hospitals that are reserved for patients in critical condition. The dataset linked below is a random sample of n = 200 ICU patients from a research hospital affiliated with Carnegie Mellon University (CMU).

Link: https://remiller1450.github.io/data/ICUAdmissions.csv

The data dictionary below documents each variable contained within the dataset:

  • ID - Patient ID number
  • Status - Patient status: 0=lived or 1=died
  • Age - Patient’s age (in years)
  • Sex - 0=male or 1=female
  • Race - Patient’s race: 1=white, 2=black, or 3=other
  • Service - Type of service: 0=medical or 1=surgical
  • Cancer - Is cancer involved? 0=no or 1=yes
  • Renal - Is chronic renal failure involved? 0=no or 1=yes
  • Infection - Is infection involved? 0=no or 1=yes
  • CPR - Patient received CPR prior to admission? 0=no or 1=yes
  • Systolic - Systolic blood pressure (in mm of Hg)
  • HeartRate - Pulse rate (beats per minute)
  • Previous - Previous admission to ICU within 6 months? 0=no or 1=yes
  • Type - Admission type: 0=elective or 1=emergency
  • Fracture - Fractured bone involved? 0=no or 1=yes
  • PO2 - Partial oxygen level from blood gases under 60? 0=no or 1=yes
  • PH - pH from blood gas under 7.25? 0=no or 1=yes
  • PCO2 - Partial carbon dioxide level from blood gas over 45? 0=no or 1=yes
  • Bicarbonate - Bicarbonate from blood gas under 18? 0=no or 1=yes
  • Creatinine - Creatinine from blood gas over 2.0? 0=no or 1=yes
  • Consciousness - Level upon arrival: 1=conscious, 2=deep stupor, or 3=coma

\(~\)

Statistical Analyses

Question #1: According recent US Census data, the population of the Pittsburgh, PA metropolitan area (where CMU is located) is 85% white, 8% black, and 7% other races. Based upon this information, do any racial groups appear to be disproportionately represented among the ICU patients at this hospital? Your answer should: clearly state a null hypothesis, provide a table of expected counts, provide the test statistic, provide the \(p\)-value, and make a conclusion.

Question #2: Using a Chi-Squared test, do these data provide statistical evidence of a difference in survival depending upon the level of consciousness of a patient arriving at the ICU? Your answer should: Your answer should: clearly state a null hypothesis, provide a table of expected counts, provide the test statistic, provide the \(p\)-value, and make a conclusion.

Question #3: Consider the hypothesis test you performed in Question #2. Based upon the table of expected counts do you believe this test to be a statistically reliable choice? That is, would a statistician have an issue with using this particular hypothesis test given the nature of these data?

Question #4: Use a randomization Chi-squared test to evaluate the hypothesis you considered in Question #2. Briefly comment on whether using a randomization Chi-squared test (a test that is robust to violated assumptions) instead of a traditional Chi-squared test (a test that is sensitive to violated assumptions) appears to make any difference in this application.

Question #5: Find and interpret the odds ratio that describes the estimated odds of survival for a patient who was conscious upon arrival relative to a patient who was in a coma upon arrival.

Question #6: Based upon these data, evaluate whether it appears statistically plausible that all three recorded levels of consciousness are equally likely among new arrivals to the ICU. Your answer should: clearly state a null hypothesis, provide a table of expected counts, provide the test statistic, provide the \(p\)-value, and make a conclusion.

\(~\)

Study #2 - HIV Diagnostic Testing

Enzyme-Linked Immunosorbent Assays, or ELISA, are commonly used to determine if an individual has human immunodeficiency virus (HIV). An ELISA test is often used as a screening test prior to blood donation to prevent transmission of HIV. However, as with most medical diagnostic tests, it is not infallible. Experts estimate that if an individual truly has HIV, they’ll test positive during an ELISA screening 97.7% of the time. If an individual does not have HIV, they’ll test negative 92.6% of the time.

Question #7: Based upon the information above, what is the sensitivity of an ELISA test?

Question #8: Based upon the information above, what is the specificity of an ELISA test?

Question #9: Suppose an individual tests positive during an ELISA screening prior to a blood donation. How likely do you think it is that this person actually has HIV? In your write-up, record whether you think this probability is closest to 0.1, 0.5, or 0.9. For now, there’s not a correct answer, I’m only looking for your intuitive judgment and a brief explanation.

\(~\)

Contingency Tables

Imagine a hypothetical population of 1,000,000 people and suppose that 0.5% of this population actually has HIV (this is roughly the percentage in the US).

Question #10: How many people in this population have HIV (as a count)? How many do not have HIV (as a count)?

Question #11: Considering all of the members of the population who have HIV, if 97.7% of them would test positive on an ELISA screening, how many positive and negative tests will there be (as counts) in this group?

Question #12: Considering all of the members of the population who do not have HIV, if 92.6% of them would test negative on an ELISA screening, how many positive and negative tests will there be (as counts) in this group?

Question #13: Using your previous answers, fill out the contingency table outlined below:

Positive ELISA Negative ELISA Total
Has HIV
Doesn’t have HIV
Total 1,000,000

\(~\)

Reversing the Conditional Probabilities

Question #15: Using the hypothetical contingency table you created in Question #13, what proportion of those who test positive on an ELISA screening test actually have HIV? Why is this probability smaller than what most people would expect?

Question #16: Using the hypothetical contingency table you created in Question #13, what proportion of people who test negative on an ELISA screening test are actually free of HIV? How does this probability estimate compare with what you’d expect?

Note: The probability estimated in Question #15 is called the test’s positive predictive value, while the probability from Question #16 is called its negative predictive value.