Goals:

The purpose of this lab is to practice identifying different study designs and understanding the implications these designs have on subsequent analyses of the data.

Directions:

  • You are expected to progress through the analyses described in this document as a group, recording your answers in a shared document. It’s completely up to your group how you’d like to organize this - some groups like using a shared Google Doc, while other might designate one person to be the group’s recorder.
  • You are expected to work together, any attempts to “divide and conquer” the lab questions may result in point deductions on your group’s lab score.
  • Labs are graded primarily for completion, and we will get together as group for the last 10-15 minutes of class to discuss some of the lab questions. This means you should focus on learning the material (while also helping the teammates in your group) rather than seeing labs as an assessment (like homework or exams).
  • Please upload your responses to the Lab’s questions on Canvas. The expectation is that everyone uploads their own copy (they can be identical within your group).
  • Use the snipping tool on Windows or take a Mac screenshot to add a screenshots to your lab write-up as requested.

\(~\)

Study #1 - Infant Heart Surgery

Some infants are born with congenital heart defects that require surgery shortly after birth. The standard surgical approach is “circulatory arrest”, which has the downside of cutting off blood flow to the brain during the surgery and potentially leading to brain damage. An alternative surgical approach is “low-flow bypass”, which maintains circulation to the brain using an external pump that might lead to other types of brain injuries. The goal of this study is to determine which surgical approach yields better developmental outcomes for infants born with congenial heart defects.

The Infant Heart Surgery dataset contains data from a study conducted by surgeons at Harvard Medical School. In the study, 143 infants were randomly assigned to receive either the low-flow or circulatory arrest surgical approaches. Two years later, the researchers followed upon each infant to measure two developmental scores:

  • Psychomotor Development Index (PDI) - a composite score measuring physiological development, with higher scores indicating greater development
  • Mental Development Index (MDI) - a composite score measuring mental development, with higher scores indicating greater development

Additionally, the research team recorded the following variables for each infant:

  • Weight - the infant’s weight (in grams)
  • Length - the infant’s length (in cm)
  • Age - the infant’s age (in hours)
  • Sex - the infant’s sex (male or female)

Question #1: Is this an experiment or an observational study? Briefly explain your answer.

Question #2: Suppose that for an infant “Variable X” is predictive of their PDI score at two-years. Considering the design of this study, without analyzing any data, would you expect “Variable X” to be a confounding variable in the primary analyses of in this study? Briefly explain.

Question #3: Construct a 95% confidence interval estimate for the difference in proportions comparing the difference in the relative frequencies of male infants in the low-flow and circulatory arrest groups. That is, define \(p_1\) as the proportion of low-flow surgery recipients that are male, and \(p_2\) as the proportion of circulatory arrest recipients that are male, then find a 95% CI estimate for \(p_1 - p_2\). You may use either bootstrapping or a CLT formula to find you interval.

Question #4: Interpret the results of your 95% CI in the context of the design of this study (ie: consider your answer to Question #2). That is, why is this interval not surprising when you consider the design of this study?

Question #5: The 95% CI for difference in mean PDI scores (low-flow - circulatory arrest) is (0.7, 10.9). To conclude that the low-flow surgery causes better physiological development than the circulatory arrest surgery, all of the following factors must be ruled out:

  1. Confounding variables
  2. Bias
  3. Random chance/sampling variability

For Question #5, briefly comment upon whether you believe the factors A - C can be sufficiently ruled out for this study. If a factor can be ruled out, explain why. If a factor cannot be ruled out, explain what you’d need to see to change your mind.

\(~\)

Study #2 - Lead Smelting in El Paso, TX

In 1887, Robert Towne built a metals smelter approximately two and a half miles northwest of El Paso, Texas, just across the river from Ciudad Juarez in Mexico and close to several small towns in nearby New Mexico. The smelter, which processed metal ore from regional mines, was quickly acquired by ASARCO (American Smelting and Refining Company) and became an important economic institution in the region. ASARCO owned and maintained the land adjacent to the smelter, an area called “Smeltertown” that was home to thousands of residents at its peak, but the town’s population dwindled to a few hundred in the early 1970s.

Prior to modern environmental regulations, heavy metals are often emitted as a by-product smelting. Metals such as lead and arsenic tend to end up in soil and dust, and can be easily ingested by children, prompting researchers to compare children living in Smeltertown with those living further away from the smelter. Because lead exposure is known to stunt intellectual development, one variable the researchers were specifically interested in was the age-adjusted IQ score of these children.

Question #6: Suppose you’re asked to conduct a study investigating the negative effects of the smelter on the children of Smeltertown in hopes finding compelling evidence to support relocating families away from other smelters across the United States. With this goal in mind, briefly address the following questions:

  • A): What is the target population? Briefly explain why this population makes sense given your study’s goal.
  • B): Suppose you collected data on every child living in Smeltertown, would this data be a sample or a population? If it’s a sample, describe whether it is a random sample or a convenience sample.
  • C): Suppose you use US Census records to select a random sample of children from the El Paso area (some of whom would reside in Smeltertown). Is this a biased sample? Briefly explain. (Hint: think of the target population)

\(~\)

The Lead IQ dataset contains partial results from the aforementioned study, it includes two variables:

  • Distance - “near” if the child lived near the smelter (ie: lived in Smeltertown), “far” if the child lived at least 1 mile from the smelter (ie: other parts of the El Paso area)
  • IQ - The age-adjusted IQ score of the child. Age-adjusted IQ scores are standardized measures of intellectual ability.

Question #7: Create an appropriate graph displaying the relationship between the variables “Distance” and “IQ”. Briefly compare the median IQ scores and IQRs of each distance group.

Question #8: Find a 95% confidence interval estimate for the difference in means comparing the IQ scores of each group of children (“near” and “far”). You may use either bootstrapping or the CLT to construct your interval.

\(~\)

Suppose, the age-adjusted IQ scores were measured by a trained professional using an unbiased testing instrument. Further, suppose researchers who conducted the study were careful not to tell these individuals whether the child they were evaluating lived in Smeltertown or another part of El Paso.

Question #9: Statistically speaking, considering the information above, why wouldn’t the researchers want the staff to know whether a child lived in Smeltertown during their IQ assessment? Briefly explain using the appropriate statistical terms.

Question #10: Is this an example of an observational study or an experiment? How do you know?

Question #11: Listed below are several possible explanations for the observed difference in IQ scores found in these data. For this question, briefly explain which of these explanations can be ruled out based your current understanding of these data (ie: your answers to Questions 6-10)?

  • A) Living near the smelter is harmful to a child’s intellectual development.
  • B) Living near the smelter is probably not harmful, the observed difference in IQ scores can be explained by random chance (ie: variability in which children ended up getting sampled).
  • C) A confounding variable that make it appear as though living near the smelter is harmful to a child’s intellectual development.
  • D) The relationship found in this study can be attributed to it using a biased sample that is not representative of all children who have been exposed to smelting by-products.