Goals:
The purpose of this lab is to practice identifying different study designs and understanding the implications these designs have on subsequent analyses of the data.
Directions:
\(~\)
Some infants are born with congenital heart defects that require surgery shortly after birth. The standard surgical approach is “circulatory arrest”, which has the downside of cutting off blood flow to the brain during the surgery and potentially leading to brain damage. An alternative surgical approach is “low-flow bypass”, which maintains circulation to the brain using an external pump that might lead to other types of brain injuries. The goal of this study is to determine which surgical approach yields better developmental outcomes for infants born with congenial heart defects.
The Infant Heart Surgery dataset contains data from a study conducted by surgeons at Harvard Medical School. In the study, 143 infants were randomly assigned to receive either the low-flow or circulatory arrest surgical approaches. Two years later, the researchers followed upon each infant to measure two developmental scores:
Additionally, the research team recorded the following variables for each infant:
Question #1: Is this an experiment or an observational study? Briefly explain your answer.
Question #2: Suppose that for an infant “Variable X” is predictive of their PDI score at two-years. Considering the design of this study, without analyzing any data, would you expect “Variable X” to be a confounding variable in the primary analyses of in this study? Briefly explain.
Question #3: Construct a 95% confidence interval estimate for the difference in proportions comparing the difference in the relative frequencies of male infants in the low-flow and circulatory arrest groups. That is, define \(p_1\) as the proportion of low-flow surgery recipients that are male, and \(p_2\) as the proportion of circulatory arrest recipients that are male, then find a 95% CI estimate for \(p_1 - p_2\). You may use either bootstrapping or a CLT formula to find you interval.
Question #4: Interpret the results of your 95% CI in the context of the design of this study (ie: consider your answer to Question #2). That is, why is this interval not surprising when you consider the design of this study?
Question #5: The 95% CI for difference in mean PDI scores (low-flow - circulatory arrest) is (0.7, 10.9). To conclude that the low-flow surgery causes better physiological development than the circulatory arrest surgery, all of the following factors must be ruled out:
For Question #5, briefly comment upon whether you believe the factors A - C can be sufficiently ruled out for this study. If a factor can be ruled out, explain why. If a factor cannot be ruled out, explain what you’d need to see to change your mind.
\(~\)
In 1887, Robert Towne built a metals smelter approximately two and a half miles northwest of El Paso, Texas, just across the river from Ciudad Juarez in Mexico and close to several small towns in nearby New Mexico. The smelter, which processed metal ore from regional mines, was quickly acquired by ASARCO (American Smelting and Refining Company) and became an important economic institution in the region. ASARCO owned and maintained the land adjacent to the smelter, an area called “Smeltertown” that was home to thousands of residents at its peak, but the town’s population dwindled to a few hundred in the early 1970s.
Prior to modern environmental regulations, heavy metals are often emitted as a by-product smelting. Metals such as lead and arsenic tend to end up in soil and dust, and can be easily ingested by children, prompting researchers to compare children living in Smeltertown with those living further away from the smelter. Because lead exposure is known to stunt intellectual development, one variable the researchers were specifically interested in was the age-adjusted IQ score of these children.
Question #6: Suppose you’re asked to conduct a study investigating the negative effects of the smelter on the children of Smeltertown in hopes finding compelling evidence to support relocating families away from other smelters across the United States. With this goal in mind, briefly address the following questions:
\(~\)
The Lead IQ dataset contains partial results from the aforementioned study, it includes two variables:
Question #7: Create an appropriate graph displaying the relationship between the variables “Distance” and “IQ”. Briefly compare the median IQ scores and IQRs of each distance group.
Question #8: Find a 95% confidence interval estimate for the difference in means comparing the IQ scores of each group of children (“near” and “far”). You may use either bootstrapping or the CLT to construct your interval.
\(~\)
Suppose, the age-adjusted IQ scores were measured by a trained professional using an unbiased testing instrument. Further, suppose researchers who conducted the study were careful not to tell these individuals whether the child they were evaluating lived in Smeltertown or another part of El Paso.
Question #9: Statistically speaking, considering the information above, why wouldn’t the researchers want the staff to know whether a child lived in Smeltertown during their IQ assessment? Briefly explain using the appropriate statistical terms.
Question #10: Is this an example of an observational study or an experiment? How do you know?
Question #11: Listed below are several possible explanations for the observed difference in IQ scores found in these data. For this question, briefly explain which of these explanations can be ruled out based your current understanding of these data (ie: your answers to Questions 6-10)?