From the Introduction to Modern Statistics (IMS) textbook, complete
the following exercises:
- Ch 2.5 Exercises: #2, #8, #16
- Ch 12.5 Exercises: #1, #2, #6, #8
Also complete the additional question given below:
Question #1: This question uses data from a study
conducted by the CDC involving children residing in El Paso, Texas who
lived either “Near” (within 1 mile) or “Far” (further than 1 mile) from
a large lead smelter. The study’s explanatory variable is the distance
from the smelter (either “Near” or “Far”) and the response variable was
the child’s age-adjusted IQ score.
lead_iq = read.csv("https://remiller1450.github.io/data/LeadIQ.csv")
- Part A: Create an appropriate data visualization
and state whether these variable appear to be associated.
- Part B: The researchers in this study are
interested in making a conclusion about the difference in mean
age-adjusted IQ scores for children exposed to pollution from lead
smelting relative to children who are not exposed to lead smelting for
all children in the United States. We’ll denote this difference in
means as: \(\mu_{near} -
\mu_{far}\). What is the point estimate of this unknown
population parameter from the sample data in this study?
- Part C: Use StatKey to create a 90% confidence
interval estimate of the population parameter from Part B using the
“percentile bootstrap approach”.
- Part D: Briefly interpret the confidence interval
from Part C. Then, based on the confidence interval, can we confidently
conclude that living near a lead smelter results in lower mean IQ scores
in the population represented by these data?
- Part E: Considering the target population described
in Part B, does your confidence interval from Parts C and D adequately
account for possible sampling bias? Briefly explain.