Sta-209-04 (Spring 24) Homework #5

From the Introduction to Modern Statistics (IMS) textbook, complete the following exercises:

Ch 2.5 Exercises: #2, #8, #16
Ch 12.5 Exercises: #1, #2, #6, #8

Also complete the additional question given below:

Question #1: This question uses data from a study conducted by the CDC involving children residing in El Paso, Texas who lived either “Near” (within 1 mile) or “Far” (further than 1 mile) from a large lead smelter. The study’s explanatory variable is the distance from the smelter (either “Near” or “Far”) and the response variable was the child’s age-adjusted IQ score.

lead_iq = read.csv("https://remiller1450.github.io/data/LeadIQ.csv")

Part A: Create an appropriate data visualization and state whether these variable appear to be associated.
Part B: The researchers in this study are interested in making a conclusion about the difference in mean age-adjusted IQ scores for children exposed to pollution from lead smelting relative to children who are not exposed to lead smelting for all children in the United States. We’ll denote this difference in means as: \(\mu_{near} - \mu_{far}\). What is the point estimate of this unknown population parameter from the sample data in this study?
Part C: Use StatKey to create a 90% confidence interval estimate of the population parameter from Part B using the “percentile bootstrap approach”.
Part D: Briefly interpret the confidence interval from Part C. Then, based on the confidence interval, can we confidently conclude that living near a lead smelter results in lower mean IQ scores in the population represented by these data?
Part E: Considering the target population described in Part B, does your confidence interval from Parts C and D adequately account for possible sampling bias? Briefly explain.