Sta-209 (Spring 2025) Homework #6

Directions:

Submit your assignment via P-web.
Submit only a compiled R Markdown document (pdf, word, or html output are all okay, but you may need to “zip” an html file)
- If you want to compile to a pdf you can install the tinytext package by running install.packages('tinytex') followed by tinytex::install_tinytex()
Only submit your .Rmd file if you are unable to compile it due to errors (in the future you will be penalized for this)

Question #1

The American Community Survey (ACS) is a component of the US Census that is administered to a random sample of US addresses on a rolling basis. When the mailed version is combined with in-person visits and telephone calls the survey has a 95% response rate. The data linked below are a random sample of employed individuals drawn from a recent ACS (2020 Census):

acs = read.csv("https://remiller1450.github.io/data/EmployedACS.csv")

The ACS data linked above includes the following variables:

Sex - “1” for males and “0” for females
Age - age in years
Married - “1” for married individuals and “0” for unmarried individuals
Income - annual income (thousands of dollars)
HoursWk - average hours worked per week
Race - self-described race
USCitizen - citizenship status, “1” for US citizens and “0” for non-citizens
HealthInsurance - “1” if the individual has health insurance, “0” otherwise
Language - “1” if the individual’s first/native language is English, “0” otherwise

Each of the following parts of this question will describe a scenario that you will address using a confidence interval estimate. To receive full credit, you must use a method that produces valid confidence intervals (ie: exact binomial for a single proportion when the sample size is small) for the population parameter of interest. Not all of these scenarios will ask for the same confidence level, so be mindful of what you’re asked to find.

Part A: Using the ACS data, find a 99% confidence estimate for the proportion of all employed Americans who speak English as their native/primary language.
Part B: Find a 95% confidence interval estimate for the average personal income of employed US adults.
Part C: Find a 99% confidence interval estimate for the difference in proportions of married and unmarried individuals who have health insurance.
Part D: Based upon the interval estimate you found in Part C, can you confidently conclude that married US adults are more likely to have health insurance than unmarried US adults? Briefly explain.
Part E: Find a 98% confidence interval estimate for the correlation between hours worked and income.
Part F: Based upon the interval estimate you found in Part E, can you confidently conclude that among US adults working more hours is associated with earning higher incomes? Briefly explain.
Part G: Find a 90% confidence interval estimate for the difference in mean incomes of male and female US adults.
Part H: Based upon the interval from Part G, can you confidently conclude that in the United States adult males tend to earn higher incomes, on average, than adult females? Briefly explain.

\(~\)

Question #2

The scenarios below each describe a potential change. For each scenario you are to indicate whether that change is expected to increase or decrease the width of a confidence interval estimate. You should also include a 1-sentence justification explaining your belief.

Part A: Deciding to take a random sample of size \(n=100\) rather than one of size \(n=75\).
Part B: Taking a random sample from a population that is more homogeneous (lower standard deviation among cases) compared to sampling from a population that is less homogeneous (higher standard deviation among cases).
Part C: Increasing the confidence level from 90% to 95%.
Part D: Using the t-distribution as a probability model rather than the Normal distribution.

\(~\)

Question #3

The scenarios below each involve the interpretation of a confidence interval estimate. For each scenario, you are to indicate whether the interpretation is correct or incorrect. You should include a 1-sentence justification explaining your belief.

Part A: A 95% confidence interval estimate for the mean cholesterol level of US adults of (202.4, 225.6) suggests that 95% of the US adult population has a cholesterol level between 202.4, 225.6.
Part B: You and your friend each take a random sample and you each calculate a 95% confidence interval estimate for the proportion of student athletes at Grinnell who have taken a statistics course using a statistically valid procedure. Your friend’s sample contained \(n=50\) student athletes, while yours only contained \(n=30\) student athletes. This leads your friend to interpret that their confidence interval is more likely to contain the true the proportion than yours.
Part C: As part of a statistics research project, many random samples are drawn from a population where the quantitative variables \(X\) and \(Y\) are independent. If we use a statistically valid procedure to calculate a 95% confidence estimate for the correlation coefficient relating \(X\) and \(Y\) for each of these random samples, roughly 5% of the random samples will produce an interval that doesn’t contain 0 within the interval’s endpoints.
Part D: A representative sample of \(n=1000\) US adults is polled to gauge support for a proposed economic policy change. It was found that 57% of the sampled individuals supported the change, with a 90% confidence interval estimate of (0.53, 0.61). This interval tells us that 90% of future random samples of US adults will show between 53% and 61% of the population supporting the proposed change.