Sta-209 (Fall 2025) Homework #7

Directions:

Submit your work via the “Assignments” tab on Canvas
For this assignment you should record your answers/code using R Markdown
- Please upload HTML, Word, or PDF output created using R Markdown and make sure it contains your code, output, and written answers. You should not include extraneous output, such as printing an entire data frame.
- At this point in the course you are responsible for knowing how to properly knit an R Markdown document, so uploading any other file format will result in a point deduction on the assignment
Homework is an individual assignment. It’s okay to check your work or collaborate with your classmates, student mentors, and others, but it is not okay to pass off their work as your own.
- Please clearly acknowledge any help you get from individuals other than yourself, or resources other than the materials on our course website (such as external websites and AI)

Question #1

Previous assignments have introduced the American Community Survey (ACS) data set, which is a random sample of US adults that is collected as part of the US Census. The data provided below are from the most recent ACS (part of the 2020 Census):

acs = read.csv("https://remiller1450.github.io/data/EmployedACS.csv")

The ACS data linked above includes the following variables:

- Sex - '1' for males and '0' for females
- Age - age in years
- Married - '1' for married individuals and '0' for unmarried individuals
- Income - annual income (in thousands of dollars)
- HoursWk - average hours worked per week
- Race - self-described race
- USCitizen - citizenship status, '1' for US citizens and '0' for non-citizens
- HealthInsurance - '1' if the individual has health insurance, '0' otherwise
- Language - '1' if the individual’s first/native language is English, '0' otherwise

$~$

Part A: Consider using the annual incomes reported by ACS respondents to estimate the average income of all US adults. Show how the margin of error of a 95% confidence interval is calculated in this scenario. You should get the calibration value, $c$, from StatKey, and you should show how the standard error is calculated as intermediate steps in your calculation.
Part B: Use the margin of error you found in Part A to produce a 95% confidence interval estimate for the average annual income of all US adults. Report the endpoints of the interval.
Part C: The average annual income in the ACS sample is $44,519. According to the confidence interval you found in Part B, is it plausible that the average annual income in the population represented by these data is $46,000? Briefly explain.
Part D: Use an appropriate R function to find a 90% confidence interval estimate for the difference in mean incomes of male and female US adults.
Part E: Based upon the interval you found in Part D, is it plausible males and females earn the same incomes (on average)? Briefly explain. $~$

Question #2

For each scenario described below, explain whether the change that is described will increase or decrease the width of the confidence interval estimate. Unless it is explicitly mentioned as changing, you are to assume that everything else involved in the calculation of the interval remains unchanged.

Part A: A random sample of $n=100$ is taken instead of a random sample of size $n=80$.
Part B: A quantitative variable is sampled from a population that is more homogeneous (less variability among its members) rather than from a population that is less homogeneous.
Part C: The confidence level is increased from 90% to 95%.
Part D: The interval uses the $t$-distribution as its underlying probability model rather than the Normal distribution.

$~$

Question #3

Below are various interpretations involving confidence interval estimates. You are to determine whether each interpretation is correct or incorrect. For any incorrect interpretations you must provide a one-sentence explanation of what is incorrect.

Part A: Suppose a scientifically rigorous study reports a 95% confidence interval estimate for the mean cholesterol level of US adults as (202.4, 225.6). This interval suggests 95% of the US adult population has a cholesterol level between 202.4 and 225.6.
Part B: A representative sample of $n=1000$ US adults is polled to gauge support for a proposed policy change. It was found that 57% of the sampled individuals supported the change, with a 90% confidence interval estimate of (0.53, 0.61). This interval tells us that 90% of future random samples of US adults will show between 53% and 61% of the population supporting the proposed change.
Part C: You and your friend each take a random sample and you each calculate a 95% confidence interval estimate for the proportion of student athletes at Grinnell who have taken a statistics course using a statistically valid procedure. Your friend’s sample contained $n=50$ student athletes, while yours only contained $n=30$ student athletes. This distinction leads your friend to conclude that their confidence interval is more likely to contain the true proportion than yours.