Directions:
- Submit your work via the “Assignments” tab on Canvas
- For this assignment you should record your answers/code using
R Markdown
- Please upload HTML, Word, or PDF output created using R Markdown and
make sure it contains your code, output, and written answers. You should
not include extraneous output, such as printing an entire data
frame.
- At this point in the course you are responsible for knowing how to
properly knit an R Markdown document, so uploading any other file format
will result in a point deduction on the assignment
- Homework is an individual assignment. It’s okay to check
your work or collaborate with your classmates, student mentors, and
others, but it is not okay to pass off their work as your own.
- Please clearly acknowledge any help you get from individuals other
than yourself, or resources other than the materials on our course
website (such as external websites and AI)
Question #1
Previous assignments have introduced the American Community Survey
(ACS) data set, which is a random sample of US adults that is collected
as part of the US Census. The data provided below are from the most
recent ACS (part of the 2020 Census):
acs = read.csv("https://remiller1450.github.io/data/EmployedACS.csv")
The ACS data linked above includes the following variables:
- Sex - '1' for males and '0' for females
- Age - age in years
- Married - '1' for married individuals and '0' for unmarried individuals
- Income - annual income (in thousands of dollars)
- HoursWk - average hours worked per week
- Race - self-described race
- USCitizen - citizenship status, '1' for US citizens and '0' for non-citizens
- HealthInsurance - '1' if the individual has health insurance, '0' otherwise
- Language - '1' if the individual’s first/native language is English, '0' otherwise
\(~\)
- Part A: Consider using the annual incomes reported
by ACS respondents to estimate the average income of all US adults. Show
how the margin of error of a 95% confidence interval is calculated in
this scenario. You should get the calibration value, \(c\), from StatKey, and you
should show how the standard error is calculated as intermediate steps
in your calculation.
- Part B: Use the margin of error you found in Part A
to produce a 95% confidence interval estimate for the average annual
income of all US adults. Report the endpoints of the interval.
- Part C: The average annual income in the ACS sample
is $44,519. According to the confidence interval you found in Part B, is
it plausible that the average annual income in the population
represented by these data is $46,000? Briefly explain.
- Part D: Use an appropriate
R function
to find a 90% confidence interval estimate for the difference in mean
incomes of male and female US adults.
- Part E: Based upon the interval you found in Part
D, is it plausible males and females earn the same incomes (on average)?
Briefly explain. \(~\)
Question #2
For each scenario described below, explain whether the change that is
described will increase or decrease the width of the
confidence interval estimate. Unless it is explicitly mentioned as
changing, you are to assume that everything else involved in the
calculation of the interval remains unchanged.
- Part A: A random sample of \(n=100\) is taken instead of a random sample
of size \(n=80\).
- Part B: A quantitative variable is sampled from a
population that is more homogeneous (less variability among its members)
rather than from a population that is less homogeneous.
- Part C: The confidence level is increased from 90%
to 95%.
- Part D: The interval uses the \(t\)-distribution as its underlying
probability model rather than the Normal distribution.
\(~\)
Question #3
Below are various interpretations involving confidence interval
estimates. You are to determine whether each interpretation is
correct or incorrect. For any incorrect
interpretations you must provide a one-sentence explanation of what is
incorrect.
- Part A: Suppose a scientifically rigorous study
reports a 95% confidence interval estimate for the mean cholesterol
level of US adults as (202.4, 225.6). This interval suggests 95% of the
US adult population has a cholesterol level between 202.4 and
225.6.
- Part B: A representative sample of \(n=1000\) US adults is polled to gauge
support for a proposed policy change. It was found that 57% of the
sampled individuals supported the change, with a 90% confidence interval
estimate of (0.53, 0.61). This interval tells us that 90% of future
random samples of US adults will show between 53% and 61% of the
population supporting the proposed change.
- Part C: You and your friend each take a random
sample and you each calculate a 95% confidence interval estimate for the
proportion of student athletes at Grinnell who have taken a statistics
course using a statistically valid procedure. Your friend’s sample
contained \(n=50\) student athletes,
while yours only contained \(n=30\)
student athletes. This distinction leads your friend to conclude that
their confidence interval is more likely to contain the true proportion
than yours.