Directions:
- Submit your assignment via P-web.
- Submit only a compiled R Markdown document (pdf, word, or html
output are all okay, but you may need to “zip” an html file)
- If you want to compile to a pdf you can install the
tinytext
package by running
install.packages('tinytex')
followed by
tinytex::install_tinytex()
- Only submit your .Rmd file if you are unable to compile it due to
errors (in the future you will be penalized for this)
Question #1
The American Community Survey (ACS) is a component of the US Census
that is administered to a random sample of US addresses on a rolling
basis. When the mailed version is combined with in-person visits and
telephone calls the survey has a 95% response rate. The data linked
below are a random sample of employed individuals drawn from a recent
ACS (2020 Census):
acs = read.csv("https://remiller1450.github.io/data/EmployedACS.csv")
The ACS data linked above includes the following variables:
- Sex - “1” for males and “0” for females
- Age - age in years
- Married - “1” for married individuals and “0” for unmarried
individuals
- Income - annual income (thousands of dollars)
- HoursWk - average hours worked per week
- Race - self-described race
- USCitizen - citizenship status, “1” for US citizens and “0” for
non-citizens
- HealthInsurance - “1” if the individual has health insurance, “0”
otherwise
- Language - “1” if the individual’s first/native language is English,
“0” otherwise
Each of the following parts of this question will describe a scenario
that you will address using a confidence interval estimate. To receive
full credit, you must use a method that produces valid confidence
intervals (ie: exact binomial for a single proportion when the sample
size is small) for the population parameter of interest. Not all of
these scenarios will ask for the same confidence level, so be mindful of
what you’re asked to find.
- Part A: Using the ACS data, find a 99% confidence
estimate for the proportion of all employed Americans who speak English
as their native/primary language.
- Part B: Find a 95% confidence interval estimate for
the average personal income of employed US adults.
- Part C: Find a 99% confidence interval estimate for
the difference in proportions of married and unmarried individuals who
have health insurance.
- Part D: Based upon the interval estimate you found
in Part C, can you confidently conclude that married US adults are more
likely to have health insurance than unmarried US adults? Briefly
explain.
- Part E: Find a 98% confidence interval estimate for
the correlation between hours worked and income.
- Part F: Based upon the interval estimate you found
in Part E, can you confidently conclude that among US adults working
more hours is associated with earning higher incomes? Briefly
explain.
- Part G: Find a 90% confidence interval estimate for
the difference in mean incomes of male and female US adults.
- Part H: Based upon the interval from Part G, can
you confidently conclude that in the United States adult males tend to
earn higher incomes, on average, than adult females? Briefly
explain.
\(~\)
Question #2
The scenarios below each describe a potential change. For each
scenario you are to indicate whether that change is expected to increase
or decrease the width of a confidence interval estimate. You should also
include a 1-sentence justification explaining your belief.
- Part A: Deciding to take a random sample of size
\(n=100\) rather than one of size \(n=75\).
- Part B: Taking a random sample from a population
that is more homogeneous (lower standard deviation among cases) compared
to sampling from a population that is less homogeneous (higher standard
deviation among cases).
- Part C: Increasing the confidence level from 90% to
95%.
- Part D: Using the t-distribution as a probability
model rather than the Normal distribution.
\(~\)
Question #3
The scenarios below each involve the interpretation of a confidence
interval estimate. For each scenario, you are to indicate whether the
interpretation is correct or incorrect. You should include a 1-sentence
justification explaining your belief.
- Part A: A 95% confidence interval estimate for the
mean cholesterol level of US adults of (202.4, 225.6) suggests that 95%
of the US adult population has a cholesterol level between 202.4,
225.6.
- Part B: You and your friend each take a random
sample and you each calculate a 95% confidence interval estimate for the
proportion of student athletes at Grinnell who have taken a statistics
course using a statistically valid procedure. Your friend’s sample
contained \(n=50\) student athletes,
while yours only contained \(n=30\)
student athletes. This leads your friend to interpret that their
confidence interval is more likely to contain the true the proportion
than yours.
- Part C: As part of a statistics research project,
many random samples are drawn from a population where the quantitative
variables \(X\) and \(Y\) are independent. If we use a
statistically valid procedure to calculate a 95% confidence estimate for
the correlation coefficient relating \(X\) and \(Y\) for each of these random samples,
roughly 5% of the random samples will produce an interval that doesn’t
contain 0 within the interval’s endpoints.
- Part D: A representative sample of \(n=1000\) US adults is polled to gauge
support for a proposed economic policy change. It was found that 57% of
the sampled individuals supported the change, with a 90% confidence
interval estimate of (0.53, 0.61). This interval tells us that 90% of
future random samples of US adults will show between 53% and 61% of the
population supporting the proposed change.