Directions:

  • My main expectation is that you thoughtfully work through labs collaboratively with your group, discussing the embedded questions and recording your responses in a shared document.
    • At times you might be asked to add screenshots to your write-up. If you are on a Windows PC, an easy way to do this is the “snipping tool”, which you can find using the search bar along the bottom of your screen. If you are on a Mac, you can find instructions on how to take a screenshot at this link.
  • Everyone should upload their own copy of the lab write-up to Canvas
  • Only a couple of questions on each lab will be graded accuracy, so your focus should be on learning the material rather than “getting the right answers” as quickly as possible

\(~\)

Introduction

The purpose this lab is to practice applying concepts and procedures related to confidence interval estimation.

\(~\)

Concepts

A \(P\%\) confidence interval is an interval estimate found using a procedure with a \(P\%\) long-run success rate. This means that an individual sample might lead to a confidence interval estimate that will “miss” the population characteristic, but if the same procedure were applied to many different random samples it would succeed in capturing the true population parameter \(P\%\) of the time.

\[\text{Point Estimate} \pm \text{Margin of Error} \\ = \text{Point Estimate} \pm c*SE\]

The key to achieving the proper long-run success rate is the margin of error. If these margins of error are systematically too large, the intervals will be more successful than advertised; however, if the margins of error are too small, the intervals will be less successful than advertised.

Question #1: Based upon the definition given above, if you were given 200 different (but properly constructed) 90% confidence interval estimates, how many of them would you expect to actually contain the population characteristic of interest?

Question #2: Using this StatKey menu, consider a sample of \(n = 30\) flips of a fair coin. Hit “generate 100 samples” to display the results of 100 different repeats of \(n = 30\) coin flips, then click the “Confidence Intervals” tab in the right panel to display the 95% confidence interval calculated from each of these samples. Using this output, answer the following:

  1. What does each horizontal line in the panel represent?
  2. Why are some lines colored red and others are colored green?
  3. What percentage of lines are colored green? How is this related to the confidence level used?

\(~\)

Application - the American Community Survey

The American Community Survey (ACS) is a component of the US Census that is administered to a random sample US addresses on a rolling basis. When the mailed version is combined with in-person visits and telephone calls the survey has a 95% response rate. The data linked below are a random sample of employed individuals drawn from a recent ACS:

The ACS data linked above includes the following variables:

  • Sex - “1” for males and “0” for females
  • Age - age in years
  • Married - “1” for married individuals and “0” for unmarried individuals
  • Income - annual income (thousands of dollars)
  • HoursWk - average hours worked per week
  • Race - self-described race
  • USCitizen - citizenship status, “1” for US citizens and “0” for non-citizens
  • HealthInsurance - “1” if the individual has health insurance, “0” otherwise
  • Language - “1” if the individual’s first/native language is English, “0” otherwise

\(~\)

Directions: On each of the following questions you should: 1) be mindful of the confidence level, 2) find the point estimate using StatKey, 3) show the work involved in the calculating the interval (ie: starting from point estimate \(\pm\) margin of error and ending with endpoints). Be careful to use the \(t\)-distribution for quantitative outcomes (ie: a mean or difference in means)

Question #3: Using the ACS data, provide a 99% confidence estimate for the proportion of all Americans who speak English as their primary language.

Question #4: Using the ACS data, find a 95% confidence interval estimate for the average personal income of employed US adults.

Question #5: Using the ACS data, find a 95% confidence interval estimate for the difference in proportions of married and unmarried individuals who have health insurance. Based upon this interval, can you confidently conclude that married individuals are more likely to have health insurance?

Question #6: Using the ACS data, find a 98% confidence interval estimate for the correlation between hours worked and income. Based upon this interval, can you confidently conclude that people who work more hours tend to earn higher incomes?

Question #7: Using the ACS data, find a 90% confidence interval estimate for the difference in mean incomes of males and females. Based upon this interval, can you confidently conclude that males tend to earn higher incomes?