Directions:

Question #1

In a previous assignment you worked with the American Community Survey (ACS) data, which are a component of the US Census administered to a random sample of US addresses on a rolling basis. When the mailed version is combined with in-person visits and telephone calls the survey has a 95% response rate. The data linked below are a random sample of employed individuals drawn from a recent ACS (2020 Census):

acs = read.csv("https://remiller1450.github.io/data/EmployedACS.csv")

The ACS data linked above includes the following variables:

In this question you will perform several hypothesis tests. For each hypothesis test (Parts A-C) you should include the following steps:

  1. State the null and alternative hypotheses using either words or statistical notation.
  2. Use either StatKey or an appropriate R function to find the \(p\)-value.
  3. Provide a one-sentence conclusion summarizing the results of your hypothesis test. Be sure to follow the guidelines from this week’s lecture slides.

\(~\)

Question #2

Rosiglitazone is the active ingredient in the controversial type 2 diabetes medicine Avandia and has been linked to an increased risk of serious cardiovascular problems such as stroke, heart failure, and death. A common alternative treatment is Pioglitazone, the active ingredient in a diabetes medicine called Actos. In a nationwide retrospective observational study of 227,571 Medicare beneficiaries aged 65 years or older, it was found that 2,593 of the 67,593 patients using Rosiglitazone and 5,386 of the 159,978 using Pioglitazone had serious cardiovascular problems. These data are summarized in the contingency table below.

Treatment No CV problems CV Problems Total
Pioglitazone 154592 5386 159978
Rosiglitazone 65000 2593 67593
Total 219592 7979 227571

\(~\)

Question #3

In modern biomedical studies it is relatively common to record measurements for thousands of genetic features at once. Suppose a cancer researcher collects data on 2000 genes, and 20 of these genes are truly related to the cancer that the researcher is studying. For each of the 2000 genes, the researcher performs a hypothesis test and compares the \(p\)-value to a decision threshold of \(\alpha = 0.05\).