Directions:
\(~\)
Introduction
The purpose this lab is to practice applying concepts and procedures related to hypothesis testing
\(~\)
A hypothesis test seeks to falsify a certain null hypothesis using sample data. The major steps are:
In many circumstances we can use the Central Limit theorem to find a Normal model for the null distribution:
Below is a review of the different \(SE\) formulas from the CLT:
Summary measure | population parameter | sample estimate | \(SE\) |
---|---|---|---|
single proportion | \(p\) | \(\hat{p}\) | \(\sqrt{\tfrac{p*(1-p)}{n}}\) |
difference of proportions | \(p_1-p_2\) | \(\hat{p}_-\hat{p_2}\) | \(\sqrt{\tfrac{p_1*(1-p_1)}{n_1} + \tfrac{p_2*(1-p_2)}{n_2}}\) |
single mean | \(\mu\) | \(\bar{x}\) | \(\tfrac{\sigma}{\sqrt{n}}\) |
difference of means | \(\mu_1 - \mu_2\) | \(\bar{x}_1 - \bar{x}_2\) | \(\sqrt{\tfrac{\sigma_1^2}{n_1} + \tfrac{\sigma_2^2}{n_2}}\) |
Other times we might need to rely on simulation to generate a null distribution, something we can do using StatKey.
\(~\)
The Washington Post manages a comprehensive database of instances where police officers have used deadly force on an suspect dating back to 2015.
These data contain the following variables:
Question #1: According to the US Census, the current racial composition of the US is 61.5% Non-Hispanic White, 17.6% Hispanic (of any race), 12.3% Black, 5.3% Asian, 0.7% Native American, and 2.6% other (source). Based upon this information, perform a hypothesis test to evaluate whether the percentage of Hispanic individuals among those killed by the police is statistically different from what would be expected according to the US Census. In doing so, please organize your response by answering the following parts (A - E):
Question #2: According to a report published by the Bureau of Justice Statistics (BJS) in November 2018, 47% of law enforcement agencies had acquired body cameras, and among these agencies 29 body cameras were available for every 100 officers. This suggests a 13.63% chance that a randomly selected officer will be wearing a body camera at any moment in time. Based upon this information, perform a hypothesis test to determine whether the presence of body cameras during police involved killings is statistically different from the expected rate of 13.63%. Please organize your response by answering the following parts:
Question #3: Is it possible that the conclusion you reached in Question #1 was an error? If so, would be a Type 1 or Type 2 error? Briefly explain.
Question #4: Is it possible that the conclusion you reached in Question #2 was an error? If so, would be a Type 1 or Type 2 error? Briefly explain.
\(~\)
A ballot initiative refers to a law, provision, or constitutional change that is voted on directly by the population of a state. These initiatives generally appear on the ballot during regularly scheduled elections held within the state. In most states, adding an initiative to the election ballot requires a formal petition that is signed by a minimum number of registered voters.
According to Ballotpedia, the minimum number of signatures needed to place a measure on the ballot in Ohio is based on the total number of votes cast for the governor in the preceding general election. For example, the current threshold for getting a proposed change to the state constitution onto the ballot is 442,958 signatures (until Nov. 2022 when the next gubernatorial race is held).
For this application we’ll consider an advocacy group that has collected 490,000 signatures on a petition for a proposed change to the Ohio constitution. Before the proposed change is added to the ballot, these signatures must be verified to ensure they actually correspond to registered voters. Because signature verification is a very time-intensive process, it is common to validate only a sample of the signatures and use statistical methods to determine if the signature threshold is met.
As an example of how this works, let’s suppose that government officials take a simple random sample of \(n = 2000\) signatures, and 1826 are verified as belonging to registered voters.
Question #5: What proportion of the groups 490,000 signatures need to be valid for the change to make it onto the ballot? What proportion of signatures were valid in the random sample of \(n = 2000\)? Does the fact that the proportion is higher in the sample provide sufficient evidence that the change should be on the ballot? Briefly explain.
Question #6: Are you confident that this sample of \(n = 2000\) signatures accurately represents the entire population of 490,000 signatures? Briefly explain.
Question #7: Perform a hypothesis test evaluating whether the sample of \(n = 2000\) signatures provide sufficient evidence to put the proposed change onto the ballot. Be sure to: 1) clearly state your null and alternative hypotheses, 2) use an appropriate null model to calculate a \(p\)-value, 3) provide a 1-sentence conclusion that puts the results of your test into context.
Question #8: What would a Type 1 error mean for the hypotheses you evaluated in Question #7? Is it possible your decision was a Type 1 error?
Question #9: What would a Type 2 error mean for the hypotheses you evaluated in Question #7? Is it possible your decision was a Type 2 error?
Question #10: For this application, which do you think is worse, a Type 1 or Type 2 error? What could be done to reduce to the likelihood of making a Type 2 error?
\(~\)
In the 1980s, Pepsi launched what they called the “Pepsi Challenge”, where they had individuals taste unlabeled cups of both beverages and report which they preferred. Wikipedia describes the challenge’s methodology as follows:
At malls, shopping centers, and other public locations, a Pepsi representative sets up a table with two cups: one containing Pepsi and one with Coca-Cola. Shoppers are encouraged to taste both colas, and then select which drink they prefer. Then the representative reveals the two bottles so the taster can see whether they preferred Coke or Pepsi. The results of the test leaned toward a consensus that Pepsi was preferred by more Americans
If you’re curious, a television ad from 1983 that demonstrates the challenge is embedded below:
\(~\)
Question #11: Briefly critique the study of the Pepsi Challenge. That is, identify and discuss (using proper statistical terms when applicable) what Pepsi did to address confounding variables and biases as possible explanations for the choices made by study participants.
Question #12: In both words and statistical notation, state the null hypothesis that would be of interest in this application.
Question #13: In a trial of the Pepsi Challenge using \(n = 71\) study participants, 42 chose the cup containing Pepsi. Use this information to come up with a Normal probability model that can be used to evaluate the null hypothesis you stated in Question #12.
Question #14: Find the \(p\)-value using the model you described in Question #13, then provide a brief conclusion using the \(\alpha = 0.01\) threshold for statistical significance.