\(~\)
This lab focuses on the concepts that are essential to statistical testing. All of the \(p\)-values you report in this lab will be found via simulation using StatKey. Although real scientific research rarely uses simulation to find \(p\)-values, it is valuable to see how your observed outcome compares to outcomes that could have occurred if the null hypothesis were true.
For each hypothesis test you perform you will be expected to report:
\(~\)
The examples in this lab will use the random sample of n=200 games played in the National Football League (NFL) between 2018 and 2023 that you worked with in our previous lab on confidence intervals. These data contain the following variables of interest:
season
- the year in which the game was playedgame_type
- whether the game was a regular season game
(game_type = "Reg"
) or a playoff game
(game_type = "Playoff"
)game_outcome
- whether the game’s home team won
(game_outcome = 1
), or lost
(game_outcome = 0
), or tied
(game_outcome = 0.5
)home_score
- the points scored by the home teamaway_score
- the points scored by the away teamscore_diff
- the score of the home team minus the score
of the away team## Libraries
library(dplyr)
library(ggplot2)
## Data used in examples
nfl = read.csv("https://remiller1450.github.io/data/nfl_sample.csv")
## We'll re-code "game_outcome" to count ties as losses
nfl$win_binary = ifelse(nfl$game_outcome == 1, "Win", "Loss or Tie")
\(~\)
Statistical tests involving a single proportion use hypotheses of the form: \[H_0: p = \text{hypothesized value} \\ H_a: p \ne \text{hypothesized value}\]
Oftentimes (but not always!) the hypothesized value is 0.5, as this reflects both outcomes of a binary random variable being equally likely.
The null distribution for a single proportion depends upon both the hypothesized value and the sample size. Consequently, you must enter both of these into StatKey in order to simulate the null distribution.
Our example of a single proportion test will evaluate whether home teams are more likely to win than away teams. In this example, we’ll obtain the necessary information to simulate the null distribution in StatKey from a one-way frequency table:
## One-way frequency table to find the numerator and denominator of the sample proportion
table(nfl$win_binary)
##
## Loss or Tie Win
## 94 106
In StatKey, we enter the observed count (106 wins), the sample size (200 games), and the null hypothesis (\(p=0.5\)). Then, we simulate at least a couple thousand outcomes to approximate the null distribution. Generally we’ll want to simulate at least a couple thousand outcomes to get a good approximation of the null distribution.
The figure below shows 2000 simulated outcomes with the simulated proportions that are at least as extreme as the proportion observed in our sample (\(106/200 = 0.53\)) highlighted in red. To highlight these samples you should first click the “two-tailed” button, then click the box underneath the distribution’s right tail and change its value to 0.53 (the sample proportion).
From this figure we estimate the \(p\)-value to be 0.448, which indicates that our sample provides insufficient evidence that home teams are more likely to win than away teams.
Note: StatKey assumes symmetry when determining two-sided \(p\)-values, which is why not all of the simulated samples with proportions of 0.47 are counted towards the \(p\)-value.
Question #1: For this question you will use the Iowa
City home sales data set to evaluate whether there is statistical
evidence that the Johnson County assessor is systematically overvaluing
homes in its assessments. To do this you should perform a hypothesis
test involving the new variable over_assessed
that is
created below. Be sure to include all 3 steps mentioned in the lab’s
“Onboarding” section when reporting your hypothesis test and
results.
## Data for Question #1
homes = read.csv("https://remiller1450.github.io/data/IowaCityHomeSales.csv")
homes$over_assessed = ifelse(homes$sale.amount > homes$assessed, "Over Assessed", "Under Assessed")
\(~\)
Statistical tests involving a difference in proportions use hypotheses of the form: \[H_0: p_1 - p_2= \text{hypothesized value} \\ H_a: p_1-p_2 \ne \text{hypothesized value}\]
A hypothesized value of 0 is almost always used in this context, as this reflects both groups having the same proportion. Thus, rejecting the null hypothesis amounts to concluding that the variable defining the groups is associated with the variable expressed in the proportions.
The null distribution for a difference in proportions depends upon both the hypothesized value and the numerator and denominator of both proportions. The reason for this is that there are infinitely many ways for \(p_1 - p_2\) to equal any hypothesized value, but not all of those ways are equally plausible, so the most realistic null distribution pools all of the data together such that \(p_1 = p_2 = p_{\text{pooled}}\).
Our example will explore whether the likelihood of the home team winning differs between playoff games and regular season games. The null hypothesis for this application is \(H_0: p_1 - p_2 = 0\).
## Two-way frequency table to find the numerator and denominator of each proportion
table(nfl$game_type, nfl$win_binary)
##
## Loss or Tie Win
## Playoff 6 5
## Reg 88 101
From the two-way frequency table, we observe the sample proportions:
If the chances that home team wins are independent of the type of game (playoff or regular season) it shouldn’t make a difference whether a game in our sample was played during the playoffs or the regular season and we should use our entire sample to estimate the likelihood of the home team winning. So, \(\hat{p}_{\text{pooled}}= 106/200\) is used to simulate the null distribution (though StatKey will handle the pooling for us).
Shown below are 2000 simulated differences in proportions under \(H_0: p_1 - p_2 = 0\):
The observed difference in proportions, \(5/11 - 101/189\), is \(-0.08\), so the \(p\)-value can be estimated by the proportion of simulated samples at least as unusual as \(-0.08\), which I found to be around 0.68. Thus, there is insufficient evidence that likelihood of the home team winning differs between NFL regular season and playoff games.
Question #2: In a study on drug-impaired driving
participants were asked if they had ever driven within two-hours of
consuming cannabis (recorded as Ever = "Yes"
). Participants
went on to drive in a simulator 30, 90, and 180 minutes after consuming
cannabis. Before each drive, they were asked if they ‘felt ready to
safely drive a real vehicle on real roadways,’ and their responses are
recorded as RT2, RT3, and RT4, respectively. For this question you
should evaluate whether participants who have previously driven with
two-hours of consuming cannabis are more likely to report feeling ready
to drive 180-minutes after consuming cannabis (recorded as
RT4
).
## Data for Question #2
ready = read.csv("https://remiller1450.github.io/data/Ready.csv")
\(~\)
Statistical tests involving a single mean use hypotheses of the form: \[H_0: \mu= \text{hypothesized value} \\ H_a: \mu \ne \text{hypothesized value}\]
Generally, there is no single ‘hypothesized value’ that is most common. However, in a special situation known as a paired design a hypothesized value of zero amounts to “no association”. In a paired design, a single variable (column) represents the paired differences between two conditions.
In our NFL sample, the variable score_diff
fits this
definition, as it is calculated as a difference in home team score and
away team score which are paired due to them occurring in the same game.
Thus, we can test the hypothesis \(H_0: \mu =
0\) to determine whether our sample provides evidence that home
teams outscore away teams on average..
Unfortunately, simulating the null distribution for a single mean requires providing the entire column of data to StatKey. This means downloading the data CSV file and using the “Upload File” button on StatKey.
You will also need to modify the hypothesized value to reflect your
null hypothesis before simulating results. If you correctly select the
score_diff
column and modify the null hypothesis to \(H_0: \mu = 0\) you should be able to
simulate a null distribution that looks like the one shown below:
While it’s not shown in the null
distribution displayed above, you should be able to confirm on your own
that the \(p\)-value in this
application is around 0.38.
The steps are similar for a hypothesis test involving a difference in means, but you must also select a categorical variable that defines each group.
The results below show the simulated null distribution and \(p\)-value for a difference in means test of the hypothesis \(H_0: \mu_1 - \mu_2\), where \(\mu_1\) is the mean score differential for regular season games and \(\mu_2\) is the mean score differential for playoff games.
A few things to note:
Question #3: At the 2008 Olympics, an unprecedented
number of world records were set by swimmers wearing a new type of
scientifically designed wetsuit known as the LZR racer. The suit was so
controversial that in 2010 new international rules were created to
regulate swimsuit coverage and material. For this question you’ll
analyze data from a study of \(n=12\)
competitive swimmers who swam a 1500m time trial with a scientifically
designed wetsuit and with a normal swimsuit. Because each participant
swam under each condition, this is a paired design, and you should
perform an appropriate hypothesis test using the variable
difference
(the difference in velocity under each
condition) to evaluate whether there is compelling statistical evidence
that the new wetsuits improve performance.
Question #4: In Question #3 you performed a paired differences test, which is the appropriate approach given the design of the wetsuit study. In this question you’ll perform a difference in means test that inappropriately ignores the paired study design. To do this, you should use the version of the data linked below:
Question #5: In statistics the term power describes the likelihood that a statistical test correctly rejects the null hypothesis (given the null hypothesis is false). A test with higher power will correctly reject a false null hypothesis more often than a test with lower power. Comparing the results of Questions #3 (paired difference test) and Question #4 (ordinary difference in means test), which test do think this was more powerful?