\(~\)

Onboarding

This lab focuses on the concepts that are essential to statistical testing. All of the \(p\)-values you report in this lab will be found via simulation using StatKey. Although real scientific research rarely uses simulation to find \(p\)-values, it is valuable to see how your observed outcome compares to outcomes that could have occurred if the null hypothesis were true.

For each hypothesis test you perform you will be expected to report:

  1. The null hypothesis and alternative hypothesis using both words and statistical notation.
  2. The two-sided \(p\)-value of the test, estimated using StatKey.
  3. A one-sentence summary of the results that follows the guidelines of this week’s lecture slides

\(~\)

Lab

The examples in this lab will use the random sample of n=200 games played in the National Football League (NFL) between 2018 and 2023 that you worked with in our previous lab on confidence intervals. These data contain the following variables of interest:

  • season - the year in which the game was played
  • game_type - whether the game was a regular season game (game_type = "Reg") or a playoff game (game_type = "Playoff")
  • game_outcome - whether the game’s home team won (game_outcome = 1), or lost (game_outcome = 0), or tied (game_outcome = 0.5)
  • home_score - the points scored by the home team
  • away_score - the points scored by the away team
  • score_diff - the score of the home team minus the score of the away team
## Libraries
library(dplyr)
library(ggplot2)

## Data used in examples
nfl = read.csv("https://remiller1450.github.io/data/nfl_sample.csv")

## We'll re-code "game_outcome" to count ties as losses
nfl$win_binary = ifelse(nfl$game_outcome == 1, "Win", "Loss or Tie")

\(~\)

Single Proportions

Statistical tests involving a single proportion use hypotheses of the form: \[H_0: p = \text{hypothesized value} \\ H_a: p \ne \text{hypothesized value}\]

Oftentimes (but not always!) the hypothesized value is 0.5, as this reflects both outcomes of a binary random variable being equally likely.

The null distribution for a single proportion depends upon both the hypothesized value and the sample size. Consequently, you must enter both of these into StatKey in order to simulate the null distribution.

Our example of a single proportion test will evaluate whether home teams are more likely to win than away teams. In this example, we’ll obtain the necessary information to simulate the null distribution in StatKey from a one-way frequency table:

## One-way frequency table to find the numerator and denominator of the sample proportion
table(nfl$win_binary)
## 
## Loss or Tie         Win 
##          94         106

In StatKey, we enter the observed count (106 wins), the sample size (200 games), and the null hypothesis (\(p=0.5\)). Then, we simulate at least a couple thousand outcomes to approximate the null distribution. Generally we’ll want to simulate at least a couple thousand outcomes to get a good approximation of the null distribution.

The figure below shows 2000 simulated outcomes with the simulated proportions that are at least as extreme as the proportion observed in our sample (\(106/200 = 0.53\)) highlighted in red. To highlight these samples you should first click the “two-tailed” button, then click the box underneath the distribution’s right tail and change its value to 0.53 (the sample proportion).

From this figure we estimate the \(p\)-value to be 0.448, which indicates that our sample provides insufficient evidence that home teams are more likely to win than away teams.

Note: StatKey assumes symmetry when determining two-sided \(p\)-values, which is why not all of the simulated samples with proportions of 0.47 are counted towards the \(p\)-value.

Question #1: For this question you will use the Iowa City home sales data set to evaluate whether there is statistical evidence that the Johnson County assessor is systematically overvaluing homes in its assessments. To do this you should perform a hypothesis test involving the new variable over_assessed that is created below. Be sure to include all 3 steps mentioned in the lab’s “Onboarding” section when reporting your hypothesis test and results.

## Data for Question #1
homes =  read.csv("https://remiller1450.github.io/data/IowaCityHomeSales.csv")
homes$over_assessed = ifelse(homes$sale.amount > homes$assessed, "Over Assessed", "Under Assessed")

\(~\)

Differences in Proportions

Statistical tests involving a difference in proportions use hypotheses of the form: \[H_0: p_1 - p_2= \text{hypothesized value} \\ H_a: p_1-p_2 \ne \text{hypothesized value}\]

A hypothesized value of 0 is almost always used in this context, as this reflects both groups having the same proportion. Thus, rejecting the null hypothesis amounts to concluding that the variable defining the groups is associated with the variable expressed in the proportions.

The null distribution for a difference in proportions depends upon both the hypothesized value and the numerator and denominator of both proportions. The reason for this is that there are infinitely many ways for \(p_1 - p_2\) to equal any hypothesized value, but not all of those ways are equally plausible, so the most realistic null distribution pools all of the data together such that \(p_1 = p_2 = p_{\text{pooled}}\).

Our example will explore whether the likelihood of the home team winning differs between playoff games and regular season games. The null hypothesis for this application is \(H_0: p_1 - p_2 = 0\).

## Two-way frequency table to find the numerator and denominator of each proportion
table(nfl$game_type, nfl$win_binary)
##          
##           Loss or Tie Win
##   Playoff           6   5
##   Reg              88 101

From the two-way frequency table, we observe the sample proportions:

  • \(\hat{p}_1 = 5/11\)
  • \(\hat{p}_2 = 101/189\)

If the chances that home team wins are independent of the type of game (playoff or regular season) it shouldn’t make a difference whether a game in our sample was played during the playoffs or the regular season and we should use our entire sample to estimate the likelihood of the home team winning. So, \(\hat{p}_{\text{pooled}}= 106/200\) is used to simulate the null distribution (though StatKey will handle the pooling for us).

Shown below are 2000 simulated differences in proportions under \(H_0: p_1 - p_2 = 0\):

The observed difference in proportions, \(5/11 - 101/189\), is \(-0.08\), so the \(p\)-value can be estimated by the proportion of simulated samples at least as unusual as \(-0.08\), which I found to be around 0.68. Thus, there is insufficient evidence that likelihood of the home team winning differs between NFL regular season and playoff games.

Question #2: In a study on drug-impaired driving participants were asked if they had ever driven within two-hours of consuming cannabis (recorded as Ever = "Yes"). Participants went on to drive in a simulator 30, 90, and 180 minutes after consuming cannabis. Before each drive, they were asked if they ‘felt ready to safely drive a real vehicle on real roadways,’ and their responses are recorded as RT2, RT3, and RT4, respectively. For this question you should evaluate whether participants who have previously driven with two-hours of consuming cannabis are more likely to report feeling ready to drive 180-minutes after consuming cannabis (recorded as RT4).

## Data for Question #2
ready = read.csv("https://remiller1450.github.io/data/Ready.csv")

\(~\)

Means and Differences in Means

Statistical tests involving a single mean use hypotheses of the form: \[H_0: \mu= \text{hypothesized value} \\ H_a: \mu \ne \text{hypothesized value}\]

Generally, there is no single ‘hypothesized value’ that is most common. However, in a special situation known as a paired design a hypothesized value of zero amounts to “no association”. In a paired design, a single variable (column) represents the paired differences between two conditions.

In our NFL sample, the variable score_diff fits this definition, as it is calculated as a difference in home team score and away team score which are paired due to them occurring in the same game. Thus, we can test the hypothesis \(H_0: \mu = 0\) to determine whether our sample provides evidence that home teams outscore away teams on average..

Unfortunately, simulating the null distribution for a single mean requires providing the entire column of data to StatKey. This means downloading the data CSV file and using the “Upload File” button on StatKey.

You will also need to modify the hypothesized value to reflect your null hypothesis before simulating results. If you correctly select the score_diff column and modify the null hypothesis to \(H_0: \mu = 0\) you should be able to simulate a null distribution that looks like the one shown below:

While it’s not shown in the null distribution displayed above, you should be able to confirm on your own that the \(p\)-value in this application is around 0.38.

The steps are similar for a hypothesis test involving a difference in means, but you must also select a categorical variable that defines each group.

The results below show the simulated null distribution and \(p\)-value for a difference in means test of the hypothesis \(H_0: \mu_1 - \mu_2\), where \(\mu_1\) is the mean score differential for regular season games and \(\mu_2\) is the mean score differential for playoff games.

A few things to note:

  1. For tests involving a single proportion or a difference in the proportions it is straightforward to simulate outcomes under the null hypothesis using what amounts to sets of weighted coin flips (where the “weight” is the hypothesized proportion of the outcome of interest).
  2. For tests involving a mean or a difference in means, it is more challenging to simulate outcomes under the null hypothesis. For tests involving a single mean, the data are shifted by a constant value such that their new mean value is the one specified in the null hypothesis. These shifted data-points are then sampled with replacement.
  3. For tests involving a difference in means, the group identities of each data-point are randomly re-assigned, as the null hypothesis implies independence between the categorical variable defining each group and the quantitative variable in the analysis.

Question #3: At the 2008 Olympics, an unprecedented number of world records were set by swimmers wearing a new type of scientifically designed wetsuit known as the LZR racer. The suit was so controversial that in 2010 new international rules were created to regulate swimsuit coverage and material. For this question you’ll analyze data from a study of \(n=12\) competitive swimmers who swam a 1500m time trial with a scientifically designed wetsuit and with a normal swimsuit. Because each participant swam under each condition, this is a paired design, and you should perform an appropriate hypothesis test using the variable difference (the difference in velocity under each condition) to evaluate whether there is compelling statistical evidence that the new wetsuits improve performance.

Question #4: In Question #3 you performed a paired differences test, which is the appropriate approach given the design of the wetsuit study. In this question you’ll perform a difference in means test that inappropriately ignores the paired study design. To do this, you should use the version of the data linked below:

Question #5: In statistics the term power describes the likelihood that a statistical test correctly rejects the null hypothesis (given the null hypothesis is false). A test with higher power will correctly reject a false null hypothesis more often than a test with lower power. Comparing the results of Questions #3 (paired difference test) and Question #4 (ordinary difference in means test), which test do think this was more powerful?