Sta-209 Lab #5 - Randomization Tests

Directions

Read through the entire lab (not just the questions). The lab will introduce course content that you will be responsible for on exams/homework.
Answer all questions in a separate document, attaching Minitab output if needed.
Do not use a “divide and conquer” strategy. While it is tempting to get done quicker, this approach negatively impacts you and your classmates. You are expected to work through the lab as a team. Also, you should recognize that Prof. Miller is happy to devote more class time to a lab if it is taking longer than anticipated.

In this lab we will explore hypothesis testing using a couple of case studies. It will focus on using randomization to estimate the null distribution. This is procedure is sometimes called randomization testing.

Case Study #1 - Parking Behavior

Have you ever been waiting for a parking spot and felt like people take forever to exit their spot?

Pyschologists Ruback and Juieng investigated this question in the research paper: Territorial defense in parking lots: Retaliation against waiting drivers, which describes a series of studies investigating how various factors relate to how quickly someone leaves a public parking space.

In the first of these studies, Ruback and Juieng observed 200 drivers departing from a public parking lot. For each departing driver they recorded the time (in seconds) between when each driver first entered their car and when they exited their parking space. Additionally, they recorded whether another car was waiting for the space while the driver got into their car and exited their space.

The Parking Dataset contains the results of this experiment

time documents how many seconds it took the driver to exit the parking space after entering their vehicle
waiting is a binary variable indicating whether another car was waiting for the spot (“yes” if another vehicle was waiting)
gender a variable documenting the gender of driver exiting the parking space

Question #1

Construct a graph that compares the distribution of exit time when another vehicle is waiting and when another vehicle is not waiting. Do these distributions appear skewed? Is there an association between leaving time and the presence of a waiting vehicle?

Hypothesis Testing

In this study, researchers wanted to determine whether drivers exited faster when another car was waiting for their spot.

To answer this, they might use statistical testing to evaluate whether the mean leaving time is the same for each group (where groups are defined by whether another car is waiting), or if the mean is lower when another car is waiting.

Question #2

Using proper statistical notation, what is the null hypothesis of this test? What is the alternative hypothesis? Is this a one-sided or two-sided test?

Question #3

Using proper statistical notation, what is your best estimate of the parameter specified in your null hypothesis in Question #2? Provide both the notation for your best estimate, and its actual numeric value.

Randomization Testing in StatKey

Randomization testing simulates the data collection process in a world where the null hypothesis is true. The different estimates that arise from these simulations are used to construct the randomization distribution, which is an estimate of the null distribution. This allows us to understand the estimates that we’d expect to see had the null hypothesis been true.

If the observed estimate in the original study is deemed sufficiently rare (relative the possible estimates shown in the null distribution), we declare the observed difference in our sample to be statistically significant.

StatKey is a tool that allows us to create randomization distributions for a few common situations. In the parking example, we are interested doing a “randomization test for a difference in means”.

As we saw in the bootstrapping lab, you’ll need to make use of the “Edit Data” option. When you click on “Edit Data” you can see how your data needs to be formatted; once you recognize the correct format, you can simply copy-paste the correct columns from Minitab.

After you have the data loaded into StatKey, you can view it in the “Original Sample” panel. You should always check this panel to make sure the data were loaded in correctly (for example, make sure \(\bar{x}_1 - \bar{x}_2\) is the same as what you see using Minitab).

To simulate the data collection process under the null hypothesis, click on “Generate 1 Sample”.

Question #4:

When you click on “Generate 1 Sample”, what is plotted in the panel titled “Randomization Dotplot of \(\bar{x}_1 - \bar{x}_2\)”? Be very specific in your answer.

How does it work?

Depending on the type of parameter you are estimating StatKey will simulate randomization samples differently. The way this happens is summarized below:

Single Mean: Data-points are shifted so that their mean is the null value, then these shifted data-points are re-sampled with replacement
Single Proportion: A virtual coin is flipped \(n\) times with a “success” probability equal to that specified in the null hypothesis
Difference in Means: The labels defining each of the two groups are shuffled, so that the numeric values of the original data-points now correspond with randomly assigned groups
Difference in Proportions: The labels defining each of the two groups are shuffled, so that the “success”/“failure” outcomes of the original data-points now correspond with randomly assigned groups
Randomization Testing for Correlation and Regression: The Y variable (the response) is shuffled so that each value of X (the explanatory variable) ends up paired with a different, randomly chosen value of Y in each randomization sample.

Question #5:

When you clicked on “Generate 1 Sample” in Lab Question #4, what was plotted in the panel titled “Randomization Sample”? Be specific.

Question #6:

Before you clicked on “Generate 1 Sample”, could you have known the total number of data-points in the randomization sample with waiting times longer than 70 seconds? Could have known which groups these data-points would belong to? Briefly explain.

Finding the \(p\)-value

To get an accurate assessment of how rare our observed sample is (if the null hypothesis were true), we need to compare it with a larger number of randomization samples. My general advice is to generate randomization samples until the standard error of the randomization distribution stays roughly constant as you generate additional samples. For most applications, this takes a few thousand randomization samples.

Question #6

Reset your randomization plot and generate 1,000 randomization samples. Include your plot in your lab write-up and answer the following questions: How many does are there and what do they represent? Why are most of these dots close to zero?

Once you’ve constructed the randomization distribution, the next step in the hypothesis test is to determine how rare/unexpected the observed estimate would be had the null hypothesis been true. This can be done by checking the appropriate “Left Tail”, or “Right Tail” box. In this step, you need to be aware of the direction of your test because there are two different one-sided tests.

Once you’ve selected the correct tail, you can click the box on x-axis of the randomization dotplot to input the estimate from your original sample. StatKey will tell you the proportion of randomization samples that are at least as extreme as the value you entered. This proportion is the p-value, which provides you the information you need to make a ruling on whether or not you believe the null hypothesis.

The plot below corresponds with the test of \(H_0: \mu_1 - \mu_2 = 0\) vs \(H_1: \mu_1 - \mu_2 > 0\). Notice here that \(\bar{x}_1\) is the sample mean of the “Smile” group:

The plot below corresponds with the test of \(H_0: \mu_1 - \mu_2 = 0\) vs \(H_1: \mu_1 - \mu_2 < 0\). Notice the impact of specifying the wrong direction:

Question #7:

Using your randomization plot constructed in Lab Question #6, formally conduct a test of the hypothesis that drivers leave faster when another car was waiting. Report your null and alternative hypotheses (recall that you already stated them earlier), your p-value, a one sentence conclusion.

Two-sided Tests

One-sided hypothesis tests can be risky, if you specify the direction incorrectly you can completely miss out on an interesting discovery. This might sound like a minor inconvenience; couldn’t we just switch hypotheses after seeing the data?

The answer is “no”. In its purest sense, statistical testing is only meaningful when a hypothesis is specified a priori (ahead of time). After you’ve seen the data important properties of statistical testing will not hold if you form your hypotheses around what you saw.

Post hoc testing (and hypothesis formation) is suspected to be a contributing factor in the reproducibility crisis facing many areas of scientific research. Practically speaking, one-sided tests are almost never used. Not only do they look suspicious (like you looked at your data ahead of time), they also risk missing important findings.

Two-sided tests are a little trickier to perform in StatKey due to the ambiguity regarding how to determine values “at least as unexpected” in two opposite directions. One possible definition, which you can find by clicking “Two-Tail”, is to double the p-value from the correctly specified one-sided test. An example of this is shown below, the two-sided p-value in this example is 0.058:

An alternate, but equally valid, approach is to interpret “at least as unexpected” in terms of distance from the null value. This often requires separate specification of the right and left tail cutoffs, notice that the two-sided p-value here is 0.057:

Question #8:

Using your randomization plot constructed in Lab Question #6, conduct a two-sided hypothesis test. Include a screenshot of your randomization distribution, report your p-value, and make a one sentence conclusion addressing the originally research question.

To end this section of the lab, we should briefly comment on why StatKey generates its randomization samples in the ways that it does. Generally speaking, randomization samples are created such that the following are satisfied:

The randomization samples, on average, will satisfy the null hypothesis (ie: the center of the randomization distribution is the null value)
The randomization distribution will correctly reflect the amount of variability in the observed data (leading to the spread of the randomization distribution being that of the null distribution)

Question #9:

In randomization testing for a single mean, StatKey re-samples the shifted data points with replacement. Why is replacement necessary? Explain in 1-2 sentences.

Question #10:

For randomization testing of a difference in means (or a difference in proportions), StatKey reallocates (shuffles) the group labels. Briefly explain why this approach satisfies both 1) and 2) stated prior to Question 9.

Case Study #2 - Concussions in Sports

You can find the data “Concussions” at this link. These data originate from a study published in the Journal of Athletics Training in 2003 by Covassin, Swanik, and Sachs titled “Sex Differences and the Incidence of Concussions Among Collegiate Athletes”. In the study, the authors used data from NCAA Injury Surveillance System (ISS), a voluntary injury reporting system used by athletic trainers at colleges across the United States. The NCAA ISS is considered to be a representative sample of all US colleges. The data we’ll analyze contains the following variables:

Sex - Male or Female
Sport - Basketball, Gymnastics, Lacrosse, Soccer, or Softball/Baseball
Year - 1997, 1998, or 1999
Concussion - The number of reported concussions for a given sex, sport, and year
No Concussion - The exposure (the number of games/competitions times the number athletes participating in each game/competition) for a given sex, sport, and year that did not result in a concussion

Question #11:

Use a Minitab formula to create a new column displaying the proportion of concussions for each sex, sport and year combination. By inspecting, graphing, or summarizing this column, which sport appears to lead to the highest proportion of concussions?

Question #12:

Within the “Stat -> Descriptive Statistics” Menu, click on the button titled “Statistics” and select the checkbox for “Sum” (you can also uncheck the other boxes if you want to de-clutter the output). Then on the main menu, enter the variables “Concussion” and “No Concussion” in the “Variables” panel and include “Sex” as a by variable. This will provide you the total number of concussions and no-concussions for each sex. With this information, use a randomization test to answer the question: “do a higher proportion of female athletes (in these sports) sustain concussions?” Use only 1000 randomization samples to avoid crashing StatKey. Clearly state your null and alternative hypotheses using proper statistical notation.

Question #13:

Return to the “Stat -> Descriptive Statistics” Menu and this time include both “Sex” and “Sport” as by variables. With this information, report the difference in proportions (female minus male) separately for each sport. Do these results seem consistent with your findings in Question 12? Do you think that sport might be confounding variable? (Hint: You don’t need to do any statistical tests to answer this question)

Question #14:

Using an approach similar those described in previous questions, perform a randomization test to determine whether proportion of concussions in 1999 differs from the proportion of concussions in 1997. Use only 1000 randomization samples to avoid crashing StatKey. Clearly state your null and alternative hypotheses using proper statistical notation.

Question #15:

Critics might point out that the proportions in these data are very small and we shouldn’t be worried about the male/female and year-to-year differences in concussions that we analyzed. However, in situations involving rare events it is common for researchers to look at ratios of proportions (a measure called relative risk) rather than differences in proportions. What is the female/male relative risk of concussion based upon these data? Also, had a randomization test been done on the relative risk, what would the null hypothesis be?

On Your Own

Question #16:

For this question I’d like you and your group to use the data from either case study presented in this lab to form and test a hypothesis of your choosing. You should include:

A sentence articulating your research question
Your null and alternative hypotheses
Your observed estimate and it’s \(p\)-value
A screenshot of the randomization distribution you used to determine the \(p\)-value
A sentence summarizing the results of the hypothesis test

Endnotes:

The data used in this lab are an artificial recreation designed to closely mimic those used by Ruback and Juieng (1997), to my knowledge the original data are not publicly available.

Submission Directions

Double check that you’ve completed all of the lab’s questions, making sure that everyone in your group agrees with the answer you’ve provided. You will receive a single group score for the lab.
Make sure that everyone’s name is on the write-up.
Email your completed write-up to Professor Miller with a subject heading that includes the text “Sta-209-Lab5”. Please include this exact character string, including the dashes. You will lose 1 point off the top of your score if you don’t do so.
If you’d like to provide feedback on your group, fill out the optional review form at this link: https://forms.gle/wNWRFMbbra8oK4LJ8