Sta-209 Lab #5 - Randomization Tests

In this lab we will explore hypothesis testing using a couple of case studies. It will focus on using randomization to estimate the null distribution. This is procedure is sometimes called randomization testing.

Case Study #1 - Parking Behavior

Have you ever been waiting for a parking spot and felt like people take forever to exit their spot? Ruback and Juieng investigated this question in: Territorial defense in parking lots: Retaliation against waiting drivers. The paper describes a series of studies investigating how various factors relate to how quickly someone leaves a public parking space.

For the first of these studies, Ruback and Juieng observed 200 drivers departing from a public parking lot. For each departing driver they recorded the time (in seconds) between when entered their car and when they left the parking space. They also recorded whether someone in another car was waiting for the space.

The dataset “Parking” (Link) contains the results of this experiment

time documents how many seconds it took the driver to exit the parking space after entering their vehicle
waiting is a binary variable indicating whether another car was waiting for the spot (“yes” if another vehicle was waiting)
gender a variable documenting the gender of driver exiting the parking space

Question #1

Construct an appropriate graph displaying the distribution of leaving times depending upon whether another car was waiting. Do the distributions appear skewed? Does there appear to be an association between leaving time and the presence of a waiting vehicle?

Hypothesis Testing

In this study, researchers wanted to determine whether drivers left faster when another car was waiting. One way to do this is to test whether the mean leaving time is the same for each group (where groups are defined by whether another car is waiting), or if the mean is lower when another car is waiting.

Question #2

Using proper statistical notation, what is the null hypothesis of this test? What is the alternative hypothesis?

Question #3

Using proper statistical notation, what is your best estimate of the quantity specified in your null hypothesis in Lab Question #2? Provide both the notation for your best estimate, and its actual numeric value.

Randomization Testing in StatKey

Randomization testing simulates the original data collection process in a world where the null hypothesis is true. The values of statistic of interest that arise from these simulations are used to construct the randomization distribution, which is an estimate of the null distribution that allows us to get an idea of what estimates would be expected if the null hypothesis were true.

If our observed estimate is deemed sufficiently rare, for example, if we’d expect less than a 1/20 chance to see such a value when the null hypothesis is true, then we declare the observed difference in our sample to be statistically significant and reject the notion that null hypothesis is true.

StatKey is a tool that allows us to create the randomization distribution for a few common situations. For the parking case study, we are interested in a “randomization test for a difference in means”. When you click this button you should be brought to a menu that looks like this:

As we saw in the bootstrapping lab, StatKey comes with several pre-loaded data sets. To use our own data set, you’ll need to make use of the “Edit Data” option. When you click on “Edit Data” you can see how your data needs to be formatted; once you know this, you can simply delete the default data and copy-paste your columns from Minitab.

Once you have the right data loaded into StatKey, you can view they in the “Original Sample” panel. You should always check this panel to make sure the data were read in correctly (for example, make sure \(\bar{x}_1 - \bar{x}_2\) is the same as what you found using Minitab).

To simulate the data collection under the null hypothesis, click on “Generate 1 Sample”.

Question #4

When you click on “Generate 1 Sample”, what is plotted in the panel titled “Randomization Dotplot of \(\bar{x}_1 - \bar{x}_2\)”? Be very specific in your answer.

Question #5

When you clicked on “Generate 1 Sample” in Lab Question #4, what was plotted in the panel titled “Randomization Sample”? Be very specific in your answer.

Finding the \(p\)-value

To get an accurate assessment of how rare our observed sample is (if the null hypothesis were true), we need to compare it with a larger number of randomization samples. My general advice is to generate randomization samples until the standard error of the randomization distribution stays roughly constant as you generate additional samples. For most applications, this takes a few thousand randomization samples.

Question #6

Reset your randomization plot and generate 3,000 randomization samples. How many dots are in this plot? What does each dot represent? Include your plot in your lab write-up.

Once you’ve constructed the randomization distribution, the next step in the hypothesis test is to determine how rare/unexpected the observed estimate would be had the null hypothesis been true. This can be done by checking the appropriate “Left Tail”, or “Right Tail” box. In this step, you need to be aware of the direction of your test because two different one-sided tests are possible.

Once you’ve selected the correct tail, you can click the box on x-axis of the randomization dotplot to input the estimate from your original sample. StatKey will tell you the proportion of randomization samples that are at least as extreme as the value you entered. This proportion is the p-value, which provides you the information you need to make a ruling on whether or not you believe the null hypothesis.

The plot below corresponds with the test of \(H_0: \mu_1 - \mu_2 = 0\) vs \(H_1: \mu_1 - \mu_2 > 0\). Notice here that \(\bar{x}_1\) is the sample mean of the “Smile” group:

The plot below corresponds with the test of \(H_0: \mu_1 - \mu_2 = 0\) vs \(H_1: \mu_1 - \mu_2 < 0\). Notice the impact of specifying the wrong direction:

Question #7

Using your randomization plot constructed in Lab Question #6, formally conduct a test of the hypothesis that drivers leave faster when another car was waiting. Report your null and alternative hypotheses (recall that you already stated them earlier), your p-value, a one sentence conclusion.

Two-sided Tests

One-sided hypothesis tests are risky, if you get the direction wrong you can completely miss out on an interesting discovery. This might seem like a minor inconvenience; couldn’t we just switch hypotheses after seeing the data?

In its purest sense, statistical testing is only meaningful when a hypothesis is specified a priori (ahead of time). Once you’ve seen the data, important properties of statistical testing might not hold if you form your hypotheses around what you saw. Post hoc testing (and hypothesis formation) is suspected to be a contributing factor in the reproducibility crisis facing many areas of scientific research. Practically speaking, one-sided tests are almost never used, because not only do they look suspicious, they also risk missing important findings.

Two-sided tests are a little trickier to conduct in StatKey due to ambiguity as to how to define “at least as unexpected” in two opposite directions. One possible definition, which you can use by clicking “Two-Tail”, is to double the p-value from the correctly specified one-sided test. An example of this is shown below, the two-sided p-value in this example is 0.058:

An alternate, but equally valid, approach is to interpret “at least as unexpected” in terms of distance from the null value. This often requires separate specification of the right and left tail cutoffs, notice that the two-sided p-value here is 0.057:

Question #8

Using your randomization plot constructed in Lab Question #6, conduct a two-sided hypothesis test. Include a screenshot of your randomization distribution, report your p-value, and make a one sentence conclusion addressing the originally research question.

How is StatKey Generating Randomization Samples?

Up until now we’ve been vague about how the randomization samples are simulated. In general, they are constructed such that the following are satisfied:

The randomization samples, on average, will satisfy the null hypothesis (ie: the center of the randomization distribution is the null value)
The standard error of the randomization distribution correctly reflects the variability in the observed data

The details of how these properties are fulfilled in various testing situations is described below:

Randomization Testing for a Single Mean: StatKey takes the original data points and shifts their location such that the shifted mean is the value that is hypothesized under the null. The shifted data points are then sampled with replacement to ensure the appropriate amount of variability.
Randomization Testing for a Single Proportion: StatKey simulates weighted coin flips using the proportion specified in the null hypothesis. For example, under \(H_0: p = .3\), StatKey flips weighted coins with a 30% chance of coming up “heads”.
Randomization Testing for a Difference in Means: StatKey provides you three different options here, the default is to reallocate (shuffle) the group labels to each existing data point.
Randomization Testing for a Difference in Proportions: StatKey also reallocates group labels here.
Randomization Testing for Correlation and Regression: The Y variable (the response) is shuffled so that each value of X (the explanatory variable) ends up paired with a different, randomly chosen value of Y in each randomization sample.

Question #9

In randomization testing for a single mean, StatKey re-samples the shifted data points with replacement. Why is replacement necessary? Explain in 1-2 sentences.

Question #10

For randomization testing of a difference in means (or a difference in proportions), StatKey reallocates the group labels. Briefly explain why this approach satisfies the criteria of 1) and 2) stated at the start of this section.

Case Study #2 - Hair Color and Eye Color

At the University of Delaware, Snee (1974) collected data on the hair color, eye color, and sex of 592 statistics students. The variables recorded in the data are:

Hair - Black, Brown, Red, Blond
Eye - Brown, Blue, Hazel, Green
Sex - Male, Female

These data were originally collected to illustrate various possible frequency tables, we will use them to try to answer a few questions about the relationship between hair and eye color. You can find the data “HairEyeColor” at the link here

Question #11

Suppose we’d like to determine whether individuals with brown hair are more likely to have blue eyes or brown eyes. Briefly explain why hypothesis testing is necessary to answer this research question. Why can’t we just conclude that the larger conditional proportion is the more prevelent eye color?

Question #12

Translate the research question in Lab Question #11 into statistical hypotheses using the proper notation. Report these hypotheses and the relevent quantities from these data (Use Minitab to find these).

Question #13

Use StatKey to perform the hypothesis test you detailed in Lab Question #12. Include a screenshot of your randomization distribution that shows your tests p-value. Also provide a one setence conclusion in the context of the original research questions.

Question #14

The hypothesis test you performed in Lab Question #13 evaluated whether differences in the proportions you observed could be due to random chance. How is random chance involved in this case study? (ie: where does the randomness come from here?)

Question #15

The hypothesis test you performed in Lab Question #13 evaluated the role of random chance. Could any other factors explain the difference in proportions that you observed? Briefly why or why not.

On Your Own

Question #16

For this question I’d like you and your group to use the data from either case study presented in this lab to form and test a hypothesis of your choosing. You should include:

A sentence articulating your research question
Your null and alternative hypotheses
Your observed estimate and it’s \(p\)-value
A screenshot of the randomization distribution you used to determine the \(p\)-value
A sentence summarizing the results of the hypothesis test

Endnotes:

The data used in this lab are an artificial recreation designed to closely mimic those used by Ruback and Juieng (1997), to my knowledge the original data are publicly unavailable