Lab #15 - Analysis of Variance (ANOVA)

Directions (read before starting)

Please work together with your assigned partner. Make sure you both fully understand something before moving on.
Record your answers to lab questions separately from the lab’s examples. You and your partner should only turn in responses to lab questions, nothing more and nothing less.
Ask for help, clarification, or even just a check-in if anything seems unclear.

$~$

Introduction

This is a brief lab intended to cover the major steps involved in one-way analysis of variance (ANOVA) in R. The lab is intentionally brief to allow you additional time to work on your project this week.

Examples and Overview

One-way ANOVA is used to statistically assess the relationship between a quantitative outcome variable and a categorical explanatory variable, usually when the categorical variable contains more than 2 categories.

The one-way ANOVA model states: \[y_i = \mu_i + \epsilon_i\]

In words, the outcome observed for the $i^{th}$ case is determined by two factors:
- A group-specific mean, $\mu_i$, where the group is determined by the categorical explanatory variable
- A deviation (or error) from the group mean

We compare the fit of this model versus the fit of the null model which ignores the groups created by the categorical explanatory variable and simply pools all of the data together.

Step 1 - Hypotheses

One-way ANOVA evaluates the global null hypothesis $H_0: \mu_1 = \mu_2 = \ldots = \mu_k$, which states that the means within every group created by the categorical variable are all equal.

The alternative hypothesis is that at least one group has a different mean.

Step 2 - Comparison of null and alternative models

In R, the aov() function will fit a one-way ANOVA model, and we can use the summary() function to extract the relevant results from that model:

tsa = read.csv("https://remiller1450.github.io/data/tsa_small.csv")

my_model = aov(Close_Amount ~ Claim_Site, data = tsa)
summary(my_model)

##               Df    Sum Sq Mean Sq F value   Pr(>F)    
## Claim_Site     2   2949087 1474543   16.75 5.64e-08 ***
## Residuals   4997 439979039   88049                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this example we see the $F$-statistic summarizing the comparison between the null model and the one-way ANOVA model is 16.75, and the corresponding $p$-value is $< 0.0001$, suggesting these data provide strong evidence that the average close amount differs by claim site.

The Residuals row of the ANOVA table describes the “error” source of variability, or $SSE$, while the Claim_Site row describes the “group” source of variability, or $SSG$. The table omits the “total” or $SST$, but we could calculate it via addition if desired.

Step 3 - Model assessment

One-way ANOVA makes two main assumptions:

The residuals are Normally distributed (an assumption related to $\epsilon_i$ in the null and alternative models)
Each group has an equal standard deviation (an assumption that implies that the errors in each group follow a Normal curve with the same amount of variability but a different mean)

If these assumptions are not met, the $p$-value resulting from one-way ANOVA can be inaccurate, producing an incorrect or misleading conclusion.

We can evaluate the first assumption by graphing the residuals:

my_residuals = my_model$residuals
ggplot() + geom_histogram(aes(x = my_residuals), bins = 25)

Unfortunately the residuals do not seem to follow a Normal distribution, so we might want to report these ANOVA results with caution or consider transforming our data (ie: using mutate() and functions such as log() to create a new outcome variable that is less skewed)

We can evaluate the second assumption by looking at the standard deviation within each group using the group_by() and summarize() functions:

tsa %>% group_by(Claim_Site) %>% summarize(Group_Stdev = sd(Close_Amount))

## # A tibble: 3 × 2
##   Claim_Site      Group_Stdev
##   <chr>                 <dbl>
## 1 Checked Baggage        269.
## 2 Checkpoint             418.
## 3 Other                  771.

A popular “rule of thumb” is that one-way ANOVA is valid so long as the largest group standard deviation is no more than double the smallest group standard deviation.

Unfortunately this is condition is also violated for these data.

Since both one-way ANOVA conditions are not met, this is a scenario where we might opt for a simulation-based test using StatKey.

Another option is the rank-based Kruskal-Wallis test implemented via kruskal.test(), which we will not have time to cover but you can read about here. The test relies upon ranks as its data similar to the Spearman rank correlation coefficient.

Step 4 - Post-hoc testing

For the moment we’ll ignore the violated assumptions we identified in Step 3 and our final step of post-hoc pairwise testing.

The phrase “post-hoc” comes from Latin means “after this” or “after the fact”. As the name suggests, we’ll only do this step if the global test in Step 2 was statistically significant. Our goal in post-hoc testing is to determine which pairs of group differences produced the significant result.

There are many post-hoc tests that can be performed, but we’ll use Tukey’s Honest Significant Differences test:

TukeyHSD(my_model, conf.level = 0.95)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Close_Amount ~ Claim_Site, data = tsa)
## 
## $Claim_Site
##                                 diff        lwr       upr     p adj
## Checkpoint-Checked Baggage  65.91855  37.227230  94.60987 0.0000002
## Other-Checked Baggage      149.76277  -6.151553 305.67709 0.0628946
## Other-Checkpoint            83.84422 -73.976342 241.66478 0.4264091

The $p$-values of this test are naturally adjusted to maintain the Type 1 error rate corresponding to 1 minus the given confidence level. So, in this example we can compare each $p$-value against a threshold of $\alpha = 0.05$ and be confident that our chances of any false positive findings (ie: Type 1 errors) is at most 5%.

Thus, we can conclude that checkpoint claims have significantly higher close amounts than checked baggage claims, and that “other” claims have significantly higher close amounts than checked baggage claims.

We might also notice that the largest effect size is between Other and Checked Baggage, with Other having an average close amount that is approximately $\$150$ larger.

$~$

Lab

This lab contains two short applications and the expectation is that you mirror the steps outlined in the previous section for each. You should report all relevant information, check assumptions, and adjust your approach if neceesary.

Application #1

An individual’s critical flicker frequency is the highest frequency at which a flickering light source can be detected. At frequencies above the critical frequency, the light source appears to be continuous even though it is actually flickering.

The data below come from a study titled “The effect of iris color on critical flicker frequency” published in the Journal of General Psychology. The study recorded the critical flicker frequency and iris color (part of the eye) for $n = 19$ subjects:

flicker = read.delim('https://raw.githubusercontent.com/IowaBiostat/data-sets/main/flicker/flicker.txt')

Question 1:

Part A: Write the null and alternative hypotheses for the use of one-way ANOVA to analyze these data.
Part B: Use aov() to fit the one-way ANOVA model. Print the ANOVA table and interpret the $p$-value. Be sure to make a conclusion that involves the context of these data.
Part C: Check both assumptions of the one-way ANOVA model for these data. If one or more assumptions seems unreasonable, perform a randomization test using StatKey or use the Kruskal-Wallis rank sum test (ie: kruskal.test()).
Part D: If you found statistical significance in Part B, perform post-hoc testing to evaluate which pairs of groups differ. If you did not find statistical significance, briefly explain why post-hoc testing should not be done.

$~$

Application #2

A previous lecture discussed data from a driving simulator experiment where participants were grouped according to the “hardest” substance they regularly used and engaged in a simulated drive where they had to follow a lead vehicle to a destination. The outcome variable, D, is the subject’s average following distance (ft) throughout the drive.

You might recall that in this lecture we discussed the problems that arise when using multiple pairwise hypothesis tests in the same analysis.

driving = read.csv("https://remiller1450.github.io/data/Tailgating.csv")

Question 2:

Part A: Write the null and alternative hypotheses for the use of one-way ANOVA to analyze these data.
Part B: Use aov() to fit the one-way ANOVA model. Print the ANOVA table and interpret the $p$-value. Be sure to make a conclusion that involves the context of these data.
Part C: Check both assumptions of the one-way ANOVA model for these data. If one or more assumptions seems unreasonable, perform a randomization test using StatKey or use the Kruskal-Wallis rank sum test (ie: kruskal.test()).
Part D: If you found statistical significance in Part B, perform post-hoc testing to evaluate which pairs of groups differ. If you did not find statistical significance, briefly explain why post-hoc testing should not be done.
Part E: A common strategy used by statisticians to analyze right-skewed data is applying a log-transformation. In this data set, the column LD is the natural logarithm of the column D, making it a log-transformed variable. For this question, refit the one-way ANOVA model using LD as the outcome and briefly describe whether the assumptions of the model seem more reasonable as well as how the $p$-value changes in comparison to what you found in Part B.