This lab is intended to provide practice and insight in applying ANOVA to real data. Due to time constraints, it will be slightly shorter than previous labs.

Directions (Please read before starting)

  1. Please work together with your assigned groups. Even though you’ll turn in a write-up that is later scored, labs are intended to formative and a substantial portion of the credit you’ll receive is based upon effort and completion.
  2. Please record your responses and code in an R Markdown document following the conventions we’ve used in previous labs.

\(~\)

Case Study - Fisher’s Iris Data

Ronald Fisher, creator of Fisher’s exact test, the \(p\)-value, and arguably the most famous historical statistician, is also credited with assembling a well-known dataset consisting of 50 samples of three different species of iris flowers: setosa, versicolor, and virginica.

data("iris")  ## The iris data are pre-loaded into R

Four features are recorded for each flower:

  • Petal.Length and Petal.Width - dimensions (in cm) of each flower’s petals
  • Sepal.Length and Sepal.Width - dimensions (in cm) of each flower’s sepal (a leaf-like part adjacent to the flower’s petals)

\(~\)

Question #1: Use side-by-side boxplots to compare the distribution of “Sepal.Width” across these three species of iris. Based upon what you see, do you believe that one-way ANOVA is likely to produce a statistically significant \(p\)-value?

Question #2: Use one-way ANOVA to determine whether average sepal widths are significantly different across different species of iris. Your answer should clearly state the null and alternative hypotheses, provide the \(F\)-value and \(p\)-value, and make a conclusion about the overall relationship between species and sepal width.

Question #3: Create a QQ-plot to evaluate whether the Normality assumption of one-way ANOVA is satisfied for the model you used in Question #2. Briefly comment upon whether you think that one-way ANOVA could be appropriate for these data without making any data transformations.

Question #4: Use the subset and sd functions (or the dplyr package) to find the sample standard deviation of “Sepal.Width” for each type of iris (you’ll need to do this three times, once for each type of iris). Based upon these values, do you think the equal variance assumption of one-way ANOVA is satisfied for the model you used in Question #2?

Question #5: Use Tukey’s HSD method to perform a post-hoc analysis. Based upon this analysis, which species appear to be the most different (in terms of sepal width).