This lab is intended to provide practice and insight in applying ANOVA to real data. Due to time constraints, it will be slightly shorter than previous labs.
Directions (Please read before starting)
\(~\)
Ronald Fisher, creator of Fisher’s exact test, the \(p\)-value, and arguably the most famous historical statistician, is also credited with assembling a well-known dataset consisting of 50 samples of three different species of iris flowers: setosa, versicolor, and virginica.
data("iris") ## The iris data are pre-loaded into R
Four features are recorded for each flower:
\(~\)
Question #1: Use side-by-side boxplots to compare the distribution of “Sepal.Width” across these three species of iris. Based upon what you see, do you believe that one-way ANOVA is likely to produce a statistically significant \(p\)-value?
Question #2: Use one-way ANOVA to determine whether average sepal widths are significantly different across different species of iris. Your answer should clearly state the null and alternative hypotheses, provide the \(F\)-value and \(p\)-value, and make a conclusion about the overall relationship between species and sepal width.
Question #3: Create a QQ-plot to evaluate whether the Normality assumption of one-way ANOVA is satisfied for the model you used in Question #2. Briefly comment upon whether you think that one-way ANOVA could be appropriate for these data without making any data transformations.
Question #4: Use the subset
and sd
functions (or the dplyr
package) to find the sample standard deviation of “Sepal.Width” for each type of iris (you’ll need to do this three times, once for each type of iris). Based upon these values, do you think the equal variance assumption of one-way ANOVA is satisfied for the model you used in Question #2?
Question #5: Use Tukey’s HSD method to perform a post-hoc analysis. Based upon this analysis, which species appear to be the most different (in terms of sepal width).