Directions:
- Submit your work via the “Assignments” tab on Canvas
- For this assignment you should record your answers/code using
R Markdown
- Please upload HTML, Word, or PDF output created using R Markdown and
make sure it contains your code, output, and written answers. You should
not include extraneous output, such as printing an entire data
frame.
- At this point in the course you are responsible for knowing how to
properly knit an R Markdown document, so uploading any other file format
will result in a point deduction on the assignment
- Homework is an individual assignment. It’s okay to check
your work or collaborate with your classmates, student mentors, and
others, but it is not okay to pass off their work as your own.
- Please clearly acknowledge any help you get from individuals other
than yourself, or resources other than the materials on our course
website (such as external websites and AI)
Question #1
For this question you should use the “diet” data set provided below.
These data come from a randomized experiment seeking to compare the
efficacy of three different weight loss diets. A subject’s assigned diet
is encoded as either 1, 2, or 3, and is recorded in the variable
Diet
. The provided code coerces this variable to a factor
(categorical variable).
diet_data = read.csv("https://remiller1450.github.io/data/diet.csv")
diet_data$Diet = factor(diet_data$Diet)
- Part A: These data contain two variables that were
recorded at the end of the experiment,
postWeight
(final
body weight at the end of the experiment) and weightChange
(change in body weight from the start to the end of the experiment).
Which of these variables is the better outcome for the researchers to
focus on? Briefly explain your reasoning.
- Part B: Create an appropriate data visualization
showing the relationship between the explanatory variable
Diet
and the outcome variable weightChange
.
Based upon this visualization, does there appear to be an association
between diet and weight change?
- Part C: Use
group_by()
and
summarize()
from the dplyr
library to find the
mean and standard deviation of the variable weightChange
in
each assigned group.
- Part D: Consider using one-way ANOVA to analyze the
relationship between diet and weight change. Participant #66 was a 41
year-old female weighing 76 kg at the start of the study who was
assigned to Diet #3 and experienced a weight change of -5.0 kg. What is
this participant’s residual under the null model? What is this
participant’s residual under the alternative model? Show your
calculations of each.
- Part E: Use one-way ANOVA to evaluate the
relationship between diet and weight change. Provide a one-sentence
conclusion that includes appropriate context and cites the \(p\)-value. You do not need to check the
assumptions of your ANOVA yet.
- Part F: Using the ANOVA table resulting from the
test you performed in Part E, report the sums of squared residuals for
both the null and alternative models. Would you describe it as
likely or unlikely for these sums of squared residuals to be this
different if diet and weight change were independent? Briefly explain
your reasoning.
- Part G: Evaluate the two primary assumptions of the
one-way ANOVA you performed in Part E using either graphs, descriptive
statistics, or both. Briefly explain whether you believe these
assumptions are reasonable or not.
- Part H: Perform post-hoc pairwise testing to
determine which pairs of diets produced statistically significant
differences in weight change.
- Part I: Now use one-way ANOVA to evaluate the
relationship between the variables
Diet
and
postWeight
. Provide a one-sentence conclusion that includes
appropriate context and cites the \(p\)-value. You do not need to check the
assumptions of this ANOVA test.
- Part J: Without actually performing any tests,
indicate whether you think it would be valuable to perform post-hoc
testing to expand upon the results of the one-way ANOVA you performed in
Part I. Briefly explain why you believe post-hoc testing would or would
not be worthwhile in this situation.
Note: There will be two additional questions added to this
assignment after Fall break on the topics of testing errors/multiple
comparisons and data transformations.