Directions:
tinytext
package by running
install.packages('tinytex')
followed by
tinytex::install_tinytex()
The Indoor Obstacle Course (IOCT) is an obstacle course that all Cadets at West Point Academy are tested on and must successfully complete prior to graduation. The data set below records IOCT completion times for 384 graduates of the academy:
ioct = read.csv("https://data.scorenetwork.org/data/ioct_west_point.csv")
height
and IOCT_Time
. Based upon this
correlation, do taller Cadets tend to finish the course faster or slower
than shorter Cadets? Briefly explain.height
and
the response variable IOCT_Time
that includes both linear
and moving average smoothers. Briefly explain why Pearson’s correlation
coefficient is appropriate for measuring the strength of association for
these data.IOCT_Time ~ height
and interpret the slope coefficient that
describes the effect of changes in height
on the expected
completion time in the fitted model.sex
and height
are associated in these data. Using your visualization, briefly explain
whether or not you believe these variables are associated.sex
and
IOCT_Time
are associated in these data. Using your
visualization, briefly explain whether or not you believe these
variables are associated.sex
confounds the
relationship between height
and
IOCT_Time
.group_by()
and
summarize()
functions in the dplyr
package to
perform a stratified analysis that reports the correlation
between height
and IOCT_Time
separately for
each category of sex
. Hint: If you are specifying
the data as the first step in the pipeline you should reference the
variables height
and IOCT_Time
without using
the $
operator inside of the summarize()
function.height
appear associated with IOCT_Time
after
you account for the variable sex
?IOCT_Time ~ height + sex
and interpret the slope
coefficient that describes the effect of changes in height
on the expected completion time in the fitted model. Briefly explain the
difference between this effect and the one from Part C.\(~\)
For this question you’ll work with the “2023 Boston Marathon” data set found below:
marathon = read.csv("https://data.scorenetwork.org/data/boston_marathon_2023.csv")
This data set records information on each finisher of the 2023 Boston
Marathon, with the variable finish_net_sec
(finishing time
in seconds) being the outcome of interest for this question.
age_group
and briefly describe the distribution of finisher
ages.finish_net_sec ~ age_group
. Based upon your fitted model,
which age group has the fastest expected finish time?half_time_sec
(the runner’s time at
the race’s halfway point) and finish_net_sec
. Using this
visualization, briefly describe the relationship between these two
variables.finish_net_sec ~ half_time_sec
. Briefly interpret the slope
and intercept of this model. If the intercept is not meaningful, you
should indicate this.finish_net_sec ~ half_time_sec + age_group
. Explain why the
coefficient of “age_group50-54” in this model is so different from the
coefficient in the model you fit in Part B.finish_net_sec ~ half_time_sec + age_group
and
finish_net_sec ~ half_time_sec
. Does it seem like including
age group improves the model? Or does including this variable produce an
overfit model?