From the Introduction to Modern Statistics (IMS) textbook, complete the following exercises:
Also complete the additional questions given below:
Question #1: This question uses data from a large, public university facing accusations of sex-based discrimination in its graduate school admissions. The data set includes each admission decision for the 6 largest graduate departments. Each applicant was given an anonymous identifier.
adm = read.csv("https://remiller1450.github.io/data/admissions.csv")
dept
and
sex
. Briefly explain how these variables are related.dept
and
admit
. Briefly explain how these variables are
related.dept
confounding the relationship between the variables sex
and
admit
? Briefly explain.filter()
function to create a data set containing only
cases that applied to department A.\(~\)
Question #2: This question uses team data for each of the 82 regular season games played by the Golden State Warriors during their record setting 2015-16 season.
gsw = read.csv("https://remiller1450.github.io/data/GSWarriors.csv")
mutate()
function in
the dplyr
package to create a new variable called
“pt_margin” that is the difference between the points scored by the
Warriors (ie: Points
) and the points scored by their
opponent (ie: OppPoints
). This variable should have
positive values when the Warriors outscored their opponent.
Hint: The mutate()
function was covered in the dplyr
lab.FG3A
(the number of
3-point shot attempts by the Warriors) and the variable
pt_margin
(that you created in Part A). Briefly describe
the relationship between these two variables.pt_margin
and the explanatory
variable FG3A
. Interpret the estimated slope coefficient
for the variable FG3A
.pt_margin
and the explanatory
variables FG3
(the number of made 3-point shots)
and FG3A
. Interpret the estimated slope coefficient for the
variable FG3A
.FG3A
is very different in the model from
Part E when compared to its value in the model from Part D.Location
and
FG3
to predict the response variable
pt_margin
. Interpret the estimated coefficient of the
re-coded variable LocationHome
.FG3
from the model you fit Part G (ie: fit a regression
model with the single predictor Location
and the response
variable pt_margin
). Provide a brief explanation as to why
the estimated coefficient of the re-coded variable
LocationHome
in this model is approximately the same as the
estimated coefficient of this variable in the model from Part G (which
also included FG3
).\(~\)