From the Introduction to Modern Statistics (IMS) textbook, complete the following exercises:
Also complete the additional questions given below:
Question #1: This question uses data from a large, public university facing accusations of sex-based discrimination in its graduate school admissions. The data set includes each admission decision for the 6 largest graduate departments. Each applicant was given an anonymous identifier.
adm = read.csv("https://remiller1450.github.io/data/admissions.csv")
dept and
sex. Briefly explain how these variables are related.dept and
admit. Briefly explain how these variables are
related.dept
confounding the relationship between the variables sex and
admit? Briefly explain.filter() function to create a data set containing only
cases that applied to department A.\(~\)
Question #2: This question uses team data for each of the 82 regular season games played by the Golden State Warriors during their record setting 2015-16 season.
gsw = read.csv("https://remiller1450.github.io/data/GSWarriors.csv")
mutate() function in
the dplyr package to create a new variable called
“pt_margin” that is the difference between the points scored by the
Warriors (ie: Points) and the points scored by their
opponent (ie: OppPoints). This variable should have
positive values when the Warriors outscored their opponent.
Hint: The mutate() function was covered in the dplyr
lab.FG3A (the number of
3-point shot attempts by the Warriors) and the variable
pt_margin (that you created in Part A). Briefly describe
the relationship between these two variables.pt_margin and the explanatory
variable FG3A. Interpret the estimated slope coefficient
for the variable FG3A.pt_margin and the explanatory
variables FG3 (the number of made 3-point shots)
and FG3A. Interpret the estimated slope coefficient for the
variable FG3A.FG3A is very different in the model from
Part E when compared to its value in the model from Part D.Location and
FG3 to predict the response variable
pt_margin. Interpret the estimated coefficient of the
re-coded variable LocationHome.FG3 from the model you fit Part G (ie: fit a regression
model with the single predictor Location and the response
variable pt_margin). Provide a brief explanation as to why
the estimated coefficient of the re-coded variable
LocationHome in this model is approximately the same as the
estimated coefficient of this variable in the model from Part G (which
also included FG3).\(~\)