These questions are intended to help you practice for Exam #1. The real exam will feature 2-3 questions that follow a similar format. All course content up until this point, including the summarizing data and sampling and study design lectures, Labs 1 and 2, and Problem Set #1 may appear on the exam.
You should record your answers in an R Markdown document. You are welcome to use the template available on Canvas.
\(~\)
Data are available here: https://remiller1450.github.io/data/GSWarriors.csv
gsw <- read.csv("https://remiller1450.github.io/data/GSWarriors.csv")
Study Description:
The Warriors are professional basketball team based in Oakland California. In 2015-2016 they set an NBA record for the most wins in NBA regular season history with a win-loss record of 73-9 (breaking the 1995-95 Chicago Bulls record of 72-10). Additionally, the Warriors team set 25 different NBA records during the 2015-16 season, which is regarded as one of the best seasons in NBA history.
Our goal in analyzing these data is to better understand factors related to the Warrior’s success during their record breaking season.
Data Dictionary:
\(~\)
1-A: Are these data best viewed as a sample or a population? Explain your answer in no more than 2 sentences (No use of R is required for this question)
1-B: Create an appropriate data visualization depicting the relationship between “Location” and “Win”. Then write 1-2 sentences describing the relationship you see between these variables.
1-C: Report the difference in the proportion of games won by location (ie: winning percentage at home - winning percentage away).
1-D: Create an appropriate data visualization depicting the relationship between “OppPoints” and “Points”. Then write 1-2 sentences describing the relationship you see between these variables.
1-E: Can a causal relationship between “Location” and “Win”, or between “OppPoints” and “Points”, be established by this study. If so, briefly explain. If not, briefly describe an alternative explanation for one of these observed associations. (No use of R is required for this question)
\(~\)
Data are available here: https://remiller1450.github.io/data/ChickWeight.csv
chicks <- read.csv("https://remiller1450.github.io/data/ChickWeight.csv")
Study Description:
At birth, 71 chicks (baby chickens) were randomly assigned to one of six diets, and their weight was measured every second day until they reached 21 days old. These diets differed only in terms of the protein source used in the feed mixture.
The data you are provided contains each chick’s assigned diet and that chick’s weight (in grams) when they were 21 days old.
Data Dictionary:
\(~\)
2-A: In one sentence, state the research question of this study. (No use of R is required for this question)
2-B: Create a histogram of the variable “weight” and describe distribution of this variable. Please do not consider the variable “feed” when answering this question.
2-C: Create an appropriate data visualization depicting the relationship between “feed” and “weight”. Briefly describe which diets appeared the most successful (in terms of achieving the greatest weight gain).
2-D: Based upon the graph you created in Part C, which diet appeared to have greatest variability in the 21 day weights of chicks that adhered to it? Justify your answer by referencing an appropriate measure of variability (you do not need to calculate the exact number).
2-E: The birth weight of each chick is not included in these data despite being a factor that is clearly associated with that chick’s 21 day weight. Does the omission of this variable pose a problem when attempting to establish a causal relationship between a chick’s diet and its weight? Briefly explain. (No use of R is required for this question)