These questions are intended to help you practice for Exam #1. The real exam will feature 2-3 questions that follow a similar format. All course content up until this point, including the summarizing data and sampling and study design lectures, Labs 1 and 2, and Problem Set #1 may appear on the exam.

You should record your answers in an R Markdown document. You are welcome to use the template available on Canvas.

\(~\)

Study #1 - The Golden State Warriors historic season

Data are available here: https://remiller1450.github.io/data/GSWarriors.csv

gsw <- read.csv("https://remiller1450.github.io/data/GSWarriors.csv")

Study Description:

The Warriors are professional basketball team based in Oakland California. In 2015-2016 they set an NBA record for the most wins in NBA regular season history with a win-loss record of 73-9 (breaking the 1995-95 Chicago Bulls record of 72-10). Additionally, the Warriors team set 25 different NBA records during the 2015-16 season, which is regarded as one of the best seasons in NBA history.

Our goal in analyzing these data is to better understand factors related to the Warrior’s success during their record breaking season.

Data Dictionary:

  • Game: A chronologically determined number identifying each game within the 82 game season
  • Date: The date the game was played
  • Location: Whether the game was home or away
  • Opp: Opposing team’s name
  • Win: Whether the game was a win (W) or a loss (L) for Golden State
  • Points: Number of points scored by Golden State
  • OppPoints: Number of points scored by the opponent
  • FG: Number of field goals made by Golden State
  • FGA: Number of field goals attempted by Golden State
  • FG3: Number of 3-point shots made by Golden State
  • FG3A: Number of 3-point shots attempted by Golden State
  • FT: Number of free throws made by Golden State
  • FTA: Number of free throws attempted by Golden State
  • Rebounds: Total number of rebounds by Golden State
  • OffReb: Number of offensive rebounds by Golden State
  • Assists: Number of assists by Golden State
  • Steals: Number of steals by Golden State
  • Blocks: Number of blocked shots by Golden State
  • Turnovers: Number of turnovers made by Golden State
  • Fouls: Number of fouls committed by Golden State
  • OppFG: Number of field goals made by the opponent
  • OppFGA: Number of field goals attempted by the opponent
  • OppFG3: Number of 3-point shots made by the opponent
  • OppFG3A: Number of 3-point shots attempted by the opponent
  • OppFT: Number of free throws made by the opponent
  • OppFTA: Number of free throws attempted by the opponent
  • OppRebounds: Total number of rebounds by the opponent
  • OppOffReb: Number of offensive rebounds by the opponent
  • OppAssists: Number of assists by the opponent
  • OppSteals: Number of steals by the opponent
  • OppBlocks: Number of blocked shots by the opponent
  • OppTurnovers: Number of turnovers made by the opponent
  • OppFouls: Number of fouls committed by the opponent

\(~\)

1-A: Are these data best viewed as a sample or a population? Explain your answer in no more than 2 sentences (No use of R is required for this question)

1-B: Create an appropriate data visualization depicting the relationship between “Location” and “Win”. Then write 1-2 sentences describing the relationship you see between these variables.

1-C: Report the difference in the proportion of games won by location (ie: winning percentage at home - winning percentage away).

1-D: Create an appropriate data visualization depicting the relationship between “OppPoints” and “Points”. Then write 1-2 sentences describing the relationship you see between these variables.

1-E: Can a causal relationship between “Location” and “Win”, or between “OppPoints” and “Points”, be established by this study. If so, briefly explain. If not, briefly describe an alternative explanation for one of these observed associations. (No use of R is required for this question)

\(~\)

Study #2 - Chicken growth in response to diet

Data are available here: https://remiller1450.github.io/data/ChickWeight.csv

chicks <- read.csv("https://remiller1450.github.io/data/ChickWeight.csv")

Study Description:

At birth, 71 chicks (baby chickens) were randomly assigned to one of six diets, and their weight was measured every second day until they reached 21 days old. These diets differed only in terms of the protein source used in the feed mixture.

The data you are provided contains each chick’s assigned diet and that chick’s weight (in grams) when they were 21 days old.

Data Dictionary:

  • weight - the chick’s weight in grams at day 21
  • feed - the protein source used in feed for the chick’s diet

\(~\)

2-A: In one sentence, state the research question of this study. (No use of R is required for this question)

2-B: Create a histogram of the variable “weight” and describe distribution of this variable. Please do not consider the variable “feed” when answering this question.

2-C: Create an appropriate data visualization depicting the relationship between “feed” and “weight”. Briefly describe which diets appeared the most successful (in terms of achieving the greatest weight gain).

2-D: Based upon the graph you created in Part C, which diet appeared to have greatest variability in the 21 day weights of chicks that adhered to it? Justify your answer by referencing an appropriate measure of variability (you do not need to calculate the exact number).

2-E: The birth weight of each chick is not included in these data despite being a factor that is clearly associated with that chick’s 21 day weight. Does the omission of this variable pose a problem when attempting to establish a causal relationship between a chick’s diet and its weight? Briefly explain. (No use of R is required for this question)