Directions

  • You may choose to work on this lab individually or with your final project group
    • If you work as a group, all members are responsible the content your lab write-up. I strongly encourage you to use a voice chat software (Skype, google hangouts, etc.) while working together
    • If you choose to work as an individual you are not required to answer questions tagged (Group Only)
  • Read through the entire lab (not just the questions). The lab will introduce course content that you will be responsible for on exams/homework.
  • Answer all questions in a separate document, attaching Minitab output if needed.

Introduction

In the typical statistics class (or textbook) you’ll learn about a particular method, practice applying it to a few examples, and then move on. This workflow is useful for learning new statistical techniques, but it doesn’t prepare you for the real world where you don’t know which chapter’s methods are best suited for the question you’re trying to answer. One of the most challenging aspects of applied statistics is knowing how to choose the right approach for a given situation. The goal this lab is for you to:

  1. Gain experience making decisions regarding which statistical approaches are appropriate for answering certain research questions
  2. Become more comfortable implementing a variety of basic data manipulations and statistical procedures using Minitab

Some of the decisions we will focus on include:

  • Whether to use confidence intervals or hypothesis tests (or both) to address the research question(s)
  • Which hypothesis test(s) are appropriate for the research question(s)
  • What, if any, data processing steps (stratification, transformations, etc.) are necessary before you can perform well-justified statistical inference

To gain broader experience this lab will involve two very different datasets.

Police Killings Data

The data we will analyze in this part of the lab come from the fivethirtyeight article “Where Police Have Killed Americans in 2015”. These data can be accessed here, and contain the following variables:

  • Name: Name of the deceased
  • Age: Age of the deceased at time of death
  • Gender: Gender of the deceased
  • RaceEthnicity: Racial/Ethnic category of the deceased
  • Month: The month when the incident occurred
  • Day: The day of the month when the incident occurred
  • Year: The year when the incident occurred
  • StreetAddress: The street address or intersection nearest to where the incident occurred
  • City: The city in which the incident occurred
  • State: The state in which the incident occurred
  • Latitude: The latitude of the street address nearest to where the incident occurred
  • Longitude: The longitude of the street address nearest to where the incident occurred
  • LawEnforcementAgency: The law enforcement agency involved
  • Cause: Cause of death
  • Armed: Whether the deceased subject was “Armed” or what they were armed with
  • Pov: The census tract poverty rate
  • Urate: The census tract unemployment rate
  • College: The census tract share of the age 25+ population with a bachelor’s degree (or higher)

Because these data only contain incidents that occured in 2015, we will consider them a sample that represents additional years (including future years which haven’t yet occured).

Question 1

A Pennsylvania jury recently acquitted a police officer who fatally shot an unarmed teenager (this NPR Article provides details). Based upon this event, we might wonder how common are police killings of an unarmed individuals among all police killings? Use statistical methods to answer this question, including all relavent Mintab output, your conclusion, and a short rationale for your approach.

Question 2

The Pennsylvania case mentioned in Question #1 received a lot of publicity because it involved a white police officer killing an unarmed black individual. We might wonder, among those killed by the police, is the proportion of blacks who were unarmed different from the proportion of whites who were unarmed? Use statistical methods to answer this question, including all relavent Mintab output, your conclusion, and a short rationale for your approach.

Question 3

Everyone who appears in the database was killed by the police (by definition), but this means we don’t have data on individuals who were not killed by the police. In this situation we might seek to bring in external information to shape our analysis. It is estimated that the racial composition of the United States in 2015 was 61.8% non-Hispanic white, 13.2% black, 17.8% Hispanic (of any race), 5.2% Asian, 0.8% Native American, and 1.2% other. Based upon this, we might wonder if police killing are equally prevalent across races, or if some racial/ethnic groups are disproportionately involved in police killings? Use statistical methods to answer this question, including all relavent Mintab output, your conclusion, and a short rationale for your approach.

Question 4

Critics of the analysis described in Question 3 might argue that because of socio-economic factors not all racial/ethnic groups commit crimes at the same rate, and therefore exposure to situations with a possibility of being killed by the police is unequal across groups. It might be possible to evaluate this criticism using external information from the National Crime Victimization Survey (NCVS). According the NCVS, 22.7% of the victims of violent crimes report that the perpetrator of the crime was black. Based upon this, we might wonder if the proportion of black individuals in the Police Killings data differs from proportion crimes with black perpetrators (given by the NCVS)? Use statistical methods to answer this question, including all relavent Mintab output, your conclusion, and a short rationale for your approach.

Question 5 (Group Only)

If socio-economic factors are related with the demographics of those killed by the police, we’d expect police killings to occur in census tracks that are worse off economically than the rest of the country. In 2015, the national unemployment rate was 5.2%, the national poverty rate was 13.5%, and 35% of the US population (Age 25 or over) have a bachelor’s degree or higher. We might wonder if, on average, the locations of police killings have poverty rates, unemployment rates, and levels of college education that differ from these national averages, as well as how different they are. Use statistical methods to answer this question and provide a short rationale for your approach.

Golden State Warriors Data

For this section we will shift to a lighter topic and analyze data from the 2015-16 Golden State Warriors record setting season. The Warriors are professional basketball team based in Oakland California. In 2015-2016 they set an NBA record for the most wins in NBA regular season history with a win-loss record of 73-9 (breaking the 1995-95 Chicago Bulls record of 72-10). Additionally, the Warriors team set 25 different NBA records during the 2015-16 season, which is regarded as one of the best seasons in NBA history.

The data we will analyze documents each of the 82 games played by the 2015-16 Warriors team; it can be accessed here, and it contains the following variables:

  • Game: ID number for each game
  • Date: Date the game was played
  • Location: Whether the game was home or away
  • Opp: Opposing team’s name
  • Win: Whether the game was a win (W) or a loss (L) for Golden State
  • Points: Number of points scored by Golden State
  • OppPoints: Number of points scored by the opponent
  • FG: Number of field goals made by Golden State
  • FGA: Number of field goals attempted by Golden State
  • FG3: Number of 3-point shots made by Golden State
  • FG3A: Number of 3-point shots attempted by Golden State
  • FT: Number of free throws made by Golden State
  • FTA: Number of free throws attempted by Golden State
  • Rebounds: Total number of rebounds by Golden State
  • OffReb: Number of offensive rebounds by Golden State
  • Assists: Number of assists by Golden State
  • Steals: Number of steals by Golden State
  • Blocks: Number of blocked shots by Golden State
  • Turnovers: Number of turnovers made by Golden State
  • Fouls: Number of fouls committed by Golden State
  • OppFG: Number of field goals made by the opponent
  • OppFGA: Number of field goals attempted by the opponent
  • OppFG3: Number of 3-point shots made by the opponent
  • OppFG3A: Number of 3-point shots attempted by the opponent
  • OppFT: Number of free throws made by the opponent
  • OppFTA: Number of free throws attempted by the opponent
  • OppRebounds: Total number of rebounds by the opponent
  • OppOffReb: Number of offensive rebounds by the opponent
  • OppAssists: Number of assists by the opponent
  • OppSteals: Number of steals by the opponent
  • OppBlocks: Number of blocked shots by the opponent
  • OppTurnovers: Number of turnovers made by the opponent
  • OppFouls: Number of fouls committed by the opponent

Question 6 (Group Only)

The Warriors have a reputation as one of the best shooting teams of all time. Based upon this claim, we might wonder whether the Warriors are better than their opponents at making free throws. Use statistical methods to evaluate this conjecture using season totals. Use statistical methods to answer this question, including all relavent Mintab output, your conclusion, and a short rationale for your approach.

Question 7 (Group Only)

The 2015-16 Warriors were led by Stephen Curry, one of the best three-point shooters of all time. Because of Curry’s presence, we might wonder if the Warriors attempted more three-point shots than their opponent in each game. Use statistical methods to answer this question, including all relavent Mintab output, your conclusion, and a short rationale for your approach.

Question 8

Critics of the Warriors have called them over-reliant on three-point shooting. If this is the case, we’d expect the Warriors to have made a lower proportion of their three-point attempts in the team’s losses than in the team’s wins. Evaluate this hypothesis using season totals. Use statistical methods to answer this question, including all relavent Mintab output, your conclusion, and a short rationale for your approach.

Question 9

It has been well-established statistically that there is a home-court advantage in the NBA. Based upon this, we might wonder how much more likely were the Warriors to win at home (relative to on the road)? Use statistical methods to answer this question, including all relavent Mintab output, your conclusion, and a short rationale for your approach.

Question 10 (Group Only)

For this question you may use either of the data sets in this lab. For your chosen data set I’d like you to create a brief report highlighting one interesting feature of the data. You report must include:

  1. A short description of the data
  2. An informative visualization
  3. A summary of one or more important statistical findings
  4. A conclusion tying the statistical results back to the nature of the data

This should all fit on a single page; it should not come as a bulleted list, rather it should be a coherent paragraph (or multiple paragraphs) accompanied by your figure.

Submission Directions

  • Email your completed write-up to Professor Miller with a subject heading that includes the text “Sta-209-Lab8”. Please include this exact character string, including the dashes. You will lose 1 point off the top of your score if you don’t do so.
  • If you’d like to provide feedback on your group, fill out the optional review form at this link: https://forms.gle/wNWRFMbbra8oK4LJ8