Today’s reading describes 10 different ways that statistical information can be used to mislead. For your reference, these 10 points are summarized below:
The aim of the following activity activity is for you to better understand these ideas by trying some of them out for yourself. That said, not all of them are applicable to the NYPD data or the graphics that can be generated by the two R Shiny apps we’ll use today.
Questions 1-5 will provide some practice looking at graphics that could be used as misleading evidence for certain arguments, and Question 6 will give you an opportunity to craft your own misleading argument.
\(~\)
Open the R Shiny app we used in our previous meeting:
For your reference, here is the link to more information about the app and the variables it displays: https://stat2labs.sites.grinnell.edu/Handouts/nypd/NYPDVariableDescriptions.pdf
Question #1: Set the y-axis of the app to “Firearms”, the x-axis to “Year”, and the measurement setting to “percentage of stops”. This graph displays the percentage of stops where the officer reported the presence of a firearm. Does this graph suggest that firearms are becoming substantially more prevalent in New York City in recent years? Which of the 10 points mentioned in the introduction might be relevant here?
Question #2: Restrict the years to 2006-2016, change the y-axis to “Stopped”, the x-axis to “Year”, the measurement to “percentage of arrests”, then facet by “Crime Type”. Does this graph support the claim that theft, weapons crimes, and substance crimes substantially decreased after the 2012 court ruling? How would you explain the y-axis of this graph?
\(~\)
For the following questions we’ll use a different R Shiny app to continue looking at the NYPD stop-and-frisk data.
This app uses the exact same underlying data as the previous app, but it uses spatial information to display the data at precinct level on a choropleth map. In this app you can click on a precinct to show a pop-up with additional information.
Question #3: Change the year to 2010, color the precincts by arrests weighted by population, and set the color scale to linear. Which precinct stands out on this graph? Why might that precinct be showing such a high per capita arrest rate? Hint: zoom in on the map, do you notice any landmarks within this precinct?
Question #4: Keeping everything else the same as Question 3, change the color scale to “total number of arrests” so that a precinct’s population is no longer considered. Notice how the precinct you identified in Question 3 no longer stands out. Considering your previous response, what might explain this difference? That is, what could explain why this precinct has a very high per capita arrest rate relative to other precincts, but it does not have a particularly high total number of arrests.
Question #5: Now color the precincts by the number of arrests by race, weighted by population and select “Black” on the Race drop-down menu. Make a mental note of how this map looks, then change the color scale setting to “logarithmic”, which will take the natural log of arrest rate and use it determine precinct color. Considering the “linear” color scale and the “logarithmic” color scale, how could you use of one of these color schemes to support a misleading argument?
\(~\)
Question #6: Your task is now to craft a misleading argument using a graphic created by one of the two R Shiny apps as evidence/support. You should intentionally try to exploit one or more of the ways that statistical information can be presented in a misleading manner as summarized in the introduction. You are welcome to borrow ideas from Questions 1-5 when coming up with your argument, or you may pursue something entirely different.
When you are finished with Questions 1-6, have one partner upload your responses to P-web.