\(~\)

Onboarding

Below is a generic contingency table:

Event No Event
Exposure
No Exposure
  • “Exposure”/“No Exposure” are categories of a binary explanatory variable.
  • “Event”/“No Event” are categories of a binary response variable.
  • If one or more of our variables of interest are not binary, it is possible to combine several categories together using the ifelse() function to create a new variable.

    ## Load data
    colleges <- read.csv("https://remiller1450.github.io/data/Colleges_2024_Complete.csv")
    
    ## Create a new variable "California"
    colleges$California = ifelse(colleges$State == "CA", "California", "Another State")
    
    ## Contingency table
    table(colleges$California, colleges$HSI)
    ##                
    ##                  No Yes
    ##   Another State 843  82
    ##   California     15   8

    Not every analysis can reduced into a pair of binary variables, but this is a technique you should be aware of, particularly when calculating measures of association such as odds ratios.

    \(~\)

    Lab

    As a reminder, you should work on the lab with your assigned partner(s) using the principles of paired programming. Everyone should keep and submit their own copy of your group’s work.

    In this lab we’ll use a data set assembled by the Washington Post that attempts to document all fatal shootings made by a police officer from 2015 until mid-2020.

    police <- read.csv("https://remiller1450.github.io/data/Police.csv")

    You’ll also need the same packages used in our previous lab, ggplot2 and forcats. Remember that you do not need to install these packages if you’ve done so previously, but you do need to load them using the library() function:

    # install.packages("ggplot2")
    # install.packages("forcats")
    
    library(ggplot2)
    library(forcats)

    \(~\)

    Two-way frequency tables and conditional proportions

    Recall that we’ve previously the table() table function create a one-way frequency table for a single categorical variable.

    ## One-way table of "threat level"
    table(police$threat_level)
    ## 
    ##       attack        other undetermined 
    ##         4750         2516          282

    We can use this function to create a two-way frequency table by supplying a second variable as an additional argument to the table:

    ## Two-way table of "threat level" and "armed"
    table(police$threat_level, police$armed)
    ##               
    ##                armed unarmed
    ##   attack        4573     177
    ##   other         2283     233
    ##   undetermined   235      47

    Question #1: Create a two-way table showing the frequencies of each combination of the variable flee and the variable armed.

    \(~\)

    Conditional proportions

    From a two-way frequency we can calculate conditional proportions “by hand” using indices (ie: accessing elements of the table via square brackets, [ and ]).

    ## Store previous table
    threat_armed_table = table(police$threat_level, police$armed)
    
    ## Proportion of attackers who were armed
    threat_armed_table[1,1]/sum(threat_armed_table[1,])
    ## [1] 0.9627368

    Two things to notice about this example:

    1. We use threat_armed_table[1,1] to access the frequency of cases whose value of threat_level was “attack” and whose value of armed was “armed”. This is the upper left cell in the two-way table, or 4573.
    2. The denominator of this conditional proportion should be the total number of cases whose value of threat_level was “attack”. We can get this by taking the entire corresponding row from the table, which is given by threat_armed_table[1,]. You’ll notice that the second index being blank indicates we want the entire first row (ie: all columns in the first row). The sum() is then used to add up the elements of this row, thereby giving us the total number of cases whose value of threat_level was “attack”.

    It is also possible to use the prop.table() function to calculate conditional proportions for every cell in the table:

    ## Row proportions
    prop.table(threat_armed_table, margin = 1)
    ##               
    ##                     armed    unarmed
    ##   attack       0.96273684 0.03726316
    ##   other        0.90739269 0.09260731
    ##   undetermined 0.83333333 0.16666667
    ## Column proportions
    prop.table(threat_armed_table, margin = 2)
    ##               
    ##                    armed   unarmed
    ##   attack       0.6449020 0.3873085
    ##   other        0.3219574 0.5098468
    ##   undetermined 0.0331406 0.1028446

    Question #2:

    • Part A: Starting with the table you made in Question #1, use a “by hand” approach to find the proportion of unarmed individuals who did not flee.
    • Part B: Apply the prop.table() function to your table from Question #1 to print a table that shows the proportion of armed/unarmed subjects within each category of the variable flee.
    • Part C: Based upon your table of conditional proportions in Part B, do these two variables appear to be associated? Briefly explain.
    • Part D: Create a conditional bar chart (using examples from Lab 3) that displays the same information as the table you created in Part B.

    \(~\)

    Risk Difference and Relative Risk

    Recall that risk difference and relative risk are two different ways of measuring association using conditional proportions. Thus, to calculate these measures of association we can use conditional proportions obtained via prop.table(), or calculated “by hand”.

    Interpreting a relative risk requires less subject-area expertise than interpreting a risk difference:

    Relative Risk (RR) Interpretation
    RR = 1 There is no difference in risk between the two groups.
    RR = 1.2 The event is 20% more likely to occur in the first group than in the second group. This is often considered a small increased risk.
    RR = 1.5 The event is 50% more likely to occur in the first group than in the second group. This is often considered a moderate increased risk.
    RR = 2 The event is twice as likely to occur in the first group than in the second group. This is often considered a large increased risk.
    RR = 0.6 The event is 40% less likely to occur in the first group than in the second group. This is often considered a moderate decreased risk.

    The example below calculates the risk difference and relative risk comparing the risk of a subject who was killed by police being “unarmed” for subjects with and without previous signs of mental illness.

    ## Create the two-way table
    mental_armed_table = table(police$signs_of_mental_illness, police$armed)
    
    ## Store the proper conditional proportions (row props in this example)
    mental_armed_props = prop.table(mental_armed_table, margin = 1)
    
    ## Arithmetic w/ the relevant proportions (risk difference)
    mental_armed_props[1,2] - mental_armed_props[2,2]
    ## [1] 0.008799235
    ## Relative risk
    mental_armed_props[1,2]/mental_armed_props[2,2]
    ## [1] 1.16405

    In this example the risk difference (difference in proportions) was 0.0088, or the chances of an individual without previous signs of mental illness being unarmed are 0.88 percentage points higher than those of an individual with previous signs of mental illness being unarmed.

    Because it is relatively rare for unarmed individuals to be killed police, relative risk might provide a better measure of association for the data in this table. Because the relative risk is 1.16, we can state that those without previous signs of mental illness are 16% more likely to have been unarmed relative to those without signs of mental illness.

    Throughout these examples you should pay careful attention to indices that were used. These determine the elements of the conditional proportions table that are being used. Note: It’s typical to use the group with the larger risk as the denominator in relative risk calculations.

    Question #3:

    • Part A: Find the risk difference comparing the risk of an individual being unarmed when a body camera was present (a value of TRUE) vs. when a body camera was not present (a value of FALSE).
    • Part B: Write a single sentence that communicates the risk difference you found in Part A. Your sentence should address all relevant details (ie: how are the risks defined, which group has the higher risk, etc.) You may use “the chances of an individual without previous signs of mental illness being unarmed are 0.88 percentage points higher than those of an individual with previous signs of mental illness being unarmed” as an example.
    • Part C: Find the relative risk comparing the risk of an individual being unarmed when a body camera was present (a value of TRUE) vs. when a body camera was not present (a value of FALSE).
    • Part D: Write 1-sentence communicating the relative risk you found in Part C. Your sentence should address all relevant details (ie: how are the risks defined, which group has the higher risk, etc.)
    • Part E: Which of these two measures, risk difference or relative risk, do you think is a more appropriate way of describing the strength of association between these two variables.

    \(~\)

    Odds and Odds Ratios

    We can calculate odds and odds ratios using indices in the same way we calculated conditional proportions and relative risks.

    ## Example table
    mental_armed_table = table(police$signs_of_mental_illness, police$armed)
    
    
    ## Odds of "unarmed" among those with signs of mental illness
    odds_ua_mental = mental_armed_table[2,2]/mental_armed_table[2,1]
    odds_ua_mental
    ## [1] 0.05667752
    ## Odds of "unarmed" among those with no signs of mental illness
    odds_ua_no = mental_armed_table[1,2]/mental_armed_table[1,1]
    odds_ua_no
    ## [1] 0.06659467
    ## Odds ratio (no signs of mental illness relative to signs of mental illness)
    odds_ua_no/odds_ua_mental
    ## [1] 1.174975

    In this example, the odds ratio of 1.175 suggests that cases with previous signs of mental illness have 17.5% higher odds of being unarmed than those without signs of mental illness.

    You might notice that this odds ratio shows approximately the same strength of association as the relative risk, which we found to 1.16 in the previous section. This is similarity between odds ratios and relative risk is true more generally for uncommon (rare) events. If you think about the calculation of odds vs. the calculation of risk, you’ll notice that they use the same numerator, and their denominator will be approximately the same when one when the count of the event of interested is small relative to the overall group size.

    Question #4:

    • Part A: Calculate the odds that an individual was unarmed in an incident where a body camera was present.
    • Part B: Calculate the odds ratio comparing the odds of an individual being unarmed when a body camera was present relative to the odds of an individual being unarmed when a body camera was not present.
    • Part C: Calculate the odds ratio comparing the odds of an individual being armed when a body camera was not present relative to the odds of an individual being armed when a body camera was present.
    • Part D: Is it a coincidence that the two odds ratios you found in Parts B and C are identical?