\(~\)
Below is a generic contingency table:
Event | No Event | |
---|---|---|
Exposure |
|
|
No Exposure |
|
|
If one or more of our variables of interest are not binary, it is
possible to combine several categories together using the
ifelse()
function to create a new variable.
## Load data
colleges <- read.csv("https://remiller1450.github.io/data/Colleges_2024_Complete.csv")
## Create a new variable "California"
colleges$California = ifelse(colleges$State == "CA", "California", "Another State")
## Contingency table
table(colleges$California, colleges$HSI)
##
## No Yes
## Another State 843 82
## California 15 8
Not every analysis can reduced into a pair of binary variables, but this is a technique you should be aware of, particularly when calculating measures of association such as odds ratios.
\(~\)
As a reminder, you should work on the lab with your assigned partner(s) using the principles of paired programming. Everyone should keep and submit their own copy of your group’s work.
In this lab we’ll use a data set assembled by the Washington Post that attempts to document all fatal shootings made by a police officer from 2015 until mid-2020.
police <- read.csv("https://remiller1450.github.io/data/Police.csv")
You’ll also need the same packages used in our previous lab,
ggplot2
and forcats
. Remember that you do
not need to install these packages if you’ve done so previously,
but you do need to load them using the library()
function:
# install.packages("ggplot2")
# install.packages("forcats")
library(ggplot2)
library(forcats)
\(~\)
Recall that we’ve previously the table()
table function
create a one-way frequency table for a single categorical variable.
## One-way table of "threat level"
table(police$threat_level)
##
## attack other undetermined
## 4750 2516 282
We can use this function to create a two-way frequency table by supplying a second variable as an additional argument to the table:
## Two-way table of "threat level" and "armed"
table(police$threat_level, police$armed)
##
## armed unarmed
## attack 4573 177
## other 2283 233
## undetermined 235 47
Question #1: Create a two-way table showing the
frequencies of each combination of the variable flee
and
the variable armed
.
\(~\)
From a two-way frequency we can calculate conditional proportions “by
hand” using indices (ie: accessing elements of the table via square
brackets, [
and ]
).
## Store previous table
threat_armed_table = table(police$threat_level, police$armed)
## Proportion of attackers who were armed
threat_armed_table[1,1]/sum(threat_armed_table[1,])
## [1] 0.9627368
Two things to notice about this example:
threat_armed_table[1,1]
to access the frequency
of cases whose value of threat_level
was “attack” and whose
value of armed
was “armed”. This is the upper left cell in
the two-way table, or 4573.threat_level
was “attack”.
We can get this by taking the entire corresponding row from the table,
which is given by threat_armed_table[1,]
. You’ll notice
that the second index being blank indicates we want the entire first row
(ie: all columns in the first row). The sum()
is then used
to add up the elements of this row, thereby giving us the total number
of cases whose value of threat_level
was “attack”.It is also possible to use the prop.table()
function to
calculate conditional proportions for every cell in the
table:
## Row proportions
prop.table(threat_armed_table, margin = 1)
##
## armed unarmed
## attack 0.96273684 0.03726316
## other 0.90739269 0.09260731
## undetermined 0.83333333 0.16666667
## Column proportions
prop.table(threat_armed_table, margin = 2)
##
## armed unarmed
## attack 0.6449020 0.3873085
## other 0.3219574 0.5098468
## undetermined 0.0331406 0.1028446
Question #2:
prop.table()
function to your table from Question #1 to print a table that shows the
proportion of armed/unarmed subjects within each category of the
variable flee
.\(~\)
Recall that risk difference and relative risk are two different ways
of measuring association using conditional proportions. Thus, to
calculate these measures of association we can use conditional
proportions obtained via prop.table()
, or calculated “by
hand”.
Interpreting a relative risk requires less subject-area expertise than interpreting a risk difference:
Relative Risk (RR) | Interpretation |
---|---|
RR = 1 | There is no difference in risk between the two groups. |
RR = 1.2 | The event is 20% more likely to occur in the first group than in the second group. This is often considered a small increased risk. |
RR = 1.5 | The event is 50% more likely to occur in the first group than in the second group. This is often considered a moderate increased risk. |
RR = 2 | The event is twice as likely to occur in the first group than in the second group. This is often considered a large increased risk. |
RR = 0.6 | The event is 40% less likely to occur in the first group than in the second group. This is often considered a moderate decreased risk. |
The example below calculates the risk difference and relative risk comparing the risk of a subject who was killed by police being “unarmed” for subjects with and without previous signs of mental illness.
## Create the two-way table
mental_armed_table = table(police$signs_of_mental_illness, police$armed)
## Store the proper conditional proportions (row props in this example)
mental_armed_props = prop.table(mental_armed_table, margin = 1)
## Arithmetic w/ the relevant proportions (risk difference)
mental_armed_props[1,2] - mental_armed_props[2,2]
## [1] 0.008799235
## Relative risk
mental_armed_props[1,2]/mental_armed_props[2,2]
## [1] 1.16405
In this example the risk difference (difference in proportions) was 0.0088, or the chances of an individual without previous signs of mental illness being unarmed are 0.88 percentage points higher than those of an individual with previous signs of mental illness being unarmed.
Because it is relatively rare for unarmed individuals to be killed police, relative risk might provide a better measure of association for the data in this table. Because the relative risk is 1.16, we can state that those without previous signs of mental illness are 16% more likely to have been unarmed relative to those without signs of mental illness.
Throughout these examples you should pay careful attention to indices that were used. These determine the elements of the conditional proportions table that are being used. Note: It’s typical to use the group with the larger risk as the denominator in relative risk calculations.
Question #3:
TRUE
) vs. when a body camera was not present (a
value of FALSE
).TRUE
) vs. when a body camera was not present (a
value of FALSE
).\(~\)
We can calculate odds and odds ratios using indices in the same way we calculated conditional proportions and relative risks.
## Example table
mental_armed_table = table(police$signs_of_mental_illness, police$armed)
## Odds of "unarmed" among those with signs of mental illness
odds_ua_mental = mental_armed_table[2,2]/mental_armed_table[2,1]
odds_ua_mental
## [1] 0.05667752
## Odds of "unarmed" among those with no signs of mental illness
odds_ua_no = mental_armed_table[1,2]/mental_armed_table[1,1]
odds_ua_no
## [1] 0.06659467
## Odds ratio (no signs of mental illness relative to signs of mental illness)
odds_ua_no/odds_ua_mental
## [1] 1.174975
In this example, the odds ratio of 1.175 suggests that cases with previous signs of mental illness have 17.5% higher odds of being unarmed than those without signs of mental illness.
You might notice that this odds ratio shows approximately the same strength of association as the relative risk, which we found to 1.16 in the previous section. This is similarity between odds ratios and relative risk is true more generally for uncommon (rare) events. If you think about the calculation of odds vs. the calculation of risk, you’ll notice that they use the same numerator, and their denominator will be approximately the same when one when the count of the event of interested is small relative to the overall group size.
Question #4: