\(~\)
Throughout the semester we’ve reported various different measures of association (odds ratios, correlation, etc.) found in our sample data as estimates of what might be true of the population we’re studying. Looking beyond the metric we use to summarize an association, statisticians also consider a few distinct types of association:
In circumstances where the marginal effect differs from the conditional effects found within subgroups, statisticians might calculate an adjusted effect, or the expected effect of an explanatory variable if other variables that also depend upon it are controlled for.
Consider an observational study comparing the blood pressure of individuals who exercise at least 180 minutes each week with those who exercise fewer than 180 minutes each week.
\(~\)
In this lab you’ll work with the “Florida Death Penalty Sentencing” data set. These data are from an influential study published in 1981 that analyzed the sentencing outcomes of individuals who were convicted of murder committed in the course of another felony in the state of Florida between 1972 and 1977. The researchers were studying racial bias in death penalty sentencing, and they recorded the race of the offender, the race of the murder victim, and whether or not the offender was sentenced to receive the death penalty.
death_penalty <- read.csv("https://remiller1450.github.io/data/DeathPenaltySentencing.csv")
You’ll also need the ggplot2 and dplyr
libraries, so make sure you’ve loaded them:
library(ggplot2)
library(dplyr)
Recall that we can find proportions using either the
table() function, or group_by() and
summarize() from the dplyr package. Below is a
two-way table relating the variables OffenderRace (whether
the individual being tried was White or Black) and
DeathPenalty (whether or not the death penalty was
awarded):
## Two-way table using table()
table(death_penalty$OffenderRace, death_penalty$DeathPenalty)
##
## death not
## black 38 142
## white 46 152
Below is the same information found using group_by() and
summarize():
death_penalty %>% group_by(OffenderRace, DeathPenalty) %>% summarize(Count = n())
## # A tibble: 4 × 3
## # Groups: OffenderRace [2]
## OffenderRace DeathPenalty Count
## <chr> <chr> <int>
## 1 black death 38
## 2 black not 142
## 3 white death 46
## 4 white not 152
Question #1:
prop.test() to find a 95%
confidence interval estimate for the difference in proportions of White
and Black offenders who received the death penalty. Does there appear to
be statistically convincing evidence of a difference?\(~\)
Question #2:
fisher.test() to find a
95% confidence interval estimate for the odds ratio comparing
the odds of a Black offender receiving the death penalty with the odds
of a White offender receiving the death penalty. Does there appear to be
statistically convincing evidence of an association between the
offender’s race and the death penalty verdict?An advantage of using group_by() and
summarize() is that we can add arbitrarily many different
grouping variables. This allows us to find conditional effects
for subgroups present in the sample.
This is demonstrated below by finding the sample counts for every
combination of OffenderRace, VictimRace, and
DeathPenalty:
death_penalty %>% group_by(OffenderRace, VictimRace, DeathPenalty) %>% summarize(Count = n())
## # A tibble: 7 × 4
## # Groups: OffenderRace, VictimRace [4]
## OffenderRace VictimRace DeathPenalty Count
## <chr> <chr> <chr> <int>
## 1 black black death 1
## 2 black black not 101
## 3 black white death 37
## 4 black white not 41
## 5 white black not 8
## 6 white white death 46
## 7 white white not 144
Alternatively, we could also perform a subgroup analysis by filtering
the original data set and applying functions like table()
to that subset. This approach is demonstrated below:
white_victim_subset = death_penalty %>% filter(VictimRace == "white")
table(white_victim_subset$OffenderRace, white_victim_subset$DeathPenalty)
##
## death not
## black 37 41
## white 46 144
Question #3:
prop.test() to test the
hypothesis that equal proportions of White and Black offenders receive
the death penalty among the subset of cases involving a White victim.
Report a one-sentence conclusion that includes the strength of evidence,
context, and the observed sample proportions (see our Lab 5 for examples
of this type of conclusion).\(~\)
Question #4: In Questions #1 and #2 the marginal effect suggested that White offenders were slightly more likely to receive the death penalty; however, in Question #3 the conditional effect of offender’s race was such that Black offenders are more likely to receive the death penalty regardless of the race of the victim. This question will explore how this type of surprising result is possible.
DeathPenalty for each value of VictimRace (a
conditional bar chart). Briefly describe the relationship between these
two variables.OffenderRace for each value of VictimRace.
Briefly describe the relationship between these two variables.\(~\)
In Questions #3 and #4 you saw that VictimRace was a
confounding variable in the relationship between
OffenderRace and DeathPenalty. More generally,
a confounding variable is defined as a third variable that is associated
with both the explanatory and response variables in an analysis. The
presence of a confounding variable will obscure the relationship between
the explanatory and response variables, making the marginal effect
misleading.
One strategy to address confounding is stratification, which involves separating the sample into subsets based upon the confounding variable and reporting conditional effects within each of these subsets. If all confounding variables are addressed via stratification we expect to have an accurate picture of how the explanatory and response variables are related.
A downside of stratification is that we now have a different effect within each stratum to interpret rather than a single number describing the entire dataset. This might be manageable if we are stratifying by one or two variables, but it quickly becomes untenable if there are a lot of confounding variables that we need to address.
As an alternative, we can recognize that the marginal effect in the
death penalty sentencing study was misleading due to an imbalance in
VictimRace across the different categories of
OffenderRace (as well as the association between
VictimRace and DeathPenalty). So, we could
resolve the confounding effects of VictimRace if we were
able to force an equal distribution of VictimRace
within each category of OffenderRace. To see how this
works, consider the marginal risk of a White offender receiving the
death penalty (which we had previously found to be 0.232): \[\text{Risk}_{\text{death}|\text{white off}} =
\frac{190}{198}\cdot 0.242 + \frac{8}{198} \cdot 0 = 0.232\]
From the above expression we can see that the marginal risk can be decomposed into a weighted sum of the conditional risks:
Instead of using these “weights” that naturally occurred in our sample, we could use another set of weights. Two logical choices are:
The example below shows the adjusted risk using the first approach: \[\text{Adjusted Risk}_{\text{death}|\text{white off}} = \frac{268}{378}\cdot 0.242 + \frac{110}{378} \cdot 0 = 0.172\]
Notice that overall, regardless of offender’s race, 268 of the 378
cases in the data set involved a White victim, while 110 involved a
Black victim. These values could be found using a simple one-way
frequency table of VictimRace, and here they are being used
to determine the risk of death penalty for White offenders if White
offenders had the same distribution of victim’s race as was observed in
the entire dataset.
Question #5:
VictimRace to find an adjusted
risk of receiving a Black offender receiving the death penalty. Show
your calculation.