Lab #13 - Confounding, Stratification, and Adjusted Effects

\(~\)

Onboarding

Throughout the semester we’ve reported various different measures of association (odds ratios, correlation, etc.) found in our sample data as estimates of what might be true of the population we’re studying. Looking beyond the metric we use to summarize an association, statisticians also consider a few distinct types of association:

Marginal effects - The average effect of the explanatory variable across all individuals in the population.
Conditional effects - The effect of an explanatory variable for a specific individual or subgroup of the population.

In circumstances where the marginal effect differs from the conditional effects found within subgroups, statisticians might calculate an adjusted effect, or the expected effect of an explanatory variable if other variables that also depend upon it are controlled for.

Example

Consider an observational study comparing the blood pressure of individuals who exercise at least 180 minutes each week with those who exercise fewer than 180 minutes each week.

The overall difference in average blood pressures among these two groups in the sample is a marginal effect.
The difference in average blood pressure among those aged 65+ across these two groups is a conditional effect.
If the age composition of the two groups differs, we might modify the marginal effect to adjust for differences in age within the two groups (we’ll see one way of doing this later in the lab)

\(~\)

Lab

In this lab you’ll work with the “Florida Death Penalty Sentencing” data set. These data are from an influential study published in 1981 that analyzed the sentencing outcomes of individuals who were convicted of murder committed in the course of another felony in the state of Florida between 1972 and 1977. The researchers were studying racial bias in death penalty sentencing, and they recorded the race of the offender, the race of the murder victim, and whether or not the offender was sentenced to receive the death penalty.

death_penalty <- read.csv("https://remiller1450.github.io/data/DeathPenaltySentencing.csv")

You’ll also need the ggplot2 and dplyr libraries, so make sure you’ve loaded them:

library(ggplot2)
library(dplyr)

Marginal Effects

Recall that we can find proportions using either the table() function, or group_by() and summarize() from the dplyr package. Below is a two-way table relating the variables OffenderRace (whether the individual being tried was White or Black) and DeathPenalty (whether or not the death penalty was awarded):

## Two-way table using table()
table(death_penalty$OffenderRace, death_penalty$DeathPenalty)

##        
##         death not
##   black    38 142
##   white    46 152

Below is the same information found using group_by() and summarize():

death_penalty %>% group_by(OffenderRace, DeathPenalty) %>% summarize(Count = n())

## # A tibble: 4 × 3
## # Groups:   OffenderRace [2]
##   OffenderRace DeathPenalty Count
##   <chr>        <chr>        <int>
## 1 black        death           38
## 2 black        not            142
## 3 white        death           46
## 4 white        not            152

Question #1:

Part A: Using the two-way table provided in this section, calculate the proportion of White offenders who received the death penalty.
Part B: Now calculate the proportion of Black offenders who received the death penalty. Does this proportion seem substantially different from the proportion of White offenders who received the death penalty?
Part C: Use prop.test() to find a 95% confidence interval estimate for the difference in proportions of White and Black offenders who received the death penalty. Does there appear to be statistically convincing evidence of a difference?
Part D: In your own words, explain why the difference in proportions you calculated in Part C is a marginal effect.

\(~\)

Question #2:

Part A: Using the two-way table provided in this section, calculate the odds of a White offender receiving the death penalty.
Part B: Now calculate the odds of a Black offender receiving the death penalty. Do these odds seem substantially different from those you found in Part A?
Part C: Use fisher.test() to find a 95% confidence interval estimate for the odds ratio comparing the odds of a Black offender receiving the death penalty with the odds of a White offender receiving the death penalty. Does there appear to be statistically convincing evidence of an association between the offender’s race and the death penalty verdict?

Conditional Effects

An advantage of using group_by() and summarize() is that we can add arbitrarily many different grouping variables. This allows us to find conditional effects for subgroups present in the sample.

This is demonstrated below by finding the sample counts for every combination of OffenderRace, VictimRace, and DeathPenalty:

death_penalty %>% group_by(OffenderRace, VictimRace, DeathPenalty) %>% summarize(Count = n())

## # A tibble: 7 × 4
## # Groups:   OffenderRace, VictimRace [4]
##   OffenderRace VictimRace DeathPenalty Count
##   <chr>        <chr>      <chr>        <int>
## 1 black        black      death            1
## 2 black        black      not            101
## 3 black        white      death           37
## 4 black        white      not             41
## 5 white        black      not              8
## 6 white        white      death           46
## 7 white        white      not            144

Alternatively, we could also perform a subgroup analysis by filtering the original data set and applying functions like table() to that subset. This approach is demonstrated below:

white_victim_subset = death_penalty %>% filter(VictimRace == "white")
table(white_victim_subset$OffenderRace, white_victim_subset$DeathPenalty)

##        
##         death not
##   black    37  41
##   white    46 144

Question #3:

Part A: Considering only cases involving a White victim, report the proportion of White offenders and the proportion of Black offenders who received the death penalty.
Part B: Use prop.test() to test the hypothesis that equal proportions of White and Black offenders receive the death penalty among the subset of cases involving a White victim. Report a one-sentence conclusion that includes the strength of evidence, context, and the observed sample proportions (see our Lab 5 for examples of this type of conclusion).
Part C: Now consider only cases involving a Black victim. Report the proportion of White offenders and the proportion of Black offenders who received the death penalty.

\(~\)

Question #4: In Questions #1 and #2 the marginal effect suggested that White offenders were slightly more likely to receive the death penalty; however, in Question #3 the conditional effect of offender’s race was such that Black offenders are more likely to receive the death penalty regardless of the race of the victim. This question will explore how this type of surprising result is possible.

Part A: Using the entire data set, create a bar chart that shows the proportion of cases in each category of DeathPenalty for each value of VictimRace (a conditional bar chart). Briefly describe the relationship between these two variables.
Part B: Again using the entire data set, create a conditional bar chart showing the proportion of cases in each category of OffenderRace for each value of VictimRace. Briefly describe the relationship between these two variables.
Part C: Using the results from Parts A and B, explain why the marginal effect seemingly suggests White offenders are more likely to receive the death penalty despite Black offenders receiving the death penalty more often for both categories of victim’s race. Note: this phenomenon is an example of Simpson’s paradox

\(~\)

Adjusted Effects

In Questions #3 and #4 you saw that VictimRace was a confounding variable in the relationship between OffenderRace and DeathPenalty. More generally, a confounding variable is defined as a third variable that is associated with both the explanatory and response variables in an analysis. The presence of a confounding variable will obscure the relationship between the explanatory and response variables, making the marginal effect misleading.

One strategy to address confounding is stratification, which involves separating the sample into subsets based upon the confounding variable and reporting conditional effects within each of these subsets. If all confounding variables are addressed via stratification we expect to have an accurate picture of how the explanatory and response variables are related.

A downside of stratification is that we now have a different effect within each stratum to interpret rather than a single number describing the entire dataset. This might be manageable if we are stratifying by one or two variables, but it quickly becomes untenable if there are a lot of confounding variables that we need to address.

As an alternative, we can recognize that the marginal effect in the death penalty sentencing study was misleading due to an imbalance in VictimRace across the different categories of OffenderRace (as well as the association between VictimRace and DeathPenalty). So, we could resolve the confounding effects of VictimRace if we were able to force an equal distribution of VictimRace within each category of OffenderRace. To see how this works, consider the marginal risk of a White offender receiving the death penalty (which we had previously found to be 0.232): \[\text{Risk}_{\text{death}|\text{white off}} = \frac{190}{198}\cdot 0.242 + \frac{8}{198} \cdot 0 = 0.232\]

From the above expression we can see that the marginal risk can be decomposed into a weighted sum of the conditional risks:

Recall that in Question #3 you found the conditional risk of the death penalty for White offenders to be 0.242 (or 46/190) when the victim was White, and 0.0 (or 0/8) when the victim was Black
- Additionally, notice that of the 198 cases involving a White offender, 190 involved a White victim, and 8 involved a Black victim

Instead of using these “weights” that naturally occurred in our sample, we could use another set of weights. Two logical choices are:

The overall proportions of White and Black victims in the entire dataset
Considering either White offenders or Black offenders as a reference group, and using their distribution of victim’s race to set the weights

The example below shows the adjusted risk using the first approach: \[\text{Adjusted Risk}_{\text{death}|\text{white off}} = \frac{268}{378}\cdot 0.242 + \frac{110}{378} \cdot 0 = 0.172\]

Notice that overall, regardless of offender’s race, 268 of the 378 cases in the data set involved a White victim, while 110 involved a Black victim. These values could be found using a simple one-way frequency table of VictimRace, and here they are being used to determine the risk of death penalty for White offenders if White offenders had the same distribution of victim’s race as was observed in the entire dataset.

Question #5:

Part A: Verify that the marginal risk of a Black offender receiving the death penalty (which you previously found to be 0.211) can be expressed as a weighted sum of conditional risks. To do this, you should show a calculation similar to the one in this section’s example.
Part B: Similar to the example in this section, use the overall distribution of VictimRace to find an adjusted risk of receiving a Black offender receiving the death penalty. Show your calculation.
Part C: Is the adjusted risk you found in Part B higher or lower than the marginal risk? Why do you think this happened?
Part D: Compare the adjusted risk you found in Part B with the adjusted risk for White offenders (shown to be 0.172 in the example). Do these adjusted risks suggest possible racial bias? Explain your reasoning.