Directions (read before starting)
\(~\)
At this point you should begin working independently with your assigned partner(s).
A widely cited study, published in 1981, analyzed data on all murders which took place during a felony that were committed in the state of Florida between 1972 and 1977. The study recorded numerous attributes pertaining to each of these murders, with the outcome of interest being whether the offender was sentenced to the death penalty. The researchers were interested in investigating potential racial bias in death penalty sentencing.
Data from this study are available at this URL:
https://remiller1450.github.io/data/DeathPenaltySentencing.csv
These data contain the following variables:
Question 1: This question explores whether the
explanatory variable OffenderRace
is associated with the
response variable DeathPenalty
.
OffenderRace
and the columns are the
values of DeathPenalty
.\(~\)
These data contain a third variable, the race of the victim. For reasons that will soon be apparent, it common for statisticians to analyze subgroups of cases, or segments of the data created by conditioning on a variable that is not directly involved in the primary research question.
The technique of splitting up a data set according to a variable that is neither the explanatory nor response variable is called stratification. After stratification, conditional analyses are performed within each stratum.
For the death penalty sentencing data, we can perform a stratified
analysis using the third variable VictimRace
by using the
filter()
function (from the dplyr
package)
before repeating the steps performed in Question 1.
Question 2: In this question you’ll perform a
stratified analysis on the subgroups created by the variable
VictimRace
.
filter()
function in
the dplyr
package to create a subset of data that contains
only the cases involving a White victimOffenderRace
and the columns are the values of
DeathPenalty
.filter()
function
to create a subset of data containing only cases involving a Black
victim. Use this data to create a two-way frequency table where the rows
are values of OffenderRace
and the columns are the values
of DeathPenalty
.\(~\)
In this study, “VictimRace” is a confounding variable, or a third variable that is associated with both the explanatory and response variables in the primary analysis. A consequence of confounding is that the real relationship between the explanatory and response variables can be obscured.
Stratification can neutralize the impact of a confounding variable, as every case belonging to a particular stratum has identical values of the confounding variable. Thus, what was a confounding variable no longer meets the definition of confounding within a stratum because it is no longer associated with variable involved in the primary analysis.
Question 3: This question explores why the
variable VictimRace
confounds the relationship between
OffenderRace
and DeathPenalty
.
DeathPenalty
for each value of VictimRace
(this is a conditional bar chart). Provide a brief
interpretation of the relationship you see in this bar chart.OffenderRace
for each value of VictimRace
(this is a conditional bar chart). Provide a brief
interpretation of the relationship you see in this bar chart.VictimRace
satisfy the
definition of confounding?