The candy weighing activity demonstrated the critical importance of how data is collected if valid inference is to be made. In this lab, we’ll see that even when our data is representative sample or a population it is still challenging to accurately answer a research question when multiple variables are involved.

Directions

  • Read through the entire lab (not just the questions). The lab will introduce course content that you will be responsible for on exams/homework.
  • Answer all questions in a separate document, attaching Minitab output if needed.
  • Do not use a “divide and conquer” strategy. While it is tempting to get done quicker, this approach negatively impacts you and your classmates. You are expected to work through the lab as a team. Also, you should recognize that Prof. Miller is happy to devote more class time to a lab if it is taking longer than anticipated.

Florida Death Penalty Data

A widely cited study, published in 1981, analyzed data on murders which took place during a felony that were committed in the state of Florida between 1972 and 1977. The study recorded numerous attributes pertaining to each of these murders, with the outcome of interest being whether the offender was sentenced to the death penalty. The researchers were interested in investigating potential racial bias in death penalty sentencing. We will analyze some of the data collected by these researchers in today’s lab.

Question #1

Inspect the Death Penalty Sentencing Data, specifically considering the variables: “OffenderRace”, which describes the race of the offender, and “DeathPenalty”, which describes whether the offender received the death penalty. Suppose the research goal is to determine whether there is racial bias in death penalty sentencing (in Florida during the 1970’s). For this question, identify the explanatory and response variables, and briefly explain your choices.

Question #2

Use Minitab to create a two-way frequency table displaying the race of the offender (rows) and whether the offender received the death penalty (columns).

Question #3

Use Minitab to create a stacked, conditional bar chart displaying the conditional proportions of white and black who received the death penalty. Note that these should be conditional proportions, so each “stack” should have a height of one. Be sure to include your graph in your lab write-up.

Question #4

Using the table you created in Questions #2, report the conditional proportions of white offenders who received the death penalty, and of black offenders who received the death penalty. Then using this result, and the graph from Question #3, answer the following: Do the variables “OffenderRace” and “DeathPenalty” appear to be associated? Does there appear to be evidence of racial bias in death penalty sentences?

Causation vs. Association

Before using these data to make definitive statements about racial bias (or lack thereof), we need to get more specific about variable associations.

Two variables are said to be associated if certain values of one variable tend to correspond with certain values of the other variable across cases.

Many statistical applications center around discovering associations, but much of the time researchers are seeking to find a more specific type of association known as causation.

Two variables are said to be causally associated if directly changing the value of one variable (for a given case) directly affects the value of the other variable. For example, consider the binary categorical variables “developed polio?” and “received polio vaccine?”:

  • The variables are associated if a lower (or higher) proportion of people who received the vaccine developed polio
  • The variables are causally associated if giving an individual the polio vaccine directly reduces the likelihood that they go on to develop polio

Sometimes it is very easy to dismiss the idea that an association is a causal association.

  • Consider the binary categorical variables: “Plays professional basketball?”, and “fits comfortably into a commercial airplane seat?”
  • The variables are clearly associated, people who play professional basketball are less likely to fit comfortably in an airplane than those who don’t
  • But the relationship isn’t causal, it should be obvious that taking a random person and forcing them onto a professional basketball roster won’t all of a sudden make them less comfortable on airplanes

Check out the website: Spurious Correlations to see some examples of obvious non-causal associations.

Question #5

Returning to Florida death penalty sentencing case study, briefly explain what a causal association would mean in the context of the variables described in Questions #1-4.

Stratification

The Florida courts data contains another variable, the race of the victim. For reasons that will soon be apparent, it common for statisticians to analyze subgroups of cases, or groups that are created by conditioning on a particular variable. For example, we might want to look at first-year students and upper classmen separately if we think that the relationship between explanatory and response variables might differ within these groups. To account for this, we could split the data based upon each case’s year.

The technique of splitting up the cases based upon a certain variable is called stratification, and any analysis done within a stratum (a subgroup) is said to be conditional on the variable used to split the data. For example, “the age distribution of first-year students” is synonymous with “the distribution of age, conditional upon class being first-year”.

One way to implement stratification is to subset your Minitab worksheet using Data -> Subset Worksheet. While we won’t do that here, it is a useful technique that you might need in the future. For our specific situation, we can avoid subsetting the worksheet by adding a layer to our two-way frequency table. This is sometimes called a stratified two-way table, or a three-way table.

Question #6

Create a stratified two-way frequency tables using the race of the offender as the row variable, the death penalty sentence as the column variable, and race of the victim as the layer variable. Include the results in your lab write-up.

Question #7

For each stratum (there are two strata here, one stratum contains cases involving white victims, the other stratum contains cases involving black victims), find the conditional proportion of white offenders who received the death penalty and the conditional proportion of black offenders who received the death penalty. You should report four different conditional proportions for this question.

Question #8

Construct a bar chart displaying the four conditional proportions you calculated in Question #7. Include your output in your lab write-up. (Hint: Use VictimRace as the outermost variable and “Take Percent and/or Accumulate” within categories at “level 2”)

Question #9

Based upon what you saw in Questions #7 and #8, do you see any evidence of racial bias? Briefly explain your answer.

Confounding Variables

A confounding variable is a third variable that is associated with both the explanatory and response variables in an analysis. The presence of a confounding variable can obscure the relationship between the explanatory and response variables.

Question #10

In the death penalty sentencing case study, race of the victim is confounding variable. For this question, use the definition of confounding stated above to briefly explain why “VictimRace” satisfies the definition of confounding. Use statistical methods (such as conditional proportions) to justify your answer.

Neutralizing Confounding Variables

How to establish causality in the presence of confounding variables is a fundamental question in any field of scientific research. When thinking about what would be necessary to neutralize a confounding variable, we should consider the following proposition:

  • If we could somehow force the confounding variable to unrelated with the explanatory variable, then it would no longer be a confounding variable (ie: it wouldn’t fit the definition of confounding).
  • One way to achieve this is via stratification, as within each stratum there is no relationship between the explanatory variable and the variable defining the strata (because every case in a stratum has that same value of the confounder!)

Question #11

Suppose we conduct a study and gather data on 100 cases. In our analysis of these data, we identify 10 possible confounding variables. Briefly explain why stratification wouldn’t be a reasonable way of addressing the identified confounding variables in this situation.

Randomization

Stratification gives us a tool to break the link between a known confounder and our explanatory variable. But what if there are confounding variables that we didn’t collect data on?

A more reliable approach to neutralizing all confounding variables is to collect the data using a design known as a randomized experiment. In a randomized experiment, representative cases from a target population are first recruited, then the values of the explanatory variable are randomly assigned to these cases.

We should also be aware of the statistical meaning of word experiment, since it differs slightly from how the word gets used in everyday conversation. In statistics, an experiment is a study design where the researchers directly manipulate or assign the explanatory variable. Random assignment is a popular way to conduct an experiment, but the explanatory variable could be assigned non-randomly.

The “opposite” of an experiment is an observational study. In this type of design, the researchers do not manipulate the explanatory variable, they merely observe it as is. We will discuss various types of observational studies in greater detail later in the course, but for now the important thing to know is that observational studies are inherently troubled by confounding.

Question #12

Returning to the death penalty sentencing case study, is this a randomized experiment or an observational study? Briefly explain your answer.

Question #13

Randomized experiments are the best way to establish causal association. That said, could a randomized experiment to investigate racial bias in death penalty sentencing? If so, briefly describe how you’d design the study. If not, briefly describe why a randomized experiment isn’t feasible for this research question.

Question #14

In your own words, summarize what you learned about confounding variables in this lab in 2-3 sentences.

Submission Directions

  • Double check that you’ve completed all of the lab’s questions, making sure that everyone in your group agrees with the answer you’ve provided. You will receive a single group score for the lab.
  • Make sure that everyone’s name is on the write-up.
  • Email your completed write-up to Professor Miller with a subject heading that includes the text “Sta-209-Lab2”. Please include this exact character string, including the dashes. You will lose 1 point off the top of your score if you don’t do so.
  • If you’d like to provide feedback on your group, fill out the optional review form at this link: https://forms.gle/wNWRFMbbra8oK4LJ8