Lab #3 - R Markdown and Contingency Tables

$~$

Onboarding

So far we’ve been relying upon R Scripts to store our work; However, R Studio supports several other file types, including R Markdown, a document type that allows for R code, that code’s output, and text written using the stylistic conventions of Markdown to coexist within the same document.

Recent installations of R Studio should come with R Markdown already available. You can check this by navigating:

File -> New File -> R Markdown

If you do not see “R Markdown” displayed in this drop down you’ll need to install the rmarkdown package:

# install.packages("rmarkdown")
library(rmarkdown)

Components of an .Rmd file

At the top of an R Markdown document is the header:

The header is initiated by $\text{---}$ and closed by $\text{---}$
Here you can provide title text, authors, and other information that will appear at the top of the document created by your R Markdown file

After the end of the header you’ll see a code chunk:

Code chunks are initiated by $\text{```\{r\}}$ and closed by $\text{```}$
The first code chunk in most documents is used to set up options for the remainder of the document. In fact, the text “setup” that you see in $\text{```\{r setup\}}$ is giving this chunk the name “setup”. You should keep this chunk as it appears and use other code chunks to add your own code.
You can run the code present in any code chunk using the green arrow in its upper right corner. The grey triangle and green rectangle icon will run all code chunks in your document up to and including the current one in sequential order.

After the setup code chunk you’ll see a section header:

Sections are created using varying numbers of the $\#$ character, which the number determining the size of the header (fewer $\#$ in larger header)

Following the section header you’ll see ordinary text:

Ordinary text will use markdown conventions, so the text $\$\text{H_0: \\mu = 0}\$$ will appear as $H_0: \mu = 0$ in your document.

$~$

Knitting

The purpose of R Markdown is to seamlessly blend R code, output, and written text. This is accomplished by “knitting” your file into a completed report. You can knit a file using the “Knit” button (blue yarn ball icon), or (on windows) by pressing ctrl-shift-k.

A few things to know about knitting:

When you knit an .Rmd file it begins with an empty environment, so the file might not knit if you’ve been testing your code out of order, or if your code depends upon things that you’ve since deleted while working.
Commands like install.packages() and View() cannot be used in the environment where the document is knit. You should comment-out or remove these commands before knitting to prevent errors.

Lab

As a reminder you should work on the lab with your assigned partner(s) using the principles of paired programming. Everyone in your lab group is responsible for the contents of the lab, but you should only submit a single document (created using R Markdown).

Frequency Tables

The simplest way to create a table from a categorical variable in R is using the table() function. Below is an example of a one-way frequency table of armed variable in the police data set:

## Load police data
police <- read.csv("https://remiller1450.github.io/data/Police.csv")

## One-way table of the `gender` variable
table(police$armed)

## 
##   armed unarmed 
##    7091     457

You should notice how we access the variable of interest using the $ operator that was introduced in Lab #1.

The table function can accept any number of variables in your data separated by commas. For example, we can create a two-way table of the armed and manner_of_death variables via the code below:

## Two-way table
table(police$armed, police$manner_of_death)

##          
##           shot shot and Tasered
##   armed   6805              286
##   unarmed  411               46

You should be aware that table() creates a unique type of object that can be stored and used as an input to other functions. For example, the code below stores the previous two-way table and transposes it using the t() function:

my_table = table(police$armed, police$manner_of_death)
t(my_table)

##                   
##                    armed unarmed
##   shot              6805     411
##   shot and Tasered   286      46

Question #1: The data below, stored in the data frame rtd, contains data from an impaired driving experiment where participants completed a baseline drive on a driving simulator, consumed cannabis, then completed three additional simulator drives while high. Participants were asked if they felt ready to drive (on real roads, not in the simulator) before each drive, and the responses are stored in the variables “RT1” (before the baseline) through “RT4” (before the fourth/final drive). They were also asked if they’d ever driven within two-hours of using cannabis (the variable: “Ever”), and if they thought they were a better driver after using cannabis (the variable: “Better”)

rtd <- read.csv("https://remiller1450.github.io/data/Ready.csv")

Part A: Use a frequency table to find the number of drivers who felt ready to drive at time-point #2, which occurred roughly 30-minutes after cannabis dosing.
Part B: Create a two-way frequency table showing the relationship between whether a subject has ever driven within two-hours of cannabis use and whether a subject was ready to drive at time-point #2. Display the values of the readiness to drive at time-point #2 variable using the table’s columns.

$~$

Conditional Proportions

The prop.table() function accepts a table as its input and will convert the table’s frequencies to proportions:

my_table = table(police$armed, police$manner_of_death)
prop.table(my_table)

##          
##                 shot shot and Tasered
##   armed   0.90156333       0.03789083
##   unarmed 0.05445151       0.00609433

By default, the function calculates proportions out of the total number of cases contained in the table. However, the function can calculate conditional proportions if the margin argument is used:

## Margin = 1 gives row proportions
prop.table(my_table, margin = 1)

##          
##                 shot shot and Tasered
##   armed   0.95966718       0.04033282
##   unarmed 0.89934354       0.10065646

## Margin = 2 gives column proportions
prop.table(my_table, margin = 2)

##          
##                 shot shot and Tasered
##   armed   0.94304324       0.86144578
##   unarmed 0.05695676       0.13855422

Question #2: For this question you should use the rtd data set introduced in Question #1. Begin by creating a two-way frequency table where the rows represent values of the variable Ever and the columns represent values of the variable RT2, then use this table (and other R commands as necessary) to answer the following:

Part A: What is the proportion of subjects who reported being ready to drive at time-point #2 that have previously driven within two-hours of using cannabis?
Part B: What is the proportion of subjects who have never driven within two-hours of using cannabis that reported being ready to drive at time-point #2.
Part C: Of the two variables used in this table, which should be viewed as the explanatory variable and which should be viewed as the response variable? Briefly explain.

$~$

Relative Risks and Odds

Table objects store their values in indexed positions similar to how values are stored in a data frame. Thus, we could access the value in position 1,1 of our example table using the following code:

my_table = table(police$armed, police$manner_of_death)
my_table[1,1]

## [1] 6805

We will later learn about some built-in functions that calculate measures like odds ratios (and a whole lot of other output), but for now we can use index positions to calculate descriptive statistics like risk difference, relative risk, odds, and odds ratios by ourselves using indices.

Below are a few examples:

# Two-way frequency table
my_table = table(police$armed, police$manner_of_death)

# Row props
my_row_props = prop.table(my_table, margin = 1)

## Risk difference of being shot w/o first being tased for armed vs. unarmed
my_row_props[1,1] - my_row_props[2,1]

## Relative risk of being shot w/o first being tased for armed vs. unarmed 
my_row_props[1,1]/my_row_props[2,1]

## Odds of being shot w/o first being tased for armed individuals
my_table[1,1]/my_table[1,2]

## Odds ratio of the likelihood of being shot w/o tasing for armed vs. unarmed
(my_table[1,1]*my_table[2,2])/(my_table[1,2]*my_table[2,1])

## [1] 0.06032364
## [1] 1.067075
## [1] 23.79371
## [1] 2.663043

Question #3: Similar to Questions #1 and #2, begin by creating a two-way frequency table from the rtd data where the rows represent values of the variable Ever and the columns represent values of the variable RT2, then use this table (and other R commands as necessary) to answer the following:

Part A: Find the risk difference describing the difference in risk of feeling ready to drive 30-minutes after consuming cannabis for those who have driven within 2-hours of cannabis use relative to those who have not.
Part B: In your own words, use a single, complete sentence to report the risk difference you found in Part A as if you were writing up the results of a scientific study.
Part C: What is the relative risk of being ready to drive for individuals who have driven within 2-hours of cannabis use relative to those who have not?
Part D: In your own words, use a single, complete sentence to report the relative risk you found in Part C as if you were writing up the results of a scientific study.
Part E: What is the odds ratio of being ready to drive for individuals who have driven within 2-hours of cannabis use relative to those who have not?
Part F: In your own words, use a single, complete sentence to report the odds ratio you found in Part E as if you were writing up the results of a scientific study.
Part G: Which of these measures of risk would you be most likely to report for this study if you could only choose one? Briefly explain (Note: there may not be a single right or wrong answer to this question)

$~$

Submission Directions

On P-web, have one person submit a compiled (ie: knitted) document created using R Markdown. You are welcome to knit to either HTML, pdf (requires LaTex or using the R Studio server), or Word (requires you have MS Word installed).
- Make sure everyone’s names are somewhere in the document
If you knit to HTML, you will need to compress your file into a zipped folder before P-Web will let you upload it. You can do this with the following steps:

Right click on the HTML file -> Send to -> Compressed folder