\(~\)
This lab contains no onboarding section, you should read through the sections below and work through it with your partners.
The chisq.test()
function is used to perform Chi-squared
Goodness of Fit Tests. The function expects the sample data to be
provided as a frequency table, and the null hypothesis is specified
using the argument p
.
In most circumstances, we’ll use the table()
function to
create the table used as the primary input to
chisq.test()
.
## Create the table (w/ some made up data)
x = c("A", "A", "A", "A", "A", "B", "B", "C", "C", "C", "C", "C")
my_table = table(x)
## Use the table as an input to chisq.test()
chisq.test(x = my_table, p = c(1/3, 1/3, 1/3))
##
## Chi-squared test for given probabilities
##
## data: my_table
## X-squared = 1.5, df = 2, p-value = 0.4724
However, in some circumstances, we might need enter the table
ourselves as a data.frame
object. This is demonstrated
below for our AP Exam example:
## Enter the data ourselves as a data frame
ap_exam_data = data.frame(A = 85, B = 90, C = 79, D = 78, E = 68)
## Perform the test
chisq.test(x = ap_exam_data, p = c(0.2, 0.2, 0.2, 0.2, 0.2))
##
## Chi-squared test for given probabilities
##
## data: ap_exam_data
## X-squared = 3.425, df = 4, p-value = 0.4894
In addition to reporting the \(p\)-value, the conclusion we draw from a hypothesis test should involve any directional relationships that are identified. For a Chi-squared test, this can be accomplished by comparing the observed and expected frequencies to identify where the largest discrepancies are:
## Store the test results
test_results = chisq.test(x = ap_exam_data, p = c(0.2, 0.2, 0.2, 0.2, 0.2))
## Table of expected counts
test_results$expected
## [1] 80 80 80 80 80
Notice how the category with the largest deviation from its expected count was the answer choice “E”. This was easy to see in our example, as all of the categories had the same expected frequency, but a more general approach is to look at standardized residuals: \[\text{stdres} = \frac{\text{Observed Frequency} - \text{Expected Frequency}}{SE}\]
Here SE is the standard error of the table cell in question, which estimated via the expected count adjusted by a scaling factor.
Below are the standardized residuals for our AP Exam example:
## Table of standardized residuals
test_results$stdres
## [1] 0.625 1.250 -0.125 -0.250 -1.500
Using these results we can interpret that the “E” category was the largest discrepancy, but it was only 1.5 standard deviations below what was expected. The largest positive deviation was “B”, and it was only 1.25 standard deviations above what was expected.
Additionally, you should notice that none of these standardized residuals exceeds 2, which is consistent with the Chi-squared test failing to reject the null hypothesis.
Question #1: In court cases jurors are selected from a pool of eligible adults that is supposed to be randomly chosen from the local community. The American Civil Liberties Union (ACLU) has studied the racial composition of jury pools in Alameda County, California, and shown below are the racial/ethnic composition of \(n=1453\) individuals included in these jury pools along with the distribution of eligible jurors (according to US Census data for Alameda County):
Race/Ethnicity | Number in Jury Pools | US Census Percentage |
---|---|---|
Non-Hispanic White | 780 | 54% |
Black | 117 | 18% |
Hispanic | 114 | 12% |
Asian | 384 | 15% |
Other | 58 | 1% |
Total | 1453 | 100% |
Question #2: As part of 2009 study, researchers collected data on the moves of 119 novice players in the game rock-paper-scissors against a computer opponent. The data below record each player’s first and second moves:
rps = read.csv("https://remiller1450.github.io/data/rock_paper_scissors.csv")
\(~\)
Chi-squared tests of independence are also performed using the
chisq.test()
function. Here, we must provide a two-way
frequency table, and we do not use the p
argument, as the
null proportions of this test are derived from the data.
The example below evaluates whether the first and second moves of novice rock-paper-scissors players are independent using the data from Question #2:
## Create the required two-way frequency table
rps_table = table(rps$first_move, rps$second_move)
## Chi-squared test using the two-way table
chisq.test(rps_table)
##
## Pearson's Chi-squared test
##
## data: rps_table
## X-squared = 6.784, df = 4, p-value = 0.1478
We will revisit this warning later, as the Chi-squared distribution only serves as a reasonable probability model when there are sufficiently large counts in each cell of the two-way frequency table (expected counts of at least 5 is a common guideline).
Similar to goodness of fit tests, we can follow-up on our testing results by looking at standardized residuals:
## Store test results
rps_test_results = chisq.test(rps_table)
## Standardized Residuals
rps_test_results$stdres
##
## Paper Rock Scissors
## Paper 0.7754986 -1.2571960 0.5169031
## Rock -0.3258697 1.9778030 -1.7726677
## Scissors -0.6271109 -1.2193823 1.9814473
Notice that all of these residuals reflect deviations that are less than two standard deviations above/below what is expected. Nevertheless, this table does suggest that more players than expected are selecting the same choice for their second move as they had already used for their first move. For example, the number of players who chose rock on their first and second move is 1.98 standard deviations higher than what would be expected under independence.
Question #3: We’ve previously worked with the “TSA Claims” data set, which contained all claims made against the Transport Security Administration between 2003 and 2008. For this question you will analyze a random sample of \(n=5000\) claims from this time period.
tsa_sample = read.csv("https://remiller1450.github.io/data/tsa_small.csv")
Month
(the month when the claim occurred) and Status
(whether a
claim was approved, denied, or settled). If these variables are
independent, what proportion of claims would you expect to be approved,
denied, and settled in each month? That is, find the distribution of
Status
under the null hypothesis of independence.\(~\)
Standardized residuals are most useful when the variables involved in a Chi-squared test each contain a modest number of categories. However, for two-way frequency tables with a large number of cells it can become difficult to attribute a significant Chi-squared test result to any small set of individual cells and it may make sense to describe the strength of relationship holistically.
Cramer’s V is a popular measure of the association between two nominal categorical variables. It takes on values between 0 (independence) and +1 (complete dependence), allowing it to be interpretted similar to Pearson’s correlation coefficient.
\[V = \sqrt{\frac{X^2/n}{\text{min}(r-1, c-1)}}\]
There are a few R
packages that will calculate Cramer’s
V (rcompanion
and lsr
), but it’s easy enough
to do ourselves. The code below calculates Cramer’s V for our
rock-paper-scissors example:
## Table and Test results
rps_table = table(rps$first_move, rps$second_move)
rps_test_results = chisq.test(rps_table)
## Calculate Cramer's V
Cramers_V = as.numeric(sqrt(rps_test_results$statistic/nrow(rps)/
min(nrow(rps_table)-1, ncol(rps_table-1))))
## Print the result
Cramers_V
## [1] 0.1688317
Here Cramer’s V is relatively close to zero, which is unsurprising as our Chi-squared test was not statistically significant.
Moving beyond this application, it is worth noting that Cramer’s V is useful because it allows us compare the strength of association between many pairing of categorical variables due to its standardized scale.
Question #4: Using the tsa_sample
data
described in Question 3, use Cramer’s V to determine whether there is a
stronger association between the variables Month
and
Status
or Claim_Type
and
Status
.
\(~\)
In the test for independence performed on rock-paper-scissors data we
saw a red warning message when using chisq.test()
due to
the small expected counts in some of the cells in the two-way frequency
table. This occurs because the Chi-squared test uses the assumption that
for a sufficiently large sample the frequencies in each cell of the
two-way frequency table will be approximately Normally distributed. A
common “rule of thumb” states that when each cell in the table of
expected counts exceeds 5 this assumption is reasonable.
If the data are such that some cells in the expected two-way table have expected values less than 5, Fisher’s Exact Test provides an exact testing approach that doesn’t rely upon a Normality assumption. The test works by considering all possible two-way frequency tables with row and column totals fixed at their observed values. As you might expect, this gets computationally expensive for large tables, and the assumption of fixed row and column totals makes the test conservative (less powerful) in most settings. I encourage you to read about Fisher’s “lady tasting tea” experiment if you’re interested.
Fisher’s Exact Test is performed using the fisher.test()
function, which behaves similarly to chisq.test()
:
## Fisher's exact test on our rock-paper-scissors experiment
fisher.test(rps_table)
##
## Fisher's Exact Test for Count Data
##
## data: rps_table
## p-value = 0.166
## alternative hypothesis: two.sided
A few things to note:
simulate.p.value = TRUE
in settings where
the full test requires too many computational resources.Question #5: The data below come from an occupational health study that asked individuals working in a biology lab to self-report how frequently they wore their lab gloves outside of the lab. These individuals were grouped by educational attainment, and the researchers were interested in whether the likelihood of wearing lab gloves outside of the lab environment is independent of educational attainment.
occ_health = read.csv("https://remiller1450.github.io/data/occ_health.csv")