This lab provides a brief overview of functions used to perform statistical tests that frequently discussed in an introductory statistics course.

If you are unfamiliar with hypothesis testing, I encourage you to look over my course notes on the topic

Also note that this lab is entirely optional. The format is more similar to a reference guide, and all questions appear at the end of the document; however, they’ll require you to be familiar with the earlier sections.

\(~\)

One-sample Tests

One-sample tests are used to decide whether a summary statistic from your sample data is statistically different from a hypothesized value.

For categorical data, a common null hypothesis is \(H_0: p = p_0\), where \(p_0\) is a hypothesized proportion for the categorical outcome of interest. This hypothesis can be evaluated using a one-sample Z-test:

acs <- read.csv("https://remiller1450.github.io/data/EmployedACS.csv")  ## Random sample of 1287 employed individuals from the American Community Survey

n_male = sum(acs$Sex == 1)  ## Number of males among respondents
n_total = nrow(acs)         ## Total sample size
prop.test(x = n_male, n = n_total, p = 0.5, alternative = "two.sided")
## 
##  1-sample proportions test with continuity correction
## 
## data:  n_male out of n_total, null probability 0.5
## X-squared = 3.1826, df = 1, p-value = 0.07443
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.4975476 0.5528050
## sample estimates:
##         p 
## 0.5252525

or an exact binomial test:

binom.test(x = n_male, n = n_total, p = 0.5, alternative = "two.sided")
## 
##  Exact binomial test
## 
## data:  n_male and n_total
## number of successes = 676, number of trials = 1287, p-value = 0.07439
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4975508 0.5528386
## sample estimates:
## probability of success 
##              0.5252525

For either test, you must provide the numerator and denominator of the sample proportion (ie: count of males and total sample size), as well as the hypothesized population proportion, “p”.

For quantitative data, the one-sample \(t\)-test should be used to assess the hypothesis: \(H_0: \mu = \mu_0\), where \(\mu_0\) is a hypothesized mean.

t.test(x = acs$Income, mu = 40, alternative = "two.sided")
## 
##  One Sample t-test
## 
## data:  acs$Income
## t = 2.9449, df = 1286, p-value = 0.003289
## alternative hypothesis: true mean is not equal to 40
## 95 percent confidence interval:
##  41.50877 47.53074
## sample estimates:
## mean of x 
##  44.51976

In this example, a vector containing the quantitative variable is given as the x argument, and the hypothesized mean is defined by the argument mu.

\(~\)

Two-sample Tests

Two-sample tests are used to assess whether two different sample groups are statistically different. A common example is A/B testing, where experimental participants are randomly assigned into one of two conditions (A or B) and an outcome is recorded.

For categorical data, you might use a difference in proportions Z-test:

## First you'll need the numerator and denominator of each sample's proportion
ins_white = sum(acs$HealthInsurance == 1 & acs$Race == "white")
n_white = sum(acs$Race == "white")
ins_black = sum(acs$HealthInsurance == 1 & acs$Race == "black")
n_black = sum(acs$Race == "black")

## Z-test
prop.test(x = c(ins_white, ins_black), n = c(n_white, n_black))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(ins_white, ins_black) out of c(n_white, n_black)
## X-squared = 0.14491, df = 1, p-value = 0.7034
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.04383785  0.07295584
## sample estimates:
##    prop 1    prop 2 
## 0.9283521 0.9137931

Notice the x argument is given a vector containing two values, in this example they are the number of insured individuals in each group. Similarly, the n argument is given the total number of individuals belonging to each group (also as a vector containing two elements).

For quantitative data, you should use a two-sample T-test:

t.test(Income ~ Sex, data = acs)
## 
##  Welch Two Sample t-test
## 
## data:  Income by Sex
## t = -4.921, df = 1231.2, p-value = 9.776e-07
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -20.650616  -8.878211
## sample estimates:
## mean in group 0 mean in group 1 
##        36.76471        51.52913

The syntax relating “Income” and “Sex” in this example uses formula notation. Here, the formula Income ~ Sex indicates the quantitative outcome, “Income”, should be evaluated according to the two groups created by the variable “Sex”.

You might read the ~ symbol as “is predicted by”, so this entire formula can be read as “Income is predicted by Sex”.

\(~\)

Chi-Squared Tests

One and two-sample testing procedures assume all categorical variables are binary. However, oftentimes it is too simplistic to reduce a nominal categorical variable (many categories) into a binary variable (two categories).

In these circumstances, you might consider a Chi-squared Test (either goodness of fit or association) or Fisher’s Exact Test (association):

## Goodness of fit Chi-squared test
chisq.test(x = table(acs$Race), p = c(0.05, 0.15, 0.1, 0.7))
## 
##  Chi-squared test for given probabilities
## 
## data:  table(acs$Race)
## X-squared = 54.6, df = 3, p-value = 8.356e-12
## Chi-squared test of association
chisq.test(x = table(acs$Race, acs$HealthInsurance))
## 
##  Pearson's Chi-squared test
## 
## data:  table(acs$Race, acs$HealthInsurance)
## X-squared = 25.378, df = 3, p-value = 1.287e-05
## Fisher's exact test
fisher.test(x = table(acs$Race, acs$HealthInsurance))
## 
##  Fisher's Exact Test for Count Data
## 
## data:  table(acs$Race, acs$HealthInsurance)
## p-value = 0.0001573
## alternative hypothesis: two.sided

For goodness of fit testing, the x argument should be a one-way table; while for tests of association, it should be a two-way table. The argument p is only used for goodness of fit testing, and it indicates the hypothesized proportions.

\(~\)

ANOVA

Similarly, some studies involve the comparison of a numeric outcome across several groups. In these settings you should use one-way ANOVA:

anova_mod <- aov(Income ~ Race, data = acs)
summary(anova_mod)
##               Df  Sum Sq Mean Sq F value   Pr(>F)    
## Race           3   56523   18841   6.291 0.000309 ***
## Residuals   1283 3842204    2995                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice how aov is another hypothesis testing function that uses formula notation. It turns out that one-way ANOVA is equivalent to a two-sample t-test if there are only two groups.

\(~\)

Post-hoc Testing

A statistically significant ANOVA test indicates that there is at least one pairing of groups that are more different than could be expected by random chance.

This finding should be followed up by post-hoc testing to determine which groups are different from each other. One method for this is Tukey’s Honest Significant Differences:

TukeyHSD(anova_mod)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Income ~ Race, data = acs)
## 
## $Race
##                   diff        lwr        upr     p adj
## black-asian -26.750588 -46.403386 -7.0977905 0.0026929
## other-asian -30.045124 -50.285682 -9.8045650 0.0008099
## white-asian -15.697207 -31.049147 -0.3452658 0.0428292
## other-black  -3.294535 -22.402516 15.8134456 0.9708621
## white-black  11.053382  -2.771118 24.8778819 0.1680528
## white-other  14.347917  -0.300105 28.9959390 0.0573859

The input to TukeyHSD is an object containing the results from our ANOVA test. We see that the function provides pairwise confidence intervals and p-values for all combinations of groups. These are also adjusted for multiple comparisons (to preserve the Type-1 error rate at \(\alpha = 0.05\)).

\(~\)

Correlation

The final combination of variables yet to be considered in this tutorial is two quantitative variables.

In this situation, the correlation coefficient can form the basis of a hypothesis test of \(H_0: \rho = 0\), or no correlation between the variables being studied:

cor.test(x = acs$HoursWk, y = acs$Income, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  acs$HoursWk and acs$Income
## t = 12.893, df = 1285, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2891509 0.3859513
## sample estimates:
##       cor 
## 0.3384462

Note that cor.test expects two vectors, x and y, that are of the same length. The method argument can be changed to calculate different types of correlation (ie: Spearman rank correlation, etc.)

\(~\)

Closing Remarks

This tutorial is intended to provide a quick reference to several functions used to perform common statistical tests.

It does not cover:

  • the assumptions of the aforementioned hypothesis tests. You should also always check to make sure that your data are appropriate for the statistical test you are using.
  • the interpretation of hypothesis testing results, effect size, or confidence intervals. All of these topics are critical in providing an accurate assessment of the information your data provide.

\(~\)

Practice

For each scenario, write out a reasonable null hypothesis and evaluate it using the proper statistical test.

Question #1: The “infant heart” data set documents the results of a experiment investigating two developmental indices, PDI and MDI, after random assignment to one of two surgical approaches, low-flow bypass and circulatory arrest.

ih <- read.csv("https://remiller1450.github.io/data/InfantHeart.csv")

Part A: Use a statistical test to determine if there is sufficient statistical evidence to conclude that one of the two surgeries yields significantly greater PDI outcomes (indicating better physical development).

Part B: Use a statistical test to determine if there is sufficient statistical evidence to conclude that an infant’s PDI and MDI scores are related to each other.

Part C: Use a statistical test to determine if there is sufficient statistical evidence that a larger share of male infants were assigned to the low-flow bypass than the circulatory arrest group.

\(~\)

Question #2: The “commute tracker” data set is a sample of daily commutes tracked by a GPS app for a worker in the greater Toronto area.

ct <- read.csv("https://remiller1450.github.io/data/CommuteTracker.csv")

Part A: Use a statistical test to determine if there is sufficient statistical evidence to conclude that the commuter is more likely to take Hwy 407 on certain days of the week.

Part B: Use a statistical test to determine if there is sufficient statistical evidence to conclude that average value of “MaxSpeed” differs by month. If it does, decide which months are statistically different from each other.

Part C: Use a statistical test to determine if there is sufficient statistical evidence to conclude that the commuter is more to not record a commute whose destination is “Home” (opposed to one whose destination is “GSK”, the commuter’s place of employment). Hint: If each trip is equally likely to be missing, you’d expect half of the recorded commutes to be going “Home”.