\(~\)
Our previous lab introduced key concepts in hypothesis testing using simulations to approximate the null distribution of outcomes that might have been observed had the null hypothesis been true. You’ll recall that the observed outcome is compared against the null distribution to determine the \(p\)-value, which is a measure of how much evidence the data provide against the null hypothesis.
In this lab you’ll use probability models and standardized procedures rather than simulations to find the null distribution and \(p\)-value. To begin, we’ll revisit the topic of standardization, or transforming data onto a common scale by adjusting the mean and standard deviation.
When studying the correlation coefficient, we introduced the concept of Z-scores. For example, in Pearson’s height data, sons had an average height of \(\overline{x} = 63.3\) inches and a standard deviation of heights of \(s = 2.8\). Thus, the Z-score for a son with a height of 68.7 inches is given by:
\[Z = \frac{68.7 - 63.3}{2.8} = 1.9\]
We used this to conclude that a son who is 68.7 inches is 1.9 standard deviations above average.
This same idea applies to descriptive statistics such as sample means, proportions, and many others. Suppose we take a sample of \(n=5\) cases and obtain a sample mean of \(\overline{x} = 5\) and \(s = 3\). If we assume the population’s mean is \(\mu = 2\) we can calculate the following Z-score:
\[Z = \frac{5 - 2}{\frac{3}{\sqrt{5}}} = 2.24\] Thus we can conclude that this sample’s mean is 2.24 standard errors above average. You might recall that we use the term “standard error” to describe the standard deviation of a descriptive statistic across different samples, see our notes on confidence intervals and the corresponding lab for details.
Going one step further, if the conditions are right for \(Z\) to follow a known probability distribution we can use that distribution to calculate the probability of observing a sample mean at least as extreme as \(\overline{x} = 5\) under the assumption we made of the population’s mean being \(\mu = 2\). Since we’re working with a single mean and using an estimate of the population’s standard deviation, the \(t\)-distribution is appropriate:
Thus, the probability of observing a sample mean at least as extreme as 2.24 standard errors above the expected mean of 2 is 0.089 (shaded in blue). By definition, this is the \(p\)-value.
To conclude, we can use this example to establish a standardized hypothesis testing procedure based upon a test statistic of the form: \[\text{Test Statistic} = \frac{\text{observed statistic} - \text{hypothesized value}}{\text{standard error}}\]
The standard error can be found via the Central Limit theorem (or computational methods) and the distribution of the test statistic can be used to determine the \(p\)-value.
\(~\)
Throughout this lab you’ll use the “Infant Heart” data set, which comes from an experiment performed by researchers at Harvard Medical School who randomly assigned infants born with a congenital heart defect to one of two surgical approaches: low-flow bypass or circulatory arrest. The researchers followed each infant for two years, with the child’s MDI (mental development index) and PDI (psychomotor development index) scores at 2-years being the study’s primary outcomes.
## Libraries
library(dplyr)
library(ggplot2)
## Data used in the lab
infants = read.csv("https://remiller1450.github.io/data/InfantHeart.csv")
\(~\)
Recall that statistical tests involving a single proportion use hypotheses of the form: \[H_0: p = \text{hypothesized value} \\ H_a: p \ne \text{hypothesized value}\]
As an example, we can hypothesize that congenital heart defects are equally likely for male and female infants, which would be a null hypothesis of \(H_0: p = 0.5\) (suggesting 50% of babies from congenital heart defects are male).
Using the table()
function we can see that our sample is
mostly male:
table(infants$Sex)
##
## Female Male
## 44 99
A hypothesis test will evaluate whether this could have happened due to chance (sampling variability), or if there’s statistical evidence that male babies are more likely to be born with congenital heart defects.
To perform the test “by hand” we’d need all of the components of the test statistic formula: \[\text{Test Statistic} = \frac{\text{observed statistic} - \text{hypothesized value}}{\text{standard error}} = \frac{99/143 - 0.5}{\sqrt{\frac{0.5(1-0.5)}{143}}}=4.6\]
Note: The standard error (the denominator of the test statistic) arises from the Central Limit theorem. See our previous lab on confidence intervals for a table of various standard error formulas.
We can find the \(p\)-value by
inputting our test statistic in pnorm()
. In doing so we
must be careful to recognize that we need to set
lower.tail = FALSE
to ensure we calculate \(Pr(Z>4.6)\) and not \(Pr(Z<4.6)\), and we need to multiply
this probability by 2 to account for the extreme outcomes on the other
side of the distribution (which is symmetric).
## Finding the p-value "by hand"
2*pnorm(q=4.6, lower.tail = FALSE)
## [1] 4.224909e-06
We can compare our “by hand” result with the results from
prop.test()
(which also uses a Normal probability model)
and binom.test()
(which uses the binomial distribution, an
exact probability model for proportions).
## Finding the p-value using prop.test
prop.test(x = 99, n = 143, p = 0.5, correct = FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: 99 out of 143, null probability 0.5
## X-squared = 21.154, df = 1, p-value = 4.238e-06
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.6124572 0.7620965
## sample estimates:
## p
## 0.6923077
## Finding the p-value using binom.test
binom.test(x = 99, n = 143, p = 0.5)
##
## Exact binomial test
##
## data: 99 and 143
## number of successes = 99, number of trials = 143, p-value = 4.887e-06
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.6097308 0.7667202
## sample estimates:
## probability of success
## 0.6923077
Regardless of the precise approach chosen, the \(p\)-value is on the order of 1e-6, or around 0.000001. This indicates overwhelming statistical evidence that males are more common among infants born with congenital heart defects.
Question #1: Citing a paper in
Nature, Google’s search AI describes a study where the implied
proportion of male infants needing surgery for congenital heart defects
is 0.62. In this question you will consider the null hypothesis \(H_0: p=0.62\), which suggests that 62% of
infants needing surgery for congenital heart defects are male, and
perform a test of this hypothesis using the infants
data
set.
infants
data set and
the null hypothesis provided above, show the calculation of test
statistic for the hypothesis test that is described.pnorm()
function to
confirm the \(p\)-value you found in
Part B.Question #2: Repeat a similar hypothesis test to the
one you performed in Question 1 using the binom.test()
function. You only need to provide the code necessary to perform this
test, though you should note a \(p\)-value that is similar to the one you
found “by hand”.
\(~\)
Statistical tests involving a difference in proportions use hypotheses of the form: \[H_0: p_1 - p_2= \text{hypothesized value} \\ H_a: p_1-p_2 \ne \text{hypothesized value}\]
A hypothesized value of 0 is almost always used in this context.
Difference in proportions tests are conducted using
prop.test()
, a function we’ve previously used to find
confidence interval estimates for a difference in proportions. To make
use of the function, we must provide the following:
x
- a vector containing the frequencies of the event of
interest in each groupn
- a vector containing the sample sizes of each
groupAn example is shown below:
## Using the NFL games data set
nfl = read.csv("https://remiller1450.github.io/data/nfl_sample.csv")
## We'll re-code "game outcome" as a binary variable
nfl$win_binary = ifelse(nfl$game_outcome == 1, "Win", "Loss or Tie")
## Table relating game type and game outcome
type_outcome_table = table(nfl$game_type, nfl$win_binary)
## Printing the table
type_outcome_table
##
## Loss or Tie Win
## Playoff 6 5
## Reg 88 101
In order to test for a difference in the win rate of home teams in
regular season and playoff games, we should provide the vector
(5,101)
as the x
argument and
(11,189)
as the n
argument. This is
accomplished by the code below:
## Extract the vectors needed
my_x = c(type_outcome_table[1,2], type_outcome_table[2,2])
my_n = rowSums(type_outcome_table)
## Use prop.test
prop.test(x = my_x, n = my_n)
##
## 2-sample test for equality of proportions with continuity correction
##
## data: my_x out of my_n
## X-squared = 0.042056, df = 1, p-value = 0.8375
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.4306698 0.2709776
## sample estimates:
## prop 1 prop 2
## 0.4545455 0.5343915
In this example, the \(p\)-value is 0.8375, which suggests that this sample provides insufficient evidence of a relationship between game type and the win rate of the home team.
Question #3:
prop.test()
to evaluate
whether there is statistical evidence that the proportions of male
infants in each type of surgery (low-flow and circulatory arrest) is
unequal. You should clearly state your null hypothesis, then you should
provide the \(p\)-value and 1-sentence
summary.\(~\)
Statistical tests involving a single mean use hypotheses of the form: \[H_0: \mu= \text{hypothesized value} \\ H_a: \mu \ne \text{hypothesized value}\]
While tests involving a difference in means use hypotheses of the form: \[H_0: \mu_1 - \mu_2 = \text{hypothesized value} \\ H_a: \mu_1 - \mu_2 \ne \text{hypothesized value}\]
For a difference in means test, the hypothesized value is almost always zero, while for a test of a single mean, it can be any value depending on the application.
Both of these tests are performed using the t.test()
function, which uses the \(t\)-distribution as a probability model to
account for the additional variability introduced by the standard error
of the mean requires us to estimate both the mean and standard deviation
of the population using the same data.
To use t.test()
to perform a test involving a single
mean we provide the relevant quantitative variable as the
x
argument and the hypothesized value as the
mu
argument. The example below tests whether the average
score differential in NFL games is zero:
## Example of t.test for a single mean
t.test(x = nfl$score_diff, mu = 0)
##
## One Sample t-test
##
## data: nfl$score_diff
## t = 0.91137, df = 199, p-value = 0.3632
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -1.140453 3.100453
## sample estimates:
## mean of x
## 0.98
To use t.test()
to perform a difference in
means \(t\)-test we can use
formula syntax similar to what we’ve previously seen for regression
models. The example below tests whether the average points scored by the
home team differs for playoff and regular season games.
## Example of t.test for a difference in means
t.test(home_score ~ game_type, mu = 0, data = nfl)
##
## Welch Two Sample t-test
##
## data: home_score by game_type
## t = 0.26493, df = 11.128, p-value = 0.7959
## alternative hypothesis: true difference in means between group Playoff and group Reg is not equal to 0
## 95 percent confidence interval:
## -6.404705 8.160357
## sample estimates:
## mean in group Playoff mean in group Reg
## 23.45455 22.57672
Question #4: PDI scores are calculated such that a score of 100 reflects the population average for children of a certain age. In this question you will test whether the PDI scores at two-years of age for infants born with congenital heart defects deviate from the population average.
t.test()
function to
confirm the test statistic and \(p\)-value you found in Parts A and B.Question #5: A primary analysis goal in this study
was to compare the mean PDI scores of infants receiving each type of
surgery. For this question you should perform a hypothesis test that
evaluates whether the study found a significant difference in mean PDI
scores. You should clearly state your null and alternative hypotheses,
calculate a \(p\)-value using
t.test()
, and report a 1-sentence summary of what you
conclude from the hypothesis test.
\(~\)
The purpose of this section is practice your ability to decide the proper statistical test and hypotheses to use when given a new research question and data set. Each question will introduce a data set and research question, and your task is to perform an appropriate hypothesis test and report a 1-sentence conclusion based upon the results of that test.
When deciding upon a test you should consider the variables involved in the research questions and their type:
prop.test()
or binom.test()
) is
appropriate.prop.test()
) is most likely appropriate.Question #6: The “Oatbran” data set (read into
R
below) contains the results of an experiment where 14
male participants were randomly assigned to eat a certain type of cereal
for two weeks, after which their LDL cholesterol levels (mmol/L) were
measured, then after a washout period they ate a second type of cereal
for two weeks and their LDL cholesterol was measured again after this
period. Participants were randomly assigned to either consume oatbran
cereal first, followed by cornflakes, or to eat cornflakes first,
followed by oatbran during the second study period.
The data set contains 3 columns:
CornFlakes
- the LDL measurement at the end of the
period in which the subject was on the cornflakes dietOatBran
- the LDL measurement at the end of the period
in which the subject was on the oatbran dietdifference
- the difference in LDL measurements
(CornFlakes
minus Oatbran
) for a subject## Data for Question #6
oatbran = read.csv("https://remiller1450.github.io/data/Oatbran.csv")
For this question, you should perform an appropriate hypothesis test to evaluate whether this study provides evidence that oat bran cereal helps lower LDL cholesterol. Be sure your answer includes all of the components requested at the start of this section.
\(~\)
Question #7: The “Commute Tracker” data set (loaded below) contains a sample of daily commutes tracked by a GPS app used by a worker in the greater Toronto area.
ct <- read.csv("https://remiller1450.github.io/data/CommuteTracker.csv")
For this question, you should use a hypothesis test to evaluate
whether the worker is more likely to take Hwy 407, a toll road which is
faster but more expensive than their normal route, when they are headed
to their workplace GoingTo = 'GSK'
, or headed home
GoingTo = 'Home'
. Be sure your answer includes all of the
components requested at the start of this section.
\(~\)
Question #8: For this question, you will use the
“Commute Tracker” data introduced in Question 7. You are to perform a
hypothesis test to evaluate whether the average total time of trips back
to the worker’s home GoingTo = 'Home'
is longer than the
average total time of trips to the worker’s office
GoingTo = 'GSK'
. Be sure your answer includes all of the
components requested at the start of this section.