Lab #1 - A First Hypothesis Test

\(~\)

Onboarding

In today’s lab you’ll analyze data from an experiment conducted in 2007 and published in the journal Nature titled “Social evaluation by preverbal infants”.

In the experiment, 16 infants repeatedly watched two different demonstrations, one after another, seeing each several times. In one of the demonstrations, a “helper” toy assists the main character in climbing up a hill:

Video link: https://www.youtube.com/watch?v=WqEV9Otdp58

In the other demonstration, a “hinderer” toy blocks the main character from climbing up the hill:

Video link: https://www.youtube.com/watch?v=YX6PTixcS5I

After watching the sequence of demonstrations each infant was presented with a choice to play with one of the two toys (either the “helper” or “hinderer”) that they had seen:

The researchers conjectured that infants would prefer the “helper”, with the implication being that infants can identify friendly/cooperative behavior and prefer it.

\(~\)

Lab

What follows is a series of questions that are expected to answer with your assigned group. You may record your answers individually, but only one person should submit your group’s lab responses to Canvas. For this lab you should type your responses in a Word doc (or Google Doc) - future labs will have a different format.

Initial Questions

The following questions are about randomness and how it relates to the study described in the onboarding section. Try your best to answer them with your group. We will discuss them together at the end of class.

Question #1: The researchers in this study were interested in the percentage of infants that would prefer the “helper” toy. Is the percentage they observed in the data they collected the outcome of a random process? Briefly explain your answer.

\(~\)

Question #2: The researchers in this study were careful to randomly assign the color and shape of each toy for each infant. They also randomly assigned whether the “helper” would appear on the left or right side during the toy choice. Why do you believe the researchers took these steps?

\(~\)

Hypotheses

Question #3: Consider the hypothesis that infants were randomly choosing between the “helper” and “hinderer” without any preference for one toy over the other. Which of the following probability statements accurately reflects this hypothesis? Indicate the correct statement and briefly explain what makes the other statements inappropriate.

A - \(Pr(\text{helper chosen}) = 0.5\)
B - \(Pr(\text{helper chosen}| \text{infants choose randomly}) = 0.5\)
C - \(Pr(\text{infants choose randomly} | \text{helper chosen}) = 0.5\)

\(~\)

Question #4: Under the hypothesis that infants were randomly choosing between the “helper” and “hinderer” toys without any preference, how many of the 16 infants would you expect to choose the “helper”? Would it be incompatible with this hypothesis if it were observed that 9 infants in the study chose the “helper”?

\(~\)

Hypothesis Testing

In the field of statistics, standard practice is to set up a null hypothesis as a “straw man”. The data are then used as evidence against the null hypothesis in order to support an alternative conclusion. The null hypothesis must be falsifiable, meaning it specifies at least one precise numerical value, such as 0, 0.5, 1 or something else depending upon the context of the application. For example, a null hypothesis of “the average August is not 75F” is not appropriate because only temperatures of exactly 75.00…0 degrees would provide evidence against it.

As an example, a null hypothesis might be “the drug has no effect on weight loss”, and a statistician would then use data from a clinical trial to assess whether there is enough evidence to refute this hypothesis. If the individuals in the trial who took the drug lost substantially more weight than those who took a placebo, that would indicate evidence against the null hypothesis.

The amount of evidence that this difference in weight loss provides against the null hypothesis is measured by calculating the conditional probability of observing an effect (a difference in weight loss in this example) at least as large as the one found in the study conditional on the null hypothesis being true. This conditional probability is known as the \(p\)-value. If the difference in weight loss was an average of 5 lbs in our example, the \(p\)-value could be expressed as \(Pr(\text{Difference in weight loss }\geq 5|\text{Drug has no effect})\).

Question #5: Which of the following is an appropriate null hypothesis for the helper/hinderer study described throughout this lab? Indicate the correct choice(s) and briefly explain what makes the other statements hypotheses inappropriate.

A - \(\geq 8 \text{ infants choose helper}\)
B - \(\text{exactly} 8 \text{ infants choose helper}\)
C - the proportion of infants who choose the helper is not 50%
D - the proportion of infants who choose the helper is 50%

Question #6: Using your answer from Question #5 and the example from this section, how would the \(p\)-value be defined in this study? You may express your answer in words or a probability statement (ie: \(Pr(...)\)).

\(~\)

Calculating a \(p\)-value

Recall that the probability of an event is defined as its long-run frequency over many repetitions of the underlying random process. In the biased coin example from today’s lecture we considered 30 flips of a coin that may or may not have been fair, 18 of which were “heads” (60%). If our null hypothesis is “the coin is fair”, the \(p\)-value is then defined as \(Pr(\geq 60\% \text{ Heads} | \text{Coin is fair})\), the probability of observing at least 60% heads given the coin was fair.

We can calculate this probability by repeating the underlying random process a large number of times. Here that random process is a set of 30 flips of a fair coin. We’ll rely on StatKey, an online statistics app, to perform these sets of 30 coin flips and track the results:

Use this StatKey link

Using the link given above, perform the following steps:

Click the “Edit Data” button and change the count to 18 and the sample size to 30 to reflect the observed data in the coin flip scenario.
Check that the “Null hypothesis” is displayed as \(p=0.5\), which reflects the assumption that the coin is fair.
Click the “Generate 1 Sample” button. You should see a dot appear somewhere on the graph. This dot is the outcome of a single repetition of the random process (flipping a fair coin 30 times).
Click the “Generate 1000 Samples” button, noting that probability requires a large number of repetitions of the underlying random process. What you see is a distribution of outcomes that could be expected for a fair coin.
Since the observed proportion of heads, 60%, is on the right side of this distribution, check the “Right Tail” box, then click the blue box that appears along the x-axis and change its value to 0.6. This will prompt StatKey to highlight all of the simulated results with 60% or more heads. The blue box that appears above these highlighted results shows the proportion of all simulated outcomes that are greater than or equal to the value you set. This is an estimate of the \(p\)-value.

After executing these steps you should arrive at a \(p\)-value in the 0.17-0.20 range. Repeating the random process more times (ie: clicking “Generate 1000 Samples” several times) can help stabilize the estimate of the \(p\)-value. Interpreted literally, this \(p\)-value indicates that if the coin were fair we’d expect to observe at least 60% heads in a set of 30 coin flips about 20% of the time. This suggests an outcome of 18 of 30 heads is not that unusual for a fair coin, so we don’t have compelling evidence that our coin is biased.

Question #7: In the helper/hinderer study 14 of the 16 infants selected the “helper” toy. Using this information and the steps in this section as a guide, use StatKey to calculate the \(p\)-value for the null hypothesis you chose in Question #5. Report your \(p\)-value and make a 1-2 sentence conclusion about whether the data provides sufficient evidence that infants were not choosing randomly between the two toys.

\(~\)

Question #8: Suppose the researchers decided to modify their original study to include a decoy toy that had not been present in either demonstration on the tray presented to each infant, thereby allowing the infant to choose between three toys rather than two. How would this impact your null hypothesis if you are still interested in disproving the notion that infants were randomly choosing their toys? Would you expect the \(p\)-value to be larger or smaller if it were still the case that 14 of 16 infants chose the “helper” in this modified study? Briefly explain your answers (2-3 sentences).