Goals:
The purpose of this lab is to provide practice applying hypothesis tests (via randomization) to a wide variety of scenarios.
Directions:
\(~\)
Introduction:
On Exam #2 you should be prepared to execute hypothesis tests from two different starting points:
On the exam you should expect asked to analyze data appearing in these forms using both randomization tests (performed entirely in StatKey) and Z or T tests (performed partially “by hand”). Sometimes you may be given the choice between these methods.
\(~\)
Gregor Mendel is generally regarded as first identifying the principles governing how traits are inherited across generations. Mendel conducted breeding experiments using Pisum sativum, a type of pea plant, and he recorded the traits exhibited in the parent and offspring plants.
A lasting effect of Mendel’s work is his representation of genes using two alleles where a capital letter (ie: “A”) represents a dominant allele, and a lower case letter (ie: “a”) represents a recessive allele. Since every individual possesses two alleles, Mendel’s framework categorizes individuals as “AA”, “Aa”, or “aa”.
One particularly illustrative experiment Mendel performed was a dihybrid cross, in which he bred two parents that both had the hybrid “Aa” genotype for two different genetic traits: “shape” (round or wrinkled) and “color” (green or yellow). In the next generation Mendel observed:
If the two traits were passed independently of each other, Mendel hypothesized a 3:1 ratio of dominant (“AA” or “Aa”) to recessive (“aa”) phenotypes for each trait. That is, he expected 3 round plants for every 1 wrinkled plant, as well as 3 yellow plants for every 1 green plant.
Question #1: For the “shape” trait, a relevant null hypothesis is \(H_0: p = 0.75\), where \(p\) is used to represent the proportion of plants with the “round” phenotype. What is the sample statistic that corresponds to this hypothesis? State your answer in proper statistical notation and provide a numeric value.
Question #2: Using the proper CLT formula, find the standard error that describes variability expected in this sample statistic if \(H_0: p = 0.75\) were true. Show your calculation.
Question #3: Using the information from Questions #1 and #2, perform either a \(Z\)-test or \(T\)-test to evaluate whether Mendel’s experiment provides statistically compelling evidence that the proportion of plants with the “round” phenotype differs from the expected proportion of \(p = 0.75\). Show how you calculated your \(Z\) or \(T\) value, provide a two-sided \(p\)-value, and make a conclusion.
Question #4: Now conduct the same hypothesis test for the “color” trait, using \(p\) to represent the proportion of plants with a “yellow” phenotype. Your answer should clearly state your null and alternative hypotheses, show how you calculated your \(Z\) or \(T\) value (include the \(SE\) calculation), and provide a \(p\)-value with a conclusion.
Question #5: Beyond a 3:1 ratio across each individual trait, Mendel expected a 9:3:3:1 ratio for the joint phenotype (the four combinations of “shape” and “color”). Letting \(p\) represent the proportion of round, yellow plants, test the hypothesis \(H_0: p = 0.5625\) using the sample statistic \(\hat{p} = 315/556\). In doing so, show how you calculated your \(Z\) or \(T\) value, provide a two-sided \(p\)-value, and make a conclusion.
Question #6: Do the results of the three hypothesis tests you’ve now conducted using these data prove that Mendel’s beliefs regarding genetic inheritance are true? Briefly explain.
Question #7: What would a Type I and a Type II error be in regards to any of the hypothesis tests you conducted in your analysis of Mendel’s data? Briefly explain in the context of this application what each error would entail.
\(~\)
Study #1 Reference: https://www.nature.com/scitable/topicpage/gregor-mendel-and-the-principles-of-inheritance-593/
\(~\)
A waiter at national chain restaurant located in a suburban shopping mall in the early 1990s recorded data on the tables they served during a three month period in hopes of demonstrating to their boss that their average tip percentage has been significantly lower than the rate of 20% they were told they could expect.
There data can be downloaded here: https://remiller1450.github.io/data/Tips.csv
Question #8: In both words and statistical symbols, what is the null hypothesis that this waiter would like to use their data to disprove?
Question #9: Use the sample data to test the hypothesis you set up in Question #8 using either a \(Z\)-test or \(T\)-test. Note that you will need to use StatKey to find some components of the test statistic. Your answer should show how you calculated your \(Z\) or \(T\) value (including the \(SE\) calculation), provide a \(p\)-value, and a brief conclusion.
Question #10: Use these data to find a 95% confidence interval estimate for the average tip percentage for this server (generalizing beyond just this three month period). Briefly explain why this confidence interval estimate agrees with the results of the hypothesis test you performed in Question #9
Question #11: Suppose the waiter who collected these data presents the results of the hypothesis test you performed in Question #9 to their boss, but the boss argues that the lower than expected tip percentage could explained by sampling bias. Is the boss’s argument statistically valid? Briefly explain.