Goals:

The purpose of this lab is to provide practice applying hypothesis tests (via randomization) to a wide variety of scenarios.

Directions:

  • You are expected to progress through the analyses described in this document as a group, recording your answers in a shared document. It’s completely up to your group how you’d like to organize this - some groups like using a shared Google Doc, while other might designate one person to be the group’s recorder.
  • You are expected to work together, any attempts to “divide and conquer” the lab questions may result in point deductions on your group’s lab score.
  • Labs are graded primarily for completion, and we will get together as group for the last 10-15 minutes of class to discuss some of the lab questions. This means you should focus on learning the material (while also helping the teammates in your group) rather than seeing labs as an assessment (like homework or exams).
  • Please upload your responses to the Lab’s questions on Canvas. The expectation is that everyone uploads their own copy (they can be identical within your group).
  • Use the snipping tool on Windows or take a Mac screenshot to add a screenshots to your lab write-up as requested.

\(~\)

Introduction:

On Exam #2 you should be prepared to execute hypothesis tests from two different starting points:

  1. Data and descriptive statistics summarized in text (Study #1 below)
  2. Data stored in a CSV file such that you must find the descriptive statistics yourself (Study #2 below)

On the exam you should expect asked to analyze data appearing in these forms using both randomization tests (performed entirely in StatKey) and Z or T tests (performed partially “by hand”). Sometimes you may be given the choice between these methods.

\(~\)

Study #1 - Gregor Mendel’s Experiment

Gregor Mendel is generally regarded as first identifying the principles governing how traits are inherited across generations. Mendel conducted breeding experiments using Pisum sativum, a type of pea plant, and he recorded the traits exhibited in the parent and offspring plants.

A lasting effect of Mendel’s work is his representation of genes using two alleles where a capital letter (ie: “A”) represents a dominant allele, and a lower case letter (ie: “a”) represents a recessive allele. Since every individual possesses two alleles, Mendel’s framework categorizes individuals as “AA”, “Aa”, or “aa”.

One particularly illustrative experiment Mendel performed was a dihybrid cross, in which he bred two parents that both had the hybrid “Aa” genotype for two different genetic traits: “shape” (round or wrinkled) and “color” (green or yellow). In the next generation Mendel observed:

  • 315 plants with round, yellow seeds
  • 108 plants with round, green seeds
  • 101 plants with wrinkled, yellow seeds
  • 32 plants with wrinkled, green seeds

If the two traits were passed independently of each other, Mendel hypothesized a 3:1 ratio of dominant (“AA” or “Aa”) to recessive (“aa”) phenotypes for each trait. That is, he expected 3 round plants for every 1 wrinkled plant, as well as 3 yellow plants for every 1 green plant.

Question #1: For the “shape” trait, a relevant null hypothesis is \(H_0: p = 0.75\), where \(p\) is used to represent the proportion of plants with the “round” phenotype. What is the sample statistic that corresponds to this hypothesis? State your answer in proper statistical notation and provide a numeric value.

Question #2: Using the proper CLT formula, find the standard error that describes variability expected in this sample statistic if \(H_0: p = 0.75\) were true. Show your calculation.

Question #3: Using the information from Questions #1 and #2, perform either a \(Z\)-test or \(T\)-test to evaluate whether Mendel’s experiment provides statistically compelling evidence that the proportion of plants with the “round” phenotype differs from the expected proportion of \(p = 0.75\). Show how you calculated your \(Z\) or \(T\) value, provide a two-sided \(p\)-value, and make a conclusion.

Question #4: Now conduct the same hypothesis test for the “color” trait, using \(p\) to represent the proportion of plants with a “yellow” phenotype. Your answer should clearly state your null and alternative hypotheses, show how you calculated your \(Z\) or \(T\) value (include the \(SE\) calculation), and provide a \(p\)-value with a conclusion.

Question #5: Beyond a 3:1 ratio across each individual trait, Mendel expected a 9:3:3:1 ratio for the joint phenotype (the four combinations of “shape” and “color”). Letting \(p\) represent the proportion of round, yellow plants, test the hypothesis \(H_0: p = 0.5625\) using the sample statistic \(\hat{p} = 315/556\). In doing so, show how you calculated your \(Z\) or \(T\) value, provide a two-sided \(p\)-value, and make a conclusion.

Question #6: Do the results of the three hypothesis tests you’ve now conducted using these data prove that Mendel’s beliefs regarding genetic inheritance are true? Briefly explain.

Question #7: What would a Type I and a Type II error be in regards to any of the hypothesis tests you conducted in your analysis of Mendel’s data? Briefly explain in the context of this application what each error would entail.

\(~\)

Study #1 Reference: https://www.nature.com/scitable/topicpage/gregor-mendel-and-the-principles-of-inheritance-593/

\(~\)

Study #2 - Restaurant Tips

A waiter at national chain restaurant located in a suburban shopping mall in the early 1990s recorded data on the tables they served during a three month period in hopes of demonstrating to their boss that their average tip percentage has been significantly lower than the rate of 20% they were told they could expect.

There data can be downloaded here: https://remiller1450.github.io/data/Tips.csv

Question #8: In both words and statistical symbols, what is the null hypothesis that this waiter would like to use their data to disprove?

Question #9: Use the sample data to test the hypothesis you set up in Question #8 using either a \(Z\)-test or \(T\)-test. Note that you will need to use StatKey to find some components of the test statistic. Your answer should show how you calculated your \(Z\) or \(T\) value (including the \(SE\) calculation), provide a \(p\)-value, and a brief conclusion.

Question #10: Use these data to find a 95% confidence interval estimate for the average tip percentage for this server (generalizing beyond just this three month period). Briefly explain why this confidence interval estimate agrees with the results of the hypothesis test you performed in Question #9

Question #11: Suppose the waiter who collected these data presents the results of the hypothesis test you performed in Question #9 to their boss, but the boss argues that the lower than expected tip percentage could explained by sampling bias. Is the boss’s argument statistically valid? Briefly explain.