Directions
When reading modern scientific research it might seem like the statistical tests we’ve been learning about are outdated and rarely used, but in reality, they are still used quite frequently for something known as A/B Testing.
A/B testing is general methodology centered on a statistical comparison of two randomly assigned conditions, “A” and “B”, which can often be viewed as treatment and control groups. For example, a business trying to optimize their webpage design might randomly assign new visitors to receive one of two variants of the page. The business can track the clicks made by the visitors assigned to each page, paying particular attention to certain behaviors (clicking on promotions/ads, adding items to their cart, completing a purchase, etc.)
In this lab we will analyze data from an A/B test conducted by an anonymous company in January 2017 to evaluate the efficacy of two landing page designs for their website. The outcome of interest for this analysis is conversion, in this example conversion refers to a successful sale of the company’s product.
Question 1:
Can a causal relationship be inferred from the A/B testing protocol described in the paragraphs above? What would a causal relationship mean in the context of the application we’ll be investigating in this lab?
Download and load the A/B Testing Data into Minitab. The data set contains the following variables:
In nearly any randomized experiment it is possible that the randomization protocol does not work as intended. We’ve already seen several examples of this, including one where the distribution of a key confounding variable was not balanced across groups (the lab monkey example), and another where there was disproportionate cross-over between the assigned treatments (the Minneapolis domestic abuse study). For these reasons, an important preliminary step in analyzing data from a randomized experiment is to determine whether randomization was successful.
Question 2:
Construct a “histogram with groups” of the variable “timestamp” using the variable “group” as the grouping variable. Based upon this graph, do you believe that this experiment’s randomization procedure was successful in preventing date from being a confounding variable in the relationship between “group” and “converted”? Include your graph and briefly explain your reasoning.
Question 3:
The creators of this experiment intended for an equal number of new visitors to be assigned to view each page design and constructed their randomization scheme according. Using the appropriate statistical test, is there evidence that an imbalanced proportion of the sample was assigned to either page? You may conduct your test in Minitab, but you should include the relevant output and a 1-2 sentence conclusion in your lab write-up.
Question 4:
A final concern in some randomized experiments is adherence. For various reasons (browser compatibility, browsing history, etc.), not every new visitor ended up viewing the version of the page they were randomly assigned to visit. To answer this question, use a two-way frequency table to evaluate adherence. Do you think that lack of adherence is a major concern (ie: source of bias) in this experiment? Briefly explain. (Hint: you’re not required to use a statistical test here, rather you should consider the data along with the background of the experiment and make a judgement)
In this section we will analyze the impact of page design on conversion. We will consider three different hypothesis tests for the purposes of learning about the similarities and differences in these tests. In practice you’d only be interested in performing a single test.
Question 5:
Use a \(z\)-test to evaluate whether conversion rates are different for the treatment and control groups. I’d like you to perform this test “by hand”, but you may check your work in Minitab. Your answer should include: the statement of your null and alternative hypotheses using proper statistical notation, the calculation of your test statistic, the \(p\)-value of your test, and a one sentence conclusion.
Question 6:
Use the appropriate exact test in Minitab to evaluate whether conversion rates are different for the treatment and control groups. Include your Minitab output in your lab write-up
Question 7:
Using Statkey, perform an appropriate randomization test to evaluate whether conversion rates are different for the treatment and control groups. Include a screenshot of your randomization distribution (showing the \(p\)-value) in your lab write-up.
Question 8
How do the results of the three different tests in Questions 5-7 compare with one another? What do you think is the reason for these similarities/differences?
In our notes on randomized experiments we first learned about the intention to treat principle (ITT), an analysis approach that should be considered when it is possible for subjects not to adhere to their randomly assigned treatment. In the A/B testing data analyzed in this lab it was possible for subjects not to end up viewing the page they were randomly assigned to, meaning we should consider the ramifications of an ITT analysis versus an analysis using the variable “landing_page” (which is sometimes called an “as-treated” analysis).
Question 9
What does intent-to-treat analysis mean in the context of this application? That is, what would the explanatory and response variables of this analysis be?
Question 10
Conduct an appropriate test to evaluate whether conversion rates are different for each “landing_page” (ie: perform an as-treated analysis). In your lab write-up you should include a screenshot of the output from your test, along with a one sentence conclusion.
Question 11
Does the hypothesis test you performed in Question #10 agree with the results of your ITT analysis (which you performed in the previous section) of these data?
Question 12
Does it seem necessary to use the intention to treat principle in this application? Briefly explain.
Question #13
Based upon the statistical tests you performed on these data (and the conclusion you reached), is it possible that you made a Type I error? Briefly explain.
Question #14
Based upon the statistical tests you performed on these data (and the conclusion you reached), is it possible that you made a Type II error? Briefly explain.
Question #15
If you were going to re-run this experiment, how could you increase its statistical power? Briefly explain.
Question #16
Suppose that more powerful repeat of this experiment is to be run, and you are asked to decide upon a sample size that you’d expect to have an 80% chance of producing a statistically significant result at the \(\alpha = 0.05\) level. Based upon the effect size observed in these data, use the power calculator at this link to determine the necessary sample size.