Directions
Like the previous lab, this lab will provide you another opportunity to practice your decision-making skills. For many questions, I will not tell you the statistical approach to use, instead it is your responsibility to choose a reasonable method and justify your decision.
The Commute Tracker dataset documents the daily commutes of a worker in the greater Toronto area collected using a GPS app. The variables in this dataset include:
Question 1:
Construct an appropriate graph to determine if there are any outliers in the variable “TotalTime”. Include your graph and a sentence or two describing whether you believe these outliers should be excluded from an analysis in your lab write-up. (Hint: use “Comments” to argue whether the outliers appear to be real data-points).
Question 2:
Conduct a full graphical analysis to discover which (if any) of the following variables are related to “TotalTime”. The variables you should consider are “DayOfWeek”, “GoingTo”, “AvgMovingSpeed”, and “Take407All”
You do not need to paste your graphs into your lab write-up, but for each of these four comparisons you should describe the type of graph you used and the key patterns you observed in that graph (ie: do the variables appear associated)
Question 3: (Group Only)
Use an appropriate graph to assess the relationship between the variables “MovingTime” and “TotalTime”. Do you notice anything unusual about the relationship between these two variables? Include your graph and a sentence or two describing what you see in your lab write-up.
Question 4:
Use an appropriate statistical test to determine whether commutes going to work tend to be significantly faster than trips coming home. Include a screenshot of your Minitab output and a thoughtful 1-3 sentence conclusion. Be sure to check the assumptions of your test to ensure what you did was statistically valid.
Question 5:
Use an appropriate statistical test to determine if day of week is associated with commute time (TotalTime). Include a screenshot of your Minitab output and a thoughtful 1-3 sentence conclusion. Be sure to check the assumptions of your test to ensure what you did was statistically valid.
Question 6:
Use an appropriate statistical test to determine if day of week is associated with whether the driver chose to take the 407 toll highway. Include a screenshot of your Minitab output and a thoughtful 1-3 sentence conclusion. Be sure to check the assumptions of your test to ensure what you did was statistically valid.
Question 7:
Use the “graphs” menu under one-way ANOVA to explore the residuals of an ANOVA model using “DayOfWeek” to predict the variable “MovingTime”. Based upon what you see, do you have any concerns using ANOVA to test for an association between these two variables?
Question 8:
Regardless of your answer to Question 7, use the “comparisons” menu under one-way ANOVA to conduct Tukey’s HSD for the ANOVA model using “DayOfWeek” to predict the variable “MovingTime”. Using these results, which groups are the most different? Is that difference statistically significant?
Question 9:
The Tukey’s HSD test you conducted in Question 8 involved 10 different pairwise comparisons. Based upon Minitab’s default choice of \(\alpha = 0.05\), what is expected the Type I error rate for this entire set of comparisons?
Question 10: (Group Only)
Apply a log-transformation to the variable “MovingTime” and repeat the analysis described in Question 7. What impact did this transformation have upon the residuals of the one-way ANOVA model? Do you trust this analysis more than that of Question 7?
Question 11: (Group Only)
Use Tukey’s HSD to conduct post-hoc testing on the one-way ANOVA model described in Question 10 (the one using the log-transformed version of “MovingTime”). Using the 95% confidence intervals provided in Minitab’s output, interpret the estimated difference for the two most different days of the week. That is, how do you estimate the average commute times compare across those two days in the population represented by these data?
Question 12: (Group Only)
You’ve now performed analyses involving “TotalTime” and “MovingTime”, for this question you are to make a recommendation on which of these variables offers more useful information for the user of this app. You may assume that their general motivation for collecting these data is to better understand their daily commute in order to optimize future decision making. You should consider both practical and statistical strengths/weaknesses of these two variables when making your recommendation.
Question 13:
For this question you are to prepare a brief executive summary summarizing what you identify as the key findings in these data. You are to determine which trends are noteworthy and include them in your summary. Your summary should include one visual and you should limit yourself to one paragraph of writing. You may choose to explore other aspects of this dataset in your determination of what trends are important.