Directions

  • You may choose to work on this lab individually or with your final project group
    • If you work as a group, all members are responsible the content your lab write-up. I strongly encourage you to use a voice chat software (Skype, google hangouts, etc.) while working together
    • If you choose to work as an individual you are not required to answer questions tagged (Group Only)
  • Read through the entire lab (not just the questions). The lab will introduce course content that you will be responsible for on exams/homework.
  • Answer all questions in a separate document, attaching Minitab output if needed.

Introduction

Like the previous lab, this lab will provide you another opportunity to practice your decision-making skills. For many questions, I will not tell you the statistical approach to use, instead it is your responsibility to choose a reasonable method and justify your decision.

Commute Tracker Dataset

The Commute Tracker dataset documents the daily commutes of a worker in the greater Toronto area collected using a GPS app. The variables in this dataset include:

  • Date of the trip
  • Month of the year
  • StartTime: when getting into the car
  • DayOfWeek: the day of the week
  • GoingTo: direction of travel
  • Distance: travelled in kilometers
  • MaxSpeed: fastest speed recorded (all trips are on the 407 highway for some portion)
  • AvgSpeed: the average speed for the entire trip
  • AvgMovingSpeed: the average speed recorded only while the car is moving
  • TotalTime: duration of the entire trip, in minutes
  • MovingTime: duration that the car was moving (i.e. not counting traffic delays, accidents, or time while the car is stationary)
  • Take407All: is Yes if the 407 toll highway was taken for the entire trip. The app comments: “I try to avoid taking the 407, taking slower back routes to save costs. But some days I’m running late, or just lazy, and take it all the way.”
  • Comments: comments from the driver about that day’s travel

Exploratory Analysis

Question 1:

Construct an appropriate graph to determine if there are any outliers in the variable “TotalTime”. Include your graph and a sentence or two describing whether you believe these outliers should be excluded from an analysis in your lab write-up. (Hint: use “Comments” to argue whether the outliers appear to be real data-points).

Question 2:

Conduct a full graphical analysis to discover which (if any) of the following variables are related to “TotalTime”. The variables you should consider are “DayOfWeek”, “GoingTo”, “AvgMovingSpeed”, and “Take407All”

You do not need to paste your graphs into your lab write-up, but for each of these four comparisons you should describe the type of graph you used and the key patterns you observed in that graph (ie: do the variables appear associated)

Question 3: (Group Only)

Use an appropriate graph to assess the relationship between the variables “MovingTime” and “TotalTime”. Do you notice anything unusual about the relationship between these two variables? Include your graph and a sentence or two describing what you see in your lab write-up.

Statistical Inference

Question 4:

Use an appropriate statistical test to determine whether commutes going to work tend to be significantly faster than trips coming home. Include a screenshot of your Minitab output and a thoughtful 1-3 sentence conclusion. Be sure to check the assumptions of your test to ensure what you did was statistically valid.

Question 5:

Use an appropriate statistical test to determine if day of week is associated with commute time (TotalTime). Include a screenshot of your Minitab output and a thoughtful 1-3 sentence conclusion. Be sure to check the assumptions of your test to ensure what you did was statistically valid.

Question 6:

Use an appropriate statistical test to determine if day of week is associated with whether the driver chose to take the 407 toll highway. Include a screenshot of your Minitab output and a thoughtful 1-3 sentence conclusion. Be sure to check the assumptions of your test to ensure what you did was statistically valid.

Question 7:

Use the “graphs” menu under one-way ANOVA to explore the residuals of an ANOVA model using “DayOfWeek” to predict the variable “MovingTime”. Based upon what you see, do you have any concerns using ANOVA to test for an association between these two variables?

Question 8:

Regardless of your answer to Question 7, use the “comparisons” menu under one-way ANOVA to conduct Tukey’s HSD for the ANOVA model using “DayOfWeek” to predict the variable “MovingTime”. Using these results, which groups are the most different? Is that difference statistically significant?

Question 9:

The Tukey’s HSD test you conducted in Question 8 involved 10 different pairwise comparisons. Based upon Minitab’s default choice of \(\alpha = 0.05\), what is expected the Type I error rate for this entire set of comparisons?

Question 10: (Group Only)

Apply a log-transformation to the variable “MovingTime” and repeat the analysis described in Question 7. What impact did this transformation have upon the residuals of the one-way ANOVA model? Do you trust this analysis more than that of Question 7?

Question 11: (Group Only)

Use Tukey’s HSD to conduct post-hoc testing on the one-way ANOVA model described in Question 10 (the one using the log-transformed version of “MovingTime”). Using the 95% confidence intervals provided in Minitab’s output, interpret the estimated difference for the two most different days of the week. That is, how do you estimate the average commute times compare across those two days in the population represented by these data?

Question 12: (Group Only)

You’ve now performed analyses involving “TotalTime” and “MovingTime”, for this question you are to make a recommendation on which of these variables offers more useful information for the user of this app. You may assume that their general motivation for collecting these data is to better understand their daily commute in order to optimize future decision making. You should consider both practical and statistical strengths/weaknesses of these two variables when making your recommendation.

Executive Summary

Question 13:

For this question you are to prepare a brief executive summary summarizing what you identify as the key findings in these data. You are to determine which trends are noteworthy and include them in your summary. Your summary should include one visual and you should limit yourself to one paragraph of writing. You may choose to explore other aspects of this dataset in your determination of what trends are important.

Submission Directions

  • Email your completed write-up to Professor Miller with a subject heading that includes the text “Sta-209-Lab9”. Please include this exact character string, including the dashes. You will lose 1 point off the top of your score if you don’t do so.
  • If you’d like to provide feedback on your group, fill out the optional review form at this link: https://forms.gle/wNWRFMbbra8oK4LJ8