Directions:
For this question you’ll use the “2015 Boston Marathon” data set located at the URL:
https://remiller1450.github.io/data/BostonMarathon2015.csv
These data record the demographics, residence, finish time and place, and splits of everyone who finished the race.
marathon_data
.head()
,
print()
, dim()
and/or View()
).
Afterwards, use an R comment (or plain text if using R Markdown) to
provide a brief written statement that includes: what a case represents
in this data set, how many cases are present, and how many variables
(columns) are present.R
function to find the
type of the variable Official.Time
. Based upon this type,
it possible to use the mean()
function to find the average
finish time without coercion? Briefly explain.Age
. Based upon this type, it possible to use the
mean()
function to find the average age of cases without
coercion? Briefly explain.For this question you’ll use the “NYC Airplanes” data set located at the URL:
https://remiller1450.github.io/data/planes.csv
These data record the characteristics of all airplanes that departed from one of New York City’s major commercial airports (JFK, LGA, or EWR) in the year 2013.
airplanes_data
.head()
,
print()
, dim()
and/or View()
).
Based upon your assessment, how many quantitative variables are
recorded? (Hint: the NA
you see in the variable
speed
is how R
represents missing data. You
should still count this variable as quantitative despite its high
prevalence of missing data)table()
to create a
one-way frequency table of the variable engines
(be careful
not to use the variable engine
). Based upon what
you see, do you think it’d be appropriate to treat the variable
engines
as categorical despite the fact that it records
numerical information? Briefly explain.mean()
and
median()
functions to find the mean and median of the
variable seats
. Noting that standard deviation of
seats
is 73.7 seats, would you consider the mean and median
to be similar to each other (suggesting a roughly symmetric
distribution) or different from each other (suggesting a skewed
distribution)? Briefly explain.