Directions:
tinytext
package by running
install.packages('tinytex')
followed by
tinytex::install_tinytex()
For this question you’ll use the “NYC Airplanes” data set located at the URL:
https://remiller1450.github.io/data/planes.csv
These data record the characteristics of all airplanes that departed from one of New York City’s major commercial airports (JFK, LGA, or EWR) in the year 2013.
airplanes_data
head()
,
print()
, dim()
and/or View()
).
Based upon your assessment, how many quantitative variables are
recorded? (Hint: the NA
you see in the variable
speed
is how R
represents missing data. You
should still count this variable as quantitative despite its high
prevalence of missing data)table()
to create a
one-way frequency table of the variable engines
(be careful
not to use the variable engine
). Based upon what
you see, do you think it’d be appropriate to treat the variable
engines
as categorical despite the fact that it records
numerical information? Briefly explain.mean()
and
median()
functions to find the mean and median of the
variable seats
. Noting that standard deviation of
seats
is 73.7 seats, would you consider the mean and median
to be similar to each other (suggesting a roughly symmetric
distribution) or different from each other (suggesting a skewed
distribution)? Briefly explain.\(~\)
For this question you’ll use the “Police” data set located at the URL:
https://remiller1450.github.io/data/Police.csv
This data set was assembled by the Washington Post to document all fatal shootings by a police officer from 2015 through mid-2020.
police
race
. Briefly describe the
distribution.table()
and
prop.table()
functions to calculate the proportion of cases
belonging to each category of the variable race
race
and
body_camera
under the hypothesis that an individual’s race
is linked to how likely it is that a body camera were present during the
incident. Do you believe these variables are associated? Briefly
explain.race
and body_camera
. Use
this table to find the proportion of White individuals
(race = "w"
) for whom a body camera was present at the time
of their shooting. Show how this proportion is calculated.\(~\)
For this question you’ll use the “Infant Heart” data set located at the URL:
https://remiller1450.github.io/data/InfantHeart.csv
This data set records the results of a randomized experiment at Harvard Medical School to compare two different surgical approaches, “low-flow bypass” and “circulatory arrest”, in infants born with congenital heart defects that require surgery shortly after birth. The study’s main outcomes are the child’s Psychomotor Development Index (PDI), a composite score measuring physiological development, and their Mental Development Index (MDI), a composite score measuring mental development, 2-years after the surgery. For both measures a higher score reflects greater development.
heart
, then create a data visualization showing the
distribution of the outcome variable MDI
MDI
using your graph from Part ATreatment
and
MDI
. Do you believe these variables are associated? Briefly
explain.MDI
and PDI
. Add a smoother to your graph to help show the
relationship between these variables.MDI
and
PDI