Directions:
tinytext package by running
install.packages('tinytex') followed by
tinytex::install_tinytex()For this question you’ll use the “NYC Airplanes” data set located at the URL:
https://remiller1450.github.io/data/planes.csv
These data record the characteristics of all airplanes that departed from one of New York City’s major commercial airports (JFK, LGA, or EWR) in the year 2013.
airplanes_datahead(),
print(), dim() and/or View()).
Based upon your assessment, how many quantitative variables are
recorded? (Hint: the NA you see in the variable
speed is how R represents missing data. You
should still count this variable as quantitative despite its high
prevalence of missing data)table() to create a
one-way frequency table of the variable engines (be careful
not to use the variable engine). Based upon what
you see, do you think it’d be appropriate to treat the variable
engines as categorical despite the fact that it records
numerical information? Briefly explain.mean() and
median() functions to find the mean and median of the
variable seats. Noting that standard deviation of
seats is 73.7 seats, would you consider the mean and median
to be similar to each other (suggesting a roughly symmetric
distribution) or different from each other (suggesting a skewed
distribution)? Briefly explain.\(~\)
For this question you’ll use the “Police” data set located at the URL:
https://remiller1450.github.io/data/Police.csv
This data set was assembled by the Washington Post to document all fatal shootings by a police officer from 2015 through mid-2020.
policerace. Briefly describe the
distribution.table() and
prop.table() functions to calculate the proportion of cases
belonging to each category of the variable racerace and
body_camera under the hypothesis that an individual’s race
is linked to how likely it is that a body camera were present during the
incident. Do you believe these variables are associated? Briefly
explain.race and body_camera. Use
this table to find the proportion of White individuals
(race = "w") for whom a body camera was present at the time
of their shooting. Show how this proportion is calculated.\(~\)
For this question you’ll use the “Infant Heart” data set located at the URL:
https://remiller1450.github.io/data/InfantHeart.csv
This data set records the results of a randomized experiment at Harvard Medical School to compare two different surgical approaches, “low-flow bypass” and “circulatory arrest”, in infants born with congenital heart defects that require surgery shortly after birth. The study’s main outcomes are the child’s Psychomotor Development Index (PDI), a composite score measuring physiological development, and their Mental Development Index (MDI), a composite score measuring mental development, 2-years after the surgery. For both measures a higher score reflects greater development.
heart, then create a data visualization showing the
distribution of the outcome variable MDIMDI using your graph from Part ATreatment and
MDI. Do you believe these variables are associated? Briefly
explain.MDI
and PDI. Add a smoother to your graph to help show the
relationship between these variables.MDI and
PDI