For this assignment you should edit the “author” info in the header to include your name. Note that you should save your file as “HW1_Your_Name.Rmd” when knitting.
For each question, you should add or modify code in blocks provided. You should provide any written responses in the section that follows the code block.
If code for a question has been provided, but you were not asked to modify it, please don’t delete it or your .Rmd file might not knit properly.
Homework #1 is due 9/5 by 11:59pm
\(~\)
Write code that stores the data at the following URL as a data.frame
named admissions_data
:
https://remiller1450.github.io/data/admissions.csv
Next, print the dimensions of this data frame and write a sentence below your code chunk that briefly describes the total number of applicants and characteristics contained in the data set.
Note: These data were queried in response to sex-based
discrimination in graduate program admissions at a large US university
in the 1970s. The column dept
indicates the department
applied to, and gpa
is the applicant’s undergraduate grade
point average, and admit == "Y"
indicates an applicant was
admitted.
## Your code for question 1-A goes here
Your written answer to question 1 should go here
indicate any help you received on this question here
Use the table()
and prop.table()
functions
to find the proportions of applicants of each sex that were
admitted.
Then, without performing any statistical tests, briefly describe whether you think there appears to be a meaningful discrepancy in admissions rates.
## Your code for question 1-B goes here
Your written answer to question 1-B should go here
indicate any help you received on this question here
Use the ggplot
function (you may assume the required
package is installed already, but you should load the library) to create
stacked conditional bar chart displaying the proportion of
applicants of each sex that were admitted within each
department.
Hint: A conditional bar chart is one where each “stack” of
bars sums to 1. You should consider the position argument available in
the proper geom
. You can visit the ggplot
cheatsheet for more help. Your graph should be faceted by
department.
## Your code for question 1-C goes here
indicate any help you received on this question here
Beginning with the visualization you created in Part C, make the following modifications:
theme_minimal
scale_fill_manual()
function to change the
colors used to represent “admit” and “not” to “green” and “grey”
(respectively)## Your code for question 1-D goes here
indicate any help you received on this question here
\(~\)
Consider the following ggplot
code:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
Part A: Why are the data-points in the scatter plot not colored blue? Briefly explain.
Part B: Modify the code so that the points are properly shown as blue.
## Put your code for 2-B here
\(~\)
The Washington Post maintains a database of fatal shootings by a police officer in the line of duty. Additional details on the methodology can be found here: https://github.com/washingtonpost/data-police-shootings
The URL below contains data for all individuals entered into the database between 2015 and 2019:
https://remiller1450.github.io/data/Police2019.csv
Write code that stores these data as a data frame named
police
. Then find the average age of the individuals in
this data set, removing missing values as necessary.
## Put your code for 3-A here
Print the names of every individual who did not have an age listed in the database (ie: all individuals removed when calculating the mean in Part A)
## Put your code for 3-B here
Subset the police
data set to only include only
individuals who were armed with a gun (you may ignore any categories
involving a gun and something else). Then, determine the fraction of
these individuals had a threat level of “attack”.
## Put your code for 3-C here