Directions

For this assignment you should edit the “author” info in the header to include your name. Note that you should save your file as “HW1_Your_Name.Rmd” when knitting.

For each question, you should add or modify code in blocks provided. You should provide any written responses in the section that follows the code block.

If code for a question has been provided, but you were not asked to modify it, please don’t delete it or your .Rmd file might not knit properly.

Homework #1 is due 9/5 by 11:59pm

\(~\)

Question #1 - Part A

Write code that stores the data at the following URL as a data.frame named admissions_data:

https://remiller1450.github.io/data/admissions.csv

Next, print the dimensions of this data frame and write a sentence below your code chunk that briefly describes the total number of applicants and characteristics contained in the data set.

Note: These data were queried in response to sex-based discrimination in graduate program admissions at a large US university in the 1970s. The column dept indicates the department applied to, and gpa is the applicant’s undergraduate grade point average, and admit == "Y" indicates an applicant was admitted.

## Your code for question 1-A goes here

Your written answer to question 1 should go here

indicate any help you received on this question here

Question #1 - Part B

Use the table() and prop.table() functions to find the proportions of applicants of each sex that were admitted.

Then, without performing any statistical tests, briefly describe whether you think there appears to be a meaningful discrepancy in admissions rates.

## Your code for question 1-B goes here

Your written answer to question 1-B should go here

indicate any help you received on this question here

Question #1 - Part C

Use the ggplot function (you may assume the required package is installed already, but you should load the library) to create stacked conditional bar chart displaying the proportion of applicants of each sex that were admitted within each department.

Hint: A conditional bar chart is one where each “stack” of bars sums to 1. You should consider the position argument available in the proper geom. You can visit the ggplot cheatsheet for more help. Your graph should be faceted by department.

## Your code for question 1-C goes here

indicate any help you received on this question here

Question #1 - Part D

Beginning with the visualization you created in Part C, make the following modifications:

  1. Change the theme to theme_minimal
  2. Change the labels given to at least one of the variables used in the graph
  3. Use the scale_fill_manual() function to change the colors used to represent “admit” and “not” to “green” and “grey” (respectively)
## Your code for question 1-D goes here

indicate any help you received on this question here

\(~\)

Question #2

Consider the following ggplot code:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

Part A: Why are the data-points in the scatter plot not colored blue? Briefly explain.

Part B: Modify the code so that the points are properly shown as blue.

## Put your code for 2-B here

\(~\)

Question #3 - Part A

The Washington Post maintains a database of fatal shootings by a police officer in the line of duty. Additional details on the methodology can be found here: https://github.com/washingtonpost/data-police-shootings

The URL below contains data for all individuals entered into the database between 2015 and 2019:

https://remiller1450.github.io/data/Police2019.csv

Write code that stores these data as a data frame named police. Then find the average age of the individuals in this data set, removing missing values as necessary.

## Put your code for 3-A here

Question #3 - Part B

Print the names of every individual who did not have an age listed in the database (ie: all individuals removed when calculating the mean in Part A)

## Put your code for 3-B here

Question #3 - Part C

Subset the police data set to only include only individuals who were armed with a gun (you may ignore any categories involving a gun and something else). Then, determine the fraction of these individuals had a threat level of “attack”.

## Put your code for 3-C here