From the Introduction to Modern Statistics (IMS) textbook, complete
the following exercises:
- Ch 4.8: #4, #5, #6
- Ch 5.10: #1, #2
Additionally, please complete the following R
exercise:
Question #1: For this question you will use the
following data, which was obtained from the US Department of Education’s
College Scorecard and includes information from 2019 on all
primarily undergraduate colleges with at least 400 enrolled
students.
The data are available at this URL: https://remiller1450.github.io/data/Colleges2019_Complete.csv
- Part A: Write code that loads these data into
R
as a data.frame object named “colleges”.
- Part B: Find the number of cases and variables in
the data set. Briefly describe what constitutes a case given the format
of the data.
- Part C: Use the
ggplot2
package to
create a scatter plot showing the relationship between the variables
Cost
, a school’s total annual cost of attendance without
considering financial aid, and Salary10yr_median
the median
salary of a school’s graduates 10 years after receiving their
degree.
- Part D: In 1 or 2 sentences, briefly the
relationship you see in the scatter plot you created in Part C. Write
your description as an
R
comment or in R Markdown outside
of a code chunk.
Submission instructions
- If you’re comfortable using R Markdown, I encourage you to use it to
record your answers to all of the homework questions.
- If you knit to HTML output, you’ll need to send it to a zipped
folder (right click, then “Send to” -> “Compressed Folder”) before
uploading it to P-web
- If you prefer to answer the textbook questions on paper (or using a
more familiar software program like Word) you may submit two files, a
picture/scan/copy of your answers to the textbook questions and a
separate R script (or R Markdown file) with the additional
exercise.