Directions:
\(~\)
This lab covers univariate and bivariate graphs and summaries. The goal is to provide you practice working with these topics on two different real datasets prior to the Midterm Project. Please do not use a “divide and conquer” strategy, any groups who use such a strategy will be proportionately penalized.
\(~\)
The College Scorecard is a government run database that stores institutional level data on all accredited colleges and universities in the United States. A new version of these data is published yearly, and it contains over 400 variables. We’ll use a simplified version of these data for the 2019-2020 academic year.
The College 2019 Dataset is a reduced version of the 2019-2020 College Scorecard data that contains fewer variables and is filtered to include only primarily undergraduate institutions with at least 400 enrolled students.
A brief description of each variable is given below:
Note: This dataset has been filtered further to exclude colleges with missing data for one or more of these variables (missing data is not compatible with StatKey).
\(~\)
Question #1: Summarize the distribution of median salaries 10 years after graduation for the colleges in this dataset. In doing so, you should comment upon the shape, central tendency, and spread of the relevant variable.
Question #2: Is there an association between whether a college is private or public and the median salary of its students 10 years after graduation? Answer this question using side-by-side graphs and summary statistics. In doing so, write 1-2 sentences describing the association (or lack thereof) that is present in these data.
Question #3: Summarize the distribution of median ACT scores of the colleges in this dataset. In doing so, you should comment upon the shape, central tendency, and spread of the relevant variable(s).
Question #4: In 1-2 sentences, describe the relationship between the median ACT score of a college and the median salary of its graduates 10 years after graduation. Include any relevant StatKey output in your lab write-up.
Question #5: Using a linear regression equation, predict the expected difference in the median 10 year salary of a college with a median ACT of 30 compared to a college with a median ACT of 20.
Question #6: Identify any two variable that you suspect might have an interesting connection. Then, use StatKey to explore their relationship. In your lab write-up, provide a 2-3 sentence summary of your investigation, along with any relevant graphs or descriptive statistics.
\(~\)
The Washington Post manages a comprehensive database of instances where police officers have used deadly force on an suspect dating back to 2015.
These data contain the following variables:
\(~\)
Question #7: According to the US Census, the current racial composition of the US is 61.5% Non-Hispanic White, 17.6% Hispanic (of any race), 12.3% Black, 5.3% Asian, 0.7% Native American, and 2.6% other races. Does the racial distribution of individuals killed by the police appear to mirror that of the Census? Or do certain racial groups appear to be overrepresented? Include the appropriate descriptive statistics and/or graphs created by StatKey to support your answers.
Question #8: Use these data to assess whether previous signs of mental illness are associated with a decreased likelihood of the individual attempting to flee the scene during a deadly confrontation with police. Support your answer using an appropriate set of descriptive statistics.
Question #9: For the span of these data, which state had the most police-involved deaths? Can you think of a possible explanation (aside from issues related to policing) that might explain why this particular state had the most cases in this dataset?
Question #10: Does the proportion of police-involved deaths with a body-camera present appear to be increasing, decreasing, or remaining approximately constant over time? Justify your answer using an appropriate set of descriptive statistics.
Question #11: Identify any two variable that you suspect might have an interesting connection. Then, use StatKey to explore their relationship. In your lab write-up, provide a 2-3 sentence summary of your investigation, along with any relevant graphs or descriptive statistics.