From the Introduction to Modern Statistics (IMS) textbook, complete
the following exercises:
- Ch 16.4 Exercise: #28
- Ch 17.5 Exercises: #7, #18, #22
- Ch 19.4 Exercises: #14, #17
For any question involving a hypothesis test you must:
- Clearly state your null and alternative hypotheses using
both words and statistical symbols
- Clearly state the test you’re using and show the calculation of an
appropriate test statistic for that test
- Find a two-sided \(p\)-value and
provide a 1-sentence interpretation that involves the context behind the
application
\(~\)
Additionally, complete the following exercise:
Question #1: For this question you’ll use the
“Professor Salaries” data set provided below. For context, this data set
contains the 9-month academic year salaries of faculty members at a
major US public university. In addition to salaries from the 2008-09
academic year, it also contains de-identified documentation of the sex,
rank, discipline, and experience of each faculty member. These data were
obtained from the “carData” R
package.
https://remiller1450.github.io/data/Salaries.csv
For this question you should perform all hypothesis tests using
R
. However, you should still clearly state your null and
alternative hypotheses and provide the \(p\)-value of your test along with a
context-based conclusion.
- Part A: Use
ggplot()
to create an
appropriate data visualization showing the relationship between the
explanatory variable “sex” and the response variable “salary”. Using
your graph, briefly explain whether these variables appear to be
associated.
- Part B: Perform an appropriate hypothesis test to
evaluate whether there is a difference in the mean salaries of male and
female faculty at this university.
- Part C: Now consider the third variable
“yrs.since.phd” and create an appropriate data visualization showing the
relationship between this variable and “salary”. Do these variables
appear to be associated?
- Part D:
Perform an appropriate hypothesis test
to evaluate whether there is a statistically significant relationship
between “yrs.since.phd” and “salary” (at the \(\alpha = 0.05\) significance
level).
- Part E: Create an appropriate data visualization
showing the relationship between “yrs.since.phd” and “sex”. Do these
variables appear associated?
- Part F: Now perform an appropriate hypothesis test
to evaluate whether there is a statistically significant relationship
between “yrs.since.phd” and “sex” (at the \(\alpha = 0.05\) significance level).
- Part G: Considering your responses to Parts C-F,
does the variable “yrs.since.phd” a confound the relationship between
“sex” and “salary”? How does this impact the conclusions you draw from
the hypothesis test you performed in Part B? Briefly explain.