Directions:
The Armed Forces Qualification Test (AFQT) is a multiple-choice test designed to measure an individual’s reasoning and verbal skills to assess their suitability for various military service roles. Throughout the 1970s the test was applied very broadly, which led to percentile scores on it becoming a commonly used measure of general intelligence.
The data provided below are a random sample of \(n=2584\) Americans who took the AFQT in 1979. They were later interviewed in 2006 about their educational attainment and annual income.
afqt <- read.csv("https://remiller1450.github.io/data/AFQT.csv")
Income2005. Briefly describe the shape of this
distribution.AFQT (x-axis), Income2005
(y-axis), and number of years of education completed (the
color aesthetic). Based upon this graph, which combinations
of variables appear related? You should clearly indicate whether you see
an association between all three combinations (AFQT and
Income2005, AFQT and Educ,
Educ and Income2005) and briefly describe the
direction of that association (ie: higher AFQT is associated with higher
income, etc.)AFQT to predict Income2005. Report and
interpret the marginal effect (slope coefficient) from the fitted
model.AFQT and Educ to predict
Income2005. Report and interpret the adjusted effect of
AFQT after adjusting for differences in educational
attainment.AFQT is roughly twice as large as the adjusted effect of a
1-percentile increase in AFQT. In 1-2 sentences, explain
why this is. That is, what is it about the relationships in these data
that would lead to this type of difference?Income2005 ~ AFQT + Educ and
Income2005 ~ AFQT. Report the \(p\)-value of this test and provide a brief
conclusion.log2(Income2005) ~ AFQT + Educ and
log2(Income2005) ~ AFQT. Why might it be prudent to apply a
log-transformation to these data when seeking to statistically evaluate
the role of the predictor Educ? Hint: Think about
the diagnostic plots you created in Part G.log2(Income2005) ~ AFQT + Educ. Do the
assumptions that underly the \(F\)-test
seem more reasonable or less reasonable after applying a
log-transformation to the outcome variable? Briefly explain.
Hint: Pay attention to the axes scales in diagnostic plots when
judging departures from the assumed conditions.log2(Income2005) ~ AFQT + Educ, interpret the adjusted
effect of a 1-percentile increase in AFQT on income. Note
that you must properly apply an inverse transformation to facilitate an
appropriate interpretation.\(~\)
The “email spam” data set contains roughly 4000 emails received by the Gmail Account of the statistician David Diez in the early months of 2012. Additional details on the data can be found here. This question focuses on the variables defined below:
spam - the outcome variable, a binary indicator of
whether the user considered an email to be spamdollar - the number of times a \(\$\) symbol or the word “dollar” appeared
in the email.exclaim_mess - the number of exclaimation points, or !
symbols, that appeared in the email.emails = read.csv("https://remiller1450.github.io/data/email_spam.csv")
spam, briefly explain why logistic regression is a more
appropriate model for these data than linear regression.dollar to predict spam. Interpret the
estimated intercept of this model. Be sure you exponentiate to
facilitate a meaningful interpretation.dollar on the likelihood of an email being spam using the
model you fit in Part B. Be sure you exponentiate to facilitate a
meaningful interpretation.dollar and exclaim_mess to
predict the likelihood of an email being spam. Interpret the adjusted
effect of dollar in this model. Why might the adjusted
effect differ from the marginal effect (which you found in Part C)?
Briefly explain.