Directions:
tinytext
package by running
install.packages('tinytex')
followed by
tinytex::install_tinytex()
For this question you’ll use the “118th Congress” data set located at the URL:
https://remiller1450.github.io/data/congress_2024.csv
As a reminder, this data set documents the age and political party of all members of the 118th US Congress.
Before beginning, you should run the code provided below, which reads
these data and uses the ifelse()
function to create a new
binary categorical variable “Baby_Boomer” that indicates whether a
member of congress is part of the baby boomer
demographic cohort.
## Read data
congress = read.csv("https://remiller1450.github.io/data/congress_2024.csv")
## Create the "Baby_Boomer" variable
congress$Baby_Boomer = ifelse(congress$Age > 60 & congress$Age < 79, "Boomer", "Not Boomer")
Party = 'R'
) belonging to the baby boomer demographic
cohort.Party = 'D'
) belonging to the
baby boomer cohort.\(~\)
For this question you’ll use the “Hollywood Movies” data set located at the URL:
https://remiller1450.github.io/data/HollywoodMovies.csv
This data set documents the box office performance and critic ratings of major films released between 2007 and 2013.
Budget
, the
amount of money used to produce the film (millions of USD), and the
response variable TheatersOpenWeek
, the number of theaters
worldwide that aired the film during on opening weekend.Budget
and
TheatersOpenWeek
. Do you believe this is an appropriate
measure of association to describe the relationship between these
variables? Briefly explain. Hint: be sure to use the argument
use = "complete.obs"
since there are missing values in
these data.Budget
and
TheatersOpenWeek
. Do you believe this is an appropriate
measure of association to describe the relationship between these
variables? Briefly explain.lm()
function to fit a
linear regression model using Budget
as the explanatory
variable and TheatersOpenWeek
as the response variable.
Print the model’s estimated coefficients (intercept and slope) and
provide a brief interpretation of each.Budget
and
TheatersOpenWeek
. Without fitting the second model, briefly
explain why it will have a higher \(R^2\) value.select()
function to
create a new data frame containing only the variables
RottenTomatoes
(the film’s rating by professional critics),
AudienceScore
(the film’s ratings by ordinary viewers), and
WorldGross
(the total amount of money generated by the
film). Then, create a correlation matrix using Pearson’s correlation to
determine which of these two ratings has the strongest linear
association with WorldGross
.