Sta-209-04 (Spring 24) Homework #6

From the Introduction to Modern Statistics (IMS) textbook, complete the following exercises:

Ch 13.8 Exercises: #2 (skip part B-i), #4
Ch 19.4 Exercises: #8, #13, #16, #21 (complete part B only)

Also complete the additional question given below:

Question #1: The data below are a random sample of \(n=5000\) claims made against the US Transportation Security Administration (TSA). To provide some background, a claim is a formal request for compensation due to damages. People make claims against the TSA when their personal property is stolen or damaged, or when they are injured by TSA processes. If the claim is approved or settled, the TSA pays an amount of money, the close amount, to the individual who made the claim.

tsa = read.csv("https://remiller1450.github.io/data/tsa_small.csv")

Part A: Filter the tsa data set to only include claims on the following items: "Cell Phones" and "Computer - Laptop". Then, create a new binary variable “Denied” that records whether a claim was denied or not using the mutate() and ifelse() functions. Store your results in a new data frame and use them throughout the remainder of this question.
Part B: Calculate a point estimate using the sample data from Part A of the odds ratio comparing the odds of a claim on a computer/laptop being denied with the odds of a claim on a cell phone being denied.
Part C: Use R to find a 95% confidence interval estimate of odds ratio described in part B.
Part D: Based upon your interval from Part C, are you confident that the TSA is more likely to deny claims made on laptops than it is to deny claims made on cell phones? Briefly explain.
Part E: Filter your data from Part A to only include claims that were less than 1000 dollars (in claim amount) and were not denied. Store the results in a data frame to be used in Parts F and G.
Part F: Fit a single variable linear regression model using the predictor Status to model the outcome variable Close_Amount. Find an interpret a 95% confidence interval estimate using the coefficient of the re-coded variable StatusSettled. Can you be confident that settled claims provide payments that differ from approved claims?
Part G: Fit a multivariable linear regression model that predicts the outcome variable Close_Amount using the predictors Status and Claim_Amount. Find an interpret a 95% confidence interval estimate using the coefficient of the re-coded variable StatusSettled. Does this interval lead you to a different conclusion about the payouts of settled claims vs. approved claims when compared with your results from Part F? If so, explain what is different. If not, briefly explain why.