Directions:
A study published in 2023 investigated whether dogs could be trained to detect cancer using their sense of smell. As part of the study, researchers collected breath samples from healthy controls and cancer patients, and they then repeated trials where a trained dog was exposed to five bags containing breath samples, one of which was from a cancer patient and the others were from healthy controls. The dogs had been trained to select exactly one bag in each trial, and during training they were given rewards for identifying cancer samples. In each trial the researchers recorded whether or not the trained dog correctly identified the breath sample from a cancer patient.
\(~\)
R
)The Transport Security Administration (TSA) is an agency within the US Department of Homeland Security with authority over the safety and security of travel in the United States. For this question, you will analyze data from a random sample of \(n=5000\) claims made by travelers against the TSA between 2003 and 2008, the first five years that the agency existed. These data are found at the following link:
The relevant variables in this analysis are:
Status - whether the claim was approved (paid in full), settled (partially paid/negotiated), or denied (not paid at all)
Claim_Amount - the amount of monetary damages requested in the initial claim
Part A: Write R
code that stores
these data in a data frame object named tsa_data
. Then find
the average amount of monetary damages claimed by travelers.
Part B: Create a data visualization showing the
distribution of Claim_Amount
. Briefly describe 1 or 2
things you can learn from this distribution that you could not have
known using just the average value you calculated in Part A.
Part C: Create a table displaying the frequencies of each status. Using this table, calculate the proportion of claims that are denied.
Part D: Create a data visualization showing the distribution of claim statuses.
Part E: According to research by Weiss Ratings, USAA denies the highest percentage of home insurance claims, rejecting 48.1% of the claims they receive. Suppose we use this value to inform a null hypothesis. Does the sample of claims against the TSA support the conclusion that the TSA rejects a lower percentage of claims than USAA? Report a \(p\)-value and an appropriate one-sentence conclusion.
Part F: Suppose we had used a smaller sized sample data in Part E, such as a random sample of 500 claims rather than 5000. Would you expect the \(p\)-value to be larger or smaller? Try dividing the observed count and sample size by 10 (and rounding to the nearest whole number) then inputting these new “data” into StatKey to verify your expectation.