Sta-209 (Spring 2025) Homework #5

Directions:

Submit your assignment via P-web.
Submit only a compiled R Markdown document (pdf, word, or html output are all okay, but you may need to “zip” an html file)
- If you want to compile to a pdf you can install the tinytext package by running install.packages('tinytex') followed by tinytex::install_tinytex()
Only submit your .Rmd file if you are unable to compile it due to errors (in the future you will be penalized for this)

Question #1

Michael Jordan is a retired professional basketball player who is recognized by many as the greatest player of all time. Since Jordan has been retired for more than two decades, it makes sense to consider each game he played in as a case within the population that constitutes his career. Given below are two independently drawn random samples of games played by Michael Jordan. One sample consists of \(n=25\) games, and the other consists of \(n=200\) games.

## Sample of 25 games
mj25 = read.csv("https://remiller1450.github.io/data/mj25.csv")

## Sample of 200 games
mj200 = read.csv("https://remiller1450.github.io/data/mj200.csv")

Part A: Let the random variable \(X\) denote the points Michael Jordan scores in a randomly chosen game. Is this a discrete or continuous random variable? Briefly explain.
Part B: Let the random variable \(Y\) denote the average number of points Michael Jordan scores in a random sample of \(n=25\) games. Is this a discrete or continuous random variable? Briefly explain.
Part C: Create a histogram showing the distribution the variable pts using the mj25 data set (the one with 25 games). Do you believe that this data set provides enough information for you to reliably approximate the underlying probability distribution of \(X\) with a continuous function (ie: Normal curve, uniform distribution, etc.)? Briefly explain.
Part D: Now create a histogram showing the distribution the variable pts using the mj200 data set (the one with 200 games). Do you believe that this data set provides enough information for you to reliably approximate the underlying probability distribution of \(X\) with a continuous function? Briefly explain.
Part E: In responding to Parts C and D, you may have considered how many times you were able to observe \(X\). Keeping this in mind, how many times have been able to observe \(Y\) considering only the mj25 data set?
Part F: Do you expect the probability distribution of \(Y\) to exhibit more variability or less variability than the probability distribution of \(X\)? Briefly explain.
Part G: Consider the sample mean of pts in the mj25 data set, and the sample mean in the mj200 data set. Without performing any calculations, which of these sample means would you expect to be closer to the population mean, which is Michael Jordan’s points per game for his entire career?
Part H: Michael Jordan’s career average points per game was 30.1. Calculate the sample mean of the variable pts in both the mj25 and mj200 samples. Which of these samples produces an estimate closer to the truth?
Part I: Do your findings in Part H suggest there may be sampling bias in how games were selected for the mj25 and mj200 samples? Briefly explain.

\(~\)

Question #2

In this question you’ll continue using the two samples of Michael Jordan’s regular season games described in Question 1; however, this question will focus on probability models.

Part A: Calculate the proportion of games in the mj200 sample where Michael Jordan scored more than 30 points. Hint: You might find the examples in Lab 6 to be helpful in using the summarize() function to obtain this proportion.
Part B: Consider a Normal probability model for the number of points Michael Jordan scores in a game. Use the data in mj200 to find appropriate parameters for this model.
Part C: Using your probability model from Part B, find the probability that Michael Jordan scores more than 30 points in a game.
Part D: How does your answer in Part C relate to the proportion you found in Part A? Briefly explain.
Part E: Construct a different probability model using the Normal distribution and parameters estimated from the mj25 sample (the one with only \(n=25\) games). Using this model, find the probability that Michael Jordan scores more than 30 points in a game.
Part F: Michael Jordan scored 30+ points in 562 of his 1,072 career regular season games. Did the probability model from Part B or from Part E come closer to accurately estimating this probability? Is that consistent with what you had expected? Briefly explain.

\(~\)

Question #3

For each of the following scenarios indicate whether the data being described is most reasonably described as a population or a sample. If it is a sample, indicate whether it a biased sample or a representative sample. You should provide a 1-2 sentence justification for each of your answers.

Part A: To estimate the size of trout in a lake, an angler records the weights of the 12 trout he caught over a weekend.
Part B: A subscription-based music streaming app tracks the listening history of all of its users.
Part C: Using random digit dialing to contact over 14,000 US adults, Gallup finds that 46% of Americans politically identify as Republican or Republican leaning.
Part D: A poll posted on X asks users: “Do you like or dislike the idea of wearing a consistent uniform to work every day, similar to what Steve Jobs did?”
Part E: A college’s institutional research team uses a random number generator to select 50 course syllabi to use in a study.