Directions:
- Submit your assignment via P-web.
- Submit only a compiled R Markdown document (pdf, word, or html
output are all okay, but you may need to “zip” an html file)
- If you want to compile to a pdf you can install the
tinytext
package by running
install.packages('tinytex')
followed by
tinytex::install_tinytex()
- Only submit your .Rmd file if you are unable to compile it due to
errors (in the future you will be penalized for this)
Question #1
Michael Jordan is a retired professional basketball player who is
recognized by many as the greatest player of all time. Since Jordan has
been retired for more than two decades, it makes sense to consider each
game he played in as a case within the population that constitutes his
career. Given below are two independently drawn random samples
of games played by Michael Jordan. One sample consists of \(n=25\) games, and the other consists of
\(n=200\) games.
## Sample of 25 games
mj25 = read.csv("https://remiller1450.github.io/data/mj25.csv")
## Sample of 200 games
mj200 = read.csv("https://remiller1450.github.io/data/mj200.csv")
- Part A: Let the random variable \(X\) denote the points Michael Jordan scores
in a randomly chosen game. Is this a discrete or continuous random
variable? Briefly explain.
- Part B: Let the random variable \(Y\) denote the average number of
points Michael Jordan scores in a random sample of \(n=25\) games. Is this a discrete or
continuous random variable? Briefly explain.
- Part C: Create a histogram showing the distribution
the variable
pts
using the mj25
data
set (the one with 25 games). Do you believe that this data set
provides enough information for you to reliably approximate the
underlying probability distribution of \(X\) with a continuous function (ie: Normal
curve, uniform distribution, etc.)? Briefly explain.
- Part D: Now create a histogram showing the
distribution the variable
pts
using the
mj200
data set (the one with 200 games). Do you
believe that this data set provides enough information for you to
reliably approximate the underlying probability distribution of \(X\) with a continuous function? Briefly
explain.
- Part E: In responding to Parts C and D, you may
have considered how many times you were able to observe \(X\). Keeping this in mind, how many times
have been able to observe \(Y\)
considering only the
mj25
data set?
- Part F: Do you expect the probability distribution
of \(Y\) to exhibit more variability or
less variability than the probability distribution of \(X\)? Briefly explain.
- Part G: Consider the sample mean of
pts
in the mj25
data set, and the sample mean
in the mj200
data set. Without performing any calculations,
which of these sample means would you expect to be closer to the
population mean, which is Michael Jordan’s points per game for his
entire career?
- Part H: Michael Jordan’s career average points per
game was 30.1. Calculate the sample mean of the variable
pts
in both the mj25
and mj200
samples. Which of these samples produces an estimate closer to the
truth?
- Part I: Do your findings in Part H suggest there
may be sampling bias in how games were selected for the
mj25
and mj200
samples? Briefly explain.
\(~\)
Question #2
In this question you’ll continue using the two samples of Michael
Jordan’s regular season games described in Question 1; however, this
question will focus on probability models.
- Part A: Calculate the proportion of games in the
mj200
sample where Michael Jordan scored more than 30
points. Hint: You might find the examples in Lab 6 to be
helpful in using the summarize()
function to obtain this
proportion.
- Part B: Consider a Normal probability
model for the number of points Michael Jordan scores in a game. Use
the data in
mj200
to find appropriate parameters for this
model.
- Part C: Using your probability model from Part B,
find the probability that Michael Jordan scores more than 30 points in a
game.
- Part D: How does your answer in Part C relate to
the proportion you found in Part A? Briefly explain.
- Part E: Construct a different probability model
using the Normal distribution and parameters estimated from the
mj25
sample (the one with only \(n=25\) games). Using this model, find the
probability that Michael Jordan scores more than 30 points in a
game.
- Part F: Michael Jordan scored 30+ points in 562 of
his 1,072 career regular season games. Did the probability model from
Part B or from Part E come closer to accurately estimating this
probability? Is that consistent with what you had expected? Briefly
explain.
\(~\)
Question #3
For each of the following scenarios indicate whether the data being
described is most reasonably described as a population or a
sample. If it is a sample, indicate whether it a biased
sample or a representative sample. You should provide a
1-2 sentence justification for each of your answers.
- Part A: To estimate the size of trout in a lake, an
angler records the weights of the 12 trout he caught over a
weekend.
- Part B: A subscription-based music streaming app
tracks the listening history of all of its users.
- Part C: Using random digit
dialing to contact over 14,000 US adults, Gallup finds that 46% of
Americans politically identify as Republican or Republican leaning.
- Part D: A poll posted on X asks users: “Do you like
or dislike the idea of wearing a consistent uniform to work every day,
similar to what Steve Jobs did?”
- Part E: A college’s institutional research team
uses a random number generator to select 50 course syllabi to use in a
study.