Directions:

  • My main expectation is that you thoughtfully work through labs collaboratively with your group, discussing the embedded questions and recording your responses in a shared document.
    • At times you might be asked to add screenshots to your write-up. If you are on a Windows PC, an easy way to do this is the “snipping tool”, which you can find using the search bar along the bottom of your screen. If you are on a Mac, you can find instructions on how to take a screenshot at this link.
  • Everyone should upload their own copy of the lab write-up to Canvas
  • Only a couple of questions on each lab will be graded accuracy, so your focus should be on learning the material rather than “getting the right answers” as quickly as possible

\(~\)

Introduction

Lately we’ve turned our attention to understanding random processes, with important examples being random sampling and random assignment. Today we will practice applying probability models that will help us understand the potential outcomes of a random process. We will look at examples for discrete and continuous random variables.

\(~\)

Discrete Probability Models

Insurance companies are essentially sophisticated gamblers - they estimate payouts and their probabilities and use this information to decide how much to charge, as well as whether to accept certain customers.

To see how this relates to our study of random processes, we’ll consider a simplistic hypothetical home insurance company.

In a given year, this company will provide the policy holder:

  • $200,000 if their home is completely destroyed by a major natural disaster (such as a tornado, hurricane, earthquake, etc.)
  • $40,000 if there is moderate structural damage caused by a major natural disaster
  • $10,000 if there is minor damage caused by a major natural disaster

Question #1: Let \(X\) denote the amount the insurance company pays out to a randomly selected customer in the upcoming year. Briefly explain why \(X\) is a random variable.

Question #2: What is the sample space of \(X\)? That is, what are the possible values that the company might expect to pay?

\(~\)

Discrete Probability Models

In order to determine prices, this company might use historical data to estimate the likelihood of each possible payout in a given year.

Let’s suppose the company estimates a 1 in 10,000 chance that a home is completely destroyed in a given year, a 1 in 3,000 chance it experiences moderate structural damage, and a 1 in 250 chance it experiences minor structural damage.

Question #3: Using these historical numbers, create a table that displays a discrete probability model for the amounts the insurance company might pay.

\(~\)

Expected Value and Standard Deviation

Two of the most important characteristics of a probability model its expected value and standard deviation. In this application, the expected value describes how much the company should expect to payout, on average, to a randomly chosen customer in a single year. The standard deviation describes how much, on average, the typical payout can be expected to vary from its expected value.

Question #4: Using the probability model you created in Question #3, calculate the expected value of the random variable \(X\). Based upon this information, how much would you recommend this company charge its customers each year?

Question #5: Using the probability model you created in Question #3 and the expected value you found in Question #4, find the standard deviation of the random variable \(X\). How might this influence your recommendations regarding what the company should charge its customers?

\(~\)

Continuous Probability Models

Michael Jordan is a professional basketball player who is widely recognized as the greatest NBA player of all time. In this case study, we will apply the concepts of random variables and probability models to a random sample of \(n = 200\) games to make inferences into Jordan’s performances.

The Michael Jordan game data is a random sample of \(n = 200\) regular season games from Michael Jordan’s career. It contains the following variables:

  • date - date the game was played
  • age - Jordan’s age reported as years-days
  • team - Jordan’s team
  • opp - opponent
  • result - game result: win (W) or loss (L) and score difference
  • mp - minutes played
  • fg - field goals made
  • fga - field goals attempted
  • fgp - field goal percentage
  • three - three-point shots made
  • threeatt - three-point shots attempted
  • threep - three-point shot percentage
  • ft - free-throws made
  • fta - free-throws attempted
  • ftp - free-throw percentage
  • orb - offensive rebounds
  • drb - defensive rebounds
  • trb - total rebounds (offensive + defensive)
  • ast - assists
  • stl - steals
  • blk - blocks
  • tov - turnovers
  • pts - total points scored (by Jordan)

Michael Jordan has been retired for over a decade, so might seem difficult to view his game performances as random processes; however, you can recognize is that sampling a game is a random process.

Question #6: Consider a randomly chosen game of Michael Jordan’s and let \(X\) denote the number of points he scored in that game. Briefly explain why it makes sense to treat \(X\) as a continuous random variable.

Question #7: Upload the Michael Jordan game data (click here to download) into StatKey and create a histogram of the variable “pts”. Based upon what you see, do you think a Normal probability model is reasonable?

\(~\)

Applying the Normal Model

To use a Normal probability model, you must define the parameters \(\mu\) and \(\sigma\) (the Normal curve’s expected value and standard deviation respectively). Typically, the values of these parameters are estimated from the sample data.

Question #8: Use StatKey to find the sample mean and sample standard deviation of the variable “pts” in our sample of 200 games. These are our best estimates of \(\mu\) and \(\sigma\).

Question #9: Use the theoretical distributions section of StatKey to display your Normal model for the number of points Michael Jordan scores in a randomly selected game. Based upon this model, estimate the probability that Michael Jordan scores 30 or more points?

Questions #10: Based upon your Normal model, estimate the probability that Michael Jordan scores 9 or fewer points? How does this compare to the empirical probability that Michael Jordan scores 9 or fewer points? (Hint: use StatKey to make a dotplot of the variable “pts” to help you determine the empirical probability).

\(~\)

Z-Scores

As discussed in class, there are a number of reasons why statisticians prefer working with standardized data and the Standard Normal Distribution (which we’ll denote \(N(0,1)\)).

Question #11: Using the sample mean and sample standard deviation of the variable “pts” you found in Question #8, calculate the Z-score corresponding to a game in which Michael Jordan scores 30 points.

Question #12: Use the Standard Normal Distribution to find the area to right of the Z-score you found in Question #11. How does this compare to the probability you found in Question #9.

Question #13: Now find a Z-score corresponding to a game in which Michael Jordan finishes with 10 total rebounds (the variable “trb” in these data). Then, comparing this Z-score and the one you found in Question #11, which is more unexpected for Michael Jordan: a 30 point game or 10 rebound game?