Last week we discussed bootstrapping and uncertainty. The main idea was that data, and estimates derived from it, come with uncertainty, and we can use simulation (drawing bootstrap samples) to better understand the uncertainty present in our data.
There are several ways that we can incorporate uncertainty into our decision making:
Things become more complicated as we become interested in combinations of uncertain numbers.
\(~\)
A random variable is a variable that represents the unknown outcome of a random process.
For example, suppose you’re a venture capitalist and can invest in an film to share in part of it’s profits. You might use the random variable \(X\) to denote amount you get back for each $100,000 invested into the film.
Any random variable has a probability distribution, which reflects likelihood of observing a particular value. We might see the data displayed above and conjecture that the return of \(X\) looks like an exponential distribution (with rate parameter \(\lambda = 1\)), potentially yielding the following probability model:
Using this model, we might ask ourselves two questions:
\(~\)
While the exponential curve displayed in the previous section reflects a probability distribution, which is a mathematical function, it was actually graphed using the following computer code:
x = 100000*rexp(rate = 1, n = 200000)
ggplot() + geom_density(aes(x))
All this code did was randomly sample from the exponential distribution, with rate parameter \(\lambda = 1\), a total of 200,000 times.
We can use these simulated samples to answer the questions we had posed earlier, without needing to rely on any complicated mathematics:
## This is the expected value
mean(x)
## [1] 100077.8
## This is the chances of losing money
sum(x <= 100000)/length(x)
## [1] 0.630415
It’s worth pointing out that the actual expected value of this model is exactly $100,000; the deviation we see is due to something known as Monte Carlo error, and we could reduce it by generating more samples.
However, what if things followed the example given in The Flaw of Averages, and the films of interest had an average profit of $20,000,0000 (which would be shared proportionally with you as an investor) and a 25% chance of losing money?
\(~\)
In reality it’s unlikely that you’re constrained to being able to only invest in a single film. So what if you spread your investment evenly across two films?
x = 50000*rexp(rate = 1, n = 200000) + 50000*rexp(rate = 1, n = 200000)
## chances of losing money
sum(x <= 100000)/length(x)
## [1] 0.59326
Now, what if you spread your investment across 5 films:
x = 20000*rexp(rate = 1, n = 200000) +
20000*rexp(rate = 1, n = 200000) +
20000*rexp(rate = 1, n = 200000) +
20000*rexp(rate = 1, n = 200000) +
20000*rexp(rate = 1, n = 200000)
## chances of losing money
sum(x <= 100000)/length(x)
## [1] 0.55942
The benefit of diversification is that a return closer to the expected becomes more likely. This is due to a phenomenon known as Central Limit theorem. Copied below is a GIF animation of the example used in The Flaw of Averages:
\(~\)
This material was intended to provide a brief overview of a few core topics in probability, statistics, and decision making. It should be complimentary to our reading of The Flaw of Averages.
Assignment: To receive completion credit for participating in class today, you are expected to turn in a 3-5 sentence paragraph summarizing the single main idea of this material and what you conclude from it. Your focus in this paragraph should be clarity, conciseness, and technical accuracy.