Individually, without sharing your responses with your partner, record whether you agree or disagree with each of the following statements as a reflection of yourself:
\(~\)
I’m going to randomly pair you with someone in our class, and you need to guess your partner’s response to each item before you know who you’re paired with. Your goal is to correctly guess whether your partner answered “agree” or disagree” for each item.
Take a moment to record your guesses using the format:
Try to think carefully about these questions and how they might relate to each other when coming up with your guesses.
\(~\)
Now exchange responses with your partner and check the number you got correct.
Where did it fall on the distribution of simulation results we looked at on StatKey?
\(~\)
At the time I crafted this activity it’d be impossible for me to know exactly how your guesses would unfold, but I anticipate that they’re further into the tails of the model we saw earlier than you’d expect. That is, I’d anticipate more people getting 7 or more, and 3 or fewer, of their partners answers correct.
The reasoning behind this is rooted in the concept of covariance, which measures the extent to which two variables jointly vary. If these variables are appropriately standardized, the covariance between the standardized variables is their correlation coefficient.
The questions you answered are part of an established psychometric questionnaire designed to measure the “big five” personality traits:
The Open-Source Psychometrics Project allows people to take this questionnaire (with responses on a 1-5 integer scale of disagree to agree with a neutral option).
The covariance matrix of standardized responses to the 10 statements you answered for 19,719 individuals is shown given below:
library(dplyr)
big5 <- read.delim("https://remiller1450.github.io/data/big5data.csv") %>% select(E1:E10)
X = scale(big5)
kable(1/(nrow(X)-1)*t(X) %*% X, digits = 2)
E1 | E2 | E3 | E4 | E5 | E6 | E7 | E8 | E9 | E10 | |
---|---|---|---|---|---|---|---|---|---|---|
E1 | 1.00 | -0.42 | 0.47 | -0.48 | 0.48 | -0.35 | 0.59 | -0.37 | 0.46 | -0.41 |
E2 | -0.42 | 1.00 | -0.45 | 0.53 | -0.54 | 0.57 | -0.48 | 0.37 | -0.36 | 0.46 |
E3 | 0.47 | -0.45 | 1.00 | -0.48 | 0.59 | -0.39 | 0.58 | -0.32 | 0.42 | -0.47 |
E4 | -0.48 | 0.53 | -0.48 | 1.00 | -0.51 | 0.47 | -0.50 | 0.45 | -0.45 | 0.51 |
E5 | 0.48 | -0.54 | 0.59 | -0.51 | 1.00 | -0.48 | 0.63 | -0.34 | 0.42 | -0.54 |
E6 | -0.35 | 0.57 | -0.39 | 0.47 | -0.48 | 1.00 | -0.41 | 0.32 | -0.33 | 0.41 |
E7 | 0.59 | -0.48 | 0.58 | -0.50 | 0.63 | -0.41 | 1.00 | -0.34 | 0.43 | -0.53 |
E8 | -0.37 | 0.37 | -0.32 | 0.45 | -0.34 | 0.32 | -0.34 | 1.00 | -0.51 | 0.38 |
E9 | 0.46 | -0.36 | 0.42 | -0.45 | 0.42 | -0.33 | 0.43 | -0.51 | 1.00 | -0.37 |
E10 | -0.41 | 0.46 | -0.47 | 0.51 | -0.54 | 0.41 | -0.53 | 0.38 | -0.37 | 1.00 |
\(~\)
This questionnaire includes 50 items, or 10 per big five dimension, so we could also look at the covariance between all 50 statements:
full_big5 = read.delim("https://remiller1450.github.io/data/big5data.csv") %>% select(E1:O10) %>% scale()
corrplot::corrplot(cor(full_big5), method = "color")
You can clearly see the big five emerge when looking at the covariances of the ratings of each statement.
\(~\)
In a variety of areas you’ll frequently going to need to deal with multiple uncertainties. However, if the uncertainties are positively correlated you’re disproportionately going to encounter better than expected or worse than expected outcomes.
On Tuesday we discussed diversification, and it’s worthwhile recognize that negative covariance can be beneficial in reducing risk without altering the expected value. In Chapter 13 of The Flaw of Averages, Sam Savage uses the example of splitting an investment between airlines and licorice (uncorrelated), or between airlines and petroleum (negatively correlated). The negative correlation reduces the variability of the combined return, because when petroleum is highly profitable when airlines are doing poorly and vice versa.