Introduction

Individually, without sharing your responses with your partner, record whether you agree or disagree with each of the following statements as a reflection of yourself:

  1. I am the life of the party.
  2. I don’t talk a lot.
  3. I feel comfortable conversing with unfamiliar acquaintances.
  4. I prefer to keep in the background.
  5. I frequently start conversations with others
  6. I often have little to say in social situations.
  7. I talk to a lot of different people at parties.
  8. I don’t like to draw attention to myself.
  9. I don’t mind being the center of attention.
  10. I am quiet around strangers.

\(~\)

Predictions

I’m going to randomly pair you with someone in our class, and you need to guess your partner’s response to each item before you know who you’re paired with. Your goal is to correctly guess whether your partner answered “agree” or disagree” for each item.

Take a moment to record your guesses using the format:

  1. Disagree
  2. Agree \(\ldots\)
  3. Agree

Try to think carefully about these questions and how they might relate to each other when coming up with your guesses.

\(~\)

The uncorrelated model

Before checking your accuracy, let’s consider a simple probability model for this scenario.

Question #1: How could flipping a coin 10 times and recording the number of heads act as a reasonable probability model for this scenario?

Question #2: What is the expected value (for the number of items you get correct) of the model described in Question #1?

\(~\)

The proportion of correct answers for the model described Question #1 can be explored via simulation using this StatKey page.

Question #3: Using the “right tail” check box and the blue box on the lower left, estimate the probability that you guess at least 7 of 10 answers correctly (according to this model).

Question #4: Using your own intuition, does this probability seem too high or too low? Briefly explain.

\(~\)

Testing your predictions

Now exchange responses with your partner and check the number you got correct.

Where did it fall on the distribution of simulation results we looked at on StatKey?

\(~\)

Covariance

At the time I crafted this activity it’d be impossible for me to know exactly how your guesses would unfold, but I anticipate that they’re further into the tails of the model we saw earlier than you’d expect. That is, I’d anticipate more people getting 7 or more, and 3 or fewer, of their partners answers correct.

The reasoning behind this is rooted in the concept of covariance, which measures the extent to which two variables jointly vary. If these variables are appropriately standardized, the covariance between the standardized variables is their correlation coefficient.

The questions you answered are part of an established psychometric questionnaire designed to measure the “big five” personality traits:

  1. Extroversion
  2. Agreeableness
  3. Openness
  4. Conscientiousness
  5. Neuroticism

The Open-Source Psychometrics Project allows people to take this questionnaire (with responses on a 1-5 integer scale of disagree to agree with a neutral option).

The covariance matrix of standardized responses to the 10 statements you answered for 19,719 individuals is shown given below:

library(dplyr)
big5 <- read.delim("https://remiller1450.github.io/data/big5data.csv") %>% select(E1:E10)
X = scale(big5)
kable(1/(nrow(X)-1)*t(X) %*% X, digits = 2)
E1 E2 E3 E4 E5 E6 E7 E8 E9 E10
E1 1.00 -0.42 0.47 -0.48 0.48 -0.35 0.59 -0.37 0.46 -0.41
E2 -0.42 1.00 -0.45 0.53 -0.54 0.57 -0.48 0.37 -0.36 0.46
E3 0.47 -0.45 1.00 -0.48 0.59 -0.39 0.58 -0.32 0.42 -0.47
E4 -0.48 0.53 -0.48 1.00 -0.51 0.47 -0.50 0.45 -0.45 0.51
E5 0.48 -0.54 0.59 -0.51 1.00 -0.48 0.63 -0.34 0.42 -0.54
E6 -0.35 0.57 -0.39 0.47 -0.48 1.00 -0.41 0.32 -0.33 0.41
E7 0.59 -0.48 0.58 -0.50 0.63 -0.41 1.00 -0.34 0.43 -0.53
E8 -0.37 0.37 -0.32 0.45 -0.34 0.32 -0.34 1.00 -0.51 0.38
E9 0.46 -0.36 0.42 -0.45 0.42 -0.33 0.43 -0.51 1.00 -0.37
E10 -0.41 0.46 -0.47 0.51 -0.54 0.41 -0.53 0.38 -0.37 1.00

\(~\)

This questionnaire includes 50 items, or 10 per big five dimension, so we could also look at the covariance between all 50 statements:

full_big5 = read.delim("https://remiller1450.github.io/data/big5data.csv") %>% select(E1:O10) %>% scale()
corrplot::corrplot(cor(full_big5), method = "color")

You can clearly see the big five emerge when looking at the covariances of the ratings of each statement.

\(~\)

Pragmatic use of covariance

In a variety of areas you’ll frequently going to need to deal with multiple uncertainties. However, if the uncertainties are positively correlated you’re disproportionately going to encounter better than expected or worse than expected outcomes.

On Tuesday we discussed diversification, and it’s worthwhile recognize that negative covariance can be beneficial in reducing risk without altering the expected value. In Chapter 13 of The Flaw of Averages, Sam Savage uses the example of splitting an investment between airlines and licorice (uncorrelated), or between airlines and petroleum (negatively correlated). The negative correlation reduces the variability of the combined return, because when petroleum is highly profitable when airlines are doing poorly and vice versa.