\(~\)

Question

The “Iowa City home sales” data were introduced in labs 7 and 8:

This question will prepare the data for analysis first by filtering to include only the two most common home types: “1 Story Frame” and “2 Story Frame”, then by creating a new categorical variable “Size” that takes on the values:

  • Large if the home has over 2100 sq feet of living area
  • Medium if the home has between 1500 and 2100 sq feet of living area
  • Small if the home has less than 1500 sq feet of living area

The goal of this analysis is to describe the relationship between the new variable “Style” and the response variable “sale amount”.

Shown below are a few data visualizations and a table that will be used for this question:

Two-way table of Size by Style
Small Medium Large
1 Story Frame 0.830 0.138 0.032
2 Story Frame 0.446 0.304 0.250
Descriptive table of Sale Amount by Size
Size Mean sale amount Median sale amount Std Dev sale amount
Small 143814.0 139900.0 42617.12
Medium 228906.5 218937.5 60551.21
Large 354631.6 310000.0 139087.11

Part A: In this question, the primary analysis aims to determine whether there is an association between “Style” and “Sale Amount”. Ignoring all other factors, is there an association between these two variables? Please provide a brief explanation.

  • Yes, as evidenced by the first set of boxplots the median sale amount is higher for home whose style is 2-story frame when compared to the median sale amount of homes whose style is 1-story frame, therefore these variables are associated

Part B: Consider the table involving the variables “Style” and “Size”. Is this table showing proportions or conditional proportions? If it is showing conditional proportions, which variable is it conditioning on?

  • It is showing conditional proportions, and it is conditioning on the variable “Style”.

Part C: Is the variable “Style” associated with the variable “Size”? Briefly explain.

  • Yes, the distribution of sizes is different for each style. For example, 1-story homes have a much higher proportion of “small” in comparison to 2-story homes (0.83 vs. 0.446).

Part D: Is the variable “Size” associated with the variable “Sale Amount”? Briefly explain.

  • Yes, both the mean and median sale amount are very different across each category of “Size”, with large homes selling for more than medium and small as an example.

Part E: Let’s revisit the relationship between “Size” and “Sale Amount”. Is this relationship confounded by the third variable “Style”? Justify your answer using the definition of confounding.

  • Yes, the variable “Size” is associated with both the explanatory variable, “Style” (established in Part C), and the response variable “Sale Amount” (established in Part D).

Part F: Regardless of your answer to Part D, perform a stratified analysis using differences in medians to describe the relationship between the primary variables. You do not need to provide exact values; ballpark estimates are sufficient. Be clear about which median you’re subtracting from which, so that positive and negative differences are clearly defined

  • Note that a stratified analysis is a separate analysis performed within each strata created by the confounding variable. Thus, we should report a difference in medians for each category of “Size”
    • For small homes, 1-story homes have approximately the same median sale amount as 2-story homes (difference of around 0)
    • For medium homes, the median 1-story sold for approximately 40,000 more than the median 2-story home
    • For large homes, the median 1-story sold for approximately 30,000 more than the median 2-story home

Part G: Is Simpson’s paradox present in this analysis? Briefly explain.

  • Yes, Simpson’s paradox is a reversal of the direction of an association that occurs after controlling for a confounding variable. Here we saw higher expected sale amounts for 2-story homes at first, but once we controlled for the variable “size” we saw higher sale amounts for 1-story homes within every category of “size”.