\(~\)

Question

The “Iowa City home sales” data were introduced in labs 7 and 8:

This question will prepare the data for analysis first by filtering to include only the two most common home types: “1 Story Frame” and “2 Story Frame”, then by creating a new categorical variable “Size” that takes on the values:

  • Large if the home has over 2100 sq feet of living area
  • Medium if the home has between 1500 and 2100 sq feet of living area
  • Small if the home has less than 1500 sq feet of living area

The goal of this analysis is to describe the relationship between the new variable “Style” and the response variable “sale amount”.

Shown below are a few data visualizations and a table that will be used for this question:

Two-way table of Size by Style
Small Medium Large
1 Story Frame 0.830 0.138 0.032
2 Story Frame 0.446 0.304 0.250
Descriptive table of Sale Amount by Size
Size Mean sale amount Median sale amount Std Dev sale amount
Small 143814.0 139900.0 42617.12
Medium 228906.5 218937.5 60551.21
Large 354631.6 310000.0 139087.11

Part A: In this question, the primary analysis aims to determine whether there is an association between “Style” and “Sale Amount”. Ignoring all other factors, is there an association between these two variables? Please provide a brief explanation.

\(~\)

Part B: Consider the table involving the variables “Style” and “Size”. Is this table showing proportions or conditional proportions? If it is showing conditional proportions, which variable is it conditioning on?

\(~\)

Part C: Is the variable “Style” associated with the variable “Size”? Briefly explain.

\(~\)

Part D: Is the variable “Size” associated with the variable “Sale Amount”? Briefly explain.

\(~\)

Part E: Let’s revisit the relationship between “Size” and “Sale Amount”. Is this relationship confounded by the third variable “Style”? Justify your answer using the definition of confounding.

\(~\)

Part F: Regardless of your answer to Part D, perform a stratified analysis using differences in medians to describe the relationship between the primary variables. You do not need to provide exact values; ballpark estimates are sufficient. Be clear about which median you’re subtracting from which, so that positive and negative differences are clearly defined

\(~\)

Part G: Is Simpson’s paradox present in this analysis? Briefly explain.