\(~\)
The “Iowa City home sales” data were introduced in labs 7 and 8:
This question will prepare the data for analysis first by filtering to include only the two most common home types: “1 Story Frame” and “2 Story Frame”, then by creating a new categorical variable “Size” that takes on the values:
Large
if the home has over 2100 sq feet of living
areaMedium
if the home has between 1500 and 2100 sq feet of
living areaSmall
if the home has less than 1500 sq feet of living
areaThe goal of this analysis is to describe the relationship between the new variable “Style” and the response variable “sale amount”.
Shown below are a few data visualizations and a table that will be used for this question:
Small | Medium | Large | |
---|---|---|---|
1 Story Frame | 0.830 | 0.138 | 0.032 |
2 Story Frame | 0.446 | 0.304 | 0.250 |
Size | Mean sale amount | Median sale amount | Std Dev sale amount |
---|---|---|---|
Small | 143814.0 | 139900.0 | 42617.12 |
Medium | 228906.5 | 218937.5 | 60551.21 |
Large | 354631.6 | 310000.0 | 139087.11 |
Part A: In this question, the primary analysis aims to determine whether there is an association between “Style” and “Sale Amount”. Ignoring all other factors, is there an association between these two variables? Please provide a brief explanation.
\(~\)
Part B: Consider the table involving the variables “Style” and “Size”. Is this table showing proportions or conditional proportions? If it is showing conditional proportions, which variable is it conditioning on?
\(~\)
Part C: Is the variable “Style” associated with the variable “Size”? Briefly explain.
\(~\)
Part D: Is the variable “Size” associated with the variable “Sale Amount”? Briefly explain.
\(~\)
Part E: Let’s revisit the relationship between “Size” and “Sale Amount”. Is this relationship confounded by the third variable “Style”? Justify your answer using the definition of confounding.
\(~\)
Part F: Regardless of your answer to Part D, perform a stratified analysis using differences in medians to describe the relationship between the primary variables. You do not need to provide exact values; ballpark estimates are sufficient. Be clear about which median you’re subtracting from which, so that positive and negative differences are clearly defined
\(~\)
Part G: Is Simpson’s paradox present in this analysis? Briefly explain.