Directions:
R
code and/or written/typed calculations, for a third party to understand how you arrived at your solution.\(~\)
A Gallup Youth Poll was conducted to determine the topics that teenagers most want to discuss with their parents. The findings show that 46% would like more discussion about the family’s financial situation, 37% would like to talk about school, and 30% would like to talk about religion. The survey was based on a national sampling of 505 teenagers, selected at random from all U.S. teenagers.
\(~\)
This exercise involves the Boston housing data set. To begin, load in the Boston data set. The Boston data set is part of the MASS
library in R (note: this is package comes installed by default, a full description of the data set is available at this link).
library(MASS)
data <- Boston
\(~\)
Describe the differences between a parametric and a non-parametric modeling approach. What are the advantages of a parametric approach to regression or classification (as opposed to a nonparametric approach)? What are its disadvantages? (a satisfactory answer should be 2-5 sentences)
\(~\)
Mechanical engineers at the University of Newcastle (Australia) investigated the use of timber in high-efficiency small wind turbine blades (Wind Engineering, January 2004). The strengths of two types of timber—radiata pine and hoop pine—were compared. Twenty specimens (called “coupons”) of each timber blade were fatigue tested by measuring the stress (in MPa) on the blade after various numbers of blade cycles.Asimple linear regression analysis of the data—one conducted for each type of timber—yielded the following results (where y = stress and x = natural logarithm of number of cycles):
Radiata Pine: \(\hat{y} = 97.37 - 2.50 X\)
Hoop Pine: \(\hat{y} = 122.03 - 2.36 X\)
\(~\)
A study was conducted to model the thermal performance of integral-fin tubes used in the refrigeration and process industries (Journal of Heat Transfer, August 1990). Twenty-four specially manufactured integral-fin tubes with rectangular fins made of copper were used in the experiment. Vapor was released downward into each tube and the vapor-side heat transfer coefficient (based on the outside surface area of the tube) was measured.
The dependent variable for the study is the heat transfer enhancement ratio, y, defined as the ratio of the vapor-side coefficient of the fin tube to the vapor-side coefficient of a smooth tube evaluated at the same temperature. Theoretically, heat transfer will be related to the area at the top of the tube that is “unflooded” by condensation of the vapor. The data in the table are the unflooded area ratio (x) and heat transfer enhancement (y) values recorded for the 24 integral-fin tubes.
The data needed for this question are given below:
Q5_data <- data.frame(unflooded_area_ratio = c(1.93, 1.95, 1.78, 1.64,
1.54, 1.32, 2.12, 1.88,
1.70, 1.58, 2.47, 2.37,
2.00, 1.77, 1.62, 2.77,
2.47, 2.24, 1.32, 1.26,
1.21, 2.26, 2.04, 1.88),
heat_transfer_enhancement = c(4.4, 5.3, 4.5, 4.5,
3.7, 2.8, 6.1, 4.9,
4.9, 4.1, 7.0, 6.7,
5.2, 4.7, 4.2, 6.0,
5.8, 5.2, 3.5, 3.2,
2.9, 5.3, 5.1, 4.6))
\(~\)
Refer to the Journal of Heat Transfer study of the straight-line relationship between heat transfer enhancement (y) and unflooded area ratio (x), Exercise 3.22 (p. 109 in the textbook, or the previous question in this assignment). Construct a 95% confidence interval for \(\beta_1\), the slope of the line. Interpret the result.
\(~\)
The British Journal of Sports Medicine (April 2000) published a study of the effect of massage on boxing performance. Two variables measured on the boxers were blood lactate concentration (mM) and the boxer’s perceived recovery (28-point scale). Based on information provided in the article, the data in the table were obtained for 16 five-round boxing performances, where a massage was given to the boxer between rounds. Conduct a test to determine whether blood lactate level (y) is linearly related to perceived recovery (x). Use \(\alpha = 0.10\) as the significance threshold.
The data needed for this question are given below:
Q7_data <- data.frame(blood_lactate_level = c(3.8, 4.2, 4.8, 4.1, 5.0, 5.3,
4.2, 2.4, 3.7, 5.3, 5.8, 6.0,
5.9, 6.3, 5.5, 6.5),
perceived_recovery = c(7,7,11,12,12,12,13,17,
17,17,18,18,21,21,20,24))
\(~\)
The table below provides a data set containing six observations, three predictors, and one response variable.
Obs. | X1 | X2 | X3 | Y |
---|---|---|---|---|
1 | 0 | 3 | 0 | 2.5 |
2 | 2 | 0 | 0 | 1 |
3 | 0 | 1 | 3 | 7 |
4 | 0 | 1 | 2 | 5 |
5 | −1 | 0 | 1 | 0.5 |
6 | 1 | 1 | 1 | 3 |
Suppose we wish to use this data set to make a prediction for Y when X1 = X2 = X3 = 0 using K-nearest neighbors.