Directions:

Question #1 (Gradient Descent)

Recall that the cost function for squared error loss can be written as:

\[Cost = \tfrac{1}{n}(\mathbf{y} - \mathbf{\hat{y}})^T(\mathbf{y} - \mathbf{\hat{y}})\]

Here \(\mathbf{y}\) is the vector of observed outcomes and \(\mathbf{\hat{y}}\) is a vector of model predictions.

In Poisson Regression, \(\mathbf{\hat{y}} = e^{\mathbf{X}\mathbf{\hat{w}}}\)

Standard procedure for Poisson regression is estimate the unknown weights using maximum likelihood estimation. However, for this question I’ll ask you to estimate a reasonable set of weights by differentiating the squared error cost function and optimizing it via gradient descent (which is not equivalent to maximum likelihood estimation for this scenario).

## Setup trial data
ic = pd.read_csv("https://remiller1450.github.io/data/IowaCityHomeSales.csv")
ic_y = ic['bedrooms']
ic_X = ic[['assessed','area.living']]

## Scale X
from sklearn.preprocessing import StandardScaler
ic_Xs = StandardScaler().fit_transform(ic_X)

## Fit via grad descent, 250 iter w/ 0.01 learning rate 
gdres = grad_descent(X=ic_Xs,y=ic_y,w=np.zeros(2),alpha=0.01, n_iter=250)

## Min of cost function
print(min(gdres[1]))

Comments: Poisson regression can be fit via maximum likelihood in sklearn using the PoissonRegressor function. The arguments alpha=0 and fit_intercept=False can be used to mimic the model fit to the example data in this question. However, you should expect somewhat different weight estimates since maximum likelihood estimation is not equivalent to minimizing squared error loss for the Poisson regression model. Further, in this example, the variables “assessed” and “area.living” are highly collinear, so there are many combinations of weights that will fit the training data similarly well.

\(~\)

Question #2 (Softmax Regression vs. kNN)

For this question you should use the dataset available here:

https://remiller1450.github.io/data/beans.csv

These data were originally introduced in Homework #1. They were constructed using a computer vision system that segmented 13,611 images of 7 types of dry beans, extracting 16 features (12 dimensional features, a 4 shape form features). Additional details are contained in this paper.

\(~\)

Question #3 (Linear Regression, Feature Expansion, and Regularization)

For this question you should use the dataset available here:

https://remiller1450.github.io/data/Ozone.csv

These data contain Ozone concentrations in New York City in 1973. For context, Ozone is a pollutant that is has been linked to numerous health problems. The goal of this application is develop methods for accurately predicting the Ozone concentration on a future date based upon the expected solar radiation, wind speed, and temperature on that date.

\(~\)

Question #4 (Modeling Choices)

Shown below are a hypothetical example data set that we discussed in the early stages of the semester. You will not be given access to the underlying data, so you should base your assessments on the information that is visibly available: