Directions:

\(~\)

Question 1

Consider gradient boosting using the squared error cost function with linear regression as the base learner. You may use the fact that linear regression has a closed form solution for its weights, and you do not need to re-derive this solution yourself.

\(~\)

Question 2

Recall that squared error cost function can be written:

\[Cost = \tfrac{1}{n}(\mathbf{y} - \mathbf{\hat{y}})^T(\mathbf{y} - \mathbf{\hat{y}})\]

Here \(\mathbf{y}\) is the vector of observed outcomes and \(\mathbf{\hat{y}}\) is a vector of model predictions.

Poisson Regression uses the form: \(\mathbf{\hat{y}} = e^{\mathbf{X}\mathbf{\hat{w}}}\)

The standard approach to Poisson regression is estimate the unknown weights using maximum likelihood estimation. However, for this question I’ll ask you to estimate a reasonable set of weights by differentiating the squared error cost function and optimizing it via gradient descent (which is not equivalent to maximum likelihood estimation in this scenario).

## Set up trial data
ic = pd.read_csv("https://remiller1450.github.io/data/IowaCityHomeSales.csv")
ic_y = ic['bedrooms']
ic_X = ic[['assessed','area.living']]

## Scale X
from sklearn.preprocessing import StandardScaler
ic_Xs = StandardScaler().fit_transform(ic_X)

## Fit via grad descent, 250 iter w/ 0.01 learning rate 
gdres = grad_descent(X=ic_Xs,y=ic_y,w=np.zeros(2),alpha=0.01, n_iter=250)

## Min of cost function
print(min(gdres[1]))

Comments: Poisson regression can be fit via maximum likelihood in sklearn using the PoissonRegressor function. The arguments alpha=0 and fit_intercept=False can be used to mimic the model fit to the example data in this question. However, you should expect somewhat different weight estimates since maximum likelihood estimation is not equivalent to minimizing squared error loss for the Poisson regression model. Further, in this example, the variables “assessed” and “area.living” are highly correlated, so there are many combinations of weights that will fit the training data similarly well.

\(~\)

Question 3

Consider a simple neural network consisting of 1 hidden layer containing 4 neurons that use the sigmoid activation function, and a final output layer that applies the sigmoid function to a linear combination of activated outputs from the hidden layer. You are to apply the squared error cost function to the final activated output.

The input data used in your network will be generated from the make_moons() function in sklearn:

## Set up trial data
X, y = make_moons(n_samples=200, noise=0.1)

Note that data has 2 input input features and a binary outcome.