Directions:
Recall that the cost function for squared error loss can be written as:
\[Cost = \tfrac{1}{n}(\mathbf{y} - \mathbf{\hat{y}})^T(\mathbf{y} - \mathbf{\hat{y}})\]
Here \(\mathbf{y}\) is the vector of observed outcomes and \(\mathbf{\hat{y}}\) is a vector of model predictions.
In Poisson Regression, \(\mathbf{\hat{y}} = e^{\mathbf{X}\mathbf{\hat{w}}}\)
Standard procedure for Poisson regression is estimate the unknown weights using maximum likelihood estimation. However, for this question I’ll ask you to estimate a reasonable set of weights by differentiating the squared error cost function and optimizing it via gradient descent (which is not equivalent to maximum likelihood estimation for this scenario).
np.diag()
to set up a diagonal
matrix. For reference, the minimum cost for the example data (see below)
should be between 4.69 and 4.70.## Setup trial data
ic = pd.read_csv("https://remiller1450.github.io/data/IowaCityHomeSales.csv")
ic_y = ic['bedrooms']
ic_X = ic[['assessed','area.living']]
## Scale X
from sklearn.preprocessing import StandardScaler
ic_Xs = StandardScaler().fit_transform(ic_X)
## Fit via grad descent, 250 iter w/ 0.01 learning rate
gdres = grad_descent(X=ic_Xs,y=ic_y,w=np.zeros(2),alpha=0.01, n_iter=250)
## Min of cost function
print(min(gdres[1]))
Comments: Poisson regression can be fit via maximum
likelihood in sklearn
using the
PoissonRegressor
function. The arguments
alpha=0
and fit_intercept=False
can be used to
mimic the model fit to the example data in this question. However, you
should expect somewhat different weight estimates since maximum
likelihood estimation is not equivalent to minimizing squared error loss
for the Poisson regression model. Further, in this example, the
variables “assessed” and “area.living” are highly collinear, so there
are many combinations of weights that will fit the training data
similarly well.
\(~\)
For this question you should use the dataset available here:
https://remiller1450.github.io/data/beans.csv
These data were originally introduced in Homework #1. They were constructed using a computer vision system that segmented 13,611 images of 7 types of dry beans, extracting 16 features (12 dimensional features, a 4 shape form features). Additional details are contained in this paper.
random_state=1
, and
separate the outcome (Class) from the predictors. Then setup a
pre-processing pipeline that includes three steps: 1. Normalizing
transformation using the Yeo-Johnson method, 2. Rescaling using a
min-max scaler, 3. Model fitting using Softmax regression with no
regularization. Use the “saga” solver with a maximum of 1000 iterations.
You may ignore the “divide by zero encountered …” and “overflow
encountered …” warning messages.GridSearchCV
function
to compare the out-of-sample classification accuracy of the Softmax
regression model from Parts A/B with a k-nearest neighbors classifier
using 30 neighbors, distance weighting, and euclidean distance. Which
model performs better?\(~\)
For this question you should use the dataset available here:
https://remiller1450.github.io/data/Ozone.csv
These data contain Ozone concentrations in New York City in 1973. For context, Ozone is a pollutant that is has been linked to numerous health problems. The goal of this application is develop methods for accurately predicting the Ozone concentration on a future date based upon the expected solar radiation, wind speed, and temperature on that date.
random_state=3
. Then, separate
the outcome from the predictors (dropping the “Day” column), and create
a pre-processing pipeline that performs standardization before fitting a
linear regression model.SplineTransformer
to expand original set
of features to facilitate non-linear relationships. Then evaluate this
new approach using 5-fold cross-validation (using the
‘neg_root_mean_squared_error’ score). Does the feature expansion seem
beneficial, or does it appear to result in overfitting?RidgeCV
with a log-spaced sequence of
regularization amounts going from 0.01 (1.0e-02) to 100 (1.0e+03). Use
5-fold cross-validation and neg_root_mean_squared_error
as
the scoring metric. Report the optimal amount of regularization.\(~\)
Shown below are a hypothetical example data set that we discussed in the early stages of the semester. You will not be given access to the underlying data, so you should base your assessments on the information that is visibly available: