Directions:
- Homework must be completed individually
- Please type your responses, clearly separating each question and
sub-question (A, B, C, etc.)
- You may type your written answers using Markdown chucks in a
JupyterNotebook, or you may use any word processing software and submit
your Python code separately
- Questions that require Python coding should include all commands
used to reach the answer, nothing more and nothing less
- Submit your work via P-web
Question #1 (Neural Networks and Back-propogation)
Consider a simple neural network consisting of 1 hidden layer
containing 4 neurons that use the sigmoid activation function. The
network will predict a numeric outcome, so the weighted outputs of each
neuron contribute directly to the outcome (rather than being passed into
another sigmoid function).
For future reference, this network will applied to Iowa City home
sales data set, using sale.amount
as the outcome and
area.living
, area.lot
, and
bedrooms
as the predictors.
Because the outcome is numeric, you will use the squared error cost
function: \[Cost = \tfrac{1}{n}(\mathbf{y} -
\mathbf{\hat{y}})^T(\mathbf{y} - \mathbf{\hat{y}})\]
- Part A: How many weight parameters are used to
generate the neurons in this network from the input features? Briefly
explain.
- Part B: How many weight parameters are used to
generate the predicted outcome from this network from the hidden
neurons? Briefly explain.
- Part C: Consider data on a single training example
with its predictors stored in the vector \(\mathbf{x}_i\). To prepare for only
one stochastic gradient descent, differentiate the cost function
with respect to each weight parameter in the final layer of the
network.
- Part D: Now find the gradient component for each of
the remaining parameters, which are the biases and the remaining
weights.
- Part E: Using functions from Lab 4, part 2
as a guide, write your function that performs stochastic gradient
descent function to train this network.
- Part F: Load the Iowa City home sales data (located
at https://remiller1450.github.io/data/IowaCityHomeSales.csv),
perform an 80-20 train-test split with
random_state=7
, then
separate the predictors from the the outcome.
- Part G: Standardize the model outcome,
sale.amount
, by subtracting its mean then dividing by its
standard deviation (given by np.std
), and rescale the
predictors using MinMaxScaler()
. Note: these steps
provide numerical stability.
- Part H: Train your network with a learning rate of
0.00001 and graph the cost function over 100 passes through the training
data. Your network should reach a minimum cost of approximately 0.91 (if
you’ve standardized properly)
- Part I: Use
MLPRegressor
, the
implementation of neural networks in sklearn
, to fit the
same model to the same data. Be sure to specify the proper activation
function, number of hidden layers, and number of neurons. Then, print
the fitted model’s final loss and verify that it’s roughly 0.45.
Note: this loss should be half of what you found because
sklearn
defines squared error cost as \(\tfrac{1}{2n}\sum_{i=1}^{n}(y_i-\hat{y}_i)^2\),
but we haven’t included the “2”.
Question #2 (Application)
Lab 2, part 2 introduced the MNIST handwritten digit data, which is
available at this link:
https://remiller1450.github.io/data/mnist_small.csv
Recall that this CSV file contains a flattened version of the
handwritten examples, which were originally 28x28 pixel grayscale
images.
- Part A: Load the MNIST data into Python, create a
90-10 training-testing split of these data using
random_state=10
, then separate the outcome from the
predictors.
- Part B: Using the flattened data, build an
artificial neural network containing 3 hidden layers that will classify
each of the 10 digits. You may use any number of neurons you deem
appropriate in these layers.
- Part C: Optimize the parameters of your network
from Part B using the training data, the cross-entropy cost function,
and the
pytorch
implementation of stochastic gradient
descent. Create a line graph to demonstrate that the cost has reached a
minimum.
- Part D: Find and report the classification accuracy
of your trained network from Parts B/C on the test data.
- Part E: Reshape the data into a properly formatted
tensor to be used in a convolutional neural network following the
conventions discussed in Lab 8.
- Part F: Using the reshaped data, build a
convolutional neural network containing at least 2 convolutional layers.
You may use any parameters, such as kernel size and stride, that you
deem appropriate in these layers.
- Part G: Optimize the parameters of your network
from Part F using the training data, the cross-entropy cost function,
and the
pytorch
implementation of stochastic gradient
descent. Create a line graph to demonstrate that the cost has reached a
minimum.
- Part H: Find and report the classification accuracy
of your trained network from Parts F/G on the test data. Based upon
these results, did the use convolutional layers lead to improved
performance? Briefly explain why you believe performance did/didn’t
improve when using convolutional layers.