Homework #4

Question #1 (Neural Networks and Back-propogation)

Consider a simple neural network consisting of 1 hidden layer containing 4 neurons that use the sigmoid activation function. The network will predict a numeric outcome, so the weighted outputs of each neuron contribute directly to the outcome (rather than being passed into another sigmoid function).

For future reference, this network will applied to Iowa City home sales data set, using sale.amount as the outcome and area.living, area.lot, and bedrooms as the predictors.

Because the outcome is numeric, you will use the squared error cost function: \[Cost = \tfrac{1}{n}(\mathbf{y} - \mathbf{\hat{y}})^T(\mathbf{y} - \mathbf{\hat{y}})\]

Part A: How many weight parameters are used to generate the neurons in this network from the input features? Briefly explain.
Part B: How many weight parameters are used to generate the predicted outcome from this network from the hidden neurons? Briefly explain.
Part C: Consider data on a single training example with its predictors stored in the vector \(\mathbf{x}_i\). To prepare for only one stochastic gradient descent, differentiate the cost function with respect to each weight parameter in the final layer of the network.
Part D: Now find the gradient component for each of the remaining parameters, which are the biases and the remaining weights.
Part E: Using functions from Lab 4, part 2 as a guide, write your function that performs stochastic gradient descent function to train this network.
Part F: Load the Iowa City home sales data (located at https://remiller1450.github.io/data/IowaCityHomeSales.csv), perform an 80-20 train-test split with random_state=7, then separate the predictors from the the outcome.
Part G: Standardize the model outcome, sale.amount, by subtracting its mean then dividing by its standard deviation (given by np.std), and rescale the predictors using MinMaxScaler(). Note: these steps provide numerical stability.
Part H: Train your network with a learning rate of 0.00001 and graph the cost function over 100 passes through the training data. Your network should reach a minimum cost of approximately 0.91 (if you’ve standardized properly)
Part I: Use MLPRegressor, the implementation of neural networks in sklearn, to fit the same model to the same data. Be sure to specify the proper activation function, number of hidden layers, and number of neurons. Then, print the fitted model’s final loss and verify that it’s roughly 0.45. Note: this loss should be half of what you found because sklearn defines squared error cost as \(\tfrac{1}{2n}\sum_{i=1}^{n}(y_i-\hat{y}_i)^2\), but we haven’t included the “2”.

Question #2 (Application)

Lab 2, part 2 introduced the MNIST handwritten digit data, which is available at this link:

https://remiller1450.github.io/data/mnist_small.csv

Recall that this CSV file contains a flattened version of the handwritten examples, which were originally 28x28 pixel grayscale images.

Part A: Load the MNIST data into Python, create a 90-10 training-testing split of these data using random_state=10, then separate the outcome from the predictors.
Part B: Using the flattened data, build an artificial neural network containing 3 hidden layers that will classify each of the 10 digits. You may use any number of neurons you deem appropriate in these layers.
Part C: Optimize the parameters of your network from Part B using the training data, the cross-entropy cost function, and the pytorch implementation of stochastic gradient descent. Create a line graph to demonstrate that the cost has reached a minimum.
Part D: Find and report the classification accuracy of your trained network from Parts B/C on the test data.
Part E: Reshape the data into a properly formatted tensor to be used in a convolutional neural network following the conventions discussed in Lab 8.
Part F: Using the reshaped data, build a convolutional neural network containing at least 2 convolutional layers. You may use any parameters, such as kernel size and stride, that you deem appropriate in these layers.
Part G: Optimize the parameters of your network from Part F using the training data, the cross-entropy cost function, and the pytorch implementation of stochastic gradient descent. Create a line graph to demonstrate that the cost has reached a minimum.
Part H: Find and report the classification accuracy of your trained network from Parts F/G on the test data. Based upon these results, did the use convolutional layers lead to improved performance? Briefly explain why you believe performance did/didn’t improve when using convolutional layers.