Directions:

Question #1 (Minkowski distance and clustering)

The Minkowski distance between two points is defined as:

\[D(\mathbf{x_1}, \mathbf{x_2}) = \bigg(\sum_{j=1}^{p}|x_{1,j} - x_{2,j}|^k\bigg)^{1/k}\]

Question #2 (clustering application)

The data stored at this URL: https://remiller1450.github.io/data/Colleges2019_Complete.csv contain information on all primarily undergraduate colleges in the United States with at least 400 enrolled students for the year 2019. The given data have been further restricted to exclude any colleges that do not report one or more of the recorded variables.

\(~\)

Question #3 (principal components)

The data stored at this URL: https://remiller1450.github.io/data/mnist_small.csv are a subset of size \(n = 6000\) from the MNIST database of handwritten digits. This database contains 28 by 28 pixel grayscale images of handwritten digits 0-9. The sample contained at the given URL have been flattened, meaning each row represents an example image with 784 columns used to represent grayscale intensities of that image’s pixels.