Overview

For this project you will engage in a comprehensive data modeling analysis of a dataset of your choosing. You will report the results of your analysis in the form of a 3-page executive report. You will be expected to fully document all code necessary to reproduce your modeling results.

\(~\)

Acceptable Datasets/Topics

For this project you may choose any dataset of sufficient complexity so long as it is real and does not have crowd-sourced analysis available on the internet.

Data found on websites like “data.world”, “Kaggle”, or websites corresponding to textbooks or academic sources are not acceptable choices for this project. However, data found on government websites, sports reference databases, economics databases, etc. are acceptable.

You also may work with data you’ve encountered in another capacity, such as another class, another research project, or an internship - but only if the analysis you conduct is sufficiently different from your previous work with the data.

\(~\)

Project Timeline

There are three required steps in completing this project: the initial proposal, two progress meetings during class, and the submission of your final report.

\(~\)

Project Topic and Proposal

All topics/datasets must be approved prior by the Monday 4/19 at 11:59pm. You should submit, via Canvas, a brief proposal containing:

  1. A link to your data source
  2. A brief description of the data
  3. A few sentences describing the nature of the question you seek to answer using a model

\(~\)

Progress Report Meetings

During each of the final two weeks of the semester, you will be expected to meet via Zoom to provide a progress update on your project. These updates will be qualitatively scored, and if sufficient progress is not made between each meeting, there will be a negative impact your project’s grade.

During the week of 4/19 - 4/23, you should be prepared to share your data cleaning/processing steps, along with multiple exploratory data visualizations at our meeting for your progress to be deemed sufficient.

During the week of 4/26 - 4/30, you should be prepared to share your model selection process, along with some preliminary modeling results at our meeting for your progress to be deemed sufficient.

\(~\)

Final Report

Your final report, along with your raw data, and any R code necessary to reproduce your analysis are due by Noon on Thursday, May 6th. Note that this deadline coincides with our assigned final exam timeslot. You do not need to show up in-person or virtually on this date; rather, you can simply upload the required files onto Canvas to submit your project.

\(~\)

Paper Components

Your final report is expected to be broken up into five sections:

  1. An introduction section that describes the study design, data collection, context, and purpose of project
  2. A research question (within your introduction section) that is specific, answerable, and interesting (ie: “After adjusting for demographic/institutional factors, are there regional differences in the net tuition costs of private colleges?”)
  3. A methods section that describes how approached creating a model to answer your research question. This should describe in detail the steps of your analysis without giving away any results.
  4. A results section that reports the results of the processes described in your methods section.
  5. A discussion section that puts your results into context and acknowledges any limitations of your data and/or the analysis approach you used.

Additionally, you are expected to include the following:

  1. A descriptive graph that summarizes the outcome of interest
  2. A table that summarizes your modeling results
  3. One or more additional graphs/tables that support any aspects of your results not captured in the graph/table mentioned above

Finally, you may choose to include additional supporting materials in an appendix that does not count towards the three-page limit. This would be the place to include unformatted output from functions like stepAIC(), summary(), anova(), etc. that isn’t professional or polished enough in appearance to be directly included in a formal research report.

\(~\)

Scoring

An outline of how your project will be scored is displayed below. Each component will be scored holistically based upon how well you fulfilled that aspect of the project.

Total: 100 points