For this project you will engage in a comprehensive data modeling analysis of a dataset of your choosing. You will report the results of your analysis in the form of a 3-page executive report. You will be expected to fully document all code necessary to reproduce your modeling results.
\(~\)
For this project you may choose any dataset of sufficient complexity so long as it is real and does not have crowd-sourced analysis available on the internet.
Data found on websites like “data.world”, “Kaggle”, or websites corresponding to textbooks or academic sources are not acceptable choices for this project. However, data found on government websites, sports reference databases, economics databases, etc. are acceptable.
You also may work with data you’ve encountered in another capacity, such as another class, another research project, or an internship - but only if the analysis you conduct is sufficiently different from your previous work with the data.
\(~\)
There are three required steps in completing this project: the initial proposal, two progress meetings during class, and the submission of your final report.
\(~\)
All topics/datasets must be approved prior by the Monday 4/19 at 11:59pm. You should submit, via Canvas, a brief proposal containing:
\(~\)
During each of the final two weeks of the semester, you will be expected to meet via Zoom to provide a progress update on your project. These updates will be qualitatively scored, and if sufficient progress is not made between each meeting, there will be a negative impact your project’s grade.
During the week of 4/19 - 4/23, you should be prepared to share your data cleaning/processing steps, along with multiple exploratory data visualizations at our meeting for your progress to be deemed sufficient.
During the week of 4/26 - 4/30, you should be prepared to share your model selection process, along with some preliminary modeling results at our meeting for your progress to be deemed sufficient.
\(~\)
Your final report, along with your raw data, and any R code necessary to reproduce your analysis are due by Noon on Thursday, May 6th. Note that this deadline coincides with our assigned final exam timeslot. You do not need to show up in-person or virtually on this date; rather, you can simply upload the required files onto Canvas to submit your project.
\(~\)
Your final report is expected to be broken up into five sections:
Additionally, you are expected to include the following:
Finally, you may choose to include additional supporting materials in an appendix that does not count towards the three-page limit. This would be the place to include unformatted output from functions like stepAIC()
, summary()
, anova()
, etc. that isn’t professional or polished enough in appearance to be directly included in a formal research report.
\(~\)
An outline of how your project will be scored is displayed below. Each component will be scored holistically based upon how well you fulfilled that aspect of the project.
Total: 100 points