In this project you will work either individually, or with up to two classmates, on an applied statistical analysis. You are free to choose your topic and dataset (pending instructor approval of your proposal). The primary output of the analysis will be a 3-page paper. While the formal project requirements do not depend upon the size of your group, expectations will be higher for larger groups.
I encourage you to choose a topic that overlap with your other interests (either academic or non-academic), and I have no issue with you using data that you’ve previously worked with in another context, such as another class, an internship, or a research project, so long as the way in which you use that data on this project is new.
\(~\)
\(~\)
Proposal:
Please submit (via Canvas) 1-2 paragraphs outlining:
If you are working with a partner, both of you need to upload the same proposal onto Canvas.
If you are having trouble thinking of a topic, I suggest browsing the catalog of over 200,000 publicly available datasets at data.gov as a starting point. Some other great data sources are sports reference, and world bank open data.
Please note: you may not use data from Kaggle.com, or any other source where users frequently post their own analyses.
Dataset and Data Dictionary:
Please submit (via Canvas) the following two items:
Paper:
Your paper should be no more than 3-pages (any spacing you find appropriate), including graphs or tables, but not including references or supplemental information. It should include the following components:
I encourage you to write your paper using R Markdown.
R Code:
Along with your paper, you should submit any R code used at any point in the project (including data cleaning and statistical analysis). If you wrote your paper in R Markdown that is great! You can simply turn-in both the “knit” and “Rmd” files.
Grading:
A rubric outlining how your paper will be evaluated is available here: Rubric Link
Your final score will be out of 60 points, which 50 coming from the paper itself, and 10 coming from proposal and dataset/data dictionary (these 10 points are entirely based upon timely completion and meeting the minimum requirements listed above).
\(~\)
The structure of this assignment is based upon the Undergraduate Class Project Competition (USCLAP) for statistics, a national competition for class projects in introductory or intermediate statistics classes. I encourage you to consider submitting your paper in the introductory category. Each year numerous students win awards and honorable mentions that are great items to include on your resume. In addition, placing submissions get invited to present at a virtual conference (another resume booster).
If you are unsure what a good project might look like, here are a few examples of papers that would receive near perfect scores:
\(~\)
The following opportunity can be used to recoup up to 6 points missed on exams during the semester. The premise is that you independently work through an R
lab describing how to use methods, functions, or procedures we did not cover in class, and then you incorporate those methods into your project. You will receive 3 pts for each lab/method you complete and implement. Listed below are eligible methods and their corresponding labs:
ggplot2
graphics - Lab Link - you might make the plots and figures in your project report extra fancy with nice labels, colors, etc.dplyr
- Lab Link - you might merge multiple datasets together and use the combined dataset in your projectleaflet
- Lab Link - you might use this knowledge to add a map to as supplementary information to your project report.tidyr
and dplyr
- Lab Link #1 and Lab Link #2 - you might use this knowledge to help you clean and process your data, or you might use it to aggregate cases into a more meaningful formatstringr
- Lab Link - you might use this knowledge to process and analyze textual data as part of your projectR
to scrape a web sourceTo receive this extra credit, you must submit answers to the Lab questions along with a briefly explanation (just a couple of sentences is fine) describing how you incorporated content from the lab into your project. If you are working with one or more partners on the project you are expected to complete these labs independently, but obviously the content will appear in everyone’s project (since you’re turning in the same report).