Overview

In this project you will analyze county-level data from the state of Ohio. The goal is to identify county characteristics that are predictive of cancer incidence (new cases) using aggregate data from three recent years (2015, 2016, and 2017). You will have the opportunity to model incidence for the cancer type/site of your choice in response to the demographic characteristic that you determine to be important.

\(~\)

Data Sources

\(~\)

Guidelines

\(~\)

Getting Starting

  1. This project requires you merge the two datasets - you should do this first, but be aware that the “Midwest” dataset contains counties in the states other than Ohio with identical names
  2. Population sizes vary widely by county - you should consider accounting for this when you form your outcome variable
  3. You can choose a cancer type/site that you are interested in without any formal/statistical justification.
  4. When determining your model’s explanatory variable, please recognize that you have the freedom report on any single demographic characteristic that you find to be most interesting. You also have the freedom to choose the type of model that is best suited for what you’d like to report. That said, I am expecting you to justify how you made these choices. You should follow the principles exhibited in the Data Exploration lab (Lab #2) to help you choose an explanatory variable. And you should follow the principles in the Model Fitting (Lab #3) and Model Evaluation (Lab #4) labs when choosing a model and presenting your results.

\(~\)

Grading

A rubric detailing how your project will be scored is available at this link