Introductory Activity (Sta-330, Sp25)

The cycle begins with data collection. For many projects the data has already been collected, but you still need to devote time and attention towards how it was collected and the broader context behind it. For other projects, you may be expected to contribute to the planning of data collection.

These steps are repeated in response to new insights learned during previous passes through the cycle. It’s very difficult to achieve an ideal model, analysis, or conclusion on the first try.

At some point, results are either disseminated/deployed, or additional data is collected in hopes it might facilitate a better outcome.

Today’s activity

The rest of today will be devoted towards a mini-project aimed at helping everyone get to know one another, covering presentation guidelines, and reflecting upon the data science life cycle.

The data you’ll work with is available here:

https://remiller1450.github.io/data/admissions.csv

These data were collected by a public US university and were queried in response to allegations of sex-based discrimination in admissions to the university’s graduate programs.

Below are brief descriptions of the variables contained in these data:

ID - a unique applicant identifier
dept - an identifier of the graduate department the applicant applied to
sex - the sex of the applicant
gpa - the applicant’s undergraduate grade point average
admit - whether the applicant was admitted

Your group’s goal is to use these data to make a decision regarding whether you believe there is sufficient evidence of discrimination in the university’s admissions.

You will prepare a short executive summary (2-4 sentences), and data visualizations or models to support the claims made in your summary.
You are expected to go through the data science life cycle loop at least twice (ie: revisit data manipulation and data visualization and exploration at least once after your initial analysis)
- One person in your group should keep a brief log of what you did during the first and second pass through the cycle

After completing your analysis, you will be paired with another group with whom you’ll share your executive summaries, provide critiques, and attempt to reach a shared conclusion.

\(~\)