Sta-330 Intro

Project #1

Client: Collin Nolte (University of Iowa), collin-nolte@uiowa.edu

Project Description:

Spoken word recognition is a dynamic process unfolding in time, often measured through an analysis of eyetracking data. The degree to which a particular word is recognized is known as lexical activation. A typical eyetracking study involves considering four on-screen images and selecting the one that corresponds to a spoken word. There are a number of linguistically interesting phenomena influencing this behavior, including words that share phonetic similarity (wizard/lizard) as well as those with semantic similarity (snake/rope). In this project, we are considering the impact of several different phenomena in the Hebrew language, instances in which words are morphologically related (share the same root consonants), semantically related (have similar meanings) or both. Our goal is to determine if the trajectory of word recognition differs across these experimental conditions. Challenges include working with a relatively novel experimental paradigm and a small sample size. Largely exploratory, tasks will include standard data processing, writing functions in R to transform between different types of eyetrack-specific data, fitting potentially sparse data to non-linear curves and exploring the use of novel statistical methodologies in analyzing the data

Possible Methods: Data processing, developing functions for eyetracking transformation for integration into future packages, analyzing differences in time series data

Possible outcomes: Writing collection of useable R code for data analysis and exploration, written report of findings with tables and visualizations

\(~\)

Project #2

Client: Dr. Tess Kulstad (Anthropology), kulstadt@grinnell.edu

Project description:

Ethnographic research suggests child fosterage arrangements take place in the context of family or economic crisis situations. The Demographic and Health Surveys (DHS) program regularly conducts comprehensive surveys on a broad range of gender-related data across the domains of empowerment, education, health and wellness, and human rights. This project uses DHS survey data for the Dominican Republic with the goal of developing statistical models to investigate whether mothers facing economic and family crises are more likely to give up their children for adoption than mothers not facing such circumstances. The available survey data record information at different levels (household, parent, child, etc.) that will need to be processed and aggregated prior to modeling. For the planned project, data cleaning steps will entail the identification, cleaning, and merging of information pertaining to the mother, child, household, and spouse. Statistical models will require the use of random effects to account for the nested structures present in the data. Some data preparation for this project has already occured, so the initial steps will be unpacking and building upon the work of previous students.

Possible methods:

Data wrangling, feature engineering and selection, mixed effects regression, propensity score analysis, mediation analysis.

Possible outcomes:

Written report containing modeling results/code, professional visualizations, tables, and text that can be used in a future publication. Possible opportunity for an R Shiny dashboard displaying various modeling choices and the corresponding results.

\(~\)

Project #3

Client: Dr. Monty Roper (Director, Build a Better Grinnell), roperjm@grinnell.edu

Project description:

The Build a Better Grinnell 2030 Community Visioning Project is a comprehensive community project involving needs assessment, asset mapping, and action planning. The initial phase of the project administered a general survey to the entire Grinnell community, with specific actions taken to intentionally target certain populations within the broader community. The survey consisted of a series of open-ended questions aimed at identifying strengths, assets, and values, as well as needs and visions. Respondents also had the opportunity to self-describe their identity within the community. The aim of this project is to explore differences in responses across self-described identities and affinity groups. This will involve working with unstructured textual data for several hundred survey participants and extracting meaningful groupings and outcomes. Later stages of the project might involve network analysis or mapping of community organizations/assets mentioned by participants. More information on the Build a Better Grinnell project is available here: http://www.buildabettergrinnell.org/

Possible methods:

Data wrangling, natural language processing, clustering, network analysis.

Possible outcomes:

Data visualizations, interactive dashboards, or a technical report that can be used to inform or shape future stages of the Build a Better Grinnell 2030 Community Visioning Project.

\(~\)

Project #4

Client: Dr. Anna Olsen (Math and Statistics), olsenanna@grinnell.edu

Project Description:

While the US Pacific coast is well-known for its earthquakes, the most widely felt earthquake in US history was the magnitude 5.8 earthquake in Mineral, Virginia in 2011 due to seismic waves being capable of traveling greater distances in the Central and Eastern US than similar waves on the Pacific coast. Engineers establish building protocols using assessments of the maximum plausible magnitude of earthquakes in that region but estimates for the Central and Eastern US may be unsatisfactory. This project looks to estimate the maximum plausible magnitude of earthquakes in the Central and Eastern US using a range of earthquake monitoring databases and statistical methods. For some data sources, such as the Global Centroid Moment Tensor catalog (https://www.globalcmt.org/CMTsearch.html), data can be obtained via querying and web scraping. Additional data sources may also be considered, thereby requiring careful merging to ensure that no observations are repeated.

Additionally, Anna has several other projects pertaining to earthquakes that can be pursued if they are deemed of greater interest than the one described above. These include the modeling of ground motion based on earthquake sources, which are inferred from data collected at measurement stations , and investigation into the calibration and conversion of historical earthquakes measured on different scales.

Possible methods: Web scraping, data wrangling, probability theory, extreme value theory, maps and data visualization

Possible outcomes: Modeling results, data visualizations, and/or a webpage and dashboard addressing the research question.