This page highlights a few different research areas where I’ve made contributions throughout my career. If you are a Grinnell College student interested in working with me on a research project, please read the information below:
\(~\)
I am currently working on passive detection of drug-impaired driving using machine learning and time-series data streams (ocular and vehicle) from driver monitoring systems. In particular, I’m planning on the following projects for Summer 2026:
The only necessary requirement for me to consider your application is intermediate-level programming experience in either R or Python (ie: organized projects involving more than just a few lines of data analysis code). Experience in machine learning and/or statistical modeling is desirable, but not required for consideration.
To better understand this line of research, I encourage you to browse the presentation slides linked below:
\(~\)
Many penalized regression methods such as LASSO, elastic net, SCAD, and MCP, naturally perform variable selection during the model fitting process. For these models a simple question that an analyst might ask is: “How many of the variables selected by the model are expected to be false discoveries?”
Read more:
\(~\)
With many states contemplating cannabis legalization, a better understanding of how the drug can impact all of areas of driving performance and is of interest. The National Advanced Driving Simulator (NADS) conducts cutting-edge research in the area drugged and impaired driving using advanced driving simulator technology that allows for experimental designs that cannot be executed on real roadways. I have been actively involved in creating statistical models that evaluate the impact of cannabis (and other substances) on driver performance in scenarios involving distracted driving.
Read more:
\(~\)
A few other areas that I’m interested are:
\(~\)
Fig 1: The figure above displays marginal false discovery estimates for a series of lasso penalized survival models for the survival outcomes of 442 early-stage lung cancer subjects in response to 22,283 gene expression measurements and additional clinical covariates. The left panel shows the number of genetic features selected by the lasso relative to the expected number of marginal false discoveries, while the right panel shows the expected marginal false discovery rate of each model.
Fig 2: The figure above displays modeling results from a single simulated data set containing various types of variables (features). The left panel shows the standard LASSO coefficient path that is returned by default from most standard software packages such as glmnet. From this path it is difficult to distinguish between important features and noise. The cross-validated model, which is indicated by the dotted vertical line, contains several noise variables that cannot be easily identified using just the coefficient path. The right panel displays each feature’s local marginal false discovery rate (mfdr) along the same sequence of models. This approach is capable of clearly distinguishing between important variables and noise; the method characterizes each of the noise variables in the cross-validated model as having a greater than 50% chance of being a false discovery.
Fig 3: The figure above shows blood THC concentrations by administered cannabis and alcohol doses during the first occurrence of the side-mirror task for each of the 19 participants. Each line represents a single subject across the six dosing conditions (Pla = Placebo, Alc = Alcohol, Low = Low THC, High = High THC).
Fig 4: Parallel analysis is an approach used to inform dimension reduction in questionaires This plot depicts the results of parallel analysis applied to a suvery of suicide ideation for n=2213 students from 3 Cincinnati-area schools. The results indicate that 13 principal factors can sufficiently represent the 50+ item questionaire.