This page highlights a few different areas I’ve actively contributed towards throughout my career.
I’m currently working on vehicle-based detection of impaired driving using machine learning methods. Some of this work was presented at the 2024 Transportation Research Board Annual Meeting
There are a variety of areas that I’d like to devote further attention to:
I am open to working with any student with coding experience (Python or R) and interest/experience in machine learning.
\(~\)
Many penalized regression methods such as LASSO, elastic net, SCAD, and MCP, naturally perform variable selection during the model fitting process. For these models a simple question that an analyst might ask is: “How many of the variables selected by the model are expected to be false discoveries?”
Read more:
\(~\)
With many states contemplating cannabis legalization, a better understanding of how the drug can impact all of areas of driving performance and is of interest. The National Advanced Driving Simulator (NADS) conducts cutting-edge research in the area drugged and impaired driving using advanced driving simulator technology that allows for experimental designs that cannot be executed on real roadways. I have been actively involved in creating statistical models that evaluate the impact of cannabis (and other substances) on driver performance in scenarios involving distracted driving.
Read more:
\(~\)
A few other areas that I’m interested are:
\(~\)
Fig 1: The figure above displays marginal false discovery estimates for a series of lasso penalized survival models for the survival outcomes of 442 early-stage lung cancer subjects in response to 22,283 gene expression measurements and additional clinical covariates. The left panel shows the number of genetic features selected by the lasso relative to the expected number of marginal false discoveries, while the right panel shows the expected marginal false discovery rate of each model.
Fig 2: The figure above displays modeling results from a single simulated data set containing various types of variables (features). The left panel shows the standard LASSO coefficient path that is returned by default from most standard software packages such as glmnet. From this path it is difficult to distinguish between important features and noise. The cross-validated model, which is indicated by the dotted vertical line, contains several noise variables that cannot be easily identified using just the coefficient path. The right panel displays each feature’s local marginal false discovery rate (mfdr) along the same sequence of models. This approach is capable of clearly distinguishing between important variables and noise; the method characterizes each of the noise variables in the cross-validated model as having a greater than 50% chance of being a false discovery.
Fig 3: The figure above shows blood THC concentrations by administered cannabis and alcohol doses during the first occurrence of the side-mirror task for each of the 19 participants. Each line represents a single subject across the six dosing conditions (Pla = Placebo, Alc = Alcohol, Low = Low THC, High = High THC).
Fig 4: Parallel analysis is an approach used to inform dimension reduction in questionaires This plot depicts the results of parallel analysis applied to a suvery of suicide ideation for n=2213 students from 3 Cincinnati-area schools. The results indicate that 13 principal factors can sufficiently represent the 50+ item questionaire.