This page highlights a few different areas I’ve actively contributed towards throughout my career.

Current Opportunities

I’m currently working on vehicle-based detection of impaired driving using machine learning methods. Some of this work was presented at the 2024 Transportation Research Board Annual Meeting

There are a variety of areas that I’d like to devote further attention to:

  1. Performance of deep learning models (Inception Time) developed to detect cannabis impairment on other sources of impairment (alcohol, combined cannabis+alcohol, drowsiness) and on-road control data.
  2. Trade-offs between complexity of data and performance. The idea would be to explore whether features that are less invasive to collect (speed, acceleration, lateral position) can be used to develop models that perform as well as models involving more invasive features (brake pedal/steering wheel inputs)
  3. Two-stage approaches that match input data to a driving scenario (ie: 4-lane interstate, 2-lane rural, or urban city) and apply an impairment detection model that was trained for that scenario.
  4. Applications of explainable AI (XAI) methods to better understand successful predictors of various types of impairment

I am open to working with any student with coding experience (Python or R) and interest/experience in machine learning.

\(~\)

Areas of Interest

1. False Discovery Rate Methods in High Dimensional Regression Modeling

Many penalized regression methods such as LASSO, elastic net, SCAD, and MCP, naturally perform variable selection during the model fitting process. For these models a simple question that an analyst might ask is: “How many of the variables selected by the model are expected to be false discoveries?”

Read more:

  • Marginal false discovery rate control for likelihood-based penalized regression models, Miller R and Breheny P, Biometrical Journal, 2019. LINK
  • Feature-specific inference for penalized regression models using local false discovery rates, Miller R and Breheny P, Statistics in Medicine, 2023. LINK
  • Marginal false discovery rates for group sparse penalized regression, Miller R, In progress, LINK

\(~\)

2. Statistical Modeling in Drugged Driving Applications

With many states contemplating cannabis legalization, a better understanding of how the drug can impact all of areas of driving performance and is of interest. The National Advanced Driving Simulator (NADS) conducts cutting-edge research in the area drugged and impaired driving using advanced driving simulator technology that allows for experimental designs that cannot be executed on real roadways. I have been actively involved in creating statistical models that evaluate the impact of cannabis (and other substances) on driver performance in scenarios involving distracted driving.

Read more:

  • Impact of cannabis and low alcohol concentration on divided attention tasks during driving, Miller, R, Brown, T, Lee, S, Tibrewal, I, Milavetz, G, Gaffney, G, Hartman, Hartman R, D Gorelick, R Compton, Huestis, M, Traffic Injury Prevention, 2020. LINK
  • Influence of cannabis use history on the impact of acute cannabis smoking on simulated driving performance during a distraction task, Miller, R, Brown, T, Wrobel, J, Kosnett, M, Brooks-Russell, A, Traffic Injury Prevention, 2022. LINK
  • Predicting changes in driving performance in individuals who use cannabis following acute use based on self-reported readiness to drive, Miller, R, Brown, T, Schmitt, R, Gaffney, G, Milavetz, G, Accident Analysis and Prevention, 2024. LINK

\(~\)

3. Other Interests

A few other areas that I’m interested are:

  • Applications in Public Health and Biology - how statistical methods and data analysis can contribute to the understand of biological processes and health outcomes
  • Statistics in Sports - how to leverage data and statistical methods to gain new insights in the realm of sports
  • Data Science Education - determining the content belongs in a data science curriculum, and what is the most effective way to teach that content to students

\(~\)

Select Data Visualizations

Marginal False Discovery Rates

Fig 1: The figure above displays marginal false discovery estimates for a series of lasso penalized survival models for the survival outcomes of 442 early-stage lung cancer subjects in response to 22,283 gene expression measurements and additional clinical covariates.  The left panel shows the number of genetic features selected by the lasso relative to the expected number of marginal false discoveries, while the right panel shows the expected marginal false discovery rate of each model.

Fig 1: The figure above displays marginal false discovery estimates for a series of lasso penalized survival models for the survival outcomes of 442 early-stage lung cancer subjects in response to 22,283 gene expression measurements and additional clinical covariates. The left panel shows the number of genetic features selected by the lasso relative to the expected number of marginal false discoveries, while the right panel shows the expected marginal false discovery rate of each model.

Local Marginal False Discovery Rates

Fig 2: The figure above displays modeling results from a single simulated data set containing various types of variables (features). The left panel shows the standard LASSO coefficient path that is returned by default from most standard software packages such as glmnet. From this path it is difficult to distinguish between important features and noise. The cross-validated model, which is indicated by the dotted vertical line, contains several noise variables that cannot be easily identified using just the coefficient path. The right panel displays each feature’s local marginal false discovery rate (mfdr) along the same sequence of models. This approach is capable of clearly distinguishing between important variables and noise; the method characterizes each of the noise variables in the cross-validated model as having a greater than 50% chance of being a false discovery.

Fig 2: The figure above displays modeling results from a single simulated data set containing various types of variables (features). The left panel shows the standard LASSO coefficient path that is returned by default from most standard software packages such as glmnet. From this path it is difficult to distinguish between important features and noise. The cross-validated model, which is indicated by the dotted vertical line, contains several noise variables that cannot be easily identified using just the coefficient path. The right panel displays each feature’s local marginal false discovery rate (mfdr) along the same sequence of models. This approach is capable of clearly distinguishing between important variables and noise; the method characterizes each of the noise variables in the cross-validated model as having a greater than 50% chance of being a false discovery.

Blood THC Concentrations While Driving

Fig 3: The figure above shows blood THC concentrations by administered cannabis and alcohol doses during the first occurrence of the side-mirror task for each of the 19 participants. Each line represents a single subject across the six dosing conditions (Pla = Placebo, Alc = Alcohol, Low = Low THC, High = High THC).

Fig 3: The figure above shows blood THC concentrations by administered cannabis and alcohol doses during the first occurrence of the side-mirror task for each of the 19 participants. Each line represents a single subject across the six dosing conditions (Pla = Placebo, Alc = Alcohol, Low = Low THC, High = High THC).

Optimization of Suicide Ideation Questionaire

Fig 4: Parallel analysis is an approach used to inform dimension reduction in questionaires  This plot depicts the results of parallel analysis applied to a suvery of suicide ideation for n=2213 students from 3 Cincinnati-area schools.  The results indicate that 13 principal factors can sufficiently represent the 50+ item questionaire.

Fig 4: Parallel analysis is an approach used to inform dimension reduction in questionaires This plot depicts the results of parallel analysis applied to a suvery of suicide ideation for n=2213 students from 3 Cincinnati-area schools. The results indicate that 13 principal factors can sufficiently represent the 50+ item questionaire.