Prediction and Factor Identification for Crash Severity: Comparison of Discrete Choice and Tree-Based Models

Wang, XY; Kim, SH

Kim, SH (reprint author), Georgia Inst Technol, Sch Civil & Environm Engn, Atlanta, GA 30332 USA.



Crash severity is one of the most widely studied topics in traffic safety area. Scholars have studied crash severity through various types of models. Using the publicly available 2017 Maryland crash data from the Department of Maryland State Police, the authors develop a multinomial logit (MNL) model and a random forest (RF) model, which belong to discrete choice and tree-based models, respectively, to (1) identify factors contributing to crash severity and (2) compare prediction performances and interpretation abilities between the two models. Based on the model results, major contributing factors of crash severity are identified, including collision type, occupant age, and speed limit. For the given dataset, RF has a higher prediction accuracy than MNL based on multiple measures (precision, recall, and F-1 score), even though the differences are not dramatic. Sensitivity analysis results show that RF is less sensitive than MNL. RF can automatically capture the non-linear effects of continuous variables and reduce the influence of collinearity relationships existing among explanatory variables. This study shows the possibility of conducting sensitivity analysis to enhance understanding of MNL and RF results, and uncovers unique characteristics of the discrete choice and tree-based models.

Download PDF

Full Text Link