journal of trafFic and transportation engineering (English edition) 2016; 3 (4): 308-323

Available online at www.sciencedirect.com

ScienceDirect

journal homepage: www.elsevier.com/locate/jtte

Original Research Paper

Factors associated with crash severity on rural roadways in Wyoming

CrossMark

Debbie S. Shinstine a Shaun S. Wulff b, Khaled Ksaibati a

a Wyoming Technology Transfer Center, University of Wyoming, Laramie, WY 82071, USA b Department of Statistics, University of Wyoming, Laramie, WY 82071, USA

ARTICLE INFO

ABSTRACT

Article history: Received 21 July 2015 Received in revised form 9 December 2015 Accepted 14 December 2015 Available online 8 August 2016

Keywords: Crash data Crash analysis Logistic regression Roadway safety Indian reservations Highway system

The ability to identify risk factors associated with crashes is critical to determine appropriate countermeasures for improving roadway safety. Many studies have identified risk factors for urban systems and intersections, but few have addressed crashes on rural roadways, and none have analyzed crashes on Indian Reservations. This study analyzes crash severity for rural highway systems in Wyoming. These rural systems include in-terstates, state highways, rural county local roads, and the roadway system on the Wind River Indian Reservation (WRIR). In alignment with the Wyoming strategic highway safety goal of reducing critical crashes (fatal and serious injury), crash severity was treated as a binary response in which crashes were classified as severe or not severe. Multiple logistic regression models were developed for each of the highway systems. Five effects were prevalent on all systems including animals, driver impairment, motorcycles, mean speed, and safety equipment use. With the exception of animal crashes, all of these effects increased the probability that a crash would be severe. Based upon these results, DOTs can pursue effective policies and targeted design decisions to reduce the severity of crashes on rural highways.

© 2016 Periodical Offices of Chang'an University. Production and hosting by Elsevier B.V. on behalf of Owner. This is an open access article under the CC BY-NC-ND license (http://

creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Strategic Highway Safety Plans are implemented to establish goals and objectives for agencies and communities to reduce crash rates on their roadway systems (FHWA, 2012). In order to develop effective strategies, it is necessary to identify potential risk factors for crashes and mitigate these risks as

much as possible. This study analyzes crash severity for rural highway systems in Wyoming.

Wyoming is uniquely characterized by a vast rural roadway network of over 6400 miles ranging from interstates, state and U.S. highways, county roads, and Indian reservation roads. There are approximately 800 miles of interstates and over 4000 miles of state and U.S. highways (WYDOT, 2013). Traffic volumes are relatively low across the state due to the

* Corresponding author. Tel.: +1 307 766 6743; fax: +1 307 766 6784.

E-mail addresses: dshinsti@gmail.com (D. S. Shinstine), wulff@uwyo.edu (S. S. Wulff), khaled@uwyo.edu (K. Ksaibati). Peer review under responsibility of Periodical Offices of Chang'an University. http://dx.doi.org/10.1016/j.jtte.2015.12.002

2095-7564/© 2016 Periodical Offices of Chang'an University. Production and hosting by Elsevier B.V. on behalf of Owner. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

sparse population. However, the vehicle miles traveled (VMT) in 2013 were 2.4 billion on the interstates and over 2.6 billion on all state and U.S. highways. Of these, 860 million were truck VMT (TVMT) on the interstates and 300 million TVMT on state and U.S. highways (WYDOT, 2013). The Wind River Indian Reservation has over 1200 miles of roadways in its inventory (NCHRP, 2007). Average daily traffic (ADT) and VMT data are not available for the local county and reservation roads. Rural roadways typically have lower population densities, longer travel distances, higher speeds, and more complex road geometrics (TRIP, 2015).

Each rural highway system has unique characteristics when crash severity is assessed. On rural roads, several factors that contribute to high severity crashes include extreme terrain, higher speeds, larger number of crashes involving alcohol use, and longer response time for emergency services (Atkinson et al., 2014; Ksaibati and Evans, 2009). Indian reservations have many similarities with rural communities concerning their roadway systems (Shinstine and Ksaibati, 2013). There are also behavioral factors that may affect crash severity. For example, alcohol and seat belt use have been identified by the native American community as some of the greatest concerns in improving highway safety (Herbel and Kleiner, 2010; Shinstine et al., 2015).

Crash severity is the level of injury experienced by the victim of the crash and can be categorized in many ways. Typically, the KABCO scale is used which divides crash severity as fatal (K), incapacitating injury (A), non-incapacitating injury (B), possible injury (C), and property damage only (O) (National Safety Council, 1970; Niessner, 2010). This paper utilizes two categories, severe (fatal and incapacitating injuries) or not severe (non-incapacitating injury, possible injury and property damage only). While this is not the common approach, there are two important reasons for using this binary representation of severity in this paper. First, Wyoming is a sparsely populated state with hundreds of miles of very low traffic volume roadways. This is reflected in the crash data which has very low frequencies in some of the categories for the KABCO scale, particularly fatalities. However, low frequencies do not equate to low risks. By combining fatal and serious injury crashes, risk factors for severe crashes can be better identified and modeled. Second, the goal of Wyoming's Strategic Highway Safety Plan (WSHSP) is to reduce critical crashes (Wyoming Highway Safety Management System Committee, 2012). Critical crashes are defined as fatal and incapacitating injury crashes that are represented in the binary response used in this paper. Reducing fatal and serious injury crashes is also the goal of the national strategy of Toward Zero Deaths (Ward et al., 2010). Binary representations of crash severity have also been used by Andreen and Ksaibati (2012) and Bham et al. (2012).

1.1. Background

Crash data have been analyzed through various types of statistical models to help researchers determine related factors, and to identify countermeasures to improve roadway safety. Many models have been developed for urban applications and intersections. There is extensive research analyzing crash risks and factors that concentrate on specific predictors.

However, there appears to be no studies that attempt to identify significant predictors for crashes on rural highways or on Indian reservations. Savolainen et al. (2011) provided an excellent review of statistical models for crashes. A brief overview of pertinent statistical modeling of crash severity is provided below.

Andreen and Ksaibati (2012) used multiple logistic regression to model crash severity on interstates 80 and 25 in Wyoming based upon several predictor variables. Logistic regression was used on a dichotomized response in which crashes were classified as "severe" and "not severe". These models were used to identify factors associated with crash severity on interstates in Wyoming. The predictor variables in these models were limited to those obtained from a standard report of the crash data throughout the state, and did not include many variables that were known to be of concern for severe crashes on roadways such as seat belt usage, driver distraction, and roadway geometrics. The study concluded that factors varied between the two interstates, I-80 and I-25. For example, on I-25, motorcycles and sobriety were determined to be important predictors of crash severity. It also recommended that more predictor variables could be included in the model such as roadway geometrics, driver distraction (use of cell phones), seat belt use, and emergency response time.

Logistic regression models have been used for urban applications to identify different factors contributing to crashes. Bham et al. (2012) discussed the use of logistic regression models of collision crashes on urban highways. In their study, crash severity was modeled as severe or not severe. The basis for this choice was that the crash reporting was more accurate for severe crashes than for the other three non-severe categories. Results were compared between divided and undivided urban highways. The analysis showed that alcohol involvement doubled the risk of crash severity for collision crashes on divided highways and was also significant in single vehicle crashes. Roadway geometrics were also significant in predicting crash severity. The study recommended that safety studies include collision type in the analysis as well as driver distraction.

Mooradian et al. (2012) used ordinal logistic regression to model crash severity. The response for this model included five levels for crash severity: fatal, serious injury, minor injury, possible injury, and no injury. Ordinal logistic regression was used in this study to account for the ordering associated with these categories. The analysis showed that significant trends existed for senior drivers leading to higher injury severity levels. The researchers stated that the statistical significance was not fully reliable, but provided information for long term patterns and for further investigation.

Ordered probit models can also be used to model ordered discrete response values. Here the ordinal responses are typically assumed to be unobserved measures of injury severity (Quddus et al., 2002; Weiss, 1992). Pei and Fu (2014) used an ordered probit model to model injury severity with four levels (no injury, slight injury, severe injury, and fatal injury) at unsignalized intersections. Several factors affecting crash severity were identified. These factors consisted of binary predictors indicating one of two categories. Interaction terms were introduced for lighting conditions with other variables.

The study concluded that crash severity could be associated with the predictors road conditions, collision types, and highway classification. Other significant variables that were not available in the crash reports, such as traffic volume, intersection geometry, and turning movement, limited the scope of the study.

Other types of models have been used to account for discrete responses arising from the study of crashes. The Poisson regression model has been widely used to model crash frequency (Uhm et al., 2012). It requires the equality of the mean and variance, but the variance often exceeds the mean with crash frequency data (Uhm et al., 2012). One approach to deal with this problem of overdispersion is through the negative binomial distribution. The Highway Safety Manual utilized a negative binomial regression model for their safety performance functions (SPFs) (Niessner, 2010). SPFs are statistical models that estimate average crash frequency based on specific roadway facility type and base conditions. These models enable analysts to consider different safety improvements to determine their effectiveness for a given roadway segment by predicting crash rates based on historical crash data and the application of the SPFs for a given improvement.

The most appropriate approach to model crash severity on rural roads for this study would be multiple logistic regression. As previously discussed, crashes in this study are modeled as "severe" for critical crashes and "not severe" for all other crashes. This is due to the low crash rates in these rural areas as well as the WSHSP goal of reducing critical crashes across the state. Thus, multiple logistic regression modeling is conducted to model crash severity as in Andreen and Ksaibati (2012) and Bham et al. (2012). This approach also advances the work by Andreen and Ksaibati (2012) who had recommended the expansion of the model to include more variables and systems to provide a comprehensive analysis that would benefit the state DOT in addressing the reduction of critical crashes.

1.2. Objective

The objective of this research is to develop a statistical model of crash severity for rural highway systems in Wyoming. These highway systems include interstates, state and U.S. highways, county local rural roads, and Indian reservation roads. The model will identify combinations of risk factors affecting crash severity for these highway systems. Results from the analyses will provide helpful information for decision makers to identify strategies for reducing critical crashes. This information will consist of two parts, (1) identification of important predictors of crash severity, and (2) separate identification of these predictors based on the highway system. Of particular interest here, is the identification of these predictors for Indian reservations.

2. Description of data

The Wyoming Department of Transportation (WYDOT) maintains a crash database for all roadways in Wyoming (WYDOT, 2009). This database includes information for every recorded crash in the state as reported by law enforcement at the time

of the crash. The raw crash data were obtained from WYDOT along with data on traffic counts, roadway geometrics, pavements, driver behaviors, and vehicle information. The raw data were compiled for all crashes across the state for a ten-year period from 2002 to 2011, resulting in 96,791 crashes. Four bulk data sets were used which included base bulk data on every crash, vehicle, driver and geometric data. The geometric data were a compilation of inventory records on the roadway types, vertical and horizontal alignment, pavement width, shoulders, medians, rumble strip locations, and traffic data (WYDOT, 2013). In addition, the highway system type was identified for each crash location. The highway systems that were included in the statistical analysis were interstates, state highways, U.S. primary and secondary highways, county rural local roads, and Indian reservation roads (IRR). Since the Wind River Indian Reservation includes all highway system types except interstates, a separate dataset was developed for the reservation that included all the highway systems within its boundaries.

Once all the crash data were compiled, this information was used to create a list of predictor variables. According to Pei and Fu (2014), all predictor variables were binary with values of 0 or 1. This was done to handle the numerous categorical predictors, and to make predictors more interpretable.

Several crashes involved more than two vehicles. The record of a particular crash includes driver and vehicle information for all vehicles in that crash. Information on multiple vehicles is incorporated through an indicator which takes the value of 1 if more than one vehicle is involved in the crash and 0 otherwise. This approach accounts for the effect of the number of vehicles involved in a crash on the severity of the crash. For multi-vehicle crashes, variables such as age and gender only account for the first driver or vehicle. This is not illogical since most of the data concerning these factors are only listed for the first driver who is typically at fault as stated in the crash report.

Since road geometrics were closely related to one another, many of these variables were aggregated to account for important aspects of road geometrics and to minimize problems of multi-collinearity (Kutner et al., 2004). Left and right shoulder information included width and shoulder type. One variable was used for each shoulder (left and right) whether a shoulder existed or not. Each horizontal and vertical alignment had several categories that needed to be consolidated. Vertical alignment was eventually reduced to level or not level. Horizontal alignment was reduced to curve or no curve.

The largest reduction in the number of predictors came with consideration of the first harmful event (FHE). In the crash report, there were over 60 characterizations for FHE. These were consolidated into five meaningful categories involving animal, rollover, collision with another vehicle, fixed object, guardrail, and other. The "other" category included a variety of events that accounted for <10% of all crashes and was thus not included in the variable selection.

Age was a continuous variable, but it deserved special consideration. The best approach was to divide it into groups and code each age group separately. According to the Centers for Disease Control (CDC), injuries from vehicle crashes are the greatest health threat to young drivers aged 16-19

(Centers for Disease Control and Prevention). Among American Indians, vehicle crashes are the leading cause of unintentional injury for ages up to 44 and the leading cause of death to young people under 20-year-old (Centers for Disease Control and Prevention, 2012). Senior drivers are also at high risk to experience severe crashes. Drivers over 65 years tend to have longer perception reaction times and lower visual acuity (Mooradian et al., 2012). Based on these trends and preliminary analyses, two age groups are selected for the model. These include indicators for drivers aged 25 and below and drivers aged over 65.

Driver distraction data identify whether a driver is distracted or not. There is a high percentage of unknown or missing data for distraction. Driver distraction is an important factor that is gaining attention as the use of cell phone and texting has become so prevalent while driving. Unfortunately, many crash records do not include this information, so it is necessary to remove this variable from all models except for the global model.

Average daily traffic (ADT), average daily truck traffic (ADTT), vehicle miles traveled (VMT), and truck vehicle miles traveled (TVMT) are important variables that are included in the roadway inventories. Exposure data are important to crash analysis when segments are compared (Golembiewski and Chandler, 2011). These variables are dichotomized according to whether the traffic data values at the crash location are above the mean values or below for the respective roadway systems (interstate, state, county, WRIR). None of this traffic data are available for the county roads, and it is incomplete for the state system. Two speeds are initially considered. These include vehicle speed at the time of the crash as reported in the crash report, and the speed limit at the given location of the crash. The mean speed refers to the vehicle speed, and the mean posted speed refers to the speed limit.

3. Study methodology

3.1. Logistic regression model

Severity of the crash is the response variable, Y, which represents whether a crash is severe (outcome 1) or not severe (outcome 0). Severe crashes include fatal and incapacitating injuries. A non-severe crash includes non-incapacitating injury, possible injury, and no injury. Since the response is binary, a Bernoulli distribution is assumed, which is a discrete probability distribution where the outcome 1 is a "success" with probability p and the outcome 0 is a "failure" with probability 1 - p. Thus, the expected value of Y equals to p.

Several predictor variables are used to model crash severity on rural roads in Wyoming. Thus, multiple logistic regression is used to formulate the model. Let x denote a q x 1 vector of p predictor variables and h pairwise interactions specified in the set H. Let b denote the corresponding q x 1 of regression coefficients.

Kutner et al. (2004) presented the multiple logistic model with logit link as the following form

x b = b0 + J2 bjxj + J2 bkkXkXk

j=1 {(k,k' )eH}

ln[p/(1 - p)] = x'b p = exp(x' b)/[1 + exp(x'b)]

In Eq. (2), odds = p/(1 - p) denotes the odds of a severe crash. It is often of interest to examine the odds ratio (OR) given by

OR =[pa/(1 - pa)][(1 - p2)/p2]

This is the ratio of odds from the probability p1 obtained from one combination of regressors xa, and from the probability p2 obtained from another combination of regressors x2.

Using maximum likelihood, it is possible to obtain estimates of the parameter vector b in Eq. (2) (SAS Institute Inc., 2008). A hat will be used to denote an estimate of the corresponding quantity. Thus, the estimates of the regression coefficients (b) probability (p), odds (odds), and odds ratio (OR) are considered. It is often of interest to obtain OR for interpretation. There are particular choices of x1 and x2 that are especially meaningful. First, consider a binary predictor Xj that is not involved in any interaction effect. Compared with Xj = 0, the estimated odds ratio for Xj = 1 is OR = exp(bj). Even though it will not be explicitly stated, this expression assumes all other regressor variables are the same for xa and x2. Now, consider a particular binary predictor Xk that is involved in a single interaction effect involving the binary predictor Xk . When there is an interaction effect, the effects of the predictors Xk and Xk cannot be assessed separately. For example, when xk = 1, xk = 0 compared with xk = 0, xk = 0, the estimated odds ratio is as follow

OR = exp

On the other hand, when xk = 1, xk = 1 compared with xk = 0, xk = 1, the estimated odds ratio is as follow

OR = exp bk + bkk

Again, these expressions assume all other regression variables are the same for xa and x2. Furthermore, if xk is involved in another pairwise interaction term with xm, it is also necessary to assume xm = 0 in order to maintain the interpretations in Eqs. (4) and (5).

The model defined above needs to be built through suitable selection of the p predictors and h interactions. The model building strategy described by Hosmer et al. (2013) is utilized. These steps include univariable analysis to identify possible predictors, stepwise variable analysis to select the set of p predictors, a detailed evaluation of possible pairwise interactions among the p predictors, and checks of the model fit. The implementation of each of these steps is discussed below. All analyses were performed using the SAS statistical software (SAS Institute Inc., 2008).

Univariate analysis

Over 50 variables are initially considered for this study. Further discussion of these variables along with descriptive

statistics can be found in Shinstine and Ksaibati (2013). Numerous two-by-two frequency tables were also examined as part of the descriptive analysis to inspect the relationships between the binary predictor and severity (Frequency tables from univariate analysis). Univariable analysis consists of fitting the logistic regression model in Eq. (2) with only a single predictor. Such models are also called simple logistic regression models (Kutner et al., 2004). A predictor was included in the possible set of predictors if it either had a relationship with severity in the simple logistic regression model or if it was recognized as an important predictor in the literature.

Through the univariable analysis, 50 variables are reduced to 33. These 33 variables are shown in Table 1 along with the number of missing observations for each predictor. Nine of the original variables include information on the second vehicle or second driver. These are determined to be insignificant which may be attributed to the lack of data on the second driver or vehicle. Thirteen of the variables are related to roadway geometrics, and as discussed previously, are combined to reduce problems of multi-collinearity. The results from the univariable analysis are shown in Table 2. Notice that the predictors VMT and driver age are not

Table 1 - Possible predictors along with corresponding

missing data for crashes by system.

Variable Global Interstate State County WRIR

Weekend 620 71 356 124 17

Animal (FHE) 654 78 368 128 17

Rollover (FHE) 654 78 368 128 17

Guardrail (FHE) 654 78 368 128 17

Fixed object (FHE) 654 78 368 128 17

Number of vehicles 622 71 356 124 17

FHE location 6884 908 4345 696 307

Lighting 935 106 532 180 34

Impaired 9478 1209 6322 680 387

Road condition 7387 1955 2853 680 91

Mean posted speed 5036 420 2383 1742 201

Pavement (surface) 9794 1259 6494 725 396

Level grade 10,279 1357 6729 769 402

Horizontal 10,320 1350 6756 793 402

alignment

Truck 9798 1241 6558 706 392

Motorcycle 9798 1241 6558 706 392

Mean speed 5042 641 3040 748 134

Vehicle state 2117 71 1335 301 62

Vehicle maneuver 1056 126 621 183 30

Driver age <25 years 1642 183 1057 209 56

Driver age >65 years 1642 183 1057 209 56

Driver gender 1394 141 907 181 39

Driver safety 7974 587 5293 1453 307

equipment

Driver distraction 34,654 6834 16,440 4483 978

Median 620 71 356 124 17

Rumble strip 620 71 356 124 17

Left shoulder 620 71 356 124 17

Right shoulder 620 71 356 124 17

ADT 10,825 412 1880 7980 589

ADTT 10,825 412 1880 7980 589

VMT 10,825 412 1880 7980 589

TVMT 10,825 412 1880 7980 589

Total crashes 96,791 34,266 54,381 7980 2212

statistically significant in the univariable analysis. However, given the support in the literature for these variables, they are included in the set of predictors in order to assess their role in the presence of the other predictors.

Variable selection

Stepwise variable selection is used to identify the statistically significant predictors for the model from the set of predictors in Table 2. This approach is similar to forward selection, except that predictors already in the model in a previous step do not necessarily remain in the model (Kutner et al., 2004). The significance levels (a) for the predictor to enter and stay are from the Wald Chi-square test. The value used for a covariate to enter the model is aenter = 0.10 and to stay in the model is astay = 0.05. These selected values are based upon their use in Andreen and Ksaibati (2012) and Mooradian et al. (2012).

Interactions

Interaction terms are examined next for inclusion into the logistic regression model. Checking for interactions is part of the logistic regression model building strategy described by Hosmer et al. (2013). Interactions are important to consider since the effect of a particular risk factor upon crash severity may depend on the values of other risk factors. For example, Andreen and Ksaibati (2012) identified lighting and gender as significant interactions when predicting crash severity. In this case, a crash was more likely to be severe if a male was driving at night. Classen et al. (2008) also identified important interactions among person, vehicle, and environment associated with crashes and injuries involving older drivers. In this study, candidates included pairwise interactions between the variables identified from the stepwise variable selection. It was not feasible to consider all possible interactions because of the large number of variables that were selected. An interaction term indicates that the impact of a predictor on severity is not the same across the values of the other predictor. Thus, specific interactions of interest are considered in which the impact of a variable on crash severity may be related to another variable. In particular, interactions may be expected among the variables lighting, impairment, speed, and distraction. A list of the interactions that are considered is given in Table 3.

Each of the interaction terms in Table 3 are tested in the model. Insignificant interactions are removed through an iterative process starting with the removal of interactions that have a Wald Chi-square p-value >0.05. This is done until all of the interaction p-values were <0.05. Once the interaction effects are selected, insignificant main effect terms are removed from the model if they are not involved in any of the interaction terms. This step conducted since the incorporation of the interaction effects may have accounted for the main effect in the model.

Model adequacy

The adequacy of the model is assessed once the predictors are incorporated and the interactions are chosen. Model

Table 2 - Predictor variables, variable codes, and Wald Chi-square p-values in the univariate analysis.

Variable Code/value Chi-square Variable Code/value Chi-square

p-value p-value

Weekend 0 = M, T, W, R <0.0001 Mean speed 0 < mean speed <0.0001

1 = F, Sa, Su 1 > mean speed

Animal (FHE) 0 = no animal <0.0001 Vehicle state 0 = Wyoming <0.0001

1 = animal 1 = out of state

Rollover (FHE) 0 = no rollover <0.0001 Vehicle maneuver 0 = straight <0.0001

1 = rollover 1 = not straight

Guardrail (FHE) 0 = guardrail <0.0001 Driver age (year) 1 < 26 0.0593

1 = no guardrail 0 > 26

Fixed object (FO) (FHE) 0 = no FO <0.0001 Driver age (year) 1 > 65 0.2610

1 = fixed object 0 < 65

Number of vehicle 0 = one vehicle <0.0001 Driver gender 0 = female <0.0001

1 > one vehicle 1 = male

FHE location 0 = on roadway <0.0001 Driver safety equipment 0 = used <0.0001

1 = off roadway 1 = not used

Lighting 0 = daylight <0.0001 Driver distraction 0 = not distracted <0.0001

1 = darkness 1 = distracted

Impaired 0 = not impaired <0.0001 Median 0 = median <0.0001

1 = impaired 1 = no median

Road condition 0 = dry <0.0001 Rumble strip 0 = rumble strip 0.0409

1 = wet, snow, etc. 1 = no rumble strip

Mean posted speed 0 < mean post sp. <0.0001 Left shoulder 0 = left shoulder 0.0011

1 > mean post sp. 1 = no left shoulder

Pavement (surface) 0 = paved 0.0106 Right shoulder 0 = right shoulder 0.0010

1 = unpaved 1 = no right shoulder

Level grade 0 = level <0.0001 ADT 0 < mean ADT <0.0001

1 = not level 1 > mean ADT

Horizontal alignment 0 = straight <0.0001 ADTT 0 < mean ADTT 0.0117

1 = curve 1 > mean ADTT

Truck 0 = truck <0.0001 VMT 0 < mean VMT 0.2648

1 = not truck 1 > mean VMT

Motorcycle (MC) 0 = not MC <0.0001 TVMT 0 < mean TVMT <0.0001

1 = MC 1 > mean TVMT

Table 3 - Interactions considered for inclusion in the logistic regression model.

Possible interaction

Animal and lighting Rollover and lighting Guardrail and lighting Fixed object and lighting FHE location and lighting Lighting and impaired Lighting and road cond. Lighting and age >65 years Impaired and rollover Impaired and fixed object Impaired and road cond. Impaired and alignment Impaired and maneuver Impaired and level Impaired and age <25 years Impaired and MC Impaired and gender Road cond. and MC Road cond. and alignment Road cond. and age >65 years Mean post sp. and surface Mean post sp. and level Mean post sp. and alignment

Weekend and impaired Median and left shoulder Rumble strip and right shoulder Mean speed and MC Mean speed and animal Mean speed and rollover Mean speed and fixed object Mean speed and impaired Mean speed and road cond. Mean speed and surface Mean speed and alignment Mean speed and age < 25 years Mean speed and gender Distracted and mean speed Distracted and lighting Distracted and impaired Distracted and road cond. Distracted and alignment Distracted and maneuver Distracted and level Distracted and rollover Distracted and age < 25 years Distracted and gender

adequacy can be assessed by goodness-of-fit tests that assess the difference between the observed and fitted values. A standard test for goodness-of-fit in logistic regression is the Hosmer-Lemeshow test (Hosmer et al., 2013; Kutner et al., 2004). The data are grouped into classes with similar fitted values with approximately the same number of observations. Based on these groupings, the Pearson Chi-square statistic is calculated.

Model adequacy can also be assessed by its classification or predictive ability. The model can be used to obtain the estimated probability (p). When p is high, the outcome 1 (severe crash) is predicted, and if pb is low, the outcome 0 (not severe crash) is predicted. The sensitivity is the proportion of severe crashes that are predicted by the model to be severe. The specificity is the proportion of non-severe crashes predicted to be non-severe. A good model is the one with high sensitivity and high specificity. However, these calculations depend on a cut-off value to determine how large pb is to be to classify a crash as severe. A more complete description of the predictive ability of a model is the receiver operating characteristic (ROC) curve (Hosmer et al., 2013). The ROC curve is a plot of the sensitivity and 1-specificity across a range of cut-points. The area under the ROC curve is commonly used as a summary measure of the predictive ability of the model. General guidelines provided by Hosmer et al. (2013) suggest that

values >0.7 indicate acceptable prediction ability, >0.8 indicate excellent predictive ability, and >0.9 indicate outstanding predictive ability.

3.6. Models for highway system

Multiple logistic regression models are developed for five different rural highway systems across the state. These five highway systems include global system, interstate system, state system, county system, and the WRIR system. The global system includes all of the other systems and is a combination of interstate, state, county and WRIR Systems. The state system includes U.S. and state highways maintained by the Wyoming Department of Transportation (WYDOT). The county system includes all county rural local highways. The Wind River Indian Reservation maintains Indian reservation roads (IRR) and some county roads. However, state and U.S. highways transverse the reservation as well, so the WRIR system includes all highway systems on the reservation.

Each of these models starts with the same set of predictor variables listed in Table 2 from the univariable analysis. Stepwise variable selection is applied separately to model each highway system. The interaction terms in Table 3 are then assessed separately for each highway system. Through this process, the important predictors of crash severity could be identified for each system and then compared across systems.

4. Results

4.1. Model specification

The main effects model refers to those models obtained from the stepwise variable selection procedure before incorporating interaction terms. The modeling at this stage is affected by the combinations of the missing values for the variables shown in Table 1. Thus, a crash is dropped from the analysis at this stage if any of the variables included in the stepwise selection procedure included a missing value. As a result, a number of crashes are dropped from the analysis. An alternative would be multiple imputation which uses a specified model to predict values for the missing data in order to obtain a complete data set (Hosmer et al., 2013). This approach is not used in order to work with the existing information provided by WYDOT, and to avoid making the subjective and complex assumptions required for the model specification in the multiple imputation. The variables obtained from the stepwise procedure for each roadway system are shown in Table 4.

The "X" indicates that this variable remained in the model through the stepwise procedure. The "All" column for each roadway system model shows the variables that are selected in the stepwise procedure when all 33 variables are included. The resulting model is referred to as the all model. The "Rem" column for each system model shows the variables that are selected by the stepwise procedure after the variables with large amounts of missing data are removed. The resulting model is referred to as the removed model. The variables with a large number of missing values include driver distraction

and the traffic data (ADT, ADTT, VMT, TVMT). The value "R" in the table indicates that this variable is removed in this model. Table 4 also shows the number and percent of crashes and severe crashes used in the stepwise procedure. The percent of severe crashes increases slightly when the variables with a large number of missing values are removed ("Rem"). For the global model, interstate and state systems, there are about two percent more severe crashes for the removed model than for the all model. For the county and WRIR models, there are about four to five percent more severe crashes for the removed model than for the all model. From Table 4, it can also be noted that the variables animal, impaired, motorcycle (MC) and driver safety equipment remain in all roadway system models. The removed models are based on more crashes, or an increase from 19% to 32%. They also capture more severe crashes. However, a drawback of this approach is that some of these predictors that were removed are important in some of the models. In addition, incorporation of traffic volumes and driver exposure is important in performing an accurate crash analysis. Distracted driving is a major concern and is included in this study due to the results from Andreen and Ksaibati (2012) and Bham et al. (2012).

Table 4 shows that the variables animal, impaired, motorcycle, and seat belts are included in every model and in all cases. In addition to these variables, the global, interstate, and state systems include the variables number of vehicles, FHE location, road surface condition, mean posted speed, level grade, horizontal alignment, mean speed, driver age >65 years or <25 years, gender, and ADT (when included). Only the model for the county system included the variable pavement surface. Only the global (removed) and the WRIR system models contained the variable state (state of residence of driver).

As a result of the above considerations, two main effects models are created for each highway system and the global system. One includes all of the variables from Table 2 (all), and another is based on the removal of driver distraction and the traffic data (ADT, ADTT, VMT, TVMT) (removed). These five predictors are removed since they include such a large amount of missing crash values.

Final models are obtained for each roadway system as described in the methodology section. In light of the previous observations, the global system is modeled both with all 33 regressors (all model) and without driver distraction and traffic data (removed model). The final models for the other systems are obtained without driver distraction and traffic data (removed model). The interaction terms in Table 3 are examined provided the corresponding terms are included in the main effects model. Final models are obtained after testing these interaction terms.

The Akaike Information Criterion (AIC) and Baysian Information Criterion (BIC) were calculated for the models and individual variables in the model to provide additional measures for model comparison (SAS Institute Inc., 2008). Models would be preferred in which these information criteria are small. For the individual variables, these information criteria are for models not including that particular term. Thus, that term would be deemed necessary when it's AIC or BIC value increases above that reported for the final model. AIC, BIC,

Table 4 - Variables selected for main effect models using forward stepwise selection along with the number of observations used in the selection procedure for the all and removed models.

Variable Global Interstate State County WRIR

All Rem. All Rem. All Rem. All Rem. All Rem. Weekend X

Animal (FHE) X X X X X X X X X X

Rollover (FHE) X X X X X X

Guardrail (FHE) X X X X X

Fixed object (FHE) X X X X X X

Number of vehicles X X X X X X

FHE location X X X X X X

Lighting X X

Impaired X X X X X X X X X X

Road condition X X X X X X X X X

Mean posted speed X X X X X X X X

Pavement (surface) X X

Level grade X X X X X X X X

Horizontal alignment X X X X X X

Truck X X

MC X X X X X X X X X X

Mean speed X X X X X X X X

Vehicle state X X

Vehicle maneuver X X X X

Driver age <25 years X X X X

Driver age >65 years X X X X X X X

Driver gender X X X X X X

Driver safety equipment X X X X X X X X X X

Driver distraction X R X R R R R

Median X X

Rumble strip

Left shoulder

Right shoulder

ADT X R X R X R NA R R

ADTT X R R R NA R R

VMT X R R R NA R R

TVMT R R X R NA R R

Total crashes 96,791 96,791 34,266 34,266 54,381 54,381 7980 7980 2212 2212

Data used 39,467 64,761 13,892 25,010 25,593 35,954 1757 3811 644 1170

Percentage of data used (%) 40.78 66.91 40.54 72.99 47.06 66.12 22.02 47.76 29.11 52.89

Severe crashes 2355 5483 847 2044 1512 3016 122 419 49 150

Percentage of severe crashes (%) 5.97 8.47 6.10 8.17 5.91 8.39 6.94 10.99 7.61 12.82

and likelihood ratio tests (LRTs) are also calculated for the final model and for the main effect model without the interactions in order to assess the necessity of incorporating interaction terms (SAS Institute Inc., 2008).

4.2. Global system

The final all model and the final removed model for the global system are given in Table 5. The Wald Chi-square test values, p-values from the Wald Chi-square test, AIC values, and BIC values are given for each of the statistically significant variables. AIC and BIC values presented for a variable are those obtained when that variable is removed from the model. For comparison purposes, the likelihood ratio test results, AIC, and BIC values are also presented for the full model and for the model containing only main effects. The final models have been obtained after removing main effects not selected by the stepwise procedure and after incorporating significant interactions. It is based upon 39,649 crashes among which 2378 are severe. The final removed model is based on 64,835 crashes among which 5490 are

severe. The information criteria values for AIC and BIC in the table are based on these crash numbers. AIC and BIC values for individual terms represent the information criteria value for a reduced model not including that particular term.

Each of these models includes five interaction terms. For the all model, the likelihood ratio test (Chi-square = 55.76, df = 5, p-value < 0.0001) and AIC indicate that interaction terms should be included in the model whereas BIC did not. For the removed model, the likelihood ratio test (Chi-square = 37.42, df = 5, p-value < 0.0001) and AIC indicate that interaction terms should be included while there was no discrimination according to BIC.

Table 6 shows the estimates (b), standard errors [sd(p)] odd ratio estimates (odds), and 95% confidence interval estimates for the odds ratio (odds) as described in Section 3.1. Estimates and confidence intervals involving interactions terms are based on Eqs. (4) and (5), and must be interpreted accordingly.

For the all model, the estimated coefficients for mean ADT and mean ADTT are negative, which indicates that crashes in higher traffic volumes are less likely to result in severe

crashes. This may be a result of reduced speeds, more attentive driving behaviors, or higher likelihoods of crashes that happen not to be severe. The positive estimated coefficient of 0.13 for mean VMT indicates the estimated odds of a severe crash are 1.14 times higher than a vehicle exceeding mean VMT. The all global system model results show that distracted driving and traffic data (ADT, ADTT, VMT, and TVMT) have important associations with crash severity. Thus, it is imperative that crash investigators record this information in the crash record.

The effect of distracted and impaired driving behaviors on crash severity could not be separated as indicated by the presence of the corresponding interaction term. The effect of impaired driving also cannot be separated from the effects of lighting and rollover due to the presence of the corresponding interaction terms in the model. In particular, the estimated odds of a severe crash for a non-impaired (xk = 0), distracted driver (xk, = 1), are 1.27 times more than those of a crash for a non-impaired (xk = 0), non-distracted driver (xk = 0). On the other hand, the estimated odds of a severe crash for an impaired (xk = 1), distracted driver (xk = 1) are

6.15 times more than those of a crash for an impaired (xk = 1), non-distracted driver (xk = 0). The estimated odds of a severe crash for an impaired (xk = 1), distracted driver (xk = 1) are 10.70 times more than those of a crash for an distracted (xk = 1), non-impaired driver (xk = 0), assuming the crash is not a rollover and occurs during daylight hours. However, the associated confidence intervals are fairly wide for these latter two terms likely due to the lack of information on driver distraction in the recorded crash reports. Another important risk factor combination involved motorcycles and mean speed. A crash involving a motorcycle traveling below the mean speed is estimated to be 7.08 times more likely to be severe than crashes not involving a motorcycle where the vehicle is traveling below the mean speed. Similarly, a crash involving a motorcycle traveling above the mean speed is estimated to be 13.80 times more likely to be severe than a crash not involving a motorcycle where the vehicle is traveling above the mean speed.

For the removed global system, many of the estimated coefficients increase in magnitude, particularly for those that are expected to be associated with crash severity. This

Table 5 - Logistic regression model results for the global system for the all model and removed model.

Variable Global all Global removed

Chi-square p-value AIC BIC Chi-square p-value AIC BIC

Full model 14,994 15,226 30,209 30,445

Main effects model 37.424 <0.0001 15,021 15,210 55.759 <0.0001 30,254 30,445

Intercept 594.447 <0.0001 15,857 16,081 1159.706 <0.0001 31,630 31,857

Animal 44.686 <0.0001 15,042 15,265 255.318 <0.0001 30,506 30,733

Rollover 114.681 <0.0001 15,109 15,332 397.438 <0.0001 30,621 30,848

Guardrail 16.068 <0.0001 15,010 15,233 44.010 <0.0001 30,255 30,482

Vehicle 199.709 <0.0001 15,212 15,435 152.462 <0.0001 30,363 30,590

FHE location 12.540 0.0004 15,004 15,228 42.615 <0.0001 30,250 30,477

Lighting 6.892 0.0087 14,999 15,222

Impaired 23.207 <0.0001 15,013 15,236 247.545 <0.0001 30,431 30,658

Road condition 62.064 <0.0001 15,056 15,280 54.502 <0.0001 30,264 30,491

Mean posted speed 54.700 <0.0001 15,048 15,272 40.534 <0.0001 30,249 30,476

Level 19.037 <0.0001 15,011 15,234 47.605 <0.0001 30,252 30,479

Alignment 31.635 <0.0001 15,022 15,246 45.298 <0.0001 30,251 30,478

Truck 16.342 <0.0001 30,224 30,451

Motorcycle (MC) 182.998 <0.0001 15,145 15,368 309.835 <0.0001 30,481 30,708

Mean speed 48.124 <0.0001 15,041 15,265 106.323 <0.0001 30,317 30,544

State 4.406 0.0358 30,211 30,438

Maneuver 5.225 0.0223 14,997 15,220 13.998 0.0002 30,221 30,448

Age <25 years 44.251 <0.0001 30,252 30,479

Age >65 years 29.923 <0.0001 15,019 15,243 24.633 <0.0001 30,230 30,457

Gender 12.161 0.0005 15,004 15,228 1.419 0.2336 30,208 30,435

Safety equip. use 421.569 <0.0001 15,365 15,588 1436.924 <0.0001 31,537 31,764

Distracted 6.868 0.0088 14,998 15,222

Median 10.364 0.0013 30,217 30,444

Mean ADT 77.471 <0.0001 15,070 15,293

Mean ADTT 20.593 <0.0001 15,013 15,236

Mean VMT 5.984 0.0144 14,998 15,221

Animal and lighting 4.678 0.0305 14,996 15,220

Lighting and impaired 5.097 0.0240 14,997 15,220

Rollover and impaired 5.161 0.0231 14,998 15,221 4.604 0.0319 30,211 30,438

MC and mean speed 14.327 0.0002 15,006 15,230 12.090 0.0005 30,219 30,446

Mean posted sp. and level 27.345 <0.0001 30,234 30,460

Impaired and distracted 9.214 0.0024 15,000 15,224

Road cond. and mean speed 5.017 0.0251 30,212 30,439

Mean speed and gender 4.187 0.0407 30,211 30,438

includes animal, impairment, and safety equipment use. This model also contains various interactions with mean speed showing how its effect on crash severity depends upon the other variables of motorcycles, level, road conditions, and gender. For example, the estimated odds of a severe crash on a non-level grade for a vehicle exceeding the mean speed are 2.27 times more likely than those of a crash on a non-level grade for a vehicle not exceeding the mean speed. In addition, the estimated odds of a severe crash are 1.82 times more likely to be severe if a male is exceeding the mean speed than if a female is exceeding the mean speed assuming the crash occurred without a motorcycle and in dry conditions. The estimated odds of a severe crash are 2.12 times more likely if the driver involved in the crash exceeded the mean speed than those of a crash

if the driver did not exceed the mean speed assuming the crash occurred without a motorcycle, in dry conditions, and involved a female driver.

Both global system models demonstrate similar predictive ability with areas under the ROC curve of 0.7998 for the all model and 0.8101 for the removed model. Hosmer et al. (2013) would classify both models as having excellent predictive ability. However, the Hosmer-Lemeshow goodness-of-fit test showed evidence against the assumption of an adequate model fit with a p-value < 0.0001 for the all model and 0.0006 for the removed model. This evidence of lack of fit could indicate a failure to adequately account for the large amount of information in the global system consisting of nearly 40,000 crashes for the all model and over 60,000 crashes for the removed model. Recall that the

Table 6 - Estimates (Est.), standard errors (S.E.), odds ratio estimates (Odds), and 95% confidence interval estimates (2.5% and 97.5%) for statistically significant variables in the all and removed logistic regression models of the global system.

Variable Global all Global removed

Est. S.E. Odds 2.5% 97.5% Est. S.E. Odds 2.5% 97.5%

Intercept -4.536 0.186 0.011 0.007 0.015 -4.410 0.129 0.012 0.009 0.016

Animal -0.974 0.146 0.378 0.284 0.502 -1.378 0.086 0.252 0.213 0.299

Rollover 0.875 0.082 2.400 2.044 2.817 0.902 0.045 2.464 2.255 2.693

Guardrail 0.628 0.157 1.874 1.378 2.547 0.541 0.082 1.718 1.464 2.017

Vehicle 1.146 0.081 3.146 2.684 3.688 0.660 0.053 1.935 1.742 2.149

FHE location 0.267 0.076 1.307 1.127 1.515 0.275 0.042 1.316 1.212 1.429

Lighting 0.153 0.058 1.165 1.040 1.306

Impaired 0.795 0.165 2.214 1.602 3.059 1.050 0.067 2.857 2.507 3.256

Road condition -0.445 0.056 0.641 0.574 0.716 -0.511 0.069 0.600 0.524 0.687

Mean posted speed 0.573 0.077 1.773 1.523 2.063 0.419 0.066 1.520 1.336 1.730

Level -0.219 0.050 0.803 0.728 0.886 -0.472 0.068 0.624 0.546 0.713

Alignment 0.331 0.059 1.393 1.241 1.563 0.250 0.037 1.284 1.194 1.381

Truck 0.222 0.055 1.249 1.121 1.391

Motorcycle (MC) 1.957 0.145 7.079 5.331 9.400 1.857 0.106 6.405 5.208 7.876

Mean speed 0.467 0.067 1.594 1.397 1.819 0.751 0.073 2.120 1.838 2.445

State 0.074 0.035 1.077 1.005 1.155

Maneuver -0.121 0.053 0.886 0.798 0.983 -0.151 0.040 0.860 0.794 0.931

Age <25 years -0.239 0.036 0.787 0.734 0.845

Age >65 years 0.426 0.078 1.532 1.315 1.784 0.305 0.061 1.356 1.202 1.530

Gender -0.175 0.050 0.840 0.761 0.926 -0.075 0.063 0.928 0.821 1.049

Safety equip. use 1.223 0.060 3.396 3.022 3.816 1.453 0.038 4.275 3.965 4.608

Distracted 0.240 0.092 1.272 1.062 1.522

Median 0.113 0.035 1.120 1.045 1.200

Mean ADT -0.633 0.072 0.531 0.461 0.611

Mean ADTT -0.283 0.062 0.753 0.667 0.851

Mean VMT 0.128 0.052 1.136 1.026 1.259

Animal and lighting + animal -1.368 0.145 0.255 0.192 0.339

Animal and lighting + lighting -0.241 0.173 0.786 0.560 1.104

Lighting and impaired + impaired 1.291 0.157 3.637 2.676 4.943

Lighting and impaired + lighting 0.650 0.212 1.915 1.263 2.903

Rollover and impaired + impaired -0.330 0.504 0.719 0.268 1.929 0.823 0.085 2.278 1.927 2.694

Rollover and impaired + rollover -0.249 0.495 0.779 0.295 2.058 0.676 0.102 1.965 1.609 2.399

MC and mean speed + MC 2.625 0.118 13.802 10.948 17.401 2.309 0.086 10.065 8.504 11.913

MC and mean speed + mean speed 1.134 0.172 3.109 2.218 4.357 1.203 0.142 3.331 2.523 4.400

Mean post sp. and level + level -0.070 0.037 0.932 0.866 1.003

Mean post sp. and level + mean post sp. 0.821 0.054 2.272 2.045 2.524

Impaired and distracted + distracted 1.816 0.511 6.146 2.257 16.734

Impaired and distracted + impaired 2.370 0.525 10.699 3.825 29.925

Road cond. and mean speed + mean speed 0.574 0.087 1.774 1.495 2.106

Road cond. and mean speed + road cond. -0.688 0.043 0.502 0.462 0.546

Mean speed and gender + gender -0.227 0.042 0.797 0.734 0.865

Mean speed and gender + mean speed 0.599 0.057 1.821 1.627 2.037

global system embodies all of the other highway systems. Thus, lack of fit might be detected if there is lack of fit in any other system or if there are differences between the systems since such differences are not accounted for by this model. The different roadway systems are now discussed separately below.

4.3. Interstate system

The results for the final model obtained for the interstate system are shown in Table 7. This model included three interaction terms all involving mean speed. The likelihood ratio test (Chi-square = 38.68, df = 3, p-value < 0.0001), AIC, and BIC indicated that interaction terms should be included in the model. The interstate system model shows excellent predictive ability with area under the ROC curve of 0.7961. The Hosmer-Lemeshow test also shows no evidence against the assumption of adequate model fit as the p-value is 0.1137. Estimates and confidence intervals for this model are shown in Table 8.

Many of the same variables remain in the final models for the interstate system and the global system. These include the number of vehicles, impaired driving, and safety equipment use (seat belt use) where the estimated odds ratios are 2.94, 2.19, and 5.50, respectively. This means that the estimated odds of a severe crash are 5.5 times more likely without a seat belt than those of a crash with a seat belt. One surprise is that the estimated coefficient associated with median is negative for the interstate system which indicates that the probability of a severe crash is less when there is not a median. Only 2.6% (889) of all crashes on the interstate occur without a median. Of these crashes, 5% (44) are severe compared with the 7% (2310) of crashes that are severe without the median. Nevertheless, it is expected that most interstates would have a median. Further investigation should be made about the locations of these crashes to determine why there are no medians. The effects of rollovers, mean speed, and motorcycles are linked through various interactions. For example, the estimated odds of a severe crash occurring above the mean speed are 26.6 times higher when the crash involves a motorcycle compared with those of a crash occurring above the mean speed and does not involve a motorcycle.

4.4. State system

The results for the final model that are obtained for the state system model are shown in Table 7. This model includes four interaction terms involving either mean speed or mean posted speed. The likelihood ratio test (Chi-square = 39.28, df = 4, p-value < 0.0001) and AIC indicate that interaction terms should be included in the model whereas BIC does not favor incorporation of interaction terms. The predictive ability of the state model is excellent with an area under the ROC curve of 0.8287. However, the Hosmer-Lemeshow test indicates that the model does not provide an adequate fit with a p-value of 0.0059. This could be due to the fact that the highways included in the Wyoming state system can vary largely between state highways, primary, and secondary U.S. highways in their geometry and maintenance levels. In addition, this model includes these highways

across the entire state where terrain and surrounding conditions are different from one location to the next. Additional modeling may be necessary to account for such differences. The lack of fit in the state system likely also contributes to the lack of fit found in the global system. Estimates and confidence intervals for this model are shown in Table 8.

Animal and safety equipment are predominant main effects with odd ratio estimates of 0.183 and 3.730, respectively. The mean posted speed plays an important role in this model as it interacts with level and alignment. In particular, the estimated odds of a severe crash where the vehicle exceeds the mean posted speed are 2.38 times higher than those of a crash where the vehicle does not exceed the mean posted speed assuming the crash occurred on a level grade with a straight horizontal alignment. The estimated odds of a severe crash on a non-level grade where the vehicle exceeds the mean posted speed are 3.47 times higher than those of a crash occurred on a non-level grade in which the vehicle does not exceed the mean posted speed assuming the crash occurred with a straight horizontal alignment. Impairment, motorcycles, and mean speed also play important roles in this model through interaction terms. For example, the estimated odds of a severe crash involving a rollover are 2.10 times higher than those of a crash involving a non-rollover when the crash occurred below the mean speed. Alternatively, the estimated odds of a severe crash involving a rollover at higher than the mean speed are 1.37 times higher than those of a crash not involving a rollover at higher than mean speed. While the effects of rollover are important, they are not as large as for the interstate system.

4.5. County system

The results for the final model that are obtained for the county system are shown in Table 7. This model is based on a smaller number of crashes at around 4000. As a result, it is expected that fewer predictors may be identified by the model selection procedure. This model includes one interaction term. The likelihood ratio test (Chi-square = 7.89, df = 1, p-value < 0.0050) and AIC indicate that interaction terms should be included in the model whereas BIC does not favor one model over the other. The predictive ability of the county system model is excellent with area under the ROC curve of 0.8345. The Hosmer-Lemeshow test also does not provide evidence against the assumption of an adequate model fit as the p-value is 0.7797. Table 9 shows the estimates and confidence intervals for this model.

Impairment, motorcycles, and seat belt use are predominant main effects in this model with estimated odds ratios of 3.26, 6.30, and 4.28, respectively. Thus, the estimated odds of a severe crash are more than 6 times higher if that crash involves a motorcycle. This is the only model to identify pavement surface as an important predictor. This is understandable since many county roads are unpaved. According the model results, the effect of surface on crash severity depends on mean speed. Thus, the estimated odds of a severe crash on an unpaved surface are 0.47 times as high as those of a crash on a paved surface assuming the crash occurs below the mean speed. Alternatively, the estimated

Table 7 - Logistic regression model results for the interstate, state, county, and WRIR systems.

Variable Interstate State

Chi-sq. p-value AIC BIC Chi-sq. p-value AIC BIC

All variables 11,668 11,863 16,229 16,425

Main effect variables 38.681 <0.0001 11,701 11,871 39.275 <0.0001 16,260 16,422

Intercept 551.913 <0.0001 12,285 12,472 331.992 <0.0001 16,628 16,814

Animal 34.939 <0.0001 11,707 11,894 243.936 <0.0001 16,490 16,677

Rollover 76.418 <0.0001 11,745 11,932 22.701 <0.0001 16,249 16,435

Guardrail 35.651 <0.0001 11,703 11,890 4.120 0.0424 16,231 16,418

FO 0.197 0.6568 11,666 11,853 22.131 <0.0001 16,249 16,436

Vehicle 109.202 <0.0001 11,782 11,969 30.864 <0.0001 16,259 16,446

FHE location 29.765 <0.0001 11,697 11,884 11.119 0.0009 16,238 16,425

Impaired 50.431 <0.0001 11,713 11,900 210.970 <0.0001 16,425 16,612

Road condition 103.802 <0.0001 11,771 11,958 146.644 <0.0001 16,386 16,573

Mean posted speed 33.129 <0.0001 11,702 11,889 47.046 <0.0001 16,279 16,466

Surface

Level 8.104 0.0044 11,674 11,861 13.815 0.0002 16,240 16,427

Alignment 18.376 <0.0001 11,684 11,871 26.001 <0.0001 16,251 16,438

Truck 5.319 0.0211 11,672 11,858

Motor cycle (MC) 47.585 <0.0001 11,707 11,894 157.801 <0.0001 16,367 16,554

Mean speed 4.405 0.0358 11,671 11,857 80.099 <0.0001 16,309 16,496

Maneuver 6.420 0.0113 11,673 11,860 6.600 0.0102 16,234 16,421

Age <25 years 13.563 0.0002 11,680 11,680 22.875 <0.0001 16,250 16,437

Age >65 years 7.233 0.0072 11,673 11,860 15.450 <0.0001 16,242 16,429

Gender 20.715 <0.0001 11,686 11,873 12.274 0.0005 16,239 16,426

Safety equipment 587.006 <0.0001 12,193 12,380 716.303 <0.0001 16,891 17,078

Median 4.589 0.0322 11,671 11,858

MC and mean speed 19.982 <0.0001 11,687 11,874 3.873 0.0491 16,231 16,418

Rollover and mean speed 16.162 <0.0001 11,682 11,869 8.044 0.0046 16,235 16,422

FO and mean speed 15.245 <0.0001 11,682 11,869

Mean post sp and level 8.844 0.0029 16,236 16,423

Mean post sp and alignment 11.280 0.0008 16,238 16,425

Surface and mean speed

Road cond. and age >65 years

County WRIR

Chi-sq. p-value AIC BIC Chi-sq. p-value AIC BIC

2206 2288 776 834

7.888 0.0050 2212 2288 5.315 0.0211 779 832

166.270 <0.0001 2402 2478 154.392 <0.0001 1008 1061

40.060 <0.0001 2270 2346 26.834 <0.0001 814 866

15.223 <0.0001 2220 2296 7.889 0.0050 782 835

62.993 <0.0001 2263 2339 26.184 <0.0001 799 852

27.408 <0.0001 2234 2310 6.878 0.0087 782 834

22.175 <0.0001 2226 2302

14.659 0.0001 2219 2295

10.282 0.0013 2214 2290

78.469 <0.0001 2280 2356 11.619 0.0007 785 837

0.008 0.9305 2204 2280 10.130 0.0015 784 837

5.057 0.0245 779 832

7.689 0.0056 2212 2288

0.780 0.3771 774 827

152.733 <0.0001 2356 2432 61.965 <0.0001 836 889

7.780 0.0053 2212 2288

5.968 0.0146 779 832

Table 8 - Estimates (Est.), standard errors (S.E.), odds ratio estimates (Odds), and 95% confidence interval estimates (2.5% and 97.5%) for statistically significant variables in the logistic regression models of the interstate and state systems.

Variable Interstate State

Est. S.E. Odds 2.5% 97.5% Est. S.E. Odds 2.5% 97.5%

Intercept -4.593 0.196 0.010 0.007 0.015 -3.806 0.209 0.022 0.015 0.033

Animal -1.163 0.197 0.312 0.212 0.459 -1.699 0.109 0.183 0.148 0.226

Rollover 1.142 0.131 3.132 2.425 4.046 0.743 0.156 2.103 1.549 2.855

Guardrail 0.690 0.116 1.994 1.590 2.501 0.305 0.150 1.356 1.011 1.821

FO 0.075 0.169 1.078 0.774 1.502 -0.435 0.092 0.647 0.540 0.776

Vehicle 1.077 0.103 2.936 2.399 3.593 0.467 0.084 1.595 1.353 1.881

FHE location 0.358 0.066 1.430 1.258 1.626 0.207 0.062 1.230 1.089 1.390

Impaired 0.785 0.111 2.193 1.766 2.724 0.993 0.068 2.699 2.361 3.086

Road condition -0.564 0.055 0.569 0.510 0.634 -0.675 0.056 0.509 0.456 0.568

Mean posted speed 0.530 0.092 1.700 1.419 2.036 0.867 0.126 2.380 1.858 3.049

Surface

Level -0.147 0.052 0.863 0.780 0.955 -0.429 0.115 0.651 0.519 0.816

Alignment 0.256 0.060 1.292 1.149 1.453 0.634 0.124 1.886 1.478 2.406

Truck 0.168 0.073 1.183 1.026 1.364

Motor cycle (MC) 1.828 0.265 6.219 3.700 10.454 1.672 0.133 5.325 4.102 6.912

Mean speed 0.205 0.098 1.227 1.014 1.486 0.657 0.073 1.929 1.670 2.227

Maneuver -0.182 0.072 0.833 0.724 0.960 -0.136 0.053 0.872 0.786 0.968

Age <25 years -0.222 0.060 0.801 0.712 0.901 -0.229 0.048 0.795 0.724 0.873

Age >65 years 0.282 0.105 1.325 1.079 1.627 0.309 0.079 1.362 1.167 1.588

Gender -0.259 0.057 0.771 0.690 0.863 -0.164 0.047 0.849 0.775 0.930

Safety equipment 1.705 0.070 5.504 4.794 6.318 1.316 0.049 3.728 3.386 4.105

Median -0.462 0.215 0.630 0.413 0.961

MC and mean speed + MC 3.280 0.207 26.573 17.718 39.853 1.979 0.100 7.235 5.950 8.797

MC and mean speed + mean speed 1.657 0.320 5.243 2.801 9.815 0.963 0.148 2.621 1.961 3.502

Rollover and mean speed + mean speed 0.718 0.087 2.051 1.731 2.430 0.228 0.134 1.256 0.967 1.632

Rollover and mean speed + rollover 1.655 0.107 5.234 4.247 6.451 0.314 0.081 1.369 1.167 1.606

FO and mean speed + FO 0.740 0.125 2.096 1.640 2.678

FO and mean speed + mean speed 0.869 0.146 2.386 1.793 3.174

Mean post sp. and level + level -0.053 0.052 0.949 0.856 1.051

Mean post sp. and level + mean post sp. 1.243 0.077 3.468 2.984 4.029

Mean post sp. and alignment + alignment 0.182 0.056 1.200 1.075 1.339

Mean post sp. and alignment + mean post sp. 0.415 0.143 1.514 1.143 2.005

odds of a severe crash on an unpaved road when the vehicle exceeds the mean speed are 1.95 times higher than those of a crash on an unpaved road when the vehicle travels below the mean speed.

4.6. WRIR system

The results for the final model that are obtained for the WRIR system are shown in Table 7. The model for this system involved the fewest number of crashes. Of these roughly 1200 crashes, just over 150 are severe. This model includes one interaction term. The likelihood ratio test (Chi-square = 5.32, df = 1, p-value = 0.0211) and AIC indicate that the interaction term should be included in the model whereas BIC does not favor the model with the interaction term. The model for the WRIR system has an excellent predictive ability with an area under the ROC curve of 0.8545. The Hosmer-Lemeshow test also does not provide evidence against the assumption of adequate model fit as the p-value is 0.7401. Table 9 shows the estimates and confidence intervals for this model.

Animal, impairment, motorcycles, and seat belt use are predominant predictors of crash severity. The estimates of these effects were similar in magnitude to the model for the county system with the exception of motorcycles. The

estimated odds ratios for these effects are 0.13 for animal, 3.21 for impairment, 4.18 for motorcycles, and 4.83 for seat belt use. This is the only model, other than the global removed system model, to identify state as significant factor. The estimated odds of a severe crash are 0.46 higher if that vehicle is out of state. Driver age is also an important variable in this model and interacted with road condition. The estimated odds ratio of a severe crash for an elderly person are 1.43 times higher than that of a crash for a person under 65 years assuming dry road conditions. If the road conditions are wet, the estimated odds of a severe crash for an elderly person are 9.37 times higher than those of a crash for a person under 65 years. However, the confidence intervals involving elderly drivers are quite wide.

4.7. Summary of results

A review of each Wyoming highway system provides individual identification of those predictors linked to crash severity. The identification of these predictors allows for a targeted approach to address highway safety. On the other hand, predictors common to each of the systems also points out a possible statewide strategy to promote highway safety.

In all 4 highway systems, there are five main effects that are consistently identified, including animal, impairment, motorcycle, mean speed, and safety equipment use. A crash

Table 9 - Estimates (Est.), standard errors (S.E.), odds ratio estimates (Odds), and 95% confidence interval estimates (2.5% and 97.5%) for statistically significant variables in the logistic regression models of the county and WRIR systems.

Variable

County

Odds 2.5% 97.5%

Odds 2.5% 97.5%

Intercept -2.224 0.172 0.108 0.077 0.152

Animal -2.064 0.326 0.127 0.067 0.241

Rollover

Guardrail

FO -0.615 0.158 0.541 0.397 0.736

Vehicle

FHE location

Impaired 1.180 0.149 3.255 2.432 4.356

Road condition -0.817 0.156 0.442 0.325 0.600

Mean posted speed 0.571 0.121 1.770 1.396 2.245

Surface -0.755 0.197 0.470 0.319 0.692

Level -0.372 0.116 0.690 0.549 0.865

Alignment

Motor cycle (MC) 1.841 0.208 6.302 4.193 9.470

Mean speed -0.014 0.158 0.986 0.723 1.345

Maneuver

Age <25 years -0.322 0.116 0.724 0.577 0.910

Age >65 years

Gender

Safety equipment 1.454 0.118 4.279 3.398 5.389

Surface and mean speed - mean speed 0.670 0.192 1.954 1.342 2.844

Surface and mean speed - surface -0.072 0.151 0.931 0.693 1.250

Road cond. and age >65 years - h age >65 years

Road cond. and age >65 years - h road cond.

-2.693 -2.011

0.217 0.388

1.165 -0.838

0.068 0.134

0.044 0.063

0.228 0.319

3.206 0.433

2.052 0.231

0.103 0.286

0.707 0.252 0.493 0.301 0.808

5.010 0.809

1.430 0.420 4.180 1.837 9.515 0.642 0.202 1.900 1.280 2.821 0.775 0.345 0.461 0.234 0.905

0.354 0.401 1.425 0.649 3.128 1.574 0.200 4.827 3.262 7.143

2.237 0.663 9.365 2.552 34.369 1.045 0.709 2.844 0.708 11.419

involving an animal is less likely to be severe than a crash involving some other types of first harmful event. Twenty-two percent of all crashes over the ten-year period involve animals, and only 1.17% of all crashes are severe involving an animal. These types of crashes tend to be less severe and typically are property damage only crashes. Impairment across the state systems has consistently high estimated odds ratios ranging from 2.2 to 3.3 without interactions. The effect of impairment for the WRIR system is only slightly lower than that for the county system, but both are larger than those observed for the interstate system and the state system. There are also indications thatthe effects of impairment on crash severity could dramatically increase with distracted driving. The effect of motorcycles on the probability of a severe crash could be quite large with estimated odds ratios of 6.30 and 4.20 for the county system and for the WRIR system, respectively. The effect of motorcycles on crash severity increases with mean speed on the interstate system and state system. The failure to use seat belts also increases the estimated odds of a severe crash with estimated odds ratios from 3.73 for the state system to 5.50 for the interstate system. In particular, this estimated odds ratio is second highest (4.83) for the WRIR system.

Some of the major differences in all the models are due to the interaction terms. On the interstate system, mean speed interacts with motorcycles, fixed objects and rollovers. For the state system, the mean posted speed interacts with the geometric effects of level and alignment. The county system is the only system that included the main effect of roadway surface. The WRIR system includes a unique interaction between age older than 65 years and road condition. The WRIR is

the only system that includes state as an effect and the results indicates that drivers from Wyoming are more likely to be in a severe crash.

5. Conclusions and recommendations

Rural roadway systems differ from urban systems because of the lower population densities, longer travel times, more extreme terrain, and other nonurban conditions. These factors along with behavioral factors need to be considered when developing a highway safety program for rural systems. This study examines crash severity in Wyoming based upon the objective set forth in the Wyoming Strategic Highway Safety Plan to reduce critical crashes. Since crash severity is defined as a dichotomous variable (whether a crash was severe or not), multiple logistic regression is used. A careful methodology is followed for model development to identify important predictors of crash severity on rural highway systems and on Indian reservation roads in Wyoming. A global system model is first developed for the entire state and then refined for each of four rural highway systems: the state highway system, interstate system, county rural local roads, and the roadway system on the Wind River Indian Reservation. The key findings are as follows.

(1) The main predictors of crash severity for all models include animal, impairment, motorcycle, mean speed, and safety equipment use.

(2) The estimated odds of a crash being severe for impairment range from 2.2 on the interstate to 3.3 on the county system.

(3) Motorcycle crashes have high estimated odds ratios on county roads of 6.3 followed by WRIR at 3.2. On the interstate, the estimated odds of a crash being severe are 26.6 times more likely when a crash involves a motorcycle traveling above the mean speed than if a motorcycle is not involved and the vehicle is traveling above the mean speed.

(4) The estimated odds of a crash being severe range from 3.7 on the state system to 5.5 on the interstate when safety equipment was not used.

(5) Animal crashes are predicted to be less severe with estimated odds ratios ranging from 0.13 to 0.31.

(6) The estimated odds ratios on the reservation are of similar magnitude compared with other systems, especially the county roads.

(7) Distracted driving and traffic data have important associations with crash severity. However, few crash reports have information on these factors.

(8) Based on the estimated odds ratio, a crash is 1.27 times more likely to be severe if the driver is distracted than if they are not and 10.7 time more likely to be severe for an impaired, distracted driver than a non-impaired, distracted driver.

Rural and Indian communities alike recognize that impairment, seat belt use, and speeding are major contributors to critical crashes on their roadways. The results of this model provide values that quantify these concerns. By targeting strategies that address these factors, WYDOT and other transportation departments across the country can apply countermeasures effectively to achieve roadway safety goals that align with the findings of this study. The following recommendations are provided to achieve these goals:

(1) Based on the significant impact that distracted driving has on predicting crash severity, it is recommended that priority should be given to increase reporting. Crash investigators need to provide accurate and complete information in the crash record.

(2) More emphasis should be placed on collecting traffic data such as ADT, ADTT, VMT, and TVMT on county and WRIR roads as well as improving collection on the state system.

(3) Due to the similarities of the odds ratios on the county and WRIR roads, the safety of these two classes of rural highways can be enhanced by applying similar safety countermeasures.

(4) To address the behavioral issues associated with impairment, safety equipment use, distracted driving, and motorcycles, enforcement and education are practical countermeasures to reduce critical crashes.

(5) High speeds associated with rural systems can be addressed through better speed limits, roadway geometry along with behavioral countermeasures of enforcement and education.

Many of these recommendations could be incorporated in the state DOT design considerations and development of policies. The methodology developed in this research for the rural roadway systems in Wyoming to predict crash severity could be used for other rural communities worldwide to address roadway safety concerns and to reduce the severity of crashes.

Acknowledgments

The WYT2/LTAP center at the University of Wyoming provided extensive resources to assist in the compilation of the data sets used and the development of the models. WYDOT was extremely helpful and responsive to provide the needed bulk crash data, inventories and traffic data. Special acknowledgement goes to WYDOT and FHWA who provided the resources to make this research possible.

Appendix A. Supplementary data

Supplementary data related to this article can be found at http://dx.doi.org/10.10167j.jtte.2015.12.002.

REFERENCES

Andreen, B., Ksaibati, K., 2012. Comparing Crash Trends and Severity in the Northern Rocky Mountain Region. FHWA-WY-12/03F. University of Wyoming, Laramie.

Atkinson, J., Chandler, B., Betkey, V., et al., 2014. Manual for Selecting Safety Improvements on High Risk Rural Roads. FHWA-SA-14-075. Federal Highway Administration (FHWA), Washington DC.

Bham, G., Javvadi, B., Manepalli, U., 2012. Multinomial logistic regression model for single-vehicle and multi-vehicle collisions on urban highways in Arkansas. Journal of Transportation Engineering 138 (6), 786-797.

Centers for Disease Control and Prevention. CDC - research update: motor vehicle crashes among young drivers - teen drivers — motor vehicle. Available at: http://www.cdc.gov/ MotorvehicleSafety/teen_drivers/GDL/youngdrivers.html (accessed 02.07.15.).

Centers for Disease Control and Prevention, 2012. Injuries among American Indians/Alaskan Native (AI/AN): fact sheet. Available at: http://www.cdc.gov/Motorvehiclesafety/native/ factsheet.html (accessed 02.07.15.).

Classen, S., Awadzi, K., Mkanta, W., 2008. Person-vehicle-environment interactions predicting crash-related injury among older drivers. American Journal of Occupational Therapy 62 (5), 580-587.

FHWA, 2012. Strategic Highway safety plan. Available at: http:// safety.fhwa.dot.gov/hsip/shsp/ (accessed 02.07.15.).

Golembiewski, G., Chandler, B., 2011. Roadway Safety Information Analysis: A Manual for Local Rural Road Owners. FHWA-SA-11-10. FHWA, Washington DC.

Herbel, S., Kleiner, B., 2010. National Tribal Transportation Safety Summit Report. FHWA-FLH-10-007. FHWA, Washington DC.

Hosmer, D., Lemeshow, S., Sturdivant, R., 2013. Applied Logistic Regression, third ed. John Wiley and Sons, Inc., Hoboken.

Ksaibati, K., Evans, B., 2009. WRRSP: Wyoming Rural Road Safety Program. FHWA-WY-09/06F. FHWA, Washington DC.

Kutner, M., Nachtsheim, C., Neter, J., 2004. Applied Linear Regression Models, fourth ed. McGraw-Hill, New York.

Mooradian, J., Ivan, J., Ravishanker, N., et al., 2012. Temporal modeling of highway crash severity for seniors and other involved persons. Transportation Research Record 3582, 1-17.

National Cooperative Highway Research Program (NCHRP), 2007. NCHRP Synthesis 366, Tribal Transportation Programs, a Synthesis of Highway Practice. NCHRP, Washington DC.

National Safety Council, 1970. Manual on Classification of Motor Vehicle Traffic Accidents, third ed. National Safety Council, Chicago.

Niessner, C.W., 2010. Highway Safety Manual. American Association of State Highway and Transportation Officials, Washington DC.

Pei, Y., Fu, C., 2014. Investigating crash injury severity at unsignalized intersections in Heilongjiang Province, China. Journal of Traffic and Transportation Engineering (English Edition) 1 (4), 272-279.

Quddus, M.A., Noland, R.B., Chin, H.C., 2002. An analysis of motorcycle injury and vehicle damage severity using ordered probit models. Journal of Safety Research 33 (4), 445-462.

SAS Institute Inc, 2008. SAS/STAT® 9.2 User's Guide. SAS Institute, Inc., Cary.

Savolainen, P., Mannering, F., Lord, D., et al., 2011. The statistical analysis of highway crash-injury severities: a review and assessment of methodological alternatives. Accident Analysis & Prevention 43 (5), 1666-1676.

Shinstine, D., Ksaibati, K., 2013. Indian reservation safety improvement program: a methodology and case study. Transportation Research Record 2364, 80-89.

Shinstine, D., Ksaibati, K., Gross, F., 2015. Strategic safety management plan for Wind River Indian reservations. Transportation Research Record 2472, 75-82.

TRIP, 2015. Rural connections: challenges and opportunities in America's Heartland. Available at: https://www.tripnet.org (accessed 02.07.15.).

Uhm, T., Chitturi, M., Bill, A., 2012. Comparing statistical methods for analyzing crash frequencies. Transportation Research Record 4472, 1-14.

Ward, N.J., Linkenbach, J., Keller, S.N., et al., 2010. Toward Zero Deaths: A National Strategy on Highway Safety. Western Transportation Institute, Bozeman.

Weiss, A.A., 1992. The effects of helmet use on the severity of head injuries in motorcycle accidents. Journal of the American Statistical Association 87 (417), 48-56.

WYDOT, 2013. Traffic volume and vehicle miles book. Available at: https://www.dot.state.wy.us/home/planning_projects/ Traffic_Data.default.html (accessed 02.07.15.). Wyoming Department of Transportation (WYDOT), 2009. Critical Analysis Reporting Environment (CARE). WYDOT, Cheyenne. Wyoming Highway Safety Management System Committee, 2012. Wyoming Strategic Highway Safety Plan. WYDOT, Cheyenne.

Debbie S. Shinstine, PhD, P.E., received her BS degree in Civil Engineering from the University of Wyoming and her MS from the University of Arizona. She received her PhD from the University of Wyoming. She is currently an adjunct professor and research engineer at the University of Wyoming.

Shaun S. Wulff, PhD, is an associate professor at the University of Wyoming. He received his MS in Statistics from Montana State University and his PhD in Statistics from Oregon State University. His research interests include linear models, mixed models, mathematical statistics, and statistical applications to engineering.

Khaled Ksaibati, PhD, P.E., secured his BS degree from Wayne State University and his MS and PhD from Purdue University. Dr. Ksaibati worked for a couple of years for the Indian Department of Transportation prior to coming to the University of Wyoming in 1990. He was promoted to an Associate professor in 1997 and full professor in 2002. Dr. Ksaibati has been the director of the Wyoming Technology Transfer Center since 2003.