Scholarly article on topic 'EMPLOYMENT OF ATMS TRAFFIC CONTROL DEVICE DATA TO ASSIST IN IDENTIFICATION OF CRASH-PRONE INTERSECTIONS'

EMPLOYMENT OF ATMS TRAFFIC CONTROL DEVICE DATA TO ASSIST IN IDENTIFICATION OF CRASH-PRONE INTERSECTIONS Academic research paper on "Earth and related environmental sciences"

CC BY-NC-ND
0
0
Share paper
Academic journal
IATSS Research
Keywords
{ATMS / "Discriminant analysis" / "Crash-prone intersections"}

Abstract of research paper on Earth and related environmental sciences, author of scientific article — Kevin P. HWANG

This paper employs information from the advanced traffic management system (ATMS) of Kaohsiung, Taiwan to help differentiate those crash-prone intersections by discriminant analysis. From the 25,604 records of 2005, 1977 crashes that occurred at 119 intersections with traffic exposure data were compiled to calibrate and validate the model. The road attributes of crash records, traffic control devices and movement exposure are the three types of data used as predicting variables. The correct ratios for model calibration and validation range from 78.33% to 67.80%. if traffic movements are removed, the correct ratios become slightly lowered to 76.67% to 66.10%. Research findings reveal that with or without inclusion of exposure data in identifying high crash-prone intersections for an urban environment does not make a significant difference. in addition, layout and traffic control devices could possibly explain about 66.10 ∼ 78.33% of the possibility that an intersection will become a high crash intersection. it suggests that the developed approach could be a countermeasure for budget constraints and difficulties in continuation of exposure data collection, and the information of ATMS could help identify crash-prone urban intersections.

Academic research paper on topic "EMPLOYMENT OF ATMS TRAFFIC CONTROL DEVICE DATA TO ASSIST IN IDENTIFICATION OF CRASH-PRONE INTERSECTIONS"

EMPLOYMENT OF ATMS TRAFFIC CONTROL DEVICE DATA TO ASSIST IN IDENTIFICATION OF CRASH-PRONE INTERSECTIONS

Kevin P. HWANG

Associate Professor Department of Transportation and Communication Management Science National Cheng-Kung University, Taipei, Taiwan

(Received January 30, 2008)

This paper employs information from the advanced traffic management system (ATMS) of Kaohsiung, Taiwan to help differentiate those crash-prone intersections by discriminant analysis. From the 25,604 records of 2005, 1977 crashes that occurred at 119 intersections with traffic exposure data were compiled to calibrate and validate the model. The road attributes of crash records, traffic control devices and movement exposure are the three types of data used as predicting variables. The correct ratios for model calibration and validation range from 78.33% to 67.80%. If traffic movements are removed, the correct ratios become slightly lowered to 76.67% to 66.10%. Research findings reveal that with or without inclusion of exposure data in identifying high crash-prone intersections for an urban environment does not make a significant difference. In addition, layout and traffic control devices could possibly explain about 66.10 ~ 78.33% of the possibility that an intersection will become a high crash intersection. It suggests that the developed approach could be a countermeasure for budget constraints and difficulties in continuation of exposure data collection, and the information of ATMS could help identify crash-prone urban intersections.

Key Words: ATMS, Discriminant analysis, Crash-prone intersections

1. INTRODUCTION

Much highway safety research focuses on evaluation of specific improvements1-4, such as where and how to implement measures2,45 and what the effects are34, both before and after crash data are collected and studied to evaluate the effectiveness of the measures and proce-dures56. Some safety consultations, especially those related to insurance policy or legal focus of the appraisal, arbitration, or analysis of a single accident or site, conduct liability or cause analysis7. Less emphasis has been paid by research to identify the crash-prone sites from the possible population. It appears most safety professionals consider identification is a simple routine of statistical analysis.

1.1 Motivation

In many developing countries, vehicle crashes are not properly documented, and traffic surveys that collate data are not regularly administered. Therefore, methodologies employed by developed countries can not be directly adopted89.

Identification of High Crash Location (HCL) is normally the first step to enhance road safety before a cause analysis is performed. Much international research has discussed crash causes and used relevant data to predict crash frequencies1011. Crash records, roadway geometry, and traffic volume as exposure data are the basic inputs to conduct highway safety research. Since the development of the Advanced Traffic Management System (ATMS), much data has been made available from the system12-14. Many developing countries are also developing ATMS to achieve better control of traffic1516. It is therefore the intention of this research to discover if the data of roadway geometry and other relevant control devices can be acquired through ATMS for safety analysis to achieve additional benefits aside from traffic control.

1.2 Study scope

In Taiwan, almost every city with a population of more than 300,000 has developed its own ATMS to control its signals on-line. The city of Kaohsiung, with 1.5 million residents, has calibrated its ATMS12 to also provide a layout of roadway geometry and information of

Fig. 1 An illustration of ATMS at Chung-Chan and Chung-Sung intersection12

relevant traffic control devices such as, markings to illustrate the distribution of traffic lanes, location of signals and sign installations, etc. They are updated frequently and displayed through ATMS, as shown in Figure 1, which is a reshaped roundabout at Chung-Chan and Chung-Sung intersection. The city also maintains a vehicle crash data base as well as a biannual traffic survey. Thus, this study chose Kaohsiung as the test city and employed those data to evaluate the proposal of whether ATMS' data can help identify the high crash-prone locations.

2. LITERATURE REVIEW AND METHODOLOGY

2.1 Methodologies to identify crash-prone locations

Though much research has discussed crash causes, improvement performance, and effectiveness of specific product or working procedures, there is still little research that has focused on the identification of crash-prone locations. The available research and conclusions are summarized in Table 1.

It is known from the above literature review that almost all HCL (high crash location) identification or refinement processes require the use of exposure data and crash records. From these, the crash rate or severity rate were used to identify the HCLs. This is a sound theoretical procedure. However, it is also known in some Asian countries, such as Thailand 9 or Korea 8, other methods have been developed to substitute for such a statistical procedure or deal with the uncertainty of crashes.

In Taiwan, crash records have been well document-

Table 1 Research on identification of high crash or crash-prone locations

Wang, Y.W., Wen, M.J., & Ting, K.L. (1994) 17

Purpose: Identify rural crash-prone bridges and pertinent causes

Result and findings: Developed crash-prone bridge discriminant function and identified slope discontinuity, percentage of heavy vehicles, curvature to approach road, travel speed, and reduction of shoulder width as objective key factors. The discriminant correct rate is 71.88%.

Stamatiadis, N., Jones, S., & Hall, L. (1999) 18

Purpose: Low-volume roads comprise a significant portion of the rural roadway network. Because of documented higher crash frequencies and more severe injuries on such roads, it is necessary to further examine causal factors of these crashes and to determine if crash characteristics follow the patterns of other highways.

Result and findings: The results showed that (a) low-volume roads present similar crash trends as those observed on other roads; (b) drivers under the age of 25 and drivers over the age of 65 have higher crash propensities than middle-aged drivers; (c) female drivers are safer on average than male drivers; (d) young drivers (under the age of 25) experience more single-vehicle crashes and drivers over 65 are more likely to be involved in two-vehicle crashes; (e) drivers of older vehicles have higher two-vehicle crash propensities on low-volume roads than drivers of newer vehicles; (f) in single-vehicle crashes, drivers of older vehicles are more likely to have a serious injury than drivers of new vehicles; and (g) large trucks have the highest two-vehicle crash propensity on low-volume roads, followed by sedans, pick-up trucks, vans, and station wagons.

Kim, K. & Yamashita, E. (2002) 19

Purpose: Various types of land uses tend to generate and attract different types of trips, and trip-making behavior affects the nature and volume of traffic. As the use of land intensifies, it does not seem unreasonable to expect that the potential exposure to crashes would also increase.

Result and findings: Yet upon closer inspection, it was evident that crashes were more a function of the characteristics of drivers and travelers than the underlying uses of land. Using comprehensive police crash data linked to a land use database, the relationships between land use and automobile crashes in Hawaii were investigated. Recent developments in geographic information system technologies and the availability of spatial databases provide a rich source of information with which to investigate the relationships between crashes and the environments in which they occur.

Bernhardt, K.L.S., & Virkler, M.R. (2002) 20

Purpose: Briefly described the HCL (high crash location) analysis process, examines some common errors and provides additional background on five important concepts for understanding and analyzing HCLs.

Result and findings: 1. Employment of EPDO (equivalent of property damage only) would significantly affect the result of analysis. Weight selection requires special attention. 2. Don't apply normal distribution instead employ a Poisson distribution when crashes are fewer than 9. 3. Proper explanation of significance. 4. Crashes occur over time but regression is applied toward the means. 5. When improvement does not impact one another, B/C (benefit/cost) analysis can be useful for ranking independent alternatives, otherwise use incremental B/C analysis.

Espino, E.R., Gonzalez, J.S. & Gan, A. (2003) 21

Purpose: Developed model to identify high pedestrian crash sites.

Result and findings: Using the Poisson model with 1 mile as a measuring unit, a model with 0.1 significance can be obtained. It appears that a divided 4-lane road tends to have a higher crash frequency of 4 per year as a threshold.

Hwang, K.P., Tsai, M.Y., & Ou, T.C. (2005) 22

Purpose: Discussed a photo-logging survey methodology to identify urban crash-prone intersections.

Result and findings: Found 32 variables and used 10 principal components to explain over 81.5% of variance. If traffic variables were deleted, 7 principal components were identified to explain 79.4% of variance. Image information could help develop better discriminant functions.

Ivan, J.N. (2005) 23

Purpose: Attempted to develop a methodology to overcome the deficiency of nonlinearity between crashes and volumes.

Result and findings: The crash rate calculated was different from the average daily crash rate. Proper dealing with crash rates is important.

Pei, Y.L., & Dai, T.Y. (2005) 24

Purpose: Discussed ways to cope with identification of high crash locations when significant traffic variation exists.

Result and findings: Used fatality per 100 million vehicle kilometers, per 10 thousand vehicles; crash number and severity index per 10 thousand vehicle equivalent to develop a bi-level fuzzy evaluation model.

Son, B., Park, M., & Lee, S. (2005) 8

Purpose: Tried to improve current hazardous road identification method of Korea.

Result and findings: Apply: 1.crash severity, 2.road geometry, 3.traf-fic characteristics; and 4.local government opinions in order to develop an evaluation AHP approach.

Pei, Y.L., & Ding, J.M. (2005) 25

Purpose: Identified crash-prone locations and found their attributes.

Result and findings: The attributes included 1. curve radius < 1,000 meter, 2. grade = 3%, and 3. insufficient sight distance and frequent rear-end crashes.

Fukuda, T., Tangpaisalkit, C., Ishizaka, T., & Sinlapabutra, T. (2005) 9

Purpose: Developed an alternative method when statistical crash records were not available to identify potential black spots.

Result and findings: Employed the description of eyewitnesses to identify potential black spots then clustered those descriptions into specific sites. Concluded that the Hiyari-Hatto theory of traffic psychology could be combined with public participation as an alternative method.

Pei, Y.L. (2006)26

Purpose: Improved the rate quality control identification method.

Result and findings: Using the Gamma distribution as a crash distribution could render better identification results.

Cafiso, S., La Cava, G., & Montella, A. (2007) 27

Purpose: Tested if the inconsistency between curves could be used to identify high crash locations with further refinement.

Result and findings: Proved the inclusion of a safety index with the inconsistency of curves could better explain the cause of crashes.

Xiao, Q., Ghazan, K., & Noyce, D.A. (2007) 28

Purpose: Devised a unique and innovative approach to identifying and prioritizing snow-prone locations in specific corridors or road segments for the implementation of a road weather safety audits.

Result and findings: A grid-based structure for the entire state was used with snow-related crashes aggregated for each grid. A normalized "relative crash rate" was also calculated for each grid section. 20 and 98 grids of the over 6,000 total grids in Wisconsin consistently appeared as high snow crash locations for three years or two out of three years, respectively.

ed and recorded in the past 15 years to identify HCLs for improvement. However, with the restructuring of provincial governments in 1990, the routine traffic survey was no longer administered for lack of budget, except for in metropolitan Taipei and Kaohsiung. Lacking the exposure data results in the crash rate being uncalculated; and identification of HCLs to rely only on crash numbers, which may cause a bias that underestimates the importance of low volume locations. It is therefore both interesting and important to know if identification of crash-prone locations can do without exposure data and how accurate they might be.

2.2 Methodology employed

After the literature review and data collection, this study first used cluster analysis to group the included locations by their crash records. Geometry, traffic control devices, and exposure data were collected for those sites. Thereafter, principal component analysis was used to subtract those high variance explained variables. Factor analysis was used to group the variables, name the groups, and match with the category of data used by traditional methods to identify HCLs. Discriminant analysis was then used to calibrate a discriminant function to differentiate between high crash intersections versus average intersections.

The correct (hit) rate was used as an index to evaluate the discriminant functions. The Spearman rank correlation test was also used to test the accuracy of the discriminant function.

3. DATA COLLECTION AND ATTRIBUTE ANALYSIS

3.1 Crash records

A total of 25,604 crash records were obtained from Kaohsiung city for the year 2005. Of these, 11,046 crashes (43.1%) occurred at mid-block road sections. The rest 14,558 (56.9%) were intersection related crashes. Only 1,977 intersection related crashes (13.6%) occurred at the

8 30 Ö

Intersection Crash Statistics

7,5 19,1

I118,0

Intersection No

Fig. 2 Distribution of crashes at 119 Kaohsiung intersections

sites with available traffic counts of 119 intersections. The remaining 12,581 crashes (86.4%) occurred at intersections without exposure data. Therefore, only 7.7% of the total crash population was useful for the analysis of this research, which is a constraint of this study. Figure 2 displays the crash distribution at those 119 intersections.

From the 12,581 crash records, we applied the two staged cluster analysis with the Ward method first, and the K-mean method second to obtain two crash groups. The high crash group had 36 intersections with an average crash number of 34.8, and 83 intersections were clustered as an average crash group with an average crash number of 8.7. We systematically divided both groups into two subgroups; one with 60 intersections for modeling purposes and the other 59 intersections for validation purposes. Through the assistance of GIS, the spatial distribution of those 119 intersections is illustrated in Figure 3. To avoid the complexity of the illustration, only major streets were kept and it was also learned from the spatial distribution that many high crash intersections fell on the two north-south corridors.

3.2 Traffic counts

Exposure data from the 2005 Kaohsiung Traffic Survey and Characteristic Analysis (2005)29 were compiled for the 119 intersections. It included data of vehicle

Fig. 3 Spatial distribution of high crash vs. normal intersections

type, through and turning movements, etc. It was noted that Kaohsiung city has a very high percentage of motorcycle and container trucks of which the first is a common phenomenon of Taiwan and the second is a consequence of Kaohsiung harbor which is also illustrated in the lower

part of Figure 3. After compilation, 6 kinds of traffic counts were used as input exposure variables; they are daily left-turn vehicles (LT), daily right-turn vehicles (RT), daily heavy vehicles (DHV), daily light vehicles (DLV), daily motorcycles (DM), and daily container trucks (DCT). In addition, vehicle count was used directly without the conversion of vehicles to PCU (passenger-car-units) as traffic conflict is counted by vehicle numbers instead of PCU.

3.3 Geometry and facility data

From the crash record, the basic geometry of an intersection was obtained, such as if it is a regular cross intersection, oblique or Y-shaped intersection, pavement deficiency, obstruction, etc. In addition, the much more detailed layouts for those 119 intersections were obtained from the web of Kaohsiung ATMS 12. Figure 4 illustrates the enlarged layout in the lower left corner of Figure 1. It shows the northbound direction is forbidden to make left-turns but the eastbound direction has an exclusive left-turn lane. In addition, two double-faced signals can be seen in the illustration. There are also squared markings to indicate the requirement commanding motorcycles to make a two-staged left-turn to avoid conflict with through traffic. Therefore, traffic control devices including signals, signs, and channelization such as the content of signs, and divided medians can all be acquired from investigation of the ATMS data base. The basic geometry of an intersection from the crash record has become redundant for they are available in a much more detailed degree from the ATMS. There were 20 variables in this category used as input data. Those related to road geometry were road class (RC), speed limit (SL), road type (RY), intersection type (IT), pavement condition (PC), pavement deficiency (PD), obstruction (O), and quality of sight distance (SD). Those related to signal control were the signal category (SC), whether the signal functions (SF) properly, and the number of signal installations (NSI). Those related to intersection approach distribution were total fast lane (FL), total fast and slow lane (F&SL), total lane including curb lane for parking (TL), total left-turn lanes (TLT), and total right-turn lanes (TRT). Those related to signs were the number of speed limit signs (NSL) and the number of other regulatory and warning signs (NRW). In addition, the class of channelization (CC) and roadside delineations (RD) were also considered.

3.4 Others

Besides those exposure, geometric, and traffic control data, the weather (W) and lighting (L) of each crash occurrence were also entered as inputs.

Fig. 4 Detailed illustrations of Chung-Chan and Chung-Sung intersection

4. DATA ANALYSIS

In order to know the relative importance of the 28 variables before discriminant analysis, principal component analysis was used. Thereafter, factor analysis was used to investigate the clustering characteristics of the variables.

4.1 Principal component analysis

Table 2 shows the results of the principal component analysis including the factor loading of each variable relative to each component. At this stage, only the data of 60 intersections for the purpose of calibrating the discriminant model were used. It was seen that ten principal components with Eigenvalues greater than 1 could be identified. They could explain 73.54% of the total variance of those 28 variables included for the analysis. In addition, 9 variables were identified with factor loading greater than 0.5 as major components of the 7 factors. However, no variable was specifically correlated with the first and second factors which suggested they were the general representation of all the variables.

It was also learned from the principal component analysis that the road class (RC), pavement condition (PC), pavement deficiency (PD), obstruction (O), intersection type (IT), signal installation (NSI), daily heavy vehicles (DHV) and daily container trucks (DCT) were important variables in relevant factors and maybe useful in identifying crash-prone intersections.

4.2 Factor analysis

In order to confirm the variables could be categorized according to our pre-classification and to reveal their relative importance in each group, factor analysis was executed. We employed the Varimax method of prin-

Table 2 Principal component analysis

Variables Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6 Factor 7 Factor 8 Factor 9 Factor 10

W 0.0039 -0.0393 0.0256 0.6688 0.0276 -0.0781 -0.0645 0.0822 -0.1031 -0.0297

L -0.0309 0.0031 -0.0769 0.1827 -0.1802 0.0516 0.0689 0.0230 0.4571 -0.2600

RC -0.0206 -0.0106 0.0088 -0.0107 0.0513 -0.0099 0.0659 -0.1291 0.1617 -0.8782

SL 0.0311 -0.0361 0.1266 0.0220 0.4226 0.0382 0.2363 -0.2526 0.0804 -0.0617

RY -0.2363 0.2283 0.1992 0.0376 -0.0775 -0.0400 0.2096 -0.0760 -0.0479 -0.0017

PC -0.0051 -0.0620 0.0257 0.6541 0.0268 -0.1043 -0.0638 0.1034 -0.1134 -0.0138

PD -0.0240 -0.0054 0.0122 0.0865 0.0012 -0.0361 0.0871 0.0246 0.5187 0.1651

O 0.0318 0.0072 -0.0115 0.0469 -0.3262 -0.1888 -0.1515 -0.5504 0.0408 0.0943

SD 0.0036 0.0080 -0.0233 0.0755 -0.1710 -0.1280 -0.3712 -0.4652 0.2656 0.1199

SC -0.3696 0.2865 0.0727 0.0345 0.0922 0.0477 -0.0325 -0.0521 -0.0661 0.0099

SF -0.3672 0.3003 0.0956 0.0252 0.0702 0.0367 -0.0284 -0.0376 -0.0296 0.0050

CC 0.3312 -0.2916 -0.0996 0.0059 0.0558 -0.0106 0.1009 -0.0396 -0.0074 0.0168

FL 0.3314 -0.2966 -0.0533 -0.0050 0.0378 0.0012 0.0370 -0.0205 -0.0387 -0.0056

F&SL 0.0739 -0.0397 0.1851 0.0902 0.0503 0.3478 0.2611 -0.3834 -0.0849 0.0845

RD -0.0553 0.0000 -0.0208 -0.1982 -0.0927 -0.1746 -0.0816 0.1677 0.2609 0.0126

IT -0.1363 -0.1694 0.0161 0.0138 -0.5435 0.2855 0.0623 0.1663 -0.0630 -0.0681

TL -0.2595 -0.3481 0.1517 -0.0172 0.0555 0.2069 -0.0807 0.0413 0.0806 0.0350

TLT -0.1710 -0.2087 0.0667 -0.0754 -0.1252 -0.3223 0.1500 -0.1272 -0.3454 -0.0625

TRT -0.1421 -0.2565 0.2389 -0.0321 -0.2091 -0.1317 0.2927 -0.0595 -0.0467 -0.0339

NSL -0.1537 -0.0443 -0.3179 0.0392 0.2855 0.1082 0.2029 -0.0391 0.0776 0.0910

NSI 0.0290 0.0396 -0.1482 -0.0079 -0.1684 0.5165 -0.3091 -0.0832 -0.2529 -0.1540

NRW -0.0020 -0.0382 0.2278 0.0968 -0.1919 0.2385 0.3206 0.0819 0.2797 0.2172

LT -0.2011 -0.2508 0.1129 -0.0530 0.0010 -0.3847 -0.0666 0.1052 -0.0451 -0.0426

RT -0.2375 -0.2279 -0.1641 0.0034 0.1974 0.0568 -0.3343 0.0653 0.1505 0.0516

DHV 0.1338 -0.0258 0.5395 -0.0460 0.1097 0.0607 -0.2575 0.0587 0.0259 -0.0448

DLV -0.2182 -0.2939 0.0168 -0.0075 0.2130 0.1092 -0.0470 -0.3117 -0.0145 0.0211

DM -0.3070 -0.3578 -0.0484 -0.0179 0.0199 0.1333 -0.1012 0.0717 0.0247 0.0207

DCT 0.1559 -0.0019 0.5303 -0.0363 0.1132 0.0555 -0.2708 0.0959 0.0540 -0.0256

Table 3 Group results of factor analysis

Eigenvalues

Values Extraction: Principal components

Eigenvalue %Total Cumulative Cumulative

variance Eigenvalue %

1 3.356772 23.97695 3.356772 23.97695

2 3.081096 22.00783 6.437868 45.98477

3 1.701029 12.15021 8.138897 58.13498

4 1.255727 8.96948 9.394626 67.1 0446

cipal components analysis to execute the factor analysis. It revealed that the 28 variables could be represented by four factors each having an Eigenvalue greater than 1 with only 14 variables left. The explained variance of each factor decreased from 23.98% to 8.97%, and the total explained variance could reach as high as 67.10% (Table 3). Table 4 indicates classification of the 14 remaining variables and naming of the four factors. For the first factor, road type (RY), signal category (SC), class of channelization (CC), and total fast and slow lane (F&SL) repre-

Table 4 Loadings of factor analysis

Factor Loadings (Unrotated)

Variables Extraction: Principal components

Factor 1 Factor 2 Factor 3 Factor 4

RY 0.791977 0.386947 0.279459 -0.034338

SC 0.724768 0.344185 0.324457 0.083840

CC -0.826007 -0.282530 -0.040329 0.090678

F&SL -0.635710 -0.291738 -0.386683 -0.061287

TLT -0.042029 0.474891 -0.574131 0.265057

TRT 0.140298 0.429069 -0.561367 -0.225255

NSL 0.081284 0.242016 -0.360976 -0.661045

NSI 0.389698 0.380928 -0.459093 0.203163

NRW -0.027042 0.022518 0.260991 -0.697570

LT -0.442781 0.721187 0.280148 0.021639

RT -0.639412 0.498973 0.351631 -0.008926

DM -0.076556 0.724054 -0.182784 0.149665

DLV -0.447534 0.729726 0.081768 -0.242735

DCT -0.355160 0.456424 0.265216 0.260420

Expl.Variance 3.356772 3.081096 1.701029 1.255727

% of Total Expl.V. 0.239769 0.220078 0.121502 0.089695

sented geometry and control devices. Left turn traffic (LT), right turn traffic (RT), daily motorcycles (DM), daily light vehicles (DLV) and daily container trucks (DCT) represented the exposure data which is the major content of factor 2. The variables of total left-turn lanes (TLT), total right-turn lanes (TRT), and the number of signal installations (NSI) represented intersection layout and its major control device which is the major content of factor 3. Factor 4 included the number of speed limit signs (NSL) and the number of other regulatory and warning signs (NRW) which represent signing of the traffic control devices. Therefore, factor 1 to 4 can be named as "Intersection Geometry, Traffic Exposure, Intersection Layout and Signal Control, and Sign Control." It was known from the factor analysis that the 4 factors were similar to the categories of the data source, except that the data from ATMS could be further added to factor 1 such as total fast and slow lane (F&SL), and used to generate factor three and four. The details of the intersection layout and traffic control devices showed their importance through factor analysis which was seldom mentioned in most safety analyses. Nevertheless, factor one, three and four were all related to the intersection geometry, layout and traffic control. Therefore, the classification of input data into two to four categories was appropriate.

5. DISCRIMINANT ANALYSIS

After the principal component analysis and factor analysis of the collected data, this section discusses the application of discriminant analysis to calibrate the discriminant function to differentiate an intersection into either a high crash intersection or an average intersection. Both the correct (hit) rate and Spearman's p ranking test were used to appraise the goodness of the model and validate the hypotheses of this study. The stepwise method is used to avoid multicolinearity of the variables.

5.1 Modeling

Including exposure data

Taking the first group of 60 intersections to execute discriminant analysis, the overall correct rate was 78.33%, with 72.22% correct rate for High Crash Intersections (HCI) and an 80.95% for average intersections. The discriminant function is as follows:

Y= -1. 1598(RY)+1.8717(SC)-0.1043(CC)-0.1791(F&SL)+ 0.9607(TLT)+0.3799(TRT)-0.0249(NSL)+0.1178(NSI) + 1.1518(NRW)-0.0001(LT)-8.2114(RT)+0.00003(DM) +0.0001(DLV)-0.0002(DCT)-4.0509

If Y>0, an intersection is judged as a high crash intersection, otherwise it is an average intersection.

After calibration of the discriminant function, the Spearman's p ranking test is used to check the consistency between the sequence of crash numbers versus the sequence of Y of the 60 intersections. A 0.578 correlation coefficient was obtained with a 0.477 Spearman's correlation coefficient.

Excluding exposure data

By removing the exposure data and redoing the discriminant analysis, an overall correct rate of 76.67% was obtained, with 78.57% correct rate for high crash intersections and 72.22% for average intersections. The discriminant function is as follows:

Y= -1.2747(RY)+1.7506(SC)-0.1105(CC)-0.4484(F&SL)+ 0.9555(TLT)+0.5004(TRT)+0.1935(NSL)+0.1568(NSI) +1.1680(NRW)-1.4362

The Spearman's p ranking test was executed again. A 0.484 correlation coefficient was obtained with a 0.473 Spearman's correlation coefficient.

5.2 Validation

Including exposure data

By bringing the second group of 59 intersections into the calibrated discriminant function; the overall correct rate was 67.70%, with an 83.33% correct rate for high crash intersections and a 60.98% for average intersections.

The Spearman's p ranking test showed a 0.391 correlation coefficient with a 0.537 Spearman's correlation coefficient.

Excluding exposure data

By bringing the second group of 59 intersections into the calibrated discriminant function; the overall correct rate was 66.10%; with an 88.89% correct rate for high crash intersections and a 56.10% for average intersections.

The Spearman's p ranking test showed a 0.289 correlation coefficient with a 0.493 Spearman's correlation coefficient.

5.3 Reverse test data

Reverse the original second group validation data as calibration data, and the original first group calibration data for validation. The principal component analysis was first executed which also rendered 10 components explaining 70.74% of the variance for those 28 variables, which

was a little bit lower than the original 73.54% of the explained variance.

Remodeling including exposure data

Taking the second group of 59 intersections and executing discriminant analysis, the overall correct rate was 72.88%, with a 66.67% correct rate for high crash intersections and a 75.61% for average intersections. The discriminant function is as follows:

Y= -0.3919(RY)-0.1882(CC)+0.4184(FL)-0.1129(F&SL)+ 4.38576(TL)+1.4653(LT)-0.83932(RT)+1.5777(DLV)+ 1.3525(DM)+0.3197(DCT)-5.823

The Spearman's p ranking test showed a 0.702 correlation coefficient with a 0.467 Spearman's correlation coefficient.

Remodeling excluding exposure data

By removing the exposure data, then redoing the discriminant analysis, the overall correct rate was 76.27%, with a 77.78% correct rate for high crash intersections and a 75.61% for average intersections. The discriminant function is as follows:

Y= -0.52172(RY)-0.19862(CC)+0.353942(FL)-0.0197(F&SL)+6.7098(TL)-5.2591

The Spearman's p ranking test showed a 0.647 correlation coefficient and a 0.470 Spearman's correlation coefficient.

Validation of remodeling including exposure data

Bringing the first group of 60 intersections into the discriminant function, the overall correct rate was 70.00%, with a 66.67% correct rate for high crash intersections and a 71.43% for average intersections.

The Spearman's p ranking test showed a 0.4137 correlation coefficient with a 0.284 Spearman's correlation coefficient.

Validation of remodeling excluding exposure data

By bringing the first group of 60 intersections into the discriminant function, the overall correct rate was 68.33%, with a 61.11% correct rate for high crash intersections and a 71.43% for average intersections.

The Spearman's p ranking test showed a 0.386 correlation coefficient and a 0.258 Spearman's correlation coefficient.

5.4 Modeling only with exposure data

After the aforementioned model calibration and

validation, this research took a further step to test what the model would be if only exposure data were taken into account even though the factor 2 of factor analysis only explained 22.0% of the variance (Table 3). Taking the first group of 60 intersections to execute the discriminant analysis, the overall correct rate was 66.67%, with a 72.22% correct rate for high crash intersections and a 64.29% for average intersections. The discriminant function is as follows:

Y= -0.000095(LT)-0.000069(RT)+0.000035(DM)+0.00011 (DLV)-0.000072(DCT)-2.197

The Spearman's p ranking test was used to check the consistency between crash numbers versus Y of the 60 intersections. A 0.678 correlation coefficient was obtained with a 0.463 Spearman's correlation coefficient.

By bringing the second group of 59 intersections into the discriminant function to validate the model, the overall correct rate was 72.88%, with a 61.11% correct rate for high crash intersections and a 78.05% for average intersections.

The Spearman's p ranking test showed a 0.536 correlation coefficient with a 0.392 Spearman's correlation coefficient.

Remodeling with reverse data set

In reverse, by taking the second group of 59 intersections to calibrate the discriminant function, the overall correct rate was 67.79%, with a 61.11% correct rate for high crash intersections and a 70.73% for average intersections. The discriminant function is as follows:

Y= 0.39628(LT)-0.97149(RT)+1.60121(DLV)+0.584527 (DM)+0.297235(DCT)-2.8921

The Spearman's p ranking test was executed. A 0.613 correlation coefficient was obtained with a 0.373 Spearman's correlation coefficient.

Taking the first group of 60 intersections to validate the discriminant function, the overall correct rate was 71.67%, with a 72.22% correct rate for high crash intersections and a 71.43% for average intersections.

A 0.153 correlation coefficient was obtained with a 0.134 Spearman's correlation coefficient from the Spearman's p ranking test.

The assembled results of the modeling and validation are summarized in Table 5.

Signs of variables

After model calibration, it was also important to check the consistency of the signs of variables in the dis-

Table 5 Summary of discriminant analysis

Overall Correct Rate HCI Correct Rate Average Correct Rate Spearman's correlation coefficient

1. Including Exposure Data

Calibration(60) 78.33% 72.22% 80.95% 0.477

Validation(59) 67.70% 83.33% 60.98% 0.537

2. Excluding Exposure Data

Calibration(60) 76.67% 78.57% 72.22% 0.473

Validation(59) 66.10% 88.89% 56.10% 0.493

3. Remodeling Including Exposure Data

Calibration(59) 72.88% 66.67% 75.61% 0.467

Validation(60) 70.00% 66.67% 71.43% 0.284

4. Remodeling Excluding Exposure Data

Calibration(59) 76.27% 77.78% 75.61% 0.470

Validation(60) 68.33% 61.11% 71.43% 0.258

5. Modeling Only With Exposure Data

Calibration(60) 66.67% 72.22% 64.29% 0.463

Validation(59) 72.88% 61.11% 78.05% 0.392

6. Remodeling Only With Exposure Data

Calibration(59) 67.79% 61.11% 70.73% 0.373

Validation(60) 71.67% 72.22% 71.43% 0.134

Table 6 Sign of variable in discriminant functions

The following signs of variables are presented in model sequence with exposure data (60/59) intersections, then without exposure data (60/59) intersections, and finally only with exposure data (60/59) intersections.

Variable Variable Sign Implication

RY (-/-), (-/-), (NA/NA) Multi leg intersection has fewer crashes.

SC (+/NA), (+/NA), ( NA/NA) The more complete signal control tends to cause more crashes.

CC (-/-), (-/-), (NA/NA) Channelization helps reduce crashes.

FL (NA/+), (NA/+), (NA/NA) The more the fast lanes the more the crashes.

F&SL (-/-), (-/-), (NA/NA) The more the sum of fast and slow lanes the fewer the crashes.

TL (NA/+), (NA/+), (NA/NA) The more the total lanes the more the crashes.

TLT (+/NA), (+/NA), (NA/NA) The more the left-turn lanes the more the crashes.

TRT (+/NA), (+/NA), (NA/NA) The more the right-turn lanes the more the crashes.

NSL (-/NA), (+/NA), (NA/NA) Lack of consistency in speed limit signs.

NSI (+/NA), (+/NA), (NA/NA) The more signal installations the more crashes.

NRW (+/NA), (+/NA), (NA/NA) The more signs the more crashes.

LT (-/+), (NA/NA), (-/+) Lack of consistency in left-turn traffic.

RT (-/-), (NA/NA), (-/-) The more right-turn traffic the fewer the crashes.

DM (+/+), (NA/NA), (+/+) The more daily motorcycles the more the crashes.

DLV (+/+), (NA/NA), (+/+) The more daily light vehicles the more the crashes.

DCT (-/+), (NA/NA), (-/+) Lack of consistency in daily container truck.

Note: 60/59 implies the data set used to calibrate the model. "NA" indicates "not applicable", which indicates either the variable is not included in the calibrated model or intentionally excluded from the calibration process.

criminant functions. They are summarized in Table 6.

From the summary of the above table, it is first revealed that FL (total fast lane), TL (total lane including curb lane for parking) were not kept after factor analysis of the original calibrated data (Table 4) but they were significant in model calibration using the validation data. All those variables from factor analysis played certain roles in the final models. Second, different data sets are very likely to produce different research results. Even through the data sets are systematically separated into two from the selected 119 intersection, 36 high crash intersections and 83 average crash intersections, some differences can still be noticed from the correct rate, Spearman's p ranking test, and sign of variables. It indicates each crash has its uniqueness which causes a variation of crashes as well the difficulty in modeling.

In addition, the variation from the model calibration indicates that the importance of model validation by different data sets and the knowledge to interpret is important in making a reliable conclusion.

From the positive or negative sign of variables, some understanding can be extracted. At first glance, the more legs of an intersection (RY), the fewer the crashes. It seems that the complexity of an intersection is negatively correlated with crashes. However, other more detailed information such as signal categories (SC), total lanes (TL), total left-turn and right-turn lanes (TLT, TRT), indicates that the more complex an intersection, the more crashes it will cause. This indicates the importance that more detailed data are crucial to achieve a better explanatory model which is the value of ATMS providing traffic control device data. In addition, the lanes, left (right) turn lanes, signal numbers, sign numbers are more important than the macro road type itself to cause a high-crash intersection. A correlation analysis also suggests that those variables have a correlation coefficient of above 0.40. Thus they are not fully correlated but some degree of co-linearity does exist and is difficult to eliminate even with stepwise discriminant analysis. The geometric factors should be used to reflect the complexity of an intersection, and they can only be obtained through field survey or through the databases of ATMS.

In addition, daily motorcycle traffic (DM) and daily light vehicle traffic (DLV) would affect an intersection to be recognized as a high-crash intersection. Right-turn traffic (RT) is considered to have a negative impact, but the number of right-turn lanes (TRT) is calibrated to have a positive impact. As for left-turn traffic (LT) and daily container trucks (DCT), they do not appear to have a consistent positive or negative impact.

Implication of the discriminant analysis

From the results of the discriminant analysis, it revealed that the overall correct rates range from 67.70% ~ 78.33%, 66.10% ~ 76.67%, 66.67% ~ 72.88% for models including exposure data, excluding exposure data, and only with exposure data, respectively. The difference, therefore, appears insignificant. However, the correct rates for models only with exposure data tend to be less accurate than other models. For the high crash intersections, the correct rates range from 66.67% ~ 83.33%, 61.11% ~ 88.89%, 61.11% ~ 72.22%, respectively. The models only with exposure data also tended to be less accurate than other models. For the average intersections, the correct rates range from 60.98% ~ 80.95%, 56.10% ~ 76.51%, 64.29% ~ 78.05%, respectively. The models without exposure data tended to be less accurate than other models. However, factor analysis indicates that the factor 2 of exposure data explains only 20.0% of the total variance but it could achieve a correct above 56.10%. This suggests traffic itself is indeed an important factor to cause vehicle crashes.

As for the Spearman's p ranking test, only the validation data for the 59 intersections for models calibrated with exposure data had a higher than 0.5 coefficient. This indicated that the discriminant function can not produce a series of results (scores of Y) relatively consistent to the crash number of intersections. It also implied that calibrating a model to predict the crash numbers for a set of intersections is relatively difficult for obtaining a satisfactory result.

It is also worth noting that the addition of a control device or arrangement of approach layout follows a discrete form, which is different from the growth of traffic. As Sullivan (2004) 30 indicated in his research that information overload will cause anxiety to librarians in their work. Kurschner, et al. (2006) 31, also indicated that to take more information, one will ignore the relatively more detailed information. This might help answer how the complexity of an intersection which is often overloaded with traffic control information is correlated with a high crash intersection. A complicated intersection gives drivers pressure and anxiety of comprehending how to cope with it. To obey various control devices also demands a lot of attention from drivers, which in turn creates a situation of information overload. Thus an improvement of an intersection is often offset by those complicated control mechanisms.

This research suggests the more complex an intersection is, the higher the possibility it will turn into a high-crash intersection. This implies that the higher the traffic demand of an intersection, the more the intersec-

tion is likely to be widened, and the more the installation of control devices will occur. Thereafter, more crashes will occur at that intersection. This is a certainty from the viewpoint of the crash-rate with consideration to exposure data. This again induces more traffic as the intersection improvement is based on the deficiency between demand and supply. Therefore the increase of vehicle traffic demand in a city causes an endless loop between the increase of vehicle crashes and the proposed improvement of intersections.

Another interesting concern is the container truck traffic (DCT) in Kaohsiung. Kaohsiung port ranks as the sixth busiest port in the world, and had been the second busiest in the 1990s. The heavy container truck traffic has a bad reputation in Kaohsiung, for it causes a lot of crashes. However the discriminant analysis shows that the counter-measure truck does not have a consistent impact on the creation of high-crash intersections. This might be an indication that the city has put in a lot of effort to countenance the negative impact of container truck traffic. Since this research does not take the accident severity into account, as no consensus of fatality or injury weight has been reached in Taiwan, it is not known if the impact of container trucks will be consistent.

6. CONCLUSION AND RECOMMENDATION

From the motivated hypotheses and the analysis of intersections related to the crash records of Kaohsiung, this research reached the following conclusions and proposes some recommendations.

6.1 Conclusions

1. Discriminant analysis is able to differentiate the 119 intersections of Kaohsiung into two groups with a correct rate of 66.10 ~ 78.33% which implies 2/3 ~ 3/4 are able to be correctly identified;

2. The correct rates of models including exposure data or excluding exposure data fall into the same range, which implies that the exposure data are redundant and can be removed. However, this is under the condition that some variables are coded though the application of ATMS to depict the details of the intersection;

3. Modeling only with exposure data achieves correct rates slightly lower than the overall models or models without exposure data. Prediction of high crash-prone intersections can be conducted without exposure data, or only with exposure data. The example of this research suggests that if ATMS can be used to provide detailed data of intersection geometry and traf-

fic control devices, costs to collect exposure data can be eliminated or significantly reduced;

4. Complexity is not appropriate to counteract with safety deficiency. A way to handle high travel demand and keep the intersection simple is contradictory to each other. In the long term, traffic engineers should think of other non-engineering methods to resolve safety problems;

5. Since roadway geometry is relatively less changed than traffic control devices or lane reassignment of an approach, to improve an identified high-crash intersection, it is recommended from Table 6 that it is better to improve the approach layout than to increase the complexity or number of traffic control devices.

6.2 Recommendation

1. It is not known how the model might change if crash severity is taken into account. In addition, Kaohsiung is the background of this study; further studies could calibrate the relevant data by taking severity into account and using other background data to cross-examine the conclusion of this research.

2. To focus on the intended hypotheses with sufficient intersections for modeling and validation, this research did not classify the intersections by their daily volumes. Further research could take this into account and cross-examine if the application of ATMS data could also be used for rural intersections or roadway sections.

3. Data of traffic control devices can be obtained through ATMS, while the intersection layout can be obtained through aerial photos, or satellite images. When satellite images, such as those from Google Earth32, can provide better resolution, roadway geometry and layout can then be acquired. Many cities without ATMS will be able to predict its crash-prone intersections if sufficient follow-up research can cross-examine the conclusion of this research. The advantages versus disadvantages of using those kinds of data sources can be further investigated.

REFERENCES

1. Tang, K.H. A field study on validation of supplemental brake lamp with flashing turn signals for motorcycles. "INTERNATIONAL JOURNAL OF INDUSTRIAL ERGONOMICS" 31: pp.295-302. (2003).

2. Mok, J.H., Landphair, H.C. & Naderi, J.R. Landscape improvement impacts on roadside safety in Texas. "LANDSCAPE AND URBAN PLANNING" 78(3): pp.263-274. (2006).

3. Metz, D. Accident saving overvalued in road scheme ap-

praisal. "PROCEEDINGS OF THE INSTITUTE OF CIVIL ENGINEERS-TRANSPORT" 159(4): pp.159-163. (2006).

4. Perez, I. Safety impact of engineering treatments on undivided rural roads. "ACCIDENT ANALYSIS AND PREVENTION" 38(1): pp.192-200. (2006).

5. Mok, S.C., & Savage, I. Why has safety improved at rail-highway grade crossings? "RISK ANALYSIS" 25(4): pp.867881. (2005).

6. Christie, N., Ward, H., Kimberlee, R., Towner, E, & Sleney, J. Understanding high traffic injury risks for children in low socioeconomic areas: a qualitative study of parents' views. "INJURY PREVENTION" 13: pp.394-397. (2007).

7. Website of rmis.com Library, http://www.rmis.com/sites/risk-mautol.php (2007).

8. Son, B., Park, M., & Lee, S. A study for hazardous road selection criteria for provincial roads. "JOURNAL OF THE EASTERN ASIA SOCIETY FOR TRANSPORTATION STUDIES" 6: pp.3426-3440. (2005).

9. Fukuda, T., Tangpaisalkit, C., Ishizaka, T., & Sinlapabutra, T. Empirical study on identifying potential black spots through public participation approach: a case study of Bangkok. "JOURNAL OF THE EASTERN ASIA SOCIETY FOR TRANSPORTATION STUDIES" 6: pp.3683-3696. (2005).

10. Hadayeghi, A., Shalaby, A.S., & Persaud, B.N. Macrolevel accident prediction models for evaluating safety of urban transportation systems. "TRANSPORTATION RESEARCH RECORD" 1840: pp.97-105. (2003).

11. Wong, S.C., Sze, N.N., & Li, Y.C. Contributory factors to traffic crashes at signalized intersections in Hong Kong. "ACCIDENT ANALYSIS AND PREVENTION" 39(6): pp.1107-1113. (2007).

12. Kaohsiung City Traffic Devices Website, http://gis.tbkc.gov. tw/KsTraffic/ (2007).

13. Abdel-Aty, M., & Pande, A. ATMS implementation system for identifying traffic conditions leading to potential crashes "IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS" 7(1): pp.78-91. (2006).

14. Chen, S.C., Shyu, M.L., Peeta, S., Zhang, C.C. Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems. "IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS" 4(3): pp.154-167. (2003).

15. Tangpaisalkit, C. Intelligent transport system (ITS) and transportation in Thailand. Office of Transport and Traffic Office of Transport and Traffic Policy and Planning (OTP) Policy and Planning (OTP) Ministry of Transport Thailand (MOT) (MOT), presentation. (2006).

16. Feng, Q., Zhou, X.P., & Du, Y.C. An urban freeway traffic management system. "COMPUTER AND COMMUNICATIONS" 23(1): pp.67-70. (2005). (in Chinese)

17. Wang, Y.W., Wen, M.J., & Ting, K.L. Research on identification of rural crash-prone bridges. "PROCEEDING OF THE 9TH ANNUAL CONFERENCE OF CHINESE INSTITUTE OF TRANSPORTATION" pp.395-402. (1994). (in Chinese)

18. Stamatiadis, N., Jones, S., & Hall, L. Causal factors for accidents on southeastern low-volume rural roads. "TRANSPORTATION RESEARCH RECORD" 1652: pp.111-117. (1999).

19. Kim, K. & Yamashita, E. Motor vehicle crashes and land use empirical analysis from Hawaii. "TRANSPORTATION RESEARCH RECORD" 1784: pp.73-79. (2002).

20. Bernhardt, K.L.S., & Virkler, M.R. Improving the identification,

analysis and correction of high-crash locations. "ITE JOURNAL" 72(1): pp.38-42. (2002).

21. Espino, E.R., Gonzalez, J.S. & Gan, A. Identifying pedestrian high-crash locations as part of Florida's highway safety improvement program: a systematic approach. "TRANSPORTATION RESEARCH RECORD" 1828: pp.83-88. (2003).

22. Hwang, K.P., Tsai, M.Y., & Ou, T.C. Development of identification and investigation technology for urban crash-prone intersections. "2005 TRAFFIC SAFETY AND ENFORCEMENT CONFERENCE" pp.295-311. (2005). (in Chinese)

23. Ivan, J.N. New approach for including traffic volumes in crash rate analysis and forecasting. "TRANSPORTATION RESEARCH RECORD" 1897: pp.134-141. (2005).

24. Pei, Y.L., & Dai, T.Y. Fuzzy evaluating method to distinguish black spot. "JOURNAL OF HIGHWAY AND TRANSPORTATION RESEARCH AND DEVELOPMENT" 22(6): pp.121-125,138. (2005). (in Chinese)

25. Pei, Y.L., & Ding, J.M. Outstanding factor method to black spot differentiation, "CHINA JOURNAL OF HIGHWAY AND TRANSPORT" 18(3): pp.99-103. (2005). (in Chinese)

26. Pei, Y.L. Improvement in the quality control method to distinguish the black spot of the road. "JOURNAL OF HARBIN INSTITUTE OF TECHNOLOGY" 36(1): pp.97-100. (2006). (in Chinese)

27. Cafiso, S., La Cava, G., Montella, A. Safety index for evaluation of two-lane rural highways. "2007 TRANSPORTATION RESEARCH BOARD ANNUAL MEETING" #07-0870. (2007).

28. Xiao, Q., Ghazan, K., & Noyce, D.A. Spatial statistical approach to identifying snow crash-prone locations. "2007 TRANSPORTATION RESEARCH BOARD ANNUAL MEETING" Paper #07-0909. (2007).

29. 2005 Kaohsiung Traffic survey and characteristic analysis. Transportation Bureau, Kaohsiung City. (2005).

30. Sullivan, P. Information overload: keeping current without being overwhelmed. "SCIENCE & TECHNOLOGY LIBRARIES" 25(1-2): pp.109-125. (2004).

31. Kurschner, C., Seufert, T., Hauck, G., Schnotz, W., & Eid, M. Construction of visio-spatial representations during listening and reading comprehension. "ZEITSCHRIFT FUR PSYCHOLOGIE" 214(3): pp.117-132. (2006).

32. Website of Google Map http://maps.google.com/ (2007).