Scholarly article on topic 'ATM performance measurement in Europe, the US and China'

ATM performance measurement in Europe, the US and China Academic research paper on "Social and economic geography"

CC BY-NC-ND
0
0
Share paper
Academic journal
Chinese Journal of Aeronautics
OECD Field of science
Keywords
{"Air traffic management" / Data / Metric / Performance / Sampling / Topology}

Abstract of research paper on Social and economic geography, author of scientific article — Andrew Cook, Seddik Belkoura, Massimiliano Zanin

Abstract Air traffic management (ATM) performance and the metrics used in its assessment are investigated for the first time across the three largest ATM world regions: Europe, the US and China. The market structure and flow management practices of each region are presented. A wide range of performance data across these three regions is synthesised. For topological and performance assessment, the notion of a ‘sufficient’ sample is often non-intuitive: many metrics may behave non-monotonically as a function of sampling fraction. Missing and under-developed metrics are identified, and the need for a balance between standardisation and flexibility is proposed. Longitudinal and cross-sectional metric trade-offs are identified.

Academic research paper on topic "ATM performance measurement in Europe, the US and China"

JOURNAL OF

AERONAUTICS

Chinese Journal of Aeronautics, (2017), xxx(xx): xxx-xxx

Chinese Society of Aeronautics and Astronautics & Beihang University

Chinese Journal of Aeronautics

cja@buaa.edu.cn www.sciencedirect.com

ATM performance measurement in Europe, the US and China

Andrew Cooka,% Seddik Belkourab, Massimiliano Zaninb,c

a Department of Planning and Transport, University of Westminster, 35 Marylebone Road, London NW1 5LS, United Kingdom b The Innaxis Foundation and Research Institute, Jose Ortega y Gasset 20, Madrid 28006, Spain

c Faculdade de Ciencias e Tecnologia, Departamento de Engenharia Electrotecnica, Universidade Nova de Lisboa, Lisboa 2829-516, Portugal

Received 13 June 2016; revised 5 October 2016; accepted 22 December 2016

KEYWORDS

Air traffic management;

Metric;

Performance;

Sampling;

Topology

Abstract Air traffic management (ATM) performance and the metrics used in its assessment are investigated for the first time across the three largest ATM world regions: Europe, the US and China. The market structure and flow management practices of each region are presented. A wide range of performance data across these three regions is synthesised. For topological and performance assessment, the notion of a 'sufficient' sample is often non-intuitive: many metrics may behave non-monotonically as a function of sampling fraction. Missing and under-developed metrics are identified, and the need for a balance between standardisation and flexibility is proposed. Longitudinal and cross-sectional metric trade-offs are identified.

© 2017 Chinese Society of Aeronautics and Astronautics. Production and hosting by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Air traffic management (ATM) performance assessment is a vital tool for improving air transport service delivery. We investigate such performance and the metrics used in the assessment thereof, across the three largest ATM regions of the world: Europe, the US and China. In addition to synthesising a wide range of data across these three regions, we set out

to establish the importance of data sampling with respect to the characterisation and assessment of ATM.

In Section 2, we compare and contrast the market structure (development of airline operations) and flow management practices of each region. Data availabilities, metric definitions and high-level performance data are also presented. Since this paper is concerned in large part with the impacts of sampling on performance assessment, it is first necessary to set a higherlevel context of how the three regions of interest are defined, and to present some data on their characteristics, in order to facilitate interpretation of the performance data available from the corresponding states, and the results of our analyses. We will briefly set the scene regarding the development of airline operations and flow management practice in these regions. It may naturally be expected that the drivers and constraints of market and operational development will affect the type of

* Corresponding author.

E-mail address: cookaj@westminster.ac.uk (A. Cook). Peer review under responsibility of Editorial Committee of CJA.

http://dx.doi.Org/10.1016/j.cja.2017.01.001

1000-9361 © 2017 Chinese Society of Aeronautics and Astronautics. Production and hosting by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

network that emerges, and hence the complexity metrics used in this paper to characterise these networks and metrics quantifying performance. It will later be demonstrated that the results for China reflect a different type of network evolution, such that more detail on this region's market development will be presented.

In Section 3, the impacts of airport and airline sampling are presented. Network topologies and delay performance are the focus of these analyses. In the concluding section, we discuss the context of international harmonisation and identify several challenges ahead regarding performance assessment and data management.

2. Regional contrasts

2.1. Establishing context

In the context of assessing the impact of sampling on performance data, it might be expected that at least the fundamental definitions of Europe, the US and China would be straightforward. Whilst this holds for the US, it is slightly more complicated for China, and much more complex for Europe. Unless otherwise indicated, the 'US' refers to air navigation services provided by the United States of America in the 48 contiguous states located on the North American continent south of the border with Canada, plus the District of Columbia, but excluding Alaska, Hawaii and Oceanic areas (the 'US CONUS'). Air transport movement data for China often include Hong Kong, Macao and Taiwan, whereas airport counts usually do not. 'European' data may refer to the European Union (EU), geographical Europe, or the area flow-managed by EUROCONTROL: comprised of 44 states participating in the European Civil Aviation Conference (ECAC). In Europe, the formation of nine supranational Functional Airspace Blocks is part of a move towards the goal of defragmentation: viz. a Single European Sky (launched in 2000 by the European Commission specifically in response to performance management and the challenge of increasing delays). The Single European Sky (SES) area comprises the 28 EU members plus Norway and Switzerland. Of these 'European' areas, the EU is the smallest, such that one has to be wary when referring to 'EU' data only. Complicating matters further, 'European' forecasts often refer to the ESR08 traffic region (EUROCONTROL Statistical Reference Area, comprising 34 traffic zones1). Turkey, for example, is in ECAC, included in ESRA08, and a member of EUROCONTROL, but is not in the EU or SES. In 2014, Turkey was the main contributor to European traffic growth, without correspondingly noteworthy delays, yet, in contrast, not subject to the determined costs air navigation charging methods central to the SES performance scheme.2 The primary focus in this paper is ATM performance, and such data usually refer to ECAC (although the full European flow management and flight planning situation is actually even more complicated3 than the summary presented here).

As will be discussed in Section 2.4, three primary types of data are collected within each region, involving automated tracking and network operational data collection, in addition to airline data sampling. Not only can the inclusion or exclusion of one or more states clearly affect the data, but, as will be demonstrated, topological and performance metrics can

vary as a function of the number of airports or airlines included, and even these delineations are open to variable definitions.

2.2. Market structure

All three regions have witnessed considerable mergers, and groupings into global alliances, with most of the largest airlines now operating as airline groups. Major liberalisation in the airline industry first started in the US market in 1978. European deregulation occurred more gradually, growing from numerous bilateral 'open sky' agreements in response to a European Court of Justice ruling in 1986.4 The main change here was deregulation of international routes within the EU in 1993 (to coincide with the launch of the single European market) and this was extended to domestic routes in 1997; the main multilateral agreement was between the EU and the US in 2008. Europe and the US are now both established free markets, with a full range of operator types, and with very limited state intervention in airline planning and operations. Recently, there has been a significant growth in low-cost carriers (LCCs),2 serving fare-driven markets, as exemplified below by the fact that LCCs appear in the top four airlines by passengers carried in both Europe and the US.

Development in China has been more complicated, as the market has changed from a fully planned, state-controlled system, to more of a market economy, in which new forms of airline ownership and operations have emerged. The summary presented here draws mainly on three published works.5-7 Chinese airlines were officially separated from military jurisdiction in 1980, and merged into three large airline groups in 2002 (Air China, China Eastern and China Southern). Regional airlines emerged essentially as supplementary carriers, with relatively greater regional and local government control and support. These comprise some quarter of all routes, albeit more policy- than market-driven.8 (As will be observed later, this may have wider consequences for hub development.) 2005 saw investment deregulation and the emergence of nonstate airlines (some private, some jointly-owned), including some LCCs, only to be followed by a suspension of new airline applications in 2007. Further state-led consolidation took place after the global financial crisis in 2008, with new mergers and acquisitions in place by 2010. In a comparison5 of the relative efficiency of these airline types in China, it is stated that some route and schedule advantages remain for the larger airlines with state planning, relative to newer operational models, such as the LCCs. Although dominant status still continues for these three large groups, there is evidence7 of significant competition between them for market share.

In Table 1, we summarise the market structure in each region, through the four largest airlines in each, drawing primarily on Flightglobal data (Flightglobal company profiles: https://www.flightglobal.com/. Accessed May 2016). Of note, is that some traditional demarcation between LCC and mainline (legacy) carriers is breaking down, for example with Vuel-ing in IAG, and Transavia part of Air France-KLM. The Lufthansa Group owns LCC Eurowings and it is understood that the alliance is actively looking for further LCC partners. In the US, Delta and United both have some LCC ownership, whereas American does not. In China, there were 38 state-owned and 13 private airlines in 2014.9 Only half a dozen or

Table 1 Carrier market structure by region.

Region Four largest carrier (groups) Alliances Carrier type Ownership

Europe Lufthansa Group Star Alliance Mainline Wholly or majority private holdings

Ryanair No global alliance LCC

IAGa oneworld Mainline

Air France-KLM SkyTeam Mainline

US American Airlines oneworld Mainline Public companies (e.g. under holding groups)

Delta Air Lines SkyTeam Mainline

Southwest Airlines No global alliance LCC

United Airlines Star Alliance Mainline

China Air China Star Alliance Mainline Majority state shareholding

China Eastern SkyTeam Mainline Majority state shareholding

China Southern SkyTeam Mainline Majority state shareholding

Hainan Airlines No global alliance Mainline China's largest privately-owned airline

a International Airlines Group, formed by the merger of Iberia and British Airways.

so LCCs in operation have around twenty or more aircraft. The first three Chinse airlines listed in Table 1 have no LCC ownership, whilst Hainan has large shareholdings in very small LCCs.

2.3. Flow management practices

Air traffic flow management (ATFM) in the three regions has many principles in common. Both the US (Federal Aviation Administration) and China (Air Traffic Management Bureau, Civil Aviation Administration of China) have one service provider. The US 'CONUS' airspace (the 48 contiguous states located on the North American continent) is operated by 20 air route traffic control centres.10 China is operated as eight upper air control areas (regional ATFM units).11 The intergovernmental organisation, EUROCONTROL, oversees ATFM in Europe, comprising 41 states and 63 en-route centres.10 Europe thus faces the additional challenges of fragmentation through service delivery from multiple sovereign states, notwithstanding the Single European Sky initiative mentioned in the previous section. In all three regions special use airspace (SUA) presents significant challenges, being present in the core operating areas of Europe and China. Although these impacts are often difficult to quantify through performance metrics, some work has been undertaken on this within Europe.2 (Nevertheless, it may be noted in Table 2 that China reports military activity as an explicit delay classification, unlike Europe or the US. Military airspace is prominent in China.) All three regions operate collaborative decision-making between ATFM, airlines and airports, although this is more mature in Europe and the US.

In Europe, emphasis is put on strategic planning, with strategic agreements on airport capacities and airport slots. Although very few airports in the US have schedule limitations, flow restrictions are usually due to weather. Whilst airport restrictions strongly dominate ATFM in the US, there is a more even split between airport and en-route restrictions in Europe.10 The default is to apply holding at-gate in Europe, whilst implementing a ground delay program (GDP) is a last resort in the US - as versatile management in the airborne phase, redirecting entire flows around large weather systems, is made possible by management through one service provider.

Table 2 Network manager high-level ATM delay reporting, by region.

Data level by region Europe US China

Focus on arrival or Departure Arrival Departure

departure delay

Delay threshold >5 min >15 min >|5 min|a

Main delay causes Airline Airline Airline

reported

Weather Weather Weather

(ordering, and simplified ATFM, ATFM ATFM

labels, to ease weather

comparison) ATFM, Security

airports

ATFM, en- Military

Reactionary Reactionary

a But classified as on-time if any departure delay is zero or less on arrival. Plans are in place to move to a single, 15-min arrival delay threshold in future.

In the airborne phase, miles in trail (MIT; separation by a common distance) is being replaced by Time-Based Metering (more efficient, individual flight spacing).10 More extreme ground holding (a 'ground stop') may be applied when all departures bound for a constrained airport are postponed. Capacity limitations in the airspace, most likely due to weather, may result in airspace flow programs (AFP) - these are more practical than multiple GDPs when a large geographical area is impacted. In contrast, en-route spacing or metering is very rare in Europe due to the fragmented service provision and SUA distribution. Sequencing tools and speed control are usually used only within state boundaries, often associated with Required Times of Arrival (e.g. at an airport).10 Cross-Border Arrival management (XMAN) is less common. China somewhat represents a mixture of the European and US systems. Whilst at the strategic level it has coordinated airports, at the tactical level, GDPs, AFP, MIT (with advanced planning based on demand) and collaborative routing are deployed by the Air Traffic Management Bureau (ATMB) for resolving trans-regional flow management.11

2.4. Performance data - Collection and coverage

When reporting performance using ATM data, three primary sources are used:

(1) Trajectory (radar track) data;

(2) Network manager (e.g. ATFM) delay data, with causes (e.g. airport or en-route restrictions);

(3) Airline data (e.g. on delay and cancellation causes).

In addition, numerous other exogenous data sources may be required to set the above data into a meaningful context (such as meteorological data, airport and sector capacities, and military activity rates) and to clean or correct the data (e.g. using schedule data to ascertain the originally intended plans of the airline regarding a given flight). Indeed, care may need to be taken to differentiate between network manager and airline sources. Coverage of the former may often be better, but ATFM delay data may relate to the last-filed flight plan of the operator (as typically the case in European reporting), and may thus underestimate total delay relative to the schedule (as reported by an airline). On the other hand, an airline might, for example, have a 10-min handling delay within a 20-min fixed ATFM delay, and report both as 10 min, thus under-reporting the ATFM delay.

Regarding airline data, US carriers are required to report performance data if they represent at least 1% of total domestic scheduled-service passenger revenues (some report voluntarily, in addition). In 2013, this represented 68% of instrument flight rules (IFR) flights at the main 34 airports used for more detailed performance tracking.10 Coverage thus

varies in terms of the contributing airlines: for example, 16 in 2014, and 12 in 2016. As of January 2011, carriers operating more than 35 000 flights per year within the European Union airspace are legally required to submit data to EUROCON-TROL. In 2013, the coverage was approximately 63% of total scheduled IFR flights (and approximately 76% of flights at the 34 main airports, to compare with the US coverage).10 In 2014, these data in Europe covered just over 100 airlines (personal communication), and 69% of commercial flights12 in the ECAC region.

Table 2 summarises delay reporting characteristics for EUROCONTROL, the FAA and ATMB. Although Europe differs in having a focus on departure delay reporting, due to the lack of en-route delay management in Europe departure delay and arrival delay are closely correlated.2 (For example, the average arrival delay of arrival-delayed (> 5 min) flights is only 4.6% greater than the average departure delay of departure-delayed flights (> 5 min).12) Whilst the table shows high-level delay reporting categories, further breakdowns are available, although Europe and the US do not quantify military delay in this manner, and China does not currently publish reactionary delay as per the US or Europe (since causality is attributed to the first rotation cause, until the chain has an on-time rotation) or cancellation rates (although estimates of the latter based on sampling from the ten major airlines suggests rather lower rates than in Europe or the US).

Table 3 summarises key performance parameters for the regions for 2014. To the best of our knowledge, this is the first time that such comparative data across these three regions have been published. It would be possible to dedicate the remainder of this paper to comparing data collection impacts

Table 3 European, US and Chinese performance in 2014.

Region Total Total Total Delayed > 5 min1 Delayed > 15 minJ Average delay Reactionary ATFM Cancelled

airports pax (m) flights (m)f (min)k delay delay

Europe 609a't CO 9.6g 37.4%m 16.3%g 9.7m 39.6%m'n 13.3% g 1.5%g

US 516c 852b 15.1t - 21.3%b 13.4l 41.9%o'p 23.5%o,q 2.2%b

China 202d 392d 8.6h-t 31.6%d - 21h-t s 30.1%h'r't s

a Airports Council International (ACI) EUROPE, ECAC area, excludes non-commercial airports.

b US Department of Transportation (DoT), Bureau of Transportation Statistics (BTS): http://www.transtats.bts.gov (Accessed May 2016). c EUROCONTROL and Federal Aviation Administration.10 d Civil Aviation Administration of China, 2014.9

e Eurostat: http://ec.europa.eu/eurostat/data/database European Union (28 states) (Accessed May 2016). f Total IFR flights, includes overflights. ECAC area for Europe. g EUROCONTROL, Performance Review Commission, 2015.2 h ATMB, Civil Aviation Administration of China.

1 Departure delay; > 5 min in China (where 'on-time' measure includes flights up to 5 min early). j Arrival delay; > 15 min in Europe.

k Departure delay for Europe (authors' estimate of arrival delay: 10.1 min, using 4.6% scalar cited); arrival delay for US (fiscal year 2014) and China.

l http://www.faa.gov/data_research/aviation_data_statistics/operational_metrics (Accessed May 2016). m EUROCONTROL, Central Office for Delay Analysis, 2015.12 n Rotational delay value: corrected from total reactionary delay.

o US DoT, BTS: http://www.rita.dot.gov/bts/help/aviation/html/understanding.html (Accessed May 2016). p ''Aircraft arriving late" (rotational) delay classification. q ''National Aviation System" delay classification. r "ATC" delay classification. s Not currently reported as such in China. t Personal communication.

on each of these reported values, but instead the data will be used to draw high-level comparisons and to pave the way for further sampling analyses in Section 3.

The main observations are as follows. Whilst China handles fewer passengers (the totals include international passengers) through fewer airports than Europe, the number of flights handled is comparable with the latter. Although for delays measured at the lower cut-off threshold of >5 min, China performs better than Europe, the average delay per flight and contribution thereto from ATFM is notably poorer in China, relative to the other two regions. With a smaller network than Europe, the main drivers of somewhat lower performance in the Chinese network are not clear. The market structure described in Section 2.2 has probably not contributed to this greatly. Critical to performance in the European and US contexts since they comprise two fifths of all delay minutes in both regions, it is difficult to assess the contribution from reactionary delays in China in the current absence of corresponding published data. Nevertheless, during the early 2000s, the vast majority of air travellers in China were simple origin and destination passengers7, such that passenger reactionary effects are probably not major contributors. Since ATM service fragmentation is not an issue either, this leaves the proposition that less mature ATFM stakeholder collaboration processes and further requirements regarding the development of the flexible use of airspace (for SUA) are the primary factors.

3. Sampling and performance analysis

3.1. Flight trajectory data sources

Air transport data sampling is often purposive, for example when the topology of one airline (or alliance) is the focus, when limited by availability, or driven by airport or airline size (as indeed in the US airline sampling protocol described above). Larger airports and airlines are thus often over-represented to the detriment of smaller ones. In Section 3,

we deploy the first type of data identified in Section 2.4, i.e. trajectory data, to explore the impacts of sampling on metrics characterising networks and describing performance. The data sources are shown in Table 4. Although the data refer to different years, a number of broad comparisons are still useful to demonstrate the effects of sampling. The OpenFlights data are not available historically, but represent the only common source of data for comparison across the three regions. Note also that the two higher delineation sources for Europe and the US both include delay data associated with the flights. For Europe, the ALL-FT+ data show up to five IATA (International Air Transport Association) delay codes with magnitudes in minutes; for the US, the RITA data show delays of each flight across the five (US) category attributes shown in Table 2. The research team does not have access to corresponding higher delineation data for China. The available delay data will be used in Section 3.3.

In Table 4, it may be observed that for Europe the higher delineation data covers approximately four times more airports but a third fewer airlines than the lower delineation data. In the US, the higher delineation data cover half as many airports as the lower delineation data, and 16 (large) carriers only. Table 4 also reproduces the total number of airports in 2014, as indicated in Table 3. It should be noted again, however, that such counts are significantly dependent on the criteria applied and, to a lesser extent, the definition of the region covered.

The Chinese market is the most dynamic regarding new infrastructure. The 202 airports cited refer to ''certified transport airports" at the end of 2014, an increase of nine from

2013, with new airports at Heilongjiang Fuyuan, Hubei Shen-nongjia, Qinghai Delingha, Shanxi Luliang, Jilin Tonghua, Guangxi Hechi, Sichuan Aba, Guizhou Liupanshui and Hu'nan Hengyang.9 The 609 European airports indicated relate to the ECAC area, using data from ACI EUROPE for

2014, excluding non-commercial airports. This number jumps to 3347, however, when all ECAC IFR flights are included, embracing military, cargo and general aviation movements (in-house analyses of EUROCONTROL data covering

Table 4 Flight trajectory data sources, by level and region.

Data level by region Europe US China

Lower delineation (no aircraft types or delay data) OpenFlightsa OpenFlightsa OpenFlightsa

No. of airports 497 595 185

No. of airlines 153 81 17

Higher delineation (with aircraft types and delay data) ALL-FT + b RITAc N/A

No. of airports 1854 286 -

No. of airlines 100 16 -

No. of airports in 2014 609 516 202

of which > 1 m passengers 221d 87e 64f

% > 1 m passengers 36% 17% 32%

a Open source repository, flights and airport data, worldwide coverage; http://openflights.org Flights for June 2015. b Provided by EUROCONTROL; all intra-European IFR flights for March through December, 2011.

c On-Time Performance data, provided by the Research and Innovative Technology Administration (RITA), US DoT. Intra-US flights, same period as (b). d ACI EUROPE, personal communication.

e http://www.faa.gov/airports/planning_capacity/passenger_allcargo_stats/passenger/ (Accessed May 2016). f Civil Aviation Administration of China, 2014.9

Fig. 1 Link density and maximum degree by airport sampling fraction.

Table 5 Three common complexity metrics.

Metric Definition Range Remarks

Link density The number of (active) links in the network, divided 0 A void, or empty network

by the maximum number of links that could be present 1 All nodes connected to all other

Maximum degree The maximum degree of a network is defined as the 0 A void, or empty network

degree of the most connected node x x determined by most connected

Assortativity (degree Pearson's correlation coefficient (q) between the 0 < q 6 1 Assortative

correlation) degrees of pairs of nodes connected by a link q « 0 Non-assortative (random

(q =1 ) all nodes connected to nodes of the same connections)

degree) 0 > q P -1 Disassortative

21AUG14-15OCT14 - the count would increase somewhat further if the whole year was taken into account). The US has a similarly long tail: FAA data cites (http://www.rita.dot.-gov/bts/sites/rita.dot.gov.bts/files/publications/national_trans-portation_statistics/index.html. Accessed May 2016) over 5000 airports in the US in 2014 if smaller (public) airports are included, as compared with the 516 indicated.

Considering in general the different definitions for exactly what comprises an airport in a network (e.g. in terms of traffic types, minimum qualifying volumes, and time periods), and in particular the dynamicity of the Chinese market, combined with varying levels of data access across platforms and researchers, it is not surprising that one finds different numbers of airports cited for the same region in different publications.

3.2. Network topology as a function of sampling

Simulating purposive sampling bias, in Fig. 1 airports are sequentially added to reconstructions of the three regional networks, according to their number of connections - airports with a larger number of connections are added first, smaller ones last. This sampling method allows us to see how the network evolves, from the core backbone (i.e. the core created by the most connected airports) to the whole structure. This also allows us to simulate incomplete data sets, which may only

contain information for the larger elements of the network, for example. The evolution of two of the metrics shown in Table 5 are plotted as a function of the fraction of nodes (airports) sampled from the original data set, using the OpenF-lights data described in Table 4. Thus, values close to zero signify that only the most connected airports are included; values close to one signify that all airports are considered. Similarly, Fig. 2 plots the evolution of two of the metrics as a function of the fraction of airlines included (largest first). (The two insets for the US network represent scaled data from the main graphs, to clearly show the evolution of the two metrics.) These two plots represent part of wider work13 using seven complexity metrics in total, simplified here.

The metrics typically vary quite strongly as a function of the number of airports included. Whilst the two metrics shown demonstrate monotonic behaviour, others changed the direction of their evolution.13 Notably, many do not saturate, i.e. they do not reach a stable value even at high sampling fractions. These observations imply that the network topology is changing as nodes, even small ones, are added. There is, therefore, no obvious sampling threshold by which nodes may be safely discarded. Fig. 2 gives somewhat contrasting results. Here, a low(er) number of airlines is usually sufficient to recover a good approximation of the complete topology. This is probably because sampling airlines involves sampling both

Fig. 2 Link density and assortativity by airline sampling fraction.

larger and smaller airports, and corresponds better to a sampling of the system than through selected (usually larger) airports only.

From Fig. 1, it may be observed that a higher maximum degree is maintained for a higher airport sampling fraction for China (c.f. Europe and US). This may be ascribed to the evolutionary constraints applied to the planned market (c.f. free markets): relatively more airports have a high degree (better connectivity), probably as a result of the national and regional policies described in Section 2.2. The assortativity of the Chinese network (Fig. 2) as a function of the airline sampling fraction is more monotonic (less 'volatile' c.f. Europe and US), further suggesting that Chinese airline networks are more homogeneous in this respect, again reflecting a planned-market evolution.

However, the final assortativity (i.e. of the full network) in China is quite low (approximately —0.14) compared with the values for Europe (approximately —0.05) and the US (approximately —0.01). Although all of these networks have long-tailed degree distributions (plot not shown), this may suggest that, topologically, the Chinese network still has some efficiency characteristics of many biological (and technological) networks (which are often scale-free and (weakly) disassorta-tive,14 as high degree nodes tend to attach to low degree nodes, with attendant transportation efficiencies). Whilst this is consistent with the values observed, it is not fully sufficient supporting evidence, particularly as the (final) assortativity values are quite small for all three regions. Thedchanamoorthy et al.14 are amongst researchers who have raised the issue of the need to look at local assortativity. They conclude that 'strong' rich-clubs (where the majority of links from hubs terminate at other hubs) are rarely present in real-world networks, whereas 'weak' rich-clubs (where the link density is merely higher for hubs, compared to the entire network) are present in many real-world networks.

The average degree for the Chinese network is 15, midway between the European (19) and US values (10) (calculations not shown), indicating that Chinese airports are well connected, especially considering the smaller absolute number of

airports. This is compatible with more of a point-to-point, than hub-and-spoke system, as observed elsewhere.7 Nevertheless, as shown in Table 4, the Chinese network has approximately the same proportion of airports with more than 1 million passengers as Europe, and a comparable absolute number (64) when compared with those of the larger, US network (87).

At the same time, it has been observed8 that policy-related factors leading to highly connected airports, might well enhance the attractiveness of airports to airlines to then grow more as hubs under the influence of free-market forces. Again, we are led to the conclusion that China reflects aspects of both planned- and free-market evolutions when topological metrics relating to airline and airport sampling are considered. The relatively poorer performance in terms of flow management, discussed above, probably has relatively little impact on these topologies. However, it is the issue of network dynamics to which we turn next.

3.3. Delay performance as a function of sampling

Fig. 3 shows the evolution of the average arrival delay observed in the network, as an increasing number of airports are included in the analysis. Airports are added, as in Fig. 2, in decreasing order of the number of connections. The dashed blue lines represent the fraction of flights included (using the right-hand axis). Black solid lines correspond to the average delay calculated by discarding negative values, as is the industry norm. The green solid lines show the results when negative values are included, purely to demonstrate the full variability of arrival times. In both cases, a similar behaviour can be observed: after an initial transient, the average delay reaches a local minimum in the middle of the graph, subsequently slightly increasing as more airports are considered. Sampling too few airports (even the ten largest, for Europe) generally leads to a significant overestimation of delay. EUROCON-TROL reporting on European performance covers the top 30 airports,2 whilst standard, comparative reporting on performance in Europe and the US includes the top 34 airports10

Fig. 3 Evolution of average delay as a function of the number of airports.

in each region. From Fig. 3, considering 34 airports corresponds to an estimation error of 2.2% for Europe, and 1.5% for the US. Whilst these measures are thus quite robust, caution might be advised on changes of the order of one percentage point for such values.

The average US arrival delay, taking into account the whole network, is 11.4 min, as compared with the value of 13.4 min in Table 3. The lower average taken from Fig. 3 is largely driven by the inclusion of all delay values, rather than as per the (US) sampling threshold of 15 min shown in Table 2. The red dotted line in Fig. 3(b) represents the evolution of reactionary delay in the US network. Using the reactionary asymptote of 4.1 min gives a reactionary ratio of 36%, in reasonably good agreement with Table 3 (approximately 42%), but demonstrating further the difference in values that may be obtained based on the sampling protocol. The reactionary delay line is fairly well correlated with the average delay (black line), corroborating reported relationships between airport size and reactionary delay, especially regarding 'back propagation' into hubs.15-18

The European average delay from Fig. 3(b) is 12.7 min, compared with the departure delay value of 9.7 min in Table 3 (with an arrival delay estimate slightly higher, at 10.1 min). The European data for Fig. 3 only refer to intra-European flights (c. f. all flights to/from/within the ECAC region, for the value in Table 3), although is it not clear what effect we might expect this to have on the average. The 10.1 min estimate is based on a (comparatively high resolution) 5-min delay threshold, so this is probably not driving a very great difference between the 10.1 and 12.7 min averages. As an opposing effect, the data for Fig. 3 refer to last-filed flight plans, which should tend to underestimate the airline-reported delay relative to schedule. On balance, it is likely that most of the observed difference is probably attributable to airline coverage (69% for the Table 3 value; c.f. practically full coverage for the ALL-FT+ data).

Fig. 4 presents an analogous analysis to Fig. 3, this time considering a sampling of airlines (same colour coding applied). Here again, airlines are included in decreasing order of the number of connections that they offer. Europe and the US seem to behave in quite different ways: increasing the number of airlines (by number of flights) decreases the average observed delay in the former, but increases it in the latter. Nevertheless, one should note that the RITA dataset includes only

a fraction of the total number of airlines operating in the US. It is quite possible that if more airlines were included, the observed delay would decrease - in a behaviour similar to Fig. 3.

Fig. 5 presents an analysis of the evolution of the average delay, as a function of the number of airports considered, and by day. (The coloured sidebars to the right of each plot show the decile contours of average delay, in minutes, as a complement to the vertical axis.) In the case of the European system, it can be seen that the behaviour observed in Fig. 3 (a) is always present, i.e. the maximum delay is observed when considering the three to seven largest airports. On the other hand, the US system presents a different behaviour: the initial peak is observed only on the two days with most delays, demonstrating that the observations of Fig. 3(b) are the result of the aggregation of rather different diurnal patterns.

Airport delay multipliers, i.e. average airport departure delay divided by average airport arrival delay, have been studied by several researchers (see Cook et al.18 and Hao and Hansen19 for reviews). Such metrics afford insights into the role key nodes play in (reactionary) delay propagation in networks. The value of research identifying delay-multiplier airports and the role that schedule buffer and turnaround times play in delay propagation has also been discussed,20 in a joint analytical-statistical approach. Here, an analytical model is used to calculate propagated delay using US on-time performance data for 2007. The optimal timing of buffers during the day and varying airline strategies regarding buffer application are discussed.

Research studying the temporal evolution of the European air transport system,21 using two network layers (the air navigation route network and the airport network) has shown that the air navigation route network is dominated by summer/winter seasonal variations, whilst the airport network also shows such seasonal variations in addition to peak/off-peak weekly patterns. In both network layers, hub airports are identified as potential delay multipliers.

Fig. 6 explores such delay multipliers as a function of the size of the network sampled. The delay multiplier plots build up the network sequentially. For example, the bars labelled '25' show the distribution of the multipliers for the top 25 airports, considering only flights between these airports. The black lines thus show the results as if produced by a network

Fig. 4 Evolution of average delay as a function of the number of airlines.

(a) Europe (b) US

Fig. 5 Temporal evolution of average delay as a function of airports included.

Fig. 6 Delay multipliers as a function of airports included.

of 25 airports and the flights between them. For the top 50, the first 25 are of course the same as the top-25 network. However, the corresponding delay multipliers may vary, as they are now calculated considering all flights connecting the 50 largest airports. As the sampling fraction is increased, it is clear that more extreme delay multiplier airports appear, as would be expected. The inclusion of smaller airports has an important

effect. It has been reported18 that for some smaller European airports, arrival delay is doubled (or even tripled) into reactionary delay. This is likely due to reduced delay recovery potential at such airports, for example through: fewer flexible or expedited turnarounds; fewer spare crew and aircraft resources; and, whether a given airport has sufficient connectivity and capacity to reaccommodate disrupted passengers.

3.4. Passenger context

These discussions have focused so far on flight delay. However, it has been established in the literature that passenger delay and flight delay are not the same. Using large data sets for passenger bookings and flight operations from a major US airline, it was shown22 that passenger-centric metrics are superior to flight-based metrics for assessing passenger delays, primarily because the latter do not take account of replanned itineraries of passengers disrupted due to flight-leg cancellations and missed connections. For August 2000, the average passenger delay (across all passengers) was estimated to be 1.7 times greater than the average flight-leg delay. Based on a model using 2005 US data for flights, it was concluded that ''flight delay data is a poor proxy for measuring passenger trip delays".23 For passengers (on single-segment routes) and flights, delayed alike by more than 15 minutes, the ratio of the separate delay metrics was estimated at 1.6. In the first full European network simulation model with explicit passenger itineraries, the busiest 199 ECAC airports in 2010 were modelled, in addition to the major flows with the rest of the world.18 Approximately 30,000 flights and 2.5 million passengers, distributed amongst 150,000 distinct passenger routings, were modelled under various scenarios. The ratio of arrival-delayed passenger minutes over arrival-delayed flight minutes (both pertaining to delays of greater than 15 min) ranged between 1.3 and 1.9, under the various scenarios, thus in good agreement with the US values cited. A topological analysis24 based on 2007 schedule data investigated the connectivity of airport networks in the same three regions as the current paper, whereby a time-dependent, minimum-path approach is employed to estimate the minimum travel time for passengers between each pair of airports in the three networks, inclusive of flight and connection times.

Furthermore, several works have demonstrated that passenger delay effects are not apparent when considering flight-

centric metrics alone18'25'26 and several proposals18'23'27-29 have been put forward for dedicated passenger metrics. It is concluded that passenger-centric metrics are required in comprehensive ATM stakeholder assessment frameworks, a theme that is developed in the following section.

4. Discussion and outlook

In this paper we have presented the first such comparison of ATM performance across Europe, the US and China. In this section, we discuss further the context of international harmonisation and identify some challenges ahead regarding corresponding metrics and data management. A key actor in this domain is the International Civil Aviation Organization (ICAO). It has contributed significantly to ATM system performance measurement and its international harmonisation. In its manual30 on global performance of the air navigation system, ICAO identifies eleven key performance areas (KPAs) - safety; security; environmental impact; cost effectiveness; capacity; flight efficiency; flexibility; predictability; access and equity; participation and collaboration; interoperability. Harmonised key performance indicators have been developed according to a Memorandum of Cooperation signed between the US and the European Union10, with ATFM delay proved to be a leading performance indicator. European and US analyses presented in this paper have been coordinated with ICAO and have also been reflected in reporting by Airservices Australia. There are several governmental mandates regarding data provision and the reporting of performance metrics in Europe10,12 and the US.10,31 Indeed, service provider compliance with SES performance scheme targets is legally binding in Europe. The Civil Aviation Administration of China plays a key role in data integration within China.

Table 6 summarises high-level targets defined by the ATM improvement programmes implemented within each of the three regions. Whilst these are broadly comparable, some are

Table 6 High-level targets defined within ATM programmes.

Region Europe US China

Programme SESAR32 NextGen33 ATMB Strategic Development

2025b Programmea

Programme target 2035a 2030

Baseline year for 2005 2009 2015

comparison

of relative changes

(ICAO) KPA

Safety Improve safety 10-fold Commercial carrier fatalities < 6.2 per Reduce ATC-attributable

100 million pax accident

rate by 20% by flight volume

Capacity Increase capacity 3-fold 12% increase in core airports throughput Increase capacity 3-fold

Efficiency 1-3 min. reduction in average delayc Reduce delays by 27% Average ATC-attributable

En-route ATFM average delay delay < 5 min

0.5 mind

Environment 10% reduction in impact of flights on Reduce fuel burned per miles flown Reduce CO2 by 10% (kg/km)

environment by > 2% annually

a Personal communication.

b Selected targets shown relate to intermediate target year 2018. Delay reduction allocated to efficiency KPA by authors for ease of

comparison.

c Declared within SES performance scheme within capacity KPA; target relative to 2012.

d Corresponding target set within SES performance scheme for 2015-2019.

more ambitious than others: for example, the 3-fold increase in capacity in China mirrors the European target, but is set relative to a 2015 baseline and in the challenging context of the highest current contributions to flight delay by ATFM (as seen in Table 3). Within SESAR, ''performance ambitions" are categorised according to the KPAs of safety, environment, capacity, cost efficiency (definition adopted for consistency with SES regulations, c.f. ICAO's cost effectiveness), operational efficiency (expressed as measurements of delay and fuel savings, to be useable by the SES performance scheme under the capacity and environment KPAs, respectively) and security. These are aligned with the SES ''high-level goals", first formulated in 2005. Other KPAs (not shown) are incorporated into these programmes. Predictability is an important example of a complementary metric. Both SESAR and NextGen aim to improve the predictability of flight arrivals. As a key outcome of the SESAR Target Concept, 70% of flights (in alignment with European-US comparative analyses10) in Europe are targeted to arrive at the gate within a 2-min time window, by 2035. Improving flight predictability by reducing variances in flying times between core airports is a target set by 2018, by the FAA.33

Social and political priorities in Europe are shifting in favour of the passenger, as evidenced by high-level position documents such as 'Flightpath 2050'34 and the European Commission's 2011 White Paper.35 SESAR's 'Performance Target'36 refers frequently to the concept of society and the passenger. The 'societal outcome' cluster of key performance areas is defined as being of ''high visibility", since the effects are of a political nature and are even visible to those who do not use the air transport system. Turning to the US, the FAA published a new strategic plan in 2011, 'Destination 2025'33, streamlining strategic goals. Also mindful of the passenger, these include goals that will ''serve the needs of the traveling public and the aviation industry to provide unencumbered access to the aviation system" and ''enhance aviation's value to the public by improving travel throughout the National Airspace System, and beyond".33

Notwithstanding the importance of differentiated passenger metrics for assessing ultimate stakeholder delivery, as discussed in Section 3.4, neither Europe, the US nor China has performance metrics oriented specifically to the passenger. The importance of understanding reactionary delay effects is also clear from the high proportion of the total delay that these comprise in Europe and the US. Nevertheless, as was observed in Table 3, this metric is currently not (comparably) reported in China, and only Europe36 has a specific metric relating to reactionary delay in its ATM performance programme. Much constructive work has been undertaken within these regions, and in comparative studies between Europe and the US, but key performance metrics are evidently missing if progress is to be made towards better measurement of delivery to the passenger, and better understanding of propagation effects and delay multiplier nodes in the networks.

There is also further scope for standardisation of such metrics, and opportunity to further assess these in the context of exogenous variables (such as military activity) and varying baseline (market) conditions. The trade-offs between the ICAO KPAs has long-since been recognised, in that performance improvements for one (e.g. flexibility) will inevitably come at a price to be paid for another (e.g. predictability).

Limited research has been carried out in this area, but far more lies ahead, particularly with regard to quantifying non-linearities across these relationships and understanding significant challenges posed by conflicts between stakeholder delivery (e.g. passenger punctuality), and regulatory/market-forces effects (e.g. airlines cancelling flights to mitigate passenger delay compensation required by regulation). Many of these metrics may be reasonably well monetised, such as the cost of delay and the cost of capacity, whereas the inclusion of largely non-monetised metrics (e.g. emissions impacts and ATM system resilience) poses further problems. Undertaking such analyses cross-sectionally is difficult enough, but looking forward to the 2025, 2030, 2035 and 2050 horizons cited above, the extent to which these objectives converge or diverge is distinctly unclear.

Metrics need to be intelligible (preferably fairly simple), sensitive (accurately reflecting the aspect of performance being measured) and consistent (they cannot be continually refined without losing comparability). These desirable qualities present yet another challenge. For example, designing metrics that suitably take exogenous variables and baseline conditions into account not only often renders them less simple to explain, but also further drives the requirement to continually review them to maintain appropriate sensitivity.

It is clear that progress in performance assessment will not be driven by mandate alone, but that such advances will also be data-driven. Such data may be provided through governmental or private enterprise, but both the diversity and volume of such data are increasing. We have sought to demonstrate in Section 3, for topological and performance assessment alike, that the notion of a 'sufficient' sample is often non-intuitive, and that many metrics may behave non-monotonically as a function of sampling fraction. This is particularly true for relatively smaller samples, with which analysts often have to work: due to limited accessibility (e.g. to airline or airport data) or purposive sampling - both in turn often determined by cost. The analyst not only has to assess different values obtained for the same metric from various data sources, but also the robustness of changes in metrics relative to the estimation errors of the sample. Furthermore, it is very rare to see statistical significance testing carried out on changes between reporting periods, or on differences between regions.

Data accessibility currently decreases from the US, to Europe, through to China. The situation in Europe is improving with some momentum. There remain several opportunities to apply some of the analytical techniques that have already brought useful insights on either side of the Atlantic, in a further developed context of mutual data sharing with colleagues in China. Greater challenges await in all three regions with regard to the advance of big data. With growing volume, this applies particularly to open architectures (in the context of an increasing diversity of data formats and demands from client interfaces) and data integrity. There may be a trend towards increasing dynamic metrics and data consumption, if the cost of warehousing does not decrease sufficiently in the near term.

For performance assessment, keeping one eye on international standardisation, not least through collaborative effort with ICAO, and another on avoiding a 'one size fits all' approach, is key to future success if we are to continue to foster a learning environment across nations' experiences and solutions - common and diverse.

Acknowledgements

The authors are indebted to Dr. Xiaoqian Sun (Associate Professor, School of Electronic and Information Engineering, Beihang University, Beijing) for kindly offering invaluable insights and support during the production of this paper, particularly with regard to ATM operations in China. The authors are also grateful for helpful comments received from two anonymous reviewers.

References

1. Standard inputs for EUROCONTROL cost-benefit analyses. Ed. 7. Brussels (Belgium): EUROCONTROL; 2015.

2. Performance Review Commission. Performance review report 2014 - An assessment of air traffic management in Europe during the calendar year 2014. Brussels (Belgium): EUROCONTROL, Performance Review Commission; 2015.

3. Tanner G. The principles of flight planning and ATM messaging. In: Cook A, editor. European air traffic management - principles, practice and research. Aldershot: Ashgate Publishing Ltd; 2008. p. 35-63.

4. Sjögren S, Söderberg M. Productivity of airline carriers and its relation to deregulation, privatisation and membership in strategic alliances. Transport Res Part E 2011;47(47):228-37.

5. Cao Q, Lv J, Zhang J. Productivity efficiency analysis of the airlines in China after deregulation. J Air Transport Manage 2015;42:135-40.

6. Chow CKW, Fung MKY. Measuring the effects of China's airline mergers on the productivity of state-owned carriers. J Air Transport Manage 2012;25:1-4.

7. Zhang Y, Round DK. The effects of China's airline mergers on prices. J Air Transport Manage 2009;15(6):315-23.

8. Zhang Y, Peng T, Fu C, Cheng S. Simulation analysis of factors affecting air route connection in China. J Air Transport Manage 2016;50:12-20.

9. Civil Aviation Administration of China. Statistical bulletin of civil aviation industry development in 2014. Beijing (China): Civil Aviation Administration of China; 2016.

10. EUROCONTROL, Federal Aviation Administration. Comparison of air traffic management-related operational performance: U.S./ Europe - 2013. Brussels, Belgium and Washington DC, USA: EUROCONTROL and Federal Aviation Administration; 2014.

11. International Civil Aviation Organization. The fourth meeting of ICAO Asia/Pacific air traffic flow management steering group (ATFM/SG/4) - Current CDM/ATFM status in China. Bangkok (Thailand): International Civil Aviation Organization; 2014.

12. CODA digest, all-causes delay and cancellations to air transport in Europe - 2014. Brussels (Belgium): EUROCONTROL, Central Office for Delay Analysis; 2015.

13. Belkoura S, Cook A, Peüa JM, Zanin M. On the multi-dimensionality and sampling of air transport networks. Transport Res Part E 2016;94:95-109.

14. Thedchanamoorthy G, Piraveenan M, Kasthuriratna D, Sena-nayake U. Node assortativity in complex networks: An alternative approach. Procedia Comput Sci 2014;29:2449-61.

15. Pyrgiotis N, Malone KM, Odoni A. Modelling delay propagation within an airport network. Transport Res Part C 2013;27(2):60-75.

16. Jetzki M. The propagation of air transport delays in Europe [Dissertation]. North Rhine-Westphalia: RWTH Aachen University; 2009.

17. Pyrgiotis N. A public policy model of delays in a large network of major airports. J Transport Res Board 2011;2206(2206):69-83.

18. Cook A, Tanner G, Cristobal S, Zanin M. Delay propagation -new metrics, new insights. 11th USA/Europe air traffic management research and development seminar; 2015 June 23-26; Lisbon, Portugal; 2015.

19. Hao L, Hansen M. How airlines set scheduled block times. 10th USA/Europe air traffic management research and development seminar; 2013 June 10-13; Chicago IL, USA; 2013.

20. Kafle N, Zou B. Modeling delay propagation: a joint analytical-statistical approach. Proceedings of the 18th air transport research society world conference; 2014 July 17-20; Bordeaux, France. Red Hook, NY: Curran Associates; 2014.

21. Sun X, Wandelt S, Linke F. Temporal evolution analysis of the European air transportation system: Air navigation route network and airport network. Transportmetrica B 2015;3(2):153-68.

22. Bratu S, Barnhart C. An analysis of passenger delays using flight operations and passenger booking data. Pittsburgh PA, USA: Sloan Industry Studies; 2004. Working Paper: WP-2004-20.

23. Sherry L, Wang D, Xu N, Larson M. Statistical comparison of passenger trip delay and flight delay metrics. Transportation research board 87th annual meeting; 2008 January 13-17; Washington D C., USA; 2008.

24. Paleari S, Redondi R, Paolo MP. A comparative study of airport connectivity in China, Europe and US: which network provides the best service to passengers? Transport Res Part E 2010;46 (2):198-210.

25. Wang D. Methods for analysis of passenger trip performance in a complex networked transportation system [Dissertation]. Fairfax VA: George Mason University; 2007.

26. Calderon-Meza G, Sherry L, Donohue G. Passenger trip delays in the U.S. airline transportation system in 2007. 3rd international conference on research in air transportation; 2008 Jun 1-4; Fairfax VA, USA; 2008.

27. Sherry L, Wang D, Donohue G. Air travel consumer protection: a metric for passenger on-time performance. Transportation research board 86th annual meeting; 2007 January 21-25; Washington D.C., USA; 2007.

28. Ball M, Barnhart C, Dresner M, Hansen M, Neels K, et al. Total delay impact study: A comprehensive assessment of the costs and impacts of flight delay in the United States. Maryland, USA: University of Maryland; 2010. Final Report: 01219967.

29. Sherry L. Passenger trip delays statistics for 2010. Transportation research board 91st annual meeting; 2012 January 22-26; Washington D C., USA; 2012.

30. International Civil Aviation Organization. Manual on global performance of the air navigation system (Doc 9883). Ed. 1. Montreal (Canada): International Civil Aviation Organization; 2009.

31. Federal Aviation Administration. Report on NextGen performance metrics, pursuant to FAA Modernization and Reform Act of 2012. Washington D.C., USA: Federal Aviation Administration; 2013.

32. SESAR Joint Undertaking. European ATM master plan, Edition 2015. Brussels (Belgium): SESAR Joint Undertaking; 2015.

33. Federal Aviation Administration. Destination 2025. Washington DC (USA): Federal Aviation Administration; 2011.

34. European Commission. Flightpath 2050 - Europe's vision for aviation (report of the high level group on aviation research). Brussels (Belgium): European Commission; 2011.

35. European Commission. White paper: Roadmap to a single European transport area - towards a competitive and resource efficient transport system. Brussels (Belgium): European Commission; 2011.

36. SESAR Consortium. SESAR definition phase: Milestone deliverable 2, air transport framework - the performance target. Brussels (Belgium): SESAR Consortium; 2006.