Scholarly article on topic 'Air Traffic Controller Trust in Automation in NextGen'

Air Traffic Controller Trust in Automation in NextGen Academic research paper on "Psychology"

CC BY-NC-ND
0
0
Share paper
Academic journal
Procedia Manufacturing
OECD Field of science
Keywords
{"Trust in automation" / NextGen / "Air traffic control"}

Abstract of research paper on Psychology, author of scientific article — Tannaz Mirchi, Kim-Phuong Vu, James Miles, Lindsay Sturre, Sam Curtis, et al.

Abstract NextGen introduces new automated tools to help air traffic controllers (ATCos) manage the projected increase in air traffic over the next decades. Assuring that ATCos have an appropriate level of trust in automation is critical to establishing the proper use of these automated technologies. The current study examined differences between subjective trust scales used to examine air traffic controller's trust in automation levels and the relationship of these trust metrics to ATCo behaviors. The study was carried out at the Center for Human Factors in Advanced Aeronautics Technologies (CHAAT) over the course of a 16-week internship involving twelve student ATCos. Results indicated that the Modified Human-Automation Trust Scale (M-HAT) was sensitive to changes in trust levels over the course of the internship. Student ATCos scoring high on the trust scale showed reduced situation awareness (SA) during high traffic density scenarios compared to students who scored low on the trust scale, possibly due to automation-induced complacency.

Academic research paper on topic "Air Traffic Controller Trust in Automation in NextGen"

CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedia Manufacturing 3 (2015) 2482 - 2488

6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the

Affiliated Conferences, AHFE 2015

Air traffic controller trust in automation in NextGen

Tannaz Mirchi*, Kim-Phuong Vu, James Miles, Lindsay Sturre, Sam Curtis,

Thomas Z. Strybel

Center for Human Factors in Advanced Aeronautics Technologies (CHAAT), Department of Psychology, California State University, Long

Beach, 1250 Bellflower Blvd., Long Beach, CA 90815, U.S.

Abstract

NextGen introduces new automated tools to help air traffic controllers (ATCos) manage the projected increase in air traffic over the next decades. Assuring that ATCos have an appropriate level of trust in automation is critical to establishing the proper use of these automated technologies. The currentstudyexamined differences between subjective trust scales used to examine air traffic controller's trust in automation levels and the relationship of these trust metrics toATCo behaviors. The study was carried out at the Center for Human Factors in Advanced Aeronautics Technologies (CHAAT) over the course of a 16-week internship involving twelve student ATCos. Results indicated that the Modified Human-Automation Trust Scale (M-HAT) was sensitive to changes in trust levels over the course of the internship. Student ATCos scoring high on the trust scaleshowed reduced situation awareness (SA) during high traffic density scenarios compared to students who scored low on the trust scale, possibly due to automation-induced complacency.

© 2015 PublishedbyElsevierB.V. Thisisan openaccess article under the CC BY-NC-ND license

(http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of AHFE Conference

Keywords: Trust in automation; NextGen; Air traffic control

1. Introduction

The Next Generation Air Transportation System (NextGen) is a new automated systemexpected to accommodate the projected increase in air traffic in the National Airspace System (NAS) [1]. NextGen should create a more efficient, safe, cost effective, and environmentally friendly approach to meet future aviation demands. To achieve these benefits, air traffic controllers and pilots will need new automation tools to identify and mitigate potential air-traffic conflicts. New features such as automated conflict detection and resolution will enable air traffic controllers (ATCos) to resolve potential aircraft conflicts faster. Data Comm will reduce the number of controller-pilot verbal communications. These toolsshould increase safety through more accurate transfer of information and a reduction in

* Corresponding author. E-mail address: tmirchi@hotmail.com

2351-9789 © 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license

(http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of AHFE Conference

doi: 10.1016/j.promfg.2015.07.509

operator workload [2]. NextGen tools promise to provide many benefits, but these will be realized only if thetools are being used properly. One important factor determining proper usage of NextGen technologies is trust in automation. The goal of the present study wasto determine proper measures of trust in automation, effective training methods for ensuring proper trust in automation, and the effects of trust on ATCo performance and situation awareness (SA).

1.1. Measuring trust in automation

Human-machine trust isthe dynamic expectation of reliability in an automation tool undergoing predictable changes as a result of experience with the tool [3]. For optimal use of automation, the operator must be aware of and understand several characteristics of the automation tool such as its current reliability, false alarm rate, and miss rate. In other words, the operator's trust must be properly calibrated to the characteristics of the automation. Calibration is the connection between an operator's perception of the reliability of an automated system and its reliability [4]. When the operator is not appropriately calibrated, negative consequences may result. Overtrust is defined as human trust exceeding the automation's capabilities, leading to over reliance on the automation. Overtrust can produce aloss of SA and out-of-the-loop phenomena [4]. Distrust occurs when operators underestimate the reliability of the automation, and fail to rely on it fully. Distrust prevents the realization of the full benefits of the automation tool [4].

Valid measures of trust are essential in order to properly calibratean operator's trust in automation. There aremultiplemethods for measuring trust in automation reported in the literature, most of which are subjective measures of trust. The current study used the following subjective measures of trust: 1) Automation-Induced Complacency-Potential Rating Scale (CPRS) [5], 2) Complacency Potential Rating Scale (CPRS) [6], and 3) the Modified Human-Automation Scale (M-HAT) [7].Singh, Molloy, & Parasuraman (1993) developed the Automation-Induced Complacency-Potential Rating Scale for assessing general trust in automation [5].In the current study, weuseda version of the original CPRSwith twelve items on a 5-point Likert scale [8]. Verma, Kozon, Ballinger, Lozito, and Subramanian (2011) modifiedthe Automation-Induced Complacency-Potential Rating Scalein order to make it specific to trust in air traffic management automation [6].This scale has 36 items ona 5-point Likert scale.The Human-Automation Trust Scale (HAT) is an empirically based tool that measureshuman trust in automated systems [9]. The scale consists of twelve items, using a 7-point Likert scale. Kunii (2006) developed a modified version of the HAT to address negative attitudes towards the original wording of the questions during pilot testing [7]. Some of the wording was changed without altering the meaning of the questions to create the Modified Human-Automation Trust Scale, also containing twelve items[7]. For the remainder of the paper, Automation-Induced Complacency-Potential Rating Scale [5] will be referred to as CPRS, Complacency Potential Rating Scale [6] as ATM-CPRS, and Modified Human-Automation Trust Scale [7] as M-HAT. For all three scales, higher scores reflect higher levels of trust while lower scores reflectlower levels of trust. Additionally, each of these scales is based on four factors related to trust in automation: trust, safety or security, confidence, and reliance.

1.2. Training trust in automation

Although many factors influence trust[10], effectivetraining methods for trust in automation havereceived little attention in the literature.Training operators to properly calibrate their level of trust in automation could reduce the occurrence of misuse and disuse [4].Training for trust in automation should focus on the way the automation works, the principles behind the design of the automation technology,and the conditions determining the effectiveness of the automation [11]. Operators should be specially trained to appropriately deal with the conflicting demands that automation presents such as passive monitoring versus active control [12].A better understanding of these principles, will equip future designers and supervisors to provide valuable training methods for automation.

Higham, Vu, Miles, Strybel, and Battiste (2013) examined training trust in automation for air traffic management [13]. A group of fifteen student ATCos participating in a 16-week internship were divided into two training classes, Trust Training or No-Trust Training. The Trust Traininggroup was provided with verbal feedback from the instructor if they moved an equipped aircraft that came close to losing separation but did not. The feedback was provided in order to further develop student knowledge of the dependabilityof the NextGen conflict detection tool.

The No-Trust Training group received no feedback. Higham et al. (2013)showed that trust training may be feasible because the Trust Training group was less likely to move the near-miss aircraft by the final exam [13]. Additionally, the Trust Traininggroup reported lower workload on scenarios containing all NextGen-equipped aircraft.However, this experiment was limited because only one near-miss aircraft pair was provided in test scenarios.

1.3. Present study

The present study further elaborated on the findings of Higham et al. (2013), by providing more opportunities for ATCos to show trust in NextGen automation tools [13]. We also examined differences in sensitivity and validity for three subjective trust scales,CPRS [5], ATM-CPRS[6],and M-HAT [7], each administered over the course of the 16-week internship. We also investigated the effects of controllers' trust levels on situation awareness (SA), which is defined as the operator's understanding of the dynamic situation. Previous research by Endsley (1996) showed that monitoring automation lead to complacency and a loss of SA, suggesting that operators became more complacent, and over-reliant on the automated tools [14]. Automation complacency has been related to putting a high level of trust in an automated system [15].

2. Method

2.1. Participants

Twelve students (one female and 11 males) from Mount San Antonio College in the study as part of a 16-week radar simulation internship at the Center Aeronautics Technologies (CHAAT). The average age of the students was 23.5 $10.00 per hour for their participation in the study ($120 for full participation).

2.2. Apparatus/materials

Testing and training simulations used the Multi Aircraft Control System (MACS). MACS is a medium fidelity software tool for human-in-the-loop air-traffic simulations [16]. The scenarios portrayed Indianapolis Center (ZID 91) traffic that included departures, arrivals,and overflights. During lab sessions, students switched between pseudopilots and air traffic controllers. For the experiment, confederate researchers served as pseudopilots. Two test scenarios consisting of low-density traffic and two test scenarios consisting of high-density traffic were used. For all scenarios, an equal mix of NextGen-equipped aircraft and current-day unequipped aircraft were included.

For equipped aircraft, students had several NextGen tools for managing traffic including Data Comm, conflict detection, and conflict probing. Data Comm enabled digital handoffs, frequency changes, and clearances. The conflict detection tool alerted the ATCo of any loss of separation (LOS)between equipped aircraft pairs that would occur in the next eight minutes by flashing aircraft in red and showing the number of minutes to LOS next to the call sign. Note that this tool was perfectly reliable for equipped aircraft. The conflict probing tool allowed the ATCo to plan a new conflict-free path for an aircraft by shading a conflict area in blue. Throughout the internship, the instructor reminded the studentsof the reliability and consistency of the conflict detection tool in detecting potential conflicts between equipped aircraft. Therefore, moving any non-alerting NextGen equipped aircraft for potential traffic conflictssuggested mistrust in the automated tools.

2.3. Procedure 2.3.1. Training

The 16-week radar simulation internship in CHAAT took place every week on Saturday from 8:00 AM -6:00PM. The lab portion lasted 3.25 hours while the lecture portion lasted 1.5 hours. A retired, radar-certified air traffic controller taught the internship lab and class. On the first day of the internship, all participants signed informed consent forms and completed the three trust questionnaires, and a demographics questionnaire. During the

(FAA CTI Institution) participated for Human Factors in Advanced years and they were compensated

first eight weeks of the internship, students were introduced to MACS and taught basic ATM techniques such as altitude, speed, vector, structure, and phraseology. Students also learned how to manage NextGen equipped and unequipped aircraft equally.

2.3.2. Experimental test

Experiment sessions occurred in Week 9of the internship for the midterm exam and Week 16 for the final exam. Testing procedures were the same for both sessions. At the start of the testing session participants completed the three trust questionnaires. They were then briefed for thirty minutes on the general purpose of the study, the MACS tools, scenarios, and probe questions. Following the briefing, students were taken into the testing cubicles and instructed to initiate a voice check with the pilots. Next, they participated in a ten-minute training scenario to warm-up their ATCo skills and practice using a touch screen for probe questions.Lastly, the participants began the four 42-minute mixed-equipage experimental scenarios.

The Situation Present Awareness Method (SPAM) was used to measure SA [17].Probe questionsqueried participants about their sector at regular intervals throughout the testing scenario. Probing began six minutes into the scenario and continued for three-minute intervals through minute 38 of the scenario. Each scenario included ten probe questions. Participants were instructed during the briefing to answer the questions as quickly and accurately as possible. The order of the scenarios and probe questions were counterbalanced across participants for both the midterm and the final exam. Immediately after each trial, participants were given a ten-minute break. At the end of the final exam, participants were debriefed on their experiences during the internship and exam sessions.

2.4. Measures

2.4.1. Subjective trust measures

Three trust questionnaires were administered to participants on the first day of the internship (Week 1), at the beginning of the midterm exam (Week 9), and at the beginning of the final exam (Week 16). The three trust questionnaires were: CPRS [5], ATM-CPRS [6], and M-HAT [7].

2.4.2. Behavioral trust measure

In addition to subjective measures, we also used a behavioral measure of trust - the numbernear-miss equipped aircraft that were moved.A nearmiss occurred when two equipped aircraft came within 6-10 nautical miles (nm)laterally but did not lose separation (less than 5nm laterally and 1,000 ft vertically). These aircraft pairs were not alerted by the automated conflict detection tool. A total of three near misses were included in each scenario. The number of near-miss aircraft that were moved was calculated using a Visual Basic program, which records the movements of any pre-programmed near-miss aircraft per participant and scenario. Therefore, by moving these aircraft the participant showed mistrust in the automated conflict detection tool.

3. Results

3.1. Subjective trust

A 3 x 3 within-subjects analysis of variance (ANOVA) was conducted to evaluate whether there weredifferences in sensitivitybetween trust ratings measured over the course of the internship. The two independent variables in this analysiswere Trust Scales (CPRS, ATM-CPRS, M-HAT) and InternshipWeek (1, 9, 16); the dependent variable was the trust score. For all analyses, we used a more liberal criterion ofp< .10 due to the small sample size. Furthermore, if any analyses had Mauchly's Test of Sphericity violations, Huynh-Feldt correctionswere used. There was a significant difference between the three trust scales administered, F(2,22)= 28.87, p< .001, which is not surprising due to the differences in scale values. Over the 16-weeks of the internship, there was also a significant increase in the levels of trust reported by the participants, F(2,22)= 6.86, p = .005, suggesting that with training trust in automation increased. These main effects were modified by a significant interaction between Trust Scales and Internship Week, F(2.51, 27.63)= 2.65, p = .078 (see Fig. 1).

Trust Scales over the 16-week Internship

« 4.5-

--«-- CPRS — ATM-CP RS M-HAT

■ÏI

Week 1 Midterm Final Internship Week

Fig. 1. Interaction between Trust Scale and Internship Week. The following trust scales were used for this analysis: Automation-Induced Complacency-Potential Rating Scale (CPRS) [5], Complacency Potential Rating Scale (ATM-CPRS) [6], and the Modified Human-Automation Scale (M-HAT) [7].The midterm exam took place at Week 9 and the final exam at Week 16.

Simple effect analyses were conducted to further break down the interaction of Trust Scales and Internship Week. The CPRSand ATM-CPRS Trust Scales did not change significantly over the internship, p> .10. However, the M-HAT showed a significant increase in trust ratings over the course of the internship, F(1.30, 14.29)= 6.16, p = .020. Participants' trust ratings on the M-HAT on the first day of the internship (M = 3.85, SEM = .21) were significantly lower than at the final exam (M = 4.56, SEM = .13), p = .063. Participants' trust ratings on the M-HAT at the midterm exam (M = 4.28, SEM = .08) were also significantly lower than at the final exam (M = 4.56, SEM = .13), p = .089. Since the M-HAT was the most sensitive to showing difference between trust scale ratings over the course of the internship, these ratings were used to divide the participants into a high and low trust group using a median split (Mdn = 4.14) for further analyses.

3.2. Behavioral trust

A 2 (Trust Group: High, Low) x 2 (Test Session: Midterm, Final) x 2 (Traffic Density: High, Low) mixed-design ANOVA was conducted to assess whether the high and low trust groups differed in the average number of near-miss aircraft moved at the midterm and the final with high and low-density traffic. Trust Group was a between-subjects factor, and Test Session and Traffic Density were within-subjects factors. By using these two measures (trust group based on M-HAT ratings) and average number of near-miss aircraft moved, we can consider whether subjective trust scale ratings were able to provide evidence of trusting behavior. The dependent variable in this analysis was the average number of near-miss aircraft moved. All main effects and interactions on the average number of near-missaircraft moved were non-significant (ps> .20). At the midterm, students moved on average 1.00 (SD = .77)near-miss aircraft in high-density scenarios, and 1.29 (SD = 1.08) near-miss aircraft inlow-density scenarios. At the final,students moved on average .96 (SD = .86) near-miss aircraft in high-density scenarios, and 1.04 (SD = 1.05)near-miss aircraft in low-density scenarios.

Fig. 2. Three-way interaction between Test Session, Traffic Density, and Trust Group.

3.3. Situation awareness

A 2 (Trust Group: High, Low) x 2 (Test Session: Midterm, Final) x 2 (Traffic Density: High, Low) mixed-design ANOVA was also conducted to look at differences between high and low trust groups in situation awareness levels at the midterm and final, with high and low traffic density. Again, Trust Group was used as a between-subjects factor, and both Test Session and Traffic Density were within-subjects factors. The dependent variable was SPAM probe accuracy and probe latency. All effects of the factors on SPAM probe latency were non-significant (ps > .30). When looking at probe accuracy, Test Session had a significant main effect, F(1, 9)= 16.69, p = .003. SPAM probe accuracy at the final exam (M = 77.12, SEM = 1.72) was higher than at the midterm exam (M = 64.02, SEM = 3.41). There was also a significant interaction between the Traffic Density and Trust Group, F(1, 9)= 7.25, p = .025, and a significant three-way interaction between Test Session, Traffic Density, and Trust Group, F(1, 9) = 3.79, p = .083 (see Fig. 2). For the final exam, low trust participants had higher probe accuracy for the high-density scenarios (M = 86.31, SEM = 2.02) compared to the low-density scenarios (M = 70.58, SEM = 3.98). A reverse effect was seen for the high trust participants who had higher probe accuracy for low-density scenarios (M = 81.22, SEM = 4.71) than the high-density scenarios (M = 72.81, SEM = 2.38), p = .006. At the midterm, no differences were observed between high and low-density scenarios for each trust group.

4. Discussion

In the present study we examined whether subjective trust ratings were sensitive to changes in trust over time brought about by training, and whether subjective trust measures were related to trust behaviors. We also investigated the effect of trust on SA. Three subjective trust scales were used: CPRS[5],ATM-CPRS [6], and M-HAT [7]. The average number of near-miss aircraft moved measured trust behaviorand SA was measured with SPAM. Near misses were defined as two aircraft coming close to losing separation but remaining 6-10nm apart. There were three near misses per scenario.

TheM-HATwas sensitive to changes in trust throughout the 16-week internship. From the first day, to the midterm, and to the final, the trust ratings significantly increasedand studentsreported more trust. However, the average number of near-miss aircraft moved was not significantly different between Trust Group, Test Session, and Traffic Density.Although not significant, all participants, especially those in the high trust group, tended to move more near-miss aircraft in low-density scenarios compared to high-density scenarios during the midterm. On average, at the midterm exam participants moved one near-miss out of three per scenario. A significant difference

was difficult to obtain due to the small sample size and low power.Three near misses per scenario, an increase from previous study using the same measure of trust behavior [13], still did not show great differences in trust behaviors. It's difficult to include too many near misses because controllers tend to change their strategies if they notice a similar reoccurring situation during a simulation.

For situation awareness at the final exam during high-density scenarios, participants who scored lower on the trust scale had higher SPAM probe accuracy while those scoring higher in trust had lower SPAM probe accuracy. One possible explanation could be that high trust participants became more complacent with the automationduring high-density scenarios and this negativelyaffected their situation awareness.

Acknowledgements

This study was supported by NASA cooperative agreement NNX09AU66A, Group 5 University Research Center: Center for Human Factors in Advanced Aeronautics Technologies (Brenda Collins, Technical Monitor).

References

[1] Joint Planning and Development Office. (2010). Concept of operations for the next generation air transportation system version 3.2. Retrieved

from http://jpe.jpdo.gov/ee/docs/conops/NextGen_ConOps_v3_2.pdf.

[2] A. Kiken, R.C. Rorie, L.P. Bacon, S. Billinghurst, J.M. Kraut, T.Z. Strybel,& V. Battiste. Effect of ATC training with NextGen tools

andonline situation awareness and workload probes on operator performance. In Human Interface and the Management of Information. Interacting with Information, Springer Berlin Heidelberg, 2011;483-492.

[3] B.M. Muir. Trust between humans and machines, and the design of decision aids. In E. Hollnagel, G. Mancini, & D.D. Woods (Eds.),

Cogengineering in complex dynamic worlds, Academic, London, 1998;71-83.

[4] J.D. Lee, & K.A. See. Trust in automation: Designing for appropriate reliance. Human Factors: The Journal of the Human Factors and

Ergonomics Society 2004;46:50-80.

[5] I.L. Singh, R. Molloy, & R. Parasuraman. Automation-induced "complacency": Development of the complacency-potential rating scale. The

International Journal of Aviation Psychology 1993;3:111-122.

[6] S. Verma, T. Kozon, D. Ballinger, S. Lozito, & S. Subramanian. Role of the controller in an integrated pilot—Controller study for parallel

approaches. In Digital Avionics Systems Conference 2011;3B1-1.

[7] Y. Kunii. Student Pilot Situational Awareness: The Effects of Trust in Technology (Doctoral dissertation, Embry-Riddle

AeronauticalUniversity). 2006.

[8] S.M. Merritt & D.R. Ilgen. Not all trust is created equal: Dispositional and history-based trust in human-automation interactions. Human

Factors: The Journal of the Human Factors and Ergonomics Society 2008;50:194-210.

[9] J.Y. Jian, A.M. Bisantz, C.G. Drury, & J. Llinas. Foundations for an empirically determined scale of trust in automated systems. International

Journal of Cognitive Ergonomics2000;4:53-71.

[10] R. Parasuraman, T.B. Sheridan, & C.D. Wickens. Situation awareness, mental workload, and trust in automation: Viable, empirically supported cognitive engineering constructs. Journal of Cognitive Engineering and Decision Making 2008;2:140-160.

[11] R. Parasuraman, & V. Riley. Humans and automation: Use, misuse, disuse, abuse. Human Factors: The Journal of the Human Factors and Ergonomics Society1997;39:230-253.

[12] A.L. Singh, T. Tiwari, & I.L. Singh. Effects of automation reliability and training on automation-induced complacency and perceived mental workload. Journal of the Indian Academy of Applied Psychology 2009;35:9-22.

[13] T.M. Higham, K.P.L. Vu, J. Miles, T.Z. Strybel, & V. Battiste. Training air traffic controller trust in automation within a nextgen environment. In Human Interface and the Management of Information. Information and Interaction for Health, Safety, Mobility and Complex Environments Springer Berlin Heidelberg, 2013;76-84.

[14] M.R. Endsley. Automation and situation awareness. Automation and human performance: Theory and applications, 1995;163-181.

[15] R. Parasuraman, R. Molloy, & I.L. Singh. Performance consequences of automation-induced "complacency." International Journal of Aviation Psychology1993;3:1 -23.

[16] T. Prevot. Exploring the many perspectives of distributed air traffic management: The Multi-Aircraft Control System MACS. S. Chatty, J. Hansman, & G. Boy. (Eds.). Proceedings of the HCI-Aero, AAAI Press, Menlo Park, CA, 2002;149-154.

[17] F.T. Durso, A.R. Dattel, S. Banbury, & S. Tremblay. SPAM: The real-time assessment of SA. A cognitive approach to situation awareness: Theory and application 2004;1.