Scholarly article on topic 'May I Interrupt? The effect of SPAM Probe Questions on Air Traffic Controller Performance'

May I Interrupt? The effect of SPAM Probe Questions on Air Traffic Controller Performance Academic research paper on "Economics and business"

Share paper
Academic journal
Procedia Manufacturing
OECD Field of science
{"Situation awareness measurement" / "Online probe technique" / SPAM / Intrusiveness}

Abstract of research paper on Economics and business, author of scientific article — Jillian Keeler, Henri Battiste, Elyse C. Hallett, Zach Roberts, Alice Winter, et al.

Abstract The use of probe questions for measuring situation awareness is often regarded as being intrusive on operator performance and workload (Pierce, 2012). Moreover, the probe questions themselves may change the operator's situation awareness. However, the intrusive effects of probe questions can be diminished through optimized presentation and collection of responses (Bacon & Strybel, 2013). The present study analyzed data from a large sample of 54 student controllers to determine whether an optimized presentation method for administering Situation Present Assessment Method (SPAM) probe questions negatively impacted the students’ workload or performance. Results were consistent with prior research (e.g., Bacon & Strybel, 2013) showing that probe questions were not intrusive and could be used as a method for measuring SA in experimental studies.

Academic research paper on topic "May I Interrupt? The effect of SPAM Probe Questions on Air Traffic Controller Performance"


Available online at


Procedía Manufacturing 3 (2015) 2998 - 3004

6th International Conference on Applied Human Factors and Ergonomics (AHFE 2015) and the

Affiliated Conferences, AHFE 2015

May I interrupt? The effect of SPAM probe questions on air traffic

controller performance

Jillian Keeler, Henri Battiste, Elyse C. Hallett, Zach Roberts, Alice Winter, Karen Sanchez, Thomas Z. Strybel, Kim-Phuong L. Vu

Center for Human Factors in Advanced Aeronautics Technologies (CHAAT), Department of Psychology, California State University, Long

Beach, United States


The use of probe questions for measuring situation awarenessisoften regarded as being intrusive on operator performanceand workload (Pierce, 2012). Moreover, the probe questions themselves may change the operator's situation awareness. However, the intrusive effects of probe questions can be diminished through optimized presentation and collection of responses (Bacon &Strybel, 2013). The present study analyzed data from a large sample of 54 student controllers to determine whether an optimized presentation method for administering Situation Present Assessment Method (SPAM) probe questions negatively impacted the students'workload or performance. Results were consistent with prior research (e.g., Bacon &Strybel, 2013) showing that probe questions were not intrusive and couldbe used as a method for measuring SA in experimental studies.

© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license


Peer-review under responsibility of AHFE Conference

Keywords: Situation awareness measurement; Online probe technique; SPAM; Intrusiveness

1. Introduction

The national airspace system (NAS) is expected to experience exponential growth in the number of aircraft (AC) over the next decades [1]. Without any additional tools, this projected increase in air traffic may negatively affect air traffic controller (ATCo) performance, potentially leading to a decrease in air transportation safety. Under the Next Generation Air Transportation System (NextGen), a series of advanced air traffic management tools will be implemented to assist ATCOs with the projected increased traffic flows. These new NextGen tools need to be evaluated to determine their impact on ATCOs' situation awareness (SA), workload, and performance.

2351-9789 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license


Peer-review under responsibility of AHFE Conference

doi: 10.1016/j.promfg.2015.07.843

Simulation studies examining the impact of advanced tools on ATCo performance often measure the operator's SA, which is the operator's understanding of his or her task environment. However, much debate continues on how to operationally define and measure SA. Many studies have indicated that performance is often correlated with measures of situation awareness. Measuring the effect of new technologies on operator SA is important because loss of awareness can cause serious errors in performance [2].

One particular method researchers use to measure SA during experimental studies is through administering probe questions during the scenario. The Situation Present Assessment Method (SPAM) is a probe method that measures operator SA and workload by administering questions pertaining to the current scenario as the operators are performing their tasks. Another probe method is the Situation Awareness Global Assessment Technique (SAGAT), which freezes the operator's scenario and removes all displays, before querying the operator. Freezing the scenario means that SAGAT is not feasible for real time measuring of SA in an operational setting. Researchers may also employ subjective measures of SA, including the Situated Assessment Rating Technique (SART), a survey asking participants to rate several dimensions relating to perceived SA. As with many subjective measures, though, conclusions based on subjective measures of SA should be supported by other converging evidence. Use of probe questions can have adverse effects on an operator's performance due to the addition of another secondary task [3]. Online probe questions may even alert the operator to events in the simulation, thus positively affecting their performance or changing their SA [4].

Pierce (2012) used an auditory probe method to administer probe questions. The auditory probe method was purposed to remove nuisance variability, as the operator would not have to perform two visual tasks simultaneously. Participants were trained to use the Air Traffic Scenarios Tests (ATST) [5],a low fidelity radar simulation, which measured five performance measures: (i) handoff delay, (ii) en route delay, (iii) aircraft interactions, (iv) number of correct exits, and (v) simulation errors. There were four conditions in the experiment, which consisted of the baseline condition, a traditional SPAM condition, an auditory-shadowing condition, and a list memory condition. Pierce found significant decreases in participants' interactions with aircraft right after a probe was administered across all probe scenarios. ATCo performance also suffered in the SPAM condition, as there were longer handoff delays, greater numbers of AC incorrectly handled, and fewer interactions with aircraft than in the baseline condition.

However, several subsequent studies have shown that the intrusiveness of probe questions can be minimized through optimizing the presentation method [6]. Instead of using an auditory presentation method [3],Bacon and Strybel presented SPAM probe questions on a separate touch-screen display using a multiple-choice format. Student ATCOs participated in medium fidelity simulations with scheduled flight-plan deviations. These deviations would result in losses of separation (LOS) if left unresolved. SPAM probes were administered during the scenario, with pre-event questions that were relevant to the deviation conflict, relevant to another conflict, or not relevant to any conflict. LOS and situation awareness were measured. Bacon and Strybel found that overall probe latency was significantly correlated with LOS. Additionally, their results failed to indicate that time to detect the flight plan deviations and numbers of LOS were changed by the presence of relevant probe questions administered before the scheduled event. Bacon and Strybel's[6] results refute the claim that SPAM probes change operator's situation awareness, and provide evidence for the validity of SPAM as a measurement of situation awareness. Their findings also advocate use of a more optimized method for probe presentation.

In addition, Silva et al. [7] further examined the intrusive effects of SPAM questions utilizing Bacon &Strybel's[6] method when taking into account the students' level of air traffic management proficiency. In Silva et al.'s study, student ATCOs achieved "journeyman status" if they met proficiency requirements by the eighth week of a 16-week internship. They found that students who achieved journeyman status managed traffic more efficiently, but their performance was not affected by the presence of SPAM probe questions. Silva et al.'s findings provide further support of the validity of SPAM probes as a measure of SA. However, because the sample size was small in both Bacon and Strybel's [6] and Silva et al.'s [7] studies, their findings need to be verified using a larger sample size.

The current study investigated whether the presence of SPAM probe questions, using Bacon &Strybel's[6] optimized administration technique, would negatively impact ATCo performance, workload, and SA with a larger sample size of students, including those who participated in Silva et al.'s [7] study. Performance on scenarios containing probe questions was compared with scenarios in which no probes were administered. SPAM probe

questions were presented in three-minute intervals. Performance metrics included average time through sector, number of LOS, and average handoff acceptance time. Situated Assessment Rating Technique (SART) and NASA Task Load Index (TLX) scores were also collected as subjective measures for SA and workload. In addition, a survey was administered to the student ATCOs at the end of the simulation to capture their views of the probe questions and the probe administration technique.

2. Methods

2.1. Participants

Fifty-four individuals enrolled in the Aviation Science program at Mount San Antonio College were recruited on a first-come-first-serve basis for a 16-week radar simulation internship. The internship took place at the Center for Human Factors in Advanced Aeronautics Technologies at California State University, Long Beach. Students were not paid for their participation in the internship, but were paid $10 per hour for 8 hours of testing (4 hours at the midterm and 4 hours at the final). Recruitment and participation occurred across four separate semesters.

2.2. Apparatus

Students were trained using the Multiple Aircraft System (MACS) equipped with both voice communication and NextGen tools including: (i) conflict alerting, (ii) trial planner with conflict probe, and (iii) controller-pilot Data Comm. The MACS program simulated Indianapolis Center, Sector ZID-91 with incoming and outgoing normal density air traffic. SPAM multiple choice questions were administered on a separate touch-screen display. Pseudo-pilots, or highly trained student confederates, flew all AC within the sector and interacted with air traffic controllers via push-to-talk headsets through a voice IP server.

2.3. SPAM probe questions

SPAM questions pertained to sector and conflict events in the simulation, as well as levels of perceived workload. They were multiple-choice or true or false questions. Counterbalancing the type of questions per scenario was employed to diminish carryover effects. Probe questions were developed beforehand in conjunction with scenario development. Query topics included comparing relative altitude levels, determining which quadrant would a conflict mostly likely occur next, and judging the distance between an AC and a waypoint within the sector. Participants were also queried about their perceived workload at regular intervals and asked to rate it on a seven-point Likert scale, with one being low and seven being high workload.

2.4. Measures

The NASA Task Load Index (TLX) was used to measure subjective levels of workload. Participants rated six workload dimensions (e.g. mental demand, temporal demand, physical demand, effort, frustration, and performance) at the end of each scenario. The Situated Assessment Rating Technique (SART) measured subjective levels of situation awareness based on three dimensions: demand on attentional resources, supply of attentional resources and understanding. Composite scores for both NASA TLX and SART were calculated per prior research standards.

Performance metrics included (i) average time through sector, (ii) average handoff acceptance time, and (iii) number of losses of separation (LOS). Average handoff acceptance time was defined as the time it took on average for an ATCo to accept an AC into their sector. Average time through sector was defined as the average amount of time it took an AC to completely travel through an ATCo's sector. A LOS was counted when any two AC came within 1-thousand feet vertically and 5-nautical miles laterally. Average handoff acceptance time and average time thru sector were objective measures of efficiency, whereas number of LOS was an indicator of safety.

A post simulation survey was administered at the end of the 16-week internship. Questions pertained to items related to the midterm and final tests, as well as participants' opinions and attitudes towards the probe presentations.

Four questions relevant to this paper were analyzed: (i) "how interfering was it to answer questions when they appeared on your probe screen (1= low interference and 7 = extreme interference)?", (ii) "was your workload changed by having to respond to questions when they appeared on your probe screen (1 = a significant decrease in workload, 4 = no change in workload, and 7 = significant increase in workload)?", (iii) "to what extent did the probe questions and your responses to the probe questions change your awareness of traffic (1 = no change, 4 = some change, and 7 = significant change)?", and (iv) "to what extent did the probe questions and your responses to them change your strategies for managing traffic (1 = no influence, 4 = some influence, 7 = significant influence)?"

2.5. Procedure

Participants received training for eight weeks, learning how to utilize NextGen technologies to manage air traffic in conjunction with traditional voice communications. After eight weeks, the students participated in a midterm exam that tested their air traffic management (ATM) skills. During the testing sessions, participants received an eighteen-minute warm-up trial prior to test scenarios. Four test scenarios consisted of 0%, two 50% (one with probes and one without probes), and 100% equipage of Next Gen AC. The order of the scenarios were counterbalanced across participants. Each scenario was 40 minutes in length. The air traffic density was designed to represent current day traffic levels.

Three minutes into each probe scenario, a "ready" prompt would appear, accompanied by an audio alert in the ATCo's headset. Participants had one minute to accept the "ready" prompt. They were instructed to accept the prompt only if they felt they had the capacity to read and attempt to answer the probe question. If they failed to accept the prompt within one minute, the ready prompt was removed, and the next ready prompt was presented two minutes later. When participants accepted the ready prompt, the probe question appeared immediately. Participants had one minute to respond to the question before a timeout occurred. Correct and incorrect responses were recorded, as well as response times to the ready prompt and probe question.

Performance measures were collected via MACs output during run time and calculated via a Visual Basic program after the simulation was completed. After the end of each trial, NASA TLX and SART were administered.

At the end of the midterm, participants were debriefed and allowed to discuss various issues and topics regarding their experience within the simulation. After completing the 16-week internship, a post-simulation survey regarding the internship overall, including assessment of the students' attitudes towards online probe questions, was administered.

3. Results

The current study uses a subset of data from a larger study (see also, Silva et al., 2011; Winter et al., 2015; Miramontes et al., 2015). Since the present study is only concerned with the effects of SPAM probe questions onATCoperformance, only data from the two 50% equipage scenarios at the midterm were analyzed.

3.1. Performance metrics

The main effect of handoff acceptance time approached statistical significance, F(1, 53) = 3.978, p = .051, as shown in Figure 1A. The no probe condition had lower average handoff acceptance times (M = 40.96 s, SD = 21.62 s) compared to the probe condition (M = 45.60 s, SD = 23.84 s). This data pattern is opposite of what would be expected if the online probes were intrusive to performance. The average time through sector was not significantly affected by probe questions, F(1,53) = .042, p = .838 (see Figure 1c). Finally, the mean number of LOS did not differ between probe and no probe conditions, F(1, 53) < 1, p = .99, shown in Figure 1b. Averages for LOS were similar across the probe (M = .96, SD = .80) and no probe (M = .96, SD = 1.01) conditions.

Fig. 1. (a) Average handoff acceptance times; (b) average loss of separations;(c) average time thru sector.

i Condition

Fig. 2.(a) NASA Task Load Index Composite Scores; (b) Situation Assessment Rating Technique Composite Scores.

3.2. SA and workload

No significant differences were found between the two conditions for TLX workload scores, F(1, 53) = .013, p ■ .908 (see Figure 2a) and for SART SA scores, F(1, 53) = 1.326, p = .255 (see Figure 2b).

3.3. Post-simulation survey

Due to small differences in data collection across simulations, only two semesters of data were available for this analysis (N = 29). The proportions of responses for each question are shown in Figure 3. When asked if probe questions interfered with ATCo responsibilities, participants' responses were distributed across the scale (see Figure 3a) with 40% responding below 4, and 44% responding above 4. When asked if they saw a change in their workload (see Figure 3b), 68% of participants indicated a small increase in workload with a rating of 5. Participants were divided when asked to rate any perceived change in awareness to traffic (see Figure 3c). 44% of participants responded above 4 and 28% responded below 4. Some division occurred againamongst participants when asked if their air traffic management strategies changed with the presence of probe questions. 32% said they saw some changeor significant change and 40% saw no change in their traffic management strategies (see Figured 3d).

Fig. 3. (a) proportion of response for level of interference; (b) proportion of response for level of perceived change in workload; (c) proportion of response for perceived changes in awareness of traffic; (d) proportion of response for perceived change in strategies for air traffic management.

4. Discussion

Findings from the present study suggest that probe questions are not intrusive to operator performance, SA, and workload. Neither objective nor subjective measures were negatively impacted by SPAM probe questions. With the post-simulation survey, most participants indicated that they saw neither change in workload nor change in air traffic management strategies when answering probe questions. Thus, the present study suggests that SPAM probe questions can be used effectively to capture SA in experimental research studies. These results are consistent with prior research conducted by Bacon &Strybel[6].

Conversely, our results are contrary to prior research conducted by Pierce [3]. First, the choice of presentation method for probe questions (e.g. auditory versus visual presentation) may account for some of the discrepancy in results. Considering the results of both studies, auditory probe questions might infer more with an ATCo's tasks than visual probe questions, as visual probe questions can be attended to at the operator's discretion. Secondly, the level of experience in air traffic management of the participants may account for the differences in the result patterns. Pierce's [3] participants had no prior background in aviation and very little training in the ATM task. The current study used highly trained student ATCOs with backgrounds in aviation sciences. Thus, the level of experience of the operators may be mediating variable for the potential negative effects of probe questions on performance, SA, and workload observed by Pierce [3]. Likewise, type of testing (e.g. low versus medium fidelity) environment differed greatly between the two studies, and this factor could account for the differences in findings.

We end this paper by noting several limitations to the current study. Since post-trial workload and SA were measured through self-report, these measures might have not been sensitive enough to detect small changes in workload and situation awareness brought on by the probe questions. Thus, further research might benefit from analyzing objective measures of SA and workload (e.g. response latency times) with subjective measures of both constructs. Similarly, some of our performance metrics were low in occurrence (e.g., LOS) and may not have been sensitive enough to be influenced by the probe questions. Examination of more sensitive measures (e.g. average vertical and lateral distance between AC) over number of LOS might be beneficial in future studies. Finally, the conclusions presented by the results of this study are based on null effects and converging evidence from other studies are needed to corroborate our findings.


This project was supported by NASA cooperative agreement NNX09AU66U, Group 5 University Research Center: Center for Human Factors in Advanced Aeronautics Technologies (Brenda Collins, Technical Monitor).


[1]Joint Planning and Development Office (JPDO). (2010).

[2] F. T. Durso, C. A. Hackworth, T. R. Truitt, J. Crutchfield, D. Nikolic. Situation Awareness as a Predictor of Performance for En Route Air

Traffic Controllers, Oklahoma University Norman Department of Psychology, 1999.

[3] R. S. Pierce. Human Factors: The Journal of the Human Factors and Ergonomics Society. 54 (2012) 838-848.

[4] M.R. Endsley. Proceedings of the National Aerospace and Electronics Conference (NAECON). (1988) 789 - 795.

[5] Federal Aviation Administration (FAA). (2011).

[6] L. P. Bacon, T. Z. Strybel. Safety science. 56 (2013) 89-95.

[7] H. I. Silva, J. Ziccardi, T. Grigoleit, V. Battiste,T. Z. Strybel, K.P.L. Vu. Human Interface and the Management of Information: Information

and Design. (2013) 269 - 275.