Scholarly article on topic 'Eye Gaze as a Human-computer Interface'

Eye Gaze as a Human-computer Interface Academic research paper on "Computer and information sciences"

Share paper
Academic journal
Procedia Technology
OECD Field of science
{"Eye gaze" / "image processing" / "human-computer interaction"}

Abstract of research paper on Computer and information sciences, author of scientific article — Rafael Santos, Nuno Santos, Pedro M. Jorge, Arnaldo Abrantes

Abstract This work describes an eye tracking system for a natural user interface, based only on non-intrusive devices such as a simple webcam. Through image processing the system is able to convert the focus of attention of the user to the corresponding point on the screen. Experimental tests were performed displaying to the users a set of known points on the screen. These tests show that the application has promising results.

Academic research paper on topic "Eye Gaze as a Human-computer Interface"

Available online at


Procedia Technology 17 (2014) 376 - 383

Conference on Electronics, Telecommunications and Computers - CETC 2013


Rafael Santos*, Nuno Santos, Pedro M. Jorge, Arnaldo Abrantes

Instituto Superior Engenharia de Lisboa, Rua Conselheiro Emidio Navarro, 1, Lisbon 1959-007, Portugal


This work describes an eye tracking system for a natural user interface, based only on non-intrusive devices such as a simple webcam. Through image processing the system is able to convert the focus of attention of the user to the corresponding point on the screen. Experimental tests were performed displaying to the users a set of known points on the screen. These tests show that the application has promising results.

©2014TheAuthors.PublishedbyElsevierLtd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/3.0/).

Peer-review under responsibility of ISEL - Instituto Superior de Engenharia de Lisboa, Lisbon, PORTUGAL. Keywords: Eye gaze; image processing; human-computer interaction

1. Introduction

The advance of technology makes possible the development of new human-machine interfaces. The act of looking at a screen is part of most natural interaction processes. But the information that the eye gaze can give us is still not entirely exploited today. Gathering and processing users' eye gaze to interact with machine is a topic already studied, but mostly is based on specific technologies that are not available in mass market devices, such as laptops or tablets. This work describes a system to detect eye gaze based on a laptop web camera enabling a more natural form of human-machine interaction. The use of a common image acquisition device allows this application to run in mobile devices such as smart phones or tablets, where the use of specific hardware is very difficult. This paper is organized as follow: the related work is presented in section 2; section 3 describes the proposed system; experimental results are presented in section 4 and conclusions and future work are discussed in section 5.

* Corresponding author. Tel.: +351 962 622 761; E-mail address:

2212-0173 © 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/3.0/).

Peer-review under responsibility of ISEL - Instituto Superior de Engenharia de Lisboa, Lisbon, PORTUGAL. doi: 10. 1016/j .protcy.2014.10.247

2. Related Work

Eye gaze is a natural form of interaction, accomplished by identifying where a person is looking. However, replicating this procedure automatically within the scope of human-machine interaction is not simple.. Eye gaze techniques are studied for more than hundred years and has been the subject of several studies over the last decade [1,2,3]. Firstly, Rayner and Pollatsek [9] developed a system based on electro-oculography using electrodes on people skin that measure electric potential differences. Duchowski [5] developed a system based on contact lenses. However, most reliable systems which produce the best results use specific and expensive equipment [14]. New systems are being developed that use non-intrusive and more affordable devices. Among these systems, the best performances are obtained with a source of infrared light [3]. Distrust of infrared light exposure motivated the development of eye gaze detection systems that use current technology, such as webcams [2]. These systems still have limitations associated with head movement's compensation [2,3] and eye random and involuntary movements [6]. It is also necessary to improve real time processing algorithms and hardware [3].

There is a commercial application developed in Portugal by Luis Figueiredo [13] which implements this kind of human-machine interface. However, it uses a higher level and expensive hardware, infrared and high speed camera which increases the commercial cost. The works presented in [1, 2] describe a possible eye gaze technique based on corneal reflex, using infrared illumination and camera. This refection produces a little and stable white point just below pupil (figure 1) contributing to obtain a good eye gaze and positioning system. Although promising results are obtained, a little-studied issue of this approach is eye exposed time to infra-red light and its consequences [2].

Figure 1 - Eye captured with infrared camera

The work presented in this paper is related with the work of Wild [3]. Wild also uses the same sequential methodology and supportive libraries for his application, but the image processing algorithms are based on Haar classifiers and Hough circle which are computational demanding and difficult to be applied on real time applications. The system presented in this paper follows a similar implementation, but the image processing techniques that are used, proved to be faster due to the implementation of a tracking algorithm, instead of a continuous (frame by frame) detection. The proposed system also assumes the use of a simple webcam, available on most computers and mobile devices on the market today.

3. Eye Gaze Detection System

The system has some proposed requirements such as the acquisition of user images, processing them, detecting essential points for eye tracking and calibration process. The block diagram for the proposed system is presented in figure 2.

Figure 2 - System block diagram

The application starts with image acquisition, either from a web camera or a pre-recorded video for testing proposes. After image acquisition, there is a calibration block where eye gaze is initialized for the pointer position on the screen. This block is decomposed in two stages:

i) detection of pupil position;

ii) estimation of the transformation matrix that will convert the centre of the pupil into a point in the screen.

The eye tracking block consists on pupil detection and tracking along the image sequence. With pupil position and the transformation matrix, it is possible to determine the point on the screen where the user is looking. The system is based on the image processing toolbox OpenCV, Open Source Computer Vision library [15], that implements important algorithms like:

• Haar Cascade classifier;

• Hough transform;

• Mean Shift (Cam Shift);

• Kalman filter.

System operation can be divided into two phases: calibration (second block in figure 2) and real time processing (first, third and fourth block in figure 2).

Figure 3 - Initial image processing

The main blocks of the initial image processing operation mode is shown on figure 3. Firstly, face detection is performed with Haar classifiers [12]. These classifiers are based on features extraction, which found contrast variation inside a group of pixels making two distinguish areas, darker and lighter shades. The classifiers are trained with two groups of images, good and bad examples of the specific features. The use of a common web camera imposes some limitations on the quality of the acquired image, particularly regarding to lighting conditions. These limitations can cause face miss detection due to the lack of contrast. To solve this problem the proposed algorithm uses different Haar filters which make the face detection step more robust. Figure 4(a) shows an example of the face detection procedure. For the purpose of minimizing detection errors and reduce processing time, relevant regions are cropped for further processing. For eye detection only the top half of face image is used (see figure 4(a)). This procedure helped us to minimize errors and speed up eye detection step (see figure 4(b)).

Figure 4 - (a) Face detection; (b) - Eye detection

Once the system has the positions of the eye, pupil detection is performed. Pupil detection is one of the most important steps in the Calibration block, namely due to accuracy requirements. For this task, Hough circle transform is used. From Hough transform, the system can identify the iris and, consequently, the pupil position is obtained as shown in figure 5. To speed up the process, only the image window with the eyes is used (see figure 4(b)). In this step, the image is converted to gray scales and histogram equalization and Gaussian blur filter are applied to reduce noise components.

Figure 5 - Iris and pupil detection

After obtaining pupil center, the calibration operation step is finished. The main algorithms of this phase (Haar classifiers and Hough transform) are computational demanding and not suitable to be applied is the real time processing operation mode. Therefore, a tracking algorithm is used to detect a set of interesting points (including the center of the pupil) throw the acquired image sequence. For the tracking algorithm is used the Mean Shift procedure [11] (see figure 6).

Mean Shift is an algorithm which performs image segmentation. An image map is created through image projection based on object histogram. For better performance, the algorithm chooses the object to follow based on the map and object position on previous image.

Due to the small variations of pupil position during detection, a Kalman filter is used, assuming a fixed position motion model (see figure 7).



Mean Shift

c POSITIONING linear correlation

\ kalman filter /

Figure 6 - Eye Tracking real time sequence

Figure 7 - (a) Pupil position without Kalman filter; (b) With Kalman Filter

From figure 7 it is possible to see the advantages of using the Kalman filter regarding more stable tracking results of the pupil position. Kalman position estimation equations (1 and 2) used were given by:

xk = xk_1 + wk (1)

zk = xk + vk (2)

Development of systematics for pupil coordinate transform was necessary to make possible the conversion between eye coordinates and screen coordinates. Assuming (x1,x2) eye coordinates and (y1,y2) the corresponding coordinates on screen, linear transformation matrix, M, is given by equation (3):

M = R-1^ (3)

where Rx = i and rxy = ¿£f=i y^.

4. Testing and Analysis

Testing environment is composed with two stages:

• Pre-recorded video with 800x600 pixels resolution;

• Webcam real time stream with 640x480 pixels resolution.

Based on low resolution images, Kalman filter uses a measurement covariance around 100, which is bigger than the process covariance value of 0.1. Those numbers are default and can be changed to fit the images conditions.

Recorded videos are useful for testing the algorithms and estimate the average error. To record a test video, it is displayed on the computer screen, a sequence of yellow dots in a black background at known positions with a specific order (top left, top right, bottom right, bottom left). Two videos were acquired for each user, one for training and one for test the system. Each training video was processed to determine the pupil positions and to calibrate the system. With the test sequences, errors between the true positions of the points on the screen and the estimated positions, given by the eye gaze, are computed and these values are presented in table1.

Trainin g phase Testing phase

Eye 1 Eye 2 Eye 1 Eye 2

56 33 40 45

53 31 28 34

56 37 36 66

73 32 89 91

65 33 103 71

Table 1 - Average error of the user eye gaze in video with 800x600 resolution.

The lines on Table 1 represent the average errors between real coordinates and eye gaze estimated coordinates on the screen at each known specific position. Results show that overall average error, arithmetic mean value for each eye, is smaller than 62 pixels. Figure 8 shows an example of a frame at the calibration phase where the user is looking for the point at the bottom-right corner of the screen. T1 and T2 show the estimated positions computed from both eyes. The grid divides the screen in sections so the system can classify them for internal calculations, like average error.

Figure 8 - Training phase example

T1 and T2 dots represent the cursor estimated positions, one for left eye (pink) and the other for right eye (white). The same calibration and test procedure was performed from webcam real time streaming with video resolution of 640x480 pixels. In this case, the calibration phase has to be assisted by the user and the person that is testing should follow the yellow dots on the screen. Based on this information, the system is able to compute the errors between real positions and estimated ones.

Figure 9 - Calibration phase with the webcam.

Figure 9 shows an example of the calibration phase with the webcam. In this image, the user is looking for the top-right corner of the screen (yellow dot). Table 2, describe the average error between real coordinates and eye gaze estimated coordinates, for webcam test.

Trainin g phase Testing phase

Eye 1 Eye 2 Eye 1 Eye 2

103 44 364 320

182 62 171 154

116 140 233 195

Table 2 - Average error for user eye gaze using the webcam.

Test results show a big variability. Average errors are lower on calibration phase (training stage) than on test phase due to the estimation of the transformation matrix (Eq. 3), which is based on linear transformation and calculated on calibration phase. Due to those big variations of the transformation matrix, errors are noticeable when tests are performed with webcam, compared with pre-recorded videos. Webcam low resolution makes the system lose important information.

5. Conclusion

This paper presents an eye gaze tracking system for real time applications with a simple webcam. Experimental results shows that an acceptable level of precision is anchieved. However, some aspects must be guaranteed. An important one is the lighting conditions, specifically face and eyes illumination, which should be homogeneous and most natural possible (sun light is better than artificial light). Other important aspect is the camera quality. With higher resolutions, system will have biger and cleaner eyes, that will decrease detection and eye gaze errors. Also, higher refresh rates will help system to perform eye gaze with more accuracy on pupil positions. Tests performed in this the work used low resolution webcams and some promising result were obtained. So, with higher resolutions and better refresh rates, it is expected to improve the results.

Another approach, but kind des-centralized of our goal, would be to use infrared cameras [1,13,14]. With those cameras, a very good eye quality is obtained, as shown on figure 1, and better eye gaze tracking results should be achieved.

The system presented in this paper had two big goals: (i) perform actions through eye movements and (ii) real time eye gaze. The first one was not completely achieved, however it is possible to move a pointer on screen.. The second goal was successfully achieved and the proposed system can process information fast enough to be used on real time. This was possible due to all improvements on image processing, useful crops on analysed images and Mean Shift application. In general terms, the main objectives were achieved confirming the possibility to develop an eye gaze system based on webcams for human-computer interaction.


[1] Poole, A. e Ball, L. J. (2005). Eye Tracking in Human-Computer Interaction and Usability Research: Current Status and Future Prospects. In C. Ghaoui (Ed.) Encyclopedia of Human-Computer Interaction, Pennsylvania: Idea Group, Inc.

[2] Tunhua, B. B. W., Changle, L. S. Z. e Kunhui, L. (2010). Real-time Non-intrusive Eye Tracking for Human-Computer Interaction. Proceedings of the 5th International Conference on Computer Science and Education (ICCSE).1092-1096.

[3] Wild, D. J. (2012). Gaze tracking using a regular web camera. Retrieved from

[4] Canny, J. (1986). A computational approach to edge detection. IEEE transactions on pattern analysis and machine intelligence, PAMI -8(6):679-698.

[5] Duchowski, A. T. (2003/ Eye tracking methodology: Theory and practice. Springer-Verlag Ltd, London.

[6] Jacob, R. J. K. e Karn, K. S. (2003). Eye tracking in Human-Computer Interaction and usability research: Ready to deliver the promises, cap.: The mind's eye: Cognitive and applied aspects of eye movement research, p. 573-605. Elsevier, Amsterdam.

[7] Jaimes, A. e Sebe, N. (2007). Multimodal human computer interaction: A survey. Computer Vision and Image Understanding, 1-2(108):116-134.

[8] Kaur, M., Tremaine, M., Huang, N., Wilder, J., Gacovski, Z., Flippo, F., e Mantravadi, C. S. (2003). Where is "it"? event synchronization in gaze-speech input systems. In Proceedings of the Fifth International Conference on Multimodal Interfaces (ICCSE), p. 151-158, NY. ACM Press.

[9] Rayner, K. e Pollatsek, A. (1989). The psychology of reading. Prentice Hall, NJ.

[10] Welch, G. e Bishop, G. (2006). An Introduction to the Kalman Filter.

[11] Comaniciu, Dorin; Peter Meer (May 2002). Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE) 24 (5): 603-619.

[12] Paul Viola and Michael J. Jones. Rapid Object Detection using a Boosted Cascade of Simple Features. IEEE CVPR, 2001.

[13] Luis figueiredo and Isabel Gomes, ESTG - Portugal. COGAIN, 2007.

[14] Funda§ao PT, Magic Eye, and

[15] Open Source Computer Vision, OpenCV, /