Depth from Gaze

Publication - IEEE ICIP 2018

Tzu-Sheng Kuo, Kuang-Tsu Shih, Sheng-Lung Chung, and Homer H. Chen, “Depth from Gaze,” IEEE International Conference on Image Processing (ICIP 2018), pp. 2910–2914, 2018.

Figure 1. Typical eye trackers are used to estimate the point where the user is gazing on a 2D monitor (left). In this paper, we present an algorithm and model to estimate the depth of user's gaze point in the 3D real world with conventional eye trackers (right).

For decades, exploiting human gaze for 2D graphical user interface has been a popular research topic in human computer interaction, gaming, and psychology. Recently, the rising interest in augmented reality (AR) and virtual reality (VR) has further fueled the development of effective means for interaction with a 3D visual world. This paper investigates the estimation of depth in 3D space using an eye tracker. Specifically, the 3D depth of a gazed object with respect to the viewer is estimated using the gaze information obtained from an eye tracker. The eye tracker is considered because it has become a popular component of see-through devices to track the eye movement of a user and thereby control the presentation of images to the user. In this application scenario, besides observing how the user navigates the visual world, the gaze information can be utilized to compute the 3D depth of each scene point the user looks at during the visual navigation journey. A sparse depth map of the attended visual stimuli can be thus obtained. A critical step towards this goal is the depth estimation of a gazed object.

We believe that it is possible to address the depth estimation problem using gaze information because human eyeballs rotate when gazing objects at different depths. However, it should be noted that the depth of a gazed object in the context of this work is different from the depth of gaze at a particular time instant. The difference is due to the fact that gaze position, which is measured by the intersection of the visual axes of both eyes, varies with time despite the same point is gazed at. In this paper, we propose a depth-from-gaze method applicable to typical indoor interactions ranging from 0.65m to 2m in depth. The estimation is achieved by modeling the effect of temporal variation of gaze as Gaussian noise and by processing the gaze data over a fixation time interval. Furthermore, a Gaussian model is developed to determine the minimal distance between two statistically distinguishable depths acquired by an eye tracker.

Figure 2. My poster presented at ICIP 2018 in Athens.