Perceptual Rendering of Immersive Displays

Constant demand for higher quality pushes display manufacturers to constantly increase the capabilities of new display devices, e.g. high spatial and temporal resolution as well as high dynamic luminance range. These developments impose significant requirements on the quality provided by computer graphics (CG) techniques. At the same time the human visual system has a number of limitations. For example, it can perceive very high spatial details only in the fovea region and high temporal resolution is crucial only in certain situations, e.g., for quickly moving objects. The discrepancy between high quality demand on the display side and the inborn limitations of the human visual system raises the question whether the quality provided by CG techniques should always match the capabilities of displays perfectly or it should rather be governed by the capabilities of the human visual system. Recently, eye tracking becomes the commodity technology, where even low cost eye trackers provide a high accuracy gaze prediction at high refresh rates while being included inside head-mounted displays. Such solutions can provide useful information about the quality requirements in different regions of displays, and allow to optimize rendering according to the ever-changing attentional focus or even in a predictive manner.

In this project we aim to leverage the capabilities of current eye tracking systems to guide rendering for new display technologies. We investigate computational techniques that track gaze location, eye movements, and focal distance, and use this information to match the quality shown on the display devices with capabilities of the human visual system. This approach has a twofold benefit. First, it allows to save a lot of resources that would be otherwise dedicated for producing information invisible to the observer, e.g., rendering high resolution images in peripheral regions should be avoided. As a result, the rendering devices will be more lightweight and energy efficient. Second, such an approach enables to reallocate resources in a predictive manner to improve the image quality in crucial places, e.g., where the observer is predicted to focus on next. We consider various display devices ranging from regular high resolution 2D screens to newly emerging head-mounted displays used for augmented and virtual reality. The latter raise additional issues, e.g., the limited total weight, but also the additional motion of the head.

In our first work, we have explored means to exploit the limited human visual field to increase rendering speed. More specifically, we have contributed a concept for combining foveated rendering with optimizations regarding the HMD’s lens astigmatism [IEEE VR’16]. Astigmatism is an optical distortion that cannot be removed through software and leads to the effect that inside the HMD only the center area can be perceived with high acuity, while the image gets more blurred with increasing distance from the center. We have proposed a gaze-contingent sampling method that renders with higher image quality in the center and with lower one in the peripheral regions. Based on a standard sampling map only optimized for lens astigmatism (Fig. 1, left) and a real-time estimate of the currently foveated region (Fig. 1, middle), our method calculates a final sampling map by taking the minimum value between the sampling map for lens astigmatism and the one for the current eye gaze (Fig. 1, right). A prototype system based on an Oculus Rift DK2 with integrated PUPIL eye tracker implementing this method has achieved improvements in rendering speed of up to 20%.

In another work, we have proposed a method that exploits two other effects occurring in head-mounted displays: a lens defect because of which, depending on the distance of the eye gaze to the center, certain parts of the screen towards the edges are not visible to the user anymore, as well as the perceptual effect that if the user looks in one direction, he cannot see large parts of the screen in the opposite direction any more [ACM VRST’16]. Our method calculates these invisible areas in real time, skips rendering for them, and instead reuses the pixel colors from the previous frame (Fig. 2). We have further introduced a one-time calibration routine to measure both effects for a particular user and HMD. We have demonstrated that this approach can achieve up to 2x speed-up. While the calibration currently takes about two minutes, future work could develop a generic model for a specific HMD which works well for most users with an optional, short calibration for fine tuning.

In a third work, we have presented a computational model to predict the users' spatio-temporal visual attention for graphical user interfaces [ACM CHI’16, best paper honourable mention award]. Like existing models of bottom-up visual attention in computer vision, our model does not require any eye tracking equipment. Instead, it predicts attention solely using information available to the interface, specifically the users' input as well as the UI components they interact with. We have demonstrated that our model predicts attention maps more accurately than state-of-the-art methods. As a next step, we plan to bring this model to HMD interfaces and, to this end, generalize to other tasks and input modalities more common to VR environments (such as mid-air gestures), as well as to explore the use of eye tracking to complement or replace the other input modalities.

In a fourth work, we address the problem of latency in gaze-contingent rendering systems. The problem is most severe during fast eye movements (i.e., saccades) when the prediction of the current gaze direction lags behind the true gaze direction. As a result, the gaze-contingent rendering does not provide images that match the requirement of the human visual system. For example, in the case of foveated rendering, the high-quality image region does not match the actual position at which the user is gazing. This quality mismatch is visible and hampers the adaptation of gaze-contingent techniques. To address this problem, we suggest a new way of updating images in gaze-contingent rendering during saccades. Instead of rendering according to the current gaze prediction coming from the eye tracker, our technique predicts where the saccade is likely to end and provides an image for the new fixation location as soon as the prediction is available (Fig. 4, left). While the quality mismatch during the saccade remains unnoticed due to the saccadic suppression, a correct image for the new fixation is provided before the fixation is established. To enable such updates, we derived a model for predicting saccade landing positions and demonstrate how it can be used in the context of gaze-contingent rendering to reduce the influence of system latency on the perceived quality. We consider using both personalized and averaged model derived from many observers. We validated our technique in a series of experiments for various combinations of display framerate and eye tracker sampling rate (Fig. 4, right).

 

Project Team

Principal Investigators
Dr. Andreas Bulling

Dr. Piotr Didyk

Dr. Karol Myszkowski