Algorithms for Low Cost Depth Imaging

„Physical model for Time-of-Flight depth cameras allowing camera specific noise reduction“

  • „Exact Modelling of Time-of-Flight Cameras for Optimal Depth Maps“, Mirko Schmidt and Bernd Jähne, International Conference on Computational Photography (ICCP) 2010
  • „Variational Image Denoising with Adaptive Constraint Sets“, Frank Lenzen, Florian Becker, Jan Lellman, Stefania Petra and Christoph Schnörr, Scale Space and Variational Methods (SSVM) 2011

„Evaluation of 2D and 2.5D optical (range) flow with synthetic data“

  • „Real Versus Realistically Rendered Scenes For Optical Flow Evaluation“, Stephan Meister and Daniel Kondermann, ITG Conference on Electronic Media Technology 2011

Very cheap single view depth imaging cameras, i.e. Time of Fight cameras (ToF) or Microsoft's Kinect system, are entering the mass consumer market. While there has been a steadily increasing number of publications and startup companies which address the hardware design of such systems, currently only a few approaches are tackling the challenges of enhancement and analysis of the obtained 2.5D depth data. In general, the acquired images have a low spatial resolution and suffer from noise as well as technology specific artifacts. The goal of this project is to provide algorithmic solutions to the entire depth imaging pipeline, ranging from preprocessing to depth image analysis.

The aim of the this project is to focus on the "software side" of depth systems and to provide holistic algorithmic solutions to the imaging pipeline, ranging from preprocessing to image understanding. We organize the workload in three interconnected sub-projects: image enhancement, image analysis and benchmarking. In general, we assume that most "real world" application scenarios of depth systems require a real time processing of the entire pipeline. Hence, one focus of our work will be to provide algorithms capable to meet this constraint. The project started in October 2010.

Anisotropic smoothing of a sequence of depth maps. Top: one frame of a sequence of depth maps taken with a ToF cam. Bottom: corresponding frame taken from the smoothed sequence.

Subproject I: Enhancement

Depth Map Filtering, Scene Flow, Super Resolution and Fusion, Segmentation: the first subproject covers preprocessing and enhancement algorithms. The low resolution and signal to noise ratios of depth cameras imply the need for filtering, edge recovery and super resolution methods. Since most cameras also provide a gray-scale or color image, multi-modal image fusion techniques will be part of the preprocessing. Highly accurate super resolution from image sequences analysis and segmentation will also be tackled in this project. Sufficient preprocessing is required to guarantee a robust and efficient image analysis considered in the second subproject (see below).

Moreover, we will address the interconnections between both sub-projects and plan a fine adjustment between the preprocessing and analysis methods. In this step, information regarding noise characteristics and other specifics of depth imaging systems will be explicitly taken into account.

A key aspect in our research is utilize our internal knowledge of the depth system specifications to design specifically tailored methods which are able to compensate the systematic camera errors.

A second goal of the first subproject is to formulate a functional that will estimate scene flow and depth at the same time that noise and artifacts are reduced, both in  the depth maps and the intensity data. Finally, we are working on the reduction the complexity and computational cost of the developed algorithms and to make them real time applicable by employing refined multi-grid approaches and implementation on parallel hardware architectures.

Kinect calibration processes: left RGB image, right infrared channel

Subproject II: Analysis

The second part of the proposed project will focus on the analysis and interpretation of (low resolution) depth images and sequences for "real world" applications. 

Our aim is to create an algorithmic middle-layer, providing a set of generic tools which can be used as basis for more complex tasks, such as real time object recognition (including body parts like hands or heads) from partial depth views, object tracking and pose estimation. We are planning to achieve these goals by designing application generic feature descriptors which provide a robust multi-modal (depth + gray-scale or color) local encoding of the recorded data.

Kinect Calibration:

The Kinect system contains a standard RGB camera as well as an infrared camera combined with a infrared pattern projector to estimate the depth of a scene. While the quality of the depth map is quite good, the system does not provide a correct alignment of the depth and RGB data.

Therefore, we use the RGB and infrared raw data to calibrate the Kinect output into overlaying multi-channel sequences.

Uncalibrated overlay of the depth and RGB channels of the Kinect
Calibrated overlay of the depth and RGB channels of the Kinect

Multi-Modal Range Flow:

Given the calibrated sequences, we investigate multi-modal depth flow algorithms. In our first approach, we use the RGB texture for an initial estimate of the x-y flow and then add a depth term.

First results from our Range Flow algorithm on Time of Flight data. Top left: gray-scale image, top right: depth image, bottom left: vector coded flow in x-y direction, bottom right: color coded flow in z direction.

Subproject III: Evaluation

One key aspect of the entire project is that we need to be able to evaluate the performance of our own and external algorithms.

Therefore, we plan to establish a publicly available set of reference images and sequences in combination with reference depth maps and benchmarking measures. To guarantee reproducibility we will also distribute the source code for our algorithms when possible.

Object for database on scanning table.
3D mesh of object.

Our approach to solve the ground truth problem is twofold:

First, we are using a high resolution stereo system and a full-3D scanner to record a database of static scenes. The database will contain high resolution reference gray-scale and depth images, as well as depth data from several camera types. For the object and body part recognition, tracking and pose estimation tasks, we are recording according sequences with ground truth, as well as an object database with full 3D scans and partial depth view from different angles.

Second, we are investigating under which circumstances it is feasible to use realistic simulations using our physical ToF model for algorithm evaluation.

This results in the advantage that we can create data with perfectly known ground truth synthetically given an application with known image data characteristics.

In cooperation with Heidelberg Collaboratory for Image Processing