RealityScan – An End-User Usable 3D Scanning Pipeline

Realistic 3D content that is automatically reconstructed from photographs will not be flawless in the near future. It is therefore essential to automatically evaluate the quality of reconstructed scenes and point out errors to the users. This is especially true for end-users that have not been trained for 3D reconstructions and are thus more likely to capture unsuitable images that again produce flawed reconstructions. When reconstructing previously unknown scenes, evaluating a reconstruction must be done without 3D ground truth. However, 2D ground truth is always available, such as, i.e. within photographs of the scene. Renderings of a perfect reconstruction should be identical with the input images.

In the last year, we have worked on a project that realizes this idea: If a set of images is given, we separate it into a reconstruction and an evaluation set. In the next step, we feed the reconstruction set into a 3D reconstruction algorithm, render the resulting model from the viewpoints of the evaluation images and compute the difference between renderings and evaluation images. This procedure can be seen in Figure 1. As a result, we obtain an error score for the whole model that later can be compared, e.g., to the scores of other reconstruction algorithms. Additionally, it is possible to take the created difference images (Figure 1c) and project them back into the scene to obtain a graphic visualization of local reconstruction error as shown in Figure 2. This can, e.g., be used to show user areas, in which they need to take more photos in order to improve reconstruction quality. We experimentally analyzed our approach and evaluated how it relates to established error metrics that require 3D ground truth. Moreover, we implemented an online benchmark based on the same concept. Since the only precondition of our method is that once you investigate the reconstruction system, a production of renderings from novel viewpoints is possible, this benchmark allows us to directly compare image-based reconstruction and rendering systems. In the near future, we plan to go live with this benchmark and draw first conclusions from it.

Another project we worked on in the last year is a massively parallel solver for Markov random fields. Since many problems in computer vision (e.g., optical flow, stereo vision, texture acquisition, and global structure from motion) and computer graphics (e.g., mesh segmentation) boil down to global discrete optimization, Markov random fields become key tools in visual computing. Most often they are also the main bottleneck in a visual computing pipeline. Especially when applying them to real-world data, they frequently result in large and hence problematic sizes that cannot bet handled efficiently by any of the existing solvers, because they do not make full use of multi- or many-core systems. We developed a massively parallel algorithm that works on Markov random fields of arbitrary (potentially sparse and very large) structures, handles arbitrary energy term types and can even deal with label costs. Our implementations based on CUDA and Intel’s TBB successfully harness the power of modern multi-/many-core systems such as GPUs or Intel’s Xeon Phi, and outperform the speed of state-of-the-art solvers by two orders of magnitude on very large datasets: For example, in our 3D reconstruction texture acquisition framework [Waechter et al. 2014], we managed to downscale the optimization time from almost one hour to about one minute without reducing the quality of the results (see Figure 3). This speed improvement is remarkable, especially considering that our new solver is generic – even more general than most solvers – and is not geared to that particular application. We believe that our solver brings execution times of core algorithms in visual computing, amongst them algorithms that we employ in our RealityScan pipeline, much closer to durations that are acceptable for end-users.

Project Team

Principal Investigators
Prof. Dr.-Ing. Michael Goesele