Efficient Multi-View Video Streaming System

Live TV over IP - adaptively optimized transport vs. RTP

Multiview video (MVV) is an emerging form of visual-content representation enabled by advances in multi-camera capturing systems and image-based modeling and rendering algorithms. With multiple-perspective multiview video viewing ("Free Viewpoint Video"), a recorded scene can be continuously rendered from different viewpoints (angles) under interactive control of a user.

The key obstacles to efficient multiview video system deployment are the lack of a unified system view, consumption scenarios and usage environments. Specifically, the main system requirement is to transmit, decode and render multiview video in real-time and without visually-disturbing artifacts. In addition, the system needs to operate efficiently on different networks (fixed wired, fixed wireless, portable wireless and mobile channels) and devices (server, desktop PC as well as battery-powered Ultrabook™, tablet, smartphone). Correspondingly, scientific focus of our project is the design and implementation of an efficient system for multiview video streaming with the following distinct properties.

With respect to streaming, we focus on accurate selection of coding rates in adaptive HTTP video streaming. The main challenge in rate selection is that streaming performance suffers significantly from delayed and noisy throughput estimates obtained at the application layer of the streaming client. As a result, streaming applications implement large receiver buffers in order to maintain continuous video rendering with acceptable quality and bandwidth utilization. This is a serious limitation for multiview streaming applications that aim to realize viewpoint adaption with a low delay. In order to achieve a lower delay, we propose a solution based on two components. First, we implement a server-side simulation of the streaming client’s buffer, which provides a low-delay feedback for the rate selection. Second, we design a hybrid rate adaption logic based on both the estimated throughput and the buffer information, which stabilizes the adaptive response to the dynamics of transport layer. The results show that our approach improves the user-perceived video quality for dynamic streaming with a delay as low as the video-chunk duration. Additionally, we explore new parameters for multiview video streaming to make full use of the opportunities offered by the presence of multiple views of the same scene. This includes the approach to drop certain views from the transmission when the available bandwidth is limited instead of reducing the overall video quality. The missing views can be reconstructed with slightly lower quality on the receiver using the remaining views. To make use of this, we develop an algorithm which is able to determine the optimal choice of views to transmit under a given bandwidth constraint and the number of required views on the client.
The basic streaming research has been transferred over to the SALT project and in this project we are focusing on the aspects of specialized multiview video streaming.

In the area of multiview coding we concentrate our efforts on reaching real-time performance, since current state-of-the-art algorithms for encoding and decoding of multiview video still struggle to reach acceptable speeds for more than two views. To solve this problem, we follow a distributed approach which makes use of the video encoding capabilities of modern microprocessors and exploits the similarities between encoded standard video and encoded multiview video to create a single standard-compliant multiview video from an arbitrary number of standard videos without having to reencode all inputs.

Rendering quality of interpolated views

For multiview video rendering, we propose a real-time algorithm for the creation of virtual views from multiview video input material. Our algorithm does not require precomputed depth maps or other scene geometry information. Instead, the depth of each vertex of a virtual-view mesh is determined using a plane sweep approach. We combine the depth estimation with median filtering to suppress false depth edges. We implement this algorithm by transferring most of the depth estimation and rendering complexity to the rendering pipeline of a GPU. In particular, our OpenCL-based depth estimation divides the process into a large number of small units such that computations can be performed in separate threads on a GPU. For efficient rendering we use a modern OpenGL shader pipeline which moves additional complexity to the GPU. Our results show that we can render virtual views at 30 to 50 frames per second with resolutions of up to 3600x3000. In this way, achieve a performance improvement over the state-of-the-art rendering algorithms by a factor of 10.


Project team

Principal Investigator
Prof. Dr.-Ing. Thorsten Herfet

Former Principal Investigator
Dr. Goran Petrovic

Tobias Lange
Yongtao Shuai