Efficient Distributed Computing in the Visual Cloud

The cloud has traditionally been focused on optimized processing within the data center. For visual computing, however, the cloud needs to expand into hybrid scenarios with processing distributed between mobile clients, the backend data center, and possibly cloud edge devices in between. The HW architectures and optimization targets between these devices often vary dramatically and algorithms need to be distributed across all of them. Our aim is at exploring and experimentally developing new ways for developing software for such environments.

This project addresses these challenges both on the algorithmic and software/hardware infrastructure level in the context of typical use-cases in the film and video industry, e.g., by contributing to and benefiting from the Dreamspace EU project. There we also describe the scalable distributed rendering approach we developed.

For our research on global illumination we use large ray streams to efficiently distribute the workload to GPUs and CPUs simultaneously. This allows taking advantage of ray sorting and vectorization:

Each iteration of the integrator generates a wide stream of rays that is sorted according to material and then processed in parallel. This wavefront approach improves performance by 5% - 15% depending on the scene. This generic approach generalizes nicely beyond simple Path Tracing. We also used it also for Bi- Directional Path Tracing, Photon Mapping, and Vertex Connection and Merging (VCM).

The differences in hardware architectures are addressed by exploring new common abstractions and compiling from a single source to different hardware targets. We applied this in several contexts: To accelerate a core part of many lighting algorithms, we designed a portable acceleration structure for Photon Mapping in AnyDSL. Our new 142 I Projects implementation is based on Hash Grids and works efficiently on CPUs and GPUs, e.g., taking advantage of AVX SIMD instructions on the CPU.

In collaboration with the compiler design lab of Prof. Hack, we are exploring the potential for autovectorization of traversal kernels. So far, we successfully vectorized the traversal for packets of rays. The initial results show that our automatically vectorized code is already within 20% of the handvectorized version. We are currently improving and generalizing this result, exploring other traversal variants (e.g., hybrid single-ray/packet traversal) and shading operations, which would significantly benefit from the functional programming style of AnyDSL.

We also explored the fine-grained shared virtual memory (SVM) architecture on newer Intel‘s CPUs (e.g., Skylake). Using AnyDSL we specialized image processing kernels to adaptively split the workload between the CPU (e.g. boundary regions) and the GPU, both working on the same, shared data. We describe the algorithm and the shape of the kernel once and from this single source generate native code for both the CPU and the integrated GPU (via OpenCL). This approach improves performance by up to 40%. Research on distributed image processing and faster shading is still ongoing.

Our research already led to two new HPC projects (Prothos and Metacca, BMBF) that started in January 2017 and are funded for three years (700 k€, plus additional funding for the group of Sebastian Hack).

Project team

Principal Investigator
Philipp Slusallek

Dr.-Ing. Richard Membarth