User-centric Video Processing

Organizing community videos of a place into a structured, navigable graph. 3D transitions between videos are automatically computed from the video data.

In recent years, mobile devices have become abundant and almost all are now capable of capturing video. Video sharing on community sites is popular too, with hours of footage uploaded every minute. The research community has recently been successful in developing tools to improve the capture, editing and browsing of photo collections and companies are providing integrated services and experiences. However, comparatively little research has investigated video as the added temporal dimension creates algorithmic and computational problems. With advancement in algorithms and systems, there is great opportunity to improve the capturing, editing and browsing of video for amateur users.

We aim to improve the quality and flexibility of capturing, editing and exploring video. We will exploit content links within and between videos, and with these links build large interconnected graphs of videos with content connections made automatically. We call this user-centric video processing: it will make video easier for amateur users and, by exploiting inter- and intra-video content connections, will provide new empowering video experiences from novel browsing to object removal. Finding these connections has been a challenge for the computer vision and graphics community for decades, and current solutions are not useful to everyday users as computation times are too long and interfaces are too complicated.

Video inpainting to remove a person from a crowded scene. Left: The person to be removed. Right top: Occluding person. Right bottom: Removed person.

We will develop new algorithms and systems to solve these problems, which, with the help of Intel, will exploit and highlight processing within mobile devices and across cloud servers to make these new and useful video technologies accessible to all.

The goals of this project are three-fold:

  • Finding intra-video spatio-temporal relationship, scene analysis and similarity detection methods, to allow better video editing and bring new video tools to users,
  • Finding inter-video spatio-temporal relationships, discovering semantic links between videos, possibly linking centralized and distributed computing environments with many mobile devices of limited computational power to exploit user-created video databases,
  • Exploiting these intra- and inter-video relationships to improve upon and provide new capturing, editing, and browsing experiences that will increase the quality, flexibility and explorability of user-captured video.