Hardware Accelerated Object Recognition

A fundamental problem in computer vision is to identify the type or specific instance of individual objects within a physical space. Object recognition is broadly applicable in video tracking, intelligent systems, content-based image indexing, and semantic tagging. However, object recognition is challenging and few practical implementations exist beyond those with highly constrained inputs (e.g. eye and gesture tracking, optical character recognition, manufacturing quality control). Our focus is on developing an object recognition system for real-world environments with minimally constrained lighting conditions and camera pose, and densely clustered objects. This system is based on a novel hardware-accelerated design which combines 2D and 3D recognition with 3D contextual information obtained from video from a commodity monocular camera.

To illustrate the challenges associated with object recognition in a practical application, consider a hypothetical augmented reality maintenance tool in a semiconductor factory. A single step in a maintenance procedure for a photolithography machine requires the technician to identify and close 17 valves in several locations. Computer guided identification of the correct valves is problematic. For example, the appearance of a valve can vary with changes in orientation, camera pose, and lighting conditions. Further, two nearby valves may be indistinguishable without additional spatial context, or a valve may be difficult to locate due to occlusion from other components. Existing applications in this space avoid the need for object recognition by registering a high resolution 3D model of the machine to the viewpoint of the technician using head tracking and fiduciary markers. However, markers are impractical in topologically complex machines, and accurate, high resolution models may be unavailable due to IP issues or in cases where the component layout varies between instances of the same machine model.

Our goal is to achieve both high performance and high recognition accuracy in applications of this type. To reach this goal, our system will incorporate monocular 3D localization and mapping, 3D segmentation based on application-specific geometric constraints, and hardware-accelerated context-aware object recognition. The system will employ machine learning techniques with time budget constraints for application-independent run-time optimization. Further, we are developing a proof-of-concept augmented reality guided learning application based on this system, to evaluate its efficacy in a modern semiconductor manufacturing facility.


Project team

Principal Investigator
Mario Fritz

Gregory Johnson
Daniel McCulley
Daniel Pohl