Advances in scene understanding: object detection, reconstruction, layouts, and inference
View/ Open
Date
01/07/2019Author
Henderson, Paul Matthew
Metadata
Abstract
The goal of scene understanding is to capture the full content of an image in a human-interpretable
representation. This must describe the different objects present, including
their attributes such as class, shape, and pose, as well as the relations between objects.
Moreover, the representation should be globally-consistent across the entire image. In
this thesis, we consider four sub-tasks within scene understanding, and make contributions
to each.
When describing the content of an image, it is natural to start by detecting all the
objects that are present—that is, localising and classifying them. Our first contribution
is to show how to train a neural-network-based object class detector end-to-end
in a principled fashion, using the evaluation metric as the training loss, and using the
same pipeline at both training and test time. This is simpler and more elegant than
the traditional approach of using a surrogate loss, yet we show it achieves comparable
performance.
Once the location and class of an object are known, we can estimate its shape and
pose in 3D space. Our second contribution is a new approach to these tasks, which
supports training purely from 2D images—without 3D supervision, multiple views, or
annotations such as pose or keypoints. Moreover, this model is generative, and so allows
sampling new object shapes a priori.
To produce a globally-consistent description of a scene, it is important to reason
over all objects simultaneously, rather than considering each individually. Our third
contribution is a probabilistic generative model over complete indoor scene layouts.
It models complex arrangements in 3D space, including high-order spatial relations
among furniture and other objects.
One common approach to generating predictions that are consistent over all objects
in a scene, or pixels in an image, is to formulate and solve a discrete energy minimisation
problem. The energy is defined as a sum over factors, and the factor structure
greatly affects what minimisation algorithms work well. Our fourth contribution is a
method that automatically selects a suitable algorithm to solve a given energy minimisation
problem. To do so, it learns to predict the best algorithm based on characteristics
of the problem instance.