Urban 3D scene understanding from images

Kundu, Abhijit

Title:

Urban 3D scene understanding from images

Files

KUNDU-DISSERTATION-2018.pdf (27.36 MB)

Author(s)

Kundu, Abhijit

Advisor(s)

Rehg, James M.

Advisor(s)

Person

Rehg, James M.

Associated Organization(s)

Organizational Unit

College of Computing

Organizational Unit

School of Interactive Computing

Collections

Theses and Dissertations

Permanent Link

http://hdl.handle.net/1853/61114

Abstract

Human vision is marvelous in obtaining a structured representation of complex dynamic scenes, such as spatial scene-layout, re-organization of the scene into its constituent objects, support of each object, etc. We also see the complete extent of the scene, even parts which are occluded. For example, even when part of the scene directly below a car is not visible, we infer that it is a part of road. This kind of structured and complete 3D scene understanding is very useful for several applications like autonomous driving. Our objective is to build a 3D scene representation of complex, real-world urban scenes from images alone much like the capabilities of human vision. The classic top-down "analysis-by-synthesis" approach offers an elegant account for such richness in human vision, but is computationally expensive and the resulting energy landscape is highly multi-modal and thus difficult to optimize. Combining top-down analysis with fast, discriminatively trained bottom-up predictors offers to solve this problem. However even recent versions of this hybrid approach are still restricted to toy problems. We revisit analysis-by-synthesis approach for complex real-world 3D scene understanding in light of advances in deep-learning methods, and availability of large-scale training data in the form of annotated images and 3D CAD models. In this thesis, we explore three different scene understanding frameworks with increasing richness in representation. The presented frameworks reasons jointly about the scene structure, their semantic labels along with 3D orientation and position of object instances over time. We also demonstrate seamless integration of different constraints and prior knowledge into our model and an effective fusion of measurements from multiple images in a video into a final representation of the scene. We evaluate these scene understanding frameworks on challenging real-world datasets of complex urban scenes.

Date Issued

2018-01-22

Resource Type

Text

Resource Subtype

Dissertation

Full item page

Title:

Urban 3D scene understanding from images

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Georgia Tech Library

Title: Urban 3D scene understanding from images

Files

Author(s)

Authors

Advisor(s)

Advisor(s)

Editor(s)

Associated Organization(s)

Series

Collections

Supplementary to

Permanent Link

Abstract

Sponsor

Date Issued

Extent

Resource Type

Resource Subtype

Rights Statement

Rights URI

Title:

Urban 3D scene understanding from images