Real-time 3D Pose Estimation with a Monocular Camera Using Deep Learning and Object Priors

Ankit Dhall

September 2018

PDF Project

Abstract

arXiv preprint, 2019

Type

Thesis

We propose a complete pipeline that allows object detection and simultaneously estimate the pose of these multiple object instances using just a single image. A novel “keypoint regression” scheme with a cross-ratio term is introduced that exploits prior information about the object’s shape and size to regress and find specific feature points. Further, a priori 3D information about the object is used to match 2D-3D correspondences and accurately estimate object positions up to a distance of 15m. A detailed discussion of the results and an in-depth analysis of the pipeline is presented. The pipeline runs efficiently on a low-powered Jetson TX2 and is deployed as part of the perception pipeline on a real-time autonomous vehicle cruising at a top speed of 54 km/hr.