• Traditional Simultaneous Localization  and Mapping (SLAM) approaches build maps based on geometric primitives such as points, lines or planes [1].
  • A remaining challenge is to represent and incorporate semantics in the for use in navigation, interaction, planning, etc.
  • We aim to combine semantics and mapping structure in a coherent representation of the environment.
  • The semantics are added by recognizing the object instances and registering those consecutively in the estimated map.


  • We seek to establish detected objects as possible landmarks while simultaneously localizing and building a map. Neither the type nor location of the objects are known in advance.
  • Objects can be recognized using a deep neural network toolbox [2]. But such classifiers are tailored for detection from a single image rather than association across multiple images.
  • We demonstrate a novel use of the nonparametric statistical methods, the Mann-Whitney statistic [3], to successfully associate objects in two consecutive frames using a confidence interval and large sample approximation.
  • We then employ an unsupervised clustering method, HDBSCAN [4], to establish the existence of objects in the map. We also present a method to execute clustering intermittently, which reduces computational complexity.



  • Object Detection:
  • We retrieve regions and class of detected objects using a Deep Convolutional Neural Network. These regions are later associated with their regions in the last frame. We used Mobilenets and Faster-RCNN, but any region-based detector can be used.
  • Non-Parametric Data Association (NPDA):
  • The Mann-Whitney statistic test compares depth distributions to determine if detected objects in consecutive frames are similar and likely correspond to the same object. It can maintain association for an object as long as the object stays in view.
  • The Mann Whitney test statistic is given by, U=|{d_i^t (x),d_j^(t-1) (y)}|, s.t.d_i^t (x) 〖<d〗_j^(t-1) (y)
  • Object Back-Projection using SLAM Pose:
  • Each detected object is back-projected to the map using estimated location information.
  • Intermittent Clustering Process:
  • We use intermittent clustering process with NPDA to estimate the probable locations of the object in the map simultaneously.



Future Work

  • In future plans, we intend to incorporate loop closure using existing semantics and reinforcement learning methods.
  • We will demonstrate the loop closure problem in a 2d simulated environment and solve the problem using a deep reinforcement learning agent. A simulated 2D case is shown here.  Preliminary results with low noise can successfully determine if there should be a loop within a one grid square neighborhood with 98% accuracy.

  • Next, we will expand this into a real world environments using our object localization method with SLAM.


1.C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age,” IEEE Transactions on Robotics, vol. 32, pp. 1309–1332, Dec 2016.

2.M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.

3.H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,” The annals of mathematical statistics , pp. 50–60, 1947.

4.R. J. Campello, D. Moulavi, and J. Sander, “Density-based clustering based on hierarchical density estimates,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 160–172, Springer, 2013.


This work was supported by the Advanced Driver Assistance System (ADAS) group at Texas Instruments (TI) in Dallas, TX.