Researchers at NVIDIA, University of Washington, Stanford University, and University of Illinois Urbana-Champaign have recently developed a Rao-Blackwellized particle filter for 6-D pose tracking, called PoseRBPF. The approach can effectively estimate the 3-D translation of an object and its full distribution over the 3-D rotation. The paper describing this filter, pre-published on arXiv, will be presented at the upcoming Robotics Science and Systems Conference in Freiburg, Germany.
Tracking 6-D poses of objects in videos can enhance the performance of robots in a variety of tasks, including manipulation and navigation tasks. Most existing techniques for object pose estimation try to predict a single estimate for the 6-D pose (i.e. xyz translation and 3-D orientation) of an object in each camera frame.
These methods have a number of limitations and problems. For instance, they are unable to tell the pose of partially or completely occluded objects. Moreover, there are situations in which, due to symmetries, there is no single correct answer for the pose of an object, which complicates the task further.
“It turns out that many objects in our everyday environments are symmetric, such as dinner plates, bowls, bottles, or cubes,” Arsalan Mousavian, one of the researchers who carried out the study, told TechXplore. “These objects do not have a unique 3-D orientation since they look identical from many different viewing angles. To circumvent these problems, we proposed a method to track the full distribution of the pose of an object (as opposed to single pose estimate) through time. This distribution accurately captures the uncertainty in the object’s pose, and tracking over time helps disambiguate the pose of the object. For example, if an object is visible at some point and becomes occluded, the method can recover the pose by tracking it from previous frames.”
The video below has more information.
PoseRBPF, the approach developed by Mousavian and his colleagues, can track the full distribution over the 6-D pose (i.e. 3-D translation, 3-D orientation) of a given object, in relation to a particular camera. Probability distributions over 6-D space are highly complex, so if they are not measured properly it is impossible to update them in real-time. To ensure the accuracy of tracked distributions, the researchers decoupled their estimations of 3-D object translation and 3-D object orientation using a technique called Rao-Blackwellized particle filtering.
“In Rao-Blackwellized particle filtering, the object translations are represented by samples, or particles, and the orientation is discretized into small chunks of close to 200,000 possible orientations,” Mousavian explained. “We used a deep learning technique to pre-compute embeddings that represent what the object might look like in all these orientations and under arbitrary lighting conditions. Taking advantage of highly parallelized NVIDIA GPU processing, our approach can then compare the current camera image to these pre-computed embeddings for all possible orientations and update the distribution in real time.”
At each time step, the approach devised by the researchers updates the set of particles by sampling from the previous particle set, following a model that predicts how the object and camera might move from one step to another. This process allows PoseRBPF to accumulate information over time, which in turn leads to more robust and accurate pose estimates.
Visualization of rotation distributions. The lines represent the probability for rotations that are higher than a threshold. The length of each line is proportional to the probability of that viewpoint. As can be seen, PoseRBPF naturally represents uncertainties due to various kinds of symmetries, including rotational symmetry of the bowl, mirror symmetry of the foam brick, and discrete rotational symmetries of the T-LESS objects on the right.
Illustration of the computation for the conditional rotation likelihood by codebook matching. Left) Each particle crops the image based on its translation hypothesis. The RoI for each particle is resized and the corresponding code is computed using the encoder. Right) The rotation distribution P(R|Z, T) is computed from the distance between the code for each hypothesis and those in the codebook.
For each particle, the orientation distribution is estimated conditioned on translation estimation, while the translation estimation is evaluated with the corresponding RoIs.
The lines represent the probability for rotations that are higher than a threshold. The length of each line is proportional to the probability of that viewpoint. As can be seen, PoseRBPF naturally represents uncertainties due to various kinds of symmetries, including rotational symmetry of the bowl, mirror symmetry of the foam brick, and discrete rotational symmetries of the T-LESS objects on the right.
By conditioning orientation estimation on translation, the tracking system proposed by Mousavian and his colleagues can effectively represent complex uncertainty distributions over the space of 6-D object poses. Their framework also provides uncertainty information about a given object’s pose, which could be particularly useful in robot manipulation tasks. Moreover, the system was trained using synthetic and non-annotated data, thus it can save researchers the time and resources spent on annotating data.
“Our method combines the classical Bayesian estimation framework of particle filtering with deep learning,” Mousavian said. “It thereby brings together well established estimation techniques developed over the last decades and the power of recent deep learning approaches. As a result, PoseRBPF can robustly estimate poses of arbitrary objects, including symmetric ones.”
The researchers evaluated their approach on two 6-D pose estimation datasets: the YCB video dataset and the T-LESS dataset. PoseRBPF achieved state-of-the-art results, outperforming other pose estimation techniques. In the future, the particle filter developed by Mousavian and his colleagues could improve the performance of robots in a variety of settings, for instance by enhancing their object manipulation capabilities.
“Moving forward, we will investigate how to use the uncertainty estimates provided by PoseRBPF in the context of object manipulation,” Mousavian said. “Another avenue for future work is to actively move the camera so as to reduce uncertainty in an object’s pose, such as looking at an object from a different viewpoint to resolve ambiguity.”