TY - GEN
T1 - Hierarchical semantic parsing for object pose estimation in densely cluttered scenes
AU - Li, Chi
AU - Bohren, Jonathan
AU - Carlson, Eric
AU - Hager, Gregory D.
N1 - Funding Information:
This work is supported by the National Science Foundation under Grant No. NRI-1227277. This work is also supported by the National Aeronautics and Space Administration under Grant No. NNX12AM45H
Publisher Copyright:
© 2016 IEEE.
PY - 2016/6/8
Y1 - 2016/6/8
N2 - Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. In this paper, we present a hierarchical semantic segmentation algorithm which partitions a densely cluttered scene into different object regions. A RANSAC-based registration method is subsequently applied to estimate 6-DoF object poses within each object class. Part of this algorithm includes a generalized pooling scheme used to construct robust and discriminative object representations from a convolutional architecture with multiple pooling domains. We also provide a new RGB-D dataset which serves as a benchmark for object pose estimation in densely cluttered scenes. This dataset contains five thousand scene frames and over twenty thousand labeled poses of ten common hand tools. We show that our method demonstrates improved performance of pose estimation on this new dataset compared with other state-of-the-art methods.
AB - Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. In this paper, we present a hierarchical semantic segmentation algorithm which partitions a densely cluttered scene into different object regions. A RANSAC-based registration method is subsequently applied to estimate 6-DoF object poses within each object class. Part of this algorithm includes a generalized pooling scheme used to construct robust and discriminative object representations from a convolutional architecture with multiple pooling domains. We also provide a new RGB-D dataset which serves as a benchmark for object pose estimation in densely cluttered scenes. This dataset contains five thousand scene frames and over twenty thousand labeled poses of ten common hand tools. We show that our method demonstrates improved performance of pose estimation on this new dataset compared with other state-of-the-art methods.
UR - http://www.scopus.com/inward/record.url?scp=84977518532&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84977518532&partnerID=8YFLogxK
U2 - 10.1109/ICRA.2016.7487712
DO - 10.1109/ICRA.2016.7487712
M3 - Conference contribution
AN - SCOPUS:84977518532
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 5068
EP - 5075
BT - 2016 IEEE International Conference on Robotics and Automation, ICRA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE International Conference on Robotics and Automation, ICRA 2016
Y2 - 16 May 2016 through 21 May 2016
ER -