TY - GEN
T1 - Hybrid Frame-Event Solution for Vision-Based Grasp and Pose Detection of Objects
AU - Wang, Kyra
AU - Yang, Sihan
AU - Kumar, Deepesh
AU - Thakor, Nitish
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/8
Y1 - 2020/8
N2 - A key challenge in object manipulation using prosthetic hands is grasp detection and pose estimation, especially in cluttered scenes. Vision-based robotic grasping solutions typically only use conventional frame-based video cameras with high spatiotemporal redundancy, which is unsuitable for mobile platforms like prostheses with low processing power. On the other hand, while event-based dynamic vision sensors (DVS) have low spatiotemporal redundancy, their low resolution results in poor object segmentation and detection performance. In this paper we outline a novel hybrid solution inspired by the two-streams hypothesis of the neural processing of vision, utilizing both a frame-based video camera and a DVS to counter the pitfalls of both systems. By using computationally efficient object detection methods on the frame-based camera to highlight regions-of-interest (ROIs) for the DVS, we are able to perform pose estimation by computing the smallest axis of DVS events generated in the ROI. The proposed approach allows us to rapidly determine the required wrist rotation and a suitable grasp type to pick up objects using a prosthetic hand. Results on a laptop show that our method matches the accuracy of a conventional solution that employs only a frame-based video camera, while achieving 77.29% faster inference speed.
AB - A key challenge in object manipulation using prosthetic hands is grasp detection and pose estimation, especially in cluttered scenes. Vision-based robotic grasping solutions typically only use conventional frame-based video cameras with high spatiotemporal redundancy, which is unsuitable for mobile platforms like prostheses with low processing power. On the other hand, while event-based dynamic vision sensors (DVS) have low spatiotemporal redundancy, their low resolution results in poor object segmentation and detection performance. In this paper we outline a novel hybrid solution inspired by the two-streams hypothesis of the neural processing of vision, utilizing both a frame-based video camera and a DVS to counter the pitfalls of both systems. By using computationally efficient object detection methods on the frame-based camera to highlight regions-of-interest (ROIs) for the DVS, we are able to perform pose estimation by computing the smallest axis of DVS events generated in the ROI. The proposed approach allows us to rapidly determine the required wrist rotation and a suitable grasp type to pick up objects using a prosthetic hand. Results on a laptop show that our method matches the accuracy of a conventional solution that employs only a frame-based video camera, while achieving 77.29% faster inference speed.
KW - Computer vision
KW - Grasping
KW - Neuromorphic engineering
KW - Pose estimation
KW - Prosthetic hand
UR - http://www.scopus.com/inward/record.url?scp=85094165396&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094165396&partnerID=8YFLogxK
U2 - 10.1109/CASE48305.2020.9216970
DO - 10.1109/CASE48305.2020.9216970
M3 - Conference contribution
AN - SCOPUS:85094165396
T3 - IEEE International Conference on Automation Science and Engineering
SP - 1383
EP - 1388
BT - 2020 IEEE 16th International Conference on Automation Science and Engineering, CASE 2020
PB - IEEE Computer Society
T2 - 16th IEEE International Conference on Automation Science and Engineering, CASE 2020
Y2 - 20 August 2020 through 21 August 2020
ER -