Proceedings of the
9th International Conference of Asian Society for Precision Engineering and Nanotechnology (ASPEN2022)
15 – 18 November 2022, Singapore

Segmentation for Grasping: An Approach toward Autonomous Table Clearing

Ka-Shing Chung1,a, Marcelo H Ang Jr1, Wei Lin2, Haiyue Zhu2, Joel Short2 and Pey Yuen Tao2

1Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore 117575, Singapore

2SIMTech, A*STAR, 2 Fusionopolis Way, Singapore 138634, Singapore


Autonomous clearing of food trays and crockery at hawker centres involve robotic tasks such as item recognition, grasp point estimation and grasp execution. Assuming there are no dynamic obstacles in the manipulator workspace, we focus on the machine intelligence required for the first two tasks above. Our problem statement is as follows: Given a RGBD view of a cluttered scene consisting of one or more known objects at rest on a flat surface, we seek a way to determine a feasible grasp pose for each of these objects.
Recent approaches treat the whole pipeline as one black box and try to predict grasp poses directly from the RBGD input. However, the pipeline is intricate and contain many sub-tasks that could be explored with greater complexity. The related work that are successful in the end-to-end training, on the other hand, will merely output highest ranked grasp poses of any reachable object, without any semantic concept of the item being picked up. This lack of scene understanding would eventually inhibit the optimization of the grasping algorithms as there is no way to willfully select a particular object for manipulation.
In our work, we break down the grasp pose determination into several components and focus on solving them individually. First, we parse the input scene by passing it through a convolutional neural network trained for instance segmentation. The network outputs an image and depth mask for each object that has been detected in the scene, as well as the object class. We assume that we have a database of known objects. Next, we use the object masks to project a partial point cloud, which is registered to a complete point cloud of the corresponding object in our library. A transformation is needed to align the two point clouds. Finally, we apply the same transformation to the grasp pose from the library to produce a grasp pose for the object as seen in the initial scene. Our approach allows the user to select the object to be grasped, and also lays the groundwork for an automated object selection strategy in the future.

Keywords: Robotic grasping, Instance Segmentation.

PDF Download