浏览全部资源
扫码关注微信
1.东南大学 自动化学院,江苏 南京 210096
2.东南大学 复杂工程系统测量与控制教育部重点实验室,江苏 南京 210096
3.南京航空航天大学 空间光电探测与感知工业和信息化部重点实验室, 江苏 南京 211106
[ "陈敏佳(1999-),女,江苏苏州人,硕士研究生,2021年于南京理工大学获得学士学位,主要从事三维视觉等方面的研究。E-mail:czz122421@163.com" ]
[ "盖绍彦(1979-),男,山东青岛人,博士,2008年于东南大学获得博士学位,现为东南大学自动化学院副教授,主要从事计算机视觉、模式识别、三维测量等方面的研究。E-mail:qxxymm@163.com" ]
[ "俞 健(1988-),男,安徽芜湖人,博士,2021年于东南大学获得博士学位,现为东南大学自动化学院助理研究员,主要从事三维测量、多视几何与位姿估计等方面的研究。E-mail:yujian@seu.edu.cn" ]
纸质出版日期:2024-03-25,
收稿日期:2023-07-26,
修回日期:2023-09-08,
移动端阅览
陈敏佳,盖绍彦,达飞鹏等.采用辅助学习的物体六自由度位姿估计[J].光学精密工程,2024,32(06):901-914.
CHEN Minjia,GAI Shaoyan,DA Feipeng,et al.Object 6-DoF pose estimation using auxiliary learning[J].Optics and Precision Engineering,2024,32(06):901-914.
陈敏佳,盖绍彦,达飞鹏等.采用辅助学习的物体六自由度位姿估计[J].光学精密工程,2024,32(06):901-914. DOI: 10.37188/OPE.20243206.0901.
CHEN Minjia,GAI Shaoyan,DA Feipeng,et al.Object 6-DoF pose estimation using auxiliary learning[J].Optics and Precision Engineering,2024,32(06):901-914. DOI: 10.37188/OPE.20243206.0901.
为了在严重遮挡以及少纹理等具有挑战性的场景下,准确地估计物体在相机坐标系中的位置和姿态,同时进一步提高网络效率,简化网络结构,本文基于RGB-D数据提出了采用辅助学习的六自由度位姿估计方法。网络以目标物体图像块、对应深度图以及CAD模型作为输入,首先,利用双分支点云配准网络,分别得到模型空间和相机空间下的预测点云;接着,对于辅助学习网络,将目标物体图像块和由深度图得到的Depth-
XYZ
输入多模态特征提取及融合模块,再进行由粗到细的位姿估计,并将估计结果作为先验用于优化损失计算。最后,在性能评估阶段,舍弃辅助学习分支,仅将双分支点云配准网络的输出利用点对特征匹配进行六自由度位姿估计。实验结果表明:所提方法在YCB-Video数据集上的AUC和ADD-S
<
2 cm结果分别为95.9%和99.0%;在LineMOD数据集上的平均ADD(-S)结果为99.4%;在LM-O数据集上的平均ADD(-S)结果为71.3%。与现有的其他六自由度位姿估计方法相比,采用辅助学习的方法在模型性能上具有优势,在位姿估计准确率上有较大提升。
In order to accurately estimate the position and pose of an object in the camera coordinate system in challenging scenes with severe occlusion and scarce texture, while also enhancing network efficiency and simplifying the network architecture, this paper proposed a 6-DoF pose estimation method using auxiliary learning based on RGB-D data. The network took the target object image patch, corresponding depth map, and CAD model as inputs. First, a dual-branch point cloud registration network was used to obtain predicted point clouds in both the model space and the camera space. Then, for the auxiliary learning network, the target object image patch and the Depth-XYZ obtained from the depth map were input to the multi-modal feature extraction and fusion module, followed by coarse-to-fine pose estimation. The estimated results were used as priors for optimizing the loss calculation. Finally, during the performance evaluation stage, the auxiliary learning branch was discarded and only the outputs of the dual-branch point cloud registration network are used for 6-DoF pose estimation using point pair feature matching. Experimental results indicate that the proposed method achieves AUC of 95.9% and ADD-S<2 cm of 99.0% in the YCB-Video dataset; ADD(-S) result of 99.4% in the LineMOD dataset; and ADD(-S) result of 71.3% in the LM-O dataset. Compared with existing 6-DoF pose estimation methods, our method using auxiliary learning has advantages in terms of model performance and significantly improves pose estimation accuracy.
六自由度位姿估计辅助学习深度图像三维点云
6-DoF pose estimationauxiliary learningRGB-D image3D point cloud
WANG Z H, SUN X Y, WEI H, et al. Enhancing 6-DoF object pose estimation through multiple modality fusion: a hybrid CNN architecture with cross-layer and cross-modal integration[J]. Machines, 2023, 11(9): 891. doi: 10.3390/machines11090891http://dx.doi.org/10.3390/machines11090891
HAI Y, SONG R, LI J J, et al. Rigidity-aware detection for 6D object pose estimation[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver, BC, Canada. IEEE, 2023: 8927-8936. doi: 10.1109/cvpr52729.2023.00862http://dx.doi.org/10.1109/cvpr52729.2023.00862
DUFFHAUSS F, KOCH S, ZIESCHE H, et al. SyMFM6D: symmetry-aware multi-directional fusion for multi-view 6D object pose estimation[J]. IEEE Robotics and Automation Letters, 2023, 8(9): 5315-5322. doi: 10.1109/lra.2023.3293317http://dx.doi.org/10.1109/lra.2023.3293317
LIEBELT J, SCHMID C, SCHERTLER K. Viewpoint-Independent object class detection using 3D feature maps[C]. 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA. IEEE, 2008: 1-8. doi: 10.1109/cvpr.2008.4587614http://dx.doi.org/10.1109/cvpr.2008.4587614
LOWE D G. Object recognition from local scale-invariant features[C]. Proceedings of the Seventh IEEE International Conference on Computer Vision. Kerkyra, Greece. IEEE, 1999: 1150-1157. doi: 10.1109/iccv.1999.790410http://dx.doi.org/10.1109/iccv.1999.790410
DROST B, ULRICH M, NAVAB N, et al. Model globally, match locally: efficient and robust 3D object recognition[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA. IEEE, 2010: 998-1005. doi: 10.1109/cvpr.2010.5540108http://dx.doi.org/10.1109/cvpr.2010.5540108
郭清达, 全燕鸣, 姜长城, 等. 应用摄像机位姿估计的点云初始配准[J]. 光学 精密工程, 2017, 25(6): 1635. doi: 10.3788/ope.20172506.1635http://dx.doi.org/10.3788/ope.20172506.1635
GUO Q D, QUAN Y M, JIANG C C, et al. Initial registration of point clouds using camera pose estimation[J]. Opt. Precision Eng., 2017, 25(6): 1635.(in Chinese). doi: 10.3788/ope.20172506.1635http://dx.doi.org/10.3788/ope.20172506.1635
HE Y S, HUANG H B, FAN H Q, et al. FFB6D: a full flow bidirectional fusion network for 6D pose estimation[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA. IEEE, 2021: 3002-3012. doi: 10.1109/cvpr46437.2021.00302http://dx.doi.org/10.1109/cvpr46437.2021.00302
IRSHAD M Z, ZAKHAROV S, AMBRUS R, et al. SHAPO: implicit representations for multi-object shape, appearance, and pose optimization[C]. 17th European Conference on Computer Vision (ECCV), Tel Aviv, Israel, Springer, 2022: 275-292. doi: 10.1007/978-3-031-20086-1_16http://dx.doi.org/10.1007/978-3-031-20086-1_16
MO N K, GAN W S, YOKOYA N, et al. ES6D: a computation efficient and symmetry-aware 6D pose regression framework[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA. IEEE, 2022: 6708-6717. doi: 10.1109/cvpr52688.2022.00660http://dx.doi.org/10.1109/cvpr52688.2022.00660
周佳乐, 朱兵, 吴芝路. 融合二维图像和三维点云的相机位姿估计[J]. 光学 精密工程, 2022, 30(22): 2901-2912. doi: 10.37188/ope.20223022.2901http://dx.doi.org/10.37188/ope.20223022.2901
ZHOU J L, ZHU B, WU Z L. Camera pose estimation based on 2D image and 3D point cloud fusion[J]. Opt. Precision Eng., 2022, 30(22): 2901-2912.(in Chinese). doi: 10.37188/ope.20223022.2901http://dx.doi.org/10.37188/ope.20223022.2901
WANG C, XU D F, ZHU Y K, et al. DenseFusion: 6D object pose estimation by iterative dense fusion[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA. IEEE, 2019: 3338-3347. doi: 10.1109/cvpr.2019.00346http://dx.doi.org/10.1109/cvpr.2019.00346
HE Y S, SUN W, HUANG H B, et al. PVN3D: a deep point-wise 3D keypoints voting network for 6DoF pose estimation[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA. IEEE, 2020: 11629-11638. doi: 10.1109/cvpr42600.2020.01165http://dx.doi.org/10.1109/cvpr42600.2020.01165
QI C R, YI L, SU H, et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space[EB/OL]. 2017: arXiv: 1706.02413. http://arxiv.org/abs/1706.02413http://arxiv.org/abs/1706.02413
HUA W T, ZHOU Z X, WU J, et al. REDE: end-to-end object 6D pose robust estimation using differentiable outliers elimination[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 2886-2893. doi: 10.1109/lra.2021.3062304http://dx.doi.org/10.1109/lra.2021.3062304
XU Z L, ZHANG Y C, CHEN K, et al. BiCo-Net: regress globally, match locally for robust 6D pose estimation[EB/OL]. 2022: arXiv: 2205.03536. http://arxiv.org/abs/2205.03536http://arxiv.org/abs/2205.03536. doi: 10.24963/ijcai.2022/210http://dx.doi.org/10.24963/ijcai.2022/210
翟敬梅, 黄乐. 堆叠散乱目标的6D位姿估计和无序分拣[J]. 哈尔滨工业大学学报, 2022, 54(7): 136-142. doi: 10.11918/202110081http://dx.doi.org/10.11918/202110081
ZHAI J M, HUANG L. 6D pose estimation and unordered picking of stacked cluttered objects[J]. Journal of Harbin Institute of Technology, 2022, 54(7): 136-142.(in Chinese). doi: 10.11918/202110081http://dx.doi.org/10.11918/202110081
ZHANG Z P, LUO P, LOY C C, et al. Facial Landmark Detection by Deep Multi-Task Learning[M]. Computer Vision-ECCV 2014. Cham: Springer International Publishing, 2014: 94-108. doi: 10.1007/978-3-319-10599-4_7http://dx.doi.org/10.1007/978-3-319-10599-4_7
LIU X P, XUE N, WU T F. Learning auxiliary monocular contexts helps monocular 3D object detection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(2): 1810-1818. doi: 10.1609/aaai.v36i2.20074http://dx.doi.org/10.1609/aaai.v36i2.20074
KEHL W, MANHARDT F, TOMBARI F, et al. SSD-6D: making RGB-Based 3D detection and 6D pose estimation great again[C]. 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy. IEEE, 2017: 1530-1538. doi: 10.1109/iccv.2017.169http://dx.doi.org/10.1109/iccv.2017.169
XIANG Y, SCHMIDT T, NARAYANAN V, et al. PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes[EB/OL]. arXiv preprint arXiv: 1711.00199, 2017. doi: 10.15607/rss.2018.xiv.019http://dx.doi.org/10.15607/rss.2018.xiv.019
GUO J W, XING X J, QUAN W Z, et al. Efficient center voting for object detection and 6D pose estimation in 3D point cloud[J]. IEEE Transactions on Image Processing, 2021, 30: 5072-5084. doi: 10.1109/tip.2021.3078109http://dx.doi.org/10.1109/tip.2021.3078109
ZHOU G Y, WANG H Q, CHEN J X, et al. PR-GCN: a deep graph convolutional network with point refinement for 6D pose estimation[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal, QC, Canada. IEEE, 2021: 2773-2782. doi: 10.1109/iccv48922.2021.00279http://dx.doi.org/10.1109/iccv48922.2021.00279
LI H Y, LIN J H, JIA K. DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation[M]. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2022: 369-385. doi: 10.1007/978-3-031-20077-9_22http://dx.doi.org/10.1007/978-3-031-20077-9_22
WANG G, MANHARDT F, TOMBARI F, et al. GDR-Net: geometry-guided direct regression network for monocular 6D object pose estimation[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA. IEEE, 2021: 16606-16616. doi: 10.1109/cvpr46437.2021.01634http://dx.doi.org/10.1109/cvpr46437.2021.01634
HU Y L, FUA P, SALZMANN M. Perspective flow aggregation for data-limited 6D object pose estimation[M]. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2022: 89-106. doi: 10.1007/978-3-031-20086-1_6http://dx.doi.org/10.1007/978-3-031-20086-1_6
SU Y Z, SALEH M, FETZER T, et al. ZebraPose: coarse to fine surface encoding for 6DoF object pose estimation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, LA, USA. IEEE, 2022: 6728-6738. doi: 10.1109/cvpr52688.2022.00662http://dx.doi.org/10.1109/cvpr52688.2022.00662
LABBÉ Y, CARPENTIER J, AUBRY M, et al. CosyPose: Consistent Multi-View Multi-Object 6D Pose Estimation[M]. Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 574-591. doi: 10.1007/978-3-030-58520-4_34http://dx.doi.org/10.1007/978-3-030-58520-4_34
WU Y Z, JAVAHERI A, ZAND M, et al. Keypoint cascade voting for point cloud based 6DoF pose estimation[C]. 2022 International Conference on 3D Vision (3DV). Prague, Czech Republic. IEEE, 2022: 176-186. doi: 10.1109/3dv57658.2022.00030http://dx.doi.org/10.1109/3dv57658.2022.00030
0
浏览量
14
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构