互注意力融合图像和点云数据的3D目标检测

陈俊英; 白童垚; 赵亮

doi:10.37188/OPE.20212909.2247

您当前的位置：

首页 >

文章列表页 >

互注意力融合图像和点云数据的3D目标检测

信息科学 | 更新时间：2021-10-08

- 互注意力融合图像和点云数据的3D目标检测
- 3D object detection based on fusion of point cloud and image by mutual attention
- 光学精密工程 2021年29卷第9期页码：2247-2254
- 作者机构：
  
  西安建筑科技大学信息与控制工程学院，陕西西安 710055
- 作者简介：
  
  [ "陈俊英（1980-），女，内蒙丰镇人，博士，副教授，硕士生导师，新南威尔士大学访问学者。2004年于西安交通大学获得硕士学位，2010年于西安交通大学获得博士学位，现为西安建筑科技大学信息与控制工程学院教师，主要从事计算机视觉及机器学习方面的研究。E-mail： chenjy@xauat.edu.cn，陈俊英（1980-），女，内蒙丰镇人，博士，副教授，硕士生导师，新南威尔士大学访问学者。2004年于西安交通大学获得硕士学位，2010年于西安交通大学获得博士学位，现为西安建筑科技大学信息与控制工程学院教师，主要从事计算机视觉及机器学习方面的研究。E-mail：chenjy@xauat.edu.cn，" ]
  [ "白童垚（1993-），男，陕西汉中人，硕士研究生，主要从事目标检测的算法研究。白童垚（1993-），男，陕西汉中人，硕士研究生，主要从事目标检测的算法研究。" ]
- 基金信息：
  
  国家自然科学基金项目(51209167;61803293);陕西省自然科学基金(2019JM-474);西安市科技计划项目(2020KJRC0055)
- DOI：10.37188/OPE.20212909.2247
  中图分类号： TP391
- 收稿日期：2021-03-09，
  
  修回日期：2021-04-24，
  
  纸质出版日期：2021-09-15
- 稿件说明：
移动端阅览
陈俊英,白童垚,赵亮.互注意力融合图像和点云数据的3D目标检测[J].光学精密工程,2021,29(09):2247-2254.

CHEN Jun-ying,BAI Tong-yao,ZHAO Liang.3D object detection based on fusion of point cloud and image by mutual attention[J].Optics and Precision Engineering,2021,29(09):2247-2254.
陈俊英,白童垚,赵亮.互注意力融合图像和点云数据的3D目标检测[J].光学精密工程,2021,29(09):2247-2254. DOI： 10.37188/OPE.20212909.2247.

CHEN Jun-ying,BAI Tong-yao,ZHAO Liang.3D object detection based on fusion of point cloud and image by mutual attention[J].Optics and Precision Engineering,2021,29(09):2247-2254. DOI： 10.37188/OPE.20212909.2247.

摘要

为了利用图像信息辅助点云数据提高3D目标检测精度，需要解决图像特征空间和点云特征空间自适应对齐融合的问题。本文提出了一种多模态特征自适应融合的3D目标检测深度学习网络。首先，对点云数据体素化，基于体素内的点云特征学习体素特征表示，用3D稀疏卷积神经网络获取点云数据的特征，同时用ResNet神经网络提取图像特征。然后通过引入互注意力模块自适应对齐图像特征和点云特征，得到基于图像特征增强后的点云特征。最后在此特征基础上应用区域提案网络和分类回归多任务学习网络实现3D目标检测。在KITTI 3D目标检测数据集上的实验结果表明：在小汽车的简易、中等、困难三个不同检测难度等级上，平均检测精度分别为88.76%，77.63%和76.14%。该方法能够有效融合图像信息和点云信息，提高3D目标检测的准确率。

Abstract

To use image information in assisting point cloud to improve the accuracy of 3D object detection， it is necessary to solve the problem of the adaptive alignment and fusion between the image feature space and point cloud feature space. A deep learning network based on adaptive fusion of multimodal features was proposed for 3D object detection. First， a voxelization method was used to partition point clouds into even voxels. The voxel feature was derived from the features of the point cloud included， and a 3D sparse convolution neural network was used to learn the features of the point cloud. Simultaneously， a ResNet-like neural network was used to extract the image features. Next， the image features and point cloud features were aligned adaptively by introducing the mutual attention module， and the point cloud features enhanced by the image feature were obtained. Finally， based on the derived features， Region Proposal Networks （RPN） and multitask learning networks for classification and regression tasks were applied to achieve 3D object detection. The experimental results on the KITTI 3D object detection data set showed that the average precision was 88.76%， 77.63%， and 76.14%， respectively on simple， medium， and difficult levels of car detection. The proposed method can effectively fuse image and point cloud information， and improve the precision of 3D object detection.

关键词

Keywords

references

赵传，张保明，余东行，等 . 利用迁移学习的机载激光雷达点云分类［J］. 光学精密工程， 2019 ， 27 （ 7 ）： 1601 - 1612 . doi: 10.3788/ope.20192707.1601 http://dx.doi.org/10.3788/ope.20192707.1601

ZHAO CH ， ZHANG B M ， YU D X ， et al . Air-borne Lidar point cloud classification using transfer learning ［J］. Opt. Precision Eng. ， 2019 ， 27 （ 7 ）： 1601 - 1612 . （in Chinese） . doi: 10.3788/ope.20192707.1601 http://dx.doi.org/10.3788/ope.20192707.1601

LI B ， OUYANG W ， SHENG L ， et al . GS3D： An efficient 3D object detection framework for autonomous driving ［C］. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition， CVPR ， 2019 ： 1019 - 1028 . doi: 10.1109/cvpr.2019.00111 http://dx.doi.org/10.1109/cvpr.2019.00111

WANG Y ， CHAO W L ， GARG D ， et al . Pseudo-lidar from visual depth estimation： Bridging the gap in 3D object detection for autonomous driving ［C］. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition， CVPR ， 2019 ： 8437 - 8445 . doi: 10.1109/cvpr.2019.00864 http://dx.doi.org/10.1109/cvpr.2019.00864

CHEN X ， MA H ， WAN J ， et al .. Multi-view 3D object detection network for autonomous driving ［C］. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition， CVPR ， 2017 ： 6526 - 6534 . doi: 10.1109/CVPR.2017.691 http://dx.doi.org/10.1109/CVPR.2017.691

SIMON M ， AMENDE K ， KRAUS A ， et al .. Complexer-YOLO： Real-time 3D object detection and tracking on semantic point clouds ［C］. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops ， 2019 ： 1190 - 1199 . doi: 10.1109/cvprw.2019.00158 http://dx.doi.org/10.1109/cvprw.2019.00158

王张飞，刘春阳，隋新，等 . 基于深度投影的三维点云目标分割和碰撞检测［J］. 光学精密工程， 2020 ， 28 （ 7 ）： 1600 - 1608 . doi: 10.37188/OPE.20202807.1600 http://dx.doi.org/10.37188/OPE.20202807.1600

WANG ZH F ， LIU CH Y ， SUI X ， et al . Three-dimensional point cloud object segmentation and collision detection based on depth projection ［J］. Opt. Precision Eng. ， 2020 ， 28 （ 7 ）： 1600 - 1608 . （in Chinese） . doi: 10.37188/OPE.20202807.1600 http://dx.doi.org/10.37188/OPE.20202807.1600

QI C R ， SU H ， MO K ， et al . PointNet： Deep learning on point sets for 3D classification and segmentation ［C］. Proceedings-30th IEEE Conference on Computer Vision and Pattern Recognition， CVPR ， 2017 ： 77 - 85 . doi: 10.1109/cvpr.2017.16 http://dx.doi.org/10.1109/cvpr.2017.16

QI C R ， YI L ， SU H ， et al . PointNet++： Deep hierarchical feature learning on point sets in a metric space ［C］. Advances in Neural Information Processing Systems， NIPS ， 2017 ： 5100 - 5109 . doi: 10.1109/cvpr.2017.16 http://dx.doi.org/10.1109/cvpr.2017.16

杨军，党吉圣 . 采用深度级联卷积神经网络的三维点云识别与分割［J］. 光学精密工程， 2020 ， 28 （ 5 ）： 1187 - 1199 .

YANG J ， DANG J SH . Recognition and segmentation of three-dimensional point cloud based on deep cascade convolutional neural network ［J］. Opt. Precision Eng. ， 2020 ， 28 （ 5 ）： 1187 - 1199 . （in Chinese）

ZHOU Y ， TUZEL O . Voxelnet： End-to-end learning for point cloud based 3d object detection ［C］. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition， CVPR ， 2018 ： 4490 - 4499 . doi: 10.1109/cvpr.2018.00472 http://dx.doi.org/10.1109/cvpr.2018.00472

SHI S ， WANG X ， LI H . PointRCNN： 3D object proposal generation and detection from point cloud ［C］. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition， CVPR ， 2019 ： 770 - 779 . doi: 10.1109/cvpr.2019.00086 http://dx.doi.org/10.1109/cvpr.2019.00086

LANG A H ， VORA S ， CAESAR H ， et al . Pointpillars： Fast encoders for object detection from point clouds ［C］. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition， CVPR ， 2019 ： 12689 - 12697 . doi: 10.1109/cvpr.2019.01298 http://dx.doi.org/10.1109/cvpr.2019.01298

XU D ， ANGUELOV D ， JAIN A . PointFusion： deep sensor fusion for 3D bounding box estimation ［C］. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition， CVPR ， 2018 ： 244 - 253 . doi: 10.1109/cvpr.2018.00033 http://dx.doi.org/10.1109/cvpr.2018.00033

HE K M ， ZHANG X ， REN S ， et al . Deep residual learning for image recognition ［C］. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition， CVPR ， 2016 ： 770 - 778 . doi: 10.1109/cvpr.2016.90 http://dx.doi.org/10.1109/cvpr.2016.90

LI Y ， BU R ， SUN M ， et al . PointCNN： Convolution on X-transformed points ［C］. Advances in Neural Information Processing Systems， NIPS ， 2018 ： 820 - 830 .

QI C R ， LIU W ， WU C ， et al . Frustum pointnets for 3D object detection from RGB-D data ［C］. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition， CVPR ， 2018 ： 918 - 927 . doi: 10.1109/cvpr.2018.00102 http://dx.doi.org/10.1109/cvpr.2018.00102

LIANG M ， YANG B ， WANG S ， et al . Deep continuous fusion for multi-sensor 3d object detection ［C］. Proceedings of the European Conference on Computer Vision， ECCV ， 2018 ： 663 - 678 . doi: 10.1007/978-3-030-01270-0_39 http://dx.doi.org/10.1007/978-3-030-01270-0_39

VORA S ， LANG A H ， HELOU B ， et al . Pointpainting： Sequential fusion for 3D object detection ［C］. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition， CVPR ， 2020 ： 4603 - 4611 . doi: 10.1109/cvpr42600.2020.00466 http://dx.doi.org/10.1109/cvpr42600.2020.00466

REN S ， HE K ， GIRSHICK R ， et al . Faster r-cnn： towards real-time object detection with region proposal networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2017 ， 39 （ 6 ）： 1137 - 1149 . doi: 10.1109/tpami.2016.2577031 http://dx.doi.org/10.1109/tpami.2016.2577031

GEIGER A ， LENZ P ， URTASUN R . Are we ready for autonomous driving？ the kitti vision benchmark suite ［C］. IEEE Conference on Computer Vision and Pattern Recognition， CVPR ， 2012 ： 3354 - 3361 . doi: 10.1109/cvpr.2012.6248074 http://dx.doi.org/10.1109/cvpr.2012.6248074

CHEN J Y ， BAI T Y . SAANet： Spatial adaptive alignment network for object detection in automatic driving ［J］. Image and Vision Computing ， 2020 ， 94 （ 2 ）： 103873 . doi: 10.1016/j.imavis.2020.103873 http://dx.doi.org/10.1016/j.imavis.2020.103873

KU J ， MOZIFIAN M ， LEE J ， et al . Joint 3D proposal generation and object detection from view aggregation ［C］. IEEE International Conference on Intelligent Robots and Systems ， 2018 ： 5750 - 5757 . doi: 10.1109/iros.2018.8594049 http://dx.doi.org/10.1109/iros.2018.8594049

浏览量

1022

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于多分支残差注意力网络的水下图像增强

融合知识蒸馏和注意力机制的光伏热斑检测

融合卷积块注意力模块和Siamese神经网络的人脸识别算法

基于LL-GG-LG Net的CT和PET医学图像融合

基于改进BiSeNet的实时图像语义分割