利用卷积神经网络的自动驾驶场景语义分割

王中宇; 倪显扬; 尚振东

doi:10.3788/OPE.20192711.2429

您当前的位置：

首页 >

文章列表页 >

利用卷积神经网络的自动驾驶场景语义分割

信息科学 | 更新时间：2020-08-13

- 利用卷积神经网络的自动驾驶场景语义分割
- Autonomous driving semantic segmentation with convolution neural networks
- 光学精密工程 2019年27卷第11期页码：2429-2438
- 作者机构：
  
  1.北京航空航天大学仪器科学与光电工程学院，北京 10019
  2.河南科技大学机电工程学院，河南洛阳 471023
- 作者简介：
  
  [ "王中宇(1963-)，男，河南洛阳人，博士，教授，1985年，1988年于合肥工业大学分别获得学士，硕士学位，1996年于华中理工大学获得博士学位，主要从事光机电一体化技术与仪器研究。E-mail:mewan@buaa.edu.cn" ]
  [ "倪显扬(1994-)，男，河北廊坊人，硕士研究生，2017年于北京航空航天大学获得学士学位，主要从事语义分割和神经网络方面研究。E-mail:overflow010@buaa.edu.cn" ]
- 基金信息：
  
  北京市自然科学基金资助项目(3172020)
- DOI：10.3788/OPE.20192711.2429
  中图分类号： TP391
- 收稿日期：2019-05-06，
  
  录用日期：2019-7-23，
  
  纸质出版日期：2019-11-15
- 稿件说明：
移动端阅览
王中宇, 倪显扬, 尚振东. 利用卷积神经网络的自动驾驶场景语义分割[J]. 光学精密工程, 2019,27(11):2429-2438.

Zhong-yu WANG, Xian-yang NI, Zhen-dong SHANG. Autonomous driving semantic segmentation with convolution neural networks[J]. Optics and precision engineering, 2019, 27(11): 2429-2438.
王中宇, 倪显扬, 尚振东. 利用卷积神经网络的自动驾驶场景语义分割[J]. 光学精密工程, 2019,27(11):2429-2438. DOI： 10.3788/OPE.20192711.2429.

Zhong-yu WANG, Xian-yang NI, Zhen-dong SHANG. Autonomous driving semantic segmentation with convolution neural networks[J]. Optics and precision engineering, 2019, 27(11): 2429-2438. DOI： 10.3788/OPE.20192711.2429.

摘要

图像语义分割是现代自动驾驶系统的一个必要部分，因为对汽车周围场景的准确理解是导航和动作规划的关键。为提高自动驾驶场景的图像语义分割准确率，且考虑到当下流行的基于卷积神经网络的语义分割模型(DeepLab v3+)无法有效地利用注意力信息，导致分割边界粗糙等问题，提出一种融合底层像素信息与通道、空间信息的语义分割神经网络。在卷积神经网络中插入注意力模块，提取出图像语义级别的信息

通过学习图像的位置信息和通道信息得到更加丰富的特征；从卷积神经网络输出的各类别得分值计算出单点势能，且从初步分割图和原图得到成对势能，以便全连接条件随机场对图像的全部像素进行建模

并且优化图像的局部细节；全连接条件随机场通过迭代得到语义分割的最终结果。在CityScapes数据集上进行了测试，与DeepLab v3+相比较

改进后的模型分别提高了均交并比和均像素精度等关键指标1.07%和3.34%。它能够更加精细地分割目标

较好地解决分割边界粗糙，有效地抑制边界区域分割的过度平滑和不合理孤岛等问题。

Abstract

Semantic image segmentation is an essential part of modern autonomous driving systems because accurate understanding of the scene around the car is the key to navigation and motion planning. The existing advanced convolutional neural network-based semantic segmentation model DeepLab v3+ can not use attention information

which leads to rough segmentation boundary. To improve the semantic image segmentation accuracy for autonomous driving scenario

this paper proposed a segmentation model that combined the low pixel information with channel and spatial information. By inserting the attention module in the convolutional neural network

image semantic level information could be extracted

and more abundant features could be obtained through learning the position information and channel information of the image. The unary potential was figured out from the scores of each category output of the convolutional neural network

and the pairwise potential was obtained from the preliminary segmentation and the original input image

so that every pixel of the image could be modeled by fully connected conditional random fields

and the local details of the image could be optimized. The final result of semantic segmentation was obtained from fully connection conditional random fields through iteration. Compared with the existing DeepLab v3+ network

the improved model can promote key indicators such as mean intersection over union(mIoU) and mean pixel accuracy(mPA) by 1.07 and 3.34 percentage points respectively. It is able to segment objects more finely

and suppress the excessive smoothness of the boundary region segmentation

unreasonable islands preferably.

关键词

Keywords

references

SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.

潘仙张, 张石清, 郭文平, 等.多模深度卷积神经网络应用于视频表情识别[J].光学精密工程, 2019, 27(4):963-970.

PAN X ZH, ZHANG SH Q, GUO W P, et al .. Video-based facial expression recognition using multimodal deep convolutional neural networks[J]. Opt. Precision Eng ., 2019, 27(4):963-970. (in Chinese)

李宇, 刘雪莹, 张洪群, 等.基于卷积神经网络的光学遥感图像检索[J].光学精密工程, 2018, 26(1):200-207.

LI Y, LIU X Y, ZHANG H Q, et al .. Optical remote sensing image retrieval based on convolutional neural networks[J]. Opt. Precision Eng., 2018, 26(1): 200-207. (in Chinese)

郭保青, 王宁.基于改进深度卷积网络的铁路入侵行人分类算法[J].光学精密工程, 2018, 26(12):3040-3050.

GUO B Q, WANG N. Pedestrian intruding railway clearance classification algorithm based on improved deep convolutional network[J]. Opt. Precision Eng ., 2018, 26(12): 3040-3050. (in Chinese)

POHLEN T, HERMANS A, MATHIAS M, et al .. Full-resolution residual networks for semantic segmentation in street scenes[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017. Honolulu, HI. New York, USA: IEEE, 2017.

HE K, ZHANG X, REN S, et al .. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA: IEEE, 2016: 770-778.

YANG M, YU K, ZHANG C, et al .. DenseASPP for Semantic Segmentation in Street Scenes[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA: IEEE, 2018: 3684-3692.

WANG F, JIANG M, QIAN C, et al .. Residual attention network for image classication[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA: IEEE, 2017: 6450-6458.

HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA: IEEE, 2018: 7132-7141.

CHEN L, ZHU Y, PAPANDREOU G, et al .. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]. European Conference on Computer Vision, Munich, Germany: Springer, 2018: 833-851.

CHEN L, PAPANDREOU G, SCHROFF F, et al .. Rethinking atrous convolution for semantic image segmentation[DB]. 2017, arXiv: 1706.05587v3[cs.CV] .

CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA: IEEE, 2017: 1800-1807.

WOO S, PARK J, LEE J, et al .. CBAM: Convolutional block attention module[C]. European Conference on Computer Vision, Munich, Germany: Springer, 2018: 3-19.

KRÄHENBVHL P, KOLTUN V. Efficient inference in fully connected CRFs with Gaussian edge potentials[J]. Advances in Neural Information Processing Systems , 2011, 24: 109-117.

LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]. Proceedings of the Eighteenth International Conference on Machine Learning, Williamstown, MA, USA: Morgan Kaufmann Publishers Inc, 2001: 282-289.

PARIS S, DURAND F. A fast approximation of the bilateral filter using a signal processing approach[J]. International Journal of Computer Vision, 2009, 81(1): 24-52.

KRAEHENBUEHL P, KOLTUN V. Parameter learning and convergent inference for dense random fields[C]. Proceedings of the 30th International Conference on International Conference on Machine Learning, Atlanta, Georgia, USA: MIT Press , 2013: 513-521.

浏览量

126

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于激光雷达点云地图的车辆定位与导航

联合图像层级特征的压缩感知迭代重构

主动学习联合聚类分组网络的高光谱遥感图像分类

基于深度学习的遥感图像舰船目标检测算法综述

面向高光谱显微图像血细胞分类的空-谱可分离卷积神经网络