Semantic image segmentation is an essential part of modern autonomous driving systems because accurate understanding of the scene around the car is the key to navigation and motion planning. The existing advanced convolutional neural network-based semantic segmentation model DeepLab v3+ can not use attention information
which leads to rough segmentation boundary. To improve the semantic image segmentation accuracy for autonomous driving scenario
this paper proposed a segmentation model that combined the low pixel information with channel and spatial information. By inserting the attention module in the convolutional neural network
image semantic level information could be extracted
and more abundant features could be obtained through learning the position information and channel information of the image. The unary potential was figured out from the scores of each category output of the convolutional neural network
and the pairwise potential was obtained from the preliminary segmentation and the original input image
so that every pixel of the image could be modeled by fully connected conditional random fields
and the local details of the image could be optimized. The final result of semantic segmentation was obtained from fully connection conditional random fields through iteration. Compared with the existing DeepLab v3+ network
the improved model can promote key indicators such as mean intersection over union(mIoU) and mean pixel accuracy(mPA) by 1.07 and 3.34 percentage points respectively. It is able to segment objects more finely
and suppress the excessive smoothness of the boundary region segmentation
unreasonable islands preferably.
关键词
Keywords
references
SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640-651.
PAN X ZH, ZHANG SH Q, GUO W P, et al .. Video-based facial expression recognition using multimodal deep convolutional neural networks[J]. Opt. Precision Eng ., 2019, 27(4):963-970. (in Chinese)
LI Y, LIU X Y, ZHANG H Q, et al .. Optical remote sensing image retrieval based on convolutional neural networks[J]. Opt. Precision Eng., 2018, 26(1): 200-207. (in Chinese)
GUO B Q, WANG N. Pedestrian intruding railway clearance classification algorithm based on improved deep convolutional network[J]. Opt. Precision Eng ., 2018, 26(12): 3040-3050. (in Chinese)
POHLEN T, HERMANS A, MATHIAS M, et al .. Full-resolution residual networks for semantic segmentation in street scenes[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017. Honolulu, HI. New York, USA: IEEE, 2017.
HE K, ZHANG X, REN S, et al .. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA: IEEE, 2016: 770-778.
YANG M, YU K, ZHANG C, et al .. DenseASPP for Semantic Segmentation in Street Scenes[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA: IEEE, 2018: 3684-3692.
WANG F, JIANG M, QIAN C, et al .. Residual attention network for image classication[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA: IEEE, 2017: 6450-6458.
HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA: IEEE, 2018: 7132-7141.
CHEN L, ZHU Y, PAPANDREOU G, et al .. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]. European Conference on Computer Vision, Munich, Germany: Springer, 2018: 833-851.
CHEN L, PAPANDREOU G, SCHROFF F, et al .. Rethinking atrous convolution for semantic image segmentation[DB]. 2017, arXiv: 1706.05587v3[cs.CV] .
CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]. IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA: IEEE, 2017: 1800-1807.
WOO S, PARK J, LEE J, et al .. CBAM: Convolutional block attention module[C]. European Conference on Computer Vision, Munich, Germany: Springer, 2018: 3-19.
KRÄHENBVHL P, KOLTUN V. Efficient inference in fully connected CRFs with Gaussian edge potentials[J]. Advances in Neural Information Processing Systems , 2011, 24: 109-117.
LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]. Proceedings of the Eighteenth International Conference on Machine Learning, Williamstown, MA, USA: Morgan Kaufmann Publishers Inc, 2001: 282-289.
PARIS S, DURAND F. A fast approximation of the bilateral filter using a signal processing approach[J]. International Journal of Computer Vision, 2009, 81(1): 24-52.
KRAEHENBUEHL P, KOLTUN V. Parameter learning and convergent inference for dense random fields[C]. Proceedings of the 30th International Conference on International Conference on Machine Learning, Atlanta, Georgia, USA: MIT Press , 2013: 513-521.