MEI Ruo-heng,MA Hui-min.Occlusion image data generation system[J].Optics and Precision Engineering,2021,29(05):1136-1144. DOI: 10.37188/OPE.20212905.1136.
针对当前数据集在遮挡问题下对于目标检测算法系统评价的不足以及现实中部分数据难以获取的问题,本文提出一个遮挡图像数据生成系统来生成遮挡图像以及对应标注信息,并利用该系统构建遮挡图像数据集MOCOD(More than Common Object Dataset)。在系统构建方面,设计了场景及全局管理模块、控制模块和数据处理模块用于生成和处理数据从而构建遮挡图像数据集。在数据生成方面,使用模板ID后处理图像生成不透明物体的像素级标注,使用光线步进采样三维时序空间生成半透明物体的像素级标注,综合生成的标注数据计算出图像中目标物体的遮挡率并划分遮挡等级。实验表明,使用遮挡图像数据生成系统能够非常高效地标注实例分割级的标注数据,图像平均标注速度达到了0.07 s。同时系统生成的标注数据提供10个等级的遮挡划分,相较于其他数据集有更为精确的遮挡等级划分和标注精度。系统引入的半透明物体遮挡标注也进一步增强了数据集对于遮挡问题评估的完备性。遮挡图像数据生成系统能够高效地构建遮挡数据集,相较于其他现有数据集,本系统生成的数据集有更精确的标注信息,能够更好地评估目标检测算法在遮挡问题下的瓶颈和性能。
Abstract
To address the inadequacy of current datasets for systematic evaluating target detection algorithm under the occlusion problem and the difficulty in acquiring some data in reality, this paper proposes an occlusion image data generation system to generate images with occlusion and corresponding annotations and to build the occlusion image dataset, namely more than common object dataset (MOCOD). In terms of system architecture, a scene and global management module, a control module, and a data processing module were designed to generate and process data to build an occlusion image dataset. In terms of data generation, for opaque objects, pixel-level annotation was generated via post-processing with a stencil buffer; for translucent objects, the annotation was generated by sampling the 3D temporal space with ray marching. Finally, the occlusion level could be calculated based on the generated annotations. The experiment result indicates that our system could efficiently annotate instance-level data, with an average annotation speed of nearly 0.07 s. The images provided by our dataset have ten occlusion levels. In the case of MOCOD, the annotation is more accurate, occlusion level classification is more precise, and annotation speed is considerably faster, compared to those in the case of other datasets. Further, the annotation of translucent objects is introduced in MOCOD, which expands the occlusion types and can help evaluate the occlusion problem better. In this study, we focused on the occlusion problem, and herein, we propose an occlusion image data generation system to effectively build an occlusion image dataset, MOCOD; the accurate annotation in our dataset can help evaluate the bottleneck and performance of detection algorithms under the occlusion problem better.
YAN J , WU M Y , CHEN SH ZH , et al . Using Mean Shift and blocking anti occlusion tracking [J]. Opt. Precision Eng. , 2010 , 18 ( 6 ): 1413 - 1419 . (in Chinese)
SONG H J , YU W , WANG R . High-confidence correlation tracking algorithm based on PSR and objective similarity [J]. Opt. Precision Eng. , 2018 , 26 ( 12 ): 3067 - 3078 . (in Chinese)
GILROY S , JONES E , GLAVIN M . Overcoming occlusion in the automotive environment-a review [J]. IEEE Transactions on Intelligent Transportation Systems , 2019 .
DENG J , DONG W , SOCHER R , et al . Imagenet: A large-scale hierarchical image database [C]. 2009 IEEE conference on computer vision and pattern recognition. IEEE , 2009 : 248 - 255 .
EVERINGHAM M , VAN GOOL L , WILLIAMS C K I , et al . The pascal visual object classes (VOC) challenge [J]. International journal of computer vision , 2010 , 88 ( 2 ): 303 - 338 .
GEIGER A , LENZ P , STILLER C , et al . Vision meets robotics: The kitti dataset [J]. The International Journal of Robotics Research , 2013 , 32 ( 11 ): 1231 - 1237 .
DOLLAR P , WOJEK C , SCHIELE B , et al . Pedestrian detection: An evaluation of the state of the art [J]. IEEE transactions on pattern analysis and machine intelligence , 2011 , 34 ( 4 ): 743 - 761 .
MAYER N , ILG E , HAUSSER P , et al . A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016 : 4040 - 4048 .
RICHTER S R , VINEET V , ROTH S , et al . Playing for data: Ground truth from computer games [C]. European conference on computer vision . Springer , Cham , 2016 : 102 - 118 .
BARBU A , MAYO D , ALVERIO J , et al . ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models [C]. Advances in Neural Information Processing Systems , 2019 : 9448 - 9458 .
PERLIN K , HOFFERT E M . Hypertexture [C]. Proceedings of the 16th annual conference on Computer graphics and interactive techniques , 1989 : 253 - 262 .
QI C R , LIU W , WU C , et al . Frustum pointnets for 3d object detection from rgb-d data [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018 : 918 - 927 .
HU X , XU X , XIAO Y , et al . SINet: A scale-insensitive convolutional neural network for fast vehicle detection [J]. IEEE transactions on intelligent transportation systems , 2018 , 20 ( 3 ): 1010 - 1019 .
LIANG M , YANG B , CHEN Y , et al . Multi-task multi-sensor fusion for 3d object detection [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2019 : 7345 - 7353 .
ZHANG W W , ZHENG Y , GAO Q , et al . Part-aware region proposal for vehicle detection in high occlusion environment [J]. IEEE Access , 2019 , 7 : 100383 - 100393 .
CHANDRASEKHAR S . Radiative transfer [M]. Courier Corporation , 2013 .
SWINEHART D F . The beer-lambert law [J]. Journal of chemical education , 1962 , 39 ( 7 ): 333 .
CORDTS M , OMRAN M , RAMOS S , et al . The cityscapes dataset for semantic urban scene understanding [C]. Proceedings of the IEEE conference on computer vision and pattern recognition , 2016 : 3213 - 3223 .