浏览全部资源
扫码关注微信
1.西安建筑科技大学 信息与控制工程学院, 陕西 西安 710055
2.人工智能与数字经济广东省实验室(广州), 广东 广州 510000
Received:24 February 2021,
Revised:10 April 2021,
Published:15 September 2021
移动端阅览
孟月波,石德旺,刘光辉等.多维度卷积融合的密集不规则文本检测[J].光学精密工程,2021,29(09):2210-2221.
MENG Yue-bo,SHI De-wang,LIU Guang-hui,et al.Dense irregular text detection based on multi-dimensional convolution fusion[J].Optics and Precision Engineering,2021,29(09):2210-2221.
孟月波,石德旺,刘光辉等.多维度卷积融合的密集不规则文本检测[J].光学精密工程,2021,29(09):2210-2221. DOI: 10.37188/OPE.20212909.2210.
MENG Yue-bo,SHI De-wang,LIU Guang-hui,et al.Dense irregular text detection based on multi-dimensional convolution fusion[J].Optics and Precision Engineering,2021,29(09):2210-2221. DOI: 10.37188/OPE.20212909.2210.
基于深度学习的自然场景文本检测算法进展显著,但对具有密集不规则排布特点的文本来说,由于其间距小、分布密集,导致特征提取困难,文本检测不全;同时,现有文本检测方法常采用的不同维度特征直接拼接的方式会导致多尺度特征融合不充分,造成语义信息的丢失。针对上述问题,本文提出一种基于多维度卷积融合的密集不规则文本检测方法。网络主体采用FPN结构,设计了文本增强模块(Text Enhancement Module,TEM),通过引入额外全局文本映射以强化网络对文本信息的关注能力;提出了通道融合策略(Channel Fusion Strategy,CFS),采用自底向上方式建立高低维度特征信息链,生成语义更加丰富的特征图,减少信息损失;预测阶段采用渐进式拓展文本核的方法生成文本预测结果。在DAST1500及ICDAR2015和CTW1500数据集上的实验表明,该方法其F值分别达到81.8%,83.0%及79.0%。提出算法不仅在密集不规则文本检测上表现出更好的性能,而且在一般自然场景文本(多向、曲线文本)上也具有一定竞争力。
Natural-scene text-detection algorithms based on deep learning have made significant progress; however, they only apply to texts with dense and irregular layouts. Owing to its small spacing and dense distribution, it is difficult to extract features from texts and the detection remains incomplete. Meanwhile, the existing text detection methods often use the direct splicing of different dimensional features, leading to insufficient multi-scale feature fusion and the loss of semantic information. To solve these problems, a dense irregular text detection method is proposed based on multi-dimensional convolution fusion. The network follows the FPN structure and utilizes a text enhancement module (TEM). By using additional global text mapping, the network pays special attention to the text information. A channel fusion strategy (CFS) is proposed, which uses the bottom-up method to establish the high-low dimension feature information chain to generate the feature map with richer semantics and reduce the information loss. In the prediction stage, text prediction results are generated through the gradual expansion of the text kernel. Experimental results on DAST1500, ICDAR2015, and CTW1500 datasets yield F values of 81.8%, 83.8%, and 79.0% respectively. The proposed algorithm not only has better performance in dense and irregular text detection but also shows a certain level of competitiveness in the case of general natural scene texts (multi-directional, curvilinear text).
王建新 , 王子亚 , 田萱 . 基于深度学习的自然场景文本检测与识别综述 [J]. 软件学报 , 2020 , 31 ( 05 ): 1465 - 1496 .
WANG J X , WANG Z Y , TIAN X . Review of natural scene text detection and recognition based on deep learning [J]. Journal of Software , 2020 , 31 ( 05 ): 1465 - 1496 . (in Chinese)
白志程 , 李擎 , 陈鹏 , 郭立晴 . 自然场景文本检测技术研究综述 [J]. 工程科学学报 , 2020 , 42 ( 11 ): 1433 - 1448 .
BAI ZH CH , LI Q , CHEN P , et al . Text detection in natural scenes: a literature review [J]. Chinese Journal of Engineering , 2020 , 42 ( 11 ): 1433 - 1448 . (in Chinese)
THILAGAVATHY A , CHILAMBUCHELVAN A . Fuzzy based edge enhanced text detection algorithm using MSER [J]. Cluster Computing , 2019 , 22 ( 05 ): 11681 - 11687 . doi: 10.1007/s10586-017-1448-5 http://dx.doi.org/10.1007/s10586-017-1448-5
SHASWATA S , NEELOTPAL C , SOUMYADEEP K , et al . Multi-lingual scene text detection and language identification [J]. Pattern Recognition Letters , 2020 , 138 : 16 - 22 . doi: 10.1016/j.patrec.2020.06.024 http://dx.doi.org/10.1016/j.patrec.2020.06.024
GHULAM J A , JAMAL H S , MUSSARAT Y , et al . A novel machine learning approach for scene text extraction [J]. Future Generation Computer Systems , 2018 , 87 : 328 - 340 . doi: 10.1016/j.future.2018.04.074 http://dx.doi.org/10.1016/j.future.2018.04.074
TIAN Z , HUANG W L , H T , et al . Detecting text in natural image with connectionist text proposal network [C]. 14th European conference on computer vision. Amsterdam, NETHERLANDS : Springer , 2016 : 56 - 72 . doi: 10.1007/978-3-319-46484-8_4 http://dx.doi.org/10.1007/978-3-319-46484-8_4
SHI B G , BAI X , BELONGIE S . Detecting oriented text in natural images by linking segments [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, HI, USA : IEEE , 2017 : 3482 - 3490 . doi: 10.1109/cvpr.2017.371 http://dx.doi.org/10.1109/cvpr.2017.371
LIU Y L , JIN L W . Deep matching prior network: Toward tighter multi-oriented text detection [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, HI, USA : IEEE , 2017 : 1962 - 1969 . doi: 10.1109/cvpr.2017.368 http://dx.doi.org/10.1109/cvpr.2017.368
LIU Y L , JIN L W , ZHANG T S , et al . Curved scene text detection via transverse and longitudinal sequence connection [J]. Pattern Recognition , 2019 , 90 : 337 - 345 . doi: 10.1016/j.patcog.2019.02.002 http://dx.doi.org/10.1016/j.patcog.2019.02.002
HE P , HUANG W L , HE T , et al . Single shot text detector with regional attention [C]. Proceedings of the IEEE International Conference on Computer Vision . Venice, ITALY : IEEE , 2017 : 1962 - 1969 . doi: 10.1109/iccv.2017.331 http://dx.doi.org/10.1109/iccv.2017.331
TANG J , YANG Z , WANG Y , et al . SegLink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping [J]. Pattern Recognition , 2019 , 96 : 106954 . doi: 10.1016/j.patcog.2019.06.020 http://dx.doi.org/10.1016/j.patcog.2019.06.020
ZHANG Z , ZHANG C Q , SHEN W , et al . Multi-oriented text detection with fully convolutional networks [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, NV, USA : IEEE , 2016 : 4159 - 4167 . doi: 10.1109/cvpr.2016.451 http://dx.doi.org/10.1109/cvpr.2016.451
ZHOU X Y , YAO C , WEN H , et al . East: An efficient and accurate scene text detector [C]. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition . Honolulu, HI, USA : IEEE , 2017 : 5551 - 5560 . doi: 10.1109/cvpr.2017.283 http://dx.doi.org/10.1109/cvpr.2017.283
SHI B G , RUAN J Q , ZHANG W , J , et al . TextSnake: a flexible representation for detecting text of arbitrary shapes [C]. 15th European conference on computer vision.Munich, Germany : Springer , 2018 : 19 - 35 . doi: 10.1007/978-3-030-01216-8_2 http://dx.doi.org/10.1007/978-3-030-01216-8_2
LI X , WANG W H , HOU B W , et al . Shape robust text detection with progressive scale expansion network [C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA : IEEE , 2019 . doi: 10.1109/cvpr.2019.00956 http://dx.doi.org/10.1109/cvpr.2019.00956
余永维 , 韩鑫 , 杜柳青 . 基于Inception-SSD算法的零件识别 [J]. 光学 精密工程 , 2020 , 28 ( 8 ): 1799 - 1809 . doi: 10.1088/1742-6596/1486/3/032024 http://dx.doi.org/10.1088/1742-6596/1486/3/032024
YU Y W , HAN X , DU L Q . Target part recognition based Inception-SSD algorithm [J]. Opt. Precision Eng , 2020 , 28 ( 8 ): 1799 - 1809 . (in Chinese) . doi: 10.1088/1742-6596/1486/3/032024 http://dx.doi.org/10.1088/1742-6596/1486/3/032024
范丽丽 , 赵宏伟 , 赵浩宇 , 等 . 基于深度卷积神经网络的目标检测研究综述 [J]. 光学 精密工程 , 2020 , 28 ( 5 ): 1152 - 1164 .
FAN L L , ZHAO H W , ZHAO H Y , et al . Survey of target detection based on deep convolutional neural networks [J]. Opt. Precision Eng , 2020 , 28 ( 05 ): 1152 - 1164 . (in Chinese)
REN S Q , HE K M , GIRSHICK R , et al . Faster r-cnn: Towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 : 1137 - 1149 . doi: 10.1109/tpami.2016.2577031 http://dx.doi.org/10.1109/tpami.2016.2577031
HE K M , ZHANG X Y , REN S Q , et al . Identity mappings in deep residual networks [C]. 14th European conference on computer vision. Amsterdam, NETHERLANDS : Springer , 2016 : 630 - 645 . doi: 10.1007/978-3-319-46493-0_38 http://dx.doi.org/10.1007/978-3-319-46493-0_38
杨其利 , 周炳红 , 郑伟 , 等 . 注意力卷积长短时记忆网络的弱小目标轨迹检测 [J]. 光学 精密工程 , 2020 , 28 ( 11 ): 2535 - 2548 . doi: 10.37188/OPE.20202811.2535 http://dx.doi.org/10.37188/OPE.20202811.2535
YANG Q L , ZHOU B H , ZHENG W , et al . Trajectory detection of small targets based on convolutional long short-term memory with attention mechanisms [J]. Opt. Precision Eng , 2020 , 28 ( 11 ): 2535 - 2548 . (in Chinese) . doi: 10.37188/OPE.20202811.2535 http://dx.doi.org/10.37188/OPE.20202811.2535
KARATZAS D , GOMEZ-BIGORDA L , NICOLAOU A , et al . ICDAR 2015 competition on robust reading [J]. International Conference on Document Analysis & Recognition . 2015 : 1156 - 1160 . doi: 10.1109/icdar.2015.7333942 http://dx.doi.org/10.1109/icdar.2015.7333942
DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C]. 2009 IEEE conference on computer vision and pattern recognition. Miami Beach, FL,USA : IEEE , 2009 : 248 - 255 . doi: 10.1109/cvpr.2009.5206848 http://dx.doi.org/10.1109/cvpr.2009.5206848
SHRIVASTAVA A , GUPTA A , GIRSHICK R . Training region-based object detectors with online hard example mining [C]. Proceedings of the IEEE conference on computer vision and pattern recognition . Las Vegas, NV, USA : IEEE , 2016 : 761 - 769 . doi: 10.1109/cvpr.2016.89 http://dx.doi.org/10.1109/cvpr.2016.89
0
Views
877
下载量
4
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution