浏览全部资源
扫码关注微信
1.兰州理工大学 电气工程与信息工程学院,甘肃 兰州 730050
2.西北民族大学 数学与计算机科学学院,甘肃 兰州 730000
3.甘肃省工业过程先进控制重点实验室,甘肃 兰州 730050
[ "刘仲民(1978-),男,甘肃靖远人,副教授,博士,硕士生导师,2002年、2009年和2018年于兰州理工大学分别获得学士、硕士和博士学位,主要研究方向为模式识别、图像修复和图像描述。E-mail:liuzhmx@163.com通讯作者:" ]
[ "陈 恒(1993-),男,陕西西安人,硕士研究生,主要研究方向为模式识别和图像描述。E-mail:Chen664234@163.com" ]
收稿日期:2022-07-27,
修回日期:2022-08-22,
纸质出版日期:2023-05-10
移动端阅览
刘仲民,陈恒,胡文瑾.SENet生成对抗网络在图像语义描述中的应用[J].光学精密工程,2023,31(09):1379-1389.
LIU Zhongmin,CHEN Heng,HU Wenjin.Application of SENet generative adversarial network in image semantics description[J].Optics and Precision Engineering,2023,31(09):1379-1389.
刘仲民,陈恒,胡文瑾.SENet生成对抗网络在图像语义描述中的应用[J].光学精密工程,2023,31(09):1379-1389. DOI: 10.37188/OPE.20233109.1379.
LIU Zhongmin,CHEN Heng,HU Wenjin.Application of SENet generative adversarial network in image semantics description[J].Optics and Precision Engineering,2023,31(09):1379-1389. DOI: 10.37188/OPE.20233109.1379.
针对图像语义描述过程中存在的语句描述不够准确及情感色彩涉及较少等问题,提出一种基于SENet生成对抗网络的图像语义描述方法。该方法在生成器模型特征提取阶段增加通道注意力机制,使网络能够更加充分和完整地提取图像中显著区域的特征,将提取后的图像特征输入到编码器中。在原始文本语料库中加入情感语料库且通过自然语言处理生成词向量,将词向量与编码后的图像特征相结合输入到解码器中,通过不断对抗训练生成一段符合该图像所示内容的情感描述语句。最后通过仿真实验与现有方法进行对比,该方法的BLEU指标相比SentiCap方法提高了15%左右,其他相关指标均有提升。在自对比实验中,该方法在CIDEr指标上提高3%左右。该网络能够很好地提取图像特征,使描述图像的语句更加准确,情感色彩更加丰富。
An SENet-based method for image semantics description of generative adversarial networks is proposed to address the inaccurate description of utterances and inadequate involvement of emotional colors in image semantics descriptions. The method first adds a channel attention mechanism to the feature extraction stage of the generator model so that the network can completely extract features from salient regions of the image and input the extracted image features into the encoder. Second, a sentiment corpus is added to the original text corpus, and a word vector is generated through natural language processing. This word vector is then combined with the encoded image features and input to the decoder, and a sentiment description statement is generated to match the content depicted in the image through continuous adversarial training. The proposed method is compared with existing methods through simulation experiments, and it is found to improve the BLEU metric by approximately 15% compared with the SentiCap method; improvements in other related metrics are also noted. In self-comparison experiments, the method exhibits an improvement of approximately 3% in the CIDEr metric. Thus, the proposed network can better extract image features, resulting in more accurate statements describing images and richer emotional colors.
李沛卓 , 万雪 , 李盛阳 . 基于多模态学习的空间科学实验图像描述 [J]. 光学 精密工程 , 2021 , 29 ( 12 ): 2944 - 2955 . doi: 10.37188/OPE.2021.0244 http://dx.doi.org/10.37188/OPE.2021.0244
LI P ZH , WAN X , LI SH Y . Image caption of space science experiment based on multi-modal learning [J]. Optics and Precision Engineering , 2021 , 29 ( 12 ): 2944 - 2955 . (in Chinese) . doi: 10.37188/OPE.2021.0244 http://dx.doi.org/10.37188/OPE.2021.0244
赵海英 , 周伟 , 侯小刚 , 等 . 多标签分类的传统民族服饰纹样图像语义理解 [J]. 光学 精密工程 , 2020 , 28 ( 3 ): 695 - 703 . doi: 10.3788/OPE.20202803.0695 http://dx.doi.org/10.3788/OPE.20202803.0695
ZHAO H Y , ZHOU W , HOU X G , et al . Multi-label classification of traditional national costume pattern image semantic understanding [J]. Opt. Precision Eng. , 2020 , 28 ( 3 ): 695 - 703 . (in Chinese) . doi: 10.3788/OPE.20202803.0695 http://dx.doi.org/10.3788/OPE.20202803.0695
ANDERSON P , HE X D , BUEHLER C , et al . Bottom-up and top-down attention for image captioning and visual question answering [C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 1823,2018 , Salt Lake City, UT, USA . IEEE , 2018 : 6077 - 6086 . doi: 10.1109/cvpr.2018.00636 http://dx.doi.org/10.1109/cvpr.2018.00636
周自维 , 王朝阳 , 徐亮 . 基于融合门网络的图像理解算法设计与应用 [J]. 光学 精密工程 , 2021 , 29 ( 4 ): 906 - 915 . doi: 10.37188/OPE.20212904.0906 http://dx.doi.org/10.37188/OPE.20212904.0906
ZHOU Z W , WANG CH Y , XU L . Design and application of image captioning algorithm based on fusion gate neural network [J]. Optics and Precision Engineering , 2021 , 29 ( 4 ): 906 - 915 . (in Chinese) . doi: 10.37188/OPE.20212904.0906 http://dx.doi.org/10.37188/OPE.20212904.0906
盖荣丽 , 蔡建荣 , 王诗宇 , 等 . 卷积神经网络在图像识别中的应用研究综述 [J]. 小型微型计算机系统 , 2021 , 42 ( 9 ): 1980 - 1984 . doi: 10.3969/j.issn.1000-1220.2021.09.030 http://dx.doi.org/10.3969/j.issn.1000-1220.2021.09.030
GAI R L , CAI J R , WANG SH Y , et al . Research review on image recognition based on deep learning [J]. Journal of Chinese Computer Systems , 2021 , 42 ( 9 ): 1980 - 1984 . (in Chinese) . doi: 10.3969/j.issn.1000-1220.2021.09.030 http://dx.doi.org/10.3969/j.issn.1000-1220.2021.09.030
WANG J , TANG J H , YANG M K , et al . Improving OCR-based image captioning by incorporating geometrical relationship [C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025,2021 , Nashville, TN, USA. IEEE , 2021 : 1306 - 1315 . doi: 10.1109/cvpr46437.2021.00136 http://dx.doi.org/10.1109/cvpr46437.2021.00136
XU K , BA J L , KIROS R , et al . Show, attend and tell: neural image caption generation with visual attention [C]. Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37. July 6 - 11 , 2015, Lille, France. New York : ACM , 2015: 2048 - 2057 . doi: 10.1109/cvpr.2015.7298935 http://dx.doi.org/10.1109/cvpr.2015.7298935
KULKARNI G , PREMRAJ V , ORDONEZ V , et al . Babytalk: understanding and generating simple image descriptions [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2013 , 35 ( 12 ): 2891 - 2903 . doi: 10.1109/tpami.2012.162 http://dx.doi.org/10.1109/tpami.2012.162
ELLIOTT D , DE V A . Describing images using inferred visual dependency representations [C]. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . 2015 : 42 - 52 . doi: 10.3115/v1/p15-1005 http://dx.doi.org/10.3115/v1/p15-1005
TANTI M , GATT A , CAMILLERI K P . What is the Role of Recurrent Neural Networks ( RNNs ) in an Image Caption Generator ? [EB/OL]. 2017 : arXiv : 1708 . 02043 . https://arxiv.org/abs/1708.02043 https://arxiv.org/abs/1708.02043 . doi: 10.18653/v1/w17-3506 http://dx.doi.org/10.18653/v1/w17-3506
MUN J , CHO M , HAN B . Text-guided attention model for image captioning [C]. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. New York : ACM , 2017 : 4233 - 4239 . doi: 10.1609/aaai.v31i1.11237 http://dx.doi.org/10.1609/aaai.v31i1.11237
LIU S Q , ZHU Z H , YE N , et al . Improved image captioning via policy gradient optimization of SPIDEr [C]. 2017 IEEE International Conference on Computer Vision (ICCV). 2229,2017 , Venice, Italy. IEEE , 2017 : 873 - 881 . doi: 10.1109/iccv.2017.100 http://dx.doi.org/10.1109/iccv.2017.100
BANSAL M , KUMAR M , SACHDEVA M , et al . Transfer learning for image classification using VGG19: Caltech-101 image data set [J]. Journal of Ambient Intelligence and Humanized Computing , 2021 .
SHAHA M , PAWAR M . Transfer learning for image classification [C]. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). 2931,2018 , Coimbatore, India. IEEE , 2018 : 656 - 660 . doi: 10.1109/iceca.2018.8474802 http://dx.doi.org/10.1109/iceca.2018.8474802
HUANG L , WANG W M , CHEN J , et al . Attention on attention for image captioning [C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). October 27 - November 2 , 2019 , Seoul, Korea (South). IEEE , 2020 : 4633 - 4642 . doi: 10.1109/iccv.2019.00473 http://dx.doi.org/10.1109/iccv.2019.00473
陈佛计 , 朱枫 , 吴清潇 , 等 . 生成对抗网络及其在图像生成中的应用研究综述 [J]. 计算机学报 , 2021 , 44 ( 2 ): 347 - 369 . doi: 10.11897/SP.J.1016.2021.00347 http://dx.doi.org/10.11897/SP.J.1016.2021.00347
CHEN F J , ZHU F , WU Q X , et al . A survey about image generation with generative adversarial nets [J]. Chinese Journal of Computers , 2021 , 44 ( 2 ): 347 - 369 . (in Chinese) . doi: 10.11897/SP.J.1016.2021.00347 http://dx.doi.org/10.11897/SP.J.1016.2021.00347
HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 1823,2018 , Salt Lake City, UT, USA . IEEE , 2018 : 7132 - 7141 . doi: 10.1109/cvpr.2018.00745 http://dx.doi.org/10.1109/cvpr.2018.00745
MAZZIA V , SALVETTI F , CHIABERGE M . Efficient-CapsNet: capsule network with self-attention routing [J]. Scientific Reports , 2021 , 11 : 14634 . doi: 10.1038/s41598-021-93977-0 http://dx.doi.org/10.1038/s41598-021-93977-0
MATHEWS A , XIE L X , HE X M . SentiCap: generating image descriptions with sentiments [J]. Proceedings of the AAAI Conference on Artificial Intelligence , 2016 , 30 ( 1 ): 3574 - 3580 . doi: 10.1609/aaai.v30i1.10475 http://dx.doi.org/10.1609/aaai.v30i1.10475
DAI B , FIDLER S , URTASUN R , et al . Towards diverse and natural image descriptions via a conditional GAN [C]. 2017 IEEE International Conference on Computer Vision (ICCV). 2229,2017 , Venice, Italy. IEEE , 2017 : 2989 - 2998 . doi: 10.1109/iccv.2017.323 http://dx.doi.org/10.1109/iccv.2017.323
WANG Q L , WU B G , ZHU P F , et al . ECA-net: efficient channel attention for deep convolutional neural networks [C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1319,2020 , Seattle, WA, USA. IEEE , 2020 : 11531 - 11539 . doi: 10.1109/cvpr42600.2020.01155 http://dx.doi.org/10.1109/cvpr42600.2020.01155
WOO S , PARK J , LEE J Y , et al . CBAM : Convolutional Block Attention Module [M]. Computer Vision - ECCV 2018. Cham : Springer International Publishing , 2018 : 3 - 19 . doi: 10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1
0
浏览量
1008
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构