SENet生成对抗网络在图像语义描述中的应用

刘仲民; 陈恒; 胡文瑾

doi:10.37188/OPE.20233109.1379

您当前的位置：

首页 >

文章列表页 >

SENet生成对抗网络在图像语义描述中的应用

信息科学 | 更新时间：2023-05-15

- SENet生成对抗网络在图像语义描述中的应用
- Application of SENet generative adversarial network in image semantics description
- 光学精密工程 2023年31卷第9期页码：1379-1389
- 作者机构：
  
  1.兰州理工大学电气工程与信息工程学院，甘肃兰州 730050
  2.西北民族大学数学与计算机科学学院，甘肃兰州 730000
  3.甘肃省工业过程先进控制重点实验室，甘肃兰州 730050
- 作者简介：
  
  [ "刘仲民（1978-），男，甘肃靖远人，副教授，博士，硕士生导师，2002年、2009年和2018年于兰州理工大学分别获得学士、硕士和博士学位，主要研究方向为模式识别、图像修复和图像描述。E-mail：liuzhmx@163.com通讯作者：" ]
  [ "陈恒（1993-），男，陕西西安人，硕士研究生，主要研究方向为模式识别和图像描述。E-mail：Chen664234@163.com" ]
- 基金信息：
  
  国家自然科学基金资助项目(62061042);甘肃省工业过程先进控制重点实验室开发基金资助项目(2022KX10)
- DOI：10.37188/OPE.20233109.1379
  中图分类号： TP391
- 收稿日期：2022-07-27，
  
  修回日期：2022-08-22，
  
  纸质出版日期：2023-05-10
- 稿件说明：
移动端阅览
刘仲民,陈恒,胡文瑾.SENet生成对抗网络在图像语义描述中的应用[J].光学精密工程,2023,31(09):1379-1389.

LIU Zhongmin,CHEN Heng,HU Wenjin.Application of SENet generative adversarial network in image semantics description[J].Optics and Precision Engineering,2023,31(09):1379-1389.
刘仲民,陈恒,胡文瑾.SENet生成对抗网络在图像语义描述中的应用[J].光学精密工程,2023,31(09):1379-1389. DOI： 10.37188/OPE.20233109.1379.

LIU Zhongmin,CHEN Heng,HU Wenjin.Application of SENet generative adversarial network in image semantics description[J].Optics and Precision Engineering,2023,31(09):1379-1389. DOI： 10.37188/OPE.20233109.1379.

摘要

针对图像语义描述过程中存在的语句描述不够准确及情感色彩涉及较少等问题，提出一种基于SENet生成对抗网络的图像语义描述方法。该方法在生成器模型特征提取阶段增加通道注意力机制，使网络能够更加充分和完整地提取图像中显著区域的特征，将提取后的图像特征输入到编码器中。在原始文本语料库中加入情感语料库且通过自然语言处理生成词向量，将词向量与编码后的图像特征相结合输入到解码器中，通过不断对抗训练生成一段符合该图像所示内容的情感描述语句。最后通过仿真实验与现有方法进行对比，该方法的BLEU指标相比SentiCap方法提高了15%左右，其他相关指标均有提升。在自对比实验中，该方法在CIDEr指标上提高3%左右。该网络能够很好地提取图像特征，使描述图像的语句更加准确，情感色彩更加丰富。

Abstract

An SENet-based method for image semantics description of generative adversarial networks is proposed to address the inaccurate description of utterances and inadequate involvement of emotional colors in image semantics descriptions. The method first adds a channel attention mechanism to the feature extraction stage of the generator model so that the network can completely extract features from salient regions of the image and input the extracted image features into the encoder. Second， a sentiment corpus is added to the original text corpus， and a word vector is generated through natural language processing. This word vector is then combined with the encoded image features and input to the decoder， and a sentiment description statement is generated to match the content depicted in the image through continuous adversarial training. The proposed method is compared with existing methods through simulation experiments， and it is found to improve the BLEU metric by approximately 15% compared with the SentiCap method； improvements in other related metrics are also noted. In self-comparison experiments， the method exhibits an improvement of approximately 3% in the CIDEr metric. Thus， the proposed network can better extract image features， resulting in more accurate statements describing images and richer emotional colors.

关键词

Keywords

references

李沛卓，万雪，李盛阳 . 基于多模态学习的空间科学实验图像描述［J］. 光学精密工程， 2021 ， 29 （ 12 ）： 2944 - 2955 . doi: 10.37188/OPE.2021.0244 http://dx.doi.org/10.37188/OPE.2021.0244

LI P ZH ， WAN X ， LI SH Y . Image caption of space science experiment based on multi-modal learning ［J］. Optics and Precision Engineering ， 2021 ， 29 （ 12 ）： 2944 - 2955 . （in Chinese） . doi: 10.37188/OPE.2021.0244 http://dx.doi.org/10.37188/OPE.2021.0244

赵海英，周伟，侯小刚，等 . 多标签分类的传统民族服饰纹样图像语义理解［J］. 光学精密工程， 2020 ， 28 （ 3 ）： 695 - 703 . doi: 10.3788/OPE.20202803.0695 http://dx.doi.org/10.3788/OPE.20202803.0695

ZHAO H Y ， ZHOU W ， HOU X G ， et al . Multi-label classification of traditional national costume pattern image semantic understanding ［J］. Opt. Precision Eng. ， 2020 ， 28 （ 3 ）： 695 - 703 . （in Chinese） . doi: 10.3788/OPE.20202803.0695 http://dx.doi.org/10.3788/OPE.20202803.0695

ANDERSON P ， HE X D ， BUEHLER C ， et al . Bottom-up and top-down attention for image captioning and visual question answering ［C］. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 1823，2018 ， Salt Lake City， UT， USA . IEEE ， 2018 ： 6077 - 6086 . doi: 10.1109/cvpr.2018.00636 http://dx.doi.org/10.1109/cvpr.2018.00636

周自维，王朝阳，徐亮 . 基于融合门网络的图像理解算法设计与应用［J］. 光学精密工程， 2021 ， 29 （ 4 ）： 906 - 915 . doi: 10.37188/OPE.20212904.0906 http://dx.doi.org/10.37188/OPE.20212904.0906

ZHOU Z W ， WANG CH Y ， XU L . Design and application of image captioning algorithm based on fusion gate neural network ［J］. Optics and Precision Engineering ， 2021 ， 29 （ 4 ）： 906 - 915 . （in Chinese） . doi: 10.37188/OPE.20212904.0906 http://dx.doi.org/10.37188/OPE.20212904.0906

盖荣丽，蔡建荣，王诗宇，等 . 卷积神经网络在图像识别中的应用研究综述［J］. 小型微型计算机系统， 2021 ， 42 （ 9 ）： 1980 - 1984 . doi: 10.3969/j.issn.1000-1220.2021.09.030 http://dx.doi.org/10.3969/j.issn.1000-1220.2021.09.030

GAI R L ， CAI J R ， WANG SH Y ， et al . Research review on image recognition based on deep learning ［J］. Journal of Chinese Computer Systems ， 2021 ， 42 （ 9 ）： 1980 - 1984 . （in Chinese） . doi: 10.3969/j.issn.1000-1220.2021.09.030 http://dx.doi.org/10.3969/j.issn.1000-1220.2021.09.030

WANG J ， TANG J H ， YANG M K ， et al . Improving OCR-based image captioning by incorporating geometrical relationship ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2025，2021 ， Nashville， TN， USA. IEEE ， 2021 ： 1306 - 1315 . doi: 10.1109/cvpr46437.2021.00136 http://dx.doi.org/10.1109/cvpr46437.2021.00136

XU K ， BA J L ， KIROS R ， et al . Show， attend and tell： neural image caption generation with visual attention ［C］. Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37. July 6 - 11 ， 2015， Lille， France. New York ： ACM ， 2015： 2048 - 2057 . doi: 10.1109/cvpr.2015.7298935 http://dx.doi.org/10.1109/cvpr.2015.7298935

KULKARNI G ， PREMRAJ V ， ORDONEZ V ， et al . Babytalk： understanding and generating simple image descriptions ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2013 ， 35 （ 12 ）： 2891 - 2903 . doi: 10.1109/tpami.2012.162 http://dx.doi.org/10.1109/tpami.2012.162

ELLIOTT D ， DE V A . Describing images using inferred visual dependency representations ［C］. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . 2015 ： 42 - 52 . doi: 10.3115/v1/p15-1005 http://dx.doi.org/10.3115/v1/p15-1005

TANTI M ， GATT A ， CAMILLERI K P . What is the Role of Recurrent Neural Networks （ RNNs ） in an Image Caption Generator ？［EB/OL］. 2017 ： arXiv ： 1708 . 02043 . https：//arxiv.org/abs/1708.02043 https://arxiv.org/abs/1708.02043 . doi: 10.18653/v1/w17-3506 http://dx.doi.org/10.18653/v1/w17-3506

MUN J ， CHO M ， HAN B . Text-guided attention model for image captioning ［C］. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. New York ： ACM ， 2017 ： 4233 - 4239 . doi: 10.1609/aaai.v31i1.11237 http://dx.doi.org/10.1609/aaai.v31i1.11237

LIU S Q ， ZHU Z H ， YE N ， et al . Improved image captioning via policy gradient optimization of SPIDEr ［C］. 2017 IEEE International Conference on Computer Vision （ICCV）. 2229，2017 ， Venice， Italy. IEEE ， 2017 ： 873 - 881 . doi: 10.1109/iccv.2017.100 http://dx.doi.org/10.1109/iccv.2017.100

BANSAL M ， KUMAR M ， SACHDEVA M ， et al . Transfer learning for image classification using VGG19： Caltech-101 image data set ［J］. Journal of Ambient Intelligence and Humanized Computing ， 2021 .

SHAHA M ， PAWAR M . Transfer learning for image classification ［C］. 2018 Second International Conference on Electronics， Communication and Aerospace Technology （ICECA）. 2931，2018 ， Coimbatore， India. IEEE ， 2018 ： 656 - 660 . doi: 10.1109/iceca.2018.8474802 http://dx.doi.org/10.1109/iceca.2018.8474802

HUANG L ， WANG W M ， CHEN J ， et al . Attention on attention for image captioning ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. October 27 - November 2 ， 2019 ， Seoul， Korea （South）. IEEE ， 2020 ： 4633 - 4642 . doi: 10.1109/iccv.2019.00473 http://dx.doi.org/10.1109/iccv.2019.00473

陈佛计，朱枫，吴清潇，等 . 生成对抗网络及其在图像生成中的应用研究综述［J］. 计算机学报， 2021 ， 44 （ 2 ）： 347 - 369 . doi: 10.11897/SP.J.1016.2021.00347 http://dx.doi.org/10.11897/SP.J.1016.2021.00347

CHEN F J ， ZHU F ， WU Q X ， et al . A survey about image generation with generative adversarial nets ［J］. Chinese Journal of Computers ， 2021 ， 44 （ 2 ）： 347 - 369 . （in Chinese） . doi: 10.11897/SP.J.1016.2021.00347 http://dx.doi.org/10.11897/SP.J.1016.2021.00347

HU J ， SHEN L ， SUN G . Squeeze-and-excitation networks ［C］. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 1823，2018 ， Salt Lake City， UT， USA . IEEE ， 2018 ： 7132 - 7141 . doi: 10.1109/cvpr.2018.00745 http://dx.doi.org/10.1109/cvpr.2018.00745

MAZZIA V ， SALVETTI F ， CHIABERGE M . Efficient-CapsNet： capsule network with self-attention routing ［J］. Scientific Reports ， 2021 ， 11 ： 14634 . doi: 10.1038/s41598-021-93977-0 http://dx.doi.org/10.1038/s41598-021-93977-0

MATHEWS A ， XIE L X ， HE X M . SentiCap： generating image descriptions with sentiments ［J］. Proceedings of the AAAI Conference on Artificial Intelligence ， 2016 ， 30 （ 1 ）： 3574 - 3580 . doi: 10.1609/aaai.v30i1.10475 http://dx.doi.org/10.1609/aaai.v30i1.10475

DAI B ， FIDLER S ， URTASUN R ， et al . Towards diverse and natural image descriptions via a conditional GAN ［C］. 2017 IEEE International Conference on Computer Vision （ICCV）. 2229，2017 ， Venice， Italy. IEEE ， 2017 ： 2989 - 2998 . doi: 10.1109/iccv.2017.323 http://dx.doi.org/10.1109/iccv.2017.323

WANG Q L ， WU B G ， ZHU P F ， et al . ECA-net： efficient channel attention for deep convolutional neural networks ［C］. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 1319，2020 ， Seattle， WA， USA. IEEE ， 2020 ： 11531 - 11539 . doi: 10.1109/cvpr42600.2020.01155 http://dx.doi.org/10.1109/cvpr42600.2020.01155

WOO S ， PARK J ， LEE J Y ， et al . CBAM ： Convolutional Block Attention Module ［M］. Computer Vision - ECCV 2018. Cham ： Springer International Publishing ， 2018 ： 3 - 19 . doi: 10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1

浏览量

1008

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于分数阶微分的高光谱图像特征提取与分类

面向高光谱影像场景分类的轻量化深度全局-局部知识蒸馏网络

多分支无锚框网络密集行人检测算法

面向高光谱图像分类的半监督双流网络

相位敏感光时域反射仪的信号处理方法综述