Few-shot object detection on Thangka via multi-scale context information

HU Wenjin; TANG Huiyuan; YUE Chaoyang; SONG Huafei

doi:10.37188/OPE.20233112.1859

您当前的位置：

首页 >

文章列表页 >

Few-shot object detection on Thangka via multi-scale context information

Information Sciences | 更新时间：2023-07-17

- Few-shot object detection on Thangka via multi-scale context information
- Optics and Precision Engineering Vol. 31, Issue 12, Pages: 1859-1869(2023)
- 作者机构：
  
  1.西北民族大学中国民族语言文字信息技术教育部重点实验室，甘肃兰州 730030
  2.西北民族大学数学与计算机科学学院，甘肃兰州 730030
- 作者简介：
- 基金信息：
- DOI：10.37188/OPE.20233112.1859
  CLC： TP391
- Received：22 August 2022，
  
  Revised：28 October 2022，
  
  Published：25 June 2023
- 稿件说明：
移动端阅览
胡文瑾,唐慧媛,乐超洋等.结合多尺度上下文信息的唐卡小样本目标检测[J].光学精密工程,2023,31(12):1859-1869.

HU Wenjin,TANG Huiyuan,YUE Chaoyang,et al.Few-shot object detection on Thangka via multi-scale context information[J].Optics and Precision Engineering,2023,31(12):1859-1869.
胡文瑾,唐慧媛,乐超洋等.结合多尺度上下文信息的唐卡小样本目标检测[J].光学精密工程,2023,31(12):1859-1869. DOI： 10.37188/OPE.20233112.1859.

HU Wenjin,TANG Huiyuan,YUE Chaoyang,et al.Few-shot object detection on Thangka via multi-scale context information[J].Optics and Precision Engineering,2023,31(12):1859-1869. DOI： 10.37188/OPE.20233112.1859.

摘要

通过对图像中感兴趣的对象进行分类与定位，能够帮助人们理解唐卡图像丰富的语义信息，促进文化传承。针对唐卡图像样本较少，背景复杂，检测目标存在遮挡，检测精度不高等问题，本文提出了一种结合多尺度上下文信息和双注意力引导的唐卡小样本目标检测算法。首先，构建了一个新的多尺度特征金字塔，学习唐卡图像的多层级特征和上下文信息，提高模型对多尺度目标的判别能力。其次，在特征金字塔末端加入双注意力引导模块，提升模型对关键特征的表征能力，同时降低噪声的影响。最后利用Rank & Sort Loss替换交叉熵分类损失，简化模型训练的复杂度并提升检测精度。实验结果表明，所提出的方法在唐卡数据集和COCO数据集上的10-shot实验中，平均检测精度分别达到了19.7%和11.2%。

Abstract

Classifying and locating objects of interest in Thangka images can help people understand the rich semantic information of Thangka and promote cultural inheritance. To address the problems of insufficient Thangka image samples， the complex background， the occlusion of detection targets， and the low detection accuracy， this paper proposes a few-shot object detection algorithm for Thangka images that combines multi-scale context information and dual attention guidance. First， a new multi-scale feature pyramid is constructed to learn the multi-level features and contextual information of Thangka images and improve the ability of the model to discriminate multi-scale targets. Second， a dual attention guidance module is added at the end of the feature pyramid to improve the ability of the model to represent key features while reducing the impact of noise. Finally， Rank&Sort Loss is used to replace the cross-entropy classification loss， which simplifies the model training process and increases the detection accuracy. Experimental results indicate that the proposed method achieved a mean average precision of 19.7% and 11.2% in 10-shot experiments using a Thangka dataset and the COCO dataset， respectively.

关键词

Keywords

references

GIRSHICK R ， DONAHUE J ， DARRELL T ， et al . Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation ［C］. 2014 IEEE Conference on Computer Vision and Pattern Recognition . 23 - 28 ， 2014， Columbus， OH， USA. IEEE ， 2014： 580 - 587 . doi: 10.1109/cvpr.2014.81 http://dx.doi.org/10.1109/cvpr.2014.81

GIRSHICK R . Fast R-CNN ［C］. 2015 IEEE International Conference on Computer Vision （ICCV） . 7 - 13 ， 2015， Santiago， Chile. IEEE ， 2016： 1440 - 1448 . doi: 10.1109/iccv.2015.169 http://dx.doi.org/10.1109/iccv.2015.169

REN S Q ， HE K M ， GIRSHICK R ， et al . Faster R-CNN： Towards Real-Time Object Detection with Region Proposal Networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2017 ， 39 （ 6 ）： 1137 - 1149 . doi: 10.1109/tpami.2016.2577031 http://dx.doi.org/10.1109/tpami.2016.2577031

LIU W ， ANGUELOV D ， ERHAN D ， et al . SSD ： Single Shot Multibox Detector ［M］. Computer Vision - ECCV 2016 . Cham ： Springer International Publishing ， 2016 ： 21 - 37 . doi: 10.1007/978-3-319-46448-0_2 http://dx.doi.org/10.1007/978-3-319-46448-0_2

REDMON J ， DIVVALA S ， GIRSHICK R ， et al . You Only Look Once： Unified， Real-Time Object Detection ［C］. 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . 27 - 30 ， 2016， Las Vegas， NV， USA. IEEE ， 2016： 779 - 788 . doi: 10.1109/cvpr.2016.91 http://dx.doi.org/10.1109/cvpr.2016.91

REDMON J ， FARHADI A . YOLO9000： Better， Faster， Stronger ［C］. 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . 21 - 26 ， 2017， Honolulu， HI， USA. IEEE ， 2017： 6517 - 6525 . doi: 10.1109/cvpr.2017.690 http://dx.doi.org/10.1109/cvpr.2017.690

REDMON J ， FARHADI A . YOLOv3： An Incremental Improvement ［EB/OL］. 2018 ： arXiv ： 1804 . 02767 . https：//arxiv.org/abs/1804.02767 https://arxiv.org/abs/1804.02767 . doi: 10.1109/cvpr.2017.690 http://dx.doi.org/10.1109/cvpr.2017.690

BOCHKOVSKIY A ， WANG C Y ， LIAO H Y M . YOLOv4： Optimal Speed and Accuracy of Object Detection ［EB/OL］. 2020 ： arXiv ： 2004 . 10934 . https：//arxiv.org/abs/2004.10934 https://arxiv.org/abs/2004.10934

CHEN Q ， WANG Y M ， YANG T ， et al . You Only Look One-Level Feature ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 20 - 25 ， 2021， Nashville， TN， USA. IEEE ， 2021： 13034 - 13043 . doi: 10.1109/cvpr46437.2021.01284 http://dx.doi.org/10.1109/cvpr46437.2021.01284

KARLINSKY L ， SHTOK J ， HARARY S ， et al . RepMet： Representative-Based Metric Learning for Classification and Few-Shot Object Detection ［C］. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR） . 15 - 20 ， 2019， Long Beach， CA， USA. IEEE ， 2020： 5192 - 5201 . doi: 10.1109/cvpr.2019.00534 http://dx.doi.org/10.1109/cvpr.2019.00534

YAN X P ， CHEN Z L ， XU A N ， et al . Meta R-CNN： Towards General Solver for Instance-Level Low-Shot Learning ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. 272，2019 ， Seoul， Korea （South）. IEEE ， 2020 ： 9576 - 9585 . doi: 10.1109/iccv.2019.00967 http://dx.doi.org/10.1109/iccv.2019.00967

CHEN H ， WANG Y L ， WANG G Y ， et al . LSTD： A Low-Shot Transfer Detector for Object Detection ［C］. Proceedings of the AAAI Conference on Artificial Intelligence ， 2018 ， 32 （ 1 ）. doi: 10.1609/aaai.v32i1.11716 http://dx.doi.org/10.1609/aaai.v32i1.11716

KANG B Y ， LIU Z ， WANG X ， et al . Few-Shot Object Detection via Feature Reweighting ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. October 27 - November 2 ， 2019 ， Seoul， Korea （South）. IEEE ， 2020 ： 8419 - 8428 . doi: 10.1109/iccv.2019.00851 http://dx.doi.org/10.1109/iccv.2019.00851

WANG X ， HUANG T E ， DARRELL T ， et al . Frustratingly Simple Few-Shot Object Detection ［EB/OL］. 2020 ： arXiv ： 2003 . 06957 . https：//arxiv.org/abs/2003.06957 https://arxiv.org/abs/2003.06957 . doi: 10.18653/v1/2021.findings-acl.88 http://dx.doi.org/10.18653/v1/2021.findings-acl.88

SUN B ， LI B H ， CAI S C ， et al . FSCE： Few-Shot Object Detection via Contrastive Proposal Encoding ［C］. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 2025，2021 ， Nashville， TN， USA. IEEE ， 2021 ： 7348 - 7358 . doi: 10.1109/cvpr46437.2021.00727 http://dx.doi.org/10.1109/cvpr46437.2021.00727

OKSUZ K ， CAM B C ， AKBAS E ， et al . Rank & Sort Loss for Object Detection and Instance Segmentation ［C］. 2021 IEEE/CVF International Conference on Computer Vision （ICCV） . 10 - 17 ， 2021， Montreal， QC， Canada. IEEE ， 2022： 2989 - 2998 . doi: 10.1109/iccv48922.2021.00300 http://dx.doi.org/10.1109/iccv48922.2021.00300

LIN T Y ， DOLLÁR P ， GIRSHICK R ， et al . Feature Pyramid Networks For Object Detection ［C］. 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR） . 21 - 26 ， 2017， Honolulu， HI， USA. IEEE ， 2017： 936 - 944 . doi: 10.1109/cvpr.2017.106 http://dx.doi.org/10.1109/cvpr.2017.106

LIN T Y ， MAIRE M ， BELONGIE S ， et al . Microsoft COCO： Common Objects in Context ［EB/OL］. 2014 ： arXiv ： 1405 . 0312 . https：//arxiv.org/abs/1405.0312 https://arxiv.org/abs/1405.0312 . doi: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48

WU J X ， LIU S T ， HUANG D ， et al . Multi-Scale Positive Sample Refinement for Few-Shot Object Detection ［M］. Computer Vision - ECCV 2020 . Cham ： Springer International Publishing ， 2020 ： 456 - 472 . doi: 10.1007/978-3-030-58517-4_27 http://dx.doi.org/10.1007/978-3-030-58517-4_27

Views

171

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Solar cell defect detection network combining multiscale feature and attention

Image super-resolution network based on multi-scale adaptive attention

Related Author

ZHOU Ying

XU Shibo

CHEN Haiyong

LIU Kun

PEI Shenghu

CHEN Haiyong

XU Shibo

Related Institution

School of Artificial Intelligence， Hebei University of Technology

China Hebei Control Engineering Research Center

AI问答

Address：No.3888 Dong Nanhu Road, Changchun, Jilin, China Postal code：130033
Tel：0431-86176855 Email：gxjmgc@ciomp.ac.cn
Technical support is provided by Beijing Founder electronics co., LTD 吉ICP备11002662号-17 京公网安备11010802024621
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰