1.宁夏大学 信息工程学院,宁夏 银川 750021
2.宁夏大学 宁夏“东数西算”人工智能与信息安全重点实验室,宁夏 银川 750021
[ "金 友(1999-),男,宁夏吴忠人,硕士研究生,目前就读于宁夏大学,主要研究基于深度学习的目标检测。E-mail:1210791546@qq.com" ]
[ "刘立波(1974-),女,宁夏平罗人,博士,教授,博士生导师,2012年于北京理工大学博士后出站,主要从事智能信息处理、计算机视觉方面的研究。E-mail: liulib@163.com" ]
收稿:2025-03-24,
修回:2025-06-04,
纸质出版:2025-09-25
移动端阅览
金友,邓箴,刘立波.知识嵌入引导的双分支融合增强开放词汇目标检测[J].光学精密工程,2025,33(18):2929-2943.
JIN You,DENG Zhen,LIU Libo.Knowledge integration guided dual-branch fusion enhanced open-vocabulary object detection[J].Optics and Precision Engineering,2025,33(18):2929-2943.
金友,邓箴,刘立波.知识嵌入引导的双分支融合增强开放词汇目标检测[J].光学精密工程,2025,33(18):2929-2943. DOI: 10.37188/OPE.20253318.2929. CSTR: 32169.14.OPE.20253318.2929.
JIN You,DENG Zhen,LIU Libo.Knowledge integration guided dual-branch fusion enhanced open-vocabulary object detection[J].Optics and Precision Engineering,2025,33(18):2929-2943. DOI: 10.37188/OPE.20253318.2929. CSTR: 32169.14.OPE.20253318.2929.
针对开放场景中检测器新类概念理解弱、标签混淆与新类检测性能不足的问题,提出了一种知识嵌入引导的双分支融合增强开放词汇目标检测(KI-DBFOVD)方法。首先,设计知识嵌入(KI)模块,利用视觉语言模型(VLM)生成的伪标签,嵌入检测器中以促进新类概念的学习;然后,提出标签匹配(LM)模块,通过多级阈值调节和基类-新类独立匹配细化标签匹配过程,缓解检测器训练过程中基类与新类标签混淆现象;最后,将传统视觉分支和视觉语言分支通过几何平均的方式进行融合,构建了一种新颖的双分支融合模块(DBF),在保持基类检测精度的同时能够更有效地挖掘和定位新类目标,进一步提升了KI-DBFOVD方法整体的检测性能。实验结果表明,本文方法在COCO数据集上对新类的检测精度达到了38.6%,在类别更加繁多且检测难度更高的LVIS数据集上对新类取得了25.4%的检测精度,优于多个主流方法,能够更好地应用在不同的开放场景中。
To address the issues of weak understanding of new class concepts, label confusion, and insufficient detection performance of new classes in open-set scenarios, a Knowledge Integration-guided Dual-branch Fusion Open-Vocabulary Object Detection (KI-DBFOVD) method was proposed in this paper. Firstly, a Knowledge Integration (KI) module was designed, where pseudo-labels generated by a Vision-Language Model were embedded into the detector to learn about new class concepts. Subsequently, a Label Match (LM) module was introduced to refine the label matching process through multi-level threshold adjustment and independent matching between base and new classes, thereby alleviating the label confusion between base and new classes during detection. Finally, a novel Dual-branch Fusion module (DBF) was constructed by fusing the traditional visual branch and the vision-language branch via geometric averaging. This fusion maintained the detection accuracy of base classes and more effectively detected and localized new class objects, then enhanced the overall detection performance of the KI-DBFOVD method. Experimental results demonstrate that this method achieves a detection accuracy of 38.6% for new classes on the COCO dataset and 25.4% on the more challenging LVIS dataset, which contains a larger number of categories. These results outperform
several mainstream methods and indicate that this approach is more suitable for different open-set scenarios
.
.
王勰 , 殷高方 , 赵南京 , 等 . 基于可变荧光统计法的藻类活体细胞密度宽范围检测实验 [J]. 光学学报 , 2024 , 44 ( 12 ): 172 - 181 .
WANG X , YIN G F , ZHAO N J , et al . Experimental investigation of wide-range detection of live algal cell density based on variable fluorescence statistical analysis [J]. Acta Optica Sinica , 2024 , 44 ( 12 ): 172 - 181 . (in Chinese)
刘立波 , 郗思宇 , 邓箴 . 结合改进ConvNeXt网络与知识蒸馏的天气识别 [J]. 光学 精密工程 , 2023 , 31 ( 14 ): 2123 - 2134 . doi: 10.37188/ope.20233114.2123 http://dx.doi.org/10.37188/ope.20233114.2123
LIU L B , XI S Y , DENG Z . Weather recognition combining improved ConvNeXt models with knowledge distillation [J]. Opt. Precision Eng. , 2023 , 31 ( 14 ): 2123 - 2134 . (in Chinese) . doi: 10.37188/ope.20233114.2123 http://dx.doi.org/10.37188/ope.20233114.2123
陈俊英 , 李朝阳 , 黄汉涛 , 等 . 并行特征提取和渐进特征融合的计算机主板装配缺陷检测 [J]. 光学 精密工程 , 2024 , 32 ( 10 ): 1622 - 1637 . doi: 10.37188/ope.20243210.1622 http://dx.doi.org/10.37188/ope.20243210.1622
CHEN J Y , LI Z Y , HUANG H T , et al . Computer motherboard assembly defect detection using parallel feature extraction and progressive feature fusion [J]. Opt. Precision Eng. , 2024 , 32 ( 10 ): 1622 - 1637 . (in Chinese) . doi: 10.37188/ope.20243210.1622 http://dx.doi.org/10.37188/ope.20243210.1622
HE K M , GKIOXARI G , DOLLÁR P , et al . Mask R-CNN [C]. 2017 IEEE International Conference on Computer Vision (ICCV). 22-29,2017 , Venice, Italy. IEEE , 2017 : 2980 - 2988 . doi: 10.1109/iccv.2017.322 http://dx.doi.org/10.1109/iccv.2017.322
REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN: towards real-time object detection with region proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 . doi: 10.1109/tpami.2016.2577031 http://dx.doi.org/10.1109/tpami.2016.2577031
ZAREIAN A , DELA ROSA K , HU D H , et al . Open-vocabulary object detection using captions [C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 20 - 25 , 2021 . Nashville, TN, USA. IEEE , 2021 : 14393-14 - 402 .
聂秀山 , 赵润虎 , 宁阳 , 等 . 开放词汇目标检测方法综述 [J]. 山东大学学报(工学版) , 2025 , 55 ( 1 ): 1 - 14 .
NIE X S , ZHAO R H , NING Y , et al . Survey of open vocabulary object detection methods [J]. Journal of Shandong University (Engineering Science) , 2025 , 55 ( 1 ): 1 - 14 . (in Chinese)
RADFORD A , KIM J W , HALLACY C , et al . Learning transferable visual models from natural langu-age supervision [C]. International conference on mac-hine learning. PMLR , 2021 : 8748 - 8763 .
CHEN K Y , JIANG X L , WANG H C , et al . OV-DAR: open-vocabulary object detection and attributes recognition [J]. International Journal of Computer Vision , 2024 , 132 ( 11 ): 5387 - 5409 . doi: 10.1007/s11263-024-02144-1 http://dx.doi.org/10.1007/s11263-024-02144-1
YANG L X , CHEN D P , CHEN Y F , et al . A neuroinspired contrast mechanism enables few-shot object detection [J]. Pattern Recognition , 2024 , 156 : 110766 . doi: 10.1016/j.patcog.2024.110766 http://dx.doi.org/10.1016/j.patcog.2024.110766
LI J Y , ZHAO F T , ZHAO H M , et al . A multi-modal open object detection model for tomato leaf diseases with strong generalization performance using PDC-VLD [J]. Plant Phenomics , 2024 , 6 : 220 . doi: 10.34133/plantphenomics.0220 http://dx.doi.org/10.34133/plantphenomics.0220
CHEN Y , WANG C , LI Z H , et al . Enhancing open-vocabulary object detection through region-word and region-vision matching [J]. Multimedia Systems , 2025 , 31 ( 3 ): 232 . doi: 10.1007/s00530-025-01806-5 http://dx.doi.org/10.1007/s00530-025-01806-5
WANG Z S , ZHOU W H , XU J L , et al . SIA-OVD: shape-invariant adapter for bridging the image-region gap in open-vocabulary detection [C]. Proceedings of the 32nd ACM International Conference on Multimedia. Melbourne VIC Australia . ACM , 2024 : 4986 - 4994 . doi: 10.1145/3664647.3680642 http://dx.doi.org/10.1145/3664647.3680642
ZHAO S Y , ZHANG Z X , SCHULTER S , et al . Exploiting unlabeled data withvision and language models forobject detection [C]. Computer Vision-ECCV 2022. Cham : Springer Nature Switzerland , 2022 : 159 - 175 . doi: 10.1007/978-3-031-20077-9_10 http://dx.doi.org/10.1007/978-3-031-20077-9_10
卢汉 , 崔博伦 , 万华洋 , 等 . 半监督式野生动物夜间目标端到端检测 [J]. 光学 精密工程 , 2025 , 33 ( 5 ): 789 - 801 . doi: 10.37188/ope.20253305.0789 http://dx.doi.org/10.37188/ope.20253305.0789
LU H , CUI B L , WAN H Y , et al . End-to-end recognition of nighttime wildlife based on semi-supervised learning [J]. Opt. Precision Eng. , 2025 , 33 ( 5 ): 789 - 801 . (in Chinese) . doi: 10.37188/ope.20253305.0789 http://dx.doi.org/10.37188/ope.20253305.0789
XU S L , LI X T , WU S Z , et al . DST-det: open-vocabulary object detection via dynamic self-training [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2025 , 35 ( 5 ): 5037 - 5050 . doi: 10.1109/tcsvt.2024.3520734 http://dx.doi.org/10.1109/tcsvt.2024.3520734
WANG K , CHENG L C , CHEN W K , et al . MarvelOVD : Marrying Object Recognition and Vision - Language Models for Robust Open - Vocabulary Object Detection [M]. Computer Vision-ECCV 2024. Cham : Springer Nature Switzerland , 2024 : 106 - 122 . doi: 10.1007/978-3-031-72643-9_7 http://dx.doi.org/10.1007/978-3-031-72643-9_7
ZHANG H L , GUAN D Y , KE X R , et al . Open-vocabulary object detection via debiased curriculum self-training [J]. Expert Systems with Applications , 2024 , 255 : 124762 . doi: 10.1016/j.eswa.2024.124762 http://dx.doi.org/10.1016/j.eswa.2024.124762
XIN C , HARTEL A , KASNECI E . DART: an automated end-to-end object detection pipeline with data Diversification, open-vocabulary bounding box Annotation, pseudo-label Review, and model Training [J]. Expert Systems with Applications , 2024 , 258 : 125124 . doi: 10.1016/j.eswa.2024.125124 http://dx.doi.org/10.1016/j.eswa.2024.125124
ZHONG Y W , YANG J W , ZHANG P C , et al . RegionCLIP: region-based language-image pretraining [C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18-24,2022 , New Orleans, LA, USA. IEEE , 2022 : 16772 - 16782 . doi: 10.1109/cvpr52688.2022.01629 http://dx.doi.org/10.1109/cvpr52688.2022.01629
WU S Z , ZHANG W W , JIN S , et al . Aligning bag of regions for open-vocabulary object detection [C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 17-24,2023 , Vancouver, BC, Canada. IEEE , 2023 : 15254 - 15264 . doi: 10.1109/cvpr52729.2023.01464 http://dx.doi.org/10.1109/cvpr52729.2023.01464
GU X , LIN T Y , KUO W , et al . Open-vocabular-y object detection via vision and language knowl-edge distillation [C]. The Tenth International Conf-erence on Learning Representations(ICLR) . 2022 : 1 - 21 .
LI J M , ZHANG J C , LI J C , et al . Learning background prompts to discover implicit knowledge for open vocabulary object detection [C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . June 16-22, 2024 , Seattle , WA , USA . IEEE , 2024 : 16678 - 16687 . doi: 10.1109/cvpr52733.2024.01578 http://dx.doi.org/10.1109/cvpr52733.2024.01578
SONG H , BANG J . Prompt-guided DETR with RoI-pruned masked attention for open-vocabulary object detection [J]. Pattern Recognition , 2024 , 155 : 110648 . doi: 10.1016/j.patcog.2024.110648 http://dx.doi.org/10.1016/j.patcog.2024.110648
LI Y . Federated fine-grained prompts for vision-language models based on open-vocabulary object detection [J]. Applied Intelligence , 2025 , 55 ( 7 ): 626 . doi: 10.1007/s10489-025-06527-w http://dx.doi.org/10.1007/s10489-025-06527-w
XU Y F , ZHANG M D , YANG X S , et al . Exploring multi-modal contextual knowledge for open-vocabulary object detection [J]. IEEE Transactions on Image Processing , 2024 , 33 : 6253 - 6267 . doi: 10.1109/tip.2024.3485518 http://dx.doi.org/10.1109/tip.2024.3485518
LONG Y X , HAN J H , HUANG R H , et al . Fine-grained visual-text prompt-driven self-training for open-vocabulary object detection [J]. IEEE Transactions on Neural Networks and Learning Systems , 2024 , 35 ( 11 ): 16277 - 16287 . doi: 10.1109/tnnls.2023.3293484 http://dx.doi.org/10.1109/tnnls.2023.3293484
WU Y X , KIRILLOV A , MASSA F , et al . Det-ectron2 [EB/OL]. https://github.com/facebookresear-ch/detectron2 https://github.com/facebookresear-ch/detectron2 , 2019 .
LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft COCO : Common Objects in Context [M]. Computer Vision-ECCV 2014. Cham : Springer International Publishing , 2014 : 740 - 755 . doi: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48
GUPTA A , DOLLÁR P , GIRSHICK R . LVIS: a dataset for large vocabulary instance segmentation [C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15-20,2019 , Long Beach, CA, USA. IEEE , 2019 : 5351 - 5359 . doi: 10.1109/cvpr.2019.00550 http://dx.doi.org/10.1109/cvpr.2019.00550
ZHU X , SU W , LU L , et al . Deformable DETR: deformable transformers for end-to-end object detection [C]. Proceedings of the 9th International Conference on Learning Representations (ICLR 2 - 021 ), 2021 : 1 - 16 .
ZHOU X , KOLTUN V , KRÄHENBUHL P . Proba-bilistic two-stage detection [J]. arXiv preprint arXiv: 2103.07461 , 2021 .
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621
