桂林电子科技大学 电子工程与自动化学院,广西 桂林 541004
[ "陈 涛(1984-),男,广西桂林人,博士,教授,博士生导师,2013年于西安电子科技大学获得博士学位,主要从事太赫兹科学技术及应用方面的研究。E-mail:tchen@guet.edu.cn" ]
收稿:2025-06-25,
修回:2025-08-17,
纸质出版:2025-10-25
移动端阅览
陈涛,赵利.改进黑翅鸢算法优化的XGBoost可解释模型在转基因棉籽油太赫兹光谱鉴别中的应用[J].光学精密工程,2025,33(20):3192-3202.
CHEN Tao,ZHAO Li.Application of XGBoost explainable model improved by black-winged kite algorithm optimization in the identification of genetically modified cotton seed oil by terahertz spectroscopy[J].Optics and Precision Engineering,2025,33(20):3192-3202.
陈涛,赵利.改进黑翅鸢算法优化的XGBoost可解释模型在转基因棉籽油太赫兹光谱鉴别中的应用[J].光学精密工程,2025,33(20):3192-3202. DOI: 10.37188/OPE.20253320.3192. CSTR: 32169.14.OPE.20253320.3192.
CHEN Tao,ZHAO Li.Application of XGBoost explainable model improved by black-winged kite algorithm optimization in the identification of genetically modified cotton seed oil by terahertz spectroscopy[J].Optics and Precision Engineering,2025,33(20):3192-3202. DOI: 10.37188/OPE.20253320.3192. CSTR: 32169.14.OPE.20253320.3192.
为实现对转基因和非转基因棉籽油的准确分类鉴别,本研究提出一种基于改进黑翅鸢算法优化极端梯度提升(XGBoost)模型的可解释分类模型。首先,应用太赫兹时域光谱(THz-TDS)系统采集转基因和非转基因棉籽油样品在0.3~1.8 THz频段的太赫兹吸收光谱。然后,通过引入双目标适应度函数优化策略、反向学习初始化种群策略和瑞利分布函数控制Lévy飞行策略对传统黑翅鸢算法(BKA)进行改进,并利用改进的黑翅鸢算法(DLBKA)对XGBoost模型的树深度、学习率和最大迭代次数进行双目标超参数优化,构建出DLBKA-XGBoost分类模型。最后,应用该模型对转基因棉籽油进行鉴别,并结合SHAP方法对模型鉴别结果进行了可解释性分析。结果表明,改进黑翅鸢算法优化的XGBoost可解释分类模型不仅提升了对转基因和非转基因棉籽油鉴别的准确率(其测试集准确率高达97.78%,较传统黑翅鸢算法优化模型提升了4.45%,较传统鲸鱼算法(WOA)优化模型提升了14.45%),还对模型给出了解释,明确了关键特征频率对鉴别结果的正向影响机制,提升了模型的透明度与可信度。因此,本研究为转基因棉籽油的鉴别提供了一种快速准确的分析方法,也为其他转基因物质的鉴别提供了有价值的参考。
To achieve accurate classification and identification of genetically modified and non-genetically modified cottonseed oil, this study proposes an explainable classification model based on an improved black-winged kite algorithm optimized extreme gradient boosting (XGBoost) model. First, a terahertz time-domain spectroscopy (THz-TDS) system was used to collect terahertz absorption spectra of genetically modified and non-genetically modified cottonseed oil samples in the 0.3-1.8 THz frequency range. Then, the traditional Black-winged Kite algorithm (BKA) was improved by introducing a dual-objective fitness function optimization strategy, a reverse learning initial population strategy, and a Rayleigh distribution function to control the Lévy flight strategy. The improved Black-winged Kite algorithm (DLBKA) was used to perform dual-objective hyperparameter optimization of the tree depth, learning rate, and maximum iteration count of the XGBoost model, thereby constructing the DLBKA-XGBoost classification model. Finally, the model was applied to identify genetically modified cottonseed oil, and the model's identification results were analyzed for interpretability using the SHAP method. The results showed that the improved Black-winged Kite Algorithm-optimized XGBoost interpretable classification model not only improved the accuracy of identifying genetically modified and non-genetically modified cottonseed oil (with a test set accuracy as high as 97.78%, an improvement of 4.45% over the traditional Black-winged Kite algorithm-optimized model, an improvement of 14.45% over the traditional Whale Optimization Algorithm(WOA)-optimized model), but also provided explanations for the model, clarifying the positive influence mechanism of key feature frequencies on the identification results, thereby enhancing the model's transparency and credibility. Therefore, this study provides a fast and accurate analytical method for the identification of genetically modified cottonseed oil and offers valuable references for the identification of other genetically modified substances.
RIAZ T , IQBAL M W , MAHMOOD S , et al . Cottonseed oil: a review of extraction techniques, physicochemical, functional, and nutritional properties [J]. Critical Reviews in Food Science and Nutrition , 2023 , 63 ( 9 ): 1219 - 1237 . doi: 10.1080/10408398.2021.1963206 http://dx.doi.org/10.1080/10408398.2021.1963206
THANGARAJ A , KAUL R , SHARDA S , et al . Revolutionizing cotton cultivation: a comprehensive review of genome editing technologies and their impact on breeding and production [J]. Biochemical and Biophysical Research Communications , 2025 , 742 : 151084 . doi: 10.1016/j.bbrc.2024.151084 http://dx.doi.org/10.1016/j.bbrc.2024.151084
ZHANG C , WOHLHUETER R , ZHANG H . Genetically modified foods: a critical review of their promise and problems [J]. Food Science and Human Wellness , 2016 , 5 ( 3 ): 116 - 123 . doi: 10.1016/j.fshw.2016.04.002 http://dx.doi.org/10.1016/j.fshw.2016.04.002
KUMAR K , GAMBHIR G , DASS A , et al . Genetically modified crops: current status and future prospects [J]. Planta , 2020 , 251 ( 4 ): 91 . doi: 10.1007/s00425-020-03372-8 http://dx.doi.org/10.1007/s00425-020-03372-8
WANG J B , WANG Y , HU X W , et al . A dual-RPA based lateral flow strip for sensitive, on-site detection of CP4-EPSPS and Cry1Ab/Ac genes in genetically modified crops [J]. Food Science and Human Wellness , 2024 , 13 ( 1 ): 183 - 190 . doi: 10.26599/fshw.2022.9250015 http://dx.doi.org/10.26599/fshw.2022.9250015
WANG Y F , BEDNARCIK M , AMENT C , et al . Immunoassays and mass spectrometry for determination of protein concentrations in genetically modified crops [J]. Journal of Agricultural and Food Chemistry , 2024 , 72 ( 16 ): 8879 - 8889 .
DEBODE F , HULIN J , CHARLOTEAUX B , et al . Detection and identification of transgenic events by next generation sequencing combined with enrichment technologies [J]. Scientific Reports , 2019 , 9 : 15595 . doi: 10.1038/s41598-019-51668-x http://dx.doi.org/10.1038/s41598-019-51668-x
HE S T , FAN Y Y , TAO S M , et al . Application of next-generation sequencing in the detection of transgenic crop [J]. Frontiers in Genetics , 2024 , 15 : 1461115 . doi: 10.3389/fgene.2024.1461115 http://dx.doi.org/10.3389/fgene.2024.1461115
LI Q X , LEI T , CHENG Y L , et al . Predicting wheat gluten concentrations in potato starch using GPR and SVM models built by terahertz time-domain spectroscopy [J]. Food Chemistry , 2024 , 432 : 137235 . doi: 10.1016/j.foodchem.2023.137235 http://dx.doi.org/10.1016/j.foodchem.2023.137235
YU J X , PU H B , SUN D W . Stacked long and short-term memory (SLSTM) - assisted terahertz spectroscopy combined with permutation importance for rapid red wine varietal identification [J]. Talanta , 2025 , 291 : 127650 . doi: 10.1016/j.talanta.2025.127650 http://dx.doi.org/10.1016/j.talanta.2025.127650
ZHOU S L , ZHU S P , LIU G H , et al . Rapid detection of transgenic soybean oils by terahertz (THz) spectroscopy [J]. Journal of Nanoelectronics and Optoelectronics , 2017 , 12 ( 9 ): 956 - 960 . doi: 10.1166/jno.2017.2218 http://dx.doi.org/10.1166/jno.2017.2218
陈涛 , 李欣 . 太赫兹光谱在转基因菜籽油鉴别中的应用: 基于改进蜉蝣算法的支持向量机模型 [J]. 物理学报 , 2024 , 73 ( 5 ): 366 - 374 . doi: 10.7498/aps.73.20231569 http://dx.doi.org/10.7498/aps.73.20231569
CHEN T , LI X . Application of terahertz spectroscopy in identification of transgenic rapeseed oils: a support vector machine model based on modified mayfly optimization algorithm [J]. Acta Physica Sinica , 2024 , 73 ( 5 ): 366 - 374 . (in Chinese) . doi: 10.7498/aps.73.20231569 http://dx.doi.org/10.7498/aps.73.20231569
SHWARTZ-ZIV R , ARMON A . Tabular data: deep learning is not all you need [J]. Information Fusion , 2022 , 81 : 84 - 90 . doi: 10.1016/j.inffus.2021.11.011 http://dx.doi.org/10.1016/j.inffus.2021.11.011
WANG J , WANG W C , HU X X , et al . Black-winged kite algorithm: a nature-inspired meta-heuristic for solving benchmark functions and engineering problems [J]. Artificial Intelligence Review , 2024 , 57 ( 4 ): 98 . doi: 10.1007/s10462-024-10723-4 http://dx.doi.org/10.1007/s10462-024-10723-4
GEZIMATI M , SINGH G . Deep learning for multimodal breast cancer characterization with emergence of terahertz and infrared imaging [J]. IEEE Transactions on Instrumentation and Measurement , 2025 , 74 : 2511514 . doi: 10.1109/tim.2025.3547084 http://dx.doi.org/10.1109/tim.2025.3547084
CHEN T Q , GUESTRIN C . XGBoost: a scalable tree boosting system [C]. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco California USA . ACM , 2016 : 785 - 794 . doi: 10.1145/2939672.2939785 http://dx.doi.org/10.1145/2939672.2939785
SUN Z , LI Y L , YANG Y X , et al . Splitting tensile strength of basalt fiber reinforced coral aggregate concrete: Optimized XGBoost models and experimental validation [J]. Construction and Building Materials , 2024 , 416 : 135133 . doi: 10.1016/j.conbuildmat.2024.135133 http://dx.doi.org/10.1016/j.conbuildmat.2024.135133
ZHANG Z , WANG X K , CAO L . FOX optimization algorithm based on adaptive spiral flight and multi-strategy fusion [J]. Biomimetics , 2024 , 9 ( 9 ): 524 . doi: 10.3390/biomimetics9090524 http://dx.doi.org/10.3390/biomimetics9090524
LUNDBERG S M , LEE S A . Unified approach to interpreting model predictions [C]. Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS) , Long Beach, CA, USA , 2017 : 4765 - 4774 .
苏海霞 , 张朝晖 , 赵小燕 , 等 . THz-TDS测试中时域脉冲的本质段及其截取模型 [J]. 光谱学与光谱分析 , 2013 , 33 ( 4 ): 921 - 925 .
SU H X , ZHANG Z H , ZHAO X Y , et al . Intrinsic section and its interception model for temporal pulse of THz-TDS [J]. Spectroscopy and Spectral Analysis , 2013 , 33 ( 4 ): 921 - 925 . (in Chinese)
CHEN C W , YANG H Q , LI X Y , et al . Hyperspectral estimation method for deterioration of rock carvings in the humid regions of Southern China [J]. Heritage Science , 2024 , 12 ( 1 ): 105 . doi: 10.1186/s40494-024-01226-0 http://dx.doi.org/10.1186/s40494-024-01226-0
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621
