Audio object detection network with multimodal cross level feature knowledge transfer
Information Sciences|更新时间:2024-02-01
|
Audio object detection network with multimodal cross level feature knowledge transfer
“The application of sound in the field of object detection has made new progress. To address the issue of low robustness of current target localization methods that rely solely on monitoring environmental sounds, experts have proposed a multimodal self supervised object detection network under cross level feature knowledge transfer. This network not only designs a multi teacher cross level feature knowledge transfer loss based on attention fusion to improve the learning ability of the network, but also solves the problem of missing localization information through localization distillation loss. The experimental results in the multimodal audiovisual detection MAVD dataset show that the mAP values of the network have improved by 6.71%, 14.36%, and 10.32% compared to the baseline network at IOU values of 0.5, 0.75, and average, respectively, demonstrating the superiority of the detection network. This research achievement provides a new solution for the application of sound in the field of object detection, and also opens up new directions for research in related fields.”
Optics and Precision EngineeringVol. 32, Issue 2, Pages: 237-251(2024)