多元信息聚合的人群密度估计与计数

刘光辉; 王秦蒙; 陈宣润; 孟月波

doi:10.37188/OPE.20223010.1228

您当前的位置：

首页 >

文章列表页 >

多元信息聚合的人群密度估计与计数

信息科学 | 更新时间：2022-06-21

- 多元信息聚合的人群密度估计与计数
- A multivariate information aggregation method for crowd density estimation and counting
- 光学精密工程 2022年30卷第10期页码：1228-1239
- 作者机构：
  
  1.西安建筑科技大学信息与控制工程学院，陕西西安 710055
  2.中科星图空间技术有限公司，陕西西安 710199
- 作者简介：
  
  [ "刘光辉（1976-），男，陕西西安人，西安建筑科技大学信息与控制工程学院硕士生导师，2016年于西安建筑科技大学获得工学博士学位，主要从事计算机视觉感知与理解、人工智能与智动化系统方面的研究。E-mail：guanghuil@163.com" ]
  [ "孟月波（1979-），女，陕西西安人，西安建筑科技大学信息与控制工程学院硕士生导师，2014年于西安交通大学大学获得工学博士学位，主要从事计算机感知与理解、人工智能与智动化系统、建筑智能化技术方面的研究。E-mail：mengyuebo@163.com" ]
- 基金信息：
  
  陕西省自然科学基础研究计划面上项目(2020JM-473;2020JM-472);陕西省重点研发计划项目(2021SF-429)
- DOI：10.37188/OPE.20223010.1228
  中图分类号： TP391
- 收稿日期：2022-01-19，
  
  修回日期：2022-01-26，
  
  纸质出版日期：2022-05-25
- 稿件说明：
移动端阅览
刘光辉,王秦蒙,陈宣润等.多元信息聚合的人群密度估计与计数[J].光学精密工程,2022,30(10):1228-1239.

LIU Guanghui,WANG Qinmeng,CHEN Xuanrun,et al.A multivariate information aggregation method for crowd density estimation and counting[J].Optics and Precision Engineering,2022,30(10):1228-1239.
刘光辉,王秦蒙,陈宣润等.多元信息聚合的人群密度估计与计数[J].光学精密工程,2022,30(10):1228-1239. DOI： 10.37188/OPE.20223010.1228.

LIU Guanghui,WANG Qinmeng,CHEN Xuanrun,et al.A multivariate information aggregation method for crowd density estimation and counting[J].Optics and Precision Engineering,2022,30(10):1228-1239. DOI： 10.37188/OPE.20223010.1228.

摘要

人群密度估计与计数是指对拥挤场景中人群分布及数量进行统计，对安全系统、交通控制等具有重要意义。针对高密度图像在人群密度估计中特征提取困难、空间语义信息获取较难、特征融合不充分等问题，本文提出一种多元信息聚合人群密度估计方法（Multivariate information aggregation，MIA）。首先，设计多元信息提取网络，采用VGG-19作为骨架网络提高特征提取深度，利用多层语义监督策略编码低层特征方式提高低层特征的语义表达，通过空间信息嵌入丰富高层特征空间信息表征；其次，设计多尺度上下文信息聚合网络，通过两个带有步长卷积的轻量级空洞空间金字塔池化（Simplify-atrous spatial pyramid pooling，S-ASPP）结构在进行全局多尺度上下文信息聚合的同时缓解模型参数冗余；最后，网络末端采用步长卷积，在不影响精度的前提下加快网络运行速度。采用ShanghaiTech、UCF-QNRF、NWPU数据集进行对比实验，实验结果表明：在典型数据集ShanghaiTech的Part_A部分上的MAE、MSE分别为59.4、96.2，Part_B部分分别为7.7、11.9；超高密度多视角场景数据集UCF-QNRF的MAE为89.3，MSE为164.5；NWPU数据集的MAE为87.9，MSE为417.2。本文方法较对比方法性能有一定提升，且实际场景应用结果验证了本文方法效果较好。

Abstract

In crowd density estimation， the crowd distribution and quantity in a crowded scene are counted， which is vital to safety systems and traffic control. A multivariate information aggregation method is proposed herein to solve difficult feature extractions， difficult spatial semantic information acquisitions， and insufficient feature fusions in the crowd density estimation of high-density images. First， a multi-information extraction network is designed， where VGG-19 is used as a skeleton network to enhance the depth of feature extraction， and a multilayer semantic surveillance strategy is adopted to encode low-level features to improve the semantic representation of low-level features. Second， a multiscale contextual information aggregation network is designed based on spatial information embedded into the high-level feature space， and two lightweight spatial pyramiding structures with step-size convolution are applied to reduce the redundancy of model parameters during global multiscale context information aggregation. Finally， step convolution is performed at the end of the network to accelerate the network operation without affecting the precision. The ShanghaiTech， UCF-QNRF， and NWPU datasets are applied for a comparison experiment. The experimental results demonstrate that the MAE and MSE of Part_A of the ShanghaiTech dataset are 59.4 and 96.2， respectively， whereas those of Part_B are 7.7 and 11.9， respectively. The ultradense multiview-scene UCF-QNRF dataset indicates an MAE and MSE of 89.3 and 164.5， respectively. The high-density NWPU dataset indicates an MAE and MSE of 87.9 and 417.2， respectively. The proposed method performs better than the comparison method， as indicated by actual application results.

关键词

Keywords

references

LI X ， CHEN M ， NIE F ， et al . A multiview-based parameterfree framework for group detection ［C］. Thirty-First AAAI Conference on Artificial Intelligence. 49，2017 ， San Francisco， California USA . 2017 ： 4147 - 4153

LIN S F ， CHEN J Y ， CHAO H X . Estimation of number of people in crowded scenes using perspective transformation ［J］. IEEE Transactions on Systems， Man， and Cybernetics - Part A： Systems and Humans ， 2001 ， 31 （ 6 ）： 645 - 654 . doi: 10.1109/3468.983420 http://dx.doi.org/10.1109/3468.983420

ZHAO T ， NEVATIA R ， WU B . Segmentation and tracking of multiple humans in crowded environments ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2008 ， 30 （ 7 ）： 1198 - 1211 . doi: 10.1109/tpami.2007.70770 http://dx.doi.org/10.1109/tpami.2007.70770

VIOLA P ， JONES M J ， SNOW D . Detecting pedestrians using patterns of motion and appearance ［J］. International Journal of Computer Vision ， 2005 ， 63 （ 2 ）： 153 - 161 . doi: 10.1007/s11263-005-6644-8 http://dx.doi.org/10.1007/s11263-005-6644-8

KILAMBI P ， RIBNICK E ， JOSHI A J ， et al . Estimating pedestrian counts in groups ［J］. Computer Vision and Image Understanding ， 2008 ， 110 （ 1 ）： 43 - 59 . doi: 10.1016/j.cviu.2007.02.003 http://dx.doi.org/10.1016/j.cviu.2007.02.003

左静，巴玉林 . 基于多尺度融合的深度人群计数算法［J］. 激光与光电子学进展， 2020 ， 57 （ 24 ）： 241502 . doi: 10.3788/lop57.241502 http://dx.doi.org/10.3788/lop57.241502

ZUO J ， BA Y L . Population-depth counting algorithm based on multiscale fusion ［J］. Laser & Optoelectronics Progress ， 2020 ， 57 （ 24 ）： 241502 . （in Chinese） . doi: 10.3788/lop57.241502 http://dx.doi.org/10.3788/lop57.241502

赵建敏，李雪冬，李宝山 . 基于无人机图像的羊群密集计数算法研究［J］. 激光与光电子学进展， 2021 ， 58 （ 22 ）： 2210013 .

ZHAO J M ， LI X D ， LI B S . Algorithm of sheep dense counting based on unmanned aerial vehicle images ［J］. Laser & Optoelectronics Progress ， 2021 ， 58 （ 22 ）： 2210013 . （in Chinese）

IDREES H ， TAYYAB M ， ATHREY K ， et al . Composition loss for counting， density map estimation and localization in dense crowds ［C］. Proceedings of the European Conference on Computer Vision （ECCV）. 814 ， Munich， Germany . 2018 ： 532 - 546 . doi: 10.1007/978-3-030-01216-8_33 http://dx.doi.org/10.1007/978-3-030-01216-8_33

RODRIGUEZ M ， LAPTEV I ， SIVIC J ， et al . Density-aware person detection and tracking in crowds ［C］. 2011 International Conference on Computer Vision . 613，2011 ， Barcelona， Spain . IEEE ， 2011 ： 2423 - 2430 . doi: 10.1109/iccv.2011.6126526 http://dx.doi.org/10.1109/iccv.2011.6126526

慕晓冬，白坤，尤轩昂，等 . 基于对比学习方法的遥感影像特征提取与分类［J］. 光学精密工程， 2021 ， 29 （ 9 ）： 2222 - 2234 . doi: 10.37188/OPE.20212909.2222 http://dx.doi.org/10.37188/OPE.20212909.2222

MU X D ， BAI K ， YOU X A ， et al . Remote sensing image feature extraction and classification based on contrastive learning method ［J］. Opt. Precision Eng. ， 2021 ， 29 （ 9 ）： 2222 - 2234 . （in Chinese） . doi: 10.37188/OPE.20212909.2222 http://dx.doi.org/10.37188/OPE.20212909.2222

周涛，霍兵强，陆惠玲，等 . 融合多尺度图像的密集神经网络肺部肿瘤识别算法［J］. 光学精密工程， 2021 ， 29 （ 7 ）： 1695 - 1708 . doi: 10.37188/OPE.20212907.1695 http://dx.doi.org/10.37188/OPE.20212907.1695

ZHOU T ， HUO B Q ， LU H L ， et al . Lung tumor image recognition algorithm with densenet fusion multi-scale images ［J］. Opt. Precision Eng. ， 2021 ， 29 （ 7 ）： 1695 - 1708 . （in Chinese） . doi: 10.37188/OPE.20212907.1695 http://dx.doi.org/10.37188/OPE.20212907.1695

REN S Q ， HE K M ， GIRSHICK R ， et al . Faster R-CNN： towards real-time object detection with region proposal networks ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2017 ， 39 （ 6 ）： 1137 - 1149 . doi: 10.1109/tpami.2016.2577031 http://dx.doi.org/10.1109/tpami.2016.2577031

常亮，邓小明，周明全，等 . 图像理解中的卷积神经网络［J］. 自动化学报， 2016 ， 42 （ 9 ）： 1300 - 1312 . doi: 10.16383/j.aas.2016.c150800 http://dx.doi.org/10.16383/j.aas.2016.c150800

CHANG L ， DENG X M ， ZHOU M Q ， et al . Convolutional neural networks in image understanding ［J］. Acta Automatica Sinica ， 2016 ， 42 （ 9 ）： 1300 - 1312 . （in Chinese） . doi: 10.16383/j.aas.2016.c150800 http://dx.doi.org/10.16383/j.aas.2016.c150800

WANG C ， ZHANG H ， YANG L ， et al . Deep people counting in extremely dense crowds ［C］. Proceedings of the 23rd ACM international conference on Multimedia. Brisbane Australia. New York， NY， USA ： ACM ， 2015 ： 1299 - 1302 . doi: 10.1145/2733373.2806337 http://dx.doi.org/10.1145/2733373.2806337

ZHANG C ， LI H S ， WANG X G ， et al . Cross-scene crowd counting via deep convolutional neural networks ［C］. 2015 IEEE Conference on Computer Vision and Pattern Recognition . 712，2015 ， Boston， MA . IEEE ， 2015 ： 833 - 841 . doi: 10.1109/cvpr.2015.7298684 http://dx.doi.org/10.1109/cvpr.2015.7298684

ZHANG Y Y ， ZHOU D S ， CHEN S Q ， et al . Single-image crowd counting via multi-column convolutional neural network ［C］. 2016 IEEE Conference on Computer Vision and Pattern Recognition . 2730，2016 ， Las Vegas， NV， USA . IEEE ， 2016 ： 589 - 597 . doi: 10.1109/cvpr.2016.70 http://dx.doi.org/10.1109/cvpr.2016.70

SAM D B ， SURYA S ， BABU R V . Switching convolutional neural network for crowd counting ［C］. 2017 IEEE Conference on Computer Vision and Pattern Recognition . 2126，2017 ， Honolulu， HI， USA . IEEE ， 2017 ： 4031 - 4039 . doi: 10.1109/cvpr.2017.429 http://dx.doi.org/10.1109/cvpr.2017.429

ZENG L K ， XU X M ， CAI B L ， et al . Multi-scale convolutional neural networks for crowd counting ［C］. 2017 IEEE International Conference on Image Processing . 1720，2017 ， Beijing， China . IEEE ， 2017 ： 465 - 469 . doi: 10.1109/icip.2017.8296324 http://dx.doi.org/10.1109/icip.2017.8296324

LI Y H ， ZHANG X F ， CHEN D M . CSRNet： dilated convolutional neural networks for understanding the highly congested scenes ［C］. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . 1823，2018 ， Salt Lake City， UT， USA . IEEE ， 2018 ： 1091 - 1100 . doi: 10.1109/cvpr.2018.00120 http://dx.doi.org/10.1109/cvpr.2018.00120

SINDAGI V A ， PATEL V M . CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting ［C］. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. August 29 - September 1 ， 2017 ， Lecce， Italy. IEEE ， 2017： 1 - 6 . doi: 10.1109/avss.2017.8078491 http://dx.doi.org/10.1109/avss.2017.8078491

LIU W Z ， SALZMANN M ， FUA P . Context-aware crowd counting ［C］. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 1520，2019 ， Long Beach， CA， USA. IEEE ， 2019 ： 5094 - 5103 . doi: 10.1109/cvpr.2019.00524 http://dx.doi.org/10.1109/cvpr.2019.00524

张宇倩，李国辉，雷军，等 . FF-CAM：基于通道注意机制前后端融合的人群计数［J］. 计算机学报， 2021 ， 44 （ 2 ）： 304 - 317 . doi: 10.11897/SP.J.1016.2021.00304 http://dx.doi.org/10.11897/SP.J.1016.2021.00304

ZHANG Y Q ， LI G H ， LEI J ， et al . FF-CAM： crowd counting based on frontend-backend fusion through channel-attention mechanism ［J］. Chinese Journal of Computers ， 2021 ， 44 （ 2 ）： 304 - 317 . （in Chinese） . doi: 10.11897/SP.J.1016.2021.00304 http://dx.doi.org/10.11897/SP.J.1016.2021.00304

孟月波，陈宣润，刘光辉，等 . 高低密度多维视角多元信息融合人群计数方法［J/OL］. 控制与决策： 1 - 10 ［ 2022-01-16 ］. DOI： 10.13195/j.kzyjc. 2021.0520 http://dx.doi.org/10.13195/j.kzyjc.2021.0520 .

MENG Y B ， CHEN X R ， LIU G H ， et al . High and low density multi-dimension perspective multivariate information fusion crowd counting method ［J/OL］. Control and Decision ： 1 - 10 ［ 2022-01-16 ］. DOI： 10.13195/j.kzyjc.2021.0520. http://dx.doi.org/10.13195/j.kzyjc.2021.0520. （in Chinese）

ZHANG Z L ， ZHANG X Y ， PENG C ， et al . ExFuse ： Enhancing Feature Fusion for Semantic Segmentation ［C］. Proceedings of the European Conference on Computer Vision（ECCV） . 2018 ： 269 - 284 . doi: 10.1007/978-3-030-01249-6_17 http://dx.doi.org/10.1007/978-3-030-01249-6_17

CHEN L C ， PAPANDREOU G ， KOKKINOS I ， et al . DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2018 ， 40 （ 4 ）： 834 - 848 . doi: 10.1109/tpami.2017.2699184 http://dx.doi.org/10.1109/tpami.2017.2699184

OH M H ， OLSEN P ， RAMAMURTHY K N . Crowd counting with decomposed uncertainty ［J］. Proceedings of the AAAI Conference on Artificial Intelligence ， 2020 ， 34 （ 7 ）： 11799 - 11806 . doi: 10.1609/aaai.v34i07.6852 http://dx.doi.org/10.1609/aaai.v34i07.6852

HAROON I ， MUHMMAD T ， KISHAN A ， et al . Composition Loss for Counting， Density Map Estimation and Localization in Dense Crowds ［C］. Proceedings of IEEE European Conference on Computer Vision （ECCV）， 814 ， Munich， Germany ， 2018 ： 532 - 546 . doi: 10.1007/978-3-030-01216-8_33 http://dx.doi.org/10.1007/978-3-030-01216-8_33

WANG Q ， GAO J Y ， LIN W ， et al . Learning from synthetic data for crowd counting in the wild ［C］. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. 1520，2019 ， Long Beach， CA， USA. IEEE ， 2019 ： 8190 - 8199 . doi: 10.1109/cvpr.2019.00839 http://dx.doi.org/10.1109/cvpr.2019.00839

WANG Q ， GAO J Y ， LIN W ， et al . NWPU-crowd： a large-scale benchmark for crowd counting and localization ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence ， 2021 ， 43 （ 6 ）： 2141 - 2149 . doi: 10.1109/tpami.2020.3013269 http://dx.doi.org/10.1109/tpami.2020.3013269

MA Z H ， WEI X ， HONG X P ， et al . Bayesian loss for crowd count estimation with point supervision ［C］. 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. October 27 - November 2 ， 2019 ， Seoul， Korea （South）. IEEE ， 2019： 6141 - 6150 . doi: 10.1109/iccv.2019.00624 http://dx.doi.org/10.1109/iccv.2019.00624

浏览量

1415

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据