SUN Ming-si,ZHAO Hong-wei,ZHAO Hao-yu,et al.Research on improved VLAD using spatial distribution entropy[J].Optics and Precision Engineering,2021,29(01):152-159.
SUN Ming-si,ZHAO Hong-wei,ZHAO Hao-yu,et al.Research on improved VLAD using spatial distribution entropy[J].Optics and Precision Engineering,2021,29(01):152-159. DOI: 10.37188/OPE.20212901.0152.
Research on improved VLAD using spatial distribution entropy
对图像特征中的局部聚集描述子向量(Vector of Aggragate Locally Descriptor,VLAD)特征进行研究后发现,该特征缺乏尺度不变特征变换(Scale Invariant Feature Transform,SIFT)描述子的空间分布信息,为了提高图像检索的准确率,提出利用空间分布熵改进VLAD的方法。首先,求取图像中的VLAD特征。其次,将SIFT描述子的空间分布信息根据描述子与聚类的对应情况,分为若干个集合。再次,在每一个集合中生成空间分布熵,将所有集合的熵值表示为空间分布熵向量。最后,利用该向量表示描述子的空间分布混乱程度,并将该向量与VLAD结合使用。实验结果表明,在码本大小为64时,在Holidays数据集上可以将平均准确率由0.519提升至0.601,在Oxford5k数据集上可以从0.395提升至0.408。该方法利用VLAD特征大幅度提高图像检索的平均准确率。
Abstract
To improve the accuracy of image retrieval algorithms, an improved VLAD method using spatial distribution entropy is proposed. After apprehending the VLAD features of images, it was discovered that the feature lacks the spatial distribution information of SIFT descriptors. First, the VLAD features in the image were obtained. Second, the spatial distribution information of SIFT descriptors is divided into several sets according to the correspondence between the descriptors and clustering. Third, the spatial distribution entropy is generated in each set. Further, the entropy of all sets is expressed as a spatial distribution entropy vector. Finally, the entropy vector is used to represent the degree of spatial distribution confusion of the descriptor, and the vector is combined with the VLAD. When the codebook size is 64, the experimental results show that the mean average accuracy obtained on the Holidays dataset and the Oxford5k dataset can be increased from 0.521-0.601 and 0.393-0.408, respectively. This method can greatly improve the average accuracy of VLAD features in image retrieval.
LI Y , LIU X Y , ZHANG H Q , et al . . Optical remote sensing image retrieval based on convolution neural network [J]. Opt. Precision Eng. , 2018 , 26 ( 1 ) : 200 - 207 . (in Chinese)
ZHU SH Y , ZHAO H D , WANG J , et al . . Design of embedded seal identification system based on SIFT-SVM [J]. Chinese Journal of Liquid Crystals and Displays , 2017 , 32 ( 11 ) : 914 - 922 . (in Chinese)
OPDENBOSCH D V , STEINBACH E . AVLAD: optimizing the VLAD image signature for specific feature descriptors [C]. IEEE International Symposium on Multimedia , 2017 : 545 - 550 .
LONG X , LU H , PENG Y , et al . . Image classification based on improved VLAD [J]. Multimedia Tools & Applications , 2016 , 75 ( 10 ): 5533 - 5555 .
LIU Y , HAN G L , SHI CH L . Based on SIFT algorithm, multi expression face recognition [J]. Chinese Journal of Liquid Crystals and Displays , 2016 , 31 ( 12 ) : 1156 - 1160 . (in Chinese)
NIAZ U , MERIALDO B . Exploring intra-bow statistics for improving visual categorization [C]. Image Analysis for Multimedia Interactive Services (WIAMIS) , 2013 14th International Workshop on. IEEE , 2013 : 1 - 4 .
ARANDJELOVIC R , ZISSERMAN A . All about VLAD [C]. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition , 2013 : 1578 - 1585 .
QIU W T , ZHAO J , LIU J . SIFT image matching method combined with region segmentation [J]. Chinese Journal of Liquid Crystals and Displays , 2012 , 27 ( 6 ): 827 - 831 . (in Chinese)
CHUM O , PHILBIN J , SIVIC J , et al .. Total recall: automatic query expansion with a generative feature model for object retrieval [C]. IEEE 11th International Conference on Computer Vision , 2007 : 1 - 8 .
PHILBIN J , CHUM O , ISARD M , et al .. Lost in quantization: improving particular object retrieval in large scale image databases [C]. Computer Vision and Pattern Recognition , 2008 : 1 - 8 .
JEGOU H , DOUZE M , SCHMID C . Hamming embedding and weak geometric consistency for large scale image search [C]. European Conference on Computer Vision , 2008 : 304 - 317 .
LIU P , MIAO Z , GUO H , et al . . Adding spatial distribution clue to aggregated vector in image retrieval [J]. Eurasip Journal on Image & Video Processing , 2018 , 2018 ( 1 ): 9 .
KIM T E , KIM M H . Improving the search accuracy of the VLAD through weighted aggregation of local descriptors [J]. Journal of Visual Communication and Image Representation , 2015 , 31 : 237 - 252 .
DELHUMEAU J , GOSSELIN P H , J 100194 ; GOU H, et al . . Revisiting the VLAD image representation[C] . Proceedings of the 21st ACM international conference on Multimedia. ACM , 2013 : 653 - 656 .
ARANDJELOVIC R , GRONAT P , TORII A , et al .. NetVLAD: CNN architecture for weakly supervised place recognition [C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016 : 5297 - 5307 .