WU Di CAO Jie WANG Jin-hua. Speaker Recognition Based on Adapted Gaussian Mixture Model and Static and Dynamic Auditory Feature Fusion[J]. Editorial Office of Optics and Precision Engineering, 2013,21(6): 1598-1604
WU Di CAO Jie WANG Jin-hua. Speaker Recognition Based on Adapted Gaussian Mixture Model and Static and Dynamic Auditory Feature Fusion[J]. Editorial Office of Optics and Precision Engineering, 2013,21(6): 1598-1604 DOI: 10.3788/OPE.20132106.1598.
Speaker Recognition Based on Adapted Gaussian Mixture Model and Static and Dynamic Auditory Feature Fusion
By optimizing the feature vectors and Gaussian Mixture Models(GMMs)
a hybrid compensation method in model and feature domains is proposed. With the method
the speaker recognition features effected by the noise and the declined performance of GMM with reducing length of the training data under different unexpected noise environments are improved. By emulating human auditory
Gammatone Filter Cepstral Coefficients(GFCC) is given out based on Gammatone Filter bank models. As the GFCC only reflects the static properties
the Gammatone Filter Shifted Delta Cepstral Coefficients(GFSDCC) is extracted based on Shifted Delta Cepstral. Then
the adaptive process for each GMM model with sufficient training data is transformed to the shift factor based on factor analysis. Furthermore
when the training data are insufficient
the coordinate of the shift factor is learned from the GMM mixtures of insensitive to the training data and then it is adapted to compensate other GMM mixtures. The experiment result shows that the recognition rate of the method proposed is 98.46% . The conclusion is that the performance of speaker recognition system is improved under several kinds of noise environments.
关键词
Keywords
references
KINNUNEN T, LI H Z. An overview of text-independent speaker recognition: from features to supervectors [J]. Speech Communication, 2010,52:12-40.[2]HAMID R,SEYYED A ,HOSSEIN B,et al.. A new representation for speech frame recognition based on redundant wavelet filter banks [J].Speech Communication, 2012, 54:256-271.[3]TYLER K P, STEPHANIE N,JOHN D,et al.. Human voice recognition depends on language ability [J]. Science, 2011,333:595.[4]PARVIN Z,SEYYED A. Robust speech recognition by extracting invariant features [J].Procedia - Social and Behavioral Sciences, 2012,32(3):230-237.[5]SHAO Y,JIN ZH ZH,WANG D L. An auditory based feature for robust speech recognition [C]. ICASSP,2009:4625-4628.[6]MAK B K W, LAI T C, TSANG I W, et al.. Maximum penalized likelihood kernel regression for fast adaptation [J]. IEEE Transactions on Audio, Speech and Language Processing, 2009, 17(7): 1372-1381.[7]翟优,曾峦,熊伟.基于不变特征描述符实现星点匹配[J].光学 精密工程, 2012,20(11):2531-2539. ZHAI Y,ZENG L, XIONG W.Star matching based on invariant feature descriptor [J]. Opt. Precision Eng., 2012,20(11):2531-2539. (in Chinese)[8]DU J,HUO Q.A feature compensation approach using high-order vector taylor series approximation of an explicit distortion model for noisy speech recognition[J].IEEE Transactions on Adio, Speech, and Language Processing,2011,19(8):2285-2293.[9]JEONG Y. Speaker adaptation based on the multilinear decomposition of training speaker models [C]. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA: IEEE, 2010:4870-4873.[10]HE Y J,HAN J.Gaussian specific compensation for channel distortion in speech recognition [J]. IEEE SIGNAL PROCESSING LETTERS, 2011, 18(10): 599-602.[11]OMID D,BIN M,ENG S,et al.. Discriminative feature extraction for speech recognition using continuous output codes [J]. Pattern Recognition Letters, 2012,33:1703-1709.[12]史思琦,石光明,李甫.基于轮廓特征多层描述和评价的部分遮挡目标匹配[J].光学 精密工程,2012, 20(12):2804-2811.SHI S Q,SHI G M,LI F.Partially occluded object matching via multi-level description and evaluation of contour feature [J].Opt. Precision Eng., 2012,20(12):2804-2811.(in Chinese)[13]BALWANT A. SONKAMBLE,DOYE D D. A novel linear-polynomial kernel to construct support vector machines for speech recognition[J].Journal of Computer Science, 2011,7 (7): 991-996.[14]TOMAS P,PETER R. Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis [J]. IEEE Transactions on Affetive Computing, 2011,2(2):66-78.[15]SANTHOSH K C, MOHANDAS V P. Robust features for multilingual acoustic modeling[J]. Int J Speech Technol ,2011, 14:147-155.