Speaker Recognition Based on Adapted Gaussian Mixture Model and Static and Dynamic Auditory Feature Fusion

WU Di; CAO Jie; WANG Jin-hua

doi:10.3788/OPE.20132106.1598

您当前的位置：

首页 >

文章列表页 >

Speaker Recognition Based on Adapted Gaussian Mixture Model and Static and Dynamic Auditory Feature Fusion

更新时间：2020-08-12

- Speaker Recognition Based on Adapted Gaussian Mixture Model and Static and Dynamic Auditory Feature Fusion
- Optics and Precision Engineering Vol. 21, Issue 6, Pages: 1598-1604(2013)
- 作者机构：
  
  兰州理工大学电气工程与信息工程学院
- 作者简介：
- 基金信息：
- DOI：10.3788/OPE.20132106.1598
  CLC： TP391.41
- Received：10 December 2012，
  
  Revised：28 February 2013，
  
  Published Online：20 June 2013，
  
  Published：15 June 2013
- 稿件说明：
移动端阅览
吴迪曹洁王进花. 基于自适应高斯混合模型与静动态听觉特征融合的说话人识别[J]. 光学精密工程, 2013,21(6): 1598-1604

WU Di CAO Jie WANG Jin-hua. Speaker Recognition Based on Adapted Gaussian Mixture Model and Static and Dynamic Auditory Feature Fusion[J]. Editorial Office of Optics and Precision Engineering, 2013,21(6): 1598-1604
吴迪曹洁王进花. 基于自适应高斯混合模型与静动态听觉特征融合的说话人识别[J]. 光学精密工程, 2013,21(6): 1598-1604 DOI： 10.3788/OPE.20132106.1598.

WU Di CAO Jie WANG Jin-hua. Speaker Recognition Based on Adapted Gaussian Mixture Model and Static and Dynamic Auditory Feature Fusion[J]. Editorial Office of Optics and Precision Engineering, 2013,21(6): 1598-1604 DOI： 10.3788/OPE.20132106.1598.

摘要

对特征参数和高斯混合模型进行改进，提出了一种特征域和模型域混合补偿的方法用于解决说话人识别特征受噪声影响较大以及高斯混合模型随训练样本长度减小而性能下降的问题。通过模拟人耳听觉，给出了基于伽马通滤波器的伽马通滤波倒谱系数；考虑其只反映了语音的静态特征，提取了能够反映语音动态特征的伽马通滑动差分倒谱系数。基于因子分析技术，利用移动因子表示高斯混合模型的自适应过程，通过训练语料较充分的说话人模型中的均值向量补偿受训练语料长度影响较大的分量的均值向量。仿真实验表明：在纯净背景下，本文方法的识别率达到了98.46%；在不同噪声环境下，本文提出的混合补偿方法能有效提高说话人识别系统的性能。

Abstract

By optimizing the feature vectors and Gaussian Mixture Models(GMMs)

a hybrid compensation method in model and feature domains is proposed. With the method

the speaker recognition features effected by the noise and the declined performance of GMM with reducing length of the training data under different unexpected noise environments are improved. By emulating human auditory

Gammatone Filter Cepstral Coefficients(GFCC) is given out based on Gammatone Filter bank models. As the GFCC only reflects the static properties

the Gammatone Filter Shifted Delta Cepstral Coefficients(GFSDCC) is extracted based on Shifted Delta Cepstral. Then

the adaptive process for each GMM model with sufficient training data is transformed to the shift factor based on factor analysis. Furthermore

when the training data are insufficient

the coordinate of the shift factor is learned from the GMM mixtures of insensitive to the training data and then it is adapted to compensate other GMM mixtures. The experiment result shows that the recognition rate of the method proposed is 98.46% . The conclusion is that the performance of speaker recognition system is improved under several kinds of noise environments.

关键词

Keywords

references

KINNUNEN T, LI H Z. An overview of text-independent speaker recognition: from features to supervectors [J]. Speech Communication, 2010,52:12-40.[2]HAMID R,SEYYED A ,HOSSEIN B,et al.. A new representation for speech frame recognition based on redundant wavelet filter banks [J].Speech Communication, 2012, 54:256-271.[3]TYLER K P, STEPHANIE N,JOHN D,et al.. Human voice recognition depends on language ability [J]. Science, 2011,333:595.[4]PARVIN Z,SEYYED A. Robust speech recognition by extracting invariant features [J].Procedia - Social and Behavioral Sciences, 2012,32(3):230-237.[5]SHAO Y,JIN ZH ZH,WANG D L. An auditory based feature for robust speech recognition [C]. ICASSP,2009:4625-4628.[6]MAK B K W, LAI T C, TSANG I W, et al.. Maximum penalized likelihood kernel regression for fast adaptation [J]. IEEE Transactions on Audio, Speech and Language Processing, 2009, 17(7): 1372-1381.[7]翟优，曾峦，熊伟.基于不变特征描述符实现星点匹配[J].光学精密工程, 2012,20(11):2531-2539. ZHAI Y,ZENG L, XIONG W.Star matching based on invariant feature descriptor [J].　Opt. Precision Eng., 2012,20(11):2531-2539. （in Chinese）[8]DU J,HUO Q.A feature compensation approach using high-order vector taylor series approximation of an explicit distortion model for noisy speech recognition[J].IEEE Transactions on Adio, Speech, and Language Processing,2011,19(8):2285-2293.[9]JEONG Y. Speaker adaptation based on the multilinear decomposition of training speaker models [C]. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Dallas, USA: IEEE, 2010:4870-4873.[10]HE Y J,HAN J.Gaussian specific compensation for channel distortion in speech recognition [J]. IEEE SIGNAL PROCESSING LETTERS, 2011, 18(10): 599-602.[11]OMID D,BIN M,ENG S,et al.. Discriminative feature extraction for speech recognition using continuous output codes [J]. Pattern Recognition Letters, 2012,33:1703-1709.[12]史思琦,石光明,李甫.基于轮廓特征多层描述和评价的部分遮挡目标匹配[J].光学精密工程,2012, 20(12):2804-2811.SHI S Q,SHI G M,LI F.Partially occluded object matching via multi-level description and evaluation of contour feature [J].Opt. Precision Eng., 2012,20(12):2804-2811.（in Chinese）[13]BALWANT A. SONKAMBLE,DOYE D D. A novel linear-polynomial kernel to construct support vector machines for speech recognition[J].Journal of Computer Science, 2011,7 (7): 991-996.[14]TOMAS P,PETER R. Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis [J]. IEEE Transactions on Affetive Computing, 2011,2(2):66-78.[15]SANTHOSH K C, MOHANDAS V P. Robust features for multilingual acoustic modeling[J]. Int J Speech Technol ,2011, 14:147-155.

Views

648

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Multispectral image segmentation by fuzzy clustering algorithm used Gaussian mixture model

The selection and overall algorithm of definition algorithm in image measuring apparatus for micro parts

Related Author

Yu LI

Yan XU

Xue-mei ZHAO

Quan-hua ZHAO

ZHANG Jun-Jie

王仲

CAO Jing-Jing

贡力

Related Institution

The Institute for Remote Sensing, School of Geomatics, Liaoning Technical University

天津大学精仪学院国家重点实验室305室

AI问答

Address：No.3888 Dong Nanhu Road, Changchun, Jilin, China Postal code：130033
Tel：0431-86176855 Email：gxjmgc@ciomp.ac.cn
Technical support is provided by Beijing Founder electronics co., LTD 吉ICP备11002662号-17 京公网安备11010802024621
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰