基于判別鄰域嵌入算法的說話人識(shí)別

梁春燕; 袁文浩; 李艷玲; 夏斌; 孫文珠

doi:10.11999/JEIT180761

基于判別鄰域嵌入算法的說話人識(shí)別

doi: 10.11999/JEIT180761 cstr: 32379.14.JEIT180761

1.
山東理工大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院 ??淄博 ??255049
2.
內(nèi)蒙古師范大學(xué)計(jì)算機(jī)與信息工程學(xué)院 ??呼和浩特 ??010022

基金項(xiàng)目: 國(guó)家自然科學(xué)基金(11704229, 61701286, 61562068)，山東省自然科學(xué)基金(ZR2017LA011, ZR2015FL003, ZR2017MF047)，山東省高等學(xué)?？萍加?jì)劃項(xiàng)目(J17KA078)，內(nèi)蒙古自然科學(xué)基金項(xiàng)目(2015MS0629)

詳細(xì)信息

作者簡(jiǎn)介:
梁春燕：女，1986年生，講師，研究方向?yàn)檎f話人識(shí)別、語種識(shí)別

袁文浩：男，1985年生，講師，研究方向?yàn)檎Z音信號(hào)處理、語音增強(qiáng)

李艷玲：女，1978年生，副教授，研究方向?yàn)樽匀徽Z言處理、口語理解、機(jī)器學(xué)習(xí)

夏斌：男，1973年生，副教授，研究方向?yàn)樯疃葘W(xué)習(xí)、信號(hào)與信息處理

孫文珠：男，1983年生，講師，研究方向?yàn)槎嗝襟w信號(hào)傳輸

通訊作者:
梁春燕　liangchunyan@sdut.edu.cn

中圖分類號(hào): TP391.42
計(jì)量
- 文章訪問數(shù): 2223
- HTML全文瀏覽量: 638
- PDF下載量: 76
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2018-08-03
- 修回日期: 2019-01-21
- 網(wǎng)絡(luò)出版日期: 2019-02-24
- 刊出日期: 2019-07-01

Speaker Recognition Using Discriminant Neighborhood Embedding

1.
College of Computer Science and Technology, Shandong University of Technology, Zibo 255049, China
2.
College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot 010022, China

Funds: The National Natural Science Foundation of China (11704229, 61701286, 61562068), The Shandong Provincial Natural Science Foundation (ZR2017LA011, ZR2015FL003, ZR2017MF047), The Project of Shandong Province Higher Educational Science and Technology Program (J17KA078), The Natural Science Foundation of Inner Mongolia Autonomous Region of China (2015MS0629)

摘要

摘要: 該文提出一種基于判別鄰域嵌入(DNE)算法的說話人識(shí)別。判別鄰域嵌入算法作為流形學(xué)習(xí)方法的一種，可以通過構(gòu)建鄰接圖獲取數(shù)據(jù)的局部鄰域結(jié)構(gòu)信息；同時(shí)該算法可以充分利用類間判別信息，具有更強(qiáng)的判別能力。在美國(guó)國(guó)家標(biāo)準(zhǔn)技術(shù)研究院2010年說話人識(shí)別評(píng)測(cè)(NIST SRE 2010)電話-電話核心測(cè)試集上的實(shí)驗(yàn)結(jié)果表明了該算法的有效性。
- 說話人識(shí)別 /
- 總變化因子分析 /
- 鄰域保持嵌入 /
- 判別鄰域嵌入
Abstract: Discriminant Neighborhood Embedding (DNE) algorithm is introduced into the speaker recognition system. DNE is a manifold learning approach and aims at preserving the local neighborhood structure on the data manifold. As well, DNE has much more power in discrimination by sufficiently using the between-class discriminant information. The experimental results on the telephone-telephone core condition of the NIST 2010 Speaker Recognition Evaluation (SRE) dataset indicate the effectiveness of DNE algorithm.
- Speaker recognition /
- Total variability factor analysis /
- Neighborhood Preserving Embedding (NPE) /
- Discriminant Neighborhood Embedding (DNE)

HTML全文

表 1 NIST SRE 2010電話-電話測(cè)試集上DNE和NPE的EER和minDCF比較(無信道補(bǔ)償)

系統(tǒng) 男聲女聲
EER(%) minDCF EER(%) minDCF

NPE 5.76 0.0575 6.98 0.0744
DNE 5.28 0.0544 6.35 0.0683

下載: 導(dǎo)出CSV

表 2 NIST SRE 2010電話-電話測(cè)試集上DNE和NPE的EER和minDCF比較(LDA信道補(bǔ)償)

系統(tǒng) 男聲女聲
EER(%) minDCF EER(%) minDCF

NPE+LDA 4.71 0.0492 6.11 0.0633
DNE+LDA 4.19 0.0453 5.57 0.0604

下載: 導(dǎo)出CSV

表 3 NIST SRE 2010電話-電話測(cè)試集上DNE和NPE的EER和minDCF比較(WCCN信道補(bǔ)償)

系統(tǒng) 男聲女聲
EER(%) minDCF EER(%) minDCF

NPE+WCCN 5.07 0.0512 6.49 0.0677
DNE+WCCN 4.59 0.0478 5.83 0.0617

下載: 導(dǎo)出CSV

表 4 NIST SRE 2010電話-電話測(cè)試集上DNE和NPE的EER和minDCF比較(LDA+WCCN信道補(bǔ)償)

系統(tǒng) 男聲女聲
EER(%) minDCF EER(%) minDCF

NPE+LDA+WCCN 4.41 0.0476 5.72 0.0584
DNE+LDA+WCCN 4.15 0.0434 5.24 0.0553

下載: 導(dǎo)出CSV

表 5 NIST SRE 2010電話-電話測(cè)試集上DNE和PLDA的EER和minDCF比較

系統(tǒng) 男聲女聲
EER(%) minDCF EER(%) minDCF

DNE+LDA+WCCN 4.15 0.0434 5.24 0.0553
PLDA 4.12 0.0428 5.37 0.0532

下載: 導(dǎo)出CSV

參考文獻(xiàn)(25)

REYNOLDS D A and ROSE R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72–83. doi: 10.1109/89.365379

KINNUNEN T and LI Haizhou. An overview of text-independent speaker recognition: From features to supervectors[J]. Speech Communication, 2010, 52(1): 12–40. doi: 10.1016/j.specom.2009.08.009

王偉, 韓紀(jì)慶, 鄭鐵然, 等. 基于Fisher判別字典學(xué)習(xí)的說話人識(shí)別[J]. 電子與信息學(xué)報(bào), 2016, 38(2): 367–372. doi: 10.11999/JEIT150566

WANG Wei, HAN Jiqing, ZHENG Tieran, et al. Speaker recognition based on fisher discrimination dictionary Learning[J]. Journal of Electronics &Information Technology, 2016, 38(2): 367–372. doi: 10.11999/JEIT150566

KENNY P, BOULIANNE G, OUELLET P, et al. Speaker and session variability in GMM-based speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1448–1460. doi: 10.1109/tasl.2007.894527

郭武, 戴禮榮, 王仁華. 采用因子分析和支持向量機(jī)的說話人確認(rèn)系統(tǒng)[J]. 電子與信息學(xué)報(bào), 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289

GUO Wu, DAI Lirong, and WANG Renhua. Speaker verification based on factor analysis and SVM[J]. Journal of Electronics &Information Technology, 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289

DEHAK N, KENNY P J, DEHAK R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788–798. doi: 10.1109/tasl.2010.2064307

DHANUSH B K, SUPARNA S, AARTHY R, et al. Factor analysis methods for joint speaker verification and spoof detection[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5385–5389.

SU Hang and WEGMANN S. Factor analysis based speaker verification using ASR[C]. Proceedings of the Interspeech 2016, San Francisco, USA, 2016: 2223–2227.

MAK M W, PANG Xiaomin, and CHIEN J T. Mixture of PLDA for noise robust i-vector speaker verification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1): 130–142. doi: 10.1109/TASLP.2015.2499038

LEI Yun and HANSEN J H L. Speaker recognition using supervised probabilistic principal component analysis[C]. Proceedings of the Interspeech 2010, Makuhari, Japan, 2010: 382–385.

LIANG Chunyan, YANG Lin, ZHAO Qingwei, et al. Factor Analysis of neighborhood-preserving embedding for speaker verification[J]. IEICE Transactions on Information and Systems, 2012, 95(10): 2572–2576. doi: 10.1587/transinf.e95.d.2572

YANG Jinchao, LIANG Chunyan, YANG Lin, et al. Factor analysis of Laplacian approach for speaker recognition[C]. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 2012: 4221–4224.

CHIEN J T and HSU C W. Variational manifold learning for speaker recognition[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 4935–4939.

WU Di. Speaker recognition based on i-vector and improved local preserving projection[C]. Proceedings of the 2015 Chinese Intelligent Automation Conference, Fuzhou, China, 2015: 115–121.

HE Xiaofei, CAI Deng, YAN Shuicheng, et al. Neighborhood preserving embedding[C]. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, China, 2005: 1208–1213.

KAJAREKAR S S and STOLCKE A. NAP and WCCN: Comparison of approaches using MLLR-SVM speaker verification system[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, 2007: IV-249–IV-252.

HAEB-UMBACH R and NEY H. Linear discriminant analysis for improved large vocabulary continuous speech recognition[C]. Proceedings of 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, USA, 1992: 13–16.

DING Chuntao and ZHANG Li. Double adjacency graphs-based discriminant neighborhood embedding[J]. Pattern Recognition, 2015, 48(5): 1734–1742. doi: 10.1016/j.patcog.2014.08.025

WANG Jing, CHEN Fang, and GAO Quanxue. Discriminant neighborhood structure embedding using trace ratio criterion for image recognition[J]. Journal of Computer and Communications, 2015, 3(11): 61282. doi: 10.4236/jcc.2015.311011

魏權(quán)齡, 王日爽, 徐冰, 等. 數(shù)學(xué)規(guī)劃與優(yōu)化設(shè)計(jì)[M]. 北京: 國(guó)防工業(yè)出版社, 1984: 358–470.

WEI Quanling, WANG Rishuang, XU Bing, et al. Mathematical Programming and Optimization Design[M]. Beijing: National Defense Industry Press, 1984: 358–470.

NIST. The NIST year 2010 speaker recognition evaluation plan[EB/OL]. http://www.oalib.com/references/16891962, 2012.

SCHEFFER N, FERRER L, GRACIARENA M, et al. The SRI NIST 2010 speaker recognition evaluation system[C]. Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, 2011: 5292–5295.

JOACHIMS T. SVM-light support vector machine[EB/OL]. http://svmlight.joachims.org/, 2008.

KINNUNEN T, JUVELA L, ALKU P, et al. Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5535–5539.

BAHMANINEZHAD F and HANSEN J H L. i-Vector/PLDA speaker recognition using support vectors with discriminant analysis[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5410–5414.

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計(jì)

表(5)

計(jì)量

文章訪問數(shù): 2223
HTML全文瀏覽量: 638
PDF下載量: 76
被引次數(shù): 0

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

留言板

基于判別鄰域嵌入算法的說話人識(shí)別

doi: 10.11999/JEIT180761 cstr: 32379.14.JEIT180761

通訊作者:
梁春燕　liangchunyan@sdut.edu.cn

計(jì)量

Speaker Recognition Using Discriminant Neighborhood Embedding

計(jì)量

目錄

系統(tǒng)	男聲		女聲
系統(tǒng)	EER(%)	minDCF	EER(%)	minDCF
NPE	5.76	0.0575	6.98	0.0744
DNE	5.28	0.0544	6.35	0.0683

系統(tǒng)	男聲		女聲
系統(tǒng)	EER(%)	minDCF	EER(%)	minDCF
NPE+LDA	4.71	0.0492	6.11	0.0633
DNE+LDA	4.19	0.0453	5.57	0.0604

系統(tǒng)	男聲		女聲
系統(tǒng)	EER(%)	minDCF	EER(%)	minDCF
NPE+WCCN	5.07	0.0512	6.49	0.0677
DNE+WCCN	4.59	0.0478	5.83	0.0617

系統(tǒng)	男聲		女聲
系統(tǒng)	EER(%)	minDCF	EER(%)	minDCF
NPE+LDA+WCCN	4.41	0.0476	5.72	0.0584
DNE+LDA+WCCN	4.15	0.0434	5.24	0.0553

系統(tǒng)	男聲		女聲
系統(tǒng)	EER(%)	minDCF	EER(%)	minDCF
DNE+LDA+WCCN	4.15	0.0434	5.24	0.0553
PLDA	4.12	0.0428	5.37	0.0532

一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

留言板

基于判別鄰域嵌入算法的說話人識(shí)別

doi: 10.11999/JEIT180761 cstr: 32379.14.JEIT180761

通訊作者: 梁春燕 liangchunyan@sdut.edu.cn

計(jì)量

出版歷程

Speaker Recognition Using Discriminant Neighborhood Embedding

計(jì)量

出版歷程

目錄

通訊作者:
梁春燕　liangchunyan@sdut.edu.cn