基于判別鄰域嵌入算法的說話人識別
doi: 10.11999/JEIT180761
-
1.
山東理工大學計算機科學與技術學院 ??淄博 ??255049
-
2.
內(nèi)蒙古師范大學計算機與信息工程學院 ??呼和浩特 ??010022
Speaker Recognition Using Discriminant Neighborhood Embedding
-
1.
College of Computer Science and Technology, Shandong University of Technology, Zibo 255049, China
-
2.
College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot 010022, China
-
摘要: 該文提出一種基于判別鄰域嵌入(DNE)算法的說話人識別。判別鄰域嵌入算法作為流形學習方法的一種,可以通過構建鄰接圖獲取數(shù)據(jù)的局部鄰域結構信息;同時該算法可以充分利用類間判別信息,具有更強的判別能力。在美國國家標準技術研究院2010年說話人識別評測(NIST SRE 2010)電話-電話核心測試集上的實驗結果表明了該算法的有效性。Abstract: Discriminant Neighborhood Embedding (DNE) algorithm is introduced into the speaker recognition system. DNE is a manifold learning approach and aims at preserving the local neighborhood structure on the data manifold. As well, DNE has much more power in discrimination by sufficiently using the between-class discriminant information. The experimental results on the telephone-telephone core condition of the NIST 2010 Speaker Recognition Evaluation (SRE) dataset indicate the effectiveness of DNE algorithm.
-
表 1 NIST SRE 2010電話-電話測試集上DNE和NPE的EER和minDCF比較(無信道補償)
系統(tǒng) 男聲 女聲 EER(%) minDCF EER(%) minDCF NPE 5.76 0.0575 6.98 0.0744 DNE 5.28 0.0544 6.35 0.0683 下載: 導出CSV
表 2 NIST SRE 2010電話-電話測試集上DNE和NPE的EER和minDCF比較(LDA信道補償)
系統(tǒng) 男聲 女聲 EER(%) minDCF EER(%) minDCF NPE+LDA 4.71 0.0492 6.11 0.0633 DNE+LDA 4.19 0.0453 5.57 0.0604 下載: 導出CSV
表 3 NIST SRE 2010電話-電話測試集上DNE和NPE的EER和minDCF比較(WCCN信道補償)
系統(tǒng) 男聲 女聲 EER(%) minDCF EER(%) minDCF NPE+WCCN 5.07 0.0512 6.49 0.0677 DNE+WCCN 4.59 0.0478 5.83 0.0617 下載: 導出CSV
表 4 NIST SRE 2010電話-電話測試集上DNE和NPE的EER和minDCF比較(LDA+WCCN信道補償)
系統(tǒng) 男聲 女聲 EER(%) minDCF EER(%) minDCF NPE+LDA+WCCN 4.41 0.0476 5.72 0.0584 DNE+LDA+WCCN 4.15 0.0434 5.24 0.0553 下載: 導出CSV
表 5 NIST SRE 2010電話-電話測試集上DNE和PLDA的EER和minDCF比較
系統(tǒng) 男聲 女聲 EER(%) minDCF EER(%) minDCF DNE+LDA+WCCN 4.15 0.0434 5.24 0.0553 PLDA 4.12 0.0428 5.37 0.0532 下載: 導出CSV
-
REYNOLDS D A and ROSE R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72–83. doi: 10.1109/89.365379 KINNUNEN T and LI Haizhou. An overview of text-independent speaker recognition: From features to supervectors[J]. Speech Communication, 2010, 52(1): 12–40. doi: 10.1016/j.specom.2009.08.009 王偉, 韓紀慶, 鄭鐵然, 等. 基于Fisher判別字典學習的說話人識別[J]. 電子與信息學報, 2016, 38(2): 367–372. doi: 10.11999/JEIT150566WANG Wei, HAN Jiqing, ZHENG Tieran, et al. Speaker recognition based on fisher discrimination dictionary Learning[J]. Journal of Electronics &Information Technology, 2016, 38(2): 367–372. doi: 10.11999/JEIT150566 KENNY P, BOULIANNE G, OUELLET P, et al. Speaker and session variability in GMM-based speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(4): 1448–1460. doi: 10.1109/tasl.2007.894527 郭武, 戴禮榮, 王仁華. 采用因子分析和支持向量機的說話人確認系統(tǒng)[J]. 電子與信息學報, 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289GUO Wu, DAI Lirong, and WANG Renhua. Speaker verification based on factor analysis and SVM[J]. Journal of Electronics &Information Technology, 2009, 31(2): 302–305. doi: 10.3724/SP.J.1146.2007.01289 DEHAK N, KENNY P J, DEHAK R, et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788–798. doi: 10.1109/tasl.2010.2064307 DHANUSH B K, SUPARNA S, AARTHY R, et al. Factor analysis methods for joint speaker verification and spoof detection[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5385–5389. SU Hang and WEGMANN S. Factor analysis based speaker verification using ASR[C]. Proceedings of the Interspeech 2016, San Francisco, USA, 2016: 2223–2227. MAK M W, PANG Xiaomin, and CHIEN J T. Mixture of PLDA for noise robust i-vector speaker verification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1): 130–142. doi: 10.1109/TASLP.2015.2499038 LEI Yun and HANSEN J H L. Speaker recognition using supervised probabilistic principal component analysis[C]. Proceedings of the Interspeech 2010, Makuhari, Japan, 2010: 382–385. LIANG Chunyan, YANG Lin, ZHAO Qingwei, et al. Factor Analysis of neighborhood-preserving embedding for speaker verification[J]. IEICE Transactions on Information and Systems, 2012, 95(10): 2572–2576. doi: 10.1587/transinf.e95.d.2572 YANG Jinchao, LIANG Chunyan, YANG Lin, et al. Factor analysis of Laplacian approach for speaker recognition[C]. Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 2012: 4221–4224. CHIEN J T and HSU C W. Variational manifold learning for speaker recognition[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 4935–4939. WU Di. Speaker recognition based on i-vector and improved local preserving projection[C]. Proceedings of the 2015 Chinese Intelligent Automation Conference, Fuzhou, China, 2015: 115–121. HE Xiaofei, CAI Deng, YAN Shuicheng, et al. Neighborhood preserving embedding[C]. Proceedings of the Tenth IEEE International Conference on Computer Vision, Beijing, China, 2005: 1208–1213. KAJAREKAR S S and STOLCKE A. NAP and WCCN: Comparison of approaches using MLLR-SVM speaker verification system[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, 2007: IV-249–IV-252. HAEB-UMBACH R and NEY H. Linear discriminant analysis for improved large vocabulary continuous speech recognition[C]. Proceedings of 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, San Francisco, USA, 1992: 13–16. DING Chuntao and ZHANG Li. Double adjacency graphs-based discriminant neighborhood embedding[J]. Pattern Recognition, 2015, 48(5): 1734–1742. doi: 10.1016/j.patcog.2014.08.025 WANG Jing, CHEN Fang, and GAO Quanxue. Discriminant neighborhood structure embedding using trace ratio criterion for image recognition[J]. Journal of Computer and Communications, 2015, 3(11): 61282. doi: 10.4236/jcc.2015.311011 魏權齡, 王日爽, 徐冰, 等. 數(shù)學規(guī)劃與優(yōu)化設計[M]. 北京: 國防工業(yè)出版社, 1984: 358–470.WEI Quanling, WANG Rishuang, XU Bing, et al. Mathematical Programming and Optimization Design[M]. Beijing: National Defense Industry Press, 1984: 358–470. NIST. The NIST year 2010 speaker recognition evaluation plan[EB/OL]. http://www.oalib.com/references/16891962, 2012. SCHEFFER N, FERRER L, GRACIARENA M, et al. The SRI NIST 2010 speaker recognition evaluation system[C]. Proceedings of 2011 IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, 2011: 5292–5295. JOACHIMS T. SVM-light support vector machine[EB/OL]. http://svmlight.joachims.org/, 2008. KINNUNEN T, JUVELA L, ALKU P, et al. Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5535–5539. BAHMANINEZHAD F and HANSEN J H L. i-Vector/PLDA speaker recognition using support vectors with discriminant analysis[C]. Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, 2017: 5410–5414. -
計量
- 文章訪問數(shù): 2198
- HTML全文瀏覽量: 627
- PDF下載量: 75
- 被引次數(shù): 0