電話語音識別中基于統(tǒng)計(jì)模型的動態(tài)通道

韓兆兵; 張化云; 張樹武; 徐波

一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名

郵箱

手機(jī)號碼

標(biāo)題

留言內(nèi)容

驗(yàn)證碼

電話語音識別中基于統(tǒng)計(jì)模型的動態(tài)通道

韓兆兵, 張化云, 張樹武, 徐波

文章導(dǎo)航 > 電子與信息學(xué)報(bào) > 2004 > 26(11): 1714-1720

韓兆兵, 張化云, 張樹武, 徐波. 電話語音識別中基于統(tǒng)計(jì)模型的動態(tài)通道[J]. 電子與信息學(xué)報(bào), 2004, 26(11): 1714-1720.

引用本文:

韓兆兵, 張化云, 張樹武, 徐波. 電話語音識別中基于統(tǒng)計(jì)模型的動態(tài)通道[J]. 電子與信息學(xué)報(bào), 2004, 26(11): 1714-1720.

Han Zhao-bing, Zhang Hua-yun, Zhang Shu-wu, Xu Bo. Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone[J]. Journal of Electronics & Information Technology, 2004, 26(11): 1714-1720.

Citation:

Han Zhao-bing, Zhang Hua-yun, Zhang Shu-wu, Xu Bo. Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone[J]. Journal of Electronics & Information Technology, 2004, 26(11): 1714-1720.

韓兆兵, 張化云, 張樹武, 徐波. 電話語音識別中基于統(tǒng)計(jì)模型的動態(tài)通道[J]. 電子與信息學(xué)報(bào), 2004, 26(11): 1714-1720.

引用本文:

韓兆兵, 張化云, 張樹武, 徐波. 電話語音識別中基于統(tǒng)計(jì)模型的動態(tài)通道[J]. 電子與信息學(xué)報(bào), 2004, 26(11): 1714-1720.

Citation:

電話語音識別中基于統(tǒng)計(jì)模型的動態(tài)通道

計(jì)量
- 文章訪問數(shù): 2339
- HTML全文瀏覽量: 121
- PDF下載量: 622
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2003-06-12
- 修回日期: 2004-03-23
- 刊出日期: 2004-11-19

Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone

摘要

摘要: 與桌面環(huán)境相比，電話網(wǎng)絡(luò)環(huán)境下的語音識別率仍然還比較低，為了推動電話語音識別在實(shí)際中的應(yīng)用，提高其識別率成了當(dāng)務(wù)之急．先前的研究表明，電話語音識別率明顯下降通常是因?yàn)闇y試和訓(xùn)練環(huán)境的電話通道不同引起數(shù)據(jù)失配造成的，因此該文提出基于統(tǒng)計(jì)模型的動態(tài)通道補(bǔ)償算法（SMDC)減少它們之間的差異，采用貝葉斯估計(jì)算法動態(tài)地跟蹤電話通道的時變特性．實(shí)驗(yàn)結(jié)果表明，大詞匯量連續(xù)語音識別的字誤識率(CER)相對降低約27％，孤立詞的詞誤識率(WER)相對降低約30％．同時，算法的結(jié)構(gòu)時延和計(jì)算復(fù)雜度也比較?。骄鶗r延約200ms．可以很好地嵌入到實(shí)際電話語音識別應(yīng)用中．
- 電話語音識別;動態(tài)通道補(bǔ)償;最大似然估計(jì);最大后驗(yàn)估計(jì)
Abstract: Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the varational mismatch caused by different telephone channels between the testing and train-ing sets. In this paper, they propose an efficient implementation to dynamically compen-sate this mismatch based on a phone-conditioned prior statistic model for the channel bias. This algorithm uses Bayes rule to estimate telephone channels and dynamically follows the time-variations within the channels. In their experiments on mandarin Large Vocabulary Continuous Speech Recognition (LVCSR) over telephone lines, the average Character Error Rate (CER) decreases more than 27% when applying this algorithm; in short utterance test, the Vord-Error-Rate(VER) relatively reduced 30%. At the same time, the structural delay and computational consumptions required by this algorithm are limited. The average delay is about 200 ins. So it could be embedded into practical telephone-based applications.

HTML全文

參考文獻(xiàn)(1)

Moreno P J, Siegler M A, Jain U, Stern R. M. Continuous speech recognition of large vocabulary telephone quality speech. Proc. of the Eighth Spoken Language Systems Technology Workshop,Austin, Texas, 1995.[2]Besacier L, Grassi S, Dufaux A, Ansorge M, Pellandini F. GSM speech coding and speaker recognition. Proc. of ICASSP 2000, Istanbul, Turkey, June 2000: 1085-1088.[3]Huerta J M. Speech recognition in mobile environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, Apr. 2000.[4]Hermansky H, Morgan N. RASTA processing of speech[J].IEEE Trans. on Speech and Audio Processing.1994, 2(4):578-589[5]Rahim M G, Juang Biing-Hwang. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(1):19-30[6]Sankar Ananth, Lee Chin-Hui. A maximum-likelihood approach to stochastic matching for robust speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(3):190-202[7]Moreno P J. Speech recognition in noisy environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, April 22, 1996.[8]Westphal M. The use of cepstral means in conversational speech recognition. Proc. of Eurospeech 97, Greece, 1997: 1143-1146.[9]Chien Jen-Tzung.[J].Wang Hsiao-Chuan, Lee Lee-Min. Estimation of channel bias for telephone speech recognition. In Proc. ICSLP96, Philadelphia USA.1996,:-Veth J D.[J].Boves L. Comparison of channel normalization techniques for automatic speech recognition over the phone. In Proc. ICSLP96, Philadelphia USA.1996,:-

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計(jì)