電話語音識別中基于統(tǒng)計模型的動態(tài)通道
Dynamic Channel Compensation Based on Statistical Model for Mandarin Speech Recognition over Telephone
-
摘要: 與桌面環(huán)境相比,電話網(wǎng)絡(luò)環(huán)境下的語音識別率仍然還比較低,為了推動電話語音識別在實際中的應(yīng)用,提高其識別率成了當(dāng)務(wù)之急.先前的研究表明,電話語音識別率明顯下降通常是因為測試和訓(xùn)練環(huán)境的電話通道不同引起數(shù)據(jù)失配造成的,因此該文提出基于統(tǒng)計模型的動態(tài)通道補償算法(SMDC)減少它們之間的差異,采用貝葉斯估計算法動態(tài)地跟蹤電話通道的時變特性.實驗結(jié)果表明,大詞匯量連續(xù)語音識別的字誤識率(CER)相對降低約27%,孤立詞的詞誤識率(WER)相對降低約30%.同時,算法的結(jié)構(gòu)時延和計算復(fù)雜度也比較?。骄鶗r延約200ms.可以很好地嵌入到實際電話語音識別應(yīng)用中.
-
關(guān)鍵詞:
- 電話語音識別;動態(tài)通道補償;最大似然估計;最大后驗估計
Abstract: Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the varational mismatch caused by different telephone channels between the testing and train-ing sets. In this paper, they propose an efficient implementation to dynamically compen-sate this mismatch based on a phone-conditioned prior statistic model for the channel bias. This algorithm uses Bayes rule to estimate telephone channels and dynamically follows the time-variations within the channels. In their experiments on mandarin Large Vocabulary Continuous Speech Recognition (LVCSR) over telephone lines, the average Character Error Rate (CER) decreases more than 27% when applying this algorithm; in short utterance test, the Vord-Error-Rate(VER) relatively reduced 30%. At the same time, the structural delay and computational consumptions required by this algorithm are limited. The average delay is about 200 ins. So it could be embedded into practical telephone-based applications. -
Moreno P J, Siegler M A, Jain U, Stern R. M. Continuous speech recognition of large vocabulary telephone quality speech. Proc. of the Eighth Spoken Language Systems Technology Workshop,Austin, Texas, 1995.[2]Besacier L, Grassi S, Dufaux A, Ansorge M, Pellandini F. GSM speech coding and speaker recognition. Proc. of ICASSP 2000, Istanbul, Turkey, June 2000: 1085-1088.[3]Huerta J M. Speech recognition in mobile environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, Apr. 2000.[4]Hermansky H, Morgan N. RASTA processing of speech[J].IEEE Trans. on Speech and Audio Processing.1994, 2(4):578-589[5]Rahim M G, Juang Biing-Hwang. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(1):19-30[6]Sankar Ananth, Lee Chin-Hui. A maximum-likelihood approach to stochastic matching for robust speech recognition[J].IEEE Trans. on Speech and Audio Processing.1996, 4(3):190-202[7]Moreno P J. Speech recognition in noisy environments. [Ph.D. Thesis]: School of Computer Science, Carnegie Mellon University, April 22, 1996.[8]Westphal M. The use of cepstral means in conversational speech recognition. Proc. of Eurospeech 97, Greece, 1997: 1143-1146.[9]Chien Jen-Tzung.[J].Wang Hsiao-Chuan, Lee Lee-Min. Estimation of channel bias for telephone speech recognition. In Proc. ICSLP96, Philadelphia USA.1996,:-Veth J D.[J].Boves L. Comparison of channel normalization techniques for automatic speech recognition over the phone. In Proc. ICSLP96, Philadelphia USA.1996,:- -
計量
- 文章訪問數(shù): 2321
- HTML全文瀏覽量: 115
- PDF下載量: 619
- 被引次數(shù): 0