將詞類信息融入三元文法統(tǒng)計模型的漢語音字轉換方法
A KIND OF CHINESE TRANSITION METHOD FROM SPELLING TO CHARACTER TAKING INTO ACCOUNT POS INFORMATION IN A TRIGRAM-BASED STATISTICAL MODEL
-
摘要: 本文給出了一種將詞類信息融入三元文法模型的漢語組合語言模型。理論分析和實驗均表明:該模型不僅復雜度低于三元文法模型,而且對測試文本域的依賴性也優(yōu)于前者。Abstract: A kind of Chinese combined language model,that takes into account POS(part of speech)information in a trigram-based statistical language model, is presented in this paper. The theoretical analysis and experiments all show that the model not only is lower than trigram model in PP(perplexity), but also is superior to trigram model in dependence on test text domain.
-
Cerf-Danon H, De Gennaro S, Ferretti M, Gonzalez J, Keppel E. Tangora-A large vocabulary speech recognition system for five language. EUROSPEECH91, Genova(Italy): Sep.24-26, 1991, vol.1, 183-192.[2]Katz S. Estimation of probabilistics from sparse data for the language model component of a speech recognizer. IEEE Trans.on Acoustics, Speech and Signal Processing, 1987, 34(3): 400-401.[3]Jelinek F, Mercer R L. Interpolated estimation of Markov source parameters from sparse data,[4]Pattern Recognition in Practice, E.L. Gelsema and L. N. Kanal, Eds., New York, North-Holland: 1980,381-397.[5]劉開瑛,鄭家恒,趙軍.語料庫詞類自動標注算法研究:機器翻譯研究進展,北京:電子工業(yè)出版社,1992 378-386.[6]吳伯修,規(guī)紹升,祝宗泰,等.信息論與編碼.北京:電子工業(yè)出版社,1986,5-13. -
計量
- 文章訪問數: 2076
- HTML全文瀏覽量: 69
- PDF下載量: 456
- 被引次數: 0