基于改進(jìn)主題分布特征的神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型
doi: 10.11999/JEIT170219
國(guó)家自然科學(xué)基金(11590770-4, U1536117, 11504406, 11461141004),國(guó)家重點(diǎn)研發(fā)計(jì)劃重點(diǎn)專項(xiàng)(2016YFB0801203, 2016YFB0801200),新疆維吾爾自治區(qū)科技重大專項(xiàng)(2016A03007- 1)
Neural Network Language Modeling Using an Improved Topic Distribution Feature
The National Natural Science Foundation of China (11590770-4, U1536117, 11504406, 11461141004), The National Key Research and Development Plan (2016YFB0801203, 2016YFB0801200), The Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (2016A03007-1)
-
摘要: 在遞歸神經(jīng)網(wǎng)絡(luò)(RNN)語(yǔ)言模型輸入中增加表示當(dāng)前詞所對(duì)應(yīng)主題的特征向量是一種有效利用長(zhǎng)時(shí)間跨度歷史信息的方法。由于在不同文檔中各主題的概率分布通常差別很大,該文提出一種使用文檔主題概率改進(jìn)當(dāng)前詞主題特征的方法,并將改進(jìn)后的特征應(yīng)用于基于長(zhǎng)短時(shí)記憶(LSTM)單元的遞歸神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型中。實(shí)驗(yàn)表明,在PTB數(shù)據(jù)集上該文提出的方法使語(yǔ)言模型的困惑度相對(duì)于基線系統(tǒng)下降11.8%。在SWBD數(shù)據(jù)集多候選重估實(shí)驗(yàn)中,該文提出的特征使LSTM模型相對(duì)于基線模型詞錯(cuò)誤率(WER)相對(duì)下降6.0%;在WSJ數(shù)據(jù)集上的實(shí)驗(yàn)中,該特征使LSTM模型相對(duì)于基線模型詞錯(cuò)誤率(WER)相對(duì)下降6.8%,并且在eval92測(cè)試集上,改進(jìn)隱含狄利克雷分布(LDA)特征使RNN效果與LSTM相當(dāng)。
-
關(guān)鍵詞:
- 語(yǔ)音識(shí)別 /
- 語(yǔ)言模型 /
- 隱含狄利克雷分布 /
- 長(zhǎng)短時(shí)記憶
Abstract: Attaching topic features to the input of Recurrent Neural Network (RNN) models is an efficient method to leverage distant contextual information. To cope with the problem that the topic distributions may vary greatly among different documents, this paper proposes an improved topic feature using the topic distributions of documents and applies it to a recurrent Long Short-Term Memory (LSTM) language model. Experiments show that the proposed feature achieved an 11.8% relatively perplexity reduction on the Penn TreeBank (PTB) dataset, and reached 6.0% and 6.8% relative Word Error Rate (WER) reduction on the SWitch BoarD (SWBD) and Wall Street Journal (WSJ) speech recognition task respectively. On WSJ speech recognition task, RNN with this feature can reach the effect of LSTM on eval92 testset. -
MIKOLOV T, KARAFIT M, BURGET L, et al. Recurrent neural network based language model[C]. INTERSPEECH, Makuhari, Chiba, Japan, 2010: 1045-1048. MIKOLOV T, JOULIN A, CHOPRA S, et al. Learning longer memory in recurrent neural networks[OL]. https:// arxiv.org/abs/1412.7753v22014. MEDENNIKOV I and BULUSHEVA A. LSTM-based language models for spontaneous speech recognition[C]. International Conference on Speech and Computer, Athens, Greece, 2016: 469-475. HUANG Z, ZWEIG G, and DUMOULIN B. Cache based recurrent neural network language model inference for first pass speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 2014: 6354-6358. COCCARO N and JURAFSKY D. Towards better integration of semantic predictors in statistical language modeling[C]. International Conference on Spoken Language Processing, Sydney, Australia, 1998: 2403-2406. KHUDANPUR S and WU J. Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling[J]. Computer Speech Language, 2000, 14(4): 355-372. LAU R, ROSENFELD R, and ROUKOS S. Trigger-based language models: A maximum entropy approach[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida, USA, 2002: 45-48. ECHEVERRY-CORREA J D, FERREIROS-LPEZ J, COUCHEIRO-LIMERES A, et al. Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition[J]. Expert Systems with Applications, 2015, 42(1): 101-112. MIKOLOV T and ZWEIG G. Context dependent recurrent neural network language model[C]. Spoken Language Technology Workshop, Miami, Florida, USA, 2012: 234-239. 張劍, 屈丹, 李真. 基于詞向量特征的循環(huán)神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型[J]. 模式識(shí)別與人工智能, 2015, (4): 299-305. doi: 10.16451 /j.cnki.issn1003-6059.201504002. ZHANG Jian, QU Dan, and LI Zhen. Recurrent neural network language model based on word vector features[J]. Pattern Recognition and Artificial Intelligence, 2015, (4): 299-305. doi: 10.16451/j.cnki.issn1003-6059.201504002. GONG C, LI X, and WU X. Recurrent neural network language model with part-of-speech for Mandarin speech recognition[C]. International Symposium on Chinese Spoken Language Processing, Singapore, 2014: 459-463. 左玲云, 張晴晴, 黎塔, 等. 電話交談?wù)Z音識(shí)別中基于LSTM-DNN語(yǔ)言模型的重評(píng)估方法研究[J]. 重慶郵電大學(xué)學(xué)報(bào)(自然科學(xué)版), 2016, 28(2): 180-186. doi: 10.3979/j.issn. 1673-825X.2016.02.007. ZUO Lingyun, ZHANG Qingqing, LI Ta, et al. Revaluation based on LSTM DNN language model in telephone conversation sqeech recognition[J]. Journal of Chongqing University of Post and Telecomunications, 2016, 28(2): 180-186. doi: 10.3979/j.issn.1673-825X.2016.02.007. 王龍, 楊俊安, 陳雷, 等. 基于循環(huán)神經(jīng)網(wǎng)絡(luò)的漢語(yǔ)語(yǔ)言模型并行優(yōu)化算法[J]. 應(yīng)用科學(xué)學(xué)報(bào), 2015, 33(3): 253-261. doi: 10.3969/j.issn.0255-8297.2015.03.004. WANG Long, YANG Junan, CHEN Lei, et al. Parallel optimization of chinese language model based on recurrent neural network[J]. Journal of Applied Sciences, 2015, 33(3): 253-261. doi: 10.3969/j.issn.0255-8297.2015.03.004. PIOTR Bojanowski, EDOUARD Grave, ARMAND Joulin, et al. Enriching word vectors with subword information[OL]. https://arxiv.org/abs/1607.04606v2. GANGULY D, ROY D, MITRA M, et al. Word embedding based generalized language model for information retrieval[C]. The International ACM SIGIR Conference, Santiago, Chile, 2015: 795-798. LI X. Recurrent neural network training with preconditioned stochastic gradient descent[OL]. https://arxiv.org/abs/1606. 04449v2, 2016. BLEI D M, NG A Y, and JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. BHUTADA S, BALARAM V V S S S, and BULUSU V V. Semantic latent dirichlet allocation for automatic topic extraction[J]. Journal of Information Optimization Sciences, 2016, 37(3): 449-469. MARCUS M P, MARCINKIEWICZ M A, and SANTORINI B. Building a large annotated corpus of English: the penn treebank[J]. Computational Linguistics, 1993, 19(2): 313-330. -
計(jì)量
- 文章訪問(wèn)數(shù): 1389
- HTML全文瀏覽量: 134
- PDF下載量: 322
- 被引次數(shù): 0