一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級(jí)搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問(wèn)題, 您可以本頁(yè)添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

基于改進(jìn)主題分布特征的神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型

劉暢 張一珂 張鵬遠(yuǎn) 顏永紅

劉暢, 張一珂, 張鵬遠(yuǎn), 顏永紅. 基于改進(jìn)主題分布特征的神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型[J]. 電子與信息學(xué)報(bào), 2018, 40(1): 219-225. doi: 10.11999/JEIT170219
引用本文: 劉暢, 張一珂, 張鵬遠(yuǎn), 顏永紅. 基于改進(jìn)主題分布特征的神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型[J]. 電子與信息學(xué)報(bào), 2018, 40(1): 219-225. doi: 10.11999/JEIT170219
LIU Chang, ZHANG Yike, ZHANG Pengyuan, YAN Yonghong. Neural Network Language Modeling Using an Improved Topic Distribution Feature[J]. Journal of Electronics & Information Technology, 2018, 40(1): 219-225. doi: 10.11999/JEIT170219
Citation: LIU Chang, ZHANG Yike, ZHANG Pengyuan, YAN Yonghong. Neural Network Language Modeling Using an Improved Topic Distribution Feature[J]. Journal of Electronics & Information Technology, 2018, 40(1): 219-225. doi: 10.11999/JEIT170219

基于改進(jìn)主題分布特征的神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型

doi: 10.11999/JEIT170219
基金項(xiàng)目: 

國(guó)家自然科學(xué)基金(11590770-4, U1536117, 11504406, 11461141004),國(guó)家重點(diǎn)研發(fā)計(jì)劃重點(diǎn)專項(xiàng)(2016YFB0801203, 2016YFB0801200),新疆維吾爾自治區(qū)科技重大專項(xiàng)(2016A03007- 1)

Neural Network Language Modeling Using an Improved Topic Distribution Feature

Funds: 

The National Natural Science Foundation of China (11590770-4, U1536117, 11504406, 11461141004), The National Key Research and Development Plan (2016YFB0801203, 2016YFB0801200), The Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (2016A03007-1)

  • 摘要: 在遞歸神經(jīng)網(wǎng)絡(luò)(RNN)語(yǔ)言模型輸入中增加表示當(dāng)前詞所對(duì)應(yīng)主題的特征向量是一種有效利用長(zhǎng)時(shí)間跨度歷史信息的方法。由于在不同文檔中各主題的概率分布通常差別很大,該文提出一種使用文檔主題概率改進(jìn)當(dāng)前詞主題特征的方法,并將改進(jìn)后的特征應(yīng)用于基于長(zhǎng)短時(shí)記憶(LSTM)單元的遞歸神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型中。實(shí)驗(yàn)表明,在PTB數(shù)據(jù)集上該文提出的方法使語(yǔ)言模型的困惑度相對(duì)于基線系統(tǒng)下降11.8%。在SWBD數(shù)據(jù)集多候選重估實(shí)驗(yàn)中,該文提出的特征使LSTM模型相對(duì)于基線模型詞錯(cuò)誤率(WER)相對(duì)下降6.0%;在WSJ數(shù)據(jù)集上的實(shí)驗(yàn)中,該特征使LSTM模型相對(duì)于基線模型詞錯(cuò)誤率(WER)相對(duì)下降6.8%,并且在eval92測(cè)試集上,改進(jìn)隱含狄利克雷分布(LDA)特征使RNN效果與LSTM相當(dāng)。
  • MIKOLOV T, KARAFIT M, BURGET L, et al. Recurrent neural network based language model[C]. INTERSPEECH, Makuhari, Chiba, Japan, 2010: 1045-1048.
    MIKOLOV T, JOULIN A, CHOPRA S, et al. Learning longer memory in recurrent neural networks[OL]. https:// arxiv.org/abs/1412.7753v22014.
    MEDENNIKOV I and BULUSHEVA A. LSTM-based language models for spontaneous speech recognition[C]. International Conference on Speech and Computer, Athens, Greece, 2016: 469-475.
    HUANG Z, ZWEIG G, and DUMOULIN B. Cache based recurrent neural network language model inference for first pass speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 2014: 6354-6358.
    COCCARO N and JURAFSKY D. Towards better integration of semantic predictors in statistical language modeling[C]. International Conference on Spoken Language Processing, Sydney, Australia, 1998: 2403-2406.
    KHUDANPUR S and WU J. Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling[J]. Computer Speech Language, 2000, 14(4): 355-372.
    LAU R, ROSENFELD R, and ROUKOS S. Trigger-based language models: A maximum entropy approach[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida, USA, 2002: 45-48.
    ECHEVERRY-CORREA J D, FERREIROS-LPEZ J, COUCHEIRO-LIMERES A, et al. Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition[J]. Expert Systems with Applications, 2015, 42(1): 101-112.
    MIKOLOV T and ZWEIG G. Context dependent recurrent neural network language model[C]. Spoken Language Technology Workshop, Miami, Florida, USA, 2012: 234-239.
    張劍, 屈丹, 李真. 基于詞向量特征的循環(huán)神經(jīng)網(wǎng)絡(luò)語(yǔ)言模型[J]. 模式識(shí)別與人工智能, 2015, (4): 299-305. doi: 10.16451 /j.cnki.issn1003-6059.201504002.
    ZHANG Jian, QU Dan, and LI Zhen. Recurrent neural network language model based on word vector features[J]. Pattern Recognition and Artificial Intelligence, 2015, (4): 299-305. doi: 10.16451/j.cnki.issn1003-6059.201504002.
    GONG C, LI X, and WU X. Recurrent neural network language model with part-of-speech for Mandarin speech recognition[C]. International Symposium on Chinese Spoken Language Processing, Singapore, 2014: 459-463.
    左玲云, 張晴晴, 黎塔, 等. 電話交談?wù)Z音識(shí)別中基于LSTM-DNN語(yǔ)言模型的重評(píng)估方法研究[J]. 重慶郵電大學(xué)學(xué)報(bào)(自然科學(xué)版), 2016, 28(2): 180-186. doi: 10.3979/j.issn. 1673-825X.2016.02.007.
    ZUO Lingyun, ZHANG Qingqing, LI Ta, et al. Revaluation based on LSTM DNN language model in telephone conversation sqeech recognition[J]. Journal of Chongqing University of Post and Telecomunications, 2016, 28(2): 180-186. doi: 10.3979/j.issn.1673-825X.2016.02.007.
    王龍, 楊俊安, 陳雷, 等. 基于循環(huán)神經(jīng)網(wǎng)絡(luò)的漢語(yǔ)語(yǔ)言模型并行優(yōu)化算法[J]. 應(yīng)用科學(xué)學(xué)報(bào), 2015, 33(3): 253-261. doi: 10.3969/j.issn.0255-8297.2015.03.004.
    WANG Long, YANG Junan, CHEN Lei, et al. Parallel optimization of chinese language model based on recurrent neural network[J]. Journal of Applied Sciences, 2015, 33(3): 253-261. doi: 10.3969/j.issn.0255-8297.2015.03.004.
    PIOTR Bojanowski, EDOUARD Grave, ARMAND Joulin, et al. Enriching word vectors with subword information[OL]. https://arxiv.org/abs/1607.04606v2.
    GANGULY D, ROY D, MITRA M, et al. Word embedding based generalized language model for information retrieval[C]. The International ACM SIGIR Conference, Santiago, Chile, 2015: 795-798.
    LI X. Recurrent neural network training with preconditioned stochastic gradient descent[OL]. https://arxiv.org/abs/1606. 04449v2, 2016.
    BLEI D M, NG A Y, and JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
    BHUTADA S, BALARAM V V S S S, and BULUSU V V. Semantic latent dirichlet allocation for automatic topic extraction[J]. Journal of Information Optimization Sciences, 2016, 37(3): 449-469.
    MARCUS M P, MARCINKIEWICZ M A, and SANTORINI B. Building a large annotated corpus of English: the penn treebank[J]. Computational Linguistics, 1993, 19(2): 313-330.
  • 加載中
計(jì)量
  • 文章訪問(wèn)數(shù):  1389
  • HTML全文瀏覽量:  134
  • PDF下載量:  322
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2017-03-17
  • 修回日期:  2017-10-06
  • 刊出日期:  2018-01-19

目錄

    /

    返回文章
    返回