一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機號碼
標題
留言內(nèi)容
驗證碼

雙向長短時記憶模型訓(xùn)練中的空間平滑正則化方法研究

李文潔 葛鳳培 張鵬遠 顏永紅

李文潔, 葛鳳培, 張鵬遠, 顏永紅. 雙向長短時記憶模型訓(xùn)練中的空間平滑正則化方法研究[J]. 電子與信息學報, 2019, 41(3): 544-550. doi: 10.11999/JEIT180314
引用本文: 李文潔, 葛鳳培, 張鵬遠, 顏永紅. 雙向長短時記憶模型訓(xùn)練中的空間平滑正則化方法研究[J]. 電子與信息學報, 2019, 41(3): 544-550. doi: 10.11999/JEIT180314
Wenjie LI, Fengpei GE, Pengyuan ZHANG, Yonghong YAN. Spatial Smoothing Regularization for Bi-direction Long Short-term Memory Model[J]. Journal of Electronics & Information Technology, 2019, 41(3): 544-550. doi: 10.11999/JEIT180314
Citation: Wenjie LI, Fengpei GE, Pengyuan ZHANG, Yonghong YAN. Spatial Smoothing Regularization for Bi-direction Long Short-term Memory Model[J]. Journal of Electronics & Information Technology, 2019, 41(3): 544-550. doi: 10.11999/JEIT180314

雙向長短時記憶模型訓(xùn)練中的空間平滑正則化方法研究

doi: 10.11999/JEIT180314
基金項目: 國家重點研發(fā)計劃重點專項(2016YFB0801203, 2016YFB0801200),國家自然科學基金(11590770-4, U1536117, 11504406, 11461141004),新疆維吾爾自治區(qū)科技重大專項(2016A03007-1)
詳細信息
    作者簡介:

    李文潔:女,1993年生,博士生,研究方向為語音信號處理、語音識別、聲學模型、遠場語音識別等

    葛鳳培:女,1982年生,副研究員,研究方向為語音識別、發(fā)音質(zhì)量評估、聲學建模及自適應(yīng)等

    張鵬遠:男,1978年生,研究員,碩士生導(dǎo)師,研究方向為大詞表非特定人連續(xù)語音識別、關(guān)鍵詞檢索、聲學模型、魯棒語音識別等

    顏永紅:男,1967年生,研究員,博士生導(dǎo)師,研究方向為語音信號處理、語音識別、口語系統(tǒng)及多模系統(tǒng)、人機界面技術(shù)等

    通訊作者:

    張鵬遠 pzhang@hccl.ioa.ac.cn

  • 中圖分類號: TN912.34

Spatial Smoothing Regularization for Bi-direction Long Short-term Memory Model

Funds: The National Key Research and Development Plan (2016YFB0801203, 2016YFB0801200), The National Natural Science Foundation of China (11590770-4, U1536117, 11504406, 11461141004), The Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (2016A03007-1)
  • 摘要:

    雙向長短時記憶模型(BLSTM)由于其強大的時間序列建模能力,以及良好的訓(xùn)練穩(wěn)定性,已經(jīng)成為語音識別領(lǐng)域主流的聲學模型結(jié)構(gòu)。但是該模型結(jié)構(gòu)擁有更大計算量以及參數(shù)數(shù)量,因此在神經(jīng)網(wǎng)絡(luò)訓(xùn)練的過程當中很容易過擬合,進而無法獲得理想的識別效果。在實際應(yīng)用中,通常會使用一些技巧來緩解過擬合問題,例如在待優(yōu)化的目標函數(shù)中加入L2正則項就是常用的方法之一。該文提出一種空間平滑的方法,把BLSTM模型激活值的向量重組成一個2維圖,通過濾波變換得到它的空間信息,并將平滑該空間信息作為輔助優(yōu)化目標,與傳統(tǒng)的損失函數(shù)一起,作為優(yōu)化神經(jīng)網(wǎng)絡(luò)參數(shù)的學習準則。實驗表明,在電話交談?wù)Z音識別任務(wù)上,這種方法相比于基線模型取得了相對4%的詞錯誤率(WER)下降。進一步探索了L2范數(shù)正則技術(shù)和空間平滑方法的互補性,實驗結(jié)果表明,同時應(yīng)用這2種算法,能夠取得相對8.6%的WER下降。

  • 圖  1  LSTM網(wǎng)絡(luò)的記憶單元

    圖  2  將激活值的1維向量拼成2維網(wǎng)格

    圖  3  模型結(jié)構(gòu)圖

    表  1  不同位置空間平滑的結(jié)果

    空間平滑
    位置
    空間平滑
    權(quán)重(c)
    CallHm WER (%)Swbd WER (%)總計WER (%)
    20.010.315.2
    P10.002019.910.415.2
    P10.001019.910.015.0
    P10.000720.010.315.2
    P20.002019.710.014.9
    P20.001019.79.814.8
    P20.000719.99.815.0
    P30.002020.110.315.2
    P30.001020.09.815.0
    P30.000720.010.115.1
    P40.001020.910.615.8
    P40.000720.610.315.5
    P40.000620.510.615.6
    下載: 導(dǎo)出CSV

    表  2  不同權(quán)重下的細胞狀態(tài)值${{{c}}_t}$的空間平滑結(jié)果

    空間平滑權(quán)重
    (c)
    CallHm WER
    (%)
    Swbd WER
    (%)
    總計WER
    (%)
    20.010.315.2
    0.010020.310.415.4
    0.001019.79.814.8
    0.000919.39.814.6
    0.000819.69.714.7
    0.000719.99.815.0
    下載: 導(dǎo)出CSV

    表  3  網(wǎng)絡(luò)中添加L2正則后的結(jié)果

    L2正則
    有/無
    空間平滑
    有/無
    CallHm WER (%)Swbd WER (%)總計WER (%)
    20.010.315.2
    19.39.814.6
    19.09.514.3
    18.59.313.9
    下載: 導(dǎo)出CSV
  • LI X, and WU X. Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015: 4520–4524. doi: 10.1109/ICASSP.2015.7178826.
    CHEN K and HUO Q. Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) , 2016, 24(7): 1185–1193. doi: 10.1109/TASLP.2016.2539499
    AXELROD S, GOEL V, Gopinath R, et al. Discriminative estimation of subspace constrained gaussian mixture models for speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(1): 172–189. doi: 10.1109/TASL.2006.872617
    POVEY D, KANEVSKY D, KINGSBURY B, et al. Boosted MMI for model and feature-space discriminative training[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, USA, 2008: 4057–4060. doi: 10.1109/ICASSP.2008.4518545.
    POVEY D and KINGSBURY B. Evaluation of proposed modifications to MPE for large scale discriminative training[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, USA, 2007: 321–324. doi: 10.1109/ICASSP.2007.366914.
    HUANG Z, SINISCALCHI S M, and LEE C H. Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation[J]. Pattern Recognition Letters, 2017, 98(15): 1–7. doi: 10.1016/j.patrec.2017.08.001
    POVEY D. Discriminative training for large vocabulary speech recognition[D].[Ph.D. dissertation], University of Cambridge, 2003.
    ZHOU P, JIANG H, DAI L R, et al. State-clustering based multiple deep neural networks modeling approach for speech recognition[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) , 2015, 23(4): 631–642. doi: 10.1109/TASLP.2015.2392944
    SRIVASTAVA N, HINTON G, KRIZHEYSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929–1958.
    GOODFELLOW I, BENGIO Y, and COURVILLE A, Deep Learning[M], Cambridge, MA: MIT Press, 2016: 228–230.
    POVEY D, PEDDINTI V, GALVEZ D, et al. Purely sequence-trained neural networks for ASR based on lattice-free MMI[C]. International Speech Communication Association (INTERSPEECH), San Francisco, USA, 2016: 2751–2755. doi: 10.21437/Interspeech.2016-595.
    SAHRAEIAN R, and VAN D. Cross-entropy training of DNN ensemble acoustic models for low-resource ASR[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(11): 1991–2001. doi: 10.1109/TASLP.2018.2851145
    LIU P, LIU C, JIANG H, et al. A constrained line search optimization method for discriminative training of HMMs[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(5): 900–909. doi: 10.1109/TASL.2008.925882
    WU C, KARANASOU P, GALES M J, et al. Stimulated deep neural network for speech recognition[C]. International Speech Communication Association (INTERSPEECH), San Francisco, USA, 2016: 400–404. doi: 10.21437/Interspeech.2016-580.
    Wu C, CALES M J F, RAGNI A, et al. Improving interpretability and regularization in deep learning[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) , 2018, 26(2): 256–265. doi: 10.1109/TASLP.2017.2774919
    KO T, PEDDINTI V, POVEY D, et al. Audio augmentation for speech recognition[C]. International Speech Communication Association (INTERSPEECH), Dresden, Germany, 2015: 3586–3589. doi: 10.21437/Interspeech.2015-571.
    LAURENT C, PEREYRA G, BRAKEL P, et al. Batch normalized recurrent neural networks[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016: 2657–2661. doi: 10.1109/ICASSP.2016.7472159.
  • 加載中
圖(3) / 表(3)
計量
  • 文章訪問數(shù):  2547
  • HTML全文瀏覽量:  651
  • PDF下載量:  77
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2018-04-03
  • 修回日期:  2018-11-22
  • 網(wǎng)絡(luò)出版日期:  2018-12-03
  • 刊出日期:  2019-03-01

目錄

    /

    返回文章
    返回