一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級(jí)搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

基于子帶雙特征的自適應(yīng)保留似然比魯棒語音檢測(cè)算法

何偉俊 賀前華 吳俊峰 楊繼臣

何偉俊, 賀前華, 吳俊峰, 楊繼臣. 基于子帶雙特征的自適應(yīng)保留似然比魯棒語音檢測(cè)算法[J]. 電子與信息學(xué)報(bào), 2016, 38(11): 2879-2886. doi: 10.11999/JEIT160157
引用本文: 何偉俊, 賀前華, 吳俊峰, 楊繼臣. 基于子帶雙特征的自適應(yīng)保留似然比魯棒語音檢測(cè)算法[J]. 電子與信息學(xué)報(bào), 2016, 38(11): 2879-2886. doi: 10.11999/JEIT160157
HE Weijun, HE Qianhua, WU Junfeng, YANG Jichen. Adaptively Reserved Likelihood Ratio-based Robust Voice Activity Detection with Sub-band Double Features[J]. Journal of Electronics & Information Technology, 2016, 38(11): 2879-2886. doi: 10.11999/JEIT160157
Citation: HE Weijun, HE Qianhua, WU Junfeng, YANG Jichen. Adaptively Reserved Likelihood Ratio-based Robust Voice Activity Detection with Sub-band Double Features[J]. Journal of Electronics & Information Technology, 2016, 38(11): 2879-2886. doi: 10.11999/JEIT160157

基于子帶雙特征的自適應(yīng)保留似然比魯棒語音檢測(cè)算法

doi: 10.11999/JEIT160157
基金項(xiàng)目: 

國(guó)家自然科學(xué)基金 (61571192),廣東省公益項(xiàng)目(2015A010103003),中央高?;究蒲袠I(yè)務(wù)費(fèi)項(xiàng)目華南理工大學(xué)(2015ZM143)

Adaptively Reserved Likelihood Ratio-based Robust Voice Activity Detection with Sub-band Double Features

Funds: 

The National Natural Science Foundation of China (61571192), The Science and Technology Foundation of Guangdong Province (2015A010103003), The Fundamental Research Funds for the Central Universities, SCUT (2015ZM143)

  • 摘要: 為了進(jìn)一步提高低信噪比下語音激活檢測(cè)(VAD)的準(zhǔn)確率,該文提出一種基于子帶雙特征的自適應(yīng)保留似然比魯棒語音激活檢測(cè)算法。算法采用子帶歸一化最大自相關(guān)函數(shù)與子帶歸一化平均過零率雙重特征設(shè)置頻率分量似然比的保留權(quán)值,同時(shí)利用已過去固定時(shí)長(zhǎng)的VAD判決結(jié)果及對(duì)應(yīng)的子帶特征參數(shù)自適應(yīng)地估計(jì)似然比的保留閾值。實(shí)驗(yàn)結(jié)果表明,此算法的VAD檢測(cè)準(zhǔn)確率相比原保留似然比算法在10 dB, 0 dB和-10 dB平穩(wěn)白噪聲下分別提高了1.2%, 7.2%和8.1%,在10 dB和0 dB非平穩(wěn)Babble噪聲下分別提高了1.6%和3.4%。當(dāng)其被用于2.4 kbps低速率聲碼器系統(tǒng)時(shí),合成語音的感知語音質(zhì)量評(píng)價(jià)(PESQ)比原聲碼器系統(tǒng)在白噪聲下提高了0.098~0.153,在Babble噪聲下提高了0.157~0.186。
  • SREEKUMAR K T, GEORGE K K, ARUNRAJ K, et al. Spectral matching based voice activity detector for improved speaker recognition[C]. 2014 International Conference on Power Signals Control and Computations (EPSCICON), Thrissur, 2014: 1-4. doi: 10.1109/EPSCICON.2014.6887507.
    DUTA C L, GHEORGHE L, and TAPUS N. Real time implementation of MELP speech compression algorithm using Blackfin processors[C]. 2015 9th International Symposium on Image and Signal Processing and Analysis (ISPA), Zagreb, 2015: 250-255. doi: 10.1109/ISPA.2015. 7306067.
    CHUL Y I, HYEONTAEK L, and DONGSUK Y. Formant-based robust voice activity detection[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(12): 2238-2245. doi: 10.1109/TASLP. 2015.2476762.
    JONGSEO S, NAM SOO K, and WONYONG S. A statistical model-based voice activity detection[J]. IEEE Signal Processing Letters, 1999, 6(1): 1-3. doi: 10.1109/97. 736233.
    DUK C Y, AL-NAIMI K, and KONDOZ A. Improved voice activity detection based on a smoothed statistical likelihood ratio[C]. 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake City, 2001: 737-740. doi: 10.1109/ICASSP.2001.941020.
    RAMIREZ J, SEGURA J, BENITEZ C, et al. Statistical voice activity detection using a multiple observation likelihood ratio test[J]. IEEE Signal Process Letters, 2005, 12(10): 689-692. doi: 10.1109/LSP.2005.855551.
    RAMIREZ J, SEGURA J C, GORRIZ J M, et al. Improved voice activity detection using contextual multiple hypothesis testing for robust speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(8): 2177-2189. doi: 10.1109/TASL.2007.903937.
    ICK K S, HAING J Q, and HYUK C J. Discriminative weight training for a statistical model-based voice activity detection[J]. IEEE Signal Processing Letters, 2008, 15: 170-173. doi: 10.1109/LSP.2007.913595.
    YOUNGJOO S and HOIRIN K. Multiple acoustic model-based discriminative likelihood ratio weighting for voice activity detection[J]. Signal Processing Letters, 2012, 19(8): 507-510. doi: 10.1109/LSP.2012.2204978.
    FERRONI G, BONFIGLI R, PRINCIPI E, et al. A deep neural network approach for voice activity detection in multi-room domestic scenarios[C]. 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, 2015: 1-8. doi: 10.1109/IJCNN.2015.7280510.
    INYOUNG H and JOON HYUK C. Voice activity detection based on statistical model employing deep neural network[C]. 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), 2014: 582-585. doi: 10.1109/IIH-MSP.2014.150.
    TAN Yingwei, LIU Wenju, WEI J, et al. Hybrid SVM/HMM architectures for statistical model-based voice activity detection[C]. 2014 International Joint Conference on Neural Networks (IJCNN), Beijing, 2014: 2875-2878. doi: 10.1109/ IJCNN.2014.6889403.
    何偉俊, 賀前華, 劉楊. 基于子帶保留似然比的魯棒語音激活檢測(cè)算法[J]. 華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版), 2015, 43(11): 78-82. doi: 10.13245/j.hust.151115.
    HE Weijun, HE Qianhua, and LIU Yang. Sub-band reserved likelihood ratio-based robust voice activity detection[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2015, 43(11): 78-82. doi: 10.13245/ j.hust.151115.
    PEARLMAN W A and GRAY R M. Source coding of the discrete Fourier transform[J]. IEEE Transactions on Information Theory, 1978, 24(6): 683-692. doi: 10.1109/TIT. 1978.1055950.
    GERKMANN T and HENDRIKS R C. Unbiased MMSE-based noise power estimation with low complexity and low tracking delay[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4): 1383-1393. doi: 10.1109/TASL.2011.2180896.
    EPHRAIM Y and MALAH D. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1984, 32(6): 1109-1121. doi: 10.1109/ TASSP.1984.1164453.
    趙力. 語音信號(hào)處理[M]. 第2版, 北京: 機(jī)械工業(yè)出版社, 2009: 38-39.
    ZHAO Li. Speech Signal Processing[M]. Second edition, Beijing: China Machine Press, 2009: 38-39.
    MOUSAZADEH S and COHEN I. Voice activity detection in presence of transient noise using spectral clustering[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(6): 1261-1271. doi: 10.1109/TASL.2013.2248717.
    PETSATODIS T, BOUKIS C, and TALANTZIS F. Convex combination of multiple statistical models with application to VAD[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(8): 2314-2327. doi: 10.1109/TASL.2011. 2131131.
  • 加載中
計(jì)量
  • 文章訪問數(shù):  1094
  • HTML全文瀏覽量:  150
  • PDF下載量:  353
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2016-02-04
  • 修回日期:  2016-06-27
  • 刊出日期:  2016-11-19

目錄

    /

    返回文章
    返回