基于受限玻爾茲曼機(jī)的語音帶寬擴(kuò)展

王迎雪; 趙勝輝; 于瑩瑩; 匡鏡明

doi:10.11999/JEIT151034

基于受限玻爾茲曼機(jī)的語音帶寬擴(kuò)展

doi: 10.11999/JEIT151034 cstr: 32379.14.JEIT151034

計(jì)量
- 文章訪問數(shù): 1525
- HTML全文瀏覽量: 130
- PDF下載量: 724
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2015-09-14
- 修回日期: 2016-03-03
- 刊出日期: 2016-07-19

Speech Bandwidth Extension Based on Restricted Boltzmann Machines

摘要

摘要: 語音帶寬擴(kuò)展是為了提高語音質(zhì)量，利用語音低頻和高頻之間的相關(guān)性重構(gòu)語音高頻的一種技術(shù)。高斯混合模型法是語音帶寬技術(shù)中被廣泛應(yīng)用的一種方法，但是，由于該方法假設(shè)語音高頻、低頻服從高斯分布，且只表征了語音低頻、高頻之間的線性關(guān)系，從而導(dǎo)致合成的高頻語音出現(xiàn)失真。因此，該文提出一種基于受限玻爾茲曼機(jī)的方法，該方法利用兩個(gè)高斯伯努利受限玻爾茲曼機(jī)提取語音低頻和高頻中蘊(yùn)含的高階統(tǒng)計(jì)特性；并利用前饋神經(jīng)網(wǎng)絡(luò)將語音低頻高階統(tǒng)計(jì)特性參數(shù)映射為高頻高階統(tǒng)計(jì)特性參數(shù)。這樣，通過提取語音低頻和高頻中蘊(yùn)含的高階統(tǒng)計(jì)特性，該方法可以深層挖掘語音高頻和語音低頻之間的實(shí)際關(guān)系，從而更加準(zhǔn)確地模擬頻譜包絡(luò)分布，合成質(zhì)量更高的語音?？陀^測試、主觀測試結(jié)果表明，該方法性能優(yōu)于傳統(tǒng)的高斯混合模型方法。
- 語音帶寬擴(kuò)展 /
- 受限玻爾茲曼機(jī) /
- 前饋神經(jīng)網(wǎng)絡(luò) /
- 高斯混合模型
Abstract: Speech Bandwidth Extension (BWE) is a technique that attempts to improve the speech quality by recovering the missing High Frequency (HF) components using the correlation that exists between the Low Frequency (LF) and HF parts of the wide-band speech signal. The Gaussian Mixture Model (GMM) based methods are widely used, but it recovers the missing HF components on the assumption that the LF and HF parts obey a Gaussian distribution and gives their linear relationship, leading to the distortion of reconstructed speech. This Study proposes a new speech BWE method, which uses two Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBMs) to extract the high-order statistical characteristics of spectral envelopes of the LF and HF respectively. Then, high-order features of the LF are mapped to those of the HF using a Feedforward Neural Network (FNN). The proposed method learns deep relationship between the spectral envelopes of LF and HF and can model the distribution of spectral envelopes more precisely by extracting the high-order statistical characteristics of the LF components and the HF components. The objective and subjective test results show that the proposed method outperforms the conventional GMM based method.
- Speech bandwidth extension /
- Restricted Boltzmann machines /
- Feedforward Neural Networks (FNN) /
- Gaussian mixture model

HTML全文

參考文獻(xiàn)(24)

BAUER P, ABEL J, FISCHER V, et al. Automatic recognition of wideband telephone speech with limited amount of matched training data[C]. Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, 2013: 1232-1236.

GANDHIMATHI G and JAYAKUMAR S. Speech enhancement using an artificial bandwidth extension algorithm in multicast conferencing through cloud services[J]. Information Technology Journal, 2014, 13(12): 1953-1960. doi: 10.3923/itj.2014.1953.1960.

YOSHIDA Y and ABE M. An algorithm to reconstruct wideband speech from narrowband speech based on codebook mapping[C]. Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, 1994: 1591-1594.

WANG Yingxue, ZHAO Shenghui, et al. Superwideband extension for AMR-WB using conditional codebooks[C]. Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence, Italy, 2014: 3695-3698.

NAKATOH Yoshihisa, TSUSHIMA Mineo, NORIMATSU Takeshi, et al. Generation of broadband speech from narrowband speech using on linear mapping[J]. Electronics and Communications in Japan, Part 2 (Electronics), 2002, 85(8): 44-53. doi: 10.1002/ecjb.10065.

DUY N D, SUZUKI M, MINEMSTSU N, et al. Artificial bandwidth extension based on regularized piecewise linear mapping with discriminative region weighting and long-Span features[C]. INTERSPEECH, Lyon, France, 2013: 3453-3457.

PARK K Y and KIM H S. Narrowband to wideband conversion of speech using GMM based transformation[C]. Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Istanbul, Turkey, 2000: 1843-1846.

PULAKKA H, REMES U, PALOMAKI K, et al. Speech bandwidth extension using gaussian mixture model-based estimation of the highband Mel spectrum[C]. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011: 5100-5103.

JAX P and VARY P. Artificial bandwidth extension of speech signals using mmse estimation based on a hidden markov model[C]. Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Hong Kong, 2003: 680-683.

BAUER P, ABEL J, et al. HMM-based artificial bandwidth extension supported by neural networks[C]. 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC), Juan-les-Pins, France, 2014: 1-5.

LIU Haojie, BAO Changchun, and LIU Xin. Spectral envelope estimation used for audio bandwidth extension based on RBF neural network[C]. Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Vancouver, Canada, 2013: 543-547.

LI K and LEE C H. A deep neural network approach to speech bandwidth expansion[C]. Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015: 4395-4399.

SEO H, KANG H G, and SOONG F. A maximum a Posterior-based reconstruction approach to speech bandwidth expansion in noise[C]. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014: 6087-6091.

LIU Xin and BAO Changchun. Audio bandwidth extension based on temporal smoothing cepstral coefficients[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014, 2014(1): 1-16.

OHTANI Y, AMURA M, ORITA M, et al. GMM-based bandwidth extension using sub-band basis spectrum model[C]. Fifteenth Annual Conference of the International Speech Communication Association, Singapore, 2014: 2489-2493.

ACKLEY D H, HINTON G E, et al. A learning algorithm for Boltzmann machines[J]. Cognitive Science, 1985, 9(1): 147-169. doi: 10.1207/s15516709cog0901_7.

MOHAME A, DAHL G E, and HINTON G E. Acoustic modeling using deep belief networks[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 14-22.

HINTON G E. Training products of experts by minimizing contrastive divergence[J]. Neural Computation, 2002, 14(8): 1771-1800.

HINTON G E and SALAKHUTDINOV R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.

com/products e/speech, 1994.

MAKINEN J, BESSETTE B, BRUHN S, et al. AMR-WB+: A new audio coding standard for 3rd generation mobile audio services[C]. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Pennsylvania, USA, 2005: 1109-1112.

張勇, 胡瑞敏. 基于高斯混合模型的語音帶寬擴(kuò)展算法的研究[J]. 聲學(xué)學(xué)報(bào), 2009, 34(5): 471-480.

ZHANG Yong and HU Ruimin. Speech bandwidth extension based on Gaussian mixture model[J]. Acta Acustica, 2009, 34(5): 471-480.

NOUR-ELDIN AMR H and KABAL P. Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech[C]. INTERSPEECH, Brisbane, Australia, 2008: 53-56.

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計(jì)