一種新的語音和噪聲活動檢測算法及其在手機雙麥克風(fēng)消噪系統(tǒng)中的應(yīng)用
doi: 10.11999/JEIT151302
基金項目:
江蘇省自然科學(xué)基金,江蘇省聲頻技術(shù)工程重點實驗室基金項目(BE2014139)
A New Voice and Noise Activity Detection Algorithm and Its Applicationto Dual Microphone Noise Suppression System for Handset
Funds:
Program of Natural Science Research of Jiangsu Higher Education Institutions of China, Program of Science and Technology of Jiangsu (BE2014139)
-
摘要: 針對現(xiàn)有雙通道語音活動檢測(Voice Activity Detection, VAD)算法依賴于固定閾值難以在多種噪聲環(huán)境下準確地檢測語音和噪聲,應(yīng)用于手機消噪系統(tǒng)會造成語音失真或噪聲消除不好等問題,該文提出一種基于神經(jīng)網(wǎng)絡(luò)的VAD算法,該算法以分頻帶能量差和歸一化互通道相關(guān)為特征,采用神經(jīng)網(wǎng)絡(luò)對語音和噪聲進行分類。在此基礎(chǔ)上,將神經(jīng)網(wǎng)絡(luò)VAD與基于互通道信號功率比值的VAD相結(jié)合,提出一種新的適用于手機消噪系統(tǒng)的語音和噪聲活動檢測算法分別對語音和噪聲進行檢測,并以此進行噪聲抑制處理,減少了消噪系統(tǒng)因VAD誤判而造成的性能下降。實驗結(jié)果表明,該處理方法在抑制背景噪聲和減少語音失真等方面優(yōu)于現(xiàn)有的消噪算法,對于方向性語音干擾也有很好的抑制效果。
-
關(guān)鍵詞:
- 語音活動檢測 /
- 語音增強 /
- 神經(jīng)網(wǎng)絡(luò)
Abstract: Existing dual microphone Voice Activity Detection (VAD) algorithms use normally a fixed threshold. The fixed threshold can not provide an accurate VAD under various noise environments. In such case, it causes voice quality degradation, particularly in handset applications. This paper proposes a new VAD algorithm based on Neural Network (NN). Both sub-band power level difference and inter-microphone cross correlation are used as features. Then the NN based VAD is combined with the method of inter-microphone signal power ratio to get a new voice and noise activity detection algorithm. Furthermore, the algorithm is used into noise suppression in handset to avoid performance degradation caused by VAD misjudgment. Experimental results show that the proposed method provides better noise suppression performance and lower speech distortion compared to the existing method.-
Key words:
- Voice Activity Detection (VAD) /
- Speech enhancement /
- Neural Network (NN)
-
JEUB M, HERGLOTZ C, NELKE C M, et al. Noise reduction for dual-microphone mobile phones exploiting power level differences[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Kyoto, 2012: 1693-1696. doi: 10.1109/ICASSP.2012.6288223. XU Y, DU J, and DAI L R. A Regression approach to speech enhancement based on deep neural networks[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2015, 23(1): 7-19. doi: 10.1109/TASLP.2014.2364452. XU Y, DU J, and DAI L R. An experimental study on speech enhancement based on deep neural networks[J]. IEEE Signal Processing Letters, 2014, 21(1): 65-68. doi: 10.1109/LSP. 2013.2291240. WANG Y X, NARAYANAN A, and WANG D L. On training targets for supervised speech separation[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(12): 1849-1859. doi: 10.1109/TASLP.2014.2352935. 王明合, 張二華, 唐振明, 等. 基于Fisher 線性判別分析的語音信號端點檢測方法[J]. 電子與信息學(xué)報, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122. WANG Minghe, ZHANG Erhua, TANG Zhenmin, et al. Voice activity detection based on Fisher linear discriminant analysis[J]. Journal of Electronics Information Technology, 2015, 37(6): 1343-1349. doi: 10.11999/JEIT141122. 郭海燕, 李梟雄, 李擬珺. 基于基頻狀態(tài)和幀間相關(guān)性的單通道語音分離算法[J]. 東南大學(xué)學(xué)報(自然科學(xué)版), 2014, 44(6): 1100-1104. GUO Haiyan, LI Xiaoxiong, and LI Nijun. Single-channel speech separation based on pitch state and interframe correlation[J]. Journal of Southeast University (Natural Science Edition), 2014, 44(6): 1100-1104. NELKE C, BEAUGEANT C, and VARY P. Dual microphone noise PSD estimation for mobile phones in hands-free position exploiting the coherence and speech presence probability[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, 2013: 7279-7283. doi: 10.1109/ ICASSP.2013.6639076. YOUSEFIAN N, RAHMANI M, and AKBARI A. Power level difference as a criterion for speech enhancement[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, 2009: 4653-4656. doi: dx.doi.org/ 10.1109/ICASSP.2009.4960668. YOUSEFIAN N, AKBARI A, and RAHMANI M. Using power level difference for near field dual-microphone speech enhancement[J]. Applied Acoustics, 2009, 70(11/12): 1412-1421. FU Z H, FAN F, and HUANG J D. Dual-microphone noise reduction for mobile phone application[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, 2013: 7239-7243. doi: 10.1109/ ICASSP.2013.6639068. MEYER-BAESE U. Digital Signal Processing with Field Programmable Gate Arrays[M]. Third Edition, Berlin Heidelberg: Springer, 2007: 298-305. RUBIO J E, ISHIZUKA K, SAWADA H, et al. Two- microphone voice activity detection based on the homogeneity of the direction of arrival estimates[C]. IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, 2007: 385-388. doi: 10.1109/ICASSP. 2007.366930. ZHAO H C, LI L G, and LI L H, et al. Dual-microphone adaptive noise canceller with a voice activity detector[C]. IEEE Region 10 Symposium, Kuala Lumpur, 2014: 551-554. doi: 10.1109/TENCONSpring.2014.6863095. CHOI J H and CHANG J H. Dual-microphone voice activity detection technique based on two-step power level difference ratio[J] IEEE Transactions on Audio, Speech and Language Processing, 2014. 22(6): 1069-1081. HU Y, and LOIZHOU P C. Evaluation of objective quality measures for speech enhancement[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(1): 229-238. -
計量
- 文章訪問數(shù): 1613
- HTML全文瀏覽量: 167
- PDF下載量: 577
- 被引次數(shù): 0