一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級(jí)搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問(wèn)題, 您可以本頁(yè)添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

基于多模態(tài)生成對(duì)抗網(wǎng)絡(luò)和三元組損失的說(shuō)話人識(shí)別

陳瑩 陳湟康

陳瑩, 陳湟康. 基于多模態(tài)生成對(duì)抗網(wǎng)絡(luò)和三元組損失的說(shuō)話人識(shí)別[J]. 電子與信息學(xué)報(bào), 2020, 42(2): 379-385. doi: 10.11999/JEIT190154
引用本文: 陳瑩, 陳湟康. 基于多模態(tài)生成對(duì)抗網(wǎng)絡(luò)和三元組損失的說(shuō)話人識(shí)別[J]. 電子與信息學(xué)報(bào), 2020, 42(2): 379-385. doi: 10.11999/JEIT190154
Ying CHEN, Huangkang CHEN. Speaker Recognition Based on Multimodal GenerativeAdversarial Nets with Triplet-loss[J]. Journal of Electronics & Information Technology, 2020, 42(2): 379-385. doi: 10.11999/JEIT190154
Citation: Ying CHEN, Huangkang CHEN. Speaker Recognition Based on Multimodal GenerativeAdversarial Nets with Triplet-loss[J]. Journal of Electronics & Information Technology, 2020, 42(2): 379-385. doi: 10.11999/JEIT190154

基于多模態(tài)生成對(duì)抗網(wǎng)絡(luò)和三元組損失的說(shuō)話人識(shí)別

doi: 10.11999/JEIT190154
基金項(xiàng)目: 國(guó)家自然科學(xué)基金(61573168)
詳細(xì)信息
    作者簡(jiǎn)介:

    陳瑩:女,1976年生,教授,博士,研究方向?yàn)樾畔⑷诤?、模式識(shí)別

    陳湟康:男,1994年生,碩士生,研究方向?yàn)檎f(shuō)話人識(shí)別

    通訊作者:

    陳瑩 chenying@jiangnan.edu.cn

  • 中圖分類號(hào): TN912.3, TP391

Speaker Recognition Based on Multimodal GenerativeAdversarial Nets with Triplet-loss

Funds: The National Natural Science Foundation of China (61573168))
  • 摘要:

    為了挖掘說(shuō)話人識(shí)別領(lǐng)域中人臉和語(yǔ)音的相關(guān)性,該文設(shè)計(jì)多模態(tài)生成對(duì)抗網(wǎng)絡(luò)(GAN),將人臉特征和語(yǔ)音特征映射到聯(lián)系更加緊密的公共空間,隨后利用3元組損失對(duì)兩個(gè)模態(tài)的聯(lián)系進(jìn)一步約束,拉近相同個(gè)體跨模態(tài)樣本的特征距離,拉遠(yuǎn)不同個(gè)體跨模態(tài)樣本的特征距離。最后通過(guò)計(jì)算公共空間特征的跨模態(tài)余弦距離判斷人臉和語(yǔ)音是否匹配,并使用Softmax識(shí)別說(shuō)話人身份。實(shí)驗(yàn)結(jié)果表明,該方法能有效地提升說(shuō)話人識(shí)別準(zhǔn)確率。

  • 圖  1  本文所提網(wǎng)絡(luò)結(jié)構(gòu)圖

    圖  2  不同margin值的ROC

    圖  3  不同閾值的識(shí)別結(jié)果

    圖  4  是否具有公共層的ROC曲線對(duì)比

    圖  5  有無(wú)特征匹配判斷網(wǎng)絡(luò)識(shí)別結(jié)果對(duì)比

    表  1  不同特征的身份識(shí)別準(zhǔn)確率(%)

    特征ID識(shí)別準(zhǔn)確率
    語(yǔ)音公共特征95.57
    人臉公共特征99.41
    串聯(lián)特征99.59
    下載: 導(dǎo)出CSV

    表  2  說(shuō)話人身份識(shí)別準(zhǔn)確率(%)

    方法ID識(shí)別準(zhǔn)確率匹配準(zhǔn)確率
    Multimodal Correlated NN[6]83.26
    Multimodal CNN[5]86.12
    Multimodal LSTM[7]90.1594.35
    Deep Heterogeneous Feature Fusion.[8]97.80
    本文AVGATN99.4199.02
    下載: 導(dǎo)出CSV
  • BREDIN H and CHOLLET G. Audio-visual speech synchrony measure for talking-face identity verification[C]. 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, 2007: Ⅱ-233–Ⅱ-236.
    HAGHIGHAT M, ABDEL-MOTTALEB M, and ALHALABI W. Discriminant correlation analysis: Real-time feature level fusion for multimodal biometric recognition[J]. IEEE Transactions on Information Forensics and Security, 2016, 11(9): 1984–1996. doi: 10.1109/TIFS.2016.2569061
    CHENG H T, CHAO Y H, YEH S L, et al. An efficient approach to multimodal person identity verification by fusing face and voice information[C]. 2005 IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, 2005: 542–545.
    SOLTANE M, DOGHMANE N, and GUERSI N. Face and speech based multi-modal biometric authentication[J]. International Journal of Advanced Science and Technology, 2010, 21(6): 41–56.
    HU Yongtao, REN J S J, DAI Jingwen, et al. Deep multimodal speaker naming[C]. The 23rd ACM International Conference on Multimedia, Brisbane, Australia, 2015: 1107–1110.
    GENG Jiajia, LIU Xin, and CHEUNG Y M. Audio-visual speaker recognition via multi-modal correlated neural networks[C]. 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops, Omaha, USA, 2016: 123–128.
    REN J, HU Yongtao, TAI Y W, et al. Look, listen and learn-a multimodal LSTM for speaker identification[C]. The 30th AAAI Conference on Artificial Intelligence, Phoenix, USA, 2016: 3581–3587.
    LIU Yuhang, LIU Xin, FAN Wentao, et al. Efficient audio-visual speaker recognition via deep heterogeneous feature fusion[C]. The 12th Chinese Conference on Biometric Recognition, Shenzhen, China, 2017: 575–583.
    GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2672–2680.
    唐賢倫, 杜一銘, 劉雨微, 等. 基于條件深度卷積生成對(duì)抗網(wǎng)絡(luò)的圖像識(shí)別方法[J]. 自動(dòng)化學(xué)報(bào), 2018, 44(5): 855–864.

    TANG Xianlun, DU Yiming, LIU Yuwei, et al. Image recognition with conditional deep convolutional generative adversarial networks[J]. Acta Automatica Sinica, 2018, 44(5): 855–864.
    孫亮, 韓毓璇, 康文婧, 等. 基于生成對(duì)抗網(wǎng)絡(luò)的多視圖學(xué)習(xí)與重構(gòu)算法[J]. 自動(dòng)化學(xué)報(bào), 2018, 44(5): 819–828.

    SUN Liang, HAN Yuxuan, KANG Wenjing, et al. Multi-view learning and reconstruction algorithms via generative adversarial networks[J]. Acta Automatica Sinica, 2018, 44(5): 819–828.
    鄭文博, 王坤峰, 王飛躍. 基于貝葉斯生成對(duì)抗網(wǎng)絡(luò)的背景消減算法[J]. 自動(dòng)化學(xué)報(bào), 2018, 44(5): 878–890.

    ZHENG Wenbo, WANG Kunfeng, and WANG Feiyue. Background subtraction algorithm with Bayesian generative adversarial networks[J]. Acta Automatica Sinica, 2018, 44(5): 878–890.
    RADFORD A, METZ L, and CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv: 1511.06434 , 2015.
    DENTON E, CHINTALA S, SZLAM A, et al. Deep generative image models using a laplacian pyramid of adversarial networks[C]. The 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 1486–1494.
    LEDIG C, THEIS L, HUSZáR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 105–114.
    WANG Xiaolong and GUPTA A. Generative image modeling using style and structure adversarial networks[C]. The 14th European Conference on Computer Vision, Amsterdam, Netherlands, 2016: 318–335.
    PENG Yuxin and QI Jinwei. CM-GANs: Cross-modal generative adversarial networks for common representation learning[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2019, 15(1): 98–121.
    HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. Computer Science, 2012, 3(4): 212–223.
  • 加載中
圖(5) / 表(2)
計(jì)量
  • 文章訪問(wèn)數(shù):  3618
  • HTML全文瀏覽量:  1868
  • PDF下載量:  152
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2019-03-15
  • 修回日期:  2019-09-09
  • 網(wǎng)絡(luò)出版日期:  2019-09-19
  • 刊出日期:  2020-02-19

目錄

    /

    返回文章
    返回