一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號碼
標(biāo)題
留言內(nèi)容
驗證碼

一種基于時空頻多維特征的短時窗口腦電聽覺注意解碼網(wǎng)絡(luò)

王春麗 李金絮 高玉鑫 王晨名 張珈豪

王春麗, 李金絮, 高玉鑫, 王晨名, 張珈豪. 一種基于時空頻多維特征的短時窗口腦電聽覺注意解碼網(wǎng)絡(luò)[J]. 電子與信息學(xué)報. doi: 10.11999/JEIT240867
引用本文: 王春麗, 李金絮, 高玉鑫, 王晨名, 張珈豪. 一種基于時空頻多維特征的短時窗口腦電聽覺注意解碼網(wǎng)絡(luò)[J]. 電子與信息學(xué)報. doi: 10.11999/JEIT240867
WANG Chunli, LI Jinxu, GAO Yuxin, WANG Chenming, ZHANG Jiahao. TSF-AADNet: A Short-time Window ElectroEncephaloGram Auditory Attention Decoding Network Based on Multi-dimensional Characteristics of Temporal-spatial-frequency[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240867
Citation: WANG Chunli, LI Jinxu, GAO Yuxin, WANG Chenming, ZHANG Jiahao. TSF-AADNet: A Short-time Window ElectroEncephaloGram Auditory Attention Decoding Network Based on Multi-dimensional Characteristics of Temporal-spatial-frequency[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240867

一種基于時空頻多維特征的短時窗口腦電聽覺注意解碼網(wǎng)絡(luò)

doi: 10.11999/JEIT240867
基金項目: 蘭州交通大學(xué)-天津大學(xué)高校聯(lián)合創(chuàng)新基金(LH2023002),天津市自然科學(xué)基金(21JCZXJC00190)
詳細(xì)信息
    作者簡介:

    王春麗:女,副教授,研究方向為腦機(jī)接口、語音分離、聲源定位

    李金絮:女,碩士生,研究方向為基于腦電的聽覺注意解碼

    高玉鑫:女,碩士生,研究方向為基于腦電的聽覺注意解碼

    王晨名:男,碩士生,研究方向為腦機(jī)接口

    張珈豪:男,碩士生,研究方向為腦機(jī)接口

    通訊作者:

    李金絮 l17339820919@163.com

  • 中圖分類號: TN911.7

TSF-AADNet: A Short-time Window ElectroEncephaloGram Auditory Attention Decoding Network Based on Multi-dimensional Characteristics of Temporal-spatial-frequency

Funds: Lanzhou Jiaotong University-Tianjin University Joint Innovation Fund (LH2023002), Tianjin Natural Science Foundation (21JCZXJC00190)
  • 摘要: 在雞尾酒會場景中,聽力正常的人有能力選擇性地注意特定的說話者語音,但聽力障礙者在這種場景中面臨困難。聽覺注意力解碼(AAD)的目的是通過分析聽者的腦電信號(EEG)響應(yīng)特征來推斷聽者關(guān)注的是哪個說話者?,F(xiàn)有的AAD模型只考慮腦電信號的時域或頻域單個特征或二者的組合(如時頻特征),而忽略了時-空-頻域特征之間的互補(bǔ)性,這在一定程度上限制了模型的分類能力,進(jìn)而影響了模型在決策窗口上的解碼精度。同時,已有AAD模型大多在長時決策窗口(1~5 s)中有較高的解碼精度。該文提出一種基于時-空-頻多維特征的短時窗口腦電信號聽覺注意解碼網(wǎng)絡(luò)(TSF-AADNet),用于提高短時決策窗口(0.1~1 s)的解碼精度。該模型由兩個并行的時空、頻空特征提取分支以及特征融合和分類模塊組成,其中,時空特征提取分支由時空卷積塊和高階特征交互模塊組成,頻空特征提取分支采用基于頻空注意力的3維卷積模塊(FSA-3DCNN),最后將雙分支網(wǎng)絡(luò)提取的時空和頻空特征進(jìn)行融合,得到最終的聽覺注意力二分類解碼結(jié)果。實驗結(jié)果表明,TSF-AADNet模型在聽覺注意檢測數(shù)據(jù)集KULeuven(KUL)和聽覺注意檢測的腦電和音頻數(shù)據(jù)集(DTU)的0.1 s決策窗口下,解碼精度分別為91.8%和81.1%,與最新的AAD模型一種基于時頻融合的雙分支并行網(wǎng)絡(luò)(DBPNet)相比,分別提高了5.40%和7.99%。TSF-AADNet作為一種新的短時決策窗口的AAD模型,可為聽力障礙診斷以及神經(jīng)導(dǎo)向助聽器研發(fā)提供有效參考。
  • 圖  1  TSF-AADNet示意圖

    圖  2  時空卷積塊的示意圖,包括時間卷積層、空間卷積層和平均池化層

    圖  3  頻率-空間注意力的示意圖:由頻率注意模塊(FAM)和空間注意模塊(SAM)組成

    圖  4  TSF-AADNet模型在短決策窗口的KUL和DTU數(shù)據(jù)集上所有受試者聽覺注意解碼的準(zhǔn)確性

    圖  5  TSAnet和FSAnet對KUL和DTU數(shù)據(jù)集中所有受試者的AAD準(zhǔn)確度

    圖  6  M1和M2對KUL和DTU數(shù)據(jù)集中所有受試者的AAD準(zhǔn)確度

    圖  7  不同注意力模型對KUL和DTU數(shù)據(jù)集中所有受試者的AAD準(zhǔn)確度

    表  1  時空、頻空特征提取分支和特征融合與分類層中各層的輸出值

    分支輸入特征維度輸出特征維度
    時空特征提取分支(TSAnet)卷積塊(Convolutional Block)$1 \times 64 \times 128$$64 \times 1 \times 64$
    高階特征交互模塊(HFI)$64 \times 1 \times 64$$64 \times 1 \times 64$
    2維卷積層$64 \times 1 \times 64$$4 \times 1 \times 64$
    自適應(yīng)平均池化層$4 \times 1 \times 64$$4 \times 1 \times 1$
    全連接層$4 \times 1 \times 1$$4$
    頻空特征提取分支(FSAnet)FSA-3DCNN$1 \times 5 \times 32 \times 32$$128 \times 5 \times 4 \times 4$
    3維卷積層$128 \times 5 \times 4 \times 4$$4 \times 5 \times 4 \times 4$
    自適應(yīng)平均池化層$4 \times 5 \times 4 \times 4$$4 \times 1 \times 1 \times 1$
    全連接層$4 \times 1 \times 1 \times 1$$4$
    特征融合與分類層拼接(Concat)88
    全連接層82
    下載: 導(dǎo)出CSV

    表  2  KUL, DTU數(shù)據(jù)集上4種短決策窗口的各種模型的平均AAD準(zhǔn)確率(%)

    數(shù)據(jù)集 模型 樣本時長(s)
    0.1 0.2 0.5 1.0
    KUL CNN[14] 74.3 78.2 80.6 84.1
    STAnet[17] 80.8 84.3 87.2 90.1
    RGCnet[28] 87.6 88.9 90.1 91.4
    mRFInet[29] 87.4 89.7 90.8 92.5
    DBPNet[30] 87.1 89.9 92.9 95.0
    TSF-AADNet(本文) 91.8 94.1 96.3 98.3
    DTU CNN[14] 56.7 58.4 61.7 63.3
    STAnet[17] 65.7 68.1 70.8 71.9
    RGCnet[28] 66.4 68.4 72.1 76.9
    mRFInet[29] 65.4 68.7 72.3 75.1
    DBPNet[30] 75.1 78.9 81.9 83.9
    TSF-AADNet(本文) 81.1 83.5 86.1 88.8
    下載: 導(dǎo)出CSV

    表  3  實驗中使用的腦電圖數(shù)據(jù)集KUL, DTU的詳細(xì)信息

    數(shù)據(jù)集 受試者個數(shù) 刺激語言 每個受試者的試驗持續(xù)時間(min) 總時長(h)
    KUL 16 佛蘭德語 48 12.8
    DTU 18 丹麥語 50 15.0
    下載: 導(dǎo)出CSV
  • [1] CHERRY E C. Some experiments on the recognition of speech, with one and with two ears[J]. The Journal of the Acoustical Society of America, 1953, 25(5): 975–979. doi: 10.1121/1.1907229.
    [2] WANG Deliang. Deep learning reinvents the hearing aid[J]. IEEE Spectrum, 2017, 54(3): 32–37. doi: 10.1109/MSPEC.2017.7864754.
    [3] ZHANG Malu, WU Jibin, CHUA Yansong, et al. MPD-AL: An efficient membrane potential driven aggregate-label learning algorithm for spiking neurons[C]. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Hawaii, USA, 2019: 1327–1334. doi: 10.1609/aaai.v33i01.33011327.
    [4] MESGARANI N and CHANG E F. Selective cortical representation of attended speaker in multi-talker speech perception[J]. Nature, 2012, 485(7397): 233–236. doi: 10.1038/nature11020.
    [5] DING Nai and SIMON J Z. Emergence of neural encoding of auditory objects while listening to competing speakers[J]. Proceedings of the National Academy of Sciences of the United States of America, 2012, 109(29): 11854–11859. doi: 10.1073/pnas.1205381109.
    [6] O'SULLIVAN J A, POWER A J, MESGARANI N, et al. Attentional selection in a cocktail party environment can be decoded from single-trial EEG[J]. Cerebral Cortex, 2015, 25(7): 1697–1706. doi: 10.1093/cercor/bht355.
    [7] MESGARANI N and CHANG E F. Selective cortical representation of attended speaker in multi-talker speech perception[J]. Nature, 2012, 485(7397): 233–236. doi: 10.1038/nature11020. (查閱網(wǎng)上資料,本條文獻(xiàn)與第4條文獻(xiàn)重復(fù),請確認(rèn)) .
    [8] CICCARELLI G, NOLAN M, PERRICONE J, et al. Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods[J]. Scientific Reports, 2019, 9(1): 11538. doi: 10.1038/s41598-019-47795-0.
    [9] FUGLSANG S A, DAU T, and HJORTKJ?R J. Noise-robust cortical tracking of attended speech in real-world acoustic scenes[J]. NeuroImage, 2017, 156: 435–444. doi: 10.1016/j.neuroimage.2017.04.026.
    [10] WONG D D E, FUGLSANG S A, HJORTKJ?R J, et al. A comparison of regularization methods in forward and backward models for auditory attention decoding[J]. Frontiers in Neuroscience, 2018, 12: 531. doi: 10.3389/fnins.2018.00531.
    [11] DE CHEVEIGNé A, WONG D D E, DI LIBERTO G M, et al. Decoding the auditory brain with canonical component analysis[J]. NeuroImage, 2018, 172: 206–216. doi: 10.1016/j.neuroimage.2018.01.033.
    [12] DE CHEVEIGNé A, DI LIBERTO G M, ARZOUNIAN D, et al. Multiway canonical correlation analysis of brain data[J]. NeuroImage, 2019, 186: 728–740. doi: 10.1016/j.neuroimage.2018.11.026.
    [13] ZWICKE E and FASTL H. Psychoacoustics: Facts and Models[M]. 2nd ed. New York: Springer, 1999.
    [14] VANDECAPPELLE S, DECKERS L, DAS N, et al. EEG-based detection of the locus of auditory attention with convolutional neural networks[J]. eLife, 2021, 10: e56481. doi: 10.7554/eLife.56481.
    [15] CAI Siqi, SU Enze, SONG Yonghao, et al. Low latency auditory attention detection with common spatial pattern analysis of EEG signals[C]. Proceedings of the INTERSPEECH 2020, Shanghai, China, 2020: 2772–2776. doi: 10.21437/Interspeech.2020-2496.
    [16] CAI Siqi, SU Enze, XIE Longhan, et al. EEG-based auditory attention detection via frequency and channel neural attention[J]. IEEE Transactions on Human-Machine Systems, 2022, 52(2): 256–266. doi: 10.1109/THMS.2021.3125283.
    [17] SU Enze, CAI Siqi, XIE Longhan, et al. STAnet: A spatiotemporal attention network for decoding auditory spatial attention from EEG[J]. IEEE Transactions on Biomedical Engineering, 2022, 69(7): 2233–2242. doi: 10.1109/TBME.2022.3140246.
    [18] JIANG Yifan, CHEN Ning, and JIN Jing. Detecting the locus of auditory attention based on the spectro-spatial-temporal analysis of EEG[J]. Journal of Neural Engineering, 2022, 19(5): 056035. doi: 10.1088/1741-2552/ac975c.
    [19] CAI Siqi, SCHULTZ T, and LI Haizhou. Brain topology modeling with EEG-graphs for auditory spatial attention detection[J]. IEEE Transactions on Biomedical Engineering, 2024, 71(1): 171–182. doi: 10.1109/TBME.2023.3294242.
    [20] XU Xiran, WANG Bo, YAN Yujie, et al. A DenseNet-based method for decoding auditory spatial attention with EEG[C]. Proceedings of the ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Korea, Republic of, 2024: 1946–1950. doi: 10.1109/ICASSP48485.2024.10448013.
    [21] GEIRNAERT S, FRANCART T, and BERTRAND A. Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns[J]. IEEE Transactions on Biomedical Engineering, 2021, 68(5): 1557–1568. doi: 10.1109/TBME.2020.3033446.
    [22] SCHIRRMEISTER R T, SPRINGENBERG J T, FIEDERER L D J, et al. Deep learning with convolutional neural networks for EEG decoding and visualization[J]. Human Brain Mapping, 2017, 38(11): 5391–5420. doi: 10.1002/hbm.23730.
    [23] LAWHERN V J, SOLON A J, WAYTOWICH N R, et al. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces[J]. Journal of Neural Engineering, 2018, 15(5): 056013. doi: 10.1088/1741-2552/aace8c.
    [24] RAO Yongming, ZHAO Wenliang, TANG Yansong, et al. HorNet: Efficient high-order spatial interactions with recursive gated convolutions[C]. Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 752.
    [25] LIU Yongjin, YU Minjing, ZHAO Guozhen, et al. Real-time movie-induced discrete emotion recognition from EEG signals[J]. IEEE Transactions on Affective Computing, 2018, 9(4): 550–562. doi: 10.1109/TAFFC.2017.2660485.
    [26] CAI Siqi, SUN Pengcheng, SCHULTZ T, et al. Low-latency auditory spatial attention detection based on spectro-spatial features from EEG[C]. Proceedings of 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, Mexico, Mexico, 2021: 5812–5815. doi: 10.1109/EMBC46164.2021.9630902.
    [27] DAS N, FRANCAR T, and BERTRAND A. Auditory attention detection dataset KULeuven (OLD VERSION)[J]. Zenodo, 2019. doi: 10.5281/zenodo.3997352.
    [28] FUGLSANG S A, WONG D D E, and HJORTKJ?R J. EEG and audio dataset for auditory attention decoding[J]. Zenodo, 2018. doi: 10.5281/zenodo.1199011.
    [29] CAI Siqi, LI Jia, YANG Hongmeng, et al. RGCnet: An efficient recursive gated convolutional network for EEG-based auditory attention detection[C]. Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, Sydney, Australia, 2023: 1–4. doi: 10.1109/EMBC40787.2023. 10340432.
    [30] LI Jia, ZHANG Ran, and CAI Siqi. Multi-scale recursive feature interaction for auditory attention detection using EEG signals[C]. Proceedings of 2024 IEEE International Symposium on Biomedical Imaging, Athens, Greece, 2024: 1–5. doi: 10.1109/ISBI56570.2024.10635751.
  • 加載中
圖(7) / 表(3)
計量
  • 文章訪問數(shù):  111
  • HTML全文瀏覽量:  39
  • PDF下載量:  9
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2024-10-15
  • 修回日期:  2025-02-19
  • 網(wǎng)絡(luò)出版日期:  2025-02-25

目錄

    /

    返回文章
    返回