一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號碼
標(biāo)題
留言內(nèi)容
驗證碼

基于動態(tài)感受野的自適應(yīng)多尺度信息融合的圖像轉(zhuǎn)換

尹夢曉 林振峰 楊鋒

尹夢曉, 林振峰, 楊鋒. 基于動態(tài)感受野的自適應(yīng)多尺度信息融合的圖像轉(zhuǎn)換[J]. 電子與信息學(xué)報, 2021, 43(8): 2386-2394. doi: 10.11999/JEIT200675
引用本文: 尹夢曉, 林振峰, 楊鋒. 基于動態(tài)感受野的自適應(yīng)多尺度信息融合的圖像轉(zhuǎn)換[J]. 電子與信息學(xué)報, 2021, 43(8): 2386-2394. doi: 10.11999/JEIT200675
Mengxiao YIN, Zhenfeng LIN, Feng YANG. Adaptive Multi-scale Information Fusion Based on Dynamic Receptive Field for Image-to-image Translation[J]. Journal of Electronics & Information Technology, 2021, 43(8): 2386-2394. doi: 10.11999/JEIT200675
Citation: Mengxiao YIN, Zhenfeng LIN, Feng YANG. Adaptive Multi-scale Information Fusion Based on Dynamic Receptive Field for Image-to-image Translation[J]. Journal of Electronics & Information Technology, 2021, 43(8): 2386-2394. doi: 10.11999/JEIT200675

基于動態(tài)感受野的自適應(yīng)多尺度信息融合的圖像轉(zhuǎn)換

doi: 10.11999/JEIT200675
基金項目: 國家自然科學(xué)基金(61762007, 61861004),廣西自然科學(xué)基金(2017GXNSFAA198269, 2017GXNSFAA198267)
詳細(xì)信息
    作者簡介:

    尹夢曉:女,1978年生,博士,副教授,CCF會員,研究方向為計算機(jī)圖形學(xué)與虛擬現(xiàn)實、數(shù)字幾何處理、圖像與視頻編輯

    林振峰:男,1996年生,碩士生,研究方向為圖像生成、圖像轉(zhuǎn)換

    楊鋒:男,1979年生,博士,副教授,CCF會員,研究方向為人工智能、網(wǎng)絡(luò)信息安全、大數(shù)據(jù)與高性能計算、精準(zhǔn)醫(yī)學(xué)

    通訊作者:

    楊鋒 yf@gxu.edu.cn

  • 中圖分類號: TN911.73; TP391

Adaptive Multi-scale Information Fusion Based on Dynamic Receptive Field for Image-to-image Translation

Funds: The National Natural Science Foundation of China (61762007, 61861004), The Natural Science Foundation of Guangxi (2017GXNSFAA198269, 2017GXNSFAA198267)
  • 摘要: 為提高圖像轉(zhuǎn)換模型生成圖像的質(zhì)量,該文針對轉(zhuǎn)換模型中的生成器進(jìn)行改進(jìn),同時探究多樣化的圖像轉(zhuǎn)換,拓展轉(zhuǎn)換模型的生成能力。在生成器的改進(jìn)方面,利用選擇性(卷積)核模塊(SKBlock)的動態(tài)感受野機(jī)制獲取和融合生成器中每個上采樣特征的多尺度信息,借助特征的多尺度信息和動態(tài)感受野構(gòu)造選擇性(卷積)核的生成式對抗網(wǎng)絡(luò)(SK-GAN)。與傳統(tǒng)生成器相比,SK-GAN以動態(tài)感受野獲取多尺度信息的生成結(jié)構(gòu)提高了生成圖像的質(zhì)量。在多樣化圖像轉(zhuǎn)換方面,基于SK-GAN在草圖合成真實圖像任務(wù)提出帶引導(dǎo)圖像的選擇性(卷積)核的生成式對抗網(wǎng)絡(luò)(GSK-GAN)。該模型利用引導(dǎo)圖像指導(dǎo)源圖像的轉(zhuǎn)換,通過引導(dǎo)圖像編碼器提取引導(dǎo)圖像特征,然后由參數(shù)生成器(PG)和特征轉(zhuǎn)換層(FT)將引導(dǎo)圖像特征的信息傳遞至生成器。此外,該文還提出雙分支引導(dǎo)圖像編碼器以提高轉(zhuǎn)換模型的編輯能力,以及利用引導(dǎo)圖像的隱變量分布實現(xiàn)隨機(jī)樣式的圖像生成。實驗表明,改進(jìn)后的生成器有助于提高生成圖像質(zhì)量,SK-GAN在多個數(shù)據(jù)集中獲得合理的生成結(jié)果。GSK-GAN不僅保證了生成圖像的質(zhì)量,還能生成更多樣式的圖像。
  • 圖  1  轉(zhuǎn)換模型結(jié)構(gòu)

    圖  2  生成器中的上采樣過程

    圖  3  SKBlock的結(jié)構(gòu)和動態(tài)特征選擇過程

    圖  4  GSK-GAN模型結(jié)構(gòu)

    $\mu $和$\sigma $分別為引導(dǎo)圖像隱變量分布均值和標(biāo)準(zhǔn)差,$z$為隱變量,$ \odot $表示沿通道方向拼接特征。

    圖  5  引導(dǎo)圖像信息的傳遞方式

    圖  6  草圖合成真實圖像實驗結(jié)果對比

    圖  7  語義圖像合成真實圖像實驗結(jié)果對比

    圖  8  多模態(tài)圖像轉(zhuǎn)換生成的結(jié)果對比

    圖  9  Edges2shoes數(shù)據(jù)集中使用雙分支引導(dǎo)圖像編碼器的生成結(jié)果

    圖  10  Edges2shoes數(shù)據(jù)集中使用隱變量的生成結(jié)果

    圖  11  Edges2shoes數(shù)據(jù)集中紋理不匹配的生成結(jié)果

    圖  12  多個數(shù)據(jù)集中上采樣層的特征對應(yīng)的多尺度信息的選擇權(quán)重

    圖  13  不同引導(dǎo)圖像信息傳遞方式對應(yīng)的多樣性生成結(jié)果

    表  1  Edges2shoes和Edges2handbags數(shù)據(jù)集中定量對比結(jié)果

    Edges2shoesEdges2handbags
    Pix2pix[1]DRPAN[7]SK-GANPix2pix[1]DRPAN[7]SK-GAN
    SSIM0.7490.7640.7880.6410.6710.676
    PSNR20.00119.73920.60616.47517.38417.171
    FID69.21343.88345.16873.67569.60668.957
    LPIPS0.1830.1760.1610.2670.2600.254
    下載: 導(dǎo)出CSV

    表  2  Cityscapes數(shù)據(jù)集中定量對比結(jié)果

    Per-pixel accPer-class accClass IOU
    L1+CGAN[1]0.630.210.16
    CRN[22]0.690.210.20
    DPRAN[7]0.730.240.19
    SK-GAN0.760.250.20
    下載: 導(dǎo)出CSV

    表  3  多模態(tài)圖像轉(zhuǎn)換Edges2shoes和Edges2handbags數(shù)據(jù)集中定量對比結(jié)果

    Edges2shoesEdges2handbags
    TextureGAN[9]文獻(xiàn)[10]GSK-GANTextureGAN[9]文獻(xiàn)[10]GSK-GAN
    FID44.190118.98845.04161.06873.29060.753
    LPIPS0.1230.1230.1190.1710.1620.154
    下載: 導(dǎo)出CSV

    表  4  生成器中不同的上采樣過程生成的圖像質(zhì)量對比結(jié)果

    SSIMPSNRFIDLPIPS
    模式10.26712.821102.7710.415
    模式20.26712.85392.6080.404
    模式30.28412.98189.7180.405
    模式3 (GAN)0.26212.56897.8280.399
    下載: 導(dǎo)出CSV

    表  5  SKBlock中不同感受野分支組合對應(yīng)的圖像質(zhì)量對比結(jié)果

    SSIMPSNRFIDLPIPS
    K130.27612.961100.5320.398
    K350.28412.98189.7180.405
    K570.26813.00798.1320.400
    下載: 導(dǎo)出CSV
  • [1] ISOLA P, ZHU Junyan, ZHOU Tinghui, et al. Image-to-image translation with conditional adversarial networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 5967–5976. doi: 10.1109/CVPR.2017.632.
    [2] CHEN Wengling and HAYS J. SketchyGAN: Towards diverse and realistic sketch to image synthesis[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 9416–9425. doi: 10.1109/CVPR.2018.00981.
    [3] KINGMA D P and WELLING M. Auto-encoding variational Bayes[EB/OL]. https://arxiv.org/abs/1312.6114, 2013.
    [4] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2672–2680.
    [5] RADFORD A, METZ L, and CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. https://arxiv.org/abs/1511.06434, 2015.
    [6] SUNG T L and LEE H J. Image-to-image translation using identical-pair adversarial networks[J]. Applied Sciences, 2019, 9(13): 2668. doi: 10.3390/app9132668
    [7] WANG Chao, ZHENG Haiyong, YU Zhibin, et al. Discriminative region proposal adversarial networks for high-quality image-to-image translation[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 796–812. doi: 10.1007/978-3-030-01246-5_47.
    [8] ZHU Junyan, ZHANG R, PATHAK D, et al. Toward multimodal image-to-image translation[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 465–476.
    [9] XIAN Wenqi, SANGKLOY P, AGRAWAL V, et al. TextureGAN: Controlling deep image synthesis with texture patches[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8456–8465. doi: 10.1109/CVPR.2018.00882.
    [10] ALBAHAR B and HUANG Jiabin. Guided image-to-image translation with bi-directional feature transformation[C]. The 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 9015–9024. doi: 10.1109/ICCV.2019.00911.
    [11] SUN Wei and WU Tianfu. Learning spatial pyramid attentive pooling in image synthesis and image-to-image translation[EB/OL]. https://arxiv.org/abs/1901.06322, 2019.
    [12] ZHU Junyan, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2242–2251. doi: 10.1109/ICCV.2017.244.
    [13] LI Xiang, WANG Wenhai, HU Xiaolin, et al. Selective kernel networks[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 510–519. doi: 10.1109/CVPR.2019.00060.
    [14] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[EB/OL]. https://arxiv.org/abs/1602.07261, 2016.
    [15] 柳長源, 王琪, 畢曉君. 基于多通道多尺度卷積神經(jīng)網(wǎng)絡(luò)的單幅圖像去雨方法[J]. 電子與信息學(xué)報, 2020, 42(9): 2285–2292. doi: 10.11999/JEIT190755

    LIU Changyuan, WANG Qi, and BI Xiaojun. Research on Rain Removal Method for Single Image Based on Multi-channel and Multi-scale CNN[J]. Journal of Electronics &Information Technology, 2020, 42(9): 2285–2292. doi: 10.11999/JEIT190755
    [16] LI Juncheng, FANG Faming, MEI Kangfu, et al. Multi-scale residual network for image super-resolution[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 527–542. doi: 10.1007/978-3-030-01237-3_32.
    [17] MAO Xudong, LI Qing, XIE Haoran, et al. Least squares generative adversarial networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2813–2821. doi: 10.1109/ICCV.2017.304.
    [18] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6629–6640.
    [19] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 586–595. doi: 10.1109/CVPR.2018.00068.
    [20] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 3213–3223. doi: 10.1109/CVPR.2016.350.
    [21] TYLE?EK R and ?áRA R. Spatial pattern templates for recognition of objects with regular structure[C]. The 35th German Conference on Pattern Recognition, Saarbrücken, Germany, 2013: 364–374. doi: 10.1007/978-3-642-40602-7_39.
    [22] CHEN Qifeng and KOLTUN V. Photographic image synthesis with cascaded refinement networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 1520–1529. doi: 10.1109/ICCV.2017.168.
  • 加載中
圖(13) / 表(5)
計量
  • 文章訪問數(shù):  2227
  • HTML全文瀏覽量:  1017
  • PDF下載量:  113
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2020-08-04
  • 修回日期:  2021-01-04
  • 網(wǎng)絡(luò)出版日期:  2021-01-10
  • 刊出日期:  2021-08-10

目錄

    /

    返回文章
    返回