基于動態(tài)感受野的自適應(yīng)多尺度信息融合的圖像轉(zhuǎn)換

尹夢曉; 林振峰; 楊鋒

doi:10.11999/JEIT200675

基于動態(tài)感受野的自適應(yīng)多尺度信息融合的圖像轉(zhuǎn)換

doi: 10.11999/JEIT200675

尹夢曉^{1, 2,},
林振峰¹,
楊鋒^{1, 2, ,}

1.
廣西大學(xué)計算機(jī)與電子信息學(xué)院南寧 530004
2.
廣西多媒體通信與網(wǎng)絡(luò)技術(shù)重點實驗室南寧 530004

基金項目: 國家自然科學(xué)基金(61762007, 61861004)，廣西自然科學(xué)基金(2017GXNSFAA198269, 2017GXNSFAA198267)

詳細(xì)信息

作者簡介:
尹夢曉：女，1978年生，博士，副教授，CCF會員，研究方向為計算機(jī)圖形學(xué)與虛擬現(xiàn)實、數(shù)字幾何處理、圖像與視頻編輯

林振峰：男，1996年生，碩士生，研究方向為圖像生成、圖像轉(zhuǎn)換

楊鋒：男，1979年生，博士，副教授，CCF會員，研究方向為人工智能、網(wǎng)絡(luò)信息安全、大數(shù)據(jù)與高性能計算、精準(zhǔn)醫(yī)學(xué)

通訊作者:
楊鋒　yf@gxu.edu.cn

中圖分類號: TN911.73; TP391
計量
- 文章訪問數(shù): 2227
- HTML全文瀏覽量: 1017
- PDF下載量: 113
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2020-08-04
- 修回日期: 2021-01-04
- 網(wǎng)絡(luò)出版日期: 2021-01-10
- 刊出日期: 2021-08-10

Adaptive Multi-scale Information Fusion Based on Dynamic Receptive Field for Image-to-image Translation

Mengxiao YIN^{1, 2
,},
Zhenfeng LIN¹,
Feng YANG^{1, 2
, ,}

1.
School of Computer and Electronics Information, Guangxi University, Nanning 530004, China
2.
Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China

Funds: The National Natural Science Foundation of China (61762007, 61861004), The Natural Science Foundation of Guangxi (2017GXNSFAA198269, 2017GXNSFAA198267)

摘要

摘要: 為提高圖像轉(zhuǎn)換模型生成圖像的質(zhì)量，該文針對轉(zhuǎn)換模型中的生成器進(jìn)行改進(jìn)，同時探究多樣化的圖像轉(zhuǎn)換，拓展轉(zhuǎn)換模型的生成能力。在生成器的改進(jìn)方面，利用選擇性(卷積)核模塊(SKBlock)的動態(tài)感受野機(jī)制獲取和融合生成器中每個上采樣特征的多尺度信息，借助特征的多尺度信息和動態(tài)感受野構(gòu)造選擇性(卷積)核的生成式對抗網(wǎng)絡(luò)(SK-GAN)。與傳統(tǒng)生成器相比，SK-GAN以動態(tài)感受野獲取多尺度信息的生成結(jié)構(gòu)提高了生成圖像的質(zhì)量。在多樣化圖像轉(zhuǎn)換方面，基于SK-GAN在草圖合成真實圖像任務(wù)提出帶引導(dǎo)圖像的選擇性(卷積)核的生成式對抗網(wǎng)絡(luò)(GSK-GAN)。該模型利用引導(dǎo)圖像指導(dǎo)源圖像的轉(zhuǎn)換，通過引導(dǎo)圖像編碼器提取引導(dǎo)圖像特征，然后由參數(shù)生成器(PG)和特征轉(zhuǎn)換層(FT)將引導(dǎo)圖像特征的信息傳遞至生成器。此外，該文還提出雙分支引導(dǎo)圖像編碼器以提高轉(zhuǎn)換模型的編輯能力，以及利用引導(dǎo)圖像的隱變量分布實現(xiàn)隨機(jī)樣式的圖像生成。實驗表明，改進(jìn)后的生成器有助于提高生成圖像質(zhì)量，SK-GAN在多個數(shù)據(jù)集中獲得合理的生成結(jié)果。GSK-GAN不僅保證了生成圖像的質(zhì)量，還能生成更多樣式的圖像。
- 圖像轉(zhuǎn)換 /
- 多尺度信息 /
- 動態(tài)感受野 /
- 自適應(yīng)特征選擇
Abstract: In order to improve the quality of the generated images by the image translation model, the generator in the translation model to obtain high-quality generated images is improved, the diversified image translation is explored and the generation ability of the translation model is expanded. In terms of generator improvement, the dynamic receptive field mechanism of Selective Kernel Block (SKBlock) is used to obtain and fuse the multi-scale information of each up sampling feature in the generator. With the help of multi-scale information of features and dynamic receptive field, the Selective Kernel Generative Adversarial Network (SK-GAN) is constructed. Compared with the traditional generator, SK-GAN improves the quality of the generated image by using dynamic receptive field to obtain multi-scale information. In terms of diversified image translation, the Selective Kernel Generative Adversarial Network with Guide (GSK-GAN) is proposed based on SK-GAN in sketch synthesis realistic image task. GSK-GAN uses the guided image to guide the source image translation and extracts the guide image features through the guided image encoder. Then transmits information of the guided image features to the generator by Parameter Generator (PG) and Feature Transformation (FT). In addition, a dual branch guided image encoder is proposed to improve the editing ability of the translation model. The random style image generation is realized by using the latent variable distribution of the guide image. The experimental results show that the improved generator is helpful to improve the quality of the generated images, and SK-GAN can obtain reasonable results in multiple datasets. GSK-GAN no only ensures the quality of the generated images, but also generates more styles of images
- Image translation /
- Multi-scale information /
- Dynamic receptive field /
- Adaptive feature selection

HTML全文

圖 1 轉(zhuǎn)換模型結(jié)構(gòu)

下載: 全尺寸圖片幻燈片

圖 2 生成器中的上采樣過程

下載: 全尺寸圖片幻燈片

圖 3 SKBlock的結(jié)構(gòu)和動態(tài)特征選擇過程

下載: 全尺寸圖片幻燈片

圖 4 GSK-GAN模型結(jié)構(gòu)

$\mu $和$\sigma $分別為引導(dǎo)圖像隱變量分布均值和標(biāo)準(zhǔn)差，$z$為隱變量，$ \odot $表示沿通道方向拼接特征。

下載: 全尺寸圖片幻燈片

圖 5 引導(dǎo)圖像信息的傳遞方式

下載: 全尺寸圖片幻燈片

圖 6 草圖合成真實圖像實驗結(jié)果對比

下載: 全尺寸圖片幻燈片

圖 7 語義圖像合成真實圖像實驗結(jié)果對比

下載: 全尺寸圖片幻燈片

圖 8 多模態(tài)圖像轉(zhuǎn)換生成的結(jié)果對比

下載: 全尺寸圖片幻燈片

圖 9 Edges2shoes數(shù)據(jù)集中使用雙分支引導(dǎo)圖像編碼器的生成結(jié)果

下載: 全尺寸圖片幻燈片

圖 10 Edges2shoes數(shù)據(jù)集中使用隱變量的生成結(jié)果

下載: 全尺寸圖片幻燈片

圖 11 Edges2shoes數(shù)據(jù)集中紋理不匹配的生成結(jié)果

下載: 全尺寸圖片幻燈片

圖 12 多個數(shù)據(jù)集中上采樣層的特征對應(yīng)的多尺度信息的選擇權(quán)重

下載: 全尺寸圖片幻燈片

圖 13 不同引導(dǎo)圖像信息傳遞方式對應(yīng)的多樣性生成結(jié)果

下載: 全尺寸圖片幻燈片

表 1 Edges2shoes和Edges2handbags數(shù)據(jù)集中定量對比結(jié)果

	Edges2shoes			Edges2handbags
	Pix2pix^[1]	DRPAN^[7]	SK-GAN	Pix2pix^[1]	DRPAN^[7]	SK-GAN
SSIM	0.749	0.764	0.788	0.641	0.671	0.676
PSNR	20.001	19.739	20.606	16.475	17.384	17.171
FID	69.213	43.883	45.168	73.675	69.606	68.957
LPIPS	0.183	0.176	0.161	0.267	0.260	0.254

下載: 導(dǎo)出CSV

表 2 Cityscapes數(shù)據(jù)集中定量對比結(jié)果

	Per-pixel acc	Per-class acc	Class IOU
L1+CGAN^[1]	0.63	0.21	0.16
CRN^[22]	0.69	0.21	0.20
DPRAN^[7]	0.73	0.24	0.19
SK-GAN	0.76	0.25	0.20

下載: 導(dǎo)出CSV

表 3 多模態(tài)圖像轉(zhuǎn)換Edges2shoes和Edges2handbags數(shù)據(jù)集中定量對比結(jié)果

	Edges2shoes			Edges2handbags
	TextureGAN^[9]	文獻(xiàn)[10]	GSK-GAN	TextureGAN^[9]	文獻(xiàn)[10]	GSK-GAN
FID	44.190	118.988	45.041	61.068	73.290	60.753
LPIPS	0.123	0.123	0.119	0.171	0.162	0.154

下載: 導(dǎo)出CSV

表 4 生成器中不同的上采樣過程生成的圖像質(zhì)量對比結(jié)果

	SSIM	PSNR	FID	LPIPS
模式1	0.267	12.821	102.771	0.415
模式2	0.267	12.853	92.608	0.404
模式3	0.284	12.981	89.718	0.405
模式3 (GAN)	0.262	12.568	97.828	0.399

下載: 導(dǎo)出CSV

表 5 SKBlock中不同感受野分支組合對應(yīng)的圖像質(zhì)量對比結(jié)果

	SSIM	PSNR	FID	LPIPS
K13	0.276	12.961	100.532	0.398
K35	0.284	12.981	89.718	0.405
K57	0.268	13.007	98.132	0.400

下載: 導(dǎo)出CSV

參考文獻(xiàn)(22)

[1]	ISOLA P, ZHU Junyan, ZHOU Tinghui, et al. Image-to-image translation with conditional adversarial networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 5967–5976. doi: 10.1109/CVPR.2017.632.
[2]	CHEN Wengling and HAYS J. SketchyGAN: Towards diverse and realistic sketch to image synthesis[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 9416–9425. doi: 10.1109/CVPR.2018.00981.
[3]	KINGMA D P and WELLING M. Auto-encoding variational Bayes[EB/OL]. https://arxiv.org/abs/1312.6114, 2013.
[4]	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2672–2680.
[5]	RADFORD A, METZ L, and CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. https://arxiv.org/abs/1511.06434, 2015.
[6]	SUNG T L and LEE H J. Image-to-image translation using identical-pair adversarial networks[J]. Applied Sciences, 2019, 9(13): 2668. doi: 10.3390/app9132668
[7]	WANG Chao, ZHENG Haiyong, YU Zhibin, et al. Discriminative region proposal adversarial networks for high-quality image-to-image translation[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 796–812. doi: 10.1007/978-3-030-01246-5_47.
[8]	ZHU Junyan, ZHANG R, PATHAK D, et al. Toward multimodal image-to-image translation[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 465–476.
[9]	XIAN Wenqi, SANGKLOY P, AGRAWAL V, et al. TextureGAN: Controlling deep image synthesis with texture patches[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8456–8465. doi: 10.1109/CVPR.2018.00882.
[10]	ALBAHAR B and HUANG Jiabin. Guided image-to-image translation with bi-directional feature transformation[C]. The 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 9015–9024. doi: 10.1109/ICCV.2019.00911.
[11]	SUN Wei and WU Tianfu. Learning spatial pyramid attentive pooling in image synthesis and image-to-image translation[EB/OL]. https://arxiv.org/abs/1901.06322, 2019.
[12]	ZHU Junyan, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2242–2251. doi: 10.1109/ICCV.2017.244.
[13]	LI Xiang, WANG Wenhai, HU Xiaolin, et al. Selective kernel networks[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 510–519. doi: 10.1109/CVPR.2019.00060.
[14]	SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[EB/OL]. https://arxiv.org/abs/1602.07261, 2016.
[15]	柳長源, 王琪, 畢曉君. 基于多通道多尺度卷積神經(jīng)網(wǎng)絡(luò)的單幅圖像去雨方法[J]. 電子與信息學(xué)報, 2020, 42(9): 2285–2292. doi: 10.11999/JEIT190755 LIU Changyuan, WANG Qi, and BI Xiaojun. Research on Rain Removal Method for Single Image Based on Multi-channel and Multi-scale CNN[J]. Journal of Electronics &Information Technology, 2020, 42(9): 2285–2292. doi: 10.11999/JEIT190755
[16]	LI Juncheng, FANG Faming, MEI Kangfu, et al. Multi-scale residual network for image super-resolution[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 527–542. doi: 10.1007/978-3-030-01237-3_32.
[17]	MAO Xudong, LI Qing, XIE Haoran, et al. Least squares generative adversarial networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2813–2821. doi: 10.1109/ICCV.2017.304.
[18]	HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6629–6640.
[19]	ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 586–595. doi: 10.1109/CVPR.2018.00068.
[20]	CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 3213–3223. doi: 10.1109/CVPR.2016.350.
[21]	TYLE?EK R and ?áRA R. Spatial pattern templates for recognition of objects with regular structure[C]. The 35th German Conference on Pattern Recognition, Saarbrücken, Germany, 2013: 364–374. doi: 10.1007/978-3-642-40602-7_39.
[22]	CHEN Qifeng and KOLTUN V. Photographic image synthesis with cascaded refinement networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 1520–1529. doi: 10.1109/ICCV.2017.168.