基于區(qū)域與深度殘差網(wǎng)絡(luò)的圖像語義分割
doi: 10.11999/JEIT190056
-
1.
江西理工大學(xué)信息工程學(xué)院 贛州 341000
-
2.
浙江大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院 杭州 310027
Image Semantic Segmentation Based on Region and Deep Residual Network
-
1.
School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China
-
2.
School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
-
摘要: 該文提出了一種結(jié)合區(qū)域和深度殘差網(wǎng)絡(luò)的語義分割模型?;趨^(qū)域的語義分割方法使用多尺度提取相互重疊的區(qū)域,可識(shí)別多種尺度的目標(biāo)并得到精細(xì)的物體分割邊界?;谌矸e網(wǎng)絡(luò)的方法使用卷積神經(jīng)網(wǎng)絡(luò)(CNN)自主學(xué)習(xí)特征,可以針對(duì)逐像素分類任務(wù)進(jìn)行端到端訓(xùn)練,但是這種方法通常會(huì)產(chǎn)生粗糙的分割邊界。該文將兩種方法的優(yōu)點(diǎn)結(jié)合起來:首先使用區(qū)域生成網(wǎng)絡(luò)在圖像中生成候選區(qū)域,然后將圖像通過帶擴(kuò)張卷積的深度殘差網(wǎng)絡(luò)進(jìn)行特征提取得到特征圖,結(jié)合候選區(qū)域以及特征圖得到區(qū)域的特征,并將其映射到區(qū)域中每個(gè)像素上;最后使用全局平均池化層進(jìn)行逐像素分類。該文還使用了多模型融合的方法,在相同的網(wǎng)絡(luò)模型中設(shè)置不同的輸入進(jìn)行訓(xùn)練得到多個(gè)模型,然后在分類層進(jìn)行特征融合,得到最終的分割結(jié)果。在SIFT FLOW和PASCAL Context數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果表明該文方法具有較高的平均準(zhǔn)確率。
-
關(guān)鍵詞:
- 語義分割 /
- 區(qū)域 /
- 深度殘差網(wǎng)絡(luò) /
- 集成
Abstract: An image semantic segmentation model based on region and deep residual network is proposed. Region based methods use multi-scale to create overlapping regions, which can identify multi-scale objects and obtain fine object segmentation boundary. Fully convolutional methods learn features automatically by using Convolutional Neural Network (CNN) to perform end-to-end training for pixel classification tasks, but typically produce coarse segmentation boundaries. The advantages of these two methods are combined: firstly, candidate regions are generated by region generation network, and then the image is fed through the deep residual network with dilated convolution to obtain the feature map. Then the candidate regions and the feature maps are combined to get the features of the regions, and the features are mapped to each pixel in the regions. Finally, the global average pooling layer is used to classify pixels. Multiple different models are obtained by training with different sizes of candidate region inputs. When testing, the final segmentation are obtained by fusing the classification results of these models. The experimental results on SIFT FLOW and PASCAL Context datasets show that the proposed method has higher average accuracy than some state-of-the-art algorithms.-
Key words:
- Semantic segmentation /
- Region /
- Deep residual network /
- Ensemble
-
表 3 3種不同擴(kuò)張卷積核使用方案的性能比較
實(shí)驗(yàn) 操作 最后卷積層輸出大小 SIFT FLOW MA(%) 1 無操作 19×19 64.50 2 僅移除stride操作 僅Res4 (stride=1) 38×38 26.61 3 僅Res5 (stride=1) 38×38 37.47 4 Res4 (stride=1)+Res5 (stride=1) 75×75 39.76 5 +設(shè)置dilated 僅Res4(dilated=2) 38×38 64.20 6 僅Res5(dilated=4) 38×38 63.60 7 Res4(dilated=2)+Res5(dilated=4) 75×75 65.50 下載: 導(dǎo)出CSV
表 4 4個(gè)單模型以及融合模型在SIFT FLOW上的效果比較
模型序號(hào) 候選區(qū)域尺寸 SIFT FLOW MA(%) 1 7×7 64.20 2 9×9 64.80 3 13×13 65.30 4 15×15 65.20 融合模型3 4 – 65.70 融合模型3 4 – 66.00 融合模型1 2 3 4 – 66.20 下載: 導(dǎo)出CSV
-
魏云超, 趙耀. 基于DCNN的圖像語義分割綜述[J]. 北京交通大學(xué)學(xué)報(bào), 2016, 40(4): 82–91. doi: 10.11860/j.issn.1673-0291WEI Yunchao and ZHAO Yao. A review on image semantic segmentation based on DCNN[J]. Journal of Beijing Jiaotong University, 2016, 40(4): 82–91. doi: 10.11860/j.issn.1673-0291 CARREIRA J, LI Fuxin, and SMINCHISESCU C. Object recognition by sequential figure-ground ranking[J]. International Journal of Computer Vision, 2012, 98(3): 243–262. doi: 10.1007/s11263-011-0507-2 CARREIRA J, CASEIRO R, BATISTA J, et al. Semantic segmentation with second-order pooling[C]. Proceedings of the 12th European Conference on Computer Vision 2012, Berlin, Germany, 2012: 430–443. ARBELáEZ P, HARIHARAN B, GU Chunhui, et al. Semantic segmentation using regions and parts[C]. Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 3378–3385. GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, 2014: 580–587. GIRSHICK R. Fast R-CNN[C]. Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440–1448. SHELHAMER E, LONG J, and DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4): 640–651. doi: 10.1109/TPAMI.2016.2572683 CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. Computer Science, 2015(4): 357–361. doi: 10.1080/17476938708814211 CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 40(4): 834–848. doi: 10.1109/TPAMI.2017.2699184 UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2): 154–171. doi: 10.1007/s11263-013-0620-5 HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916. doi: 10.1109/TPAMI.2015.2389824 YU Tianshu, YAN Junchi, ZHAO Jieyi, et al. Joint cuts and matching of partitions in one graph[C]. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 705–713. HARIHARAN B, ARBELáEZ P, GIRSHICK R, et al. Simultaneous detection and segmentation[C]. Proceedings of the 13th Conference on Computer Vision, Zurich, Switzerland, 2014: 297–312. DAI Jifeng, HE Kaiming, and SUN Jian. Convolutional feature masking for joint object and stuff segmentation[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 3992–4000. CAESAR H, UIJLINGS J, and FERRARI V. Region-based semantic segmentation with end-to-end training[C]. Proceedings of the 14th European Conference on Computer Vision 2016, Amsterdam, The Netherlands, 2016: 381–397. HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. YU F and KOLTUN V. Multi-scale context aggregation by dilated convolutions[EB/OL]. https://arxiv.org/abs/1511.07122, 2015. LIN Min, CHEN Qiang, and YAN Shuicheng. Network in network[EB/OL]. https://arxiv.org/abs/1312.4400, 2014. LIU Ce, YUEN J, and TORRALBA A. Nonparametric scene parsing via label transfer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(12): 2368–2382. doi: 10.1109/TPAMI.2011.131 MOTTAGHI R, CHEN Xianjie, LIU Xiaobai, et al. The role of context for object detection and semantic segmentation in the wild[C]. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 891–898. YANG Jimei, PRICE B, COHEN S, et al. Context driven scene parsing with attention to rare classes[C]. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 3294–3301. EIGEN D and FERGUS R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture[C]. Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 2650–2658. DAI Jifeng, HE Kaiming, and SUN Jian. Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation[C]. Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1635–1643. -