doi: 10.11999/JEIT190242
合肥工業(yè)大學(xué)計算機與信息學(xué)院 合肥 230009
Multi-level Attention Feature Network for Few-shot Learning
School of Computer and Information, Hefei University of Technology, Hefei 230009, China
- 圖像處理 /
- 多尺度圖像 /
- 小樣本學(xué)習(xí) /
- 多級注意力特征 /
- 相似性度量
Abstract:Existing few-shot methods have problems that feature extraction scale is single, the learned class representations are inaccurate, the similarity calculation still relies on standard metrics. In order to solve the above problems, multi-level attention feature network is proposed. Firstly, the multiple scale images are obtained by scale processing, the features of multiple scale images are extracted and the image-level attention features are obtained by the image-level attention mechanism to fusion them. Then, class-level attention features are learned by using the class-level attention mechanism. Finally, the classification is performed by using the network to compute the similarity scores between features. The proposed method is evaluated on the Omniglot dataset and the MiniImagenet dataset. The experimental results show that multi-level attention feature network can further improve the classification accuracy under small sample conditions compared to the single-scale image features and average prototypes.
表 1 不同尺度圖像的特征提取網(wǎng)絡(luò)分支結(jié)構(gòu)
網(wǎng)絡(luò)名 分支1 分支2 分支3 結(jié)構(gòu) $\left[ \begin{array}{l} {\rm C}:3 \times 3,64 \\ {\rm MP}:2 \times 2 \\ \end{array} \right]$ $\left[ \begin{array}{l} {\rm C}:3 \times 3,64 \\ {\rm MP}:2 \times 2 \\ \end{array} \right]$ $\left[ {{\rm C}:3 \times 3,64} \right]$ $\left[ \begin{array}{l} {\rm C}:3 \times 3,64 \\ {\rm MP}:2 \times 2 \\ \end{array} \right]$ $\left[ {{\rm C}:3 \times 3,64} \right]$ $\left[ {{\rm C}:3 \times 3,64} \right]$ $\left[ {{\rm C}:3 \times 3,64} \right]$ $\left[ {{\rm C}:3 \times 3,64} \right]$ $\left[ {{\rm C}:3 \times 3,64} \right]$ $\left[ {{\rm C}:3 \times 3,64} \right]$ $\left[ {{\rm C}:3 \times 3,64} \right]$ $\left[ {{\rm C}:3 \times 3,64} \right]$ 下載: 導(dǎo)出CSV
表 2 Omniglot數(shù)據(jù)集上的小樣本分類準(zhǔn)確率(%)
方法 微調(diào) 5-way 分類準(zhǔn)確率 20-way 分類準(zhǔn)確率 1-shot 5-shot 1-shot 5-shot MANN 否 82.8 94.9 – – MATCHING NETS 是 97.9 98.7 93.5 98.7 PROTOTYPICAL NETS 否 98.8 99.7 96.0 98.9 MAML 是 98.7±0.4 99.9±0.1 95.8±0.3 98.9±0.2 RELATION NET 否 99.6±0.2 99.8±0.1 97.6±0.2 99.1±0.1 本文方法 否 99.6 99.7 97.8 99.2 下載: 導(dǎo)出CSV
表 3 MiniIamgenet數(shù)據(jù)集上的小樣本分類準(zhǔn)確率(%)
方法 微調(diào) 5-way分類準(zhǔn)確率 1-shot 5-shot MATCHING NETS 否 43.56±0.84 53.11±0.73 META-LEARN LSTM 否 43.44±0.77 60.60±0.71 MAML 是 48.70±1.84 63.11±0.92 PROTOTYPICAL NETS 否 49.42±0.78 68.20±0.66 RELATION NETS 否 50.44±0.82 65.32±0.70 本文方法 否 53.18±0.80 66.72±0.71 本文方法(L2正則化) 否 54.56±0.81 67.39±0.68 下載: 導(dǎo)出CSV
表 4 MiniImageNet數(shù)據(jù)集上類特征方法的對比(%)
類特征 5-way 5-shot 分類準(zhǔn)確率 本文方法(均值類原型) 65.80±0.65 本文方法(求和) 65.56±0.66 本文方法(類級注意力特征) 66.43±0.68 下載: 導(dǎo)出CSV
表 5 MiniImageNet數(shù)據(jù)集上圖像特征方法的對比(%)
圖像特征 5-way 分類準(zhǔn)確率 1-shot 5-shot 本文方法(單尺度特征) 52.20±0.82 66.43±0.68 本文方法(兩尺度特征) 53.93±0.79 66.89±0.71 本文方法(圖像級注意力特征) 54.56±0.81 67.39±0.68 下載: 導(dǎo)出CSV
表 6 MiniImageNet數(shù)據(jù)集上多尺度方式對比(%)
多尺度方法 5-way 分類準(zhǔn)確率 1-shot 5-shot 特征金字塔網(wǎng)絡(luò) 53.42±0.76 66.50±0.69 不同卷積核 53.27±0.83 66.29±0.66 本文方法 54.56±0.81 67.39±0.68 下載: 導(dǎo)出CSV
表 7 MiniImageNet數(shù)據(jù)集上相似性度量方法的對比(%)
度量方式 5-way 分類準(zhǔn)確率 1-shot 5-shot 本文方法(歐氏距離) 48.43±0.78 63.52±0.71 本文方法(余弦相似度) 46.54±0.82 60.50±0.70 本文方法(網(wǎng)絡(luò)計算) 54.56±0.81 67.39±0.68 下載: 導(dǎo)出CSV
GIRSHICK R. Fast R-CNN[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440–1448. doi: 10.1109/ICCV.2015.169. HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2261–2269. doi: 10.1109/CVPR.2017.243. HE Di, XIA Yingce, QIN Tao, et al. Dual learning for machine translation[C]. The 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 820–828. LI Feifei, FERGUS R, and PERONA P. One-shot learning of object categories[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594–611. doi: 10.1109/TPAMI.2006.79 MEHROTRA A and DUKKIPATI A. Generative adversarial residual pairwise networks for one shot learning[EB/OL]. https://arxiv.org/abs/1703.08033, 2017. DIXIT M, KWITT R, NIETHAMMER M, et al. AGA: Attribute-guided augmentation[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7455–7463. doi: 10.1109/CVPR.2017.355. HARIHARAN B and GIRSHICK R. Low-shot visual recognition by shrinking and hallucinating features[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 3037–3046. doi: 10.1109/iccv.2017.328. FINN C, ABBEEL P, and LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]. The 34th International Conference on Machine Learning, Sydney, Australia, 2017: 1126–1135. RAVI S and LAROCHELLE H. Optimization as a model for few-shot learning[EB/OL]. https://openreview.net/forum?id=rJY0-Kcll, 2017. SANTORO A, BARTUNOV S, BOTVINICK M, et al. Meta-learning with memory-augmented neural networks[C]. The 33rd International Conference on Machine Learning, New York, USA, 2016: 1842–1850. KOCH G. Siamese neural networks for one-shot image recognition[EB/OL]. http://www.cs.utoronto.ca/~gkoch/files/msc-thesis.pdf, 2015. VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]. The 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 3630–3638. SNELL J, SWERSKY K, and ZEMEL R. Prototypical networks for few-shot learning[C]. The 31st Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 4080–4090. SUNG F, YANG Yongxin, ZHANG Li, et al. Learning to compare: Relation network for few-shot learning[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1199–1208. doi: 10.1109/cvpr.2018.00131. WANG Peng, LIU Lingqiao, and SHEN Chunhua. Multi-attention network for one shot learning[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6212–6220. doi: 10.1109/CVPR.2017.658. HILLIARD N, HODAS N O, and CORLEY C D. Dynamic input structure and network assembly for few-shot learning[EB/OL]. https://arxiv.org/abs/1708.06819v1, 2017. -