多級注意力特征網(wǎng)絡(luò)的小樣本學(xué)習(xí)

汪榮貴; 韓夢雅; 楊娟; 薛麗霞; 胡敏

doi:10.11999/JEIT190242

多級注意力特征網(wǎng)絡(luò)的小樣本學(xué)習(xí)

doi: 10.11999/JEIT190242

合肥工業(yè)大學(xué)計算機與信息學(xué)院合肥 230009

基金項目: 國家自然科學(xué)基金(61672202)，國家自然科學(xué)基金-深圳聯(lián)合基金(U1613217)

詳細信息

作者簡介:
汪榮貴：男，1966年生，教授，研究方向為智能視頻處理與分析、視頻大數(shù)據(jù)與云計算等

韓夢雅：女，1996年生，碩士生，研究方向為深度學(xué)習(xí)、計算機視覺等

楊娟：女，1983年生，講師，研究方向為視頻信息處理、視頻大數(shù)據(jù)處理技術(shù)等

薛麗霞：女，1976年生，副教授，研究方向為視頻大數(shù)據(jù)檢索與分析

胡敏：女，1967年生，教授，研究方向為計算機視覺、數(shù)字圖像處理等

通訊作者:
楊娟　yangjuan@hfut.edu.cn

中圖分類號: TN911.73;?TP391.41
計量
- 文章訪問數(shù): 6577
- HTML全文瀏覽量: 2384
- PDF下載量: 309
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2019-04-11
- 修回日期: 2019-09-05
- 網(wǎng)絡(luò)出版日期: 2019-09-17
- 刊出日期: 2020-03-19

Multi-level Attention Feature Network for Few-shot Learning

School of Computer and Information, Hefei University of Technology, Hefei 230009, China

Funds: The National Natural Science Foundation of China (61672202), The State Key Program of National Natural Science Foundation of China-Shenzhen Joint Foundation (U1613217)

摘要

摘要:
針對目前基于度量學(xué)習(xí)的小樣本方法存在特征提取尺度單一，類特征學(xué)習(xí)不準(zhǔn)確，相似性計算依賴標(biāo)準(zhǔn)度量等問題，該文提出多級注意力特征網(wǎng)絡(luò)。首先對圖像進行尺度處理獲得多個尺度圖像；其次通過圖像級注意力機制融合所提取的多個尺度圖像特征獲取圖像級注意力特征；在此基礎(chǔ)上使用類級注意機制學(xué)習(xí)每個類的類級注意力特征。最后通過網(wǎng)絡(luò)計算樣本特征與每個類的類級注意力特征的相似性分數(shù)來預(yù)測分類。該文在Omniglot和MiniImageNet兩個數(shù)據(jù)集上驗證多級注意力特征網(wǎng)絡(luò)的有效性。實驗結(jié)果表明，相比于單一尺度圖像特征和均值類原型，多級注意力特征網(wǎng)絡(luò)進一步提高了小樣本條件下的分類準(zhǔn)確率。
- 圖像處理 /
- 多尺度圖像 /
- 小樣本學(xué)習(xí) /
- 多級注意力特征 /
- 相似性度量
Abstract:
Existing few-shot methods have problems that feature extraction scale is single, the learned class representations are inaccurate, the similarity calculation still relies on standard metrics. In order to solve the above problems, multi-level attention feature network is proposed. Firstly, the multiple scale images are obtained by scale processing, the features of multiple scale images are extracted and the image-level attention features are obtained by the image-level attention mechanism to fusion them. Then, class-level attention features are learned by using the class-level attention mechanism. Finally, the classification is performed by using the network to compute the similarity scores between features. The proposed method is evaluated on the Omniglot dataset and the MiniImagenet dataset. The experimental results show that multi-level attention feature network can further improve the classification accuracy under small sample conditions compared to the single-scale image features and average prototypes.
- Image processing /
- Multi-scale images /
- Few-shot learning /
- Multi-level attention feature /
- Similarity metric

HTML全文

圖 1 5-shot 分類網(wǎng)絡(luò)結(jié)構(gòu)圖

下載: 全尺寸圖片幻燈片

圖 2 特征模塊

下載: 全尺寸圖片幻燈片

圖 3 類別不平衡條件下的小樣本分類網(wǎng)絡(luò)結(jié)構(gòu)圖

下載: 全尺寸圖片幻燈片

圖 4 one-shot 分類網(wǎng)絡(luò)結(jié)構(gòu)圖

下載: 全尺寸圖片幻燈片

表 1 不同尺度圖像的特征提取網(wǎng)絡(luò)分支結(jié)構(gòu)

網(wǎng)絡(luò)名	分支1	分支2	分支3
結(jié)構(gòu)	$\left[ \begin{array}{l} {\rm C}:3 \times 3,64 \\ {\rm MP}:2 \times 2 \\ \end{array} \right]$	$\left[ \begin{array}{l} {\rm C}:3 \times 3,64 \\ {\rm MP}:2 \times 2 \\ \end{array} \right]$	$\left[ {{\rm C}:3 \times 3,64} \right]$
	$\left[ \begin{array}{l} {\rm C}:3 \times 3,64 \\ {\rm MP}:2 \times 2 \\ \end{array} \right]$	$\left[ {{\rm C}:3 \times 3,64} \right]$	$\left[ {{\rm C}:3 \times 3,64} \right]$
	$\left[ {{\rm C}:3 \times 3,64} \right]$	$\left[ {{\rm C}:3 \times 3,64} \right]$	$\left[ {{\rm C}:3 \times 3,64} \right]$
	$\left[ {{\rm C}:3 \times 3,64} \right]$	$\left[ {{\rm C}:3 \times 3,64} \right]$	$\left[ {{\rm C}:3 \times 3,64} \right]$

下載: 導(dǎo)出CSV

表 2 Omniglot數(shù)據(jù)集上的小樣本分類準(zhǔn)確率(%)

方法	微調(diào)	5-way 分類準(zhǔn)確率		20-way 分類準(zhǔn)確率
方法	微調(diào)	1-shot	5-shot	1-shot	5-shot
MANN	否	82.8	94.9	–	–
MATCHING NETS	是	97.9	98.7	93.5	98.7
PROTOTYPICAL NETS	否	98.8	99.7	96.0	98.9
MAML	是	98.7±0.4	99.9±0.1	95.8±0.3	98.9±0.2
RELATION NET	否	99.6±0.2	99.8±0.1	97.6±0.2	99.1±0.1
本文方法	否	99.6	99.7	97.8	99.2

下載: 導(dǎo)出CSV

表 3 MiniIamgenet數(shù)據(jù)集上的小樣本分類準(zhǔn)確率(%)

方法	微調(diào)	5-way分類準(zhǔn)確率
方法	微調(diào)	1-shot	5-shot
MATCHING NETS	否	43.56±0.84	53.11±0.73
META-LEARN LSTM	否	43.44±0.77	60.60±0.71
MAML	是	48.70±1.84	63.11±0.92
PROTOTYPICAL NETS	否	49.42±0.78	68.20±0.66
RELATION NETS	否	50.44±0.82	65.32±0.70
本文方法	否	53.18±0.80	66.72±0.71
本文方法(L2正則化)	否	54.56±0.81	67.39±0.68

下載: 導(dǎo)出CSV

表 4 MiniImageNet數(shù)據(jù)集上類特征方法的對比(%)

類特征	5-way 5-shot 分類準(zhǔn)確率
本文方法(均值類原型)	65.80±0.65
本文方法(求和)	65.56±0.66
本文方法(類級注意力特征)	66.43±0.68

下載: 導(dǎo)出CSV

表 5 MiniImageNet數(shù)據(jù)集上圖像特征方法的對比(%)

圖像特征	5-way 分類準(zhǔn)確率
圖像特征	1-shot	5-shot
本文方法(單尺度特征)	52.20±0.82	66.43±0.68
本文方法(兩尺度特征)	53.93±0.79	66.89±0.71
本文方法(圖像級注意力特征)	54.56±0.81	67.39±0.68

下載: 導(dǎo)出CSV

表 6 MiniImageNet數(shù)據(jù)集上多尺度方式對比(%)

多尺度方法	5-way 分類準(zhǔn)確率
多尺度方法	1-shot	5-shot
特征金字塔網(wǎng)絡(luò)	53.42±0.76	66.50±0.69
不同卷積核	53.27±0.83	66.29±0.66
本文方法	54.56±0.81	67.39±0.68

下載: 導(dǎo)出CSV

表 7 MiniImageNet數(shù)據(jù)集上相似性度量方法的對比(%)

度量方式	5-way 分類準(zhǔn)確率
度量方式	1-shot	5-shot
本文方法(歐氏距離)	48.43±0.78	63.52±0.71
本文方法(余弦相似度)	46.54±0.82	60.50±0.70
本文方法(網(wǎng)絡(luò)計算)	54.56±0.81	67.39±0.68

下載: 導(dǎo)出CSV

參考文獻(16)

GIRSHICK R. Fast R-CNN[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440–1448. doi: 10.1109/ICCV.2015.169.

HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2261–2269. doi: 10.1109/CVPR.2017.243.

HE Di, XIA Yingce, QIN Tao, et al. Dual learning for machine translation[C]. The 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 820–828.

LI Feifei, FERGUS R, and PERONA P. One-shot learning of object categories[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594–611. doi: 10.1109/TPAMI.2006.79

MEHROTRA A and DUKKIPATI A. Generative adversarial residual pairwise networks for one shot learning[EB/OL]. https://arxiv.org/abs/1703.08033, 2017.

DIXIT M, KWITT R, NIETHAMMER M, et al. AGA: Attribute-guided augmentation[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7455–7463. doi: 10.1109/CVPR.2017.355.

HARIHARAN B and GIRSHICK R. Low-shot visual recognition by shrinking and hallucinating features[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 3037–3046. doi: 10.1109/iccv.2017.328.

FINN C, ABBEEL P, and LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]. The 34th International Conference on Machine Learning, Sydney, Australia, 2017: 1126–1135.

RAVI S and LAROCHELLE H. Optimization as a model for few-shot learning[EB/OL]. https://openreview.net/forum?id=rJY0-Kcll, 2017.

SANTORO A, BARTUNOV S, BOTVINICK M, et al. Meta-learning with memory-augmented neural networks[C]. The 33rd International Conference on Machine Learning, New York, USA, 2016: 1842–1850.

KOCH G. Siamese neural networks for one-shot image recognition[EB/OL]. http://www.cs.utoronto.ca/~gkoch/files/msc-thesis.pdf, 2015.

VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[C]. The 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 3630–3638.

SNELL J, SWERSKY K, and ZEMEL R. Prototypical networks for few-shot learning[C]. The 31st Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 4080–4090.

SUNG F, YANG Yongxin, ZHANG Li, et al. Learning to compare: Relation network for few-shot learning[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1199–1208. doi: 10.1109/cvpr.2018.00131.

WANG Peng, LIU Lingqiao, and SHEN Chunhua. Multi-attention network for one shot learning[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6212–6220. doi: 10.1109/CVPR.2017.658.

HILLIARD N, HODAS N O, and CORLEY C D. Dynamic input structure and network assembly for few-shot learning[EB/OL]. https://arxiv.org/abs/1708.06819v1, 2017.

相關(guān)文章

施引文獻

資源附件(0)

訪問統(tǒng)計