基于弱監(jiān)督E2LSH和顯著圖加權(quán)的目標(biāo)分類方法
doi: 10.11999/JEIT150337
基金項(xiàng)目:
國家自然科學(xué)基金(60872142, 61301232)
Object Classification Method Based on Weakly Supervised E2LSH and Saliency Map Weighting
Funds:
The National Natural Science Foundation of China (60872142, 61301232)
-
摘要: 在目標(biāo)分類領(lǐng)域,當(dāng)前主流的目標(biāo)分類方法是基于視覺詞典模型,而時(shí)間效率低、視覺單詞同義性和歧義性及單詞空間信息的缺失等問題嚴(yán)重制約了其分類性能。針對這些問題,該文提出一種基于弱監(jiān)督的精確位置敏感哈希(E2LSH)和顯著圖加權(quán)的目標(biāo)分類方法。首先,引入E2LSH算法對訓(xùn)練圖像集的特征點(diǎn)聚類生成一組視覺詞典,并提出一種弱監(jiān)督策略對E2LSH中哈希函數(shù)的選取進(jìn)行監(jiān)督,以降低其隨機(jī)性,提高視覺詞典的區(qū)分性。然后,利用GBVS(Graph-Based Visual Saliency)顯著度檢測算法對圖像進(jìn)行顯著度檢測,并依據(jù)單詞所處區(qū)域的顯著度值為其分配權(quán)重;最后,利用顯著圖加權(quán)的視覺語言模型完成目標(biāo)分類。在數(shù)據(jù)集Caltech-256和Pascal VOC 2007上的實(shí)驗(yàn)結(jié)果表明,所提方法能夠較好地提高詞典生成效率,提高目標(biāo)表達(dá)的分辨能力,其目標(biāo)分類性能優(yōu)于當(dāng)前主流方法。
-
關(guān)鍵詞:
- 目標(biāo)分類 /
- 視覺詞典模型 /
- 精確位置敏感哈希 /
- 視覺顯著圖 /
- 視覺語言模型
Abstract: The most popular approach in object classification is based on the bag of visual-words model. However, there are several fundamental problems that restricts the performance of this method, such as low time efficiency, the synonym and polysemy of visual words, and the lack of spatial information between visual words. In view of this, an object classification method based on weakly supervised Exact Euclidean Locality Sensitive Hashing (E2LSH) and saliency map weighting is proposed. Firstly, E2LSH is employed to generate a group of visual dictionary by clustering SIFT features of the training dataset, and the selecting process of hash functions is effectively supervised inspired by the random forest ideas to reduce the randomcity of E2LSH. Secondly, Graph-Based Visual Saliency (GBVS) algorithm is applied to detect the saliency map of different images and visual words are weighted according to the saliency prior. Finally, saliency map weighted visual language model is carried out to accomplish object classification. Experimental results on datasets of Caltech-256 and Pascal 2007 indicate that the distinguishability of objects is effectively improved and the proposed method is superior to the state- of-the-art object classification methods. -
SIVIC J and ZISSERMAN A. Video Google: a text retrieval approach to object matching in videos[C]. Proceedings of 9th IEEE International Conference on Computer Vision, Nice, France, 2003: 1470-1477. CHEN Y Z, Dick A, LI X, et al. Spatially aware feature selection and weighting for object retrieval[J]. Image and Vision Computing, 2013, 31(6): 935-948. WANG J Y, Bensmail H, and GAO X. Joint learning and weighting of visual vocabulary for bag-of-feature based tissue classification[J]. Pattern Recognition, 2013, 46(3): 3249-3255. OT?VIO A, PENATTI B, FERNANDA B S, et al. Visual word spatial arrangement for image retrieval and classification[J]. Pattern Recognition, 2014, 47(1): 705-720. 宋相法, 焦李成. 基于稀疏編碼和集成學(xué)習(xí)的多示例多標(biāo)記圖像分類方法[J]. 電子與信息學(xué)報(bào), 2013, 35(3): 622-626. doi: 10.3724/SP.J.1146.2012.01218. SONG Xiangfa and JIAO Licheng. A multi-instance multi-label image classification method based on sparse coding and ensemble learning[J]. Jounal of Electronics Information Technology, 2013, 35(3): 622-626. doi: 10.3724/ SP.J.1146.2012.01218. LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91-110. VAN GEMERT J C, VEENMAN C J, SMEULDERS A W M, et al. Visual word ambiguity[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(7): 1271-1283. NISTER D and STEWENIUS H. Scalable recognition with a vocabulary tree[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006: 2161-2168. PHILBIN J, CHUM O, ISARD M, et al. Object retrieval with large vocabularies and fast spatial matching[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, USA, 2007: 1-8. MU Y D, SUN J, and YAN S C. Randomized locality sensitive vocabularies for bag-of-features model[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 1-14. CAO Yiqun, JIANG Tao, and THOMAS G. Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing[J]. Bioinformatics, 2010, 26(7): 953-959. XIA Hao, WU Pengcheng, and STEVEN C H. Boosting multi-kernel locality-sensitive hashing for scalable image retrieval[C]. Proceedings of 35th ACM SIGIR Conference on Research and Development in Information Retrieval, Portland, Oregon, USA, 2012: 55-64. 張瑞杰, 郭志剛, 李弼程. 基于E2LSH-MKL的視覺語義概念檢測[J]. 自動(dòng)化學(xué)報(bào), 2012, 38(10): 1671-1678. ZHANG Ruijie, GUO Zhigang, and LI Bicheng. A visual semantic concept detection algorithm based on E2LSH- MKL[J]. Acta Automatica Sinica, 2012, 38(10): 1671-1678. ZHENG Q and GAO W. Constructing visual phrases for effective and efficient object-based image retrieval[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2008, 5(1): 1-19. CHEN T, YAP K H, and ZHANG D J. Discriminative soft bag-of-visual phrase for mobile landmark recognition[J]. IEEE Transactions on Multimedia, 2014, 16(3): 612-622. PHILBIN J, CHUM O, ISARD M, et al. Lost in quantization: improving particular object retrieval in large scale image databases[C]. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, USA, 2009: 278-286. WEINSHALL D, LEVI G, and HANUKAEV D. LDA topic model with soft assignment of descriptors to words[C]. Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA, 2013: 711-719. LAZEBNIK S, SCHMID C, and PONCE J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, USA, 2006: 2169-2178. SHARMA G and JURIE F. Learning discriminative spatial representation for image classification[C]. Proceedings of the 22nd British Machine Vision Conference, Dundee, Britain, 2011: 1-11. 趙春暉, 王瑩, KANEKO M. 一種基于詞典模型的圖像優(yōu)化分類方法[J]. 電子與信息學(xué)報(bào), 2012, 34(9): 2064-2070. doi: 10.3724/SP.J.1146.2012.00047. ZHAO Chunhui, WANG Ying, and KANEKO M. An optimized method for image classification based on bag of words model[J]. Journal of Electronics Information Technology, 2012, 34(9): 2064-2070. doi: 10.3724/ SP.J.1146.2012.00047. 趙仲秋, 季海峰, 高雋, 等. 基于稀疏編碼多尺度空間潛在語義分析的圖像分類[J]. 計(jì)算機(jī)學(xué)報(bào), 2014, 37(6): 1251-1260. ZHAO Zhongqiu, JI Haifeng, GAO Jun, et al. Sparse coding based on multi-scale spatial latent semantic analysis for image classification[J]. Chinese Journal of Computers, 2014, 37(6): 1251-1260. XIE L, TIAN Q, and ZHANG B. Spatial pooling of heterogeneous features for image classification[J]. IEEE Transactions on Image Processing, 2014, 23(5): 1994-2008. GENG B, YANG L, and XU C. A study of language model for image retrieval[C]. Proceedings of IEEE International Conference on Data Mining Workshops, Washington, DC, USA, 2009: 158-163. 吳磊. 視覺語言分析: 從底層視覺特征表達(dá)到語義距離學(xué)習(xí)[D]. [博士論文], 中國科學(xué)技術(shù)大學(xué), 2010. WU Lei. Visual language analysis: from low level feature representation to semantic metric learning[D]. [Ph.D. dissertation], University of Science and Technology of China, 2010. DATAR M, IMMORLICA N, and INDYK P. Locality-sensitive hashing scheme based on p-stable distributions[C]. Proceedings of the 20th Annual Symposium on Computational Geometry, New York, USA, 2004: 253-262. HAREL J, KOCH C, and PERONA P. Graph-based visual saliency [C]. Proceedings of Advances in Neural Information Processing Systems, NewYork, USA, 2007: 545-552. SLANEY M and CASEY M. Locality-sensitive hashing for finding nearest neighbors[J]. IEEE Signal Processing Magazine, 2008, 25(2): 128-131. 高毫林, 彭天強(qiáng), 李弼程. 基于多表頻繁項(xiàng)投票和桶映射鏈的快速檢索方法[J]. 電子與信息學(xué)報(bào), 2012, 34(11): 2574-2581. doi: 10.3724/ SP.J.1146.2012.00548. GAO Haolin, PENG Tianqiang, and LI Bicheng. A fast retrieval method based on frequent items voting of multi table and bucket map chain[J]. Journal of Electronics Information Technology, 2012, 34(11): 2574-2581. doi: 10.3724/SP.J.1146.2012.00548. ITTI L, KOCH C, and NIEBUR E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(4): 1254-1259. LI F F, FERGUS R, and PERONA P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories[J]. Computer Vision and Image Understanding, 2007, 106(1): 59-70. -
計(jì)量
- 文章訪問數(shù): 1199
- HTML全文瀏覽量: 133
- PDF下載量: 714
- 被引次數(shù): 0