基于感知深度神經(jīng)網(wǎng)絡(luò)的視覺(jué)跟蹤
doi: 10.11999/JEIT151449
國(guó)家自然科學(xué)基金(61175029, 61473309),陜西省自然科學(xué)基金(2015JM6269,2015JM6269,2016JM6050)
Robust Visual Tracking via Perceptive Deep Neural Network
The National Natural Science Foundation of China (61175029, 61473309), The Natural Science Foundation of Shaanxi Province (2015JM6269, 2015JM6269, 2016JM6050)
-
摘要: 視覺(jué)跟蹤系統(tǒng)中,高效的特征表達(dá)是決定跟蹤魯棒性的關(guān)鍵,而多線索融合是解決復(fù)雜跟蹤問(wèn)題的有效手段。該文首先提出一種基于多網(wǎng)絡(luò)并行、自適應(yīng)觸發(fā)的感知深度神經(jīng)網(wǎng)絡(luò);然后,建立一個(gè)基于深度學(xué)習(xí)的、多線索融合的分塊目標(biāo)模型。目標(biāo)分塊的實(shí)現(xiàn)成倍地減少了網(wǎng)絡(luò)輸入的維度,從而大幅降低了網(wǎng)絡(luò)訓(xùn)練時(shí)的計(jì)算復(fù)雜度;在跟蹤過(guò)程中,模型能夠根據(jù)各子塊的置信度動(dòng)態(tài)調(diào)整權(quán)重,提高對(duì)目標(biāo)姿態(tài)變化、光照變化、遮擋等復(fù)雜情況的適應(yīng)性。在大量的測(cè)試數(shù)據(jù)上進(jìn)行了實(shí)驗(yàn),通過(guò)對(duì)跟蹤結(jié)果進(jìn)行定性和定量分析表明,所提出算法具有很強(qiáng)的魯棒性,能夠比較穩(wěn)定地跟蹤目標(biāo)。
-
關(guān)鍵詞:
- 視覺(jué)跟蹤 /
- 特征表達(dá) /
- 深度學(xué)習(xí) /
- 感知深度神經(jīng)網(wǎng)絡(luò)
Abstract: In a visual tracking system, the feature description plays the most important role. Multi-cue fusion is an effective way to solve the tracking problem under many complex conditions. Therefore, a perceptive deep neural network based on multi parallel networks which can be triggered adaptively is proposed. Then, using the multi-cue fusion, a new tracking method based on deep learning is established, in which the target can be adaptively fragmented. The fragment decreases the input dimension, thus reducing the computation complexity. During the tracking process, the model can dynamically adjust the weights of fragments according to the reliability of them, which is able to improve the flexibility of the tracker to deal with some complex circumstances, such as target posture change, light change and occluded by other objects. Qualitative and quantitative analysis on challenging benchmark video sequences show that the proposed tracking method is robust and can track the moving target robustly.-
Key words:
- Visual tracking /
- Feature description /
- Deep learning /
- Perceptive deep neural network
-
侯志強(qiáng), 韓崇昭. 視覺(jué)跟蹤技術(shù)綜述[J]. 自動(dòng)化學(xué)報(bào), 2006, 32(4): 603-617. HOU Zhiqiang and HAN Chongzhao. A Survey of visual tracking[J]. Acta Automatica Sinica, 2006, 32(4): 603-617. WANG Naiyan, SHI Jianping, YEUNG Dityan, et al. Understanding and diagnosing visual tracking systems[C]. International Conference on Computer Vision, Santiago, Chile, 2015: 11-18. BABENKO B, YANG M, and BELONGIE S. Visual tracking with online multiple instance learning[C]. International Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009: 983-990. doi: 10.1109/CVPR.2009. 5206737. KALAL Z, MIKOLAJCZYK K, and MATAS J. Tracking learning detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(7): 1409-1422. doi: 10.1109/TPAMI.2011.239. HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]. International Conference on Computer Vision, Santiago, Chile, 2015: 1026-1034. COURBARIAUX M, BENGIO Y, and DAVID J P. Binary Connect: training deep neural networks with binary weights during propagations[C]. Advances in Neural Information Processing Systems, Montral, Quebec, Canada, 2015: 3105-3113. SAINATH T N, VINYALS O, SENIOR A, et al. Convolutional, long short term memory, fully connected deep neural networks[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, 2015: 4580-4584. doi: 10.1109/ICASSP.2015.7178838. PARKHI O M, VEDALDI A, and ZISSERMAN A. Deep face recognition[J]. Proceedings of the British Machine Vision, 2015, 1(3): 6. WANG Naiyan and YEUNG Dityan. Learning a deep compact image representation for visual tracking[C]. Advances in Neural Information Processing Systems, South Lake Tahoe, Nevada, USA, 2013: 809-817. 李寰宇, 畢篤彥, 楊源, 等. 基于深度特征表達(dá)與學(xué)習(xí)的視覺(jué)跟蹤算法研究[J]. 電子與信息學(xué)報(bào), 2015, 37(9): 2033-2039. LI Huanyu, BI Duyan, YANG Yuan, et al. Research on visual tracking algorithm based on deep feature expression and learning[J]. Journal of Electronics Information Technology, 2015, 37(9): 2033-2039. doi: 10.11999/JEIT150031. RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3): 211-252. doi: 10.1007/ s11263-015-0816-y. VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning Research, 2010, 11(11): 3371-3408. HINTON G E and SALAKHUTDINOV R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. doi: 10.1126/science.1127647. ADAM A, RIVLIN E, and SHIMSHONI I. Robust fragments-based tracking using the integral histogram[C]. International Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 2006: 798-805. doi: 10.1109/CVPR.2006.256. JULIER S J and UHLM J U. Unscented filtering and nonlinear estimation[J]. Proceedings of IEEE, 2004, 192(3): 401-422. doi: 10.1109/JPROC.2003.823141. YILMAZ A, JAVED O, and SHAH M. Object tracking: a survey[J]. ACM Computer Survey, 2006, 38(4): 1-45. NICKEL K and STIEFELHAGEN R. Dynamic integration of generalized cues for person tracking[C]. European Conference on Computer Vision, Marseille, France, 2008: 514-526. doi: 10.1007/978-3-540-88693-8_38. SPENGLER M and SCHIELE B. Towards robust multi-cue integration for visual tracking[J]. Machine Vision and Applications, 2003, 14(1): 50-58. doi: 10.1007/s00138-002- 0095-9. WU Yi, LIM Jongwoo, and YANG Minghsuan. Online object tracking: a benchmark[C]. International Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 2013: 2411-2418. ZHANG Kaihua, ZHANG Lei, and YANG Minghsuan. Real-time compressive tracking[C]. European Conference on Computer Vision, Florence, Italy, 2012: 866-879. doi: 10.1007/978-3-642-33712-3_62. SEVILLA-LARA L and LEARNED-MILLER E. Distribution fields for tracking[C]. International Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012: 1910-1917. doi: 10.1109/CVPR.2012.6247891. LI Hanxi, LI Yi, and PORIKLI Fatih. Deeptrack: learning discriminative feature representations by convolutional neural networks for visual tracking[C]. Proceedings of the British Machine Vision Conference, Nottingham, UK, 2014: 110-119. doi: 10.1109/TIP.2015.2510583. -
計(jì)量
- 文章訪問(wèn)數(shù): 1829
- HTML全文瀏覽量: 115
- PDF下載量: 945
- 被引次數(shù): 0