基于動(dòng)作注意策略的樹形DDQN目標(biāo)候選區(qū)域提取方法
doi: 10.11999/JEIT180358
-
1.
北京工業(yè)大學(xué)信息學(xué)部 ??北京 ??100124
-
2.
北京市計(jì)算智能與智能系統(tǒng)重點(diǎn)實(shí)驗(yàn)室 ??北京 ??100124
Region Proposal Generation for Object Detection Using Tree-DDQN by Action Attention
-
1.
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
-
2.
Beijing Key Laboratory of Computing Intelligence and Intelligent System, Beijing 100124, China
-
摘要:
針對機(jī)器人在家庭環(huán)境下的目標(biāo)檢測問題,該文提出一種基于動(dòng)作注意策略的樹形雙深度Q網(wǎng)絡(luò)(TDDQN)目標(biāo)候選區(qū)域提取的方法,該方法將雙深度Q網(wǎng)絡(luò)(DDQN)的方法與樹結(jié)構(gòu)的方法相結(jié)合,通過執(zhí)行改變檢測框的動(dòng)作以使目標(biāo)逐漸集中在檢測框內(nèi)。首先采用DDQN方法在執(zhí)行較少的動(dòng)作后選擇出當(dāng)前狀態(tài)的最佳動(dòng)作,獲取符合條件的候選區(qū)域。然后根據(jù)執(zhí)行所選擇動(dòng)作之后所得到的狀態(tài)重復(fù)執(zhí)行上述過程,以此構(gòu)成樹結(jié)構(gòu)的多條“最佳”路徑。最后采用非極大值抑制的方法從多個(gè)符合條件的候選區(qū)域選擇出最佳候選區(qū)域。在Pascal VOC2007以及Pascal VOC2012上的實(shí)驗(yàn)結(jié)果表明,在不同數(shù)量的候選區(qū)域、不同閾值的IoU和不同大小以及不同種類對象的實(shí)驗(yàn)條件下,所提方法較其他方法都有著更好的檢測性能,可以較好地實(shí)現(xiàn)目標(biāo)檢測。
-
關(guān)鍵詞:
- 目標(biāo)檢測 /
- 候選區(qū)域 /
- 樹結(jié)構(gòu) /
- 雙深度Q網(wǎng)絡(luò) /
- 動(dòng)作注意
Abstract:Considering the problem of object detection of robots in the home environments, a Tree-Double Deep Q Network (TDDQN) based on the attention action strategy is proposed to determine the locations of region proposals. It combines DDQN with hierarchical tree structure. First, DDQN is used to select the best action of current state and obtain the right region proposal with a few actions executed. According to the state obtained after executing the selected action, the above process is repeated to create multiple "best" paths of the hierarchical tree structure. The best region proposal is selected using non-maximum suppression on region proposals that meet the conditions. Experimental results on Pascal VOC2007 and Pascal VOC2012 show that the proposed method based on TDDQN has better detection performance than other methods for region proposals of different numbers, different Intersection-over-Union (IoU) values and objects of different sizes and kinds, respectively.
-
Key words:
- Object detection /
- Region proposal /
- Tree structure /
- Double Deep Q Network (DDQN) /
- Action attention
-
表 1 基于TDDQN的候選區(qū)域提取方法
輸入 當(dāng)前狀態(tài)(候選區(qū)域,樹的根節(jié)點(diǎn)) 輸出 下一狀態(tài)(新候選區(qū)域,樹的子節(jié)點(diǎn)) 步驟1 初始化IoU的閾值$\tau $和樹的最大層次數(shù)n的值,并設(shè)樹的
初始層次數(shù)為1;步驟2 根據(jù)當(dāng)前狀態(tài),分別在粗調(diào)動(dòng)作組和細(xì)調(diào)動(dòng)作組中選擇
出經(jīng)過DDQN方法得到的預(yù)測值最高的兩個(gè)動(dòng)作;步驟3 執(zhí)行粗調(diào)動(dòng)作后得到的狀態(tài)作為左節(jié)點(diǎn),執(zhí)行細(xì)調(diào)動(dòng)作
后得到的狀態(tài)作為右節(jié)點(diǎn);步驟4 樹的層次數(shù)加1; 步驟5 如果當(dāng)前樹的層次數(shù)小于n,并且仍有分支沒有被截止,
則執(zhí)行步驟6,否則執(zhí)行步驟7;步驟6 如果左節(jié)點(diǎn)IoU大于$\tau $,則予以截止,否則將左節(jié)點(diǎn)作為
其所在路徑的當(dāng)前狀態(tài)并執(zhí)行步驟2;相應(yīng)地,如果右節(jié)
點(diǎn)IoU大于$\tau $,則予以截止,否則將右節(jié)點(diǎn)作為其所在路
徑的當(dāng)前狀態(tài)并執(zhí)行步驟2;步驟7 對所有葉節(jié)點(diǎn)用非極大值抑制方法選取最優(yōu)的候選區(qū)域。 下載: 導(dǎo)出CSV
表 2 不同方法下Pascal VOC2007數(shù)據(jù)集中各檢測對象的平均檢測精度(%)
方法 瓶子 椅子 桌子 狗 人 沙發(fā) 電視機(jī) 平均檢測精度均值 RPN (vgg16)+Fast R-CNN (ResNet-101) 54.3 60.2 70.8 84.1 76.2 78.7 73.0 71.0 Faster R-CNN (ResNet-101) 55.6 56.4 69.1 88.0 77.8 79.5 71.7 71.2 DQN (vgg16)+Fast R-CNN (ResNet-101) 50.4 54.3 61.8 80.2 71.1 73.5 68.9 65.7 DDQN (vgg16)+Fast R-CNN (ResNet-101) 52.6 55.2 61.3 80.5 71.3 74.0 69.1 66.3 TRL (vgg16)+Fast R-CNN (ResNet-101) 55.0 60.1 73.3 84.5 76.3 79.6 73.4 71.7 TDDQN (vgg16)+Fast R-CNN (ResNet-101) 55.7 60.2 74.2 85.3 77.4 79.6 73.7 72.3 下載: 導(dǎo)出CSV
表 3 不同方法下Pascal VOC2012數(shù)據(jù)集中各檢測對象的平均檢測精度(%)
方法 瓶子 椅子 桌子 狗 人 沙發(fā) 電視機(jī) 平均檢測精度均值 RPN (vgg16)+Fast R-CNN (ResNet-101) 50.5 48.6 57.1 90.0 79.0 66.1 65.9 65.3 Faster R-CNN (ResNet-101) 50.8 48.5 59.0 91.9 80.5 66.3 65.4 66.1 DQN (vgg16)+Fast R-CNN (ResNet-101) 49.3 45.7 50.8 82.8 73.9 59.9 63.6 60.9 DDQN (vgg16)+Fast R-CNN (ResNet-101) 51.5 47.6 52.3 82.9 75.2 61.1 63.8 62.1 TRL (vgg16)+Fast R-CNN (ResNet-101) 53.1 51.7 55.6 87.8 80.7 66.6 67.6 66.2 TDDQN (vgg16)+Fast R-CNN (ResNet-101) 53.4 51.9 58.7 88.0 80.9 66.8 67.9 66.8 下載: 導(dǎo)出CSV
表 4 不同數(shù)據(jù)集上檢測單張圖片消耗的平均時(shí)間(s)
數(shù)據(jù)集 TDDQN (vgg16)+Fast R-CNN (ResNet-101) TRL (vgg16)+Fast R-CNN (ResNet-101) Faster R-CNN (ResNet-101) VOC2007 0.9 1.6 0.4 VOC2012 1.0 1.8 0.5 下載: 導(dǎo)出CSV
-
TANG K, JOULIN A, LI L J, et al. Co-localization in real-world images[C]. Computer Vision and Pattern Recognition, Columbus, USA, 2014: 1464–1471. 伍錫如, 黃國明, 孫立寧. 基于深度學(xué)習(xí)的工業(yè)分揀機(jī)器人快速視覺識別與定位算法[J]. 機(jī)器人, 2016, 38(6): 711–719. doi: 10.13973/j.cnki.robot.2016.0711WU Xiru, HUANG Guoming, and SUN Lining. Fast visual identification and location algorithm for industrial sorting robots based on deep learning[J]. Robot, 2016, 38(6): 711–719. doi: 10.13973/j.cnki.robot.2016.0711 DALAL N and TRIGGS B. Histograms of oriented gradients for human detection[C]. Computer Vision and Pattern Recognition, San Diego, USA, 2005: 886–893. SANDE K E A V D, UIJLINGS J R R, GEVERS T, et al. Segmentation as selective search for object recognition[C]. International Conference on Computer Vision, Barcelona, Spain, 2011, 1879–1886. ZITNICK C L and DOLLAR P. Edge boxes: Locating object proposals from edges[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014, 391–405. GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Computer Vision and Pattern Recognition, Columbus, USA, 2014, 580–587. GONZALEZ-GARCIA A, Vezhnevets A, and FERRARI V. An active search strategy for efficient object class detection[C]. Computer Vision and Pattern Recognition, Boston, USA, 2015, 3022–3031. CAICEDO J C and LAZEBNIK S. Active object localization with deep reinforcement learning[C]. International Conference on Computer Vision, Santiago, Chile, 2015, 2488–2496. BELLVER M, GIROINIETO X, MARQUES F, et al. Hierarchical object detection with deep reinforcement learning[OL]. http://arxiv.org/abs/1611.03718v2, 2016. doi: 10.3233/978-1-61499-822-8-164. JIE Zequn, LIANG Xiaodan, FENG Jiashi, et al. Tree-structured reinforcement learning for sequential object localization[C]. International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016, 127–135. HASSELT H V. Double Q-learning[C]. International Conference on Neural Information Processing Systems, Whistler, Canada, 2010, 2613–2621. HASSELT H V, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. Association for the Advancement of Artificial Intelligence, Phoenix, USA, 2016, 2094–2100. REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031 NAJEMNIK J and GEISLER W S. Optimal eye movement strategies in visual search[J]. American Journal of Ophthalmology, 2005, 139(6): 1152–1153. doi: 10.1038/nature03390 -