一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機號碼
標(biāo)題
留言內(nèi)容
驗證碼

跨層融合與多模型投票的動作識別

羅會蘭 盧飛 嚴(yán)源

羅會蘭, 盧飛, 嚴(yán)源. 跨層融合與多模型投票的動作識別[J]. 電子與信息學(xué)報, 2019, 41(3): 649-655. doi: 10.11999/JEIT180373
引用本文: 羅會蘭, 盧飛, 嚴(yán)源. 跨層融合與多模型投票的動作識別[J]. 電子與信息學(xué)報, 2019, 41(3): 649-655. doi: 10.11999/JEIT180373
Huilan LUO, Fei LU, Yuan YAN. Action Recognition Based on Multi-model Voting with Cross Layer Fusion[J]. Journal of Electronics & Information Technology, 2019, 41(3): 649-655. doi: 10.11999/JEIT180373
Citation: Huilan LUO, Fei LU, Yuan YAN. Action Recognition Based on Multi-model Voting with Cross Layer Fusion[J]. Journal of Electronics & Information Technology, 2019, 41(3): 649-655. doi: 10.11999/JEIT180373

跨層融合與多模型投票的動作識別

doi: 10.11999/JEIT180373
基金項目: 國家自然科學(xué)基金(61462035, 61862031),江西省青年科學(xué)家培養(yǎng)項目(20153BCB23010),江西省自然科學(xué)基金(20171BAB202014)
詳細(xì)信息
    作者簡介:

    羅會蘭:女,1974年生,博士,教授,研究方向為機器學(xué)習(xí)和模式識別等

    盧飛:男,1994年生,碩士生,研究方向為視頻中的動作識別、圖像語義分割等

    嚴(yán)源:男,1991年生,碩士生,研究方向為視頻中的動作識別等

    通訊作者:

    羅會蘭 luohuilan@sina.com

  • 中圖分類號: TP391.4

Action Recognition Based on Multi-model Voting with Cross Layer Fusion

Funds: The National Natural Science Foundation of China (61462035, 61862031), The Young Scientist Training Project of Jiangxi Province (20153BCB23010), The Natural Science Foundation of Jiangxi Province (20171BAB202014)
  • 摘要:

    針對動作特征在卷積神經(jīng)網(wǎng)絡(luò)模型傳輸時的損失問題以及網(wǎng)絡(luò)模型過擬合的問題,該文提出一種跨層融合模型和多個模型投票的動作識別方法。在預(yù)處理階段,借助排序池化的方法聚集視頻中的運動信息,生成近似動態(tài)圖像。在全連接層前設(shè)置對特征信息進(jìn)行水平翻轉(zhuǎn)結(jié)構(gòu),構(gòu)成無融合模型。在無融合模型的基礎(chǔ)上添加第2層的輸出特征與第5層的輸出特征融合結(jié)構(gòu),構(gòu)造成跨層融合模型。訓(xùn)練時,對無融合模型和跨層融合模型兩種基本模型采用3種數(shù)據(jù)劃分方式以及兩種生成近似動態(tài)圖像順序進(jìn)行訓(xùn)練,得到多個不同的分類器。測試時使用多個分類器進(jìn)行預(yù)測,對它們得到的結(jié)果進(jìn)行投票集成,作為最終分類結(jié)果。在UCF101數(shù)據(jù)集上,提出的無融合模型和跨層融合模型的識別方法與動態(tài)圖像網(wǎng)絡(luò)模型的方法相比,識別率有較大提高;多模型投票的識別方法能有效緩解模型的過擬合現(xiàn)象,增加算法的魯棒性,得到更好的平均性能。

  • 圖  1  無融合模型

    圖  2  跨層融合模型

    表  1  4種不同權(quán)重融合模型的平均識別準(zhǔn)確度(%)

    模型融合0.50融合0.25融合0.20融合0.10
    平均準(zhǔn)確度53.8963.1263.9464.82
    下載: 導(dǎo)出CSV

    表  2  跨層融合模型動作識別準(zhǔn)確度(%)

    動作類轉(zhuǎn)呼啦圈鍵盤打字軍隊行進(jìn)彈吉他擲鐵餅類平均
    split1+正序87.1480.40${\underline{87.14}}$${\underline{91.33}}$${\underline{77.45}}$82.47
    split1+反序${\underline{86.29}}$79.6387.9091.6576.8682.16
    split2+正序77.2888.3586.6489.2973.60${\underline{83.06}}$
    split2+反序76.66${\underline{88.88}}$86.2790.8871.3183.87
    split3+正序78.7289.2587.0291.2178.2083.03
    split3+反序78.9186.4686.9990.6676.6582.79
    注:粗體數(shù)字代表動作類中識別率最高,帶下劃線數(shù)字代表動作類的識別率次高。
    下載: 導(dǎo)出CSV

    表  3  VADMMR在5類動作上的識別準(zhǔn)確度(%)

    動作類轉(zhuǎn)呼啦圈鍵盤打字軍隊行進(jìn)彈吉他擲鐵餅類平均
    VADMMR83.7787.4388.8391.5879.8384.67
    下載: 導(dǎo)出CSV

    表  4  本文提出的VADMMR與其它動作識別方法對比

    文獻(xiàn)技術(shù)策略年份平均識別率(%)
    文獻(xiàn)[9]Spatial Stream ConvNet201473.0
    文獻(xiàn)[9]Temporal Stream ConvNet201483.7
    文獻(xiàn)[24]Composite LSTM201584.3
    文獻(xiàn)[7]動態(tài)圖像網(wǎng)絡(luò)(MDI)201670.9
    文獻(xiàn)[23]Spatial-C3D201783.6
    本文方法VADMMR201884.67
    下載: 導(dǎo)出CSV
  • BLACKBURN J and RIBEIRO E. Human Motion Recognition Using Isomap and Dynamic Time Warping[M]. Berlin Heidelberg: Springer, 2007: 285–298.
    QU Hang and CHENG Jian. Human action recognition based on adaptive distance generalization of isometric mapping[C]. Proceedings of the International Congress on Image and Signal Processing, Bangalore, India, 2013: 95–98. doi: 10.1109/cisp.2012.6469785.
    WANG Heng, KL?SER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013, 103(1): 60–79. doi: 10.1007/s11263-012-0594-8
    WANG Heng and SCHMID C. Action recognition with improved trajectories[C]. Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 2013: 3551–3558. doi: 10.1109/iccv.2013.441.
    OHNISHI K, HIDAKA M, and HARADA T. Improved dense trajectory with cross streams[C]. ACM on Multimedia Conference, Amsterdam, Holland, 2016: 257–261. doi: 10.1145/2964284.2967222.
    AHAD M A R, TAN J, KIM H, et al. Action recognition by employing combined directional motion history and energy images[C]. IEEE Conference On Computer Vision and Pattern Recognition. San Francisco, USA, 2010: 73–78. doi: 10.1109/CVPRW.2010.5543160.
    BILEN H, FERNANDO B, GAVVES E, et al. Dynamic image networks for action recognition[C]. Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 3034–3042. doi: 10.1109/cvpr.2016.331.
    CHERIAN A, FERNANDO B, HARANDI M, et al. Generalized rank pooling for activity recognition[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 2017: 3222–3231. doi: 10.1109/cvpr.2017.172.
    SIMONYAN K and ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]. Proceedings of the International Conference on Neural Information Processing Systems, Sarawak, Malaysia, 2014: 568–576. doi: 10.1109/iccvw.2017.368.
    LIU Hong, TU Juanhui, and LIU Mengyuan. Two-stream 3D convolutional neural network for skeleton-based action recognition[OL]. https://arxiv.org/abs/1705.08106, 2017.
    MOLCHANOV P, GUPTA S, KIM K, et al. Hand gesture recognition with 3D convolutional neural networks[C]. Proceedings of the Computer Vision and Pattern Recognition Workshops, Boston, USA, 2015: 1–7. doi: 10.1109/cvprw.2015.7301342.
    ZHU Yi, LAN Zhenzhong, NEWSAM S, et al. Hidden two-stream convolutional networks for action recognition[OL]. https://arxiv.org/abs/1704.00389, 2017.
    WEI Xiao, SONG Li, XIE Rong, et al. Two-stream recurrent convolutional neural networks for video saliency estimation[C]. Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, Cagliari, Italy, 2017: 1–5. doi: 10.1109/bmsb.2017.7986223.
    SHI Yemin, TIAN Yonghong, WANG Yaowei, et al. Sequential deep trajectory descriptor for action recognition with three-stream CNN[J]. IEEE Transactions on Multimedia, 2017, 19(7): 1510–1520. doi: 10.1109/TMM.2017.2666540
    SONG Sibo, CHANDRASEKHAR V, MANDAL B, et al. Multimodal multi-stream deep learning for egocentric activity recognition[C]. Proceedings of the Computer Vision and Pattern Recognition Workshops, Las Vegas, USA, 2016: 24–31. doi: 10.1109/cvprw.2016.54.
    NISHIDA N and NAKAYAMA H. Multimodal Gesture Recognition Using Multi-Stream Recurrent Neural Network[M]. New York, Springer-Verlag, Inc., 2015: 682–694.
    朱麗, 吳雨川, 胡峰, 等. 老年人動作識別系統(tǒng)研究[J]. 計算機工程與應(yīng)用, 2017, 53(14): 24–31. doi: 10.3778/j.issn.1002-8331.1703-0470

    ZHU Li, WU Yuchuan, HU Feng, et al. Study on action recognition system for the aged[J]. Computer engineering and Application, 2017, 53(14): 24–31. doi: 10.3778/j.issn.1002-8331.1703-0470
    壽質(zhì)彬. 基于神經(jīng)網(wǎng)絡(luò)模型融合的圖像識別研究[D]. [碩士論文], 華南理工大學(xué), 2015.

    SHOU Zhibin. Research on image recognition base on neural networks and model Combination[D]. [Master dissertation], South China University of Technology, 2015.
    HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    DIETTERICH T G. Ensemble methods in machine learning[J]. 1st International Workshgp on Multiple Classifier Systems, 2000, 1857(1): 1–15. doi: 10.1007/3-540-45014-9_1
    FERNANDO B, GAVVES E, ORAMAS M J, et al. Modeling video evolution for action recognition[C]. Proceedings of the Computer Vision and Pattern Recognition, Boston, USA, 2015: 5378–5387. doi: 10.1109/cvpr.2015.7299176.
    SOOMRO K, ZAMIR A R, and SHAH M. UCF101: A dataset of 101 human actions classes from videos in the wild[OL]. https://arxiv.org/abs/1212.0402, 2012.
    TRAN A and CHEONG L F. Two-stream flow-guided convolutional attention networks for action recognition[C]. Proceedings of the IEEE International Conference on Computer Vision Workshop, Venice, Italy, 2017: 3110–3119. doi: 10.1109/iccvw.2017.368.
    SRIVASTAVA N, MANSIMOV E, and SALAKHUTDINOV R. Unsupervised learning of video representations using LSTMs[C]. International Conference on Machine Learning, Lille, France, 2015: 843–852.
  • 加載中
圖(2) / 表(4)
計量
  • 文章訪問數(shù):  1858
  • HTML全文瀏覽量:  783
  • PDF下載量:  70
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2018-04-24
  • 修回日期:  2018-11-02
  • 網(wǎng)絡(luò)出版日期:  2018-11-12
  • 刊出日期:  2019-03-01

目錄

    /

    返回文章
    返回