基于深度學習的Android惡意軟件檢測:成果與挑戰(zhàn)
doi: 10.11999/JEIT200009
-
1.
中國科學院信息工程研究所 北京 100093
-
2.
香港中文大學 香港 999077
-
3.
中國科學院網(wǎng)絡測評技術重點實驗室 北京 100093
-
4.
中國科學院大學網(wǎng)絡空間安全學院 北京 100049
Android Malware Detection Based on Deep Learning: Achievements and Challenges
-
1.
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
-
2.
Chinese University of Hong Kong, Hongkong 999077, China
-
3.
Key Laboratory of Network Assessment Technology, Chinese Academy of Sciences, Beijing 100093, China
-
4.
School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
-
摘要: 隨著Android應用的廣泛使用,Android惡意軟件數(shù)量迅速增長,對用戶的財產(chǎn)、隱私等造成的安全威脅越來越嚴重。近年來基于深度學習的Android惡意軟件檢測成為了當前安全領域的研究熱點。該文分別從數(shù)據(jù)采集、應用特征、網(wǎng)絡結構、效果檢測4個方面,對該研究方向已有的學術成果進行了分析與總結,討論了它們的局限性與所面臨的挑戰(zhàn),并就該方向未來的研究重點進行了展望。
-
關鍵詞:
- 移動安全 /
- Android惡意軟件 /
- Android應用 /
- 深度學習 /
- 機器學習
Abstract: With the prosperous of Android applications, Android malware has been scattered everywhere, which raises the serious security risk to users. On the other hand, the rapid developing of deep learning fires the combat between the two sides of malware detection. Inducing deep learning technologies into Android malware detection becomes the hottest topic of society. This paper summarizes the existing achievements of malware detection from four aspects: Data collection, feature construction, network structure and detection performance. Finally, the current limitations and facing challenges followed by the future researches are discussed.-
Key words:
- Mobile security /
- Android malware /
- Android application /
- Deep learning /
- Machine learning
-
表 1 Android惡意軟件公開數(shù)據(jù)集統(tǒng)計表
數(shù)據(jù)集名稱 惡意軟件數(shù)量 軟件收集時間 軟件檢測方法 下載鏈接 VirusShare[24] 34311879 2011至今 未說明 https://virusshare.com AndroZoo[25] 1302968 2011至今 VirusTotal https://androzoo.uni.lu ArgusLab[26] 24650 2010~2016 VirusTotal http://amd.arguslab.org Drebin[28] 5560 2010~2012 VirusTotal http://contagiominidump.blogspot.com ISCX[29] 1929 2012~2015 VirusTotal https://www.unb.ca/cic/datasets/index.html Genome[30] 1260 2010~2011 未說明 http://www.malgenomeproject.org Contagio[27] 252 2011~2018 未說明 http://contagiominidump.blogspot.com 下載: 導出CSV
表 3 在相同數(shù)據(jù)下現(xiàn)有深度學習模型與傳統(tǒng)機器學習模型效果對比統(tǒng)計表(%)
研究工作 評價指標 深度學習模型 傳統(tǒng)機器學習模型 支持向量機 決策樹 樸素貝葉斯 邏輯回歸 隨機森林 K最近鄰 文獻[12] m4 96.5 80.0 77.5 79.0 78.0 文獻[14] m1 100 53.3 47.0 m2 98.3 34.8 54.0 m4 99.4 66.0 82.0 文獻[19] m1 95.77 92.08 75.09 79.22 64.18 m2 97.84 93.75 98.64 91.82 95.91 m4 96.76 92.84 82.95 83.86 71.19 文獻[22] m1 99.52 94.23 93.77 95.64 97.04 95.40 m2 99.83 95.89 94.68 95.90 94.69 93.16 m3 99.74 95.05 94.22 95.77 95.85 94.27 m4 99.68 94.97 94.13 95.82 95.93 94.29 文獻[32] m1 94.82 87.6 92 76.5 93.8 m2 97.76 87.5 92 76.8 93.8 m5 90.86 94.4 95.5 85.5 97.1 m6 9.14 5.6 4.5 14.5 2.9 m7 2.24 24.2 13.9 38 12 注:各評價指標的含義如下。m1:精確率(Precision),m2:召回率/真正率(recall/TPR),m3:F-measure,m4:準確率(accuracy),m6:假正率(FPR),m7:假負率(FNR) 下載: 導出CSV
表 4 在不同數(shù)據(jù)不同特征下現(xiàn)有基于深度學習的方法與基于傳統(tǒng)機器學習的方法效果對比統(tǒng)計表
研究工作 機器學習模型 m1(%) m2(%) m3(%) m4(%) m6(%) m7(%) m8(s) 文獻[11] 深度學習 98 99 文獻[28] 支持向量機 93.9 文獻[62] 決策樹 78 文獻[63] 樸素貝葉斯 93 文獻[61] K最近鄰 99 文獻[64] 極限梯度提升決策樹 97 97 文獻[35] 深度學習 96 93 9 0.5 文獻[28] 支持向量機 94.0 1.0 0.75 文獻[65] 隨機森林 95.3 92 0.34 19.8 文獻[39] 深度學習 98.84 98.47 98.65 98.86 文獻[66] 邏輯回歸 80.99 87.11 83.93 83.26 文獻[44] 深度學習 98.98 1.58 文獻[67] 隨機森林 97.42 4.33 文獻[20] 深度學習 99 95 97 98 文獻[68] 支持向量機 98 文獻[69] 樸素貝葉斯 94 91 92 91 文獻[67] 隨機森林 98 97 97 97 注:各評價指標的含義如下。m1:精確率(Precision),m2:召回率/真正率(recall/TPR),m3:F-measure,m4:準確率(accuracy),m6:假正率(FPR),m7:假負率(FNR),m8:檢測時間 下載: 導出CSV
表 5 基于深度學習的Android惡意軟件檢測工作效果互相對比統(tǒng)計表(%)
研究工作 m1 m2 m3 m4 m6 m7 文獻[11] 99 98 文獻[19] 96.8 文獻[20] 86 87 文獻[35] 96 93 9 0.5 文獻[20] 99.3 99 3 2.5 文獻[39] 98.87 98.47 98.65 98.86 文獻[13] 83.24 87.67 85.39 84.95 文獻[18] 94.76 91.31 93.00 93.10 文獻[20] 67 98.47 71.00 69.00 文獻[44] 98.98 1.58 文獻[20] 89.50 6.72 文獻[32] 98.09 99.56 98.82 98.5 文獻[33] 93.96 93.36 93.68 93.68 文獻[19] 96.78 96.76 96.76 96.76 文獻[20] 99 95 97 98 文獻[21] 95.31 文獻[35] 93 文獻[33] 93.68 注:各評價指標的含義如下。m1:精確率(Precision),m2:召回率/真正率(recall/TPR),m3:F-measure,m4:準確率(accuracy),m6:假正率(FPR),m7:假負率(FNR) 下載: 導出CSV
-
CHAU M and REITH R. Smartphone market share[EB/OL]. https://www.idc.com/promo/smartphone-market-share/os, 2019. Tencent Mobile Butler. Tencent mobile security lab mobile security report in the first half year of 2019[EB/OL]. https://m.qq.com/security_lab/news_detail_517.html, 2019. WANG Bolun, YAO Yuanshun, SHAN S, et al. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks[C]. 2019 IEEE Symposium on Security and Privacy, San Francisco, USA, 2019: 707–723. doi: 10.1109/SP.2019.00031. SAFAVIAN S R and LANDGREBE D. A survey of decision tree classifier methodology[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1991, 21(3): 660–674. doi: 10.1109/21.97458 SUYKENS J A K and VANDEWALLE J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999, 9(3): 293–300. doi: 10.1023/A:1018628609742 MCCALLUM A and NIGAM K. A comparison of event models for naive Bayes text classification[C]. AAAI-98 Workshop on Learning for Text Categorization, Madison, Isconsin, USA, 1998: 41–48. 王鑫, 李可, 寧晨, 等. 基于深度卷積神經(jīng)網(wǎng)絡和多核學習的遙感圖像分類方法[J]. 電子與信息學報, 2019, 41(5): 1098–1105. doi: 10.11999/JEIT180628WANG Xin, LI Ke, NING Chen, et al. Remote sensing image classification method based on deep convolution neural network and multi-kernel learning[J]. Journal of Electronics &Information Technology, 2019, 41(5): 1098–1105. doi: 10.11999/JEIT180628 徐少平, 張貴珍, 李崇禧, 等. 基于深度置信網(wǎng)絡的隨機脈沖噪聲快速檢測算法[J]. 電子與信息學報, 2019, 41(5): 1130–1136. doi: 10.11999/JEIT180558XU Shaoping, ZHANG Guizhen, LI Chongxi, et al. A fast random-valued impulse noise detection algorithm based on deep belief network[J]. Journal of Electronics &Information Technology, 2019, 41(5): 1130–1136. doi: 10.11999/JEIT180558 楊宏宇, 王峰巖. 基于深度卷積神經(jīng)網(wǎng)絡的氣象雷達噪聲圖像語義分割方法[J]. 電子與信息學報, 2019, 41(10): 2373–2381. doi: 10.11999/JEIT190098YANG Hongyun and WANG Fengyan. Meteorological radar noise image semantic segmentation method based on deep convolutional neural network[J]. Journal of Electronics &Information Technology, 2019, 41(10): 2373–2381. doi: 10.11999/JEIT190098 STATISTA. Number of available applications in the Google Play Store from December 2009 to June 2020[EB/OL]. https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-play-store/, 2019. KIM T, KANG B, RHO M, et al. A multimodal deep learning method for Android malware detection using various features[J]. IEEE Transactions on Information Forensics and Security, 2019, 14(3): 773–788. doi: 10.1109/TIFS.2018.2866319 YUAN Zhenlong, LU Yongqiang, WANG Zhaoguo, et al. Droid-sec: Deep learning in android malware detection[C]. 2014 ACM Conference on SIGCOMM, Chicago, USA, 2014: 371–372. doi: 10.1145/2619239.2631434. XIAO Xi, WANG Zhenlong, LI Qing, et al. Back-propagation neural network on Markov chains from system call sequences: A new approach for detecting Android malware with system call sequences[J]. IET Information Security, 2017, 11(1): 8–15. doi: 10.1049/iet-ifs.2015.0211 NIX R and ZHANG Jian. Classification of Android apps and malware using deep neural networks[C]. 2017 International Joint Conference on Neural Networks, Anchorage, USA, 2017: 1871–1878. doi: 10.1109/IJCNN.2017.7966078. HUANG Na, XU Ming, ZHENG Ning, et al. Deep android malware classification with API-based feature graph[C]. The 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science and Engineering, Rotorua, New Zealand, 2019: 296–303. doi: 10.1109/TrustCom/BigDataSE.2019.00047. ABDERRAHMANE A, ADNANE G, YACINE C, et al. Android malware detection based on system calls analysis and CNN classification[C]. 2019 IEEE Wireless Communications and Networking Conference Workshop, Marrakech, Morocco, 2019: 1–6. doi: 10.1109/WCNCW.2019.8902627. WANG Wei, ZHAO Mengxue, and WANG Jigang. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network[J]. Journal of Ambient Intelligence and Humanized Computing, 2019, 10(8): 3035–3043. doi: 10.1007/s12652-018-0803-6 XIAO Xi, ZHANG Shaofeng, MERCALDO F, et al. Android malware detection based on system call sequences and LSTM[J]. Multimedia Tools and Applications, 2019, 78(4): 3979–3999. doi: 10.1007/s11042-017-5104-0 YUAN Zhenlong, LU Yongqiang, and XUE Yibo. Droiddetector: Android malware characterization and detection using deep learning[J]. Tsinghua Science and Technology, 2016, 21(1): 114–123. doi: 10.1109/TST.2016.7399288 MCLAUGHLIN N, MARTINEZ DEL RINCON J, KANG B, et al. Deep android malware detection[C]. The 7th ACM on Conference on Data and Application Security and Privacy, Scottsdale, USA, 2017: 301–308. doi: 10.1145/3029806.3029823. NAWAY A and LI Yuancheng. Using deep neural network for Android malware detection[EB/OL]. https://arxiv.org/pdf/1904.00736, 2019. WANG Zhiqiang, LI Gefei, CHI Yaping, et al. Android malware detection based on convolutional neural networks[C]. The 3rd International Conference on Computer Science and Application Engineering, Sanya, China, 2019: 1–151. doi: 10.1145/3331453.3361306. SABHADIYA S, BARAD J, and GHEEWALA J. Android malware detection using deep learning[C]. The 3rd International Conference on Trends in Electronics and Informatics, Tirunelveli, India, 2019: 1254–1260. doi: 10.1109/ICOEI.2019.8862633. MELISSA. VirusShare. Com-because sharing is caring[EB/OL]. https://virusshare.com, 2019. ALLIX K, BISSYANDé T F, KLEIN J, et al. Androzoo: Collecting millions of android apps for the research community[C]. The 13th IEEE/ACM Working Conference on Mining Software Repositories, Austin, TX, USA, 2016: 468–471. WEI Fengguo, LI Yuping, ROY S, et al. Deep ground truth analysis of current android malware[C]. The 14th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Bonn, Germany, 2017: 252–276. doi: 10.1007/978-3-319-60876-1. MILAPARKOUR. Contagio mobile mobile malware mini dump[EB/OL]. http://contagiominidump.blogspot.com, 2019. ARP D, SPREITZENBARTH M, HUBNER M, et al. Drebin: Effective and explainable detection of android malware in your pocket[C]. The 21st Annual Network and Distributed System Security Symposium, San Diego, California, USA, 2014: 23–26. doi: 10.14722/ndss.2014.23247. KADIR A F A, STAKHANOVA N, and GHORBANI A A. Android botnets: What URLs are telling us[C]. The 9th International Conference on Network and System Security, New York, NY, 2015: 78–79. doi: 10.1007/978-3-319-25645-0_6. ZHOU Yajin and JIANG Xuxian. Dissecting android malware: Characterization and evolution[C]. 2012 IEEE Symposium on Security and Privacy, San Francisco, USA, 2012: 95–109. doi: 10.1109/SP.2012.16. YE Yanfang, HOU Shifu, CHEN Lingwei, et al. Out-of-sample node representation learning for heterogeneous graph in real-time android malware detection[C]. The 28th International Joint Conference on Artificial Intelligence, Macao, China, 2019: 4150–4156. doi: 10.24963/ijcai.2019/576. ALZAYLAEE M K, YERIMA S Y, and SEZER S. DL-Droid: Deep learning based android malware detection using real devices[J]. Computers & Security, 2020, 89: 101663. doi: 10.1016/j.cose.2019.101663 HOU Shifu, SAAS A, CHEN Lifei, et al. Deep4MalDroid: A deep learning framework for android malware detection based on Linux kernel system call graphs[C]. 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops, Omaha, USA, 2016: 104–111. doi: 10.1109/WIW.2016.040. HOU Shifu, SAAS A, YE Yanfang, et al. Droiddelver: An android malware detection system using deep belief network based on API call blocks[C]. WAIM 2016 International Conference on Web-Age Information Management, Nanchang, China, 2016: 54–55. doi: 10.1007/978-3-319-47121-1_5. HUANG T H D and KAO H Y. R2-D2: Color-inspired convolutional neural network (CNN)-based android malware detections[C]. 2018 IEEE International Conference on Big Data, Seattle, USA, 2018: 2633–2642. doi: 10.1109/BigData.2018.8622324. NAUMAN M, TANVEER T A, KHAN S, et al. Deep neural architectures for large scale android malware analysis[J]. Cluster Computing, 2018, 21(1): 569–588. doi: 10.1007/s10586-017-0944-y DUC N V and GIANG P T. NADM: Neural network for Android detection malware[C]. The 9th International Symposium on Information and Communication Technology, Danang City, Vietnam, 2018: 449–455. doi: 10.1145/3287921.3287977. AAFER Y, DU WENLIANG, and YIN Heng. Droidapiminer: Mining API-level features for robust malware detection in android[C]. The 9th International Conference on Security and Privacy in Communication Systems, Sydney, Australia, 2013: 86–103. doi: 10.1007/978-3-319-04283-1_6. PEKTA? A and ACARMAN T. Deep learning for effective Android malware detection using API call graph embeddings[J]. Soft Computing, 2020, 24(2): 1027–1043. doi: 10.1007/s00500-019-03940-5 SUN Yizhou and HAN Jiawei. Mining heterogeneous information networks: Principles and methodologies[J]. Synthesis Lectures on Data Mining and Knowledge Discovery, 2012, 3(2): 1–159. doi: 10.2200/S00433ED1V01Y201207DMK005 FAN Yujie, HOU Shifu, ZHANG Yiming, et al. Gotcha-sly malware!: Scorpion a metagraph2vec based malware detection system[C]. The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 2018: 253–262. doi: 10.1145/3219819.3219862. DONG Yuxiao, CHAWLA N V, and SWAMI A. Metapath2vec: Scalable representation learning for heterogeneous networks[C]. The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017: 135–144. doi: 10.1145/3097983.3098036. FU Taoyang, LEE W C, and LEI Zhen. Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning[C]. 2017 ACM on Conference on Information and Knowledge Management, Singapore, 2017: 1797–1806. doi: 10.1145/3132847.3132953. MA Zhuo, GE Haoran, LIU Yang, et al. A combination method for Android malware detection based on control flow graphs and machine learning algorithms[J]. IEEE Access, 2019, 7: 21235–21245. doi: 10.1109/ACCESS.2019.2896003 GEORGE R C and JAIN A K. Markov random field texture models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1983, PAMI-5(1): 25–39. doi: 10.1109/TPAMI.1983.4767341 CSáJI B C. Approximation with artificial neural networks[D]. [Master dissertation], Eotvos Loránd University, 2001: 7. KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. The 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2012: 1106–1114. LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. doi: 10.1109/5.726791 SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. The 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9. doi: 10.1109/CVPR.2015.7298594. BERGSTRA J, BARDENET R, BENGIO Y, et al. Algorithms for hyper-parameter optimization[C]. The 25th Annual Conference on Neural Information Processing Systems 2011, Granada, Spain, 2011: 2546–2554. SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2818–2826. doi: 10.1109/CVPR.2016.308. HOCHREITER S and SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735–1780. doi: 10.1162/neco.1997.9.8.1735 MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2204–2212. SALAKHUTDINOV R and MURRAY I. On the quantitative analysis of deep belief networks[C]. The 25th International Conference on Machine Learning, Helsinki, Finland, 2008: 872–879. doi: 10.1145/1390156.1390266. ALONSO J M and CHEN Yao. Receptive field[J]. Scholarpedia, 2009, 4(1): 5393. doi: 10.4249/scholarpedia.5393 ALLEN F E. Control flow analysis[J]. ACM SIGPLAN Notices, 1970, 5(7): 1–19. doi: 10.1145/390013.808479 SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2009, 20(1): 61–80. doi: 10.1109/TNN.2008.2005605 JIANG Bo, ZHANG Ziyan, LIN Doudou, et al. Semi-supervised learning with graph learning-convolutional networks[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 11305–11312. doi: 10.1109/CVPR.2019.01157. MARON H, BEN-HAMU H, SERVIANSKY H, et al. Provably powerful graph networks[C]. The 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, 2019: 2153–2164. GORI M, MONFARDINI G, and SCARSELLI F. A new model for learning in graph domains[C]. 2005 IEEE International Joint Conference on Neural Networks, Montreal, Canada, 2005: 729–734. WU Dongjie, MAO C H, WEI T E, et al. Droidmat: Android malware detection through manifest and API calls tracing[C]. The 7th Asia Joint Conference on Information Security, Tokyo, Japan, 2012: 62–69. doi: 10.1109/AsiaJCIS.2012.18. HUANG C Y, TSAI Y T, and HSU C H. Performance evaluation on permission-based detection for android malware[C]. International Computer Symposium ICS 2012 Held at Hualien, Taipei, China, 2013: 111–120. doi: 10.1007/978-3-642-35473-1_12. ZHANG Mu, DUAN Yue, YIN Heng, et al. Semantics-aware android malware classification using weighted contextual API dependency graphs[C]. 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, USA, 2014: 1105–1116. doi: 10.1145/2660267.2660359. FEREIDOONI H, CONTI M, YAO Danfeng, et al. ANASTASIA: Android malware detection using static analysis of applications[C]. The 8th IFIP International Conference on New Technologies, Mobility and Security, Larnaca, Cyprus, 2016: 1–5. doi: 10.1109/NTMS.2016.7792435. YANG Chao, XU Zhaoyan, GU Guofei, et al. Droidminer: Automated mining and characterization of fine-grained malicious behaviors in android applications[C]. The 19th European Symposium on Research in Computer Security, Wroclaw, Poland, 2014: 163–182. doi: 10.1007/978-3-319-11203-9_10. DIMJA?EVI? M, ATZENI S, UGRINA I, et al. Evaluation of android malware detection based on system calls[C]. 2016 ACM on International Workshop on Security and Privacy Analytics, New Orleans, USA, 2016: 1–8. doi: 10.1145/2875475.2875487. YERIMA S Y, SEZER S, and MUTTIK I. High accuracy android malware detection using ensemble learning[J]. IET Information Security, 2015, 9(6): 313–320. doi: 10.1049/iet-ifs.2014.0099 JEROME Q, ALLIX K, STATE R, et al. Using opcode-sequences to detect malicious Android applications[C]. 2014 IEEE International Conference on Communications, Sydney, Australia, 2014: 914–919. doi: 10.1109/ICC.2014.6883436. YERIMA S Y, SEZER S, MCWILLIAMS G, et al. A new android malware detection approach using Bayesian classification[C]. The 27th IEEE International Conference on Advanced Information Networking and Applications, Barcelona, Spain, 2013: 121–128. doi: 10.1109/AINA.2013.88. POWERS D M W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation[J]. Journal of Machine Learning Technologies, 2011, 2(1): 37–63. -
計量
- 文章訪問數(shù): 3243
- HTML全文瀏覽量: 951
- PDF下載量: 281
- 被引次數(shù): 0