基于遺傳算法的惡意代碼對抗樣本生成方法
doi: 10.11999/JEIT191059
-
1.
中國科學(xué)院大學(xué)計算機(jī)科學(xué)與技術(shù)學(xué)院 北京 100190
-
2.
中國科學(xué)院軟件研究所可信計算與信息保障實(shí)驗(yàn)室 北京 100190
Method for Generating Malicious Code Adversarial Samples Based on Genetic Algorithm
-
1.
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100190, China
-
2.
Trusted Computing and Information Assurance Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
-
摘要: 機(jī)器學(xué)習(xí)已經(jīng)廣泛應(yīng)用于惡意代碼檢測中,并在惡意代碼檢測產(chǎn)品中發(fā)揮重要作用。構(gòu)建針對惡意代碼檢測機(jī)器學(xué)習(xí)模型的對抗樣本,是發(fā)掘惡意代碼檢測模型缺陷,評估和完善惡意代碼檢測系統(tǒng)的關(guān)鍵。該文提出一種基于遺傳算法的惡意代碼對抗樣本生成方法,生成的樣本在有效對抗基于機(jī)器學(xué)習(xí)的惡意代碼檢測模型的同時,確保了惡意代碼樣本的可執(zhí)行和惡意行為的一致性,有效提升了生成對抗樣本的真實(shí)性和模型對抗評估的準(zhǔn)確性。實(shí)驗(yàn)表明,該文提出的對抗樣本生成方法使MalConv惡意代碼檢測模型的檢測準(zhǔn)確率下降了14.65%;并可直接對VirusTotal中4款基于機(jī)器學(xué)習(xí)的惡意代碼檢測商用引擎形成有效的干擾,其中,Cylance的檢測準(zhǔn)確率只有53.55%。
-
關(guān)鍵詞:
- 惡意代碼檢測 /
- 機(jī)器學(xué)習(xí) /
- 對抗樣本
Abstract: Machine learning is widely used in malicious code detection and plays an important role in malicious code detection products. Constructing adversarial samples for malicious code detection machine learning models is the key to discovering defects in malicious code detection models, evaluating and improving malicious code detection systems. This paper proposes a method for generating malicious code adversarial samples based on genetic algorithms. The generated samples combat effectively the malicious code detection model based on machine learning, while ensuring the consistency of the executable and malicious behavior of malicious code samples, and improving effectively the authenticity of the generated adversarial samples and the accuracy of the model adversarial evaluation are presented. The experiments show that the proposed method of generating adversarial samples reduces the detection accuracy of the MalConv malicious code detection model by 14.65%, and can directly interfere with four commercial machine-based malicious code detection engines in VirusTotal. Among them, the accuracy rate of Cylance detection is only 53.55%.-
Key words:
- Malware detection /
- Machine learning /
- Adversarial sample
-
表 1 PE文件改寫原子操作
改寫模塊 改寫內(nèi)容 PE頭文件 PE標(biāo)志位修改 PE文件校驗(yàn)和修改 節(jié)表 導(dǎo)入表添加冗余導(dǎo)入函數(shù) 節(jié)表模塊重命名 節(jié)表冗余信息填充 節(jié)表新模塊添加 PE文件 加殼、脫殼操作 下載: 導(dǎo)出CSV
表 2 實(shí)驗(yàn)數(shù)據(jù)統(tǒng)計信息
樣本 訓(xùn)練集 測試集 良性樣本 7059 784 惡意樣本 6593 732 總數(shù) 13652 1516 下載: 導(dǎo)出CSV
表 3 惡意代碼檢測引擎檢測結(jié)果
評測樣本集 良性樣本誤報 惡意樣本誤報 誤報樣本綜述 模型檢測準(zhǔn)確率(%) 原始樣本集 7 10 17 98.88 初代對抗樣本集 37 9 46 96.97 優(yōu)化后的對抗樣本集 228 11 239 84.23 下載: 導(dǎo)出CSV
表 4 廠商產(chǎn)品的檢測成功率
惡意代碼檢測引擎 誤報樣本數(shù) 檢測逃逸率(%) Cylance 111 46.45 Endgame 43 17.99 Sophos ML 50 20.92 Trapmine 35 14.64 下載: 導(dǎo)出CSV
-
LANDAGE J and WANKHADE M P. Malware and malware detection techniques: A survey[J]. International Journal of Engineering Research & Technology, 2013, 2(12): 61–68. SAXE J and BERLIN K. Deep neural network based malware detection using two dimensional binary program features[C]. The 10th International Conference on Malicious and Unwanted Software (MALWARE), Fajardo, USA, 2015: 11–20. doi: 10.1109/MALWARE.2015.7413680. ARP D, SPREITZENBARTH M, HUBNER M, et al. Drebin: Effective and explainable detection of android malware in your pocket[C]. Network and Distributed System Security Symposium, San Diego, USA, 2014: 23–26. doi: 10.14722/ndss.2014.23247. RAFF E, SYLVESTER J, and NICHOLAS C. Learning the PE header, malware detection with minimal domain knowledge[C]. The 10th ACM Workshop on Artificial Intelligence and Security, Dallas, USA, 2017: 121–132. doi: 10.1145/3128572.3140442. RAFF E, ZAK R, COX R, et al. An investigation of byte n-gram features for malware classification[J]. Journal of Computer Virology and Hacking Techniques, 2018, 14(1): 1–20. doi: 10.1007/s11416-016-0283-1 Cylance Inc. What’s new in CylancePROTECT and CylanceOPTICS[EB/OL]. https://s7d2.scene7.com/is/content/cylance/prod/cylance-web/en-us/resources/knowledge-center/resource-library/briefs/Whats-New-CylancePROTECT-and-CylanceOPTICS.pdf, 2020. Sophos Inc. Sophos central migration tool articles, documentation and resources[EB/OL]. https://community.sophos.com/kb/en-us/122264#Product%20Information, 2020. 梁光輝, 龐建民, 單征. 基于代碼進(jìn)化的惡意代碼沙箱規(guī)避檢測技術(shù)研究[J]. 電子與信息學(xué)報, 2019, 41(2): 341–347. doi: 10.11999/JEIT180257LIANG Guanghui, PANG Jianmin, and SHAN Zheng. Malware sandbox evasion detection based on code evolution[J]. Journal of Electronics &Information Technology, 2019, 41(2): 341–347. doi: 10.11999/JEIT180257 GROSSE K, PAPERNOT N, MANOHARAN P, et al. Adversarial perturbations against deep neural networks for malware classification[J]. arXiv, 2016, 1606.04435. XU Weilin, QI Yanjun, and EVANS D. Automatically evading classifiers[C]. The 23rd Annual Network and Distributed System Security Symposium, San Diego, USA, 2016: 21–24. doi: 10.14722/ndss.2016.23115. HU Weiwei and TAN Ying. Generating adversarial malware examples for black-box attacks based on GAN[J]. arXiv, 2017, 1702.05983. HU Weiwei and TAN Ying. Black-box attacks against RNN based malware detection algorithms[C]. The Workshops of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018. RAFF E, BARKER J, SYLVESTER J, et al. Malware detection by eating a whole exe[C]. The Workshops of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 268–276. TOTAL V. VirusTotal-free online virus, malware and url scanner[EB/OL]. https//www.virustotal.com/en, 2012. PASCANU R, STOKES J W, SANOSSIAN H, et al. Malware classification with recurrent networks[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015: 1916–1920. doi: 10.1109/ICASSP.2015.7178304. KOLOSNJAJI B, ZARRAS A, WEBSTER G, et al. Deep learning for classification of malware system call sequences[C]. The 29th Australasian Joint Conference on Artificial Intelligence, Hobart, Australia, 2016: 137–149. doi: 10.1007/978-3-319-50127-7_11. HUANG Wenyi and STOKES J W. MtNet: A multi-task neural network for dynamic malware classification[C]. The 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, San Sebastián, Spain, 2016: 399–418. doi: 10.1007/978-3-319-40667-1_20. MANNING C D, RAGHAVAN P, and SCHüTZE H. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008. HAN K S, LIM J H, KANG B, et al. Malware analysis using visualized images and entropy graphs[J]. International Journal of Information Security, 2015, 14(1): 1–14. doi: 10.1007/s10207-014-0242-0 KANCHERLA K and MUKKAMALA S. Image visualization based malware detection[C]. 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), Singapore, 2013: 40–44. doi: 10.1109/CICYBS.2013.6597204. LIU Xinbo, LIN Yaping, LI He, et al. A novel method for malware detection on ML-based visualization technique[J]. Computers & Security, 2020, 89: 101682. doi: 10.1016/j.cose.2019.101682 Skylight. Cylance, I kill you![ EB/OL]. https://skylightcyber.com/2019/07/18/cylance-i-kill-you/, 2019. MOHURLE S and PATIL M. A brief study of wannacry threat: Ransomware attack 2017[J]. International Journal of Advanced Research in Computer Science, 2017, 8(5): 1938–1940. doi: 10.26483/ijarcs.v8i5.4021 DANG Hung, HUANG Yue, and CHANG E C. Evading classifiers by morphing in the dark[C]. 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, USA, 2017: 119–133. doi: 10.1145/3133956.3133978. 戚利. Windows PE權(quán)威指南[M]. 北京: 機(jī)械工業(yè)出版社, 2011: 67–68.QI Li. Windows PE: The Definitive Guide[M]. Beijing: Machinery Industry Press, 2011: 67–68. KOZA J R. Genetic Programming II: Automatic Discovery of Reusable Subprograms[M]. Cambridge, MA, USA: MIT Press, 1994: 32. Cuckoo Sandbox. Cuckoo Sandbox–Automated malware analysis[EB/OL]. http://www.cuckoosandbox.org, 2017. BANON S. Elastic endpoint security[EB/OL]. https://www.elastic.co/cn/blog/introducing-elastic-endpoint-security, 2019. Trapmine Inc. TRAPMINE integrates machine learning engine into VirusTotal[EB/OL]. https://trapmine.com/blog/trapmine-machine-learning-virustotal/, 2018. -