基于遺傳算法的惡意代碼對抗樣本生成方法

閆佳; 閆佳; 聶楚江; 蘇璞睿

doi:10.11999/JEIT191059

基于遺傳算法的惡意代碼對抗樣本生成方法

doi: 10.11999/JEIT191059

1.
中國科學(xué)院大學(xué)計算機(jī)科學(xué)與技術(shù)學(xué)院北京 100190
2.
中國科學(xué)院軟件研究所可信計算與信息保障實(shí)驗(yàn)室北京 100190

基金項(xiàng)目: 國家自然科學(xué)基金(61902384, U1836117, U1836113)

詳細(xì)信息

作者簡介:
閆佳：男，1991年生，博士生，研究方向?yàn)榫W(wǎng)絡(luò)與系統(tǒng)安全

閆佳：男，1986年生，副研究員，研究方向?yàn)榫W(wǎng)絡(luò)與系統(tǒng)安全

聶楚江：男，1983年生，副研究員，研究方向?yàn)榫W(wǎng)絡(luò)與系統(tǒng)安全

蘇璞睿：男，1976年生，研究員，研究方向?yàn)榫W(wǎng)絡(luò)與系統(tǒng)安全

通訊作者:
蘇璞?！?a href="mailto:purui@iscas.ac.cn">purui@iscas.ac.cn

中圖分類號: TP309.5
計量
- 文章訪問數(shù): 2987
- HTML全文瀏覽量: 958
- PDF下載量: 255
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2019-12-31
- 修回日期: 2020-05-30
- 網(wǎng)絡(luò)出版日期: 2020-07-21
- 刊出日期: 2020-09-27

Method for Generating Malicious Code Adversarial Samples Based on Genetic Algorithm

1.
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100190, China
2.
Trusted Computing and Information Assurance Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China

Funds: The National Natural Science Foundation of China (61902384, U1836117, U1836113)

摘要

摘要: 機(jī)器學(xué)習(xí)已經(jīng)廣泛應(yīng)用于惡意代碼檢測中，并在惡意代碼檢測產(chǎn)品中發(fā)揮重要作用。構(gòu)建針對惡意代碼檢測機(jī)器學(xué)習(xí)模型的對抗樣本，是發(fā)掘惡意代碼檢測模型缺陷，評估和完善惡意代碼檢測系統(tǒng)的關(guān)鍵。該文提出一種基于遺傳算法的惡意代碼對抗樣本生成方法，生成的樣本在有效對抗基于機(jī)器學(xué)習(xí)的惡意代碼檢測模型的同時，確保了惡意代碼樣本的可執(zhí)行和惡意行為的一致性，有效提升了生成對抗樣本的真實(shí)性和模型對抗評估的準(zhǔn)確性。實(shí)驗(yàn)表明，該文提出的對抗樣本生成方法使MalConv惡意代碼檢測模型的檢測準(zhǔn)確率下降了14.65%；并可直接對VirusTotal中4款基于機(jī)器學(xué)習(xí)的惡意代碼檢測商用引擎形成有效的干擾，其中，Cylance的檢測準(zhǔn)確率只有53.55%。
- 惡意代碼檢測 /
- 機(jī)器學(xué)習(xí) /
- 對抗樣本
Abstract: Machine learning is widely used in malicious code detection and plays an important role in malicious code detection products. Constructing adversarial samples for malicious code detection machine learning models is the key to discovering defects in malicious code detection models, evaluating and improving malicious code detection systems. This paper proposes a method for generating malicious code adversarial samples based on genetic algorithms. The generated samples combat effectively the malicious code detection model based on machine learning, while ensuring the consistency of the executable and malicious behavior of malicious code samples, and improving effectively the authenticity of the generated adversarial samples and the accuracy of the model adversarial evaluation are presented. The experiments show that the proposed method of generating adversarial samples reduces the detection accuracy of the MalConv malicious code detection model by 14.65%, and can directly interfere with four commercial machine-based malicious code detection engines in VirusTotal. Among them, the accuracy rate of Cylance detection is only 53.55%.
- Malware detection /
- Machine learning /
- Adversarial sample

HTML全文

圖 1 PE文件格式結(jié)構(gòu)

下載: 全尺寸圖片幻燈片

圖 2 基于遺傳算法的對抗樣本生成算法流程圖

下載: 全尺寸圖片幻燈片

表 1 PE文件改寫原子操作

改寫模塊	改寫內(nèi)容
PE頭文件	PE標(biāo)志位修改
	PE文件校驗(yàn)和修改
節(jié)表	導(dǎo)入表添加冗余導(dǎo)入函數(shù)
	節(jié)表模塊重命名
	節(jié)表冗余信息填充
	節(jié)表新模塊添加
PE文件	加殼、脫殼操作

下載: 導(dǎo)出CSV

表 2 實(shí)驗(yàn)數(shù)據(jù)統(tǒng)計信息

樣本	訓(xùn)練集	測試集
良性樣本	7059	784
惡意樣本	6593	732
總數(shù)	13652	1516

下載: 導(dǎo)出CSV

表 3 惡意代碼檢測引擎檢測結(jié)果

評測樣本集	良性樣本誤報	惡意樣本誤報	誤報樣本綜述	模型檢測準(zhǔn)確率(%)
原始樣本集	7	10	17	98.88
初代對抗樣本集	37	9	46	96.97
優(yōu)化后的對抗樣本集	228	11	239	84.23

下載: 導(dǎo)出CSV

表 4 廠商產(chǎn)品的檢測成功率

惡意代碼檢測引擎	誤報樣本數(shù)	檢測逃逸率(%)
Cylance	111	46.45
Endgame	43	17.99
Sophos ML	50	20.92
Trapmine	35	14.64

下載: 導(dǎo)出CSV

參考文獻(xiàn)(29)

LANDAGE J and WANKHADE M P. Malware and malware detection techniques: A survey[J]. International Journal of Engineering Research & Technology, 2013, 2(12): 61–68.

SAXE J and BERLIN K. Deep neural network based malware detection using two dimensional binary program features[C]. The 10th International Conference on Malicious and Unwanted Software (MALWARE), Fajardo, USA, 2015: 11–20. doi: 10.1109/MALWARE.2015.7413680.

ARP D, SPREITZENBARTH M, HUBNER M, et al. Drebin: Effective and explainable detection of android malware in your pocket[C]. Network and Distributed System Security Symposium, San Diego, USA, 2014: 23–26. doi: 10.14722/ndss.2014.23247.

RAFF E, SYLVESTER J, and NICHOLAS C. Learning the PE header, malware detection with minimal domain knowledge[C]. The 10th ACM Workshop on Artificial Intelligence and Security, Dallas, USA, 2017: 121–132. doi: 10.1145/3128572.3140442.

RAFF E, ZAK R, COX R, et al. An investigation of byte n-gram features for malware classification[J]. Journal of Computer Virology and Hacking Techniques, 2018, 14(1): 1–20. doi: 10.1007/s11416-016-0283-1

Cylance Inc. What’s new in CylancePROTECT and CylanceOPTICS[EB/OL]. https://s7d2.scene7.com/is/content/cylance/prod/cylance-web/en-us/resources/knowledge-center/resource-library/briefs/Whats-New-CylancePROTECT-and-CylanceOPTICS.pdf, 2020.

Sophos Inc. Sophos central migration tool articles, documentation and resources[EB/OL]. https://community.sophos.com/kb/en-us/122264#Product%20Information, 2020.

梁光輝, 龐建民, 單征. 基于代碼進(jìn)化的惡意代碼沙箱規(guī)避檢測技術(shù)研究[J]. 電子與信息學(xué)報, 2019, 41(2): 341–347. doi: 10.11999/JEIT180257

LIANG Guanghui, PANG Jianmin, and SHAN Zheng. Malware sandbox evasion detection based on code evolution[J]. Journal of Electronics &Information Technology, 2019, 41(2): 341–347. doi: 10.11999/JEIT180257

GROSSE K, PAPERNOT N, MANOHARAN P, et al. Adversarial perturbations against deep neural networks for malware classification[J]. arXiv, 2016, 1606.04435.

XU Weilin, QI Yanjun, and EVANS D. Automatically evading classifiers[C]. The 23rd Annual Network and Distributed System Security Symposium, San Diego, USA, 2016: 21–24. doi: 10.14722/ndss.2016.23115.

HU Weiwei and TAN Ying. Generating adversarial malware examples for black-box attacks based on GAN[J]. arXiv, 2017, 1702.05983.

HU Weiwei and TAN Ying. Black-box attacks against RNN based malware detection algorithms[C]. The Workshops of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018.

RAFF E, BARKER J, SYLVESTER J, et al. Malware detection by eating a whole exe[C]. The Workshops of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 268–276.

TOTAL V. VirusTotal-free online virus, malware and url scanner[EB/OL]. https//www.virustotal.com/en, 2012.

PASCANU R, STOKES J W, SANOSSIAN H, et al. Malware classification with recurrent networks[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015: 1916–1920. doi: 10.1109/ICASSP.2015.7178304.

KOLOSNJAJI B, ZARRAS A, WEBSTER G, et al. Deep learning for classification of malware system call sequences[C]. The 29th Australasian Joint Conference on Artificial Intelligence, Hobart, Australia, 2016: 137–149. doi: 10.1007/978-3-319-50127-7_11.

HUANG Wenyi and STOKES J W. MtNet: A multi-task neural network for dynamic malware classification[C]. The 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, San Sebastián, Spain, 2016: 399–418. doi: 10.1007/978-3-319-40667-1_20.

MANNING C D, RAGHAVAN P, and SCHüTZE H. Introduction to Information Retrieval[M]. Cambridge: Cambridge University Press, 2008.

HAN K S, LIM J H, KANG B, et al. Malware analysis using visualized images and entropy graphs[J]. International Journal of Information Security, 2015, 14(1): 1–14. doi: 10.1007/s10207-014-0242-0

KANCHERLA K and MUKKAMALA S. Image visualization based malware detection[C]. 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), Singapore, 2013: 40–44. doi: 10.1109/CICYBS.2013.6597204.

LIU Xinbo, LIN Yaping, LI He, et al. A novel method for malware detection on ML-based visualization technique[J]. Computers & Security, 2020, 89: 101682. doi: 10.1016/j.cose.2019.101682

Skylight. Cylance, I kill you![ EB/OL]. https://skylightcyber.com/2019/07/18/cylance-i-kill-you/, 2019.

MOHURLE S and PATIL M. A brief study of wannacry threat: Ransomware attack 2017[J]. International Journal of Advanced Research in Computer Science, 2017, 8(5): 1938–1940. doi: 10.26483/ijarcs.v8i5.4021

DANG Hung, HUANG Yue, and CHANG E C. Evading classifiers by morphing in the dark[C]. 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, USA, 2017: 119–133. doi: 10.1145/3133956.3133978.

戚利. Windows PE權(quán)威指南[M]. 北京: 機(jī)械工業(yè)出版社, 2011: 67–68.

QI Li. Windows PE: The Definitive Guide[M]. Beijing: Machinery Industry Press, 2011: 67–68.

KOZA J R. Genetic Programming II: Automatic Discovery of Reusable Subprograms[M]. Cambridge, MA, USA: MIT Press, 1994: 32.

Cuckoo Sandbox. Cuckoo Sandbox–Automated malware analysis[EB/OL]. http://www.cuckoosandbox.org, 2017.

BANON S. Elastic endpoint security[EB/OL]. https://www.elastic.co/cn/blog/introducing-elastic-endpoint-security, 2019.

Trapmine Inc. TRAPMINE integrates machine learning engine into VirusTotal[EB/OL]. https://trapmine.com/blog/trapmine-machine-learning-virustotal/, 2018.

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計