基于属性数据流图的恶意代码家族分类

信息安全研究 ›› 2020, Vol. 6 ›› Issue (3): 226-234.

基于属性数据流图的恶意代码家族分类

杨频¹,朱悦¹,张磊²

1. 四川大学网络空间安全学院
2. 武汉深之度科技有限公司技术部

收稿日期:2020-03-02 出版日期:2020-03-10 发布日期:2020-03-02
通讯作者: 杨频
作者简介:杨频博士，教授，主要研究方向为舆情分析、软件安全． yangpin@scu.edu.cn 朱悦硕士研究生，主要研究方向为恶意代码检测与分析． zhuyue.y@qq.com 张磊博士，助理研究员，主要研究方向恶意代码检测与分析. zhanglei2018@scu.edu.cn

Malware Family Classification Based on Attributed Dataflow Graph

Received:2020-03-02 Online:2020-03-10 Published:2020-03-02

摘要/Abstract

摘要： 新型恶意代码的大量出现给网络安全造成严重威胁，并且很大一部分是已有恶意代码的衍生版本，对恶意代码进行家族分类有助于分析恶意代码家族演化趋势和溯源网络犯罪团伙.提出一种基于属性数据流图和图卷积网络的恶意代码家族分类方法.首先，在沙箱中运行恶意代码，获得API调用序列；再将API调用序列抽象为数据流动事件，并构建带属性的数据流图；然后，使用改进的图卷积网络对属性数据流图进行学习；最后，使用训练好的网络对恶意代码进行家族分类.实验结果表明，提出的方法可以达到96.79%的分类准确率，优于基于API调用图的方法.

关键词: 恶意代码, 分类, 图卷积网络, 动态分析, 数据流图

Abstract: New types of malware pose a serious threat to cybersecurity, and most of them are modified on the basis of existing malwares. Therefore, family classification of malwares is helpful for analyzing the evolution of malware families and tracing cybercrime groups. We propose a malware family classification method based on attributed dataflow graphs and graph convolutional networks. First, run malware in the sandbox to obtain the API call sequence; then abstract the API call sequence into dataflow events and build a dataflow graph with attributes; then, use the improved graph convolutional network to learn the attributed dataflow graph; Finally, use the trained network to classify malware into families. The experimental results show that the method proposed in this paper can achieve a classification accuracy of 96.79%, which is better than the method based on API call graph.

Key words: malware, classification, graph convolutional networks, dynamic analysis, data-flow graph

杨频朱悦张磊. 基于属性数据流图的恶意代码家族分类[J]. 信息安全研究, 2020, 6(3): 226-234.

参考文献

1] Av-Test.New malware[EB/OL].[ 2019-11-10].https://www.av-test.org/en/statistics/malware [2] Symantec.The future of mobile malware[EB/OL]. [2019-11-10]. http://www.symantec.com/connect/blogs/future-mobile-malware [3] Rafique M Z, Chen P, Huygens C, et al. Evolutionary algorithms for classification of malware families through different network behaviors[C]//Proc of the 2014 Annual Conf on Genetic and Evolutionary Computation. New York: ACM, 2014: 1167-1174 [4] Avast.Avast reports on WanaCrypt0r 2.0 ransomware that infected NHS and telefonica[EB/OL].[2019-11-10].https://blog.avast.com/ransomware-that-infected-telefonica-and-nhs-hospitals-isspreading-aggressively-withover-50000-attacks-so-far-today [5] Damodaran A, Di Troia F, Visaggio C A, et al. A comparison of static, dynamic, and hybrid analysis for malware detection[J]. Journal of Computer Virology and Hacking Techniques, 2017, 13(1): 1-12 [6] Bat-Erdene M, Park H, Li H, et al. Entropy analysis to classify unknown packing algorithms for malware detection[J]. International Journal of Information Security, 2017, 16(3): 227-248 [7] Santos I, Brezo F, Ugarte-Pedrero X, et al. Opcode sequences as representation of executables for data-mining-based unknown malware detection[J]. Information Sciences, 2013, 231: 64-82 [8] Fattori A, Lanzi A, Balzarotti D, et al. Hypervisor-based malware protection with accessminer[J]. Computers & Security, 2015, 52: 33-50 [9] Altaher A. An improved Android malware detection scheme based on an evolving hybrid neuro-fuzzy classifier (EHNFC) and permission-based features[J]. Neural Computing and Applications, 2017, 28(12): 4147-4157 [10] Fan C I, Hsiao H W, Chou C H, et al. Malware detection systems based on API log data mining[C]//Proc of the 39th IEEE 39th Annual Computer Software and Applications Conf. Piscataway, NJ: IEEE, 2015: 255-260 [11] Lee T, Choi B, Shin Y, et al. Automatic malware mutant detection and group classification based on the n-gram and clustering coefficient[J]. The Journal of Supercomputing, 2018, 74(8): 3489-3503 [12] 荣俸萍, 方勇, 左政, 等. MACSPMD: 基于恶意 API 调用序列模式挖掘的恶意代码检测[J]. 计算机科学, 2018, 45(5): 131-138 [13] Wüchner T, Ochoa M, Pretschner A. Malware detection with quantitative data flow graphs[C]//Proc of the 9th ACM Symp on Information, Computer and Communications Security. New York: ACM, 2014: 271-282 [14] Hassen M, Chan P K. Scalable function call graph-based malware classification[C]//Proc of the 7th ACM on Conf on Data and Application Security and Privacy. New York: ACM, 2017: 239-248 [15] Searles R, Xu L, Killian W, et al. Parallelization of machine learning applied to call graphs of binaries for malware detection[C]// Proc of the 25th Euromicro Int Conf on Parallel, Distributed and Network-based Processing (PDP). Piscataway, NJ: IEEE, 2017: 69-77 [16] 赵炳麟, 孟曦, 韩金, 等. 基于图结构的恶意代码同源性分析[J]. 通信学报, 2017, 38(Z2): 86-93 [17] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint, arXiv:1609.02907, 2016 [18] Zhu R, Li C, Niu D, et al. Android malware detection using large-scale network representation learning[J]. arXiv preprint, arXiv:1806.04847, 2018 [19] Yan J, Yan G, Jin D. Classifying Malware Represented as Control Flow Graphs using Deep Graph Convolutional Neural Network[C]//Proc of the 49th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks (DSN). Piscataway, NJ: IEEE, 2019: 52-63 [20] Gilmer J, Schoenholz S S, Riley P F, et al. Neural message passing for quantum chemistry[C]//Proc of the 34th Int Conf on Machine Learning. Brooklyn: JMLR. org, 2017: 1263-1272 [21] Catak F O, Yazı A F. A benchmark API call dataset for Windows PE malware classification[J]. arXiv preprint, arXiv:1905.01999, 2019 [22] Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs[C]//Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2017: 1024-1034 [23] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2017: 5998-6008 [24] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[J]. arXiv preprint, arXiv:1301.3781, 2013

[1]	吉梁. 央企商密网分类分域安全防护体系设计与思考[J]. 信息安全研究, 2020, 6(9): 0-0.
[2]	戴纯兴刘刚韩春超王传国. KVM环境下基于异常行为的恶意软件检测技术研究[J]. 信息安全研究, 2020, 6(6): 0-0.
[3]	王兴凤黄琨茗张文杰. 基于API序列和卷积神经网络的恶意代码检测[J]. 信息安全研究, 2020, 6(3): 212-219.
[4]	黄莉峥刘嘉勇郑荣锋李孟铭. 一种基于暗网的威胁情报主动获取框架[J]. 信息安全研究, 2020, 6(2): 131-138.
[5]	雷惊鹏. 基于云计算和深度学习的协议监测系统设计[J]. 信息安全研究, 2020, 6(12): 1127-1132.
[6]	刘国伟张艺朱岩. 大数据时代网络安全技术的演进[J]. 信息安全研究, 2019, 5(5): 406-413.
[7]	包英明. 大数据平台数据安全防护技术[J]. 信息安全研究, 2019, 5(3): 242-247.
[8]	朱雪冰周安民左政. 基于家族行为频繁子图挖掘的恶意代码检测[J]. 信息安全研究, 2019, 5(2): 105-113.
[9]	唐枭. 基于动态污点分析的反馈式模糊测试改进方法[J]. 信息安全研究, 2019, 5(2): 145-151.
[10]	莫坤王娜李恒吉李朝阳李剑. 基于LightGBM的网络入侵检测系统[J]. 信息安全研究, 2019, 5(2): 152-156.
[11]	刘蓉于浩佳陈思远陈波. 基于APP分层结构的Android应用漏洞分类法[J]. 信息安全研究, 2018, 4(9): 792-798.
[12]	何平胡勇. 一种基于本地代码特征的Android恶意代码检测方法[J]. 信息安全研究, 2018, 4(6): 511-517.
[13]	刘亮刘露平何帅刘嘉勇. 一种基于多特征的恶意代码家族静态标注方法[J]. 信息安全研究, 2018, 4(4): 322-328.
[14]	陈泽峰方勇刘亮左政李抒霞. 基于多维特征的Android恶意应用检测系统[J]. 信息安全研究, 2018, 4(2): 133-139.
[15]	崔丽娟. 僵尸网络综述[J]. 信息安全研究, 2017, 3(7): 589-600.