基于家族行为频繁子图挖掘的恶意代码检测

摘要/Abstract

摘要： 基于图的恶意代码检测的方法必须为每个已知恶意软件建立行为依赖图，传统的基于动态污点分析技术恶意代码检测方法的行为依赖图的数量巨大，匹配很耗时间，很难运用于实际应用中.针对这个问题，提出一种基于恶意代码家族行为频繁子图挖掘的恶意代码检测方法，运用动态污点分析技术对系统调用API(application program interface)参数进行污点标记，通过追踪污点数据的轨迹得到系统API调用关系；其次使用动态污点分析方法生成单个样本的行为依赖图；然后，用频繁子图挖掘方法挖掘出恶意代码家族频繁行为子图；最后，以家族行为频繁子图作为家族行为特征，以随机森林算法建立分类器进行恶意代码检测.相对于传统的基于API序列和单一的基于恶意代码行为依赖图的检测方法，提出的方法不受代码混淆技术的影响，并且在很大程度上缩减了行为依赖图的数量，且不丢失恶意代码行为特征，提高了恶意代码检测的效率和分类准确率.

关键词: 恶意软件检测, 行为依赖图, 动态污点分析, 频繁子图, 分类

Abstract: In graph-based malware detection methods, we must build a behavior dependency graph for each known malware, therefore the number of behavior graphs is huge and the matching process is time-consuming, therefore, they are difficult to apply in practice. To solve this issue, we propose a malware detection method based on frequent subgraphs mining of malware family behavior. First, we use a dynamic taint analysis technique to mark the system call parameters with taint tags. Second, we build the system API call relational file by tracing the propagation of the taint data, and the behavior dependency graph of a single sample is then generated. we propose an algorithm to extract the behavior frequent subgraphs, which is used to represent the behavioral features of a malware family. Finally, compared with traditional malware detection methods based on API call sequence and single malware behavior dependency graphs, the detection effect of our method is not affected by code obfuscation technology, reduces the amount of behavior dependency graphs without losing the malicious behavior features and has a high detection rate and a high positive rate.

Key words: malware detection, behavior graphs, dynamic taint analysis, frequent subgraphs, classification

朱雪冰周安民左政. 基于家族行为频繁子图挖掘的恶意代码检测[J]. 信息安全研究, 2019, 5(2): 105-113.

参考文献

[1] 蒋华, 刘勇, 王鑫. 基于控制流的代码混淆技术研究[D]. 2013 [2] 张宇嘉, 张啸川, 庞建民. 代码混淆技术研究综述[J]. 信息工程大学学报, 2017, 18(5): 635-640 [3] Firdausi I, Erwin A, Nugroho A S. Analysis of machine learning techniques used in behavior-based malware detection[C] // Proc of the 2nd Int Conf on Advances in Computing, Control and Telecommunication Technologies (ACT).Piscataway,NJ: IEEE, 2010: 201-203 [4] Seshardi V, Ramzan Z, Satish S, et al. Using machine infection characteristics for behavior-based detection of malware: US Patent 8,266,698[P]. 2012-09-11 [5] 韩兰胜, 高昆仑, 赵保华, 等. 基于 API 函数及其参数相结合的恶意软件行为检测[D]. 2013. [6] 荣俸萍, 方勇, 左政, 等. MACSPMD: 基于恶意 API 调用序列模式挖掘的恶意代码检测[J]. 计算机科学, 2018,45 (5): 132-138 [7] 李盟, 贾晓启, 王蕊, 等. 一种恶意代码特征选取和建模方法[J]. 计算机应用与软件, 2015, 32(8), 266-267 [8] Cho I K, Kim T G, Shim Y J, et al. Malware Similarity Analysis using API Sequence Alignments[J]. J. Internet Serv. Inf. Secur., 2014, 4(4): 103-114 [9] Ki Y, Kim E, Kim H K. A novel approach to detect malware based on API call sequence analysis[J]. International Journal of Distributed Sensor Networks, 2015, 11(6): 659101 [10] Kostakis O, Kinable J, Mahmoudi H, et al. Improved call graph comparison using simulated annealing[C]//Proc of the 2011 ACM Symp on Applied Computing.New York: ACM, 2011: 1516-1523 [11] Shang Shanhu, Zheng Ning, Xu Jian, et al. Detecting malware variants via function-call graph similarity[C]// Proc of the 5th Int Conf on Malicious and Unwanted Software (MALWARE). Piscataway,NJ:IEEE, 2010: 113-120 [12] Sirageldin A, Baharudin B, Jung L T. Detecting malicious executable file via graph comparison using support vector machine[C]//Proc of Int Conf on Computer & Information Science (ICCIS). Piscataway,NJ:IEEE, 2012: 469-473 [13] Nikolopoulos S D, Polenakis I. A graph-based model for malware detection and classification using system-call groups[J]. Journal of Computer Virology and Hacking Techniques, 2017, 13(1): 29-46 [14] Elhadi A A E, Maarof M A, Barry B I A, et al. Enhancing the detection of metamorphic malware using call graphs[J]. Computers & Security, 2014, 46: 62-78 [15] Song D, Brumley D, Yin H, et al. BitBlaze: A new approach to computer security via binary analysis[C]//Proc of Int Conf on Information Systems Security. Berlin: Springer, 2008: 1-25 [16] Liaw A, Wiener M. Classification and regression by randomForest[J]. R news, 2002, 2(3): 18-22 [17] Kolbitsch C, Comparetti P M, Kruegel C, et al. Effective and Efficient Malware Detection at the End Host[C]//Proc of USENIX Security Symp. Berkeley: USENIX Association,2009: 351-366