信息安全研究 ›› 2019, Vol. 5 ›› Issue (2): 105-113.

• 学术论文 • 上一篇    下一篇

基于家族行为频繁子图挖掘的恶意代码检测

朱雪冰1,周安民2,左政3   

  1. 1. 四川大学
    2. 四川大学电子信息学院
    3. 四川大学 信息安全研究所
  • 收稿日期:2019-02-14 出版日期:2019-02-15 发布日期:2019-02-14
  • 通讯作者: 朱雪冰

Malware Detection Based on Family Behavior Frequent Subgraph Mining

  • Received:2019-02-14 Online:2019-02-15 Published:2019-02-14

摘要: 基于图的恶意代码检测的方法必须为每个已知恶意软件建立行为依赖图,传统的基于动态污点分析技术恶意代码检测方法的行为依赖图的数量巨大,匹配很耗时间,很难运用于实际应用中.针对这个问题,提出一种基于恶意代码家族行为频繁子图挖掘的恶意代码检测方法,运用动态污点分析技术对系统调用API(application program interface)参数进行污点标记,通过追踪污点数据的轨迹得到系统API调用关系;其次使用动态污点分析方法生成单个样本的行为依赖图;然后,用频繁子图挖掘方法挖掘出恶意代码家族频繁行为子图;最后,以家族行为频繁子图作为家族行为特征,以随机森林算法建立分类器进行恶意代码检测.相对于传统的基于API序列和单一的基于恶意代码行为依赖图的检测方法,提出的方法不受代码混淆技术的影响,并且在很大程度上缩减了行为依赖图的数量,且不丢失恶意代码行为特征,提高了恶意代码检测的效率和分类准确率.

关键词: 恶意软件检测, 行为依赖图, 动态污点分析, 频繁子图, 分类

Abstract: In graph-based malware detection methods, we must build a behavior dependency graph for each known malware, therefore the number of behavior graphs is huge and the matching process is time-consuming, therefore, they are difficult to apply in practice. To solve this issue, we propose a malware detection method based on frequent subgraphs mining of malware family behavior. First, we use a dynamic taint analysis technique to mark the system call parameters with taint tags. Second, we build the system API call relational file by tracing the propagation of the taint data, and the behavior dependency graph of a single sample is then generated. we propose an algorithm to extract the behavior frequent subgraphs, which is used to represent the behavioral features of a malware family. Finally, compared with traditional malware detection methods based on API call sequence and single malware behavior dependency graphs, the detection effect of our method is not affected by code obfuscation technology, reduces the amount of behavior dependency graphs without losing the malicious behavior features and has a high detection rate and a high positive rate.

Key words: malware detection, behavior graphs, dynamic taint analysis, frequent subgraphs, classification