Journal of Information Security Research ›› 2018, Vol. 4 ›› Issue (4): 322-328.

Previous Articles     Next Articles

A Static Tagging Method of Malicious Code Family Based on Multi-Feature


  • Received:2018-04-18 Online:2018-04-15 Published:2018-04-20


刘亮1,刘露平2,何帅3, 刘嘉勇4   

  1. 1. 四川大学 网络空间学院
    2. 四川大学电子信息学院
    3. 四川大学
    4. 四川大学网络空间安全学院
  • 通讯作者: 刘亮
  • 作者简介:刘亮 工学硕士,工程师,主要研究方向为信息系统安全、恶意代码检测。 刘露平 硕士,博士研究生,主要研究方向为软件与系统安全、二进制程序分析、漏洞挖掘。 何帅 硕士研究生,主要研究方向为入侵检测、驱动开发。 刘嘉勇 博士,教授,主要研究方向为网路信息安全、网络信息处理、大数据分析。

Abstract: This paper describes a method of static tagging of malicious code family based on multiple features, it uses malicious code visualization technology to draw malicious code image, extracts feature from image source and text source, byte code layer and Operation code layer, it extract features from multiple sources and multi-level which aims at overcoming defects that only extract features from one source. In order to make better use of the features extracted from multiple levels, this paper designs a 3-layer multi-classifier joint framework for feature learning, and the 3-layer multi-classifier joint framework is divided into three parts, which are feature combination layer, classification layer and union layer. Finally, we can use the learning model to tag the malicious code automatically. In order to verify the validity of the method, we made the malicious code family tagging test experiment with 9 kinds of malicious code in Microsoft’s data set, and the experimental results show that our method has higher accuracy, precision, recall and F1-score which are more than 90% in other sample families except SIMDA malicious code family. The validity and reliability of the method are proved by experiments.

Key words: malicious code family, malicious code image, machine learning, multi-feature, Multi-classifier Joint framework

摘要: 本文描述了一种基于多特征的恶意代码家族静态标注方法,该方法针对现有技术提取特征单一的缺点,采用恶意代码可视化技术绘制恶意代码图像,并从图像源和文本源、字节码层和操作码层进行特征的提取,多来源多层次地提取特征。为了更好的利用提取自多个层次的特征,本文设计了3层多分类器联合框架来进行特征的学习,3层多分类器联合框架分为特征组合层、分类层和联合层。最后利用学习到的模型便可以自动进行恶意代码的标注。为了验证方法的有效性,我们在Microsoft提供的9类恶意代码进行恶意代码家族标注测试实验,实验结果表明,我们方法在除了Simda恶意样本家族外,在其他样本家族中在准确率、精确率、召回率和F1-score均高于90%。通过实验证明的该方法的有效性和可靠性。

关键词: 恶意代码家族, 多特征, 恶意代码图像, 机器学习, 多分类器联合框架