信息安全研究 ›› 2023, Vol. 9 ›› Issue (8): 762-.

• 学术论文 • 上一篇    下一篇

基于开集识别的恶意代码家族同源性分析

刘亚倩   

  1. (北京天融信网络安全技术有限公司北京100085)
  • 出版日期:2023-08-01 发布日期:2023-09-05
  • 通讯作者: 刘亚倩 硕士.主要研究方向为机器学习与网络安全. 2395744091@qq.com
  • 作者简介:刘亚倩 硕士.主要研究方向为机器学习与网络安全. 2395744091@qq.com

Analysis on the Homology of Malware Families Based on  Openset Recognition

  • Online:2023-08-01 Published:2023-09-05

摘要: 目前,恶意代码家族同源性分析方法多侧重于闭集分类问题的研究,即假定待测样本一定属于某个已知家族类别.然而真实环境中的恶意代码家族众多,未知类别的家族通常占大多数,采用闭集识别的方法,无法准确识别真实环境中的恶意代码家族.针对上述问题,提出了一种基于开集识别的恶意代码家族同源性分析方法.通过NGram滑动窗口和Doc2vec句嵌入方法将恶意代码可执行文件转换成灰度图像,基于卷积神经网络模型MobileNet获取灰度图像数据的特征,利用Open Longtailed Recognition模型实现恶意代码家族的开集识别.在9个已知类别和9个未知类别恶意代码家族上进行识别,实验结果表明,所提出的方法能够识别出未知类别恶意代码家族,同时在已知类别和未知类别家族上都能保持较高的准确率.

Abstract: At present, analysis on the homology of malware families mostly focuses on the closedset problem, that is, it is assumed that the samples to be tested must belong to a certain known class.However, there are many malware families in an open world, and the unknown classes usually account for the majority. The closedset recognition cannot accurately identify the malware families in an open world. Aiming at the above problems, this paper proposes a homology analysis method for malware families based on openset recognition. The malware executable files are converted into grayscale images through NGram sliding window and Doc2vec sentence embedding method, the features of the grayscale images are obtained based on the convolutional neural network model MobileNet, and the Open Longtailed Recognition model is used to realize openset recognition of malware families. Identifying 9 known classes and 9 unknown classes of malware families, the experimental results show that the proposed method can identify the malware family of the unknown classes while maintaining high accuracy on both known and unknown families.