信息安全研究 ›› 2025, Vol. 11 ›› Issue (7): 594-.

• 学术论文 • 上一篇    下一篇

基于特征融合的双分支恶意代码同源性分析模型

刘凤春1,2,3,5,6张志枫1薛涛1,3,5杨光辉1,3,4,6魏群1   

  1. 1(华北理工大学理学院河北唐山063210)
    2(华北理工大学轻工学院河北唐山063210)
    3(铁矿石优选与铁前工艺智能化河北省工程研究中心(华北理工大学)河北唐山063210)
    4(河北省数据科学与应用重点实验室(华北理工大学)河北唐山063210)
    5(唐山市工程计算重点实验室(华北理工大学)河北唐山063210)
    6(唐山市智能工业与图像处理技术创新中心(华北理工大学)河北唐山063210)
  • 出版日期:2025-07-29 发布日期:2025-07-29
  • 通讯作者: 刘凤春 博士,教授.主要研究方向为网络安全、数据挖掘和机器学习. lnobliu@ncst.edu.cn
  • 作者简介:刘凤春 博士,教授.主要研究方向为网络安全、数据挖掘和机器学习. lnobliu@ncst.edu.cn 张志枫 硕士.主要研究方向为恶意代码分类. Zhangzf@stu.ncst.edu.cn 薛涛 硕士,副教授.主要研究方向为深度学习、多媒体技术和机器学习. xuetao@ncst.edu.cn 杨光辉 博士,讲师.主要研究方向为网络与空间安全、数据挖掘和深度学习. yangguanghui@ncst.edu.cn 魏群 博士,副教授.主要研究方向为数据库、智能信息检索. ts_weiqun@163.com

Dualbranch Malicious Code Homology Analysis Model Based on Feature Fusion

Liu Fengchun1,2,3,5,6, Zhang Zhifeng1, Xue Tao1,3,5, Yang Guanghui1,3,4,6, and Wei Qun1   

  1. 1(College of Science, North China University of Science and Technology, Tangshan, Hebei 063210)
    2(School of Light Industry, North China University of Science and Technology, Tangshan, Hebei 063210)
    3(Hebei Engineering Research Center of Iron Ore Optimization and Intelligent Iron Preprocess(North China University of Science and Technology), Tangshan, Hebei 063210)
    4(Hebei Key Laboratory of Data Science and Application(North China University of Science and Technology), Tangshan, Hebei 063210)
    5(The Key Laboratory of Engineering Computing in Tangshan City(North China University of Science and Technology), Tangshan, Hebei 063210)
    6(Tangshan Intelligent Industry and Image Processing Technology Innovation Center(North China University of Science and Technology), Tangshan, Hebei 063210)
  • Online:2025-07-29 Published:2025-07-29

摘要: 在恶意代码同源性分析中,由于加密、混淆和加壳等技术产生大量恶意代码变种,导致深度学习模型对恶意代码特征提取能力不足的问题.为此,提出一种多分支卷积和Transformer构建的双分支恶意代码同源性分析模型MCATNet(multibranch convolution and TransformerNet).首先,构建MCATNet双分支网络,一个分支是多分支卷积MBC(multibranch convolution)模块,以MBC模块构建CNN分支,同时引入混合注意力机制,使网络在兼顾局部特征的同时更能关注核心特征;另一个分支是以ViT为主干的Transformer模块,提取恶意代码图像的全局特征信息并提出下采样模块,在精细地保留全局特征的同时使Transformer与CNN的特征图在空间尺度对齐;其次,以级联的策略融合CNN分支的局部特征和Transformer分支的全局特征,解决网络只关注单一特征问题;最后,使用Softmax分类器对恶意代码家族进行同源性分析.实验结果表明,基于特征融合的双分支模型的分类准确率达到99.24%,相比单支CNN和单支Transformer模型,准确率分别提高0.11%和0.65%.

关键词: 双分支, 特征融合, 多分支卷积, 注意力机制, 下采样

Abstract: In the homology analysis of malicious code, a large number of malicious code variants are generated due to techniques such as encryption, obfuscation, and packing, which leads to the problem that the deep learning model has insufficient ability to extract the features of malicious code. To solve this problem, a multibranch convolution and transformernet (MCATNet) homology analysis model based on feature fusion was proposed. Firstly, an MCATNet dualbranch network was constructed, one branch was a multibranch convolutional MBC (Multibranch convolution) module, and the MBC module was used to construct the CNN branch, and the CBAM hybrid attention mechanism was introduced to make the network pay more attention to the core features while taking into account the local features. Another branch is the Transformer module with ViT as the backbone, which extracts global feature information of malicious code images and proposes a downsampling module to finely preserve global features while aligning the feature maps of Transformer and CNN at the spatial scale. Secondly, the cascading strategy is used to fuse the local features of the CNN branch and the global features of the Transformer branch to solve the problem that the network only focuses on a single feature. Finally, the Softmax classifier was used to analyze the homology of the malicious code family. Experimental results show that the classification accuracy of the twobranch model based on feature fusion reaches 99.24%, which is 0.11% and 0.65% higher than that of the singlebranch CNN and singlebranch Transformer models, respectively.

Key words: doublebranched, feature fusion, multibranch convolution, attention mechanism, downsampling

中图分类号: