基于GHM可视化和深度学习的恶意代码检测与分类

信息安全研究 ›› 2024, Vol. 10 ›› Issue (3): 216-.

基于GHM可视化和深度学习的恶意代码检测与分类

张淑慧1,2,3,4胡长栋1王连海1,2,3,4徐淑奖1,2,3,4邵蔚1,2,3,4兰田1

1(齐鲁工业大学(山东省科学院)山东省计算中心(国家超级计算济南中心)济南250014)
2(算力互联网与信息安全教育部重点实验室(齐鲁工业大学(山东省科学院))济南250014)
3(山东省计算机网络重点实验室(山东省计算中心(国家超级计算济南中心))济南250014)
4(山东省基础科学研究中心(计算机科学)齐鲁工业大学(山东省科学院))济南250014)

出版日期:2024-03-23 发布日期:2024-03-08
通讯作者: 胡长栋硕士研究生.主要研究方向为恶意代码检测. 10431210649@stu.qlu.edu.cn
作者简介:张淑慧博士,研究员.主要研究方向为恶意代码检测和区块链. zhangshh@sdas.org 胡长栋硕士研究生.主要研究方向为恶意代码检测. 10431210649@stu.qlu.edu.cn 王连海博士,研究员.主要研究方向为数字取证和区块链. wanglh@sdas.org 徐淑奖博士,研究员.主要研究方向为区块链. xushuj@sdas.org 邵蔚博士.主要研究方向为区块链. shaow@sdas.org 兰田硕士.主要研究方向为恶意代码检测. 10431200585@stu.qlu.edu.cn

Malware Detection and Classification Based on GHM Visualization and Deep Learning

Zhang Shuhui1,2,3,4, Hu Changdong1, Wang Lianhai1,2,3,4, Xu Shujiang1,2,3,4, Shao Wei1,2,3,4, and Lan Tian1#br#

#br#

1(Qilu University of Technology (Shandong Academy of Sciences) Shandong Computing Center (National Supercomputing Jinan Center), Jinan 250014)
2(Key Laboratory of Computing Power Network and Information Security, Ministry of Education (Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014)
3(Shandong Provincial Key Laboratory of Computer Networks (Shandong Computing Center (National Supercomputing Jinan Center)), Jinan 250014)
4(Shandong Fundamental Research Center for Computer Science (Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014)

Online:2024-03-23 Published:2024-03-08

摘要/Abstract

摘要： 恶意代码的复杂性和变异性在不断增加，致使恶意软件的检测变得越来越具有挑战性.大多数变异或未知的恶意程序是在现有恶意代码的逻辑基础上进行改进或混淆形成的，因此发现恶意代码家族并确定其恶意行为变得越来越重要.提出了一种基于GHM(Gray,HOG,Markov)的新型恶意软件可视化方法进行数据预处理.与传统的可视化方法不同，该方法在可视化过程中通过HOG和马尔科夫提取出更加有效的数据特征，并构建了3通道彩色图像.此外，构建了基于CNN和LSTM的VLMal分类模型，对可视化图像进行恶意软件检测分类.实验结果表明，该方法可以有效地检测和分类恶意代码，具有较好的准确性和稳定性.

关键词: 恶意软件检测, 深度学习, 恶意软件分类, 内存取证, 可视化

Abstract: Malware detection is becoming more and more challenging due to the increasing complexity and variability of malicious code. Most mutated or unknown malicious programs are formed by improving or obfuscating the logic of existing malicious codes, so it is becoming more and more important to discover malicious code families and determine their malicious behaviors. In this paper, we proposed a novel malware visualization method based on GHM (Gray, HOG, Markov) for data preprocessing. Unlike the traditional visualization methods, this method extracts more effective data features through HOG and Markov in the visualization process, and constructs a threechannel color image. In addition, a VLMal classification model based on CNN and LSTM is constructed to realize the malware detection and classification of visual images. Experimental results show that this method can effectively detect and classify malicious code with good accuracy and stability.

Key words: malware detection, deep learning, malware classification, memory forensics, visualization

中图分类号:

TP393.08

张淑慧, 胡长栋, 王连海, 徐淑奖, 邵蔚, 兰田, . 基于GHM可视化和深度学习的恶意代码检测与分类[J]. 信息安全研究, 2024, 10(3): 216-.

参考文献

［1］OrMeir O, Nissim N, Elovici Y, et al. Dynamic Malware analysis in the modern era—A state of the art survey［J］. ACM Computing Surveys, 2019, 52(5): 148［2］AV Test Malware Statistic. Malware statistics & trends report| AVTEST［EBOL］. ［20221223］. https:www.avtest.orgenstatisticsmalware［3］曹婉莹, 曹旭栋, 葛平原, 等.中美网络安全漏洞披露与共享政策研究［J］. 信息安全研究, 2023, 9(6): 602608［4］Nataraj L, Karthikeyan S, Jacob G, et al. Malware images:Visualization and automatic classification［C］ Proc of the 8th Int Symp on Visualization for Cyber security. New York:ACM,2011: 17［5］Schultz M G, Eskin E, Zadok F, et al. Data mining methods for detection of new malicious executables［C］ Proc of the IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2000: 3849［6］Iwamoto K, Wasaki K. Malware classification based on extracted API sequences using static analysis［C］ Proc of the 8th Asian Internet Engineering Conf. New York: ACM, 2012: 3138［7］Zhang H, Xiao X, Mercaldo F, et al. Classification of ransomware families with machine learning based on Ngram of opcodes［J］. Future Generation Computer Systems, 2019, 90: 211221［8］Soni H, Kishore P, Mohapatra D P. Opcode and API based machine learning framework for malware classification［C］ Proc of the 2nd Int Conf on Intelligent Technologies (CONIT). Piscataway, NJ: IEEE, 2022: 17［9］Anderson B, Quist D, Neil J, et al. Graphbased malware detection using dynamic analysis［J］. Journal in Computer Virology, 2011, 7: 247258［10］Nair V P, Jain H, Golecha Y K, et al. Medusa: Metamorphic malware dynamic analysis usingsignature from API［C］ Proc of the 3rd Int Conf on Security of Information and Networks. New York: ACM, 2010: 263269［11］Bayer U, Comparetti P M, Hlauschek C, et al. Scalable, behaviorbased malware clustering［COL］ Proc of NDSS. 2009 ［20230901］. https:www.ndsssymposium.orgndss2009［12］Bozkir A S, Tahillioglu E, Aydos M, et al. Catch them alive: A malware detection approach through memory forensics, manifold learning and computer vision［J］. Computers & Security, 2021, 103: 102166［13］Otsuki Y, Kawakoya Y, Iwamura M, et al. Building stack traces from memory dump of Windows x64［J］. Digital Investigation, 2018, 24: 101110［14］Uroz D, Rodríguez R J. Characteristics and detectability ofwindows autostart extensibility points in memory forensics［J］. Digital Investigation, 2019, 28: 95104［15］MartínPérez M, Rodríguez R J, Balzarotti D. Preprocessing memory dumps to improve similarity score of windows modules［J］. Computers & Security, 2021, 101: 102119［16］Microsoft Malware Protection Center. Kaggle［J］. arXiv preprint, arXiv:1802.10135, 2014［17］Catak F O, Yaz A F, Elezaj O, et al. Deep learning basedsequential model for malware analysis using windows exe API calls［J］. PeerJ Computer Science, 2020, 6: 285［18］The Volatility Foundation.Volatility2.6［EBOL］. ［20221223］. https:www.volatilityfoundation.orgreleases［19］王连海. 基于物理内存分析的在线取证模型与方法的研究［D］. 济南: 山东大学, 2014［20］卢喜东, 段哲民, 钱叶魁, 等. 一种基于深度森林的恶意代码分类方法［J］. 软件学报, 2020, 31(5): 14541464

[1]	罗乐琦, 张艳硕, 王志强, 文津, 薛培阳, . 基于BERT模型的源代码漏洞检测技术研究[J]. 信息安全研究, 2024, 10(4): 294-.
[2]	杨晓文, 张健, 况立群, 庞敏, . 融合CNN-BiGRU和注意力机制的网络入侵检测模型[J]. 信息安全研究, 2024, 10(3): 202-.
[3]	赵荻, 尹志超, 崔苏苏, 曹中华, 卢志刚, . 基于图表示的恶意TLS流量检测方法[J]. 信息安全研究, 2024, 10(3): 209-.
[4]	江荣旺, 魏爽, 龙草芳, 杨明, . 基于增量学习的车联网恶意位置攻击检测研究[J]. 信息安全研究, 2024, 10(3): 277-.
[5]	王耀辉, 王可, 宫良一, 付豫豪, 王跃达, 李婧, . 基于异构图的恶意域名检测方法研究[J]. 信息安全研究, 2023, 9(E1): 38-.
[6]	郑丽娜, 杜彦辉, . 基于深度学习的HTTP慢速DoS攻击检测研究[J]. 信息安全研究, 2023, 9(E1): 72-.
[7]	叶水欢, 葛寅辉, 陈波, 于泠, . 基于ELMoTextCNN的网络欺凌检测模型[J]. 信息安全研究, 2023, 9(9): 868-.
[8]	王志强, 王姿旖, 王庆德, 徐华福, . 基于LightGBM的区块链异常交易检测技术研究[J]. 信息安全研究, 2023, 9(9): 877-.
[9]	李敬. 基于卷积神经网络的加密代理流量识别方法[J]. 信息安全研究, 2023, 9(8): 722-.
[10]	张鹏飞. 基于机器学习的入侵检测模型对比研究[J]. 信息安全研究, 2023, 9(8): 739-.
[11]	杜林, 许传淇. 基于BERT的漏洞文本特征分类技术研究[J]. 信息安全研究, 2023, 9(7): 687-.
[12]	蒋明, 张宗凯, 刘熙尧, 郭标, 胡家馨, 张硕, . 基于多注意力机制的孪生网络图像隐写分析方法[J]. 信息安全研究, 2023, 9(6): 573-.
[13]	刘亦纯, 张光华, 宿景芳. 基于多级度量差值的神经网络后门检测方法[J]. 信息安全研究, 2023, 9(6): 587-.
[14]	王志强, 都迎迎, 林雨衡, 陈旭东, . 基于文本关键词的对抗样本生成技术研究[J]. 信息安全研究, 2023, 9(4): 338-.
[15]	陈颖, 林雨衡, 王志强, 都迎迎, 文津, . 基于Transformer的安卓恶意软件多分类模型[J]. 信息安全研究, 2023, 9(12): 1138-.