信息安全研究 ›› 2025, Vol. 11 ›› Issue (1): 43-.

• 学术论文 • 上一篇    下一篇

一种基于图社区检测的二进制模块化方法

刘新鹏傅强张红宝陈晓光杨满智   

  1. (恒安嘉新(北京)科技股份公司北京100080)
  • 出版日期:2025-01-24 发布日期:2025-01-24
  • 通讯作者: 刘新鹏 硕士研究生,高级工程师.主要研究方向为网络攻防、软件安全、恶意流量分析、攻击溯源、应急响应. liuxinpeng@eversec.cn
  • 作者简介:刘新鹏 硕士研究生,高级工程师.主要研究方向为网络攻防、软件安全、恶意流量分析、攻击溯源、应急响应. liuxinpeng@eversec.cn 傅强 主要研究方向为网络安全、流量分析、逆向工程、密码算法. fuqiang@eversec.cn 张红宝 硕士.主要研究方向为AI与网络安全. zhanghongbao@eversec.cn 陈晓光 博士研究生,恒安嘉新总经理.主要研究方向为网络安全、人工智能. chenxiaoguang@eversec.cn 杨满智 博士研究生.主要研究方向为网络安全、反欺诈以及相关领域的人工智能决策. yangmanzhi@eversec.cn

A Binary Modularization Approach Based on Graph Community  Detection Method

Liu Xinpeng, Fu Qiang, Zhang Hongbao, Chen Xiaoguang, and Yang Manzhi   

  1. (Eversec Technology Co., Ltd., Beijing 100080)
  • Online:2025-01-24 Published:2025-01-24

摘要: 随着信息技术的不断发展,软件规模不断增加.复杂的大型软件是通过组合实现独立功能模块的组件构建的.然而,一旦源代码被编译成二进制文件这些模块化信息就会丢失,二进制模块化任务的目标就是重建这些模块化信息.二进制模块化任务有许多下游应用场景,比如二进制代码复用现象检测、二进制相似度检测、二进制软件成分分析等.提出一种新的图社区检测算法,并基于该算法设计了一种二进制模块化方法.通过对7839个Linux系统的二进制文件进行模块化验证该方法的有效性,实验显示该方法的Normalized Turbo MQ指标为0.557,比现有的最先进方法提升58.6%,并且该方法的运行时间开销远低于已有方法.同时,还提出了一种库粒度的二进制模块化方法,已有的二进制模块化方法只能将二进制文件分解为若干个模块,库粒度的二进制模块化方法可以将二进制文件分解为若干个库,同时展示了这种方法在挖矿恶意软件家族分类中的应用.

关键词: 软件安全, 二进制分析, 软件模块化, 图神经网络, 社区检测

Abstract: With the continuous development of information technology, the scale of software is also constantly increasing. Complex largescale software is built by combining components that perform independent functions. However, once the source code is compiled into binary files, this modular information is lost,and the goal of binary modularization tasks is to reconstruct this information. Binary modularization has many downstream applications such as detecting binary code reuse, binary similarity detection, and binary software composition analysis. We introduce a new graph community detection algorithm and designs a binary modularization method based on this algorithm. The method’s effectiveness is verified through modularization of 7839 binary files from the Linux system. Experiments show that the method’s Normalized Turbo MQ indicator is 0.557, which is a 58.6% improvement over existing stateoftheart methods, and the running time is much less than existing methods. Additionally, we also put forward a librarylevel binary modularization method. Existing binary modularization methods can only decompose binaries into several modules, whereas the proposed librarylevel binary modularization method allows for the decomposition of binaries into several libraries. We also demonstrate the application of this method in malware classification.

Key words: software security, binary analysis, software modularization, graph neural network, community detection

中图分类号: