一种基于图社区检测的二进制模块化方法

信息安全研究 ›› 2025, Vol. 11 ›› Issue (1): 43-.

一种基于图社区检测的二进制模块化方法

刘新鹏傅强张红宝陈晓光杨满智

(恒安嘉新(北京)科技股份公司北京100080)

出版日期:2025-01-24 发布日期:2025-01-24
通讯作者: 刘新鹏硕士研究生，高级工程师.主要研究方向为网络攻防、软件安全、恶意流量分析、攻击溯源、应急响应. liuxinpeng@eversec.cn
作者简介:刘新鹏硕士研究生，高级工程师.主要研究方向为网络攻防、软件安全、恶意流量分析、攻击溯源、应急响应. liuxinpeng@eversec.cn 傅强主要研究方向为网络安全、流量分析、逆向工程、密码算法. fuqiang@eversec.cn 张红宝硕士.主要研究方向为AI与网络安全. zhanghongbao@eversec.cn 陈晓光博士研究生，恒安嘉新总经理.主要研究方向为网络安全、人工智能. chenxiaoguang@eversec.cn 杨满智博士研究生.主要研究方向为网络安全、反欺诈以及相关领域的人工智能决策. yangmanzhi@eversec.cn

A Binary Modularization Approach Based on Graph Community Detection Method

Liu Xinpeng, Fu Qiang, Zhang Hongbao, Chen Xiaoguang, and Yang Manzhi

(Eversec Technology Co., Ltd., Beijing 100080)

Online:2025-01-24 Published:2025-01-24

摘要/Abstract

摘要： 随着信息技术的不断发展，软件规模不断增加.复杂的大型软件是通过组合实现独立功能模块的组件构建的.然而，一旦源代码被编译成二进制文件这些模块化信息就会丢失，二进制模块化任务的目标就是重建这些模块化信息.二进制模块化任务有许多下游应用场景，比如二进制代码复用现象检测、二进制相似度检测、二进制软件成分分析等.提出一种新的图社区检测算法，并基于该算法设计了一种二进制模块化方法.通过对7839个Linux系统的二进制文件进行模块化验证该方法的有效性，实验显示该方法的Normalized Turbo MQ指标为0.557，比现有的最先进方法提升58.6%，并且该方法的运行时间开销远低于已有方法.同时，还提出了一种库粒度的二进制模块化方法，已有的二进制模块化方法只能将二进制文件分解为若干个模块，库粒度的二进制模块化方法可以将二进制文件分解为若干个库，同时展示了这种方法在挖矿恶意软件家族分类中的应用.

关键词: 软件安全, 二进制分析, 软件模块化, 图神经网络, 社区检测

Abstract: With the continuous development of information technology, the scale of software is also constantly increasing. Complex largescale software is built by combining components that perform independent functions. However, once the source code is compiled into binary files, this modular information is lost,and the goal of binary modularization tasks is to reconstruct this information. Binary modularization has many downstream applications such as detecting binary code reuse, binary similarity detection, and binary software composition analysis. We introduce a new graph community detection algorithm and designs a binary modularization method based on this algorithm. The method’s effectiveness is verified through modularization of 7839 binary files from the Linux system. Experiments show that the method’s Normalized Turbo MQ indicator is 0.557, which is a 58.6% improvement over existing stateoftheart methods, and the running time is much less than existing methods. Additionally, we also put forward a librarylevel binary modularization method. Existing binary modularization methods can only decompose binaries into several modules, whereas the proposed librarylevel binary modularization method allows for the decomposition of binaries into several libraries. We also demonstrate the application of this method in malware classification.

Key words: software security, binary analysis, software modularization, graph neural network, community detection

中图分类号:

TP311.5

刘新鹏, 傅强, 张红宝, 陈晓光, 杨满智, . 一种基于图社区检测的二进制模块化方法[J]. 信息安全研究, 2025, 11(1): 43-.

参考文献

［1］Khoo W M, Mycroft A, Anderson R. Rendezvous: A search engine for binary code［C］ Proc of the 10th Working Conf on Mining Software Repositories (MSR). Piscataway, NJ: IEEE, 2013: 329338［2］Meng X. Finegrained binary code authorship identification［C］ Proc of the 24th ACM SIGSOFT Int Symp on Foundations of Software Engineering. New York: ACM, 2016: 10971099［3］Yang C, Xu Z, Chen H, et al. ModX: Binary level partially imported thirdparty library detection via program modularization and semantic matching［C］ Proc of the 44th Int Conf on Software Engineering. New York: ACM, 2022: 13931405［4］Karande V, Chandra S, Lin Z, et al. BCD: Decomposing binary code into components using graphbased clustering［C］ Proc of the 2018 on Asia Conf on Computer and Communications Security. New York: ACM, 2018: 393398［5］Caballero J, Johnson N M, McCamant S, et al. Binary code extraction and interface identification for security applications［C］ Proc of the Network and Distributed System Security. New York: ACM, 2010: 391408［6］Rosenblum N, Zhu X, Miller B P. Who wrote this code? identifying the authors of program binaries［G］ LNCS 6879: Proc of the 16th European Symp on Research in Computer Security. Berlin: Springer, 2011: 172189［7］Mancoridis S, Mitchell B S, Chen Y, et al. Bunch: A clustering tool for the recovery and maintenance of software system structures［C］ Proc of IEEE Int Conf on Software Maintenance (ICSM’99). Piscataway, NJ: IEEE, 1999: 5059［8］Sarhan Q I, Ahmed B S, Bures M, et al. Software module clustering: An indepth literature analysis［J］. IEEE Trans on Software Engineering, 2020, 48(6): 19051928［9］Newman M E J. Fast algorithm for detecting community structure in networks［J］. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 2004, 69(6): 66133［10］Lutellier T, Chollak D, Garcia J, et al. Measuring the impact of code dependencies on software architecture recovery techniques［J］.IEEE Trans on Software Engineering, 2018, 44(99): 159181［11］Fan M, Liu J, Luo X, et al. Android malware familial classification and representative sample selection via frequent subgraph analysis［J］. IEEE Trans on Information Forensics and Security, 2018, 13(8): 18901905［12］Collberg C, Thomborson C, Low D. A taxonomy of obfuscating transformations［R］. Auckland, New Zealand: Department of Computer Science, The University of Auckland, 1997［13］Averbuch A, Kiperberg M, Zaidenberg N J. Trulyprotect: An efficient VMbased software protection［J］. IEEE Systems Journal, 2013, 7(3): 455466

[1]	王江, 姜伟, 张璨, . 开源软件供应链安全风险分析研究[J]. 信息安全研究, 2024, 10(9): 862-.
[2]	王朋成, 高思淼, 王斯奋, 洪晟, 李众豪, . 自动驾驶系统模糊测试技术综述[J]. 信息安全研究, 2024, 10(11): 982-.
[3]	邱勤, 夏羿, 王国宇, 申屠欣欣, 马禹昇, 郑国忠, 王雪珊, . 基于SBOM的软件供应链安全关键技术的研究[J]. 信息安全研究, 2023, 9(E2): 66-.
[4]	闫一非, 文斌, 张逢, . 基于图神经网络的智能合约源码漏洞检测[J]. 信息安全研究, 2023, 9(E1): 55-.
[5]	梁飞, 卫兰, 林文成, . 基于子空间图聚类检测以太坊恶意账户的方法[J]. 信息安全研究, 2023, 9(E1): 68-.
[6]	喻晓伟, 陈丹伟, . 基于注意力机制的图神经网络加密流量分类研究[J]. 信息安全研究, 2023, 9(1): 13-.
[7]	贺文轩, 王颉, 王晓龙, 万振华, . 开源软件风险下的金融行业软件供应链安全解决方案[J]. 信息安全研究, 2022, 8(E1): 23-.
[8]	苏文超, 费洪晓. 覆盖率引导的灰盒模糊测试综述[J]. 信息安全研究, 2022, 8(7): 643-.
[9]	陆海婧, 李青梅, . 基础软件产业链安全问题与对策研究[J]. 信息安全研究, 2021, 7(E2): 86-.
[10]	陈华钧耿玉霞叶志权邓淑敏. “知识图谱+深度学习”赋能内容安全[J]. 信息安全研究, 2019, 5(11): 975-980.
[11]	夏航宇薛聪郭晓博穆楠. 移动支付安全技术研究综述[J]. 信息安全研究, 2019, 5(10): 944-952.
[12]	贺江敏相里朋. 代码安全性审查方法研究[J]. 信息安全研究, 2018, 4(11): 977-986.
[13]	王帆洪流顾欣. 基于Sigmoid函数的软件漏洞风险评价算法[J]. 信息安全研究, 2018, 4(11): 993-996.