基于小样本学习的源码漏洞检测

信息安全研究 ›› 2024, Vol. 10 ›› Issue (5): 440-.

基于小样本学习的源码漏洞检测

陈洪森1方勇1郝城凌1杨运涛1张棋2

1(四川大学网络空间安全学院成都610207)
2(成都市互联网信息中心成都610041)

出版日期:2024-05-20 发布日期:2024-05-20
通讯作者: 张棋硕士.主要研究方向为网络数据安全政策、数据安全管理. sczhangxqi@126.com
作者简介:陈洪森硕士.主要研究方向为漏洞检测. modengxian@protonmail.com 方勇博士，教授，博士生导师.主要研究方向为网络对抗技术. yfang@scu.edu.cn 郝城凌硕士.主要研究方向为入侵检测、图神经网络. 1612170458@qq.com 杨运涛硕士.主要研究方向为图神经网络、APT溯源检测. ttmonica111@163.com 张棋硕士.主要研究方向为网络数据安全政策、数据安全管理. sczhangxqi@126.com

Source Code Vulnerability Detection Based on Fewshot Learning#br#
#br#

Chen Hongsen1, Fang Yong1, Hao Chengling1, Yang Yuntao1, and Zhang Qi2#br#

#br#

1(School of Cyber Science and Engineering, Sichuan University, Chengdu 610207)
2(Chengdu Internet Information Center, Chengdu 610041)

Online:2024-05-20 Published:2024-05-20

摘要/Abstract

摘要： 源码漏洞检测是发现及定位关键系统威胁的重要手段.目前，将深度学习技术应用于源码漏洞检测已经成为研究热点.然而，由于源码漏洞样本缺失，有限的数据条件资源导致现有的源码漏洞检测方法在小样本场景下效果不佳.提出了一种基于小样本学习的源码漏洞检测方法，其目标在于为有限样本量的源码漏洞检测场景提供解决方案.该方法由4个关键部分组成：源码切片和编码、基于元学习的数据集处理、基于动态路由算法的漏洞类向量生成和基于神经张量网络的漏洞类向量匹配.该方法和卷积神经网络、原型网络、关系网络进行了对比，实验结果表明，该方法在准确率方面优于其他的方法，可以有效应对源码漏洞样本稀疏问题.在2way 5shot和2way 10shot的情况下，该方法分别达到93.92%和95.08%的准确率.

关键词: 小样本学习, 漏洞检测, 归纳网络, 代码切片, 元学习

Abstract: Source code vulnerability detection is an important means to discover and localize threats to critical systems. At present, the application of deep learning techniques to source generation vulnerability detection has become a research hotspot. However, due to the lack of source code vulnerability samples, limited data condition resources lead to the poor effect of existing source code vulnerability detection methods in small sample scenarios. In this paper, we propose a source code vulnerability detection method based on fewshot learning, which aims to provide a solution for source code vulnerability detection scenarios with limited sample size. The method in this paper consists of four key components: source code slicing and encoding, metalearning based dataset processing, vulnerability class vector generation based on dynamic routing algorithms, and vulnerability class vector matching based on neural tensor networks. This paper’s method is compared with convolutional neural network, prototype network, and relational network, and the experimental results show that this paper’s method outperforms the others in terms of accuracy, and can effectively cope with the problem of sparse vulnerability samples in source code. In the case of 2way 5shot and 2way 10shot, this paper’s method achieves 93.92% and 95.08% accuracy, respectively.

Key words: fewshot learning, vulnerability detection, induction network, code slicing, metalearning

中图分类号:

TP183

陈洪森, 方勇, 郝城凌, 杨运涛, 张棋, . 基于小样本学习的源码漏洞检测[J]. 信息安全研究, 2024, 10(5): 440-.

参考文献

［1］刘宝旭, 李昊, 孙钰杰, 等. 智能化漏洞挖掘与网络空间威胁发现综述［J］. 信息安全研究, 2023, 9(10): 932939［2］Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules［C］ Proc of the 31st Int Conf on Neural Information Processing Systems. Red Hook, USA: Curran Associates Inc, 2017: 38593869［3］Geng R, Li B, Li Y, et al. Induction networks for fewshot text classification［C］ Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLPIJCNLP). Strousburg, PA: ACL, 2019: 39043913［4］Socher R, Chen D, Manning C D, et al. Reasoning with neural tensor networks for knowledge base completion［C］ Proc of the 26th Int Conf on Neural Information Processing SystemsVolume 1. Red Hook, USA: Curran Associates Inc, 2013: 926934［5］Kim Y. Convolutional neural networks for sentence classification［J］. arXiv preprint, arXiv:1408.5882, 2014［6］Snell J, Swersky K, Zemel R. Prototypical networks for fewshot learning［C］ Proc of the 31st Int Conf on Neural Information Processing Systems. Red Hook, USA: Curran Associates Inc, 2017: 40804090［7］Sung F, Yang Y, Zhang L, et al. Learning to compare: relation network for fewshot learning［C］ Proc of 2018 IEEECVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 11991208［8］赵凯琳, 靳小龙, 王元卓. 小样本学习研究综述［J］. 软件学报, 2021, 32(2): 349369［9］Finn C, Levine S. Metalearning: From fewshot learning to rapid reinforcement learning［EBOL］ (20190611) ［20240301］. https:sites.google.comviewicml19meta learning［10］李凡长, 刘洋, 吴鹏翔, 等. 元学习研究综述［J］. 计算机学报, 2021, 44(2): 422446［11］刘嘉勇, 韩家璇, 黄诚. 源代码漏洞静态分析技术［J］. 信息安全学报, 2022, 7(4): 100113［12］National Institute of Standards and Technology. Software assurance reference dataset［DBOL］. ［20200815］. https:samate.nist.govSRDindex.php

[1]	罗乐琦, 张艳硕, 王志强, 文津, 薛培阳, . 基于BERT模型的源代码漏洞检测技术研究[J]. 信息安全研究, 2024, 10(4): 294-.
[2]	邱勤, 夏羿, 王国宇, 申屠欣欣, 马禹昇, 郑国忠, 王雪珊, . 基于SBOM的软件供应链安全关键技术的研究[J]. 信息安全研究, 2023, 9(E2): 66-.
[3]	闫一非, 文斌, 张逢, . 基于图神经网络的智能合约源码漏洞检测[J]. 信息安全研究, 2023, 9(E1): 55-.
[4]	周建华, 李丰, 湛蓝蓝, 杜跃进, 霍玮, . 一种基于无害处理识别的嵌入式设备漏洞检测方法[J]. 信息安全研究, 2023, 9(10): 954-.
[5]	王旭阳, 秦玉海, 任思远, . 基于机器学习的Android混合应用代码注入攻击漏洞检测[J]. 信息安全研究, 2023, 9(10): 940-.
[6]	贺文轩, 王颉, 王晓龙, 万振华, . 开源软件风险下的金融行业软件供应链安全解决方案[J]. 信息安全研究, 2022, 8(E1): 23-.
[7]	刘宇航, 刘军杰, 文伟平. 基于符号执行的代币买卖漏洞和权限转移漏洞的检测验证方法[J]. 信息安全研究, 2022, 8(7): 632-.
[8]	冯美琪, 韩杰, 李建欣. 基于攻击特征的Apache Shiro反序列化攻击检测模型[J]. 信息安全研究, 2022, 8(7): 656-.
[9]	颜天佑, 卢灏, . 基于端口扫描的变电站主机漏洞检测系统[J]. 信息安全研究, 2022, 8(2): 182-.
[10]	李涛, 田迎军, 葛阳晨, 田源, 李剑, . 物联网固件漏洞安全检测综述[J]. 信息安全研究, 2022, 8(12): 1146-.
[11]	廖微. 智能微电网中具有可扩展性的Web漏洞扫描工具研究与实现[J]. 信息安全研究, 2022, 8(12): 1198-.
[12]	陈传涛潘丽敏罗森林 . 基于抽象语法树压缩编码的漏洞检测方法[J]. 信息安全研究, 2022, 8(1): 35-.
[13]	朱燕涛, 姚纪卫, 杨芳. 以内存为靶心的高级威胁防护方案[J]. 信息安全研究, 2021, 7(E1): 46-.
[14]	周航方勇黄诚刘亮陈兴刚. 针对PHP应用的二阶漏洞检测方法[J]. 信息安全研究, 2018, 4(4): 380-386.
[15]	孙伟. XSS漏洞研究综述[J]. 信息安全研究, 2016, 2(12): 1068-1080.