Journal of Information Security Reserach ›› 2022, Vol. 8 ›› Issue (1): 35-.

Previous Articles     Next Articles

Research on Source Code Vulnerability Detection Based on Abstract Syntax Tree Compression Coding

  

  • Online:2022-01-09 Published:2022-01-07

基于抽象语法树压缩编码的漏洞检测方法

陈传涛  潘丽敏  罗森林    

  1. (北京理工大学信息系统及安全对抗实验中心 北京 100081

  • 通讯作者: 陈传涛 硕士研究生. 主要研究方向为信息安全. chencht163@163.com
  • 作者简介:陈传涛 硕士研究生. 主要研究方向为信息安全. chencht163@163.com 潘丽敏 硕士生导师,高级实验师. 主要研究方向为网络安全、文本安全、媒体安全、数据挖掘. panlimin2016@gmail.com 罗森林 教授, 博士生导师.主要研究方向为信息安全、数据挖掘、文本安全. luosenlin2012@gmail.com

Abstract: In source code vulnerability detection method based on abstract syntax tree, it is difficult to fully extract the syntax and structure features from the large-scale syntax tree, which lead to the problem of insufficient capability of vulnerability characterization and low detection accuracy. Aiming at above problem, an abstract syntax tree compression coding (abstract syntax tree compressed coding, ASTCC) based method for source code vulnerability detection is proposed. Firstly, the abstract syntax tree is divided into a group of subtrees by code statements, and then the subtrees are encoded by recursive neural network to extract the syntax information of code statements. Then, the subtree of the original syntax tree is replaced with its encoding node to reduce the depth and the number of leaf nodes of the abstract syntax tree while retaining the structural features. Finally, the tree based convolutional neural network with attention mechanism is used to detect source code vulnerabilities. Experimental results on NVD and SARD open datasets show that the proposed method reduced the size of the abstract syntax tree through compression coding, enhanced the model's ability to represent source code vulnerabilities, and effectively improved the accuracy of vulnerability detection.

Key words: vulnerability detection, abstract syntax tree, tree based convolutional neural network, attentional mechanism

摘要: 针对基于抽象语法树的源代码漏洞检测方法难以从大规模语法树中充分提取语法和结构特征,导致漏洞表征能力不足、检测准确率低的问题,提出了一种基于抽象语法树压缩编码(abstract syntax tree compressed coding,ASTCC)的源代码漏洞检测方法.该方法首先将程序抽象语法树以代码语句为单元分割成一组子树,然后通过递归神经网络对子树进行编码以提取代码语句内语法信息;再将原始语法树中的子树替换为其编码节点,从而在保留结构特征的同时减小原始语法树的深度和叶子节点数量;最后,通过带注意力机制的树卷积神经网络实现源代码漏洞检测.在NVD和SARD公开数据集上的实验结果表明,ASTCC方法能够降低抽象语法树的规模,增强模型对源代码漏洞的表征能力,有效提升漏洞检测准确率.

关键词: 漏洞检测, 抽象语法树, 树卷积神经网络, 注意力机制