信息安全研究

• 学术论文 • 上一篇    下一篇

基于频谱分析的PDF文件恶意代码检测方法

  

  1. 四川大学信息安全研究所 成都 610065
  • 收稿日期:2016-01-03 出版日期:2016-02-05 发布日期:2016-02-06

PDF File Malicious Code Detection Method Based on Spectrum Analysis

  1. Institute of Information Security, Sichuan University, Chengdu 610065
  • Received:2016-01-03 Online:2016-02-05 Published:2016-02-06

摘要: 在对基于频谱分析的复合文档恶意代码检测方法研究的基础上,提出了一种基于机器学习的频谱分析PDF文件恶意代码检测方法,首先介绍了PDF文件恶意代码检测系统.并且在原有基础上进一步优化了改进后的实数序列FFT算法;通过从PDF文件储存结构中提取出全部字段的内容进行频谱变换,对变换后的频谱图进行分析,通过使用机器学习中的算法提取频谱图的特征属性,再根据机器学习后形成的算法对PDF文件是否感染恶意代码进行检测,最后,通过实验分析,验证了所用方法的正确性和有效性,并给出了整个系统的实现方案与检验结果数据.

关键词: 便携式文档格式, 恶意代码检测, 频谱变换, 机器学习, 特征提取

Abstract: In basic research based on compound documents malicious code detection method of spectrum analysis, a spectrum analysis method based on machine learning is proposed to detect the malicious code in PDF file, firstly, we introduced the PDF file malicious code detection system and make a further optimization of the improved real sequence FFT algorithm based on the original ; we made the spectrum transformation of all the field contents extracted from the PDF file storage structure, and analyzed the transformed spectrum map, extracted feature attributes of those spectrum maps by using machine learning algorithms, then formed whether PDF files infected with malicious code is detected according to the machine learning algorithm ,and finally, through experimental analysis , to verify the correctness and validity of the method used , and gives the implementation of the whole system and test result data.

Key words: portable document format (PDF), malicious code detection, spectrum transform, machine learning, feature extraction