信息安全研究 ›› 2023, Vol. 9 ›› Issue (6): 524-.

• 人工智能的安全风险与隐私保护专题 • 上一篇    下一篇

大型语言模型内容检测算法和绕过机制研究

叶露晨;范渊;王欣;阮文波;   

  • 出版日期:2023-06-04 发布日期:2023-06-03
  • 通讯作者: 叶露晨 硕士,助理研究员.主要研究方向为密码学、AI安全. lucien.ye@dbappsecurity.com.cn
  • 作者简介:叶露晨 硕士,助理研究员.主要研究方向为密码学、AI安全. lucien.ye@dbappsecurity.com.cn 范渊 硕士,教授级高级工程师.主要研究方向为网络安全与数据安全. frank.fan@dbappsecurity.com.cn 王欣 高级工程师.主要研究方向为网络攻击与防御. xin.wang@dbappsecurity.com.cn 阮文波 硕士,助理研究员.主要研究方向为网络与系统安全. leo.ruan@dbappsecurity.com.cn

Research on Content Detection Generated by Large Language Model  and the Mechanism of Bypassing

  • Online:2023-06-04 Published:2023-06-03

摘要: 近年来,大型语言模型(large language model, LLM)技术兴起,类似ChatGPT这样的AI机器人,虽然其内部设置了大量的安全对抗机制,攻击者依然可以精心设计问答,绕过这些AI机器人的安全机制,在其帮助下自动化生产钓鱼邮件,进行网络攻击.这种情形下,如何鉴别AI生成的文本也成为一个热门的问题.为了开展LLM生成内容检测实验,从互联网某社交平台和ChatGPT收集了一定数量的问答数据样本,依据AI文本可获得条件的不同,研究提出了一系列检测策略,包含基于在线可获取AI对照样本的文本相似度分析、基于离线条件下使用统计差异性的文本数据挖掘分析、基于无法获得AI样本条件下的LLM生成方式对抗分析以及基于通过微调目标LLM模型本身构建分类器的AI模型分析,计算并比较了每种情况下分析引擎的检测能力.另一方面,从网络攻防的角度,针对检测策略的特点,给出了一些对抗AI文本检测引擎的免杀技巧.

关键词: 大型语言模型, 钓鱼邮件, AI文本检测, ChatGPT, 网络攻防, AI检测对抗

Abstract: In recent years, there has been a surge in the development of large language models. AI robots like ChatGPT, although they have a largescale security confrontation mechanism inside, attackers can still elaborate questionandanswer patterns to bypass the mechanism, with their help to automatically produce phishing emails and carry out network attacks. In this case, how to identify the text generated by AI robots has also become a hot issue. In order to carry out LLMgenerated content detection experiment, our team collected a certain number of questionandanswer data samples from an Internet social platform and ChatGPT platform, and proposed a series of detection strategies according to different conditions of AI text availability. It includes text similarity analysis based on online controllable AI samples, text data mining based on statistical differences under offline conditions, adversarial analysis based on the LLM generation method under the condition that AI samples are not available, and AI model analysis based on building a classifier by finetuning the target LLM model itself. We calculated and compared the detection capabilities of the analysis engine in each case. On the other hand, we give some antikill techniques against AI text detection engines based on the characteristics of detection strategies, from the perspective of network attack and defense.

Key words: large language model, phishing emails, AI text detection, ChatGPT, network attack and defense, AI detection evasion