一种面向特殊领域隐语的大语言模型检测系统

信息安全研究 ›› 2024, Vol. 10 ›› Issue (9): 795-.

一种面向特殊领域隐语的大语言模型检测系统

姬旭张健毅赵张驰周子寅李毅龙孙泽正

(北京电子科技学院网络空间安全系北京100070)

出版日期:2024-09-25 发布日期:2024-09-29
通讯作者: 张健毅博士，副教授，CCF会员.主要研究方向为隐私保护与系统安全. zjy@besti.edu.cn
作者简介:姬旭硕士研究生.主要研究方向为大语言模型安全、知识图谱. 1164972083@qq.com 张健毅博士，副教授，CCF会员.主要研究方向为隐私保护与系统安全. zjy@besti.edu.cn 赵张驰主要研究方向为人工智能安全. xuelianwinter@gmail.com 周子寅硕士研究生.主要研究方向为数据隐私. mrzhouziyin@126.com 李毅龙硕士研究生.主要研究方向为自然语言处理与人工智能安全. elonisme@163.com 孙泽正硕士研究生.主要研究方向为联邦学习、隐私安全. 420264993@qq.com

A Large Language Model Detection System for Domainspecific Jargon

Ji Xu, Zhang Jianyi, Zhao Zhangchi, Zhou Ziyin, Li Yilong, and Sun Zezheng

(Department of Cyberspace Security, Beijing Electronic Science and Technology Institute, Beijing 100070)

Online:2024-09-25 Published:2024-09-29

摘要/Abstract

摘要： 大语言模型从模型本身和推理中检索知识以生成用户所需的答案，因此评价大语言模型的推理能力成为热点.然而，尽管在隐语方面大语言模型表现出较好的推理与理解能力，但在诸如电信诈骗等特殊领域隐语理解能力、推理能力的评价尚未出现.针对此问题，设计并实验了首个针对特殊领域隐语的大语言模型评估系统，同时提出了包含许多特殊主题的首个隐语数据集.针对数据交叉匹配问题和数据计算问题，分别提出了协同调和算法和基于指示函数的数据感知算法，从多角度评价大语言模型的表现.实验证明，该系统可以灵活、深入地评估大语言模型问答的识别准确性.同时，结果首次揭示了大语言模型基于提问风格和线索的识别准确性变化.设计系统可以作为一种审计工具帮助提高大语言模型的可靠性和安全性.

关键词: 大语言模型, 特殊领域隐语, 隐语检测, 评估系统, 黑话, 推理

Abstract: Large language model (LLM) retrieve knowledge from their own structures and reasoning processes to generate responses to user queries, thus many researchers begin to evaluate the reasoning capabilities of large language models. However, while these models have demonstrated strong reasoning and comprehension skills in generic language tasks, there remains a need to evaluate their proficiency in addressing specific domainrelated problems, such as those found in telecommunications fraud. In response to this challenge, this paper presents the first evaluation system for assessing the reasoning abilities of DomainSpecific Jargon and proposes the first domain specific jargon dataset. To address issues related to cross matching problem and complex data calculation problem, we propose the collaborative harmony algorithm and the data aware algorithm based on indicator functions. These algorithms provide a multidimensional assessment of the performance of large language models. Our experimental results demonstrate that our system is adaptable in evaluating the accuracy of questionanswering by large language models within specialized domains. Moreover, our findings reveal, for the first time, variations in recognition accuracy based on question style and contextual cues utilized by the models. In conclusion, our system serves as an objective auditing tool to enhance the reliability and security of large language models, particularly when applied to specialized domains.

Key words: large language model, DomainSpecific Jargon, cant language detection, evaluation system, slang, reasoning

中图分类号:

TP309

姬旭, 张健毅, 赵张驰, 周子寅, 李毅龙, . 一种面向特殊领域隐语的大语言模型检测系统[J]. 信息安全研究, 2024, 10(9): 795-.

Ji Xu, Zhang Jianyi, Zhao Zhangchi, Zhou Ziyin, Li Yilong, and Sun Zezheng. A Large Language Model Detection System for Domainspecific Jargon[J]. Journal of Information Security Reserach, 2024, 10(9): 795-.

参考文献

［1］Ienca M. Don’t pause giant AI for the wrong reasons［JOL］. Nature Machine Intelligence, 2023 ［20240612］. https:www.nature.comarticless4225602300649x［2］Piktus A. Online tools help large language models to solve problems through reasoning［J］. Nature, 2023, 618: 465466［3］OpenAI. Moderation documentation［EBOL］.［20230621］. https:platform.openai.comdocsguidesmoderationoverview［4］Raimondi R, Tzoumas N, Salisbury T, et al. Comparative analysis of large language models in the royal college of ophthalmologists fellowship exams［JOL］. Eye, 2023 ［20240612］. https:www.nature.comarticless41433023025633［5］OpenAI. Introducing ChatGPT［EBOL］.［20230621］. https:Openai.ComBlogChatgpt［6］Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback［J］. Advances in Neural Information Processing Systems, 2022, 35: 2773027744［7］Si W M, Backes M, Blackburn J, et al. Why so toxic? Measuring and triggering toxic behavior in opendomain chatbots［C］ Proc of the 2022 ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2022: 26592673［8］Kang D, Li X, Stoica I, et al. Exploiting programmatic behavior ofllms: Dualuse through standard security attacks［J］. arXiv preprint, arXiv:2302.05733, 2023［9］崔蒙. 电信诈骗犯罪隐语分析［J］. 北京警察学院学报, 2021 (3): 102105［10］Antoine Bordes, YLanBoureau, Jason Weston. Learning endtoend goaloriented dialog［J］. arXiv preprint, arXiv:1605.07683 2016［11］Milano S, McGrane J A, Leonelli S. Large language models challenge the future of higher education［J］. Nature Machine Intelligence, 2023, 5(4): 333334［12］Dasigi P, Lo K, Beltagy I, et al. A dataset of informationseeking questions and answers anchored in research papers［J］. arXiv preprint, arXiv:2105.03011, 2021［13］Ashish Vaswani, NoamShazeer, Niki Parmar, et al. Attention is all you need［COL］ Proc of the Annual Conf on Neural Information Processing Systems (NIPS). 2017 ［20240612］. https:papers.nips.ccpaper_filespaper2017hash3f5ee243547dee91fbd053c1c4a845aaAbstract.html［14］Zhong Li, Wang Zilong. A study on robustness and reliability of large language model code generation［J］. arXiv preprint, arXiv:2308.10335, 2023

[1]	潘琪亮, 李明, 皮振中, 黄利文, 方中奎, . 基于DeepSpeed框架构建信息安全领域通用人工智能模型的探索[J]. 信息安全研究, 2024, 10(E1): 155-.
[2]	刘楠, 陶源, 陈广勇, . 大语言模型在网络安全领域的应用[J]. 信息安全研究, 2024, 10(E1): 236-.
[3]	吴佩泽, 李光辉, 吴津宇, . 基于大语言模型的自动化漏洞验证代码生成方法研究[J]. 信息安全研究, 2024, 10(E1): 246-.
[4]	吴佩泽, 李光辉, 吴津宇, . 基于大语言模型的电力监控系统资产脆弱性管理技术研究[J]. 信息安全研究, 2024, 10(E1): 241-.
[5]	黄振, 单文政, 郭芙蓉, 郑剑波, 陈晏鹏, . 可信大模型政务问答系统设计与实现[J]. 信息安全研究, 2024, 10(E1): 191-.
[6]	聂万泉, . 大语言模型插件安全性研究[J]. 信息安全研究, 2024, 10(E1): 196-.
[7]	梁超, 王子博, 张耀方, 姜文瀚, 刘红日, 王佰玲, . 基于知识图谱推理的工控漏洞利用关系预测方法[J]. 信息安全研究, 2024, 10(6): 498-.
[8]	马勇, 罗森林, 吴云坤, 刘勇, 刘栋, . 提升网络安全软件中预训练大模型推理速度的研究[J]. 信息安全研究, 2023, 9(E1): 210-.
[9]	王桂江, 黄润才, 马诗语, 黄小刚, 王承茂. 微博截图中的用户观点定位方法研究[J]. 信息安全研究, 2022, 8(9): 908-.
[10]	冯济舟. 基于通用评估系统环境下数据管理平台设计的研究及思考[J]. 信息安全研究, 2021, 7(3): 281-286.