Highorder Program Driven by Large Language Model

Journal of Information Security Reserach ›› 2025, Vol. 11 ›› Issue (11): 1008-.

Previous Articles Next Articles

Highorder Program Driven by Large Language Model

Wei Tao, Zhong Zhenyu, Liu Yan, Chen Da, Xue Jianxin, Hu Yuelin, Yu Chaofan, and Zhou Yunhao#br#

#br#

(Ant Group, Hangzhou 310023)

Online:2025-11-27 Published:2025-11-27

大模型驱动的高阶程序

韦韬仲震宇刘焱陈达薛见新胡钺琳余超凡周云浩

(蚂蚁集团杭州310023)

通讯作者: 韦韬博士，正高级工程师，蚂蚁集团副总裁兼首席技术安全官，北京大学客座教授.致力于让各种复杂系统变得更加安全可靠，多项成果帮助各主流操作系统提升安全性，领导并推动了多个知名开源安全软件的发展. lenx.wei@antgroup.com
作者简介:韦韬博士，正高级工程师，蚂蚁集团副总裁兼首席技术安全官，北京大学客座教授.致力于让各种复杂系统变得更加安全可靠，多项成果帮助各主流操作系统提升安全性，领导并推动了多个知名开源安全软件的发展. lenx.wei@antgroup.com 仲震宇博士，正高级工程师.主要研究方向为人工智能安全、大语言模型原生安全、安全对齐、网络安全、系统安全、应用安全. edward.zhong@antgroup.com 刘焱工程师.主要研究方向为数据安全、AI安全. bencao.ly@antgroup.com 陈达主要研究方向为大模型垂直领域可信安全和AI安全. xiawei.cd@antgroup.com 薛见新博士.主要研究方向为可信大模型与安全运营. xuejianxin.xjx@antgroup.com 胡钺琳硕士.主要研究方向为大模型在安全领域的应用、大模型的可信安全. huyuelin.hyl@antgroup.com 余超凡硕士.主要研究方向为隐语密态引擎、隐私保护大模型. shuyan.ycf@antgroup.com 周云浩硕士.主要研究方向为大模型在数据流通中的智能化应用. zhouyunhao.zyh@antgroup.com

Abstract

Abstract: Large language models (LLMs) often exhibit hallucinations in various occasions, leading to unreliable inferences. Such vulnerabilities render it critical for LLMs to be adopted cautiously in vertical domains such as financial, medical, and cybersecurity domains. In preLLM era, humans have accumulated the best practices to ensure reliabilities of complicated tasks through careful engineering. Standard operating procedures (SOP) and Check List are the exemplars of these best practices. Likewise, in LLM era, we propose highorder program (HOP)to achieve the reliability breakthroughs. By fusing both accurate execution of traditional programing languages, and superior knowledge intrinsics of LLMs, HOP sets the backbone of the control system required by vertical LLM applications. HOP achieves automations by leveraging key vertical knowledge and practices. More importantly, it delivers expected reliability through verifications. HOP itself can be autogenerated by LLMs, which further incentivizes its wide adoptions. Lately, we have applied HOP in different scenarios including fulllifecycle financial risk management in cryptographic computing settings, duplicate charges in medical diagnosis, and intrusion detection. HOP has achieved 5 to 10 folds of efficiency improvement, and an accuracy as good as 99% across aforementioned scenarios.

Key words: LLM, highorder program, verification, reliability, hallucination

摘要： 大模型因幻觉导致可靠性不足，难以满足专业领域(如金融、医疗、网络安全等)对精确性和可靠性的严苛要求.在传统专业领域，人类已经积累了大量的经验通过工程化的最佳实践实现高可靠性.这些最佳实践包括标准作业程序(standard operating procedure, SOP)和检查清单等机制.同样地，在大模型时代，首次提出大模型需借助高阶程序(highorder program, HOP)突破可靠性瓶颈.具体来说，HOP设计了一套新颖的任务融合描述与执行的语言，结合程序语言的精确执行能力与自然语言的知识表达优势，承载了专业领域的关键知识和实践并将其自动化.更重要的是其核验机制，补齐了当前大模型执行不验证的短板，保障了可靠性.在实践层面，HOP已经证明了其作为大模型行业应用急需且必须的控制体系的重大意义：HOP已经在密算金融风控全链路、网络入侵检测、医疗重复计费等多行业场景中初步应用，其时效性有显著提升(5~10倍)，可靠性准确率均可达到99%.

关键词: 大模型, 高阶程序, 核验, 可靠性, 幻觉

CLC Number:

TP18

韦韬, 仲震宇, 刘焱, 陈达, 薛见新, 胡钺琳, 余超凡, 周云浩, . 大模型驱动的高阶程序[J]. 信息安全研究, 2025, 11(11): 1008-.

References

［1］Yao J, Ning K, Liu Z, et al. LLM Lies: Hallucinations are not bugs, but features as adversarial examples［J］. arXiv preprint, arXiv:2310.01469, 2024［2］Hughs S, Bae M, Li M, et al. Hallucination leaderboard［EBOL］. 2025 ［20250930］. https:github.comvectarahallucinationleaderboard?tab=readmeovfile［3］Singh A, Schlesinger A, Fry A, et al. o3 and o4mini systemcard ［EBOL］. 2025 ［20250930］. https:cdn.openai.compdf2221c87502dc4789800be7758f3722c1o3ando4minisystemcard.pdf［4］Jannet P, Freud S, Loftus E, et al. False memory维基百科［EBOL］. ［20250930］. https:en.wikipedia.orgwikiFalse_memory［5］Dan Y. 证人的记忆效应. MBA智库百科［EBOL］. ［20250930］. https:wiki.mbalib.comwiki证人的记忆效应［6］Scheck B, Neufeld P. Innocence Project维基百科［EBOL］. ［20250930］. https:en.wikipedia.orgwikiInnocence_Project［7］Huang L, Yu W, Ma W, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions ［J］. ACM Trans on Information Systems, 2025, 43(2): 155［8］Min S, Krishna K, Lyu X, et al. Factscore: Finegrained atomic evaluation of factual precision in long form text generation［C］ Proc of the Conf on EMNLP. Stroudsburg, PA: ACL, 2023: 1207612100［9］Dhuliawala S, Komeili M, Xu J, et al. Chainofverification reduces hallucination in large language models［C］ Proc of Findings of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2024: 35633578［10］Fabbri A R, Wu C S, Liu W, et al. QAFactEval: Improved QAbased factual consistency evaluation for summarization［J］. arXiv perprint, arXiv:2112.08542, 2021［11］Laban P, Kryciński W, Agarwal D, et al. LLMs as factual reasoners: insights from existing benchmarks and beyond［J］. arXiv preprint, arXix:2305.14540, 2023［12］Adlakha V, BehnamGhader P, Lu X H, et al. Evaluating correctness and faithfulness of instructionfollowing models for question answering［J］. Trans of the Association for Computational Linguistics, 2024, 12: 681699［13］Wei J, Wang X, Schuurmans D, et al. Chainofthought prompting elicits reasoning in large language models［C］ Proc of the 36th Int Conf on Neural Information Processing Systems. New York: Curran Associates, 2022: 2482424837［14］Nye M, Andreassen A J, GurAri G, et al. Show your work: Scratchpads for intermediate computation with language models［J］. arXiv preprint, arXiv:2112.00114, 2021［15］Li C, Liang J, Zeng A, et al. Chain of code: Reasoning with a language modelaugmented code emulator［C］ Proc of the 41st Int Conf on Machine Learning. Cambridge, MA: JMLR, 2024: 2825928277［16］Chen W, Ma X, Wang X, et al. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks［J］. arXiv perprint, arXiv:2211.12588, 2022［17］Gao L, Madaan A, Zhou S, et al. Pal: Programaided language models［C］ Proc of Int Conf on Machine Learning. Cambridge, MA: JMLR, 2023: 1076410799［18］Wen J, Guan J, Wang H, et al. Codeplan: Unlocking reasoning potential in large language models by scaling codeform planning［J］ arXiv perprint, arXiv:2409.12452, 2024［19］Taylor F. The Principles of Scientific Management［M］. New York: Harper and Brothers, 1911［20］Gawande A. The Checklist Manifesto［M］. New York: Metropolitan Books, 2009［21］Goldreich O P. NP, and NP Completeness: The Basics of Computational Complexity［M］. New York: Cambridge University Press, 2010［22］Rafailov R, Sharma A, Mitchell E, et al. Direct preference optimization: Your language model is secretly a reward model［C］ Proc of the 37th Int Conf on Neural Information Processing Systems. New York: Curran Associates, 2023: 5372853741 ［23］Luo Y, Yang Z, Meng F, et al. An empirical study of catastrophic forgetting in large language models during continual finetuning［J］. IEEE Trans on Audio, Speech and Language Processing, 2025, 33: 37763786

[1]	. Multiparty Data Security Sharing Scheme Based on Decentralized Verification [J]. Journal of Information Security Reserach, 2025, 11(6): 578-.
[2]	. Baseline Evaluation of Financial Data Security Based on Combined WeightingTOPSIS Method [J]. Journal of Information Security Reserach, 2024, 10(7): 634-.
[3]	. Research Advance and Challenges of Fuzzing Techniques [J]. Journal of Information Security Reserach, 2024, 10(7): 668-.
[4]	. A Formal Modeling and Verification Method for Bitcoin Payment Protocol [J]. Journal of Information Security Reserach, 2024, 10(4): 311-.
[5]	. Research on Identity Authentication Technology Based on Block Chain and PKI [J]. Journal of Information Security Reserach, 2024, 10(2): 148-.
[6]	. Research on Security Verification of RFID Authentication Protocol #br# Based on Model Checking#br# [J]. Journal of Information Security Reserach, 2024, 10(11): 1043-.
[7]	. Face Recognition Privacy Protection Method Based on Homomorphic Encryption#br# [J]. Journal of Information Security Reserach, 2023, 9(9): 843-.
[8]	. Data Scarcity and Large Language Model Data Value Asymmetry [J]. Journal of Information Security Reserach, 2023, 9(7): 637-.
[9]	. Title Burden of Proof of the Signatory and the Relying Party in Electronic Signature Disputes [J]. Journal of Information Security Reserach, 2023, 9(11): 1096-.
[10]	. Research on Verification of Neural Network Based on Softplus Function by Reluplex Algorithm [J]. Journal of Information Security Reserach, 2022, 8(9): 917-.
[11]	. An Architecture Based on Multi Class SubChains for Government Data Quality Management [J]. Journal of Information Security Reserach, 2022, 8(4): 374-.
[12]	. An Anonymous E-voting System for Large Scale Scenarios [J]. Journal of Information Security Reserach, 2022, 8(10): 990-.
[13]	. Research on Multi-Dimensional Big Data Analysis Method of Distribution Network Power Supply Reliability [J]. Journal of Information Security Reserach, 2022, 8(1): 79-.
[14]	. HiSec Zero Trust Security Solution [J]. Journal of Information Security Reserach, 2021, 7(E1): 117-.
[15]	. Device Authentication Method Based on Blockchain Transaction Verification [J]. Journal of Information Security Reserach, 2021, 7(6): 550-557.

Highorder Program Driven by Large Language Model

大模型驱动的高阶程序

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics