信息安全研究 ›› 2025, Vol. 11 ›› Issue (12): 1117-.

• 学术论文 • 上一篇    下一篇

基于语言模型与低秩适配的钓鱼邮件高效检测方法

李子川季铎周嵩
  

  1. (中国刑事警察学院公安信息技术与情报学院沈阳110854)
  • 出版日期:2025-12-12 发布日期:2025-12-04
  • 通讯作者: 李子川 硕士,讲师.主要研究方向为网络安全、电子数据取证. lizic008792@126.com
  • 作者简介:李子川 硕士,讲师.主要研究方向为网络安全、电子数据取证. lizic008792@126.com 季铎 硕士,副教授.主要研究方向为自然语言处理. 18640037173@163.com 周嵩 博士,讲师.主要研究方向为网络安全、人工智能. 718281245@qq.com
  • 基金资助:
    国家自然科学基金项目(62406342);辽宁省自然科学基金项目(2022MS168);中国刑事警察学院重点科研课题(2022XKGJ0108)

An Efficient Detection Method of Phishing Email Based on  Language Model and LoRA

Li Zichuan, Ji Duo, and Zhou Song
  

  1. (College of Public Security Information Technology and Intelligence, Criminal Investigation Police University of China, Shenyang 110854)
  • Online:2025-12-12 Published:2025-12-04

摘要: 钓鱼邮件检测在网络安全领域至关重要,但由于钓鱼邮件形式多变且内容复杂,其检测面临巨大挑战.提出了一种结合预训练语言模型DistilBERT与低秩适配(LowRank Adaptation, LoRA)技术的钓鱼邮件检测方法.通过DistilBERT对邮件文本进行深层次特征提取,同时利用LoRA技术微调少量参数,以此减少对大规模标注数据的依赖并提升模型的泛化能力.实验结果表明,与传统机器学习方法和深度学习方法(如RNN,LSTM,Bidirectional LSTM)相比,DistilBERT+LoRA在准确率、精确率、召回率和F1分数等关键指标上均表现优异,其中准确率达到96%,F1分数为97%,显著优于对比方法.此外,在精确率和召回率的平衡方面也优于其他深度学习方法,尤其在复杂钓鱼邮件的检测上展现了更强的鲁棒性和适应性.实验还表明模型性能随着LoRA低秩参数的增加而提升.所提出的方法充分利用预训练语言模型的强大特征提取能力和LoRA的高效微调优势,为高效精准的钓鱼邮件检测提供了一种创新的解决方案.

关键词: 网络安全, 钓鱼邮件检测, 预训练语言模型, 低秩适配, 语义特征提取

Abstract: Phishing email detection is critical in cybersecurity, as it faces significant challenges due to the diverse and complex nature of phishing emails. This paper proposes a phishing email detection method integrating the pretrained language model DistilBERT with LowRank Adaptation (LoRA). DistilBERT is used to extract deep semantic features from email text, while LoRA finetunes a small number of parameters, thereby reducing the dependence on largescale labeled data and enhancing the model generalization. Experimental results show that compared to traditional machine learning methods and deep learning models (such as RNN, LSTM, and Bidirectional LSTM), DistilBERT+LoRA outperforms them in key metrics including accuracy, precision, recall, and F1score, achieving 96% accuracy and 97% F1score, which significantly surpassing comparative models. Additionally, it demonstrates better balance between precision and recall than other deep learning models, particularly demonstrating robustness and adaptability in detecting complex phishing emails. Experiments further reveal that the model’s performance improves with the increase in LoRA’s rank parameters. By leveraging the powerful feature extraction capabilities of pretrained language models and the efficient finetuning advantages of LoRA, this method provides an innovative and effective solution for accurate and efficient phishing email detection.

Key words: cybersecurity, phishing email detection, pretrained language model, LoRA(lowrank adaptation)

中图分类号: