Journal of Information Security Reserach ›› 2026, Vol. 12 ›› Issue (4): 383-.

Previous Articles    

Research on Log Anomaly Detection Method Integrating Semantic Features

Chen Hanwen1, Zhang Le1, Chi Yaping1, Jiang Bo2, and Wang Zhiqiang1   

  1. 1(School of Cyberspace Security, Beijing Electronics Science & Technology Institute, Beijing 100070)
    2(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100089)

  • Online:2026-04-07 Published:2026-04-07

融合语义特征的日志异常检测方法研究

陈瀚文1章乐1池亚平1姜波2王志强1   

  1. 1(北京电子科技学院网络空间安全系北京100070)
    2(中国科学院信息工程研究所北京100089)
  • 通讯作者: 池亚平 硕士,教授.主要研究方向为网络安全防护、云计算安全. chiyp_besti@163.com
  • 作者简介:陈瀚文 硕士研究生.主要研究方向为日志解析、日志异常检测. hanwen_chen@foxmail.com 章乐 博士,讲师.主要研究方向为深度强化学习、计算机视觉、理论计算机科学. lezhang12@tsinghua.org.cn 池亚平 硕士,教授.主要研究方向为网络安全防护、云计算安全. chiyp_besti@163.com 姜波 博士,副研究员.主要研究方向为网络威胁分析、态势感知. jiangbo@iie.ac.cn. 王志强 博士,副教授.主要研究方向为漏洞挖掘及攻防. wangzq@besti.edu.cn
  • 基金资助:
    中央高校基本科研业务费专项资金项目(3282024050);国家重点研发计划项目(2023YFC2206402)

Abstract: With the continuous expansion of system functionalities, the volume of system logs has grown exponentially, presenting substantial challenges to conventional anomaly detection approaches. Deep learningbased log anomaly detection techniques have gradually become a research hotspot due to their powerful feature extraction capabilities. This study proposes a semisupervised log anomaly detection model LogSem, which integrates semantic features. By introducing log content vectors that contain semantic information of the main log content and incorporating masked log key prediction tasks and hypersphere volume minimization tasks for semisupervised learning, the model deeply explores the semantic features of logs. Experiments conducted on three mainstream datasets show that the proposed method outperforms the LogBERT baseline model in terms of the F1 score. Furthermore, this study explores and verifies the feasibility of addressing the outofvocabulary problem through semisupervised learning.

Key words: log anomaly detection, log analysis, deep learning, semisupervised learning, BERT model

摘要: 随着系统应用功能的不断扩展,系统日志规模迅速增长,给传统的异常检测方法带来巨大的挑战.基于深度学习的日志异常检测技术因其强大的特征提取能力逐渐成为研究热点.提出一种融合语义特征的半监督日志异常检测模型LogSem.通过引入包含日志主体内容语义信息的日志内容向量,并结合掩码日志键预测任务与超球体体积最小化任务,对数据集进行半监督学习,深度挖掘日志的语义特征.在3个主流数据集上的实验表明,该方法的F1分数优于LogBERT基准模型.此外,研究探索并验证了通过半监督学习解决未登录词问题的可行性.

关键词: 日志异常检测, 日志解析, 深度学习, 半监督学习, BERT模型

CLC Number: