面向加密恶意流量的噪声标签检测方法

信息安全研究 ›› 2023, Vol. 9 ›› Issue (10): 1023-.

面向加密恶意流量的噪声标签检测方法

童家铖陈伟倪嘉翼李频

(南京邮电大学计算机学院、软件学院、网络空间安全学院南京210023)

出版日期:2023-10-17 发布日期:2023-10-28
通讯作者: 陈伟博士，教授.主要研究方向为无线网络安全、移动互联网安全等. chenwei@njupt.edu.cn
作者简介:童家铖硕士研究生.主要研究方向为网络安全、加密恶意流量检测. Oc34nus@outlook.com 陈伟博士，教授.主要研究方向为无线网络安全、移动互联网安全等. chenwei@njupt.edu.cn 倪嘉翼硕士研究生.主要研究方向为网络安全、网络入侵检测. njiay@outlook.com 李频副教授.主要研究方向为网络与信息安全. lipin7421@163.com

A Noisy Label Detection Method for Encrypting Malicious Traffic

Tong Jiacheng, Chen Wei, Ni Jiayi, and Li Pin

(School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023)

Online:2023-10-17 Published:2023-10-28

摘要/Abstract

摘要： 对于基于数据驱动的加密恶意流量检测模型的训练及其评估，处理有噪声的数据集仍然是一项挑战，提出了一种基于KRPDDT的噪声标签检测方法，使用差分训练的思想同时训练2个相同的模型，提取样本在2个模型中训练的损失，根据干净样本和噪声样本在训练行为上的差异性检测出噪声样本.同时，为了放大样本间损失上的差异，提出了基于KLIEPRPD的相对噪声权重估计方法，估计每个样本的相对概率密度，并把它作为样本损失行为的权重.该方法在对CICDoHBrw2020数据集清洗过后，有效地恢复了恶意DoH流量检测模型的性能，实验验证了该方法具有不错的稳定性，并超过了其他几种噪声检测方法.

关键词: 噪声标签监测, 噪声权重, 加密恶意流量, DoH流量, 差分训练

Abstract: Processing noisy datasets remains a challenge for training and evaluating data driven encrypted malicious traffic detection models. A noise label detection method based on KRPDDT was proposed, which used differential training to train two identical models simultaneously, extracted the training losses of samples in the two models, and detected noise samples based on the differences in training behavior between clean samples and noise samples. At the same time, in order to amplify the difference in loss between samples, a relative noise weight estimation method based on KLIEPRPD was proposed to estimate the relative probability density of each sample and used it as the weight of the sample loss behavior. This method effectively recovered the performance of the malicious DoH traffic detection model after cleaning the CICDoHBrw2020 dataset. Experiments verified that this method had good stability and outperformed other noise detection methods.

Key words: noisy label detection, noise weight, encrypt malicious traffic, DoH traffic, differential training

中图分类号:

TP393.08

童家铖, 陈伟, 倪嘉翼, 李频, . 面向加密恶意流量的噪声标签检测方法[J]. 信息安全研究, 2023, 9(10): 1023-.

参考文献

［1］何红艳, 黄国言, 张炳, 等. 基于多种特征选择策略的入侵检测模型研究［J］. 信息安全研究, 2021, 7(3): 225232［2］莫坤, 王娜, 李恒吉, 等. 基于LightGBM的网络入侵检测系统［J］. 信息安全研究, 2019, 5(2): 152156［3］Ren M, Zeng W, Yang B, et al. Learning to reweight examples for robust deep learning［C］ Proc of Int Conf on Machine Learning. Long Beach: PMLR, 2018: 43344343［4］Arazo E, Ortego D, Albert P, et al. Unsupervised label noise modeling and loss correction［C］ Proc of Int Conf on Machine Learning. Long Beach: PMLR, 2019: 312321［5］Xu J, Li Y, Deng R H. Differential training: A generic framework to reduce label noises for Android malware detection［COL］ Proc of Network and Distributed System Security Symp. Virtual: NDSS, 2021 ［20230517］. https:www.ndsssymposium.orgwpcontentuploadsndss2021_4C3_24126_paper.pdf［6］Xia S, Huang L, Wang G, et al. An adaptive and general model for label noise detection using relative probabilistic density［J］. KnowledgeBased Systems, 2022, 239(1): 262275［7］Sugiyama M, Nakajima S, Kashima H, et al. Direct importance estimation with model selection and its application to covariate shift adaptation［J］. Advances in Neural Information Processing Systems, 2007, 20(1): 14331440［8］MontazeriShatoori M, Davidson L, Kaur G, et al. Detection of DoH tunnels using timeseries classification of encrypted traffic［C］ Proc of 2020 IEEE Int Conf on Dependable, Autonomic and Secure Computing, Int Conf on Pervasive Intelligence and Computing, Int Conf on Cloud and Big Data Computing, Int Conf on Cyber Science and Technology Congress. Piscataway, NJ: IEEE, 2020: 6370［9］Singh S K, Roy P K. Malicious traffic detection of DNS over HTTPS using ensemble machine learning［J］. International Journal of Computing and Digital Systems, 2022, 11(1): 189197［10］Malach E, ShalevShwartz S. Decoupling “when to update” from “how to update”［J］. Advances in Neural Information Processing Systems, 2017, 30(4): 961971［11］Han B, Yao Q, Yu X, et al. Coteaching: Robust training of deep neural networks with extremely noisy labels［J］. Advances in Neural Information Processing Systems, 2018, 31(1): 136149

[1]	周建华, 李丰, 湛蓝蓝, 杜跃进, 霍玮, . 一种基于无害处理识别的嵌入式设备漏洞检测方法[J]. 信息安全研究, 2023, 9(10): 954-.
[2]	李思涌, 吴书汉, 孙伟, . 基于注意力机制的CNN-LSTM网络车内CAN总线入侵检测技术[J]. 信息安全研究, 2023, 9(10): 961-.
[3]	叶娜, 任祝, . 无线网络传输中针对线性欺骗攻击的检测策略[J]. 信息安全研究, 2023, 9(9): 892-.
[4]	殷树刚, 李祉岐, 刘晓蕾, 李宁, 林寅伟, . 基于APT组织攻击行为的网络安全主动防御方法研究[J]. 信息安全研究, 2023, 9(5): 423-.
[5]	朱贤伟. 电力企业可信安全网络保障体系建设实践[J]. 信息安全研究, 2022, 8(E2): 19-.
[6]	苗维杰, 夏春宇, 赵峰, . 基于可信技术的工控系统安全解决方案研究[J]. 信息安全研究, 2022, 8(E2): 48-.
[7]	沈铁志, 楚兵, 吴炳辉, 郎大鹏, . 基于图神经网络的工业互联网攻击检测算法[J]. 信息安全研究, 2022, 8(E2): 91-.
[8]	姚尧, 王树才, 李文华, 王薪达, . 可信计算技术在电力监控系统中的应用[J]. 信息安全研究, 2022, 8(E2): 101-.
[9]	韩蒙, 杨波, 邱晓慧, 林昶廷, . 面向金融行业的可信隐私计算评测解决方案[J]. 信息安全研究, 2022, 8(E2): 120-.
[10]	林青, 夏攀, 王杨. 针对可信模块的DoS攻击的分析及研究[J]. 信息安全研究, 2022, 8(E2): 144-.
[11]	王炎玲, 宁振虎. 中国可信计算标准体系框架[J]. 信息安全研究, 2022, 8(E2): 152-.
[12]	金飞, 王大伟, 王志河, . 企业攻击面管理实战分析[J]. 信息安全研究, 2022, 8(E1): 14-.
[13]	张晋, 卢佐华, . 构建电子政务移动安全主动防御体系[J]. 信息安全研究, 2022, 8(E1): 18-.
[14]	贾悦霖, 赵凡, 王瑜, . 5G+ 基于云化蜜罐网络安全感知解决方案[J]. 信息安全研究, 2022, 8(E1): 31-.
[15]	姚锡龙, 崔炳杰, 王瑜, . 应用层加密流量检测解决方案[J]. 信息安全研究, 2022, 8(E1): 35-.