信息安全研究 ›› 2025, Vol. 11 ›› Issue (4): 351-.

• 学术论文 • 上一篇    下一篇

结合序列关联图与GAN的高可用时序数据生成方法

万韵伟程瑶门元昊   

  1. (北京理工大学信息与电子学院北京100081)
  • 出版日期:2025-04-30 发布日期:2025-05-01
  • 通讯作者: 万韵伟 硕士.主要研究方向为网络安全、数据隐私保护. weiweijack@126.com
  • 作者简介:万韵伟 硕士.主要研究方向为网络安全、数据隐私保护. weiweijack@126.com 程瑶 硕士.主要研究方向为网络安全、自然语言处理. chengyao_bit@126.com 门元昊 博士研究生.主要研究方向为网络安全、数据安全. menyuanhao@bit.edu.cn

Highutility Time Series Data Generation Method Combining Sequence #br# Correlation Graph and GAN#br#

Wan Yunwei, Cheng Yao, and Men Yuanhao   

  1. (School of Information and Electronics, Beijing Institute of Technology, Beijing 100081)
  • Online:2025-04-30 Published:2025-05-01

摘要: 现实世界中获取长时间序列数据面临诸多挑战,严重制约了网络空间安全中的态势感知、威胁分析等应用发展.深度学习驱动的数据生成方法可以有效保护原始数据隐私,其中确保生成数据的高可用性和多样性至关重要.然而,现有方法采用随机拼接短序列构建模型的训练数据,无法保证生成数据分布符合预期,影响生成数据的可用性.针对上述问题,提出一种结合序列关联图与生成对抗网络的高可用时序数据生成方法,通过构建序列关联图和概率权重生成对抗网络,精准拟合原始数据分布特征.在多个真实数据集上的实验结果表明,该方法能够基于较短序列长度的原始数据,生成具有高可用性和多样性的长时间序列数据,显示出其在实际应用中的巨大潜力.

关键词: 数据生成, 数据安全, 时序数据, 短序列, 生成对抗网络

Abstract: Longterm time series data is difficult to obtain in reality, which seriously restricts the development of applications such as situational awareness and threat analysis in cyberspace security. Deep learningdriven data generation methods can effectively protect the privacy of original data, where ensuring the high utility and diversity of generated data is crucial. However, existing methods used random splicing of shortterm data to construct training data, which cannot ensure that the distribution of generated data meets expectations, affecting the availability of generated data. To address the above problems, this paper proposes a highutility time series generation method combining sequence correlation graph and generative adversarial network. By constructing sequence correlation graph and probability weighted generative adversarial network, the original data distribution is accurately fitted. Experimental results on multiple real data sets show that the method can generate longterm time series data with high utility and diversity based on shortterm original data, showing its great potential in practical applications.

Key words: data generation, data security, time series data, short sequence, GAN(generative adversarial network)

中图分类号: