Differentially Private Text Synthesis Based on Gradient Direction Filtering

Journal of Information Security Reserach ›› 2026, Vol. 12 ›› Issue (3): 220-.

Previous Articles Next Articles

Differentially Private Text Synthesis Based on Gradient Direction Filtering

Li Li1, Zhao Linlu2, Guo Guojiang2, Jin Jianwei1, and Duan Xiaoyi1

1(Department of Electronic and Communication Engineering, Beijing Electronic Science and Technology Institute, Beijing 100070)
2(Department of Cyberspace Security, Beijing Electronic Science and Technology Institute, Beijing 100070)

Online:2026-03-12 Published:2026-03-12

基于梯度方向筛选的差分隐私文本合成

李莉1赵霖露2郭国疆2金剑炜1段晓毅1

1(北京电子科技学院电子与通信工程系北京100070)
2(北京电子科技学院网络空间安全系北京100070)

通讯作者: 赵霖露硕士研究生.主要研究方向为信息安全. zhaolinlu01@163.com
作者简介:李莉博士，教授.主要研究方向为网络与系统安全、嵌入式系统安全应用. laury_li@126.com 赵霖露硕士研究生.主要研究方向为信息安全. zhaolinlu01@163.com 郭国疆硕士研究生.主要研究方向为信息安全. 1518836979@qq.com 金剑炜硕士研究生.主要研究方向为数字信号处理. 1051021291@qq.com 段晓毅博士，副教授.主要研究方向为信息安全. xiaoyi_duan@sina.com

Abstract

Abstract: Deep learning models enhance performance by memorizing training data, but this also poses a risk of training data leakage. Differential privacy, as a mainstream privacy protection method, effectively mitigates this risk. However, existing differentially private data synthesis approaches suffer from slow model convergence and low data usability. To address these issues, we propose the TVDPSGDLM_D framework. This approach introduces TVDPSGD, a thresholdvalidated differentially private optimization algorithm that incorporates a validation mechanism to filter gradient directions during differentially private model training. By discarding ineffective updates, this approach accelerates model convergence. TVDPSGDLM embeds TVDPSGD into a language generation model to synthesize labeled text datasets that maintain statistical similarity to the original data. Additionally, a pretrained classifier is used to filter the generated text, removing samples where the classification results do not match the assigned labels, thereby improving the quality of the synthetic dataset. Experimental results on public datasets demonstrate that the proposed method preserves data privacy while achieving a classification accuracy of 89.4% on the processed synthetic dataset.

Key words: differential privacy, gradient direction filtering, contrastive filtering, text synthesis, conditional control code

摘要： 深度学习模型通过记忆训练数据提升性能的同时，存在训练数据泄露的风险，差分隐私作为一种主流的隐私保护方法能够有效降低此风险.目前基于差分隐私的数据合成方案存在模型收敛速度较慢、数据可用性较低等问题.为了解决这些问题，提出TVDPSGDLM_D方案，通过阈值验证的差分隐私优化算法TVDPSGD，采用验证机制筛选梯度方向优化差分隐私模型训练的过程，摒弃无效更新提高模型收敛速度；将TVDPSGD作为优化算法嵌入语言生成模型中，得到TVDPSGDLM，生成与原始数据统计分布相似的、带标注的合成初始文本数据集；并利用预训练分类器筛选初始文本，去除分类结果与合成文本标签不一致的样本，提高合成数据的质量，获得满足差分隐私的公开数据集.在公开数据集上的实验结果显示，使用该方案合成并处理后的数据集实现了对数据隐私性的保护，分类准确率可达89.4%.

关键词: 差分隐私, 梯度方向筛选, 对比筛选, 文本合成, 条件控制码

CLC Number:

TP309.2

李莉, 赵霖露, 郭国疆, 金剑炜, 段晓毅, . 基于梯度方向筛选的差分隐私文本合成[J]. 信息安全研究, 2026, 12(3): 220-.

References

［1］钱汉伟, 彭季天, 袁明, 等. 影响预训练语言模型数据泄露的因素研究. 信息安全研究［J］, 2025, 11(2): 181188［2］Dwork C. Differential privacy［C］ Proc of the 33rd Int Colloquium on Automata, Languages, and Programming. Berlin: Springer, 2006: 112［3］Abadi M, Chu A, Goodfellow I, et al. Deep learning with differential privacy［C］ Proc of the 2016 ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2016: 308318［4］Stock P, Shilov I, Mironov I, et al. Defending against reconstruction attacks withrényi differential privacy［JOL］. 2022 ［20250319］. https:arxiv.orgpdf2202.07623［5］Mironov I. Renyi differential privacy［C］ Proc of the 30th Computer Security Foundations Symposium. Piscataway, NJ: IEEE, 2017: 263275［6］Xu Chugui, Ren Ju, Zhang Deyu, et al. GANobfuscator: Mitigating information leakage under GAN via differential privacy［J］. IEEE Trans on Information Forensics and Security, 2019, 14(9): 23582371［7］Igamberdiev T, Habernal I. DPBART for privatized text rewriting under local differential privacy［C］ Proc of the 2023 Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2023［8］Huang Xixi, Guan Jian, Zhang Bin, et al. Differentially private convolutional neural networks with adaptive gradient descent［C］ Proc of the 4th IEEE Int Conf on Data Science in Cyberspace. Piscataway, NJ: IEEE, 2019: 642648［9］Yu D, Naik S, Backurs A, et al. Differentially private finetuning of language models［C］ Proc of the 10th Int Conf on Learning Representations. Kigali, Rwanda: ICLR, 2022［10］余方超, 方贤进, 张又文, 等. 增强深度学习中的差分隐私防御机制［J］. 南京大学学报: 自然科学, 2021, 57(1): 1020［11］Yue Xiang, Inan H A, Li Xuechen, et al. Synthetic text generation with differential privacy: A simple and practical recipe［C］ Proc of the 24th Annual Int Symp on Computer Architecture. Stroudsburg, PA: ACL, 2023［12］Fernandes N, Dras M, McIver A. Generalised differential privacy for text document processing［G］ LNCS 11426: Proc of the 8th Int Conf on Principles of Security and Trust. Berlin: Springer, 2019: 123148［13］Sasada T, Kawai M, Taenaka Y,et al. Differentiallyprivate text generation via text preprocessing to reduce utility loss［C］ Proc of the 2021 Int Conf on Artificial Intelligence in Information and Communication. Piscataway, NJ: IEEE, 2021: 042047［14］Ponomareva N, Hazimeh H, Kurakin A, et al. How to DPfy ML: A practical guide to machine learning with differential privacy［J］. Journal of Artificial Intelligence Research, 2023, 77: 11131201［15］Carlini N, Liu C, Erlingsson , et al. The secret sharer: Evaluating and testing unintended memorization in neural networks［C］ Proc of the 28th USENIX Security Symposium. Berkeley, CA: USENIX Association, 2019［16］Carlini N, Tramèr F, Brown T, et al. Extracting training data from large language models［C］ Proc of the 30th USENIX Security Symposium. Berkeley, CA: USENIX Association, 2021: 267284

[1]	. Adaptive Gaussian Mixturebased Federated Learning Backdoor Defense Approach [J]. Journal of Information Security Reserach, 2026, 12(4): 348-.
[2]	. Double Differential Privacy Protection Algorithm Based on BP Neural Network [J]. Journal of Information Security Reserach, 2025, 11(9): 814-.
[3]	. A Privacy Budget Allocation Method Based on Differential #br# Privacy kmeans++#br# [J]. Journal of Information Security Reserach, 2025, 11(8): 710-.
[4]	. Application Research of Differential Privacy Shuffle Model in Range Query [J]. Journal of Information Security Reserach, 2025, 11(8): 736-.
[5]	. Personalized Differential Privacy Trajectory Publishing Scheme Fusing Semantic [J]. Journal of Information Security Reserach, 2025, 11(7): 670-.
[6]	. A Differential Privacy Text Desensitization Method for Enhancing Semantic Consistency [J]. Journal of Information Security Reserach, 2024, 10(8): 706-.
[7]	. K-means++ Clustering Method Supporting Differential Privacy Protection in Spark Framework [J]. Journal of Information Security Reserach, 2024, 10(8): 712-.
[8]	. Federated Foundation Model Finetuning Based on Differential Privacy#br# #br# [J]. Journal of Information Security Reserach, 2024, 10(7): 616-.
[9]	. Research on Privacy Protection Technology in Federated Learning [J]. Journal of Information Security Reserach, 2024, 10(3): 194-.
[10]	. Research on Text Classification Model Based on Federated Learning and Differential Privacy [J]. Journal of Information Security Reserach, 2023, 9(12): 1145-.
[11]	. Technology and Research Progress of Generative Adversarial Networks [J]. Journal of Information Security Reserach, 2022, 8(3): 235-.
[12]	. [J]. Journal of Information Security Reserach, 2022, 8(3): 270-.
[13]	. A Traceable Deep Learning Classifier Based on Differential Privacy [J]. Journal of Information Security Reserach, 2022, 8(3): 277-.
[14]	. A Privacy-Preserving Federated Learning Method for Traffic Flow Prediction [J]. Journal of Information Security Reserach, 2022, 8(10): 1035-.
[15]	. Search Log Anonymity Publish Based on Differential Privacy and Classification Technique [J]. Journal of Information Security Research, 2016, 2(3): 251-257.

Differentially Private Text Synthesis Based on Gradient Direction Filtering

基于梯度方向筛选的差分隐私文本合成

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics