信息安全研究 ›› 2023, Vol. 9 ›› Issue (12): 1145-.

• 学术论文 • 上一篇    下一篇

基于联邦学习和差分隐私的文本分类模型研究

盛雪晨陈丹伟   

  1. (南京邮电大学计算机学院、软件学院、网络空间安全学院南京210023)

  • 出版日期:2023-12-20 发布日期:2023-12-28

Research on Text Classification Model Based on Federated Learning  and Differential Privacy

Sheng Xuechen and Chen Danwei#br#

#br#
  

  1. (Department of Computer, Department of Software, Department of Cyberspace Security, Nanjing University of Posts and Telecommunications, Nanjing 210023)

  • Online:2023-12-20 Published:2023-12-28

摘要: 联邦学习作为一种分布式机器学习框架, 可以在不泄露用户数据的前提下完成模型训练.然而,最近的攻击表明,在训练过程中仅仅保持数据的局部性并不能提供足够的隐私保障.因此,为了解决联邦学习训练过程中存在的隐私保护问题,提出了一种基于BERT的文本分类模型,该模型将差异隐私(DP)和联邦学习(FL)相结合,在联邦学习参数的传递过程中保证联邦模型训练过程免受推理攻击的影响.最终实验表明,提出的方法在能够保护隐私的同时仍可保证较高的模型准确率.

关键词: 文本分类, 分布式计算, 联邦学习, 差分隐私, 隐私保护

Abstract: As a distributed machine learning framework, federated learning can complete model training without disclosing user data. However, recent attacks have shown that only keeping the locality of data in the training process can not provide sufficient privacy protection. Therefore, in order to address the privacy protection issues during federated learning training, this paper proposes a text classification model based on BERT. This model combines differential privacy (DP) and federated learning (FL) to ensure that the federated model training process is protected from inference attacks during the transfer of federated learning parameters. The final experiment shows that the proposed method can maintain high model accuracy while protecting privacy.

Key words: text classification, distributed computing, federated learning, differential privacy, privacy protection

中图分类号: