Journal of Information Security Reserach ›› 2025, Vol. 11 ›› Issue (8): 710-.

Previous Articles     Next Articles

A Privacy Budget Allocation Method Based on Differential #br# Privacy kmeans++#br#

Yan Ling1 and Zhao Hailiang2   

  1. 1(Department of Mathematics Teaching, Sichuan University Jinjiang College, Meishan, Sichuan 620860)
    2(Department of Mathematics, Southwest Jiaotong University, Chengdu 611756)
  • Online:2025-08-28 Published:2025-08-28

基于差分隐私kmeans++的一种隐私预算分配方法

晏玲1赵海良2   

  1. 1(四川大学锦江学院数学教学部四川眉山620860)
    2(西南交通大学数学学院成都611756)
  • 通讯作者: 晏玲 硕士,助教.主要研究方向为智能信息处理、数据挖掘. user_yanling@163.com
  • 作者简介:晏玲 硕士,助教.主要研究方向为智能信息处理、数据挖掘. user_yanling@163.com 赵海良 博士,教授.主要研究方向为智能机器人和车辆自动驾驶控制、多目标智能控制系统、模糊信息处理理论. hailiang@home.swjtu.edu.cn

Abstract: For the traditional differential privacy kmeans++ algorithm, uniform budget allocation by the equal division method cannot meet varying privacy needs. Meanwhile, binary division rapidly depletes the budget, leading to excessive noise later on, both impairing clustering performance. To solve this problem, a new privacy budget allocation method combining the arithmetic and equal allocation methods was proposed. For initial center selection, use an equal division budget allocation. For center updates, early stage uses arithmetic progression, later stage switches to equal division, both focused on minimal budget. This approach ensures substantial initial privacy budget for minimal cluster center distortion, and moderate budget depletion later to prevent excessive noise that could compromise clustering outcomes. A series of experiments based on real data show that, compared to the original kmeans++, the minimum error is only 0.09%. Compared to the equal distribution method and the binary method, the clustering accuracy is improved by up to 14.9% and 16.9% respectively. It can be seen that this method is significantly better than the equal division and the binary division, and can improve the usability and accuracy of clustering results to a certain extent.

Key words: information security, data mining, differential privacy protection, kmeans , , privacy budget allocation

摘要: 针对传统差分隐私kmeans++算法,常用的均分法分配隐私预算无法适应不同部位对隐私预算的不同需求,而二分法中隐私预算消耗过快会使得后期噪声过多,均会导致聚类效果不佳.为解决该问题,结合等差法和均分法提出了一种新的隐私预算分配方法.在选取初始中心点时采用均分法分配隐私预算,更新中心点的过程结合最小隐私预算,前期采用等差法,后期采用均分法.该方法使得前期分配的隐私预算较大,保证了聚类中心不会发生严重形变,后期隐私预算的消耗速度适中,避免了加入过多噪声而影响聚类效果.一系列基于真实数据的实验结果表明,与原kmeans++相比,最低误差仅有0.09%;与均分法和二分法相比,聚类准确率最高分别提升了14.9%和16.9%.由此可见该方法明显优于均分法和二分法,在一定程度上能够提升聚类结果的可用性和准确性.

关键词: 信息安全, 数据挖掘, 差分隐私保护, kmeans, 隐私预算分配

CLC Number: