信息安全研究 ›› 2023, Vol. 9 ›› Issue (5): 402-.

• 学术论文 •    下一篇

基于迭代二分聚类的K-匿名机制

王涛1谭虎1徐亭亭1辛保江1刘刚1周潘2   

  1. 1(国网山东省电力公司潍坊供电公司山东潍坊261021)
    2(华中科技大学网络空间安全学院武汉430074)
  • 出版日期:2023-05-01 发布日期:2023-04-29
  • 通讯作者: 王涛 高级工程师.主要研究方向为电力系统自动化. twang1127@163.com
  • 作者简介:王涛 高级工程师.主要研究方向为电力系统自动化. twang1127@163.com 谭虎 硕士,高级工程师.主要研究方向为信息通信技术及网络安全. tanhu8621950@sina.com 徐亭亭 高级工程师.主要研究方向为数据安全. nydiaxt@sina.com 辛保江 硕士,高级工程师.主要研究方向为电力大数据应用. xinbaojiang@163.com 刘刚 工程师.主要研究方向为电力大数据应用. lqdy0127@163.com 周潘 博士,教授.主要研究方向为数据与网络安全与隐私. panzhou@hust.edu.cn

K-anonymity Mechanism Based on Iterative Binary Clustering

  • Online:2023-05-01 Published:2023-04-29

摘要: 随着数据共享在各个领域的深入应用,对于数据所包含的个体隐私保护问题日益突出,同时K匿名作为一种隐私保护的先进理论也被广泛应用于数据的共享与分发.但是K匿名作为一种通过概化数据实现隐私保护的方式,不可避免地会造成一定的信息损失,因此如何在满足K匿名的前提下,尽可能保证数据可用性、减少信息损失量则是一个值得研究的问题.对于此,针对数值型数据提出了一种基于迭代二分聚类的K匿名算法KABIBC(Kanonymous algorithm based on iterative binary clustering)实现K匿名.首先定义了组内距离之和WGSD(withingroup sum of distance),并将数据表中的所有元组视为一个聚类,而后采用迭代的策略对其进行二分聚类,对于得到的子聚类采用同样的方式递归进行处理,并且在二分聚类时基于最小化信息损失量的原则合理调整2个子聚类的元组分配,直到得到满足K匿名要求的最小子聚类,从而保证信息损失量趋于最优.给出了理论和实验分析,表明此机制有效减少了信息损失,同时有较高的运行效率.


关键词: 迭代优化, 二分聚类, 隐私保护, K匿名, 概化

Abstract: With the deepening of data sharing in various fields, the protection of individual privacy contained in data has become increasingly prominent. At the same time, Kanonymity, as an advanced theory of privacy protection, is also widely used in data sharing and distribution. However, Kanonymity, as a way to achieve privacy protection by generalizing data, will inevitably cause a certain loss of information. Therefore, how to ensure data availability and reduce the information loss as much as possible under the premise of satisfying Kanonymity is a question worthy of study. For this problem, for numerical data, a Kanonymity algorithm KABIBC (Kanonymous algorithm based on iterative binary clustering) based on iterative binary clustering is proposed to achieve Kanonymity. First, the sum of the distances within the group is defined, i.e., WGSD(withingroup sum of distance), and treat all tuples in the data table as a cluster, and then use an iterative strategy to perform binary clustering on it, and recursively process the obtained subclusters in the same way,  and reasonably adjust the tuple assignment of the two subclusters based on the principle of minimizing the information loss in the bisection, until the minimum subcluster that satisfies the Kanonymity requirement is obtained, so as to ensure that the amount of information loss tends to be optimal. Theoretical and experimental analysis are given, and it is shown that this mechanism can effectively reduce the information loss, and at the same time has a high operating efficiency.


Key words: iterative optimization, binary clustering, privacy protection, Kanonymity, generalization

中图分类号: