基于迭代二分聚类的K-匿名机制

信息安全研究 ›› 2023, Vol. 9 ›› Issue (5): 402-.

• 学术论文 • 下一篇

基于迭代二分聚类的K-匿名机制

王涛1谭虎1徐亭亭1辛保江1刘刚1周潘2

1(国网山东省电力公司潍坊供电公司山东潍坊261021)
2(华中科技大学网络空间安全学院武汉430074)

出版日期:2023-05-01 发布日期:2023-04-29
通讯作者: 王涛高级工程师.主要研究方向为电力系统自动化. twang1127@163.com
作者简介:王涛高级工程师.主要研究方向为电力系统自动化. twang1127@163.com 谭虎硕士，高级工程师.主要研究方向为信息通信技术及网络安全. tanhu8621950@sina.com 徐亭亭高级工程师.主要研究方向为数据安全. nydiaxt@sina.com 辛保江硕士，高级工程师.主要研究方向为电力大数据应用. xinbaojiang@163.com 刘刚工程师.主要研究方向为电力大数据应用. lqdy0127@163.com 周潘博士，教授.主要研究方向为数据与网络安全与隐私. panzhou@hust.edu.cn

K-anonymity Mechanism Based on Iterative Binary Clustering

Online:2023-05-01 Published:2023-04-29

摘要/Abstract

摘要： 随着数据共享在各个领域的深入应用，对于数据所包含的个体隐私保护问题日益突出，同时K匿名作为一种隐私保护的先进理论也被广泛应用于数据的共享与分发.但是K匿名作为一种通过概化数据实现隐私保护的方式，不可避免地会造成一定的信息损失，因此如何在满足K匿名的前提下，尽可能保证数据可用性、减少信息损失量则是一个值得研究的问题.对于此，针对数值型数据提出了一种基于迭代二分聚类的K匿名算法KABIBC(Kanonymous algorithm based on iterative binary clustering)实现K匿名.首先定义了组内距离之和WGSD(withingroup sum of distance)，并将数据表中的所有元组视为一个聚类，而后采用迭代的策略对其进行二分聚类，对于得到的子聚类采用同样的方式递归进行处理，并且在二分聚类时基于最小化信息损失量的原则合理调整2个子聚类的元组分配，直到得到满足K匿名要求的最小子聚类，从而保证信息损失量趋于最优.给出了理论和实验分析，表明此机制有效减少了信息损失，同时有较高的运行效率.

关键词: 迭代优化, 二分聚类, 隐私保护, K匿名, 概化

Abstract: With the deepening of data sharing in various fields, the protection of individual privacy contained in data has become increasingly prominent. At the same time, Kanonymity, as an advanced theory of privacy protection, is also widely used in data sharing and distribution. However, Kanonymity, as a way to achieve privacy protection by generalizing data, will inevitably cause a certain loss of information. Therefore, how to ensure data availability and reduce the information loss as much as possible under the premise of satisfying Kanonymity is a question worthy of study. For this problem, for numerical data, a Kanonymity algorithm KABIBC (Kanonymous algorithm based on iterative binary clustering) based on iterative binary clustering is proposed to achieve Kanonymity. First, the sum of the distances within the group is defined, i.e., WGSD(withingroup sum of distance), and treat all tuples in the data table as a cluster, and then use an iterative strategy to perform binary clustering on it, and recursively process the obtained subclusters in the same way, and reasonably adjust the tuple assignment of the two subclusters based on the principle of minimizing the information loss in the bisection, until the minimum subcluster that satisfies the Kanonymity requirement is obtained, so as to ensure that the amount of information loss tends to be optimal. Theoretical and experimental analysis are given, and it is shown that this mechanism can effectively reduce the information loss, and at the same time has a high operating efficiency.

Key words: iterative optimization, binary clustering, privacy protection, Kanonymity, generalization

中图分类号:

中图法分类号TP391

王涛, 谭虎, 徐亭亭, 辛保江, 刘刚, 周潘, . 基于迭代二分聚类的K-匿名机制[J]. 信息安全研究, 2023, 9(5): 402-.

参考文献

参考文献
［1］王逸鹤, 黄亦芃. 面向网络安全防御防护的大数据平台架构研究［J］. 信息安全研究, 2021, 7(1): 7580［2］江茜. 大数据安全审计框架及关键技术研究［J］. 信息安全研究, 2019, 5(5): 400405［3］Samarati P, Sweeney L. Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression［JOL］. 1998 ［20221118］. https:xueshu.baidu.comusercenterpapershow?paperid=fd2eb8edd82d1687dbf1b92cd8977a93［4］Meyerson A, Williams R. On the complexity of optimal kanonymity［C］ Proc of the 23rd ACM SIGMODSIGACTSIGART Symp on Principles of Database Systems. New York: ACM, 2004: 223228［5］Sweeney L. Achieving kanonymity privacy protection using generalization and suppression［J］. International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, 2002, 10(5): 571588［6］LeFevre K, DeWitt D J, Ramakrishnan R. Incognito: Efficient fulldomain kanonymity［C］ Proc of the 2005 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2005: 4960［7］Zhang Jianpei, Zhao Ying, Yang Yue, et al. A kanonymity clustering algorithm based on the information entropy［C］ Proc of the 18th IEEE Int Conf on Computer Supported Cooperative Work in Design (CSCWD). Piscataway, NJ: IEEE, 2014: 319324［8］Li Jiuyong, Wong R C W, Fu A W C, et al. Achieving kanonymity by clustering in attribute hierarchical structures［C］ Proc of Int Conf on Data Warehousing and Knowledge Discovery. Berlin: Springer, 2006: 405416［9］Byun J W, Kamra A, Bertino E, et al. Efficient kanonymization using clustering techniques［C］ Proc of Int Conf on Database Systems for Advanced Applications. Berlin: Springer, 2007: 188200［10］Yan Yan, Herman E A, Mahmood A, et al. A weighted Kmember clustering algorithm for Kanonymization［J］. Computing, 2021, 103(10): 22512273［11］Lin Junlin, Wei Mengcheng. An efficient clustering method for kanonymization［C］ Proc of the 2008 Int Workshop on Privacy and Anonymity in Information Society. New York: ACM, 2008: 4650［12］Guo Naixuan, Yang Ming, Gong Qiyuan, et al. Data anonymization based on natural equivalent class［C］ Proc of the 23rd IEEE Int Conf on Computer Supported Cooperative Work in Design (CSCWD). Piscataway, NJ: IEEE, 2019: 2227［13］姜火文, 曾国荪, 马海英. 面向表数据发布隐私保护的贪心聚类匿名方法［J］. 软件学报, 2017, 28(2): 341351［14］Wang Kun, Zhao Wei, Cui Junjie, et al. A Kanonymous clustering algorithm based on the analytic hierarchy process［J］. Journal of Visual Communication and Image Representation, 2019, 59: 7683［15］Zheng Wantong, Wang Zhongyue, Lv Tongtong, et al. Kanonymity algorithm based on improved clustering［C］ Proc of Int Conf on Algorithms and Architectures for Parallel Processing. Berlin: Springer, 2018: 462476［16］Arava K, Lingamgunta S. Adaptive kanonymity approach for privacy preserving in cloud［J］. Arabian Journal for Science and Engineering, 2020, 45(4): 24252432［17］张强, 叶阿勇, 叶帼华, 等. 最优聚类的k匿名数据隐私保护机制［J］. 计算机研究与发展, 2022, 59(7): 16251635［18］刘晓迁, 李千目. 基于聚类匿名化的差分隐私保护数据发布方法［J］. 通信学报, 2016, 37(5): 125129

[1]	张文俊, 卫霞, 李相阳, . 身体互联网应用安全风险应对研究[J]. 信息安全研究, 2023, 9(5): 433-.
[2]	李洋, 王萌萌, 朱建明, 王秀利, 王友卫, . 一种基于Paillier和FO承诺的新型区块链隐私保护方案[J]. 信息安全研究, 2023, 9(4): 306-.
[3]	张海俊, 丁平刚, 彭一轩, 孙晨雨, . 基于CKKS与CP-ABE的国网电力数据共享方案[J]. 信息安全研究, 2023, 9(3): 262-.
[4]	李国良, 邵思豪, . 基于区块链的电子证照共享方案研究与实现[J]. 信息安全研究, 2023, 9(2): 127-.
[5]	杜海涛, 何申, 杨朋霖. 面向5G网络的可信计算方案研究与设计[J]. 信息安全研究, 2022, 8(E2): 116-.
[6]	董贵山, 张文科, 罗影, 唐林, 刘波, 冷昌琦, 李恺, 许莹莹. 工业互联网密码应用研究[J]. 信息安全研究, 2022, 8(6): 554-.
[7]	白国柱, 张文俊, 赵鹏. 智能合约隐私保护技术发展现状研究[J]. 信息安全研究, 2022, 8(5): 484-.
[8]	梁晨, 王利斌, 李卓群, 薛源, . 生成式对抗网络技术与研究进展[J]. 信息安全研究, 2022, 8(3): 235-.
[9]	粟勇, 刘文龙, 刘圣龙, 江伊雯, . 基于安全洗牌和差分隐私的联邦学习模型安全防护方法[J]. 信息安全研究, 2022, 8(3): 270-.
[10]	胡韵, 刘嘉驹, 李春国, . 一种基于差分隐私的可追踪深度学习分类器[J]. 信息安全研究, 2022, 8(3): 277-.
[11]	赵兴文, 段懿入. 基于假名的NFC安全支付认证协议[J]. 信息安全研究, 2022, 8(12): 1178-.
[12]	于文良, 马田良, 黄鹏, 邱杰, . 运营商用户信息检测与安全分析研究[J]. 信息安全研究, 2022, 8(12): 1192-.
[13]	杨春丽, 邵妍妍, 金旭彤, 黄月琴, 邵雪焱, 许保光, . 个人信息保护法对邮政业的影响与合规研究[J]. 信息安全研究, 2022, 8(11): 1085-.
[14]	董顺宇, 唐波, 刘金会, . 一种前向安全的盲币协议设计与分析[J]. 信息安全研究, 2022, 8(10): 974-.
[15]	傅思敏, 王健, 鹿全礼, 赵阳阳, . 面向交通流量预测隐私保护的联邦学习方法[J]. 信息安全研究, 2022, 8(10): 1035-.

基于迭代二分聚类的K-匿名机制

K-anonymity Mechanism Based on Iterative Binary Clustering

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics