K-anonymity Mechanism Based on Iterative Binary Clustering

Abstract

Abstract: With the deepening of data sharing in various fields, the protection of individual privacy contained in data has become increasingly prominent. At the same time, Kanonymity, as an advanced theory of privacy protection, is also widely used in data sharing and distribution. However, Kanonymity, as a way to achieve privacy protection by generalizing data, will inevitably cause a certain loss of information. Therefore, how to ensure data availability and reduce the information loss as much as possible under the premise of satisfying Kanonymity is a question worthy of study. For this problem, for numerical data, a Kanonymity algorithm KABIBC (Kanonymous algorithm based on iterative binary clustering) based on iterative binary clustering is proposed to achieve Kanonymity. First, the sum of the distances within the group is defined, i.e., WGSD(withingroup sum of distance), and treat all tuples in the data table as a cluster, and then use an iterative strategy to perform binary clustering on it, and recursively process the obtained subclusters in the same way, and reasonably adjust the tuple assignment of the two subclusters based on the principle of minimizing the information loss in the bisection, until the minimum subcluster that satisfies the Kanonymity requirement is obtained, so as to ensure that the amount of information loss tends to be optimal. Theoretical and experimental analysis are given, and it is shown that this mechanism can effectively reduce the information loss, and at the same time has a high operating efficiency.

Key words: iterative optimization, binary clustering, privacy protection, Kanonymity, generalization

摘要： 随着数据共享在各个领域的深入应用，对于数据所包含的个体隐私保护问题日益突出，同时K匿名作为一种隐私保护的先进理论也被广泛应用于数据的共享与分发.但是K匿名作为一种通过概化数据实现隐私保护的方式，不可避免地会造成一定的信息损失，因此如何在满足K匿名的前提下，尽可能保证数据可用性、减少信息损失量则是一个值得研究的问题.对于此，针对数值型数据提出了一种基于迭代二分聚类的K匿名算法KABIBC(Kanonymous algorithm based on iterative binary clustering)实现K匿名.首先定义了组内距离之和WGSD(withingroup sum of distance)，并将数据表中的所有元组视为一个聚类，而后采用迭代的策略对其进行二分聚类，对于得到的子聚类采用同样的方式递归进行处理，并且在二分聚类时基于最小化信息损失量的原则合理调整2个子聚类的元组分配，直到得到满足K匿名要求的最小子聚类，从而保证信息损失量趋于最优.给出了理论和实验分析，表明此机制有效减少了信息损失，同时有较高的运行效率.

关键词: 迭代优化, 二分聚类, 隐私保护, K匿名, 概化

CLC Number:

中图法分类号TP391

王涛, 谭虎, 徐亭亭, 辛保江, 刘刚, 周潘, . 基于迭代二分聚类的K-匿名机制[J]. 信息安全研究, 2023, 9(5): 402-.

References

参考文献
［1］王逸鹤, 黄亦芃. 面向网络安全防御防护的大数据平台架构研究［J］. 信息安全研究, 2021, 7(1): 7580［2］江茜. 大数据安全审计框架及关键技术研究［J］. 信息安全研究, 2019, 5(5): 400405［3］Samarati P, Sweeney L. Protecting privacy when disclosing information: kanonymity and its enforcement through generalization and suppression［JOL］. 1998 ［20221118］. https:xueshu.baidu.comusercenterpapershow?paperid=fd2eb8edd82d1687dbf1b92cd8977a93［4］Meyerson A, Williams R. On the complexity of optimal kanonymity［C］ Proc of the 23rd ACM SIGMODSIGACTSIGART Symp on Principles of Database Systems. New York: ACM, 2004: 223228［5］Sweeney L. Achieving kanonymity privacy protection using generalization and suppression［J］. International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems, 2002, 10(5): 571588［6］LeFevre K, DeWitt D J, Ramakrishnan R. Incognito: Efficient fulldomain kanonymity［C］ Proc of the 2005 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2005: 4960［7］Zhang Jianpei, Zhao Ying, Yang Yue, et al. A kanonymity clustering algorithm based on the information entropy［C］ Proc of the 18th IEEE Int Conf on Computer Supported Cooperative Work in Design (CSCWD). Piscataway, NJ: IEEE, 2014: 319324［8］Li Jiuyong, Wong R C W, Fu A W C, et al. Achieving kanonymity by clustering in attribute hierarchical structures［C］ Proc of Int Conf on Data Warehousing and Knowledge Discovery. Berlin: Springer, 2006: 405416［9］Byun J W, Kamra A, Bertino E, et al. Efficient kanonymization using clustering techniques［C］ Proc of Int Conf on Database Systems for Advanced Applications. Berlin: Springer, 2007: 188200［10］Yan Yan, Herman E A, Mahmood A, et al. A weighted Kmember clustering algorithm for Kanonymization［J］. Computing, 2021, 103(10): 22512273［11］Lin Junlin, Wei Mengcheng. An efficient clustering method for kanonymization［C］ Proc of the 2008 Int Workshop on Privacy and Anonymity in Information Society. New York: ACM, 2008: 4650［12］Guo Naixuan, Yang Ming, Gong Qiyuan, et al. Data anonymization based on natural equivalent class［C］ Proc of the 23rd IEEE Int Conf on Computer Supported Cooperative Work in Design (CSCWD). Piscataway, NJ: IEEE, 2019: 2227［13］姜火文, 曾国荪, 马海英. 面向表数据发布隐私保护的贪心聚类匿名方法［J］. 软件学报, 2017, 28(2): 341351［14］Wang Kun, Zhao Wei, Cui Junjie, et al. A Kanonymous clustering algorithm based on the analytic hierarchy process［J］. Journal of Visual Communication and Image Representation, 2019, 59: 7683［15］Zheng Wantong, Wang Zhongyue, Lv Tongtong, et al. Kanonymity algorithm based on improved clustering［C］ Proc of Int Conf on Algorithms and Architectures for Parallel Processing. Berlin: Springer, 2018: 462476［16］Arava K, Lingamgunta S. Adaptive kanonymity approach for privacy preserving in cloud［J］. Arabian Journal for Science and Engineering, 2020, 45(4): 24252432［17］张强, 叶阿勇, 叶帼华, 等. 最优聚类的k匿名数据隐私保护机制［J］. 计算机研究与发展, 2022, 59(7): 16251635［18］刘晓迁, 李千目. 基于聚类匿名化的差分隐私保护数据发布方法［J］. 通信学报, 2016, 37(5): 125129

[1]	. Research on Security Risk Response for Internet of Body Applications [J]. Journal of Information Security Reserach, 2023, 9(5): 433-.
[2]	. State Grid Electricity Data Sharing Scheme Based on CKKS and CP-ABE [J]. Journal of Information Security Reserach, 2023, 9(3): 262-.
[3]	. Research and Implementation of Electronic License Sharing Scheme Based on Blockchain [J]. Journal of Information Security Reserach, 2023, 9(2): 127-.
[4]	. [J]. Journal of Information Security Reserach, 2022, 8(6): 554-.
[5]	. [J]. Journal of Information Security Reserach, 2022, 8(5): 484-.
[6]	. Technology and Research Progress of Generative Adversarial Networks [J]. Journal of Information Security Reserach, 2022, 8(3): 235-.
[7]	. [J]. Journal of Information Security Reserach, 2022, 8(3): 270-.
[8]	. A Traceable Deep Learning Classifier Based on Differential Privacy [J]. Journal of Information Security Reserach, 2022, 8(3): 277-.
[9]	. NFC Secure Payment Protocol Based on Pseudonym [J]. Journal of Information Security Reserach, 2022, 8(12): 1178-.
[10]	. Research on Customers Information Detection and Security Analysis for ISP [J]. Journal of Information Security Reserach, 2022, 8(12): 1192-.
[11]	. Study on the Influence and Compliance of Personal Information Protection Law on Postal Industry [J]. Journal of Information Security Reserach, 2022, 8(11): 1085-.
[12]	. Design and Analysis of a Forward Security Blindcoin Protocol [J]. Journal of Information Security Reserach, 2022, 8(10): 974-.
[13]	. A Privacy Model for 5G Application Based on Blockchain [J]. Journal of Information Security Reserach, 2022, 8(1): 43-.
[14]	. Privacy Considerations of European Contact Tracing Technology (DP3T) [J]. Journal of Information Security Reserach, 2021, 7(9): 810-814.
[15]	. Discussion on 5G Radio Access Network Security [J]. Journal of Information Security Reserach, 2021, 7(5): 457-465.