信息安全研究 ›› 2021, Vol. 7 ›› Issue (11): 1007-.

• 学术论文 • 上一篇    下一篇

面向隐私保护的关联规则挖掘算法

范敏1 杨庚1,2   

  1. 1(南京邮电大学计算机学院  南京  210023)
    2(江苏省大数据安全与智能处理重点实验室   南京 210023)
  • 出版日期:2021-11-07 发布日期:2021-11-05
  • 通讯作者: 范敏
  • 作者简介:范敏 硕士研究生. 主要研究方向为网络与信息安全、隐私保护. 193474168@qq.com 杨庚 博士,教授,博士生导师. 主要研究方向为物联网安全、隐私保护、云计算安全. yangg@njupt.edu.cn

Association Rules Mining for Privacy Protection

Fan Min1 and Yang Geng1,2   

  1. 1(School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023)
    2(Jiangsu Key Laboratory of Bigdata Security Intelligent Processing, Nanjing 210023)
  • Online:2021-11-07 Published:2021-11-05
  • Contact: 范敏 硕士研究生. 主要研究方向为网络与信息安全、隐私保护. 193474168@qq.com

摘要: 关联规则挖掘是数据挖掘中的一项重要任务.为了解决挖掘过程中的隐私泄露问题,特别是在给定的隐私保护等级下,如何提高挖掘结果的可用性,本文提出了一种满足差分隐私的关联规则挖掘算法——TS-ARM,设计一种新的事务分割方法对数据集进行降维,采用双重条件机制对超过长度阈值的事务进行分割,即分割条件为同时满足分割长度阈值和分割支持度阈值.与传统的事务截断方法相比,这种事务分割方法能够在降低查询的敏感度,以及降低所需噪声规模的同时,有效控制了因添加噪声而导致的信息损失.理论分析和实验证明,提出的算法既保障了隐私性,并在给定的安全等级要求下,提高了数据挖掘结果的可用性.

Abstract: Association rule mining is an important task in data mining. To solve the problem of privacy in the process of mining, especially under the given level of privacy protection, how to improve the usability of mining results, this paper proposes an association rule mining algorithm that satisfies differential privacy(TS-ARM) and designs a new transaction segmentation method to reduce the dimensionality of the data set. A dual conditional mechanism is used to transactions that exceed the length threshold, that is, the segmentation condition meets both the segmentation length threshold and the segmentation support threshold at the same time. Compared with the traditional transaction truncation method, this transaction segmentation method can effectively control the information loss caused by the addition of noise while reducing the sensitivity of the query and reducing the required noise scale. Theoretical analysis and experiments prove that the proposed algorithm not only guarantees privacy, but also improves the availability of data mining results under a given security level.