Journal of Information Security Research ›› 2016, Vol. 2 ›› Issue (3): 251-257.

Previous Articles     Next Articles

Search Log Anonymity Publish Based on Differential Privacy and Classification Technique


  • Received:2016-03-15 Online:2016-03-15 Published:2016-03-16



  1. 北京信息科技大学
  • 通讯作者: 康海燕
  • 作者简介:博士,教授,硕士生导师,主要研究方向为信息系统安全和网络隐私保护、智能信息处理.

Abstract: The search logs analysis is the important research area of data mining and machine learning, the data privacy preserving of network search logs has been a big challenge at home and abroad, this paper proposed a search log anonymous publish method based on classification anonymous technique and differential privacy. First we combine the kanonymity and classification anonymous into cluster method, classifying the quasiidentifier attribute to cluster. In order to improve the data accuracy, we introduce the search similarity calculate method; Then we add exponent noise to the cluster and make sure it satisfies the differential privacy protection; Finally we release the protection result data set. The experiments shows that it can prevent the loss of sensitive information, protecting the network search logs privacy data and improving the data availability.

Key words: differential privacy, privacy preserving, search log, data publish, classification technique

摘要: 搜索日志分析是数据挖掘和机器学习的重要研究内容,网络搜索日志中的隐私数据安全成为当前面临的重大挑战,提出一种分类匿名化技术与差分隐私相结合的搜索日志匿名化发布方法.首先,将k匿名的思想与分类匿名化技术扩展到聚类方法中,分类概化准标识属性引导形成簇,所提出的查询项相似度计算方法有效改善聚类精度;其次,在簇中分别添加指数噪音数据,且使发布的数据满足差分隐私保护;最后发布处理后的数据.实验表明:该方法有效地防止搜索日志中敏感信息泄露,并提高了数据的实用性.

关键词: 差分隐私, 隐私保护, 网络搜索日志, 数据发布, 分类技术