信息安全研究 ›› 2024, Vol. 10 ›› Issue (2): 130-.

• 人工智能安全专题 • 上一篇    下一篇

基于分治方法的声纹识别系统模型反演

张骏飞张雄伟孙蒙   

  1. (中国人民解放军陆军工程大学指挥控制工程学院南京210001)

  • 出版日期:2024-02-21 发布日期:2024-02-26
  • 作者简介:张骏飞 硕士研究生.主要研究方向为信息内容安全、智能语音处理. junfeizh@163.com 张雄伟 博士,教授.主要研究方向为语音与图像处理、智能信息处理. xwzhang9898@163.com 孙蒙 博士,副教授.主要研究方向为智能语音处理、机器学习. sunmeng@aeu.edu.cn

Model Inversion of Voiceprint Recognition System Based on   DivideandConquer Method

Zhang Junfei, Zhang Xiongwei, and Sun Meng#br#

#br#
  

  1. (College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210001)

  • Online:2024-02-21 Published:2024-02-26

摘要: 模型反演越来越引起人们对隐私的关注,它可以从模型中重构私有隐私数据,从而引发更加严重的信息安全问题.针对语音信息安全,首次尝试了一个新的模型反演应用:从声纹识别系统中提取说话人语音的语谱图特征.为了减少反演过程中的复杂度及误差,采用分治法的思想逐层反演,并通过循环一致性的有效监督,成功重构与说话人身份一致的反演样本;另外,由于语音的特殊性,模型特征层已包含丰富的说话人信息,进一步减弱语义信息相似后,改进的方法显著提高了反演样本的识别准确率,表明反演所得语谱图中已含有有效表示说话人身份的信息.实验结果证明了模型反演在语谱图上的可行性,突出了提取此类语音特征信息的深度网络模型所带来的隐私信息泄露风险.

关键词: 模型反演, 神经网络, 声纹识别, 语谱图, 信息安全

Abstract: Model inversion (MI) has raised increasing concerns about privacy, which can reconstruct private data from a recognition or classification model, thus leading to more serious privacy information security problems. This paper is the first attempt at a new model inversion application for speech information security: extracting spectrogram features of speaker speech from voiceprint recognition systems. In order to reduce the complexity and error in the inversion process, this paper adopts the idea of divideandconquer method to invert layer by layer, and through the effective supervision of cycleconsistency, the inversion samples consistent with the speaker’s identity is successfully reconstructed; In addition, due to the particularity of speech, the model feature layer has contained rich speaker information, and after further weakening the similarity of semantic information, the improved method significantly improves the recognition accuracy of inversion samples, indicating that the inversion obtained spectrogram has contained information that effectively represents the identity of the speaker. The research shows that the MI of the recognition model is feasible on the spectrogram features, highlighting the risk of privacy information leakage resulting from the extraction of the speech feature information in the deep network model

Key words: model inversion, neural network, voiceprint recognition, spectrogram, information security

中图分类号: