Generative Fake Speech Security Issue and Solution#br#

	#br#

Journal of Information Security Reserach ›› 2024, Vol. 10 ›› Issue (2): 122-.

Previous Articles Next Articles

Generative Fake Speech Security Issue and Solution#br#
#br#

Feng Chang1,2, Wu Xiaolong2,3, Zhao Yiyang1,2, Xu Mingxing1,2, and Zheng Fang1,2#br# #br#

1(Department of Computer Science and Technology, Tsinghua University, Beijing 100084)
2(Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084)
3(School of Computer Science and Technology, Xinjiang University, Urumqi 830046)

Online:2024-02-21 Published:2024-02-26

生成式伪造语音安全问题与解决方案

冯畅1,2吴晓龙2,3赵熠扬1,2徐明星1,2郑方1,2

1(清华大学计算机科学与技术系北京100084)
2(清华大学北京信息科学与技术国家研究中心北京100084)
3(新疆大学计算机科学与技术学院乌鲁木齐830046)

通讯作者: 郑方博士，教授.主要研究方向为说话人识别、语音识别、自然语言处理. fzheng@tsinghua.edu.cn
作者简介:冯畅博士研究生.主要研究方向为伪造语音检测. fc19@mails.tsinghua.edu.cn 吴晓龙博士研究生.主要研究方向为语音情感识别. wuxl@stu.xju.edu.cn 赵熠扬硕士研究生.主要研究方向为说话人识别. zhaoyy22@mails.tsinghua.edu.cn 徐明星博士，副研究员.主要研究方向为语音情感识别、声纹识别. xumx@tsinghua.edu.cn 郑方博士，教授.主要研究方向为说话人识别、语音识别、自然语言处理. fzheng@tsinghua.edu.cn

Abstract

Abstract: The development of generative artificial intelligence algorithms has made the generation of fake speech increasingly natural and fluid, making it challening for human listeners to distinguish the genuine and fake speech. This paper firstly analyzes a series of threats to society posed by the improper abuse of generative fake speech, including an increase in telecommunication fraud, a decline in the security of voiceoperated applications, judicial fairness of forensic identification, and deception to the public through the combination of falsified information across various domains. Subsequently, the paper summarizes and classifies the algorithms of fake speech generation and fake speech detection technology from the perspective of technology development. We explains the procedural aspects of the technologies and their key points, along with an analysis of the challenges encountered in the process of application. Finally, this paper outlines strategies to prevent and address these security issues from four aspects: technical application, institutional regulation, public education and international cooperation.

Key words: generative artificial intelligence, fake speech, security issue of fake speech, fake speech detection, solution to fake speech threat

摘要： 生成式人工智能算法的发展使得生成式伪造语音更加自然流畅，人类听力难以分辨真伪.首先分析了生成式伪造语音不当滥用对社会造成的一系列威胁，如电信诈骗更加泛滥、语音应用程序安全性下降、司法鉴定公正性受到影响、综合多领域的伪造信息欺骗社会大众等.然后从技术发展角度，对生成式伪造语音的生成算法和检测算法分别进行总结与分类，阐述算法流程步骤及其中的关键点，并分析了技术应用的挑战点.最后从技术应用、制度规范、公众教育、国际合作4方面阐述了如何预防以及解决生成式伪造语音带来的安全问题.

关键词: 生成式人工智能, 伪造语音, 伪造语音安全问题, 伪造语音检测, 伪造语音威胁解决

CLC Number:

TP309.1

冯畅, 吴晓龙, 赵熠扬, 徐明星, 郑方, . 生成式伪造语音安全问题与解决方案[J]. 信息安全研究, 2024, 10(2): 122-.

References

［1］Gartner. Hype cycle for data, analytics and AI in China［EBOL］. 2023 ［20240120］. https:www.gartner.comendocuments4538299［2］McAfee. Artificial intelligence voice scams on the rise with 1 in 4 adults impacted［EBOL］. 2023 ［20240120］. https:www.businesswire.comnewshome20230501005587enArtificialIntelligenceVoiceScamsontheRisewith1in4AdultsImpacted［3］Coker C H. A model of articulatory dynamics and control［J］. Proceedings of the IEEE, 1976, 64(4): 452460［4］Klatt D H. Software for a cascadeparallel formant synthesizer［J］. The Journal of the Acoustical Society of America, 1980, 67(3): 971995［5］Charpentier F, Moulines E. Pitchsynchronous waveform processing techniques for texttospeech synthesis using diphones［J］. Speech Communication, 1990, 9(56): 453467［6］Tokuda K, Yoshimura T, Masuko T, et al. Speech parameter generation algorithms for hmmbased speech synthesis［C］ Proc of the Int Conf on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE, 2000: 13151318［7］Oord A V, Dieleman S, Zen H, et al. WaveNet: A generative model for raw audio［J］. arXiv preprint, arXiv:1609.03499, 2016［8］Ren Yi, Hu Chenxu, Tan Xu, et al. Fastspeech 2: Fast and highquality endtoend text to speech［J］. arXiv preprint, arXiv:2006.04558, 2020［9］Khachatryan L, Movsisyan A, Tadevosyan V, et al. Text2VideoZero: Texttoimage diffusion models are zeroshot video generators［J］. arXiv preprint, arXiv:2303.13439, 2023［10］Shuang Z W, Bakis R, Shechtman S, et al. Frequency warping based on mapping formant parameters［C］ Proc of the 9th Int Conf on Spoken Language Processing. Grenoble, France: ISCA, 2006: 22902293［11］Kaneko T, Kameoka H. Paralleldatafree voice conversion using cycleconsistent adversarial networks［J］. arXiv preprint, arXiv:1711.11293, 2017［12］Hsu C C, Hwang H T, Wu Y C, et al. Voice conversion from nonparallel corpora using variational autoencoder［C］ Proc of the 2016 AsiaPacific Signal and Information Processing Association Annual Summit and Conf. Hawaii: APSIPA, 2016: 16［13］Shi Z. A survey on audio synthesis and audiovisual multimodal processing［J］. arXiv preprint, arXiv:2108.00443, 2021［14］Kirchhuebel C, Brown G. Spoofed speech from the perspective of a forensic phonetician［C］ Proc of Interspeech. Grenoble, France: ISCA, 2022: 13081312［15］Todisco M, Delgado H, Lee K A, et al. Integrated presentation attack detection and automatic speaker verification: Common features and Gaussian backend fusion［C］ Proc of Interspeech. Grenoble, France: ISCA, 2018: 7781［16］Todisco M, Delgado H, Evans N W. A new feature for automatic speaker verification antispoofing: Constant Q cepstral coefficients［COL］ Proc of Odyssey. 2016 ［20240116］. https:www.odyssey2016.org［17］Wu Z, Chng E S, Li H. Detecting converted speech and natural speech for antispoofing attack in speaker recognition［C］ Proc of Interspeech. Grenoble, France: ISCA, 2012: 17001703［18］GomezAlanis A, Peinado A M, Gonzalez J A, et al. A light convolutional GRURNN deep feature extractor for ASV spoofing detection［C］ Proc of Interspeech. Grenoble, France: ISCA, 2019: 10681072［19］Baevski A, Zhou Y, Mohamed A, et al. Wav2Vec2.0: A framework for selfsupervised learning of speech representations［J］. Advances in Neural Information Processing Systems, 2020, 33: 1244912460［20］Lv Z, Zhang S, Tang K, et al. Fake audio detection based on unsupervised pretraining models［C］ Proc of the Int Conf on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE, 2022: 92319235［21］Patel T B, Patil H A. Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs spoofed speech［C］ Proc of Interspeech. Grenoble, France: ISCA, 2015: 20622066［22］Novoselov S, Kozlov A, Lavrentyeva G, et al. Stc antispoofing systems for the asvspoof 2015 challenge［C］ Proc of the Int Conf on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE, 2016: 54755479［23］Wu Z, Das R K, Yang J, et al. Light convolutional neural network with feature genuinization for detection of synthetic speech attacks［J］. arXiv preprint, arXiv:2009.09637, 2020［24］Tak H, Jung J W, Patino J, et al. Endtoend spectrotemporal graph attention networks for speaker verification antispoofing and speech deepfake detection［J］. arXiv preprint, arXiv:2107.12710, 2021 ［25］Jung J W, Heo H S, Tak H, et al. Aasist: Audio antispoofing using integrated spectrotemporal graph attention network［C］ Proc of the Int Conf on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE, 2022: 63676371［26］郑方, 徐明星, 程星亮. 数据特征提取方法、录音重放检测方法、存储介质和电子设备: 中国, ZL201910646885.5［P］. 20191105［27］孙珵珵. 网络安全治理对策研究［J］. 信息网络安全, 2023, 23(6): 104110

Generative Fake Speech Security Issue and Solution#br#
#br#

生成式伪造语音安全问题与解决方案

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 1

Recommended Articles

Metrics

Generative Fake Speech Security Issue and Solution#br# #br#

生成式伪造语音安全问题与解决方案

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 1

Recommended Articles

Metrics

Generative Fake Speech Security Issue and Solution#br#
#br#