[1]Gartner. Hype cycle for data, analytics and AI in China[EBOL]. 2023 [20240120]. https:www.gartner.comendocuments4538299[2]McAfee. Artificial intelligence voice scams on the rise with 1 in 4 adults impacted[EBOL]. 2023 [20240120]. https:www.businesswire.comnewshome20230501005587enArtificialIntelligenceVoiceScamsontheRisewith1in4AdultsImpacted[3]Coker C H. A model of articulatory dynamics and control[J]. Proceedings of the IEEE, 1976, 64(4): 452460[4]Klatt D H. Software for a cascadeparallel formant synthesizer[J]. The Journal of the Acoustical Society of America, 1980, 67(3): 971995[5]Charpentier F, Moulines E. Pitchsynchronous waveform processing techniques for texttospeech synthesis using diphones[J]. Speech Communication, 1990, 9(56): 453467[6]Tokuda K, Yoshimura T, Masuko T, et al. Speech parameter generation algorithms for hmmbased speech synthesis[C] Proc of the Int Conf on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE, 2000: 13151318[7]Oord A V, Dieleman S, Zen H, et al. WaveNet: A generative model for raw audio[J]. arXiv preprint, arXiv:1609.03499, 2016[8]Ren Yi, Hu Chenxu, Tan Xu, et al. Fastspeech 2: Fast and highquality endtoend text to speech[J]. arXiv preprint, arXiv:2006.04558, 2020[9]Khachatryan L, Movsisyan A, Tadevosyan V, et al. Text2VideoZero: Texttoimage diffusion models are zeroshot video generators[J]. arXiv preprint, arXiv:2303.13439, 2023[10]Shuang Z W, Bakis R, Shechtman S, et al. Frequency warping based on mapping formant parameters[C] Proc of the 9th Int Conf on Spoken Language Processing. Grenoble, France: ISCA, 2006: 22902293[11]Kaneko T, Kameoka H. Paralleldatafree voice conversion using cycleconsistent adversarial networks[J]. arXiv preprint, arXiv:1711.11293, 2017[12]Hsu C C, Hwang H T, Wu Y C, et al. Voice conversion from nonparallel corpora using variational autoencoder[C] Proc of the 2016 AsiaPacific Signal and Information Processing Association Annual Summit and Conf. Hawaii: APSIPA, 2016: 16[13]Shi Z. A survey on audio synthesis and audiovisual multimodal processing[J]. arXiv preprint, arXiv:2108.00443, 2021[14]Kirchhuebel C, Brown G. Spoofed speech from the perspective of a forensic phonetician[C] Proc of Interspeech. Grenoble, France: ISCA, 2022: 13081312[15]Todisco M, Delgado H, Lee K A, et al. Integrated presentation attack detection and automatic speaker verification: Common features and Gaussian backend fusion[C] Proc of Interspeech. Grenoble, France: ISCA, 2018: 7781[16]Todisco M, Delgado H, Evans N W. A new feature for automatic speaker verification antispoofing: Constant Q cepstral coefficients[COL] Proc of Odyssey. 2016 [20240116]. https:www.odyssey2016.org[17]Wu Z, Chng E S, Li H. Detecting converted speech and natural speech for antispoofing attack in speaker recognition[C] Proc of Interspeech. Grenoble, France: ISCA, 2012: 17001703[18]GomezAlanis A, Peinado A M, Gonzalez J A, et al. A light convolutional GRURNN deep feature extractor for ASV spoofing detection[C] Proc of Interspeech. Grenoble, France: ISCA, 2019: 10681072[19]Baevski A, Zhou Y, Mohamed A, et al. Wav2Vec2.0: A framework for selfsupervised learning of speech representations[J]. Advances in Neural Information Processing Systems, 2020, 33: 1244912460[20]Lv Z, Zhang S, Tang K, et al. Fake audio detection based on unsupervised pretraining models[C] Proc of the Int Conf on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE, 2022: 92319235[21]Patel T B, Patil H A. Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs spoofed speech[C] Proc of Interspeech. Grenoble, France: ISCA, 2015: 20622066[22]Novoselov S, Kozlov A, Lavrentyeva G, et al. Stc antispoofing systems for the asvspoof 2015 challenge[C] Proc of the Int Conf on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE, 2016: 54755479[23]Wu Z, Das R K, Yang J, et al. Light convolutional neural network with feature genuinization for detection of synthetic speech attacks[J]. arXiv preprint, arXiv:2009.09637, 2020[24]Tak H, Jung J W, Patino J, et al. Endtoend spectrotemporal graph attention networks for speaker verification antispoofing and speech deepfake detection[J]. arXiv preprint, arXiv:2107.12710, 2021 [25]Jung J W, Heo H S, Tak H, et al. Aasist: Audio antispoofing using integrated spectrotemporal graph attention network[C] Proc of the Int Conf on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE, 2022: 63676371[26]郑方, 徐明星, 程星亮. 数据特征提取方法、录音重放检测方法、存储介质和电子设备: 中国, ZL201910646885.5[P]. 20191105[27]孙珵珵. 网络安全治理对策研究[J]. 信息网络安全, 2023, 23(6): 104110
|