深度伪造生成和检测技术综述

摘要/Abstract

摘要： 近年来兴起的深度伪造技术能够篡改或生成高度逼真且难以甄别的音视频内容，并得到了广泛的良性和恶意应用。针对深度伪造的生成和检测，国内外专家学者进行了深入研究，并提出了相应的生成和检测方案。对现有的基于深度学习的音视频深度伪造生成技术、检测技术、数据集以及未来的研究方向进行了全面的概述和详细分析，这些工作将有助于相关人员对深度伪造的理解和对恶意深度伪造防御检测的研究。

关键词: 深度学习, 深度伪造, 生成技术, 检测技术

Abstract: In recent years, deepfakes technology can tamper with or generate highly realistic and difficult to distinguish audio and video content, and has been widely used in benign and malicious applications. For the generation and detection of deepfakes, experts and scholars at home and abroad have conducted in-depth research, and put forward the corresponding generation and detection scheme. This paper gives a comprehensive overview and detailed analysis of the existing audio and video deepfakes generation and detection technology based on deep learning , data set and future research direction, which will help relevant personnel to understand deepfakes and research on malicious deepfakes prevention and detection.

Key words: deep learning, deepfakes, generation technology, detection technology

张煜之, 王锐芳, 朱亮, 赵坤园, 刘梦琪, . 深度伪造生成和检测技术综述[J]. 信息安全研究, 2022, 8(3): 258-.

参考文献

[1]Mirsky Y, Lee W. The creation and detection of deepfakes: A survey[J]. ACM Computing Surveys (CSUR), 2021, 54(1): 1-41

[2] Xu F J, Wang Run, Huang Yihao, et al. Countering malicious deepfakes: Survey, battleground, and horizon[EB/OL].(2021-12-07)[2021-12-20]. https://arxiv.org/abs/2103.00218v1

[3] Masood M, Nawaz M, Malik K M, et al. Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward[EB/OL].(2021-11-23)[2021-12-20].https://arxiv.org/abs/2103.00484

[4]VisionborneInc.Faceswap:Deepfakes software for all[EB/OL].(2020-09-08)[2021-12-20].https://github.com/deepfakes/faceswap

[5]GitHubInc.Faceswap-GAN[EB/OL].(2020-09-18)[2021-12-20].https://github.com/shaoanlu/faceswap-GAN

[6]Natsume R, Yatagawa T, Morishima S. Fsnet: an identity-aware generative model for image-basedface swapping[EB/OL].(2018-11-30)[2021-12-20].https://arxiv.org/abs/1811.12666v1

[7]Natsume R, Yatagawa T, Morishima S. Rsgan: face swapping and editing using face and hair representation in latent spaces[EB/OL].(2018-04-18)[2021-12-20].https://arxiv.org/abs/1804.03447v1

[8]Li Lingzhi, Bao Jianmin, Yang Hao, et al. Advancing high fidelity identity swapping for forgery detection[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.NJ:IEEE,2020: 5074-5083

[9] Korshunova I, Shi Wenzhe, Dambre J, et al. Fast face-swap using convolutional neural networks[C]//2017IEEE International Conference on Computer Vision.NJ:IEEE,2017: 3677-3685

[10] KR P, Mukhopadhyay R, Philip J, et al. Towards automatic face-to-face translation[C]// The 27th ACM International Conference on Multimedia. New York: ACM, 2019: 1428-1436

[11]Wu W, Zhang Y, Li C, et al. Reenactgan: Learning to reenact faces via boundary transfer[EB/OL]. [2021-12-20].https://xs.dailyheadlines.cc/scholar?hl=zh-CN&as_sdt=0%2C5&q=Reenactgan%3A+Learning+to+reenact+faces+via+boundary+transfer&btnG=

[12] Tripathy S, Kannala J, Rahtu E. Icface: Interpretable and controllable face reenactment using gans[C]//The IEEE/CVF Winter Conference on Applications of Computer Vision. NJ:IEEE, 2020: 3385-3394

[13]Nirkin Y, Keller Y, Hassner T. Fsgan: Subject agnostic face swapping and reenactment[C]//The IEEE/CVF International Conference on Computer Vision. NJ:IEEE, 2019: 7184-7193

[14] Karras T, Aila T, Laine S, et al. Progressive growing of gans for improved quality, stability, and variation[EB/OL].(2018-02-26)[2021-12-20].https://arxiv.org/abs/1710.10196

[15] Liu Mingyu, Tuzel O. Coupled generative adversarial networks[J]. Advances in Neural Information Processing Systems, 2016, 29(12): 469-477

[16]Zhang Han, Goodfellow I, Metaxas D, et al. Self-attention generative adversarial networks[EB/OL].(2019-06-14)[2021-12-20]. https://arxiv.org/abs/1805.08318

[17]Choi Y, Uh Y, Yoo J, et al. Stargan v2: Diverse image synthesis for multiple domains[C]//The IEEE/CVF Conference on Computer Vision and Pattern Recognition. NJ:IEEE, 2020: 8188-8197

[18]He Zhenliang, Zuo Wangmeng, Kan Meina, et al. Attgan: Facial attribute editing by only changing what you want[J].IEEE Transactions on Image Processing,2019, 28(11):5464-5478

[19]Liu Ming, Ding Yukang, Xia Min, et al. STGAN: A unified selective transfer network for arbitrary image attribute editing[C]//The IEEE/CVF Conference on Computer Vision and Pattern Recognition. NJ:IEEE, 2019: 3673-3682

[20]Ping Wei, Peng Kainan, Gibiansky A, et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech[EB/OL].(2018-02-22)[2021-12-20]. https://arxiv.org/abs/1710.07654v1

[21]Yasuda Y, Wang Xin, Takaki S, et al. Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language[C]//2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). NJ:IEEE, 2019: 6905-6909

[22]Arik S, Chen Jitong, Peng Kainan, et al. Neural voice cloning with a few samples[EB/OL].[2021-12-20].https://arxiv.org/abs/1802.06006

[23]Jia Ye, Zhang Yu, Weiss R, et al. Transfer learning from speaker verification to multispeaker text-to-speech synthesis[EB/OL].[2021-12-20].https://arxiv.org/abs/1806.04558

[24]Cong Jian, Yang Shan, Xie Lei, et al. Data efficient voice cloning from noisy samples with domain adversarial training[EB/OL].[2021-12-20].https://arxiv.org/abs/2008.04265

[25]Sun Lifa, Kang Shiyin, Li Kun, et al. Voice conversion using deep bidirectional long short-term memory based recurrent neural networks[C]//2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). NJ:IEEE, 2015: 4869-4873

[26]Kaneko T, Kameoka H, Tanaka K, et al. Cyclegan-vc2: Improved cyclegan-based non-parallel voice conversion[C]//2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). NJ:IEEE, 2019: 6820-6824

[27]Kameoka H, Kaneko T, Tanaka K, et al. Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks[C]//2018 IEEE Spoken Language Technology Workshop (SLT). NJ:IEEE, 2018: 266-273

[28]Zhang Mingyang, Sisman B, Zhao Li, et al. DeepConversion: Voice conversion with limited parallel training data[J]. Speech Communication, 2020, 122(12): 31-43

[29]Li Haobin, Li Bin, Tan Shunquan, et al. Identification of deep network generated images using disparities in color components[EB/OL].[2021-12-20]. https://www.sciencedirect.com/science/article/abs/pii/S0165168420301596

[30]Liu Bo, Pun C. Deep fusion network for splicing forgery localization[EB/OL]. [2021-12-20].https://xs.dailyheadlines.cc/scholar?hl=zh-CN&as_sdt=0%2C5&q=%5B30%5DLiu+Bo%2C+Deep+fusion+network+for+splicing+forgery+localization&btnG=#d=gs_qabs&u=%23p%3DoHm946ya1nwJ

[31]Zhou Peng, Han Xintong, Morariu V, et al. Learning rich features for image manipulation detection[C]//The IEEE Conference on Computer Vision and Pattern Recognition. NJ:IEEE, 2018: 1053-1061

[32]Durall R, Keuper M, Pfreundt F, et al. Unmasking deepfakes with simple features[EB/OL].[2021-12-20].https://arxiv.org/abs/1911.00686

[33]Dang Hao, Liu Feng, Stehouwer J, et al.On the detection of digital face manipulation[C]//The IEEE/CVF Conference on Computer Vision and Pattern Recognition. NJ:IEEE,2020: 5781-5790

[34]Bayar B, Stamm M.A deep learning approach to universal image manipulation detection using a new convolutional layer[C]//The 4th ACM Workshop on Information Hiding and Multimedia Security.New York:ACM, 2016: 5-10

[35]Rahmouni N, Nozick V, Yamagishi J, et al. Distinguishing computer graphics from natural images using convolution neural networks[C]//2017 IEEE Workshop on Information Forensics and Security (WIFS). NJ:IEEE,2017: 1−6

[36]Li Yuezun, Chang M, Lyu S. In ictu oculi: Exposing ai created fake videos by detecting eye blinking[C]//2018 IEEE International Workshop on Information Forensics and Security (WIFS). NJ:IEEE, 2018: 1-7

[37]Yang Xin, Li Yuezun, Lyu S. Exposing deep fakes using inconsistent head poses[C]//2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). NJ:IEEE,2019: 8261-8265

[38]Hernandez-Ortega J, Tolosana R, Fierrez J, et al. Deepfakeson-phys: Deepfakes detection based on heart rate estimation[EB/OL].[2021-12-20].https://arxiv.org/abs/2010.00400

[39]Qian Yuyang, Yin Guojun, Sheng L, et al. Thinking in frequency: Face forgery detection bymining frequency-aware clues[C]//European Conference on Computer Vision. Cham:Springer, 2020: 86-103

[40]Chen Yunpeng, Fan Haoqi, Xu Bing, et al. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution[C]//The IEEE/CVF International Conference on Computer Vision. NJ:IEEE,2019: 3435-3444

[41]Zhang Zhenyu, Yi Xiaowei, Zhao Xiaofeng. Fake Speech Detection Using Residual Network with Transformer Encoder[C]//The 2021 ACM Workshop on Information Hiding and Multimedia Security. New York: ACM, 2021: 13-22

[42]Huang Lian, Pun C. Audio Replay Spoof Attack Detection by Joint Segment-Based Linear Filter Bank Feature Extraction and Attention-Enhanced DenseNet-BiLSTM Network[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28(12): 1813-1825

[43]Zhang You, Jiang Fei, Duan Zhiyao.One-class learning towards synthetic voice spoofing detection[J]. IEEE Signal Processing Letters, 2021, 28(12): 937-941

[44]Wang Run, Xu J F, Huang Yihao, et al. Deepsonar: Towards effective and robustdetection of ai-synthesized fake voices[C]//The 28th ACM International Conference on Multimedia. New York:ACM, 2020: 1207-1216

[45]Reimao R, Tzerpos V. FoR: A Dataset for Synthetic Speech Detection[C]//2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).NJ:IEEE, 2019: 1-10

[46]malavida.FakeApp 2.2.0[EB/OL].(2020-09-18)[2021-12-20].https://www.malavida.com/en/soft/fakeapp

[47]Rossler A, Cozzolino D, Verdoliva L, et al. Faceforensics++: Learning to detect manipulated facial images[C]//The IEEE/CVF International Conference on Computer Vision. NJ:IEEE, 2019: 1-11

[48]Rössler A, Cozzolino D, Verdoliva L, et al. Faceforensics: A large-scale video dataset for forgery detection in human faces[EB/OL].[2021-12-20].https://arxiv.org/abs/1803.09179

[49]Korshunov P, Marcel S. Deepfakes: a new threat to face recognition?assessment and detection[EB/OL].[2021-12-20].https://arxiv.org/abs/1812.08685

[50]Li Yuezun, Yang Xin, Sun Pu, et al. Celeb-df: A new dataset for deepfakeforensics[EB/OL]. [2021-12-20].https://arxiv.org/abs/1909.12962

[51]Dolhansky B, Howes R, Pflaum B, et al. The deepfake detection challenge (dfdc) preview dataset[EB/OL].[2021-12-20].https://arxiv.org/abs/1910.08854

[52]Jiang Liming, Li Ren, Wu W, et al. Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection[C]//The IEEE/CVF Conference on Computer Vision and Pattern Recognition. NJ:IEEE,2020: 2889-2898

[53]Wang Xin, Yamagishi J, Todisco M, et al. ASVspoof 2019: A large-scale public databaseof synthesized, converted and replayed speech[EB/OL].(2020-07-14)[2021-12-20]. https://arxiv.org/abs/1911.01601v1

[54]Arik S O, Chen Jitong, Peng Kainan, et al. Neural voice cloning with a few samples[EB/OL].[2021-12-20].https://arxiv.org/abs/1802.06006

编辑推荐 0

Metrics

阅读次数

全文

371

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	371

来源	本网站	其他网站

次数	361	10
比例	97%	3%

摘要

771

最新录用	在线预览	正式出版

0	0	771

	来源	本网站

	次数	771
	比例	100%