信息安全研究 ›› 2024, Vol. 10 ›› Issue (1): 25-.

• 学术论文 • 上一篇    下一篇

融合卷积神经网络和Transformer的人脸欺骗检测模型

黄灵1何希平1,2贺丹1杨楚天1旷奇弦1   

  1. 1(重庆工商大学人工智能学院重庆400067)
    2(检测控制集成系统重庆市工程实验室(重庆工商大学)重庆400067)

  • 出版日期:2024-01-10 发布日期:2024-01-21
  • 通讯作者: 黄灵 硕士研究生.主要研究方向为计算机视觉、深度学习、活体检测. 2207346808@qq.com
  • 作者简介:黄灵 硕士研究生.主要研究方向为计算机视觉、深度学习、活体检测. 2207346808@qq.com 何希平 博士,教授.主要研究方向为计算机视觉、机器学习、大数据、模式识别. jsjhxp@ctbu.edu.cn 贺丹 硕士研究生.主要研究方向为计算机视觉、深度学习、活体检测. 2020613001@email.ctbu.edu.cn 杨楚天 硕士研究生.主要研究方向为计算机视觉、深度学习、人脸属性编辑. chutian_yang@163.com 旷奇弦 硕士研究生.主要研究方向为计算机视觉、深度学习、人脸属性编辑. 1363164655@qq.com

Face Spoofing Detection Model with Fusion of Convolutional  Neural Network and Transformer

Huang  Ling1, He Xiping1,2, He Dan1, Yang Chutian1, and Kuang Qixian1#br#

#br#
  

  1. 1(School of Artificial Intelligence, Chongqing Technology and Business University, Chongqing 400067)
    2(Chongqing Engineering Laboratory for Detection, Control and Integrated System (Chongqing Technology and Business University), Chongqing 400067)

  • Online:2024-01-10 Published:2024-01-21

摘要: 在人脸反欺骗领域,大多数现有检测模型都是基于卷积神经网络(convolutional neural network, CNN),该类方法虽能以较少的参数学习人脸识别,但其感受野是局部的;而基于Transformer的方法虽然能够全局感知,但参数量和计算量极大,无法在移动或边缘设备广泛部署.针对以上问题,提出一种融合CNN和Transformer的人脸欺骗检测模型,旨在保持人脸全局和局部特征提取能力的前提下,实现参数量和准确度的平衡.首先,裁剪选取局部人脸图像作为输入,有效避免过拟合现象;其次,设计基于坐标注意力的特征提取模块;最后,设计融合CNN和Transformer模块,通过局部全局局部的信息交换实现图像局部特征和全局特征的提取.实验结果表明,该模型在CASIASURF(Depth模态)数据集上获得了99.31%的准确率以及0.54%的平均错误率;甚至在CASIAFASD和ReplayAttack这2个数据集上实现了零错误率,且模型参数量仅0.59MB,远小于Transformer系列模型.

关键词: 人脸欺骗检测, CNN, Transformer, 模型融合, 注意力机制

Abstract: In the field of face antispoofing, the methods based on Convolutional Neural Network (CNN) can learn feature representation with fewer parameters, yet their receptive fields remain local. In contrast, Transformerbased methods offer global perception but entail an impractical volume of parameters and computations, hindering widespread deployment on mobile or edge devices. To address these challenges, this paper proposed a face spoofing detection model that integrates CNN and Transformer, aiming to achieve a balance between the amount of parameters and accuracy while maintaining the ability to extract global and local features. Firstly, local face images are cropped and selected  as input to effectively avoid overfitting. Secondly, the feature extraction module based on coordinate attention is designed. Finally, the fusion of CNN and Transformer modules are designed to extract local and global features of images through localgloballocal information exchange. The experimental results show that the model achieved an accuracy of 99.31% and an average error rate of 0.54% on the CASIASURF (Depth modality) dataset; Moreover zero  error rate is achieved on the CASIAFASD and ReplayAttack datasets, and the model parameters are only 0.59MB, much smaller than the Transformer series models.

Key words: face spoofing detection, CNN, Transformer, model fusion, attention mechanisy

中图分类号: