Journal of Information Security Reserach ›› 2024, Vol. 10 ›› Issue (8): 729-.

Previous Articles     Next Articles

Behavior Conflict Detection Model Based on Transformer and  Graph Convolution Networks

Wen Jin1, Jiang Kaiyuan2, Han Yuyang1, Wang Zhiqiang1, Luo Leqi1, and Tian Wenliang1#br#

#br#
  

  1. 1(Beijing Electronic Science and Technology Institute, Beijing 100070)
    2(National Center for Public Credit Information, Beijing 100045)

  • Online:2024-08-08 Published:2024-08-08

基于Transformer与图卷积网络的行为冲突检测模型

文津1蒋凯元2韩禹洋1王志强1罗乐琦1田文亮1


  

  1. 1(北京电子科技学院北京100070)
    2(国家公共信用信息中心北京100045)

  • 通讯作者: 王志强 博士,副教授,硕士生导师.主要研究方向为漏洞发现、恶意软件检测和AI安全. wangzq@besti.edu.cn
  • 作者简介:文津 硕士.主要研究方向为动作识别和AI安全. 20212818@mail.besti.edu.cn 蒋凯元 正高级工程师.主要研究方向为政务信息化、数据管理、社会信用体系建设. jiangky@creditchina.gov.cn 韩禹洋 博士研究生.主要研究方向为漏洞挖掘、深度伪造检测和AI安全. 20232005@mail.besti.edu.cn 王志强 博士,副教授,硕士生导师.主要研究方向为漏洞发现、恶意软件检测和AI安全. wangzq@besti.edu.cn 罗乐琦 硕士.主要研究方向为漏洞挖掘、网络攻防. 20211909@mail.besti.edu.cn 田文亮 硕士.主要研究方向为隐私集合求交. 1070274287@qq.com

Abstract: In recent years, with the increasing number of surveillance cameras and the rapid development of the Internet, there are more and more surveillance and online videos. The automatic detection of behavior conflict in videos is of great significance to reduce the risk of privacy information leakage caused by human auditing, maintain social order and purify the environment online. To fully extract features of behavior conflict from videos and obtain models with good generalization ability and detection performance, we use I3D (inflated 3D convolutional network) and VGGish to extract multimodal features based on the XDViolence dataset, and propose the behavior conflict detection model based on transformer and graph convolution networks (TGBCDM) for behavior conflict detection. The model contains a Transformer encoder module and a graph convolution module, which can effectively capture the longrange dependencies in videos while paying attention to global and local information of video features. After experimental verification, the model outperforms eight existing methods.

Key words: behavior conflict detection, action recognition, multimodal features fusion, Transformer, graph convolution networks

摘要: 近年来,随着监控摄像头的不断增多和互联网的迅速发展,监控视频与网络视频越来越多,对视频进行自动行为冲突检测对降低人为审核导致的隐私信息泄露风险及维护社会治安、净化网络环境等具有重要意义.为了充分提取视频中的行为冲突特征,并获得有较好泛化能力与检测效果的模型,采用I3D(inflated 3D convolutional network)与VGGish,基于XDViolence进行多模态特征的提取,并提出了基于Transformer和图卷积网络的行为冲突检测模型TGBCDM(behavior conflict detection model based on Transformer and graph convolution networks).该模型包含Transformer编码器模块和图卷积模块,可以在有效捕捉视频中长距离依赖关系的同时,关注视频特征的全局信息和局部信息.经过实验证明,该模型优于现有的8种方法.

关键词: 突检测, 动作识别, 多模态特征融合, Transformer, 图卷积网络

CLC Number: