Journal of Information Security Reserach ›› 2026, Vol. 12 ›› Issue (5): 463-.

Previous Articles     Next Articles

A Compressionrobust Video Watermarking Method Based on Multiscale Convolutional Attention and Dualbranch Adversarial Training#br#
#br#

Zhu Shunzhe1, Niu Ke1,2, Lu Yihang1, Xu Qianhui1, and Li Jun1,2   

  1. 1(School of Cryptography Engineering, Engineering University of PAP, Xi’an 710016)
    2(Key Laboratory of Information Security of People’s Armed Police (Engineering University of PAP), Xi’an 710016)
  • Online:2026-05-23 Published:2026-05-23

基于多尺度卷积注意力和双分支对抗训练的抗压缩鲁棒视频水印

朱顺哲1钮可1,2卢艺航1徐千惠1李军1,2   

  1. 1(中国人民武装警察部队工程大学密码工程学院西安710016)
    2(武警部队信息安全重点实验室(中国人民武装警察部队工程大学)西安710016)
  • 通讯作者: 钮可 博士,教授.主要研究方向为信息隐藏、多媒体安全. niuke@163.com
  • 作者简介:朱顺哲 硕士研究生.主要研究方向为视频信息隐藏与人工智能. 18700909363@163.com 钮可 博士,教授.主要研究方向为信息隐藏、多媒体安全. niuke@163.com 卢艺航 硕士研究生.主要研究方向为视频信息隐藏与人工智能. luyihang135@163.com 徐千惠 硕士研究生.主要研究方向为视频信息隐藏与人工智能. xu2000qianhui@163.com 李军 博士,讲师.主要研究方向为信息隐藏. lijun9250lj@163.com

Abstract: To overcome the limitations of current deep learningbased video watermarking methods, such as reliance on singlescale feature extraction, limited adversarial training mechanisms, and insufficient robustness against compression, this paper proposes a robust video watermarking model called MSCAGAN (multiscale convolutional attention generative adversarial network), which integrates a multiscale convolutional attention mechanism and a dualbranch adversarial training framework. The model employs a lightweight multiscale attention module to extract key features form video frames at both local and global perspectives. Combined with depthwise separable convolution, it reduces computational complexity while achieving precise localization and strength control of watermark embedding, thereby enhancing invisibility. This paper innovatively designs a dualbranch adversarial training structure, in which a learnable adversary network is introduced to simulate realworld attacks, enhancing the model’s robustness against common threats such as compression, cropping, and scaling. Experimental results demonstrate that the watermarked videos generated by MSCAGAN achieve an average PSNR of 44.61dB and a SSIM of 0.964, significantly outperforming existing methods. Under H.264 compression, the average decoding accuracy reaches 94.01%. Moreover, the model maintains strong robustness even under severe cropping and scaling attacks. In summary, MSCAGAN provides an efficient and reliable solution for multimedia content copyright protection. It has the potential to be extended to emerging coding standards such as H.265, further enhancing its robustness in complex application scenarios.

Key words: video watermarking, compression resistance, robustness, multiscale convolutional attention, dualbranch adversarial training

摘要: 针对当前基于深度学习的视频水印方法普遍依赖单一尺度特征提取、对抗训练机制功能受限及抗压缩性能不足等问题,提出了一种融合多尺度卷积注意力机制与双分支对抗训练框架的鲁棒视频水印模型MSCAGAN(multiscale convolutional attention generative adversarial network).该模型通过轻量级多尺度注意力模块,从局部到全局尺度提取视频帧的关键特征,并结合深度可分离卷积以降低计算复杂度,实现对水印嵌入区域的精准定位与强度控制,从而提升不可见性.同时,创新性地设计了一种双分支对抗训练结构,通过引入可学习的对手网络模拟真实攻击行为,增强模型在面对压缩、裁剪等常见攻击时的鲁棒性.实验结果显示,MSCAGAN生成的水印视频平均峰值信噪比(PSNR)为44.61dB,结构相似性指数(SSIM)为0.964,显著优于现有方法;在H.264压缩测试中,平均解码准确率达94.01%;在裁剪和缩放攻击下也表现出较强的鲁棒性.综上,该模型为多媒体内容版权保护提供了一种高效可靠的解决方案,未来可扩展至H.265等新型编码标准,进一步提升其在复杂场景下的鲁棒表现.

关键词: 视频水印, 抗压缩, 鲁棒性, 多尺度卷积注意力, 双分支对抗训练

CLC Number: