[1]Vaswani A. Attention is all you need[J]. arXiv preprint, arXiv:1706.03762, 2017[2]Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint, arXiv:2010.11929, 2020[3]潘琪亮, 李明, 皮振中, 等. 基于DeepSpeed框架构建信息安全领域通用人工智能模型的探索[J]. 信息安全研究, 2024, 10(增刊1): 155160[4]彭祯方, 邢国强, 陈兴跃. 人工智能在网络安全领域的应用及技术综述[J]. 信息安全研究, 2022, 8(2): 110116[5]Fu Y, Zhang S, Wu S, et al. Patchfool: Are vision transformers always robust against adversarial perturbations?[J]. arXiv preprint, arXiv:2203.08392, 2022[6]Lovisotto G, Finnie N, Munoz M, et al. Give me your attention: Dotproduct attention considered harmful for adversarial patch robustness[C] Proc of the IEEECVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2022: 1523415243[7]Joshi A, Jagatap G, Hegde C. Adversarial token attacks on vision transformers[J]. arXiv preprint, arXiv:2110.04337, 2021[8]Selvaraju R R, Cogswell M, Das A, et al. GradCAM: Visual explanations from deep networks via gradientbased localization[C] Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 618626[9]Han K, Xiao A, Wu E, et al.Transformer in transformer[J]. Advances in Neural Information Processing Systems, 2021, 34: 1590815919[10]Wang W, Xie E, Li X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C] Proc of the IEEECVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 568578[11]Liu Ze, Lin Yutong, Cao Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C] Proc of the IEEECVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 1001210022[12]Chu Xiangxiang, Tian Zhi, Wang Yuqing, et al. Twins: Revisiting the design of spatial attention in vision transformers[J]. Advances in Neural Information Processing Systems, 2021, 34: 93559366 |