Research on Harmful Website Detection Based on Graph Neural Network and Multifeature Fusion

Journal of Information Security Reserach ›› 2026, Vol. 12 ›› Issue (5): 420-.

Previous Articles Next Articles

Research on Harmful Website Detection Based on Graph Neural Network and Multifeature Fusion

Qu Miaozhang1, Shi Zhibin1, Chang Zhaoyu1, and Zhang Wei2

1(School of Computer Science and Technology, North University of China, Taiyuan 030051)
2(School of Computer Information Engineering, Shanxi Technology and Business University, Taiyuan 030032)

Online:2026-05-23 Published:2026-05-23

基于图神经网络和多特征融合的有害网站检测研究

瞿淼樟1师智斌1常赵宇1张薇2

1(中北大学计算机科学与技术学院太原030051)
2(山西工商学院计算机信息工程学院太原030032)

通讯作者: 师智斌博士，副教授，主要研究方向为网络安全. 1637350520@qq.com
作者简介:瞿淼樟硕士研究生.主要研究方向为网络安全. qumz8109@163.com 师智斌博士，副教授，主要研究方向为网络安全. 1637350520@qq.com 常赵宇硕士研究生.主要研究方向为网络安全. czy2514881241@163.com 张薇硕士.主要研究方向为网络安全及数据挖掘. 254315045@qq.com

Abstract

Abstract: To address the limitations of current harmful website detection methods in deep text semantic mining and multimodal feature coperception, this study proposes a multifeature fusion detection model based on graph attention networks (GAT) and ConvNeXt. The framework leverages GloVe word embeddings to construct semantic representations of website text, mapping it into a graph structure based on word cooccurrence relationships. The adaptive attention mechanism in GAT dynamically captures contextual dependencies between noncontiguous words, while ConvNeXt extracts both local details and global contextual features from website images. A crossattentionbased fusion module facilitates dynamic textimage feature alignment and interactive integration. Experimental results demonstrate that the proposed model achieves 99.10% accuracy in fourcategory website classification, significantly enhancing detection performance. This work offers valuable insights for identifying harmful online content and enhancing cybersecurity governance.

Key words: harmful website detection, graph neural network, multifeature fusion, GAT, ConvNeXt, crossattention

摘要： 针对当前有害网站检测方法在文本深度语义挖掘与多特征协同感知方面的不足，提出一种基于图注意力网络与ConvNeXt的多特征融合检测模型GATConvNeXt.通过GloVe(global vectors for word representation)词嵌入技术构建网站文本的语义表征，并基于词共现关系将文本映射为图结构，利用图注意力网络的自适应注意力机制动态捕捉非连续词汇间的潜在关联，采用ConvNeXt提取网站图像的局部细节与全局上下文信息，设计基于交叉注意力的多特征融合模块，实现文本与图像特征的动态对齐与交互.实验结果表明，该模型在网站4分类任务中准确率达到99.10%，显著提升检测精度，对网络有害内容识别与安全治理具有重要参考价值.

关键词: 有害网站检测, 图神经网络, 多特征融合, 图注意力网络, ConvNeXt, 交叉注意力机制

CLC Number:

TP309.2

瞿淼樟, 师智斌, 常赵宇, 张薇, . 基于图神经网络和多特征融合的有害网站检测研究[J]. 信息安全研究, 2026, 12(5): 420-.

References

［1］郭兵阳. 基于异构信息网络的域名系统建模与违法网站检测研究［D］. 长沙: 国防科技大学, 2021［2］Ma J, Saul L K, Savage S, et al. Beyond blacklists: Learning to detect malicious Web sites from suspicious URLs［C］ Proc of the 15th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, New York: ACM, 2009: 12451254［3］ReyesDorta N, CaballeroGil P, RosaRemedios C. Detection of malicious URLs using machine learning［J］. Wireless Networks, 2024, 30(9): 75437560［4］翟溪林. 涉赌、涉黄及仿冒类违法网站智能检测与分类系统的研究与实现［D］. 北京: 北京邮电大学, 2024［5］Karajgar M D, Sawardekar S, Khamankar S, et al. Comparison of machine learning models for identifying malicious URLs［C］ Proc of the 2024 IEEE Int Conf on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS). Piscataway, NJ: IEEE, 2024: 15［6］张昕, 丰阳露, 周志龙, 等. 面向家庭网络的多模态预训练违法网站识别算法［J］. 网络空间安全, 2023, 14(2): 5256［7］Zhang Z, Han D, Wu S, et al. Identification and detection of illegal gambling websites and analysis of user behavior［J］. Computer Science and Information Systems, 2025, 22(3): 859879［8］Choi G H, Lee K S, Park S R, et al. Detecting and classifying illegal websites with specific Korean keywords using a large language model［C］ Proc of the 2024 IEEE Int Conf on Big Data and Smart Computing (BigComp). Piscataway, NJ: IEEE, 2024: 363364［9］Shen Q, Zhang L, Liu C. Illegal information detection of Web pages with attentional deep neural networks and multilevel feature fusion［C］ Proc of the 2023 Int Conf on Artificial Intelligence, Systems and Network Security. Piscataway, NJ: IEEE, 2023: 370374［10］游畅, 黄诚, 田璇, 等. 基于多维特征的涉诈网站检测与分类技术研究［J］. 四川大学学报: 自然科学版, 2024, 61(4): 3342［11］Cascavilla G, Catolino G, Sangiovanni M. Illicit darkweb classification via naturallanguage processing: Classifying illicit content of webpages based on textual information［J］. arXiv preprint, arXiv:2312.04944, 2023［12］孙梦怡, 魏嘉迪, 李超. 基于特征融合的赌博网站识别研究［J］. 网络安全技术与应用, 2024 (5): 5458［13］Zhang G, Li Z, Huang J, et al.eFraudCom: An ecommerce fraud detection system via competitive graph neural networks［J］. ACM Trans on Information Systems, 2022, 40(3): 129［14］周文文, 韩斌, 黄树成. 结合文本语义图和词频统计的网站分类算法研究［J］. 计算机与数字工程, 2020, 48(6): 12651268, 1313［15］张嘉皓. 基于模板特征的赌博网站检测研究［D］. 武汉: 华中科技大学, 2021［16］Wu L, Chen Y, Shen K, et al. Graph neural networks for natural language processing: A survey［J］. Foundations and Trends in Machine Learning, 2023, 16(2): 119328［17］Velikovi P, Cucurull G, Casanova A, et al. Graph attention networks［J］. arXiv preprint, arXiv:1710.10903, 2017［18］Liu Z, Mao H, Wu C Y, et al. A convnet for the 2020s［C］ Proc of the IEEECVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2022: 1197611986

[1]	. Research on Smart Contract Vulnerability Detection Method Based on Multimodal Feature Fusion [J]. Journal of Information Security Reserach, 2026, 12(6): 503-.
[2]	. EWGNN: Edge Weightaware Graph Neural Network for Encrypted Traffic Classification [J]. Journal of Information Security Reserach, 2026, 12(6): 533-.
[3]	. A Network Traffic Anomaly Detection Model Based on Semisupervised Twochannel Multiscale Gating Fusion [J]. Journal of Information Security Reserach, 2026, 12(6): 566-.
[4]	. Generative Logic and Coping Strategies of Personal Information Security Risks in Digital Platform [J]. Journal of Information Security Reserach, 2026, 12(5): 445-.
[5]	. Certificatebased Designated Verifier Aggregate Authentication Scheme in the Internet of Vehicles [J]. Journal of Information Security Reserach, 2026, 12(4): 376-.
[6]	. Anomaly Encrypted Traffic Detection Method Based on Graph Attention Network [J]. Journal of Information Security Reserach, 2026, 12(3): 237-.
[7]	. Compound Admissibility Rules of Blockchain Evidence in Online Litigation [J]. Journal of Information Security Reserach, 2026, 12(2): 134-.
[8]	. Encrypted Traffic Detection Method Based on Knowledge Distillation [J]. Journal of Information Security Reserach, 2025, 11(8): 702-.
[9]	. Model of Insider Threat Behavior Detection Based on Graph Neural Network [J]. Journal of Information Security Reserach, 2025, 11(7): 586-.
[10]	. A Blockchainbased Privacypreserving Data Aggregation System for #br# Vehicular Networks#br# [J]. Journal of Information Security Reserach, 2025, 11(4): 367-.
[11]	. An Intrusion Detection Method for Internet of Things by Fusing #br# Spatiotemporal Features#br# [J]. Journal of Information Security Reserach, 2025, 11(3): 241-.
[12]	. A Blockchain Oracle Scheme Based on Schnorr Threshold Signature [J]. Journal of Information Security Reserach, 2025, 11(3): 282-.
[13]	. Research on Traffic Anomaly Detection Method and System for API Gateway [J]. Journal of Information Security Reserach, 2025, 11(10): 917-.
[14]	. A Binary Modularization Approach Based on Graph Community Detection Method [J]. Journal of Information Security Reserach, 2025, 11(1): 43-.
[15]	. Interaction Perception Attention Network Between Layers for #br# Fewshot Malicious Domain Name Detection#br# [J]. Journal of Information Security Reserach, 2025, 11(1): 50-.

Research on Harmful Website Detection Based on Graph Neural Network and Multifeature Fusion

基于图神经网络和多特征融合的有害网站检测研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics