Journal of Information Security Reserach ›› 2026, Vol. 12 ›› Issue (6): 575-.

Previous Articles    

Chinese Dark Web Product Detection and Classification Based on  Multimodal Data Augmentation#br#

Yang Kaijie1, Luo Wenhua1, and Li Jing2   

  1. 1(School of Public Security Information Technology and Intelligence, Criminal Investigation Police University of China, Shenyang 110035)
    2(Basic Teaching and Research Department, Criminal Investigation Police University of China, Shenyang 110035)
  • Online:2026-06-07 Published:2026-06-07

基于多模态数据增强的中文暗网商品检测与分类

杨凯杰1罗文华1李晶2   

  1. 1(中国刑事警察学院公安信息技术与情报学院沈阳110035)
    2(中国刑事警察学院基础教研部沈阳110035)
  • 通讯作者: 罗文华 硕士,教授.主要研究方向为网络安全执法技术. luowenhua770404@126.com
  • 作者简介:杨凯杰 硕士研究生.主要研究方向为网络安全执法技术. 2571761460@qq.com 罗文华 硕士,教授.主要研究方向为网络安全执法技术. luowenhua770404@126.com 李晶 硕士,讲师.主要研究方向为公安学. 592579981@qq.com
  • 基金资助:
    国家重点研发计划项目(2021YFC3301801);辽宁省教育厅高校基本科研项目(LJ212410175002);中央高校基本科研业务费项目(C2024012);中国刑事警察学院研究生创新能力提升项目(2025YCZD03)

Abstract: In order to address the issues of coarse granularity in existing dark Web intelligence classification research and the predominance of Englishlanguage datasets, this paper proposes a finegrained analysis study focused on Chinese dark Web content. To overcome the scarcity of Chinese dark Web data and the misalignment of multimodal data, this study employs a large language model prompt rewriting strategy and a differentiated image enhancement strategy to achieve text and image data augmentation. By integrating product data from a certain platform on the Surface Web, a dataset comprising 14,052 product records was constructed. A feature selection optimization module was designed to establish an intertask coupling mechanism, and a Chinese dark Web product detection and classification model based on multimodal data augmentation was proposed. Experimental results demonstrate that the proposed model achieves macroF1 scores of 0.992 and 0.941 in dark Web product detection and classification tasks, respectively, representing an approximately 2% improvement over the best baseline model in  classification task and significantly outperforming existing singlemodal and multimodal methods. This approach effectively enhances the performance of finegrained classification tasks for Chinese dark Web intelligence, offering new insights and methodologies for dark Web intelligence analysis.

Key words: dark Web product, multimodal, data augmentation, detection, classification

摘要: 解决现有暗网情报分类研究粒度较粗且数据集多为英文的问题,提出针对中文暗网内容的细粒度分析方法.针对中文暗网数据稀缺及多模态数据不对齐问题,利用大语言模型提示词改写策略及差异化图像增强策略实现文本与图像数据增强,并通过混合明网某平台商品数据,构建包含14052条商品记录的数据集,设计特征选择优化模块建立任务间耦合机制,提出基于多模态数据增强的中文暗网商品检测与分类模型.实验结果表明,该模型在暗网商品检测和分类任务中,宏F1值分别达到0.992和0.941,在分类任务上较最佳基线模型提升约2%,显著优于现有单模态和多模态方法,有效提升了中文暗网情报细粒度分类任务的性能,为暗网情报分析提供了新思路和方法.

关键词: 暗网商品, 多模态, 数据增强, 检测, 分类

CLC Number: