Journal of Information Security Research ›› 2019, Vol. 5 ›› Issue (9): 798-804.

Previous Articles     Next Articles

Design and Implementation of Dark Net Data Crawler Based on Tor


  • Received:2019-09-06 Online:2019-09-15 Published:2019-09-06



  1. 中国刑事警察学院
  • 通讯作者: 汤艳君
  • 作者简介: 汤艳君 中国刑事警察学院教授,硕士生导师,主要研究方向为电子数据取证. 安俊霖 硕士研究生,主要研究方向为电子数据取证.

Abstract: tWith the development of anonymous communication technology, more and more users begin to use anonymous communication to protect personal privacy. Tor, as the most popular application of anonymous communication system, can effectively prevent behavior such as traffic sniffing, eavesdropping and other behaviors. While protecting the privacy of users from being stolen, “dark net” is also used by many criminals. Thus, this has brought great challenges to the supervision of public security. How to strengthen the regulation and crackdown on illegal information of dark network websites is an urgent problem to be solved. Therefore, the data of crawling anonymous websites is an important basis for supervising those websites effectively. The most mainstream dark network anonymous communication system Tor was introduced briefly, its technical principles were analyzed, and a dark network data crawler program was designed, which mainly use Selenium to enter the Tor network, bulk crawl the dark Web pages and save the data to the local. It will help the public security department to further monitor and analyze the relevant content in the dark network, and also propose a feasible technical means for the police department to supervise the dark network.

Key words: Dark Web, Tor, system of onion routing, Selenium, crawler

摘要: 随着匿名通信技术的发展,越来越多的用户开始采用匿名通信手段来保护个人隐私.Tor作为匿名通信系统中最为流行的应用,它能够非常有效地预防流量嗅探、窃听等行为.“暗网”在保护用户个人隐私不被窃取的同时也被很多不法分子所利用,这给公安部门的监管工作带来了巨大挑战.如何加强对暗网网站违法信息监管与打击是亟需解决的问题.因此,爬取暗网网站的数据是对暗网网站进行有效监管的重要基础.简要介绍目前最主流的暗网匿名通信系统Tor,分析其技术原理,设计了一套暗网数据爬虫程序,主要利用Selenium进入Tor网络,对暗网网页进行批量爬取并将数据固定保存至本地,有助于公安部门进一步监控和分析暗网中的相关内容,也为公安部门监管暗网提出一种可行的技术手段.

关键词: 暗网, Tor, 洋葱路由系统, Selenium, 爬虫