基于机器学习的防扫描技术研究

信息安全研究 ›› 2019, Vol. 5 ›› Issue (4): 303-308.

基于机器学习的防扫描技术研究

唐其彪,杨勃,潘利民

杭州安恒信息技术股份有限公司风暴中心

收稿日期:2019-04-08 出版日期:2019-04-15 发布日期:2019-04-08
通讯作者: 唐其彪
作者简介:唐其彪硕士，主要研究方向为网络安全、云waf智能防护、对抗防护. qibiao.tang@dbappsecurity.com.cn 杨勃硕士，高级工程师，主要研究方向为威胁情报、态势感知、金融安全和智慧城市安全。 bob.yang@dbappsecurity.com.cn 潘利民硕士，高级工程师，主要研究方向为云安全、边界安全、数据安全和智能安全。 limin.pan@dbappsecurity.com.cn

Research on Anti-Scanning Technology Based on Machine Learning

Received:2019-04-08 Online:2019-04-15 Published:2019-04-08

摘要/Abstract

摘要： 随着互联网技术的发展，Web应用系统已经广泛应用于政府门户网站、电子商务、互联网等行业，方便生活和工作的同时也带来网络安全隐患.黑客利用扫描技术不仅能够找到服务器漏洞进行攻击，而且扫描产生的大量数据报文也占用了大量的网络带宽，导致正常的网络通信无法进行.针对这个问题，提出通过解析客户端访问日志提取2s时间内日志的本次IP访问的响应码、2s时间内本次IP的访问数占全部IP访问数的比例、2s时间内本次IP访问的404响应码个数占本次IP访问的比例、2s时间内本次IP访问的端口方差，提取100条日志本次IP的访问数占比、100条日志中本次IP访问的404响应码个数、100条日志本次IP访问的端口方差7个特征，通过机器学习中朴素贝叶斯分类算法识别扫描行为的方法.并且使用spark的mLlib贝叶斯算法训练存储HDFS平台的扫描日志，定时更新算法模板，实现对抗恶意扫描的能力，最终通过iptables对扫描IP进行网络层封禁.该方法提高识别准确率，降低误报率，有效降低恶意流量，防护客户网站.

关键词: 防扫描, 机器学习, 朴素贝叶斯算法, 网络安全, spark, iptables

Abstract: With the development of Internet technology, web application systems have been widely used in government portals, ecommerce, Internet and other industries, which are convenient for life and work, but also bring network security risks. Hackers can not only find server vulnerabilities by scanning technology, but also generate a large amount of network bandwidth due to scanning, which causes normal network communication to fail. To solve this problem, it is proposed to analyze the client access log, extract the response code of the past 2s IP access in the log, the proportion of the number of IP accesses in the past 2s to the total number of IP accesses, and the response code of the IP access in the past 2s. The proportion of 404 accounts for the current IP access, the port variance of the IP access in the past 2s, the number of IP addresses in the past 100 logs, and the number of 404 responses in the past 100 logs. In the past 100 logs, the port variance of this IP access has 7 characteristics, and the scanning behavior is identified by the naive Bayesian classification algorithm in machine learning. And use the spark MLlib Bayesian algorithm to train the scan log of the hdfs platform, update the algorithm template regularly, and realize the ability to resist malicious scanning. Finally, the network layer is blocked by iptables. The method improves recognition accuracy, reduces false positive rate, effectively reduces malicious traffic, and protects customer websites.

Key words: anti-scanning, machine learning, naive bayesian algorithm, cyber security, spark, iptables

唐其彪杨勃潘利民. 基于机器学习的防扫描技术研究[J]. 信息安全研究, 2019, 5(4): 303-308.

[1]	杨鹏飞罗奇伟李尧. 数字政府网络安全指数评估体系研究[J]. 信息安全研究, 2021, 7(3): 257-262.
[2]	门嘉平肖扬文马涛. 社会工程学攻击之钓鱼邮件分析[J]. 信息安全研究, 2021, 7(2): 166-170.
[3]	王逸鹤黄亦芃. 面向网络安全防御防护的大数据平台架构研究[J]. 信息安全研究, 2021, 7(1): 75-80.
[4]	李仁杰华驰鲁志萍. 基于FP-growth优化SVM分类器的XSS攻击检测研究[J]. 信息安全研究, 2020, 6(9): 0-0.
[5]	寇春静刘志娟张弛雷灵光. 中国大陆信息网络安全学术研究的影响力分析[J]. 信息安全研究, 2020, 6(9): 0-0.
[6]	吉梁. 央企商密网分类分域安全防护体系设计与思考[J]. 信息安全研究, 2020, 6(9): 0-0.
[7]	邱勤张滨吕欣. 5G安全需求与标准体系研究[J]. 信息安全研究, 2020, 6(8): 673-679.
[8]	段伟伦韩晓露吕欣李阳. 美国5G安全战略分析及启示[J]. 信息安全研究, 2020, 6(8): 688-693.
[9]	崔枭飞樊晓贺. 新基建浪潮下5G mMTC业务场景安全问题研究[J]. 信息安全研究, 2020, 6(8): 710-715.
[10]	张彦司群冯凤娟. 铁路网络安全测评体系研究[J]. 信息安全研究, 2020, 6(8): 738-743.
[11]	蹇诗婕卢志刚姜波刘玉岭刘宝旭. 基于层次聚类方法的流量异常检测[J]. 信息安全研究, 2020, 6(6): 0-0.
[12]	张泽樊江伟周南. 基于MEA-LVQ的网络态势预测模型 [J]. 信息安全研究, 2020, 6(6): 0-0.
[13]	肖喜生彭凯飞龙春魏金侠赵静. 基于人工智能的安全态势预测技术研究综述[J]. 信息安全研究, 2020, 6(6): 0-0.
[14]	李憧刘鹏蔡国庆. 基于流量感知的动态网络资产监测研究[J]. 信息安全研究, 2020, 6(6): 0-0.
[15]	刘思博刘鹏. 态势感知在电子政务信息安全中的应用[J]. 信息安全研究, 2020, 6(6): 0-0.