信息安全研究 ›› 2025, Vol. 11 ›› Issue (6): 561-.

• 学术论文 • 上一篇    下一篇

基于三支决策特征选择的安卓恶意软件检测

陈丽芳1,2王嘉优1施永辉1韩阳1代琪1   

  1. 1(华北理工大学理学院河北唐山063210)
    2(河北省数据科学与应用重点实验室(华北理工大学)河北唐山063210)
  • 出版日期:2025-06-22 发布日期:2025-06-22
  • 通讯作者: 代琪 博士,讲师.主要研究方向为数据挖掘、机器学习. dai18232576157@163.com
  • 作者简介:陈丽芳 博士,教授.主要研究方向为数据挖掘和处理、神经网络建模、网络与信息安全. hblg_clf@163.com 王嘉优 硕士研究生.主要研究方向为网络与信息安全、机器学习. 826780249@qq.com 施永辉 硕士研究生.主要研究方向为联邦学习、网络与信息安全. 2786250969@qq.com 韩阳 博士,副教授.主要研究方向为钢铁大数据、冶金数学模型、智能计算. hanyang@ncst.edu.cn 代琪 博士,讲师.主要研究方向为数据挖掘、机器学习. dai18232576157@163.com

Android Malware Detection Based on Threeway Decision Feature Selection

Chen Lifang1,2, Wang Jiayou1, Shi Yonghui1, Han Yang1, and Dai Qi1   

  1. 1(College of Science, North China University of Science and Technology, Tangshan, Hebei 063210)
    2(Hebei Key Laboratory of Data Science and Application(North China University of Science and Technology), Tangshan, Hebei 063210)
  • Online:2025-06-22 Published:2025-06-22

摘要: Android恶意软件检测数据集中存在大量不相关和冗余的特征,单一的特征选择方法并不能有效去除不相关或冗余特征.如果移除信息量较大的特征则容易引发模型崩塌的问题.针对以上问题,提出一种基于三支决策特征选择(threeway decision feature selection, 3WDFS)的安卓恶意软件检测方法.该方法结合三支决策的思想,并行使用多种特征选择方法评估数据集的特征,将特征分为不相交的正域、负域和边界域;然后,分别利用近似马尔可夫毯和信息量差异删除边界域中的类间冗余特征和类内冗余特征,形成低冗余的边界域;最后,通过可学习权重参数级联拼接正域和低冗余边界域,输入分类模型训练学习.在公开数据集上的实验结果表明,3WDFS能够有效删除Android恶意软件检测中不相关和冗余的特征,提高检测恶意软件的检测效率和准确率.

关键词: Android软件, 三支决策, 特征选择, 冗余特征, 恶意软件检测

Abstract: There are a large number of irrelevant and redundant features in the Android malware detection dataset. A single feature selection method cannot effectively remove irrelevant or redundant features. If the features with large amount of information are removed, it is easy to cause the problem of model collapse. To address these issues, this paper proposed an Android malware detection method based on ThreeWay Decision Feature Selection (3WDFS). The algorithm combines the idea of threeway decision, and uses a variety of feature selection methods to evaluate the features of the dataset in parallel. The features are divided into disjoint positive region, negative region and boundary region. Then, the interclass redundancy feature and the intraclass redundancy feature in the boundary region are deleted by using the approximate Markov blanket and the information difference respectively to form a lowredundancy boundary region. Finally, the positive region and the low redundancy boundary region are concatenated by the learnable weight parameter, and the classification model is input for training and learning. Experimental results on public datasets show that 3WDFS can effectively remove irrelevant and redundant features in Android malware detection and improve the detection efficiency and accuracy of malware detection.

Key words: Android software, threeway decision, feature selection, redundant features, malware detection

中图分类号: