基于随机森林的抗混淆Android恶意应用检测

信息安全研究 ›› 2021, Vol. 7 ›› Issue (2): 126-135.

基于随机森林的抗混淆Android恶意应用检测

王柯林¹,杨珂²,赵瑞哲³,辛丽玲³,汪秋云³

1. 中国科学院大学
2. 国网电子商务有限公司(国网雄安金融科技集团有限公司)
3. 中国科学院信息工程研究所

收稿日期:2021-02-09 出版日期:2021-02-05 发布日期:2021-02-09
通讯作者: 王柯林
作者简介:王柯林硕士研究生主要研究方向：软件安全 wangkelin20@mails.ucas.ac.cn 杨珂博士、工程师主要研究方向：网络安全 yangke@sgec.sgcc.com.cn 赵瑞哲硕士研究生主要研究方向：恶意代码分析 zhaoruizhe@iie.ac.cn 辛丽玲硕士、工程师主要研究方向：网络威胁发现 xinliling@iie.ac.cn 汪秋云硕士、工程师主要研究方向：网络空间安全 wangqiuyun@iie.ac.cn

Obfuscated Android Malware Detection Based on Random Forest

Received:2021-02-09 Online:2021-02-05 Published:2021-02-09

摘要/Abstract

摘要： Android恶意应用的迅速增长引发了极大的安全隐患，很多行为特征容易受到代码混淆技术的影响，导致恶意行为无法被有效检测．本文提出了一种基于随机森林的Android恶意应用检测模型．模型选用危险权限、敏感API调用、Service、Activity、Intent、短信发送频率等特征，其中危险权限和service等Android组件在代码混淆过程中不受影响，采用随机森林、决策树、SVM和卷积神经网络等机器学习方法，利用十折交叉验证的方法训练．通过实验证明，对于未混淆的数据集，本方法能达到分类准确率95.77%的效果；对于混淆之后的数据集，达到分类准确率91.01%的效果．

关键词: Android应用, 动静态分析, 特征选择, 随机森林, 敏感API调用

Abstract: The rapid growth of Android malware has caused great security risks. Many behavioral characteristics are easily affected by code obfuscation techniques, resulting in malicious behaviors that cannot be effectively detected. This paper proposes an Android malware detection model based on Random Forest. The model uses features such as dangerous permissions, sensitive API calls, Service, Activity, Intent, and SMS sending frequency, among which dangerous permissions and Android components such as service are not affected during the code obfuscation process. Random Forest, Decision Tree, SVM and 1-NN were used. These machine learning methods were trained using the ten-fold cross-validation method. Experiments have shown that this method can achieve a classification accuracy of 95.77% for the normal data sets; for the obfuscated data set, it can achieve a classification accuracy of 91.01%.

Key words: Android application, dynamic and static analysis, feature selection, random forest, sensitive API calls

王柯林杨珂赵瑞哲辛丽玲汪秋云. 基于随机森林的抗混淆Android恶意应用检测[J]. 信息安全研究, 2021, 7(2): 126-135.

参考文献

[1] 360网络安全响应中心. 2018年Android 恶意软件专题报告[EB/OL]. (2019-12-10) [2020-06-01]. https://blogs.360.cn/post/review_android_malware_of_2018.html [2] Payet É, Spoto F. Static analysis of Android programs[J]. Information and Software Technology, 2012, 54(11): 1192-1201 [3] Hou S, Ye Y, Song Y, et al. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network[C] //Proc of the 23rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM，2017: 1507-1515 [4] 陈泽峰, 方勇, 刘亮, 等. 基于多维特征的 Android 恶意应用检测系统[J]. 信息安全研究, 2018, 4(2): 133-139 [5] 郗桐, 金昊, 徐根炜, 等. 基于卷积神经网络的 Android 恶意应用检测方法[J]. 信息安全研究, 2018, 4(8): 715-721 [6] Ma Z, Ge H, Liu Y, et al. A combination method for android malware detection based on control flow graphs and machine learning algorithms[J]. IEEE Access, 2019, 7: 21235-21245. [7] Mariconti E, Onwuzurike L, Andriotis P, et al. Mamadroid: Detecting android malware by building markov chains of behavioral models[J]. arXiv preprint，arXiv: 1612.04433, 2016 [8] 王勇, 蔡建宇, 孟春, 等. 基于多特征融合的安卓恶意应用程序检测方法[J]. 信息安全学报, 2018, 3(4): 54-62 [9] 黄浩华, 崔展齐, 潘敏学, 等. 静动态结合的恶意 Android 应用自动检测技术[J]. 2017，2（4）：27-40 [10] Android Developers.Components[EB/OL]. （2019-12-27）[2020-03-01]. https://developer.android.com/guide/components/fundamentals?hl=zh-cn [11] Android Developers. 系统权限[EB/OL]. (2019-12-27)[2020-03-01]. https://developer.android.com/guide/topics/security/permissions.html?hl=zh-cn [12] Allix K, Bissyandé T F, Klein J, et al. Androzoo: Collecting millions of android apps for the research community[C] //Proc of the 13th IEEE/ACM Working Conf on Mining Software Repositories (MSR). Piscataway, NJ:IEEE, 2016: 468-471 [13] Idanr1986. cuckoo-droid[EB/OL]. (2017-07-26) [2019-04-01]. https://github.com/idanr1986/cuckoo-droid [14] Arp D, Spreitzenbarth M, Hubner M, et al. Drebin: Effective and explainable detection of android malware in your pocket[C]//Proc of Network & Distributed System Security Symp. 2014, 14: 23-26 [15] Michael S, Florian E, Thomas S, et al. Mobilesandbox: Looking deeper into Android applications[C]//Proc of the 28th Int ACM Symp on Applied Computing (SAC). New York: ACM, 2013 [16] Shwenzhang. AndResGuard[EB/OL]. (2020-11-14) [2020-11-31]. https://github.com/shwenzhang/AndResGuard [17] MobSF. Mobile-Security-Framework-MobSF [EB/OL]. (2020-05-04) [2020-06-01]. https://github.com/MobSF/Mobile-Security-Framework-MobSF [18] Pjlanz. droidbox [EB/OL]. (2019-12-17)[2020-05-04]. https://github.com/pjlantz/droidbox [19] 王兆国, 李城龙, 关毅, 等. 抗混淆的 Android 应用相似性检测方法[J]. 华中科技大学学报: 自然科学版, 2016, 44(3): 60-64 [20] 焦四辈, 应凌云, 杨轶, 等. 一种抗混淆的大规模 Android 应用相似性检测方法[J]. 计算机研究与发展, 2014, 51(7): 1446 [21] Feng P, Ma J, Sun C, et al. A novel dynamic Android malware detection system with ensemble learning[J]. IEEE Access, 2018, 6: 30996-31011 [22] Ikram M, Beaume P, Kâafar M A. DaDiDroid: An obfuscation resilient tool for detecting android malware via weighted directed call graph modelling[J]. arXiv preprint, arXiv: 1905.09136, 2019