Multi-feature fusion based encrypted malicious traffic detection method for coal mine network
-
摘要: 针对煤矿网络面临由恶意软件所产生的安全传输层协议(TLS)加密恶意流量威胁和检测过程加密流量误报率高的问题,提出了一种基于多特征融合的煤矿网络TLS加密恶意流量检测方法。分析了TLS加密恶意流量特征多元异构的特点,提取出煤矿网络TLS加密恶意流在传输过程中的连接特征、元数据和TLS加密协议握手特征,利用流指纹方法构造煤矿网络TLS加密流量特征集,并对该特征集中的特征进行标准化、独热编码和规约处理,从而得到一个高效样本集。采用决策树(DT)、K近邻(KNN)、高斯朴素贝叶斯(GNB)、L2逻辑回归(LR)和随机梯度下降(SGD)分类器5个子模型对上述特征集进行检验。为提高检测模型的鲁棒性,结合投票法原理将5个分类器子模型结合,构建了多模型投票(MVC)检测模型:将5个分类器子模型作为投票器,每个分类器子模型单独训练样本集,按照少数服从多数原则进行投票,得到每个样本的最终预测值。实验验证结果表明:所构建的特征集降低了样本集维度,提高了TLS加密流量检测效率。DT分类器和KNN分类器在数据集上表现最好,达到了99%以上的准确率,但是它们存在过拟合风险;LR分类器和SGD分类器子模型虽然也达到了90%以上的识别准确率,但这2个子模型的误报率过高;GNB分类器子模型表现最差,准确率只有82%,但该子模型具有误报率低的优势。MVC检测模型在数据集上准确率和召回率达99%以上,误报率为0.13%,提高了加密恶意流量的检出率,加密流量检测误报率为0,其综合性能优于其他分类器子模型。Abstract: The coal mine network is faced with the threat of malicious traffic encrypted by the transport layer security protocol (TLS) generated by malicious software and the high false alarm rate of encrypted traffic during detection. In order to solve the above problems, a multi-feature fusion malicious traffic detection method for coal mine network TLS encryption is proposed. The characteristics of multiple and heterogeneous malicious traffic features of TLS encryption are analyzed. The connection features, metadata and TLS encrypted protocol handshake features of coal mine network TLS encrypted malicious traffic in the transmission process are extracted. A coal mine network TLS encrypted traffic characteristic set is constructed by using a flow fingerprint method. The features in the feature set are standardized, one-hot encoded and normalized, so as to obtain an efficient sample set. Five sub-models of decision tree (DT), K-nearest neighbor (KNN), Gaussian Naive Bayes (GNB), L2 logistic regression (LR) and stochastic gradient descent (SGD) classifiers were used to test the above feature sets. In order to improve the robustness of the detection model, combined with the principle of the voting method, five classifier sub-models are combined to construct a muti-model voting classifier (MVC) detection model. Five classifier sub-models are used as voters. Each classifier sub-model trains the sample set separately, and votes according to the principle of minority obeying majority to get the final prediction value of each sample. The experimental results show that the proposed feature set reduces the dimension of the sample set and improves the detection efficiency of TLS encrypted traffic. DT classifier and KNN classifier perform best on the data set, reaching more than 99% accuracy. But they have the risk of overfitting. Although the LR classifier and SGD classifier sub-models have also achieved recognition accuracy of more than 90%, the false positive rate of these two sub-models is too high. The GNB classifier sub-model performs the worst, with an accuracy of 82%. But it has the advantage of low false-positive rate. The accuracy and recall rate of that MVC detection model on a data set is more than 99%, the false alarm rate is 0.13%. The detection rate of encrypted malicious traffic is improved, and the false alarm rate of encrypted traffic detection is 0. And the comprehensive performance of the MVC detection model is better than that of other classifier sub-models.
-
表 1 εi≥0.01的前28个特征和特征重要性权重
Table 1. Top 28 features with εi≥0.01 and feature importance weights
特征 εi 特征 εi 后向数据包负载最大值 0.092 9 后向数据包头字节数值 0.016 5 conn_state
连接状态0.092 3 前向第1个数据包的窗口大小(字节) 0.016 2 前向数据包负载标准差 0.071 9 有1个有效载荷的后向数据包数量 0.015 5 后向数据包负载平均值 0.061 6 流中2个连续数据包之间到达时间的最小值 0.014 3 后向数据包负载最小值 0.058 6 前向数据包负载最大值 0.013 9 流数据包负载平均值 0.048 6 前向所有到达时间的标准差 0.013 4 目的端口号 0.042 6 流中2个连续数据包之间到达时间的最大值 0.012 8 流数据包负载最大值 0.034 7 1个TCP流中出现ACK标志的总数 0.012 8 后向数据包头字节数最大值 0.027 5 前向2个连续数据包之间到达时间的最大值 0.011 3 前向子流有效负载平均数量 0.023 6 前向子流数据包数量平均值 0.011 2 后向2个连续数据包之间到达时间的方差 0.023 1 后向数据包头字节综述 0.0110 后向与前向数据包数量之比 0.022 3 后向第1个数据包的窗口大小(字节) 0.010 5 后向2个连续数据包之间到达时间的最大值 0.021 8 后向2个连续数据包之间到达时间的平均值 0.010 2 后向子流数据包数量平均值 0.016 5 源端口号 0.010 0 表 2 流量数据集
Table 2. Traffic dataset
流量名称 类型 样本量/条 总量/条
恶意流量Yakes 209 611
657 198Conficker 80 751 Cridex 90 130 Dridex 41 622 Sality 146 366 Razy 68 047 TrickBot 20 671 良性流量 Normal 314 733 314 733 表 3 模型性能对比
Table 3. Comparison of the performance of models
模型 A/% R/% F1/% W/% DT 99.88 99.85 99.84 0.09 KNN 99.88 99.86 99.83 0.10 GNB 82.93 51.87 68.17 0.17 LR 97.69 99.21 96.81 3.13 SGD 94.35 98.73 92.49 8.00 MVC 99.66 99.28 99.52 0.13 -
[1] 刘雨燕,宋燕. 新一代信息技术助力智慧矿山建设[J]. 煤炭技术,2021,40(2):184-186.LIU Yuyan,SONG Yan. New-generation information technology helps construction of smart mines[J]. Coal Technology,2021,40(2):184-186. [2] 陈燕. 煤矿网络安全风险与防范标准研究[J]. 中国石油和化工标准与质量,2019,39(18):5-6. doi: 10.3969/j.issn.1673-4076.2019.18.002CHEN Yan. Study on safety risk and prevention standard of coal mine network[J]. China Petroleum and Chemical Standard and Quality,2019,39(18):5-6. doi: 10.3969/j.issn.1673-4076.2019.18.002 [3] 谭靓洁,李永飞,吴琼. 基于区块链的煤矿安监云数据安全访问模型研究[J]. 工矿自动化,2022,48(5):93-99. doi: 10.13272/j.issn.1671-251x.2022030023TAN Liangjie,LI Yongfei,WU Qiong. Research on security access model of coal mine safety supervision cloud data based on blockchain[J]. Journal of Mine Automation,2022,48(5):93-99. doi: 10.13272/j.issn.1671-251x.2022030023 [4] SEAN G. Nearly half of malware now use TLS to conceal communications[EB/OL]. [2022-03-21]. https://news.sophos.com/en-us/2021/04/21/nearly-half-of-malware-now-use-tls-to-conceal-communications/. [5] 袁钦献. 加密网络流量分析关键技术研究与开发[D]. 西安: 西安电子科技大学, 2019.YUAN Qinxian. Research and development of key technology for encrypted network traffic analysis[D]. Xi'an: Xidian University, 2019. [6] ANDERSON B, MCGREW D. Identifying encrypted malware traffic with contextual flow data[C]//Proceedings of the 2016 ACM workshop on artificial intelligence and security, Vienna, 2016: 35-46. [7] 翟明芳,张兴明,赵博. 基于深度学习的加密恶意流量检测研究[J]. 网络与信息安全学报,2020,6(3):66-77.ZHAI Mingfang,ZHANG Xingming,ZHAO Bo. Survey of encrypted malicious traffic detection based on deep learning[J]. Chinese Journal of Network and Information Security,2020,6(3):66-77. [8] TORROLEDO I, CAMACHO L D, BAHNSEN A C. Hunting malicious TLS certificates with deep neural networks[C]//Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, Toronto, 2018: 64-73. [9] YU Tangda, ZOU Futai, LI Linsen, et al. An encrypted malicious traffic detection system based on neural network[C]//2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery(CyberC), Guilin, 2019: 62-70. [10] REZAEI S,LIU X. Deep learning for encrypted traffic classification:an overview[J]. IEEE Communications Magazine,2019,57(5):76-81. doi: 10.1109/MCOM.2019.1800819 [11] ANDERSON B,PAUL S,MCGREW D. Deciphering malware's use of TLS (without decryption)[J]. Journal of Computer Virology and Hacking Techniques,2016,14(1):1-17. [12] ANDERSON B, MCGREW D. Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-ntationarity[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, 2017: 1723-1732. [13] 骆子铭,许书彬,刘晓东. 基于机器学习的TLS恶意加密流量检测方案[J]. 网络与信息安全学报,2020,6(1):77-83.LUO Ziming,XU Shubin,LIU Xiaodong. Scheme for identifying malware traffic with TLS data based on machine learning[J]. Chinese Journal of Network and Information,2020,6(1):77-83. [14] BARUT O, ZHU R, LUO Y, et al. TLS encrypted application classification using machine learning with flow feature engineering[C]//The 10th International Conference on Communication and Network Security, Tokyo, 2020: 32-41. [15] 鲁刚,郭荣华,周颖,等. 恶意流量特征提取综述[J]. 信息网络安全,2018(9):1-7.LU Gang,GUO Ronghua,ZHOU Ying,et al. Review of malicious traffic feature extraction[J]. Netinfo Security,2018(9):1-7. [16] 康鹏, 杨文忠, 马红桥. TLS协议恶意加密流量识别研究综述[J/OL]. 计算机工程与应用: 1-21[2022-03-21]. http://kns.cnki.net/kcms/detail/11.2127.TP.20220308.0853.002.html.KANG Peng, YANG Wenzhong, MA Hongqiao. TLS malicious encrypted traffic identification research [J/OL]. Computer Engineering and Applications: 1-21[2022-03-21]. http://kns.cnki.net/kcms/detail/11.2127.TP.20220308.0853.002.html. [17] 王洋,陈紫儿,柳瑞春,等. 基于决策树算法的网络加密流量识别方法[J]. 长江信息通信,2021,34(11):15-17. doi: 10.3969/j.issn.1673-1131.2021.11.005WANG Yang,CHEN Zi'er,LIU Ruichun,et al. Network encryption traffic identification method based on decision tree algorithm[J]. Changjiang Information & Communications,2021,34(11):15-17. doi: 10.3969/j.issn.1673-1131.2021.11.005 [18] 张心语,张秉晟,孟泉润,等. 隐私保护的加密流量检测研究[J]. 网络与信息安全学报,2021,7(4):101-113.ZHANG Xinyu,ZHANG Bingsheng,MENG Quanrun,et al. Study on privacy preserving encrypted traffic detection[J]. Chinese Journal of Network and Information,2021,7(4):101-113. [19] PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al. Scikit-learn:machine learning in Python[J]. Machine Learning,2011,12:2825-2830. [20] GARCIA S,GRILL M,STIBOREK J,et al. An empirical comparison of botnet detection methods[J]. Computers & Security,2014,45:100-123.