基于特征选择与BO−GBDT的工作面瓦斯涌出量预测方法

马文伟

马文伟. 基于特征选择与BO−GBDT的工作面瓦斯涌出量预测方法[J]. 工矿自动化,2024,50(12):136-144. DOI: 10.13272/j.issn.1671-251x.2024070022
引用本文: 马文伟. 基于特征选择与BO−GBDT的工作面瓦斯涌出量预测方法[J]. 工矿自动化,2024,50(12):136-144. DOI: 10.13272/j.issn.1671-251x.2024070022
MA Wenwei. Prediction method of gas emission in working face based on feature selection and BO-GBDT[J]. Journal of Mine Automation,2024,50(12):136-144. DOI: 10.13272/j.issn.1671-251x.2024070022
Citation: MA Wenwei. Prediction method of gas emission in working face based on feature selection and BO-GBDT[J]. Journal of Mine Automation,2024,50(12):136-144. DOI: 10.13272/j.issn.1671-251x.2024070022

基于特征选择与BO−GBDT的工作面瓦斯涌出量预测方法

基金项目: 国家科技重大专项资助项目(2016ZX05045-004-001)。
详细信息
    作者简介:

    马文伟(1985—),男,山西大同人,副研究员,硕士,主要从事矿井瓦斯灾害防治及煤矿智能化方面的研究工作,E-mail:120598723@qq.com

  • 中图分类号: TD712.5

Prediction method of gas emission in working face based on feature selection and BO-GBDT

  • 摘要:

    影响工作面瓦斯涌出量的特征众多,利用主成分分析等方法对原始数据降维,可节省计算资源,但会改变数据集的原始特征结构,损失部分原始数据特征的细节信息。针对该问题,建立梯度提升决策树(GBDT)瓦斯涌出量预测模型,利用5种特征选择算法对数据集进行特征过滤,分析每种特征组合在GBDT模型中的拟合度、计算时间及预测结果,优选出包装法为最佳的特征选择算法;结合现场实际,优选出8种特征进行瓦斯涌出量预测,结果表明,特征数量的多少与预测结果的准确性和泛化性并不呈正比关系,冗余特征或无关特征的存在反而会降低模型的预测准确性。为进一步提高模型精度,通过5种超参数寻优算法对GBDT模型进行超参数寻优,对比分析每一种超参数组合下GBDT模型的预测性能,结果表明:寻优算法本身对GBDT模型的准确性和泛化性影响较小,但基于树结构Parzen估计器(TPE)的贝叶斯优化(BO)算法所得出的最优超参数组合在GBDT模型中具有最高的准确率和相对较少的优化时间,其优化性能最佳,以此建立BO−GBDT模型。将特征选择后的数据集划分出训练集及测试集,利用BO−GBDT模型进行工作面瓦斯涌出量预测,并与随机森林、支持向量机、神经网络模型进行对比,结果表明:BO−GBDT模型具有更高的准确性和泛化性,其平均相对误差为2.61%,相比随机森林、支持向量机、神经网络模型分别降低了35.56%,37.41%,32.03%,能够满足现场工程应用需求,为矿井安全生产提供理论指导。

    Abstract:

    Gas emission in the working face is influenced by a variety of factors. Dimensionality reduction methods, such as Principal Component Analysis, can reduce computational resources but may alter the original feature structure, leading to a loss of some detailed information in the dataset. To address this issue, a gradient boosting decision tree (GBDT) model for gas emission prediction was developed. Five feature selection algorithms were applied to filter the dataset, and the model fit, computational time, and prediction accuracy of each feature combination in the GBDT model were analyzed. The wrapping method was identified as the most effective feature selection algorithm. Based on field conditions, 8 optimal features were selected for prediction. The results indicated that the number of features did not necessarily correlate with the prediction's accuracy or generalization capability. In fact, redundant or irrelevant features reduced the model's prediction accuracy. To further improve performance, five hyperparameter optimization algorithms were applied to the GBDT model. A comparative analysis of prediction performance for each hyperparameter combination was conducted. The results showed that the optimization algorithm itself had minimal impact on the accuracy and generalization of the GBDT model. However, the optimal hyperparameter combination, obtained through the tree-structured Parzen estimator (TPE) based Bayesian optimization (BO) algorithm, provided the highest accuracy and relatively short optimization time, yielding the best optimization performance. Thus, the BO-GBDT model was established. After feature selection, the dataset was divided into training and testing sets, and the BO-GBDT model was used to predict gas emission in the working face. Comparison with random forest, support vector machine, and neural network models showed that the BO-GBDT model achieved the highest accuracy and generalization, with an average relative error of 2.61%. This was 35.56%, 37.41%, and 32.03% lower than the random forest, support vector machine, and neural network models, respectively. The BO-GBDT model meets the field engineering application requirements and provides theoretical guidance for ensuring safe mining production.

  • 图  1   数据集特征与标签相关性热图

    Figure  1.   Heatmap of correlation between features and labels in dataset

    图  2   基于BO算法的超参数优化流程

    Figure  2.   Hyperparameter optimization process based on Bayesian optimization(BO) algorithm

    图  3   基于不同超参数组合的GBDT模型预测值与真实值对比

    Figure  3.   Comparison of predicted and actual values in GBDT models under different hyperparameters combinations

    图  4   基于不同超参数组合的GBDT模型相对误差对比

    Figure  4.   Comparison of relative errors in GBDT models under different hyperparameters combinations

    图  5   不同特征组合下GBDT模型预测值与真实值对比

    Figure  5.   Comparison of predicted and actual values in GBDT models under different feature combinations

    图  6   不同特征组合下GBDT模型的相对误差对比

    Figure  6.   Comparison of relative errors in GBDT models under different feature combinations

    图  7   4种模型的预测数据与真实数据对比

    Figure  7.   Comparison of predicted data and actual values of four models

    图  8   4种模型的相对误差对比

    Figure  8.   Comparison of relative errors of four models

    表  1   回采工作面瓦斯涌出量样本数据

    Table  1   Gas emission sample data of mining working face

    序号 X1/(m3·t−1 X2/m X3/m X4/(°) X5/m X6/m X7/m X8/% X9/t X10/(m3·t−1 X11/m X12/m X13 Y/(m3·min−1
    1 3.90 499 4.3 15 4.3 10 280 0.93 17217 3.10 2.80 52 5.89 2.71
    2 3.16 502 2.7 8 2.7 10 290 0.93 11197 2.80 1.79 48 4.90 2.84
    3 3.40 522 3.4 12 3.4 8 280 0.95 10891 2.15 1.72 14 4.71 3.20
    4 2.96 540 2.8 10 2.8 8 290 0.95 9289 2.44 2.20 20 4.24 3.60
    5 3.68 513 3.5 12 3.5 9 285 0.94 12838 3.28 1.80 19 4.54 3.10
    71 2.46 448 2.3 11 2.3 4.33 159 0.95 1998 2.01 1.69 17 4.65 4.07
    72 3.12 541 2.6 13 2.6 3.82 166 0.94 2207 2.3 1.81 14 4.72 4.92
    73 4.65 630 6.3 12 6.3 2.81 170 0.93 3457 3.34 1.62 19 4.65 8.05
    下载: 导出CSV

    表  2   不同特征选择算法的特征选择结果

    Table  2   Feature selection results of different feature selection algorithms

    特征选择算法 特征 R2 计算时间/s
    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
    方差过滤法 × × × × 0.846 0 0.13
    F检验法 × × × × × × 0.849 1 0.13
    互信息法 0.848 3 0.16
    嵌入法 × × × × × × × 0.851 5 0.43
    包装法 × × × × × × × 0.851 5 0.14
    未进行特征选择 0.848 3 0.16
    下载: 导出CSV

    表  3   GBDT模型超参数

    Table  3   Gradient boosting decision tree(GBDT) model hyperparameters

    序号超参数名称含义
    1n_estimators弱学习器最大个数
    2learning_rate学习率
    3max_features划分时考虑的特征数量
    4Subsample子采样比例
    5loss损失函数选择
    6criterion衡量每个决策树节点分裂质量的评价指标
    7max_depth每棵子树的深度
    8min_impurity_split最小基尼不纯度
    下载: 导出CSV

    表  4   超参数寻优算法性能对比

    Table  4   Performance comparison of hyperparameter optimization algorithms

    超参数 网格搜索 随机搜索 基于高斯过程的BO算法 基于TPE的BO算法 基于Optuna的BO算法
    max_features log2 sqrt log2 sqrt sqrt
    loss absolute_error absolute_error absolute_error quantile absolute_error
    criterion friedman_mse squared_error squared_error friedman_mse squared_error
    n_estimators 790 783 208 374 421
    learning_rate 0.01 0.01 0.106 0 0.296 1 0.214 8
    subsample 0.6 0.8 0.562 7 0.311 2 0.473 9
    max_depth 6 5 33 2 42
    min_impurity_split 0 0.888 9 0.059 4 2.525 0 3.350 1
    R2 0.903 4 0.901 6 0.926 6 0.927 2 0.926 6
    寻优时间/min 108.9 4.42 8.63 2.36 4.58
    下载: 导出CSV

    表  5   基于不同超参数组合的GBDT模型相对误差统计

    Table  5   Statistical relative errors in GBDT models under different hyperparameter combinations

    超参数优化算法 最大相对误差/% 平均相对误差/%
    网格搜索算法 11.17 3.15
    随机搜索算法 11.78 3.51
    基于高斯过程的BO算法 11.79 3.53
    基于TPE的BO算法 9.55 2.70
    基于Optuna的BO算法 10.97 3.13
    下载: 导出CSV

    表  6   不同特征组合下GBDT模型的相对误差统计

    Table  6   Statistical relative errors in GBDT models under different feature combinations

    特征选择方法 最大相对误差/% 平均相对误差/%
    方差过滤法 13.48 3.04
    F检验法 8.19 2.77
    互信息法 9.53 4.30
    嵌入法 9.55 2.70
    包装法 11.18 2.79
    包装法+自选 7.18 2.61
    下载: 导出CSV

    表  7   4种模型的相对误差统计

    Table  7   Statistical relative error in four models

    预测模型 最大相对误差% 平均相对误差%
    随机森林模型 12.59 4.05
    支持向量机模型 12.72 4.17
    神经网络模型 8.11 3.84
    GBDT模型 7.18 2.61
    下载: 导出CSV
  • [1] AQ 1018—2006矿井瓦斯涌出量预测方法[S].

    AQ 1018—2006 The predicted method of mine gas emission rate[S].

    [2] 王磊,刘雨,刘志中,等. 基于IABC−LSSVM的瓦斯涌出量预测模型研究[J]. 传感器与微系统,2022,41(2):34-38.

    WANG Lei,LIU Yu,LIU Zhizhong,et al. Research on prediction model for gas emission based on IABC-LSSVM[J]. Transducer and Microsystem Technologies,2022,41(2):34-38.

    [3] 张玉财,王毅,郭凯岩. 基于WOA−LSTM的工作面瓦斯涌出量预测研究[J]. 矿业安全与环保,2023,50(5):50-55.

    ZHANG Yucai,WANG Yi,GUO Kaiyan. Research on prediction of gas emission in working face based on WOA-LSTM[J]. Mining Safety & Environmental Protection,2023,50(5):50-55.

    [4] 荣统瑞,侯恩科,夏冰冰. 基于二次分解和BO−BiLSTM组合模型的采煤工作面瓦斯涌出量预测方法研究[J]. 煤矿安全,2024,55(5):83-92.

    RONG Tongrui,HOU Enke,XIA Bingbing. Research on prediction method of coal mining face gas outflow based on quadratic decomposition and BO-BiLSTM combination model[J]. Safety in Coal Mines,2024,55(5):83-92.

    [5] 徐耀松,白济宁,王雨虹,等. 基于CEEMDAN−DA−GRU的瓦斯涌出量预测模型[J]. 传感技术学报,2023,36(3):441-448.

    XU Yaosong,BAI Jining,WANG Yuhong,et al. Prediction model of gas emission based on CEEMDAN-DA-GRU[J]. Chinese Journal of Sensors and Actuators,2023,36(3):441-448.

    [6] 刘鹏,魏卉子,景江波,等. 基于增强CART回归算法的煤矿瓦斯涌出量预测技术[J]. 煤炭科学技术,2019,47(11):116-122.

    LIU Peng,WEI Huizi,JING Jiangbo,et al. Predicting technology of gas emission quantity in coal mine based on enhanced CART regression algorithm[J]. Coal Science and Technology,2019,47(11):116-122.

    [7] 汪明,王建军. 基于随机森林的回采工作面瓦斯涌出量预测模型[J]. 煤矿安全,2012,43(8):182-185.

    WANG Ming,WANG Jianjun. Gas emission prediction model of stope based on random forests[J]. Safety in Coal Mines,2012,43(8):182-185.

    [8] 张增辉,马文伟. 基于随机森林回归算法的回采工作面瓦斯涌出量预测[J]. 工矿自动化,2023,49(12):33-39.

    ZHANG Zenghui,MA Wenwei. Prediction of gas emission in mining face based on random forest regression algorithm[J]. Journal of Mine Automation,2023,49(12):33-39.

    [9] 成小雨,周爱桃,郭焱振,等. 基于随机森林与支持向量机的回采工作面瓦斯涌出量预测方法[J]. 煤矿安全,2022,53(10):205-211.

    CHENG Xiaoyu,ZHOU Aitao,GUO Yanzhen,et al. Prediction method of gas emission based on random forest and support vector machine[J]. Safety in Coal Mines,2022,53(10):205-211.

    [10] 陈茜,黄连兵. 基于LASSO−LARS的回采工作面瓦斯涌出量预测研究[J]. 煤炭科学技术,2022,50(7):171-176.

    CHEN Qian,HUANG Lianbing. Gas emission prediction from coalface based on least absolute shrinkage and selection operator and least angle regression[J]. Coal Science and Technology,2022,50(7):171-176.

    [11] 徐刚,王磊,金洪伟,等. 因子分析法与BP神经网络耦合模型对回采工作面瓦斯涌出量预测[J]. 西安科技大学学报,2019,39(6):965-971.

    XU Gang,WANG Lei,JIN Hongwei,et al. Gas emission prediction in mining face by factor analysis and BP neural network coupling model[J]. Journal of Xi'an University of Science and Technology,2019,39(6):965-971.

    [12] 吕伏,梁冰,孙维吉,等. 基于主成分回归分析法的回采工作面瓦斯涌出量预测[J]. 煤炭学报,2012,37(1):113-116.

    LYU Fu,LIANG Bing,SUN Weiji,et al. Gas emission quantity prediction of working face based on principal component regression analysis method[J]. Journal of China Coal Society,2012,37(1):113-116.

    [13] 肖鹏,谢行俊,双海清,等. 基于KPCA−CMGANN算法的瓦斯涌出量预测研究[J]. 中国安全科学学报,2020,30(5):39-47.

    XIAO Peng,XIE Xingjun,SHUANG Haiqing,et al. Prediction of gas emission quantity based on KPCA-CMGANN algorithm[J]. China Safety Science Journal,2020,30(5):39-47.

    [14] 王媛彬,李媛媛,韩骞,等. 基于PCA−BO−XGBoost的矿井回采工作面瓦斯涌出量预测[J]. 西安科技大学学报,2022,42(2):371-379.

    WANG Yuanbin,LI Yuanyuan,HAN Qian,et al. Gas emission prediction of the stope in coal mine based on PCA-BO-XGBoost[J]. Journal of Xi'an University of Science and Technology,2022,42(2):371-379.

    [15] 陈巧军,余浩,李艳昌,等. 基于KPCA−LSSVM的回采工作面瓦斯涌出量的预测[J]. 中国安全生产科学技术,2024,20(4):78-84.

    CHEN Qiaojun,YU Hao,LI Yanchang,et al. Prediction of gas emission quantity in mining face based on KPCA-LSSVM[J]. Journal of Safety Science and Technology,2024,20(4):78-84.

    [16] 胡坤,王素珍,韩盛,等. 基于TLBO−LOIRE的回采工作面瓦斯涌出量预测[J]. 应用基础与工程科学学报,2017,25(5):1048-1056.

    HU Kun,WANG Suzhen,HAN Sheng,et al. Gas emission quantity prediction of working face based on TLBO-LOIRE method[J]. Journal of Basic Science and Engineering,2017,25(5):1048-1056.

    [17] 洪林,赫祥林,董晓雷,等. PCA−GA−ELM煤矿瓦斯涌出量预测[J]. 辽宁工程技术大学学报(自然科学版),2015,34(7):779-784. DOI: 10.11956/j.issn.1008-0562.2015.07.003

    HONG Lin,HE Xianglin,DONG Xiaolei,et al. Prediction of mine gas emission based on PCA-GA-ELM[J]. Journal of Liaoning Technical University (Natural Science),2015,34(7):779-784. DOI: 10.11956/j.issn.1008-0562.2015.07.003

    [18] 周志华. 机器学习[M]. 北京:清华大学出版社,2016.

    ZHOU Zhihua. Machine learning[M]. Beijing:Tsinghua University Press,2016.

    [19] 祝元丽,冯向阳,闫庆武,等. 基于GBDT的望奎县农田土壤有机碳主控因子研究[J]. 中国环境科学,2024,44(3):1407-1417. DOI: 10.3969/j.issn.1000-6923.2024.03.023

    ZHU Yuanli,FENG Xiangyang,YAN Qingwu,et al. Spatial distribution and main controlling factors of soil organic carbon under cultivated land based on GBDT model in black soil region of Northeast China[J]. China Environmental Science,2024,44(3):1407-1417. DOI: 10.3969/j.issn.1000-6923.2024.03.023

    [20] 黄桂灶,马同鑫,杨泽锋,等. 基于GBDT算法的弓网动态匹配特性预测模型[J]. 振动与冲击,2024,43(16):26-32,50.

    HUANG Guizao,MA Tongxin,YANG Zefeng,et al. A study on prediction model of dynamic matching characteristics of pantograph-catenary system based on the GBDT algorithm[J]. Journal of Vibration and Shock,2024,43(16):26-32,50.

    [21]

    SNOEK J,LAROCHELLE H,ADAMS R P. Practical Bayesian optimization of machine learning algorithms[C]. Annual Conference on Neural Information Processing Systems,Lake Tahoe,2012:2951-2959.

    [22] 李海霞,宋丹蕾,孔佳宁,等. 传统机器学习模型的超参数优化技术评估[J]. 计算机科学,2024,51(8):242-255. DOI: 10.11896/jsjkx.230600164

    LI Haixia,SONG Danlei,KONG Jianing,et al. Evaluation of hyperparameter optimization techniques for traditional machine learning models[J]. Computer Science,2024,51(8):242-255. DOI: 10.11896/jsjkx.230600164

    [23] 崔榕峰,马海,郭承鹏,等. 基于贝叶斯超参数优化的Gradient Boosting方法的导弹气动特性预测[J]. 航空科学技术,2023,34(7):22-28.

    CUI Rongfeng,MA Hai,GUO Chengpeng,et al. Prediction of missile aerodynamic data based on Gradient Boosting under Bayesian hyperparametric optimization[J]. Aeronautical Science & Technology,2023,34(7):22-28.

  • 期刊类型引用(2)

    1. 张逸斌,张浪,张慧杰,李伟,刘彦青,桑聪. 基于瞬态模型的井下抽采主管道泄漏定位方法研究. 工矿自动化. 2021(01): 55-60 . 本站查看
    2. 王军. 煤矿井下瓦斯抽采管网质量监测系统的应用. 矿业装备. 2021(03): 256-257 . 百度学术

    其他类型引用(1)

图(8)  /  表(7)
计量
  • 文章访问数:  48
  • HTML全文浏览量:  11
  • PDF下载量:  10
  • 被引次数: 3
出版历程
  • 收稿日期:  2024-07-06
  • 修回日期:  2024-12-21
  • 网络出版日期:  2024-12-05
  • 刊出日期:  2024-12-24

目录

    /

    返回文章
    返回