融合词汇信息的煤矿安全事故实体提取

吕惠林, 董佳瑶, 袁林, 李利

吕惠林,董佳瑶,袁林,等. 融合词汇信息的煤矿安全事故实体提取[J]. 工矿自动化,2025,51(4):131-139. DOI: 10.13272/j.issn.1671-251x.2024090039
引用本文: 吕惠林,董佳瑶,袁林,等. 融合词汇信息的煤矿安全事故实体提取[J]. 工矿自动化,2025,51(4):131-139. DOI: 10.13272/j.issn.1671-251x.2024090039
LYU Huilin, DONG Jiayao, YUAN Lin, et al. Entity extraction integrating lexical information for coal mine safety accidents[J]. Journal of Mine Automation,2025,51(4):131-139. DOI: 10.13272/j.issn.1671-251x.2024090039
Citation: LYU Huilin, DONG Jiayao, YUAN Lin, et al. Entity extraction integrating lexical information for coal mine safety accidents[J]. Journal of Mine Automation,2025,51(4):131-139. DOI: 10.13272/j.issn.1671-251x.2024090039

融合词汇信息的煤矿安全事故实体提取

基金项目: 

国家重点研发计划项目(2023YFC3009800);陕西省教育厅科学研究计划项目(23JK0152);陕西省自然科学基础研究计划项目(2024JC-YBQN-0726,2023-JC-QN-0001);陕西省秦创原“科学家+工程师”队伍建设项目(2022KXJ-38)。

详细信息
    作者简介:

    吕惠林(1982—),男,江苏连云港人,工程师,研究方向为煤矿机电与运输技术,E-mail:22449222@qq.com

    通讯作者:

    李利(1991—),男,山东枣庄人,讲师,博士,研究方向为人工智能、多模态信息融合,E-mail:lilxiansen@163.com

  • 中图分类号: TD67

Entity extraction integrating lexical information for coal mine safety accidents

  • 摘要:

    命名实体识别是构建煤矿安全事故领域知识图谱的基本任务,但中文缺乏明显的词汇边界特征,导致现有实体提取模型对词汇信息利用不充分。针对上述问题,提出了一种融合词汇信息的煤矿安全事故实体提取模型——融合词汇信息的RoBERTa−BiLSTM−CRF模型。首先,构建煤矿安全领域专业词典,采用RoBERTa获取字符特征向量,采用AC自动机算法进行字词匹配,得到字符对应的潜在词汇,采用Glove获取词汇特征向量。然后,通过自注意机制分配权重,将基于RoBERTa得到的字符特征向量和基于GloVe得到的词汇特征向量进行融合,得到包含词汇信息的融合向量。最后,将融合向量作为BiLSTM−CRF的输入,得到最优预测序列结果,实现煤矿安全事故实体提取。实验结果表明:① 融合词汇信息的RoBERTa−BiLSTM−CRF模型对煤矿安全领域12种实体提取的F1达91.63%,较RoBERTa−BiLSTM−CRF模型提高了1.63%。② 融合词汇信息的RoBERTa−BiLSTM−CRF模型在整体实体提取任务及各类实体类型的提取任务中,综合性能优于其他模型,说明模型架构设计对不同实体类型具有广泛适用性。

    Abstract:

    Named Entity Recognition (NER) serves as a foundational task in constructing knowledge graphs for coal mine safety accidents, yet the absence of explicit lexical boundaries in Chinese text has constrained the effective utilization of lexical information by existing entity extraction models. To address this challenge, a RoBERTa-BiLSTM-CRF model integrated with lexical information was proposed for entity extraction in coal mine safety accidents. Initially, a domain-specific lexicon for coal mine safety was constructed, where character-level feature vectors were obtained via RoBERTa, and potential lexical units corresponding to characters were identified through the Aho-Corasick (AC) Automation. Subsequently, lexical feature vectors were derived using GloVe embeddings. These vectors were then fused via a self-attention mechanism, which dynamically allocated weights to integrate RoBERTa-based character features and GloVe-based lexical features, yielding a composite vector enriched with lexical semantics. Finally, the fused vector was fed into a BiLSTM-CRF framework to generate optimized prediction sequences, thereby achieving accurate entity extraction in coal mine safety accidents. Experimental results demonstrated that: (1) the proposed model achieved an F1-score of 91.63%, which was 1.63 % higher than that of the RoBERTa-BiLSTM-CRF model. (2) It outperformed comparative models in both overall entity extraction tasks and across various entity categories, indicating the broad applicability of its design to diverse entity types.

  • 图  1   煤矿安全事故本体模型

    Figure  1.   Ontology model of coal mine safety accident

    图  2   融合词汇信息的实体提取模型整体框架

    Figure  2.   Overall framework of entity extraction model integrating lexical information

    图  3   字词向量匹配流程

    Figure  3.   Process of word-character vector matching

    图  4   AC自动机状态

    Figure  4.   States of AC automaton

    图  5   字词特征向量融合流程

    Figure  5.   Process of word-character feature vector fusion

    图  6   RoBERTa模型的输入向量

    Figure  6.   Input vectors for RoBERTa model

    图  7   空间位置类实体(部分)

    Figure  7.   Partial examples of spatial location entities

    图  8   不同模型训练时的F1

    Figure  8.   F1-scores during training of different models

    图  9   不同模型的预测结果

    Figure  9.   Prediction results across different models

    表  1   命名实体标注

    Table  1   Scheme of named entity annotation

    序号 实体 标签 实例 备注
    1 事故灾害 Accident 水灾事故 研究对象
    2 采煤施工 Method 掘进作业 人员操作
    3 防治措施 Prevention 顶板维护 人员操作
    4 救援善后 Rescue 抢排水 人员操作
    5 工作人员 Person 采掘工 工作人员
    6 机电设备 Facility 掘进机 机器
    7 空间位置 Place 掘进工作面 环境
    8 大气环境 Atmospheric 瓦斯 环境
    9 地质条件 Geology 煤层厚度 环境
    10 数据参数 Parameters 每班,每周 管理
    11 安全管理 Management 综合应急预案 管理
    12 组织机构 Organization 抢险救援指挥部 管理
    下载: 导出CSV

    表  2   RoBERTa−BiLSTM− CRF实体提取模型参数设置

    Table  2   Parameters for RoBERTa-BiLSTM-CRF entity extraction model

    参数 RoBERTa层 字词融合层 BiLSTM层 CRF层
    batch size 32
    句子最大长度 256
    标签的数量 12 12
    转移矩阵维度 14×14
    嵌入向量维度 1024 1024 1024 1024
    Transformer层 12
    隐藏层 768 768 128
    多头注意力机制 12 12
    词汇向量维度 100
    LSTM层数 2
    dropout 0.1 0.1 0.5
    学习率 3×10−5 3×10−5 1.5×10−3
    归一化参数 0.7
    下载: 导出CSV

    表  3   不同模型的实体提取结果

    Table  3   Entity extraction performances across different models s %

    模型 F1 精确率 召回率
    BiLSTM−CRF 70.83 71.53 70.14
    RoBERTa−Softmax 84.91 85.64 84.19
    RoBERTa−CRF 86.52 87.46 85.6
    RoBERTa−BiLSTM−CRF 90.00 91.91 88.17
    本文模型 91.63 92.38 90.89
    下载: 导出CSV

    表  4   12种实体类型提取的F1

    Table  4   F1-scores for 12 entity categories

    概念类 数量/个 F1/%
    BiLSTM−
    CRF
    RoBERTa−
    Softmax
    RoBERTa−
    CRF
    RoBERTa−
    BiLSTM−CRF
    本文
    模型
    事故灾害 524 64.50 78.82 80.53 84.16 85.69
    采煤施工 613 63.78 77.98 79.77 83.20 84.83
    防治措施 515 65.83 80.00 81.75 87.38 86.99
    救援善后 209 70.33 87.56 86.60 90.43 91.87
    工作人员 185 76.22 85.41 92.43 98.38 97.84
    机电设备 1721 76.06 84.72 93.72 94.19 95.41
    空间位置 1127 72.94 90.68 88.73 92.28 93.88
    大气环境 158 67.09 81.65 83.54 87.34 89.24
    地质条件 253 69.96 84.58 86.56 92.09 91.70
    数据参数 758 71.11 88.65 87.07 90.50 92.22
    安全管理 432 73.15 91.90 92.59 92.82 94.44
    组织机构 69 73.91 86.96 89.86 98.55 100.00
    下载: 导出CSV
  • [1] 国家能源局. 煤矿智能化标准体系建设指南 [EB/OL]. (2024-03-13)[2024-08-13]. https://zfxxgk.nea.gov.cn/2024-03/13/c_1310768359.htm.

    National Energy Administration. Guide for building the intelligent standard system of coal mine[EB/OL]. [EB/OL]. (2024-03-13)[2024-08-13]. https://zfxxgk.nea.gov.cn/2024-03/13/c_1310768359.htm.

    [2] 郭梨,高元,吴昊,等. 基于混合因果逻辑的尾矿坝事故知识图谱构建与应用[J]. 金属矿山,2025(1):233-242.

    GUO Li,GAO Yuan,WU Hao,et al. Construction and application of tailings dam accident knowledge graph based on hybrid causal logic[J]. Metal Mine,2025(1):233-242.

    [3]

    JI Shaoxiong,PAN Shirui,CAMBRIA E,et al. A survey on knowledge graphs:representation,acquisition,and applications[J]. IEEE Transactions on Neural Networks and Learning Systems,2022,33(2):494-514. DOI: 10.1109/TNNLS.2021.3070843

    [4]

    RAU L F. Extracting company names from text[C]. The Seventh IEEE Conference on Artificial Intelligence Application,Miami Beach,1991:29-32.

    [5]

    GRISHMAN R,SUNDHEIM B. Message understanding conference-6:a brief history[C]. 16th Conference on Computational Linguistics,Copenhagen,1996:466-471.

    [6] 任乐,张仰森,刘帅康. 基于深度学习的实体关系抽取研究综述[J]. 北京信息科技大学学报(自然科学版),2023,38(6):70-79,87.

    REN Le,ZHANG Yangsen,LIU Shuaikang. Review of research on entity relation extraction based on deep learning[J]. Journal of Beijing Information Science & Technology University(Science and Technology Edition),2023,38(6):70-79,87.

    [7]

    HUANG Zhiheng,XU Wei,YU Kai. Bidirectional LSTM-CRF models for sequence tagging[J]. Computer Science,2015. DOI: 10.48550/arXiv.1508.01991.

    [8] 曹卫东,徐秀丽. 基于R−BERT−CNN模型的实体关系抽取[J]. 计算机应用与软件,2023,40(4):222-229. DOI: 10.3969/j.issn.1000-386x.2023.04.036

    CAO Weidong,XU Xiuli. Entity relationship extraction based on R-BERT-CNN[J]. Computer Applications and Software,2023,40(4):222-229. DOI: 10.3969/j.issn.1000-386x.2023.04.036

    [9] 肖丹,杨春明,张晖,等. 基于多头注意力的中文电子病历命名实体识别[J]. 计算机应用与软件,2024,41(1):133-138,160. DOI: 10.3969/j.issn.1000-386x.2024.01.020

    XIAO Dan,YANG Chunming,ZHANG Hui,et al. Named entity recognition based on Multi-Head Attention in Chinese electronic medical records[J]. Computer Applications and Software,2024,41(1):133-138,160. DOI: 10.3969/j.issn.1000-386x.2024.01.020

    [10] 潘理虎,赵彭彭,龚大立,等. 煤矿事故案例命名实体识别方法研究[J]. 计算机技术与发展,2022,32(2):154-160. DOI: 10.3969/j.issn.1673-629X.2022.02.025

    PAN Lihu,ZHAO Pengpeng,GONG Dali,et al. Combined ALBERT for named entity recognition in coal mine accident cases[J]. Computer Technology and Development,2022,32(2):154-160. DOI: 10.3969/j.issn.1673-629X.2022.02.025

    [11] 王向前,李敏敏,孟祥瑞. 基于ALBERT−BiLSTM− CRF的煤矿事故案例文本命名实体识别方法[J]. 阜阳师范大学学报(自然科学版),2022,39(3):56-64.

    WANG Xiangqian,LI Minmin,MENG Xiangrui. Named entity recognition method of coal mine accident case text based on ALBERT-BiLSTM-CRF[J]. Journal of Fuyang Normal University(Natural Science),2022,39(3):56-64.

    [12] 曹现刚,吴可昕,张梦园,等. 基于BERT的煤矿装备维护知识命名实体识别研究[J]. 机床与液压,2023,51(9):103-108. DOI: 10.3969/j.issn.1001-3881.2023.09.017

    CAO Xiangang,WU Kexin,ZHANG Mengyuan,et al. Coal mine equipment maintenance knowledge named entity recognition model based on BERT[J]. Machine Tool & Hydraulics,2023,51(9):103-108. DOI: 10.3969/j.issn.1001-3881.2023.09.017

    [13] 刘飞翔,李泽荃,赵嘉良,等. 基于ERNIE−BiGRU−CRF模型的煤矿安全隐患命名实体智能识别研究[J]. 煤炭工程,2024,56(2):206-212.

    LIU Feixiang,LI Zequan,ZHAO Jialiang,et al. Intelligent recognition of named entities of coal mine safety hidden danger based on ERNIE-BiGRU-CRF model[J]. Coal Engineering,2024,56(2):206-212.

    [14] 夏江镧,李艳玲,葛凤培. 基于大语言模型的实体关系抽取综述[J/OL]. 计算机科学与探索:1-23[2024-07-22]. http://kns.cnki.net/kcms/detail/11.5602.TP.20250219.1506.010.html.

    XIA Jianglan,LI Yanling,GE Fengpei. A survey of entity relation extraction based on large language models[J/OL]. Journal of Frontiers of Computer Science and Technology:1-23[2024-07-22]. http://kns.cnki.net/kcms/detail/11.5602.TP.20250219.1506.010.html.

    [15]

    MA Shengkun,HAN Jiale,LIANG Yi,et al. Making pre-trained language models better continual few-shot relation extractors[C]. Joint International Conference on Computational Linguistics,Language Resources and Evaluation,Torino,2024:10970-10983.

    [16]

    MIAO Xin,LI Yongqi,ZHOU Shen,et al. Episodic memory retrieval from LLMs:a neuromorphic mechanism to generate commonsense counterfactuals for relation extraction[C]. Findings of the Association for Computational Linguistics,Bangkok,2024:2489-2511.

    [17]

    LUO Da,GAN Yanglei,HOU Rui,et al. Synergistic anchored contrastive pre-training for few-shot relation extraction[C]. The 38th AAAI Conference on Artificial Intelligence,Vancouver,2024:18742-18750.

    [18]

    XU Xiaolong,LI Chenbin,XIANG Haolong,et al. Attention based document-level relation extraction with none class ranking loss[C]. The 33th International Joint Conference on Artificial Intelligence,Jeju,2024:6569-6577.

    [19]

    LI Guozheng,KE Wenjun,WANG Peng,et al. Unlocking instructive in-context learning with tabular prompting for relational triple extraction[C]. Joint International Conference on Computational Linguistics,Language Resources and Evaluation,Torino,2024:17131-17143.

    [20] 刘婷,潘理虎,张素兰,等. 基于形式概念分析的采煤工作面本体构建研究[J]. 工矿自动化,2017,43(1):73-76.

    LIU Ting,PAN Lihu,ZHANG Sulan,et al. Research of ontology construction of coal mining face based on formal concept analysis[J]. Industry and Mine Automation,2017,43(1):73-76.

    [21]

    STENETORP P,PYYSALO S,TOPIC G,et al. BRAT:a web-based tool for NLP-assisted text annotation[C]. The 13th Conference of the European Chapter of the Association for Computational Linguistics,Avignon,2012:102-107.

    [22] 姜海洋,李雪菲,杨晔. 基于距离比较的AC自动机并行匹配算法[J]. 电子与信息学报,2022,44(2):581-590. DOI: 10.11999/JEIT210009

    JIANG Haiyang,LI Xuefei,YANG Ye. Distance comparison based parallel pattern matching[J]. Journal of Electronics & Information Technology,2022,44(2):581-590. DOI: 10.11999/JEIT210009

    [23] 赵鹏飞,赵春江,吴华瑞,等. 基于BERT的多特征融合农业命名实体识别[J]. 农业工程学报,2022,38(3):112-118. DOI: 10.11975/j.issn.1002-6819.2022.03.013

    ZHAO Pengfei,ZHAO Chunjiang,WU Huarui,et al. Recognition of the agricultural named entities with multi-feature fusion based on BERT[J]. Transactions of the Chinese Society of Agricultural Engineering,2022,38(3):112-118. DOI: 10.11975/j.issn.1002-6819.2022.03.013

    [24] 周燕. 基于GloVe模型和注意力机制Bi−LSTM的文本分类方法[J]. 电子测量技术,2022,45(7):42-47.

    ZHOU Yan. Text classification method based on GloVe model and attention mechanism Bi-LSTM[J]. Electronic Measurement Technology,2022,45(7):42-47.

    [25]

    DEVLIN J,CHANG Mingwei,LEE K,et al. BERT:pretraining of deep bidirectional transformers for language understanding[C]. Conference of the North American Chapter of the Association for Computational Linguistics,Minneapolis,2019. DOI: 10.48550/arXiv.1810.04805.

    [26] 李静宜,丁飞,张楠,等. 基于深度LSTM与遗传算法融合的短期交通流预测模型[J]. 无线电通信技术,2022,48(5):836-843. DOI: 10.3969/j.issn.1003-3114.2022.05.009

    LI Jingyi,DING Fei,ZHANG Nan,et al. Short-term traffic flow prediction model base on fusion of depth LSTM and genetic algorithm[J]. Radio Communications Technology,2022,48(5):836-843. DOI: 10.3969/j.issn.1003-3114.2022.05.009

  • 期刊类型引用(3)

    1. 马英益. 矿井局部通风机智能集中控制系统方案设计. 设备管理与维修. 2024(11): 26-28 . 百度学术
    2. 咸粤飞,崔晓光,胡冰,赵振华. 采煤机牵引变频器功率平衡控制策略的研究. 电力电子技术. 2023(11): 47-50 . 百度学术
    3. 王亮,沈晔超,葛勇. 基于材料分拣模型设计的实验综述报告. 科技风. 2022(11): 58-60 . 百度学术

    其他类型引用(3)

图(9)  /  表(4)
计量
  • 文章访问数:  0
  • HTML全文浏览量:  0
  • PDF下载量:  0
  • 被引次数: 6
出版历程
  • 收稿日期:  2024-09-10
  • 修回日期:  2025-04-06
  • 网络出版日期:  2025-03-26
  • 刊出日期:  2025-04-14

目录

    /

    返回文章
    返回