融合词汇信息的煤矿安全事故实体提取

吕惠林; 董佳瑶; 袁林; 李利

doi:10.13272/j.issn.1671-251x.2024090039

融合词汇信息的煤矿安全事故实体提取

吕惠林^1,,
董佳瑶²,
袁林³,
李利^2, ,

1.
中煤科工集团常州研究院有限公司，江苏常州　213015
2.
西安科技大学电气与控制工程学院，陕西西安　710054
3.
贵州盘江精煤股份有限公司火烧铺矿，贵州六盘水　553000

基金项目:

国家重点研发计划项目(2023YFC3009800)；陕西省教育厅科学研究计划项目(23JK0152)；陕西省自然科学基础研究计划项目(2024JC-YBQN-0726,2023-JC-QN-0001)；陕西省秦创原“科学家+工程师”队伍建设项目(2022KXJ-38)。

详细信息

作者简介:
吕惠林（1982—），男，江苏连云港人，工程师，研究方向为煤矿机电与运输技术，E-mail：22449222@qq.com

通讯作者:
李利(1991—)，男，山东枣庄人，讲师，博士，研究方向为人工智能、多模态信息融合，E-mail：lilxiansen@163.com。

中图分类号: TD67
计量
- 文章访问数: 0
- HTML全文浏览量: 0
- PDF下载量: 0
出版历程
- 收稿日期: 2024-09-10
- 修回日期: 2025-04-06
- 网络出版日期: 2025-03-26
- 刊出日期: 2025-04-14

Entity extraction integrating lexical information for coal mine safety accidents

1.
CCTEG Changzhou Research Institute , Changzhou 213015, China
2.
College of Electrical and Control Engineering, Xi'an University of Science and Technology, Xi'an 710054, China
3.
Huoshaopu Coal Mine, Guizhou Panjiang Refined Coal Co., Ltd., Liupanshui 553000, China

摘要

摘要:
命名实体识别是构建煤矿安全事故领域知识图谱的基本任务，但中文缺乏明显的词汇边界特征，导致现有实体提取模型对词汇信息利用不充分。针对上述问题，提出了一种融合词汇信息的煤矿安全事故实体提取模型——融合词汇信息的RoBERTa−BiLSTM−CRF模型。首先，构建煤矿安全领域专业词典，采用RoBERTa获取字符特征向量，采用AC自动机算法进行字词匹配，得到字符对应的潜在词汇，采用Glove获取词汇特征向量。然后，通过自注意机制分配权重，将基于RoBERTa得到的字符特征向量和基于GloVe得到的词汇特征向量进行融合，得到包含词汇信息的融合向量。最后，将融合向量作为BiLSTM−CRF的输入，得到最优预测序列结果，实现煤矿安全事故实体提取。实验结果表明：① 融合词汇信息的RoBERTa−BiLSTM−CRF模型对煤矿安全领域12种实体提取的F₁达91.63%，较RoBERTa−BiLSTM−CRF模型提高了1.63%。② 融合词汇信息的RoBERTa−BiLSTM−CRF模型在整体实体提取任务及各类实体类型的提取任务中，综合性能优于其他模型，说明模型架构设计对不同实体类型具有广泛适用性。
- 煤矿安全事故 /
- 实体提取 /
- 词汇信息 /
- 本体模型 /
- 实体标注 /
- 命名实体识别
Abstract:
Named Entity Recognition (NER) serves as a foundational task in constructing knowledge graphs for coal mine safety accidents, yet the absence of explicit lexical boundaries in Chinese text has constrained the effective utilization of lexical information by existing entity extraction models. To address this challenge, a RoBERTa-BiLSTM-CRF model integrated with lexical information was proposed for entity extraction in coal mine safety accidents. Initially, a domain-specific lexicon for coal mine safety was constructed, where character-level feature vectors were obtained via RoBERTa, and potential lexical units corresponding to characters were identified through the Aho-Corasick (AC) Automation. Subsequently, lexical feature vectors were derived using GloVe embeddings. These vectors were then fused via a self-attention mechanism, which dynamically allocated weights to integrate RoBERTa-based character features and GloVe-based lexical features, yielding a composite vector enriched with lexical semantics. Finally, the fused vector was fed into a BiLSTM-CRF framework to generate optimized prediction sequences, thereby achieving accurate entity extraction in coal mine safety accidents. Experimental results demonstrated that: (1) the proposed model achieved an F1-score of 91.63%, which was 1.63 % higher than that of the RoBERTa-BiLSTM-CRF model. (2) It outperformed comparative models in both overall entity extraction tasks and across various entity categories, indicating the broad applicability of its design to diverse entity types.
- coal mine safety accidents /
- entity extraction /
- lexical information /
- ontology model /
- entity annotation /
- Named Entity Recognition

HTML全文

图 1 煤矿安全事故本体模型

Figure 1. Ontology model of coal mine safety accident

下载: 全尺寸图片幻灯片

图 2 融合词汇信息的实体提取模型整体框架

Figure 2. Overall framework of entity extraction model integrating lexical information

下载: 全尺寸图片幻灯片

图 3 字词向量匹配流程

Figure 3. Process of word-character vector matching

下载: 全尺寸图片幻灯片

图 4 AC自动机状态

Figure 4. States of AC automaton

下载: 全尺寸图片幻灯片

图 5 字词特征向量融合流程

Figure 5. Process of word-character feature vector fusion

下载: 全尺寸图片幻灯片

图 6 RoBERTa模型的输入向量

Figure 6. Input vectors for RoBERTa model

下载: 全尺寸图片幻灯片

图 7 空间位置类实体（部分）

Figure 7. Partial examples of spatial location entities

下载: 全尺寸图片幻灯片

图 8 不同模型训练时的F₁

Figure 8. F₁-scores during training of different models

下载: 全尺寸图片幻灯片

图 9 不同模型的预测结果

Figure 9. Prediction results across different models

下载: 全尺寸图片幻灯片

表 1 命名实体标注

Table 1 Scheme of named entity annotation

序号	实体	标签	实例	备注
1	事故灾害	Accident	水灾事故	研究对象
2	采煤施工	Method	掘进作业	人员操作
3	防治措施	Prevention	顶板维护	人员操作
4	救援善后	Rescue	抢排水	人员操作
5	工作人员	Person	采掘工	工作人员
6	机电设备	Facility	掘进机	机器
7	空间位置	Place	掘进工作面	环境
8	大气环境	Atmospheric	瓦斯	环境
9	地质条件	Geology	煤层厚度	环境
10	数据参数	Parameters	每班，每周	管理
11	安全管理	Management	综合应急预案	管理
12	组织机构	Organization	抢险救援指挥部	管理

下载: 导出CSV

表 2 RoBERTa−BiLSTM− CRF实体提取模型参数设置

Table 2 Parameters for RoBERTa-BiLSTM-CRF entity extraction model

参数	RoBERTa层	字词融合层	BiLSTM层	CRF层
batch size	32	−	−	−
句子最大长度	256	−	−	−
标签的数量	12	−	−	12
转移矩阵维度	−	−	−	14×14
嵌入向量维度	1024	1024	1024	1024
Transformer层	12	−	−	−
隐藏层	768	768	128	−
多头注意力机制	12	12	−	−
词汇向量维度	−	100	−	−
LSTM层数	−	−	2	−
dropout	0.1	0.1	0.5	−
学习率	3×10⁻⁵	3×10⁻⁵	1.5×10⁻³	−
归一化参数	−	0.7	−	−

下载: 导出CSV

表 3 不同模型的实体提取结果

Table 3 Entity extraction performances across different models s %

模型	F₁	精确率	召回率
BiLSTM−CRF	70.83	71.53	70.14
RoBERTa−Softmax	84.91	85.64	84.19
RoBERTa−CRF	86.52	87.46	85.6
RoBERTa−BiLSTM−CRF	90.00	91.91	88.17
本文模型	91.63	92.38	90.89

下载: 导出CSV

表 4 12种实体类型提取的F₁

Table 4 F₁-scores for 12 entity categories

概念类	数量/个	F₁/%
概念类	数量/个	BiLSTM− CRF	RoBERTa− Softmax	RoBERTa− CRF	RoBERTa− BiLSTM−CRF	本文模型
事故灾害	524	64.50	78.82	80.53	84.16	85.69
采煤施工	613	63.78	77.98	79.77	83.20	84.83
防治措施	515	65.83	80.00	81.75	87.38	86.99
救援善后	209	70.33	87.56	86.60	90.43	91.87
工作人员	185	76.22	85.41	92.43	98.38	97.84
机电设备	1721	76.06	84.72	93.72	94.19	95.41
空间位置	1127	72.94	90.68	88.73	92.28	93.88
大气环境	158	67.09	81.65	83.54	87.34	89.24
地质条件	253	69.96	84.58	86.56	92.09	91.70
数据参数	758	71.11	88.65	87.07	90.50	92.22
安全管理	432	73.15	91.90	92.59	92.82	94.44
组织机构	69	73.91	86.96	89.86	98.55	100.00

下载: 导出CSV

参考文献(26)

[1]	国家能源局. 煤矿智能化标准体系建设指南 [EB/OL]. (2024-03-13)[2024-08-13]. https://zfxxgk.nea.gov.cn/2024-03/13/c_1310768359.htm. National Energy Administration. Guide for building the intelligent standard system of coal mine[EB/OL]. [EB/OL]. (2024-03-13)[2024-08-13]. https://zfxxgk.nea.gov.cn/2024-03/13/c_1310768359.htm.
[2]	郭梨,高元,吴昊,等. 基于混合因果逻辑的尾矿坝事故知识图谱构建与应用[J]. 金属矿山,2025(1):233-242. GUO Li,GAO Yuan,WU Hao,et al. Construction and application of tailings dam accident knowledge graph based on hybrid causal logic[J]. Metal Mine,2025(1):233-242.
[3]	JI Shaoxiong,PAN Shirui,CAMBRIA E,et al. A survey on knowledge graphs:representation,acquisition,and applications[J]. IEEE Transactions on Neural Networks and Learning Systems,2022,33(2):494-514. DOI: 10.1109/TNNLS.2021.3070843
[4]	RAU L F. Extracting company names from text[C]. The Seventh IEEE Conference on Artificial Intelligence Application,Miami Beach,1991:29-32.
[5]	GRISHMAN R,SUNDHEIM B. Message understanding conference-6:a brief history[C]. 16th Conference on Computational Linguistics,Copenhagen,1996:466-471.
[6]	任乐,张仰森,刘帅康. 基于深度学习的实体关系抽取研究综述[J]. 北京信息科技大学学报(自然科学版),2023,38(6):70-79,87. REN Le,ZHANG Yangsen,LIU Shuaikang. Review of research on entity relation extraction based on deep learning[J]. Journal of Beijing Information Science & Technology University(Science and Technology Edition),2023,38(6):70-79,87.
[7]	HUANG Zhiheng,XU Wei,YU Kai. Bidirectional LSTM-CRF models for sequence tagging[J]. Computer Science,2015. DOI: 10.48550/arXiv.1508.01991.
[8]	曹卫东,徐秀丽. 基于R−BERT−CNN模型的实体关系抽取[J]. 计算机应用与软件,2023,40(4):222-229. DOI: 10.3969/j.issn.1000-386x.2023.04.036 CAO Weidong,XU Xiuli. Entity relationship extraction based on R-BERT-CNN[J]. Computer Applications and Software,2023,40(4):222-229. DOI: 10.3969/j.issn.1000-386x.2023.04.036
[9]	肖丹,杨春明,张晖,等. 基于多头注意力的中文电子病历命名实体识别[J]. 计算机应用与软件,2024,41(1):133-138,160. DOI: 10.3969/j.issn.1000-386x.2024.01.020 XIAO Dan,YANG Chunming,ZHANG Hui,et al. Named entity recognition based on Multi-Head Attention in Chinese electronic medical records[J]. Computer Applications and Software,2024,41(1):133-138,160. DOI: 10.3969/j.issn.1000-386x.2024.01.020
[10]	潘理虎,赵彭彭,龚大立,等. 煤矿事故案例命名实体识别方法研究[J]. 计算机技术与发展,2022,32(2):154-160. DOI: 10.3969/j.issn.1673-629X.2022.02.025 PAN Lihu,ZHAO Pengpeng,GONG Dali,et al. Combined ALBERT for named entity recognition in coal mine accident cases[J]. Computer Technology and Development,2022,32(2):154-160. DOI: 10.3969/j.issn.1673-629X.2022.02.025
[11]	王向前,李敏敏,孟祥瑞. 基于ALBERT−BiLSTM− CRF的煤矿事故案例文本命名实体识别方法[J]. 阜阳师范大学学报(自然科学版),2022,39(3):56-64. WANG Xiangqian,LI Minmin,MENG Xiangrui. Named entity recognition method of coal mine accident case text based on ALBERT-BiLSTM-CRF[J]. Journal of Fuyang Normal University(Natural Science),2022,39(3):56-64.
[12]	曹现刚,吴可昕,张梦园,等. 基于BERT的煤矿装备维护知识命名实体识别研究[J]. 机床与液压,2023,51(9):103-108. DOI: 10.3969/j.issn.1001-3881.2023.09.017 CAO Xiangang,WU Kexin,ZHANG Mengyuan,et al. Coal mine equipment maintenance knowledge named entity recognition model based on BERT[J]. Machine Tool & Hydraulics,2023,51(9):103-108. DOI: 10.3969/j.issn.1001-3881.2023.09.017
[13]	刘飞翔,李泽荃,赵嘉良,等. 基于ERNIE−BiGRU−CRF模型的煤矿安全隐患命名实体智能识别研究[J]. 煤炭工程,2024,56(2):206-212. LIU Feixiang,LI Zequan,ZHAO Jialiang,et al. Intelligent recognition of named entities of coal mine safety hidden danger based on ERNIE-BiGRU-CRF model[J]. Coal Engineering,2024,56(2):206-212.
[14]	夏江镧,李艳玲,葛凤培. 基于大语言模型的实体关系抽取综述[J/OL]. 计算机科学与探索:1-23[2024-07-22]. http://kns.cnki.net/kcms/detail/11.5602.TP.20250219.1506.010.html. XIA Jianglan,LI Yanling,GE Fengpei. A survey of entity relation extraction based on large language models[J/OL]. Journal of Frontiers of Computer Science and Technology:1-23[2024-07-22]. http://kns.cnki.net/kcms/detail/11.5602.TP.20250219.1506.010.html.
[15]	MA Shengkun,HAN Jiale,LIANG Yi,et al. Making pre-trained language models better continual few-shot relation extractors[C]. Joint International Conference on Computational Linguistics,Language Resources and Evaluation,Torino,2024:10970-10983.
[16]	MIAO Xin,LI Yongqi,ZHOU Shen,et al. Episodic memory retrieval from LLMs:a neuromorphic mechanism to generate commonsense counterfactuals for relation extraction[C]. Findings of the Association for Computational Linguistics,Bangkok,2024:2489-2511.
[17]	LUO Da,GAN Yanglei,HOU Rui,et al. Synergistic anchored contrastive pre-training for few-shot relation extraction[C]. The 38th AAAI Conference on Artificial Intelligence,Vancouver,2024:18742-18750.
[18]	XU Xiaolong,LI Chenbin,XIANG Haolong,et al. Attention based document-level relation extraction with none class ranking loss[C]. The 33th International Joint Conference on Artificial Intelligence,Jeju,2024:6569-6577.
[19]	LI Guozheng,KE Wenjun,WANG Peng,et al. Unlocking instructive in-context learning with tabular prompting for relational triple extraction[C]. Joint International Conference on Computational Linguistics,Language Resources and Evaluation,Torino,2024:17131-17143.
[20]	刘婷,潘理虎,张素兰,等. 基于形式概念分析的采煤工作面本体构建研究[J]. 工矿自动化,2017,43(1):73-76. LIU Ting,PAN Lihu,ZHANG Sulan,et al. Research of ontology construction of coal mining face based on formal concept analysis[J]. Industry and Mine Automation,2017,43(1):73-76.
[21]	STENETORP P,PYYSALO S,TOPIC G,et al. BRAT:a web-based tool for NLP-assisted text annotation[C]. The 13th Conference of the European Chapter of the Association for Computational Linguistics,Avignon,2012:102-107.
[22]	姜海洋,李雪菲,杨晔. 基于距离比较的AC自动机并行匹配算法[J]. 电子与信息学报,2022,44(2):581-590. DOI: 10.11999/JEIT210009 JIANG Haiyang,LI Xuefei,YANG Ye. Distance comparison based parallel pattern matching[J]. Journal of Electronics & Information Technology,2022,44(2):581-590. DOI: 10.11999/JEIT210009
[23]	赵鹏飞,赵春江,吴华瑞,等. 基于BERT的多特征融合农业命名实体识别[J]. 农业工程学报,2022,38(3):112-118. DOI: 10.11975/j.issn.1002-6819.2022.03.013 ZHAO Pengfei,ZHAO Chunjiang,WU Huarui,et al. Recognition of the agricultural named entities with multi-feature fusion based on BERT[J]. Transactions of the Chinese Society of Agricultural Engineering,2022,38(3):112-118. DOI: 10.11975/j.issn.1002-6819.2022.03.013
[24]	周燕. 基于GloVe模型和注意力机制Bi−LSTM的文本分类方法[J]. 电子测量技术,2022,45(7):42-47. ZHOU Yan. Text classification method based on GloVe model and attention mechanism Bi-LSTM[J]. Electronic Measurement Technology,2022,45(7):42-47.
[25]	DEVLIN J,CHANG Mingwei,LEE K,et al. BERT:pretraining of deep bidirectional transformers for language understanding[C]. Conference of the North American Chapter of the Association for Computational Linguistics,Minneapolis,2019. DOI: 10.48550/arXiv.1810.04805.
[26]	李静宜,丁飞,张楠,等. 基于深度LSTM与遗传算法融合的短期交通流预测模型[J]. 无线电通信技术,2022,48(5):836-843. DOI: 10.3969/j.issn.1003-3114.2022.05.009 LI Jingyi,DING Fei,ZHANG Nan,et al. Short-term traffic flow prediction model base on fusion of depth LSTM and genetic algorithm[J]. Radio Communications Technology,2022,48(5):836-843. DOI: 10.3969/j.issn.1003-3114.2022.05.009

施引文献(6)

期刊类型引用(3)

1.	马英益. 矿井局部通风机智能集中控制系统方案设计. 设备管理与维修. 2024(11): 26-28 . 百度学术
2.	咸粤飞，崔晓光，胡冰，赵振华. 采煤机牵引变频器功率平衡控制策略的研究. 电力电子技术. 2023(11): 47-50 . 百度学术
3.	王亮，沈晔超，葛勇. 基于材料分拣模型设计的实验综述报告. 科技风. 2022(11): 58-60 . 百度学术