基于大语言模型的矿山事故知识图谱构建

张朋杨; 生龙; 王巍; 魏忠诚; 赵继军

doi:10.13272/j.issn.1671-251x.2024080031

基于大语言模型的矿山事故知识图谱构建

张朋杨^{1, 2,},
生龙^{1, 2, ,},
王巍^{1, 2},
魏忠诚^{1, 2},
赵继军^{1, 2}

1.
河北工程大学信息与电气工程学院，河北邯郸　056038
2.
河北工程大学河北省安防信息感知与处理重点实验室，河北邯郸　056038

基金项目: 国家自然科学基金资助项目（61802107）；河北省高等学校科学技术研究项目（ZD2020171）；河北省省级科技计划资助项目（22567624H）。

详细信息

作者简介:
张朋杨（1998—），男，河北邯郸人，硕士研究生，主要研究方向为自然语言处理、知识图谱，E-mail：zhangpy996@163.com

通讯作者:
生龙（1982—），男，河北邯郸人，副教授，博士，主要研究方向为自然语言处理、人工智能与城市公共安全，E-mail:shenglong@hebeu.edu.cn。

中图分类号: TD67
计量
- 文章访问数: 140
- HTML全文浏览量: 16
- PDF下载量: 51
出版历程
- 收稿日期: 2024-08-13
- 修回日期: 2025-02-27
- 网络出版日期: 2025-02-27
- 刊出日期: 2025-02-14

Construction of a mine accident knowledge graph based on Large Language Models

ZHANG Pengyang^{1, 2,},
SHENG Long^{1, 2, ,},
WANG Wei^{1, 2},
WEI Zhongcheng^{1, 2},
ZHAO Jijun^{1, 2}

1.
School of Information and Electrical Engineering, Hebei University of Engineering, Handan 056038, China
2.
Hebei Provincial Key Laboratory of Security Information Perception and Processing, Hebei University of Engineering, Handan 056038, China

摘要

摘要:
现有矿山领域知识图谱构建方法在预训练阶段需要大量人工标注的高质量监督数据，人力成本高且效率低。大语言模型（LLM）可在少量人工标注的高质量数据下显著提高信息抽取的质量且效率较高，然而LLM结合Prompt的方法会产生灾难性遗忘问题。针对上述问题，将图结构信息嵌入到Prompt模板中，提出了图结构Prompt，通过在LLM上嵌入图结构Prompt，实现基于LLM的矿山事故知识图谱高质量构建。首先，收集煤矿安全生产网公开的矿山事故报告并进行格式修正、冗余信息剔除等预处理。其次，利用LLM挖掘矿山事故报告文本中蕴含的知识，对矿山事故报告文本中的实体及实体间关系进行K−means聚类，完成矿山事故本体构建。然后，依据构建的本体进行少量数据标注，标注数据用于LLM的学习与微调。最后，采用嵌入图结构Prompt的LLM进行信息抽取，实例化实体关系三元组，从而构建矿山事故知识图谱。实验结果表明：在实体抽取和关系抽取任务中，LLM的表现优于通用信息抽取（UIE）模型，且嵌入图结构Prompt的LLM在精确率、召回率、F₁值方面均高于未嵌入图结构Prompt的LLM。
- 矿山事故 /
- 知识图谱 /
- 大语言模型 /
- 图结构Prompt /
- 本体构建 /
- 信息抽取
Abstract:
Current methods for constructing knowledge graphs in the field of mining require a large amount of manually labeled high-quality supervised data during the pre-training stage, resulting in high labor costs and low efficiency. Large Language Models (LLMs) can significantly improve the quality and efficiency of information extraction with only a small amount of manually labeled high-quality data. However, the prompt-based approach in LLMs suffers from catastrophic forgetting. To address this issue, graph-structured information was embedded into the prompt template and a Graph-Structured Prompt was proposed. By integrating this prompt into the LLM, high-quality construction of a mine accident knowledge graph based on the LLM was achieved. First, publicly available mine accident reports were collected from the Coal Mine Safety Production Network and preprocessed through formatting corrections and redundant information removal. Next, the LLM was utilized to extract knowledge embedded in the accident reports and K-means clustering was used to classify entities and relationships, thereby completing the construction of the mine accident ontology. Then, a small amount of data were labeled based on the ontology, which was used for LLM training and fine-tuning. Finally, the LLM embedded with the Graph-Structured Prompt was employed for information extraction, instantiating entity-relation triples to construct the mine accident knowledge graph. Experimental results showed that LLMs outperformed the Universal Information Extraction (UIE) model in entity and relationship extraction tasks. Moreover, the LLM embedded with the Graph-Structured Prompt achieved higher precision, recall, and F1 scores compared to those without it.
- mine accident /
- knowledge graph /
- Large Language Model /
- Graph-Structured Prompt /
- ontology construction /
- information extraction

HTML全文

图 1 矿山事故知识图谱构建流程

Figure 1. Construction process of mine accident knowledge graph

下载: 全尺寸图片幻灯片

图 2 实体关系挖掘过程

Figure 2. Entity-relationship mining process

下载: 全尺寸图片幻灯片

图 3 实体及实体间关系

Figure 3. Entities and relationships between entities

下载: 全尺寸图片幻灯片

图 4 事故概述文本的图结构信息

Figure 4. Graph structure information of accident overview text

下载: 全尺寸图片幻灯片

图 5 事故单位情况文本的图结构信息

Figure 5. Graph structure information of accident unit situation text

下载: 全尺寸图片幻灯片

图 6 事故发生经过文本的图结构信息

Figure 6. Graph structure information of accident occurrence text

下载: 全尺寸图片幻灯片

图 7 信息抽取过程及案例

Figure 7. Information extraction process and case example

下载: 全尺寸图片幻灯片

图 8 数据预处理流程

Figure 8. Data preprocessing process

下载: 全尺寸图片幻灯片

图 9 顶板事故知识图谱

Figure 9. Knowledge graph of roof accident

下载: 全尺寸图片幻灯片

表 1 UIE模型与LLM在信息抽取任务上的对比结果

Table 1 Comparison results of Universal Information Extraction（UIE） model and Large Language Model（LLM） in information extraction tasks

模型	实体抽取			关系抽取
模型	精确率	召回率	F₁	精确率	召回率	F₁
UIE	0.894	0.827	0.859	0.713	0.627	0.667
GPT−3.5	0.893	0.847	0.870	0.887	0.904	0.895
GLM_4	0.956	0.850	0.901	0.910	0.885	0.898
ERNIE−4.0	0.752	0.836	0.792	0.788	0.817	0.802
Qwen−7B−chat	0.883	0.855	0.869	0.862	0.881	0.871

下载: 导出CSV

表 2 LLM嵌入图结构Prompt前后在信息抽取任务上的对比结果

Table 2 Comparison results of information extraction tasks before and after LLM embedded with Graph-Structured Prompt

模型		实体抽取			关系抽取
模型		精确率	召回率	F₁	精确率	召回率	F₁
GPT−3.5	未嵌入图结构Prompt	0.775	0.835	0.804	0.803	0.791	0.797
GPT−3.5	嵌入图结构Prompt	0.893	0.847	0.870	0.887	0.904	0.895
GLM_4	未嵌入图结构Prompt	0.831	0.679	0.793	0.785	0.794	0.789
GLM_4	嵌入图结构Prompt	0.956	0.850	0.901	0.910	0.885	0.898
ERNIE−4.0	未嵌入图结构Prompt	0.673	0.731	0.701	0.695	0.683	0.689
ERNIE−4.0	嵌入图结构Prompt	0.752	0.836	0.792	0.788	0.817	0.802
Qwen−7B−chat	未嵌入图结构Prompt	0.761	0.748	0.754	0.792	0.731	0.760
Qwen−7B−chat	嵌入图结构Prompt	0.883	0.855	0.869	0.862	0.881	0.871

下载: 导出CSV

参考文献(21)

[1]	JI Shaoxiong,PAN Shirui,CAMBRIA E,et al. A survey on knowledge graphs:representation,acquisition,and applications[J]. IEEE Transactions on Neural Networks and Learning Systems,2022,33(2):494-514. DOI: 10.1109/TNNLS.2021.3070843
[2]	王国法,任怀伟,赵国瑞,等. 智能化煤矿数据模型及复杂巨系统耦合技术体系[J]. 煤炭学报,2022,47(1):61-74. WANG Guofa,REN Huaiwei,ZHAO Guorui,et al. Digital model and giant system coupling technology system of smart coal mine[J]. Journal of China Coal Society,2022,47(1):61-74.
[3]	郭晓黎,王宇,刘瑞祥. 面向煤矿安全事件本体模型研究与应用[J]. 中国煤炭,2014,40(12):113-116. GUO Xiaoli,WANG Yu,LIU Ruixiang. Research and application of event ontology model of coal mine accidents[J]. China Coal,2014,40(12):113-116.
[4]	潘理虎,张佳宇,张英俊,等. 煤矿领域知识图谱构建[J]. 计算机应用与软件,2019,36(8):47-54,59. DOI: 10.3969/j.issn.1000-386x.2019.08.009 PAN Lihu,ZHANG Jiayu,ZHANG Yingjun,et al. Construction of knowledge graph in coal mine domain[J]. Computer Applications and Software,2019,36(8):47-54,59. DOI: 10.3969/j.issn.1000-386x.2019.08.009
[5]	李蓓,王鹏,杨政,等. 基于多层次语义约束的煤矿灾害事件本体模型构建[J]. 陕西煤炭,2024,43(4):146-149. LI Bei,WANG Peng,YANG Zheng,et al. Disaster event ontology model building of coal mine based on multi-level semantic constraints[J]. Shaanxi Coal,2024,43(4):146-149.
[6]	曹现刚,张梦园,雷卓,等. 煤矿装备维护知识图谱构建及应用[J]. 工矿自动化,2021,47(3):41-45. CAO Xiangang,ZHANG Mengyuan,LEI Zhuo,et al. Construction and application of knowledge graph for coal mine equipment maintenance[J]. Industry and Mine Automation,2021,47(3):41-45.
[7]	王忠强,宋建鑫,余数三,等. 基于依存句法分析的智慧矿山知识图谱构建方法[J]. 矿业研究与开发,2023,43(10):232-240. WANG Zhongqiang,SONG Jianxin,YU Shusan,et al. A method of constructing knowledge graph of intelligent mines based on dependency syntax analysis[J]. Mining Research and Development,2023,43(10):232-240.
[8]	韩一搏,董立红,叶鸥. 基于联合编码的煤矿综采设备知识图谱构建[J]. 工矿自动化,2024,50(4):84-93. HAN Yibo,DONG Lihong,YE Ou. Construction of knowledge graph for fully mechanized coal mining equipment based on joint coding[J]. Journal of Mine Automation,2024,50(4):84-93.
[9]	ZHONG Lingfeng,WU Jia,LI Qian,et al. A comprehensive survey on automatic knowledge graph construction[J]. ACM Computing Surveys,2023,56(4):1-62.
[10]	DAGDELEN J,DUNN A,LEE S,et al. Structured information extraction from scientific text with large language models[J]. Nature Communications,2024,15(1). DOI: 10.1038/s41467-024-45563-x.
[11]	HU Yan,CHEN Qingyu,DU Jingcheng,et al. Improving large language models for clinical named entity recognition via prompt engineering[J]. Journal of the American Medical Informatics Association,2024,31(9):1812-1820. DOI: 10.1093/jamia/ocad259
[12]	REMADI A,EL HAGE K,HOBEIKA Y,et al. To prompt or not to prompt:navigating the use of large language models for integrating and modeling heterogeneous data[J]. Data & Knowledge Engineering,2024,152. DOI: 10.1016/J.DATAK.2024.102313.
[13]	AGRAWAL M,HEGSELMANN S,LANG H,et al. Large language models are few-shot clinical information extractors[EB/OL]. [2024-07-25]. https://arxiv.org/abs/2205.12689v2.
[14]	WADHWA S,AMIR S,WALLACE B C. Revisiting relation extraction in the era of large language models[EB/OL]. [2024-07-25]. https://doi.org/10.48550/arXiv.2305.05003.
[15]	冯钧,畅阳红,陆佳民,等. 基于大语言模型的水工程调度知识图谱的构建与应用[J]. 计算机科学与探索,2024,18(6):1637-1647. DOI: 10.3778/j.issn.1673-9418.2311098 FENG Jun,CHANG Yanghong,LU Jiamin,et al. Construction and application of knowledge graph for water engineering scheduling based on large language model[J]. Journal of Frontiers of Computer Science and Technology,2024,18(6):1637-1647. DOI: 10.3778/j.issn.1673-9418.2311098
[16]	WANG Jiaqi,SHI Enze,YU Sigang,et al. Prompt engineering for healthcare:methodologies and applications[EB/OL]. [2024-07-25]. https://arxiv.org/abs/2304.14670?context=cs.
[17]	TONEVA M,SORDONI A,DES COMBES R T,et al. An empirical study of example forgetting during deep neural network learning[EB/OL]. [2024-07-25]. https://arxiv.org/abs/1812.05159.
[18]	LI Lei,JIN Li,ZHANG Zequn,et al. Graph convolution over multiple latent context-aware graph structures for event detection[J]. IEEE Access,2020,8:171435-171446. DOI: 10.1109/ACCESS.2020.3024872
[19]	ZHANG Qianjin,WANG Ronggui,YANG Juan,et al. Structural context-based knowledge graph embedding for link prediction[J]. Neurocomputing,2022,470:109-120. DOI: 10.1016/j.neucom.2021.10.088
[20]	张吉祥,张祥森,武长旭,等. 知识图谱构建技术综述[J]. 计算机工程,2022,48(3):23-37. ZHANG Jixiang,ZHANG Xiangsen,WU Changxu,et al. Survey of knowledge graph construction techniques[J]. Computer Engineering,2022,48(3):23-37.
[21]	LU Yaojie,LIU Qing,DAI Dai,et al. Unified structure generation for universal information extraction[C]. The 60th Annual Meeting of the Association for Computational Linguistics,Dublin,2022:5755-5772.

施引文献

资源附件(0)

图(9) / 表(2)

计量

文章访问数: 140
HTML全文浏览量: 16
PDF下载量: 51
被引次数: 0

基于大语言模型的矿山事故知识图谱构建

作者简介: 张朋杨（1998—），男，河北邯郸人，硕士研究生，主要研究方向为自然语言处理、知识图谱，E-mail：zhangpy996@163.com

通讯作者: 生龙（1982—），男，河北邯郸人，副教授，博士，主要研究方向为自然语言处理、人工智能与城市公共安全，E-mail:shenglong@hebeu.edu.cn。

计量

出版历程

Construction of a mine accident knowledge graph based on Large Language Models

计量

出版历程

目录

作者简介:
张朋杨（1998—），男，河北邯郸人，硕士研究生，主要研究方向为自然语言处理、知识图谱，E-mail：zhangpy996@163.com

通讯作者:
生龙（1982—），男，河北邯郸人，副教授，博士，主要研究方向为自然语言处理、人工智能与城市公共安全，E-mail:shenglong@hebeu.edu.cn。