基于大语言模型的矿山事故知识图谱构建

张朋杨, 生龙, 王巍, 魏忠诚, 赵继军

张朋杨,生龙,王巍,等. 基于大语言模型的矿山事故知识图谱构建[J]. 工矿自动化,2025,51(2):76-83, 105. DOI: 10.13272/j.issn.1671-251x.2024080031
引用本文: 张朋杨,生龙,王巍,等. 基于大语言模型的矿山事故知识图谱构建[J]. 工矿自动化,2025,51(2):76-83, 105. DOI: 10.13272/j.issn.1671-251x.2024080031
ZHANG Pengyang, SHENG Long, WANG Wei, et al. Construction of a mine accident knowledge graph based on Large Language Models[J]. Journal of Mine Automation,2025,51(2):76-83, 105. DOI: 10.13272/j.issn.1671-251x.2024080031
Citation: ZHANG Pengyang, SHENG Long, WANG Wei, et al. Construction of a mine accident knowledge graph based on Large Language Models[J]. Journal of Mine Automation,2025,51(2):76-83, 105. DOI: 10.13272/j.issn.1671-251x.2024080031

基于大语言模型的矿山事故知识图谱构建

基金项目: 国家自然科学基金资助项目(61802107);河北省高等学校科学技术研究项目(ZD2020171);河北省省级科技计划资助项目(22567624H)。
详细信息
    作者简介:

    张朋杨(1998—),男,河北邯郸人,硕士研究生,主要研究方向为自然语言处理、知识图谱,E-mail:zhangpy996@163.com

    通讯作者:

    生龙(1982—),男,河北邯郸人,副教授,博士,主要研究方向为自然语言处理、人工智能与城市公共安全,E-mail:shenglong@hebeu.edu.cn

  • 中图分类号: TD67

Construction of a mine accident knowledge graph based on Large Language Models

  • 摘要:

    现有矿山领域知识图谱构建方法在预训练阶段需要大量人工标注的高质量监督数据,人力成本高且效率低。大语言模型(LLM)可在少量人工标注的高质量数据下显著提高信息抽取的质量且效率较高,然而LLM结合Prompt的方法会产生灾难性遗忘问题。针对上述问题,将图结构信息嵌入到Prompt模板中,提出了图结构Prompt,通过在LLM上嵌入图结构Prompt,实现基于LLM的矿山事故知识图谱高质量构建。首先,收集煤矿安全生产网公开的矿山事故报告并进行格式修正、冗余信息剔除等预处理。其次,利用LLM挖掘矿山事故报告文本中蕴含的知识,对矿山事故报告文本中的实体及实体间关系进行K−means聚类,完成矿山事故本体构建。然后,依据构建的本体进行少量数据标注,标注数据用于LLM的学习与微调。最后,采用嵌入图结构Prompt的LLM进行信息抽取,实例化实体关系三元组,从而构建矿山事故知识图谱。实验结果表明:在实体抽取和关系抽取任务中,LLM的表现优于通用信息抽取(UIE)模型,且嵌入图结构Prompt的LLM在精确率、召回率、F1值方面均高于未嵌入图结构Prompt的LLM。

    Abstract:

    Current methods for constructing knowledge graphs in the field of mining require a large amount of manually labeled high-quality supervised data during the pre-training stage, resulting in high labor costs and low efficiency. Large Language Models (LLMs) can significantly improve the quality and efficiency of information extraction with only a small amount of manually labeled high-quality data. However, the prompt-based approach in LLMs suffers from catastrophic forgetting. To address this issue, graph-structured information was embedded into the prompt template and a Graph-Structured Prompt was proposed. By integrating this prompt into the LLM, high-quality construction of a mine accident knowledge graph based on the LLM was achieved. First, publicly available mine accident reports were collected from the Coal Mine Safety Production Network and preprocessed through formatting corrections and redundant information removal. Next, the LLM was utilized to extract knowledge embedded in the accident reports and K-means clustering was used to classify entities and relationships, thereby completing the construction of the mine accident ontology. Then, a small amount of data were labeled based on the ontology, which was used for LLM training and fine-tuning. Finally, the LLM embedded with the Graph-Structured Prompt was employed for information extraction, instantiating entity-relation triples to construct the mine accident knowledge graph. Experimental results showed that LLMs outperformed the Universal Information Extraction (UIE) model in entity and relationship extraction tasks. Moreover, the LLM embedded with the Graph-Structured Prompt achieved higher precision, recall, and F1 scores compared to those without it.

  • 图  1   矿山事故知识图谱构建流程

    Figure  1.   Construction process of mine accident knowledge graph

    图  2   实体关系挖掘过程

    Figure  2.   Entity-relationship mining process

    图  3   实体及实体间关系

    Figure  3.   Entities and relationships between entities

    图  4   事故概述文本的图结构信息

    Figure  4.   Graph structure information of accident overview text

    图  5   事故单位情况文本的图结构信息

    Figure  5.   Graph structure information of accident unit situation text

    图  6   事故发生经过文本的图结构信息

    Figure  6.   Graph structure information of accident occurrence text

    图  7   信息抽取过程及案例

    Figure  7.   Information extraction process and case example

    图  8   数据预处理流程

    Figure  8.   Data preprocessing process

    图  9   顶板事故知识图谱

    Figure  9.   Knowledge graph of roof accident

    表  1   UIE模型与LLM在信息抽取任务上的对比结果

    Table  1   Comparison results of Universal Information Extraction(UIE) model and Large Language Model(LLM) in information extraction tasks

    模型 实体抽取 关系抽取
    精确率 召回率 F1 精确率 召回率 F1
    UIE 0.894 0.827 0.859 0.713 0.627 0.667
    GPT−3.5 0.893 0.847 0.870 0.887 0.904 0.895
    GLM_4 0.956 0.850 0.901 0.910 0.885 0.898
    ERNIE−4.0 0.752 0.836 0.792 0.788 0.817 0.802
    Qwen−7B−chat 0.883 0.855 0.869 0.862 0.881 0.871
    下载: 导出CSV

    表  2   LLM嵌入图结构Prompt前后在信息抽取任务上的对比结果

    Table  2   Comparison results of information extraction tasks before and after LLM embedded with Graph-Structured Prompt

    模型 实体抽取 关系抽取
    精确率 召回率 F1 精确率 召回率 F1
    GPT−3.5 未嵌入图结构Prompt 0.775 0.835 0.804 0.803 0.791 0.797
    嵌入图结构Prompt 0.893 0.847 0.870 0.887 0.904 0.895
    GLM_4 未嵌入图结构Prompt 0.831 0.679 0.793 0.785 0.794 0.789
    嵌入图结构Prompt 0.956 0.850 0.901 0.910 0.885 0.898
    ERNIE−4.0 未嵌入图结构Prompt 0.673 0.731 0.701 0.695 0.683 0.689
    嵌入图结构Prompt 0.752 0.836 0.792 0.788 0.817 0.802
    Qwen−7B−chat 未嵌入图结构Prompt 0.761 0.748 0.754 0.792 0.731 0.760
    嵌入图结构Prompt 0.883 0.855 0.869 0.862 0.881 0.871
    下载: 导出CSV
  • [1]

    JI Shaoxiong,PAN Shirui,CAMBRIA E,et al. A survey on knowledge graphs:representation,acquisition,and applications[J]. IEEE Transactions on Neural Networks and Learning Systems,2022,33(2):494-514. DOI: 10.1109/TNNLS.2021.3070843

    [2] 王国法,任怀伟,赵国瑞,等. 智能化煤矿数据模型及复杂巨系统耦合技术体系[J]. 煤炭学报,2022,47(1):61-74.

    WANG Guofa,REN Huaiwei,ZHAO Guorui,et al. Digital model and giant system coupling technology system of smart coal mine[J]. Journal of China Coal Society,2022,47(1):61-74.

    [3] 郭晓黎,王宇,刘瑞祥. 面向煤矿安全事件本体模型研究与应用[J]. 中国煤炭,2014,40(12):113-116.

    GUO Xiaoli,WANG Yu,LIU Ruixiang. Research and application of event ontology model of coal mine accidents[J]. China Coal,2014,40(12):113-116.

    [4] 潘理虎,张佳宇,张英俊,等. 煤矿领域知识图谱构建[J]. 计算机应用与软件,2019,36(8):47-54,59. DOI: 10.3969/j.issn.1000-386x.2019.08.009

    PAN Lihu,ZHANG Jiayu,ZHANG Yingjun,et al. Construction of knowledge graph in coal mine domain[J]. Computer Applications and Software,2019,36(8):47-54,59. DOI: 10.3969/j.issn.1000-386x.2019.08.009

    [5] 李蓓,王鹏,杨政,等. 基于多层次语义约束的煤矿灾害事件本体模型构建[J]. 陕西煤炭,2024,43(4):146-149.

    LI Bei,WANG Peng,YANG Zheng,et al. Disaster event ontology model building of coal mine based on multi-level semantic constraints[J]. Shaanxi Coal,2024,43(4):146-149.

    [6] 曹现刚,张梦园,雷卓,等. 煤矿装备维护知识图谱构建及应用[J]. 工矿自动化,2021,47(3):41-45.

    CAO Xiangang,ZHANG Mengyuan,LEI Zhuo,et al. Construction and application of knowledge graph for coal mine equipment maintenance[J]. Industry and Mine Automation,2021,47(3):41-45.

    [7] 王忠强,宋建鑫,余数三,等. 基于依存句法分析的智慧矿山知识图谱构建方法[J]. 矿业研究与开发,2023,43(10):232-240.

    WANG Zhongqiang,SONG Jianxin,YU Shusan,et al. A method of constructing knowledge graph of intelligent mines based on dependency syntax analysis[J]. Mining Research and Development,2023,43(10):232-240.

    [8] 韩一搏,董立红,叶鸥. 基于联合编码的煤矿综采设备知识图谱构建[J]. 工矿自动化,2024,50(4):84-93.

    HAN Yibo,DONG Lihong,YE Ou. Construction of knowledge graph for fully mechanized coal mining equipment based on joint coding[J]. Journal of Mine Automation,2024,50(4):84-93.

    [9]

    ZHONG Lingfeng,WU Jia,LI Qian,et al. A comprehensive survey on automatic knowledge graph construction[J]. ACM Computing Surveys,2023,56(4):1-62.

    [10]

    DAGDELEN J,DUNN A,LEE S,et al. Structured information extraction from scientific text with large language models[J]. Nature Communications,2024,15(1). DOI: 10.1038/s41467-024-45563-x.

    [11]

    HU Yan,CHEN Qingyu,DU Jingcheng,et al. Improving large language models for clinical named entity recognition via prompt engineering[J]. Journal of the American Medical Informatics Association,2024,31(9):1812-1820. DOI: 10.1093/jamia/ocad259

    [12]

    REMADI A,EL HAGE K,HOBEIKA Y,et al. To prompt or not to prompt:navigating the use of large language models for integrating and modeling heterogeneous data[J]. Data & Knowledge Engineering,2024,152. DOI: 10.1016/J.DATAK.2024.102313.

    [13]

    AGRAWAL M,HEGSELMANN S,LANG H,et al. Large language models are few-shot clinical information extractors[EB/OL]. [2024-07-25]. https://arxiv.org/abs/2205.12689v2.

    [14]

    WADHWA S,AMIR S,WALLACE B C. Revisiting relation extraction in the era of large language models[EB/OL]. [2024-07-25]. https://doi.org/10.48550/arXiv.2305.05003.

    [15] 冯钧,畅阳红,陆佳民,等. 基于大语言模型的水工程调度知识图谱的构建与应用[J]. 计算机科学与探索,2024,18(6):1637-1647. DOI: 10.3778/j.issn.1673-9418.2311098

    FENG Jun,CHANG Yanghong,LU Jiamin,et al. Construction and application of knowledge graph for water engineering scheduling based on large language model[J]. Journal of Frontiers of Computer Science and Technology,2024,18(6):1637-1647. DOI: 10.3778/j.issn.1673-9418.2311098

    [16]

    WANG Jiaqi,SHI Enze,YU Sigang,et al. Prompt engineering for healthcare:methodologies and applications[EB/OL]. [2024-07-25]. https://arxiv.org/abs/2304.14670?context=cs.

    [17]

    TONEVA M,SORDONI A,DES COMBES R T,et al. An empirical study of example forgetting during deep neural network learning[EB/OL]. [2024-07-25]. https://arxiv.org/abs/1812.05159.

    [18]

    LI Lei,JIN Li,ZHANG Zequn,et al. Graph convolution over multiple latent context-aware graph structures for event detection[J]. IEEE Access,2020,8:171435-171446. DOI: 10.1109/ACCESS.2020.3024872

    [19]

    ZHANG Qianjin,WANG Ronggui,YANG Juan,et al. Structural context-based knowledge graph embedding for link prediction[J]. Neurocomputing,2022,470:109-120. DOI: 10.1016/j.neucom.2021.10.088

    [20] 张吉祥,张祥森,武长旭,等. 知识图谱构建技术综述[J]. 计算机工程,2022,48(3):23-37.

    ZHANG Jixiang,ZHANG Xiangsen,WU Changxu,et al. Survey of knowledge graph construction techniques[J]. Computer Engineering,2022,48(3):23-37.

    [21]

    LU Yaojie,LIU Qing,DAI Dai,et al. Unified structure generation for universal information extraction[C]. The 60th Annual Meeting of the Association for Computational Linguistics,Dublin,2022:5755-5772.

图(9)  /  表(2)
计量
  • 文章访问数:  140
  • HTML全文浏览量:  16
  • PDF下载量:  51
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-08-13
  • 修回日期:  2025-02-27
  • 网络出版日期:  2025-02-27
  • 刊出日期:  2025-02-14

目录

    /

    返回文章
    返回