基于内在动机强化学习算法的煤矿井下运输机器人自主避障

赵克宝; 李灵锋; 陈茁; 韩骏; 尹瑞

doi:10.13272/j.issn.1671-251x.2025040020

基于内在动机强化学习算法的煤矿井下运输机器人自主避障

赵克宝^1,,
李灵锋¹,
陈茁¹,
韩骏^1, ,,
尹瑞^{2, 3}

1.
河北建材职业技术学院机电工程系，河北秦皇岛　066004
2.
中煤张家口煤矿机械有限责任公司，河北张家口　076250
3.
河北省高端智能矿山装备技术创新中心，河北张家口　076250

基金项目:

国家重点研发计划项目（2017YFF0210606）；河北省高等学校科学研究计划项目（ZD2022018，ZC2024136）。

详细信息

作者简介:
赵克宝（1977—），男，河北涿州人，副教授，硕士，主要研究方向为计算机科学，E-mail：99283920@qq.com

通讯作者:
韩骏(1982—)，男，河北抚宁人，副教授，高级工程师，硕士，研究方向为智能控制技术，E-mail:384042235@qq.com。

中图分类号: TD67
计量
- 文章访问数: 44
- HTML全文浏览量: 9
- PDF下载量: 15
出版历程
- 收稿日期: 2025-04-08
- 修回日期: 2025-06-23
- 网络出版日期: 2025-06-26
- 刊出日期: 2025-06-14

Autonomous obstacle avoidance of underground coal mine transport robots based on intrinsic motivation reinforcement learning algorithm

ZHAO Kebao^1,,
LI Lingfeng¹,
CHEN Zhuo¹,
HAN Jun^1, ,,
YIN Rui^{2, 3}

1.
Department of Mechanical and Electrical Engineering, Hebei Polytechnic of Building Materials, Qinhuangdao 066004, China
2.
China Coal Zhangjiakou Coal Mining Machinery Co., Ltd., Zhangjiakou 076250, China
3.
Hebei Province High-end Intelligent Mine Equipment Technology Innovation Center, Zhangjiakou 076250, China

摘要

摘要:
现有的机器人避障方法多依赖于预设规则或外部奖励信号，难以适应煤矿井下复杂多变的环境。为实现煤矿井下运输机器人自主高效避障，提出了一种基于内在动机强化学习（IM−RL）算法的机器人自主避障方法。煤矿井下运输机器人通过视觉传感器感知外界环境信息，利用基于好奇心的内在动机取向函数计算判别外界环境属性的内部奖赏值，利用外部动机奖励函数计算其动作属性的外部奖赏值，结合内在动机取向函数的奖励权重和外部动机奖励函数的奖励权重，计算运输机器人执行动作前后状态的综合奖赏值，形成强化学习算法奖励机制，通过深度置信网络对其状态进行训练和学习，激励运输机器人主动探索未知环境，同时利用自身记忆机制存储知识和经验，通过不断学习训练实现自主避障。在静态环境、动态环境和煤矿井下实际环境中分别进行运输机器人自主避障实验，结果表明：基于IM−RL算法的机器人自主避障路径和搜索时间较短，具有较强的泛化性和鲁棒性。
- 内在动机 /
- 强化学习 /
- 运输机器人 /
- 自主避障 /
- 路径规划
Abstract:
Existing robot obstacle avoidance methods mostly rely on preset rules or external reward signals, making it difficult to adapt to the complex and variable underground environment in coal mines. To achieve autonomous and efficient obstacle avoidance for underground coal mine transport robots, an autonomous obstacle avoidance method for underground coal mine transport robot based on Intrinsic Motivation Reinforcement Learning (IM-RL) algorithm was proposed. The underground coal mine transport robot perceived external environmental information through visual sensors, calculated internal reward values for identifying external environmental attributes using a curiosity-driven intrinsic motivation orientation function, and computed external reward values for its action attributes using an external motivation reward function. By combining the reward weights of the intrinsic motivation orientation function and the external motivation reward function, it calculated a comprehensive reward value based on the robot's state before and after performing an action, forming the reward mechanism of the reinforcement learning algorithm. The robot's state was trained through a deep belief network, which encouraged the transport robot to actively explore unknown environments. Meanwhile, it used its own memory mechanism to store knowledge and experience, achieving autonomous obstacle avoidance through continuous learning and training. Autonomous obstacle avoidance experiments for the transport robot were conducted in static environments, dynamic environments, and actual underground coal mine environments. The results showed that robots using the IM-RL algorithm achieved the short obstacle avoidance paths and search times, demonstrating strong generalization and robustness.
- intrinsic motivation /
- reinforcement learning /
- transport robot /
- autonomous obstacle avoidance /
- path planning

HTML全文

图 1 基于IM−RL算法的井下运输机器人自主避障流程

Figure 1. Autonomous obstacle avoidance process of underground transport robot based on IM-RL algorithm

下载: 全尺寸图片幻灯片

图 2 实验环境地图

Figure 2. Experimental environment map

下载: 全尺寸图片幻灯片

图 3 不同奖励权重下机器人静态避障路径

Figure 3. Static obstacle avoidance paths of robot under different reward weights

下载: 全尺寸图片幻灯片

图 4 不同奖励权重下机器人动态避障路径

Figure 4. Dynamic obstacle avoidance paths of robot under different reward weights

下载: 全尺寸图片幻灯片

图 5 不同算法下机器人动态避障路径

Figure 5. Dynamic obstacle avoidance paths of robot under different algorithms

下载: 全尺寸图片幻灯片

图 6 煤矿井下环境中不同算法下机器人避障路径

Figure 6. Obstacle avoidance paths of robot under different algorithms in coal mine underground environment

下载: 全尺寸图片幻灯片

表 1 不同奖励权重下机器人静态避障仿真数据

Table 1 Simulation data of robot static obstacle avoidance under different reward weights

奖励权重路径距离/m 搜索时间/s

ξ=0.95，η=0.05 76.59 —

ξ=0.85，η=0.15 73.53 6.56

ξ=0.90，η=0.10 56.70 5.68

下载: 导出CSV

表 2 不同算法下机器人静态避障仿真实验数据

Table 2 Simulation experiment data of robot static obstacle avoidance under different algorithms

算法路径距离/m 搜索时间/s

CNN 59.22 29.42

混合A^* 58.55 15.49

改进A^*−DWA 57.70 10.63

IM−RL 56.70 5.68

下载: 导出CSV

表 3 不同奖励权重下机器人动态避障仿真实验数据

Table 3 Simulation experiment data of robot dynamic obstacle avoidance under different reward weights

奖励权重路径距离/m 搜索时间/s

ξ=0.95，η=0.05 76.44 13.43

ξ=0.85，η=0.15 82.21 12.39

ξ=0.90，η=0.10 58.28 9.12

下载: 导出CSV

表 4 不同算法下机器人动态避障仿真实验数据

Table 4 Simulation experiment data of robot dynamic obstacle avoidance under different algorithms

算法路径距离/m 搜索时间/s

CNN 76.00 36.57

混合A^* 71.94 33.18

改进A^*−DWA 66.97 31.26

IM−RL 58.28 9.12

下载: 导出CSV

表 5 煤矿井下环境中不同算法下机器人避障实验数据

Table 5 Experimental data of robot obstacle avoidance under different algorithms in coal mine underground environment

算法路径距离/m 搜索时间/s

CNN 63.24 57.36

混合A^* 62.39 54.47

改进A^*−DWA 67.00 49.52

IM−RL 57.29 11.67

下载: 导出CSV

参考文献(21)

[1]	杨春雨,张鑫. 煤矿机器人环境感知与路径规划关键技术[J]. 煤炭学报,2022,47(7):2844-2872. YANG Chunyu,ZHANG Xin. Key technologies of coal mine robots for environment perception and path planning[J]. Journal of China Coal Society,2022,47(7):2844-2872.
[2]	曹现刚,藏家松,吴旭东,等. 基于AE−RRT^的煤矸分拣机器人避障拣轨迹规划方法[J/OL]. 煤炭学报:1-12[2025-03-27]. https://link.cnki.net/doi/10.13225/j.cnki.jccs.2024.1195. CAO Xiangang,ZANG Jiasong,WU Xudong,et al. Obstacle avoidance trajectory planning method for coal gangue sorting robot based on AE-RRT^[J/OL]. Journal of China Coal Society: 1-12[2025-03-27]. https://link.cnki.net/doi/10.13225/j.cnki.jccs.2024.1195.
[3]	金将,王小平,臧铁钢,等. 基于改进蚁群算法的机器人避障路径规划[J]. 计算机工程与设计,2025,46(4):950-958. JIN Jiang,WANG Xiaoping,ZANG Tiegang,et al. Robot obstacle avoidance path planning based on improved ant colony algorithm[J]. Computer Engineering and Design,2025,46(4):950-958.
[4]	张彪,李永强. 基于动态寻优蚁群算法的移动机器人路径规划[J]. 仪器仪表学报,2025,46(3):74-85. ZHANG Biao,LI Yongqiang. Path planning of mobile robot based on the dynamic optimization ant colony algorithm[J]. Chinese Journal of Scientific Instrument,2025,46(3):74-85.
[5]	王欣,邓玉娇,常俊林. 矿井救灾机器人运动学分析及避障策略研究[J]. 煤矿机械,2013,34(2):69-71. WANG Xin,DENG Yujiao,CHANG Junlin. Kinematics analysis and obstacle avoidance strategy research of mine rescue robot[J]. Coal Mine Machinery,2013,34(2):69-71.
[6]	张辰,范永,李贻斌,等. 人工智能在煤矿机器人中的应用[J]. 中国煤炭,2021,47(1):93-98. DOI: 10.3969/j.issn.1006-530X.2021.01.014 ZHANG Chen,FAN Yong,LI Yibin,et al. Application of artificial intelligence in coal mine robots[J]. China Coal,2021,47(1):93-98. DOI: 10.3969/j.issn.1006-530X.2021.01.014
[7]	巩固,朱华. 基于目标识别与避障的煤矿救援机器人自主行走[J]. 南京理工大学学报,2022,46(1):32-39. GONG Gu,ZHU Hua. Autonomous walking of coal mine rescue robot based on target recognition and obstacle avoidance[J]. Journal of Nanjing University of Science and Technology,2022,46(1):32-39.
[8]	李芳威,鲍久圣,王陈,等. 基于LD改进Cartographer建图算法的无人驾驶无轨胶轮车井下SLAM自主导航方法及试验[J]. 煤炭学报,2024,49(增刊2):1271-1284. LI Fangwei,BAO Jiusheng,WANG Chen,et al. Unmanned trackless rubber wheeler based on LD improved Cartographer mapping algorithm underground SLAM autonomous navigation method and test[J]. Journal of China Coal Society,2024,49(S2):1271-1284.
[9]	张立亚,李晨鑫,刘斌,等. 基于子图像分割映射点云空间的机器人避障算法[J]. 煤炭科学技术,2024,52(增刊2):368-374. ZHANG Liya,LI Chenxin,LIU Bin,et al. Obstacle avoidance algorithm based on sub-image segmentation and mapping point cloud space[J]. Coal Science and Technology,2024,52(S2):368-374.
[10]	宋秦中,胡华亮. 基于CNN算法的井下无人驾驶无轨胶轮车避障方法[J]. 金属矿山,2023(10):168-174. SONG Qinzhong,HU Hualiang. Obstacle avoidance method for underground unmanned trackless rubber-tyred vehicle based on CNN algorithm[J]. Metal Mine,2023(10):168-174.
[11]	郭爱军,杨腾,潘子宇. 动态环境下无人矿车速度规划与避障方法[J]. 矿业研究与开发,2024,44(7):239-245. GUO Aijun,YANG Teng,PAN Ziyu. Speed planning and obstacle avoidance method for unmanned mining vehicle in dynamic environment[J]. Mining Research and Development,2024,44(7):239-245.
[12]	张可琨,鲍久圣,艾俊伟,等. 基于改进A^与DWA算法的井下搬运机器人自主行走路径规划[J]. 煤炭科学技术,2024,52(11):197-213. DOI: 10.12438/cst.2024-0747 ZHANG Kekun,BAO Jiusheng,AI Junwei,et al. Autonomous walking path planning of underground handling robot based on improved A^ and DWA algorithm[J]. Coal Science and Technology,2024,52(11):197-213. DOI: 10.12438/cst.2024-0747
[13]	YANG Hongxia,TENG Xingqiang. Mobile robot path planning based on enhanced dynamic window approach and improved a algorithm[J]. Journal of Robotics,2022. DOI: 10.1155/2022/2183229.
[14]	XU Zhenyang,YUAN Wei. Mobile robot path planning based on fusion of improved A^* algorithm and adaptive DWA algorithm[J]. Journal of Physics:Conference Series,2022,2330(1). DOI: 10.1088/1742-6596/2330/1/012003.
[15]	彭继国,张波,孙凌飞,等. 井下移动机器人智能视觉避障研究[J]. 工矿自动化,2020,46(9):51-56,63. PENG Jiguo,ZHANG Bo,SUN Lingfei,et al. Research on intelligent visual obstacle avoidance of underground mobile robot[J]. Industry and Mine Automation,2020,46(9):51-56,63.
[16]	王利民,孙瑞峰,翟国栋,等. 融合改进A^算法与动态窗口法的煤矿足式机器人路径规划[J]. 工矿自动化,2024,50(6):112-119. WANG Limin,SUN Ruifeng,ZHAI Guodong,et al. Path planning of coal mine foot robot by integrating improved A^ algorithm and dynamic window approach[J]. Journal of Mine Automation,2024,50(6):112-119.
[17]	鲁志,刘莹煌,张绪坤,等. 融合A^与DWA算法的移动机器人动态避障研究[J]. 电子测量技术,2025,48(8):34-45. LU Zhi,LIU Yinghuang,ZHANG Xukun,et al. Research on mobile robot dynamic obstacle avoidance by fusing A^ and DWA algorithms[J]. Electronic Measurement Technology,2025,48(8):34-45.
[18]	HARLOW H F. Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys[J]. Journal of Comparative and Physiological Psychology,1950,43(4):289-294. DOI: 10.1037/h0058114
[19]	李福进,张俊琴,任红格. 基于仿生学内在动机的Q学习算法移动机器人路径规划研究[J]. 现代电子技术,2019,42(17):133-137. LI Fujin,ZHANG Junqin,REN Hongge. Research on mobile robot path planning by Q-learning algorithm based on bionics intrinsic motivation[J]. Modern Electronics Technique,2019,42(17):133-137.
[20]	阮晓钢,张家辉,黄静,等. 一种结合内在动机理论的移动机器人环境认知模型[J]. 控制与决策,2021,36(9):2211-2217. RUAN Xiaogang,ZHANG Jiahui,HUANG Jing,et al. An environment cognition model combined with intrinsic motivation for mobile robots[J]. Control and Decision,2021,36(9):2211-2217.
[21]	曾俊杰,秦龙,徐浩添,等. 基于内在动机的深度强化学习探索方法综述[J]. 计算机研究与发展,2023,60(10):2359-2382. DOI: 10.7544/issn1000-1239.202220388 ZENG Junjie,QIN Long,XU Haotian,et al. Exploration approaches in deep reinforcement learning based on intrinsic motivation:a review[J]. Journal of Computer Research and Development,2023,60(10):2359-2382. DOI: 10.7544/issn1000-1239.202220388

施引文献

资源附件(0)

图(6) / 表(5)

计量

文章访问数: 44
HTML全文浏览量: 9
PDF下载量: 15
被引次数: 0

奖励权重	路径距离/m	搜索时间/s
ξ=0.95，η=0.05	76.59	—
ξ=0.85，η=0.15	73.53	6.56
ξ=0.90，η=0.10	56.70	5.68

算法	路径距离/m	搜索时间/s
CNN	59.22	29.42
混合A^*	58.55	15.49
改进A^*−DWA	57.70	10.63
IM−RL	56.70	5.68

奖励权重	路径距离/m	搜索时间/s
ξ=0.95，η=0.05	76.44	13.43
ξ=0.85，η=0.15	82.21	12.39
ξ=0.90，η=0.10	58.28	9.12

算法	路径距离/m	搜索时间/s
CNN	76.00	36.57
混合A^*	71.94	33.18
改进A^*−DWA	66.97	31.26
IM−RL	58.28	9.12

算法	路径距离/m	搜索时间/s
CNN	63.24	57.36
混合A^*	62.39	54.47
改进A^*−DWA	67.00	49.52
IM−RL	57.29	11.67

基于内在动机强化学习算法的煤矿井下运输机器人自主避障

作者简介: 赵克宝（1977—），男，河北涿州人，副教授，硕士，主要研究方向为计算机科学，E-mail：99283920@qq.com

通讯作者: 韩骏(1982—)，男，河北抚宁人，副教授，高级工程师，硕士，研究方向为智能控制技术，E-mail:384042235@qq.com。

计量

出版历程

Autonomous obstacle avoidance of underground coal mine transport robots based on intrinsic motivation reinforcement learning algorithm

计量

出版历程

目录

作者简介:
赵克宝（1977—），男，河北涿州人，副教授，硕士，主要研究方向为计算机科学，E-mail：99283920@qq.com

通讯作者:
韩骏(1982—)，男，河北抚宁人，副教授，高级工程师，硕士，研究方向为智能控制技术，E-mail:384042235@qq.com。