基于内在动机强化学习算法的煤矿井下运输机器人自主避障

赵克宝; 李灵锋; 陈茁; 韩骏; 尹瑞

doi:10.13272/j.issn.1671-251x.2025040020

基于内在动机强化学习算法的煤矿井下运输机器人自主避障

Autonomous obstacle avoidance of underground coal mine transport robots based on intrinsic motivation reinforcement learning algorithm

摘要

摘要: 现有的机器人避障方法多依赖于预设规则或外部奖励信号，难以适应煤矿井下复杂多变的环境。为实现煤矿井下运输机器人自主高效避障，提出了一种基于内在动机强化学习（IM−RL）算法的机器人自主避障方法。煤矿井下运输机器人通过视觉传感器感知外界环境信息，利用基于好奇心的内在动机取向函数计算判别外界环境属性的内部奖赏值，利用外部动机奖励函数计算其动作属性的外部奖赏值，结合内在动机取向函数的奖励权重和外部动机奖励函数的奖励权重，计算运输机器人执行动作前后状态的综合奖赏值，形成强化学习算法奖励机制，通过深度置信网络对其状态进行训练和学习，激励运输机器人主动探索未知环境，同时利用自身记忆机制存储知识和经验，通过不断学习训练实现自主避障。在静态环境、动态环境和煤矿井下实际环境中分别进行运输机器人自主避障实验，结果表明：基于IM−RL算法的机器人自主避障路径和搜索时间较短，具有较强的泛化性和鲁棒性。

Abstract: Existing robot obstacle avoidance methods mostly rely on preset rules or external reward signals, making it difficult to adapt to the complex and variable underground environment in coal mines. To achieve autonomous and efficient obstacle avoidance for underground coal mine transport robots, an autonomous obstacle avoidance method for underground coal mine transport robot based on Intrinsic Motivation Reinforcement Learning (IM-RL) algorithm was proposed. The underground coal mine transport robot perceived external environmental information through visual sensors, calculated internal reward values for identifying external environmental attributes using a curiosity-driven intrinsic motivation orientation function, and computed external reward values for its action attributes using an external motivation reward function. By combining the reward weights of the intrinsic motivation orientation function and the external motivation reward function, it calculated a comprehensive reward value based on the robot's state before and after performing an action, forming the reward mechanism of the reinforcement learning algorithm. The robot's state was trained through a deep belief network, which encouraged the transport robot to actively explore unknown environments. Meanwhile, it used its own memory mechanism to store knowledge and experience, achieving autonomous obstacle avoidance through continuous learning and training. Autonomous obstacle avoidance experiments for the transport robot were conducted in static environments, dynamic environments, and actual underground coal mine environments. The results showed that robots using the IM-RL algorithm achieved the short obstacle avoidance paths and search times, demonstrating strong generalization and robustness.

HTML全文

参考文献(21)

施引文献

资源附件(0)