面向露天矿无人驾驶矿卡的轻量级多尺度目标检测模型

李远成; 侯亚蕾; 任一铭; 田滨; 要婷婷

doi:10.13272/j.issn.1671-251x.2025090133

面向露天矿无人驾驶矿卡的轻量级多尺度目标检测模型

Lightweight multi-scale object detection model for autonomous mining trucks in open-pit mines

摘要

摘要: 露天矿无人驾驶矿卡场景中低照度、高粉尘等导致多尺度目标并存时检测准确率低，且模型参数规模大，难以在检测精度与轻量化部署之间实现有效平衡。针对上述问题，提出了一种面向露天矿无人驾驶矿卡的轻量级多尺度目标检测模型——改进YOLOv11n模型。该模型在主干网络浅层C3k2模块中引入混合Token（MToken），通过并行多空洞率分支卷积增强对多尺度目标的特征提取能力；在主干网络深层C3k2模块中引入多查找表（MuLUT），通过深层语义特征建模增强对多尺度目标的判别能力；使用亮度增强自注意力（ILSA）模块替代C3k2模块，提升了复杂低照度环境下的特征表达质量；在金字塔稀疏变换器（PST）模块中引入自适应Top−k选择策略，替代原有颈部特征金字塔结构中的C3k2模块，通过跨尺度特征增强提高对多尺度目标的捕获能力。实验结果表明：① 相较于YOLOv11n模型，改进YOLOv11n模型mAP@0.5提升了3.7%，参数量、计算量和模型大小分别降低了26.7%，30.2%和21.8%。② 改进YOLOv11n模型与SSD，Faster R−CNN，YOLOv11n，YOLOv12n和YOLOv13n相比，mAP@0.5最优，且参数量、计算量和模型大小最小，易于边缘部署。③ 将改进YOLOv11n模型部署在端侧设备，能够准确识别车辆和行人，推理速度达27.6帧/s，模型大小为2.673 MiB，具备优异的实时性与部署效率。

Abstract: In the scenario of autonomous mining trucks in open-pit mines, low illumination and high dust conditions lead to low detection accuracy when multi-scale targets coexist, and the large parameter scale of models makes it difficult to achieve an effective balance between detection accuracy and lightweight deployment. To address these issues, a lightweight multi-scale object detection model for autonomous mining trucks in open-pit mines was proposed, referred to as the improved YOLOv11n model. In the shallow C3k2 modules of the backbone network, a Mixed Token (MToken) was introduced, and parallel multi-dilation-rate convolution branches were used to enhance feature extraction capability for multi-scale targets. In the deep C3k2 modules of the backbone network, a Multiple Look-Up Tables (MuLUT) was introduced, and deep semantic feature modeling was used to enhance discrimination capability for multi-scale targets. An Intensity Lighten Self-Attention (ILSA) module was used to replace the C3k2 modules, improving feature representation quality under complex low-illumination conditions. In the Pyramid Sparse Transformer (PST) module, an adaptive Top-k selection strategy was introduced to replace the C3k2 modules in the original neck feature pyramid structure, and cross-scale feature enhancement was used to improve the capture capability for multi-scale targets. Experimental results showed that: ① compared with the YOLOv11n model, the improved YOLOv11n model achieved a 3.2% increase in mAP@0.5, while the number of parameters, computational cost, and model size were reduced by 26.7%, 30.2%, and 21.8%, respectively. ② Compared with SSD, Faster R-CNN, YOLOv11n, YOLOv12n, and YOLOv13n, the improved YOLOv11n model achieved the best mAP@0.5, and had the smallest number of parameters, computational cost, and model size, making it suitable for edge deployment. ③ When deployed on edge devices, the improved YOLOv11n model could accurately detect vehicles and pedestrians, achieving an inference speed of 27.6 frames/s and a model size of 2.673 MiB, demonstrating excellent real-time performance and deployment efficiency.

HTML全文

参考文献(21)

施引文献

资源附件(0)