面向露天矿无人驾驶矿卡的轻量级多尺度目标检测模型

Lightweight multi-scale object detection model for autonomous mining trucks in open-pit mines

  • 摘要: 露天矿无人驾驶矿卡场景中低照度、高粉尘等导致多尺度目标并存时检测准确率低,且模型参数规模大,难以在检测精度与轻量化部署之间实现有效平衡。针对上述问题,提出了一种面向露天矿无人驾驶矿卡的轻量级多尺度目标检测模型——改进YOLOv11n模型。该模型在主干网络浅层C3k2模块中引入混合Token(MToken),通过并行多空洞率分支卷积增强对多尺度目标的特征提取能力;在主干网络深层C3k2模块中引入多查找表(MuLUT),通过深层语义特征建模增强对多尺度目标的判别能力;使用亮度增强自注意力(ILSA)模块替代C3k2模块,提升了复杂低照度环境下的特征表达质量;在金字塔稀疏变换器(PST)模块中引入自适应Top−k选择策略,替代原有颈部特征金字塔结构中的C3k2模块,通过跨尺度特征增强提高对多尺度目标的捕获能力。实验结果表明:① 相较于YOLOv11n模型,改进YOLOv11n模型mAP@0.5提升了3.7%,参数量、计算量和模型大小分别降低了26.7%,30.2%和21.8%。② 改进YOLOv11n模型与SSD,Faster R−CNN,YOLOv11n,YOLOv12n和YOLOv13n相比,mAP@0.5最优,且参数量、计算量和模型大小最小,易于边缘部署。③ 将改进YOLOv11n模型部署在端侧设备,能够准确识别车辆和行人,推理速度达27.6帧/s,模型大小为2.673 MiB,具备优异的实时性与部署效率。

     

    Abstract: In the scenario of autonomous mining trucks in open-pit mines, low illumination and high dust conditions lead to low detection accuracy when multi-scale targets coexist, and the large parameter scale of models makes it difficult to achieve an effective balance between detection accuracy and lightweight deployment. To address these issues, a lightweight multi-scale object detection model for autonomous mining trucks in open-pit mines was proposed, referred to as the improved YOLOv11n model. In the shallow C3k2 modules of the backbone network, a Mixed Token (MToken) was introduced, and parallel multi-dilation-rate convolution branches were used to enhance feature extraction capability for multi-scale targets. In the deep C3k2 modules of the backbone network, a Multiple Look-Up Tables (MuLUT) was introduced, and deep semantic feature modeling was used to enhance discrimination capability for multi-scale targets. An Intensity Lighten Self-Attention (ILSA) module was used to replace the C3k2 modules, improving feature representation quality under complex low-illumination conditions. In the Pyramid Sparse Transformer (PST) module, an adaptive Top-k selection strategy was introduced to replace the C3k2 modules in the original neck feature pyramid structure, and cross-scale feature enhancement was used to improve the capture capability for multi-scale targets. Experimental results showed that: ① compared with the YOLOv11n model, the improved YOLOv11n model achieved a 3.2% increase in mAP@0.5, while the number of parameters, computational cost, and model size were reduced by 26.7%, 30.2%, and 21.8%, respectively. ② Compared with SSD, Faster R-CNN, YOLOv11n, YOLOv12n, and YOLOv13n, the improved YOLOv11n model achieved the best mAP@0.5, and had the smallest number of parameters, computational cost, and model size, making it suitable for edge deployment. ③ When deployed on edge devices, the improved YOLOv11n model could accurately detect vehicles and pedestrians, achieving an inference speed of 27.6 frames/s and a model size of 2.673 MiB, demonstrating excellent real-time performance and deployment efficiency.

     

/

返回文章
返回