Abstract:
In the scenario of autonomous mining trucks in open-pit mines, low illumination and high dust conditions lead to low detection accuracy when multi-scale targets coexist, and the large parameter scale of models makes it difficult to achieve an effective balance between detection accuracy and lightweight deployment. To address these issues, a lightweight multi-scale object detection model for autonomous mining trucks in open-pit mines was proposed, referred to as the improved YOLOv11n model. In the shallow C3k2 modules of the backbone network, a Mixed Token (MToken) was introduced, and parallel multi-dilation-rate convolution branches were used to enhance feature extraction capability for multi-scale targets. In the deep C3k2 modules of the backbone network, a Multiple Look-Up Tables (MuLUT) was introduced, and deep semantic feature modeling was used to enhance discrimination capability for multi-scale targets. An Intensity Lighten Self-Attention (ILSA) module was used to replace the C3k2 modules, improving feature representation quality under complex low-illumination conditions. In the Pyramid Sparse Transformer (PST) module, an adaptive Top-k selection strategy was introduced to replace the C3k2 modules in the original neck feature pyramid structure, and cross-scale feature enhancement was used to improve the capture capability for multi-scale targets. Experimental results showed that: ① compared with the YOLOv11n model, the improved YOLOv11n model achieved a 3.2% increase in mAP@0.5, while the number of parameters, computational cost, and model size were reduced by 26.7%, 30.2%, and 21.8%, respectively. ② Compared with SSD, Faster R-CNN, YOLOv11n, YOLOv12n, and YOLOv13n, the improved YOLOv11n model achieved the best mAP@0.5, and had the smallest number of parameters, computational cost, and model size, making it suitable for edge deployment. ③ When deployed on edge devices, the improved YOLOv11n model could accurately detect vehicles and pedestrians, achieving an inference speed of 27.6 frames/s and a model size of 2.673 MiB, demonstrating excellent real-time performance and deployment efficiency.