Abstract:
At present, mine personnel target detection methods based on typical models such as YOLOv8n and YOLOv11n and their improved variants increase parameter counts and computational cost to different extents in order to improve detection performance, making it difficult to meet the dual requirements of detection accuracy and real-time performance. The YOLOv12n model introduces a region attention mechanism, which effectively addresses challenges such as scale variations of personnel targets, occlusion, and complex scene interference. Based on YOLOv12n, a YOLO model for mine personnel target detection, named MP-SCW-YOLO, was proposed. To address feature extraction difficulties caused by low illumination, occlusion, and complex backgrounds, the model introduced a Space-to-Depth Convolution (SPDConv) module into the backbone, which enhanced the network’s ability to extract features of low-resolution and small targets under weak illumination in mines while maintaining computational efficiency. To address the problem that the contrast between personnel targets and the background was low, and key features were easily weakened by complex background interference, a Channel and Position Attention Mechanism (CPAM) module was integrated into the neck to further enhance the representation of key target features in both channel and spatial positions. To address detection difficulties caused by low illumination, personnel occlusion, and target scale variation, the WIoUv3 loss function was adopted to alleviate bounding box regression deviations caused by occlusion and pose variation, thereby improving localization accuracy. The experimental results showed that: ① on both public and self-constructed mine datasets, the MP-SCW-YOLO model outperformed the baseline YOLOv12n in detection performance, generalization performance, and lightweight characteristics. ② The public dataset was used to evaluate overall detection performance and lightweight characteristics. The results showed that, compared with the baseline model,
F1, precision, recall, and mAP@0.5 were improved by 2.6%, 4.3%, 1.0%, and 2.1%, respectively, while the number of model parameters and floating-point operations were reduced by 13.15% and 6.90%, respectively. ③ The self-constructed dataset was used to verify the model’s scene generalization performance and algorithm robustness. The results showed that, compared with the baseline model, F1, recall, and mAP@0.5 were improved by 0.9%, 2.1%, and 1.1%, respectively, while the number of model parameters and floating-point operations were reduced by 13.15% and 6.90%, respectively. Compared with other mainstream models, the MP-SCW-YOLO model achieves optimal detection performance while maintaining good lightweight characteristics, and can better meet the practical requirements of underground mine personnel target detection.