基于改进YOLOv12n的矿井人员目标检测算法

范伟强; 胡玉涵; 李晓宇; 张高敏; 萨日娜

doi:10.13272/j.issn.1671-251x.18276

摘要: 目前基于YOLOv8n和YOLOv11n等典型模型及其改进模型的矿井人员目标检测方法为了提升检测性能，在不同程度上增加了参数量和计算成本，难以满足对人员检测精度与实时性的双重需求。YOLOv12n模型通过引入区域注意力机制，可有效应对人员目标尺度变化、遮挡及复杂场景干扰等挑战。基于YOLOv12n，提出了一种面向矿井人员目标检测的YOLO模型MP−SCW−YOLO。针对因低光照、遮挡及背景复杂等因素导致的特征提取困难问题，该模型在Backbone中引入空间深度转换卷积（SPDConv）模块，在兼顾计算效率的前提下，增强网络对矿井弱光照下低分辨率及小目标的特征提取能力。针对人员目标与背景对比度低、关键特征易被复杂背景干扰而弱化的问题，在Neck中融入通道与位置注意力机制（CPAM）模块，进一步强化目标关键特征在通道和空间位置上的表达能力。针对低光照、人员遮挡及目标尺度变化等因素导致的检测困难问题，采用WIoUv3损失函数缓解因遮挡、姿态变化等造成的边界框回归偏差，提升定位精度。实验结果表明：① 在公开和自建的矿井数据集上，MP−SCW−YOLO模型在检测性能、泛化能力和轻量化方面均优于基准模型YOLOv12n。② 基于公开数据集评估模型的整体检测性能与轻量化效果，结果表明：相较于基准模型，该模型的F₁分数、精确度、召回率、mAP@0.5分别提升2.6%，4.3%，1.0%和2.1%，模型参数量和浮点运算数分别降低了13.15%和6.90%。③ 自建的矿井数据集用于验证模型的场景泛化能力及算法鲁棒性，结果表明：相较于基准模型，该模型的F₁分数、召回率、mAP@0.5分别提升0.9%，2.1%和1.1%，模型参数量和浮点运算数分别降低了13.15%和6.90%。相较于同类主流模型，MP−SCW−YOLO模型在保持最优检测性能的同时兼具了良好的轻量化特性，能够更好地满足矿井下人员目标检测的实际需求。

Abstract: At present, mine personnel target detection methods based on typical models such as YOLOv8n and YOLOv11n and their improved variants increase parameter counts and computational cost to different extents in order to improve detection performance, making it difficult to meet the dual requirements of detection accuracy and real-time performance. The YOLOv12n model introduces a region attention mechanism, which effectively addresses challenges such as scale variations of personnel targets, occlusion, and complex scene interference. Based on YOLOv12n, a YOLO model for mine personnel target detection, named MP-SCW-YOLO, was proposed. To address feature extraction difficulties caused by low illumination, occlusion, and complex backgrounds, the model introduced a Space-to-Depth Convolution (SPDConv) module into the backbone, which enhanced the network’s ability to extract features of low-resolution and small targets under weak illumination in mines while maintaining computational efficiency. To address the problem that the contrast between personnel targets and the background was low, and key features were easily weakened by complex background interference, a Channel and Position Attention Mechanism (CPAM) module was integrated into the neck to further enhance the representation of key target features in both channel and spatial positions. To address detection difficulties caused by low illumination, personnel occlusion, and target scale variation, the WIoUv3 loss function was adopted to alleviate bounding box regression deviations caused by occlusion and pose variation, thereby improving localization accuracy. The experimental results showed that: ① on both public and self-constructed mine datasets, the MP-SCW-YOLO model outperformed the baseline YOLOv12n in detection performance, generalization performance, and lightweight characteristics. ② The public dataset was used to evaluate overall detection performance and lightweight characteristics. The results showed that, compared with the baseline model, F₁, precision, recall, and mAP@0.5 were improved by 2.6%, 4.3%, 1.0%, and 2.1%, respectively, while the number of model parameters and floating-point operations were reduced by 13.15% and 6.90%, respectively. ③ The self-constructed dataset was used to verify the model’s scene generalization performance and algorithm robustness. The results showed that, compared with the baseline model, F1, recall, and mAP@0.5 were improved by 0.9%, 2.1%, and 1.1%, respectively, while the number of model parameters and floating-point operations were reduced by 13.15% and 6.90%, respectively. Compared with other mainstream models, the MP-SCW-YOLO model achieves optimal detection performance while maintaining good lightweight characteristics, and can better meet the practical requirements of underground mine personnel target detection.

基于改进YOLOv12n的矿井人员目标检测算法

Algorithm for underground mine personnel target detection based on improved YOLOv12n