基于CAF−YOLO的煤矿井下打钻作业识别与工序判别

瞿雨馨; 张富凯; 郭峰; 蔡琛昂; 李爱军; 董璐; 常文静; 孙一菲

doi:10.13272/j.issn.1671-251x.2026030038

基于CAF−YOLO的煤矿井下打钻作业识别与工序判别

Drilling operation recognition and process classification in underground coal mines based on CAF-YOLO

摘要

摘要: 煤矿井下打钻场景具有光照不均、背景杂乱与强反光并存等成像退化特征，同时钻杆、钻尾等目标呈细长连续结构并伴随显著方向变化，在退钻/进钻过程中频繁发生遮挡与重叠，目标尺度跨度大、背景纹理干扰强，这使得仅依赖水平框的常规检测难以满足工程需求。旋转框（OBB）检测具备更强的角度表征与长程依赖建模能力，使得目标定位、类别判别与框角度描述等结果更加准确。提出一种基于旋转目标检测模型（CAF−YOLO）的煤矿井下打钻作业识别与工序判别方法。CAF−YOLO在YOLOv13n−OBB的基础上进行改进：在骨干网络中引入RepLKNet大核结构模块，通过“大核卷积+小核卷积”的双路径协同建模增强长程上下文感知能力，提升钻杆、钻尾等细长结构目标在遮挡与复杂背景下的连续特征表达；在颈部网络中引入自适应特征融合模块，联合局部注意力与全局注意力生成融合权重，对跨层特征进行动态加权重组，抑制冗余背景响应并强化小目标及遮挡目标的判别特征；采用Focal Loss在分类分支对困难样本赋予更高权重，缓解正负样本与难易样本不均衡问题，从而提升模型在复杂井下工况下的检测鲁棒性与泛化能力。在此基础上，进一步构建面向打钻视频的时序状态判别方法，利用关键目标的空间位置、运动状态与人员接近信息进行视频级建模，实现打钻工序的准确判别。在自建UCMDO−OBB数据集上的实验结果表明，相较于YOLOv13n−OBB，CAF−YOLO的识别精确率提升了1.8%，召回率提升了4.3%，mAP@0.5提升了3.2%，mAP@0.5:0.95提升了3.3%，改进效果显著。在工序判别任务上，准确率达92.7%，宏平均F₁分数达0.912，推进阶段平均绝对百分比误差为4.1%，能够有效实现视频级过程分析，为井下打钻作业的智能监测与安全管理提供技术支撑。

Abstract: Underground coal mine drilling scenes exhibit imaging degradation characteristics such as uneven illumination, cluttered background, and strong reflection coexisting. Meanwhile, targets such as drill rods and drill tails present slender and continuous structures with significant orientation variations, and occlusion and overlap frequently occur during drilling-in and drilling-out processes. The large variation in target scale and strong background texture interference make conventional detection relying only on horizontal bounding boxes difficult to meet engineering requirements. Oriented Bounding Box (OBB) detection has stronger angle representation and long-range dependency modeling capability, which makes the results of target localization, category classification, and angle description of bounding boxes more accurate. A method for drilling operation recognition and process classification in underground coal mines based on an oriented object detection model, CAF-YOLO, was proposed. CAF-YOLO was improved based on YOLOv13n-OBB. In the backbone network, a RepLKNet large-kernel structure module was introduced, and the dual-path collaborative modeling of large-kernel convolution and small-kernel convolution enhanced long-range contextual perception and improved the capability of continuous feature representation of slender structural targets such as drill rods and drill tails under occlusion and complex backgrounds. In the neck network, an adaptive feature fusion module was introduced, which combined local attention and global attention to generate fusion weights, dynamically reweighted and reorganized cross-layer features, suppressed redundant background responses, and enhanced discriminative features of small and occluded targets. Focal Loss was adopted in the classification branch to assign higher weights to hard samples, which alleviated the imbalance between positive and negative samples and between easy and hard samples, thereby improving the robustness and generalization performance of the model under complex underground working conditions. On this basis, a temporal state classification method for drilling videos was further constructed. The method utilized the spatial position, motion state, and personnel proximity information of key targets for video-level modeling to achieve accurate classification of drilling processes. Experimental results on the self-built UCMDO-OBB dataset showed that, compared with YOLOv13n-OBB, the precision of CAF-YOLO increased by 1.8%, the recall increased by 4.3%, mAP@0.5 increased by 3.2%, and mAP@0.5:0.95 increased by 3.3%, indicating significant improvement. In the process classification task, the accuracy reached 92.7%, the macro-average F₁ score reached 0.912, and the mean absolute percentage error of the advancing stage was 4.1%. The proposed method effectively enabled video-level process analysis and provided technical support for intelligent monitoring and safety management of underground drilling operations.

HTML全文

参考文献(26)

施引文献

资源附件(0)