Abstract:
Underground coal mine drilling scenes exhibit imaging degradation characteristics such as uneven illumination, cluttered background, and strong reflection coexisting. Meanwhile, targets such as drill rods and drill tails present slender and continuous structures with significant orientation variations, and occlusion and overlap frequently occur during drilling-in and drilling-out processes. The large variation in target scale and strong background texture interference make conventional detection relying only on horizontal bounding boxes difficult to meet engineering requirements. Oriented Bounding Box (OBB) detection has stronger angle representation and long-range dependency modeling capability, which makes the results of target localization, category classification, and angle description of bounding boxes more accurate. A method for drilling operation recognition and process classification in underground coal mines based on an oriented object detection model, CAF-YOLO, was proposed. CAF-YOLO was improved based on YOLOv13n-OBB. In the backbone network, a RepLKNet large-kernel structure module was introduced, and the dual-path collaborative modeling of large-kernel convolution and small-kernel convolution enhanced long-range contextual perception and improved the capability of continuous feature representation of slender structural targets such as drill rods and drill tails under occlusion and complex backgrounds. In the neck network, an adaptive feature fusion module was introduced, which combined local attention and global attention to generate fusion weights, dynamically reweighted and reorganized cross-layer features, suppressed redundant background responses, and enhanced discriminative features of small and occluded targets. Focal Loss was adopted in the classification branch to assign higher weights to hard samples, which alleviated the imbalance between positive and negative samples and between easy and hard samples, thereby improving the robustness and generalization performance of the model under complex underground working conditions. On this basis, a temporal state classification method for drilling videos was further constructed. The method utilized the spatial position, motion state, and personnel proximity information of key targets for video-level modeling to achieve accurate classification of drilling processes. Experimental results on the self-built UCMDO-OBB dataset showed that, compared with YOLOv13n-OBB, the precision of CAF-YOLO increased by 1.8%, the recall increased by 4.3%, mAP@0.5 increased by 3.2%, and mAP@0.5:0.95 increased by 3.3%, indicating significant improvement. In the process classification task, the accuracy reached 92.7%, the macro-average
F1 score reached 0.912, and the mean absolute percentage error of the advancing stage was 4.1%. The proposed method effectively enabled video-level process analysis and provided technical support for intelligent monitoring and safety management of underground drilling operations.