轻量化姿态估计时空增强图卷积模型下的矿工行为识别

王建芳; 段思源; 潘红光; 景宁波

doi:10.13272/j.issn.1671-251x.2024090059

轻量化姿态估计时空增强图卷积模型下的矿工行为识别

1.
陕西陕煤澄合矿业有限公司，陕西渭南　715200
2.
西安科技大学电气与控制工程学院，陕西西安　710054

基金项目: 国家自然科学基金项目（51804249）。

详细信息

作者简介:
王建芳（1975—），男，陕西渭南人，高级工程师，硕士，主要从事智能矿山等方面研究工作，E-mail：1194699440@qq.com

通讯作者:
景宁波（1984—），男，山西永济人，工程师，博士，主要从事智慧矿山、煤矿机器人方面的教学和科研工作，E-mail: nbjing@xust.edu.cn。

中图分类号: TD67
计量
- 文章访问数: 421
- HTML全文浏览量: 9
- PDF下载量: 36
出版历程
- 收稿日期: 2024-09-15
- 修回日期: 2024-11-22
- 网络出版日期: 2024-12-05
- 刊出日期: 2024-11-24

Lightweight pose estimation spatial-temporal enhanced graph convolutional model for miner behavior recognition

1.
Chenghe Mining Co., Ltd., Shaanxi Coal and Chemical Industry Group Co., Ltd., Weinan 715200, China
2.
College of Electric and Control Engineering, Xi'an University of Science and Technology, Xi'an 710054, China

摘要

摘要:
基于骨架序列的行为识别模型具有速度快、算力要求低、模型简单等特点，图卷积神经网络在处理骨架序列数据时具有优势，而现有基于图卷积的矿工行为识别模型在高精度和低计算复杂度之间难以兼顾。针对该问题，提出了一种基于轻量化姿态估计网络（Lite−HRNet）和多维特征增强时空图卷积网络（MEST−GCN）的矿工行为识别模型。Lite−HRNet通过目标检测器进行人体检测，利用卷积神经网络提取图像特征，并通过区域提议网络生成锚框，对每个锚框进行分类以判断是否包含目标；区域提议网络对被判定为目标的锚框进行边界框回归，输出人体边界框，并通过非极大值抑制筛选出最优检测结果；将每个检测到的人体区域裁剪出来并输入到Lite−HRNet，生成人体关键点骨架序列。MEST−GCN在时空图卷积神经网络（ST−GCN）的基础上进行改进：去除ST−GCN中的冗余层以简化模型结构，减少模型参数量；引入多维特征融合注意力模块M2FA。生成的骨架序列经MEST−GCN的BN层批量标准化处理后，由多维特征增强图卷积模块提取矿工行为特征，经全局平均池化层和Softmax层得到行为的置信度，获得矿工行为预测结果。实验结果表明：① MEST−GCN的参数量降低至1.87 Mib；② 在以交叉主体和交叉视角为评价标准的公开数据集NTU60上，采用Lite−HRNet提取2D人体关键点坐标，基于Lite−HRNet和MEST−GCN的矿工行为识别模型的准确率分别达88.0%和92.6%；③ 在构建的矿工行为数据集上，基于Lite−HRNet和MEST−GCN的矿工行为识别模型的准确率达88.5%，视频处理速度达18.26 帧/s，可以准确且快速地识别矿工的动作类别。
- 矿工行为识别 /
- 人体关键点提取 /
- 骨架序列 /
- 图卷积 /
- 轻量化姿态估计网络 /
- 特征融合 /
- 多维特征融合注意力模块
Abstract:
Skeleton-sequence-based behavior recognition models are characterized by fast processing speeds, low computational requirements, and simple structures. Graph convolutional networks (GCNs) have advantages in processing skeleton sequence data. However, existing miner behavior recognition models based on graph convolution struggle to balance high accuracy and low computational complexity. To address this issue, this study proposed a miner behavior recognition model based on a lightweight pose estimation network (Lite-HRNet) and a multi-dimensional feature-enhanced spatial-temporal graph convolutional network (MEST-GCN). Lite-HRNet performed human detection using a target detector, extracted image features through a convolutional neural network (CNN), and generated anchor boxes via a region proposal network (RPN). These anchor boxes were classified to determine whether they contain a target. The RPN applied bounding box regression to the anchor boxes identified as containing targets and outputted the human bounding box, with the optimal detection result selected via non-maximum suppression. The detected human regions were cropped and inputted into Lite-HRNet to generate skeleton sequences based on human pose keypoints. MEST-GCN improved upon the spatial-temporal graph convolutional network (ST-GCN) by removing redundant layers to simplify the model structure and reduce the number of parameters. It also introduced a multi-dimensional feature fusion attention module (M2FA). The generated skeleton sequences were processed by the BN layer for batch normalization, and the miner behavior features were extracted through the multi-dimensional feature-enhanced graph convolution module. These features were passed through global average pooling and a Softmax layer to obtain the behavior confidence, providing the miner behavior prediction results. Experimental results showed that: ① The parameter count of MEST-GCN was reduced to 1.87 Mib. ② On the public NTU60 dataset, evaluated using cross subject and cross view standards, the accuracy of the miner behavior recognition model based on Lite-HRNet and MEST-GCN reached 88.0% and 92.6%, respectively, with Lite-HRNet extracting 2D human keypoint coordinates. ③ On a custom-built miner behavior dataset, the model based on Lite-HRNet and MEST-GCN achieved an accuracy of 88.5% and a video processing speed of 18.26 frames per second, accurately and quickly identifying miner action categories.
- miner behavior recognition /
- human keypoint extraction /
- skeleton sequence /
- graph convolution /
- lightweight pose estimation network /
- feature fusion /
- multi-dimensional feature fusion attention module

HTML全文

图 1 基于Lite−HRNet和MEST−GCN的行为识别模型结构

Figure 1. Architecture of behavior recognition model based on Lite-HRNet and MEST-GCN

下载: 全尺寸图片幻灯片

图 2 Lite−HRNet提取的人体关键点

Figure 2. Human keypoints extracted by Lite-HRNet

下载: 全尺寸图片幻灯片

图 3 人体时空图

Figure 3. Human spatial-temporal map

下载: 全尺寸图片幻灯片

图 4 MEST−GCN和ST−GCN结构

Figure 4. MEST-GCN and ST-GCN structures

下载: 全尺寸图片幻灯片

图 5 在时间维度上的平移操作

Figure 5. Shift operation in the temporal dimension

下载: 全尺寸图片幻灯片

图 6 精度和平均损失随迭代次数变化的曲线

Figure 6. Relationship curves of accuracy and average loss with iteration number

下载: 全尺寸图片幻灯片

图 7 Lite−HRNet对矿工动作关键点的提取效果

Figure 7. Keypoint extraction results of miner actions using Lite-HRNet

下载: 全尺寸图片幻灯片

图 8 不同网络提取矿工摔倒动作的关键点

Figure 8. Keypoint extraction of falling miner using different networks

下载: 全尺寸图片幻灯片

图 9 不同模型在MBD数据集上的可视化识别效果

Figure 9. Visualization of recognition results from different models on the MBD dataset

下载: 全尺寸图片幻灯片

表 1 不同关键点数据对比实验

Table 1 Comparison results of different keypoint data %

关键点数据结合图卷积模型	准确率
关键点数据结合图卷积模型	X−sub	X−view
模型1	80.3	89.6
模型2	86.9	92.5
模型3	86.5	91.8
模型4	88.0	92.6

下载: 导出CSV

表 2 不同注意力模块对比实验结果

Table 2 Comparative results of different attention modules

模型	准确率/%		参数量/Mib
模型	X−sub	X−view	参数量/Mib
ST−GCN	87.3	92.4	3.12
ST−GCN−6	86.5	91.8	1.30
ST−GCN−6+SE	86.9	91.8	1.43
ST−GCN−6+CBAM	87.2	92.0	2.14
MEST−GCN	88.0	92.6	1.87

下载: 导出CSV

表 3 不同模型在MBD数据集上的对比实验

Table 3 Comparison of different models on the MBD datasets

模型	准确率/%	参数/Mib	帧率/（帧∙s⁻¹）
ST−GCN	88.0	3.12	12.77
2s−AGCN	89.0	6.95	7.65
CTR−GCN	89.3	2.60	7.59
MS−G3D	87.3	6.42	3.36
MEST−GCN	88.5	1.87	18.26

下载: 导出CSV

参考文献(25)

[1]	张涵,王峰. 基于矿工不安全行为的煤矿生产事故分析及对策[J]. 煤炭工程,2019,51(8):177-180. ZHANG Han,WANG Feng. Countermeasure and analysis on accidents of mines based on staff's unsafe behaviors[J]. Coal Engineering,2019,51(8):177-180.
[2]	苏晨阳,武文红,牛恒茂,等. 深度学习的工人多种不安全行为识别方法综述[J]. 计算机工程与应用,2024,60(5):30-46. SU Chenyang,WU Wenhong,NIU Hengmao,et al. Review of deep learning approaches for recognizing multiple unsafe behaviors in workers[J]. Computer Engineering and Applications,2024,60(5):30-46.
[3]	WANG Zheng,LIU Yan,DUAN Siyuan,et al. An efficient detection of non-standard miner behavior using improved YOLOv8[J]. Computers and Electrical Engineering,2023,112. DOI: 10.1016/j.compeleceng.2023.109021.
[4]	陈天,闫雨寒,徐达伟,等. 基于改进双流算法的矿工行为识别方法研究[J]. 河南科技大学学报(自然科学版),2021,42(4):47-53,7. CHEN Tian,YAN Yuhan,XU Dawei,et al. Research on miner behavior recognition method based on improved two-stream algorithm[J]. Journal of Henan University of Science and Technology(Natural Science),2021,42(4):47-53,7.
[5]	XIN Fangfang,HE Xinyu,YAO Chaoxiu,et al. A real-time detection for miner behavior via DYS-YOLOv8n model[J]. Journal of Real-Time Image Processing,2024,21(3). DOI: 10.1007/s11554-024-01466-0.
[6]	WANG Yu,CHEN Xiaoqing,LI Jiaoqun,et al. Convolutional block attention module-multimodal feature-fusion action recognition:enabling miner unsafe action recognition[J]. Sensors,2024,24(14). DOI: 10.3390/s24144557.
[7]	程健,李昊,马昆,等. 矿井视觉计算体系架构与关键技术[J]. 煤炭科学技术,2023,51(9):202-218. DOI: 10.12438/cst.2023-0152 CHENG Jian,LI Hao,MA Kun,et al. Architecture and key technologies of coal mine underground vision computing[J]. Coal Science and Technology,2023,51(9):202-218. DOI: 10.12438/cst.2023-0152
[8]	王宇,于春华,陈晓青,等. 基于多模态特征融合的井下人员不安全行为识别[J]. 工矿自动化,2023,49(11):138-144. WANG Yu,YU Chunhua,CHEN Xiaoqing,et al. Recognition of unsafe behaviors of underground personnel based on multi modal feature fusion[J]. Journal of Mine Automation,2023,49(11):138-144.
[9]	CAO Xiangang,ZHANG Chiyu,WANG Peng,et al. Unsafe mining behavior identification method based on an improved st-gcn[J]. Sustainability,2023,15(2). DOI: 10.3390/su15021041.
[10]	SUN Zehua,LIU Jun,KE Qiuhong,et al. Human action recognition from various data modalities:a review[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3200-3225.
[11]	WANG Huogen,SONG Zhanjie,LI Wanqing,et al. A hybrid network for large-scale action recognition from RGB and depth modalities[J]. Sensors,2020,20(11). DOI: 10.3390/s20113305.
[12]	REN Bin,LIU Mengyuan,DING Runwei,et al. A survey on 3d skeleton-based action recognition using learning method[J]. Cyborg and Bionic Systems,2024,5. DOI: 10.34133/cbsystems.0100.
[13]	YAN Sijie,XIONG Yuanjun,LIN Dahua. Spatialtemporal graph convolutional networks for skeleton-based action recognition[C]. AAAI Conference on Artificial Intelligence,New Orleans,2018. DOI: 10.48550/arXiv.1801.07455.
[14]	YANG Huaigang,REN Ziliang,YUAN Huaqiang,et al. Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition[J]. Frontiers in Neurorobotics,2022,16. DOI: 10.3389/fnbot.2022.1091361.
[15]	XING Yuling,ZHU Jia,LI Yu,et al. An improved spatial temporal graph convolutional network for robust skeleton-based action recognition[J]. Applied Intelligence,2023,53(4):4592-4608. DOI: 10.1007/s10489-022-03589-y
[16]	黄瀚,程小舟,云霄,等. 基于DA−GCN的煤矿人员行为识别方法[J]. 工矿自动化,2021,47(4):62-66. HUANG Han,CHENG Xiaozhou,YUN Xiao,et al. DA-GCN-based coal mine personnel action recognition method[J]. Industry and Mine Automation,2021,47(4):62-66.
[17]	刘浩,刘海滨,孙宇,等. 煤矿井下员工不安全行为智能识别系统[J]. 煤炭学报,2021,46(增刊2):1159-1169. LIU Hao,LIU Haibin,SUN Yu,et al. Intelligent recognition system of unsafe behavior of underground coal miners[J]. Journal of China Coal Society,2021,46(S2):1159-1169.
[18]	李雯静,刘鑫. 基于深度学习的井下人员不安全行为识别与预警系统研究[J]. 金属矿山,2023(3):177-184. LI Wenjing,LIU Xin. Research on underground personnel unsafe behavior identification and early warning wystem based on deep learning[J]. Metal Mine,2023(3):177-184.
[19]	李善华,肖涛,李肖利,等. 基于DRCA−GCN的矿工动作识别模型[J]. 工矿自动化,2023,49(4):99-105,112. LI Shanhua,XIAO Tao,LI Xiaoli,et al. Miner actionrecognition model based on DRCA-GCN[J]. Journal of Mine Automation,2023,49(4):99-105,112.
[20]	YU Changqian,XIAO Bin,GAO Changxin,et al. Lite-HRNet:a lightweight high-resolution network[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Nashville,2021:10440-10450.
[21]	SUN Ke,XIAO Bin,LIU Dong,et al. Deep high-resolution representation learning for human pose estimation[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:5693-5703.
[22]	ZHANG Yu,GAN Junsi,ZHAO Zewei,et al. A real-time fall detection model based on BlazePose and improved ST-GCN[J]. Journal of Real-Time Image Processing,2023,20(6). DOI: 10.1007/s11554-023-01377-6.
[23]	KONG Yu,FU Yun. Human action recognition and prediction:a survey[J]. International Journal of Computer Vision,2022,130(5):1366-1401. DOI: 10.1007/s11263-022-01594-9
[24]	HU Jie,SHEN Li,SAMUEL A,et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Inetlligence,2020,42(8):2011-2023. DOI: 10.1109/TPAMI.2019.2913372
[25]	WOO S,PARK J,LEE J,et al. Cbam:convolutional block attention module[C]. European Conference on Computer Vision,Munich,2018:3-19.

施引文献(3)

期刊类型引用(2)

1.	韩子彬，王丽宏. 无线安全监控系统在选煤厂瓦斯监测中的应用. 中国安全科学学报. 2021(S1): 159-164 . 百度学术
2.	田浩，赵小虎，张凯，王宽，李祎宸. 基于6LowPAN的煤矿井下数据传输系统设计. 工矿自动化. 2019(08): 6-12 . 本站查看