The massive amount of surveillance video in coal mines is transmitted to the cloud computing center for centralized processing through Ethernet. This method has problems such as high latency, high cost and high network bandwidth occupation. To address the above problems, a lightweight convolutional neural network (CNN) model is constructed with depthwise separable convolution as the core. Moreover, the lightweight CNN model is optimized by introducing the residual structure to improve the image feature extraction ability. The low contrast of surveillance video images caused by the complex lighting environment in coal mines affects the recognition accuracy of the model. Hence, the contrast limited adaptive histogram equalization (CLAHE) algorithm is used to improve the brightness and contrast of images so as to improve the recognition effect of the model. The lightweight CNN model is compressed by STM32Cube AI and deployed on the embedded platform. A video surveillance terminal based on the lightweight CNN model is designed to perform real-time and intelligent processing of coal mine surveillance video locally to achieve real-time identification and alarming of coal mine violations. Experimental results show that by introducing the residual structure to optimize the lightweight CNN model and using the CLAHE algorithm for image enhancement, the model can achieve an accuracy of more than 95% for recognizing various violations in coal mines and improve real-time response to violations.