Research on Human Action Detection and Recognition Methods Based on Interpretable Artificial Intelligence

Abstract

Deep learning-based methods can be combined with skeleton data, but they only consider the feature vectors formed by joint coordinates and do not extract the spatio-temporal dependencies between skeletons. In order to provide a more comprehensive detection and recognition of spatio-temporal relationships in human action sequences, this paper proposes a graph neural network-based human action detection and recognition method by combining YOLOv5, AlphaPose, and spatio-temporal graph convolutional network (ST-GCN) algorithms under the interpretable artificial intelligence (XAI) perspective. Firstly, the improved YOLOv5s target detection algorithm is used to get the human body detection frame and obtain the human body position information, then the AlphaPose pose estimation algorithm is used to obtain the coordinate information of the joint points of the human skeleton, and finally the improved ST-GCN algorithm is used to construct the spatio-temporal graph and extract the spatio-temporal dependencies between the joints to complete the human body action recognition. Through experimental verification, the method can accurately recognize human fall, running, kicking, and squatting actions on the dataset, with a recognition accuracy of 92.04%, and compared with the five baseline models, the method has higher recognition accuracy, with the values of each index greater than 91%, which can provide technical support for human behavior recognition.

Keywords: interpretable artificial intelligence; graph neural network; target detection; action recognition; spatio-temporal map