Key frame extraction is an important research content for human motion capture data analysis and processing, for this reason, a key frame extraction method for motion capture data based on quantum particle swarm optimization algorithm is proposed, which can either extract a definite number of key frame sequences or extract key frame sequences according to the objective function. In this paper, the spatio-temporal graph convolutional network is selected as the benchmark network for tap dance action recognition, and the dance action recognition is realized by combining adaptive and attention mechanisms. The comprehensive index of tap dance is introduced and used as a constraint, and the golden section algorithm is used to optimize the training path of the dance action to obtain an ergonomic training path. The experimental results of this paper show that the key frame extraction method of motion capture data based on quantum particle swarm optimization algorithm meets the need of real-time compression of motion capture data. By constructing the validation dataset, the accuracy improvement of AAST-GAN algorithm and the effect of gesture extraction are compared and verified, and the recognition accuracy reaches more than 86%, which is a good recognition accuracy for each tap dance action. The dance movement training path proposed in this paper ensures the effectiveness and comfort of tap dance movements.