Research on optimizing vocal singing posture based on image recognition technology

Ting Huang1, Yurun Li 2
1Music College, Sichuan University of Science & Engineering, Zigong, Sichuan, 643000, China
2Music and Dance School, West China Normal University, Nanchong, Sichuan, 637000, China

Abstract

Vocal singing is a key art form of many stage singing arts, specifically including acting and singing. The study firstly is to introduce the detection principle of YOLOv5 target detection algorithm, on the basis of which the original YOLOv5 algorithm is improved by reconstructing the backbone network with the use of SENet and GhostNet, then the original YOLOv5 algorithm and the improved YOLOv5 algorithm are tested for comparison, and the test results show that on the target detection dataset Precision, Recall and mAP values reach 85.75%, 72.34% and 78.48% respectively, which are all improved compared with the original algorithm. Secondly, a high-resolution human posture estimation network incorporating multiple attention mechanisms is proposed to further extract multi-scale feature information and global feature information, and validated on publicly available datasets, CDLNet has an AP value of 0.662 and an AR value of 0.731 on the vocal singing posture estimation dataset, comparing with similar methods, the method has an MPJPE in Human3.6M The lowest is 44.6, which is suitable for use in vocal singing posture estimation in vocal singing scenarios. Finally, an action recognition model based on multi-granularity spatio-temporal graph convolutional neural network designed in this paper is used to analyze the singing gesture action recognition for singing action categories, and experiments show that the average recognition rate of MGstgcn can reach 86.5% on the HSiPu2 dataset, which meets the demand of vocal singing gesture action recognition analysis.

Keywords: target detection, gesture estimation, spatio-temporal graph convolution, action recognition, attention mechanism