Research on optimizing vocal singing posture based on image recognition technology

Huang, Ting; Li ,  Yurun

doi:10.61091/jcmcc127a-230

Abstract

References

Journal of Combinatorial Mathematics and Combinatorial Computing

In Press
Volume 127a
Pages: 4073--4093

Research article

Research on optimizing vocal singing posture based on image recognition technology

^¹, ^²

¹Music College, Sichuan University of Science & Engineering, Zigong, Sichuan, 643000, China

²Music and Dance School, West China Normal University, Nanchong, Sichuan, 637000, China

Received: 15/01/2024
Revised: 10/03/2024
Accepted: 30/11/2024
Published Online: 15/04/2025

Copyright Link
License

Abstract

Vocal singing is a key art form of many stage singing arts, specifically including acting and singing. The study firstly is to introduce the detection principle of YOLOv5 target detection algorithm, on the basis of which the original YOLOv5 algorithm is improved by reconstructing the backbone network with the use of SENet and GhostNet, then the original YOLOv5 algorithm and the improved YOLOv5 algorithm are tested for comparison, and the test results show that on the target detection dataset Precision, Recall and mAP values reach 85.75%, 72.34% and 78.48% respectively, which are all improved compared with the original algorithm. Secondly, a high-resolution human posture estimation network incorporating multiple attention mechanisms is proposed to further extract multi-scale feature information and global feature information, and validated on publicly available datasets, CDLNet has an AP value of 0.662 and an AR value of 0.731 on the vocal singing posture estimation dataset, comparing with similar methods, the method has an MPJPE in Human3.6M The lowest is 44.6, which is suitable for use in vocal singing posture estimation in vocal singing scenarios. Finally, an action recognition model based on multi-granularity spatio-temporal graph convolutional neural network designed in this paper is used to analyze the singing gesture action recognition for singing action categories, and experiments show that the average recognition rate of MGstgcn can reach 86.5% on the HSiPu2 dataset, which meets the demand of vocal singing gesture action recognition analysis.

Keywords: target detection, gesture estimation, spatio-temporal graph convolution, action recognition, attention mechanism

Contents

Journal of Combinatorial Mathematics and Combinatorial Computing

Research on optimizing vocal singing posture based on image recognition technology

Abstract

Information

Guidelines

CP Initiatives

Follow CP