Dynamic analysis of classroom engagement sentiment based on multilevel feature extraction and Transformer CNN-LSTM integrated model: personalized behavior prediction for non-English learners in a cross-modal adversarial learning framework

Axin Huang1,2, Mary Jane C.Samonte 3
1School of Graduate Studies, Mapua University, 1002 Metro Manila, Philippines
2GongQing Institute of Science and Technology, Gongqingcheng, Jiangxi, 332020, China
3 School of Information Technology, Mapua University, 1002 Metro Manila, Philippines

Abstract

The article proposes a novel cross-modal adversarial learning framework for analyzing the emotional dynamics of non-English learners during classroom engagement and predicting their individualized behaviors. The framework combines multilevel feature extraction and Transformer CNN-LSTM integrated model to handle multimodal data more efficiently and capture the complex relationship between emotions and behaviors. Low-level and high-level multilevel features are then extracted from the raw multimodal data. Meanwhile, Transformer is utilized to mine long-distance dependencies between multimodal data, CNN extracts local features, and LSTM is used to model dynamic changes in time series. In addition, the framework introduces adversarial training to learn shared features across modalities. Before 50 rounds of training, the CL-Transformer model loss function, emotion recognition accuracy, and behavior prediction accuracy converge, showing the fastest training speed and training results. The algorithm in this paper has more than 90% precision, recall, and F1 scores for emotion recognition and behavior prediction, and the recognition accuracy for different emotions is up to 0.96. In the fifth stage of the case study, the classroom emotion conversion rate and arousal is up to 0.66, and the model predicts that the probability of cell phone playing behavior is the highest for learners who are in angry moods, which is 64.7%. The learners’ classroom emotional acceptance as well as behavioral integration have an impact on their classroom engagement.

Keywords: multilevel feature extraction, Transformer CNN-LSTM, cross-modal adversarial, behavioral prediction, emotion dynamic analysis