The application of virtual reality (VR) technology in teaching is increasingly widespread. This study leverages VR to create cross-cultural teaching contexts and develop speech recognition models for language learning. An ecological model of language learning based on VR is constructed, and a cross-cultural contextual VR system is implemented and introduced into language education. Testing reveals that the system achieves a speech recognition efficiency of 99.7% and a correctness rate of 99.5%. Moreover, a comparison of pre- and post-test data between experimental and control groups shows that the experimental group significantly outperformed the control group in English proficiency (p < 0.05). Overall, the cross-cultural contextual VR system demonstrates a significant positive impact on language learning outcomes.
1970-2025 CP (Manitoba, Canada) unless otherwise stated.