As artiϐicial intelligence technology becomes more and more mature, it is both a challenge and an opportunity for English speaking teaching. Aiming at the poor generation of virtual English teaching resources due to the training problems of traditional generative adversarial network, dual generative adversarial network is used to optimize the above problems and select the virtual English teaching resources that meet the requirements with the help of Pielou. At this level, the HTC VIVE suite, high performance computer system, Unity 3D development engine, and joystick control are integrated to jointly complete the work of English speaking teaching scene design. Combining the research data and evaluation indexes, the practical application efϐicacy of the scenario is analyzed. From the overall performance of different methods in the four datasets, this paper’s method is superior to the other four methods, that is, this paper’s method is able to generate high-quality virtual spoken English teaching resources. And the practical application efϐicacy in terms of test scores, learning effects, satisfaction, and English speaking teaching background is better than traditional multimedia, which is more conducive to promoting the development of English speaking teaching.