This paper constructs a multi-agent simulation model to study and prevent juvenile delinquency. A multi-agent reinforcement learning model is constructed according to reinforcement learning theory to simulate the behavioral decision-making process of minors in different social environments. By introducing the NashQ algorithm, it simulates the minors’ strategic choices when facing the temptation of crime. In the simulation experiments, the NashQ algorithm meets the convergence requirements of the model, and only 1/3 of the training times are needed to achieve the stability of the simulated environment. Among them, family factors, school factors and social factors all affect the stability of the prevention effect. Good family environment, high quality teaching conditions and healthy social atmosphere can effectively prevent juvenile delinquency.