Service Oriented Architecture (SOA), as a distributed computing architecture, is widely used to build efficient, maintainable and scalable information systems. This paper focuses on SOA design optimization based on reinforcement learning and cloud computing to achieve resource scheduling optimization with a view to improving the service quality of SOA applications. The asynchronous dominant action evaluation algorithm (A3C) based on policy gradient is used as the decision core of the cloud resource scheduler, and the residual recurrent neural network (R2N2) is introduced to construct the cloud resource scheduler based on the A3C-R2N2 algorithm to promote resource scheduling optimization. In the resource scheduling deployment strategy performance experiments, the median average latency of the stochastic dynamic scheduling strategy based on policy gradient learning proposed in this paper is reduced to 9.99% and 56.25% of the direct deployment, respectively, and the CPU utilization rate is also improved by 20.72% compared to the direct deployment. The loss function and reward function of the A3C-R2N2 algorithm in this paper begin to converge after the number of practice reaches 10,000 times and the number of training episodes reaches 300, respectively. Compared with random deployment and nearby deployment strategies, the deployment strategy based on A3C-R2N2 algorithm in this paper has an average service response time of 9.3622s, which is optimal.