In enterprise operations, multi-objective optimization involves multiple conflicting objectives such as cost escalation control, customer satisfaction, and production efficiency. Based on reinforcement learning algorithm, the article deals with multi-objective optimization problem in enterprise operation through the interactive learning between intelligent body and environment, for which a multi-objective operation efficiency improvement path for enterprise based on Q-learning scheduling is designed. The simulation data is utilized to generate the PDR tree structure, and subsequently, the intelligent body is prompted to complete the multi-objective operation learning of the enterprise through several iterations. On this basis, the intelligent body completes all the actions and generates scheduling strategies to improve operational efficiency. The model proposed in this paper can predict the demand changes of enterprises in the future time window and make the best decision to improve the operational efficiency. Under the model of this paper, the mean values of pure technical efficiency as well as scale efficiency of 10 firms in 2024 are 0.9 and 0.933, respectively, and they are predicted to continue to grow in 2025. The model reduces the firms’ average operating costs and administrative expenses, while employee compensation and fixed assets increase by 49.58% and 19.48%. Since the survey period, the TFP index of all 10 companies is greater than 1, which indicates that, the application of the model in this paper improves the operational efficiency of the companies.