The service efficiency of intelligent customer service robots affects the service operation efficiency of enterprises and plays an important role in maintaining customer resources. This paper applies multimodal interaction technology to intelligent customer service system, takes multimodal big language model Qwen-VL as the core, proposes a two-stage relationship multimodal relationship extraction framework based on big language model, realizes multimodal relationship extraction with the help of high-quality auxiliary knowledge, integrates dynamic semantic features and static structural features to complete the multimodal emotion polarity prediction, and constructs multimodal retrieval Q&A system to improve the performance of smart robot performance. Applying the intelligent customer service system in this paper for service practice, the conversation between the intelligent customer service robot and the customer usually ends in about 50 rounds, and the service efficiency is relatively efficient. In the face of customer emotional sentences labeled as happy, complaining and angry, the recognition accuracy under multimodal sentiment analysis is greater than 99%, and the behavior of “notification” and “confirmation” service behavior accounts for the largest proportion of behaviors, and the number of behaviors reaches 560,365 times, 365976 times, which is in line with the expected service behavior of intelligent customer service robots.