The rapid growth of multilingual information online has made traditional translation insufficient, highlighting the need for intelligent language translation. This study employs a convolutional neural network to extract visual features from translated images and uses region-selective attention to align text and image features. The fused information is then processed through a sequence model to develop a computer vision-based translation algorithm. Results show that the proposed algorithm excels in key evaluation metrics, improving translation quality. It maintains a low leakage rate (1.30%), a mistranslation rate of 2.64%, and an average response time of 67.28ms. With strong generalization and applicability in multilingual translation, the algorithm demonstrates high performance and promising real-world applications.
1970-2025 CP (Manitoba, Canada) unless otherwise stated.