Analyzing Semantic Alignment Mechanisms and Translation Accuracy in English-Chinese Translation Using Support Vector Machines

Abstract

Cross-language text categorization techniques can achieve more efficient localization and use of text data in multilingual languages by overcoming the differences between different languages. In this paper, firstly, by combining cross-language word vectors and adversarial training, support vector machines are utilized to improve the alignment effect of English-Chinese cross-language words and sentences in the feature space, and to achieve higher quality English-Chinese cross-language text classification. Then the variational mechanism is combined with multi-task learning to align the potential semantic space of multimodal data, maintain the domain invariance of different modal data representations, improve the generalization ability of the model, and ensure the consistency of the variational machine translation training process and the prediction process. The two are combined to construct a hybrid variational multimodal machine translation model based on semantic alignment, experimentally validate the effect of the text categorization algorithm on datasets such as Multi30k, and examine the quality of English-Chinese and Chinese-English translations. In the experiments, it is found that on the MSCOCO dataset, the BLEU of English to Chinese and Chinese to English of this paper’s model is 61.26 and 60.15 respectively, and the translation quality is significantly better than the baseline model. The model achieved the best results in all 3 actual translation tasks. And compared with the complete model, the translation performance of different removal cases in the ablation experiments are decreased, which verifies the effectiveness of the model of this paper as a whole and different components. The method in this paper can effectively reduce the feature differences between different languages, and has important practical application value for solving cross-language text categorization and machine translation problems.

Keywords: support vector machine; semantic feature alignment; mixed variational distribution; machine translation