A Study of Word Vector Computation and Multilayer Network Representation in English Corpus

Chunmei Qiao1
1The Public Course Teaching Department, Henan Vocational University of Science and Technology, Zhoukou, Henan, 466000, China

Abstract

Although China’s research on English is not as early as that of the western countries, researchers, combining the basic national conditions of China and the actual situation of the nationals’ learning of English, have been making continuous efforts in the research on the construction and application of English corpus, and have already achieved satisfactory results. In this paper, we first analyze the related contents of English corpus, and construct English corpus corpus from phonological and semantic aspects by analyzing the correlation characteristics between English corpus and semantics, according to the basic principles of corpus selection. Combining two word vector similarity measures, Jaccard similarity and edit distance, finally constitutes the final similarity calculation algorithm for English sentences. The MECNC model is constructed by integrating the joint representation and co-representation learning methods, and using edge probability to abstract the connection between two nodes. Experimentally analyze the word vector similarity of English corpus with the results of English corpus recommendation based on multilayer network representation. The correlation scores of Jaccard similarity metric in WS-SIM, WS-REL, MEN, Mtruk-771, and Simverb-3500 are 0.8069, 0.6668, 0.7389, 0.7125, respectively, 0.2769, which achieves the best results, so Jaccard captures more of the correlation between words. Experiments on link prediction task were conducted on five corpora using 3, 5, 8, and 10-fold cross-validation methods, and on the corpus CKM [245,1550], MECNC model OM3 has a maximum AUC value close to 0.94 at a cross-validation number of 8, which shows that MECNC, which is used as a guiding information for intra-layer wandering, shows a better performance.

Keywords: associative features; MECNC model; English corpus; word vector computation; multilayer network representation