Research on Thematic Clustering and Text Mining of Chinese Modern and Contemporary Literary Texts in the Network Era

Abstract

This paper aims at resolving the issue that the conventional literature study can’t deal with the large amount of data, the author proposes a research method for theme clustering and text mining of Chinese modern and contemporary literary texts in the network era. The author studied how to effectively improve the thematic clustering performance of literary texts based on keyword clustering ensemble method. Comparing two clustering ensemble methods (K-means based data ensemble and incremental clustering based algorithm ensemble) and four keyword extraction methods (TF-ISF CSI, ECC, TextRank), the effects of various keywords on the results of thematic clustering were analysed. Experiments indicate that the clustering algorithm can greatly increase the topic clustering efficiency, and it is more stable when the key words are less. The author’s research provides new technological means for text mining and thematic clustering in contemporary Chinese literature, which helps to promote the development of digital humanities research.

Keywords: Internet era; Text mining; Cluster analysis