Optimization Design of Massive Data Storage System Based on Distributed Computing Model

Xiang Li 1
1Image and Text Information Center, Jiangsu Province Nantong Industry & Trade Technician College, Nantong, Jiangsu, 226010, China

Abstract

With the arrival of the big data era, the demand for massive data storage is growing, and distributed storage systems have become a key technology to solve this problem. The traditional HDFS system has a large storage overhead, this paper in order to improve the storage efficiency of massive data, the introduction of corrective deletion code (RS code) technology, to ensure the reliability of the data at the same time significantly reduce the cost of storage. In order to improve the storage efficiency of massive data, this paper introduces the corrective censoring code (RS code) technology, which ensures the data reliability and significantly reduces the storage cost. In addition, to address the problems of low coding efficiency and high repair overhead in the practical application of RS code, this paper further introduces the local repair code (LRC) technology, which reduces the data repair overhead, and compares and analyzes the application effect of optimization model (RS-LRC-HDFS). The experimental results show that after RS-LRC optimization, the time overhead of the HDFS storage system in the write process and read process is significantly improved by 81.12% and 93.01%, respectively, compared with the pre-optimization period, and the repair time of massive file data is reduced by 87.25%. It can be seen that it provides an efficient and reliable solution for massive data storage.

Keywords: HDFS system, corrective censoring code, local repair code, RS-LRC-HDFS, data storage