A Compressed Text Index on Secondary Memory

Rodrigo Gonzalez1, Gonzalo Navarro 1
1 Deptartment of Computer Science, University of Chile. Av. Blanco Encalada 2120, 3″¢ floor, Santiago, Chile.

Abstract

We introduce a practical disk-based compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides good I/O times for searching, which in particular improve when the text is compressible. In this aspect our index is unique, as most compressed indexes are slower than their classical counterparts on secondary memory. We analyze our index and show experimentally that it is extremely competitive on compressible texts. As side contributions, we introduce a compressed rank dictionary for secondary memory operating in one I/O access, as well as a simple encoding of sequences that achieves high-order compression and provides constant-time random access, both in main and secondary memory.