A Failure Function for Multiple Two-dimensional Pattern Matching

Maxime Crochemore1, Costas S. Tiopoulos2,3, Maureen Korda2, James F. Reid2,4
1Institut Gaspard-Monge, Université de Marne-la-Vallée, Cité Descartes, 5 Bd Descartes, Champs-sur-Marne, F-77454 Marne-la-Vallée CEDEX 2, France.
2Dept. Computer Science, King’s College London, London WC2R 2LS, UK.
3School of Computing, Curtin University of Technology, GPO Box 1987 U, Western Australia.
4Dipt. di Elettronica e Informatica, Universita degli Studi di Padova, Via Gradenigo 6/A, Padova 35131, Italy.

Abstract

Given a two-dimensional text \(T\) and a set of patterns \(\mathcal{D} = \{P_1, \ldots, P_k\}\) (the dictionary), the two-dimensional \emph{dictionary matching} problem is to determine all the occurrences in \(T\) of the patterns \(P_i \in \mathcal{D}\). The two-dimensional \emph{dictionary prefix-matching} problem is to determine the longest prefix of any \(P_i \in \mathcal{D}\) that occurs at each position in \(T\). Given an alphabet \(\Sigma\), an \(n \times n\) text \(T\), and a dictionary \(\mathcal{D} = \{P_1, \ldots, P_k\}\), we present an algorithm for solving the two-dimensional dictionary prefix-matching problem. Our algorithm requires \(O(|T| + |\mathcal{D}|(log m + log |\Sigma|))\) units of time, where \(m \times m$ is the size of the largest \(P_i \in \mathcal{D}\). The algorithm presented here runs faster than the Amir and Farach [3] algorithm for the dictionary matching problem by an \(O(log k)\) factor. Furthermore, our algorithm improves the time bound that can be achieved using the Lsuffix tree of Giancarlo [6],[7] by an \(O(k)\) factor.