Efficient \((\delta, \gamma)\)-Pattern-Matching with Don’t Cares

Yoan José Pinzén Ardila1, Manolis Christodoulakis2, Costas S. Tliopoulos2, Manal Mohamed2
1National University of Colombia Department of System Engineering and Industrial Engineering Ciudad Universitaria – Bogoté D.C.- Colombia Avenida Carrera 30 No 45-03
2King’s College London, Department of Computer Science, London WC2R 2LS, UK

Abstract

Here we consider string matching problems that arise naturally in applications to music retrieval. The \( \delta \)-Matching problem calculates, for a given text \( T_{1..n} \) and a pattern \( P_{1..m} \) on an alphabet of integers, the list of all indices \( \mathcal{I}_\delta = \{1 \leq i \leq n-m+1 : \max_{j=1}^m \{|P_j – T_{i+j-1}| \leq \delta\}\} \). The \( \gamma \)-Matching problem computes, for given \( T \) and \( P \), the list of all indices \( \mathcal{I}_\gamma = \{1 \leq i \leq n-m+1 : \sum_{j=1}^m |P_j – T_{i+j-1}| \leq \gamma\} \). In this paper, we extend the current result on the different matching problems to handle the presence of “\emph{don’t care}” symbols. We present efficient algorithms that calculate \( \mathcal{I}_\delta \), \( \mathcal{I}_\gamma \), and \( \mathcal{I}_{(\delta,\gamma)} = \mathcal{I}_\delta \cap \mathcal{I}_\gamma \) for pattern \( P \) with occurrences of “don’t cares”.

Keywords: 5-matching; -y-matching; wildcard matching; music in- formation retrieval.