Algorithms for Two Versions of LCS Problem for Indeterminate Strings

Costas Iliopoulos1,2, M. Sohel Rahman3,4, Wojciech Rytter1,4
1Department of Mathematics and Informatics Copernicus University, Torun, Poland
2Algorithm Design group Department of Computer Science King’s College London Strand, London WO2R 2LS, England
3Department of Computer Science & Engineering Bangladesh University of Engineering & Technology Dhaka-1000, Bangladesh
4Institute of Informatics Warsaw University Warsaw, Poland

Abstract

We study the complexity of the longest common subsequence (LCS) problem from a new perspective. By an indeterminate string (i-string, for short) we mean a sequence \( \tilde{X} = \tilde{X}[1]\tilde{X}[2]\ldots \tilde{X}[n] \), where \( \tilde{X}[i] \subseteq \Sigma \) for each \( i \), and \( \Sigma \) is a given alphabet of potentially large size. A subsequence of \( \tilde{X} \) is any usual string over \( \Sigma \) which is an element of the finite (but usually of exponential size) language \( \tilde{X}[i_1]\tilde{X}[i_2]\ldots \tilde{X}[i_p] \), where \( 1 \leq i_1 < i_2 < i_3 \ldots < i_p \leq n \), \( p \geq 0 \). Similarly, we define a supersequence of \( x \). Our first version of the LCS problem is Problem ILCS: for given i-strings \( \tilde{X} \) and \( \tilde{Y} \), find their longest common subsequence. From the complexity point of view, new parameters of the input correspond to \( |\Sigma| \), and maximum size \( \ell \) of the subsets in \( \tilde{X} \) and \( \tilde{Y} \). There is also a third parameter \( \mathcal{R} \), which gives a measure of similarity between \( \tilde{X} \) and \( \tilde{Y} \). The smaller the \( \mathcal{R} \), the lesser is the time for solving Problem ILCS. Our second version of the LCS problem is Problem CILCS (constrained ILCS): for given i-strings \( \tilde{X} \) and \( \tilde{Y} \) and a plain string \( Z \), find the longest common subsequence of \( \tilde{X} \) and \( \tilde{Y} \) which is, at the same time, a supersequence of \( Z \). In this paper, we present several efficient algorithms to solve both ILCS and CILCS problems. The efficiency in our algorithms is obtained in particular by using an efficient data structure for special types of range maxima queries and fast multiplication of boolean matrices.