An \(O(n . m)\) Algorithm for Calculating the Closure of lca-Type Operators

Vincent Ranwez1, Stefan Janaqi2, Sylvie Ranwez2
1Institut des Sciences de I’Evolution de Montpellier (ISE-M), UMR 5554 CNRS, Université Montpellier II, place E. Bataillon, CC 064, 34 095 Montpellier cedex 05, France.
2LGI2P/EMA Research Centre, Site EERIE, Parc scientifique G. Besse, 30 035 Nimes cedex 1, France.

Abstract

The least common ancestor of two vertices, denoted \(\text{lca}(x, y)\), is a well-defined operation in a directed acyclic graph (dag) \(G\). We introduce \(U_\text{lca}(S)\), a natural extension of \(\text{lca}(x,y)\) for any set \(S\) of vertices. Given such a set \(S_0\), one can iterate \(S_{k+1} = U_\text{lca}(S_k)\) in order to obtain an increasing set sequence. \(G\) being finite, this sequence always has a limit which defines a closure operator. Two equivalent definitions of this operator are given and their relationships with abstract convexity are shown. The good properties of this operator permit to conceive an \(O(n.m)\) time complexity algorithm to calculate its closure. This performance is crucial in applications where dags of thousands of vertices are employed. Two examples are given in the domain of life-science: the first one concerns genes annotations’ understanding by restricting Gene Ontology, the second one deals with identifying taxonomic group of environmental \(DNA\) sequences.