Regular Expression Matching Algorithms using Dual Position Automata

Hiroaki Yamamoto1
1Department of Information Engineering, Shinshu University, 4-17-1 Wakasato, Nagano-shi, 380-8553 Japan.

Abstract

This paper introduces an automaton model called a dual position automaton (a dual PA), and then gives a bit-parallel algorithm for generating a dual PA from a regular expression (RE). For any RE \( r \) over an alphabet \( \Sigma \), our translation algorithm generates a dual PA consisting of \( \tilde{m}(\tilde{m} + 1) \) bits in \( O(\tilde{m}\lceil \tilde{m}/w \rceil) \) time and space, where \( w \) is the length of a computer word, \( \tilde{m} = \sum_{a \in \Sigma} m_a \), and \( m_a \) is the number of occurrences of an alphabet symbol \( a \) in \( r \). Furthermore, we give a method to construct a compact DFA representation from a dual PA. This DFA representation requires only \( (\tilde{m} + 1) \sum_{a \in \Sigma} 2^{m_a} \) bits. Finally, we show RE matching algorithms using such a DFA representation.