Canonical minimal automata for generalised regular expressions

Given a generalised regular expression (GRE) R, this program computes the
canonical minimal (incomplete) deterministic finite automaton (DFA) which
recognises the language described by R and hence solves the
equivalence problem for GREs.

The uniqueness up to isomorphism of the minimal DFA for every
regular language is given by the Myhill–Nerode theorem. The
states of the canonical minimal DFA are usually expressed as equivalence
classes of strings under the Nerode congruence; here they are given
as natural numbers in [0, ..., m-1], indexed in order of the
earliest string in shortlex order to reach each state.

It follows from (Stockmeyer, 1974) that this program runs in non-ELEMENTARY
time in the worst case; i.e. there is no k such that this program
runs in O(2^2^2^...^n) time for a power tower of height k. More
specifically, the height of the power tower is given by the number of
nested complements permitted within the expression.

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ GREs are given in postfix notation. Whitespace is ignored. ┃
┠─────────────────────────────┬──────────────────────────────┨
┃ $    nothing                │ R+   one or more             ┃
┃ %    anything               │ R!   complement              ┃
┃ ~    the empty string       │ RS,  concatenation           ┃
┃ .    any single character   │ RS|  union                   ┃
┃ a    the letter ‘a’         │ RS&  intersection            ┃
┃ R*   zero or more           │ RS\  difference              ┃
┃ R?   zero or one            │ RS^  symmetric difference    ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

How it works • Sub-expressions are successively computed via DFA operations and minimised. At the end, canonicalisation is performed. • For the base cases $, %, ~, . and a-to-z, DFAs are given directly. • For !, &, |, \ and ^, the product construction on DFAs is used. Only reachable states are constructed, and the argument DFAs are completed only when necessary (!, both arguments of | and ^, and the second argument of \). • For *, + and concat, epsilon-transitions are added and the powerset construction is used on the resulting NFA. Again only reachable states are constructed. (? is simple enough that it is just constructed directly.) • Minimisation is performed as follows: 1. Unreachable states are removed. 2. Dead states (states that cannot reach a final state) are removed, except for the start state. These can be found by graph-searching backwards from each final state. 3. States in the same Nerode congruence class are merged using Hopcroft’s algorithm. • Canonicalisation is performed a breadth-first search in alphabetic order (which naturally visits each state in order of the earliest string in shortlex order that reaches it). In particular there is no need to solve the graph isomorphism problem.