Information and Computation ••• (••••) ■

Contents lists available at ScienceDirect

Information and Computation

www.elsevier.com/locate/yinco

Information Computation

Reachability, confluence, and termination analysis with state-compatible automata ^

Bertram Felgenhauer *, René Thiemann

Institute of Computer Science, University of Innsbruck, Innsbruck, Austria

A R T I C L E I N F 0 A B S T R A C T

Article history: Received 28 June 2014 Available online xxxx

Regular tree languages are a popular device for reachability analysis over term rewrite systems, with many applications like analysis of cryptographic protocols, or confluence and termination analysis. At the heart of this approach lies tree automata completion, first introduced by Genet for left-linear rewrite systems. Korp and Middeldorp introduced so-called quasi-deterministic automata to extend the technique to non-left-linear systems. In this paper, we introduce the simpler notion of state-compatible automata, which are slightly more general than quasi-deterministic, compatible automata. This notion also allows us to decide whether a regular tree language is closed under rewriting, a problem which was not known to be decidable before.

The improved precision has a positive impact in applications which are based on reachability analysis, namely termination and confluence analysis.

Our results have been formalized in the theorem prover Isabelle/HOL. This allows to certify automatically generated proofs that are using tree automata techniques. © 2016 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

In this paper we are largely concerned with over-approximations of the terms reachable from a regular tree language L0 by rewriting using a term rewrite system R, that is, we are interested in regular tree languages L such that R*(L0) c L. Such over-approximations have been used, among other things, in the analysis of cryptographic protocols [8], for termination analysis [9,13] and for establishing non-confluence of term rewrite systems [19]. These applications work by first translating a problem into a term rewrite system: states (of, say, a program or a protocol) are modeled as terms, and behaviors (the effect of program statements, or of actions by the protocol participants and attackers) are modeled by rules of a term rewrite system. In the context of term rewriting, the most basic use of over-approximating reachable terms is for non-reachability: Given a set of starting terms L0, a term rewrite system R, and a set of bad terms B, are there terms s e L0 and b e B such that s reaches b? If we can find an over-approximation L of R*(L0) such that L n B is non-empty, then we can answer this question negatively. Note that it is beneficial to make the approximation L as small as possible.

Unfortunately, the question whether R*(L0) c L for regular languages L0 and L and a term rewrite system R is unde-cidable in general. Tree automata completion, conceived by Genet et al. [6,7], is based on the stronger requirements that L0 c L and L is itself closed under rewriting, i.e., R(L) c L.

* This research was supported by FWF projects P22467, P22767, P27528 and Y757.

* Corresponding author.

E-mail addresses: bertram.felgenhauer@uibk.ac.at (B. Felgenhauer), rene.thiemann@uibk.ac.at (R. Thiemann). http://dx.doi.org/10.1016/j.ic.2016.06.011

0890-5401/© 2016 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

In the present paper we concentrate on the second property, R(L) c L, since checking L0 c L for regular languages L0 and L is a well-studied problem. This research was born of involvement in the development of three tools for term rewriting, CeTA [18], a certifier for termination and confluence proofs generated by provers, CSI [19], an automated confluence prover, and TjT^ [11], an automated termination prover. The general idea is that the automated provers generate proofs of (non-)confluence or (non-)termination of a term rewrite system, which can then be verified by a certifier like CeTA. The distinguishing feature of a certifier is that it is highly trustworthy; in the case of CeTA, this is achieved by proving its code correct in the proof assistant Isabelle [17]. In contrast, CSI and TjT"2 are ordinary programs that are far more likely to contain bugs. Both CSI and TjT2 use quasi-deterministic automata [13] to produce overapproximations of reachable terms, and CeTA could not certify the resulting proofs. Our main contributions are as follows:

State-compatible automata We introduce the notion of state-compatible automata as a way of ensuring closure under rewriting. Our definition refines Genet's concept of compatibility [7], allowing better (i.e., smaller) approximations of reachable terms in some cases. At the same time, our notion is considerably simpler than quasi-deterministic automata, which is beneficial for the formalization in Isabelle. We show that state-compatibility does not only ensure R(L) c L, but can also be utilized to obtain a decision procedure for the question whether a regular tree language is closed under rewriting—to the best of our knowledge, this decidability result is new. The theory of state-compatible automata is developed in Section 3.

Comparison to quasi-deterministic automata We carefully analyze how state-compatible automata relate to quasi-deterministic automata, which were introduced by Korp and Middeldorp [13] to overcome problems with tree automata completion for non-left-linear term rewrite systems. We show that every quasi-deterministic automaton can easily be converted to a state-compatible automaton accepting the same language. This conversion is currently used in CSI (for non-confluence) and TTT2 (for termination) in order to produce certifiable proofs when tree automata are used. The relation to quasi-deterministic automata can be found in Section 4.1, and we relate state-compatible automata to quasi-models by Endrullis et al. [4] in Section 4.2.

Adaptation to match-bounds The match-bounds technique by Geser et al. [9] is a technique for proving termination of term rewrite systems. The original technique is a direct application of tree automata completion, but is restricted to left-linear term rewrite systems. Korp and Middeldorp [13] extended match-bounds to non-left-linear systems, using so-called raise-consistent, quasi-deterministic automata. We show how to adapt raise-consistency for state-compatible automata in Section 6.

Formalization Last but not least, we formalized state-compatible automata and the applications to non-confluence and termination in Isabelle, making the techniques available in the certifier CeTA, and allowing it to certify more non-confluence and termination proofs produced by CSI and TjT2. We also formalized the completeness result for checking R(L) c L, so that CeTA can, in principle, verify match-bounds and non-confluence proofs by tools that are unaware of state-compatible automata. See Section 7 for details.

Our formalization improves on the certifier for tree automata completion by Boyer et al. [2] in that we can handle non-left-linear systems.

Furthermore, state-compatible automata strictly refine both compatible and quasi-compatible automata, which matters for applications. To demonstrate this, we provide examples where the techniques of [12,13,19] are successfully applied when state-compatible automata are used, but where the techniques must fail if they are restricted to use the compatibility criteria of Genet or Korp and Middeldorp. These examples can be found in Sections 5 and 6. Finally, we conclude in Section 8.

Remark A preliminary version of this paper was already published in [5]. The current paper extends this work in various directions. For example, whereas [5] only mentions the application areas in passing, we now provide many more details in Sections 5 and 6. These include the new Examples 28 and 43 which show that the improved precision of state-compatible automata carries over to the applications. Even more importantly, we now present all details on how to adapt state-compatible automata for match-bounds to non-left-linear term rewrite systems, a task which was only mentioned as future work in [5]. The necessary results of Sections 6.1 and 6.2 have also been formalized using Isabelle. We also provide some insight into the implementation of tree automata completion in TjT2, which is based on an unpublished extension of quasi-deterministic automata with e-transitions by Korp. Finally, Section 7 on our formalization has been extended.

2. Preliminaries

We assume that the reader is familiar with first order term rewriting and tree automata. For introductions to these topics see [1] and [3].

Terms over a signature F and a set of variables V, denoted T(F, V) (or T(F) if V is empty) are inductively defined as either variables v e V or of the form f (t1,..., tn), where ti, ..., tn are terms and f e F is a function symbol of arity n. We write Var(t) for the set of variables in t. A term t is linear if each variable occurs in t at most once. Contexts are terms over F U{D| that contain exactly one occurrence of □. If C is a context and t a term, then C[t] denotes the term obtained by

B. Felgenhauer, R. Thiemann/Information and Computation ••• (••••) 3

replacing the □ in C by t. There are also multi-hole contexts D where D(t1,..., tn} is the term that is obtained by filling all holes: we always assume that the number n of terms matches the number of holes in D. A substitution t : V — T(F, V) maps variables to terms. We write tT for the result of replacing each variable x in t by t(x).

A term rewrite system (TRS) R is a set of rewrite rules l — r, where each rule's left-hand side l and right-hand side r are terms such that l £ V and Var(r) c Var(l). A TRS R defines a rewrite relation —r, namely s —r t whenever there are a context C, a rule l — r e R, and a substitution t such that s = C[It] and t = C[rT]. We denote by lhs(R) the set of all left-hand sides of rules in R. A TRS is left-linear if all its left-hand sides are linear terms. A rule I — r is called collapsing if r is a variable. The inverse, the reflexive closure, transitive closure, and the reflexive, transitive closure of a binary relation — are denoted by —, —=, —+, and —*, respectively. Given a set of terms L, R(L) (R*(L)) is the set of one-step (many-step) descendants of L: t e R(L) (t' e R*(L)) if and only if t —r t' (t t') for some t e L. A language L is closed under rewriting by R if R(L) c L.

A (bottom-up) tree automaton A = (F, Q, Qf, A) over a signature F consists of a set of states Q disjoint from F, a set of final states Qf c Q, and a set of transitions A of shape f (q1,..., qn) — q where the root f e F has arity n and q, q1,..., qn e Q. (We forbid e-transitions for the sake of simplicity.) We regard A as a TRS over the signature F U Q, with the states as constants. A substitution a is a state substitution if a(x) e Q for all x e V. A term t is accepted in state q if t —A q; t is accepted by A if it is accepted in a final state. The language accepted by A is L(A) = {t 11 —*A q for some q e Qf}. A tree automaton is finite if its set of transitions is finite. A language is regular if it is accepted by a finite tree automaton. We call A deterministic if no two rules in A have the same left-hand side. For convenience, we often write —A for —a. Following [13], we formulate Genet's result from [7] as follows:

Definition 1. A tree automaton A is compatible with a TRS R if for all state substitutions a, rules l — r e R and states q e Q, la q implies ra q.

Theorem 2. Let the tree automaton A be compatible with the TRS R. Then

1. if R is left-linear, then L(A) is closed under rewriting by R, and

2. if A is deterministic, then L(A) is closed under rewriting by R.

Finally, we recall that every tree automaton can be reduced to an equivalent automaton where all states are useful.

Definition 3. Let A = (F, Q, Qf, A) be a tree automaton. We say that a state q e Q is reachable if t q for some term t e T(F); q e Q is productive if C[q] q' for some context C and state q' e Qf. Finally, an automaton A is trim if all its states are both reachable and productive.

Proposition 4. For any tree automaton A there is an equivalent tree automaton A' that is trim. If A is deterministic, then A1 is also deterministic.

3. State-compatible automata

In this section, we introduce state-compatible automata. But first we will review tree automata completion and the shortcomings of compatible and quasi-deterministic automata.

3.1. Background and motivation

In tree automata completion, closure under rewriting is ensured by constructing L as the language accepted by a tree automaton A that is compatible with R. However, for non-left-linear systems, compatibility is not sufficient for ensuing closure under rewriting.

Example 5. Let R = {f(x, x) — x} and A be the automaton with states 1, 2, 3, final state 3, and transitions a — 1 a — 2 f(1, 2) — 3

So A is non-deterministic and R is non-left-linear. Even though A is compatible with R, L(A) = {f(a, a)} is not closed under rewriting by R, because f(a, a) can be rewritten to a which is not in L(A).

The usual approach to deal with non-left-linear TRSs is to demand that the automaton is deterministic. However, this may result in bad approximations, as demonstrated by the next example.

Example 6. Let R = {f(x, x) — b, b — a} and L0 = {f(a, a)}. The set of terms reachable from L0, namely R*(L0) = {f(a, a), b, a}, is not accepted by any deterministic, compatible tree automaton. To see why, assume that such an automaton

B. Felgenhauer, R. Thiemann / Information and Computation ••• (••••) •

/ — > — «'

q <--' q ~~ ~ q' q - - ~ q'

Fig. 1. Compatibility, state-compatibility, and state-coherence.

A exists, and let q be the state accepting f(a, a). There must be transitions a — q' (q' is unique because A is deterministic) and f(q', q') — q in A. By compatibility with the rules f(x, x) — b and b — a, we must have transitions b — q, and a — q. Since we already have the transition a — q', determinism implies q' = q. With the three transitions a — q, b — q, and f(q, q) — q, A accepts every term over the signature {f, a, b}, which is not a very useful approximation of R*(L0).

To overcome this problem, Korp and Middeldorp introduced quasi-deterministic automata [13] (cf. Definition 17). Indeed it is easy to find a quasi-deterministic automaton accepting R*(L0) ={f(a, a), b, a} that is compatible with R from the previous example.

Example 7. Let A be an automaton with states 1, 2, final state 2 and transitions a — 1* a — 2 b — 2* f(1,1) — 2*

(The stars can be ignored for the moment. They indicate the designated states for each left-hand side, cf. Definition 17.) Then A is quasi-deterministic, compatible with R and L(A) = {f(a, a), b, a}. Hence L(A) is closed under rewriting by R.

We extend compatible automata in a slightly different way to cover non-left-linear systems. Namely, we stick to deterministic automata, but relax the compatibility restriction to state-compatibility instead, which accomplishes a similar effect as quasi-deterministic automata. It turns out that as long as R has only non-collapsing rules, state-compatible automata and quasi-deterministic automata are equivalent, i.e., they can express the same regular languages that are closed under rewriting by R. In the presence of collapsing rules, state-compatible automata can capture more approximations than quasi-deterministic ones.

3.2. Definitions

Before we get down to definitions, let us briefly analyze the failure in Example 6. What happens there is that, by the compatibility requirement, all three terms in the rewrite sequence f(a, a) —r b —r a have to be accepted in the same state. In conjunction with the determinism requirement, this is fatal. Consequently, because our goal is to obtain a deterministic automaton, we must allow a and b to be accepted in separate states, qa and qb. To track their connection by rewriting, we introduce a relation ^ on states, such that qb ^ qa. In general, we require ^ to be state-compatible and state-coherent, which are defined as follows (see also Fig. 1).

Definition 8. Let A = (F, Q, Qf, A) be a tree automaton, and » c Q x Q be a relation on the states of A. We say that (A, ») is state-compatible with a TRS R if for all state substitutions a, rules l — r e R and states q e Q, if la q then ra q' for some q' e Q with q ^ q'. We say that (A, ») is state-coherent if {q' | q e Qf, q ^ q'} c Qf, and if for all f (q:,..., qi,... , qn) — q e A and qi » q\ there is some q' e Q with f (q:,..., q\,..., qn) — q' e A and q » q'.

The purpose of state-coherence is to deal with contexts in rewrite steps, as we will see in the proof of Theorem 11 below.

Example 9. Let A be an automaton with states 1, 2 (both final), and transitions a — 1 b — 2 f(1,1) — 2

Furthermore, let 2 ^ 2 and 2 ^ 1. Then (A, ») is state-coherent and state-compatible with R = {f(x, x) — b, b — a} and L(A) = {f(a, a), b, a}. Note that this automaton was obtained from the quasi-deterministic automaton from Example 7 by keeping only the transitions to designated states. We will see in Section 4.1 that this construction works in general.

Remark 10. If (A, ») is state-coherent, then (A, »=) and (A, »*) are also state-coherent. The same holds for state-compatibility with R.

B. Felgenhauer, R. Thiemann/Information and Computation ••• (••••) •

3.3. Soundness and completeness

Next we prove the analogue of Theorem 2 for state-coherent, state-compatible automata.

Theorem 11. Let A be a tree automaton such that (A, is state-coherent and state-compatible with the TRS R for some relation ^. Then

1. if R is left-linear, then L(A) is closed under rewriting by R, and

2. if A is deterministic, then L(A) is closed under rewriting by R.

Proof. Let A = (F, Q, Qf, A). First we show that whenever It ^^ q for some substitution t and rule l ^ r e R, then there is a state q' e Q with q ^ q' and rT ^^ q'. By the assumptions, we can extract from a sequence It ^^ q a state substitution a such that It ^^ la ^^ q: For each x e Var(l), we map x to the state reached from T(x) in the given sequence. The state is unique either by left-linearity, or because the given automaton is deterministic. By state-compatibility, we obtain a state q' such that q ^ q' and rT ^^ ra ^^ q'.

Using state-coherence we can show by structural induction on C that whenever C[q] ^^ q. and q ^ q', then C[q'] ^^ q. for some state q. with q. ^ q..

Finally, assume that t e L(A) and t ^r t'. Then there exist a rule l ^ r e R, a context C and a substitution t such that t = C[It] and t' = C[rT]. We have a derivation t = C[It] ^^ C[q] ^^ q. e Qf. By the preceding observations we can find states q ^ q' and q. ^ q. such that t' = C[rT] ^^ C[q'] ^^ q.. Note that by state-coherence, q. e Qf implies q. e Qf, so that t' e L(A). □

Note that Theorem 11 generalizes Theorem 2 (choose ^ to be the identity relation on states, which is always state-coherent). Moreover, the converse of Theorem 11 holds for trim, deterministic automata. We will prove this in Theorem 13 below, which allows us to derive our main decidability result in Corollary 14. But first let us show by example that the converse fails for some trim, non-deterministic automaton and ground TRS R.

Example 12. Consider the TRS R = {a — b} and the automaton A with states 0, 1, 2, 3, final state 0, and transitions

a — 1 b — 2 f(1) — 0 g(1) — 0

b — 3 f(2) — 0 g(3) — 0

This automaton accepts L(A) = {f(a), f(b), g(a), g(b)}, which is closed under rewriting by R. Assume that (A, ») is state-coherent and state-compatible with R. By state-compatibility, a — b begets 1 ^ 2 or 1 ^ 3. If 1 ^ 2, then state-coherence, considering the transition g(1) — 0, requires a transition with left-hand side g(2), which does not exist. Similarly, if 1 ^ 3, then f(1) — 0 requires a transition with left-hand side f(3), which does not exist.

Theorem 13. Let A be a trim, deterministic tree automaton such that L(A) is closed under rewriting by the TRS R. Then there is a relation ^ such that (A, ») is state-coherent and state-compatible with R.

Proof. Let A = (F, Q, Qf, A). We define ^ as follows: q ^ q' if and only if for some terms t, t' e T(F), we have

q A t t'A q' (1)

Note that by virtue of A being deterministic, t and t' determine q and q' uniquely. We show that (A, ») is state-coherent and state-compatible.

1. (state-coherence) If q e Qf and q ^ q', then there exist terms t, t' satisfying (1). In particular, q e Qf implies t e L(A), and t ^r t' implies t' e L(A), because L(A) is closed under rewriting by R. Because A is deterministic, t' determines q' uniquely, and q' e Qf follows.

2. (state-coherence) Assume that f (q1,..., qn) ^ q e A and qi ^ qi for some index i and state qi. By (1) there are ti, ti such that qi jj^ ti ^r ti ^^ qi. Because all qj are reachable, we can fix terms tj with tj ^^ qj for j = i. The state q is productive, so there is a context C such that C[q] ^^ q. e Qf. Let t = f t, ..., tn) and t' = f t, ..., ti,..., tn). Then C[t] e L(A) and C[t] ^r C[t'], hence C[t'] e L(A) as well. Consequently, there are states q', q. such that

C [q] A C [t] — C it'] A C [ f (q1,...,qi,..., qn)] — C [q'j -A q. e Qf In particular, we have a transition f (qi, ..., qi, ..., qn) ^ q' e A, and q ^ q'.

3. (state-compatibility) Assume that la q for a state substitution a. All states of A are reachable, so there is a substitution t : V u T(F) with t(x) a(x) for all x e V. Furthermore, q is productive, so that for some context C, C[q] q. e Qf. We have C[It] e L(A) and C[It] ur C[tt]. Consequently, C[tt] e L(A) and for some states q', q,,

C [q] A C [la ] A C [It ] — C [tt ] A C [qf] A q, e Qf

In particular, tt q'. Recall that A is deterministic. Hence we can decompose this rewrite sequence as follows: tt ra q'. We conclude by noting that q ^ q' by the definition of □

Corollary 14. The problem R(L) c L is decidable for finite tree automata A.

Proof. W.l.o.g. we may assume that A is deterministic. Using Proposition 4 we may also assume that A is trim. By Theorems 11 and 13 the problem reduces to whether there is some relation ^ such that (A, ») is both state-compatible with R and state-coherent. But since there are only finitely many relations ^ we can just test state-compatibility and state-coherence for each □

Remark 15. As a consequence of Theorem 13, regular languages accepted by state-coherent automata that are state-compatible with a fixed TRS R are closed under intersection and union. This can also be shown directly by a product construction.

3.4. Deciding R(L(A)) c L(A)

In the remainder of this section we show that instead of testing all possible relations it suffices to construct a minimal one. We proceed as follows:

1. We assume that A = (F, Q, Qf, A) is finite, trim and deterministic. Note that given a non-deterministic automaton, we can compute an equivalent deterministic one in exponential time. Once we have a deterministic automaton, we can compute an equivalent trim one in polynomial time.

2. In the following steps we will find the smallest relation ^ that makes (A, ») both state-compatible with R and state-coherent, if such a relation exists. Initially set ^ = 0.

3. Consider each rule l u r e R, each state substitution a, and each state q e Q such that la uA q. If there is some state q' with ra u*a q' then add (q, q') to Otherwise, L(A) is not closed under rewriting by R, and the procedure terminates.

At this point it is ensured that (any extension of) ^ is state-compatible with R.

4. In order to ensure state-coherence, we repeat the following process until ^ is not increased any further. Whenever q ^ q' and f (q\,..., q, = q,..., qn) u q. e A, then we look for a transition with left-hand side f (q\,..., qi = q',..., qn) in A. If no such transition exists, state-coherence fails, and the algorithm terminates. Otherwise, let q, e Q be the corresponding right-hand side and add (q., q,) to

At this point ^ is the smallest relation satisfying state-compatibility with R which additionally satisfies the second condition of state-coherence.

5. Assert for all q ^ q' with final state q that also q' is final.

Step 3 identifies the applicable instances of the state-compatibility constraint. It consists of a polynomial number of NP queries. Steps 4 and 5 can be performed in polynomial time. The whole procedure is, therefore, in the AP (or PNP) complexity class for deterministic automata as input.

Remark 16. Using [3, Exercise 1.12.2], which shows that it is NP-hard to decide whether an instance of a term l is accepted by a tree automaton A, we can show that deciding whether the language accepted by a deterministic automaton is closed under rewriting by a given TRS is co-NP-hard. To wit, given a term l, a tree automaton A, a fresh unary function * and a fresh constant ❖, then *(L(A)) = {*(x) | x e L(A)} is closed under rewriting by *(l) u ❖ if and only if no instance of l is accepted by A.

4. Related work

41. Quasi-deterministic automata

In this section we relate deterministic state-coherent, state-compatible automata to quasi-deterministic automata by Korp and Middeldorp [13]. Our interest in quasi-deterministic automata is two-fold. First, they represent the state of the art in tree automata completion for non-left-linear TRSs. Secondly, the existing implementation of tree automata completion in TjT"2 and CSI is based on quasi-deterministic automata, and our goal is to certify the resulting non-confluence and

B. Felgenhauer, R. Thiemann/Information and Computation ... (••••) ...... 7

termination proofs (cf. Sections 5 and 6). We show that given a compatible, quasi-deterministic automaton, we can extract a state-compatible, deterministic automaton accepting the same language, while the opposite direction fails in the presence of collapsing rules. In our tools' implementation, due to Korp, this restriction is overcome by adding e-transitions to the quasi-deterministic automata, and we will conclude the section with a description of this extension, which was hitherto unpublished.

First we recall the definitions of compatibility and quasi-determinism.

Definition 17 (Definition 18 of [13j). Let A = (F, Q, Qf, A) be a tree automaton. For a left-hand side l e lhs(A) of a transition, we denote the set {q 11 u q e A} of possible right-hand sides by Q (l). We call A quasi-deterministic if for every l e lhs(A) there exists a designated state p e Q (l) such that for all transitions f (qi, ..., qn) u q e A and i e{1, ..., n} with qi e Q (l), the transition f (q1, ...,q—1, p, qi+1,..., qn) u q belongs to A. Moreover, we require that p e Qf whenever Q (l) contains a final state.

For each l e lhs(A) we fix a designated state pl satisfying the constraints of Definition 17. We denote the set of designated states by Qd and the set {l u pl 11 e lhs(A)} by Ad. The notion of compatibility used for quasi-deterministic tree automata is refined slightly from the standard one, Definition 1.

Definition 18 (Definition 23 of [13j). Let R be a TRS and L a language. Let A = (F, Q, Qf, A) be a quasi-deterministic tree automaton. We say that A is compatible with R and L if L c L(A) and for each rewrite rule l u r e R and state substitution a: Var(l) u Qd such that la w*Ad q it holds that ra u'*a q.

To bring the notion of compatibility closer to our work, we say that A is compatible with R if A is compatible with R and 0, making the condition L c L(A) vacuous. Example 7 exhibits a quasi-deterministic automaton that is compatible with R.

We will show that for each quasi-deterministic automaton that is compatible with a TRS R, there is a deterministic, state-coherent automaton that is state-compatible with R and accepts the same language. To this end, we need the following key lemma, a slight generalization of [13, Lemma 20], which shows that a quasi-deterministic automaton A is almost deterministic: all but the last step in a reduction can be performed using the deterministic Ad transitions.

Lemma 19. Let A = (F, Q, Qf, A) be a quasi-deterministic automaton. If t uA q then t w*Ad -ua qfor all terms t e T(F U Q) and states q e Q.

Proof. The proof is identical to the proof of [13, Lemma 20], except when ti in t = f (t1,..., tn) is a state. In that case, we let pk = qi = ti. □

Theorem 20. Let A = (F, Q, Qf, A) be a quasi-deterministic tree automaton that is compatible with R. Then A = (F, Qd, Qf 0 Qd, Ad) makes (A', ^) state-coherent and state-compatible with R, where q ^ q' if q = q' or, for some left-hand side l e lhs(A), q e Q (l) and q' = py Furthermore, L(A') = L(A).

Proof. Note that ua = ua and ua = uAd.

1. (state-coherence) Assume that q is final in A1, and q ^ q'. If q = q' then q' is final, too. Otherwise, there is a left-hand side l such that q e Q(l) and q' = pi is the designated state of l. Since Q (l) contains a final state (namely, q), q' must be final as well by Definition 17.

2. (state-coherence) Let l = f(q1,..., qi,..., qn) and l'= f(q1,..., qi,..., qn), where qi ^ qi. Furthermore, let l u q e Ad. If qi = qi then l' u q e Ad and q ^ q. Otherwise, there is a left-hand side l' such that qi e Q (l') and qi = pl. is the designated state of l.. By Definition 17, there is a transition l' u q in A. Thus, l' is a left-hand side and q e Q (l'). Furthermore, l' u py e Ad, and q ^ py follows.

3. (state-compatibility) Let a be a state substitution and la u*Ad q. By compatibility, we have ra u*a q. If r is a variable, we are done, noting that q ^ q. Otherwise, using Lemma 19, there is a left-hand side l' e lhs(A) such that ra u*Ai l' ua q. Consequently, ra u*Ai ■ uAd py, and since q e Q (l'), we have q ^ py.

4. (accepted language) L(A!) c L(A) is obvious. To show L(A) c L(A'), assume that t e L(A), i.e., t u*a q e Qf. By Lemma 19, there is a left-hand side l e lhs(A) such that t u*Ai l ua q. As in the previous item we conclude that t u*Ai pl, and q ^ pl. The state pl is final by state-coherence, so t e L(A') follows. □

In the opposite direction, we have a positive result for non-collapsing TRSs.

Theorem 21. Let A = (F, Q, Qf, A) be a deterministic automaton and the relation ^ c Q x Q be such that (A, ^) is state-coherent and state-compatible with R. Furthermore, assume that R contains no collapsing rules. Then the automaton A1 =

(F, Q, Qf, A') with A' = {l ^ q' | l ^ q e A, q q'} is a quasi-deterministic automaton with designated states pl = q for l ^ q e A, such that A is compatible with R and accepts the same language as A.

Proof. Verifying that the construction results in a quasi-deterministic automaton that is compatible with R is straightforward. Note that applying Theorem 20 to A results in some (A', »") with L(A') = L(A), where A' is A with states restricted to Qd, the right-hand sides of A'. This restriction preserves the accepted language. Therefore, L(A) = L(A). □

If R contains collapsing rules, quasi-deterministic, compatible automata may be weaker than state-coherent, state-compatible ones, as the following example demonstrates.

Example 22. Let R = {f(x, x) ^ x}. The automaton A over {f, a} with states 1, 2, both final, and transitions a ^ 1 f(1,1) ^ 2

accepts L = {f(a, a), a}. Furthermore, (A, ») is state-coherent and state-compatible with R if we let 2 ^ 1.

Now assume that A = ({f, a}, Q, Qf, A) is a quasi-deterministic automaton and compatible with R, and that f(a, a) e L(A). We will show that A accepts all terms over {f, a}. Note that since f(a, a) is accepted, a must be a left-hand side of A. Let q be the designated state of a. By Lemma 19, we have a run f(a, a) ^Ad f(q, q) ^a q' e Qf. Let q' be the designated state of the left-hand side f(q, q). By quasi-determinism, q' is a final state. Compatibility requires that f(q, q) ~^Ad q' A^ q, i.e., q. = q. So we have a final state q and two transitions a ^ q, f(q, q) ^ q, and A accepts all of T({f, a}).

Remark 23. In his thesis [12], Korp generalizes Definition 17 (cf. [12, Definition 3.10]) by incorporating an auxiliary relation that may be viewed as a precursor to our relation The modified definition permits smaller automata, which benefits implementations, but is more complicated than Definition 17. The modification also does not add expressive power. Indeed if A = (F, Q, Qf, A) satisfies [12, Definition 3.10] using , then taking A' = {l ^ q |l e lhs(A), <Pa(H) ^ q}, the automaton A = (F, Q, Qf, A') satisfies Definition 17, noting that $a(1) is just another notation for the designated state pl of l. Furthermore, C(A) = C(A).

The actual implementation of quasi-deterministic automata in TjT"2 and CSI by Korp is based on an extension with e-transitions, which are transitions from states to states.

Definition 24. Let As = (F, Q, Qf, A, Ae) be a tree automaton with e-transitions Ae c Q x Q. Let A = (F, Q, Qf, A) be the corresponding tree automaton with the e-transitions removed. Then Ae is extended quasi-deterministic if A is quasi-deterministic and l ^ q' e A whenever l ^ q e A and q ^ q' e Ae.

We let = ^AuAe. The following lemma is key to justifying the extension.

Lemma 25. Let Ae = (F, Q, Qf, A, Ae) be an extended quasi-deterministic automaton with e-transitions. Then for terms t e T(F), we have t ^A q if and only if t ^A q.

Proof. By structural induction on t.

As a consequence, the language L(Ae) = {t e T(F) | t ^^ q, q e Qf} accepted by Ae satisfies L(AS) = L(A). Furthermore, Lemmas 20 and 21 and Theorem 24 of [13] remain true for extended quasi-compatible automata, and therefore L(Ae) is closed under rewriting by R if Ae is extended quasi-deterministic and compatible with R. In order to obtain a deterministic, state-coherent, state-compatible automaton, we apply Theorem 20, and then extend ^ as necessary to ensure compatibility with collapsing rules from R. This is easy in our implementation, because e-transitions correspond to state instances of collapsing rules. More precisely, q' ^ q is added only if l ^ x e R, la ^^ q and a(x) = q'. A general approach is to observe that (F, Q, Qf, Ad) accepts the same language as Ae, and then compute ^ by the algorithm from Section 3.4.

4.2. Partial quasi-models

In [4], Endrullis et al. introduce partial quasi-models for establishing local termination, i.e., termination on a restricted set of terms. To this end they consider partial F-algebras, which consist of a carrier A and a partial function [ f ] : An ^ A for each f e F of arity n. Let [t, a] be the (partial) evaluation map T(F, V) x AV ^ A'.

Definition 26 ([4, Definition 7.2]). A partial quasi-model of a TRS R is a partial F-algebra with carrier A equipped with a partial order > such that

1. for all I ^ r e R and a : V ^ A, if [£, a] is defined, [£, a] > [r, a]; and

2. [ f ] is closed and monotone with respect to > for all f e F.

B. Felgenhauer, R. Thiemann/Information and Computation ("") 9

Partial quasi-models over finite carriers A are almost the same as deterministic, state-compatible, state-coherent automata.1 We briefly explain the connection between the two concepts.

Given a quasi-model for R with finite carrier A, we construct an equivalent state-coherent, state-compatible (with R) automaton as follows. We take Q = A, ^ = > and

A = {/(q1,...,qn) — [f ](q1,...,qn) | [f ](q1,...,qn) is defined}

It is straight-forward to prove that for ground terms t, t —A q if and only if [t, a] is defined and [t, a] = q. Furthermore, the first condition of Definition 26 is equivalent to state-compatibility, while the second condition amounts to state-coherence, disregarding the restrictions on final states. This construction is invertible; given a deterministic automaton A = (F, Q, Qf, A) and a partial order ^ such that (A, ») is state-coherent and state-compatible with R, we take A = Q, > = » and let [ f ](q1, ...,qn) = q whenever f (qu ..., qn) — q e A.

There are a few differences between deterministic state-coherent, state-compatible automata and quasi-models with finite carriers. First of all, tree automata come with a set of final states that is different from the set of states; this, crucially, allows them to accept languages that are not closed under the subterm relation. (This is not a problem in [4], where the languages of interest are terminating terms. These languages can be closed under subterms without losing the termination property.)

A second difference is that we do not require ^ to be a partial order. However, this difference is not essential: First of all, by Remark 10, we may assume ^ to be reflexive and transitive without loss of generality. Furthermore, if q ^ q' ^ q, then the state-coherence conditions force q and q' to be equivalent states. By mapping each state to its equivalence class with respect to = = ^ n we obtain a new state-coherent, state-compatible (with R) automaton equipped with the partial order »/=.

Finally, note that we allow tree automata to be non-deterministic, and non-determinism does not map naturally to partial quasi-models (without effectively performing a powerset construction).

5. Confluence

Tree automata have an obvious application for disproving (local) confluence, as pointed out in [19]. Given some peak s t (or s r — • —r t for local confluence), one has to prove that s and t are not joinable. To this end, it suffices

to find suitable tree automata over-approximating descendants of s and t, respectively.

Observation 27. Let As and At be tree automata. If s e L(As), t e L(At), L(As) n L(At) = 0, and both automata are closed under rewriting with R, then s and t are not joinable w.r.t. R.

Given the peak and both automata for a concrete TRS R, by the decision procedure for closure under rewriting it is easy to check the conditions of Observation 27.

Notice that due to the precision of our criterion, we also can strengthen the power of confluence tools which are based on Observation 27 as demonstrated in the upcoming example.

Example 28. Consider the TRS R consisting of the following rules.

c — f(a, b) f(x, x) — x f(a, a) — f(b, b)

c — f(a, a) f(x, y) — f(y, x) f(b, b) — f(a, a)

Disproving local confluence is equivalent to finding a non-joinable critical pair. Note that R contains only one non-joinable critical pair, arising from the peak f(a, b) r — c — r f(a, a).

Proving non-joinability of f(a, b) and f(a, a) is possible via Observation 27 and the following two automata: the first automaton has one final state 3, and consists of four transitions, and accepts the language {f(a, b), f(b, a)}.

a — 1 b — 2 f(1, 2) — 3 f(2,1) — 3

The second automaton has three final states 1, 2, and 3, and also contains four transitions. Its language is {f(a, a), f(b, b), a, b}.

a — 1 b — 2 f(1,1) — 3 f(2, 2) — 3

By Observation 27, R is not confluent. In the following we will argue that it is impossible to show non-confluence this way when using quasi-deterministic automata and compatibility for closure under rewriting.

Assume there is some quasi-deterministic automaton A with transitions A which accepts f(a, a) and is compatible with R. In the same way as in Example 22 one obtains transitions a — q and f(q, q) — q, where q is a final state and the designated state of both a and f(q, q). Since A is closed under rewriting it must also contain f(b, b) and thus, by the same

1 This connection was pointed to the authors by J. Endrullis in personal communication.

B. Felgenhauer, R. Thiemann / Information and Computation ••• (••••) •

reasoning there are transitions b ^ p and f(p, p) ^ p for some final state p which is also the designated state of b and f(p, p).

Since A is compatible and f(a, a) ^Ad q we deduce f(b, b) ^^ q. By Lemma 19 we further conclude f(b, b) f(p, p) q. Hence, both p and q are contained in Q (f(p, p)) where p is the designated state of f(p, p). Thus, by the definition of quasi-compatible automata, we can exchange q by p in any left-hand side of a transition. So, from f(q, q) ^ q e A we know that also f(q, p) ^ q e A. Thus we obtain the derivation f(a, b) ^A f(q, p) ^ A q which shows f(a, b) e L(A). Therefore, no automaton which accepts f(a, b) is disjoint from A.

6. Match-bounds

In this section we show how tree automata can be used to prove termination via match-bounds [9]. To this end, we first recapitulate the basic concepts and important results about match-bounds. Note that match-bounds require special treatment for non-left-linear TRSs (see Example 30). In [13], raise-consistent automata are used to ensure correctness. We present an adaptation of raise-consistency to our setting, leading to the new notion of state-raise-consistency. Finally, we explain how we treat quasi-compatibility, a weaker condition than compatibility that has been introduced specifically for match-bounds.

6.1. A short introduction to match-bounds

Match-bounds is a termination technique which is based on the following idea.

1. One considers an enriched signature where each original symbol of F is labeled by some natural number to yield a symbol of F' = F x N.

2. The rewrite rules of R over F are enriched by labels in a way that each rewrite step corresponds to an increase of labels.2 The result is an enriched TRS R' over F'. Possible enrichments are match and roof where the details are not relevant for this paper.

3. One tries to show boundedness, i.e., whenever one picks some initial term where all labels in t are 0, then there must be some bound b so that the labels never exceed b when rewriting with R'.

4. If boundedness is ensured, then termination follows as any infinite derivation would lead to an infinite increase in the labels by 2., which is impossible by 3.

Termination via match-bounds can easily be treated as a tree automata problem: the set of terms where every symbol is labeled by 0, lift0(F), is accepted by a tree automaton, and moreover, tree automata completion of lift0 (F) under ur , if successful, yields a suitable bound b for Step 3, namely the largest label in the transitions of the resulting automaton A. In short, match-bounds can be summarized as follows.

Theorem 29. If R' is a valid enrichment of R, R is left-linear, A is a finite tree automaton, lift0 (F) c L(A), and A is closed under ur , then R is terminating.

It is well known that the restriction to left-linearity is essential as otherwise not every rewrite step of R can be simulated by a corresponding step in R'.

Example 30. For R = {f(x, x) u f(a, x)} and both possible enrichments we obtain R' = {fi(x, x) u fi+1(ai+1, x) | i e N}. Now the derivation

f(a'a) Rf(a'a) Rf(a'a) R...

cannot be simulated in the enriched system since after one step f0(a0, a0) —> f1 (a1, a0)

there is a mismatch of the labels which cannot be repaired by R', and thus, the evaluation gets stuck.

To overcome this problem, we follow [13] and consider a special rewrite relation which allows to adjust labels for matching non-left-linear rules. In the following definition, base is the function which removes all labels of a term, and sft takes two terms with base(s) = base(t) as input and performs a component-wise maximum on all labels. For example, f1(a1, a3)ff2(a0, a3) = f2(a1, a3). Moreover, for non-empty sets S = {s;i, ..., sn} we define base(S) = {base(s1), ..., base(sn)} and fS = s1 f... fsn.

2 There is an exception concerning collapsing rules as these do not increase the labels. This is explained in detail in [9].

B. Felgenhauer, R. Thiemann/Information and Computation ••• (••••) •••-••• 11

Definition 31. Let R' be some TRS over an enriched signature F'. We define the relation -- r as s -- r t if there is some rule l — r e R', l = D (x1,..., xn> with all variables displayed, s = C[D<s1, ..., sn>], Si = {sj | 1 < j < n, Xj = Xj} and |base(Sj)| = 1 for each 1 < i < n, t = {x1/f S1, . . . , xn/fSn}, and t = C[rt].

Note that t in the previous definition is well-defined since whenever xi = Xj then Si = Sj. Moreover, if R' is left-linear,

then ri is identical to —r. The purpose of r is to simulate rewriting steps using non-left-linear rules on enriched terms, even when matching fails due to mismatches in the labels as in Example 30.

Example 32. Recall Example 30, where f1(a1, a0) could not be rewritten by the rule f1(x, x) — f2(a2, x) because x would

have to equal both a1 and a0. With Definition 31 we have f1(a1, a0) r f2(a2, a1) using the rule f1 (x,x) — f2(a2,x) and the substitution given by t(x) = a1fa0 = a1.

Using this new relation it is possible to generalize Theorem 29 to arbitrary, possibly non-left-linear TRSs.

Theorem 33. If R' is some valid enrichment of R, A is some tree automaton, lift0(F) c L(A), and A is closed under -r, then R is terminating.

6.2. Adapting raise-consistency

The main difficulty when handling non-left-linear TRSs stems from the changed closure property where the standard

rewrite relation -r has been replaced by -r.

To handle this problem, the notion of raise-consistency was introduced in [13]. The basic idea is to ensure that whenever an automaton accepts terms s1, s2 with base(s1) = base(s2), it also accepts s1fs2 in a related state, cf. Lemma 35, thereby

allowing to perform a step s -r t step by first replacing s by another term s' accepted by the automaton (namely s' = C[lt] in terms of Definition 31), followed by a plain rewrite step using a rule from R'.

In the following we first adapt the notion of raise-consistency to our setting leading to the notion of state-raise-consistency, and prove that state-raise-consistency in combination with state-compatibility and state-coherence ensures

closure under - r . Furthermore, we show that raise-consistent quasi-deterministic automata can easily be turned into state-raise-consistent deterministic automata.

Definition 34. (A, ») is state-raise-consistent if q ^ q' for any transitions fi (q1,..., qn) - q and fj (q1,..., qn) - q' of A with i < j.

Until Corollary 37 we assume a fixed deterministic automaton A, a TRS R', and a relation where (A, ») is state-raise-consistent, state-coherent, and state-compatible w.r.t. R'. Following the basic idea outlined above, we show that L(A)

is closed under rewriting by -r. Lemma 35 deals with combining the transitions si qi to accept s, where s is decomposed into C [ D<s1,..., sn>] for a step s - ri t according to Definition 31.

Lemma 35. Let S = {s1, ...,sn} with |base( S )| = 1. If si qifor all 1 < i < n then there is some qsuch that f S q and qi q for all 1 < i < n

Proof. We only consider the case n = 2 here, which is then easily generalized to arbitrary n. So, let base(s1) = base(s2), and si qi for both i = 1, 2. We perform induction on s1, so let s1 = fa (t1, ..., tk) fa(p1,..., Pk) -A q1 and s2 = fb(t1,..., t'k) fb(p'v ..., Pk) -A q2 where base(ti) = base(tj) for all 1 < i < k. Hence, by the induction hypothesis we obtain qi such that tift[-*A qi, pi »* qi, and pi »* qi for each i. Thus, s^ = fmax(a,b)(t1ft'1,..., tkft'k) -A fmax(a,b)(q'v ..., q'k). Using fa(p1,..., pk) - a q1, pi q'i and state-coherence, there is some q\ such that fa(q'v ..., q'k) -a q^ and q1 qi. Similarly, we obtain q'2 such that fb(q'1,..., q'k) - A q'2 and q2 q2.

It remains to prove existence of some q' such that fmax(a,b)(q'1,..., q'k) - A q' and q* q' for both i = 1, 2: then we would be able to derive the desired result s1 fs2 fmax(a,b)(q'1,..., q'k) -A q' and qi q' q' for both i. To show the existence of q' we distinguish cases depending on how a and b compare.

If a = b, then fa(q'1,..., q'k) -a q*, and fb(q\,..., q'k) -a q' imply q\ = q'2 by determinism of A. Hence, we can let q' := q1 = q'2 and are done. If a < b, then by state-raise-consistency we conclude q} ^ q'2 and choose q' := q'2 to derive the desired result: fmax(a,b)(q'1,..., qk) = fb(q'1,..., q'k) -A q2 = q' and q' q'2 = q' for both i. The final case, a > b, is symmetric to a < b.

We are no ready to deal with a full step s Ur, t. Lemma 36. If s u^ q and s Ur, t, then there exists some q' with t u^ q' and q q'.

Proof. Since s Ur, t there are l u r e R' and C, D, si, Si, and t satisfying the conditions of Definition 31. Hence, we can decompose s u^ q into s = C[D<s1, ..., sn)] u^ C[D(q1,..., qn)] u^ C[p] u^ q, where si u^ qi for each 1 < i < n and D<q1,..., qn) u^ p. Since |base(Si)| = 1 we use Lemma 35 to obtain for each i some qi such that f Si u^ qi and qi q'i. Moreover, since Si = Sj whenever xi = xj we can ensure that qi = qj whenever xi = xj. This allows us to define a = {x1/q'1,..., xn/q'n} and we conclude both D<q1,..., q'n) = la and t(xi) u^ a(xi) for all 1 < i < n.

From D<q1,..., qn) u^ p, qi qi and state-coherence we obtain a p0 satisfying la = D <q'1,..., q'n) u^ p0 and p p0. From la u^ p0 and state-compatibility we conclude that there is some p' such that ra u^ p' and p0 ^ p' and hence p p'. In the same way, from C[p] u^ q, and p p' we obtain some q' such that C[p'] u^ q' and q q'. This finally yields t = C[tt] u^ C[ra] u^ C[p'] u^ q' where q q'. □

Corollary 37. If s e L(A) and s Ut, then t e L(A).

Therefore, we have the following criterion for termination of R.

Corollary 38. Let R be a TRS and R' a suitable enrichment for R. If A is a finite deterministic tree automaton and (A, ^) is state-raise-consistent, state-coherent, and state-compatible with R', and furthermore lift0(F) c L(A) then R is terminating.

Observe that Corollary 37 shows that state-coherence together with state-compatibility, state-raise-consistency, and determinism is a sufficient criterion for closure under rewriting with U r, . However, it is not a necessary criterion, so there

is no analogue of Theorem 13 where we replace ur, by Ur, and add state-raise-consistency. This is demonstrated in the following example.

Example 39. Let A be the deterministic and trim automaton with transitions a0 ^ 0, a1 ^ 1, fo(0) 2 and final states 1, 2.

It accepts the language L = {a1, f0(a0)}. Let R = 0. Then obviously L(A) = L is closed under rewriting w.r.t. -— ri.

Now assume there is some relation ^ such that (A, ^) is both state-coherent and state-raise-consistent. From the latter and the transitions a0 — 0 and a1 — 1 we conclude 0 ^ 1. In combination with the transition f0(0) — 2 and state-coherence there must be some state q satisfying f0(1) — q and 2 ^ q. However, since there is no transition with left-hand side f0(1) we derived a contradiction, and thus there is no such relation

We present another example that shows that the match-bounds technique potentially suffers from this limitation.

Example 40. Consider the TRS R = {f(g(g(x))) — f(g(x))}, whose enrichment is R = {fa(gb(gc(x))) — fd(gd(x)) | d = min(a, b, c) + 1}. Let the automaton A have states 1, 2, both final, and the following transitions:

f0(1) — 1 g0(1) — 1 C0 — 1 f 1 (2) — 1 g1 (1 ) — 2

With ^ as equality, A is deterministic, state-coherent, and state-compatible with R', and therefore R is match-bounded.

However, there is no (A, ^) that is finite, deterministic, state-coherent, state-compatible with R', and also state-raise-consistent, and accepts f0(g0(c0)) for all k > 0. To see why, let qi be the state accepting g0(c0). There is a state q with f0(g0(g0(qi))) q, and by state-compatibility with R', there are states q', q" such that f1 (g1(qi)) —a h(q') —A q", where q ^ q". In particular, there are transitions g1(qi) — q', and g0(qi) — qi+1, which implies qi+1 ^ q' by state-raise-consistency. By state-coherence this implies that whenever A accepts C[g0(qi)], then it also accepts C [g1 (qi )]. We have Mg^d))) in L(A) by closure under rewriting, and by induction on j we can now show that f1 (g{+1(g0-'(c0))) e L(A). In particular, f1(g1+1(c0)) is accepted by A as well.

By iterating this construction (increasing all labels of f and g by 1 in each iteration), we can show that A accepts fa(gka(c0)) for all k > 0 and a e N, which requires infinitely many transitions (at least one for each of the symbols fa and ga ), contradicting finiteness of A.

Despite incompleteness, state-raise-consistency still subsumes the criterion of raise-consistency for quasi-deterministic automata.

B. Felgenhauer, R. Thiemann/Information and Computation ••• (••••) 13

Definition 41 ([13, Definition 27]). Let A = (F', Q, Qf, A) be a tree automaton. We say that A is raise-consistent if for every pair of transitions fi (q1,..., qn) - q and fj (q1,..., qn) - q' in A with i < j, the transition fj (q1,..., qn) - q belongs to A.

Theorem 42. Let A, R, A, and ^ be as in Theorem 20. If A is raise-consistent then (A, ») is state-raise-consistent.

Proof. Let fi (q1,..., qn) - q e Ad and fj (q1,..., qn) - q' e Ad be rules of A with i < j. Hence, both rules are also present in A and by raise-consistency we conclude that fj(q1,..., qn) - q e A. By definition of Ad we know that q' = Q (f j(q1,..., qn)) is the designated state and thus, q ^ q' by the definition of □

Similarly to Example 22 we further prove that the inclusion is strict by providing a rewrite system where match-boundedness cannot be proven using quasi-deterministic automata.

Example 43. Let R = {f(x, x) - x, f(f(x, y), z) - f(a, a)} over signature F = {a, f}. Then the enriched system is R' = {fi(x, x) - x, fi (f j(x, y), z) - fk(ak, ak) | i, j, k e N, k = 1 + min(i, j)}. In particular, R' contains the rules fi(x, x) - x and fi (fi (x, y), z) - fi+1 (ai+1, ai+1) for all i e N.

The deterministic automaton A over {f, a} with states 1 , 2, both final, and transitions

a0 - 2 f0 (p, q) - 1 for all p, q e{1, 2} ai - 2 fi(2, 2) - 1

accepts all terms in lift0(F) and is closed under rewriting with r, as (A, ») is state-coherent and state-compatible with R' and state-raise-consistent if we let 1 ^ 1 and 1 ^ 2. Hence, by Theorem 33 and Corollary 37 termination of R is proved.

We further prove that a similar proof is impossible if we use compatible, quasi-deterministic automata. To this end, assume that A = (F x N, Q, Qf, A) is a finite, quasi-deterministic automaton which is compatible3 with R and accepts all terms in lift0(F). These conditions imply that L(A) is closed under rewriting with R'. Obviously, f0(f0(a0, a0), a0) e L(A) and since f0(f0(a0, a0), a0) -r f1(a1, a1) and L(A) is closed under rewriting, we also have f1(a1, a1) e L(A).

Since f1 (x, x) - x e R', we can proceed in the same way as in Example 22 to show that A accepts all terms over {f1, a1}, i.e., lift1 (F).

Now, again we have a derivation f1 (f1(a1, a1), a1) -r f2(a2, a2) and by closure under rewriting we conclude f2(a2, a2) e L(A) and afterwards derive lift2(F) c L(A) as before. Iterating this reasoning yields IJieN lf (F) c L(A). But this is impossible, since A is a finite automaton which can only have finitely many symbols in the transitions whereas IJieN f (F) contains infinitely many symbols.

6.3. Quasi-compatibility

In [13, Section 5] the improvement of quasi-compatibility is introduced, which relaxes the compatibility criterion and therefore allows to reduce the size of the automata. While it is possible to also integrate this refinement into state-compatibility, we omit the details here. The main reason is that every quasi-(state-)compatible automata can easily be turned into an automaton which is also (state-)compatible by just adding more transitions, cf. the remark between Definitions 32 and 33 in [13]. Thus, in the same way as we transform quasi-deterministic automata into deterministic automata within Section 4.1, we can also always transform quasi-compatible automata into compatible ones without losing power.

7. Formalization

We have formalized all results from Section 3 and significant parts of Sections 5 and 6 within IsaFoR, our Isabelle Formalization of Rewriting, in combination with executable algorithms which check state-compatibility, state-coherence, and state-raise-consistency. These are used in CeTA [18], a certifier for several properties related to term rewriting.

On the analyzer side we have extended the termination tool TjT"2 [11] and the confluence tool CSI [19] to produce state-coherent, state-compatible automata. Since both tools use quasi-deterministic automata in their completion process, we apply the construction of Theorem 20 as a post-processing step, resulting in a state-coherent, state-compatible automaton. CeTA can then be used to certify this output. In contrast to the earlier version of CeTA corresponding to [5], CeTA now supports match-bounds for non-left-linear TRSs as well.

All tools and the formalization are available at http://cl-informatik.uibk.ac.at/research/software/ (CeTA + IsaFoR version 2.23, TTT2 version 1.15, CSI version 0.5.1.)

3 We do not even require raise-consistency of A.

Table 1

Definitions and properties which are available in IsaFoR.

Definition or property Content

Preliminaries basic notions on term rewriting and tree automata

Algorithms membership, intersection, emptiness of tree aut.

Definition 3 reachable and productive states

Proposition 4 algorithm to compute trim automaton

Definition 8 state-compatibility and state-coherence

Remark 10 state-coherence of reflexive transitive closure

Theorem 11 soundness of state-coherence and state-compat.

Theorem 13 completeness of state-coherence and state-compat.

Corollary 14 decidability of closure under rewriting

Section 3.4 decision procedure of closure under rewriting

Observation 27 disproving joinability via tree automata

Theorem 29 soundness of match-bounds, left-linear version

Definition 31 rewrite relation ^ R

Theorem 33 soundness of match-bounds, non-left-linear version

Definition 34 state-raise-consistency

Lemmas 35 and 36 simulating ^ R-steps

Corollaries 37 and 38 sufficient criterion for match-bounds

In the following, we illustrate in more detail which parts of the paper are available in IsaFoR. Furthermore we motivate our design choices in the formalization, and elaborate on problems w.r.t. executability of the algorithms. We also compare our formalization to related work. The majority of the formalization is located in the files Tree_Automata.thy, Tree_Automata_Impl.thy, and Raise_Consistency.thy.

71. Coverage

Table 1 lists all the definitions and properties from the paper that have been formalized in IsaFoR.

Note that it does neither contain explicit notions for compatibility nor for quasi-deterministic tree automata, and hence also the various counterexamples from the paper are not contained in IsaFoR. Still, compatibility is supported by CeTA as it is simulated by state-compatibility and state-coherence where ^ is chosen as the identity relation. Then state-coherence is vacuously satisfied and state-compatibility coincides with compatibility. In the case of quasi-deterministic automata, the tools are required to apply the conversion (Theorems 20 and 42) into state-compatible and state-coherent deterministic automata in their output.

Concerning the formalization of the decision procedure for closure under rewriting, there is a restriction on the input: CeTA demands that the tree automaton is already deterministic. The major reason is that at the moment we only formalized a non-optimized version of the powerset-construction which turns each automaton into an equivalent deterministic one: it always converts an automaton with | Q | states into an automaton with |2Q | states.

7.2. Design choices and deviations

The formalization of basic definitions on tree automata contains two major differences to this paper: it allows e-transitions, and the set of reachable states t u*a q is formalized directly as a function ta_res mapping terms to sets of states. Using a function instead of a relation has both positive and negative effects. For example, it eases proofs which are naturally performed by induction on terms, since in f (t1,..., tn) one does not have to reduce all arguments t1 to tn sequentially in a relation, but this is done in one step in ta_res. On the other hand, one cannot trace derivations t u*a q explicitly as there is no notion of derivation. Hence, some obvious results have to be proven explicitly by induction, e.g., that removing transitions results in a smaller accepted language.

Note an interesting subtlety in the following definition of the language of a tree automaton, ta_lang, namely the demand of the function adapt_vars. The definition of a language is nearly straight-forward:

ta_lang A = {adapt_vars 11 ground t A ta_res A t n tafinal A = 0}

where the function adapt_vars does nothing except changing the type of variables of ground terms.

The necessity of adapt_vars is motivated as follows. In the formalization we encode mixed expressions like f(q1, a, q2) consisting of states q1, q2 and function symbols f, a simply as terms, where states are just represented as variables. The advantage of this encoding is the reusability of IsaFoR's library on terms. However, with this encoding, Isabelle's type system enforces that the type of variables of t in the definition of ta_lang is exactly the type of states, even if t is a ground term. Hence, if one removed adapt_vars from the definition of ta_lang, the type system of Isabelle would enforce that the type of states in A must correspond to the type of variables in ta_lang A. As a result, soundness theorems for the powerset-construction would not even type-check, since the type of states changes from a (Q) to a set (2Q), and thus, the two languages would have terms with variables of these different types, while the comparison requires equal types. Thanks

B. Felgenhauer, R. Thiemann/Information and Computation ••• (••••) 15

to adapt_vars, the type of variables in the accepted language may be different from the type of states of the automata, resolving this issue. This comes at a price: one sometimes has to reason within proofs that adapt_vars can be treated like an identity.

Another difference between paper and formalization becomes visible in Corollary 14. The formalization only states that L(A) is closed under R if and only if for the determinized and trimmed automaton there exists a suitable relation Instead of formally proving decidability by an algorithm which enumerates all possible relations, we directly formalized the algorithm of Section 3.4 as a function to compute the least such relation.

7.3. Executability

Many of the algorithms have been defined in two steps: first they have been formalized on an abstract level, which eased the corresponding soundness proofs; and later on they have been refined to fully executable ones. In fact, for some algorithms we just relied on the automatic refinement provided in [16] which turns set operations into operations on trees. For other algorithms we performed manual data refinement. Although the latter approach is more tedious, it has the advantage for the user that we additionally integrated detailed error messages which are displayed if the certifier rejects a proof. For example, when using the decision procedure which is mainly implemented via automatic refinement, we only get one bit of information; in contrast, the algorithm for ensuring state-compatibility for a user-provided relation ^ returns a detailed reason in case of a state-compatibility violation.

For some computationally expensive algorithms we performed additional manual refinement steps to increase the efficiency. For example, we group the transitions of an automaton by their root symbols and store these groups in ordered trees using Isabelle's collection framework [15]. Moreover, for each f (q1, ...,qn) u q, we precompute the closure of q under e-transitions. This speeds up the computation of ta_res while checking state-compatibility. In the end, we provide an executable algorithm which for given A and R checks whether A is closed under R. Isabelle's code generator transforms this algorithm—and several others—into Haskell code, which in combination with a small hand-written file is the source code of our certifier CeTA.

In the following we provide more details on specific algorithms which are essential for this paper.

The decision procedure requires a trim automaton as input. Here, CeTA is able to compute a trim automaton on its own, since Proposition 4 has been proven constructively. To be more precise, it is first shown that restricting an automaton to the set of reachable or productive states does not change the languages,

ta_lang (ta_only_reach A) = ta_lang A

ta_lang (ta_only_prod A) = ta_lang A

where ta_only_prod A and ta_only_reach A are the automata that are obtained from A by only keeping productive and reachable states, respectively. Then one easily defines4

trim_ta A = ta_only_prod (ta_only_reach A)

which turns an automaton into an equivalent trim one. Whereas equivalence is trivial with the help of the previous two properties, showing that the result is trim requires a connection between the restriction on reachable and productive states: We formalized that if in A all states are reachable then also in ta_only_prod A all states are reachable.

However, to make trim_ta executable, we actually need executable versions of ta_only_prod and ta_only_reach. For the former, we proved that productivity can be expressed in terms of a reflexive transitive closure which is then executable. For the latter, we implemented a working list algorithm which iteratively removes reachable states from the automaton. Note that in both cases, we did not define new recursive functions, but just proved equalities which are treated as function definitions by Isabelle's code generator [10].

The formalization of the algorithm of Section 3.4 (decide_coherent_compatible) is based on an inductive predicate that constitutes an abstract description of the decision procedure. Its soundness and completeness manifests in the following theorem.

ta_det A =^ finite (ta_states A)

decide_coherent_compatible A R <—>

(3 » . state_compatible A » R A state_coherent A

The abstract algorithm is later on refined to a fully executable one via implementing a working list algorithm to create a suitable relation It starts with all pairs of states that are enforced by state-compatibility and then iteratively adds new pairs of states which are demanded according to state-coherence.

4 This definition was simplified a lot in comparison to [5] where trim_ta was defined as a recursive algorithm.

Combining trim_ta with decide_coherent_compatible yields the decision procedure closed_under_rewriting, an executable algorithm with the desired soundness property:

ta_det A finite (ta_states A)

closed_under_rewriting A R <—> (-r (ta_lang A) c ta_lang A)

In addition to the decision procedure, we also provide Theorem 11 to demonstrate closure under rewriting when ^ is supplied. The advantage of the latter is its improved runtime and its broader applicability: one does not have to iteratively construct the relation, and for left-linear TRSs, also non-deterministic automata with e-transitions are supported. Here, for checking state-compatibility, we use a tree automaton matching algorithm that restricts the set of state substitutions a that have to be considered for compatibility w.r.t. Definition 8.

7.4. Related formalization

Note that there already exists another Isabelle library on tree automata [14,15] by Peter Lammich. We briefly relate the two libraries. Lammich's formalization is more complete in terms of pure tree automata operations: our library only contains those algorithms which have been essential for implementing the certifier, whereas Lammich also formalized algorithms for union and complement and thereby proves that regular tree languages are closed under Boolean operations. However, Lammich does neither support e-transitions nor algorithms which are related to reachability analysis for TRSs.

8. Conclusion

We have introduced the class of deterministic, state-coherent automata that are state-compatible with a TRS R. We have shown that these automata capture precisely those regular tree languages that are closed under rewriting by R, leading to a decision procedure for checking whether a regular language is closed under rewriting. Their simple definition allowed us to formalize most of our results on state-coherent, state-compatible automata within Isabelle/HOL. Also criteria for

match-bounds, namely raise-consistency to ensure closure under - r , could easily be adapted to the corresponding notion state-raise-consistency. We further demonstrated via examples that the gain in precision carries over to the applications: our notions allows more powerful confluence and termination analysis tools, however it remains as future work to integrate our notions in those search algorithms of the tools which synthesize tree automata.

Further future work consists in the expansion of the formalization w.r.t. match-bounds, e.g., by integrating results on match-bounds for dependency pairs or using match-bounds of right-hand sides of forward closures. Moreover, we plan to generate witnesses of the decision procedure for closure under rewriting in the negative case. Another open question is whether the state-raise-consistency condition can be relaxed to cover more systems.

Acknowledgments

We would like to thank Aart Middeldorp for fruitful discussions on the topic of tree automata and helpful feedback. We are also grateful to the reviewers of the preliminary versions of this paper for their constructive feedback.

References

[1] F. Baader, T. Nipkow, Term Rewriting and All That, Cambridge University Press, 1998.

[2] B. Boyer, T. Genet, T.P. Jensen, Certifying a tree automata completion checker, in: Proc. 4th International Joint Conference on Automated Reasoning, in: Lect. Notes Comput. Sci., vol. 5195, 2008, pp. 523-538.

[3] H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, C. Loding, S. Tison, M. Tommasi, Tree automata techniques and applications, available at http://tata.gforge.inria.fr, 2007.

[4] J. Endrullis, R.C. de Vrijer, J. Waldmann, Local termination, in: Proc. 20th International Conference on Rewriting Techniques and Applications, in: Lect. Notes Comput. Sci., vol. 5595, 2009, pp. 270-284.

[5] B. Felgenhauer, R. Thiemann, Reachability analysis with state-compatible automata, in: Proc. 8th International Conference on Language and Automata Theory and Applications, in: Lect. Notes Comput. Sci., vol. 8370, 2014, pp. 347-359.

[6] G. Feuillade, T. Genet, V.V.T. Tong, Reachability analysis over term rewriting systems, J. Autom. Reason. 33 (2004) 341-383.

[7] T. Genet, Decidable approximations of sets of descendants and sets of normal forms, in: Proc. 9th International Conference on Rewriting Techniques and Applications, in: Lect. Notes Comput. Sci., vol. 1379, 1998, pp. 151-165.

[8] T. Genet, Y.-M. Tang-Talpin, V.V.T. Tong, Verification of copy-protection cryptographic protocol using approximations of term rewriting systems, in: Proc. WITS'03 (Workshop on Issues in the Theory of Security), 2003.

[9] A. Geser, D. Hofbauer, J. Waldmann, H. Zantema, On tree automata that certify termination of left-linear term rewriting systems, Inf. Comput. 205 (4) (2007) 512-534.

[10] F. Haftmann, T. Nipkow, Code generation via higher-order rewrite systems, in: Proc. 10th International Symposium on Functional and Logic Programming, in: Lect. Notes Comput. Sci., vol. 6009, 2010, pp. 103-117.

[11] N. Hirokawa, A. Middeldorp, Tyrolean termination tool, in: Proc. 16th International Conference on Rewriting Techniques and Applications, in: Lect. Notes Comput. Sci., vol. 3467, 2005, pp. 175-184.

[12] M. Korp, Termination analysis by tree automata completion, Ph.D. thesis, University of Innsbruck, 2010.

[13] M. Korp, A. Middeldorp, Match-bounds revisited, Inf. Comput. 207 (11) (2009) 1259-1283.

[14] Lammich, P., Tree automata. Archive of Formal Proofs, http://afp.sf.net/entries/Tree-Automata.shtml, Formal proof development, Nov. 2009.

B. Felgenhauer, R. Thiemann/Information and Computation ••• (••••) 17

[15] P. Lammich, A. Lochbihler, The Isabelle collections framework, in: Proc. 1st International Conference on Interactive Theorem Proving, in: Lect. Notes Comput. Sci., vol. 6172, 2010, pp. 339-354.

[16] A. Lochbihler, Light-weight containers for Isabelle: efficient, extensible, nestable, in: Proc. 4th International Conference on Interactive Theorem Proving, in: Lect. Notes Comput. Sci., vol. 7998, 2013, pp. 116-132.

[17] T. Nipkow, L. Paulson, M. Wenzel, Isabelle/HOL - a Proof Assistant for Higher-Order Logic, Lect. Notes Comput. Sci., vol. 2283, Springer, 2002.

[18] R. Thiemann, C. Sternagel, Certification of termination proofs using CeTA, in: Proc. 22nd International Conference on Theorem Proving in Higher Order Logics, in: Lect. Notes Comput. Sci., vol. 5674, 2009, pp. 452-468.

[19] H. Zankl, B. Felgenhauer, A. Middeldorp, CSI - a confluence tool, in: Proc. 23rd International Conference on Automated Deduction, in: Lect. Notes Artif. Intell., vol. 6803, 2011, pp. 499-505.