Electronic Notes in Theoretical Computer Science 57 (2001)

URL: http://www.elsevier.nl/locate/entcs/volume57.html 22 pages

Compact Normalisation Trace via Lazy

Rewriting

Quang-Huy Nguyen ^

LORIA & INRIA BP 101, 54-602 Villers-les-Nancy Cedex, France

Abstract

Innermost strategies are usually used in compiling term rewriting systems (TRSs) since they allow to efficiently build result terms in a bottom-up fashion. However, innermost strategies do not always give the shortest normalising derivation. In many cases, using an appropriate laziness annotation on the arguments of function symbols, we evaluate lazy arguments only if it is necessary and hence, get a shorter derivation to normal forms while avoiding non-terminating reductions. We provide in this work a transformation of annotated TRSs, that allows to compute normal forms using an innermost strategy and to extract lazy derivations in the original TRS from normalising derivations in the transformed TRS. We apply our result to improve the efficiency of equational reasoning in the Coq proof assistant using ELAN as an external rewriting engine.

1 Introduction

Proof assistants like PVS [4], KIV [17] or Coq [13] advocate the use of equational reasoning for improving efficiency and reducing user interactions. In Coq, the proof objects are stored in each deduction step. The correctness of proofs is justified by type-checking these objects. This mechanism improves reliability and allows one to extract a certified program from the proof of its specification. However, an equality proof requires a lot of user interactions and the generated proof object is huge since it contains the contexts of rewrite steps. In [1], we propose an approach to deal with these problems using ELAN [19] as a fast oracle: Coq delegates term normalisation process to ELAN, and then replays normalisation traces provided by ELAN and which are the lists of pairs (rule Jabel, position^» f^contracted-redex) to get normal forms (NFs). Trace replaying consists of the syntactic pattern matching between redex and the left hand side (LHS) of rule and the replacement of redex

1 Email: Quang-Huy.Nguyen01oria.fr

©2001 Published by Elsevier Science B. V.

by the instantiated right hand side (EHS), Since the rule and the redex are already given by ELAN, syntactic pattern matching in Coq in the worst case is linear in the size of this redex. Meanwhile, the cost of finding out a redex depends on the size of terms which can be very huge. Thus, ELAN performs the proof search and Coq checks this proof later, Coq and ELAN must work on the same canonical (confluent and terminating) TES, In this context, ELAN should return to Coq the as compact as possible traces to minimise the time needed for replaying. This time depends not only on the number of rewrite steps but also on the positions of contracted redices since contracting inner redices creates bigger proof objects,

Fokkink, Kamperman and Walters propose in [8] the lazy rewriting with laziness annotation: every argument of function symbols is annotated lazy or eager. Only the eager arguments are eagerly reduced, A lazy argument is reduced only if this reduction creates new redexes among the active subterms which contain it. We will give a formal definition of active subterms in section 3 but one can see them as the subterms which are allowed to be eagerly reduced, the root being always active by default. For short, in the sequel of this paper, we denote lazy rewriting with laziness annotation by lazy rewriting. In many cases, lazy rewriting might give a shorter derivation to the NF than innermost rewriting since lazy arguments are evaluated by need. Furthermore, lazy rewriting allows dealing with infinite structures by avoiding reductions on non-terminating branches. This property is important when working with non-terminating TESs,

Due to laziness annotations, some subterms of a term will not be rewritten during lazy rewriting. These subterms are called lazy. Lazy rewriting normalises a term to its lazy normal forms where all active subterms are in head normal form (HNF), Some lazy subterms may be reducible, but their reduction may not be finite if the TES is not terminating. Otherwise, all lazy subterms can recursively be normalised until HNF, In this case, lazy rewriting provides a means to get NFs,

Also in [8], the authors show how to correctly simulate lazy rewriting by innermost rewriting with respect to (w.r.t.) a new TES obtained by transforming the original TES, This transformation process is called thu/nkification. A simulation is correct if it is complete, sound and termination preserving [12] [7]. In other words, correctness guarantees that no information on NFs in the original TES is lost. In addition, we are strongly concerned with the relation between the normalisation derivations in order to keep lazy normalisation traces still useful for the proof assistant.

In this paper, we show the correspondence between normalisation traces in original and transformed TESs and propose a normalisation procedure based on lazy rewriting. This procedure yields a NF of input terms if the TES is terminating and so, their unique NF if the TES is canonical. On the other hand, all normalising tasks in this procedure use leftmost-innermost strategy, that can efficiently be performed in ELAN, Our normalisation procedure is used

to replace leftmost-innermost normalisation in the cases where using relevant laziness annotations we can get shorter normalising derivations. Moreover, since subterms are sequentially reduced to HNF in a top-down fashion, outer redices are usually contracted first in this procedure,

Thunkification only works with the TESs where no non-variable term is put on the lazy arguments of a function symbol in the left hand sides of rewrite rules. In [8], the authors deal with this problem by transforming the original TES into a minimal TES (i.e. each LHS contains no more than two function symbols) [7]. Hence, this transformation generates a fairly large number of new but simple rules and of new function symbols. The minimal TES given by their transformation is optimal for the abstract rewriting machine (AEM) [7] but not for ELAN whose compiler uses an improved version of the manv-to-one pattern matching algorithm presented in [10]. Moreover, this transformation flattens the LHSs by introducing new function symbols with new aritv. This fact changes the positions of redices and hence, makes the correspondence between normalising derivations more difficult to establish. Therefore, we propose in this paper another transformation (preliminary transformation) to overcome the limit of thunkification for left-linear constructor-based TESs while keeping a good correspondence between normalising derivations.

Since TESs are allowed to be overlapping in this work, an order between rewrite rules needs to be explicitly shown. Like most of functional languages, ELAN uses textual ordering and we decided to keep it instead of using specificity ordering as in [8], On the other hand, we only consider reductions (rewriting, lazy rewriting) on terms without variables (ground terms). Furthermore, all rewrite rules are required to be left-linear. Completeness of thunkification does not hold if the TES is not left-linear. Some extensions are envisaged, for example, by checking equality between the original forms (in original signature) of the terms that instantiate the same variable. However, if the transformations become too complicated, then the gain in performance will be less clear.

This paper is organized as follows. In section 2, we briefly review standard definitions on term rewriting. Section 3 gives a rule-based definition of lazy rewriting, Thunkification is described in section 4 where we show the correspondence between normalisation traces. In section 5 we present the normalisation procedure based on lazy rewriting. The preliminary transformation is described in section 6, In section 7, a complete example is given in order to illustrate the combination of two transformations. We close the paper by discussing some related works. All absent proofs can be found at the complete version available at http://www.loria.fr/~nguyenqh/publication,

2 Term Rewriting

We mostly use the notations introduced in [5]. In particular, a signature £ consists of a set V of variables and a set T of function symbols. Arity of

function symbol / in T is denoted by ar(f).

The set of terms over £ is denoted by 7s while the set of ground terms over £ is written C/v. The function symbol heading a term t is denoted by %ead(t). A term is linear if no variable can occur more than once in it, A position within a term is represented by a sequence of natural numbers describing the path from the root of term to the head of the subterm at that position. The position of the root of term is an empty sequence and is denoted by e. The set of nonvariable positions in a term t is denoted by !FPos(t). A subterm rooted at position p of term t is denoted by t\p. By t[s]p we denote the term t whose subterm at position p is replaced by the term s. The subterm t\Pl is a context of subterm t\P2 ifpi is a prefix of p2.

A substitution is a mapping from the variables of V to terms. If a is a substitution, then to denotes the result of applying o on t. We write t{x s} the term t in which each occurrence of variable x is replaced by the term s. Term s overlaps the term t if there exist a non-variable subterm t\p and a substitution o such that so = to\ . Notice that the variables of s and t are renamed before, if necessary, so that they are disjoint. By this definition, a term t always overlaps itself at root position. However, this case is trivial and is not considered as an overlap. Two terms s and t are overlapping if s overlaps t or vice versa,

A rewrite rule over T\ is an ordered pair {I, r) of terms and is denoted by I —r. We call I and r respectively the left hand side and the right hand side of rule. Rewrite rules are often restricted by two conditions: the LHS is not a variable and all variables occurred in the EHS must be contained by the LHS, A rewrite rule is called left-linear/right-linear if its LHS/EHS is linear,

A set of rewrite rules 1Z over 7s is called a term rewriting system (TRS). In order to identify rewrite rules in TESs, in this paper, a rewrite rule is often denoted by [I] I —r where t is the label of the rule, A TES 1Z is called left-linear if all its rules are, A TES is overlapping if the LHSs of two (not necessary distinct) rules are, A symbol in T is called defined symbol of a TES 1Z if it is the head symbol of the LHS of some rule in 7Z. Function symbols which are not defined symbols are called constructor symbols of 1Z. 1Z is called constructor-based if no defined symbol can appear inside a LHS, In constructor-based TESs, only overlapping at the roots of LHSs is allowed. Let 1Z be a TES, A term s in 7s rewrites to a term i in 7s in one rewrite step if there exist some rule [I] I —r in 7Z, a position p in s, and a substitution o such that: s|p = lo and t = s[ro]p.

We denote this rewrite step by s -^-n t or s—^t and the reflexive-transitive closure of relation —by —A derivation in 1Z is any (finite or infinite) sequence of rewrite steps. From an operational point of view, a rewrite step consists of two phases: the pattern matching between s\p and I giving a substitution o, and the replacement of redex s\p in s by ro. Since syntactic pattern matching yields no more than one solution, position p and label t suffice to memorise the rewrite step on a given term s. The pair (£,p) is called the

trace of this rewrite step. The subterm s|p is also called a redex since it is an instance of the LHS of some rule in 7Z. A term is said to be in normal form w.r.t. 1Z if it contains no redex, A derivation from a term to one of its NFs is called a normalising derivation of this term.

Definition 2.1 (Normalisation trace) If t = t\ ^f1 /•_> ^f2 ,,, /„ is a

normalisation derivation of term t w.r.t. 71, then the list

Tf = {{£,,Pl)■,...■, {¿n,Pn)}

is the corresponding normalisation trace of t.

A term t is in head normal form (HNF) if there is no redex s such that t —*n s. If a term is in HNF, then its head symbol cannot be modified in any derivation issued from it. Hence, if a term t and all its subterms are in HNF, then t is in NF,

In this paper, we use the symbol ih» to describe the evaluation rules in the definitions of new operators,

3 Lazy Term Rewriting

The signature is first given a laziness annotation that marks lazy or eager each argument of its function symbols.

Definition 3.1 (Laziness annotation) Let £ = (V. IF) be a signature. The laziness annotation C of £ is a mapping from T to {e, /}* such that:

V/ e T, C(f) is an or(/)-tuple tt = (x\,..., xar(f)) where Xi = I means the ith argument of / is lazy, Xi = e means this argument is eager.

By 7r/, we denote the ith element of £(/). In the sequel, when speaking about lazy rewriting, a signature always includes its laziness annotation. This laziness annotation divides the set of positions in a term into two subsets: the active positions and the lazy positions, that we define now.

Definition 3.2 (Active and lazy positions) Let i be a term in C/v. We

- the root occurrence e is always an active position,

- for any position p of t such that %ead(t\p) = / and Vi = 1,,, or(/): p.i is active if and only if p is and 7r/ = e; otherwise, p.i is called a lazy position.

The set of active positions in a term t is denoted by AVos(t). The subterms rooted at an active position is called active. The other subterms of the term are lazy. Thus, a subterm of t is active if and only if the path from its head symbol to the root of t contains no edge that connects a function symbol to one of its lazy arguments.

Lazy rewriting is a restricted case of (normal) rewriting. Lazy rewriting only applies on active subterms and a crucial behaviour of lazy rewriting is that

it can change the laziness property of suhterms from lazy to active (subterm activation).

In order to apply lazy rewriting on a term t, we first decorate it. That is, we annotate every subterm u of t by u* where p is the position of u in t and x = a meaning that u is active while x = I meaning that u is lazy. All subterms of a lazy subterm are also lazy. The following operator $ decorates subterm s which is rooted at position p and occurs as an argument of the symbol heading an active subterm of t: <&(s,p, e) h-» sp and <&(s,p, I) h-» sl.

Let i be a term in C/v. We associate to t a decorated term t-pe = VC(t^) where VC is defined by the rule in figure 1,

Symbol For anv / e T\

vc(f-(tu...,tn)) »-» f;(vcmtup. 1,tt{)),...,vc($(tn,p.n,4))) vc(fp(tu...,tn)) ^ fl(VC(tilP.i),■ ■ -,vcK.n))

Constant For anv constant c:

VC{cap) ^ cap and VC(ép) h-» ép

Fig. 1. Evaluation rules for terni décoration

Let Q®imt be the set of decorated terms generated by applying VC on terms in <Jv: g%mU = {i|3s egs :t = VC(sae)}. On the other hand, let us denote by g^term the set of all possible decorated terms generated by decorating terms in gs {gfmt c g1term). The mapping UD : g£term ^ gs removes all decorations and returns the initial term.

Lazy rewriting at the root of a decorated term t by rule I —r is denoted by [/ r](t) and is described by the rules in figure 2, These rules transform a 4-tuple: the first component is the term to be reduced; the second component is the set of positions of essential subterms (ES), i.e. the lazy subterms of t which correspond to a non-variable subterm of the third component is of the form (li,... ,ln r) where h,... ,ln are the subterms of I; the fourth component is a list of decorated terms to be correspondingly matched with

l\-, •••■, In-

The aim of these rules is for modelling both pattern matching and lazy rewriting in the same process as it is done in [3] for term rewriting. The rule SymbolClash returns the initial term in case of conflict caused by an active subterm of t during the pattern matching phase. The lazy subterms never cause conflicts. This fact differentiates pattern matching in lazy rewriting which is called pattern matching modulo laziness from (normal) pattern matching. If a subterm of t is lazy and the corresponding subterm of I is not a variable, then this lazy subterm is called essential and EssentialSubterm in-

serts its position into ES. Decomposition is applied if a symbol which roots an active subterm of t matches with the corresponding symbol in I. Instantiation instantiates a variable of the EHS with a subterm without decoration of t. Replacement replaces the term by the (decorated) instantiated EHS if ES is empty. In this case, no essential subterm has been revealed and pattern matching modulo laziness is identical with pattern matching. Moreover, a is the substitution returned by pattern matching modulo laziness. If ES is not empty, then Activation is applied to activate one essential subterm s of t and hence, all active subterms of s. One can choose s from ES using different strategies (leftmost, rightmost, ,,,), However, the results presented in this paper are independent of the used strategy. If Activation or Replacement is applied, then a lazy rewrite step is carried out and t is called a (lazy) re-dex since it matches modulo laziness with I. Formally, a (decorated) term t matches modulo laziness with a linear pattern I if and only if the symbols which root the active subterms of t match with the corresponding symbols of h

Vp e AVos{UV{t)) n FVos{t) : Uead{UV{t)\p) = Uead{l\p)

Figure 3 describes operator CTZ that performs lazy rewriting inside a decorated term t: CTZ replaces a subterm by the result of the application of lazy rewriting on it. Moreover, the decoration of this result needs to be adapted to its position in t by the shifting operator STi : Q^term xfg^term such that SH(s,p) adds a prefix p to the position in the decoration of s and of all its subterms. We respectively denote the lazy rewriting relation w.r.t. TZ and

its reflexive-transitive closure by and A lazy rewrite step by a rule

labelled £ at position p of term is denoted by

Definition 3.3 (Lazy normal form) A decorated term I is said to be in lazy normal form (LNF) w.r.t. 7Z if there exists no decorated term t! such that t^-jit'.

Example 3.4 ([15]) Consider the following TES (infinite list):

{[rl] 2nd(cons(x,cons(y, z))) —y

[r2] inf(x) —cons(x,inf(s(x)))

where C(2nd) = (e);£(m/) = (e);£(cons) = (e,l).

The term t = 2nd°(m/°(0°x)) is derived to its LNF as follows:

t ^ 2ndae(consi(0ll, inf[,2(s[.2.M.2.i.im ^

r O 1 o

2nd«(cons°MA, m/?.2(s?.2.i(0?.2.i.i))))

2nd°e (cOriSCt(0Ct l, consf 2(s"21(0" 2.1.l); ^n/l.2.2(Si.2.2.l(S1.2.2.1.l(0i.2.2.1.1.l)))))) '^ se(0i)- In the second step, the essential subterm infl2{sll2l(Oll2ll)) is activated.

Initialisation Mr](t) ^ it]im^r)(t) Decomposition For any J G IF

№<$](■ ■ ■, h, ■ ■ ■, tn,... r)(..., Si,..., sn,...) . .) H-»

SymbolClash For any f,g G T and f ^ g

[ipS](..., f(tu ..., tn),... r)(..., gap(s ..., sm),.. .) H-» t

EssentialSubterm For any f E any subterm s decorated with I:

Instantiation For any a; G V, any decorated subterm s:

[t][ES](...,x,...^r)(...,s,...) h-» [t][ES](...,...^r{x^UV(s)})(...,.

Replacement [i][0](^r)Q h-» VC(r-) Activation

[ipsu{p}](->r)() t[vc(uv(t\p);)]p

Fig. 2. Evaluation rules for lazy rewriting

Application For any decorated term t, any position p in UV{t) and any rule I —r G TZ:

C7l(t,p, l —ï r) h-» t[SH([l —r](i|p),p)]p if i|p is decorated with a C7l(t,p, I —r) h-» t if i|p is decorated with I

Fig. 3. Evaluation rules for lazy rewriting inside a term

Remark 3.5 Let t and t' be two decorated terms , If t t' by applying the Replacement rule, then UV(t) UV(t'). Otherwise, if t ^ t' by applying

the Activation rule, then UV{t) = UV(tf).

The next propositions show the relation between lazy rewriting and rewriting in the same TES,

Proposition 3.6 If t is in LNF w.r.t. 1Z, then UV(t) is in HNF w.r.t. 7Z.

Proof. By induction on the size of t.

If the size of t is 1, then t is a constant or a variable: t is active and t has no lazy subterm. Due to the definition of LNF, UV(t) is in HNF, Suppose now that the proposition is correct for all terms of size strictly smaller than from n. The size of the subterms of t is less than or equal to n — 1. Suppose that UV(t) is not in HNF, That is, there exist a term s e Qy, and a rule I —r e 1Z such that UV{t)^r\s and s 'matches with I (*), Notice that the derivation from UV(t) to s only contracts the redices below root. Since t is in LNF, all its active subterms are also in LNF, By induction hypothesis, these subterms (after being removed their decoration) are in HNF and their head symbols cannot be changed by any derivation issued from UV(t) (**),

(*)(**) imply that the symbols which root the active subterms of t match with the corresponding symbols oil. In other words, t matches modulo laziness with I and t is not in LNF which contradict the hypothesis and finishes the proof, □

Since the active subterms of a LNF are also in LNF, all active subterms (without decoration) of a LNF are in HNF.

Proposition 3.7 If there exists an infinite derivation to^nti^n ■ ■ then there exists k e N such that UV(t0) —.>nUT>{tk).

Proof. Lazy rewrite steps that terminate by applying Activation strictly decrease the number of lazy subterms. Hence, there is no infinite sequence of these lazy rewrite steps in a derivation. That is, there exists a smallest k > 1 such that tk-i^ntk by applying Replacement, Due to remark 3,5, we have: UV{t0) = ...= UV{tk-1) ->nUV{tk). □

A direct corollary of this proposition is that if rewriting w.r.t. 1Z is terminating, then so is lazy rewriting w.r.t. 7Z regardless of laziness annotations,

4 Thunkifieation

Thunkification has been described in [8] for lazy graph rewriting. We consider lazy term rewriting and do not require the LHSs of the original TES to be minimal [7]. This fact requires a small generalisation in the proofs. Our thunkifieation works on left-linear but possibly overlapping TESs where all lazy subterms of the LHSs must be a variable. In this case, no subterm activation is possible in a lazy rewriting step since lazy subterms always correspond to the variables of pattern. In other words, lazy rewriting steps always end by

applying the Replacement rule and hence, lazy rewriting derivations only include terms in C?|?mrt,

4-1 Thunkification Description

Thunkification extends the signature and generates a new TES by which innermost rewriting simulates lazy rewriting in the original TES,

The new signature is built from the original signature £ = (V, J-) by adding new function symbols introduced during thunkification: Q,Tf,vecf, vect, At, inst for every / c T and for some subterms t of the EHSs of rewrite rules in the original TES, The introduction of new function symbols allows one to mask lazy subterms, A lazy /-rooted subterm s is masked (or thunked) by a subterm in the form of Q(rf,vecf(...)) and hence, cannot eagerly be rewritten. The structure of s is stored in this 0-rooted subterm so that one can recover it later.

The thunkification of terms is a mapping tp : C?|?mrt —t/v which is defined by the rules in figure 4, We describe now the new TES generated by thunkification.

Definition 4.1 (Lazy argument position and subterm) Let t be a term in If there exist p E TVosit) and i E N such that %ead(t\p) = / and 7r/ = I, then p.i is called a lazy argument position in t while t\p.i is called a lazy argument subterm of t.

Definition 4.2 (Migrant variable [8]) A variable that appears at a lazy argument position in the LHS of a rewrite rule and at an active position in a subterm t of the EHS is called migrant in t.

The laziness property of subterms which instantiate migrant variables are changed from lazy to active after the lazy rewrite step. Thus, we need to activate lazy rewriting on these subterms later.

Definition 4.3 (Set of rules) Let R be a TES, The set of rewrite rules S generated by applying thunkification on 1Z is the union of four subsets S0, Si, S2 and S3 which are defined as follows:

(i) S0 contains the rule I —r' if and only if I —r E 1Z and r' is built from r as follows:

• In a bottom-up fashion, replace any lazy argument subterm t of the EHS r by ©(At, vect(xi,,,,, xnt)) where all variables of t.

• Eeplace any migrant variable x of the EHS r by inst(x).

(ii) Si = {inst(Q(rf, vecf(xi,..., xar(f)))) f(tu ..., tar(f)) | f E F} where I; = inst(xj) if 7r/ = e; otherwise /( =2^.

(iii) S2 = {inst(Q(At, vect(xi,... ,xnt))) —t'\ t has been replaced in (i) and t' = t{xi i—Y inst(xi)Yii such that Xi is a migrant variable of t}.

(iv) 1S3 = {inst(x) —x}.

In fact, <S0 contains all rewrite rules in 1Z whose EHS has been changed (or thunked): every lazy argument subterm t is thunked by a subterm in the form of e(A t,vect(...)) and hence, t cannot eagerly be rewritten, A corresponding rule is then inserted into <S2 in order to recover t later. The insertion of the symbol inst allows rewriting afterwards on the subterms which have instantiated migrant variables. The unique rule of <S3 allows dealing with the direct subterms which are not thunked of symbol inst. This rule has the lowest priority and hence, is the last rule of S to be tried with terms since we use textual ordering.

In [8], only non-variable lazy argument subterms of EHSs are thunked. Since an innermost strategy will be used for rewriting by S, the subterms which instantiate variables of EHSs are always in NF before the application of any rule. In other words, the thunkifieation of lazy argument subterms which are variables is unnecessary. However, in this work, we also thunk these subterms in order to ensure the correctness of lemma 5,1 in section 5,

(0 v(fap(tu- ■■,tn)) H"» f(^(®(t l,p.l,n()), ■ ■ ■ ,<p($(tn,p.n,n£)))

(11) vlfl(ti,. .., tn)) H-» 0(7-/, vecf((p(t ilpA),..., ¥(tnlp.n)))

(Hi) <p(Cp) hc if c is a constant.

(iv) p(clp) h-» 6(rc, vecc) if c is a constant.

Fig. 4. Evaluation rules for ip

The set of terms B is defined as follows:

B = {gegs, I 3gQ e : <p(gQ) g}

This definition of B is slightly different from [8] where g0 is not thunked (by ip). The thunkifieation of g0 helps to get NFs w.r.t. S more quickly. This fact is used in our normalisation procedure in section 5,

The mapping o : B —Q^%n%t relates terms in B and terms in C?|?mrt and is defined by the rules in figure 5, Actually, (j> recovers lazy subterms using the informations stored in their corresponding 0-rooted subterms,

4-2 Correctness of Thunkifieation

Lazy rewriting on terms in C?|?mrt w.r.t. 1Z can correctly be simulated by innermost rewriting on terms in the subset B of C/v w.r.t. S via (j> up to the criteria figured in [12], That is, <f> is surjeetive, sound, complete and termination preserving. The mapping <f> is surjeetive since for every term g in

(i) <f>(g) h-» T(g,e,e)

(ii) T(inst(t),p,e) H-» T(t,p,e)

(iii) T(inst{t),p,l) H-» T(t,p,l)

(iv) T(&(Tf,vecf(ti,... ,tn)),p,e) h-» /®(T(ti,p.l,tt{), ■ ■. ,T(tn,p.n,irf))

(V) T(Q(Tf,vecf(ti,... ,tn)),p,l) h-» 4(T(ii,p.l,0,.. ,,T (tn,p.n,l))

(vi) T(Q(rc,vecc),p,e) h-» c° if c is a constant.

(vii) T(Q(rc,vecc),p,l) H-» cj, if c is a constant.

(viii) T(e(At, vect(tl7... ,tnt)),p,e) ^ T(t{xi ^ ti} .. {xnt ^ tnt},p,e)

(ix) T(@(\t,vect(ti,... ,tnt)),p,l) ^ T(t{xi h^ti} ... {xnt ^ tnt};P; 0

M T(f(tu...,tn)^e) «-» /^T^p.l,^),...,!^ p.n^D)

(xi) T{f{tu...,tn),p,l) «-» flp{?{t1,p.l,l),...,?{tn,p.n,l))

(xii) T(c,p,e) H-» Cp if c is a constant.

(xiii) T(c,p,l) h-» Cp if c is a constant. Fig. 5. Evaluation rules for <$>

Qj}mt : 4>((p(g)) = g. In the following, —denotes the innermost rewriting relation w.r.t. S.

Theorem 4.4 (Soundness [8]) Let g be a term in B. If g—*sg', then H^-^nHd')- More precisely: ifg->s09', then </>(g) </>(g') and if 9 ^s^siusz g', then <f>(g) = <f>(g').

Lemma 4.5 ([8]) If g e B contains no symbol inst, then every active sub-term of (j){g) inherits the head symbol from its corresponding subterm of g.

Theorem 4.6 (Completeness [8]) If g e B is in NF w.r.t. S, then 4>{g) is in LNF w.r.t. 7Z

Theorem 4.7 (Termination preservation [8]) If there exists an infinite derivation g$ —gi —,,,, then there exists k e N such that (j){go) 4>{gk)-

Corollary 4.8 If lazy rewriting w.r.t. 7Z is terminating, then so is innermost rewriting w.r.t. S.

4-3 Correspondence of Trace

We show in this section that (lazy) normalisation traces of 4>{g) w.r.t. 1Z can be extracted from normalisation traces of g w.r.t. S.

Suppose that each rule in <S0 inherits the label from its corresponding rule in 1Z, we have:

Theorem 4.9 (Correspondence of trace) Assume that T^ is a normalisation trace of term g e B w.r.t. S in an innermost reduction strategy.

Extracting from T^ the traces of the rewrite steps performed by rewrite rules in Sq yields a (lazy) normalisation trace T^ of (j)(g) w.r.t. 7Z.

5 Normalisation Procedure

A term can be normalised by sequentially reducing all its subterms into HNF, Suppose that we need to normalise a term t by a left-linear and terminating TES 1Z. The thunkifieation process is first applied on 1Z to get the TES S. Next, t is thunked and normalised w.r.t. S to get g as a NF, Due to the rule in <S3, g contains no symbol inst. Completeness implies that <f>(g) is in LNF w.r.t. 7Z. In other words, all active subterms of 4>{g) are in HNF w.r.t. 1Z and inherit the head symbol from the corresponding subterms of g (lemma 4,5), Furthermore, in <f>(g), active subterms are never subterms of lazy subterms. In other words, <f>(g) can be divided into two parts: the upper part contains active subterms while the lower part contains lazy subterms. Hence, the upper part of g contains the subterms which correspond to active subterms of 4>{g) and which are in HNF w.r.t. 1Z. The lower part of g correspond to the lazy subterms of 4>{g). The frontier between these two parts is composed of symbols 0 (lemma 5,1),

Thus, we can unthunk (activate) 0-rooted subterms and reduce them into NF w.r.t. S. By this reduction, some more subterms of g become in HNF w.r.t. 1Z. Notice that if a 0-rooted subterm is activated, then its "active" subterms are also unthunked. The activating procedure of 0-rooted subterms will be described later by operator (ff. The process is recursively applied until all subterms of g are in HNF w.r.t. 1Z and g is a NF of t.

Lemma 5.1 Let g be a term in B and g contains no symbol inst. Then g is divided into two parts. The upper part contains the subterms which correspond to active subterms of (j)(g) while the lower part contains the subterms which correspond to lazy subterms of (j)(g). The frontier between these two parts is composed of symbols 0.

Let g be a term in t/y/, We define the set of disjoint Q-ancestor positions of g as follows:

Inlaid) = {p\ P E !FVos(g),'Head(g\p) = 0 and %ead(g\Pl

for any prefix pi of p}

■Pia(g) can be computed by the rules in figure 6, Intuitively, Via(g) contains the frontier between two parts of g. The activating operator (j)* is a mapping from C/v/ to C/v/ and is defined by the rules in figure 7: <f>* activates (unthunks) a 0-rooted term g and every 0-rooted subterm s of g such that (f>(s) is an active subterm of 4>{g). Figure 8 describes the normalisation procedure based on lazy rewriting (lazynorm(t,7Z)),

Initialisation Via(g) »"»• £a(g,e)

Symbol Ca(f(ti,...,tn),p) H-» Ca(ti,p.l) U ... U Ca{tn,p.n)

if feT

Constant Ca(c,p) H-» 0 if c is a constant

Discovery £o(0(ii,i2),p) H-» {p}

Fig. 6. Evaluation rules for Qia(t)

i) (j)*(e(Tf,vecf(ti,.. ■,tn))) (H» /(*(<!, 7TÎ),..., 4))

ii) (f>*(e(Xt,vect(tu.. ,tnt))) t{x 1 h^ti} ... {xnt 1 tnt}

Hi) 4>*{®{tc, vecc)) c if c is a constant

iv) <f>*(t)

v) t

Fig. 7. Evaluation rules for <f>*

Theorem 5.2 If 1Z is terminating and fulfills all necessary conditions for thunkification, then lazynorm(t,7Z) is also terminating and yields a NF oft w.r.t. 1Z.

Remark 5.3 The normalisation of term t by procedure lazy norm,(t,7Z) generates a trace Tt containing the traces of all performed (leftmost-innermost) rewrite steps. Let us extract from Tt the pairs whose first element is the label of some rule in <S0. Due to theorem 4.9, this process yields a normalisation trace Tf of t in 1Z (in the sense of normal rewriting).

6 Preliminary Transformation

In this section, we present a transformation that allows to eliminate all nonvariable lazy argument subterms and hence, all non-variable lazy subterms of LHSs. Our transformation works on (left-linear) constructor-based TESs, It is proved to be correct and to preserve a good correspondence between normalisation traces in original and transformed TESs.

procedure normalise(g E

(■i) g is normalised in leftmost-innermost strategy w.r.t, S to get gnf (ii) if gnf contains no symbol 0 then

return g„j

for all p E Via(g„f) do

s := <t>*{{9nf)\v) (9nf)\p '■= normalise(s,S)

end for

procedure lazynorm(t E Qs, 1Z)

:o Build 5 = SQUSlUS2US3 from 1Z

a) g ■■= <p{VC{t))

Hi) tnf := normalise(g, S)

iv) return lll f

Fig. 8. Normalisation procedure (lazynorm) based on lazy rewriting

6.1 Transformation Description

Let 1Z be a left-linear constructor-based TES, Suppose that p.i is a nonvariable lazy argument position in the LHS of a rule ls —r e 1Z and Head(ls\p) = /, We activate this position by adding a new function symbol

fV f fP

fj of aritv or(/j where 7rje = 7rj Vj ^ i while 7qe = e , and by transforming /.,. ^ /• which is called the source rule as follows:

- Eeplace it by the rule ltr where lt is ls but %ead(lt \p) = ff. This rule is called the transformed rule.

- Add a new rule ls[x]p.i —k[x]P.i where a; is a fresh variable to 1Z such that this rule has the lowest priority in case of overlapping. This rule is called the added rule.

All other rules of 1Z are unchanged. This process is called a transformation step that eliminates one non-variable lazy argument subterm of the LHS of a rule in 7Z.

Example 6.1 Consider again the TES in example 3.4. Applying the trans-

formation on the rule r 1 (source rule) yields the following TES:

[rt] 2nd(consl(x,cons(y,z))) y (Transformed rule)

[r2] inf(x) —cons(x,inf(s(x)))

[ra] 2nd(cons(x,x')) —2nd(consl(x, x')) (Added rule)

where C(2nd) = (e);£(m/) = (e); C(cons) = (e,l); £(consl) = (e,e).

Denote by 5 the new TES generated by one transformation step. Let be the new signature (£' = (V^U {/J})). The set of terms B is defined as follows:

B = {ge gWerm I 3go e QTTm :

Tfte mapping (/>' : B —t Q^term relates terms in B with terms in Q^term\ (f)'(g) is built by replacing every symbol ff in g by /, Furthermore, the laziness annotations of suMerms of g and (j)'(g) are kept identical.

We call 4>'(g) the simulation of lazy rewriting on terms in Q^term w.r.t. 7Z by lazy rewriting on terms in B w.r.t. S. Obviously, S is also constructor-based and left-linear. That is, the transformation can be repeated until the LHSs contain no non-variable lazy argument subterm. Our transformation is terminating since in each step, the number of non-variable lazy argument subterms of LHSs is strictly decreased.

6.2 Correctness of Preliminary Transformation

The correctness of the preliminary transformation can be deduced from the correctness of one transformation step up to the criteria figured in [12]. The mapping <j)' is obviously surjeetive, since it is the identity mapping on the subset giterm of B.

Theorem 6.2 (Soundness) Let g be a term in B. If g^sg' then <t>'(g)^n 4>'(g'). More precisely: if g-^s g' by applying the added rule or the transformed rule, then (j)'(g) ^n 4>'(g') by applying the source rule at the same position. Otherwise, (j)'(g) ^n 4>'(g') by applying the same rule at the same position.

Remark 6.3 If g ^s g' by a rewrite step using added rule, then UV((f)'(g)) = UV(4>'(g')). Hence, if we are only interested in non-decorated terms as in case of normal rewriting, then this step is redundant.

Theorem 6.4 (Completeness) If g e B is in LNF w.r.t. S, then <j>'(g) is in LNF w.r.t. 1Z.

Corollary 6.5 (Correspondence of trace) Let T9 be a (lazy) normalisation trace of term g e B w.r.t. S. Replacing the labels of added rule and transformed rule in T9 by the label of source rule, yields a (lazy) normalisation trace of (j)'(g) w.r.t. 1Z.

Example 6.6 In example 6,1, term t = 2nd(inf(0)) is normalised w.r.t. S as follows: 2nd(inf(Q)) ^ 2nd(cons(Q,inf(s(Q))) ^ 2nd(consl(0, m/(s(0))) 2ncl(conSg(0, cons(s(0), m/(s(s(0)))))) ^ s(0). In the generated trace Tf = {(r2,1); (ra, e); (r2,1.2); (rt, e)}, replacing rt and ra by rl yields a (lazy) normalisation trace of t w.r.t. R : Tf = {(r2,l);(rl,"e);(r2,1.2);(rl,e)}

Theorem 6.7 (Termination preservation) If there exists an infinite derivation ■ ■ then there exists k E N such that 4>'(go)^n(l)'(9k)-

7 Combining Two Transformations

We describe in this section, the combination of thunkification and preliminary transformation described above. If the LHSs of the considered TES (TZ) contain no non-variable lazy subterms, then sole thunkification is sufficient. In order to get a normalisation trace of term t, we use the normalisation procedure described in section 5, Otherwise, preliminary transformation is used to eliminate non-variable lazy subterms of the LHSs, The new TES (5) generated by this transformation is then, transformed by thunkification. Suppose that the normalisation procedure yields a trace Tt, Due to remark 5,3, one can extract from Tt the trace Tf of corresponding (lazy) derivation by S. Ee-placing added rules and transformed rules in S by their source rules in TZ, one gets T^ which is the trace of corresponding (lazy) derivation by TZ.

Nevertheless, due to remark 6,3, rewrite steps by added rules are redundant since our goal is to get a normalisation trace in the sense of normal rewriting. Therefore, we need to refine our trace by eliminating these redundant steps. This refinement should be done on Tf before generating Tf which is now the normalisation trace of t w.r.t. TZ.

Example 7.1 We illustrate our method by considering the TES (TZ) in example 3,4, Thunkification cannot directly be applied on TZ since the LHS of rl contains non-variable lazy subterm cons(y,z). Using preliminary transformation, we get the TES S in example 6,1, This TES fulfills all necessary

conditions for thunkifieation which will give the following TES:

[rt] 2nd(consl(x,cons(y,z))) —y [r2] inf(x) —

cons(x, 0(\inf(s(x),vecinfis(x))(x)))) [ra] 2nd(cons(x,x')) —ï 2nd(cons\(x,inst(x')))

[rll] inst(Q(rcons, V€Ccons{x, y))) —cons(inst(x),y) [rl2] inst(Q(rinf, vecinf(x))) —inf(inst(x)) [rl3] inst(®(T2nd,vec2nd(x))) 2 nd(inst(x)) [■r 14] inst(Q(Tconsi, vecC0nsi (x, y))) consle(inst(x),inst(y)) [r21] ¿nsi(e(Ai„/(sW),?;eCi„/(sW)(a;)))) inf(s(x)) [r31] inst(x) —x

Consider the term t = 2nd(inf(0)). We normalise cp(t) = 2nd(inf(0)) w.r.t.

U by the following leftmost-innermost derivation:

r2 ]_ ra ^

2nd(inf (0)) 2nd(cons(0,e(Xinfisiohvecinfisio))(0)))))

r21 1 2

2nd(conSg(0, m/(s(0))))

2ncl(conSg(0, cons(s(0), Q(Xinf(s(0), vecinf{sm(s(0))))))) ^ s(0).

Since s(0) contains no symbol 0 the normalisation procedure finishes and return this term as a NF of t w.r.t. S. Due to the soundness of preliminary transformation, s(0) is also a NF of t w.r.t. 7Z. Thanks to theorem 4,9, one can extract from the normalising derivation above a normalisation trace of t w.r.t. S: Tf = {(r2,1); (ra, e); (r2,1,2); (rt, e)} (only the rewrite steps performed by rules in U0 figure in Tf). Finally, we eliminate the rewrite steps by added rules (ra) and replace transformed rules (rt) by their source rules (ri) to get a normalisation trace of t w.r.t. 1Z (in the sense of normal rewriting): Tf = {(r2,1); (r2,1,2); (rl,e)}. Notice that applying an innermost strategy on t using the rules in 1Z leads to infinite reductions.

8 Related Work

Lazy rewriting can be obtained in OBJ [9] and CafeOBJ [6] using operator evaluation strategy (E-strategv) where each operator (function symbol) has its own evaluation order.

There are two suggested ways to simulate lazy rewriting by E-strategv:

(i) omit lazy arguments from local strategy of its function symbol

(ii) use negative integers for these arguments

The first method is not well-behaved if there is some non-variable lazy subterm in the LHS of a rule as in example 3,4, where the second argument is omitted from the local strategy of cons. However, such a strategy reduces 2nd(inf(0)) to 2nd(cons(0, inf(s(0)))) instead of s(0) since the sub-term inf(s(0)) is not allowed to be reduced and r 1 cannot be applied.

The second method is implemented in CafeOBJ using on-demand flag [18], A negative integer —i in the local strategy of function symbol / means the ith subterm of / is forced to be rewritten if and only if it causes a conflict during pattern matching. In example 3,4, the local strategy of cons is (1 0) and

2nd(inf(0)) is derived as follows: 2nd(inf(0)) -4 2nd(cons(0,inf(s(0))))

r2 \ 2 v 1 e

2nd(cons(0, (cons(s(0),mf(s(s(0))))))) -4- s(0). In the second rewrite step, r 1 is tried with the term 2nd(cons(0, inf(s(0)))). The subterm inf(s(0)) causes a conflict and hence, it is forced to be rewritten. The E-strategies that can reduce terms to their HNF is characterised in [15] for left-linear and constructor-based TESs, On-demand flag is very similar to the notion of essential node and thunkifieation shares the same limit with the first method described above. Preliminary transformation allows us to overcome this limit for left-linear and constructor-based TESs,

Context-sensitive rewriting [14] can be seen as a restricted case of lazy rewriting where subterm activation is not allowed. In order to correctly simulate rewriting by context-sensitive rewriting, one needs to use canonical replacement maps which actually require that all lazy subterms of the LHSs must be variables. In other words, context-sensitive rewriting also shares the same restriction with the first method described above.

9 Conclusion

In this paper, we described lazy rewriting and the mechanism of thunkifieation under a rule-based form. We showed the relation between normalising derivations in TESs before and after thunkifieation and proposed a normalisation procedure based on lazy rewriting, A preliminary transformation that allows extending the application scope of thunkifieation while preserving a nice correspondence between normalisation traces was also presented.

Finding optimal derivations is undecidable in general [16] [11] and even when it is decidable, the decision procedures are often difficult to implement. In practice, most of interesting results only involve orthogonal constructor-based TESs [20] [2] [21], We think that our normalisation procedure is helpful since the normalisation procedure is reasonably efficient in ELAN, thanks to correct simulations, while generated traces are more compact and still useful for Coq, thanks to the nice correspondences between normalising derivations before and after each transformation. Moreover, TESs are allowed to be overlapping,

A natural question may arise: which arguments should be marked lazy in

each function symbol ? There is not already general answer, but intuitively, the variables that appear in the LHS but not in the EHS of the same rule should be lazy. Thus, in an if-then-else construction like

{if (true, x, y) x; if (false, x, y) y}

the two last arguments of if should be lazy. Such TRSs form a class where lazy rewriting can provide more compact normalisation traces. If all variables in the LHS also appear in the EHS, then all redices are necessary and lazy or outermost strategies do not give a shorter derivation than innermost strategies. Furthermore, variables marked lazy should not appear more than once in the EHS since this duplicates reductions on terms which will instantiate these variables. In such cases, sharing is required with lazy rewriting. In our work, sharing is only helpful if it is implemented in both Coq and ELAN, This requires some extensions in Coq replaying procedure and ELAN compiler that we are investigating,

10 Acknowledgements

I sincerely thank Claude Kirchner, Hélène Kirchner and some anonymous referees for their constructive comments on the earlier versions of this paper, I am also grateful to Mark van den Brand for pointing out [8] to me.

References

[1] C. Alvarado and Q-H. Nguyen. ELAN for equational reasoning in Coq. In J. Despeyroux, editor, Proc. of 2nd Workshop on Logical Frameworks and Metalanguages. Institut National de Recherche en Informatique et en Automatique, ISBN 2-7261-1166-1, June 2000.

[2] S. Antoy. Definitional trees. In H. Kirchner and G. Levi, editors, Proc. of the 3rd International Conference on Algebraic and Logic Programming, volume 632 of Lecture Notes in Computer Science, pages 143-157. Springer-Verlag, September 1992.

[3] P. Borovansky, C. Kirchner, H. Kirchner, and C. Ringeissen. Rewriting with strategies in ELAN: a functional semantics. International Journal of Foundations of Computer Science, 2001.

[4] CSL/SRI. The PVS homepage, http://pvs.csl.sri.com.

[5] N. Dershowitz and J-P. Jouannaud. Handbook of Theoretical Computer Science, volume B, chapter 6: Rewrite Systems, pages 244-320. Elsevier Science Publishers B. V. (North-Holland), 1990.

[6] R. Diaconescu and K. Futatsugi. An overview of CafeOBJ. In C. Kirchner and H. Kirchner, editors, Proc. of 2nd International Workshop on Rewriting Logic

and its Applications, volume 15 of Electronic Notes in Theoretical Computer Science. Elsevier Science Publishers В. V. (North-Holland), 2000. Available at http://www.elsevier.nl/locate/volumel5.html.

[7] W. Fokkink, J. Kamperman, and P. Walters. Within ARM's reach: Compilation of left-linear rewrite systems via minimal rewrite systems. A CM Transactions on Programming Languages and Systems, 20(3):679-706, May 1998.

[8] W. Fokkink, J. Kamperman, and P Walters. Lazy rewriting on eager machinery. ACM Transactions on Programming Languages and Systems, 2(l):45-86, January 2000.

[9] J. A. Goguen, J.M. Winkler, J. Meseguer, K. Futatsugi, and J-P. Jouannaud. An introduction to OBJ. In J A. Goguen and G. Malcolm, editors, Software engineering with OBJ, Advances in Formal Methods. Kluwer Academic Publishers, 2000.

[10] A. Graf. Left-to-right tree pattern matching. In Ronald V. Book, editor, Proc. 4-th Int. Conf. RTA, volume 488 of Lecture Notes in Computer Science, pages 323-334. Springer-Verlag, 1991.

[11] G. Huet and J-J. Levy. Computations in orthogonal rewriting systems, Part I + II. In J-L. Lassez and G. D. Plotkin, editors, Computational Logic - Essays in Honor of Alan Robinson, pages 395-443, 1991.

[12] J. Kamperman and P. Walters. Minimal term rewriting systems. In M. Haveraaen, O. Owe, and O. J. Dahl, editors, Recent Trends in Data Type Specification, volume 1130 of Lecture Notes in Computer Science, pages 274290. Springer-Verlag, 1995.

[13] LogiCal/INRIA. The Coq homepage, http://coq.inria.fr.

[14] S. Lucas. Context-sensitive computations in functional and functional logic programs. Journal of Functional and Logic Programming, 1998(1), January 1998.

[15] M. Nakamura and K. Ogata. The evaluation strategy for head normal form with and without on-demand flag. In K. Futatsugi, editor, Proc. of 3rd International Workshop on Rewriting Logic and its Applications, volume 36 of Electronic Notes in Theoretical Computer Science. Elsevier Science Publishers В. V. (North-Holland), 2000. Available at http://www.elsevier.nl/locate/volume36.html.

[16] M.J. O'Donnell. Equational logic programming. In D. Gabbay, editor, Handbook of Logic in Artificial Intelligence and Logic Programming, volume 5, Logic Programming, chapter 2. Oxford University Press, 1995. Preprint.

[17] University of Karlsruhe. The KIV homepage, http://illwww.ira.uka.de/~kiv/KIV-KA.html.

[18] K. Ogata and K. Futatsugi. Operational semantics of rewriting with the on-demand evaluation strategy. In Proc. of ACM Symposium on Applied Computing, pages 756-763, 2000.

[19] PROTHEO/LORIA. The ELAN homepage, http://elaii.loria.fr.

[20] R I. Strandh. Classes of equational programs that compile into efficient machine code. In N. Dershowitz, editor, Proc. of the 3rd Int. Conf. HI"A. volume 355 of Lecture Notes in Computer Science, pages 449-461. Springer-Verlag, April 1989.

[21] S. Thatte. A refinement of strong sequentiality for term rewriting with constructors. Information and Computation, 72(l):46-65, January 1987.