Continuum Mech. Thermodyn. DOI 10.1007/s00161-015-0470-1

CrossMark

Giovanni A. Bonaschi • Mark A. Peletier

Quadratic and rate-independent limits for a large-deviations functional

Received: 8 January 2015 / Accepted: 23 July 2015

© The Author(s) 20l5. This article is published with open access at Springerlink.com

Abstract We construct a stochastic model showing the relationship between noise, gradient flows and rate-independent systems. The model consists of a one-dimensional birth-death process on a lattice, with rates derived from Kramers' law as an approximation of a Brownian motion on a wiggly energy landscape. Taking various limits, we show how to obtain a whole family of generalized gradient flows, ranging from quadratic to rate-independent ones, connected via 'L log L' gradient flows. This is achieved via Mosco-convergence of the renormalized large-deviations rate functional of the stochastic process.

Keywords Large deviations • Gamma convergence • Gradient flows • Markov chains • Rate-independent systems

1 Introduction

1.1 Variational evolution

Two of the most studied types of variational evolution, 'gradient-flow evolution' and 'rate-independent evolution', differ in quite a few aspects. Although both are driven by the variation in space and time of an energy, gradient flows are in fact driven by energy gradients, while in practice rate-independent systems are driven by changes in the external loading (represented by the time variation of the energy). As a result, gradient-flow systems have an intrinsic timescale, while rate-independent systems (as the name signals) do not, and the mathematical definitions of solutions of the two are rather different [5,27].

Despite this, they share a common structure. Both can be written, at least formally, as

Communicated by Andreas Ochsner. G. A. Bonaschi (B) • M. A. Peletier

Department of Mathematics and Computer Science, Institute for Complex Molecular Systems, Technische Universiteit Eindhoven, P.O. Box 513, 5600 MB Eindhoven, The Netherlands E-mail: g.a.bonaschi@tue.nl

M. A. Peletier

E-mail: m.a.peletier@tue.nl

G. A. Bonaschi

Dipartimento di Matematica, Universita di Pavia, 27100 Pavia, Italy

Published online: 23 August 2015

0 G 3^(x(t)) + DxE(x(t), t).

Here E is the energy that drives the system, and the convex function f is a dissipation potential, with subdifferential df. For gradient flows, f typically is quadratic, and df single-valued and linear; for rate-independent systems, f is 1-homogeneous, and df is a degenerate monotone graph.

Rate-independent systems have some unusual properties. Solutions are expected to be discontinuous, and therefore, the concept of smooth solutions is meaningless. Currently, two rigorous definitions of weak solutions are used, which we refer to as 'energetic solutions' [26] and 'BV solutions' [32]. Heuristically, the first corresponds to the principle 'jump whenever it lowers the energy', while the second can be characterized as 'don't jump until you become unstable'. For time-dependent convex energies, the two definitions coincide, but in the non-convex case they need not be.

Various rigorous justifications of rate-independent evolutions have been constructed, which underpin the rate-independent nature by obtaining it via upscaling from a 'microscopic' underlying system (e.g. [1,13,14, 21,28,40,43]), with a perturbative approach (e.g. [11]), or from a chain model of highly nonlinear viscous springs [38]. The general approach in these results is to choose a microscopic model with a component of gradient-flow type (quadratic dissipation, deemed more 'natural') and then take a limit which induces the vanishing of the quadratic behaviour and the appearance of the rate-independent behaviour.

While these results give a convincing explanation of the rate-independent nature, most are based on deterministic microscopic models (exceptions are [38,43,44]). Other arguments suggest that the rate independence may arise through the interplay between thermal noise and a rough energy landscape. A well-studied example of this is the non-trivial temperature dependence of the yield stress in metals, which shows that the process is thermally driven (e.g. [7]), together with many classical non-rigorous derivations of rate-independent behaviour [8,25,39].

Recently, stochasticity has also been shown to play a role in understanding the origin of various gradient-flow systems, such as those with Wasserstein-type metrics [2,3,15,30,41]. In this paper, we ask the question whether these different roles of noise can be related:

What is the relationship between noise, gradient flows, and rate-independent systems?

We will provide a partial answer to this question by studying a simple stochastic model below. By taking various limits in this model, we obtain a full continuum of behaviours, among which rate-independent and quadratic gradient flow can be considered extreme cases. In this sense, both rate-independent and quadratic dissipation arise naturally from the same stochastic model in different limits.

1.2 The model

The model of this paper is a continuous-time Markov jump process t ^ Xn on a one-dimensional lattice, as shown in Fig. 1. Denoting by 1 /n the lattice spacing, we will be interested in the continuum limit as n

The evolution of the process can be described as follows. Assume that a smooth function (x, t) ^ E (x, t) is given and fix the origin as initial point. If the process is at the position x at time t, then it jumps in continuous time to its neighbours (x — 1/n) and (x + 1/n) with rate nr- and nr+, where r±(x) = a exp(^6VE(x, t)) (throughout we use VE(x, t) for the derivative with respect to x).

The jump process of Fig. 1 has a bias in the direction —VE of magnitude

r + — r — = —2a sinh (pVE), (2)

and we will see this expression return as a drift term in the limit problem. The parameter a characterizes the rate of jumps and thus fixes the global timescale of the process; the parameter p should be thought as the inverse of temperature and characterizes the size of the noise.

In Sect. 1.6, we show the physics behind the choice of this particular model.

x 6 —Z

Fig. 1 One-dimensional lattice with spacing 1/n. The jump rates r + and r— depend on two parameters a and p and on the derivative of the function E

2a sinh(/3£)

«,9 —► LJ

Fig. 2 Middle graph shows the function £ ^ 2a sinh (f £) for moderate values of a and f. The left graph shows the limit for f ^ 0, similar to zooming in to the region close to the origin; this limit is linear. The figure on the right shows the limiting behaviour when f for a specific scaling of a. This second limit does not exist as a function, but only as a graph (a subset

of the plane) defined in (5)

1.3 Heuristics

We now give a heuristic view of the dependence of this stochastic process on the parameters n, a, and f, and in doing so we look ahead at the rigorous results that we prove below.

First, as n ^x, the process Xn becomes deterministic, as might be expected, and its limit x satisfies the differential equation suggested by (2):

Equation (3) is of the form (1) with (df)-1 = 2a sinh(f •). From the viewpoint of the gradient-flow-versus-rate-independence discussion above, the salient feature of the function £ ^ 2a sinh (f£) is that it embodies both quadratic and rate-independent behaviour in one and the same function, in the form of limiting behaviours according to the values of the parameters a, f. This is illustrated in Fig. 2 as follows. On the one hand, if we construct a limit by zooming in to the origin, corresponding to f ^ 0, a ~ rn/f (the left-hand figure), then we find a limit that is linear; on the other hand, if we zoom out and rescale with f and a ~ e-f A, then the exponential growth causes the limit to be the monotone graph in the right-hand side; see mA in (5).

These two limiting cases give rise to a gradient-flow and a rate-independent behaviour, respectively. In formulas, as a and f ^ 0 with af ^ rn for fixed rn > 0, then Eq. (3) converges to

which is a gradient flow of E. The limit a corresponds to large rate of jumps in the underlying stochastic process, while f ^ 0 corresponds to a weak influence of the energy gradient.

In the other case, as a ^ 0 and f with a ~ exp(-f A), the rate of jumps is low, but the influence of the energy becomes large. Formally, we find the limiting equation

x(t) = -2a sinh(f VE(x(t), t)).

x(t) = -2rnVE(x(t), t),

0 if £ < - A,

[-x, 0] if £ = -A,

x(t) e mA(-VE(x(t), t)), with mA(£) = {0} if - A < £ < A,

[0, x] if £ = A, 0 if £ > A.

Again formally, in this limit the system can only move while VE = ±A; whenever the force |VE| is less than A, the system is frozen, while values of |VE| larger than A should never appear. In Sect. 4, we obtain a rigorous version of this evolution as the limit system.

1.4 Large deviations, gradient flows, and variational formulations

Before we describe the results of this paper, we comment on the methods that we use. We previously introduced the concept of gradient flows, and now we introduce the one of large deviations (both are defined precisely in Sect. 2).

In the context of stochastic systems, the theory of large deviations provides a characterization of the probability of rare events, as some parameter—in our case n—tends to infinity. In the case of stochastic processes, this leads to a large-deviations rate function J that is defined on a suitable space of curves. It is now known that many gradient flows and large-deviations principles are strongly connected [2,3,16,30]. In abstract terms, the rate function J of the large-deviations principle simultaneously figures as the defining quantity of the gradient flow, in the sense that

J > 0;

t ^ z(t) is a solution of the gradient flow ^^ J (z(-)) = 0.

The components of the gradient flow (the energy E and dissipation potential f) can be recognized in J. In [30], it was shown how jump processes may generate large-deviations rate functions with non-quadratic dissipation, leading to the concept of generalized gradient flows (see also [17,31,33]).

The central tool in this paper is this functional J that characterizes both the large deviations of the stochastic process and the generalized gradient-flow structure of the limit. Our convergence proofs will be stated and proved using only this functional, giving a high level of coherence to the results.

1.5 Results

In this section, we give a non-rigorous description of the results of this paper, with pointers towards the rigorous theorems later in the paper (Fig. 3).

Fix an energy E e C *(R x [0, T ]). We start with the large-deviations result for the jump process Xn due to Wentzell [47,48].

Statement 1 (Large deviations) Let a and p constant, then Xn satisfies a large-deviations principle for n ^ o, with rate function Ja,p given in (15). Moreover, the minimizer of Ja,p satisfies the generalized gradient-flow equation (3).

This result is stated in Theorem 1 with a sketch of the proof as it is presented in the introduction of [20]. In accordance with the discussion above, Ja,p(x) > 0 for all curves x : [0, T] ^ R, and Ja,p(x) = 0 if and only if x is a solution of (3).

Next we prove that the functional Ja,p converges to two functionals Jq and to JRI in the sense of Mosco-convergence, defined in (20) (in the spirit of pE convergence of [29]), when a and p have the limiting behaviour of Fig. 2. The limiting functionals drive, respectively, a quadratic gradient-flow and a rate-independent evolution.

Statement 2 (Connection)

A1. Let a and p ^ 0, such that ap ^ m, for some m > 0 fixed. Then, we have that, after rescaling,

Ja,p ^ Jq, where Jq is defined in (21); moreover Jq (x) = 0 if and only ifx solves (4). A2. Let p ^ o and choose a = e—pA, for some A > 0 fixed. Then, after an appropriate rescaling, Ja,p ^ JRI, where JRI is given in (30); moreover JRI(x) = 0 if and only if x is an appropriately defined solution of (5).

The Mosco-convergence stated above also implies that minimizers xa,p of Ja,p converge to the minimizers of Jq and JRI in the corresponding cases, i.e. that the solutions xa,p of (3) converge to the solutions of (4) and (5). Point A1 is proven in Theorem 2, and point A2 in Theorem 5.

Together, Statements 1 and 2 describe a sequential limit process: first, we let n ^o, and then, we take limits in a and p. For the quadratic case (A1), we can also combine the limits:

Statement 3 (Combining the limits) Let n ^o, and take p = pn ~ n—S for some 0 < 8 < 1; let a = an be such that an pn ^ m, for some fixed m > 0.

Jri <-

a —> 0

/3 = (hn)~

2 uVEdt + V2uJhdW

Fig. 3 Schematic representation of this paper. In the top centre, there is the generator of the Markov process. The arrows starting from Xn represent the Limiting behaviour for n — to in different regimes. The centre downwards arrow represents the Limit with a and ft fixed and ends at the rate functional (Statement 1). Statement 3 is represented by the right side of the figure. The limits with ft = ft(n) show that the limiting behaviour may be either a Brownian motion with drift (B2a) or a gradient flow characterized by the rate functional Jq (B1). In the bottom part, there are the two Mosco-limits, representing Statement 2 where (A1) is the right arrow and (A2) the left one. Dashed lines are known results, the thick lines are our contribution, and the dotted line is an open problem

B1. First, let 0 < S < 1. Then, Xn satisfies a large-deviations principle as n — to, with rate function Jq; the Markov process Xn has a deterministic limit (4), and this limit minimizes Jq (as we already mentioned).

B2a. In the case S = 1, let an ftn — m, nftn — 1/ h, for some m, h > 0; then Xn converges to the process

Yh described by the SDE

dYth = —2mVE(Y th, t) dt + V2rnh dWt.

B2b. The process Yh in (6) satisfies a large-deviations principle for h — 0 with rate function Jq. B3. The case S = 0 corresponds to point A1 of Statement 2, where first n — to and then ft — 0 and a —>■ to.

Point B1 of Statement 3 is given in Theorem 3. Point B2a is given in Theorem 4; point B2b is the well-known result of Freidlin and Wentzell [23, Ch. 4-Th. 1.1], and it is included in Theorem 3.

Remark 1 In this paper, we consider only the one-dimensional case. We make this choice because the main goal of the paper is to show the connection and the interplay between large deviations and gradient flows, and the one-dimensionality allows us to avoid various technical complications. However, the generalization to higher dimension is in some cases just a change in the notation and does not require any relevant modification in some of the proofs. The rate-independent limit in higher dimensions is non-trivial, and it is the object of work in progress.

1.6 Modelling

The choice of this stochastic process is inspired by the noisy evolution of a particle in a wiggly energy landscape. An example could be that of a Brownian particle in an energy landscape of the form En (x, t) = E (x, t) + n-1e(nx), where E is the smooth energy introduced above, and e is a fixed periodic function that can be thought of the type present in [1,28]. If the noise is small with respect to the variation of e (maxe — min e), then this Brownian particle will spend most of its time near the wells of En, which are close to the wells of e(n ■). Kramers' formula [9,24] provides an estimate of the rate at which the particle jumps from one well to the next; below we show how some approximations lead to the jump rates r ± of Fig. 1.

In the wiggly energy En (x, t) = E(x, t) + n—1e(nx), assume that E varies slowly in both x and t, and that e has period 2 and let n be large. Then, a small patch of the energy landscape of En looks like Fig. 4. The height of the energy barriers to the left and right of a well equals ^e := (maxe — min e), plus a perturbation from the smooth energy E, which to leading order has size n—1V E.

Fig. 4 Global component E perturbs the height of the energy barriers, leading to the formula for r ± in Fig. 1

We now assume that the position Zt of the system solves the SDE

dZt = -VEn(Zt, t) dt + J — dWt.

Here f characterizes the noise and can be interpreted as 1/ kT as usual, although in this case there is an additional scaling factor n. For sufficiently large f, the rates of escape from a well, to the left and to the right, are given by Kramers' law to be approximately

a exp [f(-Ae ±VE)], (7)

where the minus sign applies to the rate of leaving to the right. Here a is a constant depending on the form of e [9,24]. Writing

a = a exp[-fAe], (8)

we find the rates r ± of Fig. 1.

In this formula for a, it appears that a and f are coupled. From a modelling point of view, this is true: if one varies the temperature while keeping all other parameters fixed, then both f and a will change. Note that the scaling regime a ~ e-f A is exactly this case, with A = Ae. On the other hand, the parameter a in (8) is still free, and this allows us to consider a and f as independent parameters when necessary.

Note that by this derivation a natural alternative would be to define the jump rates r ± in terms of the energy differences rather than the gradient, i.e.

r±(x) = a exp [-f n (E(x ± 1/n, t) - E(x, t))] (compare to Fig. 1).

This choice would also render the process time-reversible. However, for the purposes of this paper the difference is mathematically negligible, and we make the choice of Fig. 1 since it leads to simpler and easier-to-read formulas.

1.7 Outline

The paper evolves as described in the following. In Sect. 2, we introduce the concepts of (generalized) gradient flows and of large deviations and we show the connection between the two concepts. In Sect. 3, we prove point A1 of Statement 2 and the whole Statement 3. Then, in Sect. 4, we introduce the space of functions of bounded variations and rate-independent systems, and we prove point A2 of Statement 2. We end the paper in Sect. 5 with a final discussion.

2 Gradient flows and large deviations

In the introduction, we mentioned that the methods of this paper make use of a certain unity between gradient flows and large-deviations principles: the same functional J that defines the gradient flow also appears as the rate function of a large-deviations principle. We now describe gradient flows, large deviations, and this functional J.

2.1 Gradient flows

Given a C1 energy E : R ^ R, we call a gradient flow of E the flow generated by the equation

x = -VE (x).

The energy E decreases along a solution, since

dE(x(t)) = VE ■ X = —|VE|2 = — |X|2. dt

Adopting the notation f(£) = and f (n) = -j, this identity can be integrated in time to find

i (f(x) + f * (—VE)) dt + E(x(T)) — E(x(0)) = 0 Jo

In this paper, we study a generalized concept of gradient flow, considering the energy equality for a broader class of couples f, f *. We will allow the energy to be also dependent on time. We recall the definition of the Legendre transform: given f : R ^ R, we define the transform f * as

f *(w) := sup {v ■ w — f(v)}.

In the following, apart from the rate-independent case, we will assume that f e C 1(R) is symmetric, superlinear, and strictly convex, so that its Legendre transform f * will share the same properties as well.

A curve x : [0, T] ^ R is an absolutely continuous curve, i.e. x e AC(0, T), if for every e > 0, there exists a 8 > 0 such that, for every finite sequence of pairwise disjoint intervals (tj, Tj) c [0, T] satisfying X |tj — Tj | < 8, then X |x(tj) — x(Tj)| < e. The space AC(0, T) coincides with the Sobolev space Wu(0, T) [10, Ch. 8].

Definition 1 (Generalized gradient flow) Given an energy E e C1 (R x[0, T ]), a convex dissipation potential f e C 1(R) with f(v) = f(—v), let f * be its Legendre transform. Then, a curve x e AC(0, T) is a (generalized) gradient flow of E with dissipation potential f in a given time interval [0, T ], if it satisfies the energy identity

f f(x(t)) + f * (—VE(x(t), t))) dt + E(x(T), T) — E(x(0), 0)

/ dtE(x(t), t)dt = 0. (9)

Note that the left-hand side of (9) is non-negative for any function x, since

— E(x(t), t) = VE(x(t), t)x(t) + dtE(x(t), t) dt

> —f(x(t)) — f *(—VE(x(t), t)) + dtE(x(t), t).

From this inequality, one deduces that equality in (9), as required by Definition 1, implies that for almost all t e [0, T]

x(t) = df* (—VE(x(t), t)), (10)

where df* is the subdifferential of f *. We will not use this form of Eq. (10); the arguments of this paper are based on Definition 1 instead.

Existence and uniqueness of classical gradient-flow solutions in R (i.e. with quadratic f) follows from classical ODE theory; in recent years, the theory has been extended to metric spaces and spaces of probability measures [5].

In our case, we will require the energy to satisfy the following conditions

E e C 1(R x [0, T]), E > 0,

|VE| < R < x, (11)

V E is uniformly Lipschitz continuous in t.

Remark 2 In this paper, we analyse generalized gradient flow with f of Definition 1 chosen to be strictly convex and superlinear at infinity. The rate-independent evolution (5) is formally the case of a non-strictly convex, 1-homogeneous dissipation potential f, and for this case, there are several natural ways to define a rigorous solution concept. In Sect. 4, we show how generalized gradient flows for finite a and p, with strictly convex f, converge to a specific rigorous rate-independent solution concept, the so-called BVsolutions [31,32]. We define this concept in Definition 3.

Returning to the unity between gradient flows and large deviations, for generalized gradient flows the functional J mentioned before is the left-hand side of (9).

2.2 Large deviations

'Large deviations' of a random variable are rare events, and large-deviations theory characterizes the rarity of certain rare events for a sequence of random variables. Let {Xn} be such a sequence of random variables with values in some metric space.

Definition 2 ([46]) {Xn} satisfies a large-deviations principle (LDP) with speed an ^ to, if there exists a lower semicontinuous function J : S ^[0, to] with compact sublevel sets such that for each open set O,

liminf — log P(Xn e O) >- inf J(x),

n^TO an xeO

and for each closed set C

lim sup — log P (Xn e C) <- inf J (x).

n^TO an x eC

The function J is called the rate function for the large-deviations principle.

Intuitively, the two inequalities above state that

Prob(Xn ~ x) - e-anJ (x\

where we purposefully use the vague notations ~ and the rigorous versions of these symbols are exactly given by Definition 2.

Remark 3 Typically, the rate function for Markov processes contains a term I0 characterizing the large deviations of the initial state Xn (0). In the following, we will always assume that the starting point will be fixed, or at least that Xn(0) ^ x0, so that I0(x) equals 0 if x = x0, and +to otherwise, so we will disregard I0.

2.3 The Feng-Kurtz method

Feng and Kurtz created a general method to prove large-deviations principles for Markov processes [20]. The method provides both a formal method to calculate the rate functional and a rigorous framework to prove the large-deviations principle. Here we present only the formal calculation.

Consider a sequence of Markov processes {Xn} in R, which we take time-invariant for the moment, and consider the corresponding evolution semigroups {Sn (t)} defined by [see also (13)]

Sn(t) f (x) = E [ f (Xn (t)) | Xn (0) = x], f e Cb(R),

satisfying

d Sn (t) f = VnSn (t) f, Sn (0) f = f, dt

where Z2n is the generator of Xn. For any time interval [0, T], where T may be infinite, Xn(■) is an element of the Skorokhod space 0([0, T]), the space of cadlag functions (right continuous and with limit from left). To obtain the rate functional, we define the nonlinear generator

(Hnf) (x) := -e-anf(x> (tineanf) (x). an

If Hn ^ H in some sense, and if H f depends locally on V f, we then define the Hamiltonian H (x, p) through

H f (x) =: H(x, V f (x)). By computing the Legendre transform of H(x, p), we obtain the Lagrangian

L(x, x) = sup {x ■ p — H(x, p)}.

The Feng-Kurtz method then states, formally, that {Xn} satisfies a large-deviations principle in D([0, T]) with speed an , with a rate function

J (x) =

/0T L(x, x) dt if x e AC(0, T),

otherwise.

In the book [20], a general method is described to make this algorithm rigorous. 2.4 Large deviations of Xn

We now apply this method to the process Xn described in the introduction. It is a continuous-time Markov chain, defined by its generator

Qnf (x) := nae—pVE(x't^f (x + — f (x))

+ naepV E ^ >( f(x — 1) — f (x)}. (12)

For f e Cb (R), the expected value E is defined as

E(f (Xn)|X0n) = ( f (z)dtf (z), (13)

with, denoting by QJ the adjoint of £2n,

dt ^t = , (14) M0 = 8Xn ■

where fiy is the law at time t of the process Xn started at the position X0n at time t = 0. Under the condition (11), the martingale problem (14) is well posed, since the operator Qn is bounded in the uniform topology. The rigorous proof of Statement 1 consists of the following theorem

Theorem 1 (Large-deviations principle for Qn) Let E : R x[0, TR satisfying condition (11). Consider the sequence of Markov processes {Xn} with generator Qn defined in (12) and with Xn (0) converging to x0 for n ^<x>. Then, the sequence Xn satisfies a large-deviations principle in 0([0, T]) with speed n and rate function

'.,p(x) : =

P J [fa ax ) + fS,f>(V E) + xV e) dt forx e AC (0 , T),

otherwise,

(v + V v2 + 4a A 1 ^-—r 2a

- I--Vv2 + 4a2 +--,

2a JP P

fap(v) = p log^ ' ' 2a ' ' ) — PPVv2

fa&(w) = — (cosh(Pw) — 1).

The proof can be found in [23, Ch. 5-Th. 2.1] when the energy E is independent of time. In the general case of a time-dependent energy, the proof follows considering a space-time process, as shown in the proof of Theorem 3.

Note that solutions of Ja,p(x) = 0 satisfy the gradient-flow Eq. (10), which in this case indeed is Eq. (3),

x = -2a sinh (pVE(x, t)).

In the remainder of this paper, we will consider sequences in a and p; to reduce notation, we will drop the double index, writing fp and fp for fa>p and f* p; similarly, we define the rescaled functional Jp,

Jp(x) := 1 fa,p(x) p

= J (fp(x(t)) + fp(-VE(x(t), t)) + x(t)VE(x(t), t)) dt, (17)

for x e AC(0, T) and Jp(x) = +cc otherwise.

Remark 4 The theorem above shows how the rate function Jap can be interpreted as defining a generalized gradient flow. This illustrates the structure of the fairly widespread connection between gradient flows and large deviations: in many systems, the rate function not only defines the gradient-flow evolution, through its zero set, but the components of the gradient flow (E and f) can be recognized in the rate function. This connection is explored more generally in [30] as we describe in the following. Define a so-called L-function L(z, s), positive, convex in s for all z and inducing an evolution equation. The authors of [30, Lemma 2.1 and Prop. 2.2] show that if DsL(z, 0) is an exact differential, say DS(z), then it is possible to write L as

L(z, s) = V(z, s) + 9*(z, -DS(z)) + {DS(z), s), (18)

where 9* can be expressed in terms of the Legendre transform H(z, %) of L as

9*(z, %) := H(z, DS(z) + %) - H(z, DS(z)).

Applying the same procedure to our case, with L = L defined in (19), we obtain after some calculations that

—= H r+fO-

Substituting in DS(z) our choice for r + and r-, namely

r + = ae

№ E (z) r - = aepV E(z)

it follows that S(z) = PE(z).

2.5 Calculating the large-deviations rate functional for (12)

We conclude this section by calculating the rate function for the simpler situation when the jump rates r± are constant in space and time, as it is shown in the introduction of [20]. This formally proves Theorem 1, substituting in the end the expression or r ± from (12). With constant jump rates, the generator reduces to

Vnf (x) = nr f(x + - f (x)

and for n ^to it converges to Qf (x) = (r + — r-)V f (x). As we said in the introduction, the process Xn has a deterministic limit, i.e. Xn ^ x a.s., with x = r + — r—.

x — 1

+ nr~ | f J x — -) — f (x)

In order to calculate the rate functional, we compute the nonlinear generator and the limiting Hamiltonian and Lagrangian. We have

Hnf (x) = r+

so that

en(f(x+l/n)-f(x)) _ 1 + r_ en(f(x-1/n)-f(x)) _ 1

lim Hn = H(x, p) = r + (ep — 1 + r— (e—p — 1.

n^tXl \ / \ /

We then obtain by an explicit calculation the Lagrangian

(X + -J X 2 + 4r+r — \ i-

—-- WX2 + 4r+r —+ r + + r —, (19)

and substituting r + and r— with the corresponding ones from (12) we get

L(x, X) = p f(X) + f *(VE) + XVE) , and we formally have proved Theorem 1.

3 The quadratic limit

In this section, we precisely state and prove point A1 of Statement 2 and the whole of Statement 3. We are in the regime where p ^ 0, a ^x, with ap ^ rn.

First, we show heuristically why the functional Jp defined in (17) is expected to converge to Jq defined in (21). Looking at the equation that minimizes the functional Jp, and doing a Taylor expansion for p ^ 1,

X = —2a sinh(p V E) ~ —2ap VE ^ —2rnV E.

Considering the functional Jp for p ^ 1, it can be seen that

2a 2 2 f „(w) = — (cosh(pw) — 1) " apw ^ rnw ,

, , , V (v wv2 + 4aA 1 j 2-—2 2a v2

fp(v) = pplogV—2a—)--p + 4ap,

implying that

fp(w) ^ rnw2, fp(v) ^

We now turn to the rigorous proof of the convergence to the quadratic gradient flow and therefore point Al of Statement 2. For this, we need the concept of Mosco-convergence. Given a sequence of functionals $n

and $ defined on a space X with weak and strong topology, $n is said to Mosco-converge to $ ($n ^ $) in the weak-strong topology of X if

V xn — x weakly, liminf $n (xn) > $(x), ^q)

V x 3 xn ^ x strongly such that lim sup $n (xn) < $(x).

The gradient-flow Definition l is based on the function space X := AC(0, T), and we define weak and strong topologies on the space AC(0, T) by using the equivalence with W 1,1(0, T). Let x, xn e AC(0, T). We say that xn converges weakly to x (xn -— x) if xn ^ x strongly in L1 (0, T) and xn — x weakly in L1 (0, T), i.e. in o(L1, Lx); we say that xn converges strongly to x (xn ^ x) if in addition xn ^ x strongly in L 1(0, T).

Theorem 2 (Convergence to the quadratic limit) Given E : R x[0, T] ^ R satisfying condition (11), for x e AC (0, T) consider the functional J p

Jp(x) = J (fp(x(t)) + f*(-VE(x(t), t)) + x(t)VE(x(t), t)) dt,

then, for a ^to, p ^ 0, and ap ^ rn > 0, Jp —> Jq in the weak-strong topology of AC (0, T), with

Jq(x) :=jf (^ + «(VE)2(x(t), t) + x(t)VE(x(t), t^ dt. (21)

Moreover, if a sequence {xp} is such that Jp(xp) is bounded, and

E(xP(0), 0) +( dtE(xP(s), s) ds < C VP, Jo

1 TTO\

then the sequence {xcp} is relatively compact in the topology a(L1, Lto).

Proof First, we prove the Mosco-convergence. The limsup condition follows because, for p ^ 0

fp(n) ^ , fP(^) ^ 2, locally uniformly. 4m

By the local uniform convergence, we can choose the recovery sequence to be the trivial one.

Now we prove the liminf inequality. The uniform convergence of xp to x implies that we can pass to the limit in the terms E(xp(-), ) and /dtE(xp, t) dt. Then, applying Fatou's lemma, we find

liminf f 2_ (cosh(pVE(xp, t)) — 1) dt > liminf f aP(VE)2(xp, t) dt J00 p pJ00

> I m(VE)2(x, t) dt, J0

where we used the inequality 2 cosh (0) > 2 + 02.

Some algebraic manipulations are needed to estimate the lim inf for fp. Let v > 0

v t v + V v2 + 4a2 \ 1 / 2 2 2a

fp(v) = p i0g!-2_-) — /v2 + 4_2 + J

v v v2 2a v2 2a

= Jlog ( 2_ + V 4_2 +1 j — JV 4_2 +1 + J

^ v / v v3 \ 2a / v2 \ 2a v2 v4

> J\ Ta — 48a V — J \ + 8_z + _ = _" —

48a3/ P \ 8a J J 4aJ 48a3p'

where we used that

_ v3 v

log (v + Vv2 + 1 J > v--, and V1 + v < 1 +—.

V / 6 2

Then, considering that a3P ^ to, we get

i'T r T / x2 x 4 \ r T ' 2 liminf / fP(xP) dt > liminf / ( —P---^ 1 dt >/ — dt,

P-0 70 ^^ " P-0 70 \4aP 48a3P/ " ^ 4m '

and we conclude.

Now we prove the compactness.

Let us suppose that Jp (xp) and E(xp (0), 0) + J0T dtE(xp, t)dt are bounded. Then, by the positivity of f*,

i fP(xP) dt < C < to, VP. 0

With the choice p = 1, we have

/ fi(xp) dt < C Vp < 1, Jo

and the compactness of {xp} in o(L1, Lc) follows from the Dunford-Pettis theorem (e.g. [10, Th. 4.30]). □

Note that this result can also be obtained by the abstract method of Mielke [29, Th. 3.3]. Also note that the result can also be formulated in the weak-strict convergence of BV (see the definition after Eq. (25)); for the lower semicontinuity, this follows since the weak convergence in BV with bounded J p implies weak convergence in AC, and for the recovery sequence, it follows from our choice of the trivial sequence.

We end this section with two theorems completing the proof of Statement 3, pictured in the right-hand side of Fig. 3.

First, we define for each h > 0 the SDE

dYth = —2mVE(Yth, t)dt + -ZlrnhdWt, (22)

where Wt is the Brownian motion on R, Y0h has law 8x0, and its generator ¿2 is defined as

¿2f (x) = —2mVE(x, t)V f (x) + rnhAf (x). (23)

For f e Cb (R), the expected value E is defined as

E(f (Yh)|xo) = [ f (z)dfa (z),

with, denoting by ¿2T the adjoint of ¿2,

dt fit = ¿2Tfit = — 2rnV ■ (fa VE) + rnh A^t, Mo = Sx0.

where fa is the law at time t of the process Yh started at the position x0 at time t = 0. Then, the following theorems hold.

Theorem 3 (Large deviations for the processes Xnand Yh) Given an energy E satisfying condition (11), fix 0 < 8 < 1 and consider the sequence of processes {Xn} with generator ¿2n defined in (12) with p = pn = n—s and ap = an pn ^ rn forn ^ <x>. Then, ifXn (0) ^ x0, the process Xn satisfies a large-deviations principle in 0([0, T ]) with speedn1-8 and with rate function the extension of Jq in (21) to BV:

Jq(x) := ji ^x4^) + a(VE)2(x(t), t) + x(t)VE(x(t), t^ dt, when x e AC(0, T) and Jq (x) = +c» otherwise.

Moreover, as h ^ 0 the process Yh defined in (22) satisfies a large-deviations principle in 0([0, T]) with speed h—1 and also with rate function Jq.

Proof The proof of the large-deviations principle for Xn relies on the fulfilment of three conditions, namely convergence of the operators Hn, exponential tightness for the sequence of processes Xn, and the comparison principle for the limiting operator H, following the steps of [20, Sec. 10.3].

We restrict ourselves, for the sake of simplicity, to the case a = rn/p, and let m = n1-8. To treat the time dependence, we use the standard procedure of converting a time-dependent process into a time-independent process by adding the time to the state variable (see e.g. [19, Sec. 4.7]): consider the variable u = (x, t) e R x [0, T], f e Cc2(R x [0, T]), and given ¿2n defined in (12), we define Qn as

Qnf (u) = ¿nf (x, t) + dtf (x, t). (24)

With m = n1—8, we have,

Hnf (u) = 1e—mf(u) Qn emf (u) = ie—mfu Qn em f (u) + dtf (u). mm

Now, with the convention u + 1/n = (x + 1/n, t),

Hnf (u) = un2S [e-n-SVE(u) (enl-&(f(u+1/n)-f (u)) _

+ en_vE(u) ^en1-s(f (u_1/n)_f (u)) _ + dtf

(1 _ n_svE + o(n_s))(n_sv f (u) + n_2S 1 (V f )2(u) + o(n_2S + (1 + n_s V E + o(n_8)^ _n_s V f (u) + n_2S 2 (V f )2(u) + o(n_2S)^ + dtf

= rnn2S

= _2rnV E (u )V f (u) + rn(V f )2(u) + dtf (u) + o(1),

implying convergence in the uniform topology,

lim ||Hnf _ Hf\\x = 0,

Hf(u) = _2rnV E (u)V f (u) + rn(V f )2(u) + dtf (u).

The exponential tightness holds by [20, Cor. 4.17]. The comparison principle can be proved as in [20, Example 6.11], modifying the definition of the auxiliary function used in the cited example, with an additional time-dependent term

__(X — y)2

0n(X, y, t, T) = f (X, t) _ f (y, T) _ n K JJ _ n(t _ T)2,

— 1 + (x _ y)2

and then the proof, mutatis mutandis, follows similarly.

Then, the large-deviations principle holds in Drx[0,t]([0, T]) with rate functional

Jq(u) : = L(x, X, t, t) ds, J0

where L is the Legendre transform of H respect to the variables (V f, dtf). It is just a calculation to check that

x2(s) VE2(X(s) t(s)) L (x, x, t, t) = —^+VE (x (s), t (s ))x (s) +- \ + ^(s))'

0 t(s) = 1, otherwise.

where I1 is the indicator function of the set {1}, i.e.

h(Hs)) =

It is then clear that Jq (u) = Jq (x).

The large-deviations result for Yh can be found in [23, Th. 1.1 of Ch. 4] in the case of a time-independent energy. The time-dependent case follows by the same modification as above. □

Theorem 4 (Convergence to Brownian motion with gradient drift) Let be given an energy E satisfying condition (11), with V E uniformly continuous, let an ^ rn, nftn ^ 1/h, and let i be the law of the process {Xn(t)} definedin (12) with f = ^x^-VX ^ x0, then fn weakly converge to f (in the duality with Cb(R)), where f is the law of the Brownian motion with gradient drift (22) with f0 = sxq .

Proof This is a result of standard type, and we give a brief sketch of the proof for the case of time-independent E, using the semigroup convergence theorem of Trotter [45, Th. 5.2]. The assumptions of this theorem are satisfied by the existence of a single dense set on which Q and Qn are defined, pointwise convergence of Qn to Q on that dense set, and a dense range of k _ Q for sufficiently large k. The assertion of Trotter's theorem is pointwise convergence of the corresponding semigroups at each fixed t, which implies convergence of the dual semigroups in the dual topology, which is the statement of Theorem 4.

We set the system up as follows. Define the state space Y := Cb(R) with the uniform norm, where R is the one-point compactification of R; define the core D := {f e C^(R) n C(R) : Af uniformly continuous }, which is dense in Y for the uniform topology, and which will serve as the dense set of definition mentioned above for both Qn and Q. For each f e D, Qnf ^ Q f in the uniform topology.

The density of the range of k - Q is the solvability in D of the equation

-rnhAf + 2«V E V f + kf = g, in R,

for all g in a dense subset of Y; we choose g e Cc (R) + R. This is a standard result from PDE theory, which can be proved for instance as follows. First note that we can assume g e Cc (R), by adding a constant to both g and f. Secondly, for sufficiently large k > 0 the left-hand side generates a coercive bilinear form in H1 (R) in the sense of the Lax-Milgram lemma, and therefore, there exists a unique solution f e H *(R). By bootstrap arguments, using the continuity and boundedness of V E, we find f e C^(R), and since f e H1 (R) n C^(R), f (x) tends to zero at ±c, implying that f e C^(R) n C(R). Finally, since VE is uniformly continuous, the same holds for Af. This concludes the proof. □

4 Rate-independent limit

In this section, we prove point A2 of Statement 2 and the whole of Statement 3. We will prove point A2 with a theorem that holds in greater generality, without assuming the explicit form of the couple f p, f *, but only a few 'reasonable' assumptions and the limiting behaviour.

We are therefore in the regime where p ^c, log a = — p A for some A > 0.

4.1 Functions of bounded variation and rate-independent systems

We now briefly recall the definition of the BV space of functions with bounded variation, following the notation of [33]. A full description of this space and its properties can be found in [4]. Given a function x : [0, T] ^ R, the total variation of x in the interval [0, T] is defined by

Var(x, [0, T]) := sup

^ |x(tj) - x(tj-i)| : 0 = to < ••• < tn = T j=1

We say that x e BV ([0, T]) ifVar(x, [0, T]) < c. The function x then admits left and right limits x (t-) and x (t+) in every point t e [0, T], and we define the jump set of x as

Jx :={t e [0, T] : x(t-) = x(t) or x(t) = x(t+)}, and the pointwise variation in the jump set as

Jmp(x, [0, T]) := ^ (|x(t-) - x(t)| + |x(t) - x(t+ )|). (25)

t e Jx

The total variation admits the representation

Var(x, [0, T]) = i |x(t)|dt +/ d|Cx|+Jmp(x, [0, T]),

where |x| is the modulus of the absolutely continuous (a.c.) part of the distributional derivative of x; the measure |Cx| is the Cantor part and Jmp represents the contribution of the (at most countable) jumps.

Given a sequence {xn} c BV ([0, T]), we again define two notions of convergence. We say that xn weakly converges to x (xn x) if xn (t) converges to x (t) for every t e[0, T] and the variation is uniformly bounded, i.e. supn Var(xn, [0, T]) < c. We say that xn strictly converges to x (xn ^ x) if xn -— x and in addition Var(xn, [0, T]) converges to Var(x, [0, T]) as n ^c.

According to the general set-up of [32,33], we define the notion of rate-independent system based on an energy balance similar to equation (9), where now the dissipation f has a linear growth, i.e. f(n) = f Ri(n) = A | n | with A > 0.

We first define JmpE , which can be viewed as an energy-weighted jump term, as JmpE(x, [0, T]) = £ [A(x(t-), x(t)) + A(x(t+), x(t))],

t e Jx

where using the notation a v b := max{a, b}

A(x0, xi) := inf

e e AC(0,1)

i (|VE(t,e(T))|v A) |e(T)|dT : e(i) = xt

The relation between the definitions (26) and (25) becomes clear in the following inequality

JmpE(x, [0, T]) > A Jmp(x, [0, T]),

where equality can be achieved depending on the behaviour of E, e.g. trivially when |VE(x, t) | < A for every x, t. Then, we can interpret JmpE as a modified jump term, with an E-dependent weight.

In analogy with the (generalized) gradient flow Definition 1, we define rate-independent systems. There is no unique way to define solutions for a rate-independent system. The so-called energetic solutions have been introduced and analysed in [35-37] and are based on the combination of a pointwise global minimality property and an energy balance. Here we concentrate on BV solutions, as defined in [32,34]. Our limiting system will be of this type.

Fix A > 0, the rate-independent dissipation fRI and its Legendre transform fRI are

fRI (v) = A|v|, fR I (w) =

0 w e [-A, A], +c otherwise.

Definition 3 (Rate-independent evolution, in the BV sense [32,34]) Given an energy E : R x[0, T] ^ R, continuously differentiable, and A > 0, a curve x e BV([0, T]) is a rate-independent evolution of E in [0, T] if it satisfies the energy balance

[ {fRI(x(t)) + fRI(-VE(x, t))) dt + a[ d|Cx| + JmpE(x, [0, T])

+ E(x(T), T) - E(x(0), 0) -i dtE(x, t)dt = 0, (28)

with fRI and fR I are defined in (27).

4.2 Assumptions and the main result

In the rest of this section, we prove that the generalized gradient-flow evolution converges to the rate-independent one. This is point A2 of statement 2, formulated in Theorem 5 showing Mosco-convergence of J p to JRI, which is the left-hand side of (28). There are three main reasons why the convergence to a rate-independent system should be expected.

First, from a heuristic mathematical point of view, our choices of a and p yield pointwise convergence of f p and f to a one-homogeneous function and to its dual, the indicator function; this suggests a rate-independent limit. However, this argument does not explain which of the several rate-independent interpretations the limit should satisfy, nor does it explain the additional jump term.

Secondly, from a physical point of view, the underlying stochastic model mimics a rate-independent system. This can be recognized by keeping the lattice size finite but letting p ^c; then, the rates either explode or converge to zero, depending on the value of V E. We can interpret this in the sense that when a rate is infinite, with probability one a jump will occur to the nearest lattice point with zero jump rate.

Thirdly, considering the evolution, in the case 1 ^ p < c the generalized gradient flow will present fast transitions when |VE | > A. By slowing down time during these fast transitions, we can capture what is happening at the small timescale of these fast transitions—which become jumps in the limit. This is exactly how we construct the recovery sequence in Theorem 5.

The convergence will be proven in a greater generality; more precisely, we do not use the explicit formulas, but we require that f p and f t satisfy the following conditions:

A f p and fp are both symmetric, convex and C ,

for jwj > A,

B fp converge pointwise to f R¡(w) =

0 for | w | < A;

C V M > 0 3 Sp ^ 0 such that as p

:= dfp(A + Sp) ^<x>,

df*(w + MKp) sup —--Kp ^ 0, and

jwj|K dfp(w v (A + Sp)) p

dfp(A + MKp)Kp ^ 0. D For each a > 1 and for each jwj < R, there exists np(w, a) > 0 such that

dfp(w + np(w, a)) = adf p(w), and n p is bounded uniformly in a, p, and jwj < R.

It is important to underline that the previous conditions C-D are needed in Theorem 5 only for the r-limsup, meanwhile they are not necessary for the r-liminf.

Conditions C-D are very technical and not intuitive, but they are satisfied by a large family of pairs (f p, fp). In the following we show, with two examples, that our specific case and the vanishing-viscosity approach, respectively, are covered by the assumptions A-D. In the two examples, we implicitly underline the novelty of our approach and the difference between our model (with L log L-type dissipation) versus the vanishing viscosity (see [34]).

Dissipation (16): fp(w) = p_le_pA cosh(pw)

Conditions A and B are trivially satisfied. Then, considering only w > A for simplicity, we get

dfp(w) = e_pA sinh(pw) ~ ep(w_A). With the choice S p = p_1 log(p), it holds that

Kp ~ , dfp(A + MKp)Kp < epMKpK p ^ 0. Then, condition C is satisfied with

df p(w + KpK „ exp(p(w + K p _ A)) + 1 tar^^iMr n

—a — Kp < -——---Kp < (exp(pKp) + 1)Kp ^ 0.

dfp(w) exp(_p(A _ w))

Condition D is satisfied because for w ^ 1 we have that sinh(pw) ~ ^epw. Then, condition D approximately reads as

eP(w+n) — aePw,

which is satisfied for n — P 1 log a.

Viscoplasticity with vanishing viscosity: fp(w) = P(|w| — A)+

Also here, conditions A and B are immediately satisfied. Then, again considering w > A, we verify condition C by choosing Sp — P—1/3 and k = 1, so that

dfp(w) = 2p(w — A) K—1 = 2pSp

Then, it is just a calculation to check that condition C is satisfied in this case. Now condition D requires that

2p (w + n — A) = 2a P(w — A),

and so n = (a — 1)(w — A) satisfies the condition.

Theorem 5 (Convergence to the rate-independent evolution) Given an energy E satisfying condition (11), a sequence of couple fp — fp satisfying conditions A-D, and for x e BV ([0, T ]) consider the functional Jp

Jp(x) = J [fp(x(t)) + fp(—VE(x(t), t)) + X(t)VE(x(t), ) dt (29)

when x e AC(0, T) and +Jp(x) = +to otherwise. Then, as p ^ to, J p —> JRI with respect to the weak-strict topology of BV, where Jri is given by

Jri(x) := I {fRi(x(t)) + fRi(—VE(x(t), t))) dt + a[ d|Cx|

+ JmpE(x, [0, T]) + E(x(T), T) — E(x(0), 0) — [ dtE(x(t), t)dt. (30)

Moreover, if the sequence {x p} is such that Jp (xp) is bounded, {xp (0)} is bounded, and

i dtE(xp(s), s)ds < C Vp, t e [0, T], (31)

then {xp} is weakly compact in BV([0, T]).

The proof is divided into three main steps. We first prove the compactness and the liminf inequality; this will follow as in [33, Th. 4.1,4.2]. We report them for completeness, and we translate their proof because we can avoid some technicalities. To finish the proof, we need to construct a recovery sequence. When minimizers with JRI = 0 are considered, then the recovery sequence is easy to construct; we just need to take a sequence xp such that Jp (xp) = 0 for every p. But for the full Mosco-convergence, we need to find a way to construct a recovery sequence also for non-minimizers of JRI. This is the last part of the proof, and it will be achieved using a parametrized-solution technique.

4.3 Proof of compactness and the lower bound

Proof (Proof of compactness) Recall that weak convergence in BV is equivalent to pointwise convergence supplemented with a global bound on the total variation (e.g. [4, Prop. 3.13]).

First, we show that |xp (t) — xp(0)| is bounded uniformly in t and p. We observe that

f p(v) + fp(A) > Av, for every v, A e R, (32)

and so we obtain

A|xp (t) — xp (0)|< a[ |xp| ds </ fp(xp) ds + t sup fi(A) < C < +to, 00

where the constant C may change from line to line. Then,

|xp(t) — xp(0)|< C for every p and every t e [0, T].

The inequality above and the boundedness of x (0) imply that the whole sequence is bounded for every t e [0, T].

Next we show the existence of a converging subsequence. For every 0 < t0 < t1 < T, we recall the bound

A|xp(tx) — xp(t0)|<^ A|xp| dt <jf (f p(xp) + f*p(A)} dt.

Afh r t1

A|xp| dt < /

t0 Jt0

Defining the non-negative finite measures on [0, T]

vp,A := (f p(xp) + fp(A^ C\

up to extracting a suitable subsequence, we can suppose that they weakly converge to a finite measure va, so that

A limsup \xp(to) - xp(ti)\ < limsup vp:A([to, ill) < VA([to, h]). Defining the jump set J := {t e [0, T] : va ({t}) > 0} and considering a countable set I d J that is dense

in [0, T], we can find a subsequence ph such that xph —> x for every t e I as ph From now on, for

simplicity, we will number the subsequence with the same index of the main sequence. Then,

A\x (to) - x (ti)\<VA ([to, til), forevery to, ti e I. (33)

The curve I b t ^ x(t) can be uniquely extended to a continuous curve in [o, T]\J, that we will still denote by x. Arguing by contradiction, we show that the whole xp (t) converges pointwise to x (t). Fix a point t e [o, T]\I, if the pointwise convergence does not hold, we can find a subsequence that we still denote xp (t) such that xp (t) ^ x = x(t). From the pointwise convergence for t e I and the continuity of x in [o, T]\J, we can find a further subsequence I b tpn ^ t such that x pn (t pn) ^ x (t), but this is in contradiction to the previous inequality, assuming for simplicity t pn < t,

A\x (t) - x\< liminf A\xpn (tpn) - xpn (t )\< limsup vpn, a ([tpn, t]) = VA({t}) = o.

We have so proven the pointwise convergence of xp to x; the inequality (33) then gives a uniform bound on the BV norm of xp and so we conclude. □

Proof (Proof of the liminf inequality) Let {xp} c AC (o, T) be a sequence such that Jp (xp) is bounded, which converges weakly to x e BV([o, T]). By the arguments above, x p is bounded uniformly in t and p and every term in the functional is bounded itself by a constant independent of p. The following limits, follow from the pointwise convergence and Lebesgue's dominated convergence theorem:

E(xp(•), •) ^ E(x(•), •), f dtE(xp(t), t) dt ^ f dtE(x(t), t) dt.

As we said, the integral JoT f p(-VE(xp (t), t))dt is bounded for every p by a constant that we still denote by C. Because of the monotonicity of fp, this bound implies that

f *p(a)C1{t e (o, T) : \VE(xp(t), t)\> a}< K Va > o, and since fp(w) ^ for W > A, we obtain

lim L1{t e (o, T) : \VE(xp(t), t)\ > a} = o Va > A.

This proves that\VE(x(t), t)\< A a.e. and therefore /oT fRI(VE(x(t), t)) dt = o; it trivially follows that

hminf^ fp(V E (xp (t), t)) dt >J f*RI(V E (x (t), t)) dt.

We now prove the second part of the inequality,

rT , r-T t-T

(f p(x) + fp(VE)) dt > J A\x\ dt + A j d\Cx\+JmpE(x, [o, T]). As in the proof of the compactness, we consider the non-negative finite measure on [o, T]

vp := (fp(xp) + fp(VE(xp, ■))} L1, up to extracting a subsequence, we can suppose that they weakly converge to a finite measure

vo + f RI(VE(x, •))L1.

Because |VE (x, )|< A L-a.e. we obtain that, as in the proof of the compactness,

V0 > A(|jc| + |Cx| + | Jx|) = fri(x) + A(|Cx| + | Jx|).

This inequality is slightly too weak for us. The Cantor and the Lebesgue measurable parts are fine, but we need a stronger characterization of the jump part: for all t e Jx,

V0({t}) > A(x(t—), x(t)) + A(x(t), x(t+)). wo sequences h— < t < h+ converging

xp(h—) ^ x(t—), xp(h+) ^ x(t+),

To prove this, fix t e Jx and take two sequences h— < t < h+ converging monotonically to t such that

and define

sp(h) := h + J if p(xp(r)) + fp(VE(xp(r),r) dr; s± : = sp(h±). Because of the convergence of v p, we have

lim sup(s+ — s—) < limsup vp([s—, s+]) < v0({t}),

p^TO p^TO

and up to extracting a subsequence we can assume that s± ^ s±. Denote by hp := s—1 the inverse map of sp, we observe that hp is 1-Lipschitz and monotone, and it maps [s—, s+] onto [h—, h+]. We can then define the following Lipschitz functions

— „+!

xp (hp (s)) if s e [ss+],

6 p(s) :=

xp (hp) if s < s~p. The functions 6 p are uniformly Lipschitz, since (writing r = hp (s))

|xp (r)|

\0ß(s)| = \jtß(hß(s))| \hß(s)| <

1 + tß(xß(t)) + f *ß(VE(xß(t), t))

(32) max{c, 1}

and they take the special values

±) = Xß (h±), 0 ß(t) = Xß (t).

Therefore, denoting by I a compact interval containing the intervals [s— , s+] for all p, then up to a subsequence, we have that

6p(s) ^ 6(s), |6p| m in LTO(I) with m >|6|. Moreover, 6(s±) = x(t±) and 6(t) = x(t). Then, using the inequality

f p(v) + fp(w) > (|w| v A)v — fp(A),

we obtain

vo({t}) > lim sup // (f .ß(Xß (t)) + fß(V E (Xß (t),t))} dT

> liinmf i_ß ((\V E (Xß (t), t)\ v A)\Xß\(T) - fß( A)) dT

> liminM (\V E (0 ß(s ), hß (s ))\ v A)\0ß\(s ) ds - (h+- h- )f ß( A).

ß^(X> Js0

x(r+) -x(t_) -xo

s(r_) s(r+)

Fig. 5 Schematic representation of the time parametrization procedure. The curve x is such that x(s(t)) = x (t)

The last term (h+ — h—)fA) tends to zero as ß Therefore,

vo({t}) > liming (\V E (6 ß(s), hß (s))\v A)\0ß\(s) ds ß^™ Jj

(*) r fs+

>J (\VE(6(s), s)\ v A)m(s) ds >J (\VE(6(s), s)\v A)\6>\(s) ds > A(x(t—), x(t)) + A(x(t), x(t+)). The inequality (*) follows from the technical Lemma [33, Lemma 4.3], and so we conclude.

4.4 Proof of the limsup inequality

Proof We assume that we are given x e BV([0, T]); we will construct a sequence x ß such that Jß(xß) ^

Jri(x ).

Reparametrization. A central tool in this construction is a reparametrization of the curve x (as in Fig. 5), in terms of a new time-like parameter s on a domain [0, S]. The aim is to expand the jumps in x into smooth connections.

As in [32, Prop. 6.10], we define

s(t) := t + f (fRi(x) + fRj(VE(x,v))) dT + A f d\Cx\ 00 + JmpE(x, [0, t]), (34)

then there exists a Lipschitz parametrization (t, x) : [0, S] ^ [0, T]x R such that t is non-decreasing,

t(s(t)) = t, and x(s(t)) = x(t) for every t e [0, T],

and such that

/ L(x, t, X, t) ds 0

= i (fRi(x) + fRi(VE(x,t))) dT + a( d\Cx\+ JmpE(x, [0, T]), 00

where recalling that t = 0 in the jumps

L(x, t, x, t) =

A\x\ + fRJ(\VE(x, t)\) if t > 0, \x\ (A v\VE(x, t)\) if t = 0.

Moreover, it also holds that

Var(x, [0, S]) = Var(x, [0, T]). Note thatL(x, t, x, t) > \x\ (A v \VE(x, t)\), since fRi(w) is only finite when H < A.

Preliminary remarks. The third term in Jp (xp) (see (29)) is equal to

E(xß(T), T) - E(Xß(0), 0) - 3tE(Xß(t), t) dt,

and these three terms pass to the limit under the strict convergence xp ^ x that we prove below. We therefore focus on the other terms in J p and JRI. By (36), it is sufficient to prove that

f ß {Xß(t)) + fp (VE(Xß(t), t)

lim sup

p^<x> J0

< i L(x(s), t(s), x(s), t(s)) ds. (38)

From condition (11), we have that V E (x, t) is uniformly Lipschitz continuous in t; let L be the Lipschitz constant. In order to define a time rescaling, we introduce an auxiliary function. We fix

2M := L f |X(s)| ds = LVar(x, [0, T]) = LVar(x, [0, S]),

and use Hypothesis C to obtain sequences Sp, K p ^ 0 for this value of M. We now define

pp(v, w) := inf |ef p + efp (|w| V (A

= |v| (|w| V (A + Sp)), (39)

and the infimum is achieved by

ep(v, w) := ——--. (40)

* dfp (|w|v(A + Sp)) V '

The function ep can be interpreted as an optimal time rescaling of a given speed v andagivenforce | w |v (A+S p).

Definition of the new time tp and the recovery sequence xp. For the sake of simplicity, in the following we construct a recovery sequence only for a curve x with jumps at 0 and T. Later in the proof we show that, in a similar way, a recovery sequence can be constructed for a curve x with countable jumps with transparent changes in the proof.

We construct the recovery sequence by first perturbing the time variable t. We define tp : [0, S] ^ [0, Tp] as the solution of the differential equation

tp(s) =t(s) v ep (X(s), VE (x(s), t(s))) , tp(0) = 0. (41)

We can assume that |xx(s)| = 0 for s e [0, s(0)] and s e [s(T-), S] to guarantee the positivity of ep. Then, for s e [s(0), s(T-)], we have

t=t(s)

so that tp (s) > 0 for all s e [0, S]. The range of tp is [0, Tp], with Tp > T; since the recovery sequence xp is to be defined on the interval [0, T], we rescale tp by

^ß := T ^ I'

and define our recovery sequence as follows:

Xß (t) := x(t-1 {tXß)J, so that iß (t) = p (V {tXßj)^ß-

We now have that

tß {iß(t)) + tß(VE(Xß(t), t)) rT

tß ^ (t-1 (tkß )) kß^ + f ß (vE (x(t-1(tkß)), t)) tß ^(s)kß^ + tß (VE(x(s), tß(s)k- ))

Estimates. The inequality (38) now follows from the following three estimates: Lemma 1 Write e p(s) := ep (x(s), VE (x(s), t(s))). Then, there exists C p ^ 0 for p ^rn such that

tß (|V E (x(s), tß (s )k-1)^ ^ - tß (|V E (x(s), t(s ))\) s ß(s ) < CßS ;

(x(s)kßyt - fß(sß(s))sß<s)

ds < C S;

tß (Sß(s0 + tß (\VE(x(s), t(s))d] sß(s) ds

< f \x(s)\ (A v \VE(x(s), t(s))|) ds + CßS. Jo

We prove this lemma below.

Convergence and conclusion. Strict convergence of x p ^ x follows if we prove the pointwise convergence xp (t) ^ x (t) for all t e [0, T] and the convergence of the variation. Recall the definition of tp(s) = t(s) v sp (x(s), VE (x(s), t(s))), then

lim sup sp(s ) = lim sup---

p^™ / pv ; p^™ / df*(\VE(x(s), t)|v(A + Sp))

sups \x(s)\ < lim „ . Fs,' v , , = 0,

ßdtß(A + S ß)

implies tß (s) ^ t(s), and so it also holds

tß (s) ^ t(s)

t-1(tkß) ^ s(t) Wt e (0, T).

Moreover, tp (s) > 0 implies that t-1 (0) = 0 and t-1 (Tp) = S, and so we have that

Xß (t) = x (t-1 (tk ß)) ^ x(s(t))(=} X (t) W t e [0, T].

The convergence of the variation is automatic, since by definition of xp

r \Xp(t)|dt = i

i \Xß(t) dt = i \x(s)\ds = Var(x, [0, S]) (=} Var(x, [0, T]).

Recovery sequence for a general curve x. Now we show how to construct a recovery sequence for a curve with countable jumps. Given the jump set Jx, fix e > 0, consider a countable set {tl} c Jx u {0, T} (with f < f+1) such that

Jmp£(x, Jx\{tl}) < e, (45)

and such that the interval [0, T] can be written as the union of disjoint subintervals

[0, T] = U S1 where S1 =[tl, tl+1].

It is important to underline that we would like to take the set {ti} d Jx, but then the decomposition of the interval [0, T] as a finite union of intervals would not be true in general. Due to this complication, we must take a subset, and in the end, we show how to conclude with a diagonal argument. Then, let tp = tp(s(ti)), with tp from (41) and s from (34), we define

ti+i _ ti ,i tp t p

'p ti+1 - ti' and the recovery sequence is

xp(t) := x(t-1 (klp(t - ti) + tp)) for t e Si, (46)

so that

xp(t) = p (t-1 (xlp(t - ti) + tp)) Xlp for t e (S1 )◦.

= -It tp

We have now that

T r , /•s(ti+1)

f p (xp (t)) + fp (VE(xp (t), t)) dt = X

ls(ti)

(i(s )xp)

+fp (VE(x(s), (Xp)-1(tp(s) - tp) + tl))] ^ ds

p) 71 X

Applying Lemma 1 in every subinterval [s(tl),

s(ti+1)], we obtain the same bounds (42-44) with CpS substituted by Cp |s(t'+1) - s(tl )|, and it is important to underline that Cp is independent of e. Then, inequality (38) follows because

Y^Cp|s(ti+1) -s(tl)| = CpS. i

The convergence of the variation follows again by definition of x .

The pointwise convergence of xp (t) ^ x(t) for t e [0, T]\Jx is again trivial. The following calculations show that, by construction, the convergence holds also in the points {t'} c Jx

xp(tl) (4=} x (t-1 (ti)) = x (t-1 (tp(s(ti= x (s(tf^ (=} x(f ). while from (45) and the convergence of the variation we have that

lim |xp(t) -x(t)| <e, Vt e Jx\{tl}.

In fact the recovery sequence x p has a hidden dependence on e, then taking e = p -1 we define a new recovery sequence, that we keep labelling xp, and sending p ^^ (e to zero) we conclude. □

Proof (Proof of Lemma 1) First note that for any s' e [0, S],

0 < tp(s') - t(s') = f (tp(s) -t(s)) ds < I" ep(s) ds < jS J^ ds J0 J0 J0 df p( A + Sp)

= Var(x, [0, S])K p < —.

Consequently,

0 < Tp - T = T(Xp - 1) < p ^ 0 as p ^to. (47)

Using the Lipschitz continuity of V E in time, we also have

|VE(x(s), tp(s^-1^ - |VE(x(s), t(s))| < L|tp(s)X-1 - t(s)|

< L|tp(s)X-1 - tp(s)|+L|tp(s) - t(s)|

< MKp. (48) We now prove (42). Using the convexity of fJ

f J (|VE(x(s), tp(s)X-1)|) (<8) f J (|VE(x(s), t(s))|+ MKp) < fJ (|V E (x(s), t(s))|) + MKp dfp (|V E (x(s), t(s))| + MKp). (49)

Setting Sp := {s e [0, S] : tp(s) = t(s)}, we split the domain into Sp and Sp. On Sp, since t(s) > 0, the finiteness of L implies that |VE| < A; on Sp, tp(s) = ep(s). Therefore,

J fJ (|V E (x(s), t(s ))|) [tp (s) - ep(s^ ds

= / fJ (|V E (x(s), t(s ))|) [tp (s) - ep(s)] ds JSp

< f J(A) j tp(s)ds < CpS, (50)

with C := f (A).

We also integrate the second term in (49) first on S p and then on Sp. On Sp, again since |V E | < A

MKp dfJ (|V E (x(s), t(s ))|+ MKp )tp (s) ds

< MKp fp(A + MKp) J tp(s) ds < CpS,

with C := MK f (A + MK ). On the other hand, on Sp, using

tp(s) = ep(s) = |x(s)Udfp (|VE(x(s), t(s))| v (A + Sp)),

we get

MKp dfp (|V E (x(s), t(s))|+ MKp )tp (s) ds

r dfp (|V E (x(s), t(s))|+ MKp) .

< M / Kp —-pp--- |x(s)| ds < CpS, (51)

" JSp dfJ (|V E (x(s), t(s))| v (A + Sp))lKM - p y J

where the last inequality holds because Var(x, [0, S]) < S, with

dfp (|V E (x(s), t(s))| + MKp)

C p := sup MKp

/ Pdfp (|V E (x(s), t(s ))| v (A + Sp))'

Together, (50-51) prove (42) with vanishing Cp thanks to Hypotheses B-C.

To prove (43), we use the fact that tp(s) > ep(s) for all s, and that the mapping t ^ xf p(v/x) is non-increasing. Therefore,

l f d(S)) + C dfP ^ E(X(S),^^ V (A + (Xf) - 1}

ep(s) ds.

where C is a constant and we prove the inequality marked (*) below. Continuing with the argument, we again apply the definition (40) of e p to find

(kp - 1) j dfp (|V E (x(s), t(s))| V (A + 8p)) ep(s)ds < (kp - 1) jf |x (s )|ds < CpS,

where C p := (kp - 1) converges to zero by (47). We next prove (44), with C p := 8 p. We calculate

ep(s ) ds

^ ^p (epS) + !VE(x(s), t(s))[) ep(s) ds

< IS p (i(s)) + !VE(x(s), t(s))| V (A + 8p))

= |x(s^ (|VE(x(s), t(s))| V (A + 8p)) ds by (39)

< [ |x(s^ (|VE(x(s), t(s))|v A) ds + CpS.

We finally prove the inequality (*) above, as the following separate result: for each R > 0, there exists C > 0 such that

Va > 1, VM< R, Vp: fp(adfp(z)) < f p(dfp(z)) + C(a - 1)3fp(z). To show this, note that by Hypothesis D for each a > 1 and for each |z|< R there exists n p(z, a) such that

dfp(z + n p(z, a)) = adfp(z),

and n p is bounded uniformly in a, p, and |z| < R. The following three statements follow from convexity and convex duality:

f p(adfp(z)) + fp(z + np) - (z + np)adfp(z) = 0; f p(df p(z)) + fp(z) - zdfp(z) = 0; fp(z + np) - fp(z) - ndfp(z) > 0. Upon subtracting the second and third line from the first, we find

fp(adfp(z)) < f p(dfp(z)) + (z + np)(a - 1)dfp(z),

which implies the result.

5 Discussion

In the introduction, we posed the question whether we could understand the distinction and the relationship between gradient-flow and rate-independent systems from the point of view of stochastic processes. The simple model in one dimension of this paper gives a very clear answer that we summarize in our words as follows:

- The continuum limit is a generalized gradient flow, with non-quadratic, non-1-homogeneous dissipation, and the large-deviations rate functional 'is' the corresponding generalized gradient-flow structure, in the sense of Sect. 1.4;

- Taking further limits recovers both cases: quadratic gradient-flow and rate-independent;

- At least some of the limits are robust against exchanging the order of the limits, and we conjecture that this robustness goes much further.

Therefore, the quadratic and rate-independent cases appear as natural completion in the scale of systems characterized by a and p.

In addition, the details of the proofs show how the formulation in terms of J of (a) large deviations, (b) generalized gradient flows including rate-independent systems, and (c) convergence results for these systems gives a unified view on the field and a coherent set of tools for the analysis and manipulation of the systems.

Related issues have been investigated in the case of stochastic differential equations. The two limiting processes, n ^ to and p ^ {0, to} can be interpreted as differently scaled combinations of two limiting processes: (a) the small-noise limit, (b) the limit of vanishing microstructure. In the case of SDEs [6,18,22], three regimes have been identified, corresponding to 'microstructure smaller than noise', 'noise smaller than microstructure', and the critical case. In the first of these, 'microstructure smaller than noise', a behaviour arises that resembles the quadratic limit of this paper, in which the microstructure is effectively swamped by the noise. The critical case resembles our original large-deviations result (Theorem 1) in that both give non-quadratic, non-one-homogeneous rate functionals. Finally, when the noise is asymptotically smaller than the microstructure, a limit similar to the rate-independent limit is obtained in [18,22], but because the authors consider time-invariant energies and a different scaling, the behaviour of the limiting system is rather different.

The one-dimensionality of the current set-up may appear to be a significant restriction, but we believe (and in some cases we know) that the structure can be generalized to a wide class of other systems. For instance,

- The initial large-deviations result (Theorem 1) also holds in higher dimensions; other proofs of this and similar results are given in [12,42].

- The joint large-deviations-quadratic limit (Theorem 3) can be generalized to higher dimensions with only notational changes in the proof.

- Of the proof of the convergence to a rate-independent system (Theorem 5), one part (the liminf-inequality) has been done in the generality of a metric space, with a specific functional form of the dissipation potential, in [33]. The other part, the construction of a recovery sequence, is subject of current work; here the characterization of the limiting jump term depends on the particular form of the approximating f p - f J, in a way that is not yet clear.

More generally, the results of [30] show that the connection between large-deviations principles and generalized gradient flows is robust and arises for all reversible stochastic processes and quite a few more (such as the GENERIC system in [17]).

In Fig. 3, the question mark represents an open problem: the combined large-deviation-rate-independent limit. We conjecture that, as in the combined large-deviation-quadratic limit (Theorem 3), a large-deviations principle holds in this limit, with rate functional JRI. Unfortunately, the framework provided by [20] does not seem to apply as-is, and the form of this functional will require a radical change in the strategy of the proof.

Acknowledgments We want to sincerely thank Jin Feng for the valuable comments and suggestions. During all the preparation of this work, Giuseppe Savare provided many useful suggestions and critical remarks, and we are very grateful for his contribution. G.A.B. and M.A.P. kindly acknowledge support from the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) VICI Grant 639.033.008.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

References

1. Abeyaratne, R., Chu, C., James, R.D.: Kinetics of materials with wiggly energies: theory and application to the evolution of twinning microstructures in a Cu-Al-Ni shape memory alloy. Philos. Mag. A 73(2), 457-497 (1996)

2. Adams, S., Dirr, N., Peletier, M.A., Zimmer, J.: From a large-deviations principle to the Wasserstein gradient flow: a new micro-macro passage. Commun. Math. Phys. 307, 791-815 (2011)

3. Adams, S., Dirr, N., Peletier, M.A., Zimmer, J.: Large deviations and gradient flows. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 371(2005), 20120341 (2013)

4. Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems, vol. 254. Clarendon Press Oxford, Oxford (2000)

5. Ambrosio, L., Gigli, N., Savare, G.: Gradient Flows in Metric Spaces and in the Space of Probability Measures, 2nd ed. Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel (2008)

6. Baldi, P.: Large deviations for diffusion processes with homogenization and applications. Ann. Probab. 19(2), 509-524 (1991)

7. Basinski, Z.S.: Thermally activated glide in face-centred cubic metals and its application to the theory of strain hardening. Philos. Mag. 4(40), 393-432 (1959). doi:10.1080/14786435908233412

8. Becker, R.: Über die Plasticität amorpher und kristalliner fester Körper. Phys. Z. 26, 919-925 (1925)

9. Berglund, N.: Kramers' Law: Validity, Derivations and Generalisations. Arxiv preprint arXiv:1106.5799 (2011)

10. Brezis, H.: Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer, New York (2011)

11. Cagnetti, F.: A vanishing viscosity approach to fracture growth in a cohesive zone model with prescribed crack path. Math. Models Methods Appl. Sci. 18(07), 1027-1071 (2008)

12. Chen, X.: Global asymptotic limit of solutions of the Cahn-Hilliard equation. J. Differ. Geom. 44, 262-311 (1996)

13. Dal Maso, G., DeSimone, A., Mora, M.G.: Quasistatic evolution problems for linearly elastic-perfectly plastic materials. Arch. Ration. Mech. Anal. 180(2), 237-291 (2006)

14. Dal Maso, G., DeSimone, A., Mora, M.G., Morini, M.: A vanishing viscosity approach to quasistatic evolution in plasticity with softening. Arch. Ration. Mech. Anal. 189(3), 469-544 (2008)

15. Dirr, N., Laschos, V., Zimmer, J.: Upscaling from particle models to entropic gradient flows. J. Math. Phys. 53(6), 063704 (2012)

16. Duong, M.H., Laschos, V., Renger, D.R.M.: Wasserstein gradient flows from large deviations of many-particle limits. ESAIM: Control Optim. Calc. Var. E-first (2013)

17. Duong, M.H., Peletier, M.A., Zimmer, J.: GENERIC formalism of a Vlasov-Fokker-Planck equation and connection to large-deviation principles. Nonlinearity 26, 2951-2971 (2013)

18. Dupuis, P., Spiliopoulos, K.: Large deviations for multiscale diffusion via weak convergence methods. Stoch. Process. Appl. 122(4), 1947-1987 (2012)

19. Ethier, S.N., Kurtz, T.G.: Markov Processes: Characterization and Convergence, vol. 282. Wiley, Hoboken (2009)

20. Feng, J., Kurtz, T.G.: Large Deviations for Stochastic Processes, vol. 131. Citeseer (2006)

21. Fiaschi, A.: A vanishing viscosity approach to a quasistatic evolution problem with nonconvex energy. Annales de l'Institut Henri Poincare (C) Non Linear Anal. 26(4), 1055-1080 (2009)

22. Freidlin, M.I., Sowers, R.B.: A comparison of homogenization and large deviations, with applications to wavefront propagation. Stoch. Process. Appl. 82(1), 23-52 (1999)

23. Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems, vol. 260. Springer, New York (2012)

24. Kramers, H.A.: Brownian motion in a field of force and the diffusion model of chemical reactions. Physica 7(4), 284304 (1940)

25. Krausz, A.S., Eyring, H.: Deformation Kinetics. Wiley, New York (1975)

26. Mainik, A., Mielke, A.: Existence results for energetic models for rate-independent systems. Cal. Var. Partial Differ. Equ. 22(1), 73-99 (2005)

27. Mielke, A.: Handbook of Differential Equations: Evolutionary Differential Equations, chap. Evolution in Rate-Independent Systems, pp. 461-559. North-Holland, Amsterdam (2005)

28. Mielke, A.: Emergence of rate-independent dissipation from viscous systems with wiggly energies. Contin. Mech. Thermo-dyn. 24(4-6), 591-606 (2012)

29. Mielke, A.: On Evolutionary Gamma-Convergence for Gradient Systems. Tech. Rep. 1915, WIAS, Berlin (2014)

30. Mielke, A., Peletier, M.A., Renger, D.R.M.: On the Relation Between Gradient Flows and the Large-Deviation Principle, with Applications to Markov Chains and Diffusion. arXiv preprint arXiv:1312.7591 (2013)

31. Mielke, A., Rossi, R., Savare, G.: Modeling solutions with jumps for rate-independent systems on metric spaces. Discrete Contin. Dyn. Syst. A 25(2), 585-615 (2009)

32. Mielke, A., Rossi, R., Savare, G.: BV solutions and viscosity approximations of rate-independent systems. ESAIM Control Optim. Calc. Var. 18(01), 36-80 (2012)

33. Mielke, A., Rossi, R., Savare, G.: Variational convergence of gradient flows and rate-independent evolutions in metric spaces. Milan J. Math. 80(2), 381-410 (2012)

34. Mielke, A., Rossi, R., Savare, G.: Balanced-Viscosity (bv) Solutions to Infinite Dimensional Rate-Independent Systems. arXiv preprint arXiv:1309.6291 (2013)

35. Mielke, A., Theil, F.: A mathematical model for rate-independent phase transformations with hysteresis. In: Proceedings of the Workshop on "Models of Continuum Mechanics in Analysis and Engineering, pp. 117-129 (1999)

36. Mielke, A., Theil, F.: On rate-independent hysteresis models. Nonlinear Differ. Equ. Appl. 11(2), 151-189 (2004)

37. Mielke, A., Theil, F., Levitas, V.I.: A variational formulation of rate-independent phase transformations using an extremum principle. Arch. Ration. Mech. Anal. 162(2), 137-177 (2002)

38. Mielke, A., Truskinovsky, L.: From discrete visco-elasticity to continuum rate-independent plasticity: rigorous results. Arch. Ration. Mech. Anal. 203(2), 577-619 (2012)

39. Orowan, E.: Problems of plastic gliding. Proc. Phys. Soc. 52, 8-22 (1940)

40. Puglisi, G., Truskinovsky, L.: Thermodynamics of rate-independent plasticity. J. Mech. Phys. Solids 53(3), 655-679 (2005). doi:10.1016/j.jmps.2004.08.004. http://www.sciencedirect.com/science/article/pii/S0022509604001425

41. Renger, D.R.M.: Microscopic Interpretation of Wasserstein Gradient Flows. Ph.D. thesis, Technische Universiteit Eindhoven (2013). http://alexandria.tue.nl/extra2/749143.pdf

42. Shwartz, A., Weiss, A.: Large Deviations for Performance Analysis: Queues, Communications, and Computing. Chapman & Hall/CRC, London (1995)

43. Sullivan, T.J.: Analysis of Gradient Descents in Random Energies and Heat Baths. Ph.D. thesis, University of Warwick (2009)

44. Sullivan, T.J., Koslowski, M., Theil, F., Ortiz, M.: On the behavior of dissipative systems in contact with a heat bath: application to andrade creep. J. Mech. Phys. Solids 57(7), 1058-1077 (2009)

45. Trotter, H.F.: Approximation of semi-groups of operators. Pac. J. Math. 8(4), 887-919 (1958)

46. Varadhan, S.R.S.: Asymptotic probabilities and differential equations. Commun. Pure Appl. Math. 19(3), 261-286 (1966)

47. Wentzell, A.D.: Rough Limit theorems on large deviations for Markov stochastic processes I. Theory Probab. Appl. 21(2), 227242 (1977)

48. Wentzell, A.D.: Limit Theorems on Large Deviations for Markov Stochastic Processes, vol. 38. Springer, New York (1990)