ESAIM: PROCEEDINGS, January 2014, Vol. 44, p. 177-196

SMAI Groupe MAS - Journées MAS 2012 - Session thématique

AND INEQUALITIES VIA MARTINGALE METHODS

, Christophe Cuny2, Jérôme Dedecker3, Xiequan Fan4 and Sarah Lemler5

Abstract. In these notes, we first give a brief overwiew of martingales methods, from Paul Lévy (1935) untill now, to explain why these methods have become a central tool in probability, statistics and ergodic theory. Next, we present some recent results for/or based on martingales: exponential bounds for super-martingales, concentration inequalities for Lipschitz functionals of dynamical systems, oracle inequalities for the Cox model in a high dimensional setting, and invariance principles for stationary sequences.

Résumé. Dans ces notes, nous faisons d'abord un rapide survol des méthodes de martingales, depuis Paul Lévy (1935) jusqu'à nos nos jours, afin d'expliquer pourquoi ces méthodes sont devenues centrales en probabilité, statistique et théorie ergodique. Ensuite, nous présentons des résultats récents sur/ou fondés sur les martingales : des inégalités exponentielles pour les sur-martingales, des inégalités de concentration pour les fonctionnelles lipschitziennes de systèmes dynamiques, des inégalités oracle pour le modèle de Cox en grande dimension, et des principes d'invariances pour les suites stationnaires.

LIMIT THEOREMS

Jean-René Chazottes 1

Introduction

In this introduction, our goal is twofold. We shall first recall the first developments of the theory of martingales in the field of limit theorems and inequalities. Next we shall briefly explain how these results can be extended to more general sequences. Of course, we shall not give all the references on the subject, this would be a too long exercise, but we shall try to give some essential references from the beginning until now, to explain how martingale methods have become a central tool in probability, statistics and ergodic theory.

In order to write the main results without giving too many notations, we shall consider in this introduction the simple case of stationary sequences. Let (11, A, P) be a probability space, let T be a bijective bi-measurable transformation preserving the probability P, and let I be the a-algebra of T-invariant sets. Let X0 be a centered and square integrable random variable with variance a2, and define the stationary sequence (Xi)ieZ by Xi = X0 o Ti. Given a a-algebra F0 such that F0 C T-1(F0), we introduce the non-decreasing filtration Fi = T-i(F0), and the tail a-algebras = p|keZ and = VkeZ . Let Sn denote the partial sum

Sn = X1 + ■ ■ ■ + Xn.

1 CPHT, CNRS UMR 7644, École Polytechnique, 91128 Palaiseau cedex, France.

2 Laboratoire MAS, École Centrale de Paris, Grande Voie des Vignes, 92295 Chatenay-Malabry cedex, France.

3 Laboratoire MAP5, CNRS UMR 8145, Université Paris-Descartes, Sorbonne Paris Cité, 45 rue des Saints Pères, 75270 Paris cedex 06, France.

4 LMBA, UMR CNRS 6205, Université de Bretagne Sud, Campus de Tohannic, 56017 Vannes, France

5 Laboratoire Statistique et Génome, CNRS UMR 8071- USC INRA, Université d'Évry Val d'Essonne, 23 bvd de France 91 037 Évry, France

© EDP Sciences, SMAI 2013

Article published online by EDP Sciences and available at http://www.esaim-proc.org or http://dx.doi.org/10.1051/proc/201444012

0.1. The martingale case

Let us first recall some important results when (Xi)ieZ is a sequence of martingale differences adapted to the filtration Fi, that is when X0 is F0-measurable and E(Xo|F_i) = 0 almost surely.

Levy (1935) provided the first generalization of the central limit theorem (CLT) for sequence of independent and identically distributed (iid) random variables to the martingale case. His result writes as follows: if E(X2 |F_i) = o2 almost surely, then n-1/2 Sn converges in distribution to N(0, o2).

The strong assumption E(X2|F_i) = o2 almost surely has been removed independently by Billingsley (1961) and Ibragimov (1963), who proved that

—l converges in distribution to ^qN, (0.1)

where n = E(X2 |I) and N is a standard Gaussian random variable independent of I. Actually, both proofs were done in the case where I is P-trivial (the ergodic case) but they remain unchanged in the general situation. As we shall see later on, this result is not only a slight generalization of the iid situation: in many interesting cases, the partial sum of a stationary sequence can be approximated by a martingale with stationary differences, to which the result of Billingsley-Ibramov applies. The CLT for martingales has been next extended to the non stationary case by Brown (1971), who also proved the weak invariance principle for the process {S[nt],t G [0,1]}. A version of Billingsley-Ibragimov's CLT for random variables with values in 2-smooth Banach spaces is given in Woyczinsky (1975).

At the same time, different authors have obtained moments and exponential bounds for martingales. For instance Burkholder (1966, 1973) has proved that, if E(|X0|P) < < for some p g]1, <[, then there exist two positive constants cp and Cp depending on p such that

k=1 c - - c k=1

£Xfc) < max |Sk| < Cp ^X2 . (0.2)

Z-' / P 1< k<L P -' / P

The extension of this famous inequality to the case of continuous martingale is known as the Burkholder-Davis-Gundy inequality (Burkholder Davis and Gundy (1972)). Burkholder's inequality (0.2) is also true in separable Hilbert spaces (using the square of the norm of Xi instead of X2), and the upper bound remains valid in 2-smooth Banach spaces (see Pinelis (1994)).

In the sixties, Hoeffding (1963) and Azuma (1967) proved an exponential bound for the deviation of Sn. In the stationary case, a version of this result writes as follows: if X0 G [Y0, Y0 + £] almost surely, where Y0 is a F_ ^measurable random variable and £ is some positive constant, then, for any positive x,

P( 1<1ka<Xn |Sk| >X) < 2eXP ( - S2) . (0.3)

This inequality implies in particular that, if ||X0||^ < <, then, for any positive x,

Pf max |Sk| < 2exp f--rr^^V (0.4)

V 1<k<n| k| )< FV 2n||X0||2o^

An extension of Inequality (0.4) to 2-smooth Banach spaces is given in Pinelis (1992). Other moments or exponential bounds may be found in the papers by Burkholder (1973), Freedman (1975), Pinelis (1994) or Liu and Watbled (2009).

McDiarmid (1989) has given an interesting application of (0.3) to Lipshitz functions of independent sequences. In the iid case, his inequality writes as follows: let (^i)1<i<n be a sequence of iid random variables with values

in X, and let d be some distance on X. Let f be a function from Xn to R such that

\f {xí, ...,xn) — f (y^ .. .,Vn)\ < ^^2d{xi,Vi)■

Let M = )||to. Then, for any positive x,

/ 2x2 \

P(f (£1, ...,U - E(f (£1, ...,Cn)) > x) < exp ( - ■ (0.5)

McDiarmid (1998) has pointed out a number of applications of Inequality (0.5). This inequality is also an important tool in classification problems (see for instance Freund, Mansour and Shapire (2004)).

Rio (2000) has extended McDiarmid's inequality to a large class of dependent sequences. Collet, Martinez and Schmitt (2002) have proved that McDiarmid's inequality holds when Ci = Ti is the j-th iterates of an uniformly expanding map T of [0,1], and d(x, y) = \x — y\.

0.2. The general case

In this section, we shall no longer assume that E(X0\F-1) = 0 almost surely, and see how the results of the preceding section can be extended to more general sequences. The main step in this direction is due to Gordin (1969). Gordin noticed that, if

Y \F0)||2 < ^ and Y X — E(Xk\F0)||2 < ^ (0.6)

k=1 k = -to

then X0 = D0 + Z — Z o T, where

D0 = Y(E(Xfc\F0) — E(Xk\F_1)) and Z = Y (Xk — E(Xfe\Fb)) — YE(XfeF0) ■

fcGz k=-to k=1

Notice that both D0 and Z are square integrable, that D0 is F0-measurable and such that E(D0\F-1) = 0. Moreover, setting Di = D0oTi, and Mn = D1 + • • -+Dn, one has Sn = Mn +ZoT—ZoTn+1 . Since n-1/2ZoTn+1 converges in probability (and even almost surely) to zero, it follows from Billingsley-Ibragimov's CLT that (0.1) holds with n = E(D2 \I). Gordin's CLT is a major result, which can be applied to many dependent sequences, including mixing sequences, stationary Markov chains (see Gordin and Lifsiz (1978)), and certain dynamical systems (see the paper by Liverani (1996)). As a striking application, let us mention the paper by Le Borgne (2002) who applied Gordin's CLT to the iterates of ergodic automorphisms of the d-dimensional torus. Heyde (1975) noticed that Gordin's proof remains valid by assuming only that

the sequences E(Sn\F0) and Sn — E(Sn\Fn) converge in L2 (0.7)

and that, under this condition, the process {S[nt],í G [0,1]} satisfies both the weak and the strong invariance principle.

Alternatively, it follows from the papers by Hannan (1973) and Heyde (1974), that if

E(X0\Fto) = X0 a. s., E(X0\F-to) =0 a. s. and Y l|E(Xfc\Fb) — E(Xfe\F-1 )||2 < ^ (0.8)

then Gordin's martingale difference D0 is well defined and square integrable, and ||Sn — Mn||2 = o(y/n), so that the CLT (0.1) holds with n = ED^I). Clearly, this condition is weaker than (0.6). Moreover one can

prove that if (0.8) holds then {S[„tj, t G [0,1]} satisfies the weak invariance principle (see Hannan (1979) for the adapted case, i.e. when X0 is F0-measurable, and Dedecker, Merlevede and Volny (2007) for the non-adapted case and extensions). In a recent paper, Cuny (2012a) has proved that the strong invariance principle also holds as well under (0.8) for random variables with values in a 2-smooth Banach space.

Another major improvement of Gordin's CLT is due to Maxwell and Woodroofe (2000). These authors have proved that, if the condition (0.7) is weakened to

^ ||E(Sfc|Fq)||2 < , ^ ||Sfc - E(Sfc|F„)||2 <

2s—k/-< < and 2s--< < (0.9)

k=i k=i

then there exists a square integrable martingale difference D0 such that ||S„ — M„||2 = o(y/n), so that the CLT (0.1) holds with n = E(D2|I) (in fact Maxwell and Woodroofe have proved the result in the adapted case and the non adapted case (0.9) is due to Volny (2007)). Peligrad and Utev (2005) proved that if (0.9) holds, then {S[„tj,t G [0,1]} satisfies the weak invariance principle (again, the non adapted case is due to Volny (2007)).

Note that necessary and sufficient conditions for the martingale approximation ||S„ — M„||2 = o(^/n) are given in the papers by Dedecker, Merlevede and Volny (2007) and Zhao and Woodroofe (2008). Further refinements can be found in the paper by Gordin and Peligrad (2011). Finally, Wu (2007) and Dedecker, Doukhan and Merlevede (2012) have obtained rates of convergence in the strong invariance principle, by reinforcing the condition (0.8).

Concerning moments and exponential bounds, let us mention the paper by Peligrad, Utev and Wu (2007). These authors have proved that, for p G [2, <[, there exists a positive constant Kp depending on p and a positive constant C such that

max,<k<„ |Skl||p . „ „ ,, , „„ ||E(Sk |Fo)||p , „A ||Sk — E(S„|F,

< K,(||X„||, + CE»1^^ + k3/2

k=1 k=1 (the non adapted case is due to Volny (2007)), and that

P( max |Sk| > x) < exM — V ^ / V

2n(|Xo|TO + Cn)iJ '

C = ||E(Sk|Fo)|TO + ^ |Sk — E(S„|Fk)|

C„ ky2 k3/2

for some positive constant D (the non adapted case is due to Dede (2009)). Rosenthal type inequalities can be found in the paper by Merlevede and Peligrad (2013).

0.3. Organization of the paper

In Section 1, X. Fan presents an extension of Hoeffding's inequality to the case of supermartingales with a new method (see Fan, Grama and Liu (2012) for more details). The approach is based on the technique of conjugate distribution and is different from Hoeffding's method (1963). The results improve on several known inequalities of Bennett (1962), Freedman (1975), Nagaev (1979), Haeusler (1984) and Courbot (1999).

In Section 2, J.-R. Chazottes presents some recent concentration inequalities for a large class of nonuniformly hyperbolic dynamical systems modeled by Young towers (see Chazottes and Gouezel (2013) for more details). In this context, he shows how to make use of some classical martingale inequalities, namely Azuma-Hoeffding and Rosenthal-Burkholder inequalities.

In Section 3, S. Lemler shows how an appropriate exponential bound for martingale with jumps enables to choose the weights in a Lasso procedure, for the Cox model in a high dimensional setting. As a consequence,

she obtains non asymptotic oracle inequalities for the conditional hazard rate function (see Lemler (2012) for more details).

In Section 4, C. Cuny presents the recent developments of the method of approximation by a martingale initiated by Gordin (1969), which has reached a fairly precise form thanks to the characterizations obtained recently by Zhao and Woodroofe (2008) and Gordin and Peligrad (2011). He also shows how the introduction of the operator QZ = E(Z o T|Fo) enables to make use of classical tools from ergodic theory. Following this approach, C. Cuny has recently proved (see Cuny (2012a) and Cuny (2012b)) that the almost sure invariance principle holds under the condition (0.8) or under the condition (0.9).

1. Exponential inequalities for super-martingales

Let (£i)i=i,...,n be a sequence of centered random variables such that a2 = E(£2) < ro. Write Sn = ^n=i & and a2 = ^n=i a2. Assume that (£j)j=1i...in are independent and satisfy | < 1 for all i. Bennett (1962) proved that, for all x > 0,

/ a2 \ X+a2

P(Sn > x) < B(x,a2)=: ^^ ex. (1.1)

\x + a2 J

Bennett's inequality can be improved by a bound depending on n. In fact, Hoeffding (1963) improved Bennett's inequality and showed that if (£i)i=i,...,n are independent and satisfy ^ < 1 for all i, then, for all x > 0,

2 \ x+a2 ^ \ n—x ^ n+&2

P(Sn > x) < Hn(x,a2) —-j > l{x<n} (1.2)

I \ x ~a I \ n X I I and Hn(x, a2) < B(x, a2), (1.3)

where by convention <0 = 1 when x = n. Hoeffding's bound Hn(x,a2) is the best that can be obtained from the classical Bernstein inequality P( Sn > x) < infA>0 EeA(s"-x).

Freedman (1975) extended the inequality of Bennett to the case of supermartingales. Let (£i, Fj)i=ibe a sequence of supermartingale differences, i.e. E(£i|Fi-1) < 0. Denote by {S)k = ^k=1 E(£2|Fi-1). Assume that supermartingale differences (£i, Fi)i=1..,n satisfy < 1 for all i. Freedman's inequality states that, for all x,v > 0,

P(Sk > x and (S)k < v2 for some k G [1,n]) < B(x, v2). (1.4)

However, one can not obtain the Hoeffding inequality by Freedman's method. In this note, we give the Hoeffding inequality for supermartingales with a new method. The main result of this section is the following theorem.

Theorem 1.1. Assume that (£i, Fi)i=1j...jn is a sequence of supermartingale differences satisfying £i < 1 for all i = 1,...,n. Then, for all x,v > 0,

P(Sfe > x and (S)k < v2 for some k G [1,n]) < Hn(x,v2) (1.5)

< B(x,v2). (1.6)

It is obvious that if (£i)i=1,...,n are independent then our inequality (1.5) implies Hoeffding's inequality (1.2) with v2 = a2. Hence, we extend the Hoeffding inequality to the case of supermartingales. Note that (1.6) is the Freedman inequality. Thus we improve on Freedman's inequality.

Sketch of the proof of Theorem 1.1. For 0 < x < n,v > 0, we define the stopping time T(x) = min{k G [1, n] : Sk > x and {S)k < v2}. For A > 0, we introduce the martingale (ZTak, Fk)k=0..,„, where

ZTak(A) = g Ei^^' Z0(A) = 1 (1J)

and the conjugate measure

dVx = Zt an(A)dP. (1.8)

Then 1{sk>x and (s)k<v2 for some ke[i,n]} = ELi 1{t(x)=k}. Using the conjugate measure (1.8) we have

p (Sk > x and (S)k < «2 for some k G [l,n]) = EAZTan(A)-1l{Sk>x and <s>fcfor some ke[i,n]}

= Y Ea exp l-ASfc (A)} 1{T(x)=fc}, (1.9)

where (A) = ^k=1 logE(eA^|Fi_i). For T = k, we can prove that

>№) < kf ^ ) < kf (x,£ ) < n/ (4

where /(A,t) = log ^e-At + is increasing and concave for all t > 0. Hence we get

P (Sk > x and (S)k < v2 for some k G [1,n]) < exp j-Ax + n / ^A, — ^ j . (1.10)

Optimizing inequality (1.10) in A > 0, we obtain the desired inequality (1.5). Inequality (1.6) follows from (1.3).

By a truncation argument, Theorem 1.1 implies the following result for non bounded supermartingale differences.

Corollary 1.2. Assume that (&i, Fi)i=i,...,n be a sequence of supermartingale differences. For y > 0, define

V-(y) = YE(e-1{£i<„}|Fi_i), k = 1, ..,n.

Then, for all x > 0, y > 0 and v > 0,

P(Sk > x and V- (y) < v2 for some k G [1,n]) < Hn(X, V-^) + p( max & > y\ (1.11)

k y y2 i<i<n

Since P(V-(y) > v2) < P((S)n > v2), inequality (1.11) improves the corresponding inequalities of Nagaev (1979), Haeusler (1984) and Courbot (1999) in the sense that B (x, p) is replaced by Hn , .

2. Martingale and concentration inequalities for dynamical systems

2.1. Generalities

2.1.1. Concentration inequalities

Let fi be a metric space. A function K on fin is separately Lipschitz if, for all i, there exists a constant Lip^K) with

|K(xo,... ,xi—i, xi, xi+1,.. .,xn—i) - K(xo,... ,xi—i, x'i, xi+1,... ,xn—1)| < Lipi(K)d(xi, xi),

for all points x1,..., xn, xi in fi. Consider a stationary process (Zo, Z1,...) taking values in fi. We say that this process satisfies an exponential concentration inequality if there exists a constant C such that, for any separately Lipschitz function K (x0,... ,xn—1), one has

E^eK(Zo,...,Zn-1)—E(K(Zo,...,Zn-1))^ < eC En=To1 Lip; (K)2 . (21)

Let us stress that this inequality is valid for all n, i.e. the constant C does not depend on the number of variables one is considering. An important consequence of such an inequality is a control on the deviation probabilities: for all t > 0,

P(|K(Zo,..., Zn—i) - E(K(Zo,..., Zn—1))| >t) < 2eLiPj(K)2. (2.2)

This inequality follows from the inequality P(Y > t) < e—AtE(eAF) (A > 0) with Y = K(Z0,..., Zn—1) -E(K(Zo,..., Zn—i)), then we use inequality (2.1) and optimize over A by taking A = t/(2C^"=Q Lipj(K)2). If (Zo, Z1,...) is a sequence of bounded i.i.d. random variables, then (2.1) holds (see e.g. McDiarmid (1989)). One can check that, if K(Zo,..., Zn—1) = Zo + • • • + Zn— 1, then (2.2) gives the right scales with respect to the central limit theorem and large deviations.

In some cases, it is not reasonable to hope for such an exponential inequality. One says that (Zo, Z1,...) satisfies a polynomial concentration inequality with moment Q > 2 if there exists a constant C such that, for any separately Lipschitz function K(xo,... ,xn—1), one has

(n—1 \ Q/2

E|K(Zo ,...,Zn—i) - E(K(Zo,...,Zn— i))|Q < C jY Lipj (K)2j . (2.3)

An important consequence of such an inequality is a control on the deviation probabilities: for all t > 0,

(n— 1 \Q/2

P(|K (Zo,...,Zn—i) - E(K (Zo,...,Zn—1))| >t) < Ct—Q iE Lipj (K )2j . (2.4)

The inequality (2.4) readily follows from (2.3) and the Markov inequality.

Notice that if (Zo, Z1,...) is a sequence of i.i.d. random variables with Zi G LQ, then (2.3) holds (see Boucheron et al. (2005)).

Concentration inequalities are a tool to study in a unified and systematic way the fluctuations of a wide class of functions of random variables K(Zo,..., Zn—1), since the only required condition is that K is separately Lipschitz.

2.1.2. Dynamical systems as stochastic processes

We are interested in processes coming from dynamical systems: we consider a map T on a metric space Q (the "phase space"), and a probability measure ¡ 1 left invariant by T, i.e. ¡ o T-1 = ¡ 2. The process (x, Tx, T2x,...), where x is distributed according to has finite-dimensional marginals given by the measures ¡n on Qn given by

d¡n(xo,..., xn_i) = d¡(xo)6Xl=Txo • • • SXn_1=Txn-2. This is not a product measure but, if the map T is 'sufficiently mixing', one may expect that Tkx is more or less independent of x if k is large, making the process (x, Tx,...) look like an independent process to some extent 3. A natural way of studying the probabilistic properties of such dynamical systems is to look at Birkhoff sums of an observable f : Q ^ R, namely f (x) + f (Tx) + • • • + f (Tnx), that is, partial sums of the process (f (Tkx))k>0. For a class of nice observables, typically Lipschitz functions, one can prove convergence in law after appropriate scaling, large deviations, etc. We refer the interesting reader to the recent survey by Chazottes (2013). A general observable based on the observation up to time n is of the form K(x, Tx,..., Tn-1x). The basic example is of course the Birkhoff sum of some observable f. But many interesting observables do not have such a simple (additive) structure. It is precisely the scope of concentration inequalities to deal with very general observables in a systematic way by using only the fact that they are separately Lipschitz.

2.2. Concentration inequalities for a class of nonuniformly hyperbolic systems

In Chazottes and Gouezel (2013) we obtained concentration inequalities for dynamical systems modeled by the so-called Young towers. We also derived fluctuation bounds for various observables that we shall not present here due to the lack of space.

2.2.1. Set-up

In a nutshell, the set up is the following. We consider a map T : Q Ó which is a nonuniformly hyperbolic system in the sense of L.-S. Young (1998, 1999) : The map is modeled by a Young tower constructed over a hyperbolic base Y C Q. The degree of nonuniformity is measured by the return-time function R : Y ^ Z+ to the base that decays either exponentially or polynomially. Such systems are known to have an SRB4 measure ¡ absolutely continuous with respect to the Lebesgue measure mu.

2.2.2. Exponential concentration inequality

Theorem 2.1 (Chazottes and Gouezel (2013)). Let (Q,T,¡) be a dynamical system modeled by a Young tower with exponential tails, i.e., J exp(coR) dmu < ^ for some co > 0. Then it satisfies an exponential concentration inequality: there exists a constant C > 0 such that, for any n G N, for any separately Lipschitz function K(xo,.. .,xn_i),

K(x,Tx,...,Tn-1x)_ j K(y,Ty,...,Tn-1y)d^(y) d¡(x) < eCY,^ Lip^K)2 . (2.5)

Basic examples to which this theorem applies are subshifts of finite type equipped with a Gibbs measure for a Holder continuous potential, and Axiom A attractors. These are canonical examples of uniformly hyperbolic dynamical systems. More sophisticated systems encompassed by this theorem are for instance the unimodal map on Q = [—1,1] defined as x ^ 1 — ax2, where a G [1, 2], or Henon-like attractors. These are examples of nonuniformly hyperbolic dynamical systems. The Henon attractor results from the iterations of the map (x,y) ^ (1 — ax2 + y, bx), defined on Q = R2, where a, b are positive parameters. For each b sufficiently small, there exists a subset Pb C (2 — e, 2) with positive Lebesgue measure such that for every a G Pb, the map admits a unique SRB measure (see Benedicks and Young (2000)).

1This probability measure is of course defined on a c-algebra. Think of the Borel c-algebra for concreteness.

2i.e., A) = ^.(T —1A) for any measurable set A (T need not be invertible).

3Let us stress that the usual mixing coefficients for stochastic processes are generally not suited for our purposes.

4'SRB' stands for Sinai-Ruelle-Bowen. See for instance Young (2002).

2.2.3. Polynomial concentration inequality

Theorem 2.2 (Chazottes and Gouezel (2013)). Let (fi,T,/) be a dynamical system modeled by a Young tower with a polynomial tail, meaning that, for some q > 2, J Rqdmu < -x>. Then it satisfies a polynomial concentration inequality with moment 2q - 2, i.e., there exists a constant C > 0 such that, for any n G N, for any separately Lipschitz function K(xo, .. ., xn—i),

2q-2 /n-1 \ q 1

J K (x, Tx,..., T n-1x) -J K (y, Ty,..., Tn-1y)d/(y) d/(x) < C Í ^ Lip,(K )2 J

The most important example is certainly the so-called Manneville-Pomeau map, a canonical example of a map on Q = [0,1] which is expanding except at x = 0. More precisely, let

ix(1 + 2axa) if x G [0,1/2)

T(x) =

[2x - 1 if x G (1/2,1]

where a G (0,1). In this case, the base of the Young tower is Y = (1/2,1], m(R = n) ~ c/na+1 and d/ = h dm where the density h(x) ~ x-a as x ^ 0. One checks easily that (2.6) holds for all q < a provided that a G (0,1/2). a

These results are so far the culminating point in the study of concentration inequalities for dynamical systems. They extend and improve all previous results obtained in Collet, Martinez and Schmitt (2000), Chazottes, Collet and Schmitt (2005a), Chazottes, Collet and Schmitt (2005b) and Chazottes et al. (2009). There is still a lot of work to be done (see Chazottes (2013), Section 6).

2.2.4. Examples of observables

Let us barely mention some obervables to which we can apply the previous inequalities sucessfully. More details can be found in Collet, Martinez and Schmitt (2000), Chazottes, Collet and Schmitt (2005b) and Chazottes and Gouëzel (2013). For instance, we can study the speed at which the distance between the empirical measure and the SRB measure / goes to 0. We can also look at the kernel density estimator for maps having absolutely continuous invariant measures. Among other observables we can deal with, let us mention the empirical covariance and the integrated periodogram. Concentration inequalities can be used to obtain an almost sure central limit theorem from the usual central limit theorem (see Chazottes, Collet and Schmitt (2005b)). This illustrates that even if concentration inequalities are mainly intended to obtain fluctuation bounds, they can also be used to get some limit theorems.

Here we detail one example. The basic problem can be formulated as follows. Let A C Q be a set of initial conditions and x an initial condition not in A: How well can one approximate the orbit of x by an orbit from an initial condition of A? One can measure the 'average quality of tracing' by defining

Sa(x, n) = — inf } d(Tjx,Tjy). n yeA -f-'

(d is the distance on Q.) Assume for simplicity that diam(Q) = 1.

Théorème 2.1. Let T : Q O be a dynamical system modeled by a Young tower with exponential tails. There exists a constant c > 0 such that for any measurable subset A C X with strictly positive /-measure, for any n G N and for any t > 0

cy/ log n t

Ax G Q: Sa(x, n) >/A0g? + "M < e-t2/4C.

/ (A) n n

Proof. The function of n variables

1 n— 1

K(xo,..., Xn—l) = - inf y^ d(xj, Tjy) n ye A-f-'

is separately Lipschitz with maxi=o,...,n-i Lipi(K) < 1/n. The process (x,Tx,T2x,...) satisfies (2.5). Hence it satisfies (2.2) with Zj(x) = Tl(x):

jjx : Sa(x, n) >J SA(y,n)d^(y) + --=J < e-t2/4C. (2.7)

We now estimate J SA(y, n)dj(y) from above. Fix s > 0 and define the set

Bs = jx : Sa(x, n) > j SA(y, n)dj(y) + --=| •

One has the identity

/ SA(y,n)d^(y)= SA(y,n)d^(y) + / SA(y,n)dj(y) + / SA(y, n)dj(y).

J J A J AcnB% JAcnBs

The first integral is equal to 0 by definition of Sa. The second one is less than or equal to

(y SA(y,n)dM(y) + -=) j(Ac). The third one is bounded above by j(Bs) because SA(y, n) < 1 and /acrb dj < j(Bs). By (2.7) one has

M(Bs) < e-s2/4C.

I SA(y,n)dj(y) <[J SA(y,n)dj(y) + j(Ac) + e-s2/4C,

y SA(y,n)dj(y) < j(A)-1 (+ e-s2/4C

To finish the proof, it remains to optimize over s > 0. □ □

2.3. Strategy of proofs

Full details can be found in Chazottes and Gouezel (2013). Our aim here is to give a rough sketch of proofs and highlight the use of martingale inequalities.

The starting point is that one can work in an auxiliary system, the Young tower, instead of the original system (and pull back later the obtained results). For the sake of notational simplicity, we still denote by (Q,T,j) this auxiliary dynamical system. If we start with an invertible map (e.g. the Henon map), we can reduce to a non-invertible Young tower for which one can define the so-called transfer operator (see more details below). Notice that one has to put an appropriate metric on the Young tower with respect to which Lipschitz functions are defined. The projection map going from the Young tower to the original dynamical system is contracting and project Lipschitz functions on the tower to Lipschitz functions on the original phase space.

2.3.1. Martingale differences

Fix a separately Lipschitz function K(x0,... , xn-1). We consider it as a function on depending only on the first n coordinates (therefore, we set Lip^K) = 0 for i > n). We endow with the measure limit of the in (see Chazottes and Gouezel (2013)) when N ^ On let Fp be the a-algebra of events depending only on the coordinates (xj )j>p (this is a decreasing sequence of a-fields).

The very first step is classical: we want to write the function K as a sum of reverse martingale differences with respect to this sequence. Therefore, let Kp = E(K|Fp). More precisely,

Kp(xp, xp+i,...) = E(K |Fp)(xp, xp+i,...) = Y g(p)(y)K (y,..., Tp-1y, xp,...)

T p(y)=x

where g(p) is the inverse of the jacobian of Tp.

Now let Dp = Kp — Kp+1. The function Dp is Fp-measurable and E(Dp|Fp+1) = 0. Moreover,

K — E(K ) = Y Dp. (2.8)

The basic strategy is to look for good estimates on Dp, then to apply a suitable martingale inequality.

2.3.2. Exponential case

The key estimate is the following: there exists C > 0 and 0 < p < 1 such that, for any p, one has

|Dp| < C Y pp-j Lipj(K). (2.9)

Next, Hoeffding-Azuma inequality (see e.g. Milman and Schechtman (1986), page 33, or Ledoux (2001)) yields

P-1 n \ m 2

E ( ^p=-1 DA p=-1 sup|D

Using Cauchy-Schwarz inequality one easily gets ^p=0 sup |Dp|2 < ^^j LiPj(K)2. In view of (2.8), we obtain inequality (2.5).

Let us say a few words about (2.9). This estimate is a consequence of the existence of a spectral gap for the transfer operator L when it acts on a suitable Banach space. More precisely, one has Lu(x) = J2Ty=x g(y)u(y) where g denotes the inverse of the jacobian of T. Let C be the space of Lipschitz functions on H with its canonical norm ||/||c = sup |f | + Lip(f). One can prove that for a Young tower with exponential tails there exist C > 0 and 0 < p < 1 such that ||Lfe/ — J / di||c < CpkH/||c. The point is to write Kp(xp,xp+1,...) as a sum of functions of one variable, by an appropriate telescoping procedure, and then to use the contraction properties of the transfer operator.

2.3.3. Polynomial case

For Young towers with polynomial tails, there is no spectral gap for the transfer operator, hence life becomes much more complicated. To control Ln/ one has to rely on Banach algebra techniques to study some renewal sequences of operators entering the decomposition of Ln.

We do not give further details and content ourselves by pointing that the useful martingale inequality to use is the following Rosenthal-Burkholder martingale inequality (see Burkholder (1973) Theorem 21.1 and Inequality (21.5)): for all Q > 2,

Q i - Q/2\

Y Dp < C E Y E(Dp|Fp+i) 1 + C Y E(|Dp|Q)

p LQ V p p

3. Empirical Bernstein inequality and applications to high-dimensional

survival analysis

3.1. Framework

In this section, we state an empirical Bernstein inequality for martingales with jumps, and provide a statistical application of this inequality in a high-dimensional setting.

Let us first introduce the notations and the framework. For i = 1, ...,n, let N be a marked counting process, Yi a predictable random process in [0,1] and Zi = (Z^i,..., ZipP) a vector of covariates in Rp. Let (Ft)t>o be the natural filtration defined by Ft = a{Ni(s),Yi(s), 0 < s < t, Zi,i = 1, ...,n}, and let Ai(i) be the compensator of the process Ni(t) with respect to (Ft)t>t, so that Mi(t) = Ni(t) — Ai(t) is a martingale adapted to (Ft)t>0.

Assumption 3.1. Ni satisfies the Aalen multiplicative intensity model: for all t > 0 Ai(t) = J"0 Ao(s, Zi)Yi(s)ds, where the intensity Ao is an unknown nonnegative function.

For any function f : Rp ^ R, let ||f||n,TO = max1<i<„ |f(Zi)|, and let

1 ' CT

Vn,r (f ) = -Y f Zi)dMÁt). (3.1)

n 7=1J0

Our goal is to provide a Bernstein inequality for martingales of the form (3.1), where the predictable variation Vn,t(f) is replaced by the observable optional variation. Let us be more precise and start by recalling a standard version of Bernstein's inequality for martingales with jumps.

Theorem 3.2 (Shorack and Wellner (1986), van de Geer (1995)). Let {Mt}t>0 be a locally square integrable martingale with respect to {Ft}, Vt =< M >t and AMt = Mt — Mt-. Suppose that |AMt| < K for all t > 0 and some 0 < K < -x>. Then for each a > 0, b > 0,

P(Mt > a and Vt < b2 for some t) < exp

In our case, the predictable variation Vn t(f) satisfies

2(aK + b2)_

K, t(f) = n< nn(f) >t= -Y (/(Zi)fAo(t, Zi)Yi(s)ds.

n 7=1 Jo

We want to state a different version of the Bernstein inequality, where the predictable variation Vn t(f) is replaced by the observable optional variation of nn , t(f) defined by

1 n C t

% , t(f) = n[Vn(f )]t = - Y (/(Zi)?dNi(s). n 7=1 Jo

3.2. Empirical Bernstein inequality for martingales with jumps

Theorem 3.3. Let nn,T be defined by (3.1). For any x > 0 and ci, C2, C3 some positive constants, we have

Kt(f)l> [cjx + 4 ' x(f)Vn,t(f)+ C2x + 1+¿n ,x(f))\\f\\n,to < c3e~x, (3.3)

* V n n /

. /6enVn t(f ) + 56ex\\f \\2n

where in x(f) = 2 log log -:--—^-1

n 1 ^ f V 24\f\n ,

to . . i

Sketch of the proof of Theorem 3.3. We only give the main steps to prove Inequality (3.3). Let us introduce the processes Unt and Hi(f), defined by

1 n Í *

Un,t(f) = "E Hi(f)dMi(s) and Hi(f) =

n i=iJo

f (Zi)

max |f (Zi)|'

Since Hi(f) is a bounded predictable process with respect to Ft, Un*(f) is a square integrable martingale. Its predictable variation and its optional variation are respectively given by

1 n . t 1 n ,.t

K,t(f) = n< Un,(f) >t=-J2 (f (Zi)fdki(s) and K,t(f) = n[Un(f )]t = -Y, (f (Zi))2dNi (s).

n 1=1 J o n 1=1 J 0

The proof is done in three steps : Step1 : We prove first that

Un,t(f) > \/ ——— + 3ñv < K't(f) ^ w

Step 2 : Step 2 consists in replacing $n,t(f) by the observable $n,t(f) in Step 1. It follows that

Un,t(.f) > 2X¡ —KAf) + (V - (- + 3) + 3) ~'v ^ »n,t(f) < "

" vn V v v \v 33/33/ n

< 3e-x.

Step 3 : Finally, in a third step, we remove the event {v < $n,t(f) < w} from Inequality (3.4).

(3.4) □

3.3. Application : oracle inequality for the Lasso estimator of the intensity

In this part we show how the Bernstein Inequality (3.3) can be applied in a statistical context. We want to obtain a prognostic on the survival time adjusted on the covariates in a high-dimensional setting, and more precisely to estimate the unknown intensity Ao using a Lasso procedure. The properties of the Lasso estimator are stated in terms of non-asymptotic oracle inequalities. Such an oracle inequality is a consequence of an appropriate Bernstein inequality. The Lasso procedure based on a direct application of the Bernstein inequality (3.2) would provide estimators involving the unknown predictable variation, whereas Inequality (3.3) will provide completely data-driven estimators. This type of procedure has already been considered, especially by Hansen, Reynaud-Bouret and Rivoirard (2012) and Gai'ffas and Guilloux (2012), but for different statistical models.

We consider the specific Cox proportional hazard model, with A0(i, Zi) = a0(t) exp(f0(Zj)), where Zi = (Ziti,..., ZipP)T is the vector of covariates of individual i (i = 1,..., n), and f0 is the unknown regression function. The baseline hazard function a0 is assumed to be known and the estimation of A0 reduces to the estimation of f0. For the sake of simplicity, we consider here that a0 is known, and we refer to Lemler (2012) for the general case.

Let Fm = {fi, ...,fM} where fj : Rp ^ R for j = 1, ...,M, be a finite set of functions, called a dictionary, where M is large (typically M ^ n). We assume that the unknown A0 can be well approximated by a function defined for all /3 in RM by Ap(t, Zi) = a0(t)ef?(Zi) where fp = YM=i jfj.

To estimate the unknown parameter /3 in RM, we consider the log-likelihood defined by

) = - n lt{J0 íog^ (t' Zi))dNi(t)\¡3 (t, Zi)Yi (t)dtj. (3.5)

Associated to this estimation criterion, we consider the empirical Kullback divergence defined for all / in r(^) by

~ 1 n t'T

Kn(Ao, Ap) =-V (log (Ao(t, Zi)) — log (Ap(t, Zi))) Ao(t, Zi)Yi(t)dt

n 7=1Jo 1 n rT

--T. (Ao(t, Zi) — Ap(t, Zi)) Yi(t)dt. (3.6)

In this high-dimensional setting, the function fo is estimated using a weighted Lasso procedure.The Lasso estimator of / is defined as the minimizer of the ^-penalized empirical likelihood in the following way:

$l = argmin{Cn(Ap) + pen(/)}, with pen/) = Yu,^|. (3.7)

fieRM j=1

The weights u, are positive data-driven weights suitably chosen thanks to the empirical Bernstein's inequality (3.3) (see Theorem 3.3).

Main steps leading to the non-asymptotic oracle inequality. By definition of the weighted Lasso estimator, we have for all / e RM

Cn(A0L) +pen(/§L) < Cn(Afi) +pen(3). (3.8)

Using the Doob-Meier decomposition N = M7 + A7, we can easily show that for all / e

Cn(A$L) — Cn(Ap) = Kn(Ao, A$l ) — Kn(Ao, Ap) + — P)jVn,T(fj), (3.9)

1 n i T

where Vn,T (fj ) = "Y/ fj (Zi)dMi(t).

To obtain non-asymptotic oracle inequalities, we have to control the centered empirical process nn,T (fj). We apply Theorem 3.3 to the process rjn,T(fj), whose observable optional variations is defined by

VnAfj) = n[Vn (fj )]t = -Y (fj (Zi))2dNi (s). no

n t=1Jo j

From Theorem 3.3, we obtain

lnn,t(fj )|> (cJ X + innx(fj ) Vnf ) + C2 X + ) )h/, |n,c

t /6enVn t(fj) + 56ex||fj ^00 \ where in, x(fj) = 2loglog I -' 24|f ||2-— V e | .

Choice of the weights : We choose the data-driven weights for j = 1,..., M as

= x + 'ogM + ¿nM,>Vn_, f ) + C2x + 1+'ogM + t.M,) , f „,,„,. (3.10)

For this choice of weights, we introduce the following set A = p| < |nnT(fj)| < wj f • On A,

j=iL ' J

- ^ w (fj )| < E - fti iand p(Ac) < E p(iw(fj )i > ) <cse~ j=l j=l j=l

From (3.8) and (3.9), we deduce that

Kn(\o,\$L) < Kn(\o,X^)+ E(Pl - p)jVn,T(fj)+ E wj№ - Ewj to• (3.11)

j=i j=i j=i

We finally obtain the following non-asymptotic oracle inequality with a slow rate of convergence of order V'log M/n.

Theorem 3.4. Let A > 0 be some numerical positive constant and x > 0 be fixed. Then, with probability larger than 1 - Ae-x

Kn(Xo,XaL ) < inf {Kn(Xo,\p )+2pen(^)}, (3.12)

PL ^er(^)

with pen(^) defined by (3.7) and (3.10).

Under the classical restricted eigenvalue condition RE and some other assumptions, we also obtain some non-asymptotic oracle inequalities with a fast rate of convergence of order log M/n and some results in variable selection (see Lemler (2012) for more details).

4. Approximation by a martingale

Since its introduction by Gordin (1969), the martingale method has attracted number of probabilists. It was originally designed for the Central Limit Theorem (CLT) but it has been successfully used for almost any limit theorem (the Weak Invariance Principle (WIP) and its quenched version, the Law of the Iterated Logarithm and its functional versions, the Marcinkiewicz-Zygmund strong law of large numbers...)

When applied to the CLT or WIP problem, the martingale method has reached a fairly precise form thanks to the characterizations obtained recently by Zhao and Woodroofe (2008) (CLT case) and Gordin and Peligrad (2011) (CLT and WIP cases). Those characterizations are of theoretical interest but proved to be useful in the applications as well (see for instance Gordin and Peligrad (2011)).

The martingale method applies to "different" situations : stationary processes (adapted or not), non-invertible dynamical systems, functionals of Markov chain. In this note, for the sake of clarity, we shall only be concerned with adapted stationary processes. As we recall below, this case is actually equivalent to considering functionals of Markov chains. The case of non-adapted processes may be treated similarly (see for instance Volny (2007) or Cuny (2012a)). The adaptation of the results mentioned below to the setting of non-invertible dynamical systems needs more care since what we really obtain in that case is a"reverse" martingale approximation. We shall consider only real-valued processes, but some results extend to Hilbert space-valued processes.

We use the same notations as in the introduction. We want to study the process (Xn = X0 o Tn)neZ, which is adapted to the non-decreasing filtration (Fn = T-n(Fo))ngz, i.e. X0 is Fo-measurable. We assume that X0 G Lp(H, F0, P), for some p > 1. For simplicity, we assume T to be ergodic.

Let us define an operator Q on L1(H, F0, P) by QZ = E(Z o T|F0). The operator Q is a positive contraction of every Lr(11, F0, P), r > 1, hence it is a Markov operator. It turns out that this operator allows to see our process (Xn)neZ as a functional of a Markov chain, see Cuny and Volny (2012).

Another advantage of this operator is that it allows to translate projective conditions in terms of Q, hence to make use of classical facts from ergodic theory of operators. For instance, the martingale-coboundary decomposition of Gordin is easily characterized as follows, which has been observed by Volny (1993) (the non-adapted case is also considered there). Volny worked under the regularity condition E(Xo|F_TO) = 0 a.s. but it is actually not needed.

Proposition 4.1. Let p > 1 and let Xo e Lp(H, Fo, P). The following are equivalent

(i) Xo = Do + Z — Z o T-1 with Do,Z e Lp(tt, Fo, P) and E(Do|F-1) = 0 almost surely; (ii) supn>1 ||E(Sn|Fo)||p < ro.

Proof. (i) ^ (ii) is obvious. Now, (ii) reads: supn> 1 ||QXo + • • • + QnXo||p < ro. Hence, by a result of Browder (1958, Lemma 5) when 1 < p < ro, and by Theorem 7 of Lin and Sine (1983) when p =1, there exists Y e Lp(Q, Fo, P) such that Xo = (I — Q)Y, and (i) follows by taking Do = Y — E(Y|F-1) = Y — (QY) o T-1 and Z = — QY. □

Our goal now will be to explain that, in several situations, the proof of a limit theorem by means of a martingale approximation may be split into two steps: the first step is to prove the desired result when one has a martingale-coboundary decomposition as above. The second step consists of obtaining a maximal inequality adapted to the limit theorem under consideration. This approach has been used explicitly in Cuny (2012a) and Cuny (2012b) and implicitly, for instance, in Jiang and Wu (2003), Cuny and Volny (2012) or Cuny and Merlevede (2012).

Assume from now on that Xo e L2(H, Fo, P). We say that Xo or (Xn)neZ admits a martingale approximation of type (CLT), (WIP) or (ASIP) if there exists Do e L2(il, Fo, P) with E:(Do|F_1) = 0 a.s. such that, writing

Mn = Do oT + ••• + Do oTn,

E((Sn — Mn)2) = o(n) (CLT)

E( max (Sk — Mk)2) = o(n) (WIP)

|Sn — Mn| = o( \Jn log log n) P-a.s. (ASIP).

The notation (ASIP) stands for the almost sure invariance principle. The above martingale approximations are unique. If Xo admits a martingale approximation of one of the above types then it satisfies the corresponding limit theorem.

An important fact is that the set of Xo e L2(H, Fo, P) admitting a martingale approximation (of any of the above type) is a vector space, containing (I — Q)L2(Q, Fo, P), hence stable by Q.

We would like to study the validity of some limit theorems under the Hannan condition (0.8) and/or the Maxwell-Woodroofe condition (0.9) mentioned in the introduction. In terms of the operator Q those conditions read

||Xo|k := £ ||QnXo — (Qn+1Xo) o T-1||2 < ro ; (4.1)

II VII • IIV II V^ ||QXo +-----^ QnXo^2 f.0s

||X0||MW2 := 11Xo112 ^^ -n3?2- < ro . (4.2)

Let us consider the spaces H2 := {Xo e L2(Q, Fo, P) : E_TO(Xo) = 0 and ||Xo||^2 < ro} and MW2 := {Xo e L2(Q, Fo, P) : ||Xo||mw2 < ro}. It is not hard to prove that those spaces are Banach spaces and that Q induces a contraction of H2 and of MW2.

Moreover, for every Xo G H2, ||QnXo||H2 —> 0. With little effort one can also prove that, for every

n—^+^0

Xo G MW2, IIQXo + ••• + QnXo\\MW2/n 0.

n — + TO

By the mean ergodic theorem (see e.g. Theorems 1.2 and 1.3 p. 73 of Krengel (1985)), we have, noticing that Q has no fixed points neither on H2 nor on MW2,

H2 , , rTTT —--———-MW2

H2 = (I - Q)H2 2 and MW2 = (I - Q)MW2 2 (4.3)

4.1. The approximating martingale.

We first have to find D0. According to Gordin and Peligrad (2001), if there exists a martingale approximation of type (CLT), necessarily

1 n k-1

Do = lim - VV(QkXo - (Qk+1Xo) o T-1), (4.4)

n=r+ TO n Z-J Z-J

k=1 i=o

where the limit holds in L2(H, P). Set D(Xo) := Do, whenever the above limit exists. Then, D is an unbounded operator on L2(H, P).

As one may expect (from the proof of Proposition 4.1), for every Y G L2(H, Fo, P), D((I - Q)Y) = Y -(QY) o T-1.

The operator D is well-defined (i.e. bounded) on H2 (this is easily verified) and on MW2, by the results of Gordin and Peligrad (2011). For some problems, it may be useful to have an other form of D, more adapted to the conditions (4.1) or (4.2).

For every Xo G H2 we have, with convergence in L2(H, P) (and P-a.s.),

DXo = Y(QnXo - (Qn+1Xo) o T-1) and ||DXo||2 < • (4.5)

For every Xo G MW2, we have (see Cuny and Merlevede (2012)), with convergence in L2(H, P) (and P-a.s.)

DXo = EE QkXo - (Q k+-Xo) ◦ T-1 and \DXo\2 < C\Xo\mw2 , (4.6)

n> o k> n

for a universal constant C > 0. It is not hard to see that the inner sum above converges as soon as Xo G L2(H, P). One can prove that if the representation (4.5) holds, then (4.6) holds as well, and that (4.6) implies (4.4).

4.2. Some maximal inequalities.

Before proving the martingale approximation properties, we shall need the corresponding estimate with a big "O" instead of a little "o". For this we need some maximal inequalities. We start with the martingale case. We have

Proposition 4.2. Let Do G L2(0., Fo, P) with E(Do|F-1) = 0 P-a.s. Write Mn := £n=1 Do o Tk. We have

\ max1<k<n |MkIh D ,, . sup-^- < 2|Do|2 .

sup ■

2>1 \J nL(L(n))

where L(n) = max(l, logn) and C > 0 is a universal constant.

x< C||Do||2 , (4.7)

The first estimate is nothing but Doob's maximal inequality. The second one has been proven in Cuny (2012a) (the maximal function actually lies in any Lp(H, P), 1 < p < 2). Both inequalities hold without ergodicity. The maximal inequality (4.7) seems to be new in the martingale setting. In the iid case, it has been proved by Pisier (1976). To emphasize the usefulness of such inequality we mention, that thanks to (4.7), in order to prove the law of the iterated logarithm for martingales with stationary ergodic increments in L2, it suffices to prove it for martingales with bounded increments (this follows from a Banach principle argument).

In the next proposition, X stands either for H2 or for MW2. For Xo e X, we denote Do = DXo and Mn = En=1 Do o Tk.

Proposition 4.3. Assume that Xo e X. We have

sup || maxi<fc<„ |fffc - MkM|2 X

SUp -=- < C hAohx :

n> i vn

ISn - Mn

sup — =

n>i\/ nL(L(n))

< C||Xo||x ,

where C > 0 is a universal constant.

The estimate (4.9) is proved in Cuny (2012b), under the Maxwell-Woodroofe condition. For the other estimates, at first, notice that to prove (4.8) or (4.9), it suffices to treat separately (Sn)n>1 and (Mn)n>1. For the martingale part both estimates follow from Proposition 4.2 combined with (4.5) or (4.6). For (Sn)n>1 itself, (4.8) follows from Theorem 1 (iii) of Wu (2007) under the Hannan condition and from Proposition 2.3 of Peligrad and Utev (2005) under the Maxwell-Woodroofe condition. The estimate (4.9) for (Sn)n>1 is proved in Cuny (2012a) under the Hannan condition.

4.3. The conclusion.

It follows from (4.8) and a Banach principle argument, that the set of X0 G X, for which (WIP) holds, is closed in X. But (WIP) holds on (I - Q)X, hence, by (4.3), it holds on X too.

Similarly, by (4.9), the set of X0 G X, for which (ASIP) holds, is closed in X and we conclude as above.

References

[1] Azuma, K. Weighted sums of certain dependent random variables, Tôhoku Math. J. 19 (1967) 357-367.

[2] Benedicks, M. and Young L.-S. Markov extensions and decay of correlations for certain Hénon maps, Géométrie complexe et systèmes dynamiques (Orsay, 1995). Astérisque 261 (2000) 13—56.

[3] Bennett, G. Probability inequalities for the sum of independent random variables, J. Amer. Statist. Assoc. 57 (1962) 33—45.

[4] Billingsley, P. The Lindeberg-Lévy theorem for martingales, Proc. Amer. Math. Soc. 12 (1961) 788-792.

[5] Boucheron, S., Bousquet, O., Lugosi, G. and Massart, P. Moment inequalities for functions of independent random variables, Ann. Probab. 33 (2005) 514-560.

[6] Browder, F. E. On the iteration of transformations in noncompact minimal dynamical systems, Proc. Amer. Math. Soc. 9 (1958) 773-780.

[7] Brown, B. M. Martingale central limit theorems, Ann. Math. Statist. 42 (1966) 59-66.

[8] Burkholder, D. L. Martingale transforms, Ann. Math. Statist. 37 (1971) 1494-1504.

[9] Burkholder, D. L. Distribution function inequalities for martingales, Ann. Probab. 1 (1973) 19-42.

[10] Burkholder, D. L., Davis, B. J. and Gundy, R. F. Integral inequalities for convex functions of operators on martingales. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability theory, pp. 223-240. Univ. California Press, Berkeley, Calif., (1972).

[11] Chazottes, J.-R. Fluctuations of observables in dynamical systems: from limit theorems to concentration inequalities, to appear in a volume in honor of V. Afraimovich (2013), http://arxiv.org/abs/1201.3833.

[12] Chazottes, J.-R., Collet, P., Redig, F. and Verbitskiy E. A concentration inequality for interval maps with an indifferent fixed point, Ergod. Th. Dynam. Systems 29 (2009) 1097-1117.

[13] Chazottes, J.-R. and Gouëzel, S. Optimal concentration inequalities for dynamical systems, Comm. Math. Phys. 316 (2012) 843-889.

14] Chazottes, J.-R., Collet, P. and Schmitt, B. Devroye inequality for a class of non-uniformly hyperbolic dynamical systems, Nonlinearity 18 (2005a) 2323-2340.

15] Chazottes, J.-R., Collet, P. and Schmitt, B. Statistical consequences of the Devroye inequality for processes. Applications to a class of non-uniformly hyperbolic dynamical systems, Nonlinearity 18 (2005b) 2341-2364.

16] Collet, P. Variance and exponential estimates via coupling, Bull. Braz. Math. Soc. (N.S.) 37 (2006) 461-475.

17] Collet, P., Martinez, S. and Schmitt, B. Exponential inequalities for dynamical measures of expanding maps of the interval, Probab. Theory Related Fields 123 (2002) 301-322.

18] Courbot, B. Rates of convergence in the functional CLT for martingales, C. R. Acad. Sci. Paris 328 (1999) 509-513.

19] Cuny, C. ASIP for martingales in 2-smooth Banach spaces. Application to stationary sequences, (2012a) http://hal. archives-ouvertes.fr/hal-00745651

20] Cuny, C. An almost sure invariance principle under the Maxwell-Woodroofe condition, submitted, (2012b).

21] Cuny, C. and Merlevède, F. On martingale approximations and the quenched weak invariance principle, accepted for publication in Ann. Probab. (2012). http://arXiv.org/abs/1202.2964

22] Cuny, C and Volny, D. A quenched invariance principle for stationary processes, ALEA, Lat. Am. J. Probab. Math. Stat. 10 (2013), 107-115.

23] Dede, S. Moderate deviations for stationary sequences of Hilbert-valued bounded random variables, J. Math. Anal. Appl. 349 (2009) 374-394.

24] Dedecker, J., Doukhan, P. and Merlevède, F. Rates of convergence in the strong invariance principle under projective criteria, Electron. J. Probab. 17 (2012) 1-31.

25] Dedecker, J., Merlevède, F. and Volny, D. On the weak invariance principle for non adapted sequences under projective criteria, J. Theoret. Probab. 20 (2007) 971-1004.

26] Fan, X., Grama, I. and Liu, Q. Hoeffding's inequality for supermartingales, Stochastic Process. Appl. 122 (2012) 3545-3559.

27] Freedman, D. A. On tail probabilities for martingales, Ann. Probability 3 (1975) 100-118.

28] Freund, Y., Mansour, Y. and Schapire, R. E. Generalization bounds for averaged classifiers, Ann. Statist. 32 (2004) 1698-1722.

29] Gaïffas, S. and Guilloux, A. High-dimensional additive hazard models and the Lasso. Electron. J. Stat. 6 (2012) 522-546.

30] Gordin, M. I. The central limit theorem for stationary processes., Dokl. Akad. Nauk SSSR. 188 (1969) 739-741.

31] Gordin, M. I. and Lifsic, B. A. Central limit theorem for stationary Markov processes, (Russian) Dokl. Akad. Nauk SSSR 239 (1978) 766-767.

32] Gordin, M. I. and Peligrad, M. On the functional central limit theorem via martingale approximation, Bernoulli 17 (2011) 424-440.

33] Haeusler, E. An exact rate of convergence in the functional central limit theorem for special martingale difference arrays, Probab. Theory Relat. Fields 65 (1984) 523-534.

34] Hannan, E. J. Central limit theorems for time series regression, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 26 (1973) 157-170.

35] Hannan E. J. The central limit theorem for time series regression, Stochastic Process. Appl. 9 (1979) 281-289.

36] Hansen, N. R., Reynaud-Bouret, P. and Rivoirard, V. Lasso and probabilistic inequalities for multivariate point processes, Work in progress, personal communication, (2012).

37] Heyde, C. C. On the central limit theorem for stationary processes, Z. Wahrscheinlichkietstheorie und Verw. Gebiete 30 (1974), 315-320.

38] Heyde, C. C. On the central limit theorem and iterated logarithm law for stationary processes, Bull. Austral. Math. Soc. 12 (1975) 1-8.

39] Hoeffding, W. Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc. 58 (1963) 13-30.

40] Le Borgne S. Limit theorems for non-hyperbolic automorphisms of the torus, Israel J. Math. 109 (1999) 61-73.

41] Ibragimov, I. A. A central limit theorem for a class of dependent random variables, Theory Probab. Appl. 8 (1963) 83-89.

42] Jiang, Y. and Wu, L. Hilbertian invariance principle for empirical process associated with a Markov process, Chinese Ann. Math. Ser. B 24 (2003) 1-16.

43] Krengel, U. Ergodic theorems, de Gruyter Studies in Mathematics, 6. Walter de Gruyter & Co., Berlin, (1985).

44] Ledoux, M. The concentration of measure phenomenon, Mathematical Surveys and Monographs 89. American Mathematical Society, (2001). Second printing (2005).

45] Lemler, S. Oracle inequalities for the Lasso for the conditional hazard rate in a high-dimensional setting, (2012) http: //arxiv.org/abs/1206.5628v3

46] Lévy, P. Propriétés asymptotiques des sommes de variables aléatoires indépendantes ou enchaînées, J. Math. Pures Appl. Ser. 8 (1935) 347-402.

47] Lin, M. and Sine, R. Ergodic theory and the functional equation (I — T)x = y. J. Operator Theory 10 (1983) 153-166.

48] Liu, Q. and Watbled, F. Exponential inequalities for martingales and asymptotic properties of the free energy of directed polymers in a random environment, Stochastic Process. Appl. 119 (2009) 3101-3132.

49] Liverani, C. Central limit theorem for deterministic systems, International Conference on Dynamical Systems, 56-75, Pitman Res. Notes Math. Ser., 362, Longman, Harlow, (1996).

[50] Maxwell, M. and Woodroofe, M. Central limit theorems for additive junctionals of Markov chains, Ann. Probab. 28 (2000) 713-724.

[51] McDiarmid, C. On the method of bounded differences, Surveys in combinatorics, 148-188, London Math. Soc. Lecture Note Ser., 141, Cambridge Univ. Press, Cambridge, (1989).

[52] McDiarmid, C. Concentration, Probabilistic methods for algorithmic discrete mathematics, 195—248, Algorithms Combin., 16, Springer, Berlin, (1998).

[53] Nagaev, S. V. Large deviations of sums of independent random variabels, Ann. Probab. 7 (1979) No. 5, 745—789.

[54] Merlevède, F. and Peligrad, M. Rosenthal inequalities for martingales and stationary sequences and examples, Ann. Probab. 41 (2013), No 2, 914-960.

[55] Milman, V. and Schechtman, G. Asymptotic theory of finite-dimensional normed spaces, Lect. Notes in Math. 1200 Springer (1986).

[56] Peligrad, M. and Utev, S. A new maximal inequality and invariance principle for stationary sequences, Ann. Probab. 33 (2005) 798-815.

[57] Peligrad, M., Utev, S. and Wu, W. B. A maximal Lp-inequality for stationary sequences and its applications, Proc. Amer. Math. Soc. 135 (2007) 541-550.

[58] Pinelis, I. An approach to inequalities for the distributions of infinite-dimensional martingales. Probability in Banach spaces, 8, 128-134, Progr. Probab., 30, Birkhäuser Boston, Boston, MA, (1992).

[59] Pinelis, I. Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22 (1994) 1679-1706.

[60] G. Pisier, Sur la loi du logarithme itéré dans les espaces de Banach, (French) Probability in Banach spaces (Proc. First Internat. Conf., Oberwolfach, 1975), pp. 203-210. Lecture Notes in Math., Vol. 526, Springer, Berlin, (1976).

[61] Rio, E. Inégalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes (French) C. R. Acad. Sci. Paris S'er. I Math. 330 (2000) 905-908.

[62] Shorack, G. R. and Wellner, J. A. Empirical processes with applications to statistics. Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons Inc. New York, (1986).

[63] van de Geer, S. Exponential inequalities for martingales, with application to maximum likelihood estimation for counting processes. Ann. Statist. 23 (1995) 1779-1801.

[64] Volny, D. Approximating martingales and the central limit theorem for strictly stationary processes. Stochastic Process. Appl. 44 (1993) 41-74.

[65] Volny, D. A nonadapted version of the invariance principle of Peligrad and Utev, C. R. Math. Acad. Sci. Paris 345 (2007) 167-169.

[66] Woyczynski, W.A. A central limit theorem for martingales in Banach spaces. Bull. Acad. Polon. Sci. Sr. Sci. Math. Astronom. Phys. 23 (1975) 917-920.

[67] Wu, W. B. Strong invariance principles for dependent random variables Ann. Probab. 35 (2007) 2294-2320.

[68] Young, L.-S. Statistical properties of dynamical systems with some hyperbolicity, Ann. of Math. (2) 147 (1998) 585-650.

[69] Young, L.-S. Recurrence times and rates of mixing, Israel J. Math. 110 (1999) 153-188.

[70] Young, L.-S. What are SRB measures, and which dynamical systems have them?, J. Stat. Phys. 108 (2002) 733-754.

[71] Zhao, O. and Woodroofe, M. On martingale approximations, Ann. Appl. Probab. 18 (2008) 1831-1847.