Scholarly article on topic 'Exercising control when confronted by a (Brownian) spider'

Exercising control when confronted by a (Brownian) spider Academic research paper on "Mathematics"

CC BY-NC-ND
0
0
Share paper
Academic journal
Operations Research Letters
OECD Field of science
Keywords
{"Dynamic programming" / Martingales / "Stochastic optimization" / "Brownian motion"}

Abstract of research paper on Mathematics, author of scientific article — Philip Ernst

Abstract We consider the Brownian “spider,” a construct introduced in Dubins and Schwarz (1988) and in Barlow and Pitman (1989). In this note, the author proves the “spider” bounds by using the dynamic programming strategy of guessing the optimal reward function and subsequently establishing its optimality by proving its excessiveness.

Academic research paper on topic "Exercising control when confronted by a (Brownian) spider"

Accepted Manuscript

Exercising control when confronted by a (Brownian) spider

Philip Ernst

PII: S0167-6377(16)30028-1

DOI: http://dx.doi.org/10.10167j.orl.2016.05.001

Reference: OPERES 6093

To appear in: Operations Research Letters

Received date: 21 August 2015

Revised date: 6 May 2016

Accepted date: 6 May 2016

Please cite this article as: P. Ernst, Exercising control when confronted by a (Brownian) spider, Operations Research Letters (2016), http://dx.doi.org/10.1016Zj.orl.2016.05.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Exercising Control When Confronted by a (Brownian) Spider

Philip Ernst1-*

Abstract

We consider the Brownian "spider," a construct introduced in [4] and in [1]. In this note, the author proves

the "spider" bounds by using the dynamic programming strategy of guessing the optimal reward function

and subsequently establishing its optimality by proving its excessiveness.

Keywords: Dynamic programming, martingales, stochastic optimization, Brownian motion

2010 MSC: Primary: 90C39, Secondary: 90C15

In memory of my mentor, Professor Larry Shepp (1936-2013)

1. Introduction

In this note, we consider the Brownian "spider," a process also known as the "Walsh" Brownian motion, due to [4] and [1]. The Brownian spider is constructed as a set of n > 1 half-lines, or "ribs," meeting at a common point, O. A Brownian motion on a'spider starting at zero may be constructed from a standard reflecting Brownian motion (|Wt|,t > 0) by assigning an integer i G {1,..., n} uniformly and independently to each excursion which is then transferred to an excursion on rib i (here, i should be interpreted as the index of the rib on which the next excursion occurs). It is helpful to think about the Brownian spider as a bivariate process; the first coordinate of the process is reflecting Brownian motion and the second coordinate of the process is the rib index. Formally, we define the Brownian spider process Zt as

Zt = (|Wt|,Rt) ,t > 0 (1)

where | Wt| is reflected Brownian motion and Rt is the rib on which the process is located at time t. |Wt| can be decomposed into excursions away from 0 with endpoints tk s.t. | Wtk | =0. Rt is constant between tk and tk+i for all i, and Rt = i means the excursion occurs on the rib i. We define the supremum of reflected Brownian motion on each rib as

Sj(t) = sup |Wt|, t > 0, i = 1,..., n. {t: Rt=i}

^Submitted August 17, 2015

*Corresponding author. E-mail: philip.ernst@rice.edu

1 Department of Statistics, Rice University, Houston, TX 77005, USA

Preprint submitted to Operations Research Letters

May 11, 2016

Below is a sample path realization of the Brownian spider for n = 3. We use Wj(t) to denote the process on the rib i:

f |Wt| if Rt = i

Wi(t)= I ' ^ t (2)

I 0 if Rt = i.

Figure 1: A sample path realization of the Brownian spider for n = 3.

In an attempt to understand the unboundedness of Brownian motion on the spider up to time t, a natural question to ask is: what is E "=1 Si(t)]? However, Lester Dubins (personal communication with Larry Shepp) asked a different question. Dubins wished to design a stopping time to maximize the coverage of Brownian motion on the spider for a given expected time. That is, he wished to find

C„ := sup E [Si(t ) + ... + S„(r)], (3)

{t: E[T] = 1}

where the supremum is calculated over all stopping times of mean one. Equivalently, Dubins wished to calculate the smallest C = Cn such that for every stopping time т the following inequality holds

E [Si(T) + S2(T) + ... + Sn(T)] < CnVETTj. (4)

(note that for any stopping time т, E [Si (т)] scales with with yT). The left side of equation (4) is the mean total measure of space visited on the spider up to time т.

For n = 0, we, somewhat inconsistently, define C0 in a similar way for ordinary Brownian motion without a reflecting barrier at zero. We seek the smallest constant C0 for which the one-sided maximum satisfies

max W (t )

{0<t<r}

< C^v/E]TI. (5)

In this note, we will prove that the optimal bounds Cn = \Jn + 1, for n = 0,1, 2. Without further delay, the author notes that the solution of the optimal bounds for n = 0,1, 2 is not new. The cases n = 0,1 were solved by [4], and the case n =1 was also independently solved by [9] by a different method. The n = 2 case was recently resolved by [3]. What is new, however, is the dynamic programming strategy the author employs to find the bounds Cn for n = 0, 1, 2, which he believes to be the most tractable approach for solving for Cn for all n (despite much effort by many researchers, this problem remains open). The behavior of Cn for large n is interesting because when n = to, the total measure of space visited on the spider up to time t > 0 is also infinite. This is because it is the total variation of a Brownian motion on [0, t] because at each return to the node, a fresh rib is chosen.

Larry Shepp saw dynamic programming to be the root of all optimal control problems. In general, there are two strategies that can be used to solve a dynamic programming problem.

(A) Guess a candidate for an optimal strategy, calculate the reward function for the strategy, then prove its excessiveness.

(B) Guess the optimal reward function and establish its optimality by proving its excessiveness.

Unlike [3], which employs strategy (A), our approach is that of (B), and to the best of our knowledge,

we are the first to do so. In stochastic optimization, strategy (B) reduces to "guessing" the right optimal control function. If one can guess the right function, the supermartingale becomes a martingale, and Ito calculus takes care of the rest. This approach appears prominently throughout Shepp's most seminal works, specifically on p.634 of [12], on p.207 of [10], p.1528 of [13], on p.335 of [7], and most recently, on p.422 of [8].

The organization of this note is as follows: In Section 2, we formalize our guess for the optimal reward function. In Section 3, we establish the optimality of this function by proving its excessiveness, albeit only in the cases n = 0, 1, 2. We conclude by arguing the viability of our strategy towards a solution of the general problem.

2. Our guess of the optimal reward function

Let r = R0 be the index of the starting rib, x be a fixed distance along the rib r, and C and M finite constants. In order to obtain the least upper bound Cn, we must solve a more general optimal stopping problem. Let s1, s2,..., sn > 0 be the distances that have already been covered on each of the respective ribs at time 0. For every value of C > 0, and every choice of r and x such that x < sr, and s1,..., sn, we must find the value of

V (x, r; si,...,sn; C) := sup E{x,r,si,...,sn} [Si(t ) + ... + Sn(T) - Ct] . (6)

{t: E[T]<M}

The subscript of the expectation, {x, r, si,..., sn}, denotes that the process is currently at a distance x on rib r at time 0. By abuse of notation, Sj(r) denotes the furthest point covered on rib i up to time t. Note that we must find V(x, r; si,..., sn; C) not only for x = 0 and s1 = ... = sn = 0, but for every point of the spider at x on every rib r as initial point, and every starting position for s^, i = 1,..., n.

In (6), the supremum is taken over bounded stopping times t. Even though we only need the case when the initial point is O and when s^ = 0, i = 1,..., n, standard martingale methods of solving optimal stopping problems do not work unless we can find the formula V for every starting position (see, for example: [2], [6], [5], [11], and [15]).

We "guess" that F(x, r, s1,..., sn, C) should have the following properties:

(a) V(0, r, s1,..., sn, C) does not depend on r (if x^ = 0, r becomes irrelevant).

(b) EILi dxV(x, r, si,..., s„, C) = 0 at x = 0.

(c) dj-V(x, r, s1,..., sn, C) = 0 at x = sr V r.

(d) 1 dx2V(x, r, s1,..., sn, C) < C, 0 < x < sr, r = 1,..., n.

(e) V(x, r, s1,..., sn, C) > s1 + ... + sn, for all 0 < x < sr and r = 1,..., n.

(f) If equality holds in property (e), 1 dx^ V^(x, r, s1,..., sn, C) = C, 0 < x < sr.

Intuitively, at a stopping place, we are far from any boundary point where an s would increase and thus we are willing to accept the reward F(x, r, s1,..., sn, C) = s1 + ... + sn.

3. Establishing the optimality of the reward function

Theorem 3.1. If we have a function V satisfying properties (a)-(f) in Section 2, then

V(x, r, si,..., s„; C) = y(x, r, si,..., s„, C). (7)

Proof. Consider the process

Y(t) = f(Zt, S(t),C) - Ct, t > 0 (8)

where S(t) = (Si (t),..., Sn(t)). Y(t) is a continuous local supermartingale at x = 0 by properties (a) and (b), at x = sr by property (c), and at any x by property (d). For any bounded stopping time t, it follows from the optional sampling theorem that E [Y(t)] < Y(0). Property (e) gives us that for any bounded t,

E{x,r,si,...,s„} [Si(t) + ... + S„(t) - Ct] < V (x, r, si,..., s„, C). (9)

From the definition of V in (6), we must have V < V\

We now consider the reverse inequality V > V\ By property (f), equality holds in the last argument for the "right t." Although this "right t" does not always exist in such problems, it does for our problem; the "right t" is the first entry time of the underlying Markov process (Z, S) in the set where equality holds in (e). Further, this "right t" is a particular stopping time that is "approximable by uniformly bounded ones." Larry Shepp used the phrase "approximable by uniformly bounded ones" to denote that we can take the "right t" at which the equality is attained, approximate this "right t" with "right t" A n for n > 1, and then proceed to pass to the limit for n. This is valid in our setting since the "right t" has finite expectation. When property (f) holds, and when equality holds in (d), Y will be a local martingale up to the first entry of the underlying Markov process (Z, S) into the the set where equality holds in (e). Since the "right t" has a finite expectation, we may invoke the standard form of Doob's stopping theorem for bounded stopping times, as in [14]. Thus,

E{*,r,„i,.„,„„} [Si(t) + ... + Sn(T) - Ct] > V (x, r, si,..., sn, C). (10)

and one can optimize over t on both sides. The reverse inequality V > V thus holds and thus V = completing the proof.

If we can find the right V satisfying properties (a)-(f), we then know that

An(C ):= V (O,r, 0,..., 0; C ) = C, (11)

where 0n is a number independent of C. V must be of the form -en because a scaling argument allows us to

reduce the problem to any one value of C. This is because we will show that

V (x, r, si,..., sn; C) = C v (Cx, r, Csi,..., Csn; 1). (12)

Note that if we start at x = O and si = ... = 0 then above form for An(C) is obtained. Let

S(T) ^ Si(T) + ... + Sn(T).

For any C and any t, E [S(t)] < An(C) + CE [t]. If we specify m = E [t] for any fixed stopping time t, then we will obtain the best upper bound by minimizing over C, which is

E [S(t)] < f C + Cmj .

The infimum is attained at C = \pm and gives the bound Cn = 2^0^ for any n. Thus we need only find V(O; C) for any one value of C.

3.1. Solution for n=0,1,2 Corollary 3.2. C0 = 1.

Proof. For n = 0, consider the function

V (x, s, C) = C^(x - s + —j j + s.

We note that properties (a)-(f) hold, and so for x = s = 0, and for any C > 0

E [st] < CE [t] + 4c. (13)

Minimizing over C, i.e., taking C = , as above for any t, we obtain the inequality

E [s(t)] <v/E1t] (14)

for all t, i.e., C0 = 1. □

Corollary 3.3. Ci = y/2. Proof. For n =1, the right V is given by:

V (x, s, C) = Cx2 + 0 < x < s < 2C; (15)

V (x, s, C) = C^x - s + ^J + s, 0 < x < s, s> 2C

We use the above argument to see that A2(C) = 27 and = i and so Ci = a/2. CI

Corollary 3.4. C2 = V3.

Proof. For n = 2, V is, for i = j and si + s2 < C,

C (s2 1 s2) 3

V(x, r, si, s2, C) = Cx2 - Cx(si - s,) + V i 2 + —, 0 < x < si. (16)

We further simplify equation (16) as follows:

V(x, r,Sl,S2,C )= C((x - - —)) j + ^-x - - —)) ) + Si + s2, (17)

where 0 < x < si, si + s2 > C. We can use the above argument to see that with V(O; C) = 4C we arrive at

C2 = V3. □

4. n = 3 and beyond

At present, we possess a non-trivial but ultimately incomplete strategy for addressing the case n = 3. Our strategy is to develop the "correct" nonlinear Fredholm equation in order that we may reduce the problem to that of a nonlinear integral recurrence. Based on simulation approaches, we conjecture the following about the constant:

Conjecture 4.1. The \Jn + 1 pattern for the spider constant does not hold for n = 3. Further, it is likely that the spider constant for n = 3 is not an elementary number.

5. Final Remarks

We are hopeful of a solution to the general n case for the Dubins spider and maintain that our proposed dynamic programming approach constitutes the most tractable direction for solving the problem, for the following reasons: 1) The use of linear programming would be infeasible because the approximate linear programming would be large and unwieldy, making accurate numerics impossible. 2) Bellman's dynamic programming method seems intractable for the same reason as that of using linear programming. 3) The more standard method of dynamic programming, namely that of guessing a candidate for an optimal strategy, calculating the reward function of the strategy, and proving its excessiveness (as most recently done by [3]) was unsuccessful in obtaining the general solution.

Acknowledgments

First and foremost, I am indebted to my mentor, Professor Larry Shepp, for his extraordinary support, for introducing me to this literature, and for his enormously insightful conversations about this problem. I am also indebted to my colleague Professor Goran Peskir for his excellent inspiration and insight, particularly regarding the proof of Theorem 3.1. I am grateful to Quan Zhou for his invaluable help in producing the figure in this note as well as for his careful reading of the manuscript. I thank Professor David Gilat and Professor Isaac Meilijson for their detailed input. I thank Professor Ton Dieker for his helpful comments. Finally, I am tremendously grateful to an anonymous referee whose very helpful comments enormously improved the quality of this work.

References

[1] Barlow, M., Pitman, J., & Yor, M. (1989). On Walsh's Brownian motions. In Séminaire de probabilités de Strasbourg (pp. 275-293). volume 1372.

[2] Brumelle, S. (1971). Some inequalities for parallel-server queues. Operations Research, 19, 402-413.

[3] Dubins, L., Gilat, D., & Meilijson, I. (2009). On the expected diameter of an L2 bounded martingale. The Annals of Probability, 37, 393-402.

[4] Dubins, L., & Schwarz, G. (1988). A sharp inequality for submartingales and stopping times. Astérisque, 157, 129-145.

[5] Dubins, L., Shepp, L., & Shiryaev, A. (1994). On optimal stopping rules and maximal inequalities for Bessel processes. Theory of Probability and Applications, 38, 226-261.

[6] Dynkin, E., & Yushkevich, A. (1979). Controlled Markov Processes. Springer.

[7] Ernst, P., Foster, D., & Shepp, L. (2014). On optimal retirement. Journal of Applied Probability, 51, 333-345.

[8] Ernst, P., & Shepp, L. (2015). Revisiting a theorem of L.A. Shepp on optimal stopping. Communications on Stochastic Analysis, 9, 419-423.

[9] Gilat, D. (1988). On the ratio of the expected maximum of a martingale and the Lp norm of its last term. Israel Journal of Mathematics, 63, 270-280.

[10] Grigorescu, I., Chen, R., & Shepp, L. (2007). Optimal strategy for the Vardi casino with interest payments. Journal of Applied Probability, 44, 199-211.

[11] Rhee, W., & Talagrand, M. (1987). Martingale inequalities and NP-complete problems. Mathematics of Operations Research, 12, 177-181.

[12] Shepp, L., & Shiryaev, A. (1993). The Russian option: reduced regret. Annals of Applied Probability, 3, 631-640.

[13] Shepp, L., & Shiryaev, A. (1996). Hiring and firing optimally in a large corporation. Journal of Economic Dynamics and Control, 20, 1523-1539.

[14] Shiryaev, A. (1978). Optimal Stopping Rules. Springer-Verlag.

[15] Walrand, J., & Varaiya, P. (1981). Flows in queueing networks: a martingale approach. Mathematics of Operations Research, 6, 387-404.