Samuel Prentice Nicholas Roy

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA {prentice, nickroy}@mit.edu

The Belief Roadmap: Efficient Planning in Belief Space by

Factoring the

Covariance

Abstract

When a mobile agent does not know its position perfectly, incorporating the predicted uncertainty of future position estimates into the planning process can lead to substantially better motion performance. However, planning in the space ofprobabilistic position estimates, or beliefspace, can incur a substantial computational cost. In this paper, we show that planning in belief space can be performed efficiently for linear Gaussian systems by using a factored form of the covariance matrix. This factored form allows several prediction and measurement steps to be combined into a single linear transfer function, leading to very efficient posterior belief prediction during planning. We give a belief-space variant ofthe probabilistic roadmap algorithm called the belief roadmap (BRM) and show that the BRM can compute plans substantially faster than conventional beliefspace planning. We conclude with performance results for an agent using ultra-wide bandwidth radio beacons to localize and show that we can efficiently generate plans that avoid failures due to loss of accurate position estimation.

KEY WORDS—planning under uncertainty, motion planning, probabilistic, state estimation

1. Introduction

Sequential decision making with incomplete state information is an essential ability for most real-world autonomous systems. For example, robots without perfect state information can use probabilistic inference to compute a distribution over possible

The International Journal of Robotics Research

Vol. 28, No. 11-12, November/December 2009, pp. 1448-1465

DOI: 10.1177/0278364909341659

© The Author(s), 2009. Reprints and permissions:

http://www.sagepub.co.uk/journalsPermissions.nav

Figures 1-8 appear in color online: http://ijr.sagepub.com

states from sensor measurements, leading to robust state estimation. Incorporating knowledge of the uncertainty of this state distribution, or belief, into the planning process can similarly lead to increased robustness and improved performance of the autonomous system; the most general formulation of this problem is known as the partially observable Markov decision process (POMDP) (Kaelbling et al. 1998). Unfortunately, despite the recent development of efficient exact and approximate algorithms for solving POMDPs, planning in belief space has had limited success in addressing large real-world problems. The existing planning algorithms almost always rely on discrete representations of the agent state, dynamics and perception and finding a plan usually requires optimizing a policy across the entire belief space, inevitably leading to problems with scalability.

In contrast, the motion planning community has realized considerable success in using stochastic search to find paths through high-dimensional configuration spaces with algorithms such as the probabilistic roadmap (PRM) (Kavraki et al. 1996) or rapidly exploring randomized trees (RRTs) (LaValle and Kuffner 2001). Some approaches have extended these techniques to allow uncertainty over the effects of actions by modeling the planning problem as a Markov decision process (MDP). MDP-based planning through a PRM optimizes over a stochastic action space (Alterovitz et al. 2006, 2007), but still assumes perfect knowledge of the state. More recently, Pepy et al. (2008) used a RRT-based approach to plan over a set representation of uncertain states; however, only the effects of action uncertainty were considered in state prediction, resulting in the monotonic growth of belief uncertainty without exteroceptive observations. Incorporating the full probability distribution provided by state estimation into planning algorithms such as the PRM or RRT has generally not been feasible. Computing the reachable part of belief space can be expensive; predicting the full evolution of the agent's belief over time, incorporating both stochastic actions and noisy observa-

tions, involves costly non-linear operations such as matrix inversions. Furthermore, the reachable belief space depends on the initial conditions of the robot and must be re-computed when the robot's state estimate changes. Therefore, any work done in predicting the effect of a sequence of actions through belief space must be completely reproduced for a query from a new start position.

We present a formulation for planning in belief space which allows us to compute the reachable belief space and find minimum expected cost paths efficiently. Our formulation is inspired by the PRM, and we show how a graph representation of the reachable belief space can be constructed for an initial query and then re-used for future queries. We develop this formulation using the Kalman filter (Kalman 1960), a common form of linear Gaussian state estimation. We first provide results from linear filtering theory and optimal control (Vaughan 1970) showing that the covariance of the Kalman filter can be factored, leading to a linear update step in the belief representation. This technique has been well-established in the optimal control and filtering literature; however, its use for prediction in planning is both novel and powerful. Using this result, the mean and covariance resulting from a sequence of actions and observations can be combined into a single prediction step for planning. The factored form not only allows the graph of reachable belief space to be computed efficiently, but also updated online for additional queries based on new initial conditions. Optimal paths with respect to the roadmap can therefore be found in time linear with the size of the graph, leading to greatly accelerated planning times compared with existing techniques.

The specific problem we address is an agent navigating in a Global Positioning System (GPS)-denied environment. The GPS provides position estimates of a user's location on Earth to within a few meters, enabling geolocation in areas with a clear line-of-sight (LOS) to GPS satellites, but in indoor and dense urban environments, GPS becomes unreliable or altogether unavailable. One approach to mitigating the loss of GPS involves placing anchor radio beacons around the target environment, enabling localization by computing range measurements from the radio-frequency (RF) signals. However, even with spread-spectrum technologies such as ultra-wide bandwidth (UWB) radio, the quality of positional information available to a mobile agent varies across the environment. When a signal is transmitted, the channel properties of the environment may result in multiple signal path reflections; interference between these path reflections can result in signal degradation, or even cancellation, which inhibits the receiver's ability to successfully resolve the original signal for time-based ranging. Obstacles in the environment may occlude the radio transceivers, blocking signal transmission or adding substantial material propagation delays, both of which are problematic for ranging applications. We show that integrating predictions of position uncertainty into the planning algorithm enables the robot to choose plans that maximize the

probability of reaching the goal, avoiding trajectories through regions where the density of UWB signals is low that lead to very uncertain estimates of the agent position. We give experimental results demonstrating this algorithm for motion planning of a mobile robot, in which the robot must use UWB range estimates for tracking its position. We conclude with results that show planning using a simulated UWB ranging model to navigate across Massachusetts Institute of Technology (MIT) campus.

2. Trajectory Planning and Search

Given a map, model of robot kinematics, and the start and goal positions, the objective of trajectory planning is to find the minimum-cost collision-free path from start to goal. We restrict the discussion in this paper to kinematic motion planning; we plan to extend this work to kinodynamic planning in future work. Here C denotes the configuration space (Lozano-Perez 1983), the space of all robot poses, Cfree is the set of all collision-free poses (based on the map of obstacle positions) and Cobst is the set of poses resulting in collision with obstacles, so that C = Cfree U Cobst. When the state is fully observable, the PRM algorithm (Kavraki et al. 1996; Bohlin and Kavraki 2000) can be used to find a path through Cfree by generating a discrete graph approximation of Cfree. The PRM provides a general framework for efficiently solving fully observable motion planning problems in two stages, as follows.

1. Pre-processing phase. The PRM first constructs a graph, or roadmap, that is a simplified representation of Cfree. Robot poses are sampled from C according to a suitable probabilistic measure and tested to determine whether each pose lies in Cfree or Cobst. Poses within Cfree (i.e. that do not collide with obstacles) are retained and added as nodes to the graph. Edges in the graph are placed between nodes where a straight-line path between the nodes also lies entirely in Cfree. Typically, edges are limited to the k nearest-neighbor nodes or to the neighbors that are closer than some bounding distance.

2. Query phase. Given a start and goal pose, a graph search algorithm is used to find a path through the graph that connects the corresponding start and goal nodes. As a result, the PRM returns a path consisting of a series of straight-line trajectories between waypoints. If the start and goal poses are not already nodes in the graph, additional nodes are added to the graph and edges to any reachable graph nodes are added as in the pre-processing phase.

The power of the PRM resides in the fact that even if Cfree cannot be computed tractably, it is relatively efficient to de-

termine whether an arbitrary node or edge lies in Cfree. As a result, when planning in high-dimensional spaces, Cfree can be approximated as a discrete graph by sampling poses from C, retaining the collision-free samples and straight-line trajectories. Note that there is an implicit assumption in using the PRM, specifically, that some controller exists that can be used to follow a straight-line trajectory from waypoint to way-point without collisions when the edge is known to lie entirely in Cfree. It should also be noted that any sampling-based roadmap approach inherently sacrifices guaranteed global op-timality for the computational tractability of solutions within the discrete graph approximation. "Optimal" plans within the discrete roadmap are approximations of optimal trajectories in the continuous state space, which converge as the size of the roadmap grows.

3. Belief Estimation in Linear Gaussian Systems

When the agent does not have access to near-perfect information about its state in the world using some external positioning system such as GPS, the agent can infer its position from a history of sensor measurements and control actions. The sensor measurements constitute observations of the environment, and can be matched against a prior model of the world, such as a map. Typically, these observations are noisy and imperfect; as a result, statistical inference can be used to maintain a distribution over possible positions, essentially averaging out errors in the observations over time. While the inference results in a probability distribution over the agent's state, a common assumption is that the maximum likelihood state under the distribution can be used in place of the true state when executing a plan.

Let us denote the (unknown) state of the agent at time t as st. If the agent takes an action according to some control ut, then at time t + 1 the agent has moved to some new state st+1 that is drawn stochastically according to some transition probability distribution p(st+1 |st, ut). After each motion, the agent receives an observation zt that is drawn stochastically according to some observation probability distribution p(zt |st). With knowledge of the transition and observation distributions, the agent can estimate the probability of its current state bt = p(st |u1:t, z1:t) after a sequence of controls and observations.

A common assumption is that the posterior state st after some control input ut depends only on the prior state st_1 such that p(st |St_1, ui:t, Zi:t—1) = p(st |St-i, ut). Similarly, the likelihood of an observation depends only on the current state, p(zt |st, u1:t, z1:t—1) = p(zt|st). These assumptions allow the posterior belief bt to be computed recursively as

p(st |ui:t, zi:t) = Zp(zt S)J p(St ^t, St—l)p{St—l)dSt—1, (1)

where Z is a normalization factor. Equation (1) is the standard Bayes' filter equation, and provides a recursive form of updating the state distribution.

Implementing the Bayes' filter requires committing to a specific representation of the state distribution p(st), with consequences on how the transition p(st |st_1, ut) and observation p(zt |st) functions are represented, and on the tractability of performing the integration in Equation (1). One of the most common representations is the Kalman filter (Kalman 1960), in which the state distribution is assumed to be Gaussian and the transition and observation functions are linear with Gaussian noise. If the true system transition and observation functions are non-linear, the extended Kalman filter (EKF) (Smith et al. 1990) linearizes the transition and observation functions at each step. A full derivation of the EKF is outside the scope of this paper, but briefly, the assumption is that

St = g(st-i, Ut, Wt), Wt ~ N(0, Wt),

zt = h(st, qt), qt ~ N(0, Qt), (3)

where wt and qt are random, unobservable noise variables. In the presence of this unobserved noise, the EKF estimates the state at time t from the estimate at time t — 1 in two separate steps: a process step based only on the control input ut leading to an estimate p(st) = N, ht), and a measurement step to complete the estimate of p(st). The process step follows as

Ht = g(Ht-u ut) % = Gt Zt-i GT + VtWtVj,

where Gt is the Jacobian of g with respect to s and Vt is the Jacobian of g with respect to w. For convenience, we denote Rt = VtW,Vj. Similarly, the measurement step updates the belief as follows:

Ht = Ht + Kt(h(Ht) - Zt)

%t = %t — KtHt %t,

(6) (7)

where Ht is the Jacobian of h with respect to s and Kt is known as the Kalman gain,

Kt = %tHj (Ht %tHj + Qt

If the measurement function is conditionally independent of a large number of state variables, the information form of the EKF may be more computationally efficient. The distribution p(st\u\:t, zi:t) can be represented by the information vector and the information matrix Qt = S"1 (Julier et al. 1995). The information matrix updates in the process and measurement steps of the extended information filter (EIF) can be written as

Qt = = (Gt Zt-G + Rt )

Qt = Qt

Hj Q-1 Ht.

(9) (10)

For convenience, we use Mt = HjQ~l Ht to denote the information matrix corresponding to a given measurement, such that Q = Q + Mt.

4. Belief Space Planning

The assumption of most planning and control algorithms is that knowledge of the mean of the belief distribution is sufficient for good performance. However, if the planner uses both the mean and the covariance of the belief in choosing actions, different plans can be computed depending on whether the position estimate is very certain (for example, a norm of the covariance is small), or if the position estimate is uncertain (a norm of the covariance is large). The robot can then balance shorter paths against those with greater potential to reduce belief uncertainty through sensor information gain, choosing longer but more conservative motion plans when appropriate. Note that by incorporating additional statistics from the belief such as the covariance, we are doing nothing more than increasing the state space of the agent. Instead of planning in configuration space, the agent is now planning in belief space B, or information space, but the basic problem is essentially the same: the planner must find a sequence of actions {u0, ...,ut} such that the resulting beliefs {b0,...,bt} maximize the objective function J of the robot. Conventional motion planners generally search for collision-free paths that minimize the cost of moving to the goal location, such that

J{s,) = min C{s, - sgoal) + У] c{s,, u,),

where C is the cost of the distance to the goal location and c is the cost of executing control ut from state st.

Planning in a Gaussian belief space requires a different objective function since every belief has some non-zero probability that the robot is at the goal state (although this probability may be extremely small for beliefs where the mean is far from the goal). A more appropriate objective function is therefore to minimize the expected cost of the path, ESo.T [ J(st)]. In practice, the effect of uncertainty along the trajectory is dominated by the effect of uncertainty at the goal, allowing us to approximate the cost with a more efficient cost function:

J {b, ) « m in Ebgoal| b, ,u0:T [C {s, - sgoal)] + 2^ c{b,, u, ), (12)

where the expectation is taken with respect to the belief state at the goal.

A trivial extension of the PRM for solving this optimization problem would first generate a graph by sampling belief nodes randomly and creating edges between nodes where an action exists to move the robot from one belief to another.

Graph search would then find a trajectory to the belief with highest probability of being at the goal.

The difficulty with this approach is that the control problem is under-actuated and, thus, only a specific subset of beliefs B* is actually realizable. Even if the robot has full control of the mean of its belief, the covariance evolves as a complicated, non-linear function of both the robot controls and environmental observations. If the robot has full control over its л-dimensional mean, then the reachable part of the belief space is an л-dimensional manifold embedded in the n3-dimensional belief space, and therefore a subset of measure zero. It is van-ishingly unlikely that any belief b' e B* would ever be sampled such that there exists a control u to reach it.

A different approach must therefore be used for building the belief graph. Since the robot does have control authority over its mean, it is possible to sample mean components of the belief, then predict the corresponding covariance components. This process is shown in Figure 1. Let us sample a set of mean poses {u¡} from Cfree as the nodes in the PRM graph (Figure 1(a)). We add an edge e¡j between pairs (u¡, uj) if a sequence of controls u¡j = {uti, ■■■,utj} exists to move along the straight line between u¡ and u j without collision (Figure 1(b)). We then simulate a sequence of controls and measurements along each edge; for each step along the edge e¡j, the Gt, Rt and Mt matrices are computed using the appropriate models. Finally, we perform search in the graph to find a sequence of controls starting from the initial belief b0 such that the posterior covariance at the end of the sequence is minimized, computing this posterior covariance using Equations (5) and (7). Figure 1(c) shows the result of the search process and the minimum covariance achieved at each graph node. Figure 1(d) demonstrates the advantage of planning in belief space. The belief space planner detours from the shortest path, finding a trajectory that is rich with sensor information to enable superior localization.

5. Linearizing Belief Space Planning

A major computational bottleneck with the planning algorithm described above is that standard search optimization techniques cannot be used, such as re-using portions of previous solutions. While the EKF is an efficient model for tracking the probability distribution of both linear and well-behaved nonlinear systems, the update steps in the filtering process itself are non-linear. In particular, any trajectory optimization that depends on the covariance requires solving the following Ric-cati equation (from Equations (5) and (7)):

Zt = (Gt Zt-1 GT + Rt) - (Gt Zt-^T + Rt)Hj x (Ht (Gt Zt-1 GT + Rt) Hj + Qt)-1 Ht

x {G,Et-1G? + R,)■

Fig 1. A basic example of building a belief-space roadmap in an environment with ranging beacons. (a) Distribution means are sampled, and the means in Cfree are kept. (b) Edges between distributions that lie in Cfree are added to the graph. (c) Once the graph is built, an initial belief (lower right, labeled START) can be propagated through the graph by simulating the agent's motion and sensor measurements, and performing the appropriate filter update steps in sequence along edges. The posterior distribution at each node is drawn with 1 — a uncertainty ellipses, and results from a single-source, minimum uncertainty search path to that node. In this example, we artificially increased the noise in the robot motion to make the positional uncertainty clearly visible throughout the environment. (d) Reiteration of the benefit of incorporating the full belief distribution in planning. The belief space planner detours from the shortest path through a sensing-rich portion of the environment to remain well localized.

(c) Resulting belief space graph (d) Advantage of belief space planning

As a result, if the initial conditions or the trajectory itself are modified in any way, the covariance must be recomputed from scratch. If the planner computes a predicted posterior state (ut, 2t) from an initial distribution (u0, 20) and a predicted sequence of actions and observations using a set of t EKF updates, the non-linearity prevents us from computing a new posterior state (ut, 2t) from a different set of initial conditions (u0, 20), except by repeating the entire sequence of t EKF updates. This is not the case for the mean ut for most real-world systems; for a sequence of controls {u0,...,ut}, under some reasonably mild assumptions, once Ut is calculated from u0, a new ut can be calculated in a single step from a different u0. The EKF update of the mean becomes linear during predictive planning when the measurement zt is assumed to be the

maximum likelihood observation zt = h(ut), which simplifies Equation (6) to ut = Ut.

For a trajectory formed from a sequence of k graph edges each of length l, O(kl) EKF process and measurement updates are required. The asymptotic complexity of the overall problem is O(lbd) for a search depth of d edges in the graph with a branching factor of b; the computational cost of the specific EKF updates along each edge may seem negligible as a constant multiplier of the exponential growth, but this term has a significant effect on the overall time to plan. However, if the covariance is factored appropriately, we can show that the EKF update equations for each factor separately are, in fact, linear. Along with other benefits, the linearity will allow us to combine multiple EKF updates into a single transfer function f -

Fig. 2. Belief prediction with one-step covariance transfer function. Nodes i and j are connected by an edge in Cfree; a beacon (diamond) provides range measurements. (Top) Standard EKF updates are used in succession to propagate the initial covariance Z0 along edge eij in multiple filtering steps. (Bottom) The one-step transfer function f ij encodes the effects of multiple EKF process and measurement covariance updates, enabling the posterior covariance ZT to be computed in one efficient step given any novel initial covariance Z0.

associated with each edge e¡j to efficiently predict the posterior filter state from a sequence of controls and measurements in a single step. Although initial construction of the graph and transfer functions requires a cost of 0(l) per edge, this construction cost can be amortized, leading to a planning complexity of 0(bd), equivalent to the fully observable case.

The one-step covariance transfer function f ¡j is demonstrated in Figure 2. In the top, standard EKF updates are used to predict the evolution of an initial belief (u0, Z0) along edge e¡j. The belief is updated in multiple filter steps across the edge; the mean u propagates linearly (with the assumptions stated above), while the covariance requires the computation of a corresponding series of non-linear updates according to Equation (13). The posterior belief (uT, ZT) resulting from the initial belief (u0, Z0) is recovered after T update steps, and must be fully re-computed for a change in initial conditions. The one-step transfer function is shown in the bottom of Figure 2. An initial belief at node i is propagated through f ¡j in one efficient step to recover the posterior covariance at node j . The following section derives this transfer function by showing that EKF covariance updates can be composed.

5.1. Linear Covariance Updates

To show the linearization of the EKF covariance update, we rely on previous results from linear filtering theory and optimal

control (Vaughan 1970) to make use of the following matrix inversion lemma.

Lemma 1. We have

(A + BC-1)-1 = (ACC-1 + BC-1)-1, = C (AC + B )-1.

Theorem 1. The covariance can be factored as Z = BC-1, where the combined EKF process and measurement update gives Bt and Ct as linear functions of Bt-i and Ct-1.

Proof. We proceed by proof by induction.

Base case. We can show the theorem to be trivially true, as

S0 = B0C0 1 = S01 1 ■

Induction step. Given

St-l = Bt-lC--11,

from Equation (5),

S, = G,B,-lC--11GT + R, Z, = {GtB,-l){GfCt-1)-1 + R, ■ From Lemma 1,

S = ({G;TCt-1) {GtB,-1 + Rt {GfCt-l))'1)

St = D,E,

S, = E,D

where Dt = Gf"TCt_i and Et = GtBt + Rt (GfC, As a result, we can see that the process update preserves the factored form of Z. Similarly, if we start with the information form for the covariance update, from Equation (10),

St = {S-1 + Hj Q- Hj)-\ Substituting in Mt and Equation (20),

St = (DtE;1 + m, )-1. Again from Lemma 1,

St = Et {Dt + M,Et )-1,

S, = BtC- \

(21) (22)

where Bt = Et and Ct = Dt + MtEt. If we collect terms, we see that

Bt = Et = GtBt-i + Rt (G_TCt_i) (25)

Ct = Dt + MtEt (26)

= G_TCt_i + Mt(GtBt_i + Rt(GfCt_i)). (27)

In both cases, Bt and Ct are linear functions of Bt_1 and Ct_1.

Collecting terms again, we can re-write Equations (25) and (26), such that

B W X B

C t Y Z t C

0 I 0 G _T B

I M t G RG _T t C

where Yt is the stacked block matrix the covariance factors and f t =

W X Y Z

transfer function for the covariance factors.

consisting of

is the one-step

5.2. Initial Conditions

In order to use this factored form of the covariance, we need to ensure that this factorization applies across all possible initial conditions. To verify that the factorization and update are independent of the initial conditions, we show that our assumed initial condition So = X^^1 = SoI_ is an achievable result of performing a boundary update from each of two possible boundary conditions, which we denote with a minus subscript as = X_/Y_ or, equivalently, Q_ = Y_/X_, with a slight abuse of notation. The first boundary condition we consider is that of infinite uncertainty or, equivalently, zero information. The second is that of zero uncertainty or, equivalently, infinite information. The linear system corresponding to the boundary update is given as

We consider each boundary condition in turn, by solving the system in Equation (30) and imposing the constraint So =

X oYo-1.

X G RG _T X

Y 0 MG MRG_T + G_T 0 Y

5.2.1. Boundary Case: Infinite Uncertainty, Zero Information

The boundary condition of infinite uncertainty £_ = œ>, or zero information Q_ = 0, corresponds to the case where Y_ = 0, which is shown as follows:

= œ>,

X_ X_ = o.

Using Equation (3o), the covariance factors are written as

X o = Go X_+ o = G o X _, (34)

Yo = Mo GoX_ + o = MoGoX_. (35)

Solving for the initial condition using Equations (34) and (35), we obtain

X0Y0-1 = G 0 X- ■ X-1 G-1 M-1 = M-1

which implies the following constraint:

£0 = M,

-1 0 •

By denoting A = Go X_ and applying the constraint in Equation (36) to Equations (34) and (35), we obtain the following solution set:

Y 0 _£-1A_

A = 0^

Note that our trivial initial condition of Xo — So and Yo = I is valid with A = So.

The result shown above is intuitive when considering EKF/EIF control and measurement updates from the boundary condition of zero information Q_ = o and, equivalently, infinite uncertainty = to. The EKF control update is irrelevant, since adding any process noise to infinite covariance results in infinity:

£0 = G0E-Gn + R0 = G0^Gn + R0 =

The measurement update, however, shows that an update from this boundary condition corresponds to receiving a measurement Mo = Qo:

Q0 = Q- + M0 = 0 + M0 = M0.

Thus, this boundary update is equivalent to beginning in a state with zero information and increasing certainty by incorporating a measurement of value M0 = Q0 =

5.2.2. Boundary Case: Zero Uncertainty, Infinite Information where

The boundary condition of zero uncertainty = 0, or infinite information Q_ = to, corresponds to the case where X_ = 0, which is shown as follows:

= T = T_ =

Q_ = — = —

Y_ = 0.

As before, the covariance factors are written as

Xo = Go ■ 0 + Ro G0TY_ = Ro G0TY_ Yo = M0G0 ■ 0 + (M0R0G_T + G_T)Y_ = ( Mo Ro + I )G _TY_.

Computing the initial covariance corresponding to Equations (43) and (45) gives us

X0Y0—1 = R0 G—TY— ■ Y—1 GT0(M0 R0 + I)—1

= Ro( Mo Ro + I )_1, implying the following constraint on R0 and M0:

So = Ro( Mo Ro + I )_1.

We make the substitution B = G— Y— for the free variables, and apply the constraint in Equation (46) to Equations (43) and (45), yielding the solution set

(Mo Ro + I )B

B = 0.

This result is also intuitive, although not as straightforward as the previous boundary update in which the control update had no effect. Owing to the ordering of control and measurement updates, the initial covariance can result from a combination of both adding uncertainty R0 to the boundary state of perfect information, and then subsequently adding measurement information M0. It is for this reason that the constraint set is a function of both R0 and M0. However, our assumed initial condition of X0 = S0 and Y0 = I is the trivial result of adding only process noise R0 = S0 and zero measurement information M0 = 0, with B = I.

6. Redheffer Star Product

The factored covariance representation and matrix form of the update given in Equation (29) represents a non-recursive solution to the Riccati Equation (13) in the symplectic form. The 2n x 2n matrix Y is by definition symplectic if

Y J YT = J, (48)

0 - In

and In is the n x n identity matrix. The eigenvalues of symplectic matrices such as Y occur in reciprocal pairs such that if X is an eigenvalue of Y, then so is X—l. Unfortunately, as a result, composition of symplectic matrices is known to become numerically unstable as round-off errors in computation can result in loss of the symplectic eigenvalue structure (Fassbender 2000). An alternate form that provides greater numerical stability through inherent structure preservation is the Hamiltonian form, with the corresponding composition operator known as the Redheffer "star" product (denoted with a "*"); see Redheffer (1962). A 2n x 2n block matrix

A B C D

is called Hamiltonian if

J S = ( J S)T = _ST J,

noting that JT = J _1 = _ J. We show that there is a Hamil-tonian representation and composition operator equivalent to the symplectic form that does not share the same numerical instability.

6.1. Derivation of Star Product

We can formally derive the Hamiltonian method of composition by starting with descriptor matrices of a Hamiltonian system at two adjacent timesteps as follows:

For clarity it should be stated that these two system matrices have the same block structure, where each block actually corresponds to a time-varying quantity. Our goal is to determine the Hamiltonian matrix corresponding to the aggregate system that represents both Equation (52) and Equation (53). We show that the resulting system can be computed using the star product in the following manner:

A B Xl

_yi. C D _y2_

X3 V x" X2

_y2_ Y Z _y3_

X3 A B •k V X Xl

_yi. C D Y Z y3.

Our explicit goal is to derive equations for the variables x3 and yi in terms of xi and y3 to determine the star product operation in Equation (54). We begin by writing the given equations from the systems in Equations (52) and (53), as follows:

X2 = Axi + By2 yi = Cxi + Dy2 x3 = Wx2 + Xy3 У2 = Yx2 + Zy3.

Substituting y2 from Equation (58) into Equation (55) we solve for x2 as follows:

X2 = Axi + B(YX2 + Zy3)

(I _ BY)X2 = Axi + BZy3

X2 = (I _ BY)_iAxi + (I _ BY)_iBZy3 (59)

We also substitute x2 from Equation (55) into Equation (58) to solve for y2 as follows:

y2 = Y (Axi + By2) + Zy3

(I _ YB)y2 = YAxi + Zy3

y2 = (I _ YB)_iYAxi + (I _ YB)_iZy3. (60)

Now with x2 and y2 both written in terms of xi and y3, it is possible to similarly solve for x3 and yi. To solve for x3, we substitute x2 from Equation (59) into Equation (57):

x3 = Wx2 + Xy3

= ((I _ BY)_iAxi + (I _ BY)_iBZy3) + Xy3,

which is simplified to give the desired result

x3 = W(I _ BY)_i Axi + (X + W(I _ BY)_iBZ)y3. (6i)

We also substitute y2 from Equation (60) into Equation (56) to solve for yi as follows:

yi = Cxi + Dy2

= Cxi + D(I _ YB)_iYAxi + D(I _ YB)_iZy3,

which simplifies to become

yi = (C + D(I _ YB)_iYA)xi + D(I _ YB)_i Zy3. (62)

Now, with Equations (6i) and (62), our solution is obtained as the aggregate system in Equation (54), which can now be written in terms of one matrix as

W (I - BY )-1 A X + W ( I - BY )-1 BZ C + D(I - YB)-lYA D(I - YB)-1 Z

where we have now derived the star product as a set of matrix block operators, given as

A B W X

Si * So = *

C D Y Z

W ( I - BY )-1 A X + W (I C + D(I

-BY)-1BZ D(I - YB)-1 Z

-YB)-1YA 6.2. Scattering Theory Parallel

The Hamiltonian method of composition can be demonstrated most intuitively with an analogy in scattering theory stemming from Redheffer's original work and developed in the context of optimal filtering by Kailath et al. (1976).

The key to this analogy lies in the forward-backward Hamiltonian system associated with the discrete Riccati difference equation: in filtering, this corresponds to a system which simultaneously produces filtered and smoothed estimates; in scattering theory it is interpreted as waves traveling through a medium in opposite directions with forward and backward transmission and reflection operators, whose interactions are determined by the state space parameters {G, R, M}. Given these parameters for a set of consecutive scattering medium layers or, equivalently, a set of consecutive Kalman filter updates, the descriptor matrices for each update can be combined using the star product to produce one descriptor matrix. This resulting descriptor matrix represents the aggregate scattering medium or, equivalently, the aggregate filter update.

The Riccati difference equation is represented as follows:

Zt = Rt + Gt Zt-i(I - Mt Zt-i)-1 GT,

where at filtering time t (or scattering layer t) Gt = state transition matrix

(or scattering transmission matrix) Rt = error covariance (or right reflection coefficient) Mt = measurement information (or left reflection coefficient)

and the associated Hamiltonian matrix is called the scattering matrix and has the form

G R - M GT

Composition of multiple layers can be performed using the star product as

S,:t = S, * S,+i * ■■■ * ST, (68)

where S,:T is the aggregate scattering matrix capturing the effects of layers, through T.

It can be shown that individual control and measurement steps have corresponding scattering matrices by noting the direct correspondence between EKF/EIF updates and the Riccati Equation (66). The control update yields the control update scattering matrix

G R 0 GT

Similarly, the measurement update gives the measurement update scattering matrix

Thus, multiple filter updates can be composed in the Hamil-tonian form as in Equation (68) by star-producing the corresponding scattering matrices in succession:

s, = sc * sm ■

6.2.1. Initial Conditions

The initial conditions in this formulation are handled in a similar fashion to our discussion in Section 5.2. The insight in applying the initial covariance is to create a boundary layer, which is a scattering layer that is attached to the null scattering layer.

It is straightforward to see that the initial covariance I0 could be represented as a null layer with process noise R = I0, or alternatively with additional measurement information M = S-1 = Q0. These are identical to the cases derived in Section 5.2. The first two cases would be boundary layers described as

I I 0 I

or 0 =

where this boundary layer can be attached to a scattering medium to impose an initial condition.

6.3. Computing the Hamiltonian One-step Update

The key to applying the star product for composition lies in the associative property of the star product operation (Redhef-fer 1962), which ensures that the star product of any scattering matrices yields another scattering matrix of the same block form. To compose the filter updates for T timesteps, we begin by computing the aggregate scattering matrix,

Sl:T =

Gl:T Rl:T - Mi:T GTt

= Si * S2 * ■ ■■ * St ■

We can then apply a novel initial condition I0 and use Equation (66) to solve for the posterior covariance,

"■ It" I I0"

(The matrix elements ■ are irrelevant to the final solution for the covariance.) As mentioned previously, the advantage to this formulation is that the aggregation of Hamiltonian matrices is numerically more stable than the symplectic form of Equation (29).

7. The Belief Roadmap Algorithm

The belief space planning algorithm can now be shown as a two-stage process. First, mean positions of the robot are sampled, as in the PRM algorithm, and edges between visible graph nodes are added. The corresponding process and measurement Jacobians are calculated at steps along each edge and assembled via matrix multiplication into a one-step transfer function for the covariance, f ij, according to Equation (29).

In the second stage, a standard search algorithm is used to compute the sequence of edges through the graph, starting at b0, that maximizes the probability of being at the goal (or, equivalently, results in minimal belief covariance at the goal). During search, each f ¡j now allows us to compute the posterior covariance 2 j that results at node j by starting at node i with covariance 2, moving in one efficient step along edge eij. We call this algorithm the belief roadmap (BRM) planner. The build and search phases of the BRM planner are shown in Algorithms 1 and 2, respectively.

There are several points of discussion that we address in turn. First, note that this must be a forward search process; the terminal node of this path cannot be determined a priori, for while the goal location ^goal is known, the covariance (and, hence, bt (sgoal)) depends on the specific path.

Second, the BRM search process in Algorithm 2 assumes a queue function that orders the expansion of 2) nodes. Breadth-first search sorts the nodes in a first-in, first-out order.

Algorithm 1 The BRM build process. Require: Map C over mean robot poses i: Sample mean poses {pi} from Cfree using a standard PRM sampling strategy to build belief graph node set {ni} such that

Hi [m] = Mi

2: Create edge set {eij} between nodes (ni, Hj) if the straight-line path between (ni [m], Hj [m]) is collision-free 3: Build one-step transfer functions {f j V eij e {e,j} 4: return Belief graph Q = {{}, {e,-j}, {f,,}}

Algorithm 2 The BRM search process.

Require: Start belief (p0, 20), goal location ^goal and belief graph Q Ensure: Path p from /г0 to ^goal with minimum goal covariance 2goal.

Append G with nodes {n0, ngoal}, edges {{e0,j}, {eijgoal}}, and one-step transfer functions {{f 0, j}, {fi,goal}} Augment node structure with best path p = 0 and covariance Z = 0, such that ni = {ju, Z, p} Create search queue with initial position and covariance Q n0 = {/z0,Z0, 0} while Q is not empty do Pop n Q if n = ngoal then Continue

end if

for all n' such that 3e„y and not n' э n[p] do

Compute one-step update T' = fn n, ■ T, where T = [n[f] ] t; ! ■ T21-1

if tr(Z') < tr(n'[S]) then n' «-{nVL Z, n[p] U{n'}} Push n' ^ Q end if end for end while return ngoal[p]

Note that cycles in the path are disallowed in line 9 of Algorithm 2, where a neighboring node n' is only considered if it is not already in the search state path n[p]. This guarantees that upon termination of the breadth-first search process, the minimum covariance path stored at the goal node is optimal with respect to the roadmap. More intelligent search processes rely on an A* heuristic to find the goal state faster; however, common admissible heuristics do not apply as the evolution of the covariance through the roadmap is non-monotonic (owing to the expansion and contraction of uncertainty in the underlying state estimation process). Developing suitable A* heuristics for planning in belief space are a topic for future work. Similarly, the applicability of dynamic search processes, such as the D* family of algorithms (Stentz 1995; Koenig and Likhachev 2002) and anytime methods (Van den Berg et al. 2006), is a direction for future research. For the results given in this paper, we used exclusively breadth-first search for both shortest-path (PRM) and belief-space (BRM) planning problems.

In addition, note that in lines 12-13 of Algorithm 2 that we only expand nodes where the search process has not already

found a posterior covariance n'[£] such that some measure of uncertainty such as the trace or determinant is less than the measure of the new posterior covariance 2'. It is also assumed that a node n' replaces any current queue member n' when pushed onto the queue in line 14.

Further, considerable work has been devoted to finding good sampling strategies in fully observable motion planing problems (we refer the reader to Hsu et al. (2006) and Mis-siuro and Roy (2006)). Such strategies can bias samples towards different topological features and areas of interest to improve both the quality and efficiency of the roadmap. For the results in this paper, we used a medial-axis sampling strategy for both the shortest-path (PRM) and belief-space (BRM) planning problems. However, it is likely that better belief-space planning would result from sampling strategies that are aware of the sensor model. Similarly, a sampling strategy that incorporates the cost function would also lead to improved planning, especially for cost functions that are not solely a function of the distribution over the goal state. By iteratively computing expected costs and re-sampling the roadmap, an

Algorithm 3 The minmax-BRM algorithm.

Require: Start belief (p0, So), goal location pgoal and belief graph Q Ensure: Path p from p0 to pgoal with minimum maximum covariance. 1: Q = {{}, {e.j}, {j ^ Build_BRM_Graph (Bp)

2: Append Q with nodes {n0, ngoai}, edges {{e0,j}, {eljgoal}}, and one-step descriptors {{S0,j}, {5ijgoai}}

3: Augment node structure with best path p = 0 and maximum covariance 2mpax = to along path p, such that ni =

p s, p, smj

Create search queue with initial position and covariance Q n0 = {p0, 20, 0, to} while Q is not empty do Pop n Q if n = ngoal then Continue

end if

for all n' such that 3eny and n' 3 n[p] do

Compute one-step update Y' = f n n, ■ Y, where Y = [n[/2] ]

2' ^ y;! ■ Y2!

if max(tr(2'), tr(n[Smx])) < tr(n'[2mj) then n; №], 2', {n[p], n'}, max(2', n[2mj)} Push n' ^ Q end if end for end while return ngoal[p]

upper-bound on the expected cost of the entire computed plan can be achieved. The exact algorithm for iteratively planning-resampling is outside the scope of this paper.

7.1. Modified BRM for MinMax Path Uncertainty

In the BRM formulation shown in Algorithm 2, the search process finds the series of trajectories that results in minimal uncertainty at the goal location; however, it may be desirable to instead limit the maximum uncertainty encountered along an entire path. One approach could be to impose bounds on the maximum allowable uncertainty during the BRM search tr(2) < trmax to discard undesirable goal paths. In a more principled approach, one could modify the BRM search process to optimize an alternative objective function that minimizes the maximum predicted uncertainty along the entire path. Within the context of the BRM graph, this approach would consider the posterior covariance predicted at each intermediate belief node in a series of trajectories. The goal would be to minimize the objective function J, which is given as

c(bt[p], ut)

max D(bt [2])

where )(■■■) is the cost of a path, c is the cost of executing control ut from belief pose bt[p], and D is the cost associated with the maximum uncertainty of all discrete belief nodes along the path.

The BRM search process is adapted to minimize the objective function J in Algorithm 3, which we have named the minmax-BRM algorithm. There are two key changes from the standard BRM. First, the augmented search node structure in line 3 stores the best path p to the given node and the maximum covariance 2mpax along p. The best path ni [ p] to node ni corresponds to the series of nodes beginning at the start node n0 that collectively has the minimum-maximum (minmax) co-variance of all such paths considered to node ni. The maximum covariance ni [2^] along this best path ni [p] is also stored in the search state for computing the associated cost D in the objective function J (Equation (75)) and for decision-making during search. Note that the covariance ni[2] stored at node ni is no longer the minimum achievable covariance, but rather the posterior covariance resulting from the best path ni[ p].

Second, the primary decision-making step in line 13 is modified for the new objective function. In this case, the path being considered in the current search state {n[p], n'} is deemed better than the existing path n'[p] to node n' if its maximum uncertainty is less than that of the existing path. Note that the maximum uncertainty of the current search path {n[p], n'} is computed by taking the max function of the associated uncertainty of each portion of this path, which is tr(n[2mpax]) for n[p] and tr(2') for n'. If the current search path is better than the existing path, then the node n' is updated accordingly in line 14 and placed on the queue in line 15.

A key consideration of the minmax-BRM algorithm is that it can only guarantee an optimal solution within the roadmap

for a specific resolution of the uncertainty evolution along a path. In Algorithm 3, we only consider the covariance at node locations along the path. While lines ii-i2 exactly compute the posterior covariance of multiple EKF updates along a trajectory, the underlying multi-step process is not monotonic. This means that it is possible for the covariance at an intermediate point on an edge between two graph nodes to be larger than both the prior covariance and posterior covariance for the full trajectory. It is possible to revert to some form of multistep approach to this problem, but, without further assumptions, the guarantee of minmax covariance will always be limited to the resolution of discretization. We leave the analysis of this problem for future work, and place our focus on the standard BRM for experiments.

It is important to note the generality of the BRM formulation, which was demonstrated in this section by modifying the search process to optimize an alternative objective function. The BRM technique presents a general approach to planning in belief space that can be adapted to solve a broad class of planning problems.

8. UWB Localization

In this work, we applied the belief roadmap algorithm to the problem of navigating in a GPS-denied environment using UWB radio beacons for localization. UWB is a nascent technology that is amenable to ranging applications in dense indoor and urban environments, overcoming weaknesses of traditional narrowband counterparts with its fine delay resolution and large bandwidth that provide immunity to multipath fading and the ability to penetrate building materials (Win and Scholtz i998; Cassioli et al. 2002). Ranges can be computed between a pair of UWB radio beacons by measuring the round-trip time-of-flight of UWB pulse exchanges. While a complete characterization of the UWB channel is still an active area of research and beyond the scope of this paper, here we briefly develop the general ranging model used in this work. A supplementary discussion of the details of this UWB ranging scheme was presented by Prentice (2007).

8.1. UWB Measurement Model

The general UWB sensor model can be written as

rt = dt + bt + Ht, (76)

where rt is the range, dt is the distance between UWB sensors, bt is the range bias, and Ht is additive noise. The round-trip time calculation is approximate in nature, leading to uncertainty in the range calculation which can be modeled as a stochastic process. In a testing campaign, we developed a Gaussian model to describe ranging uncertainty in LOS scenarios. We characterized the Gaussian process by gathering

True Distance vs. Range Bias Error in LOS Scenario

12 r T

_6I-1-1-1-1-1-1-1

0 2 4 6 8 10 12 14

True Distance (m)

Fig. 3. The UWB ranging model in LOS. The x-axis is the true range between sensors and the y-axis is the measured systematic error (unbiased measurements are at zero), and the error bars give the standard deviation of the random noise.

range data between a pair of sensors at various distances with LOS visibility. The distance was increased in 0.25-meter increments from i to i4 meters of separation and at each point, i,000 range samples were gathered. The resulting data is plotted in Figure 3, showing the mean bias Mbias and standard deviation a bias errorbar at each distance.

This data suggests that the LOS range bias can be reasonably modeled as distance-varying Gaussian noise, with mean Mbias(dt) and standard deviation abias(dt). Computing a linear regression yields

Mbias (dt) = Mhiasdt + Mbias,

a bias (dt) = a biasdt + a Mas' (78)

The range function in Equation (76) then becomes

rt = dt + Mbias (dt) + M(0, abias (dt )2), (79)

where the bias bt is now a linear function of the distance Mbias (dt), and the noise term nt is zero-mean Gaussian noise with variance abias(dt)2. When used in filtering problems, the range function in Equation (79) corresponds to the observation function Zt = h(xt) + Vt, with zt = rt, Vt = M(0, abias(dt)2) and h(xt) is given as

h(xt) = dt + Mbias (dt) (80)

= Mbias + (i + Mbias)

X \J(x _ xbeacon)2 + (y _ ybeacon)2, (8i)

where xt is assumed to be the robot pose (x, y, 0)t at time t, and (xbeacon, ybeacon) is the ranging beacon location.

9. Experimental Results

In order to evaluate the BRM algorithm, we performed a series of evaluations on a small planning domain in simulation. The testing consisted of two objectives: (1) to evaluate the quality of plans produced by the BRM algorithm in terms of uncertainty reduction; and (2) to assess the computational advantage of employing the linearized EKF covariance update during the search process.

The experimental setup consisted of small-sized maps with randomly placed ranging beacons using the stochastic range sensor model from Section 8.1. The environment was assumed to be free of obstacles to avoid experimental bias resulting from artifacts in sensor measurements and random trajectory graph generation in environments with varying contours.

We begin by presenting the motion and sensor models used in our experiments, which are linearized versions of the motion and sensor models for use in our EKF-based formulation. Note that, for readability, we omit time index subscripts in the next two sections; however, all matrices derived are time-varying quantities.

9.1. Linearized Motion Model

a = —D sin ( fig + — ) - C sin ( fig

b = D cos ( fg + f j + C cos Í fig +

The linearized process noise in state space is computed as R = VWVT where W is the process noise covariance in control space

a D 0 0

W = 0 a C 0

0 0 2 a -

and V is the mapping from control to state space, computed as the Jacobian of the motion model with respect to the control space components

'Ö gx 8gx 8gx'

8 D 8C 8T

¿ gy 8gy 8gy

8 D 8C 8T

8 gg -SD 8 gg 8c 8 gg ~ST-

Computing these partial derivatives leads to

We use the following non-linear probabilistic motion model,

gx = x + d cos (g + -2) + c cos fg +

= y + d sin (g + t J + c sin í g + T++

gg = 0 + T mod 2k ,

where gx, gy and g0 are the components of g corresponding to each state variable, and the control variable ut is given by ut = [ d c t ]T with down-range D, cross-range C and turn T components.

In the EKF, the state transition matrix G is the Jacobian of the motion model with respect to the state, and is computed by linearizing the state transition function g about the mean state fi as follows:

'8gx 8gx 8gx'

8x 8 y 8g

8gy 8gy 8gy

8x An 8 y An 8g An

8 gg 8x 8gg 8 y 8gg 8g.

1 0 a 0 1 b 0 0 1

— 2 (D sin (g + f)

+ c sin (g + ^))

(g + 2 (D cos (0+ I)

+ c cos (g + ^))

where the components of V are evaluated at the mean state f and the mean control f[u] (D, C, and T take on the mean values of the respective normal distributions).

9.2. Linearized LOS Sensor Model

Similarly, the UWB sensor model developed in Section 8.1 can be linearized for use in the EKF. For convenience we restate the distance-varying Gaussian noise model of the bias N(fb(d),ab(d)) and the observation function z. The noise model is given by

fb (d) = fmd + Mb' a b (d) = ff md + ff b

and the observation function z = h(x) + v is determined by

h(x) = 4 + (i+)V/(x-x)7M7-y)2,

v = M (0,ab (d )2),

where (x, y) is the robot pose, (xb, yb) is the beacon location, d is the Euclidean distance and the parameters of the Gaussian bias distribution are linear functions of the distance.

The linearized transformation from measurement space to state space is computed as the measurement Jacobian H, computed as the partial derivatives of the measurement function with respect to each component of the state:

ôh ôh ôh Sx ôy së_

which becomes

(1+vf )(x -xb)

(1+/f )(y-yb )

.^(x-Xb)2 + (y-yb )2 V (x -Xb)2+(y-yb )2

Note that (x - xb) = d cos 9m and (y - yb) = d sin 9m, where 9m = atan2(y - yb, x-xb) is the angle of measurement relative to the robot pose. Thus, Equation (82) becomes

(1 + /uf ) cos 9 m (1 + /f ) sin 9 m 0

As can be seen, the range measurements yield no information on bearing, and thus only affect the estimation of the x and y components of the robot state. The measurement noise covariance Q = cov(q, q) for a given beacon is the 1 x 1 matrix

(P fd + b)2

Recall that the information matrix update in the measurement step is additive as Qt = Qt + Mt. Thus, the range measurements z[l ] to each of N visible beacons i at time t can be incorporated by computing the aggregate measurement information as

Mt = y Mt[i ],

where Mt[i] = Ht[i]' Q?'H[

=0 [i ]

9.3. Localization Performance

In the first set of analyses, we compared the quality of paths produced using the BRM algorithm with those resulting from a shortest path search in a conventional PRM, which does not incorporate uncertainty. In each test iteration, sensor locations were sampled along randomized trajectories between a start and goal location in an obstacle-free environment with 100 m sides, as shown in Figure 4. This experimental setup tests the ability of the BRM to detour from shorter paths for

Fig. 4. Experimental setup. Range sensor locations (shown as small circles) were sampled along randomized trajectories between a start and goal location. The solid and dashed line show the plans generated by the BRM and shortest path algorithms, respectively, with ellipses indicating the covariances Z along each trajectory. This experimental design tests the ability of the BRM to find paths with greater potential information gain to stay well localized during execution.

those with lower expected uncertainty. We tested the quality of paths computed by the BRM and PRM shortest path planning algorithms by evaluating the average position error obtained at the goal location after executing the prescribed path.

We performed two analyses to demonstrate that the BRM provided more accurate localization; we artificially varied the random noise of the range beacons, and we artificially limited the range of the beacons by discarding measurements beyond a maximum range. In Figure 5(a), we see the performance of the two planning algorithms under conditions of varying noise. As the sensor noise increases, both algorithms have worsened positional accuracy at the goal, but the shortest path algorithm degrades more quickly. The BRM planner contends with increased sensor noise by finding trajectories with higher quality measurements. In Figure 5(b), we see that with a small maximum sensor range, the BRM is able to find trajectories in close proximity to sensors, yielding a reasonable level of positional accuracy. As the maximum sensor range increases, trajectories farther from sensors provide sufficient information for localization and the positional errors in both planners converge to similar values. Intuitively, as the information space becomes artificially saturated with abundant state information, the agent can remain well-localized regardless of its trajectory. Conversely, when the information space has greater disparity, the BRM excels at finding higher-quality paths that provide greater state information.

Sensor Model Uncertainty vs. Positional Error at Goal Location

(a) Varying sensor noise

Maximum Sensor Range vs. Positional Error at Goal Location

Maximum Sensor Range (m) (b) Varying sensor range

Fig. 5. Characterization of the positional accuracy of the robot for both the PRM shortest path and BRM planners as a function of the sensor noise and sensor range. (a) Accuracy versus sensor noise. The positional accuracy of the PRM shortest path algorithm suffered with increased noise. The positional accuracy of the BRM increased slightly but not substantially with noise. (b) Accuracy versus range. The positional accuracy of the PRM shortest path algorithm increased with sensor range as the agent had more ranges for localization. Even with very short range, the BRM algorithm was able to find paths that maintained relatively good localization.

9.4. Algorithmic Performance

Second, we assessed the speed improvement of utilizing the linearized EKF covariance update during planning. We com-

pared the time required by the planning search process when using the one-step linearized EKF covariance update (f j to that of the standard EKF covariance updates. Note once again that the one-step covariance transfer function produces the same resulting covariance as performing each of multiple standard EKF updates in succession; there is no tradeoff in accuracy. This experiment evaluates the effect of using the one-step transfer function on planning speed. Planning experiments were performed using randomized sensor locations in maps of varying size (30-100 m per side). To reduce variability in the speed comparison results, the number of sensors was held constant throughout the experiments and the number of nodes was sampled randomly in proportion to the area of the environment to maintain consistent trajectory lengths.

Figure 6(a) shows the relative search times with respect to the depth of the search tree in the corresponding trajectory graph. The BRM maintains a consistent improvement by over two orders of magnitude, with search costs scaling logarithmically with increasing tree depth. Similar results are obtained when comparing the search times with respect to the length of the resulting path, shown in Figure 6(b), reiterating the significant scalable improvement of the one-step update. The one-step covariance update presents a consistent improvement in planning speed and scales with the size of the trajectory graph, making planning in belief space with the BRM computationally tractable.

Note that to construct update matrices for each trajectory in the graph, the one-step linearized search incurs a one-time build cost comparable to the cost of one path search using the standard covariance model. However, this cost is amortized; the BRM reuses this graph to enable efficient search in replanning.

9.5. Example trajectories

Finally, example trajectories are shown in Figures 7-8. In Figure 7, a mobile robot navigates through a small-sized indoor environment (~70 m in length) providing ranging beacons for localization. The BRM planner detours from the direct route chosen by the shortest path planner for sensor-rich regions of the environment. Whereas the shortest path planner accumulates positional error by relying on dead-reckoning, the BRM path incorporates ranging information to maintain lower uncertainty.

Figure 8 shows example trajectories for a very large planning problem. The robot must navigate across the MIT campus from the bottom right corner to the top left corner. Scattered throughout the environment are ranging beacons with known position, shown as the small (blue) circles. The robot can localize itself according to the ranges, but the ranging accuracy varies across campus according to the proximity and

Fig. 6. Algorithmic performance. (a) Time to plan versus tree depth (b) Time to plan versus path length. Note that these graphs are semi-log graphs, indicating two orders of magnitude increase in speed.

density of the beacons. The robot is also constrained to the outside paths (and cannot short-cut through buildings, the light-grey blocks). The shortest path planner shown in Figure 8(a) finds a direct route (the solid (blue) line) but the positional uncertainty grows quite large, shown by the (green) uncertainty ellipses. In contrast, the BRM algorithm finds a path that stays well localized by finding areas with a high sensor density. The uncertainty ellipses are too small to be seen for this trajectory.

Fig. 7. Example trajectories for a mobile robot in an indoor environment (~70 m across) with ranging beacons. The robot navigates from START (lower left) to GOAL (top). The BRM finds a path in close proximity to the ranging beacons, balancing the shorter route computed by the PRM in configuration space against a lower cost path in information space. The positional uncertainty over the two paths is shown as the bold covariance ellipses.

(b) BRM: lowest expected uncertainty path

Fig. 8. Example paths for a mobile robot navigating across MIT campus. The solid line in each case is the robot path, the small dots are the range beacons being used for localization, and the dark ellipses are the covariances Z of the robot position estimate along its trajectory. Note that the shortest path trajectory grows very uncertain, whereas the lowest expected uncertainty path always stays well localized at the cost of being slightly longer.

10. Conclusion

In this paper, we have addressed the problem of planning in belief space for linear Gaussian systems, where the belief is tracked using Kalman filter style estimators. We have shown that the computational cost of EKF predictions during planning can be reduced by factoring the covariance matrix and combining multiple EKF update steps into a single, one-step process. We have presented a variant of the PRM algorithm, called the belief roadmap planner, and shown that it substantially improves planning performance and positional accuracy. We have demonstrated our planning algorithm on a large-scale environment and showed that we could plan efficiently in this large space. This kind of trajectory has been reported elsewhere (Roy and Thrun 1999) but in limited scales of environments. We believe that our demonstration of belief-space planning in the MIT campus environment is considerably larger than existing results.

References

Alterovitz, R., Branicky, M. and Goldberg, K. (2006). Constant-curvature motion planning under uncertainty with applications in image-guided medical needle steering. Workshop on the Algorithmic Foundations of Robotics. Berlin, Springer.

Alterovitz, R., Simeon, T. and Goldberg, K. (2007). The stochastic motion roadmap: a sampling framework for planning with Markov motion uncertainty. Proceedings of Robotics: Science and Systems.

Bohlin, R. and Kavraki, L. (2000). Path planning using lazy PRM. Proceedings of the IEEE International Conference on Robotics and Automation.

Cassioli, D., Win, M. Z. and Molisch, A. F. (2002). The ultra-wide bandwidth indoor channel: from statistical model to simulations. IEEE Journal on Selected Areas in Communications, 20(6): 1247-1257.

Fassbender, H. (2000). Symplectic Methods for the Symplec-tic Eigenvalue Problem. Dordrecht, Kluwer Academic/New York, Plenum.

Hsu, D., Latombe, J. and Kurniawati, H. (2006). On the probabilistic foundations of probabilistic roadmap planning. The

International Journal of Robotics Research, 25(7): 627643.

Julier, S., Uhlmann, J. and Durrant-Whyte, H. (1995). A new approach for filtering nonlinear systems. Proceedings of the American Control Conference.

Kaelbling, L., Littman, M. and Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101: 99-134.

Kailath, T., Friedlander, B. and Ljung, L. (1976). Scattering theory and linear least squares estimation—II discrete-

time problems. Journal of the Franklin Institute, 301: 7782.

Kalman, E. R. (1960). A new approach to linear filtering and prediction problems. Transactions of the ASME (Series D)—Journal of Basic Engineering, 82: 35-45.

Kavraki, L., Svestka, P., Latombe, J.-C. and Overmars, M. (1996). Probabilistic roadmaps for path planning in high dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4): 566-580.

Koenig, S. and Likhachev, M. (2002). Improved fast replanning for robot navigation in unknown terrain. Proceedings 2002 IEEE International Conference on Robotics and Automation (ICRA'02), Vol. 1.

LaValle, S. and Kuffner, J. (2001). Randomized kinodynamic planning. The International Journal of Robotics Research, 20(5): 378-400.

Lozano-Perez, T. (1983). Spatial planning: a configuration space approach. IEEE Transactions on Computers, 32(2): 108-120.

Missiuro, P. and Roy, N. (2006). Adapting probabilistic roadmaps to handle uncertain maps. Proceedings of the IEEE International Conference on Robotics and Automation.

Pepy, R., Kieffer, M. and Walter, E. (2008). Reliable robust path planner. Proceedings 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2008), pp. 1655-1660.

Prentice, S. (2007). Robust range-based localization and motion planning under uncertainty using ultra-wideband radio. Master's Thesis, Massachusetts Institute of Technology, Cambridge, MA.

Redheffer, R. (1962). On the relation of transmission-line theory to scattering and transfer. Journal of Mathematical Physics, 41: 1-41.

Roy, N. and Thrun, S. (1999). Coastal navigation with mobile robots. Advances in Neural Processing Systems, 12: 10431049.

Smith, R., Self, M. and Cheeseman, P. (1990). Estimating uncertain spatial relationships in robotics. Autonomous Robotic Vehicles, I. J. Cox and G. T. Wilfong (eds). Orlando, FL, Springer.

Stentz, A. (1995). The focussed D* algorithm for real-time replanning. The International Joint Conference on Artificial Intelligence, Vol. 14, pp. 1652-1659.

Van den Berg, J., Ferguson, D. and Kuffner, J. (2006). Anytime path planning and replanning in dynamic environments. Proceedings 2006 IEEE International Conference on Robotics and Automation (ICRA 2006), pp. 23662371.

Vaughan, D. (1970). A nonrecursive algebraic solution for the discrete Riccati equation. IEEE Transactions on Automatic Control, 15(5): 597-599.

Win, M. Z. and Scholtz, R. A. (1998). Impulse radio: how it works. IEEE Communications Letters, 2(2): 36-38.