Contents lists available at ScienceDirect

journal of Computational Physics

www.elsevier.com/locate/jcp

An efficient exponential time integration method for the numerical solution of the shallow water equations on the sphere

Stéphane Gaudreault *, Janusz A. Pudykiewicz

Recherche en Prévision Numérique Atmosphérique, Science and Technology Branch, Environment Canada, 2121 Route Transcanadienne, Dorval, Québec, H9P 1J3, Canada

I CrossMark

A R T I C L E I N F 0

Article history:

Received 16 October 2015

Received in revised form 11 July 2016

Accepted 12 July 2016

Available online 18 July 2016

Keywords:

Shallow water equations Exponential time integration methods Numerical Weather Prediction Time integration Exponential methods

A B S T R A C T

The exponential propagation methods were applied in the past for accurate integration of the shallow water equations on the sphere. Despite obvious advantages related to the exact solution of the linear part of the system, their use for the solution of practical problems in geophysics has been limited because efficiency of the traditional algorithm for evaluating the exponential of Jacobian matrix is inadequate. In order to circumvent this limitation, we modify the existing scheme by using the Incomplete Orthogonalization Method instead of the Arnoldi iteration. We also propose a simple strategy to determine the initial size of the Krylov space using information from previous time instants. This strategy is ideally suited for the integration of fluid equations where the structure of the system Jacobian does not change rapidly between the subsequent time steps. A series of standard numerical tests performed with the shallow water model on a geodesic icosahedral grid shows that the new scheme achieves efficiency comparable to the semi-implicit methods. This fact, combined with the accuracy and the mass conservation of the exponential propagation scheme, makes the presented method a good candidate for solving many practical problems, including numerical weather prediction.

Crown Copyright © 2016 Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

1. Introduction

The nonlinear partial differential equations, governing physical and chemical processes in continuous systems, are often solved using the method of lines [43] which leads to a large set of Ordinary Differential Equations (ODEs) of the form

— = F(u, t), u(0) = U0, (1)

where u e R" is the state vector, n indicates the number of degrees of freedom, and F is the function describing all forcings; F : Rn+1 R".

The system described by Eq. (1) is typically stiff because it governs processes with different time scales. Consequently, the selection of an appropriate time integration scheme is of the foremost importance. In all situations when the stiffness comes from the linear part of F, the problem could be cast in a simpler form as

* Corresponding author.

E-mail addresses: Stephane.Gaudreault2@canada.ca (S. Gaudreault), Janusz.Pudykiewicz@canada.ca (J.A. Pudykiewicz). http://dx.doi.org/10.1016/jjcp.2016.07.012

0021-9991/Crown Copyright © 2016 Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

^ = Lu + N(u, t), (2)

where L and N are linear and nonlinear part respectively.

There are abundant examples of partial differential equations which lead to a semi-discrete form of Eq. (2). They include (see Minchev [23] for review): Allen-Cahn, Burgers, Cahn-Hilliard, Kuramoto-Sivashinsky, Navier-Stokes, shallow water, Swift-Hohenberg, nonlinear Schrodinger, convection-diffusion and convection-reaction-diffusion equations. The numerical solution of the set of semi-linear system described by Eq. (2) is often obtained using semi-implicit schemes where the linear term Lu is solved with the help of a method appropriate for the stiff problem (usually an implicit method) whereas the nonlinear part N(u, t) is solved explicitly. This methodology offers a rational compromise between requirements of accuracy and efficiency. The alternative strategy that has attracted increased attention in a number of diverse fields in recent years, is the exponential time integration method. It requires the evaluation of functions related to the exponential of the Jacobian matrix [14]. Methods belonging to this class offer very high accuracy and stability without a severe time step restriction of the explicit numerical schemes. Since their introduction in the late 1950s, several numerical software packages providing various implementations have been proposed [14]. For a recent review, see [17] and the references therein.

The exponential integration schemes based on the static splitting given by Eq. (2) were discussed by Beylkin et al. [4] and Cox and Matthews [9]. The basic idea of these methods is quite simple. We begin from the multiplication of Eq. (2) by factor eLt and the integration over time step At to obtain

un+1 = eLAtun + J eL(At-T)N(u(nAt + r))dr, (3)

where un, un+1 are solutions at times n and n + 1, At is the time step and t is the time. The exponential term is then evaluated using the methods discussed in [36] and the integral of the nonlinear term is approximated with an appropriate quadrature. It was shown that the methods of this class offer a very good accuracy and a realistic representation of the high frequencies in contrast to the traditional semi-implicit schemes [9].

Considering the results obtained with exponential integration methods in various areas of science, it is justified to investigate them in the context of numerical weather prediction where the selection of a time integration scheme is a key element. Traditionally, the meteorological equations are solved using the well-established Semi-Implicit Semi-Lagrangian (SISL) integration schemes, first introduced in the atmospheric community by André Robert [32] and [33]. The main advantage of these methods is that they are not limited by a stability-based CFL condition. Hence the time step size can be chosen solely on the basis of a desired accuracy.

The efficiency and robustness of the SISL methods for integrating meteorological equations led to their common use in many of the meteorological centers. Semi-Lagrangian schemes were further advanced by the development of the accurate parallel algebraic solvers [6,27]. The fully conservative algorithms were also implemented. Most of these advances have been motivated by the need to use parallel computing architectures in the optimum manner. The same consideration is the driving force behind the search of alternative techniques.

Different versions of the exponential time integrators have been studied in the meteorological context by numerous authors. Archibald et al. [2] used the scheme of Beylkin et al. [4], based on the assumption of static splitting described by Eq. (2), to solve the shallow water equations on the cubed sphere. Clancy and Pudykiewicz [8] applied the exponential propagation methods based on the dynamic linearization [40] to the shallow water system on an icosahedral geodesic grid. The basic principle of this method is outlined briefly as follows. After expanding Eq. (1) in a Taylor series around state u(tn) at tn we obtain

— (t) = Fn + Jn ■ (u(t) - un) + R(u(t)), (4)

where un = u(tn), Fn = F (un ), Jn = fâ (un ) and

R(u(t)) = F(u(t)) - Fn - Jn ■ (u(t) - un). (5)

The use of integrating factor e-Jn t on Eq. (4) yields

d(e-Jntu(t)) = e-Jnt(Fn - Jnun) + e-Jnt R(u(t)). (6)

Integrating over [tn, tn + Atn] and multiplying by eJn (tn+Atn~> leads to the integral form

tn+Atn

u(tn + Atn) = un + (eJnAtn - I)J-Fn + f eJn(tn+Atn-t)R(u(t))dt, (7)

where Atn indicates At at the time step number n and I is the identity matrix. The methodology of linearization applied in this paper is discussed in [15,16,18].

It was shown in [8] that the exponential schemes based on the dynamic linearization in the form of Eq. (4) are accurate and stable for very long time steps. One of the most important facts observed was that the gravity waves are propagated by the scheme without distortion of their phase speed. In the past this fact was not considered to be of major importance because gravity waves were considered to be "noise" that needed either explicit or implicit filtering. Considering the role of gravity waves in maintaining the geostrophic equilibrium, we can conjecture that realistic representation of the phase properties of gravity waves can contribute to realistic meteorological forecast.

A critical issue that was identified in [8] is that the efficiency of the exponential integrators applied to the shallow water equations is not as good as that of the traditional semi-implicit technique. Although recent progress in the computational linear algebra has led to more efficient algorithms, like the method of Niesen and Wright [28], it was found that semi-implicit methods are still more efficient (see [8] and [11] for the discussion).

The aim of this paper is to address the efficiency issue of the exponential time integrators. We propose a modification of the algorithm described in [28] by applying the incomplete orthogonalization method instead of the Arnoldi iteration. Theoretical considerations are presented using a general formalism of the linear algebra and they are illustrated by integrating the icosahedral shallow water model developed by Pudykiewicz [29,30]. The model is used consistently as the testing platform for all the proposed modifications of the time integration scheme.

The incomplete orthogonalization method was originally proposed as an eigenvalue algorithm for general non-symmetric matrices by Saad [34]. The method was also applied to solve linear systems [35], and more recently, for the time integration of an advection-diffusion equation [20]. The modifications of the algorithm described in [28] are not limited to the replacement of the Arnoldi iteration by the incomplete orthogonalization method. We also introduce a simple strategy to determine the initial size of the Krylov space relying on the information from previous time instants which is ideally suited for the integration of the fluid equations where the structure of the Jacobian of the system is not changing rapidly between subsequent time steps.

The paper is organized as follows. The first section provides the rationale for the selection of the shallow water equations in the study of the efficient time integration methods. Following this, we present a concise description of the linear operators obtained after the spatial discretization of the governing equations on icosahedral grid. The general formulation of the time integration algorithm with an exponential propagation method is covered in Sections 4 and 5. We also include the algorithmic formulation of the scheme developed in the course of this study. In the last part of the paper, we compare our results with those reported in [8] where the original exponential propagation algorithm was employed. The list of mathematical symbols used in the paper is collected in Appendix A.

2. The shallow water equations

We will perform our investigation with the shallow water equations because they provide an adequate and, at the same time, relatively simple model of a continuum. For this reason, they are commonly used in many diverse areas of science. One of the most relevant applications of the shallow water equations is in geophysical fluid dynamics, where they provide an approximation for rotating stratified fluids. The important property of the shallow water equations is that they can be obtained as the amplitude equations for the vertical normal modes of the continuously stratified fluids. This fact is very useful in the analysis of large scale atmospheric circulation. Besides being an essential tool in theoretical studies, the shallow water equations are also very useful in the process of designing Numerical Weather Prediction models by creating an ideal test model for the evaluation of time integration methods as well as for the investigation of various aspects of space discretization.

Herein we will analyze the flow of fluid on the sphere described in the Cartesian coordinates by the following parametric equations

x = a cos 0 cos 0

y = a cos 0 sin 0

z = a sin 0

where 0 is the latitude (—n/2 < 0 < n/2), 0 is the longitude (0 < 0 < 2n), and a is the radius of the sphere. The corresponding shallow water equations can be written in the following form

^ = — W (u) — grad(£), (8)

— = —div(hu), (9)

where u is a smooth velocity field on sphere S with values in tangent space TS, W (u) = n x u, n = (Z + f n) is the absolute vorticity, Z = Curln (u) is the relative vorticity, h is the smooth scalar field on S describing thickness of the fluid layer, E = (|u|2/2 + $), $ = g(h + hs), hs is the height of the surface level, g is the gravity acceleration, f = 2Q sin0 is the Coriolis parameter, Q is the angular velocity of the rotation, x denotes the vector product in R3, n is the surface normal vector.

Denoting 6 = £i, 0 = £2 and using the general expression for the metric tensor _ dx dx d y d y dz dz

gii = Wi + Wi + Wi Wj '

We obtain the following expression for the metric tensor for the spherical surface " h2 0 '

where hi and h2 are scale factors: hh\ = a2 and h2 = a2 cos2 6.

The metric tensor is diagonal and the gradient, divergence and Curln operators described by Eqns. (8)-(9) could be expressed by a general formula for the orthogonal curvilinear coordinates (Morse and Feshbach [26], page 115)

■A 1 d^

grad(^} = ^ = £ anh_ _,

div(V) = V-V = -h- £ ddr (hih2 ^

hih2 n= d^n\ hn

n / d d

Curln(V) = T-(h2V2) - — (hi V1) , hih2\ dfi

where ^ is the scalar field on the sphere, Vn is the nth component of the vector V, and a1, a2 are tangent basis vectors defined as

— sin 9 cos 0 — sin 0

a1 = — sin 9 sin 0 a2 = cos 0

cos 9 0

3. Shallow water model used in the experiments

The experiments are performed using the shallow water model on the sphere as described in [30]. Space discretization is based on the finite volume technique on an icosahedral geodesic grid [12] obtained by iterative division of the icosahedron inscribed in the sphere. The spherical surface in the model is represented by a simplicial complex of triangles. The sample geodesic grid obtained at the third iterative division of the icosahedron is shown in the upper part of Fig. 1. Control volume , centered at the given vertex i, is defined by a closed contour composed of geodesic arches connecting spherical radial projections of the centers of triangles and the midpoint of edges coincident at the vertex (lower part of Fig. 1).

The differential operators intrinsic to the spherical manifold are approximated using Stokes theorem on a simplicial complex embedded in R3 as discussed in [30]. This leads to the following expressions

grad(^)

div(V)

= £ Vj • bj

Qi j(i)

(Curln (V) • n)

= £ Vij • di

Qi j(i)

where is the control volume, Y2j(i) denotes the summation over elements constituting the boundary of Q, V is a smooth vector field on sphere S with values in the tangent space TS, N^ is the discretization of the normal component of the gradient, subscript ij denotes the values at the interface point eij.

Vectors bij e T S and d^ e T S are

bj = K ij 4+ n2 ij Sfy/Si

dij = (t ¿j4 + T ij4)/Si,

where njj ij e T S and t kj e T S are, respectively, the unit vectors normal and tangent to the geodesic arch connecting points

e j and mk (k = 1, 2); Slk is the length of this arch and Si is the area of (see the lower panel of Fig. 1 and [30] for the

ij ij explanation).

Fig. 1. Geodesic grid on the sphere, tangent space and the normal vector (upper panel). The structure of an element used in the construction of the operators (lower panel).

Similar expressions for discretization of operators are used in the shallow water model described by Tomita et al. [41]. The main differences are in the methodology used to derive the governing equations and in the definition of the control volumes. There are also slightly different formulae used to approximate the interface values ifaj and Vjj in Eqns. (16)-(18). Both approaches, however, share a common methodology of discretizing the operators on the differentiable manifold approximated by a triangular simplices embedded in R3. This technique is established very well in the discrete differential geometry. The essence of this approach comprises a representation of the curved manifold by a discrete mesh embedded in Cartesian space rather than by a metric tensor in curvilinear coordinates. The basis for this methodology was introduced in the framework of the Regge Calculus [31]. An excellent discussion from the modern point of view, of the mathematical foundation of the discretization used in icosahedral models described in [41] and [30], can be found in the introduction to the book on discrete differential geometry by Bobenko and Suris [5].

After expressing the shallow water Eqns. (8)-(9) in terms of the Cartesian coordinates (see section 2.3 in [42]) followed by integration over the control volumes , and applying Eqns. (16)-(18) for approximation of the differential operators, we obtain the system of ODE's of the form

dux dt

= -Wx-GSx-£

duz ~dt

-Wy - GSy • £

= -Wz - GSz • £

— = -Dx • (uxh) - Dy • (uy h) - Dz • (uzh)

where ux, uy, uz, £ = ((u2 + u2y + u2)/2 + g(h + hs)), and h are column arrays with Cartesian components of the velocity u, total energy and the height field h (all quantities are the volume averages over ), GSx, GSy, GSz are the sparse matrices

used to evaluate the Cartesian components of the gradient on the sphere, while Dx, Dy, and Dz are the sparse matrices used to evaluate the divergence div(u) = Dx-ux + Dy ■ Uy + Dz ■ Uz. Further, the column arrays Wx, Wy, and Wz are defined as

Wx = n Px

Wy = n Py (22)

Wz = n Pz,

where n is the column array containing the control volume average values of the absolute vorticity; the components of this array are

ni = Zi + fi = V'JuX + vyj Uy + VzuJz + fi, (23)

where i e [1, n], j e [1, n], and n denotes the number of nodes.

Sparse arrays VX, Vy, and Vz are used to calculate the vertical component of the vorticity. For further details consult [30]. Arrays Px, Py, and Pz contain Cartesian components of the vector product of the velocity and the surface normal and have the following form

Pi x = n'yu'z - n'zu'y

Pi y = niux - nixui

Pi = nixuy - niyuix

where nx, ny, and nz are the column arrays with x, y, and z components of the normal vector n evaluated at the centers of the control volumes.

We may consider adding a dissipation terms to the system in the form D f ■ ^, for each prognostic variable ^ in the same manner as in [8]. The dissipation operator takes the following form

Df = -v L2, (25)

where L is the sparse matrix representing the Laplace operator. Alternatively, we could apply the dissipation scheme discussed in [30]. However, based on the experiments reported in [8], we assume that the simple formula (25) is sufficient for the purposes of this study.

The dissipation coefficient v in Eqn. (25) is given, after [8], as follows

AxnY f x

v = Yh^—, (26)

where Yh is the coefficient of proportionality (Yh e [0.04 x 10-2, 1.25 x 10-2]), nY = 4, Ate is the reference time step,

Ate = 240 s, and Ax is the average separation of the node points estimated as Ax = ^4na2/Ng, with a indicating the earth

radius and Ng = 10 x 221 + 2 being the number of nodes in the geodesic mesh (l is the grid number).

The system of equations (21) could be rewritten in a compact form of the autonomous system defined by Eq. (1) with state vector u containing the components of the velocity and the height field as

u = (uTx uTy u[ hT)T. (27)

The dimension of the state vector u is thus 4 Ng where Ng is the number of control volumes considered in the discretization. The right-hand side vector of Eq. (21) is written compactly as

F = (FuT Fuyy Fuj Fhy)T. (28)

The components of arrays Fux, Fuy, Fuz are defined by the following relations

F< = - (( E Vx u}x + Vy uJy + vH + f^ (niy uz - nz uiy) -

'1 ( i 2 . i 2 . j 2,

J^GSj 2(uJx + uy + uJz ) + g(hj + hi)), (29)

Fuy = - ((E Vx u}x + Vy uJy + vz ui^j + f ^ (nz uix - nix uz) -

Y, GSiy (2 (uJx2 + uJy2 + ui2) + g(hj + hi)), (30)

FuZ = ^ Ç VX uX + vy Uy + VZ uf) + f^ (nX uy - n'y uX) -

J2gsz[2(uX + u'y + uZ ) + g(hj + hi)).

The components of array Fh are

Fhi — -^ ( OX (hj u{) + Diy (hj uJy) + DZ (hj uZ) ). j

3.1. Evaluation of the Jacobian

The Jacobian J of the autonomous system described by Eq. (21) is a sparse matrix of size (4Ng) x (4Ng) given by the formula

After applying the sparse matrix representation introduced in Eqns. (29)-(32) and elementary rules of differentiation for vector functions, we obtain a compact analytical expression for the Jacobian

J — Jv + Jr + Jm

where the block array Jv represents relative vorticity and the block array Jr accounts for rotation and dissipation. The third block array Jm in Eq. (34) represents coupling between the mass field and the velocity field. The block array, Jv is defined as follows

Jv — -

{ P'X vj {P'x Vj} {PX VZ'} {0}

{py Vj {py Vj {py vh {0}

{ P'Z Vj {P'Z Vj { pZ vh {0}

{0} {0} {0} {0}

The wave brackets in Eq. (35) denote matrices (for example: the expression {P'x VlJ} represents the matrix with ij-th element equal to P'x V'xj). The second block array, Jr accounting for the effects of rotation is cast as

Jr — +

{Dij} {ni nZ Sij} -{rf n}y

{ni nZ Sij} {Dj -w nX

{ni nJy Sij} -M nX Sij} {Dj

{0} {0} {0}

The last block array, Jm in Jacobian is written as

'{GSX uX} {GSX uy} {GSX uZ} {gGSX}

{GSy uX} {GSy uy} {GS? uZ} {gGSj

Jm —

{GSZ uX} {GS'Z uy} {GS'Z UZ} {gGsZZ}

_ {DX hj} {Dy hj} {D'Z hj} { dx uX + D? uy + DZUZ }

The structure of the Jacobian for the shallow water equations on icosahedral grid number 6 having 40962 nodes and average global resolution of the order of 100 km is depicted in Fig. 2. The large sparse matrix has a multi-band structure

nz = 51731 nz = 1439

Fig. 2. The sparsity pattern of the Jacobian matrix for the shallow water equations discretized on icosahedral grid number 6.

which is shown by subsequent panels zooming into finest structures of the operator. The matrix depicted in Fig. 2 is not normal which means that J ■ JT = JT ■ J.

The operator J given by Eq. (34) represents an explicit form of the tangent linear model for the shallow water system. The methodology of time discretization expressed by Eq. (4) could be also applied to the full three-dimensional system of atmospheric equations. Then, J would include the parameterization of subgrid-scale processes and chemical reactions in both gas and aerosol phase. Thus, such an approach would provide the most realistic coupling of various processes which are considered in meteorological models.

The model Jacobian could have following form

Jmodel = J + Jchem + ^ ' Ji i

where J and Jchem are known from the exact analytical expressions and the J representing various parameterizations would be evaluated approximately.

4. Exponential time integration methods

The semidiscrete system given by Eq. (21) obtained in the previous section, is an example of a general problem arising in the "method of lines", in which the space derivatives are discretized first, leading to a large system of ODEs. The time integration in the icosahedral model was originally accomplished using the fourth order Runge-Kutta scheme [30]. The explicit time integration schemes are not practical due to the stiffness of the Jacobi matrix reflecting the presence of multiple time scales in the system of equations.

The model was later expanded by including a class of semi-implicit predictor corrector time integration schemes [7] and the exponential integration methods [8]. The general idea behind the latter is relatively simple and it can be outlined as follows. Considering the fact that (21) is cast in the form of a general autonomous system given by Eq. (1) with the state

vector as in Eq. (27) and the right-hand side defined by Eq. (28), we can write an analytical solution in integral form given by Eq. (7). After some changes in notation related to the introduction of y function we obtain:

tn+Atn

u(tn + Atn) = Un + pi(AtnJn)AtnFn + J eJn(tn+Atn-t) R(u(t))dt, (38)

where Jn is the Jacobian matrix at time tn, R(u(t)) is defined by (5) and yi is one of the so-called y functions defined for complex matrix argument Z by the following recursion

yo( Z) = eZ,

yk+i(Z) = Z-^yk(Z) - i ^, for k > 1.

Depending on the quadrature used to approximate the integral term in Eq. (38), we will obtain different versions of the exponential integration scheme.

Niesen and Wright [28] noted that every stage in an exponential integration algorithm can be described by a linear combination of the y functions acting on certain vectors:

yo (A)bo + yi (A)bi + y2 (A)b2 + ... + yp (A)bp, (40)

where b0, b1,..., bp e Rn.

A number of efficient exponential integration schemes is evaluated by Tokman [40]. For the tests with the shallow water equations, Clancy and Pudykiewicz [8] considered two integration schemes assuming the fixed time step Atn = At; this assumption is typical in most of the meteorological applications. The first of the schemes is denoted as EP12

Un+i = Un + yi(JnAt)AtFn. (41)

The second order EP12 scheme is obtained by ignoring the integral term in Eq. (38). This scheme is considered most often because its perceived efficiency.

The second method which was selected for our tests is the third order scheme denoted by Tokman [40] as EP13

Un+1 = Un + yi (JnAt)AtFn + 3 y2(Jn At)AtRn-i, (42)

Rn-1 = F(Un-i) - F(Un) - Jn(Un-1 - Un). To summarize: the two schemes tested are characterized by the following b-vectors: EP12: b0 = 0, bi = AtFn,

EP13: b0 = 0, bi = AtFn, b2 = 3AtRn-i.

5. Description of the algorithm

In this section, we first describe an efficient technique to evaluate yp (A)v terms for a general n x n nonsymetric matrix A over the real numbers and for the vector v e Rn. Then, we use this method to evaluate a linear combination of y functions of the form given by Eq. (40) using the method described in [37] and [28]. The discussion is completed by the evaluation of the optimum selection of parameters of the algorithm.

5.1. Krylov approximation of the yp (A)v terms

Finding reliable and accurate methods to compute the matrix exponential and related functions is difficult, and this is still the topic of a considerable research [24,25,28]. In a typical application, the Jacobian matrix is large and sparse, like the operator with the pattern depicted in Fig. 2. Computing y functions directly is then intractable.

An efficient method to solve the problem consists in approximating yp(A)v e Rn in the m-dimensional Krylov subspace Km (m < n) defined as

Km(A, v) = span{v, Av,..., Am-i v}. (43)

Unfortunately, vectors Aiv do not form a good basis because they are too close to each other to be linearly independent (they almost point in the same direction as the dominant eigenvectors of A). Therefore, the modified Gram-Schmidt process is generally used as part of the Arnoldi iteration to obtain an orthonormal basis [28,37]. Analyzing the results from [8], it was found that performing the Arnoldi iteration is the primary computational cost of exponential time integrators.

To reduce the calculation time, we replaced the Arnoldi iteration by the Incomplete Orthogonalization Method [34], with an orthogonalization length of 2 (hereafter IOM2). The only difference with the classical Arnoldi iteration procedure is that at each step the current vector is orthogonalized only against the 2 previous ones instead of all of them. The time complexity of the IOM2 procedure is expressed as O(nm + Nm) instead of O(nm2 + Nm) for the Arnoldi iteration, where n denotes dimension of the problem (typically of the order 105 to 106), m is the dimension of the Krylov space (typical values in the shallow water flow on the sphere for icosahedral number 6 are of the order 30 to 50) and N is the number of nonzero elements in the Jacobian matrix. The value of N depends mainly on the order of the approximation and number of processes included in the simulation. Typical values of N are in the range 24n to 56n. The IOM2 is described in Algorithm 1:

Algorithm 1 Incomplete orthogonalization method: computation of Vm+1 and Hm. 1: Input: A, v1 = v 2: for j = 1, m do

3: s = Avj

4: for i=max(1, j — 2), j do 5: hj j = vi • s 6: s = s — hj jVj 7: end for

8: if |s| < tolerance then 9: break 10: end if

11: hi+1, j = |s| 12: Vj+1 = s/|s| 13: end for

14: Define Vm+1 = [V1, • •• , Vm+1 ], H m = {hi, j }i<i<m+i,i<j<m

After performing IOM2 procedure we obtain a n x m matrix

whose columns are the basis vectors. The sequence [v1, v2,..., vm] obtained after application of Algorithm 1 is a basis of Km(A, v1) and satisfy the incomplete orthogonality property

Vi • Vj = Sij for \i — j\<2.

In general, Vm Vm is not an identity matrix but a square matrix with ones on the main diagonal and zeros elsewhere, except for the right upper and left lower corners containing nonzero elements. Both the stability and the convergence of the IOM2 method in the context of the exponential time integrators are discussed in [20].

Similarly to the Arnoldi iteration, the IOM2 procedure also generates an m x m Hessenberg matrix Hm. Matrix Hm can be seen as an oblique projection of the action of the matrix A on the Krylov subspace and it satisfies the relation

AVm = Vm+1 Hm+1 = Vm Hm + hm+1,mVm+1 eh, (45)

where em = (0, ...0, 1)T e Rm is the last canonical basis vector in Rm.

The sparse matrix Hm generated by the Arnoldi iteration is an upper Hessenberg matrix whilst the equivalent matrix produced by the IOM2 procedure is an upper quadridiagonal matrix.

The following lemma from [20] motivates the Krylov approximation we use for the action of the vp (A) on a given vector v.

Lemma 1 (Koskela [20]). Let A e Cnxn and let Vm and Hm be the result of m steps of IOM2 on Jacobian matrix A with starting vector v. Then for any polynomial pm—1 of degree up to (m — 1) the following approximation holds:

Vm = [v 1, v2, ..., Vm],

Pm-l(A)v ^ fi Vm Pm-l(Hm)ei, where e1 = (1, 0, ...0)T and fi = ||v||.

The action of çp (A) on v is then approximated as

Çp (A)v ^ fi Vm Vp (Hm)e1.

This approximation can be improved by using the error estimate

em = \\P hm+\,meli ^p+1(Hm)ej vm+\W = fi ^+1,!^ [Pp+\(Hm)]m,\,

proposed in [35] as a correction, to finally obtain

Vp (A)v « fi Vm Vp (Hm)e1 + fi hm+1,mem Vp+\( Hm)e1 vm+1.

Computing this approximation is expected to be considerably less expensive than evaluating the pp (A)v directly. Although evaluation of (49) requires both the pp+1(Hm)e1 and the pp (Hm)e1, resorting to the recursive definition of p functions is avoided. The augmented matrix technique, first proposed in [36] for the special cases of p0 and pi and generalized by Sidje [37] for any integer p > 0, allows to calculate both functions in a single Krylov projection. This technique is based on the following theorem.

Theorem 1 (Sidje [37] EXPOKIT) ' Hm c 0

Hm+p —

Let c e Cm and

. C(m+p)x(m+p)

exp (t H m+p ) —

exp (T Hm ) Ttyi(T Hm)c T 2ty2(T Hm)c ... T p typ (t Hm)c'

t p-1 (p-1)!

There are several methods that can be used to compute the exponential of the augmented matrix Hm+p [24,25]. For simplicity, we use the 13th diagonal Pade approximation combined with scaling and squaring algorithm implemented in the MATLAB function expm [13].

The approximation based on the above Krylov projection method could be used directly to evaluate the pp (A)v. The approximation error for the exponential of a negative semidefinite matrix is bounded as in the following equation [20]

I\eAv - VmeHme1ß\ <

ea( A)

' + ea(Hm) yHm

where a(A) = max{Re(Xj)} denotes the spectral abscissa of A (i.e the supremum among the real part of the eigenvalues of A), m is the size of the Krylov subspace and ||A|| is the matrix 2-norm. When we approximate the exponential matrix function or the related ( functions, this a priori estimate suggests that a rapid convergence could be obtained if we choose a large value of m when | A| is large.

An adaptative method will tend to increase m if the estimated error is too large. In practical applications, however, there is a limit for the value of m due to storage constraints. The method of solving this problem was proposed originally in the phipm algorithm described in [28] and it will be outlined in the next subsection.

5.2. Evaluation of a linear combination of the p functions

The basic algorithm described in subsection 5.1 can be used to evaluate the linear combination of the form of Eq. (40) using the method described in [37] and [28]. Skaflestad and Wright [39] observed that the function

u(t ) — cpo(t A)bo +1 pi(t A )bi + t2p2(tA)b2 + ... + tp typ (tA)bp, is a solution of the following initial value problem

u'(t) — Au(t ) + bi + tb2 + ... + ■

-bp, u(0) — bo.

(P - 1)!

The linear combination described by Eq. (40) is thus given by Eq. (53) with t = 1. If the time interval [0, 1] is split into subintervals, the solution u(tk+1) at the end of the (k + 1)-th subinterval can be expressed in term of the previous solution, u(tk), at the k-th subinterval, as

P P-i tj

u(tk+i) = y>o(TkA)u(tk) + Y, 4 V (xkA)Y, jbi+j, (55)

i=l j=0 j

where Tk = tk+i - tk.

Using the recurrence relation Vq( A) = Vq+1( A) A + £ I, Eq. (55) can be simplified to obtain

P-1 Tj T

u(tk+i) = TpVp(TkA)Wp + J2 ~kwj, (56)

Tk vp(Tkn)mp + j j=o j

where vectors Wj are computed recursively in the following manner

p- j tl

w o = u(tk) and Wj = Awj-i +^ -pbj+i, j = 1,...,p. (57)

Eq. (56) provides an algorithm to compute the linear combination of V functions of type (40) by stepping the solution u(t) from t = 0 to t = 1. For each u(tk+1), only one Krylov projection must be computed with the 1OM2 method for the Tp Vp (TkA) Wp term. In other words, a total of K Krylov projections must be performed to complete K steps. Since 0 < Tk < 1, it is expected that computing vp(TkA)wp will require fewer Krylov vectors than computing vp(A)wp with an unscaled matrix. Hence, the more steps taken, the less expensive each projection will be. The key is then to find the best tradeoff between the number of subintervals K and the number of Krylov vectors needed per projection.

We note that (56) becomes increasingly sensitive to round-off errors as p increases. Using the techniques presented by Al-Mohy and Higham [1] several explicit multiplications by A in (57) can be avoided. In this study, we have only tested the method with time integration schemes that requires v functions up to p = 2 and no apparent instability was found. We intend to analyze this in detail in the future studies of the high order schemes.

5.3. Krylov adaptativity

In [28], Niesen and Wright developed an efficient way to partition the interval [0, 1] into K subintervals so that computing a sequence of K Krylov projections on subspaces with dimensions mk (k = 1, ... , K) is cheaper than computing one unscaled projection vp(A)wp. The approach used in this study is similar and is presented in Algorithm 2 (see [28] and references therein for the details):

Algorithm 2 Evaluate linear combination (40).

1: input : A, (b0, b1, ..., bp), Tol, m

2: t = 0, k = 0, uk = b0

3: t = 1, S = 1.2

5: Compute w0, ..., wp according to (57)

w0 = U(t) and Wj = AWj-1 +Ep=o Tbj+i, j = 1, .... p, do

Compute Hm, Vm, hm+1m and vm+1 using Algorithm 1 Compute Vp (A)v tt P Vm Vp (Hm)e1 + p hm+1,meTmVp+1 (Hm)e1 Vm+1 Compute dm = WP hm+1,meTmVp+1 (Hm)e1 vm+1 II = P \hm+1,m,m\ [Vp+1 (Hm)]m,1 Compute rn = ^Td W

Compute Tnew and mnew using Algorithm 3 Compute C(Tnew, m) and C(t, mnew) (See section 5.3.1) if C (Tnew, m) < C (t, mnew) then

t = min(max(Tnew, 1 t), 2t, 1 — t) else

m = min(max(mnew, L 3 m], 1), [ 4 mj) end if while m < S

_ 1 Tj

Compute uk+1 according to: U (tM) = Tp Vp (TkA)wp + Ep=0 j wj t = t + t , k = k + 1 while t = 1 return up

Based on the error estimate given by Eq. (48), we construct a cost function C(Tp, mp) which is used to determine whether it is more efficient to reduce Tp or to increase the size of the Krylov subspace mp. The new values Tp+1 and mp+1 are then selected so that the cost function is minimized and the error estimate is within a user defined tolerance.

Algorithm 3 Computing Tnew and mn

1: y = 0.8

2: if previous step was rejected and t was reduced then

3: -„_ log(T/to!d)

3: q = log(»fm«/|fm old)

4: else if previous step was rejected and q was computed in previous step then 5: Keep q constant

6: else

7: q = 4 m

8: endif

.. ,i/®+i)

9: Tnew = Tk[

10: if previous step was rejected and m was reduced then

v 1 /(mold -m)

11: k = |

/ \1 . I llfm II \

\l|fm old I I

12: else if previous step was rejected and K was computed in previous step then

13: Keep k constant

14: else

15: ic = 2

16: end if

17: mnew = m + M.

5.3.1. Cost function

The cost function is used to choose between two possibilities: either keeping m constant and changing t or keeping t constant and changing m. The option with the lowest computational cost is selected automatically. The cost function is an approximation of the FLOP (floating point operations) count required to evaluate Eq. (56). It is important to note that the expressions provided are based on very rough estimates of the FLOP count on a single processor-machine. Important details related to a specific computer architectures are not taken into account. Also, the interprocessor communication time should be taken into account in a parallel implementation.

Let N be the number of nonzero entries in a matrix A of size n x n. An evaluation of equation (56) is needed in order to advance from tk to tk+1. This evaluation requires the computation of the Wj vectors (57). Using the same sparse matrix technique as in the original phipm MATLAB code [28], this computation can be done in roughly 2(p — 1)(N + n) FLOP. Then, m steps of the IOM2 procedure are performed, for a cost of the order of (n + N)m FLOP. To evaluate (TkA)wp using Eq. (49), the exponential of a matrix of size (m + p + 1) must be computed with a cost of M(m + p + 1)3 FLOP. Assuming the expm algorithm implemented in MATLAB [13], the cost M(A) of computing the exponential function of A is given by

M (A) = — + 2

where fx] + denote the smallest non-negative integer larger than x.

The remaining scalar multiplications and vector additions in Eq. (56) require (2p + 1)n FLOP. The final cost function was obtained for the number of steps required to go from the current time, tk to t = 1 as

C (t, m) =

1 - tk

m(n + N) + 2(p - 1)(N + n) + M(m + p + 1)3 + (2p + 1)n.

5.3.2. Initial parameters

The procedure described above has two key parameters, the dimension of the Krylov space, m and the size of a time subinterval, t , for which initial values need to be determined at the beginning of the procedure. At the beginning of each model time-step, it is necessary to submit an initial estimate of m. Numerical experiments have indicated that this choice is crucial for a good performance of the method. The strategy used in this study is to use the final value of m from the previous time-step as a first guess for the next one. The heuristic arguments applied here are justified because the nature of the problem to be solved in a particular time-step resembles closely that from the previous time-step. This is only true when At is either fixed or does not vary significantly. The assumption of the constant time step is very well justified considering the fact that the meteorological models use consistently fixed time steps.

In [36], an initial step size is suggested. A similar, but more liberal estimate was used in [28]

10 /Tol((mqve + 1)/e)mave+^2n(mave + 1A Vmave ( )

T0 II A||TO\ 4|| AIUIM» ) '

where Tol is the prescribed tolerance and mave = (minput + mmax)/2, minput is the prescribed input value of m and mmax is the maximum size of m.

We do not use the above estimates because they require the evaluation of a matrix norm, which is computationally expensive and difficult to implement in a general solver where the action of the matrix-vector product might be given by a

Table 1

Average execution time (seconds) per simulated day for the cases of the mountain flow and Rossby-Haurwitz wave of Williamson et al. [44].

Method Mountain flow Rossby wave

RK4 At = 240 s 88.27 90.29

T-ABT At = 900 s 38.32 38.39

EP12/phipm/1OM2 At = 1 h 78.81 87.93

EP12/phipm/1OM2 At = 2h 41.16 49.92

EP13/phipm/1OM2 At = 1 h 79.13 87.04

EP13/phipm/1OM2 At = 2 h 41.13 48.40

EP12/phipm At = 1 h 105.67 119.62

EP12/phipm At = 2 h 60.33 75.84

EP13/phipm At = 1 h 103.81 117.83

EP13/phipm At = 2 h 59.06 75.66

Table 2

Average time taken, in seconds, for the calculation of the Jacobian matrix, the forcing terms and the execution of the phipm routine at each time step over the 14-d Rossby-Haurwitz wave simulation. The fourth column gives the average number of Krylov basis vectors per call to phipm/IOM2.

Method Jacobian matrix Forcing phipm/1OM2 Krylov vectors

EP12 At = 1 h 1.6519 0.0627 1.9491 40

EP12 At = 2h 1.4026 0.0540 2.7031 57

EP13 At = 1 h 1.6309 0.0618 1.9341 39

EP13 At = 2h 1.3233 0.0505 2.6595 57

"matvec" subroutine. Instead, we use an initial step size of t = 1. Since the only consequence of rejecting a time-step size is that the exponential of a small matrix is computed in vain, a rather optimistic estimate like t = 1 seems reasonable.

6. Numerical examples

1n order to evaluate the scheme described in Section 4, the standard set of tests introduced by Williamson et al. [44] was used. The experiments were performed on an icosahedral grid with the shallow water equations solved using EP12 and EP13 algorithms described in Section 4. The strategy of experiments was the same as that in [8]. The runs were performed on grid number 6 with N = 40,962 vertices. The state vector of the system described by Eq. (27) comprised of four predictive variables and has size of 163,848. The Jacobian of the system, Eq. (34) is represented by 163,848 x 163,848 sparse matrix with sparsity pattern depicted in Fig. 2. We performed the simulation with the explicit fourth order Runge-Kutta scheme (RK4), the semi-implicit predictor-corrector T-ABT scheme described by Clancy and Pudykiewicz [7] and finally, two exponential schemes EP12 and EP13 with both the Arnoldi and the incomplete orthogonalization methods as well as with the additional performance enhancing options discussed in Section 5. The model was implemented in MATLAB and was run on a desktop computer with Intel (R) Xeon (R) CPU X 5670 @ 2.93 GHz and 12 Gb of RAM.

The most important conclusion from the tests is that the solutions with EP12 and EP13 obtained using 1OM2 projection have the equivalent accuracy to that reported by Clancy and Pudykiewicz [8] while significantly reducing the time of calculations with respect to the standard version of phipm with the Arnoldi orthogonalization. The amounts of time used to perform the simulations for the cases of zonal flow perturbed by a mountain and for the Rossby wave are collected in Table 1. It is shown that for the mountain flow case the time spent to execute the model using the exponential scheme EP13 is almost identical to that consumed by the semi-implicit scheme AT-ABT described in [7].

Further insight into the problem of efficiency of exponential integration methods can be provided by analyzing the timing of the selected components of the algorithm. Table 2 contains the average time taken for the calculation of the Jacobian matrix, the forcing terms and the phipm routine (with 1OM2) for two algorithms tested with 3600 s and 7200 s time steps. The analogical information for the runs employing the phipm algorithm with the Arnoldi iteration and same tolerance as in [8] is summarized in Table 3. The fourth column in both tables gives the average number of Krylov basis vectors. Analysis of the data in Table 2 indicates clearly that if the phipm routine is executed in the mode without explicit knowledge of the Jacobian, the overall time per time step will be reduced further by saving time spent on the assembly of the matrix of J. The latter will make the numbers in Table 1 even more favorable for the EP13 scheme. 1n particular, time spent for EP13 scheme with 7200 s time-step for the Rossby wave case will be shorter than that for the AT-ABT scheme while offering both better accuracy and exact representation of the phase speed of gravity waves.

Sample results from the run for the Rossby wave number 4 are depicted in the upper panel of Fig. 3 which shows the height field after 14 days. The numerical solution obtained with the EP13 scheme using a very long time step of 7200 s and phipm/IOM2 routine reproduces quite well the solutions reported in the literature. In particular, the Rossby-Haurwitz wave remains stable during the entire simulation, propagating eastwards and the initial structure of the wave number four is

Table 3

Average time taken, in seconds, for the calculation of the Jacobian matrix, the forcing terms and the execution of the phipm routine at each time step over the 14-d Rossby-Haurwitz wave simulation. The fourth column gives the average number of Krylov basis vectors per call to phipm.

Method Jacobian matrix Forcing phipm Krylov vectors

EPI2 At = 1h 1.6350 0.0619 3.2874 42

EPI2 At = 2h 1.3392 0.0505 4.9305 60

EPI3 At = 1h 1.6352 0.0625 3.2117 40

EPI3 At = 2h 1.3199 0.0515 4.9338 63

Fig. 3. The normalized conservation errors for total mass, energy and potential enstrophy for Rossby-Haurwitz wave calculated using EPI3 exponential time integration scheme on icosahedral grid number 6 with the time step of 7200 s. The upper panel shows the height field after 14 days of simulation.

well maintained with only minimal vacillations in shape. The conservation errors for total mass, total energy and potential enstrophy:

Mass = h, Energy = (gh + 1 |u|2 ), Potential Enstrophy = -1 (t + f

V 2 / 2h V

are presented in Fig. 3. showing that mass is conserved, consistently with the finite volume formulation; the errors for energy and potential enstrophy are of the same order, with values around 0.5 x 10-3 after 15 days. It is evident that the conservation properties are the same as those reported in [30] as well as those obtained with other models in the literature executed with time steps which are 30-50 times shorter.

The comparison of the time dependence of the potential enstrophy error obtained herein and in [22] (Fig. 14c) shows that our results with about 4 x 104 degrees of freedom, are similar to those obtained in [22] with approximately 9 x 104 degrees of freedom. This observation shows that the exponential time integration scheme with the second order approximation of operators in (21) is competitive to the best currently used algorithms.

In order to demonstrate the performance of the method in the case involving a complex vorticity pattern, it is useful to present the simulation of the flow perturbed by an isolated mountain (case number 5 from the standard test suite for the shallow water models [44]).

The initial height field for this test is

h = h0 - g^a Q u0 + U-2 j sin2 0 - bs, (61)

where h0 = 5960, u0 = 20 and bs denotes the conical mountain with height of 2000 m and the base diameter a(n/9) centered at (Xc = n/2, 0c = n/6). The initial velocity field is

ul = uo cos0, u0 = 0. The Froude number for this case is

Fr = < 1.

Here we will show the vorticity field generated by the obstacle instead of the commonly displayed height field. The vorticity field simulated on icosahedral grid number 6 using EP13 method with 1OM2 and time step of 7200 s is depicted in Fig. 4 for the initial 10 days of simulation. The simulated vorticity fields show vortex shedding behind the obstacle. There are no unphysical vorticity patterns nor noise generated around the mountain. The observed pattern is also in quantitative agreement with the mechanism of vortex shedding discussed by Schär and Smith [38].

The most challenging test for the discussed method is the unstable jet [10]. A rapid generation of the vorticity filaments in this system makes numerical simulation difficult. The detailed analysis of the associated processes is presented in [30]. The gradients of the vorticity grow in time as exp(Xe t) and the corresponding spatial scale of the vorticity filaments decreases according to the following relation

5L(t) = SL0 exp (-ke t),

where SL0 denotes the initial space scale and Xe is Lyapunov exponent. Considering the fact that for a typical tropospheric flow Xe is of the order of 0.5 day-1, we can expect the appearance of very sharp structures of the vorticity field relatively quickly. The estimate for the case of an unstable jet shows that the initial scale measured by the width of a jet (of the order of 106 m) will be reduced down to about 1.8 x 104 m in just 8 days.

As soon as length scales generated in the process of barotropic instability exceed the limit of the resolution (typically few mesh intervals), one needs to introduce a mechanism eliminating fine scale structures appearing in the vorticity field. Otherwise the numerical algorithm develops an unrealistic solution representing mainly dispersion errors of the numerical method.

For a method with long time steps, resolving a highly curved flow field is difficult (tests with the unstable jet usually run with relatively short time steps). Sample results obtained on icosahedral grid number 6 using a time step of 7200 s are depicted in Fig. 5. The run was performed with small hyperdiffusion added in the same manner as in [8]. The obtained solutions compares well with the results produced by the spectral model with resolution T341, executed with the time step of 30 s (Fig. 4 presented by Galewsky et al. [10]). Therefore we conclude that the scheme discussed herein converges to the reference solution on a grid with a relatively small number of degrees of freedom and very long time steps in the similar manner as the original exponential propagation scheme discussed in [8]. The success of this simulation is largely attributed to the high accuracy with which the linear part is solved. 1n the case off large-scale atmospheric flow this is probably the decisive factor in favor of the exponential integration methods. The remarkable stability of the exponential integration scheme should be taken with some caution when coupling the dynamical core with nonlinear forcings in order to avoid performing the calculations which are stable but not accurate. This important consideration was already discussed in the well known paper by Bartello and Thomas [3] in the context of semi-Lagrangian schemes.

Initial condition Time=2 days

Fig. 4. The vorticity field (scaled by 105) generated in the zonal flow behind an isolated mountain. Calculations were performed using EPI3 exponential time integration scheme on icosahedral grid number 6 with the time step of 7200 s.

The results shown in Figs. 3-5 provide qualitative illustration of the stability of the method for several typical flows systems. In order to quantify the accuracy of the solver, we applied also well known Lauter test [21]. The plots of the l1, l2, and lTO norms in function of time for the first 10 days of simulation on grid number 6 for time steps of 3600 s and 7200 s are depicted in Fig. 6. The results shown were obtained for: u0 = 2na/12 m/day, k1 = 133681 m2/s2, k2 = 10 m2/s2, and a = n/4; see [21] for the discussion of the parameters.

It is shown that with both time steps considered the exponential integration scheme EPI3/IOM2 has norm in the range between 10—4 — 10—3 for At = 7200 s and 10—5 — 10—4 for At = 3600 s indicating quantitatively good accuracy anticipated from theory. The accuracy of the semi-implicit predictor-corrector scheme T-ABT for the Lauter test is discussed in [7]. Fig. 8 in [7] shows the lTO height error norms in function of time for several schemes. It is evident that with At = 1200 s the semi-implicit T-ABT scheme has lTO norm about 10—3 which is two orders of magnitude larger than the exponential integration schemes discussed herein.

The times of execution of EPI3/IOM2 with At = 3600 s and At = 7200 s are of the order of 80 s and 40 s respectively. For comparison the standard fourth order Runge-Kutta scheme produces the results with an average error of the order of 10—5 in time of about 90 s. The equivalent results for the semi-implicit scheme T-ABT are around 38 s but with an error

Time=4 days Time=5 days

Fig. 5. The vorticity held (scaled by 105) in the unstable jet. Calculations were performed using EP13 exponential time integration scheme on icosahedral grid number 6 with the time step of 7200 s.

about two orders of magnitude larger. The exponential scheme EP13/1OM2 with At = 7200 s produces thus very accurate results in time comparable to the relatively low accuracy T-ABT scheme executed with At = 1200 s.

The results indicating the efficiency of the exponential scheme for meteorological equations will very likely improve in the future as result of the further development of the method. The analysis of data in Table 2 shows that the significant part of the cost of exponential scheme is evaluation of the Jacobian. This task in fact is not even an internal part of the algorithm and we intend to optimize the calculations by the use of the so-called Jacobian-free methods [19] where the cost of assembly of the Jacobian is eliminated. The results of this work will be reported in the forthcoming paper.

7. Conclusions

1ntegration of the stiff partial differential equations with exponential integration methods based on dynamic linearization and Krylov projection methods offers both accuracy and stability with long time steps. The specific example of the shallow water equations discretized on an icosahedral geodesic grid was used in the past to provide evidence of the applicability of the technique to a relatively complex problem with the size of the Jacobian matrix of the order (105 )2. Herein we changed the algorithm described in [8] by replacing the standard Arnoldi iteration with the Incomplete Orthogonalization Method and by modifying the selection of crucial parameters of the integration.

Fig. 6. Error norms for the Lauter test, calculations were performed with the third order scheme with 1OM2 option, the norms are depicted in logarithmic scale. Upper and lower panels show results obtained with time steps of 7200 and 3600 s respectively.

The basic tests indicate clearly that the proposed improvements are sufficient to make the exponential integration scheme for shallow water equations described in [8] worth considering in the future research of numerical methods for NWP. In the process of optimization, the original high accuracy of the method is fully preserved. The most notable result is the ability of the scheme to perform an accurate simulation with long time steps for flows with complex vortical structures generated by the orographic barrier as well as for the unstable zonal jet.

The new algorithm creates an unique opportunity to perform a meteorological simulation with an accuracy significantly higher than those obtained with S1SL scheme. Furthermore, the method conserves mass down to machine accuracy. Besides the obvious advantage of the high accuracy, the new method also eliminates the need for an arbitrary splitting between the linear and nonlinear parts in the shallow water equations. This task is performed automatically by the evaluation of the Jacobian operator at each time step using the analytical formulae. For this reason, the proposed algorithm allows for better coupling between the dynamics and the unresolved processes by including them in the Jacobian operator in the same manner as it is done for diffusion in the scheme reported herein.

The future work will include experiments with the higher order exponential integration schemes which can lead to a further increase of accuracy and, possibly, to even better efficiency of the method. The MATLAB code of the phipm/IOM2 routine used in all simulations is available from the authors upon request.

Acknowledgements

We would like to thank the two anonymous reviewers for their detailed comments that helped to improve the manuscript. The personal reviews by Drs. C. Subich and C. Girard from RPN were invigorating. We also appreciate the review of the text as well as the program for phipm/IOM2 solver performed by Mr. Valentin Dallerit (Polytech Lyon).

Appendix A List of symbols

Symbol Description

a Earth radius

— sin 6 cos $ — sin $

ai, ai unit tangent vector; a1 = — sin 6 sin $ a2 = cos $

cos 6 0

bo, bi, b2,..., t bij C

Curl„ (V)

div(V)

Dx, Dy, Dz

ei, em eij £ f F

Fux, Fuy, Fuz

grad(f) GSx, GSy, GSz h hi hi hs Hm H m+p

Jn Jv Jr

Km(A, V) L L m

mkj (k = 1,2)

M( A) n

nx, ny, nz

nY n nk

nbij N

N (u) Ng Nq, P

Px, Py, Pz

Si S T S u

general non-symmetric n x n matrix

vectors e Rn in the exponential integration scheme

weighted normal vector

cost function

curl of V; Curln(V) e AhV2) - 4(hi V,)

weighted tangent vector

divergence of V div(V) = £2=1 afc(hih2

• ux + Dy • uy + Dz

operators used to evaluate the divergence divu = Dx • dissipation operator; Df = — v L2 first and last canonical vectors in Rm midpoint of geodesic arch connecting point i with point j

total energy; E = (|u|2/2 + <t) (in discretized equations column array with total energy) Coriolis parameter; f = 2Q sin 6

function describing all forcings; F : Rn —> Rn (in discretized equations column array with forcings) column arrays with Cartesian components of forcings column array with right-hand side of continuity equation gravity acceleration

dx 3* 1 dy dy _i_ &L dZL g — l~h2 0

metric tensor; g« = ■

: gij = & s"fj + aij + as, a$j1 gradient of f; grad (f) e £2=i an £ jf operators to evaluate the Cartesian components of the gradient

scalar field on S describing thickness of the fluid layer (in discrete equations column array with height field)

scale factor for S1; hi = a2

scale factor for %2; h2 = a2 cos2 6

height of the surface level

projection of matrix A

augmented H matrix

identity matrix

Jacobian; J = j dFr

Jacobian at time instant n; Jn = du (un) part of the Jacobian which represents relative vorticity part of the Jacobian which accounts for rotation and dissipation part of the Jacobian representing coupling between the mass and the velocity fields part of the Jacobian representing physical parameterizations number of Krylov projections Krylov space

sparse matrix representing the Laplace operator operator describing linear part of forcings dimension of the Krylov space

mass centers of two triangles sharing the common edge ij cost of computing exponential function of matrix A number of degrees of freedom and the n-th time step column arrays with Cartesian components of n coefficient in expression for Yh'; ny = 4 surface normal vector

normal to the geodesic arch connecting points eyand mk (k = 1, 2) number of the nonzero elements in the Jacobian matrix nonlinear part of forcings

number of nodes in the geodesic mesh Ng = 10 x 22' + 2 (l is the grid number)

normal component of the gradient; Nq( = fiJ2j(i) by

number of p-functions different than <p0

polynomial of degree m

arrays with Cartesian components of u x n

residue at time step n

area of

denotes surface of the sphere plane tangent to the sphere state vector u e Rn

Symbol Description

un state vector at time level n

ux, uy, uz column arrays with Cartesian components of the velocity

u smooth velocity field on S with values in TS

v vector; v e Rn

v, Av,..., Amv vectors spanning Krylov space

v1, v2,...,vm Krylov basis vectors after orthogonalization procedure

||v|| norm of a vector v

Vyinterface values of vector field V

Vm n x m matrix with columns composed by v1, v2,..., vm

VT transpose of V

Vx, Vy, and Vz operators used to calculate the vorticity

V vector field on S with values in TS

W (u) n x u

Wx, Wy, Wz column arrays with Cartesian components of W (u)

x, y, z Cartesian coordinates

Z complex matrix

summation over elements constituting the boundary of Qi denotes the multiplication of arrays in component-wise manner

x denotes the vector product in R3

[x] + the smallest non-negative integer larger than x

a(A) spectral abscissa of A

p norm of vector v; p = || v ||

At time step

Atn time step at time level n

Slk length of the arch connecting points eij and mk (k = 1, 2)

Ate is the reference time step; Ate = 240 s

Ax is the average separation of the node points

em error estimate

n absolute vorticity; n = (Z + fn) (column array with absolute vorticity in discretized equations)

Yh coefficient of proportionality

Xe is the Lyapunov exponent

v dissipation coefficient

Q is the angular velocity of the rotation

Qi control volume defined on the sphere

4 is the longitude (0 <4 < 2n)

Pi functions with matrix argument

$ geopotential; $ = g(h + hs)

Vyinterface value of V

t time

Tk tk+1 - tk

t k tangent to the geodesic arch connecting points eyand m^ (k = 1, 2)

0 is the latitude (-n/2 < 0 < n/2)

fi, %2 curvilinear coordinates; fj = 0, %2 = 4

Z relative vorticity; Z = Curln (u)

References

[1] A.H. Al-Mohy, N.J. Higham, Computing the action of the matrix exponential with the application to exponential integration, SIAM J. Sci. Comput. 33 (2) (2011) 488-511.

[2] R. Archibald, K.J. Evans, J. Drake, J.B. White III, Multiwavelet discontinuous Galerkin-accelerated Exact Linear Part (ELP) method for the shallow-water equations on the cubed sphere, Mon. Weather Rev. 139 (2011) 457-473.

[3] P. Bartello, S.J. Thomas, The cost-effectiveness of semi-Lagrangian advection, Mon. Weather Rev. 124 (12) (1996) 2883-2897.

[4] G. Beylkin, J.M. Keiser, L. Vozovoi, A new class of time discretization schemes for the solution of nonlinear PDEs, J. Comput. Phys. 147 (1998) 362-387.

[5] A.I. Bobenko, Y.B. Suris, Discrete differential geometry: integrable structure, in: Graduate Studies in Mathematics, American Mathematical Society, 2008, 404 pp.

[6] L.E. Carr III, C.F. Borges, F.X. Giraldo, An element-based spectrally optimized approximate inverse preconditioner for the Euler equations, SIAM J. Sci. Comput. 34 (4) (2012) 392-420.

[7] C. Clancy, J. Pudykiewicz, A class of semi-implicit predictor-corrector schemes for the time integration of atmospheric models, J. Comput. Phys. 250 (2013) 665-684.

[8] C. Clancy, J. Pudykiewicz, On the use of exponential time integration methods in atmospheric models, Tellus A 65 (2013).

[9] S.M. Cox, P.C. Matthews, Exponential time differencing for stiff systems, J. Comput. Phys. 176 (2002) 430-455.

[10] J. Galewsky, R.K. Scott, L.M. Polvani, An initial-value problem for testing numerical models of the global shallow-water equations, Tellus 56A (2004) 429-440.

[11] F. Garcia, L. Bonaventura, M. Net, J. Sánchez, Exponential versus IMEX high-order time integrators for thermal convection in rotating spherical shells, J. Comput. Phys. 264 (2014) 41-54.

[12] F.X. Giraldo, Lagrange-Galerkin methods on spherical geodesic grids, J. Comput. Phys. 136 (1997) 197-213.

[13] N.J. Higham, The scaling and squaring method for the matrix exponential revisited, SIAM Rev. 51 (4) (2009) 747-764.

[14] Nicholas J. Higham, Edvin Deadman, A catalogue of software for matrix functions. Version 1.0, MIMS, Preprint, 2014.

[15] M. Hochbruck, C. Lubich, On Krylov subspace approximations to the matrix exponential operator, SIAM J. Numer. Anal. 34 (5) (1997) 1911-1925.

[16] M. Hochbruck, C. Lubich, H. Selhofer, Exponential integrators for large systems of differential equations, SIAM J. Sci. Comput. 19 (5) (1998) 1552-1574.

[17] M. Hochbruck, A. Ostermann, Exponential integrators, Acta Numer. 209 (2010) 209-286.

[18] M. Hochbruck, A. Ostermann, J. Schweitzer, Exponential Rosenbrock-type methods, SIAM J. Numer. Anal. 47 (1) (2009) 786-803.

[19] D.A. Knoll, D.E. Keyes, Jacobian-free Newton-Krylov methods: a survey of approaches and applications, J. Comput. Phys. 193 (2004) 357-397.

[20] A. Koskela, Approximating the matrix exponential of an advection-diffusion operator using the incomplete orthogonalization method, in: Numerical Mathematics and Advanced Applications, ENUMATH 2013, Springer, 2015, pp. 345-353.

[21] M. Läuter, D. Handorf, K. Dethloff, Unsteady analytical solutions of the spherical shallow water equations, J. Comput. Phys. 210 (2005) 535-553.

[22] S. Li, F. Xiao, A global shallow water model using high order multi-moment constrained finite volume method on icosahedral grid, J. Comput. Phys. 229 (2010) 1774-1796.

[23] B.V. Minchev, Exponential integrators for semi-linear problems, Ph.D. thesis, University of Bergen, ISBN 82-92610-01-4, 2004.

[24] C. Moler, C. Van Loan, Nineteen dubious ways to compute the exponential of a matrix, SIAM Rev. 20 (4) (1978) 801-836.

[25] C. Moler, C. Van Loan, Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later, SIAM Rev. 45 (1) (2003) 3-49.

[26] P.M. Morse, H. Feshbach, Methods of Theoretical Physics, vol. I, Feshbach Publishing, 1981.

[27] E.H. Müller, R. Scheichl, Massively parallel solvers for elliptic partial differential equations in numerical weather and climate prediction, Q. J. R. Meteo-rol. Soc. 140 (685) (2014) 2608-2624.

[28] J. Niesen, W.M. Wright, Algorithm 919: a Krylov subspace algorithm for evaluating the ^-functions appearing in exponential integrators, ACM Trans. Math. Softw. 38 (3) (2012).

[29] J. Pudykiewicz, Numerical solution of the reaction-advection-diffusion equation on the sphere, J. Comput. Phys. 213 (1) (2006) 358-390.

[30] J. Pudykiewicz, On numerical solution of the shallow water equations with chemical reactions on icosahedral geodesic grid, J. Comput. Phys. 230 (5) (2011) 1956-1991.

[31] T. Regge, General relativity without coordinates, Nuovo Cimento XIX (1961) 558-571.

[32] A. Robert, A stable numerical integration scheme for the primitive meteorological equations, Atmos.-Ocean 19 (1) (1981) 35-46.

[33] A. Robert, A semi-Lagrangian and semi-implicit numerical integration scheme for the primitive meteorological equations, J. Meteorol. Soc. Jpn. 60 (1) (1982) 319-325.

[34] Y. Saad, Variations on Arnoldi's method for computing eigenelements of large unsymmetric matrices, Linear Algebra Appl. 34 (1980) 269-295.

[35] Y. Saad, The Lanczos biorthogonalization algorithm and other oblique projection methods for solving large unsymmetric systems, SIAM J. Numer. Anal.

19 (3) (1982) 485-506.

[36] Y. Saad, Analysis of some Krylov subspace approximations to the matrix exponential operator, SIAM J. Numer. Anal. 29 (1) (1992) 209-228.

[37] R.B. Sidje, Expokit: a software package for computing matrix exponentials, ACM Trans. Math. Softw. 24 (1) (1998).

[38] C. Schär, R.B. Smith, Shallow-water flow past isolated topography. Part II: transition to vortex shedding, J. Atmos. Sci. 50 (10) (1993) 1401-1412.

[39] Bârd Skaflestad, W.M. Wright, The scaling and modified squaring method for matrix functions related to the exponential, Appl. Numer. Math. 59 (3) (2009) 783-799.

[40] M. Tokman, Efficient integration of large stiff systems of ODEs with exponential propagation iterative (EPI) methods, J. Comput. Phys. 213 (2) (2006) 748-776.

[41] H. Tomita, M. Tsugawa, M. Satoh, K. Goto, Shallow water model on a modified icosahedral geodesic grid by using spring dynamics, J. Comput. Phys. 174 (2001) 579-613.

[42] A. Qaddouri, J. Pudykiewicz, M. Tanguay, C. Girard, J. Côté, Experiments with different discretizations for the shallow-water equations on a sphere, Q. J. R. Meteorol. Soc. 138 (2012) 989-1003.

[43] R. Vichnevetsky, J.B. Bowles, Fourier analysis of numerical approximations of hyperbolic equations, in: SIAM Sudies in Applied Mathematics, SIAM, Philadelphia, 1982, 140 pp.

[44] D.L. Williamson, J.B. Drake, J.J. Hack, R. Jakob, P.N. Swarztrauber, A standard test set for numerical approximations to the shallow water equations in spherical geometry, J. Comput. Phys. 102 (1) (1992) 211-224.