CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedía Engineering 89 (2014) 1192 - 1199

Procedía Engineering

www.elsevier.com/locate/procedia

16th Conference on Water Distribution System Analysis, WDSA 2014

Identifying the High-Level Flow Model of Water Distribution Networks Using Graph Theory

M. Fortinia*, C. Bragallia, S. Artinaa

aDICAM-University of Bologna, V.le del Risorgimento 3, Bologna, Italy

Abstract

Identifying the main connections between water production, processing and distribution sites can give a clearer comprehension of their structure, importance and criticality. We present a graph-theory based approach which is able to dramatically reduce the complexity of a network to allow a better comprehension of its main flow models. As a starting point, some nodes in the network are marked as primary;A skeletonization procedure then reduces the graph excluding all the pipes which are not essential to connect the primary nodes.The network is further analysed to define a single path between couples of primary nodes. An efficient implementation (in Python+IGraph) is discussed, along with performance improvements. The results can be employed to understand the flow model of a previously unknown network or as a first step to determine its most vulnerable or important elements.

© 2014 The Authors. Published by ElsevierLtd.This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/3.0/).

Peer-review under responsibility of the Organizing Committee of WDSA 2014 Keywords: WDN; Graph Theory; Primary network; Simplification; Algorithms; IGraph

1. Introduction

Water Distribution Networks in densely populated areas are complex graphs whose development took place in successive stages. The growth of WDNs often occurred without an overall vision, in a disorderly way, chasing after urban sprawls and the need to find new water resources. To guide the planning, design and management decisions that are necessary to bridge the infrastructure gap relating water distribution networks, the identification of the main connections between water production, processing and distribution is necessary and can give a clearer comprehension of their structure, importance and criticality. In the context of processing of large WDNs databases, the application of graph theory provides the conceptual basis for dealing with the graph-theoretical decomposition of the network graph [1]. Graph theory metrics have also been applied to network flow models, with the aim of assessing the possibility of using these metrics to identify the importance of components for a given network architecture, and thus vulnerable areas within infrastructure systems [2] [3] [4]. In the following a graph-theory based algorithm is presented, which

* Corresponding author. E-mail address: m.fortini@unibo.it

1877-7058 © 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

Peer-review under responsibility of the Organizing Committee of WDSA 2014 doi: 10. 1016/j .proeng .2014.11.249

is able to reduce the complexity of a network according to the concept of primary network and to allow a better comprehension of its main flow model.

2. Primary Network

The concept of primary network is connected with the goal of making the operation of water distribution systems more "readable", in particular by improving the ability to highlight the paths of transfer of water resources from the network entry points to the most relevant areas of water consumption, namely to water demand centers. A WDS is viewed as a connected, undirected graph, in which the primary network is the subset of links needed to connect the primary nodes (wells, springs, derivations, treatment plants, tanks, pumps, valves, flow meters, water demand centers) characterized by the maximum water transport capacity. Water demand centers are represented by urban areas which correspond to the most densely populated zones in which water consumption is assumed to be concentrated or to particularly water-demanding individual users. The concept of primary network therefore differs from that of the adduction network, which identifies a system of conduits essentially used for the transport of water from the sources to the entry into the distribution network. It should be noted, however, that despite the "simplicity" of the definition of adduction network, its detection within a given layout is often ambiguous, especially in areas in which water resources are widespread and next to the water demand centers.

A natural approach to determine the primary network could be to delete the pipes with a diameter smaller than a threshold. This method is very quick to implement, but has some serious drawbacks:

• to reach a reduction in the number of links of the layout comparable to the one attainable with the algorithm described in the following, we would obtain a very fragmented final graph;

• it is necessary for each instance to calibrate a threshold of diameter for which the resulting graph remains completely connected and it is possible that this corresponds to keep all pipes.

These problems are due to the fact that a diameter of a pipe is not an absolute measure of its importance, but a relative one: in less populated, rural areas, the important pipes have smaller diameters, similarly, design choices or changing conditions compared to the original design, can lead to discrepancies between diameter and flow rate.

In this paper, we define a primary network as the union of all the paths of maximum water transport capacity which connect the primary nodes of a WDS.

3. Algorithm for High-Level Flow Model

The algorithm developed to determine the High-Level Flow model of a WDS is based on the concept of primary network and an overview of the process is detailed in Algorithm 1.

begin find the external primary network

substitute all demand centers with one or more points;

skeletonize the network;

mark the external primary network;

begin add the internal primary network

mark the demand centers' entry and exit points selected in the preceding step as primary;

skeletonize the network;

mark the internal primary network;

Algorithm 1: Algorithm for High-Level Flow Model

3.1. Water transport capacity coefficient

The weight assumed for each link was assigned purely on the basis of geometric parameters. This has the advantage of identifying the primary network from the infrastructure point of view, regardless of the mode of management. For other purposes, if you have a calibrated model of the complete network, the weight could instead be the flow. It is supposed that, generally, the primary pipelines, which connect water resources with water demand centers, are developed according to the shortest paths from the energy point of view; also, it was considered that often the main pipelines are oversized. This leads to the following expression of the water transportation capacity coefficient for a pipe e:

Ce = (1)

where l is the length of the segment and DN is its nominal diameter.

Then the shortest paths, which are identified to search the primary network, are those which correspond to a minimum value in the sum of the weights Ce. We observe that the costs are all strictly positive.

3.2. External primary network

We represent a graph as G(V, E)

V (G) = (vo, v1,...,vn} is the set of all vertices or nodes of G

E(G) = (e0, e1,..., em}, ei = (u, v), u, v e V(G) is the set of all edges of G

A water demand center is spatially identified by a perimeter. This allows to distinguish between the part of the network which is internal to the perimeters and the external one. We first determine the primary network outside the demand centers. To do this, for each demand center D, we replace the subgraph GD of the edges completely contained into it with its centroid c, then we create "virtual" edges connecting c with each edge crossing the demand center's perimeter to recover connectivity (algorithm 2). Let's call G' the modified graph we obtained by applying algorithm 2 to graph G. An example of such a transformation can be seen in figure 1.

foreach demand center D do

Gd ^ subgraph (e e E(G), e n D = e);

c ^ centroid(GD);

V(G) ^ V(G) U (C, type = 'centroid'};

E(G) ^ E(G) U (e = ((c, u), type = 'centroid', weight = e), (u, v) e E(G), u e D A v í D};

Algorithm 2: Substitute internal networks with single points

This pre-processing step is useful to determine the entry and exit edges for each demand center, and has the side benefit of reducing the complexity of finding the primary network, since the most dense networks are usually those in highly populated areas, such as demand centers.

3.3. Network skeletonization

We are interested in all the shortest paths connecting all primary nodes, using the weights defined in section 3.1 as the lengths of the edges. Since the complexity of the shortest paths algorithm is O(| V(G)|2) calculated on all the nodes

. . . , , ... ... , (b) The same demand center with its centroid (in red) and the

(a) A demand center with its original network ,,,,/.

added edges (in green)

Fig. 1: Example of the substitution of a demand center's internal network with its centroid

[5], it is convenient to find all ways to remove all the unnecessary nodes, i.e. the ones which we can prove won't be part of any such path.

Given that the weights are strictly positive, we observe that any leaf node of the graph which is not a primary node can be deleted along with its adjacent edge, since no shortest path connecting primary nodes will pass through it, repeating this process until no non-primary leaf nodes are left. A stronger observation is that any biconnected component of the graph which doesn't contain any primary node can be deleted for the same reason. To find and delete all such components, we can apply algorithm 3 which starts from finding all cut vertices in the graph.

P(G) = {v, v e V(G), v is a primary node };

repeat

C = {k, k is a cut vertex for G, k P(G)};

k ^ random(k e C);

C ^ C \ {k};

V(G) ^ V(G) \ {k};

B ^ {Gk, Gk e connComp(G), V(Gk) n P(G) = 0};

G ^ G \ U Gk eB Gk;

until C = 0;

Algorithm 3: Unoptimized network skeletonization algorithm

To find all cut vertices in a graph a common choice is using Tarjan's algorithm [6], which requires a complete exploration of the graph to enumerate all cut vertices. For each cut vertex, algorithm 3 requires to run another exploration of the graph to list all the connected components. For this reason, with very large graphs finding cut vertices and decomposing the network is a very time consuming step: we modified Tarjan's algorithm to find which subtrees to delete while exploring the graph by keeping track of the visited primary nodes. Our implementation adds an extra decision step when Tarjan's algorithms finds a bridge edge (algorithm 4).

Apply Tarjan algorithm exploring the graph in depth-first order; Keep track of Na!pove(e): number of primary nodes visited above edge e; Keep track of Nbpelow(e): number of primary nodes visited below edge e; if e is a bridge edge then if Npelow(e) = 0 then | mark(e) ^ 'to delete' else

if Nbpelow(e) * 0 andNfove(e) = 0 then | mark(e) ^ 'unsure' else

| mark(e) ^ 'to keep' end

Verify all edges with mark(e) = 'unsure' and either mark them as 'to keep' or 'to delete';

Delete all bridges marked as 'to delete' and keep the biconnected component containing the primary nodes

Algorithm 4: Optimized network skeletonization algorithm

з.4. Marking the external primary network

To mark the external primary network, we calculate the shortest paths for all possible combinations of primary nodes in each connected component Gc of G. The number of such paths is

^ |V(Gc)| * (|V(Gc)| - 1) (2)

GceconnComp(G)

Since we just need the shortest paths between primary nodes, we can enhance the process further for large networks by using algorithm 5, which is based on this observation:

Observation 3.4.1. For each cut vertex vc in a completely connected graph, the shortest path between any two nodes

и, w in two different piconnected components induced py vc is composed py the shortest path between u and vc, plus the shortest path between vc and w.

C ^ connComp(G);

while C ± 0 do

GC ^ largest GC e C;

C ^ C \ (Gc};

PC ^ primaryNodes(GC) P| p e cutVertices(GC);

if P * 0 then

p ^ {p, p e PC, p has the highest degree };

mark all shortest paths' edges between p and q e (primaryNodes(GC) \ {p}} as primary external;

C ^ C U connComp(GC \ {p});

foreach (p, q), p, q e primaryNodes(GC) do

| mark all shortest paths' edges between p and q as primary external;

Algorithm 5: Progressive subdivision shortest path marking algorithm

This modified algorithm has the same complexity of finding the shortest paths between all the couples of primary nodes in the worst case where there are no primary nodes which are also cut vertices, and improves exponentially for each such node.

3.5. Complete primary network: demand centers' entry and exit nodes

Among the edges we added inside the demand centers in algorithm 2, there will be some which have been marked as primary in algorithm 5. We use this information to determine which nodes have to be considered the entry and the exit points for each demand center.

We mark these nodes as primary in algorithm 6, so that they will be connected by the shortest path marking algorithm 5.

foreach demand center D do

EP(G) ^ (e e E(G), e is primary external, type(e) = 'centroid', e n D + 0) ; P(G) ^ P(G) U {u, (c, u) e EP(G), type(c) = 'centroid'};

Algorithm 6: Marking the demand centers' entry and exit points as primary

3.6. Marking the complete primary network

Based on the complete graph with the new set of primary nodes from subsection 3.5, we repeat the skeletonization algorithm of subsection 3.3 on graph G, then we use algorithm 5 to mark the shortest paths' edges as primary internal. The primary network is the subgraph of all edges marked as primary internal or primary external:

E(Gprimary) = {e, e e E(G), mark(e) e {'primary internal', 'primary external'}}

4. Implementation

Rather than writing a procedure to parse EPANET2 .inp files, the graph was extracted directly from EPANET2 [7] using pyepanet2 [8], a Python object connection to EPANET2 toolkit, obtaining an igraph [9] Graph object. The rest of the computations were implemented in Python using igraph's functions. The implementation and the optimizations described above allowed us to process networks with millions of nodes and edges on a personal computer (intel® CORETMi7-2630QM with 8GB RAM).

5. Experiments and results

One of our test cases was the Langhirano network [10]. Importing the .inp file resulted in a graph with 8917 nodes and 9061 edges, with a total length of about 190 km. There are 88 primary nodes in the network.

The network connects 21 demand centers: we applied algorithm 2 to each one and added 354 centroid edges to calculate the external network.

The external primary network started from a network of 4641 nodes and 4872 edges, which was reduced by the skeletonization algorithm 4 to a graph with 2815 nodes and 3010 edges (Table 1). The number of edges marked as primary external for the external network by algorithm 5 was 1835. The number of nodes marked as the demand centers' entry and exit nodes was 55.

The complete network was reduced by the skeletonization algorithm to a graph with 5080 nodes and 5189 edges. The final primary network contained 3087 nodes and 3138 edges, with a total length of about 113 km (Fig. 2).

For comparison, in Figure 3 we show the primary network selected by a trivial extraction by diameter. The minimum diameter is chosen eitherso that the relative number of remaining edges is comparable to the reduction to the

(a) External primary network (b) Primary network

Fig. 2: Primary network. In black the original network, in green the centroid edges, red dots are the centroids and black squares primary nodes. Demand centers are greyed areas with a blue perimeter.

Table 1: Size of the original, external, skeletonized and primary networks for the Langhirano instance.

Original External External Skeleton Complete Skeleton Primary

Nodes Edges 8917 9061 4641 4872 2815 (60% of External) 3010 (61% of External) 5080 (56% of Original) 5189 (57% of Original) 3087 (34% of Original) 3138 (35% of Original)

35% of the original obtained by the algorithm described in this paper for the same instance (Fig. 3a),or by setting a threshold equal to the most common diameter in the test instance (Fig. 3b).

It is easy to see that resulting network contains many redundant pipes and fails to connect key areas and elements. In the first case, for instance, the originally connected graph is split into 60 connected components, while our algorithm retains the connectivity.

The same process was applied to a larger graph, with 2154 290 nodes and 2 231976 edges, which was reduced to a primary graph of 605 308 nodes and 606 955 edges, with a similar relative reduction as the one we achieved for the Langhirano instance.

6. Conclusions

Water Distribution Networks in densely populated areas are complex graphs. A graph-theory based algorithm is presented able to reduce the complexity of a network, according to the concept of primary network, and to allow a better comprehension of its main flow model. A WDN is viewed as a connected, undirected graph, in which the primary network is the subset of links needed to connect the primary nodes (wells, springs, derivations, treatment plants, tanks, pumps, valves, flow meters, water demand centers) characterized by the maximum water transport capacity. The algorithm presented is able to select the primary network, containing about 35 % of the original nodes and edges. The same process was applied to a larger graph with a similar relative reduction. The results can be employed to understand the flow model of a previously unknown network or as a first step to determine its most vulnerable or important elements.

(a) Naive extraction with a threshold that gives a reduction similar (b) Naive extraction with a threshold equal to the most common to the one reached by proposed algorithm on the test instance diameter in the test instance

Fig. 3: Example of the output of a trivial extraction by diameter. In black the original network, in red the selected edges. Demand centers are greyed areas with a blue perimeter.

References

[1] Jochen W. Deuerlein - Decomposition Model of a General Water Supply Network Graph - J. Hydraul. Eng. 2008.134:822-832

[2] Sarah Dunn and Sean M. Wilkinson - Identifying Critical Components in Infrastructure Networks Using Network Topology J. Infrastruct. Syst. 2013.19:157-165.

[3] A. Yazdani1 and P. Jeffrey - Applying Network Theory to Quantify the Redundancy and Structural Robustness of Water Distribution Systems - J. Water Resour. Plann. Manage. 2012.138:153-161.

[4] R. Kinney, P. Crucitti, R. Albert,a, and V. Latora - Modeling cascading failures in the North American power grid - Eur. Phys. J. B 46, 101107

(2005)

[5] E.W. Dijkstra, A note on two problems in connexion with graphs. Numer. Math. 1 (1959), 269-271.

[6] R. Tarjan, Depth-first search and linear graph algorithms - SIAM journal on computing, 1972 - SIAM

[7] EPA-United States Environment Protection Agency, EPANET2, http://www.epa.gov/nrmrl/wswrd/dw/epanet.html

[8] M. Fortini, pyepanet2 a Python object to access the EPANET2 toolkit https://github.com/mfortini/pyepanet2

[9] G. Csardi, T. Nepusz, The igraph software package for complex network research, InterJournal, Complex Systems, 2006, p. 1695, http: //igraph.org

[10] Langhirano network instance, http://www.waterdistribution.systems