■H.fcHd.*«

ELSEVIER

Contents lists available at ScienceDirect

Engineering Science and Technology, an International Journal

journal homepage: www.elsevier.com/locate/jestch

ÉnuuimÊ BP,!

Full Length Article

Optimizing VM allocation and data placement for data-intensive applications in cloud using ACO metaheuristic algorithm

T.P. Shabeera *, S.D. Madhu Kumar, Sameera M. Salam, K. Murali Krishnan

Department of Computer Science and Engineering, National Institute of Technology Calicut, Kerala 673601, India

ARTICLE INFO

Article history: Available online xxxx

Keywords:

MapReduce

Cloud computing

Virtual Machines

Virtual Machine placement

Data placement

ABSTRACT

Nowadays data-intensive applications for processing big data are being hosted in the cloud. Since the cloud environment provides virtualized resources for computation, and data-intensive applications require communication between the computing nodes, the placement of Virtual Machines (VMs) and location of data affect the overall computation time. Majority of the research work reported in the current literature consider the selection of physical nodes for placing data and VMs as independent problems. This paper proposes an approach which considers VM placement and data placement hand in hand. The primary objective is to reduce cross network traffic and bandwidth usage, by placing required number of VMs and data in Physical Machines (PMs) which are physically closer. The VM and data placement problem (referred as MinDistVMDataPlacement problem) is defined in this paper and has been proved to be NP- Hard. This paper presents and evaluates a metaheuristic algorithm based on Ant Colony Optimization (ACO), which selects a set of adjacent PMs for placing data and VMs. Data is distributed in the physical storage devices of the selected PMs. According to the processing capacity of each PM, a set of VMs are placed on these PMs to process data stored in them. We use simulation to evaluate our algorithm. The results show that the proposed algorithm selects PMs in close proximity and the jobs executed in the VMs allocated by the proposed scheme outperforms other allocation schemes. © 2016 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC

BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4XI/).

1. Introduction

Cloud computing provides highly scalable, elastic services on-demand in a pay-per-use basis. Nowadays, the acceptability of cloud computing is very high and it is increasing day by day. The impact of cloud computing is significant on our daily lives varying from social networking to sensor networks. Extensive use of smart devices have increased the use of cloud model. Number and size of cloud data centers are increasing rapidly. The infrastructure and storage costs decrease dramatically but bandwidth is one of the scarcest resource in today's cloud.

Cisco Global Cloud Index [1] predicts that by 2017 more than two thirds of data center traffic will be between devices within cloud data centers compared to data traveling in and out of data centers. Cisco Global Cloud Index [2] predicts that by 2018, more than three quarters (78 percent) of workloads will be processed by cloud data centers; only 22 percent will be processed by tradi-

* Corresponding author. E-mail addresses: shabeera@gmail.com (T.P. Shabeera), madhu@nitc.ac.in (S.D. Madhu Kumar), shemi.nazir@gmail.com (S.M. Salam), kmurali@nitc.ac.in (K. Murali Krishnan).

tional data centers. Cisco Global Cloud Index [1] also predicts that server virtualization (multiple virtual servers running on a physical server) will have large impact on cloud data center networks. In the virtualized environment, the bandwidth available in the cloud data center can be managed effectively by efficiently placing Virtual Machines (VMs). The main objective of cloud data centers is to maximize the profit by increasing performance while minimizing cost [3]. By efficiently managing the bandwidth in the data center, the cloud providers can improve the performance and hence maximize the profit.

Setting up and managing big data management infrastructure is costlier when compared to hosting the same in cloud. MapReduce [4] proposed by Google and its open source implementation Hadoop [5,6] are the most popular big data management frameworks available today. By moving big data and its processing to cloud, the individuals and businesses can concentrate on other profit making ideas. In the currently available commercial clouds, data is stored in storage clouds and computation is done with compute clouds. An example is Amazon Elastic MapReduce(Amazon EMR) [7]. In Amazon EMR, data is stored in Amazon Web Services (AWS) data stores such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. The computation is by Amazon Elastic

http://dx.doi.org/10.1016/j.jestch.2016.11.006

2215-0986/® 2016 Karabuk University. Publishing services by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (http://creativec0mm0ns.0rg/licenses/by-nc-nd/4.0/).

Compute Cloud (Amazon EC2) instances. The problem with this scenario is that, before starting execution, the data needs to be copied from the location where it was stored to the instance on which the computation has to be started. It will take time according to the size of the data transmitted. Instead of copying, if remote access is allowed, multiple requests to the same data can lead to bottlenecks at the storage nodes.

In MapReduce clusters, data moves between nodes in the cluster during execution. In MapReduce cloud, the clusters are setup with VMs and data transfer occurs between these VMs while execution. MapReduce consists of Map and Reduce. The VMs executing Map may be independent, but data from these VMs should be moved to the VMs on which the Reduce tasks are started. In case of Join like tasks, VMs may need to transfer data in-between. In all these cases, data transfer delay increases with increase in distance. Distance can be defined in terms of network latency or hop count. The hop count is the number of intermediate devices through which data pass between source and destination [8]. Since each hop adds store and forward and other latencies, increase in number of hops between source and destination implies more data transfer delay. So, while creating MapReduce cluster, the cluster should consist of closer VMs, so that the data transfer latency can be minimized and hence the job completion time can be reduced.

The main drawback of cloud resource allocation is over-provisioning. If the VMs are not placed in Physical Machines (PMs) optimally, there will be resource wastage and more bandwidth will be consumed by these VMs. Most of the research work in the current literature focus on energy efficiency and server utilization. But in the case of MapReduce like application, if the VMs are hosted in distant PMs, data transfer time will be more and the bandwidth usage will be high.

This paper proposes a resource allocation algorithm for dataintensive applications in cloud, considering VM placement and data placement together. This algorithm adapts the popular Ant Colony Optimization (ACO) metaheuristic. Given the number of VMs and data as input, the proposed algorithm selects a set of PMs that minimizes data transfer delay. Based on the number of VMs a PM can host, the blocks of data to be copied to that particular server is decided. The data is copied to the physical storage of the corresponding server and the given number of VMs are started on that server.

The rest of the paper is organized as follows: Section 2 gives the literature review in detail. Section 3 outlines the overview of the system architecture. Section 4 describes the problem (MinDistVMDataPlacement Problem) and shows that this problem is NP-Hard. Section 5 presents the algorithm and the experimental evaluation is given in Section 6. Section 7 concludes the paper.

2. Literature review

MapReduce [4] is a popular programming framework for processing large amount of distributed data. This consists of mainly two functions, namely Map and Reduce. The computation is in terms of key/value pairs. Map phase takes some key value pairs as input and produces intermediate key/value pairs. Reduce phase takes the intermediate key/value pairs and produces final key/value pairs. There is a shuffle phase in-between Map and Reduce phases.

Hadoop [5] is an open source implementation of MapReduce with a distributed file system for storage of data. The distributed file system is known as Hadoop Distributed File System (HDFS) [9]. Hadoop is widely accepted for processing big data. There are two type of nodes in Hadoop cluster: Data Nodes for storing data and Compute Nodes for performing computation. Since the data size

is huge and computation is very small, the current trend is to move computation to the location of data. In HDFS, the large sized data is split into fixed sized blocks and distributed across the Data Nodes. The computing slots can be started on the same Data Nodes or on different nodes. The scheduler tries to schedule Map tasks on the Data Nodes if possible. For processing big data, end users need to create Hadoop clusters of required size. Some users may need it for a short period of time and some may need for a long time but the size of cluster may vary. In this case, usually people create cluster of maximum needed size. But most of the time resource wastage will be there. Managing the cluster is also a tedious task for the end users.

Cloud computing provides infrastructure, platform, software etc. as services to the user in a pay-as-you-go model [10]. The adaptation of cloud computing resulted in significant gains in productivity and cost savings in various fields including railway technology [11]. In mobile cloud computing, offloading is a popular method where the required computation takes place remotely inside the cloud. But, when to offload, energy efficient job scheduling, resource management inside the cloud, selection of appropriate cloudlet to offload an application, selection of application specific cloudlet with respect to low power and low latency are open challenges of mobile cloud computing [12].

The resources in cloud are virtually infinite and scalable on demand. MapReduce cloud provides MapReduce clusters on demand. There are different options for providing MapReduce as a cloud service [7,13-16]. One is allocating VMs with storage for the cluster. Data is uploaded into this VM and processed by MapReduce and the result can be taken back to the user or stored to a storage cloud. The second option is to store data in storage cloud and allocate a set of VMs for MapReduce cluster. The data from the storage cloud is copied to the VM before starting the processing and the results are stored back to the storage cloud. In the third option, a set of PMs are selected for the MapReduce cluster. The data is stored in the physical storage of the server and a set of VMs are started on these PMs based on its capacity. The data is processed by the VMs and results are stored back to the physical storage media. Compared to the first two, the last option has a number of advantages. In the first option, everything will be lost if the VMs are removed. In the second option, there is delay in copying data to the VM. This delay depends on the size of the data.

Majority of the research work found in the literature consider the data placement and VM placement separately. VM Placement (placing VMs on PMs) is a widely studied topic. The main objectives of majority of these studies are consolidating the VMs on servers for energy efficiency and server utilization [17-23]. Ahmad et al. [24] analysed VM migration and the different VM consolidation frameworks for cloud data centers. But in the case of MapRe-duce like applications, if the VMs are hosted in distant PMs, data transfer time will be more and the bandwidth usage will be high [25]. In the survey of the resource management in laaS cloud, Manvi and Shyam [26] observed that performance metrics like delay, bandwidth overhead, computation overhead, reliability, security and Quality of Experience have to be taken into consideration while designing a resource management scheme.

The network aware VM placement algorithms proposed in [2731] are not developed specifically for data-intensive applications, and hence these are not considering the location of data being processed. Refs. [14,32-37] consider network awareness and distance between VMs. But the authors assume that the data is already distributed in storage cloud and these works optimize VM placement with respect to location of data. Ref.s [13,16,38] consider both data and VM placement.

Cura [15] allocates preclustered VMs for MapReduce cluster. But here the problem is over-provisioning of the resources. Clusters with exactly equal number of VMs may not be available

all the time. This over-provisioning may lead to starvation of some other requests.

Alicherry and Lakshman [32] proposed algorithms for resource allocation in a distributed cloud. Here also the authors consider the allocation of VMs only. They propose approximation algorithm for placing VMs in closer data centers. Alicherry and Lakshman [14] proposed algorithms for VM placement in cloud environment, optimizing data access latencies. The location of data is already available and they try to minimize the inter VM distance and VM-data node distances. But without optimizing data placement, optimizing only VM placement may not give a good result.

Tziritas et al. [36] addresses energy efficiency and network load minimization. The authors propose algorithms for application aware workload consolidation, considering both energy efficiency and network load minimization separately as well as together. But in these algorithms the authors consider a cloud environment with an initial placement and try to optimize the VM placement by migration for achieving energy efficiency and network load minimization. The authors consider interdependent VMs and try to consolidate into a PM that minimizes energy utilization. But in a cloud environment that provides data-intensive applications as a service, VM migration causes additional overhead. The number of VMs required to create a MapReduce cluster may not be consolidated in a single physical server.

He et al. [39] addresses VM consolidation to save energy by treating VMs as moldable. Moldable VMs can change their resource capacities during consolidation without jeopardizing QoS. Moldable VMs consolidate to fewer number of physical nodes than rigid VMs. Inter-VM communication and data transfer are not considered here.

Shabeera and Madhu Kumar [40] proposed algorithms for VM allocation in MapReduce cloud. They proposed Greedy, Random and PAM based algorithms for VM allocation. In all these cases they are not considering the data placement.

Di Martino et al. [41] survey the most recent developments on cloud computing in support for big data. These authors also highlight the challenges faced and the early results related to the development of data-intensive applications distributed across multiple cloud-based data centers.

Remedy [37] is a VM migration based approach for network-aware VM management in data centers. Intelligent VM migrations with network-awareness avoid network hotspots without degrading network performance of other flows in the network. The target hosts for migrating VMs are ranked based on the cost of migration modeled in terms of additional network traffic generated during migration, available bandwidth for migration and the resultant bandwidth after migration.

Purlieus [13] improves data locality in MapReduce cloud by coupling data placement with VM placement. They store the data on physical storage of the MapReduce clusters and the computing VMs are started on the same or nearby physical servers. Here the initial selection of the physical nodes is not mentioned. Purlieus mainly concentrates on the Map and Reduce scheduling part and not in the initial selection of the nodes. They assume that the PMs are connected to each other by a local area network. But in cloud environment, since it is highly distributed, optimally selecting the nodes itself is an NP-Hard problem.

CAM [16] uses a min-cost flow model for data and VM placement by considering storage utilization, changing CPU load and network link capacities. This approach considers both VM migration and delay scheduling. But these two techniques add additional overhead to the system.

Coupled Placement Advisor (CPA) [38] is a framework for coupled placement of application storage and computation in data centers. In this approach the data and computations are placed based on the proximity and affinity relationships of compute and storage nodes.

Big data analytic's input data consists of terabytes or petabytes of data. For processing these, many VMs are required. So for improving the data locality, the input data needs more than one PM to store their data depending on the number of VMs it can hold. Execution time of data-intensive applications depend on the data transfer delay between the VMs in the cluster. So, the resources should be allocated minimizing the distances (network latency or hop count) between VMs and the data they are processing to improve the job completion time.

3. System architecture

A distributed cloud consists of multiple data centers distributed across the world. Each data center consists of racks of servers of varying type. These servers are virtualized for improving the resource utilization. Processing is done by the VMs and the type and number of VMs are decided according to the nature of job and the data size. In addition to this, the data storage requirement is normally very high for the data-intensive applications. Users of this cloud request service from the service provider by submitting data and job to be executed on the data. Unlike traditional data-intensive systems that use separate storage cloud and compute cloud, the architecture we have considered consists of a set of physical servers that work as both storage as well as compute cloud. The servers are virtualized and these VMs constitute the clusters for compute cloud. The data is distributed across the physical storage devices of the servers based on the servers' VM allocation capacities. Fig. 1 illustrates the cloud scenario being discussed in this paper.

In this architecture, the first phase is the profiling phase. When a job is submitted, it goes through the profiling phase [42]. This phase analyses the job and data, and decides the type of VMs and cluster size (number of VMs) to be allocated to process the job on the input data. Since our system considers homogeneous VMs at present, profiler outputs the required number of VMs. Starfish [43] is an open source tool to create profiler. Based on the cluster size, the VMs have to be placed in PMs.

The next phase is the selection of PMs from the available resource pool. This phase is the resource allocation phase. This phase selects the PMs that can accommodate the required number of VMs. The available resource pool is updated. Then the data are copied to the storage location of the PMs based on the VM allocation capacity and the jobs are started on the VMs. After job completion, the results are send back to the user.

In a cloud environment consisting of multiple data centers, it is possible that while allocating VMs, the allotment may be spanned over PMs on different data centers, which are far apart. So, at the time of resource allocation, the PMs have to be selected such that the sum of VM allocation capacities is at least the required demand of VMs and they are physically closer. This paper formulates this problem of finding adjacent PMs that can accommodate the required number of VMs and proposes an algorithm for the same and also to improve the job completion time.

4. Problem description and proof of hardness

Consider a distributed cloud environment in which the cloud data centers consisting of racks of PMs with enough storage capacity and predefined number of VMs. The distance between PMs denotes the access latency or hop count, that is assumed to be known earlier. The distance between two PMs is calculated by measuring the number of networking devices between them. We consider distance between PMs as processing delay of switches * number of switches between them. For example, if two PMs are in same rack, there is a ToR switch connecting the PMs. So, the distance will be the processing delay of this ToR switch. If the PMs are in two different racks, there will be two ToR switches

Fig. 1. System architecture.

and one or more higher level switches, depending on the underlying cloud network architecture.

The resource allocation algorithms used by the cloud provider have great impact on the performance of applications as well as cloud provider's profit and resource utilization. In the MapReduce-as-a-Service cloud architecture we have considered, the resource allocation phase selects a set of PMs based on the VM hosting capacity. Input of this phase is the required number of VMs. After selecting the PMs, the input data is stored in the physical storage devices of the PMs. These data are processed by the VMs placed on the corresponding PMs. These VMs may interact with each other while processing data. Hence the resource allocator should select PMs that reduce cross network traffic and access delay.

The problem of selecting PMs for placing data and VMs (MinDistVMDataPlacement problem) is formally defined and its hardness is proved in the following subsections.

4.1. MinDistVMDataPlacement Problem

Given (P, N, d, k), where P = {1,2,3, ...ng are set of PMs, N(i) = ni represents the number of VMs the ith PM can hold, dij p 0 defines the distance between PM i and PM j (assume dii = 0 for all i), and the demand, k p 1, the MinDistVMDataPlacement problem is to pick a subset P' # P such that J2iePi ni p k and Hf '} # P' dj is minimum.

This problem can be restated as: Given a positive integer k and a weighted complete graph G(V,E,N,d), where V = {1,2, ...ng such that vertex i represents PM i, every i 2 V is associated with a value N(i) which denotes the available number of VMs in PM i, each edge (i, j) is associated with a weight dj, ie. d : V x V ! R+ u {0}, the MinDistVMDataPlacement Problem translates to find U # V such that J^/ieUni p k and ^Ty # v dij is minimum.

The formulation of the MinDistVMDataPlacement problem is:

Minimize^^ dijXiXj i=i j=i

Subject to v '

Yin^niX P k, Xi 2 {0,1} (1 6 i 6 n)

Given a solution to this problem x* 2{0,1}n, Opt(G) = {i: xi = 1} denote the nodes picked and refer to this as the optimal solution.

Note that this problem differs from the weighted clique problem and its variations [44]. Next section proves the NP-Hardness of the problem by reduction from the Minimum Knapsack problem.

4.2. Reduction from Minimum Knapsack problem to MinDistVMDataPlacement problem

This section shows that the MinKnapsack [45] is polynomial time reducible to MinDistVMDataPlacement. It is known that the Knapsack problem is NP-Complete [44]. Consequently, the MinDistVMDataPlacement problem is NP-hard [46].

Definition 1. Minimum Knapsack instance I = (n, N, S, k) consists of n items, {1,2..., n} with N(i) = nir an integer value p 0 and size S(i) = st p 0,1 6 i 6 n and demand k. The problem is to find T c {1,2,..., ng that minimizes the sum of items selected in T subjected to J2j2Tnj p k[47].

The formulation of the MinKnapsack problem is:

Minimize^ sixi i=1

Subject to (2)

EL"^p k

Xi 2 {0,1}(1 6 i 6 n)

The notation, Opt(I) = {i: xi = 1} for a given solution y* 2 {0,1}n denote the items picked and this is referred as the optimal solution.

4.2.1. Reduction

Given an instance of the MinKnapsack problem, I = (n, N, S, k) , where.

N : {1... n}! N u {0}, such that N(i) = ni, S : {1... n}! R+ u {0}, such that S(i) = si,

k is a positive integer.

An instance of the MinDistVMDataPlacement is defined as:

G = (V, N, d, nM + Ik), where V = X u Y, such that X = {1,2,..., n} and Y = {n + 1, n + 2,..., 2n} where i and n + i are two distinct nodes in G corresponding to each item i 2 1(1 6 i 6 n).

Let M = P=1ni + 1,N : V ! N u{0} is defined as: N(i) = ni and N(n + i)=M for 1 6 i 6 n.

d : V x V ! R+ u {0}, defined as follows:

' si if j = n + i 0 otherwise

for 1 6 i, j 6 n

The demand of G is set as nM + k.

The following Lemma is an immediate consequence of the choice of the demand value, nM + k.

Let Opt(I) be the optimal solution for I. If Opt(G)\{n + 1, n + 2,..., 2n} — Opt(I), there exist another set, Opti(I) = Opt(G) \{n + 1, n + 2,..., 2n} such that P^pt^n P k

and J2jeOpt(r)nj P k, I2ieOpt/(I)si — PjeOpt(f)sj.

To complete the proof, consider the following cases:

Case 1: PigOpt,mSi < PJeOpt(I)Sj. Then Optt(I) will be the optimal solution for I. Which contradicts the assumption that Opt(I) is an optimal solution for I.

Case 2:PieOpt,(I)si > PjeOpt(I)Sj. Then P

■ieOptid)^

jeOptdfj-je{n+1,n+2,...,2n}Sj

^ieOpt(I)

je{n+1,n+2,...,2n}Sj

Sj. This contradicts the assumption

that Opt(G) is the optimal solution for G. Hence Opt(G) = Opt(I)[{n + 1,n + 2,...,2n}.

Lemma 1. The demand nM + k in G cannot be satisfied until all nodes {n + 1,n + 2,...,2n} are picked.

The following Theorem establishes the exact correspondence between Opt(I) and Opt(G).

Theorem 1. Opt(G) = Opt(I)u{n + 1, n + 2,..., 2n}, where the Union is disjoint.

Proof 1. By Lemma 1, Opt(G) should contain {n + 1, n + 2,..., 2n}. These elements supply nM. Then to satisfy the demand nM + k, Opt(G) should contain Xt # X such that pueX,nu P k and PueX,su is minimum.

Table 1

MinKnapsack instance.

Item Size Value

1 30 7

2 10 8

3 20 4

4 50 6

5 20 5

6 40 5

Corollary 1.1. S # {1, 2,..., ng is an optimal solution for MinKnapsack problem iff S u{n + 1, n + 2,..., 2ng is an optimal solution for MinDistVMDataPlacement problem.

Corollary 1.2. MinDistVMDataPlacement problem is NP-Hard.

Proof 2. MinKnapsack problem is polynomial time reducible to MinDistVMDataPlacement problem. It is known that MinKnapsack problem is NP-Complete. Hence MinDistVMDataPlacement problem is NP-Hard.

4.2.2. Example

Consider a Minknapsack instance with items 1,2, 3, 4, 5, 6. Their values and sizes are shown in Table 1. Let the Demand = 20. A feasible solution for the instance given is {1,2,5g. Construct a complete graph with vertices {X1, X2, X3, X4, X5, X6, Y1, Y 2, Y 3, Y4, Y5, Y6} Assign value of item i to vertex Xi and M = 36 to Yi (1 6 i 6 n). The demand (nM + k) is 236. The resultant graph is shown in Fig. 2. Apply MinDistVMDataPlacement on this graph. The optimal solution is {Xi;X2,X5, Yi; Y2, Y3, Y4, Y5, Y6g with cost = 60. The corresponding items in Minimum KnapSack is {1,2,5g. It is the optimal solution for Minimum KnapSack.

Fig. 2. Reducing MinKnapsack to MinDistVMDataPlacement.

4.3. How to solve MinDistVMDataPlacement problem?

We have proved that MinDistVMDataPlacement is NP-Hard. MinDistVMDataPlacement is a subset selection problem and computationally infeasible for large data centers. A subset selection problem is a problem of finding feasible subset of objects from an initial set of objects. Heuristic algorithm is an option for such problems. Heuristic approaches find ''rather good" solutions, that may be optimal. Ant Colony Optimization (ACO) metaheuristic algorithm have been successfully used to solve subset selection problems like maximum clique, Multidimensional Knapsack, Maximum Boolean Satisfiability, Maximum Constraint Satisfaction, Minimum Vertex Cover, Edge-Weighted k-Cardinality Tree and so on [48]. Brugger et al. [49] demonstrated that ACO perform superior to genetic algorithm for large problem instances. According to Solnon et al. [48,50] ACO outperforms genetic algorithm, tabu search, simulated annealing for subset selection problems. In the next section, we present the ACO based algorithm for selecting subset of PMs for placing VMs and data, considering the sum of distances between the PMs.

5. ACO Metaheuristic Algorithm for Solving MinDistVMDataPlacement Problem

As mentioned in Section 2, most of the works in literature consider VM placement and data placement as separate problems. Actually in data-intensive applications, both the locations of data and VMs affect the data processing time. The VMs assigned to process the data should be close to the data. Here we have considered the cloud environments with racks of physical servers distributed across multiple data centers. These servers act as both data nodes and compute nodes. These servers are assumed to have enough storage capacity and fixed VM placement capacity. The data is stored in the physical storage space of the server and the VMs on the server perform computation on the corresponding data.

Given the number of VMs and data, a set of PMs that reside physically closer are to be selected for the cluster. The cloud environment is mapped to a weighted complete graph, where the vertices represent PMs and weights on the vertices represent the VM placement capacity of the PM. The weights on the edges represent distance between corresponding PMs. A subset of PMs has to be selected such that sum of weights of the selected vertices is at least equal to the required demand and the sum of edge weights between the selected vertices is minimum. The previous section proved that this problem is NP-Hard. This section proposes an algorithm to select a set of adjacent PMs having sum of VM placement capacities equal to the required number of VMs. The Ant Colony Optimization (ACO) metaheuristic algorithm [51,52] is adapted for solving the VM and data placement problem in the cloud environment. After the selection of physical servers, data are copied into the servers and VMs are started to perform computation on these data. Algorithm 1 gives the pseudocode of Ant Colony Optimization(ACO) algorithm.

In MinDistVMDataPlacement problem, the desirability of vertices are not independent. The selection of a vertex depends on the subset of already selected vertices. Therefore Algorithm 1 follows clique pheromone strategy [48]. In the clique pheromone strategy, pheromone values are associated with every pair of vertices. The quantity of pheromone on the edge (vi, Vj) represents the learned desirability of selecting both the vertices vi and Vj within the same solution.

Wt + Wj

Algorithm 1: ACO Metaheuristic Algorithm for MinDistVMDataPlacement Input: Set of PMs P and demand of VMs vmdmnd, distance matrix D

A set of parameters [a, p,Tmin,Tmax,nbAitts,nouof-Cycles)

Output: Feasible-BestS olution

1 Initialize Parameters: Feasible-Best-Solution= 0, Pheromone trail associated with each edge(i, j) as

2 repeat for each ant k e [l..nbAnts] do

| PMS ett-D; Candidatest - P; end

for each ant k £ [l..nMrtii] do

Randomly choose first PM pt e P, if (Pi.VMval — vmdmnd) then

Feasible-Best-Solution = p,; return Feasible-Best-S olution;

PMSett = p,

while Candidatest 2 0 and (PMSett.VMval < vmdmnd) do choose a PM Pi with probability pipi, PMSett) =

[t>»„(Pi, PMSett)]"

[T/ac„,(Pi, PMS ett)r

if (Pi.VMval == vmdmnd) then Feasible-BestS olution = p, \ return Feasible-BestS olution;

PMSetk = PMSett u pt; Remove Pi from Candidatest;

Optionally, apply local search to one or more solutions of PMSeti..,PMSet„bAnti; for each ant k e [l..n£Anfs] do

if PMSetk.cost < Feasible-BestS olution.cost then

Feasible-BestS olution = PMSett; Feasible-BestSolution.cost = PMSett-cost;

for each pt e P do tor each pj 6 PAo

Update the pheromone trail Tpiipj as follows: T»« = tmA1 + W'u. [PMSeti„, PMSetwu,)); if TiPuPj) < Tmm then | TiPuPj) = TminJ

if TiPuPj) '■<•■'then ■rtpi,pj) = rmax;

40 until maximum number of cycles reached or acceptable solution found;

41 retuill the best solution found since the beginning: Feasible-Best-Solution;

The pheromone components are stored in an n x n matrix, where n is the number of vertices. A set of ants try building solution based on these pheromone values. Paths with more pheromone values have greater probability of selection. And after each solution building phase, the pheromone values are updated. The best solution built by these ants is kept globally. This procedure is repeated for a fixed number of times. The inputs to the algorithm

Set of PMs (P) Demand of VMs (vmdmnd) Distance matrix (D)

and a set of ACO specific parameters

• a: The parameter that determine the importance of pheromone factor (sfactor)

• q: pheromone persistence rate

• smin: Minimum pheromone trail

• smax: Maximum pheromone trail

• nbAnts: Number of ants

• no_of-Cycles: Number of cycles

Initially pheromone trail (s) associated with each edge(i, j) will be wdwj, where wi and wj are the weights of vertices i and j respectively and dij is the distance between vertices i and j. The best feasible solution and solutions of each ant are initialized as empty.

The PMSetk keeps the set of PMs selected by ant k. Initially it is empty and gradually adds vertices into the set based on a probabilistic transition rule. The first vertex in the set is selected randomly. A new vertex Vj is selected from candidate set based on the probabilistic transition rule. The probabilistic transition rule contains a pheromone factor (Sfactor), which depends on the sum of pheromones on every pair of vertices (vi, Vj) such that vi is a vertex already in PMSet. Sfactor of a particular node Vj is calculated as:

Sfactor (Vj, PMSetk) = s(vi, Vi):

vi2PMSetk

The vertices from the candidate set with highest probability are added to the PMSet until their sum of weights becomes equal to at least the required demand (vmdmnd).

The probability function for selecting a vertex pi into the set of ant k denoted by PMSetk is

p(p„ PMSetk) =

[Sfactor(pi, PMSetk)

m [sfactor (pi, PMSetk) pjeCandidatesk

where a p 0 is an ACO specific parameter that control the influence of s. a determines the diversification. Decreasing the value of a emphasizes diversification. So the value of a has to be selected depending on the availability of time for solving. Eq. 5 finds the probability of every vertices in the candidate set and the one with highest probability will be selected for including in the set. Every ant build solution in this way. After this phase if a solution better than the Global Best Solution is found, the Global Best Solution will be updated with this new set.

Next phase in Algorithm 1 is the pheromone updation. The pheromone values are updated for subsequent iterations according to the equation:

Tpi.p, = xPip .(1 - p)+ Ss(Sij, {PMSet1...,PMSetnbAntsg)

where q is the pheromone evaporation rate.

Pheromone evaporation is applied to decrease pheromone values. The aim of pheromone evaporation is to avoid an unlimited increase of pheromone values and to allow the ant colony to forget poor choices done previously [53]. Increasing the value of q also emphasizes the diversification by making pheromone evaporation slow. So, just like a, the value of q also has to be selected based on the availability of time to solve the optimization problem. Phero-mone trails are bounded within [smin, smax] to avoid a situation where all ants construct the same solution again and again without finding any better solution.

In the next iteration, the ants build solution based on this new pheromone values. This process is iterated according to the no of cycles given as input. But in each iteration whenever a vertex is selected from the candidate set, the weight of the vertex is compared with the demand, vmdmnd. If it is equal to at least vmdmnd the single vertex is returned as solution.

The heuristic factor g evaluates the promise of an object based on the solution an ant has built so far. This is useful for pheromone strategy in which the object's desirability is independent of other selected objects. But we are following a clique pheromone strategy. In the clique pheromone strategy, pheromone values are associated with every pair of objects, not with each object. The heuristic factor is not considered in this algorithm because it has been found that for clique pheromone strategy, it is better to use no heuristic factors [48,54]. The computational cost analysis of the algorithm follows.

5.1. Computational cost

The number of ants in the algorithm is nbAnts and each ant build solution. If the number of hosts is denoted by |P|, the size of the distance matrix will be |P| * |P|. In the solution building process, pheromone trail matrix is created, which also has a size of |P| * |P|. Lines 3-39 will be repeated for a predefined number of cycles, no„of „Cycles. Lines 3-5 and 23-27 of Algorithm 1 execute nbAnts times. The candidate set for each ant is the set of PMs, so the while loop from lines 12 to 20 executes at the most |P| times. This while loop is repeated for each ant. So the computational cost of lines 6-21 will be O(nbAnts * |P|). Lines 28-39 constitutes the pheromone updation phase. The pheromone updation phase takes O(|P| * P) time. Hence the computational cost for the algorithm is O(no„of „cycles(nbAnts + nbAnts * |P| + nbAnts +|P| * |P|). Since | P| nbAnts, this is O(no_of-cycles * |P| * |P|), which is O(| P|2). Time complexity of the algorithm is O( P 2) and so the computational cost mainly depends on the number of hosts in the cloud.

6. Experimental evaluation and discussion

In a MapReduce cluster, data transfer occurs between different phases of job execution. We have carried out experiments for studying the role of distance between the allotted VMs on job completion time. The distance between VMs actually depends on the number of networking devices, the processing delay, and the bandwidth of the link in between them. If there are n VMs, the distance between every pair of VMs are added to get the total distance. Higher total distance means while communicating between the VMs, more number of higher level switches are involved and this results in congestion in higher level switches and delay in job completion time. Distance between two VMs is a function of underlying network latency and bandwidth between the PMs hosting the two VMs. If the VMs are on the same PM, their distance will be zero. Otherwise the distance depends on the number of switches and links between the PMs hosting the VMs. For studying the role of distance between VMs on job completion time, three Hadoop clusters are created with VMs. Each cluster is having four VMs. Each cluster consists of VMs created in IBM System 3100 M4 with Intel Xeon E3-1220 processor, 16 GB RAM and 1 TB hard disk and the configuration of the VM is 2 GB RAM, 2 cores and 20 GB disk.

If the VMs are in closer PMs, network delay and data transfer cost can be minimized. If the VMs are in same PM, the delay between VMs are negligible and the distances between these VMs can be taken as zero. There may be requirements for smaller number of VMs, that can be accommodated on single PM. The number of VMs that can be placed in a PM depends on its configuration and the configuration of the required VMs to be placed. For example, if a server has a configuration of 32 cores, 512 GB RAM, and 16 TB disk and the VM configuration is 2 GB RAM, 2 core and 128 GB disk, 16 VMs can be placed in this server. If a request for 10VMs comes, a cluster of 10 VMs can be created in this single server. And in this case, the data transfer delay will be negligible. If the configuration of a PM is 16 cores, 15 GB RAM, 2 TB hard disk

and the configuration of the VM is 2 core, 4 GB RAM and 100 GB disk. Here, although the PM has 16 cores, the PM can place at the most 5 VMs, since the RAM size is 16 GB. So if the client demands for more VMs than the VM placement capacity of a single PM, it cannot be placed in this PM and need more than one PM. In this case (if the demand is greater that the VM placement capacity of a single PM), the PMs have to be selected such that they are close to each other. Data transfer between VMs takes time depending on the number of switches and links and their delays.

To study the impact of closeness of VMs on job completion time, we created three clusters. In cluster 1, all VMs are in single PM. Only one VM deployed per PM for clusters 2 and 3, although it can accommodate more. In Cluster 2, four VMs are distributed across 4 PMs in a rack. In Cluster 3, four PMs residing on four different racks are selected to place the VMs. Two benchmark MapRe-duce programs, namely, TeraSort and WordCount are executed on these three clusters. Fig. 3b shows the execution time of TeraSort in these three clusters with 1 GB and 10 GB data. Fig. 3a shows the execution time of WordCount in these three clusters with 1 GB and 10 GB data. From these results it can be concluded that if the VMs are in same or nearby PMs, the job completion time can be reduced.

We formulated the problem of VM and data placement, referred as MinDistVMDataPlacement and proposed an algorithm based on ACO metaheuristic. The ACO algorithm is compared with FFD and Distance-aware FFD. We used simulation-based evaluation to compare the performance of the proposed metaheuristic algorithm with the existing algorithms in literature, namely, FFD and Distance-Aware FFD. These algorithms have been simulated in the popular cloud simulation platform CloudSim [55] and evaluated based on topological data given by Benson et al. [56]. We have considered the data center topology shown in Fig. 4. The host to switch links are 1GigE and the links between switches are 10 GigE. We assume uniform latency for all the switches.

We have extended CloudSim [55] by adding the concept of racks. Instead of adding hosts directly to the data centers, multiple racks are added to the data centers and hosts are assigned to the racks. So each host will have both rack id and data center id. Both FFD and Distance aware FFD are greedy approaches. These two algorithms sort the hosts in the decreasing order of VM allocation capacities. FFD selects the hosts from the sorted list until their sum of weights equal to at least the required number of VMs. FFD is not aware of the topology. Distance-aware FFD is a modified version of FFD. Distance aware FFD selects the first host from the sorted list and the remaining hosts are added based on the distance from the already selected hosts until the required demand is satisfied. This is a topology aware greedy approach.

Three input data sets are given to these algorithms. First one is UnivCloud, based on University data center topology (Univl). Uni-vCloud consists of 18 racks. Each rack consisting of 40 servers. There are 720 servers in total. The second data set is based on Private data center topology (Prvtl) and we named it as PrvtCloud. PrvtCloud consists of two DCs, each with 25 racks consisting of 40 servers each. So there are 2000 servers in PrvtCloud. The third data set is MultiDCCloud, consisting of 10 DCs each with 25 racks. Some racks contain 20 servers and some racks contain 30 servers. In total there are more than 6000 servers in MultiDCCloud. All these input data sets follows the respective topologies given in [56]. All the servers are virtualized. Some servers are having 60 cores and some are with 40 cores. The VMs are assumed to have 2 cores and 4 GB RAM. So some of the PMs can place 30 VMs and some can place at the most 20 VMs.

MinDistVMDataPlacement is an optimization problem. When solving this problem, ACO algorithm has to intensify the search around the most promising racks and at the same time, it has to discover new and more successful search space. At each cycle, every ant build solution and the one with minimum objective function is kept as the global best solution. This is repeated for a fixed

■ Ouster 1

■ Cluster 2 Cluster 3

■ Cluster 1

■ Cluster 2 Cluster 3

Data Size

(a) WordCount

Data Size

(b) Terasort

Fig. 3. Comparison of execution time.

Fig. 4. Data center topology [57].

number of times (no-of-Cycles) to get the optimal value. The ACO specific parameters' adaption can be done either online or offline. The constraint while choosing values for the parameters in the algorithm is the running time of the algorithm. We have to choose values such that the running time will not affect the Quality of Service (QoS) in cloud. When a client requests for resources and if the response time is more, it adversely affects the QoS. The value of ACO specific parameters, a, p, no_of-ants and no^of-Cycles have to be selected depending on affordable waiting time of the job. The values of the parameters used in Algorithm 1 are fixed through rigorous trial-and-error offline tuning. The details of these parameters are as follows: The values of a and p are varied between 0 and 1, smin varied from 0 to 1, smax is varied from 1 to 20, no_of_ants is varied between 1 and 20, and the no.of .Cycles is varied between 1 to 10. The algorithm is tested with various combination of these parameters and the values are selected in accordance with the waiting time. These values are: a = 0.5, p = 0.5, smin = 0.001, smax = 10, noofants = 8 and no.of „Cycles = 5. If the waiting time of the job is not a constraint, the algorithm can be further improved by decreasing the values of a, p and increasing the values of no^of-ants and no^of-Cycles.

The execution time of the ACO Algorithm for these three data sets are shown in Fig. 5. All the experiments are repeated for three cases: When there is no load in the system. That is, when all the VMs are available for allocation. The second case is with 50% load, ie when 50% of the VMs are already allocated. The third case is with 90% load. In the third case only 10% of the VMs are available for allocation.

We have compared the number of PMs allocated in our algorithm with the greedy approaches, First Fit Decreasing(FFD) and Distance-aware FFD. Fig. 6 compares the number of PMs selected for UnivCloud in FFD,Distance-aware FFD and ACO. Fig. 6a shows

the number of PMs allocated in the three algorithms, when there is no load in the system. In this case we can see that the number of PMs are almost same. Fig. 6b shows the number of PMs allocated when the current load in the system is around 50%. In this case we can see that the number of PMs allocated increases in ACO and distance-aware FFD when the number of VMs requested increases compared to FFD. Fig. 6c shows the number of PMs allocated when the load is around 90%.

Fig. 7 and 8 compares the number of PMs selected by FFD, Distance-aware FFD and ACO in PrvtCloud and MultiDCCloud respectively for the three different cases mentioned above. Multiple requests for VMs are given to these three algorithms. Number of VMs requested varied from 10 to 200 and is plotted for 10, 50, 100, 150 and 200 units of VMs.

Since FFD search and find PMs with maximum VM placing capacity, number of PMs in FFD is less than that in ACO. Distance-aware FFD selects the first PM from the sorted list just like FFD. But, the next PM is selected based on the distance from the selected host(s). The number of PMs selected in distance-aware FFD will always be greater than or equal to that of FFD. The number of PMs in distance-aware FFD may be lesser, greater, or equal to that of ACO. For example, if one node in a rack is with high VM placement capacity and all others in that rack have less VM allocation capacities, distance-aware FFD chooses more number of PMs.

We have compared the sum of distances between allocated VMs in the three algorithms. Here we considered hop count as the distance. Fig. 9 compares the sum of distances between currently allocating VMs in the selected PMs in UnivCloud by these algorithms. Fig. 9a compares the sum of distances between allocating VMs in the selected PMs when the current load is 0% in UnivCloud. Fig. 9b compares the sum of distances between currently allocating

-LoadBetasallocation:l>№ -Load Before allocation:90%

Load Before allocsbon:50%

Requested Number of VMs

(a) UnivCloud

-■-Load Before allocations —•—Load Before allocation^ Load Before allocation^

Requested Number of VMs

(b) PrvtCloud Fig. 5. Comparison of execution time.

-Loai before allocation: Mb -Load before allocation: 90W

-Load before allocation: 50%

Requested Number of VMs

(c) MultiDCCloud

-FFD Distance-AraeFFD

-ACQ -*-FTD Distancera* FFD

DstanceAwre FFD

Requested Number of VMs

(a) Load before allocation: 0%

10 50 1«) ISO

Requested Number of VMs

(b) Load before allocation: 50%

Fig. 6. Number of PMs Allocated in UnivCloud.

Requested Number of VMs

(c) Load before allocation: 90%

-FTO DistanceAare FTO

DjstancfrA«re FH)

-ACQ -»-Fffi Distance-Aware FTO

Requested Number of VMs

(a) Load before allocation: 0%

Requested Number of VMs

(b) Load before allocation: 50% Fig. 7. Number of PMs Allocated in PrvtCloud.

10 50 100 150

Requested Number of VMs

(c) Load before allocation: 90%

DtstancMware FFD

Distanc&aware FFD

Number of VMs requester

(a) Load before allocation: 0%

SO 100 ISO 200

Requested number of VMs

(b) Load before allocation: 50% Fig. 8. Number of PMs Allocated in MultiDCCloud.

Requested number of VMs

(c) Load before allocation: 90%

-ACO -»-Fffi DistanceVtaare Fffi

- FTO DistanceAare FTO

- FFD Distarc&^ware FFD

Requested number of VMs

(a) Load before allocation: 0%

Requested number of VMs

(b) Load before allocation: 50% Fig. 9. Sum of distances between VMs in UnivCloud.

70000 60000 /

10000 o:

Requested number of VMs

(c) Load before allocation: 90%

VMs while the current load is 50%. Fig. 9c compares the sum of distances between currently allocating VMs when the load is 90%. Fig. 10 and 11 compare the sum of distances between VMs in selected PMs in PrvtCloud and MultiDCCloud respectively.

The results shown in the above mentioned figures are the average of 50 runs. To measure the amount of variability relative to this average value, we calculated the coefficient of variation in each experiment. The results are shown in Figs. 12a, 12b, and 12c.

We can see that the number of PMs selected by FFD is always less than or equal to the number of PMs selected by ACO and distance-ware FFD. But, since FFD is not considering the distance between PMs, the selected PMs may be in different racks or in different data centers. This will increase the sum of distances between the allocated VMs. Although number of PMs selected in

distance-aware FFD is sometimes less than that of ACO, the sum of distances are not optimized always. Because the selection is highly dependent on the first host being selected.

Since this VMs are hosting data-intensive applications, data will be transferred between VMs while execution is in progress. Data transfer between VMs takes time depending on the distance between these VMs and size of the data that is transferred. Fig. 9 shows that sum of distances between VMs of the selected PMs in UnivCloud is very much lesser in ACO than FFD. This is true for all the other three cases also. So ACO gives subset of PMs with required amount of VM allocation capacity that minimizes the sum of distances between VMs.

We used simulation-based evaluation to compare the job completion time on clusters created by our proposed metaheuristic

Dstanc&Awie FFD

DistancfrAwareFTD

DistancfrAMreFH)

50 100 ISO

Requested number of VMs

(a) Load before allocation: 0%

70000 J /l

EOOOO 5CM0 /y

40000 //

"' 10 50 100 ISO 200

Requested number of VMs

(b) Load before allocation: 50% Fig. 10. Sum of distances between VMs in PrvtCloud.

50 100 ISO

Requested number of VMs

(c) Load before allocation: 90%

Requested number of VMs

(a) Load before allocation: 0%

10 SO 100 ISO 200

Requested Number of VMs

(b) Load before allocation: 50%

Fig. 11. Sum of distances between VMs in MultiDCCloud.

50 100

Requested number of VMs

(c) Load before allocation: 90%

■ LoadBe№reAlloca]jand% *LoadBe№reMlocidjonjO% Load Before Allocations

■Load Before Allocation« «Load Before Allooaion:5№i LoadBefcreAllocadon:90M

Number of VMs

(a) UnivCloud

Number of VMs

(b) PrvtCloud Fig. 12. Coefficient of variation.

I Load befbre allocation: (Ni * Loed before allocation: 50% Load before allocaion: 90%

100 Number ofVMs

(c) MultiDCCloud

IFFD Distance-Aware FFD ACO

UnivCloud PrvtCloud MultiDCCIoud

Fig. 13. Comparison of job completion time on the allocated VMs.

algorithm with the existing algorithms, namely, FFD and Distance-Aware FFD. We simulated MapReduce job execution on clusters of 200 VMs created using ACO, FFD, and Distance-Aware FFD. The experimentation is repeated for UnivCloud, PrvtCloud, and Multi-DCCloud. Input data of size 200 GB is assigned to each cluster and the job completion time is measured for each case. Fig. 13 shows the job completion time on these clusters. These results show that the data transfer delay has a significant role on the job completion time. Since the input size, VM configuration, and cluster size are same, the difference in job completion time is due to the delay in data transfer between the computing VMs. The data transfer delay is dependent on the number of networking devices and links between the computing VMs, which is defined in this paper as distance between the VMs. So we can infer that the job completion time increases with increase in distance between the VMs that execute the job.

7. Conclusion

Data-intensive applications hosted in cloud help the end user with on demand processing of huge amount of data. In dataintensive applications, the computing nodes transfer data between nodes during execution. Since the cloud environment is virtual-ized, the data and VMs should be placed in an optimized way to improve the application performance. The primary focus of this work is to select subset of PMs such that their total VM allocation capacity is at least the required demand of VMs, minimizing the data transfer delay between them. Ant Colony Optimization meta-heuristic algorithm is used for selecting a subset of PMs that satisfies this objective. After selecting the PMs, the data are copied to the storage devices of the PMs and the required number of VMs are started on the PMs based on their VM allocation capacities. Simulation results show that this selection decreases the sum of distances between VMs and hence reduces the job completion time.

This paper considers only homogeneous VMs. The future work includes allocating heterogeneous VMs according to the users' requests, so that the resource utilization can be maximized without deteriorating the job completion time on the allocated VMs. In this paper we are assuming a replication factor of the data blocks as one. Our future work consider a replication factor 'R' and partitions the nodes in the cloud provider's side into 'R' partitions such that distances between partitions are maximized. Handling multiple requests at the same (batch request) is not considered currently. We assume that the scheduler has a queue, and according to the scheduling policy, only one request comes to the profiler phase and do the allocation, updates the cloud resource pool and fetch the next request from the queue. Including batch request processing is another possible extension of this work.

References

[1] C.G.C. Index, Forecast and Methodology, 2012-2017, white paper, Cisco Systems.

[2] C.G.C. Index, Forecast and Methodology, 2013-2018, white paper, Cisco Systems.

[3] G. Lee, Cloud Networking: Understanding Cloud-Based Data Center Networks, Morgan Kaufmann, 2014.

[4] J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters, Commun. ACM 51 (1) (2008) 107-113.

[5] Apache hadoop, Available from: http://hadoop.apache.org [15 May 2015]

¡6] T. White, Hadoop: the definitive guide, O'Reilly Media Inc, 2009.

[7] A. Emr, Amazon elastic mapreduce, Available from: http://aws.amazon.com/ elasticmapreduce/ [05 June 2015]

[8] Hop count, Available from: https://en.wikipedia.org/wiki/Hop%28networking% 29 [15 July 2016]

[9] D. Borthakur, Hdfs architecture guide, Hadoop Apache Project http://hadoop. apache.org/common/docs/current/hdfsdesign.pdf.

[10] P. Mell, T. Grance, The nist definition of cloud computing, 2011.

[11] S.L. Murray, Generational development in railway informaton systems, Int. J. Eng. Sci. Technol. 16 (2) (2013).

[12] A. Mukherjee, D. De, Low power offloading strategy for femto-cloud mobile network, Eng. Sci. Technol. Int. J. 19 (1) (2016) 260-270.

[13] B. Palanisamy, A. Singh, L. Liu, B. Jain, Purlieus: locality-aware resource allocation for mapreduce in a cloud, in: Proceedings of 2011 International Conference for High Performance Computing, Storage and Analysis, ACM, Networking, 2011, p. 58.

[14] M. Alicherry, T. Lakshman, Optimizing data access latencies in cloud systems by intelligent virtual machine placement, INFOCOM, 2013 Proceedings IEEE, IEEE, 2013, pp. 647-655.

[15] B. Palanisamy, A. Singh, B. Langston, Cura: a cost-optimized model for mapreduce in a cloud, IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), IEEE, 2013, pp. 1275-1286.

[16] M. Li, D. Subhraveti, A.R. Butt, A. Khasymski, P. Sarkar, Cam: a topology aware minimum cost flow based resource manager for mapreduce applications in the cloud, Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, ACM, 2012, pp. 211-222.

[17] A. Beloglazov, J. Abawajy, R. Buyya, Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing, Fut. Gen. Comput. Syst. 28 (5) (2012) 755-768.

multi-objective ant colony system algorithm for virtual machine placement in cloud computing, J. Comput. Syst. Sci. 79 (8) (2013) 1230-1242.

[19] X. Li, Z. Qian, S. Lu, J. Wu, Energy efficient virtual machine placement algorithm with balanced and improved resource utilization in a data center, Math. Comput. Model. 58 (5) (2013) 1222-1235.

[20] H. Khani, A. Latifi, N. Yazdani, S. Mohammadi, Distributed consolidation of virtual machines for power efficiency in heterogeneous cloud data centers, Comput. Electr. Eng. 47 (2015) 173-185.

[21] S.E. Dashti, A.M. Rahmani, Dynamic vms placement for energy efficiency by pso in cloud computing, J. Exp. Theoret. Artific. Intell. (2015) 1-16.

[22] Z. Xiao, J. Jiang, Y. Zhu, Z. Ming, S. Zhong, S. Cai, A solution of dynamic vms placement problem for energy consumption optimization based on evolutionary game theory, J. Syst. Softw. 101 (2015) 260-272.

[23] G. Wu, M. Tang, Y.-C. Tian, W. Li, Energy-efficient virtual machine placement in data centers by genetic algorithm, Neural Information Processing, Springer, 2012, pp. 315-323.

[24] R.W. Ahmad, A. Gani, S.H.A. Hamid, M. Shiraz, A. Yousafzai, F. Xia, A survey on virtual machine migration and server consolidation frameworks for cloud data centers, J. Netw. Comput. Appl. 52 (2015) 11-25.

[25] M.H. Ferdaus, M. Murshed, R.N. Calheiros, R. Buyya, Virtual machine consolidation in cloud data centers using aco metaheuristic, Euro-Par 2014 Parallel Processing, Springer, 2014, pp. 306-317.

[26] S.S. Manvi, G.K. Shyam, Resource management for infrastructure as a service (iaas) in cloud computing: a survey, J. Netw. Comp. Appl. 41 (2014) 424-440.

[27] J.-W. Lin, C.-H. Chen, Interference-aware virtual machine placement in cloud computing systems, International Conference on Computer & Information Science (ICCIS), 2, IEEE, 2012, pp. 598-603.

[28] S. Georgiou, K. Tsakalozos, A. Delis, Exploiting network-topology awareness for vm placement in iaas clouds, Third International Conference on Cloud and Green Computing (CGC), IEEE, 2013, pp. 151-158.

[29] J.T. Piao, J. Yan, A network-aware virtual machine placement and migration approach in cloud computing, 9th International Conference on Grid and Cooperative Computing (GCC), IEEE, 2010, pp. 87-92.

[30] D. Kliazovich, P. Bouvry, S.U. Khan, Dens: data center energy-efficient network-aware scheduling, Cluster Comput. 16 (1) (2013) 65-75.

[31] D.S. Dias, L.H.M. Costa, Online traffic-aware virtual machine placement in data center networks, Global Information Infrastructure and Networking Symposium (GIIS), 2012, IEEE, 2012, pp. 1-8.

[32] M. Alicherry, T. Lakshman, Network aware resource allocation in distributed clouds, INFOCOM, IEEE, 2012, pp. 963-971.

[33] Z. Huang, D.H. Tsang, J. She, A virtual machine consolidation framework for mapreduce enabled computing clouds, Proceedings of the 24th International Teletraffic Congress, International Teletraffic Congress, 2012, p. 26.

[34] I. Takouna, R. Rojas-Cessa, K. Sachs, C. Meinel, Communication-aware and energy-efficient scheduling for parallel applications in virtualized data centers, Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing, IEEE Computer Society, 2013, pp. 251-255.

[35] X. Meng, V. Pappas, L. Zhang, Improving the scalability of data center networks with traffic-aware virtual machine placement, INFOCOM, 2010 Proceedings IEEE, IEEE, 2010, pp. 1-9.

[36] N. Tziritas, C.-Z. Xu, T. Loukopoulos, S.U. Khan, Z. Yu, Application-aware workload consolidation to minimize both energy consumption and network load in cloud environments, 42nd International Conference on Parallel Processing (ICPP), 2013, IEEE, 2013, pp. 449-457.

[37] V. Mann, A. Gupta, P. Dutta, A. Vishnoi, P. Bhattacharya, R. Poddar, A. Iyer, Remedy: Network-aware steady state vm management for data centers, NETWORKING 2012, Springer, 2012, pp. 190-204.

[38] M. Korupolu, A. Singh, B. Bamba, Coupled placement in modern data centers, Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, IEEE, 2009, pp. 1-12.

[39] L. He, D. Zou, Z. Zhang, C. Chen, H. Jin, S.A. Jarvis, Developing resource consolidation frameworks for moldable virtual machines in clouds, Fut. Gen. Comput. Syst. 32 (2014) 69-81.

[40] T. Shabeera, S. Madhu Kumar, Optimising virtual machine allocation in mapreduce cloud for improved data locality, Int. J. Big Data Intell. 2 (1) (2015) 2-8.

[41] B. Di Martino, R. Aversa, G. Cretella, A. Esposito, J. Kolodziej, Big data (lost) in the cloud, Int. J. Big Data Intell. 1 (1) (2014) 3-17.

[42] H. Herodotou, S. Babu, Profiling, what-if analysis, and cost-based optimization of mapreduce programs, Proc. VLDB Endowment 4 (11) (2011) 1111-1122.

[43] H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F.B. Cetin, S. Babu, Starfish: A self-tuning system for big data analytics, in: CIDR, Vol. 11, 2011, pp. 261-272.

[44] R.M. Karp, Reducibility among combinatorial problems, Springer, 1972.

[45] D. Pisinger, Where are the hard knapsack problems?, Comput Operat. Res. 32 (9) (2005) 2271-2284.

[46] R.G. Michael, S.J. David, Computers and Intractability: A Guide to the Theory of NP-Completeness, WH Freeman & Co., San Francisco, 1979.

[47] S. Martello, P. Toth, Knapsack problems: algorithms and computer implementations, John Wiley & Sons Inc, 1990., 1990.

[48] C. Solnon, D. Bridge, An ant colony optimization meta-heuristic for subset selection problems. System Engineering using Particle Swarm Optimization, Nova Science, 2006. p. 7-29.

Y. Gao, H. Guan, Z. Qi, Y. Hou, L. Liu, A

[49] B. Brugger, K.F. Doerner, R.F. Hartl, M. Reimann, Antpacking - an ant colony optimization approach for the one-dimensional bin packing problem, in: Evolutionary Computation in Combinatorial Optimization, Springer, 2004, pp. 41-50.

[50] C. Solnon, S. Fenet, A study of aco capabilities for solving the maximum clique problem, J. Heurist. 12 (3) (2006) 155-180.

[51] M. Dorigo, M. Birattari, Ant colony optimization, in: Encyclopedia of Machine Learning, Springer, 2010, pp. 36-39.

[52] M. Dorigo, M. Birattari, T. Stutzle, Ant colony optimization, Computational Intelligence Magazine, IEEE 1 (4) (2006) 28-39.

[53] T. Stutzle, M. Lopez-Ibanez, P. Pellegrini, M. Maur, M.M. De Oca, M. Birattari, M. Dorigo, Parameter adaptation in ant colony optimization, in: Autonomous Search, Springer, 2011, pp. 191-215.

[54] S. Fenet, C. Solnon, Searching for maximum cliques with ant colony optimization, Applications of Evolutionary Computing, Springer, 2003, pp. 236-245.

[55] R.N. Calheiros, R Ranjan, A. Beloglazov, C.A. De Rose, R. Buyya, Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Software: Practice and Experience 41 (1) (2011) 23-50.

[56] T. Benson, A. Akella, D.A. Maltz, Network traffic characteristics of data centers in the wild,, Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, ACM, 2010, pp. 267-280.

[57] M. Al-Fares, A. Loukissas, A. Vahdat, A scalable, commodity data center network architecture, ACM SIGCOMM Computer Communication Review 38 (4) (2008) 63-74.