You are on page 1of 24

Resource

Allocation
inA
Network-based
Cloud Computing
Environment
Research Proposal)

1 | Page

Table of Contents
I.
II.

Chapter-1
Introduction

Chapter-2 Research Problem 1: Resource Allocation in


a Network Based Cloud Computing Environment

2.1- Related Work... 5


II.1.1 Efforts with a focus on a Data Center Processing
5
Resources
II.1.2 Efforts with a focus on a Data Center Network Resources
5
.
2.2- Motivation. 6
2.2.1 A comprehensive Solution for Network/Processing Resource
6
Allocation
2.3- Network Aware Resource Allocation Methodology and design
7
challenges
8
2.3.1
Research
8
Strategy/Methodology
9
2.3.2
Methodology for External
11
Challenges
11
2.3.3
Methodology for Internal
12
Challenges.
12
2.412
Objective..
2.4.1
Static
Case..
2.4.2
Dynamic
Case.
2.5Preliminary
Results..
2.5.1
Detailed
Formulation
2.5.2
Client Request
12
Types

III.

Chapter-3 Research Problem 2: Energy Efficient

Network Based Resource Allocation


3.1- Related Work
3.2- Motivation.
3.2.1 A Comprehensive Solution for Energy Efficient RA

3.3- Energy Efficient Network: Methodology and


Challenges.
3.3.1 Research Strategy/Methodology
.
3.3.2 Common Solutions and
tradeoffs.
3.3.3 Energy Consumption Vs Optimal
Performance
3.3.4 Idle State as a Major Source of Wasted
Power.
3.4- Objective
3.4.1 Static
Case..
3.4.2 Dynamic
Case.

IV.

Bibliography

Chapter 1

Introduction
Cloud computing is an increasingly popular computing paradigm, now proving a necessity for
utility computing services. Several providers have cloud computing (CC) solutions available,
where a pool of virtualized, and dynamically scalable computing power, storage, platforms, and
services are delivered on demand to clients over the Internet in a pay as you go manner. This is
implemented using large data centers (DCs) where thousands of servers reside. Clients have
the choice between using private clouds which are data centers specialized for the internal
needs of a certain business organization and public clouds which are open over the Internet to
the public for use on a pay as you go or pay per demand manner. A practice that is getting more
momentum is applying "surge computing" concepts in which clients use their private clouds
mainly and outsource tasks to public clouds only when private clouds are saturated [32][1]. The
plethora of clients moving to the cloud partially or fully has attracted IT leading providers with
solid computation technology and a software base like Google, Microsoft and Amazon to
implement diverse solutions. Services are offered under several deployment models,

14
15
15
16
16
17
18

18
18
18
19

Infrastructure As A Service (IAAS) , Platform As A Service (PAAS) , Software As A Service


(SaaS) and Network As A Service (NAAS) which is sometimes referred to as Software Defined
Networks (SDN). [18] Each provider offers a unique service portfolio with a range of options that
include Virtual Machines (VMs) instance configuration, nature of network services, degree of
control over the rented machine, supporting software/hardware security services, additional
storage, etc. More recently, the emphasis has shifted to more comprehensive solutions. To
move completely to the cloud, clients demand guarantees with regards to achieving the required
improvements in scale, cost control and reliability of operations. Despite its importance,
providing computation power alone is not sufficient as a competitive advantage. Other factors
have gained more weight recently such as the networking solution offerings. The network
performance and resource availability could be the tightest bottleneck for any cloud. As proven
in [1], an under-performing network could delay applications with heavy data requirements to
the degree that sending data using a mail courier would be a more viable solution. This is seen
as an opportunity for network service providers as a lot of them are planning and building their
own clouds using distributed cloud (D-Cloud) architecture [31].
Here we see the need for a comprehensive resource allocation and scheduling system for
cloud computing data center networks. This system would handle all the resources in the cloud
providers data center network and would manage client requests, dictate resource allocation,
ensure network QoS conditions, and eliminate performance hiccups. This system would execute
the mentioned tasks while minimizing the service provider cost and controlling the level of
consumed energy.
The diversity in instance types in multiple geographically distributed data centers makes
resource management even more of a complicated matter. However , the resource
management of the data centers servers and the network resources while scheduling and
serving tens of thousands of client requests on virtual machines residing on data center
servers, is a critical success factor. First, it is a main revenue source to the service provider as
excess resources translate directly to revenue. Second, it is a key point that will make or break
potential clients decision to move fully to the cloud.

Chapter 2

Research Problem I:
Resource Allocation in A Network-based
Cloud Computing Environment
In the First problem we would like to propose a model for Network based cloud computing
environments. This consists of a mixed network of public and private clouds where clients have
the freedom to use public cloud resources on demand. Clients have the option to reserve Virtual
machines of multiple types where the types are based on the functionality or primary application
or use of the VMs (high memory VMs, High CPU VMs, etc). Clients also have the ability to
request data connections to move data between their private clouds, from a private cloud to a
public cloud or in the other direction. For connection requests, clients define the connection
requirements like source, destination, requested start time, duration, and performance or QoS

constraints. This can be in an advance reservation manner (i.e. requested connection start time
is in the future) or an immediate reservation manner (I.e. requested connection start time is
equal to the request arrival time or as soon as the network controller can schedule it).
In our work, we aim to couple the resource allocation with the concepts of software defined
network (SDN). SDN is a networking paradigm in which the forwarding behaviour of a network
element is determined by a software control plane decoupled from the data plane [34]. SDN
leads to many benefits such as Increasing network and service customizability, supporting
improved operations and increased performance [35] [36]. The software control plane can be
implemented using a central network controller. We are proposing that the central controller will
handle the task of resource allocation in the data center network. This can be done by directing
all the client requests to the SDN controller. The SDN controller will execute the resource
allocation algorithms then send the allocation commands across the network.

2.1 RelatedFigure
Work 2.1: A sample network of private and public clouds
connected
through
the Internet
or VPNs
2.1.1 Efforts with a focus
on Data Center
Processing
Resources:

The problem of resource allocation in a CC was discussed before. Multiple models were
proposed where resources are scheduled based on user requests. In [3], a queuing model is
proposed where a client requests virtual machines for a predefined duration. VM pre-emption
one and multiple job queues are considered. Network resources are not considered at all. Jobs
are assumed not to communicate with each other or transmit or receive data. No preference is
required as to where the VMs are to be scheduled. In [7], an algorithm is proposed to optimally
distribute VMs in order to minimize the distance between user VMs in a data center grid. The
only network constraint used is the Euclidean distance between data centers. No specific
connection requests or user differentiation is used. In the same paper, an algorithm is proposed
to schedule VMs on racks, blades and processors within one data center to minimize
communication cost. In [6], three consecutive phase queues are used to schedule prioritized job
categories .No network topology is used. Rather, only the monetary cost of transmitting data is
considered for network requests .The proposed heuristic results are compared to the equal
allocation method where resources are divided equally between computation , scheduling and
transmission tasks.

2.1.2 Efforts with a focus on Data center Network Resources:

In [4], the authors tackle the problem where a client may have multiple jobs being processed at
the same time but not necessarily on the same server. These requests are abstracted as a
virtual network (VN) where every request (put on a separate VM) is considered a node and the
path between two nodes is considered a link (edge) in the VN. The problem then turns into
provisioning a virtual network. Also, a revenue maximization objective is introduced. The
problem is formulated as a mixed integer linear problem and a genetic algorithm is proposed as
a heuristic method to solve the problem. The time factor is not provisioned for since no
reservation start time or duration is introduced. Also, Users are assumed to request computation
and network resources in one step. The scenario where a user requests more connectivity for
an already reserved VM is not considered. In [8], the authors tackle the problem of proposing
the best virtual network with IP over wavelength division multiplexing (WDM) network. Only
network demands are considered with three different network profile demands evaluated
.constraints are based on propagation delay, flow conversion constant and capacity. User
demand for processing resources is not considered. The authors target minimizing the network
power consumption by minimizing the number of network components that are switched on.

2.2 Motivation
2.2.1 A comprehensive Solution for Network/processing resource allocation
Provisioning for cloud services in a comprehensive way is of crucial importance to any resource
allocation model. Any model should consider both computational resources and network
resources to accurately represent practical needs. First, excluding the processing
(computational) resources while designing the RA model deprives the model of the main and
most important cloud service. Cloud data centers are built first and foremost as ways to
outsource computational tasks. Any model that optimizes data center resources should include
answers to questions like: how are VMS allocated? how are processing resources modeled?
what is the resource portfolio that is being promoted to clients? and how are the data center
resources distributed physically? [10]. The other side of the coin is networking services. Network
services are the backbone of the cloud computation services. As clients ask for tasks to be
processed in the data center, they need networking service with adequate QoS standards to
send and receive their application data. Network services seem to be getting less attention than
necessary and "Bandwidth bottlenecks loom large in the cloud"[40]. In the report prepared by
Jim fray [41], only 54% of the IT professionals surveyed about their use of cloud services
indicated that they involve network engineering/operations personnel, down from 62% in 2009.
[41] This means to directly affect the implementation of network best practices and the attention
to the health of overall traffic delivery. Frey mentions that "only 28% of survey respondents

believe collecting packet traces between virtual machines for monitoring and troubleshooting is
absolutely required. And only 32% felt that collecting data about traffic, from virtual switches for
monitoring and troubleshooting is absolutely required". There is a clear lack of insight into how
the network is performing.
This oversight does not only affect performance. Bandwidth costs deeply affect the cloud
clients' financial structure too. A study performed by the authors of [11] , shows that for a client
who downloads a relatively small amount of 10 GB per day (1.83 Mb/s assuming all clients were
in north America) he would be charged 30 $ /month when using MS Azure. This is equal to
16$/Mb. While the market price per Mb is around 8$[11]. This margin more than covers for
operational network costs for the provider. Therefore, optimizing the bandwidth cost represents
an opportunity of profit for providers and an opportunity for saving for clients.
The weight of network resource importance in the cloud market has pushed network service
providers [33], to start building their own distributed data centers with a vision to enter the cloud
computing market. Their idea is based on replacing a large data center with multiple smaller
data centers with the aim of being closer to clients. The report explains: "cloud is about
delivering multi-tenant IT services. [Network] Service providers already know how to sell multitenant communications services. Important advantages will come from service providers ability
to deliver services that tie the network together with compute and storage. The network
infrastructure effectively becomes a distributed cloud that helps to reduce costs and increase
service differentiation. Service providers can offer compute and storage service options as well
as network capabilities that ensure application performance in the cloud"[33]. The report
summarizes a group of very promising savings (from both CAPEX and OPEX sides) for network
service providers when they build their own distributed clouds and use them to operate
( network and cloud ) services to their clients.

CAPEX savings in the cloud:


1- Investments in network hardware that can be virtualized can be reduced by 25% to 80%
2- Virtualization of customer premises equipment can deliver 30% savings
3-Base station virtualization can reduce civil works costs by 60%
4-Incremental capital investment for adding a subscriber can be reduced by 70%
OPEX savings in the cloud:
1- Data center operations costs can be reduced by 40%
2-Services operations costs can be reduced by 25%
3-Network planning and engineering expenses can be reduced by 20%
4-100% savings in energy and real estate expenses is achieved for the network elements
eliminated from the network through hardware reduction. 90% reduction in maintenance
charges, and 45% reduction in network operations, is also realized for these elements.
5-Base station virtualization leads to 60% savings in site rental costs and a 50% reduction in
power consumption expenses The Alcatel-Lucent researchers also found a connection between
internal transformation and incremental revenue opportunities in the cloud
As mentioned earlier, a cloud service provider would have to cater network services to clients
to support one of three functions:

A- Connecting the clients' private cloud to the VMs the client reserved in the data centers. this
could be using the Internet or VPNs as shown in figure 2.1.
B- Connecting the VMs on different Public clouds together to facilitate data exchange between
two VMs reserved by the same client.
C- Connecting VMs on the same public cloud together.
It is no use to the clients if their application is producing the results needed in the required
time if these results cannot be delivered to the client base through a stable network connection.
In [1], data transfer bottlenecks are stated as one of the main obstacles cloud client base growth
is facing. In that paper, the authors show that when moving large amounts of data in a
distributed data center environment, the network service performance will be a critical point for
the whole process. In the example mentioned, the authors reached the conclusion that for some
configurations, the data transmission tardiness would cause the client to prefer options like
sending data disks with a courier (FedEx, for example).

2.3 Network-aware resource allocation: methodology and design


challenges
Targeting a network-aware resource allocation system brings to the front multiple challenges
that faced the cloud computing community. Addressing those issues would be of utmost
importance to form a complete solution. We can classify these design challenges to two
categories:
External challenges: enforced by factors outside the resource allocation process
1- Regulative and Geographical challenges
2- Charging model issues

Internal Challenges:
1- Data Locality: An opportunity and a challenge (Combining compute and data management)
2- Reliability of network resources inside a data center:
3- SDN design challenges inside the data centers
Before we discuss these challenges one by one, we will discuss in brief the general research
strategy that we will use.

2.3.1 Research Strategy/Methodology


Our first goal is to focus on the design and validation of a cloud resource allocation system
that tackles these challenges. We will execute a well defined set of tasks that leverage
optimization techniques, graph theory, stochastic analysis, and data center network simulation.
We will generally follow these steps which will slightly differ from a phase to phase throughout
the project

1- In our work, we will not assume any specific networking substrate technology.
Instead we will build a solution for a generalized network.
2- Also, no specific restrictions on the data center architecture or the build/type of
serves will be imposed.
3- As a preliminary step, we will simulate the data center environment in order to
get the key research components up and running.
4- Next, for each one of these challenges, we will to adjust the environment
parameters in order to separate the dominant factor that causes the challenge.
5- Then, Exploratory solutions will be tested and their effectiveness recorded.
6- As we tackle these challenges one by one, we will gain a better understanding of
the data center moment to moment dynamics. This will help pursuing and
constructing a complete solution that gets the best combined result for all the
issues as per the needed combination function.

2.3.2 Methodology for External challenges


1- Regulative and Geographical challenges
In the virtualization model used In cloud offerings , the client does not manage the physical
location of data . There is no guarantee given by the provider as for the data physical location in
a certain moment[1]. In fact ,It is a common practice to distribute client data over multiple
geographically distant data centers with Internet as the communication medium more often than
not. Splitting the data will enhance fault tolerance but it will present us with regulative and
security challenges . An example would be the regulative obligation of complying with the U.S.
Health Information Portability and Accountability Act (HIPAA) (the Health Information Protection
Act (HIPA) is Canada's version) . Both were enacted to ensure security and confidentiality of
personal health information in order to protect patient privacy[37].
If we take HIPAA for example , HIPAA does not apply directly to third party service providers, it
is imperative that health care organizations require the third-party providers to sign contracts
which require them to handle all patient data in adherence with HIPAA standards.

Complying with HIPAA raises some constraints to handling and storing data :
A- Geographical constraints: HIPAA requires that patient data does not leave US soil.[38] This
constraint would limit the choice of data centers to allocate a VM to and data movement
manoeuvres while trying to optimize performance. Additionally , "When data is stored in the
cloud, you need to make sure there is a way for you to know exactly where the data is physically
stored, how many copies of the data have been made, whether or not the data has been
changed, or if the data has been completely deleted when requested."[39]
B- Client actions : To get more assurance about data security ,clients may require guarantees
like instant wiping(writing over byte by byte) of the data instead of deletion. They might also
require storing encrypted data on the cloud[39] . This would pose an extra pressure on the
performance and will make it harder to comply with QoS requirements.
C- Under HIPPA, patients have a right to access any information stored about them[38]. A
careful study of the locations of the patients and the usage distribution of these patients would

be crucial for the resource allocation system. considering this factor when placing the data
would minimize the distance patient data will travel in the network. . Here, making a decision
where the data has to be located will have a direct effect on minimizing the cost.
2- Charging model issues
The Resources management system should incorporate the clients charging model. For
example , when using Amazon EC2 , A client has the choice to pay for the instances completely
on demand, reserve an instance for term contract (1 year or 3 years) or even choose spot
Instances that enable you to bid for unused Amazon EC2 capacity. Spot instance Price is set by
Amazon EC2 and fluctuates periodically depending on the supply of and demand for Spot
Instance capacity[2]. We would like to investigate issues like :
A- Finding the service portfolio design /offering that maximizes the revenue weight of excess
resources in the data center. Examining the options available in the market - and from the
example above- we observe that cost is not calculated based on static consumptions.
B- Finding the best way to integrate the virtual network usage into the cost analysis. Challenges
would arise because a virtual link length/distance (and in turn cost) vary from link to link . A
virtual link could even change to use another physical path on the substrate network based on
methodology used . More detailed examples are found in [4].

2.3.3 Methodology for Internal challenges


1- Data Locality : An opportunity and a Challenge ( Combining compute and data management)
There is a pressing need for systems to implement data locality features "the right way". we
mean by that how to combine the management of compute(processing ) and data(network)
resources using data locality features to minimize the amount of data movement and in turn
improve application performance/scalability while meeting end users performance/security
concerns. It is very important to schedule computational tasks close to the data, and to
understand the cost of moving the work as opposed to moving the data [10].

To have a full view of how to use data locality we need to investigate the following:
A- A Data aware-scheduler is critical in achieving good scalability and performance. A more
specific perspective needs to be reached. Questions like : ( How much would the scheduler
know at a certain moment , what are the policies and decision criteria for moving data ,what
data integration policies should be enforced )are to be investigated.
B- Analyzing the behaviour of data intensive applications is a very good starting point to
understand data locality and data movement patterns.
C- Also an idea to be investigated is moving the application itself to servers in the data center
where the needed data is. This raises questions about the availability of servers in the other
data center, policy/ algorithm specifications on when to move considering future demand might
need data stored in the original location, decision criteria as to migrate the whole VM or just
move the concerned application .

D- As discussed previously ,While moving data to increase data locality regulative and Security
issues should be considered .
2- Reliability of network resources inside a data center
While discussions of the network from data center to data center are abundant, we notice that
discussions of the data center internal network is often ignored. it is vital for consumers to obtain
guarantees on service delivery inside a data center. we aim to investigate:
A-The design of a resilient virtual network inside a data center.
B- Moreover, we aim to investigate and analyze ways in which we can design a data center
network that provides different types of service level agreements (SLAs) that can be negotiated
between the providers and the consumers. These SLAs must be enforced by the resource
manager. The network controller inside the data centre should always have updated and precise
network and cloud resource usage. The resource manager should employ fast and effective
optimization algorithms.
C- A critical subject that must be analyzed here is the trade-off between complexity and
efficiency.
D- As discussed previously ,While moving data to increase data locality regulative and Security
issues should be considered .
3- SDN design challenges inside the data centers
As discussed earlier ,using the new concepts of Software defined networks would enhance the
network performance. But since it is a relatively new idea , the community has yet to tackle
deeply these issues regarding SDN :
A-Reliability - As a centralized method, using SDN controllers affects reliability if the controller
fails. Although solutions like stand by controllers or using multiple controllers for the network are
suggested, practical investigation is needed to reveal the problems ,find the decision criteria
and analyze the trade-offs of using such solutions.
B- Scalability -[21] determines that when the network scales up in the number of switches and
the number of end hosts, the SDN controller can become a key bottleneck. For example , [22]
estimates that a large data center consisting of 2 million virtual machines may generate 20
million flows per second. But the current controllers can support about 105 flows per second in
the optimal case [18]. Extensive scalability results in losing visibility of the network traffic,
making troubleshooting nearly impossible.
C- Visibility - Prior to SDN, the network team could quickly spot, for example, that a backup was
slowing the network. The solution would then be to simply reschedule it to after hours.
Unfortunately with SDN, only a tunnel source and a tunnel endpoint with UDP traffic is visible,
but one cannot see who is using the tunnel. There is no way to determine whether the problem
is the replication process, the email system or something else. The true top talker is shielded
from view by the UDP tunnels, which means that when traffic slows and users complain,
pinpointing the problem area in the network is not possible. With the loss of visibility,
troubleshooting is hindered, scalability is decreased and a delay in resolution could be quite
detrimental to the business [19][23].

D- The Controller placement problem influences every aspect of a decoupled control plane,
from state distribution options to fault tolerance to performance metrics. This problem includes
placement of controller with respect to the available topology in the network and the number of
needed controllers. This placement is related to certain metrics defined by the user, and they
could vary from latency to increasing number of nodes in the network, etc. According to [20],
random placement for a small value of k medians will result in an average latency between 1.4x
and 1.7x larger than that of the optimal placement.
As a result, cloud clients will see network service specification as decisive factor in their choice
to move to the cloud or to choose which cloud provider. Factors like bandwidth options , port
speed , number of IP addresses , load balancing options, availability of VPN access among
others should be considered by any comprehensive model.
The proposed model tackles the resource allocation challenges faced when
provisioning computational resources (CPU, memory), storage and network
resources. Advance or immediate reservation requests for resources are sent by
clients. A central controller manages these requests with the objective of minimizing
average tardiness and request blocking. The solution aims at solving the provider's
cost challenges and the cloud applications performance issues. To the best our
knowledge, this proposal is the first to address the resource allocation problem in
cloud computing data centers in this form.
The proposed model schedules network connection requests for the network
between data centers and client nodes. A future step will be tackling internal data
center networks scheduling.

2.4 Objective
We have two cases in which we can tackle this problem :

2.4.1 Static case


Here, the system configuration, data center design parameter and the network topology are
known. Also , All client requests are known in advance for a certain period and the objective
becomes to maximize capacity results of the data center network. The problem will be modeled
using optimization theory. The main objective would be to find the maximum performance point.
This could be in the shape of maximum client requests served , minimum blocking percentage ,
minimum average request service tardiness (latency) or maximum revenue generated for a
certain period. An important objective also is to understand the dynamics of the system and find
the bottlenecks of the network and computational resources of the data center network
resources inside and outside the data center will be used.

2.4.2 Dynamic Case


In this case, client requests are not known in advance. The objective is to get the best possible
result based on the comparison with the optimal case. We aim at finding heuristic algorithms
that achieve good levels of performance at dynamic scenarios. Until now , we have suggested
five heuristics that can be used in the process of network based resource allocation and
scheduling. Detailed steps and comparison results of each are within the attached paper. We
aim at enhancing the functionality and improving the performance of these algorithms along with
suggesting more heuristic algorithms that serve the ideas that we discussed earlier.

2.5 Preliminary Results


In the first step, we modeled the optimization problem of minimizing the average tardiness of all
advance reservation connection requests. In this case we will try to reach the least average
tardiness possible regardless of what path is used . That should be done while satisfying the
requirements for the different clients virtual connection requests .

2.5.1 Detailed formulation


First, The underlying physical network of data centers and client nodes can be described as
N(Q,L,M) where:
Q : is the set of servers in all the data centers. L : the set of physical links. We assume each link
will be divided with granularity set before the experiment .All links will also be bidirectional. M: is
the server resource matrix .The server resources we consider here are : memory size, number
of CPUs and Storage. Therefore, Mq1 is the amount of memory units on server q. Also, Mq2 is the
number of CPUs on server q while, Mq3 is the amount of storage units on server q.

2.5.2 Client request types


As discussed earlier , Client requests are divided to 3 categories : 1-A request to reserve or
create a VM . The VM type is defined in the request.. 2- A request to establish a connect
between a Client node and a VM .. 3- A request to establish a connect between a VM and
another VM .To describe VM reservation requests we will use the following notions: V is the set
of vertices. They represent the requested VMs. K is set that describes the amount of resources
needed for every requested virtual machine. Kvm is the amount of resource m requested by VM
v. For example, required memory by VM v is Kv1 = 7GB.
Also, we will need to use the set P which represents the set of all paths that can be used in
the network. The connection requests come as a part of the problem. Every connection request
will specify the source s, destination d, requested start time r and a connection duration (time
needed ) t. F will represent our scheduled starting time for connection request i .
Other required variables for the problem modulation include: alp which is = 1 if link l is on path
p and 0 otherwise. bqcp Which is =1 if path p is an alternate path from server q to server c and 0
otherwise . TARD represents the allowed tardiness level measured by time units for the
experiment.
This is modeled as MILP as follows:

MIN Fi ri
i

Subject to
X q 1....(2)
q

i Yip 1...(3)
p

q m X q .K m M qm ...(4)
p

i qc p (3Yip ( X si q X di c bqcp )) 0...(5)


i jl

[(t .a
i

lp

.Yip ) (h.alp .Yip ) (h.alp .Y jp )]

Fi F j hW
. ij 3h...(6)

i j Wij W ji 1...(7)
X q , Yip , Wij 0,1 ...(8)
i Fi ri 0...(9)
i Fi ri TARD...(10)
i Fi , ri 0...(11)
for all the constra int s : u , V , p P, q , c Q

In (2), we ensure that a VM will only be assigned to one server. In (3), we ensure that a
connection request will be assigned exactly one physical path. In (4), we guarantee that VM
resource requirements won't exceed these of the servers they reside on. In (5), we ensure that
a connection is established only on one of the alternate legitimate paths form one VM to
another. In (6) and (7), we ensure that at most one request can be scheduled on a certain link at
a time and that no other request will be scheduled on that link until the duration is finished. In
(9), we ensure that the scheduled time for a request is within the tardiness window allowed in
this experiment.

Chapter 3

Research Problem II:


Energy Efficient Network-based resource
allocation
As data centers number and average size expand, so does the power or energy consumption .
Electricity used by servers doubled between 2000 and 2005, from 12 billion to 23 billion kilowatt
hours[9]. This is not only due to the increasing amount of servers per DC, but also the individual

server consumption of energy has increased too. Before the year 2000, servers drew on
average about 50 watts of electricity. By 2008, they were averaging up to 250 watts [9]. The
increase in energy consumption is of major concern to the data center owners because of its
effect on the operational cost . It is also a major concern of governments because of the
increase in data centers' carbon footprint . As the cloud service providers aim for and expect,
the cloud client base is expanding by the day and this demand will lead to building new data
centers and developing the current data centers to include more servers and upgrade the
existing servers to have more functionality and use more power. Power-related costs are
estimated to represent approximately 50% of the data center operational cost and they are
growing faster than other hardware costs like server or network costs [24]. Thus , energy
consumption is proving to be a major obstacle that would limit the providers ability to expand.
Recently, the response to this fact is seen in the practical landscape as major players in the
cloud market are taking more serious steps. Companies as large as Microsoft and Google are
aiming to deploy new data centers near cheap power sources to mitigate energy costs [24].
Recently, leading computing service providers have formed a global consortium known as The
Green Grid that aims at tackling this challenge by advancing energy efficiency in data centers
[13]. This is also pushed by governments in an attempt to decrease the carbon footprints and
the effect on climate change. For example, the Japan Data Center Council has been
established by the Japanese government to mitigate the soaring energy consumption of Data
Centers in Japan[17].
We intend to investigate available data center energy consumption optimization models.
Moreover, We will provide a model of our own to tackle the problem of energy consumption in
distributed clouds. This will be integrated with our solution of the first problem. We aim at
providing an algorithm to distribute the cloud resources in a way that minimizes the energy
consumed by these resources. Resources involved will not just include data center
computational resources (processors , memory , disks, etc) but also the power consumption of
the Cloud network resources. Power needed to establish connections between virtual machines
in different data centers or in the same data center will be provisioned for. There is a lack of a
comprehensive solution for energy efficient network based resource allocation as most of the
solutions concentrate on the architecture and power usage of the computational resources.

3.1 Related Work


Multiple solutions were proposed with the aim of reaching an energy efficient resource allocation
scheme. A common concept is the idea used in [15]. The algorithm in [15] is proposed to
execute the consolidation of different applications on cloud computing data center servers. The
idea is to consolidate tasks or VMs on the least amount of servers and then switch the unused
servers off or change them to the idle state. That problem is modeled as a bin-packing problem
with the assumption that the servers are the bins and they are full when their resources reach a
predefined optimal utilization level. This utilization level is calculated and set beforehand.
Resources used are processor and disk space. A heuristic is used to allocate workloads to
servers or bin. This heuristic tries to maximize the Euclidean distance between the current
allocations of the servers and the optimal point of each server. There were no comparisons to
the optimal solution. Also , power consumption by network components is not considered.
Another issue here is it is debatable whether finding an optimal point for each server is only
based on utilization without considering the type of the application.
Other works took a hardware planning approach to the problem. Instead of trying to reach the
highest performance possible ,the aim is to execute a certain work load with as little energy as

possible . The concept used in [27] for example was to build a cluster of embedded devices that
use little energy .The results were acceptable for tasks with low computational content. This
would not suit the cloud clients needs as this architecture cannot support applications with high
computational demands. In [24], authors try to produce a hybrid design that mixes low power
platforms with high performance platforms. They have shown an improvement over the low
power platforms .This was tested on different categories of tasks (compute intensive , non
compute intensive) . The power usage break down to system components was not discussed .
Network components energy consumption was not considered.
An economic approach to manage shared resources and minimize the energy consumption in
hosting centers is described in [28]. The authors present a solution that dynamically resizes the
active servers and responds to the thermal or power supply events by down grading the service
based on the SLA. In practical scenarios ,this approach alone would be not be sufficient. With
tens of thousands of requests arriving every time unit, and with the scheduling component
already allocating the requests at the lower limit of the SLAs to have enough resources , it will
not be very easy to find active requests that can tolerate their services being downgraded. In
[25], Authors consider heterogeneous clusters where a number of servers of different type are
deployed in each cluster. They aim at allocating resources while considering the consumed
energy. The operation cost of a server is modeled as a constant operation cost plus a cost factor
linearly related to the utilization of the server in the processing domain. The same calculation
method is considered in [26].
in [16] , the authors suggest heuristics for dynamic adaption of VM allocation at run-time
according to the current utilization of resources. The authors apply live migration, switching idle
nodes to the sleep mode, and thus minimizing energy consumption. That approach can handle
a heterogeneous infrastructure and heterogeneous VMs. "The algorithms do not depend on a
particular type of workload and not require any knowledge about applications running in
VMs."[16]. This approach only considers the CPU among the energy consuming parts of the
system. It also does not consider the energy consumed by network components .

3.2 Motivation
3.2.1 A comprehensive Solution for Energy Efficient Network-based resource allocation
Provisioning for cloud services in a comprehensive way is of crucial importance to any resource
allocation model. Any model that aims at allocating resources while minimizing energy
consumption in a distributed cloud should consider all sources of energy consumption. The
model should include analysis for power used by CPU, Memory, Hard disks, and power supply
unit which are the main power consuming components in a server. An illustration of the power
consumption of the possible components is shown in figure 3.1 (based on results of studies
performed by the authors of [9]).

Figure 3.1: Server Power Consumption (Source: Intel Labs, 2008)

Also the model should investigate power consumed by network components to transmit data
both inside the data center and outsider the data center (connecting data centers together). Any
energy gain that can be gotten from any of these components is an important achievements
since one data center's operational cost effect and impact on environment are both very high.
An average data center is estimated to consume as much energy as 25,000 households [29].

3.3 Energy Efficient Network-based resource allocation: methodology and


design challenges
3.3.1 Research Strategy/Methodology
Our goal for this problem is to focus on the design and validation of a cloud resource
allocation system that is energy efficient. We will generally follow the same strategy and apply
the same methodologies used in the first problem. We summarize the main adjustments in our
approach as follows :
1- No specific restrictions on the data center architecture or the build/type of hardware will be
imposed.
2- Next, for each one of these energy efficiency challenges discussed hereafter, we will to adjust
the environment parameters in order to separate the dominant factors that cause the challenge.
The aim here is to find the main points of energy leakage in a data center. This could involve
design problems or specific situations/circumstances that maximize the power consumptions
3- Then, exploratory solutions will be tested and their effectiveness recorded. As we tackle these
challenges one by one, we will gain a better understanding of the data center moment to
moment dynamics which will help pursuing and constructing a complete solution that gets the
best combined result for the issues as per the needed combination function.
4- Our final step will be to integrate this solution with the solution of the first problem to arrive at
a full network-based energy-aware resource allocation component that can be deployed as part
of any distributed cloud computing management system.

3.3.2 Common solutions and common trade-offs


1-A solution with a lot of variations in the literature is consolidation of applications on fewer
servers. This concept , despite its simplicity, has the potential to impact the performance
negatively. There are three main issues here :
A- A Consolidation could quickly cause I/O bottlenecks. Concentration of VMs increases the
competition for physical server resources which causes the performance to suffer as it has a
high probability of having IO bottlenecks. This would threaten the performance level . it can
cause more power consumption because of the latency in task completion.
B- Network bottlenecks : Connection blocking would increase visibly as connections from and to
all the consolidated VMs compete for the links available to the physical node where the server
is. This will be clear for applications with heavy data transaction as higher blocking percentage
would be found around the servers carrying the consolidated VMs. This would cause even more
latency and would consume more network related power.
C- The method used to hibernate or shut down the unused servers should be considered. there
is the latency caused by the time needed for system hibernating and waking up. There is also
power consumed. If used , consolidation should be part of a more complicated solution that
takes in consideration those issues along with client priorities.
2-Also, what about VM migration ? this is the core of the consolidation process. The
methodology might differ based on the VM size and configuration variations. Nevertheless,
trade-offs have to be considered between the power gained by moving the VM and hibernating
the machine it is on and the total losses caused by this migration .

These losses include:


A- Time lost moving VM through the network.
B- Power consumed by network components during the move
C- Latency of the task completion caused by the changed node on the network and the need to
provision new network resources.
D- Cost of bandwidth in case of large sets of data.
Clear decision criteria are needed for knowing when is it beneficial to migrate a VM,
considering not only short term gains but also the long term situation.

3.3.3 Energy consumption Vs. optimal performance : hardware contradictions


The way processors work currently, a higher performance (execution speed) is achieved by
maximizing the use of the processor cache memory and minimizing the use of the main memory
and disks. Also, The number and capacity of cache memory modules is going to increase in the
future. In addition, using mechanisms like out-of-order execution, high speed buses, and

support for a large number of pending memory requests increases the transistor counts which
leads to more wasted power. Thus, questions arise as to where is the optimal point between
performance and power consumption is cases like this one?

3.3.4 Idle state as a major source of wasted power


Buying or adding new resources will not solve the problem, it rather complicates it. As calculated
by authors of [15] the power consumption of the main server components (CPU, memory, etc)
starts with a constant then increases linearly as the utilization goes up. In [30], the authors
explain the energy waste that happens because of idle servers. "Even at a very low load, such
as 10% CPU utilization, the power consumed is over 50% of the peak power." We should
beware of this idea especially when there is a bottleneck since all the other idle resources are
wasting power. Therefore, we should pay special attention to optimizing all the resources in the
DC. We can see that energy consumption in this case is not really additive. Adding more
resources (servers, etc) might support performance but generally, it will not help minimizing the
power consumption of the distributed clouds. On the contrary, unused servers will consume
significant power while being idle.

3.4 Objective
We have two cases in which we can tackle this problem:

3.4.1 Static case


Here, the system configuration, data center design parameter and the network topology are
known. Also, all client requests are known in advance for a certain period and the objective
becomes one of two:
A- To maximize capacity results of the data center network and minimize the total consumed
energy.
B- To minimize the consumed energy while maintaining a certain level of request service rate.
For example, minimizing the total energy used to allocate and serve N requests such that the
blocking rate does not exceed 5% of the total requests arrived.
An important objective also is to understand the dynamics of the system and find the causes for
bottlenecks of the power consumption in the network. A detailed tracking of the network and
computational resources would be crucial.

3.4.2 Dynamic Case


In this case, client requests are not known in advance. The objective is to get the best possible
result based on the comparison with the optimal case. We aim at finding heuristic algorithms
that achieve good levels of performance in dynamic scenarios.

Bibliography
[1] M. Armbrust, A. Fox, R. Gri_th, A. Joseph, R. Katz, A. Konwinski, G. Lee, D.
Patterson, A.Rabkin, I. Stoica, and M. Zaharia, "Above the Clouds: A Berkeley
View of Cloud Computing," Tech. Rep. UCB/EECS-2009-28, EECS Department,
U.C. Berkeley, Feb 2009.

[2] "Amazon Elastic Compute Cloud (Amazon EC2),'' available online from :
http://aws.amazon.com/ec2/.
[3] S. Maguluri, R. Srikant, and L. Ying , " Stochastic Models of Load Balancing
and Scheduling in Cloud Computing Clusters," IEEE INFOCOM 2012
Proceedings. pp.702 710, 25-30 Mar,2012.
[4] G. Sun, V. Anand, H. Yu, D. Liao, and L.M Li, " Optimal Provisioning for
Elastic Service Oriented Virtual Network Request in Cloud Computing," IEEE
Globecom 2012
[5] T.D.Wallas, a, Shami, and C.Assi, "Scheduling Advance reservation
requests for wavelength division multiplexed networks with static traffic
demands, " IET communications, 2008, Vol. 2,No.8. pp. 1023-1033
[6] X. Nan, Y. He, and L. Guan, " Optimal resource allocation for multimedia
cloud in priority service scheme, " IEEE International Symposium on Circuits
and Systems (ISCAS),2012 , pp. 1111- 1114
[7] M. Alicherry, and T.V. Lakshman. "Network aware resource allocation in
distributed clouds," IEEE INFOCOM 2012 Proceedings. pp.963-971, 25-30
Mar,2012.
[8] B. Kantarci, and H.T. Mouftah. "Scheduling Advance reservation requests
for wavelength division multiplexed networks with static traffic demands, "
IEEE Symposium on Computers and Communications (ISCC), 2012. pp. 806811,1-4 Jul 2012.
[9] L. Minas and B. Ellison. "The Problem of Power Consumption in Servers ,"
Prepared in Intel Lab , Dr. Dobbs Journal, Mar 2009.
[10] I. Foster, Yong Zhao, I. Raicu, and S. Lu. "Cloud computing and grid
computing 360-degree compared," In Grid Computing Environments
Workshop, 2008. GCE'08, pp. 1-10. IEEE, 2008.
[11] A. Leinwand, The Hidden Cost of the Cloud: Bandwidth Charges,
http://gigaom.com/2009/07/17/the-hidden-cost-of-the-cloud-bandwidthcharges/, 2009

[12] Dillon, Tharam, C. Wu, and E. Chang. "Cloud computing: Issues and
challenges," In Advanced Information Networking and Applications (AINA),
2010 24th IEEE International Conference on, pp. 27-33. IEEE, 2010.
[13] The green grid consortium, 2011. URL: http://www.thegreengrid.org.
[14] I. Raicu, , Yong Zhao, I. T. Foster, and A. Szalay. "Accelerating large-scale
data exploration through data diffusion," In Proceedings of the 2008
international workshop on Data-aware distributed computing, pp. 9-18. ACM,
2008.
[15] S. Srikantaiah, A. Kansal, and Feng Zhao. "Energy aware consolidation for
cloud computing," In Proceedings of the 2008 conference on Power aware
computing and systems, pp. 10-10. USENIX Association, 2008.
[16] A.Beloglazov, J. Abawajy, and R. Buyya. "Energy-aware resource
allocation heuristics for efficient management of data centers for cloud
computing," Future Generation Computer Systems 28, no. 5 (2012): 755-768.
[17] Ministry of Economy, Trade and Industry, Establishment of the Japan data
center council, Press Release.
[18]G. Ferro, OpenFlow and Software Defined Networking, SDN and
OpenFlow Webinar by Big Switch Networks, December 2011
[19] V. Yazc, O. Sunay and A.O. Ercan, Controlling a Software-Defined
Network via Distributed Controllers, NEM Summit in Istanbul, Turkey,
October 2012
[20] B. Heller, R. Sherwood and N. McKeown, The Controller Placement
Problem, ACM 2012
[21] A. Voellmy and J. Wang, Scalable Software Defined Network
Controllers, SIGCOMM12, August 2012, Helsinki, Finland.
[22] A. Tavakoli, M. Casado et al., Applying NOX to the Datacenter, Hot
Topics in Networks Workshop 2009
[23] H. Bae, SDN promises revolutionary benefits, but watch out for the
traffic visibility challenge, http://www.networkworld.com/ , January 2013
[24] B.G. Chun, G. Iannaccone, G. Iannaccone, R. Katz, G. Lee, L. Niccolini, "
An energy case for hybrid datacenters, ACM SIGOPS Operating Systems
Review", v.44 n.1, Jan. 2010

[25] H. Goudarzi and M. Pedram, "Maximizing profit in the computing system


via resource allocation," Intl workshop Center Performance, Minneapolis,
MN, Jun. 2011.
[26] Goudarzi, H.; Pedram, M.; , "Multi-dimensional SLA-Based Resource
Allocation for Multi-tier Cloud Computing Systems," Cloud Computing
(CLOUD), 2011 IEEE International Conference on , vol., no., pp.324-331, 4-9
Jul. 2011.
[27] D. g. Andersen et al. "FAWN: A fast array of wimpy nodes," In SOSP,
2009.
[28] J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat and R. P. Doyle, "
Managing energy and server resources in hosting centers," Presented at 18th
ACM Symposium on Operating Systems Principles (SOSP'01), October 21,
2001.
[29] J. Kaplan, W. Forrest, N. Kindler, "Revolutionizing Data Center Energy
Efficiency,"
McKinsey & Company, Tech. Rep.
[30] G. CHEN, , et al. "Energy-aware server provisioning and load dispatching
for connection-intensive internet services," In NSDI (2008).
[31] SCOPE Alliance. Telecom grade cloud computing. www.scopealliance.org,
2011.
[32] R. Van den Bossche, K. Vanmechelen, J. Broeckhove, , "Cost-Optimal
Scheduling in Hybrid IaaS Clouds for Deadline Constrained Workloads," IEEE
3rd International Conference on Cloud
Computing (CLOUD),2010, pp.228-235, 5-10 July 2010.
[33] "The carrier cloud, driving internal transformation and new cloud
revenue, Strategic white paper , Alcatel-Lucent white papers , 2011.
[34] I. Monga, Software-Defined Network: A view from Summer Joint Techs
focus-day, Energy Sciences Network, August 14th, 2012 -[ONF].
[35] G. Ferro, OpenFlow and Software Defined Networking, SDN and
OpenFlow Webinar by Big Switch Networks, December 2011.
[36] A. Tootoonchian, S. Gorbunov et al.,On Controller Performance in
Software-Defined Networks, USENIX Association Berkeley, CA, USA 2012.
[37] Ontario Ministry of Health and Long-Term Care: Health Information
Protection Act, 2004
[38] E. Moyle, "Why Cloud Computing Changes the Game for HIPAA Security,"
http://www.technewsworld.com/story/72291.html

[39] P. Rudo," How Cloud Computing Affects HIPAA Compliance,"


http://enterprisefeatures.com/2011/08/how-cloud-computing-affects-hipaacompliance/ , an article published on 28/08/11
[40] S. Gittlen "Bandwidth bottlenecks loom large in the cloud",
http://www.computerworld.com, Jan 4, 2012
[41] J. Frey, "Network Management and the Responsible, Virtualized Cloud,"
research report , Feb 2011.

You might also like