You are on page 1of 76

KAPFUNDE ARNOLD TATENDA C108805V

MATANDAUDHLE TINASHE ROBERT C1010069T


KATEKWE NELSON C1010514C
MAGUMA LIVINGSTONE C1010314K

Chinhoyi University of Technology

Department of ICT and Electronics

Final Year Project:


Resource scheduling (prioritisation, allocation, sharing) algorithm for efficiency and
effectiveness on cloud platforms.

This project is submitted in partial fulfilment of the requirements for the Bachelor of Science Honours
Degree in Information Technology.

1
Dedication

This research we dedicate it to cloud computing service providers and any scholars who would
wish to do a project in cloud computing resource scheduling algorithms or similar project.

2
Abstract

Cloud computing is growing rapidly and also changing shape in process. While its adoption
may have considerable impact on the world in many dimensions of business and technology,
these changes are difficult to predict for many reasons. One of them is that we cannot foresee
how cloud computing itself will evolve. None the less there are trends that have already begun
or appear to be imminent, which may be closely related to the future evolution of cloud
computing. As such it is worth taking closer look at them and monitoring them, and their
impact, over the next years.

3
Acknowledgements

We are greatly indebted mostly to the people who posted their journals on Google Scholar
which we used mostly to get the information about cloud computing resource scheduling
algorithms. The support of our supervisor Mr Muchabaiwa is greatly acknowledged. It is not
possible to list all the people that contributed to the successful conduction of our study.
Thanking all our parents, friends and all those who had helped us in carrying out this work. We
are also indebted to the library resources centre and interest services that enabled us to ponder
over the vast subject of resource scheduling algorithm.

4
Topic

Resource scheduling (prioritisation, allocation, sharing) algorithm for efficiency and


effectiveness on cloud platforms.

5
Table of Contents
Chapter 1..................................................................................................................................................... 8
Introduction............................................................................................................................................. 8
Background of Study .............................................................................................................................. 9
Statement of the Problem...................................................................................................................... 10
Objectives ............................................................................................................................................. 11
Significance of the Study ...................................................................................................................... 12
Research Questions ............................................................................................................................... 13
Limitations ............................................................................................................................................ 14
Delimitations......................................................................................................................................... 15
Operational Definition of Terms........................................................................................................... 16
Chapter 2................................................................................................................................................... 20
Literature Review ................................................................................................................................. 20
Cloud Architecture............................................................................................................................ 20
Resources scheduling problems ........................................................................................................ 21
Resource Scheduling Algorithms ..................................................................................................... 22
Chapter 3................................................................................................................................................... 62
Methodology ......................................................................................................................................... 62
Introduction....................................................................................................................................... 62
Research Design ............................................................................................................................... 62
Research Instruments ........................................................................................................................ 62
Population ......................................................................................................................................... 63
Sampling procedure and Sample size ............................................................................................... 63
Data collection procedures................................................................................................................ 63
Data Analysis .................................................................................................................................... 64
Summary ........................................................................................................................................... 64
Chapter 4................................................................................................................................................... 65
Data Presentation .................................................................................................................................. 65
Chapter 5................................................................................................................................................... 69
Proposed Scheduling Model ................................................................................................................. 69
Hybrid PAS Algorithm (Prioritisation Sharing Allocation) ................................................................. 70
Summary and Conclusion ..................................................................................................................... 71
Recommendations................................................................................................................................. 72
Reference .................................................................................................................................................. 74

6
7
Chapter 1
Introduction
Introduction of cloud computing Cloud computing is nothing but it is a collection/group of
integrated and networked hardware, software and Internet infrastructure (called a platform).In
addition, the platform provides on demand services, that are always on, anywhere, anytime and
anyplace. Cloud computing technology virtualizes and offers many services across the network.
It mainly aims at scalability, availability, throughput, and resource utilization. What is resource
scheduling? Resource scheduling is a way of determining schedule on which activities should
be performed. Demand for resource. Resources scheduling strategy is the key technology in
cloud computing. Cloud computing is growing rapidly hence there is need to manage resources.

8
Background of Study
Cloud Computing is the use of computing resources (Hardware and Software) that are delivered
as a service over a network (typically the internet). It supplies a high performance computing
based on protocols which allow shared computation and storage over long distances.
Shailesh Sawant states that, the cloud computing platform guarantees subscribers that it sticks
to the service level agreement (SLA) by providing resources as service and by needs. However,
day by day subscriber’s needs are increasing for computing resources and their needs have
dynamic heterogeneity and platform irrelevance. In the cloud computing environment,
resources are being shared but they are not properly distributed resulting in resource wastage.
Therefore, the main problems to be solved are how to meet the needs of the subscribers and
how to dynamically as well as efficiently manage the resources.
In addition, in cloud computing, there are many tasks requires to be executed by the available
resources to achieve best performance, minimal total time for completion, shortest response
time, utilization of resources. Because of these different intentions, we need to design, develop,
propose a scheduling algorithm to outperform appropriate allocation map of tasks on resources.

9
Statement of the Problem
What is the most efficient and effective resource scheduling algorithm that can be used on
cloud platforms?

10
Objectives
a) Assess problems being faced by cloud service providers in relation to resource scheduling.
b) Evaluate resource scheduling algorithms that are currently available on cloud platforms.
c) To develop an improved resource scheduling algorithm for cloud data services.
d) Simulate the algorithms .

11
Significance of the Study
Our study will help cloud service providers to predict dynamic nature of user, user demands and
application demands. Service providers can also use this study to manage resources for
individuals or clients. It also seeks to integrate cloud provider activities for utilising and
allocating scarce/availability resources within the limit of cloud environment so as to meet the
needs of the cloud applications and clients.

In addition, the study is important since it seeks to address the problems faced by clients when
accessing cloud platforms. The study will come up with solutions to avoid resource contention
that is two applications trying to access the same resource at the same time. Due to denial of
service that people experience on cloud platforms even if they have the right to access, the
study will devise ways in which resources can be effectively scheduled.

12
Research Questions
a) What are the resources scheduling problems being faced by cloud service providers?
b) What are the different scheduling algorithms that can be used to enhance prioritisation,
resource sharing and resource allocation?
c) What is the most effective and efficient algorithm that can be used in cloud platforms to
enhance prioritisation, resource allocation and sharing?

13
Limitations
In conducting our research we faced difficulties which led to us having limitations to our study
and the limitations are:

a) The number of companies that offer cloud services are a few in Zimbabwe.
b) Most of the algorithms mainly focus on Load Balance on Cloud VM.
c) Most of the companies are not willing to disclose the resource scheduling algorithms that they use.
d) Time taken by respondents is too long considering that our research is being done within a
timeline.

14
Delimitations
a) The study will examine at resource scheduling algorithms for cloud data services.
b) The study will be confined to clients that use cloud data services.
c) The study will focus in the resources that are used in cloud data services.

15
Operational Definition of Terms

Algorithm
From Algorithm-Wikipedia, the free encyclopaedia, an algorithm is a step-by-step procedure
for calculations.

Back End
According to Vijayalakshmi A. Lepakshi and Dr. Prashanth C S R, the back end of the cloud
computing architecture is the cloud itself, comprising various computers, servers and data
storage devices.

Cloud
1. According to Judith Hurwitz, Robin Bloor, Marcia Kaufman and Fern Halper, the “cloud”
in cloud computing can be defined as the set of hardware, networks, storage, services, and
interfaces that combine to deliver aspects of computing as a service.
2. According to R. Buyya and S.Venugopal, a cloud is a type of parallel and distributed system
consisting of a collection of inter-connected and virtualized computers that are dynamically
provisioned and presented as one or more unified computing resources based on service-
level agreements established through negotiation between the service provider and
consumers.

Cloud Computing
a) It is the storing and accessing of applications and computer data often through a Web
browser rather than running installed software on your personal computer or office server.
b) It is an internet-based computing whereby information, IT resources, and software
applications are provided to computers and mobile devices on-demand.
c) Using the internet to access web-based applications, web services and IT infrastructure as
a service.

Cloud Platform
From Adrian Otto’s Blog a cloud platform is a system where software applications may be run
in an environment composed of utility cloud services in a logically abstract environment.

Cloud Provider
It is a company that provide components of cloud computing typically IaaS, NaaS, PaaS and
SaaS to subscribers or customers.

Community Cloud

16
Dr Mark I Williams states that Community clouds are used by distinct groups (or
‘communities’) of organizations that have shared concerns such as compliance or security
considerations, and the computing infrastructures may be provided by internal or third-party
suppliers.

CPU time is the amount of time for which a central processing unit was used for processing
instructions of a computer program or operating system

Data Centre
According to Ratan Mishra, Anant Jaiswal, a datacentre is nothing but a collection of servers
hosting different applications. An end user connects to the datacentre to subscribe different
applications.

Front End
Vijayalakshmi A. Lepakshi and Dr. Prashanth C S R states that the front end is the part seen by
the client, i.e. the computer user. This includes the client‘s network (or computer) and the
applications used to access the cloud via a user interface such as a web browser.

Hard disk space is the amount of permanent storage of data measured in bytes and this storage
is maintained whether the computer is on or off.

Hybrid Cloud
According to Cloud computing - Wikipedia, the free encyclopaedia states that hybrid cloud is a
composition of two or more clouds (private, community or public) that remain unique entities
but are bound together, offering the benefits of multiple deployment models.

Infrastructure as a Service (IaaS)


From Cloud computing - Wikipedia, the free encyclopaedia, IaaS is achieved when providers of
IaaS offer computers is it physical or (more often) virtual machines and other resources. In an
IaaS agreement, the subscriber completely outsources the storage and resources, such as
hardware and software that they need.

Network as a Service (Naas)


Cloud computing - Wikipedia, the free encyclopaedia states that it is a category of cloud
services where the capability provided to the cloud service user is to use network/transport
connectivity services and/or inter-cloud network connectivity services. NaaS involves the
optimization of resource allocations by considering network and computing resources as a
unified whole.

Network throughput is the average rate of successful message delivery over a communication
channel.

Platform as a Service (PaaS)

17
PaaS is a set of services aimed at developers that helps them develop and test apps without
having to worry about the underlying infrastructure. Developers don't want to have to worry
about provisioning the servers, storage and backup associated with developing and launching an
app.

According to Cloud computing - Wikipedia, the free encyclopaedia, this is where cloud
providers deliver a computing platform, typically including operating system, programming
language execution environment, database, and web server. Application developers can develop
and run their software solutions on a cloud platform without the cost and complexity of buying
and managing the underlying hardware and software layers. PaaS offers, the underlying
computer and storage resources scale automatically to match application demand so that the
cloud user does not have to allocate resources manually.

Public Cloud
According to Cloud computing - Wikipedia, the free encyclopaedia, a cloud is called a 'Public
cloud' when the services are rendered over a network that is open for public use.

Private Cloud
Cloud computing - Wikipedia, the free encyclopaedia states that Private cloud is cloud
infrastructure operated solely for a single organization, whether managed internally or by a
third-party and hosted internally or externally.

Resources or System Resources


According to Resource (computing) - Wikipedia, the free encyclopaedia, a resource, or system
resource, is any physical or virtual component of limited availability within a computer system.
Every device connected to a computer system is a resource. Every internal system component is
a resource.

Server
According to Bradley Mitchell, a server is a computer designed to process requests and deliver
data to other (client) computers over a local network or the internet.

Software as a Service (SaaS)


Cloud computing - Wikipedia, the free encyclopaedia states that SaaS is when a SaaS provider
gives subscribers access to both resources and applications. SaaS makes it unnecessary for you
to have a physical copy of software to install on your devices. SaaS also makes it easier to have
the same software on all of your devices at once by accessing it on the cloud. In a SaaS
agreement, you have the least control over the cloud.

Throughput

The number of processes completing at a given unit

18
Random Access Memory is the most common computer memory which can be used by
programs to perform necessary tasks while the computer is on

Virtual Memory is the memory that appears to exist as main storage although most of it is
supported by data held in secondary storage, transfer between the two being made automatically
as required

Virtual Machine is a software based or fictive computer which may be based on specifications
of a hypothetical computer or emulate the computer architecture and functions of a real world
computer.

19
Chapter 2
Literature Review
Cloud computing is internet based computing, whereby shared resources, software and
information are provided to computers and other devices on demand, like a public utility. Cloud
computing is technology that uses the internet and central remote servers to maintain data and
applications. This technology allows consumers and businesses to use application without
installation and access their personal files at any computer with internet access. An essential
requirement in cloud data services is scheduling the current jobs to be executed with the given
constraints. The scheduler should order the jobs in a way where balance between improving the
quality of services and at the same time maintaining the efficiency and fairness among the jobs.
Thus, evaluating the performance of scheduling algorithms is crucial towards realizing large-
scale distributed systems. In spite of the various scheduling algorithms proposed for cloud data
services, there is no comprehensive performance study undertaken which provides a unified
platform for comparing such algorithms.

Cloud Architecture
The two most significant components of cloud computing architecture are known as the front
end and the back end.

Cloud Computing Services


There are four types of services that you can subscribe to in cloud computing. These are as
follows:
a) Software as a Service (SaaS)
b) Platform as a Service (PaaS)
c) Infrastructure as a Service (IaaS)
d) Network as a Service (Naas)

20
Types of clouds
There are different types of clouds that you can subscribe to depending on your needs. As a
home user or small business owner, you will most likely use public cloud services.
a) Public Cloud
b) Private Cloud
c) Community Cloud
d) Hybrid Cloud

Cloud computing resources


Resources found in cloud computing are:
a) CPU time [Mark C. Chu-Carroll(April 2011)]
b) Random access memory and virtual memory
c) Network throughput(outgoing and incoming bandwidth) [Mark C. Chu-Carroll(April
2011)]
d) Cache Memory
e) Virtual Machine
f) Data Storage [Mark C. Chu-Carroll(April 2011)]

Resources scheduling problems


The resource scheduling problems being faced by cloud platform service providers are:
1. Prioritisation of users and job.
2. Load balance on VMs.
3. Security and Risk
The security and risk concerns are probably the best known and most challenging to
address. That is the reason several chapters in this book address these topics.
A common obstacle to the adoptions of cloud computing is the fact that the service
provider hosts sensitive data – potentially in a multi-tenant environment. The customer
must consider the host to be trustworthy enough not to intentionally, or inadvertently,
compromise the information. The fact that there is only limited standardization of cloud
functionality leads to interoperability barriers, which lock the customer into a given
vendor’s service. This presents risk if the vendor faces insolvency or if the customer
subsequently chooses to switch vendors. A general governance problem can give many
customers cause for concern. They have only limited recourse if the system performs
unreliably or doesn’t scale to their required capacity.

21
Need Of Resource Scheduling Algorithm:
a) Minimize the variation during the resource demand.
b) Improve efficiency.
c) Reflect reality:
Modifying activities within time, in other word modify resource loading for each unit of
time.
d) Technical aspects:
Not every technology is absolutely new, but is enhanced to realize a specific feature,
directly or as a pre-condition.ˆ Virtualization is an essential characteristic of cloud
computing. Virtualization in clouds refers to multi-layer hardware platforms, operating
systems, storage devices, network resources, etc. The first prominent feature of
virtualization is the ability to hide the technical complexity from users, so it can
improve independence of cloud services. Secondly, physical resource can be efficiently
configured and utilized, considering that multiple applications are run on the same
machine. Thirdly, quick recovery and fault tolerance are permitted.ˆ Virtual
environment can be easily backed up and migrated with no interruption in service.

e) Resource management:
From the cloud platform service provider’s point of view, large scale of virtual
machines needs to be allocated to thousands of distributed users, dynamically, fairly,
and most important, profitably. From the consumer’s point of view, users are economy-
driven entities when they make the decision to use cloud service.

Resource Scheduling Algorithms


According to Isam Azawi Mohialdeen, comparing these scheduling algorithms from different
perspectives is an aspect that needs to be addressed. This project aims at achieving a practical
comparison study among four common resource scheduling algorithms in cloud computing.
These algorithms are Bee Algorithm (Bee Life Algorithm), Improved Genetic Algorithm, Ant
Colony Algorithm, Particle Swarm Optimization (PSO) and Round Robin (RR). These
algorithms have been evaluated in terms of their ability to provide quality service for the tasks
and guarantee fairness amongst the jobs served. The three metrics for evaluating these job
scheduling algorithms are throughput, makespan and the total execution cost.

22
Bee Algorithm (Bee Life Algorithm)
Bees in Nature
According to Bitam S., Batouche M., Talbi E.G., as a social and domestic insect, the bee is
native to Europe and Africa. The bees feed on nectar as a source of energy in their lives and use
pollen as a source of protein in the rearing larvae. The bee colony contains generally, a single
breeding female called Queen, a few thousands of males known as the Drones, a several
thousands of sterile females called Workers, and many young bee larvae called Broods. The
bees share a communication language of extreme precision, based on two kinds of dances: the
round dance when food is very close. They are carried out when bees search food. The bees’
reproduction is guaranteed by the queen. It will mate with several males in full flight, until her
spermatheca is full. The unfertilized egg will give rise to a drone, while, the fertilized egg gives
rise to worker or queen depending on food quality.

According to Pradeep R. and Kavinya R., it is a nature inspired algorithm which tries to track
the activities of bee to get their food. First they select a scout bee to go and search a wide
domain of areas, if a scout bee finds a potential food resource it returns to its hive and does a
waggle dance which tells other bees the direction and the distance of the potential food
resource. A set of selected bees goes to the food resource and starts bringing in the honey while
other scout bee’s does the same work and sets of bees are sent to different location to bring in
the food. After every identification of a food resource the scout bee informs others and sets its
course for other new sites nearby the potential food resource. Using these activities we define
terms as in no of scout bees (n), no of sites selected out of n visited sites(m), no of best sites out
of whole set (e), no of bees recruited for the best sites e (nep), no of bees recruited for other
sites (m-e).
Bee’s algorithm is as stated below:
1. Initialize population with random solutions.
2. Evaluate fitness of the population.
3. While (stopping criterion not met)
4. Select sites for neighbourhood search.
5. Recruit bees for selected sites (more bees for best e sites) and evaluate fitness.
6. Select the fittest bee from each patch.
7. Assign remaining bees to search randomly and evaluate their fitness.
8. End While.
In first step, the bee’s algorithm starts with the scout bees (n) being placed randomly in the
search space. In step 2, the fitness of the sites visited by the scout bees are evaluated. In step 4,
bees that have the highest fitness are chosen as “selected bees” and sites visited by them are
chosen for neighbourhood search. Then, in steps 5 and 6, the algorithm conducts searches in the
neighbourhood of the selected sites, assigning more bees to search near to the best e sites. The
bees can be chosen directly according to the fitness associated with the sites they are visiting.

23
Alternatively, the fitness values are used to determine the probability of the bees being selected.
Searches in the neighbourhood of the best e sites which represent more promising solutions are
made more detailed by recruiting more bees to follow them than the other selected bees.
Together with scouting, this differential recruitment is a key operation of the Bees Algorithm.
However, in step 6, for each patch only the bee with the highest fitness will be selected to form
the next bee population. In nature, there is no such a restriction. This restriction is introduced
here to reduce the number of points to be explored. In step 7, the remaining bees in the
population are assigned randomly around the search space scouting for new potential solutions.
These steps are repeated until a stopping criterion is met. At the end of each iteration, the
colony will have two parts to its new population – those that were the fittest representatives
from a patch and those that have been sent out randomly.
Scheduling Based on Bee Algorithm
The proposed algorithm for resource allocation using the above explained bee algorithm is
justified. In this, meta-scheduler is used, which sends independent jobs to various clusters
present. In the beginning, the jobs will be submitted to meta-scheduler.
Select
The meta-scheduler using a select function will find a job which has lowest memory
requirement, lowest I/O requirement and lowest processor required to complete their job which
will act as a scout bees to find a site.
f(n) = min{Uni=1 j(i)}
where j(i) denotes the jth job and function f(n) determines the minimum resource requirement
job.
Fitness
The minimum resource requirement job which acts as a scout bee are identified and sent to the
cluster where in the jobs identify the instances present. A scout job identifies the site by using a
fitness function which runs that job in a particular instance and if a progress is made it
determines that instance specification as in it is either memory oriented or processor oriented.
Conceptually, fitness refers to how much progress each job is making with assigned resources
compared to the same job running on the entire cluster. Therefore, it is between 0 (no progress
at all) and 1 (progress is shown). The computing rate (CR) is used to calculate the fitness of an
instance to the job. Specifically, given that CR(s; j) is the computing rate of slot s for job j, the
progress share where Sj is a set of slots running tasks of job j and S is the set of all slots.
F(j) = CR(s;j) / CR(s';j)
Waggle

24
By identifying the sites resources the scouts returns to meta-scheduler and does the waggle
function. Waggle function segregates the jobs present in meta-scheduler based upon scouts
information like whether the instance is memory or processor oriented also its distance and cost
to travel. The grouping takes place in such a way that the memory scout job passes on the
information to the memory oriented jobs present and the selected set goes to the cluster to get
executed. Care should be taken to select all jobs within instances capacity. If a job outruns
instance capacity then the job has to wait until the scout jobs find another resource available
adjacent to the just found site.
W(n)= {Umi=1 s(j). s}
where the s job’s resources which is an integral multiple of scout job i.e. s(j)
After waggle function the subsets of jobs are rendered to the desired sites by meta-scheduler
and scout jobs sets course for another site exploration. Hence after the fitness function have to
run only for the scout jobs in algorithm and the time is reduced drastically. Thus the resources
are efficiently allocated also the time is reduced.
Modified Bees Life algorithm (BLA)
Tasquia Mizan, Shah Murtaza Rashid A and Masud, Rohaya Latip further modified the Bee
Algorithm and called it the Bee Life Algorithm. They chose the Bees Life algorithm (BLA) as
an optimization algorithm for its simplicity of operation and power of effect. Each cycle of a
bee population life consists of two bee behaviours: reproduction and food foraging respectively.
In reproduction behaviour, the queen starts mating in the space by mating-flight with the drones
using mutation and crossover operators. Our idea is the adaptation of the BLA operator’s value
(selection; mutation; crossover) during the run of the BLA. In this section task will be
scheduled. In the food foraging part of BLA, we propose to use a greedy method which will
find the nearest cloud storage centre (CSC) using shortest path algorithm. Therefore, scheduler
will assign each task to a nearest cloud storage centre.
In this algorithm, N means the number of population in bees colony, D mention the drones bee's
population and W mention the worker bee's population. Pseudo code for the proposed job
scheduling algorithm using BLA and Greedy Method is as follows:
i. Get new jobs to be scheduled. The jobs to be scheduled include uncompleted task
and new jobs enter the global queue and the queue size is predicted as open.
ii. Generating tasks property by GIS.
iii. Get the current state of the system.
iv. Go through the BLA to get optimised task schedule.
a. Initialise population N bees.
b. Evaluate fitness of population.

25
c. While stopping criteria is not satisfied forming new population. /*reproduction
behaviour*/
d. Generate N broods by mutation and crossover.
e. Evaluate fitness of broods.
f. If the fittest brood is fitter than the queen then replace the queen for the next
generation.
g. Choose D best bees among D fittest following broods and drones of current
population to form next generation drones.
h. Choose W best bees among W fittest remaining broods and workers of current
population to ensure food foraging. /*food foraging behaviour*/
v. Use greedy method to find a neighbourhood.

a. Initially, the first neighbourhood (priority basis, queue) is reached.


b. Use a priority queue to find its successors.
c. Repeat process v.b. until neighbourhood is reached.
d.Evaluate fitness of population fittest bee is the queen, D fittest following bees are
drones, W fittest remaining bees are workers.
e.End while.
vi. Finally obtain the optimal solution.

In BLA, at first the algorithm will choose a set of tasks randomly. Then the next step in fitness,
the calculation of the makespan for that set of tasks for a particular CSC is done. After that The
Algorithm will check the stopping criteria, if the total jobs not scheduled than generate a new
set of tasks. In the reproduction stage the algorithm will find out which set of tasks will
forwarded to which CSC by mutation and crossover. In the step 'e' fitness will compare the
priority for the set of tasks chosen by step 'd' for a particular CSC. If the set of tasks chosen by
step 'd' has higher priority then it will be scheduled first by replacing the queen for next
generation in step 'f'. In the steps 'g' and 'h' will find out the next set of tasks with priority basis
and choose the tasks that will go to the CSC concurrently. In food foraging behaviour, step 'v'
the greedy method will start by initially reach the first CSC using the shortest path algorithm
,then find its successors and repeat the process until the next CSC is found by steps va, vb and
vc. When the sets of tasks with less makespan allocated to a nearest CSC in the hybrid cloud
then the other set of tasks will be selected with priority basis for scheduling in step 'i'. At the
end the optimal solution will be obtained.

26
The Flow Chart Of Bee Algorithm:
Multiple Users Requests Global Queue

Task Properties

NO
Do jobs need to be
scheduled?

YES
Check System Status

BLA

Initialise Population

Check Fitness

YES
Stopping Criteria

NO
Reproduction

Select Drones

Mutation

Crossover

Food Foraging

Greedy Method

Find First Neighbour

Find Successor

Did You Find Neighbour?


NO

Select Next Population

Optimal Scheduling

27
Efficiency and Performance Analysis
A number of different set of tasks with same number of instructions and assuming the same
execution time has been used for examining the efficiency of scheduling methods. The common
and significant evaluation methods are makespan and flowtime. AdilYousif states that
makespan is the time where system completes the latest task; and flowtime is the total of
execution times of all tasks submitted to the cloud. In order to evaluate the performance and the
effectiveness of BLA scheduler, we assume that 3 jobs to be executed in 3 CSC and resources.
Each job can be divided in tasks relevant to the task property and scheduled to the CSC. Each
task is defined by the execution time. A simple simulation has been conducted to measure the
performance of proposed scheduling algorithm. The results is shown in Figure 4 illustrates that
the proposed method has less makespan than the other nature inspired algorithm such as firefly
algorithm (FA) or even the genetic algorithm (GA).

Advantages of Bee Algorithm

 Simplicity, flexibility and robustness.

 Use of flexible fewer control parameters.

 Ease of implementation with basic mathematical and logical operations.

 The algorithm has local search and global search ability

 Implemented with several optimisation problems

28
Disadvantages

 Random initialisation
 Algorithm has several parameters
 Parameters need to be tuned

Improved Genetic Algorithm


Before we look into the Improved Genetic Algorithm we have to consider two algorithms stated
by Pardeep Kumar and Amandeep Verma that is Min-Min and Max-Min. The Min-Min and
Max-Min algorithm where used to improve the Genetic Algorithm. These two algorithms are
further explained below:
1. Min-Min Algorithm
Min-Min begins with a set of tasks which are all unassigned. First, it computes minimum
completion time for all tasks on all resources. Then among these minimum times the
minimum value is selected which is the minimum time among all the tasks on any
resources. Then that task is scheduled on the resource on which it takes the minimum
time and the available time of that resource is updated for all the other tasks. It is updated
in this manner; suppose a task is assigned to a machine and it takes 20 seconds on the
assigned machine, then the execution times of all the other tasks on this assigned
machine will be increased by 20 seconds. After this the assigned task is not considered
and the same process is repeated until all the tasks are assigned resources.

2. Max-Min Algorithm
Max-Min is almost same as the min-min algorithm except the following: in this after
finding out the completion time, the minimum execution times are found out for each and
every task. Then among these minimum times the maximum value is selected which is
the maximum time among all the tasks on any resources. Then that task is scheduled on
the resource on which it takes the minimum time and the available time of that resource
is updated for all the other tasks. The updating is done in the same manner as for the
Min-Min. All the tasks are assigned resources by this procedure.
Pardeep Kumar and Amandeep Verma have implemented the logic of Min-Min and Max-Min
algorithms on the execution time values as given in the TABLE I below. They have assumed
four machines and six tasks.

Table 1 Execution Time


M0 M1 M2 M3
T0 200 250 220 300
T1 150 170 190 160
T2 300 320 180 360
T3 400 380 350 310
T4 100 120 140 160
T5 220 250 280 200

29
Figure 1 Task assignment by Min-Min algorithm

Figure 2 Task Assignment By Max-Min Algorithm

There is a term “makespan” in Min-Min and Max-Min scheduling techniques, which is the
maximum execution time on any machine among the machines on which the tasks are
scheduled. For example, in Figure 1, “630” is the makespan because it is the maximum
execution time among the four machines. In Figure 1 and Figure 2, the x-axis represents the
different machines and y-axis represents the execution times. According to Pardeep Kumar and
Amandeep Verma, they got the following different values of makespans by the two techniques:
Method used Makespan
Min-Min 630
Max-Min 590
Based on the different execution times of tasks on resources, one technique can outperforms the
other and the assignment of resources to the tasks can change i.e. if any task is assigned to a
machine if one technique is used; the same task can be assigned to another machine another
technique is used.

Genetic algorithm is a method of scheduling in which the tasks are assigned resources
according to individual solutions (which are called schedules in context of scheduling), which

30
tells about which resource is to be assigned to which task. Genetic Algorithm is based on the
biological concept of population generation. The main terms used in genetic algorithms are:

A. Initial Population
Initial population is the set of all the individuals that are used in the genetic algorithm to
find out the optimal solution. Every solution in the population is called as an individual.
And every individual is represented as a chromosome for making it suitable for the genetic
operations. From the initial population the individuals are selected and some operations are
applied on those to form the next generation. The mating chromosomes are selected based
on some specific criteria.
B. Fitness Function
A fitness function is used to measure the quality of the individuals in the population
according to the given optimization objective. The fitness function can be different for
different cases. In some cases the fitness function can be based on deadline, while in cases
it can be based on budget constraints.
C. Selection
We use the proportion selection operator to determine the probability of various
individuals genetic to the next generation in population. The proportional selection
operator means the probability which is selected and genetic to next generation groups is
proportional to the size of the individual's fitness.
D. Crossover
We use single-point crossover operator. Single-point crossover means only one
intersection was set up in the individual code, at that point part of the pair of individual
chromosomes is exchanged.
E. Mutation
Mutation: - Mutation means that the values of some gene locus in the chromosome coding
series were replaced by the other gene values in order to generate a new individual.
Mutation is that negates the value at the mutate points with regard to binary coded
individuals.

Genetic Algorithm that was improved works in the following manner:


1. Begin
2. Initialize population with random candidate solutions.
3. Evaluate each candidate.
4. Repeat Until (termination condition is satisfied)
b. Select parents
c. Recombine pairs of parents
d. Mutate the resulting offsprings
e. Evaluate new candidate
f. Select individuals for the next generation;
5. End

31
In Genetic Algorithm the initial population is generated randomly, so the different schedules
are not so much fit, so when these schedules are further mutated with each other, there are very
much less chances that they will produce better child than themselves. They have provided an
idea for generating initial population by using the Min-Min and Max- Min techniques for
Genetic Algorithms. As discussed in Genetic Algorithm; the solutions that are fit, give the
better generations further when genetic operators are applied on them and hence if Min-Min
and Max-Min will be used for the individual generation, we will get the better initial
population and further the better solutions than in the case of standard Genetic Algorithm in
which initial population is chosen randomly.

The algorithm performed fairly well on a wide variety of problems.geneticalgorithms operate on a


population of solutions rather than a single solution and employ heuristicssuch as selection,
crossover, and mutation to evolve better solutions.

The new Improved Genetic Algorithm is as follows:


1. Begin.
2. Find out the solution by Min-Min and Max-Min.
3. Initialize population by the result of Step 2.
4. Evaluate each candidate.
5. Repeat Until (termination condition occur)
a. Select parents.
b. Recombine pairs of parents.
c. Mutate the resulting offsprings.
d. Evaluate new candidate.
e. Select individuals for next generation.
6. End.

Simulations and Results


Pardeep Kumar and Amandeep Verma used CloudSim as a simulator for checking the
performance of our improved algorithm and the standard Genetic Algorithm. CloudSim is an
extensible simulation toolkit that enables modelling and simulation of Cloud computing
systems and application provisioning environments. The CloudSim toolkit supports both system
and behaviour modelling of Cloud system components such as data centres, virtual machines
(VMs) and resource provisioning policies. It implements generic application provisioning
techniques that can be extended with ease and limited efforts. They also considered Virtual
Machines as resource and Cloudlets as tasks/jobs. According to Pardeep Kumar and Amandeep
Verma, they checked the performance of the algorithms in two cases: in first case, they had
fixed the number of virtual machines and varied the number of cloudlets; in second case they
had fixed the number of cloudlets and varied the number of virtual machines. The makespans
that the algorithms produce are shown in tables and the graphs corresponding to these tables
have been shown. In first case, they had fixed the number of virtual machines as 10 and they are
varying the number of cloudlets from 10 to 40 with a difference of 10. They ran each algorithm
10 times and the average of these 10 runs is noted down in Table II shown below:

32
Table 2 Makespans For Fixed VMs And Varying Cloudlets
VMs Fixed: 10 Cloudlets varying
10 20 30 40
M Improve Genetic 8 26.1 60.9 113.5
E
T
H
O
D
Standard Genetic 12.4 44.7 86.5 146.8
U
S
E
D

The performance of the first case according to the noted values is shown in graph of Figure. 3,
in which x-axis shows the number of cloudlets and the y-axis shows the makespans and the
number of virtual machines is fixed as 10:
Figure 3 Graph For Makespans For Fixed VMs And Varying Cloudlets

In second case, we have fixed the number of cloudlets 40 and they are varying the number of
virtual machines from 10 to 40 with a difference of 10. They have run each algorithm 10 times
and the average of these 10 runs is noted down in the Table III shown below:

33
Table 3 Makespans for Fixed Cloudlets and Varying VMs
Cloudlets Fixed: 10 VMs varying
10 20 30 40
M Improve Genetic 8 26.1 60.9 113.5
E
T
H
O
D Standard Genetic 12.4 44.7 86.5 146.8
U
S
E
D
The performance of the first case according to the noted values is shown in graph of Figure. 4,
in which x-axis shows the number of virtual machines and the y-axis shows the makespans and
the number of cloudlets is fixed as 40:
Figure 4 Graph for makespans for fixed Cloudlets and varying VMs

From both the graphs, it can be observed that the makespan of the Improved Genetic Algorithm
is less than that of Standard Genetic Algorithm. So the new improved Genetic Algorithm
helped in reducing overall execution time of the tasks and in proper utilization of resources.

34
Advantages

 Used to generate useful solutions to optimization and search problems.

 It solves problems with multiple solutions

 Improved Genetic algorithms are easily transferred to existing simulations and models

 It can solve multidimensional, non-differential, non-continuous and non-parametrical


problems

 Easy to understand and practically does not demand the knowledge of mathematics

Disadvantages

 The genetic algorithm performed well on some problems that were very difficult for the
branch and bound techniques (i.e. the branch and bound method took a long time to the
optimal solution the genetic algorithm did not perform well on problems in which the
resources were tightly constrained). This comes as little surprise since there presentation
forces the genetic algorithm to search for resource- feasibility, and tightly constrained
resources mean fewer resource-feasible solutions the genetic algorithm did not perform
well on the job shop problem

 Cannot assure constant optimisation response times.

 The genetic algorithm cannot assure constant optimisation response times

35
Ant Colony Algorithm
Basic principles of Ant trail laying:
According to Ratan Mishra and Anant Jaiswal, depending on the species, ants may lay
pheromone trails when travelling from the nest to food, or from food to the nest, or when
travelling in either direction. They also follow these trails with a fidelity which is a function of
the trail strength, among other variables. Ants drop pheromones as they walk by stopping
briefly and touching their gesture, which carries the pheromone secreting gland, on the ground.
The strength of the trail they lay is a function of the rate at which they make deposits, and the
amount per deposit. Since pheromones evaporate and diffuse away, the strength of the trail
when it is encountered by another ant is a function of the original strength, and the time since
the trail was laid. Most trails consist of several superimposed trails from many different ants,
which may have been laid at different times; it is the composite trail strength which is sensed by
the ants.
Shilpa Damor states that the Ant Colony algorithm was aims to search for an optimal path in a
graph, based on the behaviour of ants seeking a path between their colony and a source of food.

Step of Ant Colony Algorithm according to Shilpa Damor:


1.The first ant finds the food source (F), via any way (a), then returns to the nest (N), leaving
behind a trail pheromone (b)
2.Ants indiscriminately follow four possible ways, but the strengthening of the runway makes
it more attractive as the shortest route.
3.Ants take the shortest route; long portions of other ways lose their trail pheromones.

Ant Colony algorithm:


(Based on chapter 5 from the book Design of Ant Colony Optimization Algorithm for Task
Scheduling)

Step 1: Collect all details about jobs and resources


Step 2: Initialization of the parameters value
Step 3: For each ant repeat step 4 and step 5
Step 4: a) Select the task (Ti) and resource (Rj).
b) Assign (Ti, Rj, availability[j], availability [j]+PTij ) to the output list.
Step 5: Repeat the following until all tasks are executed
a) Calculate the heuristic information (ηij)
b) Calculate current pheromone trail value ∆τij
c) Update the pheromone trail matrix
d) Calculate the probability matrix
e) Select the task with highest probabilities of i and j as the next task Ti to be executed
on the resource Rj.
f) Remove the task Ti from the unscheduled list
g) Modify the resource free time availability[j]= availability [j]+PTij
Step 6: Find out the best feasible solution by analysing of all the ants scheduling list.

Design of ACO for Task assignment:

36
The Task Resource Scheduling (TRS) is the problem of assigning T tasks to R. Resources so
that the assignment cost is minimized, where the cost is defined by a Cost function. The TRS is
considered one of the hardest combinatorial optimization (CO) problems, and can be solved to
optimality only for instances.

Collect all details about jobs and resources:


Before starting of the algorithm, we identify the number of tasks (T) and number of available
resources (R) to assign the task. Cloud is collection of Data centres (DCs) represented as
weighted graph as shown in Figure 5. In the graph each node represents a data centre or
resource. The cost of communication between the Data centres is represented using the variable
‘dc’. For example dcij is the cost of communication between DCi & data centres j. The cost
communication between the resources of same data centre is zero.
In a heterogeneous cloud environment tasks may need to communicate with other tasks which
are hosted on other resources. The task dependencies are represented as a Task Dependence
Graph (TDG), as depicted in Figure 6. Each node of the graph represents a task.
The edge between them represents dependency between tasks. The edge cost represents the
weight of communication between the tasks.
Figure 5 Resource Dependencies with Communication Cost Graph

For example there is a dependency between tasks i & j then the edge weight between them is
defined as CCij. If there is no edge between the tasks i & task j then CCij is defined as zero.
The edge weight between same tasks are also defined as zero, i.e. CCii=0.

37
Figure 6 Task Dependence-Communication Cost Graph (TDG)

Tij matrix consists of M×M entries indicates the communication cost between the tasks where
M is the number of tasks. Rij matrix consists of N×N entries indicates communication cost
between resources where N is number of resources available in cloud environment.

Initialization of the parameters value:


Pheromone evaporation value (ρ) = 0.5
Initial pheromone deposit value (τ0) = 0.01
Importance of pheromone (α) =1
Importance of resource innate attributes (β) = 1
Availability of resources available [1...N]

Calculation of the heuristic information (ηij):


A heuristic value, ηi if associated with components, ηij if associated with connections between
resource i and resource j. In many cases η is the cost, or an estimate of the cost, of adding the
component or connection to the solution under construction.
The heuristic information ηij is typically inversely proportional to the distance between cities i
and j, a straightforward choice being
ηij =1/ccij
Calculation of pheromone trail value (Δij):
Pheromone trail value can be used to learn an appropriate order for task assignments. While
ants build a solution of the scheduling, they visit edges and change their pheromone level which
is used immediately to locally update the rule. The local update rule reduces the convergence
because ants choose new resource based on high pheromone level therefore, this resource
becomes less desirable for the following ants, if the pheromone trail is reduced.
Hence, this is achieved as shown by equation below.
τij = ρ T ij + Δτij where 0 < ρ < 1 and 1≤ i,j ≤ N
where Δτij = ( 1 - ρ ) / fk where 0 < ρ < 1
At each iteration the ants calculate the minimized function ‘fk ’ for kth ant.

Calculation of the probability matrix:


Using the random proportional rule current ant in resource i choose to go to resource j with
probability Pij.

38
Implementation of ACO:
Ant colony Optimization Algorithm is implemented to solve the task scheduling problem.
Tasks arrives into the system randomly then group them as batches. The batch of tasks 90
generated is given as input to the genetic algorithm. The ACO schedules these tasks onto the
available resources. ACO generates new solution and test the solution by evaluating the cost
function. Each solution is scored and either accepted or rejected before considering it for the
next iteration.
Makespan of the cloud environment can be calculated using the Total Cost F(x) = TM
+ TD + TO+ TE +TC. Makespan is used to measure the throughput of the cloud system.
Minimum F(x) value will be taken as the makespan of the cloud system. The proposed
algorithm should minimize the makespan value of the cloud grid system

Experimental Results & Analysis of Ant Colony Algorithm:


Example based on Collection of all details about jobs and resources as from tables below:
Table 4 Available Resources in 4 data centres
Data Centre # 1 2 3 4
Available processing capacity 60 70 55 65
Available Memory 30 25 40 50
Available network bandwidth 8 4 10 15

Table 5 Resource Requirements for 6 Tasks


Task# 1 2 3 4 5 6
Process Time 20 25 30 15 35 40
Memory Requirement 10 15 20 25 10 30

The task dependencies are represented as a Task Dependence Graph (TDG), as depicted in
Figure 7.

Figure 7 Task Dependence-Communication Cost Graph (TDG)

39
Table 6 Task Dependency Cost Matrix representation
Task # 1 2 3 4 5 6
1 0 0 0 1 0 0
2 0 0 1 3 4 0
3 0 1 0 0 3 0
4 1 3 0 0 0 5
5 0 4 3 0 0 2
6 0 0 0 5 2 0
Cloud is collection of Data centers represented as weighted graph. In the graph each node
represents a data centre or resource. The cost of communication between the Data centers is
represented using the variable ‘dc’. For example dcij is the cost of communication between data
centers i and j.
Figure 8 Resource Dependency with Communication Cost Graph
dc12
DC #1 DC #2
dc14
dc13 dc24
dc32 = 3

DC #3 DC #4
dc3

40
Table 7 Resource Dependency Communication Cost Matrix Representation
Data Centre # 1 2 3 4
1 0 1 2 4
2 1 0 3 1
3 2 3 0 2
4 4 1 2 0

Initialization of the parameters value:


Pheromone evaporation value (
Initial pheromone deposit value (0) = 0.01
Importance of pheromone (α) = 1
Importance of resource innate attribute (  ) = 2
Available resources are represented in one dimensional matrix is Availability [1 …N]
Availability (j) defines the free memory/ process time of machine j.
Table 8 Resource Availability
Data Centre # 1 2 3 4
Available processing capacity 60 70 55 65
Available Memory 40 60 30 35
Available network bandwidth 8 4 10 15

Calculation of the heuristic information (ηij):


Table 9 Heuristic Information of Resources
R1 R2 R3 R4
R1 0 1 1 1
R2 1 0 1 1
R3 1 1 0 1
R4 1 1 1 0

Calculation of current pheromone trail value ij


Table 10 Current Pheromone Trail Value of Resources
R1 R2 R3 R4
R1 0 1 2 4
R2 1 0 3 1
R3 2 3 0 2
R4 4 1 2 0

Calculation of the probability matrix

41
Table 11 Probability Matrix for Resources
Pij R1 R2 R3 R4
R1 0.0 0.142857 0.285714 0.571428
R2 0.2 0.0 0.6 0.2
R3 0.285714 0.428571 0.0 0.285714
R4 0.571428 0.142857 0.285714 0.0

After calculating probability matrix P ij , arrange all the resources based on


probability value in increasing order. So, based on the above calculation of Pij , number of
possible paths are generated from each resource those we are called as Possible
ant movement to find the optimal solution.

Starts from Resource R1:


Path 1: R1 R2 R3 R4
Starts from Resource R2:
Path 2: R2 R1 R4 R3 Path 3: R2 R4 R1 R3
Starts from Resource R3:
Path 4: R3 R4 R1 R2 Path 5: R3 R1 R4 R2
Starts from Resource R4:
Path 6: R4 R2 R3 R1

Communication Cost Calculation (Tc):


Path 1: R1 R2 R3 R4
Task allocation of resources by using path 1

Availability of Resources after allocating the each task in path 1


DC#1(30) DC#2(25) DC#3(40) DC#4(50)
T1(10) 1(20) 0 0 0

42
T2(15) 1(5) 0 0 0
T3(20) 0 1(5) 0 0
T4(25) 0 0 1(15) 0
T5(10) 0 0 1(5) 0
T6(30) 0 0 0 1(20)

Availability of Resources after completion of all tasks in path 1


Resources DC #1 DC #2 DC #3 DC #4
Availability 5 5 5 20

Total Communication Cost (Tc) for completion of all tasks in path1


Data Centres Communication cost (Tc)
DC1 (cc14 * dc13) + (cc24*dc13+cc23*dc12+cc25*dc13) 17
DC2 cc32*dc21+cc35*dc23 10
DC3 cc42*dc31+cc46*dc34+cc41*dc31 18
DC4 cc64*dc43+cc65*dc43 14
Total Communication Cost (Tc) 59

Path 2: R2 R1 R4 R3

Task allocation of resources by using path 2

43
Availability of Resources after allocating the each task in path 2
DC#2(25) DC#1(30) DC#4(50) DC#3(40)
T1(10) 1(15) 0 0 0
T2(15) 1(0) 0 0 0
T3(20) 0 1(10) 0 0
T4(25) 0 0 1(25) 0
T5(10) 0 1(0) 0 0
T6(30) 0 0 0 1(10)

Availability of Resources after completion of all tasks in path 2


Resources DC #1 DC #2 DC #3 DC #4
Availability 0 0 10 25

Total Communication Cost (Tc) for completion of all tasks in path2


Data Centres Communication cost (Tc)
DC1 cc32*dc12+cc35*dc11+cc52*dc12+cc53*dc11+cc56*dc14 13
DC2 cc14*dc23+cc24*dc23+cc25*dc21+cc23*dc21 17
DC3 cc41*dc32+cc42*dc32+cc46*dc34 22
DC4 cc64*dc34+cc65*dc41 18
Total Communication Cost (Tc) 70

Path 3: R2 R4 R1 R3
Task allocation of resources by using path 3

Availability of Resources after allocating the each task in path 3


DC#2(25) DC#4(50) DC#1(30) DC#3(40)
T1(10) 1(15) 0 0 0
T2(15) 1(0) 0 0 0
T3(20) 0 1(30) 0 0
T4(25) 0 1(5) 0 0
T5(10) 0 0 1(20) 0
T6(30) 0 0 0 1(10)

Availability of Resources after completion of all tasks in path 3


Resources DC #1 DC #2 DC #3 DC #4

44
Availability 20 0 10 5

Total Communication Cost (Tc) for completion of all tasks in path 3


Data Centres Communication cost (Tc)
DC1 cc52*dc12+cc53*dc14+cc56*dc13 20
DC2 cc14*dc24+cc24*dc24+cc25*dc21+cc23*dc24 9
DC3 cc64*dc34+cc65*dc31 14
DC4 cc32*dc42+cc35*dc41 13
Total Communication Cost (Tc) 56

Path 4: R3 R4 R1 R2
Task allocation of resources by using path 4

Availability of Resources after allocating the each task in path 4


DC#3(40) DC#1(30) DC#4(50) DC#2(25)
T1(10) 1(30) 0 0 0
T2(15) 1(15) 0 0 1(0)
T3(20) 0 0 1(30) 0
T4(25) 0 0 1(5) 0
T5(10) 1(5) 0 0 0
T6(30) 0 1(0) 0 0

Availability of Resources after completion of all tasks in path 4


Resources DC #1 DC #2 DC #3 DC #4
Availability 0 25 5 5

45
Total Communication Cost (Tc) for completion of all tasks in path 4
Data Communication cost (Tc)
Centres
DC1 cc64*dc14+cc65*dc13 24
DC2 ----------------- 0
DC3 cc14*dc34+cc24*dc34+cc25*dc33+cc23*dc34+cc52*dc33+cc53*dc34+cc56*dc31 20
DC4 cc32*dc43+cc35*dc43+cc41*dc43+cc42*dc43+cc46*dc41 36
Total Communication Cost (Tc) 80

Path 5: R3 R1 R4 R2
Task allocation of resources by using path 5

Availability of Resources after allocating the each task in path 5


DC#3(40) DC#4(50) DC#1(30) DC#2(25)
T1(10) 1(30) 0 0 0
T2(15) 1(15) 0 0 1(0)
T3(20) 0 0 1(10) 0
T4(25) 0 1(25) 0 0
T5(10) 1(5) 0 0 0
T6(30) 0 0 0 0

Availability of Resources after completion of all tasks in path 5


Resources DC #1 DC #2 DC #3 DC #4
Availability 10 25 5 25

Total Communication Cost (Tc) for completion of all tasks in path 5


Data Centres Communication cost (Tc)
DC1 cc32*dc13+cc35*dc13 8
DC2 ----------------- 0
DC3 c14*dc34+cc24*dc34+cc25*dc33+cc23*dc31+cc52*dc33+cc53*dc31 16
DC4 cc41*dc43+cc42*dc43 8
Total Communication Cost (Tc) 32

46
Path 6: R4 R2 R3 R1
Task allocation of resources by using path 5

Availability of Resources after allocating the each task in path 6


DC#4(50) DC#2(25) DC#3(40) DC#1(30)
T1(10) 1(40) 0 0 0
T2(15) 1(25) 0 0 0
T3(20) 1(5) 0 0 0
T4(25) 0 1(0) 0 0
T5(10) 0 0 1(30) 0
T6(30) 0 0 1(0) 0

Availability of Resources after completion of all tasks in path 6


Resources DC #1 DC #2 DC #3 DC #4
Availability 30 0 0 5

Total Communication Cost (Tc) for completion of all tasks in path 6


Data Centres Communication cost (Tc)
DC1 ----------------- 0
DC2 cc41*dc24+cc42*dc24+cc46*dc23 19
DC3 cc52*dc34+cc53*dc34+cc56*dc33+cc64*dc32+cc65*dc33 29
DC4 cc14*dc42+cc24*dc42+cc25*dc43+cc23*dc44+cc32*dc44+cc35*dc43 18
Total Communication Cost (Tc) 66

As a measure of performance, total cost function was used. The total cost was computed
using two heuristics: Ant Colon Optimisation (ACO) based cost optimization, and Genetic
Algorithm (GA) selecting a resource based on minimum cost.
The total numbers of tasks were set 10 to 100, the processing times of tasks are uniform
distribution in [5, 10] and the memory requirement is also uniform distribution in [50, 100].
The interactive data between of tasks are varying from 1 to 10, and the communications

47
between Data Centres are varied by uniform distribution from 1 to 10. Simulation results
demonstrate that more iterations or number of particles obtain the better solution since more
solutions were generated.
Figure 9 Experimental Observations of GA and ACO

Advantages
 for a small number of nodes, problems can be solved by exhaustive search
 The algorithm has strength in both and local global searches
 Implemented in several optimization problems

Disadvantages
 Given large number of nodes it is very difficult to carry out computations.
 Theoretical analysis is difficult, due to sequences of random decisions
 Random initialization
 Probabilistic approach in the local search

48
Particle Swarm Optimization (PSO)

Particle Swarm Optimization (PSO) is a self-adaptive global search based optimization technique
introduced by Kennedy and Eberhart. According to Suraj Pandey, LinlinWu, Siddeswara, Mayura
Guru and Rajkumar Buyya, the algorithm is similar to other population-based algorithms like
Genetic algorithms but, there is no direct re-combination of individuals of the population. Instead, it
relies on the social behaviour of the particles. In every generation, each particle adjusts its trajectory
based on its best position (local best) and the position of the best particle (global best) of the entire
population. This concept increases the stochastic nature of the particle and converges quickly to
global minima with a reasonable good solution. PSO has become popular due to its simplicity and its
effectiveness in wide range of application with low computational cost. Some of the applications that
have used PSO are: the reactive voltage control problem, data mining, chemical engineering, pattern
recognition and environmental engineering. The PSO has also been applied to solve NP-Hard
problems like Scheduling and task allocation.

Task-Resource Scheduling Problem Formulation:


The mapping of tasks of an application workflow to distributed resources can have several
objectives. Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and Rajkumar Buyya focused
on minimizing the total cost of computation of an application workflow.
They denote an application workflow as a Directed Acyclic Graph (DAG) represented by G=
(V,E), where V ={T1, ..., Tn} is the set of tasks, and E represents the data dependencies
between these tasks, that is, fj,k = (Tj, Tk) ∈ E is the data produced by Tj and consumed by Tk.
Assume we have a set of storage sites S = {1, ..., i}, a set of computer sites PC = {1, ..., j},
and a set of tasks T = {1, ..., k}. Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and
Rajkumar Buyya assumed the ‘average’ computation time of a task Tk on a computer
resource PCj for a certain size of input is known. Then, the cost of computation of a task on a
compute host is inversely proportional to the time it takes for computation on that resource.
Also assume the cost of unit data access di,j from a resource i to a resource j is known. The
access cost is fixed by the service provider (e.g. Amazon CloudFront). The transfer cost can
be calculated according to the bandwidth between the sites.
However, we have used the cost for transferring unit data between sites, per second. We
assume that these costs are non-negative, symmetric, and satisfy the triangle inequality: that
is, di,j = dj,i for all i, j ∈ N, and di,j + dj,k ≥ di,k for all i, j, k ∈ N.

Figure 10 An Example Workflow, Computer Nodes (PC) & Storage

49
Figure 10 depicts a workflow structure with five tasks, which are represented as nodes. The
dependencies between tasks are represented as arrows. This workflow is similar in structure
to our version of the Evolutionary Multi-objective Optimization (EMO) application [20]. The
root task may have an input file (e.g. f.in) and the last task produces the output file (e.g.
f.out). Each task generates output data after it has completed (f12, f13, ..., f45). These data are
used by the task’s children, if any. The numeric values for these data is the edge-weight
(еk1,k2) between two tasks k1 ∈ T and k2 ∈ T . The figure also depicts three compute resources
(PC1, PC2, PC3) interconnected with varying bandwidth and having its own storage unit (S1,
S2, S3). The goal is to assign the workflow tasks to the compute resources such that the total
cost of computation is minimized.
The problem can be stated as: “Find a task-resource mapping instance M, such that when
estimating the total cost incurred using each compute resource PCj , the highest cost among
all the compute resources is minimized.”
Let Cexe(M)j be the total cost of all the tasks assigned to a compute resource PCj (Eq. 1). This
value is computed by adding all the node weights (the cost of execution of a task k on
compute resource j) of all tasks assigned to each resource in the mapping M. Let Ctx(M)j be
the total access cost (including transfer cost) between tasks assigned to a compute resource
PCj and those that are not assigned to that resource in the mapping M (Eq. 2). This value is
the product of the output file size (given by the edge weight ek1,k2) from a task k1 ∈ k to task
k2 ∈ k and the cost of communication from the resource where k1 is mapped (M(k1)) to
another resource where k2 is mapped (M(k2)).
The average cost of communication of unit data between two resources is given by
dM(k1),M(k2). The cost of communication is applicable only when two tasks have file
dependency between them, that is when ek1,k2 > 0. For two or more tasks executing on the
same resource, the communication cost is zero.

Cexe(M)j =∑𝑘 wkj ∀M(k) = j (1)

Ctx(M)j = ∑𝑘1∈T ∑𝑘2∈T 𝑑 M(k1),M(k2)ek1,k2

∀M(k1) = j and M(k2) ≠ j (2)

Ctotal(M)j = Cexe(M)j + Ctx(M)j (3)

Cost(M) = max(Ctotal(M)j ) ∀j ∈ P (4)

Minimize(Cost(M) ∀M) (5)

Equation 4 ensures that all the tasks are not mapped to a single computer resource. Initial
cost maximization will distribute tasks to all resources. Subsequent minimization of the
overall cost (Equation 5) ensures that the total cost is minimal even after initial distribution.
For a given assignment M, the total cost Ctotal(M)j for a computer resource PCj is the sum of
execution cost and access cost (Eq. 3).
When estimating the total cost for all the resources, the largest cost for all the resources is
minimized (Eq. 5). This indirectly ensures that the tasks are not mapped to single resources
and there will be a distribution of cost among the resources.

Scheduling Based On Particle Swarm Optimization:

50
In this section, we present a scheduling heuristic for dynamically scheduling workflow
applications. The heuristic optimizes the cost of task-resource mapping based on the solution
given by particle swarm optimization technique.
The optimization process uses two components:
a) the scheduling heuristic as listed in Algorithm 1, and
b) the PSO steps for task-resource mapping optimization as listed in Algorithm 2. First,
we will give a brief description of PSO algorithm.

𝑣𝑖𝑘+1 = ω𝑣𝑖𝑘+1 + c1rand1 × (pbesti − 𝑥𝑖𝑘 ) + c2rand2 × (c − 𝑥𝑖𝑘 ), (6)

𝑥𝑖𝑘+1 = 𝑥𝑖𝑘+1 + 𝑣𝑖𝑘+1 , (7)

where:
𝑣𝑖𝑘 velocity of particle i at iteration k
𝑘+1
𝑣𝑖 velocity of particle i at iteration k + 1
ω inertia weight
cj acceleration coefficients; j = 1, 2
randi random number between 0 and 1; i = 1, 2
𝑥𝑖𝑘 current position of particle i at iteration k
pbesti best position of particle i
ɡbest position of best particle in a population
𝑥𝑖𝑘+1 position of the particle i at iteration k + 1.

Particle Swarm Optimisation (PSO) is a swarm-based intelligence algorithm [8] influenced


by the social behaviour of animals such as a flock of birds finding a food source or a school
of fish protecting themselves from a predator. A particle in PSO is analogous to a bird or fish
flying through a search (problem) space. The movement of each particle is co-ordinated by a
velocity which has both magnitude and direction. Each particle position at any instance of
time is influenced by its best position and the position of the best particle in a problem space.
The performance of a particle is measured by a fitness value, which is problem specific.
The PSO algorithm is similar to other evolutionary algorithms. In PSO, the population is the
number of particles in a problem space. Particles are initialized randomly. Each particle will
have a fitness value, which will be evaluated by a fitness function to be optimized in each
generation. Each particle knows its best position ɡbest and the best position so far among the
entire group of particles ɡbest. The ɡbest of a particle is the best result (fitness value) so far
reached by the particle, whereas ɡbest is the best particle in terms of fitness in an entire population.
In each generation the velocity and the position of particles will be updated as in Eq 6 and 7,
respectively. PSO algorithm provides a mapping of all the tasks to a set of given resources.

Algorithm 1 Scheduling Heuristic:


1. Calculate average computation cost of all tasks in all computer resources
2. Calculate average cost of (communication/size of data) between resources
3. Set task node weight wkj as average computation cost
4. Set edge weight ek1,k2 as size of file transferred between tasks
5. Compute PSO({ti}) /* a set of all tasks i ∈ k*/
6. repeat

51
7. for all “ready” tasks {ti} ∈ T do
8. Assign tasks {ti} to resources {pj} according to the solution provided by PSO
9. end for
10. Dispatch all the mapped tasks
11. Wait for polling time
12. Update the ready task list
13. Update the average cost of communication between resources according to the current
network load
14. Compute PSO({ti})
15. until there are unscheduled tasks

Scheduling Heuristic: Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and Rajkumar
Buyya states that we calculate the average computation cost (assigned as node weight in
Figure 10) of all tasks on all the compute resources. This cost can be calculated for any
application by executing each task of an application on a series of known resources. It is
represented as TP matrix in Table 1. As the computation cost is inversely proportional to the
computation time, the cost is higher for those resources that complete the task quicker.
Similarly, store the average value of communication cost between resources per unit data,
represented by PP matrix in Table 1, described later in the paper. The cost of communication
is inversely proportional to the time taken. We also assume we know the size of input and
output data of each task (assigned as edge weight ek1,k2 in Figure 10). In addition, we
consider this cost is for the transfer per second (unlike Amazon CloudFront which does not
specify time for transferring). The initial step is to compute the mapping of all tasks in the
workflow, irrespective of their dependencies (Compute PSO(ti)). This mapping optimizes the
overall cost of computing the workflow application. To validate the dependencies between
the tasks, the algorithm assigns the “ready” tasks to resources according to the mapping given
by PSO. By “ready” tasks, this mean those tasks whose parents have completed execution
and have provided the files necessary for the tasks’ execution. After dispatching the tasks to
resources for execution, the scheduler waits for polling time. This time is for acquiring the
status of tasks, which is middleware dependent. Depending on the number of tasks
completed, the ready list is updated, which will now contain the tasks whose parents have
completed execution. According to Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and
Rajkumar Buyya, we then update the average values for communication between resources
according to the current network load. As the communication costs would have changed, they
recomputed the PSO mappings. Also, when remote resource management systems are not
able to assign task to resources according to our mappings due to resource unavailability, the
recomputation of PSO makes the heuristic dynamically balances other tasks’ mappings
(online scheduling). Based on the recomputed PSO mappings, assign the ready tasks to the
compute resources. These steps are repeated until all the tasks in the workflow are scheduled.

Algorithm 2 PSO algorithm:


1. Set particle dimension as equal to the size of ready tasks in {ti} ∈ T
2. Initialize particles position randomly from PC = 1, ..., j and velocity vi randomly.
3. For each particle, calculate its fitness value as in Equation 4.

52
4. If the fitness value is better than the previous best pbest, set the current fitness value as
the new pbest.
5. After Steps 3 and 4 for all particles, select the best particle as ɡbest.
6. For all particles, calculate velocity using Equation 6 and update their positions using
Equation 7.
7. If the stopping criteria or maximum iteration is not satisfied, repeat from Step 3.
NB] The algorithm is dynamic (online) as it updates the communication costs (based on
average communication time between resources) in every scheduling loop. It also recomputes
the task-resource mapping so that it optimizes the cost of computation, based on the current
network and resource conditions.

PSO: The steps in the PSO algorithm are listed in Algorithm 2. The algorithm starts with
random initialization of particle’s position and velocity. In this problem, the particles are the
task to be assigned and the dimension of the particles is the number of tasks in a workflow.
The value assigned to each dimension of particles is the computing resources indices. Thus
the particles represent a mapping of resource to a task. In our workflow (depicted in Figure
10) each particle is 5-D because of 5 tasks and the content of each dimension of the particles
is the compute resource assigned to that task. For example a sample particle could be
represented as depicted in Figure 11.
The evaluation of each particle is to perform by the fitness function given in Eq. 5. The
particles calculate their velocity using Eq. 6 and update their position according to Eq. 7.
The evaluation is carried out until the specified number of iterations (user-specified stopping
criteria).

Figure 11 A Sample of Particle for the Workflow Shown In Figure 10

Performance Metric:
As a measure of performance, we used cost for complete execution of application as a metric.
Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and Rajkumar Buyya computed the total
cost of execution of a workflow using two heuristics: PSO based cost optimization
(Algorithm 1), and best resource selection (based on minimum completion time by selecting a
resource with maximum cost).
They evaluated the scheduling heuristic using the workflow depicted in Figure 10. Each task
in the workflow has input and output files of varying sizes. Also, the execution cost of each
task varies among all the compute resources used (in our case PC1 − PC3). Suraj Pandey,
LinlinWu, Siddeswara, Mayura Guru and Rajkumar Buyya analysed the performance of our
heuristic by varying each of these in turn. Plot the graphs by averaging the results obtained
after 30 independent executions. In every execution, the x-axis parameters such as total data
size (e.g. 1024MB), range of computation cost (e.g. 1.1-1.3 $/hour) remain unchanged, while
the particle’s velocity and position change.

53
The graphs also depict the value of the plotted points together with the CI (represented as “+/-
” value).
Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and Rajkumar Buyya varied the size of
total data processed by the workflow in the range 64-1024 MB. By varying the data size, they
compared the variance in total cost of execution and the distribution of workload on
resources, for the two algorithms as depicted in Figure 12 and Figure 13, respectively. We
fixed the computer resource cost in the range 1.1−1.3$/hr.

Figure 12 Comparison Of Total Cost Between PSO Based Resource Selection And Best
Resource Selection Algorithms When Varying Total Data Size Of A Workflow.

Total Cost of Execution: Figure 12 plots the total cost of computation of the workflow (in the
log scale) with the increase in the total data processed by the workflow. The graph also plots
95% Confidence Interval (CI) for each data point. The cost obtained by PSO based task-
resource mapping increases much slower than the BRS algorithm. PSO achieves at least three
times lower cost for 1024MB of total data processed than the BRS algorithm. Also, the value
of CI in cost given by PSO algorithm is +/- 8.24, which is much lower as compared to the
BRS algorithm (+/- 253.04), for 1024 MB of data processed by the workflow. The main
reason for PSO to perform better than the ‘best resource’ selection is the way it takes into
account communication costs of all the tasks, including dependencies between them. When
calculating the cost of execution of a child task on a resource, it adds the data transfer cost for
transferring the output from its parent tasks’ execution node to that node. This calculation is
done for all the tasks in the workflow to find the near optimal scheduling of task to resources.
However, the BRS algorithm calculates the cost for a single task at a time, which does not
take into account the mapping of other tasks in the workflow. This results in PSO based
algorithm giving lower cost of execution as compared to BRS based algorithm.

54
Figure 13 Distribution Of Workflow Tasks On Available Processors.

Distribution of Load: Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru and Rajkumar
Buyya calculated the distribution of workflow tasks onto available resources for various size
of total data processed, depicted in Figure 13. This evaluation is necessary as algorithms may
choose to submit all the tasks to few resources to avoid communication between resources as
the size of data increases, thus minimizing communication cost to zero. In our formulation,
equation 4 restricts all tasks being mapped to the same resource, so that tasks can execute in
parallel for increased time-efficiency. In Figure 13, The X-axis represents the total size of
data processed by the workflow and the Y-axis the average number of tasks (expressed as
percentage) executed by a compute resource for various size of data. The figure shows that
PSO distributes tasks to resources according to the size of data. When the total size of data is
small (for 64-126 MB), PSO distributed tasks proportionally to all the resources (PC1 −
PC3). However, when the size of data increased to (and over) 256MB, more tasks were
allocated to PC1 and PC3. As the cost of computer resources was fixed for this part of
experiment, the BRS algorithm does not vary task resource mapping. Also, it is indifferent to
the size of data.
Hence, BRS’s load distribution is a straight line as depicted in Figure 13, with PC1, PC2 and
PC3 receiving 20%, 40% and 40% of the total tasks, respectively. The distribution of tasks to
all the available resources in proportion to their usage costs; ensured that hotspots (resource
overloading) were avoided. Our heuristic could minimize the total cost of execution and
balance the load on available resources.

55
Figure 14 Comparison Of Total Cost Between PSO Based Resource Selection And Best
Resource Selection Algorithms When Varying Computation Cost Of All The Resources
(For 128MB Of Data).

The reason for PSO’s improvement over BRS is due to PSO’s ability to find near optimal
solutions for mapping all tasks in the workflow to the given set of compute resources. The
linear increase in PSO’s cost also suggest that it takes both computation and communication
cost into account. However, BRS simply maps a task to the resource that has minimum
completion time (a resource with higher frequency, lower load and thus having higher cost).
As the resource costs increase, the use of BRS leads to more costs due to the affinity towards
better resource, irrespective to the size of data. Whereas, PSO minimizes the maximum total
cost of assigning all tasks to resources.

Advantages of Particle Swarm Optimization (PSO)


 Easy to perform
 Few parameters to adjust
 The global search of the algorithm is efficient
 The dependence on the initial solution is smaller
 It is a fast algorithm
Disadvantages
 Slow convergence in refined search stage
 Weak local search ability
 The algorithm has a weakness regarding local search
 It may get trapped in local minima for hard optimization problems

56
57
Round Robin Algorithm
Shilpa Damor states that to achieve optimal resource utilization, maximize throughput,
minimize response time, and avoid over- load. Using multiple components with load
balancing, instead of a single component, may increase reliability through redundancy. Hence
we chose the Round Robin Algorithm as our existing load balancing algorithm for cloud
computing. According to Dr. Hemant S. Mahalle, Prof. Parag R. Kaveri and Dr.Vinay
Chavan (January 2013), Round robin algorithm uses the time slicing mechanism. The name
of the algorithm suggests that it works in the round manner where each node is allotted with a
time slice and has to wait for their turn. The time is divided and interval is allotted to each
node. Each node is allotted with a time slice in which they have to perform their task. The
complicity of this algorithm is less compared to the other two algorithms. An open source
simulation performed the algorithm software know as cloud analyst, this algorithm is the
default algorithm used in the simulation. This algorithm simply allots the job in round robin
fashion which doesn't consider the load on different machines.

Tejinder Sharma and Vijay Kumar Banga states that Round Robin Algorithm is the simplest
algorithm that uses the concept of time quantum or slices Here the time is divided into
multiple slices and each node is given a particular time quantum or time interval and in this
quantum the node will perform its operations. The resources of the service provider are
provided to the client on the basis of this time quantum. In Round Robin Scheduling the time
quantum play a very important role for scheduling. If the time quantum is extremely too
small then Round Robin Scheduling is called as Processor Sharing Algorithm and number of
context switches is very high. It selects the load on random basis and leads to the situation
where some nodes are heavily loaded and some are lightly loaded. According to Saroj
Hiranwal and Dr. K.C. Roy, though the Round Robin Algorithm is very simple but there is an
additional load on the scheduler to decide the size of quantum and it has longer average
waiting time, higher context switches higher turnaround time and low throughput.

The Round Robin (RR) job scheduling algorithm considered in this study distributes the
selected job over the available VMs in a round order where each job is equally handled. The
idea of the RR algorithm is that it attempts to sends the selected jobs to the available VMs in
a round form.

Figure 15 The Process Of Round Robin Algorithm

58
According to Isam Azawi Mohialdeen, the Figure 15 depicts the mechanism of the Round
Robin (RR) job scheduling algorithm. The algorithm does not require any pre-processing,
overhead or scanning of the VMs to nominate the job’s executor.

Round Robin Algorithm:

Input: cloudletlist: The list of cloudlets(i.e jobs), VML: The list of available VMs

Output: Map each cloudlet to a VM.

Steps:

1. Nocl cloudletlist.size();
2. NoVMVML.size();
3. Index0;
4. For j0 to Nocl do
5. Clcloudletlist.get(j);
6. Index(index+1) mod NoVM;
7. VVML.get(index);
8. stageinTransferTime(cl,v,in);
9. stageoutTransferTime(cl,v,out);
10. execExecuteTime(cl,v);
11. if (cl.AT+stagein+exec+stageout+v.RT≤cl.DL) then
12. sendjob(cl,v);
13. update(v)
14. else
15. drop(cl);
16. FailedJobs;
17. End

The index of the selected VM for the current job is computed by a round robin fashion
using equation below:
index ¬ (index+1) mod NoVM where: index = The index to the selected VM

NoVM = The total number of available VMs

59
Figure 16 The Total Cost

In this experiment the aim was to study the impact of the number of jobs on the total cost
when VMs execute their assigned jobs. Figure 16 illustrates the experimental results
obtained for the total cost consumed by each set of jobs fed to the four scheduling
algorithms. It is clear that the total cost is highly influenced by the number of assigned
jobs for every scheduling algorithm. Notice that the minimum completion time produces
the highest cost in all cases compared to the other scheduling algorithms. This is mainly
due to the minimum completion time accomplishing the largest number of received jobs.
Thus, the total cost will be more compared with the other algorithms. The opportunistic
load balancing scheduling algorithm incurred higher cost compared with the Random and
Round Robin algorithms. This is because the opportunistic load balancing has the
capability to run more jobs at the same time, as the algorithm when the jobs are
dispatched over the available VM while taking the VM load into account. Thus, many
jobs can be run and that will lead to an increase of cost. The Round Robin algorithm
produced less cost compared to the minimum completion time and the opportunistic load
balancing algorithms. The Random algorithm is the superior in all cases in terms of the
total cost compared with the other algorithms. Nevertheless, the Random algorithm has
the same cost with the Round Rubin algorithm when the number of jobs is up to 500, 600
and 700.

Advantages
 Starvation is never a problem
 The algorithm ensures that all processes in the job (process) queue share a time slice
on the processor.
 Excellent for parallel computing as it is great for load balancing if the tasks are
around the same length

Disadvantages

60
 The algorithm does not consider priority
 If the time slice value is too small the context switching time will be large in
relation to actual work done on the CPU

61
Chapter 3
Methodology
Introduction
This chapter gives description of the research design adopted, the target population and the
sampling procedures. In addition, it also explains the research instruments used.

Research Design
Our research is qualitative research. According to Family Health International, Qualitative
Research is a type of scientific research. This is justified since our scientific research consists
of an investigation that:
a) seeks answers to a question
b) systematically uses a predefined set of procedures to answer the question
c) collects evidence
d) produces findings that were not determined in advance
e) produces findings that are applicable beyond the immediate boundaries of the study
More so, Creswell (2007), states Qualitative research as the process of research which involves
emerging questions and procedures, data typically collected in the participant’s setting, data
analysis inductively building from particulars to general themes, and the researcher making
interpretations of the meaning of the data. The final written report has a flexible structure
meaning that our final report can be further modified by those who seek to improve our
algorithm.
Qualitative research is essential since our research will involve collection of algorithms
which are predefined set of procedures that will use to produce an optimal solution. The data
will collect in cloud platforms which will be also our participant’s setting.

Research Instruments
On this section our main research instruments was documentary analysis. That is we used
journal publications, articles as well as textbooks that we got from the internet. However,
they were attempts to use other research instruments such as Focus group discussions,
questionnaires and interviews.

Focus Group Discussions


On the 13th of March we went to e-Tech Africa Expo 2014. There we did focus group
discussions with companies providing Cloud Computing services like Twenty Third
Centuries and Zarnet.

The first company we visited was Twenty Third Century Systems Company. They offer a
product called Cumulus which provide clients with a cloud environment where there can
place business applications such as accounting packages. We asked the representative about
the resource scheduling algorithm they use and he said that most the systems that are in the
market like the Intel quad core can allocate resources automatically more so, they will
allocate adequate resources for the client at no cost meaning they won’t be any situation
where the resources are not enough. They on the simulation part he suggested that we use a
testing environment since in the industry he said they don’t use simulations but rather testing
environment. When it came to resources he talked of space and processing power as one of

62
the major resources. In addition, the Twenty Third Century Systems representative mentioned
of security as being part of the cloud environment so as to protect the data of the user.

Secondly we went to Zarnet Company, they offer cloud computing services but only to the
government. Zarnet are still yet to offer the services. But Zarnet representative advised us to
look into the number of space that user would want and also as well as priority of users.

Documentary Analysis
This was our main research instrument. We used journal publications that we found on
Google Scholar. In addition, we took most of the algorithms from the journals and other
support documents from the internet. On algorithms we took those that where scientifically
tested. Some of the data on Cloud Computing algorithms we got it from the textbooks that we
downloaded on various websites. In our research we used current documents in cloud
computing regarding resource scheduling that have been released.

Interviews
As for interviews we did telephone interviews with Marco Fortino, Business Development
Manager, Google Enterprise EMEA. However, our interview was not successful since he was
not able to disclose information regarding to resource scheduling algorithm they use on their
Google Cloud Platform.

On the other hand, Google offer cloud platform services thus we had a chance to explore the
cloud environment and see how resources are being shared. Google share resources over the
network thus it helped us in simulating our algorithm over the network.

Population
Our target population was service providers offering cloud computing platforms mainly those
that offer cloud platforms. We had to reach to our target population use means of
communication such as telephone and email since distance was a major factor. More so, we
have a few companies that offer cloud computing services.

Sampling procedure and Sample size


During our research we used Stratified Random Sampling. When employing this technique,
we divided our population in strata on the basis of which companies offering cloud platform
services and from each of these smaller homogeneous groups (strata) we drew at random a
predetermined number of companies.

Our sample size was small since the number of companies offering cloud platforms services
is quite small but on the verge rising.

Data collection procedures


We used data collected from focus group discussions to develop our algorithm. We also took
note of the key areas that the companies advised into considering. For example Twenty Third
Century company advised in coming up with a testing environment so as to see if our project
is feasible.

63
During data collection us as researchers explained to the participants that the research was for
post graduate studies purposed and that all the information was confidential unless the
subjects waivered such confidentiality. Participants were informed that the information would
be useful as a starting point resolving issues or problems being faced by cloud platforms
service providers in sharing resources. Participants were asked to answer all questions frankly
and truthfully.

Data Analysis
Data collected through the use of questionnaires was coded and analysed to examine the
patterns. We also use tables, pie charts and graphs that were used to present data. This made
it easy to interpret and they show a clear picture of the resource scheduling algorithms
currently available. Extensive descriptions of patterns observed were used to answer some of
the research questions.

Summary
From the research we did we concluded in coming up with an algorithm that focuses on
priority, allocation and sharing of resources on cloud platforms. We also decided on
simulating the algorithm by creating a testing environment as well as develop a web
simulation application.

64
Chapter 4
Data Presentation
From the resource scheduling algorithms that are currently available on cloud platforms we
managed to simulate Bee Algorithm and Round Robin Algorithm. These algorithms we
simulated using the following tools below:

a) Microsoft SQL Server 2008 Database.


b) ASP.NET 2010
c) Windows Server 2008
d) Oracle VM Virtual Box

Cloud Algorithm Performance Tests


Our main aim on simulation was to enforce priority, allocation and sharing of resources.
After simulation we did the following test on the algorithms. For each of the algorithm we
carried out three tests that is requesting for a small object, requesting for a large object and
performing CPU intensive tasks. For each of the test we did five tests then calculated the
average response time.

Testing Efficiency of Algorithms


1. Requesting a small object (An image of size 14.3kB).
2. Requesting for a large object (An image of size 17.5MB).
3. Performing CPU intensive Tasks.

1. Requesting a small object (An image of size 14.3kB):


Total
Number Of Virtual Machines 10
Important and Urgent Tasks: 10
Important and Not Urgent Tasks: 10
Not Important and Urgent Tasks: 10
Not Important and Not Urgent Tasks: 10
Miscellaneous Tasks: 10

Test Results
Number of Iterations 1 2 3 4 5 Average
Bee Algorithm 0.022 0.025 0.026 0.026 0.029 0.0256
(seconds)
Round Robin Algorithm 0.027 0.028 0.033 0.027 0.031 0.0292
(seconds)

65
Time Taken To Request A Small Object
0.035

0.03

0.025
Time (seconds)

0.02

0.015 Bee Algorithm


Round Robin Algorithm
0.01

0.005

0
1 2 3 4 5
Number of Iterations

From the diagram above we can see that when requesting a small object Bee Algorithm is
much faster as compared to Round Robin. Using average time Bee Algorithm was faster by
0.0036seconds.

2. Requesting a large object (An image of size 17.5MB):


Total
Number Of Virtual Machines 10
Important and Urgent Tasks: 10
Important and Not Urgent Tasks: 10
Not Important and Urgent Tasks: 10
Not Important and Not Urgent Tasks: 10
Miscellaneous Tasks: 10

Test Results
Number of Iterations 1 2 3 4 5 Average
Bee Algorithm 0.033 0.027 0.033 0.028 0.031 0.0304
(seconds)
Round Robin Algorithm 0.026 0.038 0.033 0.029 0.034 0.0320
(seconds)

66
Time Taken To Request A Large Object
0.04

0.035

0.03
Time (seconds)

0.025

0.02
Bee Algorithm
0.015
Round Robin Algorithm
0.01

0.005

0
1 2 3 4 5
Number of Iterations

From the results obtained after requesting an image of 17.5MB the two algorithms almost
performed fairly equally. However, using the average response time Bee Algorithm was fast
by 0.0016seconds.

3. Performing CPU intensive tasks:


Total
Number Of Virtual Machines 10 000
Important and Urgent Tasks: 100 000
Important and Not Urgent Tasks: 100 000
Not Important and Urgent Tasks: 100 000
Not Important and Not Urgent Tasks: 100 000
Miscellaneous Tasks: 100 000

Test Results
Number of Iterations 1 2 3 4 5 Average
Bee Algorithm 0.304 0.32 0.279 0.307 0.327 0.3074
(seconds)
Round Robin Algorithm 0.356 0.35 0.323 0.347 0.385 0.3522
(seconds)

67
Time Taken To Performing CPU Intensive Tasks
0.45
0.4
0.35
0.3
Time (seconds)

0.25
0.2 Bee Algorithm
0.15 Round Robin Algorithm
0.1
0.05
0
1 2 3 4 5
Number of Iterations

Using time taken to perform CPU intensive tasks Bee Algorithm from the graph executed
much faster as compared to Round Robin algorithm. Using average response time Bee
Algorithm was much faster by 0.0448seconds.

From the results above we chose Bee Algorithm since from the three tests above it performed
much fast than Round Robin Algorithm.

68
Chapter 5
From the results obtained in tests carried out we came up with Hybrid PAS Algorithm
(Prioritisation Algorithm Sharing). Below is our proposed model for our algorithm.

Proposed Scheduling Model

Multiple Users Authenticating Centre


(Portal Server) Priority Server
HPASA

Quadrant Queue 1 Quadrant Queue 2 GIS Quadrant Queue 3 GIS Quadrant Queue 4 MQ

MBA
Allocation And Sharing Server

Hybrid Cloud

VM2
VM1 VMn

Cloud Computing Resource Centre

69
Hybrid PAS Algorithm (Prioritisation Sharing Allocation)
This algorithm is seeks to cater for weaknesses of the above algorithms. It is more of a hybrid
since we are taking into account steps in the above algorithms. However, our proposed
algorithm will be more focused on resource scheduling for efficiency and effectiveness on
cloud platforms. Its core objectives will be to be able to achieve prioritisation, allocation and
sharing of resources.

To prioritise in a cloud platform will use the Priority Matrix. The priority of the jobs will be
set by the Cloud Platform administrator according to the client’s prioritisation of jobs. The
prioritisation of jobs can always be changed.

Priority Matrix

It was developed by Stephen Covey. It is a method of arranging tasks by urgency and


importance using a 2x2 matrix. The Priority Matrix is a quadrants based task prioritisation.
The quadrants organise tasks based on importance and urgency. The quadrants default labels
are:
a) Critical and Immediate
b) Critical but immediate
c) Not critical but immediate
d) Uncategorised

Urgent Not Urgent


Important 1 2
Critical And Immediate Critical but immediate
Not Important 3 4
Not critical but immediate Uncategorised

The jobs will be ranked as follows:


a) Important and Urgent (Quadrant Queue 1)
b) Important and Not Urgent (Quadrant Queue 2)
c) Not Important And Urgent (Quadrant Queue 3)
d) Not Important And Not Urgent (Quadrant Queue 4)

Hybrid PAS Algorithm

1. User Login
2. If Important and Urgent Then
Add job(s) to Quadrant Queue 1
3. Else If Important and Not Urgent Then
Add job(s) to Quadrant Queue 2
4. Else If Not Important and Urgent Then
Add job(s) to Quadrant Queue 3
5. Else If Not Important and Not Important Then
Add job(s) to Quadrant Queue 4

70
6. Else
Add job(s) to Miscellaneous Queue 5
7. End If
8. While (Quadrant Queue 1 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
9. End While
10. While (Quadrant Queue 2 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
11. End While
12. While (Quadrant Queue 3 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
13. End While
14. While (Quadrant Queue 4 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
15. End While
16. While (Miscellaneous Queue 5 has tasks)
Allocate tasks using Modified Bee Algorithm (See Modified Bee Algorithm Below)
17. End While

Modified Bee Algorithm

1. Generating tasks property by GIS.


2. Get the current state of the system.
3. Initialise population with random solutions.
4. Evaluate fitness of the population.
5. While (required minimum requirements are not found)
a) Select sites for neighbourhood search.
b) Recruit bees for selected sites and evaluate fitness.
c) Select the fittest bee from each path.
6. End While

Grid Information System (GIS)


Grid information system (GIS) is responsible to allocate the tasks property.

Summary and Conclusion


Since one of the objectives of this project was to evaluate the resource scheduling algorithms
that are currently available on cloud platform, we discovered that most of the algorithms do
not take into consideration the factor of prioritisation on jobs. They mainly focused on
sharing and allocating the resources to jobs available which resulted in some important and
urgent jobs being starved of resources. The management and scheduling of resources in
Cloud environment is complex, and therefore demands sophisticated tools for analysis the
algorithm before applying them to the real system. Virtualization is an essential characteristic
of cloud computing. Virtualization in clouds refers to multi-layer hardware platforms,
operating systems, storage devices, network resources, etc. We learnt that the first prominent
feature of virtualization is the ability to hide the technical complexity from users, so it can

71
improve independence of cloud services. Furthermore, physical resource can be efficiently
configured and utilized, considering that multiple applications are run on the same machine.ˆ
In addition, quick recovery and fault tolerance are permitted. Virtual environment can be
easily backed up and migrated with no interruption in service resource management. From
the provider’s point of view, large scale of virtual machines needs to be allocated to
thousands of distributed users, dynamically, fairly, and most important, profitably. From the
consumer’s point of view, users are economy-driven entities when they make the decision to
use cloud service.
Basing on the objective of assessing the problems being faced by cloud service providers in
relation to resource scheduling we found out that most of the Service providers ‘s personnel
do not have enough knowledge on how the algorithms really work. We found out that some
of the Providers do not use algorithms to share, allocate and prioritise the job since they
claimed that the computer automatically schedule the resources

Some of the providers indicated that they scheduled the resources needed by the client based
on the client‘s specifications.

Another issue of concern was of security ,the safety of the client’s information to both the
Service Providers and External threats .The available algorithms do not consider the factor of
security hence it is both a challenge and drawbacks on people who would want to use cloud
services.

Comparing the algorithms mentioned in this document we learnt of the Bee algorithm’s
simplicity, flexibility and robustness , its use of flexible fewer control parameters and its
easiness to implement with basic mathematical and logical operations. The Bees Algorithm
mimics the foraging strategy of honey bees to look for the best solution to an optimisation
problem .This makes it better than the other algorithms mentioned despite its lack of
prioritisation.

Our algorithm which we developed encompasses all the factors to be considered (sharing,
allocation and prioritisation). It dynamically optimises maximum usage of resources on the
cloud which will guarantee better results and service delivery.

Recommendations
a) To the Service providers:
To adopt an algorithm that share, allocate and prioritise cloud resources. However they
should focus on the security part of the algorithm in relation to keeping the client’s
information or data secure.
Conduct seminars with other companies that provide the same services so that they can
have a clear view on the algorithms that are applicable and best to implement in different
platforms.
Training of personnel (staff), educating them so that they can have a good insight on
cloud services and how they work.
b) To the Clients (Users):

72
Encouraged to use cloud services since it is a more affordable way of using expensive
resources at a much lower price as the main goal is to obtain results regardless of the
amount of money spent.

73
Reference

AdilYousif, Intelligent Task Scheduling for Computational Grid, University Teknologi


Malaysia. Kassala University, Sudan.

Adnan Mehedi, Md. Habibur Rahman, A Survey of Cloud Simulation Tools, Course:
Simulation and Modeling Techniques.

Adnan Mehedi, Md. Habibur Rahman, GreenCloud: A Tutorial, Course: Simulation and
Modeling Techniques.

Adrian Otto’s Blog, What is a Cloud Platform?[o], http://adrianotto.com/2011/2/cloud-


platform. Date Accessed 12/02/2014.

Alberto Núñez, Jose L. Vázquez-Poletti, Agustin C. Caminero · Gabriel G. Castañé, Jesus


Carretero and Ignacio M. Llorente (2012), iCanCloud: A Flexible and Scalable Cloud
Infrastructure Simulator, Springer Science+Business Media B.V.
Algorithm-Wikipedia, the free encyclopaedia[o], http://en.m.wikipedia.org/wiki/Algorithm.
Date accessed 09/10/2013.

Bitam S., Batouche M., Talbi E.G., A survey on bee colony algorithms, 24th IEEE
International Parallel and Distributed Processing Symposium, NIDISC Workshop, Atlanta,
Georgia, USA, pp. 1-8, 2010.

Bradley Mitchell, Server[o],


http://compnetworking.about.com/od/basicnetworkingconcepts/q/network_servers.htm. Date
Accessed 12/02/2014.

Cloud Computing Defined[o], http://www.cloudcomputingdefined.com/, Date accessed


09/10/2013.

Cloud computing - Wikipedia, the free encyclopaedia[o],


http://en.wikipedia.org/wiki/Cloud_computing, Date accessed 09/10/2013.

Cloud Provider[o], http://searchcloudprovider.techtarget.com/definition/cloud-provider. Date


Accessed 12/02/2014.

Creswell (2007), The Selection of a Research Design, 01-Creswell (RD)-45593:01-Creswell


(RD)-45593.qxd

DESIGN OF ANT COLONY OPTIMIZATION ALGORITHM FOR TASK


SCHEDULING[o],
http://www.google.co.zw/url?sa=t&rct=j&q=&esrc=s&source=web&cd=10&cad=rja&ved
=0CIkBEBYwCQ&url=http%3A%2F%2Fshodhganga.inflibnet.ac.in%2Fbitstream%2F1060
3%2F8273%2F26%2F15_chapter%25205.pdf&ei=91lrUprHK4-
VhQfJkoHIAg&usg=AFQjCNFMSaa2pa4YRa2AgI2P05nwrxhTtg&sig2=SXjatuUtDwqQT9q
llJgsvA&bvm=bv.55123115,d.ZG4. Date Accessed 26/10/2013.

74
Dr. Hemant S. Mahalle, Prof. Parag R. Kaveri and Dr.Vinay Chavan (January 2013), Load
Balancing On Cloud Data Centres, Parag et al., International Journal of Advanced Research
in Computer Science and Software Engineering (1).

Greencloud - The green cloud simulator[o], http://greencloud.gforge.uni.lu/install.html. Date


Accessed 6/11/2013.

Family Health International, Qualitative Research Methods: A Data Collector’s Field Guide,
Module 1: Qualitative Research Methods Overview

Isam Azawi Mohialdeen, COMPARATIVE STUDY OF SCHEDULING AL-GORITHMS IN


CLOUD COMPUTING ENVIRONMENT, Journal of Computer Science, 9 (2): 252-263, 2013
Science Publications.

Judith Hurwitz, Robin Bloor, Marcia Kaufman and Fern Halper, What Is Cloud Computing?
[o], http://m.dummies.com/how-to/content/what-is-cloud-computing.html. Date Accessed
12/02/2014.

Kennedy J., R. Eberhart, Particle swarm optimization, In IEEE International Conference on


Neural Networks, volume 4, pages 1942–1948, 1995

Mark C. Chu-Carroll(April 2011), Code In The Cloud, Programming Google App Engine,
The Pragmatic Bookshelf, Raleigh, North Carolina Dallas, Texas.

Pradeep R., Kavinya R. (2012), Resource Scheduling In Cloud Using Bee Algorithm For
Heterogeneous Environment, Computer Science, Anna University, India.

Priority Matrix - Wikipedia, the free encyclopaedia[o],


http://en.wikipedia.org/wiki/Priority_Matrix, Date Accessed 05/04/2014.

Ratan Mishra, Anant Jaiswal (2012), Ant colony Optimization: A Solution of Load balancing
in Cloud, International Journal of Web & Semantic Technology (IJWesT) Vol.3, No.2

Resource (computing) - Wikipedia, the free encyclopaedia[o],


http://en.wikipedia.org/wiki/Resource_(computing), Date Accessed 09/10/2013.

Resource Scheduling Algorithm,


http://www.slideshare.net/shilpadamor9/resource-scheduling-algorithm, Date Accessed
3/3/2014.

R. Buyya, C. S. Yeo, and S. Venugopal, “Market-Oriented Cloud Computing: Vision, Hype,


and Reality for Delivering IT Services as Computing Utilities”, Proceedings of the 10th IEEE
International Conference on High Performance Computing and Communications (HPCC
2008, IEEE CS Press, Los Alamitos, CA, USA), Sept. 25-27, 2008, Dalian, China.

75
Saroj Hiranwal , Dr. K.C. Roy, Adaptive Round Robin Scheduling Using Shortest Burst
Approach Based On Smart Time Slice, International Journal Of Computer Science And
Communication July-December 2011 ,Vol. 2, No. 2 , Pp. 319-323

Shailesh Sawant (2011), A Genetic Algorithm Scheduling Approach for Virtual Machine
Resources in a Cloud Computing Environment, Master's Projects, San Jose State University
SJSU ScholarWorks.

Seung-Hwan Lim, Bikash Sharma, Gunwoo Nam, Eun Kyoung Kim, and Chita R. Das, MDCSim: A
Multi-tier Data Center Simulation Platform, Department of Computer Science and
Engineering, The Pennsylvania State University University Park, PA 16802, USA, Technical
Report CSE 09-007.

Shilpa Damor (April 2013), RESOURCE SCHEDULING ALGORITHM, Department Of


Computer Science and Engineering Ahmedabad-382481, Nirma University.

Suraj Pandey, LinlinWu, Siddeswara, Mayura Guru, Rajkumar Buyya, A Particle Swarm
Optimization-based Heuristic for Scheduling Workflow Applications in Cloud Computing
Environments, Cloud Computing and Distributed Systems Laboratory 2CSIRO Tasmanian
ICT Centre Department of Computer Science and Software Engineering Hobart, Australia
The University of Melbourne, Australia

Tasquia Mizan, Shah Murtaza Rashid A, Masud, Rohaya Latip (June 2012), Modified Bees
Life Algorithm for Job Scheduling in Hybrid Cloud, International Journal of Engineering and
Technology Volume 2 No. 6.

Tejinder Sharma, Vijay Kumar Banga (March 2013), Efficient and Enhanced Algorithm in
Cloud Computing, International Journal of Soft Computing and Engineering (IJSCE), ISSN:
2231-2307, Volume-3, Issue-1

Vijayalakshmi A. Lepakshi, Dr.Prashanth C S R, A Study on Task Scheduling Algorithms in


Cloud Computing, International Journal of Engineering and Innovative Technology (IJEIT)
Volume 2, Issue 11, May 2013.

Workload Classification & Software Energy Measurement for Efficient Scheduling on


Private Cloud Platforms[o], http://arxiv.org/abs/1105.2584, Date Accessed 5 April 2014.

Yoshida H., Kawata K., Fukuyama Y., Nakanishi Y, A particle swarm optimization for
reactive power and voltage control considering voltage stability, In the International
Conference on Intelligent System Application to Power System, pages 117–121, 1999.

76

You might also like