You are on page 1of 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing
1

A Novel Statistical Cost Model and an Algorithm


for Efficient Application Offloading to Clouds
Jose Barrameda, Student Member, IEEE, and Nancy Samaan, Member, IEEE

AbstractThis work presents a novel statistical cost model for three main categories; the first includes the computational and
applications that can be offloaded to cloud computing environ- communication costs of executing the application components.
ments. The model constructs a tree structure, referred to as the The second category includes data pertaining to the status of
execution dependency tree (EDT), to accurately represent various
execution relations, or dependencies (e.g., sequential, parallel the communication network between the user and the clouds,
and conditional branching) among the application modules, such as the available rate, the expected delay and price. The
along its different execution paths. Contrary to existing models last category includes the mobile device specific characteristics
that assume fixed average offloading costs, each modules cost such as energy consumption rates and CPU speed [2].
is modelled as a random variable described by its Cumula-
Profiling data is then used, by the offloading algorithm,
tive Distribution Function (CDF) that is statistically estimated
through application profiling. Using this model, we generalize to quantify the gains and costs of offloading the application
the offloading cost optimization functions to those that use components to the cloud. Execution cost functions are often
more user tailored statistical measures such as cost percentiles. used by the algorithm to express a given users objective
We employ these functions to propose an efficient offloading of application offloading. Examples of these objectives are
algorithm based on a dynamic programming formulation. We
minimizing the devices consumed energy and reducing the
also show that the proposed model can be used as an efficient
tool for application analysis by developers to gain insights on application response time. The offloading decision algorithm
the applications statistical performance under varying network then decides which components or modules1 should be of-
conditions and users behaviours. Performance evaluation results floaded such that the overall cost is minimized. Clearly, users
show that the achieved mean absolute percentage error between may continuously modify their objectives according to their
the model-based estimated cost and the measured one for the
context. For example, when the mobile devices battery level
application execution time can be as small as 5% for applications
with sequential and branching module dependencies. decreases beyond a certain threshold, a user may opt for
choosing to minimize the application energy consumption.
Index TermsCloud computing; application modeling; of-
floading; statistical application cost model; execution path; cu-
Whereas, if the device is connected to a permanent power
mulative distribution functions. supply source, then response time minimization may be the
main objective of the user.
Typically, solutions to the offloading problem that offload
I. I NTRODUCTION
one or more components of the application need to first
The use of the virtually unlimited cloud computing re- create an application model that can describe the relationships
sources to host mobile users applications can significantly and communication dependencies among these components.
enhance the performance of these applications. It can also Weighted graphs such as the call graph model [3] are the most
overcome the resource limitation problem of the mobile de- commonly used models in the literature. In these models, the
vices. Furthermore, it allows the users to have access to nodes in the graph represent various application modules while
computationally intensive applications such as speech recog- an edge connecting two nodes express calls, or invocations,
nition, image and language processing tools as well as gaming from one module to the other [4], [5]. The weights of the nodes
applications that could not otherwise be executed entirely on and the edges usually express the computation and communi-
the mobile devices given their limited capabilities [1]. cation costs, respectively, during application execution. These
Application Offloading refers to the process of executing execution costs are defined according to the offloading objec-
the entire mobile application, or parts of it, over the shared tive and may represent, for example, the module execution
resources of the cloud data centers [2]. These applications run time, its needed CPU cycles or the energy consumed during
in complete isolation over virtual machines (VMs) that are execution.
seamlessly launched and terminated on the cloud data centers. The main limitation of the graph models stems from the
In turn, the users only pay for the used resources on a pay- fact that the estimated costs are static and are set using average
per-use basis. execution cost values of the modules as obtained during appli-
Before an application can be offloaded, there must be an cation profiling. These costs may not be sufficiently accurate
efficient means to assess the benefits and costs of offloading. for modules that exhibit large variations in their execution
Profiling is the process of gathering the data needed by the behaviours. Examples of these modules are those that are
offloading algorithm. In general, this data can be divided into called by different conditional branching operations or those
The authors are with the School of Electrical Engineering and Computer
Science, University of Ottawa, Canada, 161 Louis Pasteur St., Ottawa, ON, 1 The words module and component will be used exchangeably to refer to
Canada K1N 6N5, {jgonz045, nsamaan}@site.uottawa.ca any given partition of an application.

2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

whose execution depends significantly on their caller modules. values for the application execution time including the time
Consider for example, a face-detection module that executes taken to execute the offloading algorithm.
as part of an image processing application, whereby it can be The rest of this article is organized as follows; Section II is
called directly or after a call to an image thumbnail resizing dedicated to reviewing the related work. We propose our novel
operation. Clearly, the cost of calling the face-detection op- statistical cost model and describe a new formulation for the
eration in the second scenario is much smaller. Due to this offloading problem using that model in Section III. Section
limitation, the resulting offloading decision may be far from IV describes an efficient method to statistically estimate the
satisfying the desired offloading objective. model CDFs and then provides some use case scenarios for
In this article, we address the aforementioned limitation the use of the cost model by application developers. We
by building upon our previously developed mobile applica- evaluate the performance of the proposed statistical cost model
tion model, the Execution Dependency Tree (EDT) model, and offloading algorithm in Section V. Finally, Section VI
proposed in [6]. In this model, each application module is concludes the article with a summary and an outlook on future
mapped into one or more nodes in the tree such that a research directions.
directed edge from a parent to a child node defines a call
sequence. The call sequence of the modules from the root of
II. R ELATED WORK
the tree to one of each leaves defines a possible execution
path for the application. In other words, an execution path Offloading algorithms generally differ with regards to three
refers to a certain call sequence of module executions from main aspects [1], namely, the granularity of the offload-
the start of the application until it is terminated. Nodes in the able application components, the adopted application model
tree are associated with costs that define the execution and and the objective of the offloading process. With respect
communication costs for both executions locally or remotely to the first aspect, algorithms can be classified into three
over the cloud. Consequently, the execution cost over an main categories, namely, augmented execution-, elastically
execution path reflects the total cost of its execution. The partitioned/Modularized Applications- and mobile application-
dependency of the decision on the module execution path, based techniques [2]. Algorithms in the first category create
hence, lends the adopted model to be much more accurate in VMs hosted on the cloud data center to replicate or clone
minimizing the offloading cost for applications. This becomes the mobile devices execution environment but with additional
significant for applications where the cost of one or more of resources than those available on the device [9]. Clearly, for
their components exhibit a high degree of cost variance for the algorithms in this category, offloading reduces to the binary
their executions. decision of either to offload the entire application or not.
We further enhance the precision of this model by proposing However, algorithms in this category cannot accommodate
a novel statistical cost model for each module using Cumu- applications where part of its execution involves users in-
lative Distribution Functions (CDFs)[7]. These functions are teractions or the use of the device interfaces. This limitation
efficiently estimated through application profiling to accurately is overcome by algorithms in the second category that can
reflect the distribution of a given modules execution cost. offload only a subset of the application modules after it is
We show that this novel cost model provides the users with partitioned. In this case, offloading relies on a client-server
more accurate means to define their offloading objectives based communication form between the device and the cloud
through the use of novel statistical measures such as cost in order to perform offloading operations. This is realized
percentiles, variances and confidence intervals of the median through different technologies such as remote procedure calls
cost. Furthermore, through the use of a set of use cases we (RPC) [10] and REST calls [11]. Finally, offloading solutions
demonstrate the usability of this model as an efficient tool in the last category can initiate the execution of the application
that can be used by developers for the statistical analysis of on the mobile device but later on suspend its execution and
the performance of the application under given environment migrate the application source code to be executed on the
parameters including different users behaviours and network cloud [12]. However, for some applications this can be very
conditions. costly. In what follows, we review some of the existing
We also employ this model to develop a new offload- approaches in these categories.
ing algorithm based on a dynamic programming procedure Ou et al. [4] model object-oriented applications using
whenever the costs of executing the application for different weighted undirected graphs. The nodes in the graph are
offloading decisions follow a total ordering. The accuracy of program classes and their weights represent their compu-
the model is evaluated for applications with different users tational and communication costs (e.g., memory, CPU and
behaviours, devices and network conditions. We show that the bandwidth). An edge exists between two nodes if their classes
achieved mean absolute percentage error between the modelled are dependent. MAUI [3] is another offloading framework that
and the measured application execution cost can be as low models the application using a directed graph. The graph nodes
as 0.14% and 0.23% for sequential and probabilistic (i.e., represent applications methods and their weights measure the
branching) module dependencies, respectively. Furthermore, average required CPU cycles, energy cost and amount of data
simulation results for offloading a face-detection application for method call serializations. The edge weights store the
to a server hosted on the DigitalOcean digital cloud [8] under energy cost of sending the data over the network. In MAUI,
different network conditions showed an error of no more than the offloading decision problem is formulated and solved as
7.23% between the means of the modelled and the measured a binary integer linear program. Clonecloud [9] is another

2
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

offloading solution that employs VMs to host application balancing and elasticity. The framework can also be hosted
threads. When a user launches an application, a pre-computed on cloudlets [22], i.e., local clouds hosted within, or close
offloading decision is selected based on the status of the to, base stations and access points or Peer to peer (P2P)
network connecting the user to the cloud. Kosta et al. [13] clouds formed by a group of mobile devices.. The framework
propose Thinkair which employs VMs to host offloaded ap- can be extended to solve the offloading problem by adding
plication methods, similar to Clonecloud. However, Thinkairs an offloading component that decides where service requests
cost model considers only the cost of self-contained methods, should run. Shi et al. [23] address the connectivity problem of
i.e., methods that do not call other methods. In Scavenger P2P clouds in CirrusCloud. The authors describe an exhaustive
[12], the programming code of the self-contained functions search based offloading algorithm while assuming that future
is uploaded to the cloud. The model developed in Scavenger computation requests and network status are known.
assumes that the application developers can easily specify the Huerta et al. [24] propose an offloading solution based on
offloading costs of the functions in terms of their parameters. the Hadoop software framework. Offloading is achieved by
Cidon et al. [14] perform offloading in a scheduling-like replacing Hadoop requests that are sent to the cloud by RPC
procedure; a priority queue holds the self-contained RPC calls calls to the P2P cloud formed by the mobile users. Xia et
to the application methods. These calls are ordered in the al. [25] study a two-tiered cloud architecture where mobile
queue by their offloading cost. A scheduler, then, offloads devices access a local cloudlet via multiple access points
the RPC call with highest priority, or cost, whenever a server as well as distant clouds via a cellular network. Offloading
becomes available. The RPC call with lowest priority is simul- is decided by a location-aware algorithm. A mobile device
taneously executed in the mobile device whenever sufficient offloads its tasks and the framework evaluates the cost of each
resources become available in the mobile device. task considering the required and remaining device energy.
Huang et al. [5] also use a weighted directed graph and In [26], Yue et al. propose scheduling algorithms to manage
propose a dynamic offloading algorithm base Lyapunov Opti- offloading requests of multiple mobile devices accessing a
mization. Yang et al. [15] assume that the network bandwidth single cloud server via a common base station. To schedule the
and device workload can be predicted. The authors model the application transfers to/from the base station, a time division
applications using what they refer to as Call Trees (CTs). resource allocation scheme is used in order to minimize the
Similar to graphs, the nodes in a CT represent methods of energy used by the mobile devices. Similarly, Chen et al. [27]
the application and parent-child edges depict method-invokes- consider the case of multiple mobile devices sharing the wire-
method relations. Clearly, CT models are more limited, in less medium to offload computations to a single cloud. In [28],
that they fail to model applications where two methods are Qiu proposes a model for application fault-tolerance analysis
invoking each other, or a common method. However, we note using scale-free graphs. Each node represents a module and
that although our developed model employs a tree structure, has a probability of fault, which is assigned randomly from a
it is employed to model the multiple execution paths of the distribution taken from observations of real services.
application and not the static module relations. Our work also relates to various efforts that analyze the
Kemp et al. [16] present Cuckoo, an offloading solution impact of the various network and device configurations on
based on the concept of Androids services and supports self- the offloading decision. For example, in [29], Ahmed et al.
contained modules and a single fixed offloading decisions per showed that the application migration time to the cloud can
module. In [17], Li et al. present TEES (Traffic and Energy be significantly affected by the traffic load of the users access
saving Encrypted Search), an architecture that leverages cloud point, the number of hops in the path to the cloud and the
resources for search over encrypted cloud-hosted documents users movement speed. The authors recommend minimizing
while minimizing the bandwidth and energy required by the the size of the migrated application states in order to reduce
mobile device. In [18], Elgazzar et al. propose a mobility- the impact of the network.. Similarly, Abolfazli et al. [30]
aware offloading solution where data required by offloaded demonstrated that the number of hops to the cloud has a higher
operations may be hosted in a third party provider. The impact on the latency of the offloaded application compared
decision algorithm chooses an offloading plan that provides to the actual physical distance to the cloud. However, Taleb
the optimal performance among a set of plans that meet user et al. [31] argue that such network delays may be reduced by
constraints. The application execution unit is the method and future cellular network technologies such as LTEs Evolved
dependencies are considered when profiling and estimating the Packet Core (EPC). In [32], Barbera et al. demonstrated that
total cost of every operation. the use of short data synchronization intervals between the
Kamienski et al. [19], [20], use the main concepts of application state in the device and on the cloud may lead to
service oriented architectures to model the applications using higher communication, time and energy costs.
cloud services. If an application module does not call other Complementary to our work are the various research efforts
modules then a corresponding atomic cloud service that does that analyze the relationships and tradeoffs among various of-
not invoke other services is created. On the other hand, for floading objectives. For example, Wu et al. [33] showed that
a module that calls other modules, a composite cloud service minimizing the execution time may result in a higher energy
is built using sequential calls to other atomic and composite cost, and vice versa. In another work, [34], the authors de-
ones [21]. The framework deploys these cloud services via veloped an offloading algorithm that can optimize a weighted
plugins in a variety of cloud providers such as Amazon EC2. sum of the mean energy and the mean time costs of offload-
Automatic service placement is carried out to achieve load ing the jobs. Similarly, Song et al. [35] analyzed the energy

3
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

consumption and traffic tradeoff by reusing results from pre- relationship among modules that can be executed in parallel,
vious offloading operations. Liu et al. [36] also examined the if sufficient resources are available, is referred to as a parallel
tradeoff between energy savings and user privacy when us- execution dependency.
ing image steganographic techniques. Our proposed model is The application modules and their above dependencies are
generic and can accommodate any measurable cost parameter represented using a EDT structure T = (V, E). The set of
for the application. However, to maintain the lucidity of our nodes V = {VM VD }, |V | = n, is the union of of two
presentation, we mostly derive our cost formulations using the subsets; VM is the set of the module nodes representing the
execution time as the adopted cost measure, but show how application modules. We make no assumptions with respect to
these formulations can be extended to other measures. the granularity of these modules. They can represent functions,
There are several limitations of the adopted cost models of methods, threads or components chosen by the user. On the
the aforementioned schemes. The first is that they mostly rely other hand, VD is the set of dependency nodes that are used
on rough estimates for the application execution cost. These to describe one of three possible modules execution relations,
estimates may be misleading for some applications that ex- namely, series, probabilistic and parallel relations. A directed
hibit a great variance in their performance such as image pro- edge e = (vj , vi ) E, vj , vi V , exists between a parent
cessing operations that rely on the image size. Secondly, they node vj and a child node vi if vj is the caller of vi .
neglect the effects of the users behaviour on the application A dependency node must be a parent of either module or
cost. Consider a user executing a search operation on a set of dependency nodes. On the other hand, to maintain lucidity of
files that he/she always maintains their indices, while another our presentation, we allow a module node to only be a caller,
may search files that are rarely indexed. Clearly, the search i.e., a parent of a single dependency node. If a module node
operation for the latter case will take a significantly longer is the caller of two or more modules, then a dependency node
time. Another limitation, is the rigid assumptions made about is needed first to specify the relation between these modules.
the offloadable application granularity (e.g., a method versus On the other hand, if the module node is self-contained, i.e.,
a thread). Finally, the majority of the adopted algorithms are it does not call any other modules, it becomes a leaf node in
restricted to optimizing the average costs of the application the tree. Finally, the case of module with a call to a single
and lack the ability to accommodate specific user objectives module is represented by a call first to a series dependency
such as minimizing the cost in the work case scenario. We node that in turn calls that module. This generalized approach
will demonstrate in the next section how these limitations are will simplify the derivations of the rules needed to derive the
overcome by our model. overall cost of the tree. Finally, we use the term execution path
of a module to refer to the sequence of nodes when the tree
III. A N O FFLOADING C OST MODEL AND P ROBLEM is traversed from the root to that module.
F ORMULATION Fig. 1 depicts the pseudo-code of a generic application
In this section, we first describe our mobile application and its corresponding EDT. The application is partitioned into
model and then employ the model to formulate the offloading modules a-f corresponding to its methods a-f . In the example,
decision making problem. a conditional branching due to the if statement is translated
to a probabilistic dependency. Similarly, it shows that d and e
can be executed in parallel while the series relation mandates
A. Application Model that f must be executed before the conditional statement. We
We model a given mobile application using a tree structure also note that as c is called twice and appears as c1 and c2 ,
that we refer to as the execution dependency tree (EDT). The one within each execution path leading to that node. Hence,
model aims at overcoming the problem of imprecise estima- the EDT allows us to accurately profile the cost of the module
tion of the modules execution cost. This problem is more c while taking into consideration the applications execution
critical for applications that contain modules exhibiting signif- path. As will be shown later this can dramatically reduce the
icant variance in their execution cost as they are called within imprecision in estimating the module costs.
different execution paths of an application. This issue is el-
evated by incorporating two additional pieces of information a
that were not considered in previous approaches; the first is
the module cost as a function of its execution path, relying on 1 public void a() series
2 f ();
the use of module dependencies. The second is the probability 3 if condition then prob
distribution of the module cost. 4 b (); f
More precisely, our proposed tree structure allows for the 5 c (); series parallel
6 else
identification of three different execution relations, or depen- 7 fork(d ());
dencies, among two or more modules. The first is the series, or 8 fork(e ()); b c2 d e
sequential execution dependency that defines a specific order 9 end
10 public void b() series
of the sequential executions among the modules. The second 11 c ();
relation describes modules related through conditional branch- c1
ing statements. In this case only one module may be exe-
cuted according to the satisfied condition. We refer to this Fig. 1: An example of an application and its EDT.
relationship as a probabilistic execution relation. Finally, the

4
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

B. A Novel Cost Model to this cost as the subtree cost of vi . This cost includes its
The offloading problem for a given application is to decide own execution cost Xi as well as that of all its children. Let
for each module whether it should be executed locally or of- Ch(vi ) V be the set of children module nodes of node vi ,
floaded to the cloud. Formally, the offloading problem is to find we can calculate the subtree costs as follows.
a location vector l = (l1 , l2 , . . . , ln ), where each li {M, C} If vi is a leaf node in the tree, then Yi is simply the module
represents the decided location of module vi V . Here, set- cost of the node, that is,
ting li = M or li = C means that the module vi will be exe- Yi |l = Xi |l, vi VM , s.t. Ch(vi ) = . (2)
cuted in the mobile device or offloaded to the cloud, respec-
On the other hand, if vi is a module node with a single child
tively. In other words, the location vector defines the final out-
vk , the subtree cost Yk |l of the child node vk must also be
come of the offloading algorithm which determines the execu-
included in the subtree cost of the parent node vi . In this case,
tion location, mobile or cloud, of every node in the EDT. This
Yi is computed as follows.
vector must be selected such that it satisfies a given offloading
cost minimization objective as will be discussed below. Yi |l = Xi |l + Yk |l, vi VM , s.t. Ch(vi ) = {vk } (3)
We first associate with each module node in the set VM The subtree cost for dependency nodes is computed according
one or more random variables that express the cost of ex- to the dependency type as follows. If vi is a series node, then
ecuting the module. The choice of these random variables is its subtree cost can be calculated as the sum of the costs of
tied to the overall objective of the offloading procedure. These its subtrees. That is:
variables can, for example, measure the amount of time it X
takes to execute the module, the needed CPU cycles, network Yi |l = Yk |l, vi VD . (4)
communication overhead, or the consumed energy during its vk Ch(vi )
execution if the user is interested in minimizing the application If vi is a probabilistic node, define P (vk ) to be the proba-
response time, save the CPU cycles, the communication cost or bility of executing vk Ch(vi ), P (vk ) [0, 1], such that
reduce the consumed energy by the application, respectively.
P
vk Ch(vi ) P (vk ) = 1. Then, Yi is a linear combination of
More than one variable can be used if the user desired to adopt random variables Yk with weights P (vk ), i.e.:
a multi- objective function [33], [36]. X
Formally, we associate with each module node, vi , two Yi |l = P (vk ) Yk |l, vi VD (5)
random variables XiM and XiC , that describe the execution vk Ch(vi )

cost, related to a given objective, when the module is executed If vi is a parallel node, then the subtree cost will depend on
locally on the device or offloaded to the cloud, respectively. the type of the measured cost. For example, if the measure cost
Since transmitting data also consumes the mobile device is the execution time, then it will be equal to the maximum or
resources (e.g., transmission time, energy needed for trans- largest order statistic of the random variables of the subtree
mission and processing), we also associate with vi , another costs of its children nodes. In other words, in this case, if the
variable XiT to model the communication cost needed to modules are executed in parallel, then the completion time will
perform offloading operations. This cost is incurred when, for be equal to the longest time taken by one of the module child
example, sending the input and output data of the offloaded subtrees.
module to and from the cloud.
Yi |l = max Yk |l, vi VD (6)
Clearly, the cost of executing vi must depend on its execu- vk Ch(vi )
tion location as well as the location of its calling parent. Let On the other hand, if energy is the measured cost, then the
Xi |l be another random variable that measures the module cost formulation will be similar to that of (4).
execution cost of vi , then Now, define the function FX (x) [0, 1] to be the Cumula-
M
Xi li = lj = M tive Distribution Function (CDF) of a given random variable

X C
X. Here, FX (x) the probability that a random variable X takes
i li = lj = C
Xi |l = T C
(1) a value less than or equal to x 0, i.e., FX (x) = P (X x).


Xi + Xi li = C, lj = M Using this notation, define FXiM (x), FXiC (x) and FXiT (x), to
T
Xi + XiM li = M, lj = C be the CDFs for the variables XiM , XiC and XiT , respectively.
where module node vi VM has a parent node vj V and Also define FXi (x|l) and FYi (x|l) to be the CDFs of Xi and
li and lj are their respective locations. The first two cases, in Yi , respectively, given a location vector l.
It is worth noting this work does not impose any restrictions
the above equation, require no communication cost between
on the specific functional forms of the CDFs.
the parent and child modules as they are executed at the same
Clearly, the above equations can be generalized by replacing
location. Hence, the execution cost of vi is that of either at the
the random variables with their CDFs [37]. It can easily be
mobile device or the cloud. On the other hand, the last two
shown that,
cases, i.e., when lj 6= li , the cost is a combination of both the
communication cost between the parent and the child nodes FX M (x) li = lj = M
i

in addition to the execution cost the module vi . FXiC (x) li = lj = C
Having calculated the execution cost of a single module, Xi , FXi (x|l) = (7)
FXiT FXiC (x) li = C, lj = M
we need next to consider the subtree cost, Yi , of executing


FXiT FXiM (x) li = M, lj = C

all the modules in the subtree with vi as its root. We refer

5
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

where is the convolution operator [7] and is defined between the probabilities of each v1 and v2 taking x seconds or less,
two functions Fj , Fk over a finite range [0, x] such that i.e., FY3 = 0.5 FY3 + 0.5 FY2 .
Z x
Fj Fk (x) = Fk (x y)dFj (y). (8)
0

For the last two cases in Eq. (7), the cost Xi is given by the
sum of the two independent random variables (Eq. (1)). Hence,

0.8
the resulting CDF is given by the convolution their CDFs [7].
v3
(probabilistic)

CDF
The CDF of the subtree cost of any node vi is then computed

0.4
FY1(x)
as follow. If vi is a leaf module node, then FY2(x) v1 v2
FY3(x) = F1 2Y1+1 2Y2(x)

0.0
FYi (x|l) = FXi (x|l) (9)
0 50 100 150 200 250 300 350
If vi is a module node with a child node vk , its cost is given
by Eq. (3), and its CDF is computed as follows: Seconds

FYi (x|l) = FXi (x|l) FYk (x|l). (10) Fig. 3: Cost CDFs of a probabilistic node v3 and its children
If vi is a series node, Yi is defined by Eq. (4) and its CDF is v1 and v2 (with equal probabilities).
computed by the convolution of the CDF of the vk Ch(vi ),
i.e.: If vi is a parallel node, then the CDF of Xi from Eq. (6)
is calculated as:
FYi (x|l) = vk Ch(vi ) FYk (x|l). (11)
Y
Fig. 2 depicts an example of the CDFs of random variables FYi (x|l) = FYk (x|l), (13)
corresponding to the time cost of three modules, v1 , v2 and vk Ch(vi )
v3 , with their costs given by Y1 , Y2 , and Y3 , respectively.
In the example, suppose we have operation v3 that executes
The first two modules could represent some image process-
operations v1 and v2 both on the original image, and creates a
ing operations (e.g., resizing and gray scaling). The random
new image with both results next to each other. Operations v1
variable Y3 represents the cost of applying both operations v1
and v2 are performed in parallel and thus Y3 = max(Y1 , Y2 ),
and v2 in sequence, i.e., Y3 = Y1 + Y2 . In the example, the
Fig. 4. In the example, the probability of v3 finishing before
probability that each of v1 and v2 takes less than 50s is very
50s is almost zero as both v1 and v2 are unlikely to finish
low, consequently, the probability that v3 is smaller than 100s
before 50s. Then, for about 80s, v1 will almost certainly be
is very low too.
completed, therefore, the probability of v3 finishing before
x > 80s is equal that of v2 .
0.8

v3 (series)
CDF

0.8
0.4

FY1(x) v3 (parallel)
v1 v2
CDF

FY2(x)
0.4

FY3(x) = FY1+Y2(x) FY1(x)


0.0

FY2(x) v1 v2
0 100 200 300 400 FY3(x) = Fmax (Y1, Y2)(x)
0.0

Seconds 0 50 100 150 200 250 300 350

Seconds
Fig. 2: Cost CDFs of a series node v3 and its children v1 and
v2 .
Fig. 4: Cost CDFs of a parallel node v3 and its children v1
and v2 .
If vi is a probabilistic node, the CDF of vi is given by the
CDF of Xi from Eq. (5), i.e.:
X To derive a generic formulation for any subtree cost, we
FYi (x|l) = P (vk ) FYk (x|l). (12) note that Eqs. (9)-(13) all indicate that the subtree cost for a
vk Ch(vi ) node is the module cost of its root module in addition to a
In the example, suppose that operation v3 is defined as running function of the costs of its subtrees. We can hence derive a
v1 with probability 0.5, and v2 otherwise, i.e., Y3 = 0.5Y1 + general formulation for these equations as follows,
0.5Y2 , Fig. 3. It can be seen that the probability of v3 taking
a value equal to or smaller than x is equal to the average of FYi (x|l) = FXi (x|l) i (x|l), (14)

6
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

where i (x|l) = is small and defined by the developers [3], [13]. In these situ-
ations, an exhaustive search can be used to obtain the optimal
vi VM and

0 offloading location vector.



Ch(vi ) =


FYk (x|l)
vi VM and
Ch(vi ) = {vk } (15)

F
vk Ch(vi ) Yk (x|l) vi VD series D. Proposed Offloading Algorithm

P
vk Ch(vi ) P (vk ) FYk (x|l) vi VD probabilistic

In this section, we focus on applications where their cost




Q
vk Ch(vi ) FYk (x|l) vi VD parallel CDFs for different location vectors have a total order under
D . We show that, for these applications, the problem can be
Finally, the following definition demonstrates how the CDF of
formulated as a set of subproblems with a similar structure
an application execution cost can be calculated using its EDT.
and, hence, be solved efficiently using a simple dynamic
Definition 1. Given the EDT T = (V, E) of an application program.
and a location vector l, then FT (x|l), the cost distribution The key idea of the use of dynamic programming as a tool
function for T given l, is equal to that of the of the root node for optimization is to formulate the given problem using a form
of T . known as the Bellman equation [38]. The optimal solution of
The calculated CDF for an application can be interpreted ac- this equation is described using that of a set of subproblems.
cording to the measured cost. For example, when the modelled The optimal solutions of the subproblems must always lead to
cost is the execution time T , then FT (x|l) is the probability that of the original problem. Problems satisfying this property
that T finishes by time x, and the mean value of the CDF is the are said to have an optimal substructure [39]. Furthermore,
expected time of completion of the application if its modules in this formulation, solving the subproblems requires only
are offloaded according to the location vector l. Similarly, solving one or more common subproblems. This property is
when energy is the considered measure, then the CDF is the referred to as having overlapping subproblems [39]. In turn,
probability of the application consuming less or equal than x the subproblems are solved only once and their solution is
energy units. reused repetitively.
Indeed, our problem formulation, Eq. (14), is a Bellman
equation if it satisfies the optimal substructure property dis-
C. Problem Formulation Using Cost CDFs
cussed above. This property requires that if the cost FYi (x|l)
As stated before, the offloading problem is to find a location is optimal, then the cost of the subproblems FYk (x|l) must be
decision l vector that minimizes the cost of running the optimal as well for all vk Ch(vi ). The property must hold
application. In this section, we reformulate the problem in for each possible alternative of the term i (x|l) in Eq. (15),
terms of the previously obtained cost CDFs for different i.e., for each node type in the equation.
location vectors. The comparison between these CDFs is
If vi is a series node, the optimal substructure property holds
mathematically defined by a binary relation over their set.
if and only if choosing a location vector that leads to a better
More formally, let D be a binary relation defined over the
or equal cost for a child node vk must also result in a better
set of an EDT CDFs, D.
or equal cost distribution for vi . We verify this property in the
The relation D can be defined using any statistical mea-
case of a node with two children for clarity of presentation,
surement of the CDFs. For example, the traditional offloading
but the results hold for any number of children.
problem definition aims at minimizing the average cost of the
Formally, given a series node vi VD with two children
application. This is equivalent to defining D as Fi D Fj
vk , vm Ch(vi ) and let l0 and l00 be two alternative location
E[Xi ] E[Xj ], where E[X] is the expected value of
vectors for the EDT rooted at node vk . Also, fix l as the
X. Similarly, D can represent a function, for example, of
already selected location vector for the tree rooted at vm , then
the percentiles of the CDFs or the confidence intervals of the
the optimal substructure property holds whenever we have:
median.
Let DT D be the set of feasible CDFs for EDT T , i.e.,
the set of CDFs that can be obtained for T for all possible FYk (x|l0 ) D FYk (x|l00 )
location decision vectors. The offloading problem is defined [FYk (x|l0 ) FYm (x|l)] D [FYk (x|l00 ) FYm (x|l)] (16)
as follows.
A similar reasoning can be followed for probabilistic nodes.
Definition 2. Given an EDT T of an application and D a
Formally, given vi VD a probabilistic node with two children
total binary relation defined on D, the offloading problem is to
vk , vm Ch(vi ) with probabilities Pk and Pm , respectively,
find a location vector l such that FT (x|l) DT is infimum
then the optimal substructure property holds whenever the
for the ordered set (DT , D ).
following property holds:
Since modules in the EDT can, in general, be executed lo-
cally or offloaded, the number of solutions for the offloading FYk (x|l0 ) D FYk (x|l00 )
problem increases exponentially as the number of the EDT
[Pk FYk (x|l0 ) + Pm FYm (x|l)]
nodes increases. However, in some scenarios, the size of the
tree is relatively small as the number of nodes to be considered D [Pk FYk (x|l00 ) + Pm FYm (x|l)] (17)

7
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

Algorithm 2: Solve
For parallel nodes with the cost represented in Eq. (13), the Data: T :EDT
property holds when: 1 begin
2 r T.root ;
3 if solved[r] then return;
0 00
FYk (x|l ) D FYk (x|l ) 4 solved[r] true;
foreach childT ree r.children do
[FYk (x|l ) FYm (x|l)] D [FYk (x|l00 ) FYm (x|l)]
0 5
(18) 6 Solve (childT ree);
7 end
8 foreach lc {0, 1} do
we can now enunciate the following lemma. /* lc are the possible locations of vr s
parent */
Lemma 1. Let T be the EDT of an application, and let 9 cdf null;
(D, D ) be a total ordered set representing the total ordered 10 lr null;
set of its cost CDFs, then the offloading problem can be 11 foreach lr0 {0, 1} do
12 cdf 0 GetCDF (r, lc , lr0 );
efficiently solved by dynamic programming whenever Eqs. 13 if (cdf = null)|(cdf 0 D cdf ) then
(16)-(18) hold. 14 cdf cdf 0 ;
15 lr lr0 ;
We provide an informal proof of the above lemma which 16 end
17 end
can easily be formalized. As stated before, Cormen et at. [39] 18 decision[r][lc ] lr ;
have formally showed that an optimization problem formulated 19 cdf [r][lc ] cdf ;
using the Bellman equation satisfies the optimal substructure 20 end
/* where GetCDF (r, lc , lr ) computes the CDF of
property can be solved by dynamic programming. Hence, to node vr according to Equations (9) to (13)
proof the lemma, we need only to show that whenever Eqs. depending on the type of vr , i.e., leaf,
(16)-(18) hold, the offloading problem as defined in Def. 2 series, probabilistic and parallel. Note
that the CDF of vr s children were computed
shows an optimal substructure. This can be directly derived in line 6. */
from the above examples that show that the optimal location 21 end
vector for an EDT, with any node types, must include the
optimal location vectors of its subtrees. Algorithm 3: BuildSolution
An example of D that satisfy Eqs. (16)-(18) is one that Data: T : EDT
Data: lc : Callers location
employs the expected value of the distribution functions, i.e., 1 begin
FYi (x|l0 ) D FYi (x|l00 ) (E[Yi |l0 ] E[Yi |l00 ]). 2 r T.root ;
3 lr decision[r][lc ];
We are now ready to use the dynamic programming based 4 l[r] lr ;
offloading scheme described in Algorithms 1-3. The first is 5 foreach childT ree Children(r) do
the main algorithm that calls the latter two. Algorithm 2 visits 6 BuildSolution (childT ree, lr );
7 end
the nodes of T in a bottom-up order. The optimal location for 8 end
each visited node is obtained for each possible location of its
parent node. Similarly, the optimal location for the parent node
is decided for each of its parents locations and so on until the
algorithm reaches the root node. For each node, the optimal IV. E STIMATION OF THE COST CDF S
location is decided by comparing the CDFs resulting from To estimate the cost of each module node, we employ
locating the parent and child node on the mobile device and on a statistical method known as bootstrap. In this method,
the cloud. This information is stored for each node. Algorithm a random sampling with replacement is performed on the
3 traverses T starting from the root and visits all the nodes. profiled costs. This step obtains the needed observations of
At each node, the optimal location is selected depending on the random variables XiM , XiC and XiT . The module cost
the already known location of the parent. Xi , is then computed using Eq. (1). Next, measurements for
the random variable of the cost of every dependency node are
computed according to Eqs. (4)-(6) in a bottom-up approach
Algorithm 1: Solution algorithm up to the root. At the end, the values obtained for the root
Data: T : EDT represent the needed samples to construct the cost of the
Result: Location l = [lo , l1 , , ln ]
/* Globals: l: optimal decision. decision[r][lc ]:
application. This process is performed multiple times to get a
optimal decision for vr given its parent statistically representative estimate of the cost CDF.
location lc . cdf [r][lc ]: optimal CDF for vr given Depending on the offloading objective, multiple statistical
its parent location lc . solved[r]: marks if the
node vr has been solved/visited.
measurements of the cost can be derived using this method.
*/
1 begin Examples of these measurements are the mean, confidence
2 Solve(T ); intervals, percentiles and standard deviation. This technique
3 BuildSolution(T , 0);
4 return l;
allows an efficient and accurate estimation of the running costs
5 end of the application under different scenarios.
Next, we list some scenarios where this estimation method is
particularly useful in providing an accurate offloading decision
The next section is dedicated to show how the application that can accommodate the effects of the characteristics of the
CDFs can be obtained. device and the network connecting the user to the cloud.

8
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

Application analysis for different network conditions. The 1) Effects of the application size: Fig. 5 shows the generic
impact of the network on the performance of the application EDT used in the experiments where we vary the type of the
can be easily analyzed. Intuitively, when higher data rates dependency node v0 as well as the number of its children
are available, offloading the application modules becomes modules. Each of the child module nodes consists of sev-
faster and may not add a significant cost, in terms of the eral iterations of multiple random number computations and
response time, to the application execution. However, below mathematical operations and memory reads and writes. Since
a certain rate threshold, executing the application on the we are interested in evaluating the accuracy of the model,
device might be faster. This threshold differs significantly we first assume that the offloading decision is to execute
from one application to the other and can be statistically the application on the device. We run the experiments on
estimated using the developed cost model. an emulated device over a desktop computer with a 4 cores
Application analysis for different users behaviours. Different CPU and 12 GB of RAM. In all the experiments, we use the
users interact differently with the same application; for ex- application completion time as the cost to be minimized. We
ample, some users may only repetitively use specific features executed each experiment 20 times while varying the number
such as a given set of image processing operations on the of children from 1 to 20. we profiled the applications obtaining
full-resolution image. Other users, for example, may prefer observations for the module cost of each node. These costs are
a preview of multiple operations on a reduced-resolution then used to estimate the cost using the proposed statistical
image. Modifying the probabilities on the branching condi- cost model. Fig. 6 depicts the average of the measured execu-
tions for the EDT to match the user behaviour increases the tion time as well as the average cost estimated by the model
precision in estimating the application costs. as obtained from profiling the individual modules using the
Analysis of the application using different devices. This step bootstrap method. We use the mean absolute percentage error
makes it possible to predict the performance of the applica- (MAPE)[40] to measure the accuracy of the model. It is defined
tion on multiple devices. Furthermore, the application can as the means of the percentage error between each measured
PN
even be profiled for the same device with different con- and estimated value. It is equal to N1 t=1 | MtME t
t
|, where
figurations (e.g., energy savings and fully powered modes). Mt and Et are measured and estimated values, respectively,
Widely used applications can be also profiled using newly and N is the number of experiment runs.
developed devices, or for new operating system upgrades. For series and probabilistic dependency nodes, the MAPE
This may provide additional insights for the application value was 0.14% and 0.23%, respectively. We observed that
developers and device manufacturers. increasing the number of nodes did not affect the error. On
Analysis of the application under different application con- the other hand for the parallel root node, the MAPE value
figurations. Applications usually have multiple configuration started increasing as the number of nodes exceeded 4. We
parameters such as multiple graphic options in video games, attribute this increase to the hardware limitation of the hosting
including texture resolution, use of shadows and number device that has 4 CPU cores. Since the device can only host 4
of polygons. Clearly, these settings affect the costs of concurrent modules, true parallel execution cannot be achieved
running the modules of the applications and thus impact for more than 4 modules for the application. Hence, the actual
the execution cost of the application. These applications application execution time becomes larger than the estimated
may easily have a large number of possible configurations, one by the module. We plan to address this problem as a
but clearly these may not necessary affect the cost of future work where we envision that the effects of the number
the application modules. The unique ability of proposed of cores can be accounted for by using additional probabilistic
module to accommodate changes in the costs of individual dependency and series nodes in the EDT.
modules without having to repeat the profiling of the entire
application lends the model to be a ideal solution to profiling
v0 (root)
this genre of applications.
Analysis of the application with different design choices.
During application design, developers may have multiple v1 v2 . . . vn
design choices. The costs of running the application can be
estimated for each candidate design and used as a tool to Fig. 5: Application structure.
aid the developers in their choices.
In general, these use case scenarios can be performed us- 2) Effects of the standard deviation of the module costs:
ing various statistical measurements (e.g., mean, standard In this experiment, we maintained the same settings of the
deviation, quartiles and percentiles) as well as for diverse previous experiments, while modifying various operations
optimization objectives (e.g., energy and time minimization). within the child modules in the application shown in Fig. 5 in
order to increase the variance in their costs. More precisely, we
V. P ERFORMANCE E VALUATION modified the different modules in order to obtain an estimated
cost with a mean 7850 ms and standard deviation that varied
A. Model and Cost Estimation Accuracy between 0 to 2000ms. We then evaluated the accuracy of the
We first evaluate the accuracy of the proposed cost model model for that range of the standard deviation. Fig. 7 depicts
with respect to the types of dependency nodes in the applica- the obtained results for both the measured and the model
tion as well as the number of its modules. estimated costs.

9
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

fd : Face Detection main


th: Thumbnail Face Detection

rz : Resize Image prob
25000

measured series
measured prob


4.2 MB 4.2 MB
measured parallel
estimated series
m: 0s th fd0 m: 73.26s
c: 0s c: 15.66s
20000


estimated prob

estimated parallel

series
15000


Time (ms)

4.87 MB 0.26 MB


m: 0.38s rz fd1 m: 3.65s
c: 1s
10000



Fig. 8: The EDT of the face-detection application.

5000





as part of the offloading framework. The EDT is first built
0

5 10 15 20
automatically by observing the execution paths when profiling
the application. At this stage, the probability of running each
Number of Modules
child of the probabilistic dependency node is also calculated
Fig. 6: Measured and estimated execution cost versus the from these runs. Next, the profiler samples the cost of each
number of application modules. module from several runs of the application using the bootstrap
method. Fig. 8 also shows the sampled average cost values for
the modules. For different conditions or changes in the state of
For the applications with the series, probabilistic and par- the environment, such as a different mobile device or network
allel root nodes, the MAPE value was 2.08%, 3.54% and bandwidth, the EDT must be updated and the offloading
7.23%, respectively. As we expected the highest value was for algorithm must be run to obtain the new optimal offloading
the parallel node as was the case in the previous experiment. decision. This is similar to other works like MAUI [3].
However, we noticed that increasing the standard deviation did The mobile application runs on a Microsoft Phone Windows
not significantly affect the results for the three applications. Simulator, and the offloading server is a .Net application
running on Ubuntu 14.04 with Mono Framework [41] on a
DigitalOcean droplet [8]. On the server, we control the data
rate of the communication between the mobile application and
the offloading server in order to carry on the simulations.
25000





As described in SectionIII-D, the proposed dynamic pro-
gramming algorithm calculates the cost of different offloading
20000

choices for every module and eventually for all the location
vectors as it traverses the tree. Using these costs, we can
15000

estimate the costs of various location vectors without running


Time (ms)

the application. To measure the accuracy of the algorithm in


10000

calculating these costs, Fig. 9 plots the estimated (marked


measured series
as measured on the plot) costs for three offloading decisions
5000

measured prob
measured parallel
vectors: no offloading, offloading the entire application and
estimated series employing the developed algorithm to obtain the best location
estimated prob
estimated parallel vector while varying the available data rate of the underlying
0

network. The plot using the simulated device that is connected


0 500 1000 1500 2000
to the cloud for these three scenarios.
Standard Deviation (ms)
Clearly, the offloading decision as obtained by the proposed
algorithm shows the lowest costs for different network data
Fig. 7: Average measured and estimated cost varying the
rates. On the other hand, the MAPE values were 7.3%,
standard deviation of module costs.
4.7% and 4.63% for the no offloading, offloading the entire
application and employing the developed algorithm scenarios,
respectively. The slight increase in these values compared to
B. Accuracy of the offloading algorithm those computed in the previous experiments can be explained
In this experiment, we evaluate the efficiency of the offload- by noting that the cost model does not include some additional
ing algorithm using a face-detection application whose EDT is overhead that is incurred in actual execution environments.
depicted in Fig. 8. Using the application, the user either either Our investigations showed that this overhead is the result
executes the face detection method (f d0) directly or chooses to of executing the offloading framework in addition to time
execute a thumbnail face detection operation (method th). The taken for, TCP session establishment, resource allocation and
latter method first calls the image resizing operation (method memory access, on the device. Noting that for this experiment,
rz) then the face detection one (f d1). We developed a profiler we did not use the user device to profile the application.

10
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

This result suggests that future additional work is needed to full image. The top left histogram with P (th) = 0 depicts
include these effects in the cost models in order to improve the cost for a user that never uses the thumbnail preview. In
the precision of the model. this case, the histogram shows that the user will almost always
experience a long execution time for the application that ranges
between 400s and 600s. As we see users with a higher P (th),
their histograms show smaller and smaller average wait times.
For the user that almost always uses thumbnail images face-

detection, the wait is minimal and falls within the range
800

of 60-100s. Figs. 11 and 12 plot the density functions of


700

the total cost of the face-detection application corresponding


to the histograms and their CDFs, respectively. Additionally,
600

Measured no offloading
Measured full offloading Tbl. I provides a sample of some common statistics for the
Time (s)

Measured optimal
application cost that can be useful to application developers.
500

Simulated no offloading
Simulated full offloading
Simulated optimal The table shows that the lowest mean for the cost when the
400

user always uses the thumbnails P (th) = 1. It also shows


that the variance in this case is still small with a standard
300

deviation of 13.61s. The second to last column shows the 95


cost percentile. It indicates that 95% of the executions for that
200

0.05 0.15 0.3 0.5 0.6 0.7 0.8


user will be faster than 94.26s.
Data rate (MBps)
P(th)=0 P(th)=0.2

10000
Fig. 9: A comparison of the execution time cost of the Frequency

Frequency

10000
0 4000

proposed offloading algorithm versus full and no offloading

0
decisions while varying the network data rate. 400 500 600 700 0 100 200 300 400 500 600 700

Seconds Seconds

P(th)=0.4 P(th)=0.6

C. Cost model sensitivity to varying contexts


Frequency

Frequency
15000

20000
Next, we illustrate the use of the developed statistical cost
0

0
model to accurately measure the performance of developed 0 100 200 300 400 500 600 0 200 400 600

applications under different scenarios including diverse user Seconds Seconds

preferences, different device configurations and when using P(th)=0.8 P(th)=1


various network technologies. With the wide prevalence of
Frequency

Frequency

remote and frequent updates of mobile applications, we envi-


30000

6000

sion that the new model can serve as an important application


0

analysis tool for application developers. While modelling these 0 100 200 300 400 500 600 20 40 60 80 100 120

Seconds Seconds
scenarios, the developed tool can produce histograms, density
functions and CDFs that accurately reflect the varying costs Fig. 10: Histograms of the estimated execution time of the
of the application. In addition, statistical measurements can face-detection application for different users behaviours.
also be computed, such as the quartiles, expected values
and the standard deviation. in order to aid the developers
P (th) 1st Q Median Mean 3rd Q 95 % SD
in perfecting the applications performance. To this end, the
0 428.1 468.7 467.5 510.9 550.35 58.74
following sections demonstrate some of these scenarios. 0.25 111.80 443.90 370.00 500.60 543.95 177.95
1) User Behaviour: Clearly, users interact differently with 0.5 75.10 111.80 269.80 468.40 536.85 199.42
their applications. They may also exhibit a repetitive pattern 0.75 68.72 83.10 173.40 111.90 525.27 172.98
1 65.60 75.18 75.37 86.50 94.26 13.61
while using certain applications. For example, for a face-
detection application, one user may be used to execute the TABLE I: A Sample of statistical measurements for the face-
application directly on full sized images. On the other hand, detection application for different users behaviours.
another user may almost allows prefer to resize all images to
their thumbnails before executing the face-detection applica- 2) Cost sensitivity to different device and network config-
tion. Different users behaviours may as well fall in between urations: Second, we demonstrate the use of this model when
these two behaviours. Fig. 10 plots the histograms depicting the developer is testing the behaviour of the applications as
the cost, in terms of the execution time for different users of it runs on several mobile devices that use different networks.
the face-detection application as they use a face-detection for More precisely, we consider two devices one of which is one
a sample of 20 images. In the figure, each histogram depicts and a a half times faster. We mark the first as a device with
a different user behaviour as marked by the parameter P (th). a speed up factor of 1 while the other with a 1.5 speed up
This parameter represents the probability that the user will use factor. Here, the speed up factor is a simple measure of the
the face-detection for a thumbnail of the image instead of the CPU speed of the mobile device that allows comparing the

11
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

and full offloading solutions the benefit is minimal. Such


an analysis may help the developer in quantifying the real
impact of the mobile device characteristics on the application
0.04

P(th)=0
P(th)=0.25 performance while considering various offloading solutions for
P(th)=0.5
P(th)=0.75 users using different networking technologies.
0.03

P(th)=1
Density

0.02

1000
NooffloadDeviceSpeedUpFactor=1
NooffloadDeviceSpeedUpFactor=1.5
FullOffloadDeviceSpeedUpFactor=1
0.01

FullOffloadDeviceSpeedUpFactor=1.5
Fd0OffloadDeviceSpeedUpFactor=1

800
Fd0OffloadDeviceSpeedUpFactor=1.5
0.00

Time (s)

600
0 100 200 300 400 500 600

Time (s)

Fig. 11: Density of the estimated execution cost of the face-

400
detection for various users behaviours.

200
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1.0

Data rate (MBps)

Fig. 13: Expected execution time of the face-detection applica-


0.8
Cumulative Distribution Function

tion for different devices and offloading decisions and network


conditions.
0.6

3) Cost sensitivity to Users behaviors and network con-


0.4

ditions: Another useful use case scenario for the developed


P(th)=0 tool is when the developer is interested in investigating the
0.2

P(th)=0.25
P(th)=0.5 dependencies among several parameters given a preselected
P(th)=0.75
P(th)=1
offloading decision vector for the application. This can help
0.0

the developer in identifying the best scenarios for adopting a


0 100 200 300 400 500 600 certain offloading decision for instance. Fig. 14 demonstrates,
Time (s) such an example where it plots the estimated average cost
as the network data rates and the user behaviour (defined
Fig. 12: Estimated cost CDF for the face-detection application by the probability of selecting a thumbnail image P (th)) are
for various users behaviours. varied for a location decision vector that uses the cloud only
Whenever a user chooses to employ face detection on full size
images. As shown in the figure, for low network speeds, the
relative speed of different devices. Each device experiences user behaviour has a significant impact on applications run-
different different network conditions with varying data rates. ning time. As the data rate increase, the faster the application
We profile the same face-detection application as it runs on runs even for those users that choose to use full size images.
these two devices while considering three different offload- Another similar test example is shown in Fig. 15. The figure
ing decision, namely, execute the application entirely on the shows the average application execution time for different
device, remotely on the cloud or using the decision resulting data rates and device speed up factors when face-detection for
from our offloading decision algorithm that only offloads the full images is done in the cloud. Clearly, for this offloading
execution of the face-detection operation on full size images ( decision, the impact of the device speed up is negligible.
marked by the module f d0). These three decisions are referred Similarly, Fig. 16 shows the cost for the face-detection
to as, No-offload, Full-offload and f d0-offload, respectively, application under different user behaviours and device speed
in the generated figures. factors and no offloading. For users that perform face-detection
Fig. 13 depicts the expected costs, in terms of the execution on the full image, the speed of the device has a significant
times, for the two devices while varying the available data rate impact and the application running time can be reduced by
for the underlying network. Clearly, increasing the devices about 50% with a speed factor of 2. For users are frequent
computational power has a considerable impact when the users of thumbnail face-detection, reflected by a larger value
application runs on the mobile, while for the both the partial for P (th), the smaller the benefit of a faster device is.

12
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

1500 1000

Time (s)
Time (s)

1000
500

2.0
500 1.8
1.0 or
0.0 1.6 ct
0.8 0.2 fa
1.4 p
0.6 0.4 dU
0.0 0.6 ee
0.2
1.2 Sp
0.4 ) P(th) 0.8
vice
0.4 1.0 1.0
0.6 0.2 P (th De
Data 0.8
rate (
MBps) 1.0 0.0

Fig. 16: Execution cost versus device speed ups and users
behaviours when no modules are offloaded.
Fig. 14: Execution cost versus the network data rate and users
behaviours when only fd0 is offloaded.
These functions can be estimated for various factors that may
affect the offloading decision. These factors include the users
behaviours while interacting with the application as well as the
underlying network conditions. Using these functions, a novel
offloading algorithm is also introduced to optimally select
which components of the application can be offloaded to the
cloud in order to optimize a given objective. We further show
that, when these functions meet certain properties, namely,
800 when they form an ordered set, the offloading problem can
be efficiently solved via a simple dynamic programming
Time (s)

600 procedure. Performance evaluation results demonstrated that


the mean absolute percentage error between the measured and
400
2.0
the cost estimated by the model can be as small as 4% for
1.8 applications that include sequential and probabilistic (condi-
1.6 or
0.0
0.2 fa
ct tional branching) components. Finally, we describe several
1.4 p
0.4 dU use case scenarios where the developed model can be used
0.6 1.2 p ee
Data S
rate ( 0.8
vice as an efficient application profiling tool by the developers to
MBps) 1.0 1.0 De
continuously enhance the performance of the application given
different network and device configurations as well as users
behaviours. In the future, we will investigate the behaviour of
Fig. 15: Execution cost versus the network data rate and device applications comprised of parallel modules given additional
speed ups when only fd0 is offloaded. factors, such as the number of available CPU cores in order
to further enhance the model accuracy when the degree of
needed parallelism exceeds the number of available CPU
VI. C ONCLUSION AND F UTURE WORK cores. Also, we will analyze the effects of using different
statistical measurements as optimization goals. Examples of
This article described a novel application model and a new the target goals include minimizing the highest end of the
algorithm to solve the application offloading decision making 95% mean confidence interval and the 95% percentile.
problem. The novel model employs tree structures, referred
to as tree Execution Dependency Trees (EDTs), to accurately R EFERENCES
represent the different execution paths of the application. In [1] A. Khan, M. Othman, S. Madani, and S. Khan, A survey of mobile
contrast to existing models, this adopted structure allows an cloud computing application models, Communications Surveys Tutori-
als, IEEE, vol. 16, no. 1, pp. 393413, 2014.
offloading algorithm to create multiple offloading decisions [2] N. Fernando, S. W. Loke, and W. Rahayu, Mobile cloud computing:
per module, or application component, based on its current A survey, Future Generation Computer Systems, vol. 29, no. 1, pp.
execution path. This accuracy is further enhanced by the use 84106, 2013.
[3] E. Cuervo, A. Balasubramanian, D. Cho, A. Wolman, S. Saroiu,
of a novel cost model that employs probabilistic distribution R. Chandra, and P. Bahl, MAUI: making smartphones last longer with
functions rather than the commonly used fixed cost averages. code offload, in ACM MobiSys 10, 2010, pp. 4962.

13
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCC.2015.2513404, IEEE
Transactions on Cloud Computing

[4] S. Ou, K. Yang, and A. Liotta, An adaptive multi-constraint partitioning [29] E. Ahmed, A. Akhunzada, M. Whaiduzzaman, A. Gani, S. H. Ab Hamid,
algorithm for offloading in pervasive systems, in IEEE PerCom, March and R. Buyya, Network-centric performance analysis of runtime ap-
2006. plication migration in mobile cloud computing, Simulation Modelling
[5] D. Huang, P. Wang, and D. Niyato, A dynamic offloading algorithm Practice and Theory, vol. 50, pp. 4256, 2015.
for mobile computing, Wireless Communications, IEEE Transactions [30] S. Abolfazli, Z. Sanaei, M. Alizadeh, A. Gani, and F. Xia, An
on, vol. 11, no. 6, pp. 19911995, 2012. experimental analysis on cloud-based mobile augmentation in mobile
[6] J. Barrameda and N. Samaan, A novel application model and an cloud computing, Consumer Electronics, IEEE Transactions on, vol. 60,
offoading mechanism for effcient mobile computing, in IEEE WiMOB, no. 1, pp. 146154, 2014.
2014, pp. 8184. [31] T. Taleb, M. Corici, C. Parada, A. Jamakovic, S. Ruffino, G. Karagiannis,
[7] D. P. Bertsekas and J. N. Tsitsiklis, Introduction to probability, Athena and T. Magedanz, Ease: Epc as a service to ease mobile core network
Scientific press, 2000. deployment over cloud, Network, IEEE, vol. 29, no. 2, pp. 7888, 2015.
[8] Digitalocean cloud, https://cloud.digitalocean.com/, accessed June- [32] M. Barbera, S. Kosta, A. Mei, and J. Stefa, To offload or not to offload?
2015. the bandwidth and energy costs of mobile cloud computing, in IEEE
[9] B. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, Clonecloud: elastic INFOCOM, 2013, pp. 12851293.
execution between mobile device and cloud, in Proceedings of the sixth [33] H. Wu, Q. Wang, and K. Wolter, Tradeoff between performance
conference on Computer systems, 2011, pp. 301314. improvement and energy saving in mobile cloud offloading systems,
[10] J. Flinn, S. Park, and M. Satyanarayanan, Balancing performance, in IEEE ICC, 2013, pp. 728732.
energy, and quality in pervasive computing, in 22nd IEEE ICDCS, July [34] H. Wu and K. Wolter, Tradeoff analysis for mobile cloud offloading
2-5, 2002, Vienna, Austria. IEEE, 2002, pp. 217226. based on an additive energy-performance metric, in Proceedings of the
[11] P. Bahl, R. Han, L. Li, and M. Satyanarayanan, Advancing the state of 8th International Conference on Performance Evaluation Methodologies
mobile cloud computing, in Proceedings of the third ACM workshop and Tools, ICST, 2014, pp. 9097.
on Mobile cloud computing and services. ACM, 2012, pp. 2128. [35] J. Song, Y. Cui, M. Li, J. Qiu, and R. Buyya, Energy-traffic tradeoff
[12] M. Kristensen, Scavenger: Transparent development of efficient cyber cooperative offloading for mobile cloud computing, in 22nd IEEE
foraging applications, in IEEE PerCom, 2010, pp. 217226. IWQoS, 2014, pp. 284289.
[13] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, Thinkair: [36] J. Liu, K. Kumar, and Y.-H. Lu, Tradeoff between energy savings
Dynamic resource allocation and parallel execution in the cloud for and privacy protection in computation offloading, in 16th ACM/IEEE
mobile code offloading, in IEEE INFOCOM, 2012, pp. 945953. international symposium on Low power electronics and design, 2010,
[14] A. Cidon, T. London, S. Katti, C. Kozyrakis, and M. Rosenblum, pp. 213218.
MARS: adaptive remote execution for multi-threaded mobile devices, [37] R. A. Sahner and K. S. Trivedi, Performance and reliability analysis
in Proceedings of the 3rd ACM MobiHeld, 2011. using directed acyclic graphs, Software Engineering, IEEE Transactions
[15] L. Yang, J. Cao, S. Tang, D. Han, and N. Suri, Run time application on, vol. 1, no. 10, pp. 11051114, 1987.
repartitioning in dynamic mobile cloud environments, Cloud Comput- [38] R. E. Bellman and S. E. Dreyfus, Applied dynamic programming. Rand
ing, IEEE Transactions on, To Appear, 2015. Corporation, 1962.
[16] R. Kemp, N. Palmer, T. Kielmann, and H. Bal, Cuckoo: a computation [39] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction
offloading framework for smartphones, Mobile Computing, Applica- to algorithms, 3rd Edition. The MIT press, 2009.
tions, and Services, pp. 5979, 2012. [40] R. J. Hyndman and A. B. Koehler, Another look at measures of forecast
[17] J. Li, R. Ma, and H. Guan, Tees: An efficient search scheme over accuracy, International journal of forecasting, vol. 22, no. 4, pp. 679
encrypted data on mobile cloud, Cloud Computing, IEEE Transactions 688, 2006.
on, To Appear, 2015. [41] M. Project. [Online]. Available: http://www.mono-project.com/, accessed
[18] K. Elgazzar, P. Martin, and H. Hassanein, Cloud-assisted computation 18-June-2015
offloading to support mobile services, Cloud Computing, IEEE Trans-
actions on, To appear, 2015.
[19] C. Kamienski, R. Simoes, E. Azevedo, R. Dantas, C. Dias, D. Sadok,
and S. Fernandes, An integrated composition model for collaboration
in the cloud, in 1st IEEE LatinCloud, Porto Alegre, Brazil, 2012, pp.
1924.
Jose Barrameda is currently a Ph.D student in
[20] , E2ecloud: Composition and execution of end-to-end services in
Computer Science at School of Electrical Engineer-
the cloud, in Computers and Communication, IEEE Symposium on,
ing and Computer Science (EECS) at University of
2014.
Ottawa. He received his MSc. in Computer Science
[21] G. Alves, E. Cavalcante, F. Lopes, E. Azevedo, R. Dantas, T. Batista,
from EECS in 2011. His current research interests
S. Fernandes, and C. A. Kamienski, Tasks meet flows: Merging two
focus on mobile cloud computing, distributed com-
paradigms in a cloud applications development platform, in 2nd IEEE
puting and algorithm analysis and design.
LatinCloud, Macei, Brazil, 2013, pp. 3540.
[22] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, The case for
vm-based cloudlets in mobile computing, Pervasive Computing, IEEE,
vol. 8, no. 4, pp. 1423, 2009.
[23] C. Shi, M. Ammar, E. Zegura, and M. Naik, Computing in cirrus
clouds: the challenge of intermittent connectivity, in Proceedings of the
first edition of the MCC workshop on Mobile cloud computing. ACM,
2012, pp. 2328.
[24] G. Huerta-Canepa and D. Lee, A virtual cloud computing provider for
mobile devices, in ACM 1st MCS,San Francisco, USA. ACM, 2010, Dr. Nancy Samaan received the BSc and MSc
p. 6. degrees from the Department of Computer Science,
[25] Q. Xia, W. Liang, Z. Xu, and B. Zhou, Online algorithms for location- Alexandria University, Egypt, and the PhD degree
aware task offloading in two-tiered mobile cloud environments, in in computer science from the University of Ottawa,
Proceedings of the 2014 IEEE/ACM 7th International Conference on Canada, in 2007. She is currently an associate pro-
Utility and Cloud Computing. IEEE Computer Society, 2014, pp. 109 fessor with the School of Electrical Engineering
116. and Computer Science, University of Ottawa. Her
[26] J. Yue, D. Zhao, and T. D. Todd, Cloud server job selection and current research interests include network resource
scheduling in mobile computation offloading, in IEEE GLOBECOM, management, wireless communications, quality-of-
2014, pp. 49904995. service issues, and autonomic communications. In
[27] X. Chen, Decentralized computation offloading game for mobile cloud 2008, she received the Natural Sciences and En-
computing, IEEE Transactions on Parallel and Distributed Systems, gineering Research Council of Canada University Faculty Award. She is a
vol. 26, no. 4, pp. 974 983, 2015. member of the IEEE.
[28] W. Qiu, Z. Zheng, X. Wang, X. Yang, and M. Lyu, Reliability-based
design optimization for cloud migration, Services Computing, IEEE
Transactions on, vol. 7, no. 2, pp. 223236, 2014.

14
2168-7161 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like