Master's Project Report HVNS RC3

Simulating the Performance of Data
Distribution and Retrieval Algorithms

with
Hardware Varying Network Simulator
Master’s Project
Alexander G. Maskovyak
Rochester Institute of Technology

Golisano College of Computing & Information Sciences
Department of Computer Science
102 Lomb Memorial Drive
Rochester, NY 14623‐5608
alexander.maskovyak@gmail.com
May 2010
Chair – Hans-Peter Bischof
Reader – Axel Schreiner
Observer – TBA
P age |2
P age |3
Contents
Abstract .................................................................................................................................................. 7
1. Introduction .................................................................................................................................... 8
1.1. Motivation ............................................................................................................................... 8
1.2. Distributed Data ....................................................................................................................... 8
1.3. Simulation................................................................................................................................ 8
1.4. Current Simulators ................................................................................................................... 9
2. Project Description ........................................................................................................................ 10
2.1. Terminology ........................................................................................................................... 10
2.2. Architecture ........................................................................................................................... 11
2.3. Report Structure .................................................................................................................... 12
3. Simulation ..................................................................................................................................... 14
3.1. Static versus Dynamic Simulation ........................................................................................... 14
3.2. Deterministic versus Stochastic Simulation............................................................................. 14
3.3. Continuous versus Discrete Simulation ................................................................................... 14
3.4. Analytical versus Agent-Based Simulation .............................................................................. 15
4. Simulation Implementation ........................................................................................................... 16
4.1. Simulator ............................................................................................................................... 16
4.2. Simulatables........................................................................................................................... 18
4.3. Operation Bound Simulatables ............................................................................................... 20
5. Network Model ............................................................................................................................. 23
5.1. Node ...................................................................................................................................... 23
5.2. Connection Adaptor ............................................................................................................... 24
5.3. Connection Medium............................................................................................................... 25
5.4. Protocol Stack ........................................................................................................................ 26
6. Hardware Model............................................................................................................................ 28
6.1. Hardware Computer Node ..................................................................................................... 29
6.2. Harddrive ............................................................................................................................... 29
6.3. Cache ..................................................................................................................................... 29
6.4. Connection Adaptor ............................................................................................................... 29
7. Distribution and Retrieval Algorithms ............................................................................................ 31
7.1. Operation Model .................................................................................................................... 31
P age |4
7.2. Implementation ..................................................................................................................... 32

7.3. Client-Managed Distribution Algorithm .................................................................................. 33
7.4. Server-Managed Distribution Algorithm ................................................................................. 37
8. Configuration................................................................................................................................. 42
8.1. Java API .................................................................................................................................. 43
8.2. HVNSLanguage ....................................................................................................................... 46
8.3. HVNSL Example Configuration ................................................................................................ 47
8.4. Configuration Directory Structure .......................................................................................... 51
8.5. HVNSL Grammar .................................................................................................................... 53
9. Algorithm Benchmarking ............................................................................................................... 55
9.1. Metrics................................................................................................................................... 55
9.2. Logging Implementation ........................................................................................................ 56
9.3. Log File Format ...................................................................................................................... 56
9.4. Generating Logs ..................................................................................................................... 57
10. Simulation Expectations ............................................................................................................. 58
10.1. Varying Adaptor Speed ....................................................................................................... 58
10.2. Varying Cache Size.............................................................................................................. 58
10.3. Varying Cache Speed .......................................................................................................... 59
10.4. Varying Server Quantity ..................................................................................................... 59
10.5. Varying Redundancy ........................................................................................................... 59
11. Simulation Results ..................................................................................................................... 60
11.1. Varying Adaptor Speed ....................................................................................................... 61
11.2. Varying Cache Size.............................................................................................................. 62
11.3. Varying Cache Speed .......................................................................................................... 64
11.4. Varying Server Quantity ..................................................................................................... 65
11.5. Varying Redundancy ........................................................................................................... 66
12. Simulator Comparison................................................................................................................ 67
12.1. ns-2 .................................................................................................................................... 67
12.2. JiST ..................................................................................................................................... 69
12.3. OMNeT++........................................................................................................................... 70
13. Conclusions................................................................................................................................ 71
14. Future Work .............................................................................................................................. 73
P age |5
14.1. Architecture ....................................................................................................................... 73

14.2. HVNSL ................................................................................................................................ 73
14.3. Algorithm Design................................................................................................................ 74
14.4. Benchmarks ....................................................................................................................... 74
References ............................................................................................................................................ 75
P age |6
P age |7
Computer Science is a science of abstraction—creating the right model for a problem and
devising the appropriate mechanizable techniques to solve it.
- A. Aho and J. Ullman
Abstract
Software‐based network simulators are often an essential resource in networking research. Network
simulation affords researchers the ability to test communication protocols and topological design where
this would otherwise be economically or physically infeasible. A heretofore unexplored avenue for
simulation which has not been explored is the interplay between the computer hardware of devices on
a network and the impact this has on the performance of data distribution algorithms. This is line of
inquiry is becoming increasingly important as the amount of data produced begins to exceed the
capacity of local storage and must instead be stored on remote servers and later retrieved.
Hardware Varying Network Simulator (HVNS) was designed to fulfill this niche. HVNS models how
hardware attributes like disk-read speed, communication link bandwidth, and cache size affect the
performance of one data distribution and retrieval algorithm over another. HVNS can demonstrate
when adding functionality like cache utilization or a faster connection adaptor results in no performance
gain because of hardware properties of the network. HVNS can be used to test purely theoretical
hardware and its effects. This can be used to discriminate between algorithms which may not currently
be viable but may become viable should certain hardware capabilities become available in the future.
A cursory glance over network-related primary literature demonstrates that Network Simulator version
2 (ns-2) [1] is the de-facto standard in discrete event-driven network simulation. It is widely used in
transport, network, and multicast protocol testing for both wired and wireless networks. However, it
and similar simulators model the network at an inappropriate level of abstraction for this domain.
These simulators are slower, harder to learn, and more difficult to extend. HVNS was designed to be
easy to use, extensible, easily configurable, and perform well for this domain. This project examines the
design of HVNS, its configuration language, and provides insight into the affects that hardware can have
on data distribution and retrieval.
P age |8
Data expands to fill the space available for storage.

- Parkinson’s Law of Data
1. Introduction
Data storage and access requirements in the present and near‐term are going to necessitate the
deployment of distributed hardware solutions and the use of data distribution and retrieval algorithms
for a wide variety of fields, namely scientific research. There is active ongoing research into the
development of these algorithms. Network simulators exist to test network protocols but are not
specifically designed to test the interaction between I/O algorithms and the computer hardware of the
devices in that network topology.
1.1. Motivation
The amount of data generated world‐wide is increasing at an exponential rate [2]. Scientific research
using high performance computers produces a considerable amount of data. The experiments run on
CERN's Large Hadron Collider are expected to generate 2 gigabytes of data every second, with a total
yearly production of 10 to 20 petabytes of information [3][4]. The Wide Field Infrared Camera (WFCAM)
of Cambridges Astronomical Survey Unit captures approximately 2200 images of the night's sky which
take up 230 gigabytes of space [5]. It is physically and economically infeasible with current technology
to store and backup this data on a single localized storage medium [6]. Instead, these data are stored
on multiple networked machines which are retrieved in intelligent ways to mitigate the costs associated
with non‐local access.
1.2. Distributed Data
Distributed file systems allow these data to be stored and accessed from across multiple file servers as
though they were local resources. Distributing data incurs performance penalties. Additional time is
spent on I/O over the network connection for sending the data to be stored initially and retrieving it
later for an application's/client's later use. A variety of techniques are being explored in research to
increase the performance of file distribution and retrieval algorithms which include the use of caching,
data duplication, and data compression [7].
1.3. Simulation
Software‐based network simulators are often an essential resource in networking research. Network
simulation affords researchers the ability to test communication protocols and topological design where
this would otherwise be economically or physically infeasible. Simulation can similarly be applied to the
study of data distribution and access algorithms where hardware properties can greatly impact the
performance of one algorithm over another. Simulation also provides an additional benefit aside from
circumventing resource limitations: theoretical hardware models can be constructed to test the impact
of technology that is on the horizon or in‐development [8].
P age |9
1.4. Current Simulators
A wide variety of simulators have been deployed in professional and academic settings. Some popular
simulators include Ns-2, JiST, and OMNeT++. Network Simulator version 2 is a discrete‐event network
simulation environment and the de‐facto standard used in the field of network research to test, design,
and benchmark network protocols in wired environments. Ns‐2 offers support for simulating a full
network protocol stack, multi‐cast protocols, and routing. The ns‐2 configuration file requires
knowledge of OTcl, a scripting language designed to be embedded in applications [7]. Java in Simulation
Time (JiST) is a prototype for virtual machine-based simulation where a Java application’s code is
intercepted and transformed into bytecode which contains simulation time-semantics [9]. It is efficient
and allows a developer to avoid the use of a domain-specific simulation language to define a simulation.
OMNet++ is a discrete‐event, network simulator [10]. It is an open‐source project which provides model
frameworks which can be used to develop domain‐specific projects. It is designed for extensibility.
P a g e | 10
The hardest part of the software task is arriving at a complete and consistent specification, and
much of the essence of building a program is in fact the debugging of the specification.
- F. Brooks
2. Project Description
This project examines the interplay between remote storage-and-retrieval algorithms and how they are
affected by aspects of the hardware and network topology on which they are deployed. Specifically the
question is under what circumstances remote read requests can outperform local read requests. One
way to answer this question is via simulation. The approach is to design a custom simulator, model the
applicable network and computational domains, implement several distribution algorithms, and
measure their performance on a variety of network/computational configurations.
2.1. Terminology
This project deals with several computer science domains: simulation, networking, and hardware.
Computer simulation explores how an abstract model’s state changes over the course of time. The
model is a representation of an object or grouping of objects as well as their behavior in response to a
variety of events. A simulator is responsible for performing a simulation and is generally responsible for
managing the flow of time and monitoring system state changes. A discrete scheduled event simulator
manages time as a series of events in an event queue [11]. The simulator handles events scheduled to
occur at some point in time which can trigger the scheduling of additional events further into the future.
Events are created by the model as a part of its behavior and cause the model to undergo state changes.
An agent-based simulation explicitly models individual entities as opposed to an analytical model which
operates with underlying mathematical equations or otherwise abstracts out entities as numeric
properties. Agents respond to events based upon a set of internal rules or logic [12]. These responses
can include an alteration to their state and/or a scheduling of future events within the simulator. The
behavior of a state-based agent can be modeled by a Finite State Automata. Finite State Automata (also
known as a Finite State Machines) are graphical models which describe the states of an entity, the
events an entity receives, the behavior of an entity in response to events, and how an entity’s state
changes in response to events when it occupies any particular state [13].
A simulation run is the result of a simulator performing these functions on a model with some initial
state conditions. Typically, the simulation run will produce some form of diagnostic data regarding the
behavior of the system. An interested party will run several simulations with varying initial conditions
and model parameters in order to observe patterns, trends, or to benchmark “performance” against
some variety of metrics.
Computer networking deals with computational devices which are connected via media to facilitate
communication and the sharing of computational resources like hardware, software, or data. Networks
P a g e | 11
range in size from the small local area network (LAN) of a family-unit with a handful of devices
connected to a wireless access point to the internet which consists of hundreds of thousands of
networks of computers connected across the globe.
Software applications typically have two roles on a network: client or server. A client role indicates that
a piece of software is performing a request on a piece of software in the server role which is responding
to / servicing the request. Software communicates across the network via discrete messages which are
known as datagrams, packets, segments, or messages depending upon the level of abstraction being
used. A protocol is an agreed-upon dialogue that two pieces/instances of software program use that
governs that piece of the program’s behavior/interactions in response to messages. Here too, an FSM
can be used to describe this interaction. The protocol stack is a layered series of modular protocols
which implements a suite of computer networking protocols. Each layer of the stack handles one aspect
of communication so that the layer above it does not have to—they are said to provide a “service” to
the layer above [14]. The protocol stack has 3 basic layers (though there are commonly more and
groupings can differ) with one protocol each handling media access, message transportation, and
applications. The media access layer handles transformation of information into a form that can be
transported by the underlying medium to which a device is connected. The transportation layer
determines how messages find their way to their intended destination as they pass through a series of
devices. The application layer is where most end-user centric, desktop applications reside and is the
layer of the distribution algorithms developed for this project. The distributed algorithms above make
use of virtual “hardware” to store data and send messages to one another.
The hardware model includes both data storage and communication hardware. A harddrive has large,
long-term storage but generally has slow access times. A cache is much smaller, short-term storage but
has comparatively much faster access times. Caches are used to speed up data retrieval. Caches store
frequently accessed information so that the fast speed of the cache can be leveraged and the slow
access of the harddrive avoided. When requests are uniform or predictable, the cache can further be
used to store information that has not yet been requested but which may be requested at some point in
the future. Due to the small size of a cache, it is important to make intelligent decisions about what is
stored there; otherwise no speedup over harddrive access can be realized. A connection adaptor is the
gateway to the medium connecting computational devices to one another. Network infrastructure
speeds are rapidly increasing, and in some cases, are much faster than the access time for local storage
[15].
The last observation is the impetus behind this project.
2.2. Architecture
Hardware Varying Network Simulator or HVNS is the result of the approach described in the description.
HVNS is an agent-based, discrete-event simulator. The simulator runs in its own thread with a thread-
safe, event priority-queue. Simulatables are entities which schedule events with the simulator and
handle messages contained within an event from a sender.
P a g e | 12
Events are containers that hold a message, intended receiver, and a time of delivery. Events are held in
the simulator queue first in time-order, and second by a priority order flag. The simulator pops events
from the top of its queue, updates its time to match the event’s time, and then delivers the message to
the intended simulatable recipient. The simulatable then proceeds to perform some operations in
response to this event which may result in the recipient scheduling additional events with the simulator.
Events can be scheduled by any simulatable for any simulatable. Messages from a simulatable to itself
are given the highest priority and are considered “control” messages that have no transit cost. They are
intended to allow a simulatable to schedule the alteration of its state for a future time and “bootstrap”
itself for future operations/work.
The network model consists of nodes directly connected with some set of neighbors via Ethernet-like
connection medium. Nodes exchange messages via packets over this medium via next-hop routing. The
computational model views nodes as computers, each of which contains a number of components
which can store and retrieve varying amounts of data at varying rates.
Distribution algorithms are the primary agents in this model and make use of the computational
facilities to request data to be distributed, data to be stored, and data to be retrieved. One algorithm on
one node is designated as the client. This client is told to execute which causes it to schedule a
“bootstrapping” event for itself in the queue. The client will then proceed to request a specified amount
of data from its harddrive and send this information out to select devices on the network for storage.
The interval of time between the first harddrive request and the last harddrive response is considered
the baseline local read-time. Server nodes will begin to store, replicate, and propagate this information
across algorithmically selected devices. Once distributed storage is complete a message is sent back to
the client node. The client node can then begin requesting the data back from the server nodes. The
interval of time between the first remote request and the reception of the last requested piece of data
is considered the experimental remote read-time. Smaller time intervals indicate greater/better
performance. Ideally, the confluence of the algorithm’s design and the attributes of the available
hardware will allow the algorithm’s remote read-time outperform the local read-time.
The simulator environment and the model are configured through a domain-specific language called
HVNSLanguage (HVNSL). This configuration language has relatively few keywords and has a fairly
predictable, uniform syntax across the entire configuration space. The configuration language is
domain-specific which allows it to focus on simplifying the configuration of hardware-components as
well as the creation of network topologies.
2.3. Report Structure
This report begins with an examination of simulation theory with an emphasis placed on the techniques
employed by HVNS. It explores the architecture and interfaces of HVNS, the design decisions involved,
and the benefits and drawbacks of this approach.
The models under simulation are then explored in the same way. The design and abstractions employed
for the network, its protocol stack, and routing infrastructure is discussed. This is followed by an
P a g e | 13
examination of the architecture and abstraction behind the computer model, the hardware employed,
and the hooks included for the distribution algorithms. The report next explores the distribution
algorithms themselves, their operation, and the design intent.
HVNSLanguage (HVNSL), the simulator and model configuration language, is then discussed including its
syntax and its semantics.h
The benchmarking apparatus, measurements, and expectations are detailed. The results of the
measurements are then described. The report analyzes the results in comparison to expectations.
Further analysis is given comparing the actual simulator architecture itself to competing simulators
employed in the field. This includes a discussion on the benefits and drawbacks of the various
approaches described in relation to HVNS.
The report concludes with an evaluation of the project as a whole. Here the success of HVNS, the
algorithms, and benchmarking are discussed. Suggestions for future work are next discussed which
includes potential improvements to the simulator, model, configuration language, and testing
apparatus. The lessons learned from the approach taken to this project’s development are also included
as guidance for future work.
P a g e | 14
To dissimulate is to feign not to have what one has. To simulate is to feign to have what one
hasn’t. One implies a presence, the other an absence.
- Jean Baudrillard
3. Simulation
In the broadest sense, simulation is the fabrication or imitation of something which is “real.” An
imitation reproduces some subset of the defining characteristics of the real item. These characteristics
may be behavioral, physical, or intangible. In Computer Science, the preceding abstraction is known as a
model and the imitated properties are considered the model’s state. Computer simulation is generally
concerned with utilizing inputs in tandem with this model to affect change in the model’s state over
time (the cardinality of the change set can be one as well). How this occurs depends upon the
classification and specific implementation of the model.
Computer simulations can be classified into several types based upon aspects of the simulator on which
or model upon which they are being performed.
3.1. Static versus Dynamic Simulation
Simulators possess either static or dynamic models. Static models produce a single solution for a
simulation run. Dynamic systems have models which can assume several states over the course of a
simulation run. Static models are useful for analyzing the relationships between sets of input with
output variables. Dynamic systems are useful for analyzing how a system gets from some arbitrary start
condition to an arbitrary ending condition inside the span of time simulated [16].
3.2. Deterministic versus Stochastic Simulation
Simulators are either deterministic or stochastic. Deterministic simulators will evolve systems in an
identical way across all simulation runs when given the same input conditions. Stochastic simulators will
evolve systems with some degree of variance across all simulation runs when given the same input
conditions. Stochastic simulators rely upon pseudo-random number generators to introduce
randomness into a run [17].
3.3. Continuous versus Discrete Simulation
Simulators are either continuous or discrete. Continuous simulators have models with explicit state
variables whose values are governed by differential-algebraic equations or differential equations.
Periodically (i.e. at some fixed time interval), the simulator will alter its state by solving the equations to
produce values for state-assignment. Discrete simulators can alter their variables at only a fixed number
of points in time. Discrete event simulators are an important subset of discrete simulators. Discrete
event simulators operate on a succession of events whose occurrence moves time forward [17].
P a g e | 15
3.4. Analytical versus Agent-Based Simulation
Simulators have either analytical or agent-based models. Analytical models rely upon a collection of
rules, equations, or functions to determine how the state of the system as a whole is advanced. In
contrast, an agent-based system involves some number of autonomous rule-based entities whose
actions and interactions affect the system. These agents respond to events that occur in the system,
which may alter their internal state and cause them to schedule additional events. The state of the
system is comprised of the aggregation of the states of all agents and any state variables external to
them [17].
P a g e | 16
A good simulation, be it a religious myth or scientific theory, gives us a sense of mastery over
experience. To present something symbolically, as we do when we speak or write, is somehow to
capture it, thus making it one’s own. But with this appropriation comes the realization that we
have denied the immediacy of reality and that in creating a substitute we have but spun another
thread in the web of our grand illusion.
- Rudolph Heinz Pagel
4. Simulation Implementation
HVNS runs on the JVM and uses Java as its implementation language for the simulator and all
simulatable constructs [18][19]. Java has strong OOP properties and an expansive standard library of
classes which implement many design patterns which have been leveraged (e.g. interfaces, observers,
threads) for HVNS. HVNS makes extensive use of design patterns and object‐oriented principles to
allow it to be extensible and modular. The architecture is designed around interfaces, abstract base
classes, instantiations which are external from constructors, and factory patterns.
4.1. Simulator
HVNS is a dynamic, deterministic, discrete event-scheduling, agent-based simulator. HVNS is a dynamic

simulator with a model that experiences a range of states before the simulation concludes. HVNS is a
deterministic simulator in that it does not introduce randomness in any aspect of its event scheduling or
delivery. This means that the simulator itself neither creates random events nor alters the scheduling of
events by simulatables. HVNS is a discrete event-scheduling simulator which maintains an event queue
and execution threads. HVNS is an agent-based simulator as it relies upon the goal-based interactions of
its registered simulatables to cause the evolution of its system state.
The simulator and simulation environment are distinct from the network/computational model
employed. This means that the operation of HVNS can be made to emulate aspects of the alternative
simulation classes described in Chapter 3 through alterations of the model (HVNS itself can also be sub-
classed to produce these effects). As an example, a discrete simulator can be modeled as a single
simulatable which schedules “do work” message for itself at a fixed time interval. An analytical
approach can be created by having the aforementioned simulatable maintain the equations and state
variables which it recalculates with every “do work” message. As another example, non-determinism
can be introduced into the simulation via the model being simulated. Simulatables can employ random
number generators in a way that affects if and when they are going to schedule an event, or the way in
which they determine the recipient of their event’s message. Alternatively, simulatables could be
wrapped inside a parent simulatable which can be made to intercept events, modify when and/if they
are to be delivered, and schedule a series of events to occur at random points in the future.
P a g e | 17
The simulator runs in its own thread and possesses a thread-safe priority-queue for events. There are
three major portions of the simulator that are discussed here: simulator startup, scheduling events, and
the main event loop.
The startup procedure for HVNS operation and for discrete event simulators in general is depicted in
Figure 4-1:
Simulator Startup
Set ENDING_CONDITION to FALSE

Initialize event queue
Initialize state variables / register agents
Start event thread
Schedule bootstrap event(s)
FIGURE 4-1. PSEUDO-CODE FOR THE SIMULATOR START-UP SEQUENCE.
Simulator startup consists of a few tasks. The ending condition is set to false for the simulator. This is
an important step otherwise the main event loop would never execute. The ending condition is
implementation specific and largely depends upon the model under consideration. The ending
condition may be to stop at some specific time, when a particular state variable or a derivative of that
variable reaches a value, or when a certain number of events have been processed. The ending
condition must be able to be set to false, otherwise simulation may proceed indefinitely. The ending
condition for HVNS is dependent upon a “stopped” variable. HVNS depends upon the model’s
configuration to tell it when to stop, as otherwise if will wait indefinitely for new events to be
introduced into the queue. The event queue is next initialized so that it is ready to handle the
scheduling of events. The model’s state is set, which can include the initialization of state values and/or
the creation and registration of agents (i.e. simulatables). The main event thread is then initialized
which begins the main event loop where event execution and message delivery occurs. Bootstrap
events are then scheduled, allowing the model to evolve.
Events and their execution are the main concern of the simulator. Events move the simulation’s time
forward. The chronological sequence of events represents the evolution of model’s state over a
simulation run. A simulator without scheduled events cannot evolve because the agents it contains are
passive entities who can only act upon the execution of an event/reception of a message. The bootstrap
events kick-off the simulatables to begin reacting to events.
Events are the primary form of communication between a simulatable and the simulator. Discrete
event simulators use events to cause alterations to the model at a specific time. Agent-based simulators
like HVNS use scheduled events to pass messages between agents. The basic scheduling algorithm for
events is shown in Figure 4-2.
P a g e | 18
Event Scheduling Algorithm
if( event time < simulator time )

Add event to queue
Signal scheduled event
FIGURE 4-2. PSEUDO-CODE FOR EVENT SCHEDULING.
Simulatables call the scheduling function with their event. The simulator will add the event to the queue
so long as the event occurs in the future and not the past, otherwise non-causal events can occur (i.e.
events which depend not just upon past events but also future events). The event queue is a priority
queue which sorts events based upon an event comparator. Generally, events are sorted first in
ascending time-order, and second in descending priority-order. This ensures that events occur in a
chronologically ascending sequence allowing causality to be maintained. The priority-order allows
simulatables to send control messages to themselves that will always be received prior to external
messages. The main event thread is then signaled if it is waiting on an empty queue.
Main Event Loop
while( ENDING_CONDITION is FALSE )

while( queue is empty ) { wait for scheduled event }
Remove event from queue

Update simulator time to event time
Execute event (deliver message to recipient)
FIGURE 4-3. PSEUDO-CODE FOR THE SIMULATOR'S MAIN EVENT HANDLING LOOP.
The main event loop handles the execution scheduled events and the delivery of messages. It is
depicted in Figure 4-3. It runs until some ending condition is met as discussed above. The event loop
avoids busy waiting by waiting on a condition variable when the queue is empty. The event loop wakes
upon being signaled by the scheduling function that a new event was added to the queue.
An event communicates three important pieces of information to HVNS: the time of delivery, the
message, and the intended recipient of the message. HVNS removes the head element from the queue,
updates its time to match the event’s time, and then “executes” the event by delivering the event’s
message to the specified simulatable recipient.
4.2. Simulatables
Simulatables, in aggregate, represent the simulator’s model. They are stateful entities (even if only
possessing a single state) which are capable of handling messages delivered to them as the result of an
event occurring. Simulatables are also agents which implement entity-specific logic which dictates how
they respond to messages. This response can include a state-change, the scheduling of additional
P a g e | 19
events (and messages to other simulatables), the generation and registration of additional simulatables,
etcetera.
Simulatables are passive entities that live inside the main event loop’s thread. They can only ever act if
an event with a message for them is scheduled and subsequently executed. Simulatables use events to
communicate with the simulator. Events are containers for messages. A message would otherwise be a
method call between one simulatable and another. Events allow such a call be to be affected by
simulation imposed limits on operations and the passage of time.
Simulatables receive messages which are labeled as implementing the IMessage interface. The content
of a message is completely dependent upon the implementation and expectations of the particular
simulatable subclass which is receiving the message. Messages are expected to contain information that
allows a simulatable to act in an event-appropriate fashion.
It is useful to use the abstraction that messages are time-delayed method calls. As such, the message
type flag or the message class itself can be used to indicate the method which is to be called. The fields
and/or message accessors provide the required parameters for this method call. If a message contains a
type flags, then it is generally heavyweight as it must provide parameters (even if null or empty) for
every method type. If the class of the message itself indicates the method that is to be called then the
message need only provide the parameters appropriate for that single method. This comes at the cost
of an expensive class type check. Systems with heterogeneous simulatables may implement different
interfaces. To keep messages compact and easy to understand, it is recommended that different
message types be employed for each of these interfaces.
Simulatables, as passive entities, must receive messages in order to act. The bootstrap events
scheduled at the start of a simulation run allow one or more simulatables to perform operations at
simulation onset. Typically, but ultimately dependant on the model, this will start a cascade of event
scheduling which moves the model between states and forward in time. Simulatables, too, can
bootstrap themselves or other simulatables. A simulatables accomplishes this by scheduling a “control”
message to itself to be received at some point in the future. The event containing this message has its
priority field set to INTERNAL which has a higher priority than the default EXTERNAL priority which
ensures that it is delivered prior to any external events.
Figure 4-4. depicts an example of this mechanism with a simulatable that wishes to accomplish some w
amount of work in work-state[n] followed by some w` amount of work in work-state[n+1]. The control
messages allow this simulatable to do work in a state until all work is exhausted, it can then proceed to
schedule a state change and schedule additional work messages to be completed in the next state.
P a g e | 20
Bootstrapping Future Work
[receive bootstrap doWork[1]

schedule doWork[1]]
[receive doWork[1] && !hasWork[1] [receive doWork[2] && !hasWork[2] [receive doWork_n && !hasWork_n
send doWork[2]] send doWork[n-1]] send doWork_n+1]
Work[1] Work[2] Work[n-1] Work[n]
[receive doWork[1] && hasWork[1] [receive doWork[2] && hasWork[2] [receive doWork[n-1] && hasWork[n-1] [receive doWork[n] && hasWork[n]
doWork[1] doWork[2] doWork[n-1] doWork[n]
send doWork[1]] send doWork[2]] send doWork[n-1]] send doWork[n]]
FIGURE 4-4. FSM OF A SIMULATABLE BOOTSTRAPPING ITS WORK AND STATE CHANGES.
4.3. Operation Bound Simulatables
An operation bound or performance restricted simulatable is a stateful entity with limitations placed on
its ability to respond to, or schedule events during an interval of time. An Operation Bound
Simulatable’s operation is represented by the FSM depicted in Figure 4-5.
Operation Bound Simulatable FSM [Get Request

Send(request, t+refresh_sent_t, external)]
[Get Request
Delegate Response
ops--
[Get Refresh ops > 0]
Ops = max_ops]
[Get Refresh [Get Request
Ops = max_ops] Send Response
ops--
ops = 0
]
Blocked
Fully-Awake Partially-Awake
[Get Request
Delegate Response
ops--
refresh_sent_t = t;
Send(Renewal, t+refresh_sent_t, internal)]
[Get Refresh
Ops = max_ops]
FIGURE 4-5. OPERATION BOUND SIMULATABLE FSM DEMONSTRATES A NODE THAT HAS A LIMITED CAPACITY TO ACT AND SO PUTS OFF
ADDITIONAL WORK FOR A FUTURE TIME WHEN IT WILL HAVE THE CAPACITY TO ACT AGAIN.
P a g e | 21
Operation bound simulatables possess three basic states which govern their ability to act. These states
include:
 Fully-Awake – the simulatable is able to perform up its maximum number of operations during a
time interval.
 Partially-Awake – the simulatable is able to perform some reduced number of operations during
the current time interval.
 Blocked – the simulatable has exhausted its ability to perform any additional operations during
the current time interval.
Operation bound simulatables begin in the Fully-Awake state. Fielding a request and/or sending a
response in this state causes the simulatable’s allowed operations to be decreased by some amount, a
refresh message to be scheduled, and a transition to the Partially-Awake state.
The simulatable may continue to perform operations while in the Partially-Awake state, with each
operation performed resulting in additional decreases to the simulatable’s ability to field further
requests. Once all operational ability has been exhausted, the simulatable goes into a blocked state.
The Blocked state indicates that no operations can be performed. Any requests received in this state
are rescheduled by the simulatable to reoccur during the start of the next time interval. This ensures
that the events can be handled once the simulatable is again able to do so.
The refresh message is contained in an event which is scheduled by and for the simulatable. It is
scheduled to occur at the beginning of the next operation interval which is governed by the
simulatable's refresh_time variable. The refresh message is sent with the highest priority possible
ensuring that it is received before any other events can be received by the simulatable. Reception of the
refresh message indicates the start of the next operation interval and indicates that the simulatable may
return to the Fully-Awake state with the ability to perform the maximum number of allowed operations.
If the simulatable has been in the Blocked state and has rescheduled events to occur for this time
interval, these events will be redelivered by the simulator and can be handled normally.
The operation bound simulatable abstracts out performance restriction for subclasses by implementing
a function hook which is called during the Fully-Awake and Partially-Awake states. Subclasses override
this hook method so that they can be the delegate of model-specific functionality. The parent class
takes care of determining if and when delegation can occur. An FSM representing this is depicted in
Figure 4-6.
P a g e | 22
[Get Request
ops--
refresh_sent_t = t;
Send(Renewal, t+refresh_sent_t, internal)] Delegated FSM
[Get Request
Send(request, t+refresh_sent_t, external)]
[Get Request
ops--
[Get Refresh ops > 0
Ops = max_ops] ]
[Get Request
Send Response
ops--
ops = 0
[Get Refresh
Ops = max_ops] ]
Blocked
Fully-Awake Partially-Awake
[Get Refresh
Ops = max_ops]
FIGURE 4-6. SUBCLASSED OPERATION BOUND SIMULATABLE OVERRIDING HOOK METHOD.
An alternative approach that was explored, involved the use of a separate message queue on all
simulatables. Simulatables would control their ability to poll the message queue by sending a refresh
message at the start of their polling operations. A simulatable would return to a blocked state once
they handled as many messages as allowed by their configuration. The simulatable would subsequently
wake up upon reception of the refresh message and start the entire process over again. This approach
was abandoned since it requires the duplication of existing functionality—the simulator already
possesses a priority queue for events.
P a g e | 23
An idea is always a generalization, and generalization is a property of thinking. To generalize

means to think.
- Georg Hegel
5. Network Model
The network model represents communication devices which can generate, receive, and propagate
collections of data in a container known as a packet. It is the model on top of which the hardware
model resides. A network is composed of three basic entities: Nodes, Connection Media, and
Connection Adaptors. All network entities use operation bound simulatables as their base which allows
their performance to be altered.
The network model uses a simplified single addressing scheme whereby a node has a single address
which is shared by all of its connection adaptors. This is as opposed to having separate MAC and
network addresses for each adaptor as would be the case in a real-world network.
It is important to note that the network model entities are all simulatables, specifically operation bound
simulatables. This means that these entities do not directly communicate via method calls during a
simulation. Instead, all communication between network entities occurs through the use of events
containing messages. The message passed to a network entity during an event is roughly equivalent to a
method call on its interface. As an example, when a connection medium is said to propagate a packet to
the connection adaptors to which it is attached, it is not actually calling each adaptor’s receive method
and providing it with the packet. Instead, the connection medium is actually scheduling an event for
each of these connection adaptors. This event will contain a message of the type
ConnectionAdaptorReceiveMessage and will contain the packet which the connection adaptor is to
receive and inspect. This ensures that all events are affected by the temporal mechanics of the
simulation environment.
5.1. Node
A node represents a packet generating/receiving device. Nodes are roughly equivalent to everyday
systems such as personal computers, file servers, phones, etcetera. Nodes are logically connected to
one another through a connection medium. Internally, a node may have several connection adaptors
which “physically” connect it to several connection mediums. A node sends messages by making use of
its protocol stack.
P a g e | 24
Two Nodes Connected via Medium
Node Node
Connection Medium Connection

Adaptor Adaptor
FIGURE 5-1. DEPICTION OF THE PHYSICAL CONNECTION BETWEEN TWO NODES AND THE RELATIONSHIP OF THE NETWORK ENTITIES TO ONE
ANOTHER.
Nodes in the network model can be logically connected to several other nodes. They also possess a
protocol handler, known as the NetworkProtocolHandler, which can perform routing services to
determine the next hop destination for a packet. These characteristics allow nodes to also emulate the
functionality provided by switches or routers. As such, the network model has been simplified to
exclude explicit implementation of these network objects.
Nodes also provide the transport API which allows algorithms (or any application layer item) to make
use of the protocol stack. They also fulfill the unreliable transport layer role by wrapping application
layer messages into datagrams which are then provided to the network layer for routing.
5.2. Connection Adaptor
A connection adaptor is a packet propagator and sender. It represents any physical layer device which
interfaces with the medium itself. Examples of such devices include network interface cards and
wireless antennas. One or more connection adaptors can be contained within a node.
A connection adaptor must deal with two events, packets that are outgoing and packets that are
incoming (both of which are in respect to the attached node). A connection adaptor that receives a
packet from its own node (specifically from a protocol handler above it in the stack) sends that packet
out across the connection medium. A connection adaptor that receives a packet from a connection
medium must inspect that packet to determine its intended destination. If the packet’s destination
address does not match the connection adaptor’s address, then the connection adaptor was not the
intended recipient and the packet is dropped. However, if the addresses match then this connection
adaptor is considered the next hop and the packet must be handled by a protocol handler further up the
protocol stack.
NetworkProtocolHandler is the protocol handler one further up the stack. It is a network layer protocol
handler that provides routing services. It is responsible for determining the address of the next-hop
node along the path of nodes to the destination.
P a g e | 25
Network of Three Nodes

Node A Node B Node C
NetworkPH Network Protocol Handler NetworkPH
Medium Connection Connection Medium Connection
Connection
Adaptor Adaptor Adaptor Adaptor
FIGURE 5-2. THIS NETWORK OF THREE NODES DEPICTS THE INTERNAL MAKEUP OF A NETWORK OF NODES THREE NODES CONNECTED IN
SERIES. A NODE POSSESSES A CONNECTION ADAPTOR FOR EVERY MEDIUM TO WHICH IT IS CONNECTED. A SINGLE CONNECTION NETWORK
PROTOCOL HANDLER HANDLES ROUTING RESPONSIBILITIES FOR ALL CONNECTION ADAPTORS AND ALGORITHMS INSTALLED ON A NODE.
When NetworkProtocolHandler receives a datagram from a higher level protocol, it interrogates its
routing table to determine the address of the next closest node on the path from the current node to
the destination node. The datagram is encapsulated into a packet and delivered to the connection
adaptor to send out. In some cases, this next hop node may be identical to the destination node. In
other cases, there may be several intermediate nodes that must also receive and route the packet.
NetworkProtocolHandler also handles datagrams received as the payload of a packet from the lower
level protocol. The destination address of the datagram is inspected. If the address matches the
address of the node, then the payload data is removed from the datagram and delivered to the protocol
handler that matches the protocol of the datagram (i.e. the TransportProtocolHandler itself). If the
address does not match the address of the node, then the connection adaptor proceeds to act as if the
datagram had been received from the higher level protocol. It determines the next hop address via its
routing table, packages the datagram into a packet with this next hop address, and gives the packet to
the connection adaptor to send across the medium.
Network of Three Nodes Routing a Packet

Node A Node B Node C
NetworkPH NetworkProtocolHandler NetworkPH
Medium Connection Connection Medium Connection
Connection
Adaptor Adaptor Adaptor Adaptor
FIGURE 5-3. NODE A IS SENDING A PACKET WITH NODE C AS ITS DESTINATION. THE NEXT HOP NODE ALONG THE PATH IS B. WHEN B
RECEIVES THIS PACKET, IT INSPECTS THE DESTINATION ADDRESS OF THE DATAGRAM, DETERMINES THE NEXT HOP ADDRESS, REPACKAGES
THE DATAGRAM INTO A NEW PACKET ADDRESSED TO C, AND FINALLY SENDS IT TO NODE C.
5.3. Connection Medium
Connection medium is a packet duplicating/propagating device. It represents the physical layer /

medium of an actual network which may be a coaxial cable, twisted pair Ethernet, airwaves, or
otherwise. Depending upon implementation, a connection medium can logically connect multiple nodes
to one another to allow packets to be propagated in a broadcast fashion to devices connected to the
medium. A connection medium is logically connected to a node but physically connected to the node’s
connection adaptor. It receives and sends packets to connection adaptors. All connection mediums are
P a g e | 26
currently reliable and will never drop packets or introduce errors into sent packets. The addition of
unreliable media would necessitate the implementation of a reliable transport protocol.
5.4. Protocol Stack
The protocol stack is an abstraction that describes the collection of protocols installed on a node. It is a
layered series of protocols where the lower level protocol (i.e. the protocol below) provides a service
the higher level protocol (i.e. the protocol above). Figure 5-4 provides a visual depiction of the protocol
stack in the internet protocol suite called TCP/IP as well as the protocol stack employed by node.
Comparison of Protocol
Stacks
TCP/IP Node’s Protocol

Stack
Application Layer Algorithm
Transport API Transport API
Transport
Transport Layer
ProtocolHandler
Network
Network Layer
ProtocolHandler
Data Link Layer

ConnectionAdaptor
Physical Layer
FIGURE 5-4 MAPPING BETWEEN TCP/IP'S PROTOCOL STACK AND NODE'S PROTOCOL STACK
The traditional TCP/IP protocol suite has 5 layers: Application, Transport, Network, Data Link, and
Physical and a transport layer API that serves as the glue between an application and the services
provided by the rest of the stack. Node’s protocol stack combines several of these layers. The
connection adaptor provides general media access service by handling both the physical and link layer
services. A NetworkProtocolHandler provides networking services by handling routing operations.
TransportProtocolHandler provides unreliable transport services and also provides the transport API to
algorithms. The distribution algorithms sit at the top on the protocol stack and access the services of
the stack via the Transport API (i.e. the IProtocolHandler interface).
A protocol in this model implements the IProtocolHandler interface. IProtocolHandlers handle the
following operations:
 Associate a protocol handler with a protocol name

P a g e | 27
 Handle packets from the higher level protocol (implementation dependant)

 Handle packets from the lower level protocol (implementation dependant)
All packets implement the IPacket interface. They have a protocol type, source, destination, and
payload. The term used to refer to a packet is dependent upon the layer of the protocol stack. The
transport layer deals with datagrams. The network layer deals with packets. The physical layer deals
with frames. Each layer is responsible for dealing with one type of packet which contains information
pertinent to that layer. The generic term packet will be used when referring to a non-specific level of
the protocol stack.
Each layer of the protocol stack encapsulates the packet received from the higher level protocol before
contacting the lower level protocol. Encapsulation adds header (and/or footer) information that is
pertinent to the lower level protocol. The original packet from the higher level protocol is also added as
a payload to the new lower level packet. This is important since it allows the original information to be
retrieved by the corresponding protocol handler on the protocol stack of the destination device.
The typical operation for a protocol handler is as follows. The protocol handler processes a packet. If
packet was received from a higher protocol then the next step of the chain is a lower protocol. The
handler encapsulates the original packet as payload inside of a new packet and sends it to the lower
protocol. If packet was received from a lower protocol then next step of the chain is a higher protocol.
The handler removes the payload from the packet and hands this off to the higher level protocol.
Algorithm Sending a Message to another Algorithm
Node A Node B
Transport Transport
Network Medium Network
Connection Connection
Adaptor Adaptor
FIGURE 5-5 THE ENCAPSULATION PROCESS OF A PACKET AS IT MOVES DOWN THE PROTOCOL STACK ON NODE A AND UP THE PROTOCOL
STACK ON NODE B.
Protocol handlers handle one or more protocol types. Protocol handlers maintain references to the next
higher and next lower protocol handlers in the stack. These references are setup up during the
initialization of the protocol handlers employed by a node.
P a g e | 28
Generalization is necessary to the advancement of knowledge; but particularly is indispensable

to the creations of the imagination. In proportion as men know more and think more they look
less at individuals and more at classes. They therefore make better theories and worse poems.
- Thomas B. Macaulay
6. Hardware Model
The hardware model represents is an abstraction of a computer, its operating system, and several of the
components that impact its performance. This model intersects the network model as it shares a view
of the node and use of the connection adaptor. Distribution algorithms interface with the hardware
model’s API to retrieve information, store information, and to make use of the network for
communication. Hardware components, like network components, use operation bound simulatables
as their base which allows their performance to be altered.
All hardware objects share the following performance-altering properties:
 Transit time
 Maximum allowed operations
 Refresh interval
Some of these were discussed in the context of operation bound simulatable and how they affect its
state-changing behavior. The discussion here focuses on their meaning and use in a hardware context.
Transit time is the delay associated with sending information from one hardware component to
another. It is used to represent both the time it takes to process a request as well as the time delay
associated with sending a response along the channel between the hardware source and the intended
recipient. Implementation wise, transit time’s value is used when scheduling an event to be received by
another device. The current simulator time plus the transit time is the soonest possible time that a
simulatable may use for scheduling a new event.
Maximum allowed operations affects how many “operations” can be performed by a piece of hardware
within a given activity interval. Individual hardware components are responsible for implementing a
coherent and reasonable view on what activities qualify as operations (i.e. quantify the value in
operations of every activity/method). Operations in the context of an adaptor or harddrive correspond
most closely to bandwidth. A harddrive that can perform 10 operations per one unit refresh interval can
possibly send or receive 20 data per unit time. If data is assumed to be a byte and unit time a second,
this corresponds to a bandwidth of 80 megabytes per second. Hardware components keep track of how
many operations they have performed during a time interval and subtract operations as activities are
performed.
The refresh interval value can be thought of as a hardware component’s internal clock; the frequency at
which a hardware component can operate. The time between refreshes is a hardware components
P a g e | 29
activity interval. The underlying operation bound simulatable uses the refresh interval to reset a
hardware components operation count which signifies the beginning of a new activity which allows a
hardware component to perform operations once more.
6.1. Hardware Computer Node
The Computer interface provides for the installation of algorithms, harddrives, and caches. It is the
gateway that an algorithm uses to obtain access to hardware services performed by harddrives, caches,
and adaptors.
6.2. Harddrive
Harddrives are large, long-term storage devices. They will generally have slow access times relative to
caches and even connection adaptors. Harddrives store data for distribution algorithms. Harddrive size
will generally be homogenous across all computer nodes in the network. The important exception is
that the client node must have a harddrive large enough to store the entirety of the data that is to be
distributed to servers on the network so that local-disk read time can be measured.
Harddrives store IData objects which are associated with indices. Harddrives can fetch and store data
from specified indices for a computer node.
6.3. Cache
Caches are small, short-term storage devices. Caches are much smaller than harddrives, but also much
faster. They are optionally installed upon a computer node. Caches represent the use of main memory
to store data to speed up data retrieval. A cache may be used by an algorithm to store information that
it predicts it will need access to in the future. This allows an algorithm to pre-pay the cost of a slow
harddrive access for a piece of data on the behalf of a future requester of that data. The algorithm can
retrieve the data from the cache when a request for it is finally made allowing the requester to
experience the fast cache access time and not the slow harddrive access time. Ideally, every request for
data could be met with a cache hit where the prediction for cache storage/data usage was correct and
the cache has the data necessary to field the request. A cache miss occurs when the prediction is
incorrect and the cache does not have the data requested. Cache misses penalize the requester to pay
the cost of the cache access time as well as the harddrive access which must be made.
Caches, like harddrives, store IData objects which are associated with indices. The index associated with
a piece of data is the same whether that data is stored in the cache or on the harddrive.
6.4. Connection Adaptor
Connection adaptors represent network interface cards and are used for communication across the
network. Connection Adaptors operate on a packet level as does the medium to which they are
connected. They transfer full objects without transformation to a separate physical representation (i.e.
P a g e | 30
bits) across connection media. Connection adaptor speeds are approaching and in some cases
exceeding the transfer rates of local storage like harddrives [15].
P a g e | 31
A distributed system is one in which the failure of a computer you didn’t even know existed can
render your own computer unusable.
- Leslie Lamport
7. Distribution and Retrieval Algorithms
A distribution and retrieval algorithm (DRA) deals with the logistics of distributing data from a local
client node to a series of nodes remote to it on the network for storage, as well as retrieving this data at
some future point in time. A DRA has the following high-level functions:
 Retrieve data from local storage.

 Set up and manage storage network of available nodes on a network.
 Retrieve data from remote storage nodes.
Distribution algorithms perform different roles as part of the hardware model, network model, and
simulation environment.
DRA implement the IAlgorithm interface and access the hardware model through installation onto an
entity which implements the IComputer interface like HardwareComputerNode objects. This interface
grants a DRA the ability to reference storage devices like the computer’s harddrive and cache so that
IData objects can be stored and retrieved locally. In order to use a storage device, DRAs must
understand the interfaces presented by storage devices as made available through the storage device’s
supported messages.
DRAs sit on top of the network protocol stack in the network model. DRAs implement the
IProtocolHandler interface which connects them to the transport layer protocol handler. This
connection provides DRAs with the ability to transmit and receive messages which contain control and
data values stored as a packet’s payload to and from DRAs remote to them in the network.
DRAs are operation bound simulatables in the simulation. As simulation agents they can schedule
events containing messages for other simulatables like harddrives, caches, and the transport protocol
handler.
7.1. Operation Model
DRAs use a client-server approach to communication. One DRA is selected to be a client in the
configuration file which builds the simulation. The other DRAs remain in a passive role until they elect
and are thereafter selected to perform server duties.
The simulation begins when the simulator schedules a SET_CLIENT event to the client node which
indicates it has been selected to perform the client role. The client then generates a specified quantity
of data which will be used for both the local and remote read tests. Once these are complete, the client
P a g e | 32
sends itself a series of bootstrap events to allow it to continue processing and moving through its
behavioral states.
The client’s general operation is roughly as follows:
1. Select volunteer(s) and acknowledge their role as server(s).

2. Read data from local storage.
3. Distribute data to server(s).
4. Await server(s) ready signal.
5. Read data from server(s).
The manner in which this is accomplished is implementation specific. Overviews of the two algorithms
designed for this project are detailed in Sections 7.3 and 7.4.
7.2. Implementation
DRAs are implemented using the state design pattern [20]. The state design pattern abstracts out state
values and the unique behaviors associated with those state values into the form of a state object. The
DRA itself is a state-context or state-holder object. The DRA holds a state object to which the it
delegates method invocations. The DRA’s behavior thusly depends upon the behavior/implementation
of the state object it contains. The DRA’s behavior can be altered by replacing the state object it holds.
State-transitions occur when the state object performs this replacement on its state-holder.
IStateHolder
IState
IState _state
delegateEvent( Event e )
delegateEvent( Event e )
setState( IState s )
Algorithm
IState _state Init Distribute Read
delegateEvent( Event e ) delegateEvent( Event e ) delegateEvent( Event e ) delegateEvent( Event e )

setState( IState s )
FIGURE 7-1. UML OF STATE DESIGN PATTERN.

P a g e | 33
The state design pattern simplifies the creation of distribution algorithms which may have several roles
and several states per role to fulfill. It successfully isolates responsibilities, which creates simpler and
more easily tested code.
7.3. Client-Managed Distribution Algorithm
The Client-Managed Distribution and Retrieval Algorithm (CMDRA) is a client-server approach that
requires a client which is active in the selection of servers, the fair distribution of data, and which
maintains an index table mapping data indices to server addresses. CMDRA features the use of cache on
servers to speed up retrieval operations. This section examines of the client’s operation followed by an
examination of the server(s)’s operation. FSMs are provided for both the client operation and the server
operation. The discussion text shares terminology with these diagrams and discusses the highlights of
each state of the FSM.
The operation of CMDRA’s client is depicted as an FSM that is split into two parts, beginning in Figure
7-2 and ending in Figure 7-3.
Client FSM
[Receive ServerAcknowledges [Receive HD Response
serveracknowledgements++
needMoreAcknowledgements()] Send data next server round robin style]
[receive ServerAcknowledges
[receive SetClient and !needMoreAcknowledgements() [receive ServerAcknowledges
acknowledgements++
broadcast VolunteerRequest] send DoWork] and haveAllAcknowledgements()]
NullRole AwaitVolunteers Distribute

[receive ServerVolunteers [receive ServerAcknowledges
acknowledgements++
send ClientRejectsVolunteer] and !haveAllAcknowledgements()
]
[Receive DoWork
[receive ServerVolunteers && needToDistribute()
and needMoreVolunteers()
Send HD Request
store volunteer address distributed++
send VolunteerAccepted] send DoWork]
FIGURE 7-2. CMDRA CLIENT FSM (PART 1). FEATURES INCLUDE VOLUNTEER SELECTION AND THE DATA DISTRIBUTION PROCESSES.
Potential clients begin in the NullRole stage. They transition into client status once they receive the
SetClient message from the simulation. A client first attempts to locate some arbitrary number of
servers which will each store some fractional slice of the total data to be stored. The client broadcasts a
VolunteerRequest message to obtain these servers. The request message contains a count of the total
amount of data that a server is expected to store. This volunteer request message is disseminated
across all nodes in the network. Servers which are willing and able to store the specified amount of data
P a g e | 34
send back a ServerVolunteers message. The client accepts a user defined number of these servers and
rejects the rest.
Once all volunteers have been acknowledged the client begins data distribution. The client sends data
requests to its harddrive for all indices which are to be sent. The harddrive in turn responds with the
data stored in the index for each request it receives. The client sends the data from each of these
responses to a server which is selected in a round-robin fashion. The client keeps track of index
ownership for future retrieval by maintaining a mapping of servers and the indices they hold. Every data
sent is acknowledged by the server as it is successfully stored. Once all data has been sent and all server
acknowledgements have been received, the client proceeds to the server confirmation stage called
ConfirmServerReady.
Client FSM [receive ServerReady

serverready++ [receive DataResponse
and needMoreServerReady and dataResponse is valid
dataresponses++
send DoWork] and needMoreDataResponses]
[receive DataResponse
[receive ServerReady dataresponses++
and !needMoreServerReady and !needMoreDataResponses
[send DoWork] send DoWork] send SimulationComplete]
ConfirmServerReady Read Done
[receive ServerVolunteers [receive ServerVolunteers
send ClientRejectsVolunteer] send ClientRejectsVolunteer]

[receive DoWork
[Receive DoWork and haveDataToRequest
and haveMoreDataStoreComplete()
send ServerDataRequest
dataStoreComplete++ dataRequests++
send DataStoreComplete to server] send DoWork]
FIGURE 7-3. CMDRA CLIENT FSM (PART 2). FEATURES INCLUDE CONFIRMATION OF SERVER READYNESS AND THE REMOTE READING
PROCESS.
Inside ConfirmServerReady, the client indicates to each of the servers that it has finished sending it data.
The client then waits for the servers to confirm that they are ready to respond to read requests for that
data. Servers that have finished storing and processing data respond in time. The client enters the Read
state once this has occurred.
Inside Read, the client proceeds to send data requests to each server for the indices it knows that server
possesses. The data received from these requests is compared against the data stored locally to ensure
that it is correct. Once all valid data have been received the client ends the simulation.
P a g e | 35
The operation of CMDRA’s server is depicted as an FSM in Figure 7-4 and Figure 7-5. Like the client, the
server begins in a passive NullRole state without a specifically defined client or server role.
Server FSM
[VolunteerRequest && hasSpace()
Mark request id. [receive ClientAcceptsVolunteer [Receive DataStoreComplete

Send ServerVolunteers
Broadcast VolunteerRequest.] send ServerAcknowledges] send ServerReady
send DoWork]
NullRole Volunteered AwaitStorage
[receive ClientRejectsVolunteer]
[Receive DataStorage
[VolunteerRequest send ServerAcknowledges]
&& !hasSpace()]
FIGURE 7-4. CMDRA SERVER FSM (PART 1). FEATURES INCLUDE THE VOLUNTEER AND DATA STORAGE PROCESSES.
This changes when a would-be server receives a volunteer request from the client. Servers which have
sufficient capacity and are willing to perform a storage role send a ServerVolunteers message to the
client and proceed into the Volunteered state. They also rebroadcast the volunteer message to other
nodes on the network (up until the time to live limit is reached) so that additional servers not directly
connected to the client may also receive the message. Volunteered CMDRAs can either be accepted or
rejected as a volunteer. Would-be servers that are rejected return to NullRole. Would-be servers that
are accepted enter a storage request acceptance state called AwaitStorage and acknowledge this
transition with the client.
Inside AwaitStorage, servers who are awaiting storage requests field these requests from the client and
place data received into long-term storage on their harddrive. Servers acknowledge every piece of data
received from the client. Eventually, the client will indicate that all data has been sent. The server then
has time to process the data in some way (e.g. place data into cache, etc.). Once operations of this
nature complete, the server proceeds to the Service state and confirms that it is now ready to field read
requests.
P a g e | 36
Server FSM [Receive ClientDataRequest(index)
send CacheRequest(index)]
[Receive CacheResponse(data, address)
send DoWork(1)
send DataResponse(data, address)]
[Receive CacheResponse(null, index)

[]
Send HDRequest(index)]
Service
[Receive HDResponse(data, address)

[Receive DoWork(cacheFreespace)
Send HDRequest(index) from Cache to HD

Send DoWork(cacheFreespace--)]
FIGURE 7-5. CMDRA SERVER FSM (PART 2). FEATURES INCLUDE THE SERVICE PROCESS WHICH DEMONSTRATE CACHE AND HD STORAGE
AND RETRIEVAL AND THE SERVICING OF CLIENT DATA REQUESTS.
Inside Service, the server is responsible for fielding read requests from the client. The server receives
requests for the data stored at an index. The server first attempts to retrieve this information from the
cache. The cache can respond with data or a null response. Cache data responses are shipped off to the
client. A null response forces the server to request the data from the harddrive. The hardrive’s data
response always contains data and can never be null. This data is similarly shipped off to the client.
During this time the server attempts to keep the cache filled with data. Cache hits result in new
requests by the server to the harddrive indicating that the cache needs to be filled with more data (i.e.
data which has not yet been requested).
P a g e | 37
7.4. Server-Managed Distribution Algorithm
The Server-Managed Distribution and Retrieval Algorithm (SMDRA) is a client-server approach which
offloads most the server selection, data distribution, and data mapping to a single primary server which
is responsible for a collection of secondary storage servers. SMDRA features the use of cache as well as
the use of data redundancy to speed up data requests. There are three roles present in this algorithm:
client, primary server, and secondary server(s). This section examines the operation of each in turn.
FSMs are provided depicting the operations of the client both types of servers. The discussion text
shares terminology with these diagrams and discusses the highlights of each state of the FSM.
The operation of SMDRA’s client is depicted in Figure 7-6 and Figure 7-7.
Client FSM
[Receive ServerVolunteers [Receive HD Response
relay to primary server] Send data to primary server]
[receive SetClient [receive ServerVolunteers [receive ServerReady

send AcceptedAsPrimary] send DoWork] acknowledgements++
broadcast VolunteerRequest] and haveAllAcknowledgements()]
NullRole AwaitFirstVolunteer AwaitVolunteers Distribute

[receive ServerVolunteers acknowledgements++
and !haveAllAcknowledgements()
send ClientRejectsVolunteer] ]
[Receive DoWork
&& needToDistribute()
Send HD Request
distributed++
send DoWork]
FIGURE 7-6. SMDRA CLIENT FSM (PART 1). FEATURES INCLUDE PRIMARY SERVER SELECTION, VOLUNTEER RELAYING, AND THE DATA
DISTRIBUTION PROCESSES.
Here again, would-be clients begin in the NullRole state until it receives a SetClient message from the
simulation. The client broadcasts a VolunteerRequest message and enters AwaitFirstVolunteer. The
volunteer request message is disseminated across all nodes in the network. Servers which are willing
and able to store the amount of data in a slice send back a ServerVolunteers message.
Inside AwaitFirstVolunteer, the server awaits the first of these messages. The sender of this first
message is selected by the client to be the primary server and is sent an acknowledgement of this role.
The acknowledgement includes information pertinent to the selection of additional volunteers including
the number of base servers and the amount of data redundancy required.
Inside AwaitVolunteers, the client relays all subsequent server volunteers to the primary server. It does
this until the primary server indicates that it is ready to receive storage requests.
P a g e | 38
Inside Distribute, the client proceeds to enter the distribute stage where it sends all data to the primary
server. It continues sending data until it exhausts its supply and receives an acknowledgement from the
primary server that the data has been received. The client then proceeds to ConfirmServerReady.
Client FSM
and dataResponse is valid
dataresponses++
and needMoreDataResponses]
[receive ServerReady dataresponses++
and !needMoreServerReady and !needMoreDataResponses
[send DataStoreComplete] send DoWork] send SimulationComplete]
ConfirmServerReady Read Done
[receive ServerVolunteers [receive ServerVolunteers
send ClientRejectsVolunteer] send ClientRejectsVolunteer]

[receive DoWork
and haveDataToRequest
send ServerDataRequest
dataRequests++
send DoWork]
FIGURE 7-7. SMDRA CLIENT FSM (PART 2). FEATURES INCLUDE CONFIRMATION OF SERVER READYNESS AND THE REMOTE READING
PROCESS.
The client remains in the ConfirmServerReady until the primary server indicates that it has completed its
dissemination of the data. The client can then enter the Read state.
Inside Read, the client sends a data request for every piece of data sent to the primary server. Data
responses will be received from the server (primary or secondary) that has the data and was selected by
the primary server to field the request. The data received is compared against the data stored locally to
ensure that it is correct. Once all valid data have been received the client ends the simulation.
P a g e | 39
The operation of SMDRA’s primary server is depicted in Figure 7-8 and Figure 7-9.
Primary Server FSM [Receive ServerAcknowledges

needMoreAcknowledgements()]
[VolunteerRequest && hasSpace()
[receive AcceptedAsPrimary and !needMoreAcknowledgements()
Mark request id.
Send ServerVolunteers
send ServerAcknowledges] send ServerReady]
Broadcast VolunteerRequest.]
NullRole Volunteered AwaitVolunteers
[receive ClientRejectsVolunteer]
[receive ServerVolunteers
[VolunteerRequest and needMoreVolunteers()
&& !hasSpace()]
store volunteer address
send AcceptedAsSecondary]
FIGURE 7-8. SMDRA PRIMARY SERVER FSM (PART 1). FEATURES INCLUDE THE VOLUNTEER STAGE AND ACQUISITION OF SECONDARY
SERVERS.
The would-be server begins in a NullRole without a specifically defined client or server role. This
changes when the would-be server receives a volunteer request from the client. Would-be servers
which have sufficient capacity and are willing to perform a storage role send a ServerVolunteers
message to the client and proceed into the Volunteered state. They also rebroadcast the
VolunteerRequest message to other nodes on the network (up until the time to live limit is reached) so
that additional servers not directly connected to the client may also receive the message.
Volunteered SMDRAs can be rejected, accepted as primary servers or accepted as secondary servers.
Would-be servers that are rejected return to NullRole. Would-be servers that are accepted as a primary
server enter the await volunteers stage and acknowledge this with the client. Would-be servers that are
accepted as secondary servers are discussed later.
The primary server awaits further volunteers inside AwaitVolunteers. The primary server waits for as
many volunteers as are required to fulfill the base server quantity and the redundancy quantity specified
from the client. The base server quantity indicates how the data is to be divided across multiple servers.
N base servers indicates that the data will be divided into N portions/slices of data, with 1 slice of data
being stored on each server. R redundancy indicates how many times the data is to be duplicated; how
many servers will service a data slice. The total number of servers needed is N * R. Servers are grouped
according to the slice of data (i.e. range of indicies) they will be responsible for storing. In the case
where there is a redundancy of one, there is only one server in a server group. If redundancy is set at
two, there will be two servers in each server group. The primary server is always a part of the first
P a g e | 40
group. The primary server sends a ServerReady message to the client when this has occurred and
proceeds to PrimaryAwaitStorage.
Primary Server FSM
[Receive ServerReadReady [Receive ClientDataRequest(index)

serverReadReady++ and we store that index
needMoreReadReady()]
[Receive ServerReadReady delegate to internal SecondaryService]
serverReadReady++
and !needMoreReadReady()
send ServerReady]
[]
PrimaryAwaitStorage PrimaryService
send ServerRejectsVolunteer]
send ServerRejectsVolunteer]
[Receive DataStorage
[Receive ClientDataRequest(index)
relay DataStorage to all servers and a secondary stores it...
in the server group responsible for
the index send dataRequest to random secondary
store it if we are a part of that group] storing that index]
FIGURE 7-9. SMDRA PRIMARY SERVER FSM (PART 2). FEATURES INCLUDE STORAGE OF DATA ON SELF OR SECONDARY SERVERS AND THE
SERVICING OF REQUESTS VIA SELF OR SECONDARY.
Inside PrimaryAwaitStorage, the primary server handles storage requests. It determines which server is
to receive data based upon the index of the data. Every server in a server group that is to handle that
index receives a copy of the data for storage (including the primary server if it is a part of that group).
This continues until all data have been received and stored by all servers. Secondary servers
acknowledged receipt of all data in the range they are to store with a ServerReadReady message. Once
all servers indicate they are read ready, the primary server indicates that the storage network is read
ready to the client. The primary server then enters PrimaryService.
SMDRA’s PrimaryService state is a composite state since the primary server needs to handle not only the
relaying of storage requests to other secondary servers, but also must handle such requests itself. Data
requests are received and fielded in this state. The index of a request is matched against the server
group that is storing that index. A server is randomly selected from that group and the data request is
relayed to the server. If the server selected happens to be the primary server, then the request is
delegated to an internal SecondaryServiceState. This secondary service state is depicted in Figure 7-10
as part of the Secondary Server FSM.
P a g e | 41
The operation of SMDRA’s secondary server(s) is depicted in Figure 7-10.
Secondary Server FSM

[Receive ClientDataRequest(index)
send CacheRequest(index)]
[Receive CacheResponse(data, address)
send DoWork(1)
[receive AcceptedAsSecondary [Receive DataStorage send DataResponse(data, address)]
and hasReceivedAllDataInRange()
send ServerAcknowledges]
send ServerReady] [Receive CacheResponse(null, index)
Send HDRequest(index)]
Volunteered SecondaryAwaitStorage SecondaryService
[Receive HDResponse(data, address)

[Receive DataStorage [Receive DoWork(cacheFreespace)
and !hasReceivedAllDataInRange()
Send HDRequest(index) from Cache to HD
store data to HD] Send DoWork(cacheFreespace--)]
FIGURE 7-10. SMDRA SECONDARY SERVER FSM. FEATURES INCLUDE STORAGE OF DATA AND THE SERVICING OF REQUESTS.
SMDRAs become a secondary server during the aforementioned Volunteered state when they receive a
message indicating they are to become a secondary server after which they proceed to the
SecondaryAwaitStorage state. The acknowledgement message they receive indicates the quantity of
data they are to receive when in SecondaryAwaitStorage.
Inside SecondaryAwaitStorage, a server receives storage requests for data which is stored to the
harddrive. A secondary server stays in this state until they receive all data within the range of indices
they are entrusted with storing. Once this occurs, they acknowledge receipt of this data to the primary
server and send a ServerReadReady message to the primary server. They proceed to the
SecondaryServiceState at this point, which is identical in function to CMDRA’s ServiceState and is
depicted in Figure 7-5.
P a g e | 42
A Programming Language is a tool that has profound influence on our thinking habits.
- E. Dijkstra
8. Configuration
A simulation can be set up explicitly by constructing objects and setting parameters through Java.
Alternatively, a configuration file can be written using HVNSLanguage to setup the simulation
environment and then this file can be fed into the HVNS driver program. The Java API involves the
creation of a simulator object and the use of factory and helper methods to create and connect groups
of nodes.
Configuration files are used to customize the operation of a computer program outside of modification
of the code itself or through the use of real-time user intervention through a console. They provide
initial settings in some structured and end-user centric format much like command-line parameters.
Simulators typically employ configuration files due to the sheer size of the configuration data employed
which makes the aforementioned alternatives impractical. The configuration file approach requires the
use of HVNSLanguage (HVNSL), a domain-specific language that attempts to ease the customization of
hardware attributes and the creation of topologies.
The general procedure in either case is largely the same and follows this basic sequence detailed in
Figure 8-1.
Steps to Running a Computer Network Simulation
Instantiate a ComputerNetworkSimulation or subclass.

Setup base computer network types.
Create nodes patterned off the base types.
Connect sets of nodes, creating a topology.
Setup the client.
Install reporters.
Run the simulator.
Start the client’s distribution, reading, and reporting.
FIGURE 8-1. SEQUENCE OF STEPS TYPICALLY USED TO PERFORM A SIMULATION RUN VIA THE JAVA API.
The following sections will provide a walk-through of this procedure, first through the use of the Java
classes themselves, and then through the use of a configuration file. The Java API walk-through assumes
a working knowledge of basic Java classes and methods. The HVNSL example guides the reader through
the HVNSL syntax and semantics through the use of an example configuration file which produces a
simulation run identical to that of the Java API. The section afterward reproduces the full HVNSL
grammar as implemented in ANTLR’s EBNF-like meta-language as a reference for the example and for
the reader’s configuration file development.
P a g e | 43
8.1. Java API
The simulation, network, and computer models are all built in Java. A developer can make use of the
Java API to create and run simulations. Further, a developer can easily extend the base classes to add
features and functionality as required for an experiment.
The following code snippets walk through the steps of running a computer network simulation to
produce experimental output. This is a simple, non-exhaustive look into the operation and use of the
simulation classes and network/computer models that are a part of HVNS. Detailed information
regarding the classes, methods, and fields can be found in the project’s javadoc.
The first step required is to instantiate a ComputerNetworkSimulator or a subclass thereof.
// create the simulator

ComputerNetworkSimulator sim = new ComputerNetworkSimulator();
FIGURE 8-2. CREATING A SIMULATOR.
The base simulator comes with default settings for all network level and hardware level entities. A
subclass can easily override these base objects by extending ComputerNetworkSimulator and overriding
its initialization method.
// create network level items

HardwareComputerNode node = new HardwareComputerNode();
ConnectionAdaptor adaptor = new ConnectionAdaptor();
adaptor.setMaxAllowedOperations( 1000 );
adaptor.setRefreshInterval( 1.0 );
ConnectionMedium medium = new ConnectionMedium();
medium.setTransitTime( 2.0 );
medium.setMaxAllowedOperations( 2000 );
FIGURE 8-3. CREATING NEW BASE TYPES TO BE USED BY THE SIMULATOR'S FACTORY METHODS.
An alternative way to override is built into the basic simulator itself. Substitute types can be built with
custom performance settings as shown in Figure 8-3.
// set network level entities

sim.setBaseNode( node );
sim.setBaseAdaptor( adaptor );
sim.setBaseMedium( medium );
FIGURE 8-4. OVERRIDING THE SIMULATOR'S BUILT-IN BASE TYPES AND THEIR SETTINGS.
These custom types can be set by the simulator’s methods. The base types and their attributes will be
used by the simulator’s factory methods when the developer requests their creation.
P a g e | 44
// create node level entities

Harddrive harddrive = new Harddrive();
harddrive.setMaxAllowedOperations( 100 );
harddrive.setRefreshInterval( 1.0 );
harddrive.setCapacity( 100000 );
harddrive.setTransitTime( 10.0 );
Cache cache = new Cache();

cache.setMaxAllowedOperations( 1000 );
cache.setRefreshInterval( 1.0 );
cache.setCapacity( 10 );
cache.setTransitTime( 1.0 );
AbstractAlgorithm algorithm = new DummyAlgorithm();

algorithm.setServerCount( 10 );
algorithm.setDataAmount( 1000000 );
FIGURE 8-5. CREATING NEW BASE TYPES TO BE USED BY FUTURE NODES THAT WILL BE CONSTRUCTED BY THE SIMULATOR.
The node base type has several important hardware properties, like the harddrive, cache, and algorithm
which can also be customized in the same way. This is shown in Figure 8-5. First the new base objects
are built, and then their attributes are set, just like the network level entities. This is also where
attributes for the algorithm are defined, such as the number of server nodes to use and the amount of
data to distribute.
// set node entities

node.setHarddrive( harddrive );
node.setCache( cache );
node.install( (IAlgorithm)algorithm );
FIGURE 8-6. OVERRIDING A NODE'S BUILT-IN BASE TYPES AND THEIR SETTINGS.
These custom types can be set on the node. Like the network entities, these items and their properties
will be duplicated for new nodes created by the simulator’s factory methods.
// create nodes, connect nodes

INode[] nodes = sim.createNodes( 10 );
sim.connectRandomly( 100, nodes );
FIGURE 8-7. DEMONSTRATION OF THE USE OF A FACTORY METHOD TO CREATE 10 NODES AND THEN CONNECT THEM RANDOMLY WITH
EACH OTHER.
Node creation is simplified greatly by the use of base types and factory methods. The basic factory
methods allow for the creation and simulator registration of individual network elements like nodes,
creation adaptors, and connection mediums. Larger quantities of nodes can also be created as per the
first line of code in Figure 8-7. The simulator also provides myriad methods to connect groups of nodes
to one another in various topological arrangements without the need to explicitly create adaptors or
P a g e | 45
media. The expedited topology connection methods supported are: random, mesh, bus, sequence, and
ring.
// setup client
HardwareComputerNode client = (HardwareComputerNode)nodes[ 0 ];
Harddrive clientHarddrive = (Harddrive)harddrive.clone();
clientHarddrive.setCapacity( 10000000 );
client.setHarddrive( clientHarddrive );
FIGURE 8-8. DESIGNATING AND CUSTOMIZING THE CLIENT NODE.
The client node should next be customized. Typically, the client node will be identical to all other nodes
except that it will be provided with a significantly larger harddrive since it must be able to store the total
amount of data that is to be sent to nodes in the network.
// setup reporters
sim.setOutputPath( "configSet1/config1/run1/" );
sim.addAlgorithmListeners();
FIGURE 8-9. SETTING UP REPORTING/DIAGNOSTIC OPTIONS.
The purpose of a HVNS simulation run is to measure the performance of an algorithm’s remote-reading
operation versus its local reading operation. In order to do this, diagnostic output from the algorithm
must be captured for analysis. The output path for this output is first specified to the simulator. Then
algorithm listeners can be installed on all of the registered nodes. These listeners will report pertinent
information regarding the algorithm’s operation as it occurs to a log file under the output path directory.
Every node in the network will produce a log file with some form of their address as a part of the name.
Details regarding the log file are discussed in Section 9.3.
// start simulation
(new Thread((Runnable)sim)).start();
sim.start();
// start client
client.start();
FIGURE 8-10. INITIALIZATION OF THE SIMULATOR'S EVENT LOOP AND THE EXECUTION OF THE CLIENT'S ALGORITHM.
At this point the network topology has been created with the performance properties specified and the
reporters have begun monitoring their nodes. All that must be done is to start the simulator in its own
thread to begin the event loop. Then the client’s start method can be called. This method informs the
client’s algorithm that it may bootstrap itself into the simulation environment. The algorithm will
schedule a bootstrap event and proceed through it distribution and read runs.
P a g e | 46
8.2. HVNSLanguage
HVNSLanguage is the domain-specific language employed in a HVNS configuration file. HVNSL was
designed to possess the following high-level attributes:
 Succinct : files contents should be viewable on a single 8 ½” x 11” sheet of paper.

 Simple : as few keywords as possible.
 Intuitive : as much orthogonality as possible.
 Readable: structure of a topology should be obvious.
It also has the following functionality requirements:
 Define the simulator class to be used.

 Interface with java objects/methods.
 Allow network entity types to be customized and then instantiated/duplicated.
 Allow hardware entity types to be customized and then instantiated/duplicated.
 Allow one-off instances to be easily created off a base type.
 Ease the creation of topologies of nodes.
 Customize the algorithm employed and define a client node which performs the local-read,
distribution, remote-read procedure.
HVNSL was created with the use of Another Tool for Language Recognition, ANTLR [21]. ANTLR is a LL(*)
parser generator (i.e. a compiler-generator) with unlimited character lookahead. ANTLR can generate
lexers, parsers, and tree-parsers as output. ANTLR is a well-known, actively developed tool. The
primary creator is Terence Parr, who is well-regarded and responsible for advancing parser theory and
tools [22]. ANTLR has many official and user-developed tutorials [23]. The website hosts dozens of
grammars written in ANTLR including those which recognize Java, C#, and Python languages
[24][25][26].
Symbols, grammars, and tree grammars are defined in the ANTLR meta-language and then processed by
its supporting libraries into recognizers in supported target language. Actions written in the target
language can be included in the grammar rules and thereby into the generated recognizer. This allows
interpreters and compilers to be created for a grammar.
P a g e | 47
8.3. HVNSL Example Configuration
This section intends to instruct the reader on the use of HVNSL by way of an example in a similar fashion
to the Java API section.
config Configuration1
begin
// assign the simulator
simulator = java simulation.simulator.ComputerNetworkSimulator;
end
FIGURE 8-11. THE VERY BEGINNING AND END OF A CONFIGURATION FILE. THE SIMULATOR ASSIGNMENT IS THE FIRST AND ONLY REQUIRED
STATEMENT OF THE PROGRAM BODY. THIS IS ALSO AN EXAMPLE OF INSTANTIATING A JAVA OBJECT.
Configuration files begin with the ‘config’ keyword followed by the name of the configuration file and
the keyword ‘begin’. They end with the ‘end’ keyword. The body of the program allows for the use of
three basic statements: assignment, connection, and value statements. The very first statement must
be an assignment statement for the simulator variable. This statement makes it explicit what simulator
is performing the simulation, be it the base ComputerNetworkSimulator or a subclass. The simulator
assignment should occur only once. The simulator created here is used for all factory construction,
implements the functionality indicated by the connection operators, possesses references to all
registered simulatable entities, and is the resultant simulator returned when a configuration file is
processed by the tree-parser.
Assignments like the simulator assignment have the form of
IDENTIFIER ASSIGNMENT_OPERATOR value SEMI_COLON
An identifier begins with a letter followed by any number of letters and numbers thereafter. The
assignment operator is the ‘=’ sign. A value can be the result of any expression. An expression can be
any numeric value, the result of numeric operations (e.g. addition, subtraction, multiplication, etc.), or a
Java object.
The ‘java’ keyword in Figure 8-12 indicates that the following value is a fully-qualified Java class name
and that this object is to be instantiated. Currently, only objects with parameter-less constructors may
be instantiated in this fashion.
// create network level items

baseServerNode = java computation.HardwareComputerNode;
adaptor = java network.entities.ConnectionAdaptor {
setMaxAllowedOperations : 1000;
setRefreshInterval : 1.0; };
connectionMedium = java network.entities.ConnectionMedium {
setTransitTime : 2.0;
setMaxAllowedOperations : 2000; };
FIGURE 8-12. ASSIGNMENT AND CUSTOMIZATION OF THE BASE TYPES. JAVA METHODS ARE ACCESSED VIA THE CURLY BRACE CONSTRUCT.
P a g e | 48
The base types are next created and customized. The’ java’ keyword is again used to indicate that a Java
object is to be instantiated and assigned to the variable on the left side. The adaptor and
connectionMedium assignments demonstrate the use of curly braces to access a java object’s methods.
Multiple methods may be called within the curly braces. Methods inside the curly braces are applied
after instantiation occurs but before assignment. The general form of a method call inside of curly
braces is:
IDENTIFIER COLON expression SEMI_COLON
// set network level entities

simulator {
setBaseNode : baseServerNode;
setBaseAdaptor : adaptor;
setBaseMedium : connectionMedium; };
FIGURE 8-13. SETTING THE BASE TYPES FOR THE SIMULATOR.
The simulator’s base types are next set. These assignments occur through the use of the curly brace
construct to access the simulator’s ‘setBase’ methods. The base types are duplicated by the simulator
whenever a construction request occurs.
// setup node level entities

harddrive = java computation.hardware.Harddrive {
setRefreshInterval : 1.0;
setCapacity : 100000;
setTransitTime : 10.0; };
cache = java computation.hardware.Cache {
setRefreshInterval : 1.0;
setCapacity : 10;
setTransitTime 1.0: };
alg = java computation.algorithms.DummyAlgorithm {
setServerCount : 10;
setDataAmount : 1000000; };
FIGURE 8-14. CREATION AND CUSTOMIZATION OF ENTITIES INTERNAL TO NODE.
Creation and customization of entities internal to node occurs identically to that of the network entity
base types as shown in Figure 8-14. Duplication of a node entity will result in a duplication of the node’s
internal values much like what occurs when simulator creates any other network entity.
P a g e | 49
// set node level entities

baseServerNode {
setHarddrive : harddrive;
setCache : cache;
install : alg; };
FIGURE 8-15. SETTING THE BASE TYPES FOR NODE.
Setting the base types of node’s internal entities occurs in the same fashion as the network entities with
the simulator.
// create nodes
c1, n1, n2, n3, n4, n5, n6, n7, n8, n9 = make Node;
FIGURE 8-16. THE CREATION OF MULTIPLE NODES. HERE MULTIPLE ASSIGNMENT STATEMENT IS USED.
Next the nodes to be simulated are created. This is a multi-assignment statement. The left hand
specifies a comma separated list of identifiers. Each of these will be assigned a unique instance of a
value on the right side of the assignment operator.
The value for this assignment statement uses the ‘make’ and ‘Node’ keywords. This instructs the
simulator to create a new instance of the node base type. Since this is a multi-assignment, 10 new
instances will be created in this fashion. ‘make’ can also be combined with the ‘Adaptor’ and ‘Medium’
keywords to construct adaptors and media fashioned after the base adaptor and media resepctively.
// setup client
c1 { setHarddrive : clone harddrive { setCapacity : 1000000; }; };
FIGURE 8-17. EXAMPLE OF THE USE OF THE CLONE KEYWORD AND MULTIPLE EMBEDDED CURLY BRACES.
The client node is now customized in Figure 8-17. It was created earlier using the make statement.
Here its harddrive is set to a larger capacity. The object’s ‘setHarddrive’ method is called with the value
produced by the ‘clone’ keyword. ‘clone’ takes an identifier for a variable that was previously created.
The Java object returned by this operation has its ‘setCapacity’ method called.
// connect nodes
(& c1 n1 n2 n3 n4 n5 n6 n7 n8 n9);
FIGURE 8-18. A CONNECTION STATEMENT. 10 NODES ARE CONNECTED RANDOMLY WITH ONE ANOTHER. ALL NODES ARE GUARANTEED TO
BE REACHABLE.
A topology is created through the use of connect prefix operators. Use of the random connection
operator is shown in Figure 8-18. A connect statement occurs inside of parentheses and takes a list of
nodes. The nodes will be connected dependent upon the operator employed. Any Adaptors and
connection media required for the connection will be created with the simulator’s factory methods.
P a g e | 50
The operators are as follows:
 +
o Name: Series connector .
o Description: Nodes are connected in series with the first node connected to the
second, the second to the third, etc.
 *
o Name: Bus connector.
o Description: Nodes are connected to each other through a single connection
medium.
 #
o Name: Mesh connector.
o Description: All nodes are directly connected to each other. A medium exists for
each connected pair.
 &
o Name: Random connector.
o Description: Every node is connected randomly to one other node in the set. The
topology created guarantees that any node is reachable from any other.
 @
o Name: Ring connector.
o Description: Nodes are connected in a ring, with the first node connected to the
second, the second to the third, etc. The first and last nodes are also connected.
simulator { setClient : c1; };
FIGURE 8-19. SETTING THE CLIENT NODE.
The client node is now specified which allows the driver program which accepts the configuration file to
execute the algorithm on the client node. This concludes the HVNSL configuration example.
P a g e | 51
8.4. Configuration Directory Structure
Configuration files and the log files produced from their simulations are organized into groupings in the
file system for mass processing and analysis. These groupings are primarily designed to aid in the
creation of graphs for visual inspection of a DRA’s local versus remote read performance. The directory
structure is as follows:
configurations/
config_set[s1-sn]/
client_set_averages.log
config_[x1-xn].cfg
config_[x1-xn]/
avg/
client.log
run_[r1-rn]/
node_[a1-an].log
This hierarchy will be described in order, from larger organizational structures to smaller ones.
configurations is the root of all configuration sets. It holds a collection of graph producing structures.
Its name is unrestricted.
Config_set[s1-sn] is a configuration set directory. It stores all configuration files and all config
directories. Multiple config directories are needed when isolating the manner in which varying a single
property of a configuration file affects performance. Each config_set directory produces several x value
for a graph, demonstrating how the variance of a single attribute affects local and remote read-time
performance. It “produces” a single graph. Its name is unrestricted.
config_[x1-xn].cfg is a configuration file. Configuration files contain code configuring a simulator. There
is a one to one correspondence between configuration files and configuration directories. Its name is
unrestricted.
config_[x1-xn] is a configuration directory. This directory contains all run directories produced from
multiple simulation runs from simulators compiled multiple times from a single configuration file base.
Multiple run directories should be produced when evaluating configurations that have stochastic
properties (e.g. random topologies). It also stores an avg directory. Its name is unrestricted.
P a g e | 52
avg is an average directory which contains the client.log file. The client.log file contains a single line
containing the base local read-time and remote read-time as averaged from all client nodes found
across all runs in the config directory. It is generated and/or refreshed after multiple runs occur. The
values in this file correspond to a single x value on a graph.
run_[r1-rn] is a run directory. A run is produced from the execution of a single simulation compiled from
a single configuration file. Run directories are numbered from 1 to some arbitrary value. This directory
contains all node log files. This is the default naming convention that can be overridden via Java.
node[a1-an].log is a log file. It contains pertinent algorithm events associated with a single node on the
network during one run of a configuration. This is the default naming convention that can be overridden
via Java.
P a g e | 53
8.5. HVNSL Grammar
/// PARSER
/** Start-rule. */
Script : 'config'! NAME! 'begin'! simulatorAssignment ( statement )* 'end'! EOF! ;
/** Special assignment for the simulator to use. */

simulatorAssignment
: NAME ASSIGN ( expression ) SEMI { "simulator".equals($NAME.text ) }?
-> ^( ASSIGN NAME["simulator"] expression );
/** Statements. */
Statement : assign | connectStatement | value SEMI -> ^( value ) ;
/** Assignment of values to variables. */

Assign : NAME (COMMA NAME)* ASSIGN ( expression ) SEMI -> ^( ASSIGN NAME expression )+ ;
/** Calls to setter methods and object fields. */

innerAssigns : LEFT_BRACE ( NAME COLON expression SEMI )+ RIGHT_BRACE -> ( NAME expression )+
;
/** Handles value creation, including insantiation. */

Value : MAKE 'Node' -> ^( SIMULATOR_CREATE NODE )
| MAKE 'Medium' -> ^( SIMULATOR_CREATE MEDIUM )
| MAKE 'Adaptor' -> ^( SIMULATOR_CREATE ADAPTOR )
| JAVA NAME -> ^( JAVA_INSTANTIATE NAME )
| JAVA NAME in=innerAssigns -> ^( JAVA_INVOKE ^(JAVA_INSTANTIATE NAME) $in )
| CLONE NAME -> ^( CLONE NAME )
| CLONE NAME in=innerAssigns -> ^( JAVA_INVOKE ^(CLONE NAME) $in )
| NAME
| NAME in=innerAssigns -> ^( JAVA_INVOKE NAME $in )
| LEFT_PAREN! expression RIGHT_PAREN!
| FLOAT
| INTEGER ;
/** Addition and subtraction operations. */

Expression : mult ( ('+'^ | '-'^ ) mult )* ;
/** Multiplication operations. */

mult: unary ( ( '*'^ | '/'^ ) unary )* ;
/** Handles unary positive and negative operators for numerics. */

Unary : ('+'! | negation^)* value ;
/** Convert unary '-' to a negation token. */

Negation : '-' -> NEGATION ;
/** Handles the base of a connect statement. */

connectStatement : LEFT_PAREN! ( nodeConnectOperator^ NAME+ ) RIGHT_PAREN! SEMI! ;
/** Converts the connect operator into the proper token that the tree evaluator
* understands and which is unique from the math operators. */
nodeConnectOperator
: '+' -> SERIES_CONNECT_OP
| '*' -> BUS_CONNECT_OP
| '#' -> MESH_CONNECT_OP
| '&' -> RANDOM_CONNECT_OP
| '@' -> RING_CONNECT_OP ;
FIGURE 8-20. HVNSL GRAMMAR USED BY ANTLR TO CREATE THE PARSER. THE GRAMMAR HERE PRODUCES THE ABSTRACT SYNTAX TREE
AGAINST WHICH THE TREE PARSER EVALUATES CAUSING PROGRAM EXECUTION. THE RESULT OF THIS EXECUTION IS A CONFIGURED
SIMULATOR.
P a g e | 54
// LEXER
// control charaacters
ASSIGN: '=';
LEFT_PAREN: '(';
RIGHT_PAREN: ')';
LEFT_BRACE: '{';
RIGHT_BRACE: '}';
COLON: ':';
COMMA: ',';
SEMI: ';';
// keywords
CLONE: 'clone';
MAKE: 'make';
JAVA: 'java';
// names and strings

NAME: LETTER ( LETTER | DIGIT | '_' | '.' )*;
fragment LETTER: LOWER | UPPER;
fragment LOWER: 'a'..'z';
fragment UPPER: 'A'..'Z';
// numerics
INTEGER: NON_ZERO_DIGIT DIGIT*;
FLOAT: INTEGER '.' DIGIT+;
fragment DIGIT: '0'..'9';
fragment NON_ZERO_DIGIT: '1'..'9';
// spacing
WHITESPACE: ( SPACE | NEWLINE )+ { $channel = HIDDEN; };
fragment NEWLINE: ('\r'? '\n')+;
fragment SPACE: ' ' | '\t';
// comments
SINGLE_COMMENT: '//' ~('\r' | '\n')* NEWLINE { skip(); };
MULTI_COMMENT options { greedy = false; }
: '/*' .* '*/' NEWLINE? { skip(); };
FIGURE 8-21. ADDITIONAL GRAMMAR TOKENS. THESE ARE TOKEN RULES USED BY THE GENERATED LEXER.
P a g e | 55
To measure is to know.
If you can not measure it, you can not improve it.
- Lord Kelvin
9. Algorithm Benchmarking
Benchmarking is the process of running a piece of software or algorithm through a number of trials
under varying conditions and measuring its relative performance. The Hardware Varying Network
Simulator and the Distribution and Retrieval Algorithms (DRA) were designed to examine the hardware
conditions under which it is time advantageous to use network-based storage instead of local storage.
This chapter discusses these measurements, how they are generated, and how they are analyzed. The
specific hardware conditions to be varied as well as the expected results of this variance are discussed in
Chapter 10.
9.1. Metrics
The baseline measurement is local read-time. Local read-time represents the case where a user/client
of data only has access to local storage for data. A majority of personal computer users fall within this
case.
Local read-time is the period of time events beginning with the DRA’s first request for data from the
harddrive. It encompasses all subsequent requests to the harddrive for data as well as the harddrive’s
responses to those requests. It ends with the DRA’s reception of the harddrive’s response for the last
outstanding piece of data requested. The numeric value of the local read-time is the delta of the time of
period’s end event and the period’s beginning event.
The experimental measurement is remote read-time. Remote read-time represents the case where a
user/client has access to network area storage or cloud-based storage for data. This is a common case
for individuals in corporate or research environments with version control or content-sharing systems.
It is also a rapidly increasing case for home users as Web 2.0 applications increase their cloud-based
storage offerings and consumer electronics manufacturers offer more Network Attached Storage
[27][28][29].
Remote read-time is the period of time events beginning with the DRA’s first request for data from a
remote server. This first request will take the form of a packet containing the request message being
given to the transport protocol handler and subsequently being sent down the stack and out into the
network where it is routed to its destination. It encompasses all subsequent requests to remote servers
as well as the data responses those servers send across the network to the client. It may include server-
to-server control messages relaying requests and related management overhead. The period ends with
the DRA’s reception of the DRA response for the last outstanding piece of data requested. The numeric
P a g e | 56
value of the remote read-time is the delta of the time period’s end event and the period’s beginning
event.
9.2. Logging Implementation
The values associated with the metrics previously described are affected by the operations of many of
the DRAs and nodes of the network. However, only the client’s context requires inspection since it is
from its frame of reference that the question regarding local and remote read-times is posited, and its
values will be affected by the operation of the aforementioned entities.
DRAs are implemented as observables which notify observers of events [20]. These AlgorithmEvents
provide information regarding a DRA’s data and control messages which contain information that can be
used to determine local and remote read-times. These events are not to be confused with the
scheduled events of the simulator and simulatables, though scheduled simulator events will often
provoke the creation of an AlgorithmEvent and observer notification. IAlgorithmListener implementers
are the observables that understand AlgorithmEvents. AlgorithmListener is an IAlgorithmListener which
records these events into an aforementioned log file as they occur during the simulation.
AlgorithmEvents which occur during the local read-time period are designated with a LOCAL value in the
log file. AlgorithmEvents which occur during the remote read-time period are designated with a
REMOTE value in the log file. The delta of the time for the last occurring LOCAL event with that of the
time for the first occurring LOCAL event is the baseline local read-time. The delta of the time for the last
occurring REMOTE event with that of the time for the first occurring REMOTE event is the remote read-
time.
9.3. Log File Format
AlgorithmListeners create log files for every algorithm attached to every node on the network during a
run. Every AlgorithmEvent is recorded on its own line as a tab delimited list of attributes describing that
event. A simulation run may produce very many of these lines, especially for configurations involving
large amounts of data or a large number of servers. The format of a line is as follows:
period time dataSent dataReceived controlSent controlReceived dataStored dataReceived
 period – has two different meanings. Typically this is the name of the DRA’s state during which
the event occurred. It can also take on the values LOCAL and REMOTE. It is LOCAL for events
which occur during states where local reads are occurring against the client’s harddrive;
beginning from the first request to local storage and ending with the last response. It is
REMOTE when remote reads are occurring against the storage network; beginning when the
first data requests is sent out in a packet across the network and ending when the last piece of
data received from the network. The total amount of time covered by LOCAL provides the
baseline local-read time to beat, while the total amount of time covered by REMOTE provides
the remote-read time.
 time – discrete point in time when the event occurred.
P a g e | 57
 dataSent – total amount of data containing messages this algorithm sent to other entities.
 dataReceived – the total amount of data containing message this algorithm received from other
entities.
 controlSent – total amount of control/non-data messages sent to other entities. This includes
messages to other DRAs and storage devices for data requests, server management, etcetera.
 controlReceived – total amount of control/non-data messages received.
 dataStored –total amount of data that was sent to the harddrive for storage.
 dataRetrieved - total amount of data that was retrieved from the harddrive to field some
request.
9.4. Generating Logs
The HVNSDriver is the entry program for configuration use and simulation execution. It takes two
parameters: path to a configuration set collection and the number of simulations to run per
configuration.
Execution of the HVNSDriver creates a collection of ConfigurationSetManagers which in turn create

ConfigurationManagers for all configuration files found. ConfigurationManagers create configuration
directories and run directories as needed. They also average client run information to produce the
averaged values in the client.log file discussed earlier. The client.log files, in turn, are used by the
ConfigurationSetManager to produce client_set_averages.log which contains tab delimited lists of the
local and remote read times from all client.log files occupying a set.
P a g e | 58
A fact is a simple statement that everyone believes. It is innocent, unless found guilty. A
hypothesis is a novel suggestion that no one wants to believe. It is guilty, until found effective.
- Edward Teller
10. Simulation Expectations
A variety of hardware and algorithm attributes will be altered and their influence on DRA performance
measured. There are three hardware characteristics and two physical characteristics which are
hypothesized to affect remote read times. The hardware attributes are: connection adaptor speed,
cache size, and cache speed. The algorithms attributes are: server quantity and data redundancy. This
chapter hypothesizes how varying these attributes will affect the performance of the remote read-times
relative to the baseline local-read-times.
Performance is inversely proportional to read-time duration. Performance increases as read-times

decrease. Performances decreases as read-times increase. Performance increases may be indicated by
language such as “better”, “strengthened”, or “improved.” Performance decreases may be indicated by
language such as “worse”, “weakened”, or “degraded.”
10.1. Varying Adaptor Speed
Decreasing adaptor speed should decrease performance since the client will have to wait longer to
receive all data responses. Increasing adaptor speed should increase performance as data arrives at its
destination more quickly. This increase in performance cannot occur forever. The servers themselves
have to store the information which is being requested. They store this in their harddrive primarily, but
also in their fast caches. Past a certain point of adaptor speed, the cache is going to be drained so
quickly, that requests will be forced to be serviced by the slower harddrive access. This is mitigated to
some degree by the quantity of servers. While one server may have to access the data from the hard
disk, another one may be able to quickly reply from its cache. Adaptor speed can be increased up until
the limit of the connection medium, after which point no further increases can be experienced.
10.2. Varying Cache Size
Decreasing cache size should decrease performance since DRAs will more quickly exhaust the cache and
be forced to access the harddrive more often to field requests. Increasing cache size should increase
performance until such a point in time where the cache is being exhausted and the adaptor is waiting on
the cache or when adaptor’s speed to transmit information becomes the bottleneck. At that point, the
additional cache is wasted since the cache is waiting on the adaptor.
P a g e | 59
10.3. Varying Cache Speed
Decreasing cache speed should decrease performance until the point where it meets the harddrive's
speed, at which point no further decreases will be seen. Increasing cache speed will increase
performance until the wall imposed by the client adaptor's speed and/or the server's adaptor's speed.
10.4. Varying Server Quantity
Decreasing the number of servers to the point where total data stored is less than the total original data
available increases read-times to infinity. This cannot be modeled with the current DRAs since the
server quantity impacts the number of slices into which the data is divided. This means that setting the
server quantity to one, simply forces that single server to host all data.
Decreasing server quantity should decrease performance since there are less effective resources
available for requests. A single server is limited by the operations and speed that its adaptor (and other
hardware) can support. Increasing server quantity adds additional effective adaptor bandwidth and
cache which should increase performance. Speeds will increase until the client's adaptor(s) are
saturated or synchronization/server messaging overhead saturates the servers/connection
medium. There is also the concern that adding additional servers, even with free coordination, and
more cache won't increase performance if the time it takes to respond to a message is slowed
significantly by network distance to the requesting node. If the servers are farther away, then the time
it takes for them to respond to a message may be greater than the time it takes for a closer server to
respond, even considering the fact that a closer server may have to wait for a harddrive request or its
connection adaptor to be free again. At that point it is a matter of being limited to the faster of either
hard-drive speed or adaptor/network speed of servers. It is anticipated that the greater concern is the
speed of the client’s adaptor and that communication distance will cancel much of the performance
increase.
10.5. Varying Redundancy
The SMDRA supports data redundancy such that some number n copies of the total data will be
replicated by servers. Increasing redundancy allows multiple servers which have additional non-
saturated resources to field requests thus increasing performance. This redundancy should suffer from
the same restrictions/limitations of increasing server quantity. Speeds should increase until the client(s)
adaptors are saturated. If these servers are further cost wise from the client this may have no affect or
even decrease performance as was discussed in Section 10.4.
P a g e | 60
The strongest arguments prove nothing so long as the conclusions are not verified by experience.
Experimental science is the queen of sciences and the goal of all speculation.
- Roger Bacon
11. Simulation Results
There are two collections of configuration sets, one for CMDRA and one for SMDRA. Each of these
collections contained configurations sets for simulation benchmarking. These configuration sets are
identical between the two algorithms, differing only BY the algorithm employed against the hardware
configuration. There are four configuration sets for CMDRA including: adaptor speed, cache size, cache
speed, and server quantity. There are five configuration sets for SMDRA which has an additional
configuration set for server redundancy since it supports the feature while CMDRA does not.
Each configuration was run three times to control stochastic properties of the simulation that were
likely to skew the results significantly (e.g. very lucky or unlucky cache usage). Certain configurations
were run more frequently when it became apparent that three would be insufficient to balance out this
randomness.
The topology employed has a client node directLy connected via several connection adaptors to three
nodes, and has at least six additional nodes that are at most one hop away. The rest of the topology is
randomly generated. All topologies contain a total of 51 nodes, client node inclusive.
Time values are in simulator time units. A whole time unit is considered to be a second. All items have a
refresh rate of one second. All operations are in the form of data per second. Data is considered to be a
single byte in size.
Of importance was the use of some reasonable hardware values that exist today as “jumping-off”
points. Simulation runs will explore the manipulation of these attributes, increasing them from basic to
state-of-the-art and beyond. State-of-the-art hardware values are displayed below
Hardware Attribute Base Values Normalization to Bytes Scaled Down

Harddrive Capacity 1 TB 1.0 X 10^12 B 1.0 X 10^12 B 100000 data
Rate 70 MB/s 7.0 X 10^7 B/s 7.0 X 10^7 B/s 7 data / sec
Cache Capacity 4 GB* 4.0 X 10^9 B 4.0 X 10^9 B 400 data

Rate 12 GB/s* 1.2 X 10^10 B/s 1.2 X 10^10 B/s 1200 data / sec
Adaptor Rate 10 Gb/s 1.0 X 10^10 b/s 1.25 X 10^9 B/s 125 data / sec
Connection Rate 10 Gb/s 1.0 X 10^10 b/s 1.25 X 10^9 B/s 125 data / sec
Media
*Taken from main memory values.
TABLE 11-1. STATE-OF-THE-ART HARDWARE VALUES AS SEEN IN REAL-WORLD USE AND THE SCALED VERSIONS USED IN HVNS SIMULATIONS.
P a g e | 61
10000 data will be distributed and retrieved by these algorithms which represent 100 GB of data after
scaling.
Each experiment will vary a single attribute value, and hold the rest of these values fixed. The fixed
values will change from experiment to experiment to better demonstrate the principle being
benchmarked (i.e. adaptor speed will be increased larger than needed for the distribution algorithm to
beat baseline in the case where the effects of cache speed are being considered to better demonstrate
cache’s effect without meeting the adaptor bottleneck). This will account for some deviations in remote
read-times. The hardware values used will be described in each section.
11.1. Varying Adaptor Speed
Varying Adaptor Speed

12000
10000
time (seconds)
8000
6000 Local Read
4000
CMDRA Remote Read
2000
SMDRA Remote Read
0
0 200 400 600 800 1000
adaptor speed ( (data X 10^7) / second )
FIGURE 11-1. VARYING ADAPTOR SPEED DEMONSTRATES IMPROVED PERFORMANCE IN LINE WITH THE HYPOTHESIS.
Adaptor speed data confirms the hypothesis that adaptor speed increases result in better remote read
performance. Remote read times are better when there is fast cache available and adaptor speed
suitably eclipses harddrive speed. CMDRA requires an adaptor that is at least twice as fast as the
harddrive. SMDRA requires an adaptor that is three times as fast as the harddrive due to the increased
delay created from having to relay requests to secondary servers. This performance tops out after
meeting the performance of the connection media. CMDRA performs much better here since the client
directly communicates with the server containing its data. In SMDRA the client sends requests to a
primary server which relays the request to a secondary server containing the data. The secondary
server then responds to directly to the client. This relaying of requests delays the client’s reception of
the response tremendously.
P a g e | 62
11.2. Varying Cache Size
Varying Cache Size

with constant Cache Replenishment
500
400
time (seconds)
300
Local Read
200
CMDRA Remote Read
100
SMDRA Remote Read
0
0 2000 4000 6000 8000
cache size (data)
FIGURE 11-2. VARYING CACHE SIZE DECREASES PERFORMANCE, CONTRARY TO THE HYPOTHESIS BECAUSE OF THE CACHE REPLENISHMENT
EMPLOYED.
The first benchmarks maintained the naïve cache replenishment algorithm. Figure 11-2 demonstrates
that increasing cache size actually decreases performance unless it is increased to the point where it
stores all of the requestable data. This is completely contrary to the hypothesis. The reason this occurs
is because of the manner in which CMDRA and SMDRA handle their caches. They initially fill their
caches with all data that they are sent. As they successfully retrieve information from the cache for the
client, they fetch additional information from the harddrive to ensure that they are at capacity. These
are not free requests. These requests consume both the cache’s and the harddrive’s available
operations. This means that as the size of the cache increases, the algorithm gets slower because it
takes an increasing amount of operations to keep the cache full. This slowdown occurs until such a
point where the cache is large enough to contain all available data. At this point, the client will be
served all data from the cache immediately and will have completed all read operations before it can be
bogged down by the server’s incessant filling of the cache.
P a g e | 63
Varying Cache Size

with no Cache Replenishment
160
140
time (seconds)
120
100
80 Local Read
60
40 CMDRA Remote Read
20 SMDRA Remote Read
0
0 2000 4000 6000 8000
cache size (data)
FIGURE 11-3. VARYING CACHE SIZE WITHOUT CACHE REPLENISHMENT DEMONSTRATES IMPROVED PERFORMANCE IN AGREEMENT WITH
THE HYPOTHESIS.
Figure 11-3 demonstrates what happens when this cache refill policy is disabled. Here the read times
correspond to expectations. The take-away is that use of a cache always requires paying the cost of
slow harddrive access. The question is whether the client will pay for that access or whether the server
has enough lead time to pay it for the client. Both CMDRA and SMDRA pay this lead access during the
initial data store event when they store all data in the harddrive but also some portion of that data in
the cache. With cache refill disabled, the client gains the fast cache access without having to pay the
harddrive access. When cache refill is enabled, the client is forced to pay for the access as the servers
continue to fill the cache needlessly which exhausts the cache’s and the harddrive’s ability to do useful
work.
P a g e | 64
11.3. Varying Cache Speed
Varying Cache Speed

300
250
time (seconds)
200
150 Local Read
50 SMDRA Remote Read
0
0 200 400 600 800 1000
cache speed (data/second)
FIGURE 11-4. INCREASING CACHE SPEED IMPROVES PERFORMANCE IN AGREEMENT WITH THE HYPOTHESIS.
Cache speed data also confirms the hypothesis that faster caches result in improved remote read times.
In this case, increasing the cache speed improves performance until reaching the limits of the
connection adaptor. Of note is the performance of SMDRA here. The adaptor speed for this
experiment allows CMDRA to achieve remote read performance that is better than baseline. SMDRA
requires a significantly faster adaptor in order to realize the same performance. However this
experiment still demonstrates that cache speed increases improve the performance of both distribution
algorithms.
P a g e | 65
11.4. Varying Server Quantity
Varying Server Quantity

(Single Client Adaptor)
1600
1550
time (seconds)
1500
1450 Local Read
1350
SMDRA Remote Read
1300
0 10 20 30 40 50 60
server quantity
FIGURE 11-5. INCREASING SERVER QUANTITY WHEN THE CLIENT POSSESSES ONLY A SINGLE ADAPTOR OF COMPARABLE PERFORMANCE TO
THE SERVERS’ ADAPTORS RESULTS IN NO PERFORMANCE GAIN, IN AGREEMENT WITH THE HYPOTHESIS.
Varying server quantity with only a single adaptor / single direct connection to the client greatly reduces
the performance of the DRAs. The adaptor becomes saturated and congested, slowing down the
retrieval of data.
Varying Server Quantity

1800
1600
1400
time (seconds)
1200
1000
800 Local Read
400
SMDRA Remote Read
200
0
0 10 20 30 40 50
server quantity
FIGURE 11-6. INCREASING SERVER QUANTITY WHEN MULTIPLE ADAPTORS ARE PRESENT ON THE CLIENT IMPROVES PERFORMANCE IN
AGREEMENT WITH THE HYPOTHESIS.
Varying server quantity with multiple adaptors / direct connections to the client confirms the hypothesis
that having more servers improves performance of CMDRA until these adaptors become saturated and
performance degrades. SMDRA does not show any improvement at all because all requests are
P a g e | 66
channeled through the primary server. The primary server’s connection adaptor is saturated from onset
and shows no improvement from the single client adaptor model.
11.5. Varying Redundancy
Varying Server Redundancy

600
500
time (seconds)
400
300
Local Read
200
SMDRA Remote Read
100
0
0 2 4 6 8 10
redundancy
FIGURE 11-7. INCREASING DATA REDUNDANCY RESULTS IN NO PERFORMANCE GAIN FOR SMDR SINCE THE PRIMARY SERVER’S ADAPTOR IS
THE BOTTLENECK.
Server redundancy does not improve SMDRA’s performance since the system is still limited to the speed
of the primary server’s adaptor. See Section 11.4 for further information.
P a g e | 67
Those who think ‘Science is Measurement’ should search Darwin’s works for numbers and
equations.
- David Hunter Hubel
12. Simulator Comparison
The degree of differences between simulators makes it difficult to directly compare the running time
performance of one simulator against another even when simulating “identical” network topologies
with nodes running identical protocols. This is due to differences in implementation languages and
variations in the implementations themselves. Succinctly, if it isn’t the same code running on the same
machine under the same conditions, it isn’t identical. Additionally, since simulators which model this
domain do not exist, a comparison of the results obtained from HVNS and these simulators cannot be
performed. This does not mean that no comparisons can be made.
The features of HVNS can be qualitatively assessed against the features of other simulators currently
being used, including protocol support and the complexity of the network devices being modeled.
Configuration files can be compared for ease‐of‐use and expressiveness. The quantity of nodes and
the complexity of topologies supported may also be compared. Other aspects of the architecture and
design of simulators can be compared to demonstrate how extensible/expandable it is for future
use/reuse. The following discussion is designed to give a high-level overview of these simulators to
provide the reader with a flavor for their implementation and capabilities. Injected into this discussion
will be the occasional comparison between the discussed simulator’s functionality and the functionality
of HVNS or HVNSL.
The simulators chosen for comparison are Network Simulator version 2 (ns-2), Java in Simulation Time
(JiST), and OMNeT++.
12.1. ns-2
ns-2 is a discrete event network simulator. It is the de-facto standard used by the network protocol
research community to simulate a wide-range of network protocols for the application, transport,
routing and link layers including HTTP, UDP, TCP, RTP, SRM, CSMA/CD, RTP, etcetera [30]. It was
originally developed at UC Berkely and is now maintained by a multitude of researchers across several
institutions. It is implemented in C++ and MIT Object Tool Command Language (OTcl) [31][32].
The blend of C++ and OTcl is a major source of ns-2’s power and complexity. OTcl itself is a scripting
language that was designed for application configuration. It is a syntactic front-end for C with all the
functionality that entails [33]. The simulation itself is coded in C++ but has OTcl objects that are linked
to these C++ objects . This allows the OTcl based configuration language powerful control to modify
aspects of the simulation environment and attributes of the network objects in the scripting language
without the need for recompilation. This power comes at a price. Modifications and extensions to ns-2
P a g e | 68
require the modification of C++, OTcl, or both [5]. OTcl itself is often cited as a major hurdle for
configuring simulations and implementing protocols [5].
In contrast, HVNS is implemented in entirely in Java which has a much larger user-base than OTcl [34].
Extensions to HVNS require the use of the network simulator base classes and implementing necessary
interfaces for the network/computational/simulation. It is configured with HVNS—a domain specific
language. It is designed specifically to support the creation of topologies of network object and the
modification of their attributes. It only provides support for Java objects and the invocation of a subset
of their methods. It does not support looping, conditionals, native class definitions, or functions. It was
designed with few keywords so that it would be easily learned.
Architecturally, both ns-2 and HVNS are single-processor programs with a single event loop which
executes events sequentially. This limits scalability and affects performance. Ns-2 programs simulating
200 seconds of clock-time on networks of hundreds of nodes can take hours to complete [35].
Comparatively, HVNS can simulate networks of this size with the distribution of hundreds of thousands
of pieces of data representing hundreds of thousands of time-steps in an hour of clock time. HVNS
simplifies the protocol stack and abstract out its performance through the operation bound
simulatables. More complicated protocols may have greater transit time or be able to perform a smaller
number of operations per time interval. This simplification allows HVNS to model topologies and data
sets that are much larger than ns-2.
Ns-2’s simulation relies upon a model of nodes and links. Agents implement protocols which are
installed as event handlers on these nodes. Entities respond to events which they schedule for
themselves to handle at some point in the future. HVNS is similar as it has nodes and connection media.
Nodes are connected to media via adaptor. ComputerHardwareNodes implement an interface that
allows DRA (agents) to be installed upon them to handle messages. Operation bound simulatables
provide a hook to allow any event handler that is implemented to be affected by performance
boundaries. HVNS allows any entity to schedule events to any other entity (including itself) in contrast
to ns-2 which only allows self-scheduling [36]. HVNS’s any-scheduling forces entities to communicate
via events and message passing which brings simulation time semantics to all interactions and not just
agent-level entities as in ns-2.
The network model and protocol stack of ns-2 is rich and has been extended to a number of networking
standards. The TCP/IP, CSMA, protocols were at one point near identical with those included in Linux
distributions. The network can have nodes dynamically join and leave. Connections can have degraded
performance or be severed entirely. Data is converted to bit-streams during transmission across the
physical layer. Transmissions can be interrupted, pre-empted, and corrupted.
HVNS has a very simple network model and protocol stack that does not support an unreliable network
or congestion control. Network nodes and connections maintain un-degraded operation throughout.
HVNS abstracts out much of the network protocol stack to focus specifically on how hardware affects
performance of data distribution. HVNS’s physical layer transmits packets of data that are not
P a g e | 69
converted into bits. These transmissions always reach their intended target and are never corrupted.
HVNS can simulate the time-delay effects of these network properties by increasing transmission times.
Ns-2 provides comprehensive tracing and analysis tools for examining events that occur during
simulation runs. It also allows a user to peak into and monitor events currently in an actively running
simulation environment. HVNS provides support for logging events on a node or simulator level. HVNS
aggregation tools which support the analysis of very specific local and remote read times for the client
node but this is not customizable.
12.2. JiST
JiST is a discrete event simulation framework designed to provide transparent support for simulation
time semantics in Java and to be scalable to network sizes beyond what ns-2 or GloMoSim can
reasonably support [9].
JiST simulations are written in Java and compiled with the standard Java compiler. The simulation is
then run through JiST which rewrites the bytecode to include simulation time semantics. The modified
code is run on the simulation kernel of the virtual machine which has the embedded simulation event
loop [37].
This process avoids to the need to learn a domain-specific language intermediary like OTcl, HVNSL, or
NED. It also avoids the need to represent method invocation through explicitly created events as in
HVNS. Objects are packaged within entities which maintain their own simulation time which is
dependent upon program progress. Entity references to one another are replaced by separators which
act as proxies to the referenced entity. Entity method invocations are events. Method invocation
occurs in simulation time meaning that the proper temporal ordering of execution is maintained.
JiST exploits the parallelism opportunity present for events scheduled to occur at the same simulation
time. These codes are executed in parallel across synchronized threads. This parallelism can be
extended to occur across multiple networked hosts. Further, JiST implements heuristics to migrate
entities across hosts to maintain load balance. Entities which communicate regularly with one another
are migrated to be on the same host or on “adjacent” hosts in order to minimize remote communication
overhead.
JiST provides further speedup by supporting “optimistic” execution of events. Scheduled events cannot
normally be executed until simulation time catches up with the scheduled time of the event. This
restriction is in place because intermediate events may be scheduled within the intervening period of
time which may change the state of an entity. JiST optimistically executes some of these events before
it is safe to do them but maintains a snapshot of their state before execution. This allows JiST to undo
the execution if an intermediate event is scheduled. However, if no such event occurs, JiST has realized
performance gain.
P a g e | 70
JiST represents a completely different paradigm of discrete event simulation which is compelling and
worthy of discussion but it is not incredibly comparable to HSVN or the other simulators described in
this chapter.
12.3. OMNeT++
OMNeT++ is a parallel discrete event simulator. It is written in C++ though it has extensions to support
other programming languages like Java [38]. It provides the ability to partition the simulation
environment into 32k+ partitions for parallel operation unlike the single threaded models of ns-2 and
HVNS [39]. It is primarily used to simulate network communications. However, it provides a framework
for creating models out of hierarchies of custom modules for any problem domain.
OMNeT’s framework has modules communicate via channels to either a module or to a gate/port on the
module. This is similar to HVNS where nodes are connected via adaptors to connection media. These
channels can be configured with propagation, bit error rate, and data rate values. HVNS’s connection
media can have transit times and operations per time but not bit error rate.
OMNeT++’s modules are configured through parameters assigned through NED (Network Description)
or configuration files [40]. NED handles the creation of network topologies as well and modules.
Modules and algorithms are programmed through C++ classes. Messages that modules can receive are
defined in message definition files. NED allows the user to create simple modules from C++ bases or
other NED modules. These simple modules can be further combined to form compound modules and
hierarchies. HVNSL provides the ability to instantiate network entities with a Java base and to customize
them through method invocation. However, HVNSL does not operate at the module level and does not
provide the ability to create compound objects of primitives/Java objects.
OMNeT++ includes an Eclipse-based IDE that allows for visual setup and monitoring of the simulation
[41]. It also provides the ability to analyze and display the results of running a simulation. It also
provides built-in logging facilities for every module and includes libraries for statistical calculations to aid
in result analysis.
P a g e | 71
A conclusion is the place where you got tired of thinking.
- Martin H. Fischer
13. Conclusions
This project describes the architecture and implementation of Hardware Varying Network Simulator
(HVNS), a discrete event simulator. The impetus behind its creation stems from the observation of two
phenomena. The first is the move toward distributed data storage due to increasing data production
rates. The second is the rapidly increasing speeds of network communication devices. This presented a
question: at what point does distributed storage become not just necessary due to capacity constraints
but also advantageous due to superior transmission speeds despite network communications overhead?
In other words: under what circumstances can remote reading a range of data beat the local reading of
that same data?
Simulation presents an avenue for answering that question. Hardware and network components can be
modeled as agents who can send and receive messages from events scheduled with the simulation.
These agents need to be operation bound to mimic the limitations of their real-world counterparts.
These agents use event rescheduling and refresh messages to ensure that their operations are bound
and that they react to events appropriately. Distribution and Retrieval Algorithms can be modeled by
FSA and implemented using the state design pattern to aid in development and debugging.
ANTLR provides an effective framework to implement the configuration language HVNSL. HVNSL allows
simulations to be setup outside of Java code and development. HVNSL is a simple frontend for network
entity creation that relies upon an underlying factory class and methods. Node creation and connection
is simple since address assignment and connection creation are handled behind the scenes for the end-
user allowing said user to be concentrate on topology creation and hardware attribute modification. It
does so at the cost of complex constructs like conditions and looping and native HVNSL objects.
Importantly, the benchmarks demonstrate that HVNS can be used to provide insight into how hardware
affects the distribution and retrieval of data. As expected, greater adaptor bandwidth provides
improved remote performance with adaptor bandwidth approximating two 2 Giga-data/time speeds
being the point where remote read-times beat local read-times. Greater cache speeds can result in
improved remote read times. Cache sizes do not increase performance unless an intelligent caching
algorithm is used that properly pays the harddrive access up front for the client. The implementation in
place constantly fills the cache which slows down the client’s reading, disabling this increases
performance. Increasing server count and data redundancy can also improve performance so long as
the client’s adaptor does not reach a saturation point. All of this means that remote read-times can
beat local read-times for large amounts of data when using advanced networking hardware at speeds
that are currently unavailable.
There are a variety of discrete event simulators employed in research which each have their strengths
and weaknesses. HVNS compares favorably for this problem. It is able to support the simulation of
P a g e | 72
thousands of nodes and the distribution of hundreds of thousands of pieces of data. HVNS’s strength is
that it was designed for this problem domain, uses a modern language and design patterns, and has a
simple configuration language. Its weakness is that it is fixed in this problem domain, lacks visualization,
and its configuration language lacks some amount of expressiveness. A prudent course might be to
more closely examine OMNeT++ which has extensive support for these features and lacks the harsh
learning curve of ns-2. Additionally, JiST provides an interesting (if unproven) paradigm that would
greatly simplify the construction of any simulation. JiST especially is worth looking into as well.
P a g e | 73
To raise new question, new possibilities, to regard old problems from a new angle, requires
creative imagination and marks read advance in science.
- Albert Einstein
14. Future Work
HVNS has demonstrated that it is a capable simulator which can simulate complex topologies, support
an arbitrary number of simulatable agents, and which produces results in line with expectations.
However, there are a number of improvements that could be made to increase its versatility and
performance.
14.1. Architecture
The simulator architecture only supports operation on a single processor. However, any of the events
that occur in the system at the same point in time can safely be simulated in parallel which would result
in a boost to performance. It would be advantageous for the simulator to be multithreaded and support
the concurrent operation of multiple agents as modern PC architecture moves away from clock
frequency in favor of increasing processor core counts. Ideally, the abstraction provided for this
functionality would allow for the use of any arbitrary processor on a distributed network of machines.
The speedup would allow for larger data sets to be distributed and retrieved which would allow for
more accurate simulation results.
The network model in place greatly simplifies many aspects of network communication. Packets are not
lost, they are not replayed, and they are never corrupted. Network connections never experience
degraded performance or disrupted operation. These aspects can be approximated in the current
simulation by adjusting transit time or bandwidth higher or lower depending upon the degree of
performance loss expected. These approximations may be difficult to determine and the degree of
performance loss may itself need to be simulated. Future version of HVNS should model these features.
Adding these features would also necessitate the implementation of an enhanced transport protocol
handler as well as an improved network protocol handler.
14.2. HVNSL
The configuration files inside of a configuration set share a great deal of commonality since each differs
only by a single value for one hardware/DRA attribute. HVNSL could simplify this by implementing an
include statement that would allow a master file of common values to be shared across multiple
configuration files. Alternatively, HVNSL could be expanded to provide support for multiple simulation
runs with a looping construct that could specify the range of values being tested.
P a g e | 74
14.3. Algorithm Design
The DRAs designed are relatively simple and can be approved in a number of ways to better model
facets of data distribution that will be encountered in real-world settings.
CMDRA and SMDRA assume a static network that runs without hardware failure and guaranteed
uptime. Network infrastructure and computer hardware often fails unpredictably either through a
diminishing in performance or total loss of operation. Connections become severed and harddrives
crash. New DRAs should be designed to take this into account by monitoring the availability of data on
network servers and redistributing data to alternate devices to maintain availability.
CMDRA and SMDRA have naïve cache prediction capabilities. The only criterion used to select data for
cache storage is whether the data has yet been requested from the client. If the data hasn’t yet been
requested, then the DRA knows that it will be in the future and as such copies it from the harddrive to
the cache. It does this without considering whether there is a pattern to the requests. This works due
to the nature of the experiments being run on HVNS where it is known that the client is going to request
all data that a server is storing at some point in the future and that it is only going to request any
individual piece of data once. It is inappropriate for alternative use cases which may be found in
residential or corporate environments where arbitrary data is requested at arbitrary intervals of time.
Additionally, the cache replenishment algorithm itself needs to be improved so that it only fills the cache
in the lull between data requests so that the client does not have to pay for the cost of the harddrive
access.
14.4. Benchmarks
There are other metrics that could be considered when judging a DRA’s performance that are not
considered in this project. However, data is currently recorded which can be analyzed to produce these
metrics. These metrics include: network load-balancing, server load-balancing, etc. Network load-
balancing is a measure of the amount of network traffic being handled by nodes in the network and can
demonstrate where network links are being overloaded. Server load-balancing is a measure of the
amount of work servers handle as a part of DRA operation and can indicate shortcomings in a DRAs
distribution of data requests/responses.
P a g e | 75
References
1. The Network Simulator ‐ ns‐2. *Online+ *Cited: 09 02, 2009.+
http://nsnam.isi.edu/nsnam/index.php/User_Information.
2. Gross, Grant. Amount of data stored doubles in three years. Network World. [Online] 10 23, 2003.
http://www.networkworld.com/news/2003/1028amounofda.html.
3. Schreier, Paul. HPC Applications: CERN. hpc High-Performance Computing Projects. [Online] 08 2008.
[Cited: 08 25, 2009.] http://www.hpcprojects.com/features/feature.php?feature_id=207.
4. WLCG ‐ Service Challenges. Worldwide LHC Computing Grid Project. [Online] [Cited: 09 02, 2009.]
http://lcg.web.cern.ch/LCG/documents/ServiceChallenges.pdf.
5. Yet Another Network Simulator. Lacagge, Mathieu and Henderson, Thomas R. 2006, Workshop on
Ns‐2 (WNS2).
6. Murphy, David. Western Digital Launches World-First 2TB Hard Drive. PC World. [Online] PC World
Communications, Inc., 1 27, 2009. [Cited: 05 11, 2010.]
http://www.pcworld.com/article/158374/Western_Digital_Launches_WorldFirst_2TB_Hard_Drive.html
?tk=rss_news.
7. Ns3‐Project Goals. Henderson, Thomson R. Pisa : s.n., 2006, Workshop on ns‐2: The IP Network
Simulator.
8. Strogatz, Steven. The End of Insight. [book auth.] John Brockman. What is Your Dangerous Idea?
Today's Leading Thinkers on the Unthinkable. New York : Harper Perennial, 2007.
9. Barr, Rimon. An Efficient, Unifying Approach to Simulation Using Virtual Machines. PhD Dissertation.
s.l. : Cornell University, 2004.
10. CERN openlab. CERN openlab. [Online] 10 09, 2009. [Cited: 10 15, 2009.]
http://proj‐openlabdatagrid‐.
11. Parallel Discrete Event Simulation. Fujimoto, Richard M. 0001-0782, New York, NY : ACM, 1990, Vol.
33.
12. Tutorial on Agent-Based Modeling and Simulation. Macal, Charles M and North, Michael J. Orlando,
Florida : s.n., 2005.
13. Linz, Peter. Formal Languages and Automata (4th ed.). Sudbury, MA : Jones and Bartlett, 2006.
14. OSI Reference Model -- The ISO Model of Architecture for Open Systems Interconnection.
Zimmermann, Hubert. 4, 1980, Vol. 28.
P a g e | 76
15. Ahmed, S and Raja, M. Y. A. Optical Communications, Internet2 and Global Reliance on Informatics.
High Capacity Optical Networks and Enabling Technologies, 2008. HONET 2008 International Symposium
on. 2008.
16. Gosavi, A. Simulation Moedling and Analysis. New York : McGraw-Hill, 2007.
17. Spall, James C. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and
Control. [Online] 9 2007. [Cited: 04 22, 2010.] http://www.jhuapl.edu/ISSO/PDF-txt/SMC_Intro_000.ppt.
18. Gosling, James, et al. The Java Language Specification (Third Edition). Santa Clara, California :
Addison Wesley, 2005.
19. Lindholm, Tim and Yellin, Frank. Java(TM) Virtual Machine Specification, The (2nd Edition). Palo
Alto, California : Prentice Hall, 1999.
20. Gamma, Erich, et al. Design Patterns: Elements of Reusable Object-Oriented Software. s.l. : Addison-
Wesley, 1995.
21. Parr, Terence. About the ANTLR Parser Generator. ANTLR v3. [Online] ANTLR Project. [Cited: April
15, 2010.] http://www.antlr.org/about.html.
22. —. Innovations and Contributions to Computer Science. ANTLR v3. [Online] ANTLR Project, February
1999. [Cited: April 15, 2010.] http://www.antlr.org/contributions.html.
23. —. Article List. ANTLR v3. [Online] ANTLR Project, March 5, 2010. [Cited: April 15, 2010.]
http://www.antlr.org/article/list.
24. —. Grammar List. ANTLR v3. [Online] ANTLR Project, March 26, 2010. [Cited: April 15, 2010.]
http://www.antlr.org/grammar/list.
25. ECMA. ECMA-334 4th Edition. C# Language Specification. Geneva : ECMA Interational, 2006.
26. Python Software Foundation. Python Language Reference, The. Python. [Online] Python Software
Foundation, May 2010. [Cited: May 11, 2010.] http://docs.python.org/reference/index.html.
27. Dropbox. Dropbox. Dropbox. [Online] 2010. [Cited: May 12, 2010.] https://www.dropbox.com/.
28. Google Inc. Google docs. Google docs. [Online] Google inc., 2010. [Cited: May 12, 2010.]
29. Hewlett-Packard Development Company, L.P. HP's MediaSmart Server for your home. HP Laptops,
Desktops, Servers, and more. [Online] Hewlett-Packard Development Company, L.P., 2010. [Cited: May
12, 2010.] http://www.hp.com/united-states/campaigns/mediasmart-server/.
30. The Network Simulator ns-2: Validation Tests. The Network Simulator ns-2. [Online] [Cited: May 12,
2010.] http://www.isi.edu/nsnam/ns/ns-tests.html.
P a g e | 77
31. Stroustrup, Bjarne. The C++ Programming Language (Third ed.). s.l. : Addison-Wesley Professional,
1997.
32. OTcl. OTcl. [Online] [Cited: May 12, 2010.] http://otcl-tclcl.sourceforge.net/otcl/.
33. Extending Tcl for Dynamic Object-Oriented Programming. Wetherall, D and Lindblad, C.J. Toronto,
Ontario : USENIX Tcl\Tk Conference, 1995.
34. DedaSys LLC. Programming Language Popularity. Programming Language Popularity. [Online] April
22, 2010. [Cited: May 9, 2010.] http://langpop.com/.
35. Rimon, Barr and Haas, Zygmunt J. JiST / SWANS. JiST / SWANS. [Online] [Cited: May 10, 2010.]
http://jist.ece.cornell.edu/.
36. Chung, Jae and Claypool, Mark. NS by Example. [Online] [Cited: May 8, 2010.]
http://nile.wpi.edu/NS/.
37. Barr, Rimon. JiST / SWANS. docs. [Online] September 24, 2003. [Cited: May 9, 2010.]
http://www.cs.cornell.edu/barr/repository/jist/.
38. Varga, András. OMNeT++ Community Site. OMNeT++. [Online] April 20, 2010. [Cited: May 7, 2010.]
http://www.omnetpp.org/.
39. —. OMNeT++ API Reference 4.0. Parallel simulation extension (doxygen). OMNeT++. [Online]
February 26, 2009. [Cited: May 8, 2010.]
http://www.omnetpp.org/doc/omnetpp40/api/group__ParsimBrief.html.
40. —. OMNeT++ User Manual (OMNeT++ version 4.0). OMNeT++. [Online] [Cited: May 9, 2010.]
http://www.omnetpp.org/doc/omnetpp40/manual/usman.html.
41. —. What is OMNeT++. OMNeT++. [Online] [Cited: May 8, 2010.]

http://www.omnetpp.org/home/what-is-omnet.
42. WNS2 2008: The Second International Workshop on NS-2. [Online] [Cited: 10 15, 2009.]
http://www.wns2.org.

Master&#39;s Project Report HVNS RC3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Master&#39;s Project Report HVNS RC3

Uploaded by

Copyright:

Available Formats

Simulating the Performance of Data

Distribution and Retrieval Algorithms

Rochester Institute of Technology

Chair – Hans-Peter Bischof

Reader – Axel Schreiner

7.2. Implementation ..................................................................................................................... 32

14.1. Architecture ....................................................................................................................... 73

Data expands to fill the space available for storage.

1.2. Distributed Data

1.4. Current Simulators

The last observation is the impetus behind this project.

2.3. Report Structure

3.1. Static versus Dynamic Simulation

3.2. Deterministic versus Stochastic Simulation

3.3. Continuous versus Discrete Simulation

3.4. Analytical versus Agent-Based Simulation

- Rudolph Heinz Pagel

HVNS is a dynamic, deterministic, discrete event-scheduling, agent-based simulator. HVNS is a dynamic

Set ENDING_CONDITION to FALSE

Event Scheduling Algorithm

if( event time < simulator time )

FIGURE 4-2. PSEUDO-CODE FOR EVENT SCHEDULING.

Main Event Loop

while( ENDING_CONDITION is FALSE )

Remove event from queue

Bootstrapping Future Work

[receive bootstrap doWork[1]

Work[1] Work[2] Work[n-1] Work[n]

4.3. Operation Bound Simulatables

Operation Bound Simulatable FSM [Get Request

FIGURE 4-6. SUBCLASSED OPERATION BOUND SIMULATABLE OVERRIDING HOOK METHOD.

An idea is always a generalization, and generalization is a property of thinking. To generalize

Two Nodes Connected via Medium

Connection Medium Connection

5.2. Connection Adaptor

Network of Three Nodes

Network of Three Nodes Routing a Packet

5.3. Connection Medium

Connection medium is a packet duplicating/propagating device. It represents the physical layer /

5.4. Protocol Stack

TCP/IP Node’s Protocol

Application Layer Algorithm

Transport API Transport API

Data Link Layer

 Associate a protocol handler with a protocol name

 Handle packets from the higher level protocol (implementation dependant)

Algorithm Sending a Message to another Algorithm

Generalization is necessary to the advancement of knowledge; but particularly is indispensable

All hardware objects share the following performance-altering properties:

6.1. Hardware Computer Node

6.4. Connection Adaptor

7. Distribution and Retrieval Algorithms

 Retrieve data from local storage.

7.1. Operation Model

The client’s general operation is roughly as follows:

1. Select volunteer(s) and acknowledge their role as server(s).

IState _state Init Distribute Read

delegateEvent( Event e ) delegateEvent( Event e ) delegateEvent( Event e ) delegateEvent( Event e )

FIGURE 7-1. UML OF STATE DESIGN PATTERN.

7.3. Client-Managed Distribution Algorithm

NullRole AwaitVolunteers Distribute

Client FSM [receive ServerReady

[send DoWork] send DoWork] send SimulationComplete]

Master's Project Report HVNS RC3

Master's Project Report HVNS RC3