You are on page 1of 8

AbstractInitiative

Performance Analysis & Tuning / Capacity Management / Cluster (HA) & Grid Computing / LDAP

www.AbstractInitiative.com

Viral Grid Computing


Contents
Summary ....................................................................................................................................................... 3
The Workflow................................................................................................................................................ 3
Differences .................................................................................................................................................... 3
Philosophy ..................................................................................................................................................... 6
Scalability ...................................................................................................................................................... 7
More Complex Architectures ........................................................................................................................ 7
Workloads ..................................................................................................................................................... 8
Permission, Trademarks, and Copyrights ..................................................................................................... 8

Abstract Initiative, LLC All Rights Reserved 2|Page


Summary
This computing model has been around for a long time. I know this because the
principal architects here at Abstract Initiative, LLC have been doing it for a long time -
around 10 years in some form or another.

You've seen it on news websites, possibly on television. It's cheap, powerful, efficient, &
most importantly incredibly effective. Thousands of desktop computers infected with
some sort of computer virus receiving instructions from some IRC chat room and acting
on a task or tasks en masse.

The Workflow
The basic workflow is this:

1) A job source fills a queue with atomic tasks. (The queue may be well satisfied
with a database and an application to insert/update/delete records - a simple
queue.)
2) Preconfigured worker nodes pull tasks from the queue, update their status
(running/run time/output/errors/completion/etc.)
3) A "queue manager" process checks for errors/failed jobs/etc. and depending on
the desired queue policies can resubmit jobs that do not complete correctly, or
can stall/isolate failed jobs if necessary.
4) An application reports on queue status, performance, errors, job completion.

This scenario works very well for repetitive, high volume batch-type work - but the
model can easily be adjusted for online/transactional type work as well. (The type of
work commonly performed by MPI environments - with a few changes to the scheduling
model.)

Differences
Historically, in "HPC" or "grid" computing there are one or more "manager" or "head"
nodes from which jobs are initiated. The jobs are then initialized and sent to the
compute slices. This model isn't much different, except for the reliance on a message
passing middleware and possibly some sort of distributed, shared memory mechanism.
These mechanisms usually require non-standard programming and architectural
knowledge on the part of the application developers and can sometimes restrict the
possible programming languages to those supported by the MPI or NUMA/Shared
Memory middleware.

Because of moving the tasks previously managed by an MPI suite


(MPI_send/MPI_recv/etc.) to simple communications between the compute slices and

Abstract Initiative, LLC All Rights Reserved 3|Page


the "manager" node(s), a significant layer of complexity is removed. Because this model
does not provide for a shared memory mechanism, attention must be paid at the
workload architecture level to atomicity of units of work, and some facility must be
created to reassemble the "pieces" into the final product.

There is no specific operating system, programming language, high level network


transfer protocol, queue software, hardware platform, or specific vendor that one needs
to use to implement this sort of computing platform. This decision can be made for the
specific workload, or for your infrastructure. We are effectively simplifying what was
previously a complex computing operation.

A common infrastructure in HPC/Grid computing is to interconnect all of the nodes with


each other. This helps with shared memory access and utilization, data migration, and
in some cases fencing - In the event of node failure. Because we are breaking our
workload into smaller, discreet bite-sized pieces we shouldn’t need this additional
infrastructure. The atomicity of the units of work should effectively render this
functionality useless.

In the images below, the inter-node interconnects are red arrows. In real life, with a
large number of compute slices, these red arrows become expensive and heavily used
critical infrastructure.

Abstract Initiative, LLC All Rights Reserved 4|Page


Abstract Initiative, LLC All Rights Reserved 5|Page
In addition to the hardware infrastructure simplicity, there are more benefits to Viral Grid
Computing with regards to batch work. Another common piece of the puzzle is some
sort of job scheduling software. There are many out there with broad ranges in price
and functionality, but this type of architecture is capable of negating the need for this
added complexity.

Philosophy
The source of this philosophy arguably comes from 2 sources:

1) The UNIX(R) philosophy of "Do one thing, do it well" as evidenced by the large
number of small, single-function commands that specialize in specific tasks. As
well as:

Abstract Initiative, LLC All Rights Reserved 6|Page


2) The "virus" methodology of feeding work to an army of single or limited function
"muscle" nodes (sometimes known as zombies).

Scalability
The Queue: Depending on how the work queue(s) is/are implemented you may be able
to scale the "manager" node(s) significantly in a horizontal fashion. RDBMS
technologies like Oracle RAC(R), MySQL NDB, or other "cluster-able" database
architectures can scale to the physical or logical limits of the RDBMS software. This can
also be leveraged to increase the availability of the "manager" nodes or to increase the
concurrent processing power thereof.

Worker Nodes: Since the work is pulled, worked, and returned by the worker nodes, and
in the simplest examples of this architecture - your logical limits are

1) Physical resource limitations (rack space, network/storage bandwidth,


network/storage ports, electricity, etc.)
2) The amount of available work and the amount of time/resources required to
complete that work.
3) The capabilities of the code written to complete each unit of work, and the ability
to reassemble the completed pieces (if necessary).

Reporting/Management Application: This is merely a user-queue interoperability


mechanism. Its scalability depends on the complexity of the workload(s) and the
capabilities and functionality of this application.

More Complex Architectures


Because this is nothing more than a multi-server queue (M/M/C in Kendall notation) it
can also be expanded into a multi-queue/multi-server queuing network (open or closed).
If you think of the base architecture as people lining up at a bank to get to a teller, the
more complex architectures can be seen as:

1) Going to the bank.


2) Then going to the DMV.
3) Then a short shopping spree at your local shopping mall.

The high-level description of this queuing network could simply be named "errands".
There is no rule that says you can't have more than one queue, and that your worker
nodes (compute slices) can't be multi-talented. You simply have to allow for this in your
workload design.

Abstract Initiative, LLC All Rights Reserved 7|Page


The same thing can be done with MPI but for the most part this is fairly uncommon
because of the complexities of MPI architectures and the development time and effort
required for implementation. To implement this, the management application simply
needs to include the logic to migrate a job/task to another queue upon completion (or
assembly of completed jobs) from the previous queue.

Workloads
Almost every IT shop has a workload or two that would benefit greatly from this sort of
architecture. An easy way of picking the right workload is to find one that has a large
number of discreet units of work, or a number of stages.

One type of application that comes to mind is a payroll system, for example. In this case
paycheck amounts, accrued vacation & sick time, and items of that sort. Traditionally a
payroll system will work through these calculations in a serial manner. To make this
more efficient, simply take the information needed to make these calculations and split
them up (by job band, by business unit, or whatever) and put it in a queue for a compute
node to process. Instead of processing each payroll instance one at a time, you can
send them out in blocks, and reassemble them upon completion. This can be done with
MPI as well, but vendor support for this may vary. You may not even need to make
many changes to an existing payroll system to do it this way (licensing costs may be
another concern) - by simply having a compute node per-business unit (or whatever
your distinction may be) and using the database that stores this information as an
informal “queue”.

Permission, Trademarks, and Copyrights


The contents of this document were assembled from documents originally authored,
except for the “Summary” section and mention of specific vendor’s RDBMS products, by
Brian Horan between August 1998 and November of 2008, used with written permission
and copyright 2002-2009 Brian Horan.

UNIX is a registered trademark of The Open Group.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names
may be trademarks of their respective owners.

MySQL is a trademark of Sun Microsystems, Inc. in the United States, the European
Union and other countries. Abstract Initiative, LLC is independent of Sun Microsystems,
Inc.

Abstract Initiative, LLC All Rights Reserved 8|Page

You might also like