Professional Documents
Culture Documents
Performance Analysis & Tuning / Capacity Management / Cluster (HA) & Grid Computing / LDAP
www.AbstractInitiative.com
You've seen it on news websites, possibly on television. It's cheap, powerful, efficient, &
most importantly incredibly effective. Thousands of desktop computers infected with
some sort of computer virus receiving instructions from some IRC chat room and acting
on a task or tasks en masse.
The Workflow
The basic workflow is this:
1) A job source fills a queue with atomic tasks. (The queue may be well satisfied
with a database and an application to insert/update/delete records - a simple
queue.)
2) Preconfigured worker nodes pull tasks from the queue, update their status
(running/run time/output/errors/completion/etc.)
3) A "queue manager" process checks for errors/failed jobs/etc. and depending on
the desired queue policies can resubmit jobs that do not complete correctly, or
can stall/isolate failed jobs if necessary.
4) An application reports on queue status, performance, errors, job completion.
This scenario works very well for repetitive, high volume batch-type work - but the
model can easily be adjusted for online/transactional type work as well. (The type of
work commonly performed by MPI environments - with a few changes to the scheduling
model.)
Differences
Historically, in "HPC" or "grid" computing there are one or more "manager" or "head"
nodes from which jobs are initiated. The jobs are then initialized and sent to the
compute slices. This model isn't much different, except for the reliance on a message
passing middleware and possibly some sort of distributed, shared memory mechanism.
These mechanisms usually require non-standard programming and architectural
knowledge on the part of the application developers and can sometimes restrict the
possible programming languages to those supported by the MPI or NUMA/Shared
Memory middleware.
In the images below, the inter-node interconnects are red arrows. In real life, with a
large number of compute slices, these red arrows become expensive and heavily used
critical infrastructure.
Philosophy
The source of this philosophy arguably comes from 2 sources:
1) The UNIX(R) philosophy of "Do one thing, do it well" as evidenced by the large
number of small, single-function commands that specialize in specific tasks. As
well as:
Scalability
The Queue: Depending on how the work queue(s) is/are implemented you may be able
to scale the "manager" node(s) significantly in a horizontal fashion. RDBMS
technologies like Oracle RAC(R), MySQL NDB, or other "cluster-able" database
architectures can scale to the physical or logical limits of the RDBMS software. This can
also be leveraged to increase the availability of the "manager" nodes or to increase the
concurrent processing power thereof.
Worker Nodes: Since the work is pulled, worked, and returned by the worker nodes, and
in the simplest examples of this architecture - your logical limits are
The high-level description of this queuing network could simply be named "errands".
There is no rule that says you can't have more than one queue, and that your worker
nodes (compute slices) can't be multi-talented. You simply have to allow for this in your
workload design.
Workloads
Almost every IT shop has a workload or two that would benefit greatly from this sort of
architecture. An easy way of picking the right workload is to find one that has a large
number of discreet units of work, or a number of stages.
One type of application that comes to mind is a payroll system, for example. In this case
paycheck amounts, accrued vacation & sick time, and items of that sort. Traditionally a
payroll system will work through these calculations in a serial manner. To make this
more efficient, simply take the information needed to make these calculations and split
them up (by job band, by business unit, or whatever) and put it in a queue for a compute
node to process. Instead of processing each payroll instance one at a time, you can
send them out in blocks, and reassemble them upon completion. This can be done with
MPI as well, but vendor support for this may vary. You may not even need to make
many changes to an existing payroll system to do it this way (licensing costs may be
another concern) - by simply having a compute node per-business unit (or whatever
your distinction may be) and using the database that stores this information as an
informal “queue”.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names
may be trademarks of their respective owners.
MySQL is a trademark of Sun Microsystems, Inc. in the United States, the European
Union and other countries. Abstract Initiative, LLC is independent of Sun Microsystems,
Inc.