You are on page 1of 14

Beyond Controllers and Capacitors

UNDERSTANDING SYSTEM REDUNDANCY

BY JOSH NELAND AND FRANKLIN FLINT

Fast Forward
Evaluate redundancy opportunities within system

and subsystem software. Eliminate single points of failure in hardware infrastructures. Configure redundancy in servers, power, cooling, and networking.

Building High Availability


S T A R T A T A H I G H L E V E L W I T H T H E S O F T W A R E

ENVIRONMENT D E T E R M I N E A S T R A T E G Y F O R B U I L D I N G T H E REQUIRED AVAILABILITY LEVELS INTO THE VARIOUS SUBSYSTEMS

Clustering software enables automatic failover from a faulty node to a different node in a cluster. Virtualization allows different types of software stacks to be managed in a uniform way. Match the responsiveness of a chosen solution to the requirements during the design phase. Nodes should minimize state and fail gracefully.

Active/Active and Active/Passive Configurations


Active/Active
The primary and reserve

Active/Passive
Provides a fully redundant

nodes are running the database at any given time Low failover latency Easy expansion Load balancing Requires nodes to be designed so they can run concurrently

node for each operational node in the system The redundant node is brought online only if the active node fails Simpler to implement Cost of providing a fully redundant set of nodes (and hardware) can be high

The difference between Dedicated and Multi-purposed Nodes


With dedicated nodes, you must decide how many redundant nodes to provide for each interface. With multi-purposed nodes, a single shared pool of extra nodes can provide the redundancy for any of the interfaces.

Evaluating Required Interface Behavior


Make external interfaces highly available by

implementing a retry policy internally across redundant nodes of any subsystem. Design the system to achieve availability at the underlying storage level. All the subsystems should access shared, centralized storage. Replicate databases and regularly update the copies.

Configuring Hardware for Redundancy


Hardware redundancy starts

with the selection of server technology. Blade servers are more spaceefficient and energy-efficient than equivalent stand-alone rack servers, and blades are easy to slide into a chassis to quickly add processing capacity or swap out a faulty unit.

Redundant hardware

components can include:


Redundant and hot-pluggable chassis cooling fans Redundant chassis management modules Standard N-1 redundant hotplug power supplies Optional redundant RAM Redundant storage with hotplug hard drives Battery backed-up RAID cache Redundant network interfaces Redundant hot-plug switch modules

Blade Servers

Server Configuration Options


R E V I E W T H E A V A I L A B L E C O N F I G U R A T I O N

OPTIONS FOR VARIOUS FUNCTIONS

setting up a redundant configuration can simply require installing modules and powering up the system Or it may require using a management interface to configure the redundancy options.

A M O N G T H E M O S T C O M M O N P O I N T S O F F A I L U R E

IN ANY SERVER ARE THE HARD DRIVES

Providing Backup for Power, Cooling


Organizations wanting a true high-availability solution should consider purchasing power from two different utility companies.

Each blade server enclosure contains

multiple chassis power supplies Redundancy allows uninterrupted system operation if one or more of these chassis power supplies fails Administrators can also configure the management interface to turn off server blades based on the organizations priorities The blade chassis cooling system, which consists of multiple fan modules, is an important factor A well-designed blade chassis will provide hot-pluggable fans with more cooling potential than is necessary for the platform

Ensuring Redundancy
Well-designed blade servers ensure network

connection redundancy Chassis designs also allow installation of dual Ethernet switch modules that can enable additional connectivity or network redundancy and fault tolerance A distinction between a blade server and others is that the connection between the NIC or LOM and the internal ports of the switch module is hardwired through the midplane

Deciding Where to Invest


As infrastructure grows increasingly integrated and

intelligent, companies have an opportunity to change the architecture of their systems

This way, availability does not depend on every single processor and capacitor, but is also built into databases, interfaces, and applications

Be selective about where to invest in redundancy Start with the overall system, assess which subsystems are required to be highly available, and consider the software environment as well as the hardware domain

ABOUT THE AUTHORS

Josh Neland and Franklin Flint are Technology Evangelists for the OEM Solutions group at Dell. You can follow their musings on Twitter @joshneland and @franklinAtDell, and read more of their work at http://blog.delloem.com/

Resources

http://blog.delloem.com/2011/08/article-on-redundancy-written-by-ourdell-oem-technology-strategists-on-intech/
http://content.dell.com/us/en/enterprise/oem-industry-solutions-oemappliances-program.aspx

You might also like