You are on page 1of 6

High Availability Overview

Reliability Performance Schedule of Operations Availability High Availability Continuous Operations Continuous Availability Fault tolerance

80% or more of unplanned downtime is the result of People and Processes, NOT hardware or O/S failures Application failures Software failures, errors in configurations Scheduling errors Operator errors Out of space conditions Batch prevented OLTP from being available on time Data corruption Unexpected or unplanned volumes

To address the 80%, invest money/time in: Staffing, Training Change management Problem management Job scheduling, restart procedures Page 1 of 6

High Availability Overview


Intelligent event management, tuning Application architecture Function, regression, integration, load testing Test and time recovery scenarios Production readiness reviews, standards Application planning, capacity planning

The Road to High Availability some technology stuff. Minimize SPOF- Single Point Of Failure Environmental, facilities, network Web load balancers, redundant dispatchers RAID: level 5/0/1, mirroring, striping ECC data protection On site spares, hot swappable parts HA solutions, clustering, auto fail over Data Base replication, cloning Oracle Parallel Server- OPS Understand the application architecture and constraints. Understand all application dependencies and interrelationships to needed components. Reduce batch interference. Confront the backup problem. Hot backup strategies, cloning, SANs. Manage other planned changes.

Manage the Planned downtime: Infrastructure and facility work Page 2 of 6

High Availability Overview


Hardware changes and upgrades Operating system level changes Database changes and releases Application changes and releases- release tolerance a key item Increased need for infrastructure test environments. To some this is new. Common maintenance windows Expect increased coordination, staff overhead Application availability dependent on design. Transaction queuing, batch processing Release tolerance, recovery

Set schedule and availability expectations early. Have some functions up 24 x 7, not all.

Continuous availability cost about 3.5X as much as a standard application. (GartnerGroup) Applications are interrelated and integrated with others more than ever. Shared infrastructure elements are more common. Managing a maintenance window for each application can be exceedingly complex. A common maintenance window for infrastructure activity can be beneficial. Saves negotiating time, sets expectations

Step 1 Define the Problem A problem well defined is a problem 80% solved. For each application area, determine what the problem/goal is with the correct user representative(s) . Determine the schedule goal. Separately, determine the availability goal. Page 3 of 6

High Availability Overview


Schedule and availability should be determined and designed in up front, just like any other application functional requirement. Its more costly to retrofit.

Step 2

Categorize Categorize the applications into groups. For Example. Business Support Systems Operational Support Systems Self Service/E-Commerce Management Support Systems Business Support System Mon-Fri: Sat: 6:00 a.m. to 10:00 p.m. EST 6:00 a.m. to 6:00 p.m. EST

Sun: Normal maintenance window Batch updates, data refreshes

Operational Support Systems Round-the-clock operations, such as physical plant, security, hospitals Near 24x 7 schedule Occasional Sunday morning maintenance Monthly cold backups Batch, backups non-disruptive to users Accessible about 8700 hours/year The most extended schedule

Self Service/ E-Commerce Near 24 by 7 schedule Can tolerate 1-2 hours down per night Accessible from 148 to 156 hours per week Page 4 of 6

High Availability Overview


Batch and backups during 1-2 hours per day

Management Support Systems Systems used by management for such activities as reporting, queries. Same schedule as Business Support Systems

Step 3 Know the Applications Understand each applications architecture, constraints, release tolerance, flexibility to change. In-House vs. purchased. Know the applications dependencies on other applications and components. Architecture Diagrams, data flows are key. Know the Baseline What is your current SOP with respect to technology? Procedures? Testing? What is your current availability? What can you expect with existing budget? If you havent already, at least start measuring something. Identify root causes of unplanned downtime. What are infrastructure constraints on expanding schedule? Know the Costs What improvements can you make from existing budget? Training, testing, Q/A, etc. Invest in the right areas for you to expand schedule and availability. Know costs to expand schedule beyond baseline to meet goals. Know costs to increase availability beyond baseline to meet goals. The Business Case Develop a consistent approach to weigh the business benefits vs. the cost. Maintain focus on the business problem/goal. The Steering Committee or business owner(s) of the applications need to determine the business need. Its difficult to cost and plan for applications individually- categorizing may help. Page 5 of 6

Step 4 Step 5 Step 6

High Availability Overview


Step 7 Differentiate between like to have and true business need. Who pays? May not be any quick fix. Execute The Plan Have Commitment. Sr. management commitment Front-line management commitment

Define the resources, people, budget, etc. Define ownership. Develop, document a typical plan, with goals, activities, responsibilities, dates, etc. Make it part of existing project plans

Manage and adjust. Measure actual vs. goal.

Page 6 of 6

You might also like