You are on page 1of 10

White Paper

ArChIteCtINg Your Network to SurvIve A DISASter


Six Steps to ensure Application Performance, Network resiliency, Data Integrity, and user Access Security

Copyright 2012, Juniper Networks, Inc.

White Paper - Architecting Your Network to Survive a Disaster

Table of Contents
executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Junipers Six-Step Approach to Architecting the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Analyze Application workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Simplify and Centralize the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Improve Data Synchronization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Monitor Network Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 enhance Network resiliency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 enable user redirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 About Juniper Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

List of Figures
Figure 1: Disaster recovery heat map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Figure 2: Centralized security policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Figure 3: Local compute cluster and geo cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Figure 4: using MAg Series Junos Pulse gateways to redirect users securely . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Copyright 2012, Juniper Networks, Inc.

White Paper - Architecting Your Network to Survive a Disaster

Executive Summary
If your It organization has been thinking about the need to update their Business Continuity and Disaster recovery (BCDr) plan, then you are not alone. According to recent research by 451 research1, disaster recovery planning is top of mind for enterprises, and data replication ranks as a top two-storage initiative for It organizations. It is no wonder that BCDr planning is receiving more attention. Proof is in the outages and financial losses that have occurred from recent disastersfrom floods, tornadoes, hurricanes, and snowstorms, to Japans tsunami. Statistics provide a warning: Seventy-five percent of businesses that do not have continuity plans fail within three years of a disaster, and 43% of that 75% never reopen.2 In addition, government regulations have increased disaster recovery and compliance requirements significantly. these situations have raised awareness of the need to maintain productivity within a company, sustain value chain relationships, and deliver continued services to customers and partnersall of which can be difficult when forced to migrate applications and user connections to a new data center location in real time. the goal of a BCDr plan is often focused on how to continuously access applications and protect data. Data replication and active/active data center planning are often at the heart of BCDr planning. however, a constructive BCDr plan also should consider user connectivity, network availability, and security. A worthwhile plan must extend further than data replication and active/active data centers. this paper explores the key components/building blocks of a comprehensive and robust BCDr solution. this includes how to protect application resources, how to ensure secure user access, how to protect data, and how to keep applications accessible 24/7. In addition, it is important to understand how to maintain availability of applications, how to ensure that users reach those applications, and how to simplify and tune the network to ensure application performance.

Introduction
today, most organizations realize that they must pay attention to BCDr. however, organizations find themselves facing a number of BCDr challenges, ranging from infrastructure sprawl resulting from poor service-level agreement (SLA) definition to infrastructure built without clearly identifying application requirements. Many customers have deployed infrastructure in an ad hoc manner without consistent management or security policies. the result of these practices has been the creation of multiple failure points, difficulty managing the network and provisioning it, and poor utilization of links, many of which are frequently idle. In addition, many organizations also have a distributed authentication, authorization, and enforcement infrastructure leading to complex firewall policies that prevent user-specific enforcement and deployments based on local data center It policies rather than on global policies. these inconsistent policies for users and application access result in security gaps. Since some organizations do not have automated backup systems, they are forced to enable manual backup and configuration synchronization systems. this results in inconsistent states, which affect the user experience, since policies are out of sync due to the time delay in restoring them. Also, legacy applications often are impacted because they cannot always be replicated and established in new locations due to their hard-coded IP addresses. And data flows from different locations can vary greatly causing congestion during link failure, while traffic may not be prioritized based on application relevance, causing lesser priority applications to impact the performance of critical applications.

Junipers Six-Step Approach to Architecting the Network


Juniper Networks has designed a systematic approach to developing an industry-leading BCDr practice as a result of years of BCDr solution deployments. Part of this approach is the establishment of a six-step program to architect data center networks for business continuity and disaster recovery: 1. Analyze application workflows to ensure proper prioritization of application availability requirements. 2. Simplify and centralize the network architecture to minimize the number of failure points and to ensure consistent policies. 3. Improve data synchronization to ensure that applications are available for active/active or active/passive scenarios. 4. Continuously monitor network performance to enable an active/active or active/passive data center. 5. enhance network resiliency so that when the failure or problem is detected, the architecture can fail over rapidly and minimize data loss. 6. enable user redirection rapidly and securely to the new destination and ensure that users can connect to their applications and services. In the following sections, we explore each of these steps in detail.
1 2

451 research Bruce t. Blythe, A Managers guide to Catastrophic Incidents in the workplace, August 2002.

Copyright 2012, Juniper Networks, Inc.

White Paper - Architecting Your Network to Survive a Disaster

Analyze Application Workflow


when considering backup and recovery options, It should consider the criticality of each application and the ability to migrate it. Some data-based applications must be replicated based on an approved scheme. Some legacy applications cannot be virtualized and cannot be moved due to hard-coded IP addresses. If the online ordering or billing system is down, business may come to a halt. therefore, it is crucial to rate applications and prioritize their support for BCDr accordingly. An output of the application workflow analysis is the application heat map. the intent of this heat map is to identify the different applications and their importance across specific attributes such as customer experience and revenue impact. An application might have a bigger impact on the customer experience compared to revenue or vice versa. By analyzing applications in this way, It organizations can prioritize them and develop a robust BCDr solution. Figure 1 depicts a typical heat map related to disaster recovery.

APPLICATION NAME
AD/LDAP

CUSTOMER EXPERIENCE

REVENUE IMPACT

RESTRICTIONS

Legacy Application

Hardcoded IP Addresses

VoIP

Billing Application Critical Medium


Figure 1: Disaster recovery heat map In addition, applications have dependencies that must be identified. Multitiered applications are complex and may have multiple application dependencies. they also often have many users with varying privileges. Multiple points of entry to the workflow may exist. these inferences are used in creating a BCDr solution plan. For instance, a distributed application requires that all dependent segments be migrated to the recovery data center before the application can resume. For an active/active data center, administrators must consider the multiuser privileges for the applications and the multiple entry points in the workflow. therefore, analyzing application workflow must be the first step in a BCDr plan. At this stage, Juniper recommends that It identify the recovery time objective (rto) and recovery Point objective (rPo) metrics for the different applications. three important benefits to the organization are as follows: CategorizationDetermining application priorities (and the metrics for rto and rPo) so that the applications can be categorized for active/active deployments Dependenciesunderstanding application dependencies and requiring that the ecosystem ensure that all application dependencies are available at the backup data center PrivilegesIdentifying application user access privileges to ensure that these privileges are retained for application access at the backup data center

Simplify and Centralize the Network


removing complexity from the network is key when attempting to transition to the backup data center. Security sprawl and complex routing architectures add complexity and create a considerable administrative burden, as well as impact users and applications performance. Simplifying and centralizing the network architecture minimizes the number of failure points and enables consistent policies. the first step is to eliminate security sprawl and simplify routing. this involves ensuring that the data center edge router is sufficiently powerful to handle the wAN traffic and redirection of traffic to the firewalls, as well as having the necessary features to handle data center interconnect requirements. this is an area where Juniper can provide professional consulting services, if desired.

Copyright 2012, Juniper Networks, Inc.

White Paper - Architecting Your Network to Survive a Disaster

Customers can then begin consolidating security to shared firewall solutions, such as Juniper Networks SrX Series Services gateways, attached to the data center edge router. this connection provides flexibility to take advantage of virtual contexts on the firewall to handle multiple security policies and traffic types on the same equipment. Figure 2 depicts a centralized security policy, and enforcement is distributed to provide consistency across data centers. As a result, network administrators can move the network connections behind existing standalone firewalls to the new, shared firewall and eliminate the tiers of firewall appliances.

Au

e th

tic

at o

io

n tio n

AAA

Au

th

a riz

Policies Synchronized MAG Series Cluster MAG Series Cluster

Firewall Policies

Firewall Policies

PRIMARY DATA CENTER


Figure 2: Centralized security policy

BACKUP DATA CENTER

the next step is to eliminate unnecessary router tiers and consolidate routing to the high-performance Juniper Networks MX Series 3D universal edge routers. this design is then fully normalized across all infrastructure pods connecting them to the core router. the final step is to connect data centers, and separate traffic and security by application. It organizations can satisfy the traffic requirements and save costs by deploying fewer links using MPLS virtualization technology over shared links. MPLS running on Juniper routers has been proven in the most demanding service provider networks and is available from most major service providers. to decouple from data center-specific It policies and to ensure consistency, It should migrate to a centralized policy administration system such as the Juniper Networks MAg Series Junos Pulse gateways. using distributed enforcement points is suitable for distributing loads and improving resiliency of the authentication and authorization system. the keys to this approach are centralizing policy enforcement, eliminating site-specific It policies, and deploying simplified firewall policies that are centrally managed, thereby enabling dynamic, consistent policy enforcement. the key benefits of the simplification process are: reduction in the number of devices, which reduces the number of points of potential failure Simplification, which reduces the number of provisioned devices while ensuring that centralized control and consistent policies are administered across data centers Improved security to centralize and virtualize security, which enables easy and consistent policy administration

Copyright 2012, Juniper Networks, Inc.

White Paper - Architecting Your Network to Survive a Disaster

Improve Data Synchronization


Maintaining business continuity means ensuring that applications can be moved to the backup data center and are available as they transition. Because many applications operate in virtual machine environments, it is critical to move virtual machines without causing delays in operations. keeping systems up-to-date is vital to ensuring secure operations in the event of network failures or network attacks. In many cases, security wAN connectivity is also essential to secure application performance. to migrate virtual machines between data centers while maintaining sessions (known as long distance vMotion in the vMware environment), a Layer 2 stretch for connection information, such as media access control (MAC) and IP addresses, that extend across the wAN and between data centers must exist. this scheme provides the best resiliency and redundancy while ensuring performance. Juniper recommends a solution using vPLS on the MX Series routers. Junipers solution enables massive scale using proven standards based on technology that is interoperable with other network equipment. Another part of managing the users experience is to keep systems up to date so that they have the latest security patches. For this to occur, a centralized mechanism is required to distribute software updates to hundreds or thousands of clients. By combining a highly scalable point-to-multipoint (P2MP) solution using virtual private LAN service (vPLS), administrators can achieve data replication that is completely transparent to the underlying wAN infrastructure. If the network does not have an MPLS core and if a secure method is required to move traffic over the IP network, administrators can implement connections using generic routing encapsulation (gre) tunnels over IP or over IPsec. Administrators can implement this transport on the same high-performance MX Series routers that support MPLS and vPLS. the benefits of a Juniper solution for data migration are: Deploy network links using standards-based MPLS technology to realize cost savings benefits due to link sharing and interoperability with existing network equipment Scale rapidly from a few data centers to several data centers and support thousands of end points experience resiliency that is only possible using carrier-class proven technology For additional information, please refer to the MPLS Data Center Interconnection for Disaster Recovery white paper at www.juniper.net/us/en/solutions/enterprise/data-center/simplify/#literature.

Monitor Network Performance


Monitoring network performance is critical to ensuring application performance, meeting recovery time and recovery point objectives, and ensuring connectivity for users and partners. tools are required to monitor network conditions that provide fault reports and performance changes. to enable network monitoring, It organizations can deploy Junipers comprehensive set of network monitoring tools that enable precise network visibility (L3 to L7 views) for a variety of traffic. Junipers tools include J-Flow, real-time performance monitoring (rPM), and quality-of-service (QoS) statistics. J-Flow can be used to monitor IP metrics while rPM measures network round-trip delay, jitter, and standard deviation values for each configured rPM test. Network monitoring tools collect QoS statistics for various network parameters, which are aggregated and presented to provide network visibility into traffic flows based on IPv4, IPv6, MPLS, and other parameters. Collectively, these elements enable a complete monitoring solution. For performance monitoring to be successful, the correct tools are required to aggregate information and analyze it. this means that network administrators should be able to easily feed the network traffic flow to third-party applications, which require router integration. Junipers routers enable third-party application integration using our software developer kit (SDk). A number of developers have created network analysis tools that integrate with our routers using Juniper Networks Junos SDk. As a result, It organization can select the right set of tools to perform network visibility, user, usage, and traffic analyses. For examples, please refer to https://developer.juniper.net/ content/jdn/en/marketplace/discover/application-gallery/application-list.html. Network monitoring tools are not only for application performance monitoring. they also enable the creation of a robust security policy by examining user and usage behavior. this allows administrators to take proactive steps to mitigate risks and improve overall network performance. For instance, users of bandwidth intensive, noncritical applications can be penalized while allowing business critical applications priority.

Copyright 2012, Juniper Networks, Inc.

White Paper - Architecting Your Network to Survive a Disaster

the benefits of network monitoring solutions are: Junipers router-integrated solution eliminates single purpose devices, thereby reducing Capex and opex. Juniper provides a wide variety of third-party tools that can be integrated using Junos SDk. Proactive monitoring helps prioritize business critical applications. Performance monitoring provides visibility that allows administrators to estimate network usage and proactively provision for network growth.

Enhance Network Resiliency


high availability (hA) in the face of transitioning workloads, and the ability to facilitate capacity to accommodate increased traffic, are essential to a resilient network. this means that networks must recover rapidly from link failures and traffic must be routed around failures when failures occur. Compute clusters enhance server resiliency. there are two forms of hA compute clusters. the first is a local compute cluster where the compute resources reside in the same site and use shared disks. the second form is a geo cluster where the compute resources reside in two different sites, with disks one and two located in separate sites. Note that the geo cluster provides resiliency and is proportionally more expensive to deploy. Figure 3 depicts geo cluster support compute resiliency but is dependent on the network.

Local Compute Cluster PUBLIC NETWORK Site 1 Site 1

GeoCluster PUBLIC NETWORK Site 2

Compute Cluster

Compute Cluster

Disk 1 Shared Disk

Disk 2

PRIVATE NETWORK

PRIVATE NETWORK

PRIVATE NETWORK

Figure 3: Local compute cluster and geo cluster Cluster networking requires network reliability and resiliency to ensure that any heartbeat signals, data synchronization, and communication are reliably communicated within the cluster. In addition, cluster networking requires deterministic latency, where the upper bounds of latency are known and fixed to ensure that any delays in heartbeat communication are not perceived as a failure. In addition, real-time state replication requires deterministic latency to ensure state synchronization. Administrators should consider several points when building comprehensive resiliency from server to wAN. Administrators must configure the switches for rapid link recovery if a device fails. Junipers virtual Chassis configuration (where several physical devices are combined into one logical switch) achieves this result. Access layer links connect to the core using several 10gbe connections through mesh connectivity that is enabled using link aggregation group (LAg) technology. the MX Series routers in the core and wAN have multichassis link aggregation group (MC-LAg) enabled for resiliency. the SrX Series firewall has cluster mode enabled for improved resiliency. the MPLS cloud has link level resiliency by using the MPLS fast reroute capability. In this configuration, wAN links are fully redundant and connectivity is ensured.

Copyright 2012, Juniper Networks, Inc.

White Paper - Architecting Your Network to Survive a Disaster

to enable resiliency in the wAN that provides the data center interconnect, Juniper recommends deploying a topology with traffic engineered paths and QoS guarantees. Fast reroute protects the critical paths, which enables 50 ms recovery time and is comparable to the highest standards set by the telecommunications industry SoNet deployments. Junipers MPLS-enabled routers achieve this level of performance. In addition, the MPLS cloud enables privacy between different application traffic using logical separation. this allows network traffic segmentation and isolation, especially when different organizations or applications share the links. the benefits of enhancing network resiliency are: this comprehensive resiliency model minimizes data loss that would result from internal and external data center failures. traffic between data centers must be routed optimally through the least congested paths to ensure minimal delay. Any link failures result in rapid convergence using MPLS. As a result, application access is impacted minimally.

Enable User Redirection


once data has been moved and the applications are running in the failover data center, it is critical to connect users to the applications quickly and securely. ensuring that users are connected to applications depends on several factors. Server load-balancing technology is used to detect unresponsive applications and redirect users. SSL technology is used to ensure secure connections; however, it must be able to handle the increased load. Moving legacy applications, with their hard-coded IP addresses, presents an additional challenge, as does the need to enable collaboration between users who are adjusting to their rapidly changing environment. In a data center with server and application tiers, user access to these servers must be distributed so that servers are not overloaded. when servers experience performance issues, traffic must be redirected to another server. to achieve this, administrators should deploy server load-balancing devices. these can be standalone appliances or virtual instances. Juniper provides services cards for the MX Series routers known as the Multiservices Dense Port Concentrator (MS-DPC) that can run software-based load balancers. this model replaces the dedicated load balancers and reduces the number of network managed devices. the load balancer supports L4 to L7 health monitoring of servers and load-balancing of user traffic. this service is essential to ensure that applications run at optimal performance levels. Figure 4 shows how MAg Series Junos Pulse gateways redirect users in a secure fashion to the new data center.

Junos Pulse Secure Access

MAG Series

Number of Remote Users

Emergency Demand

Normal Demand

Unplanned Event

Time

Figure 4: Using MAG Series Junos Pulse Gateways to redirect users securely

Copyright 2012, Juniper Networks, Inc.

White Paper - Architecting Your Network to Survive a Disaster

one of the biggest challenges for organizations is the host of legacy applications that use hard-coded IP addresses to communicate between its components. these applications are configured so that they cannot use Domain Name System (DNS) services. to solve this issue, Juniper has developed a solution using BgP Anycast addressing and route health Injection (rhI) enabled by an integrated application delivery controller (ADC). In this solution, both the primary and backup data centers, after evaluating L4 to L7 application health, can advertise the virtual IP instances on the ADC that store the legacy applications. the gateway advertises the host address or virtual IP (vIP) address to see if the endpoint is healthy. then, traffic is directed to the nearest data center based on the BgP routing metric that it advertises. If failure occurs, the data center that experienced the failure stops advertising the routes of the failed servers, and as a result, traffic is redirected to the alternate data center. the Anycast address enables a client endpoint to connect to the nearest router. this enables the clients to establish persistence with a given router and enables clients to reach the destination by eliminating a DNS lookup. A challenge to application access is that in the event of failure and redirection, a significant number of users are directed to the data recovery site. what if that number of remote access users suddenly increased 5 or 10 times during a disaster? then consider that local users require access to the data center authentication infrastructure, and we immediately observe a considerable spike in utilization. to enable the needed scalability, Junipers secure access solution, In Case of emergency (ICe) licensing, provides the capability to continuously deliver authentication services in the event of a user rollover during a disaster. ICe utilizes Junipers proven SSL vPN technology to provide remote access capabilities for sudden peak loads in connection requests from remote employees, partners, and customers. to connect mobile users, the It organization requires a solution that not only enables secure connectivity but also enables collaboration with employees and partners. Such a solution should not rely on a dedicated meeting server based in the data center. Juniper Networks Junos Pulse collaboration tool can enable combined secure access and collaboration in a single platform. Junos Pulse also integrates with Microsoft outlook for improved convenience in sharing applications. technological benefits to redirect users are: Anycast and rhI ensure that legacy software can be moved. router-integrated solutions deliver faster route convergence and lower total cost of ownership (tCo). remote users and partners can connect securely with minimal delays even during peak loads (ICe solution). the Junos Pulse platform enables mobile connectivity to the data center, enabling collaboration even when primary meeting resources are not available.

Conclusion
Juniper Networks can summarize its recipe for a successful Business Continuity Disaster recovery solution in three words: simplicity, security, and agility. Simplicity means eliminating redundant architecture, improving utilization, and consolidating services in fewer links. organizations can conveniently centralize control and benefit from consistent policy administration as well as fewer points of failure. Security ensures that all layers of the network are protected. Security must transcend from the traditional security perimeters to the extended boundary of the network. using a combination of device virtualization and end-to-end security, from the mobile device to the hypervisor, Juniper uniquely enables a more secure network. the result of effectively combining simplicity and security is improved agility, which means change without disruption. Juniper uniquely enables an infrastructure that supports change with control.

Copyright 2012, Juniper Networks, Inc.

White Paper - Architecting Your Network to Survive a Disaster

About Juniper Networks


Juniper Networks is in the business of network innovation. From devices to data centers, from consumers to cloud providers, Juniper Networks delivers the software, silicon and systems that transform the experience and economics of networking. the company serves customers and partners worldwide. Additional information can be found at www.juniper.net.

Corporate and Sales Headquarters Juniper Networks, Inc. 1194 North Mathilda Avenue Sunnyvale, CA 94089 uSA Phone: 888.JuNIPer (888.586.4737) or 408.745.2000 Fax: 408.745.2100 www.juniper.net

APAC Headquarters Juniper Networks (hong kong) 26/F, Cityplaza one 1111 kings road taikoo Shing, hong kong Phone: 852.2332.3636 Fax: 852.2574.7803

EMEA Headquarters Juniper Networks Ireland Airside Business Park Swords, County Dublin, Ireland Phone: 35.31.8903.600 eMeA Sales: 00800.4586.4737 Fax: 35.31.8903.601

to purchase Juniper Networks solutions, please contact your Juniper Networks representative at 1-866-298-6428 or authorized reseller.

Copyright 2012 Juniper Networks, Inc. All rights reserved. Juniper Networks, the Juniper Networks logo, Junos, NetScreen, and ScreenoS are registered trademarks of Juniper Networks, Inc. in the united States and other countries. All other trademarks, service marks, registered marks, or registered service marks are the property of their respective owners. Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice.

2000496-001-eN

oct 2012

Printed on recycled paper

10

Copyright 2012, Juniper Networks, Inc.