You are on page 1of 8

Solution brief

Enabling software
deduplication
across the
enterprise with
HPData Protector

Table of contents

3 The challenges of deduplication


4 Deduplication explained
5 HP Data Protector federated deduplication
6 Federated deduplication use cases
8 Conclusion

The challenges of deduplication


As data volumes double every 12 to 18 months, its no surprise that
managing data growth is still a top priority for IT departments.1 Data
deduplication is one of the most important and fastest growing
storage optimization techniques to appear in the recent years.2
Using deduplication, organizations can reduce stored data by up to
95 percent in just a few short months, significantly reducing their
storage footprint as well as backup and restore times.
Despite the many benefits, first-generation deduplication
technologies also have significant drawbacks. Most deduplication
technologies are isolated, inefficient, and difficult to scale.

Incompatible solutions

The impact of inefficiencies


The deduplication process can place a tremendous processing
load on the hardware on which its hosted. Many deduplication
technologies use an inefficient process that reads each entire
data chunk on disk to determine if a new chunk is a match. This
laborious process taxes the CPU and slows down hardware and other
applications (figure 2). This can greatly impact the performance of
backup or application servers depending on where deduplication
is being executed, to the point where deduplication makes them
virtually unusable, or prevents them from scaling to back up large
volumes of data.
Figure 2. Processor-intensive deduplication approaches slow performance
and cant be deployed at a large scale

Many vendors use different point solutions for source, software,


andtarget deduplication. Because these solutions arent
compatible, data must be rehydrated to move it from one system
to another (figure 1). This can lead to longer restore times and more
laborintensive data protection efforts.

78%

Figure 1. In first-generation deduplication solutions, data must be


rehydrated as it is transferred from onesystem to another

Deduplication appliances running legacy deduplication technologies


also have some challenges. First-generation deduplication
appliances were designed to have fast ingest ratesthe speed they
could write data to the backup mediaso they could meet backup
windows. As a result, they emphasized methods that could process
redundancy checks, index updates, and disk writes faster.

Multiple deduplication agents


Organizations often need to deploy different agents for
sourcebased deduplication on backup and application servers.

However, these techniques also create a tax during restores,


which can be more than 50 percent of the ingest rate (figure 3).
Reconstituting highly fragmented data chunks can dramatically slow
the process of reading and rehydrating data from a deduplicated
backup. Recovery performance is a critical criterion for businesses
trying to quantify their recovery time objectives in the event of
system or site failure.3
Figure 3. Deduplication tax makes restores take longer

Many deduplication engines segment data into large block sizes.


This coarse-grained chunking can be sufficient for backup servers,
but does not achieve useful deduplication ratios on application
servers or clients.
To alleviate these issues, vendors developed separate deduplication
agents for server and client backups. Client deduplication agents
needed to be application aware to achieve better deduplication
ratios on live systems. As a result, applying source deduplication
across the enterprise meant buying, deploying, and managing
different deduplication agents putting strain on IT budget and
making the IT environment more complex and hard to manage.
1

Source: IT Spending Intentions Survey, ESG Research, January 2011.

HP StoreOnce video

Source: HP StoreOnce: The Next Wave of Data Deduplication, Enterprise Strategy


Group, November 2011.

Deduplication explained
Data deduplication compares chunks of information to
detect duplicates and stores each unique data segment
only once. For that to happen, a deduplication engine
assigns a unique identifier to each chunk of data using
mathematical hash functions. Once it has identified two
chunks of data as identical, the system will replace the
duplicate with a link to the original chunk.
There are two architectural approaches to chunking. A
fixed deduplication algorithm breaks data into blocks of a
fixed size. Variable chunking groups the data into blocks
based on patterns in the data itself. The advantage of
variable chunking is that it can recognize duplicates when
small changes have occurred and merely shifted the
data from one backup to the next. Variable chunking, the
technique most commonly used today, leads on average
to 20:1 deduplication ratios, or higher.
Deduplication involves a combination of three elements:
the deduplication engine, the deduplication store, and
backup agents.
The deduplication engine is where the majority of
processing takes place. It manages the logic and
processing of the backup stream by calculating segments
and hash values, identifying unique and repeated
segments, and maintaining the hash lookup table.
The deduplication store is the disk storage location
managed by the deduplication engine. It stores the unique
(deduplicated) segments, and is often physically coupled
with the deduplication engine.
Deduplication-enabled backup agents (for example, media
agents, disk agents, and application agents) manage some
of the deduplication processes. Agents can be deployed
separately from the deduplication engine to offload some
of the performance impact. Agents can perform tasks
such as segmenting the data, calculating the hash value
of segments, and sending new data to the engine and the
store. The deduplication agent talks to the deduplication
engine to calculate which segments are unique.

Deduplication can take place at the application source,


backup server, or at the target device.
Application source deduplication removes redundant
data before it is transmitted to the backup target.
Source deduplication reduces storage and bandwidth
requirements. However, it can be slower than target
deduplication and increase the workload on servers.
Backup server deduplication shifts the deduplication
execution onto a separate dedicated server to maximize
the performance of the target device and minimize the
impact on the application source where the application is
running.
Target deduplication removes redundant data from a
backup after it has been transmitted to a hardware device.
This method can use any backup application the device
supports, and the deduplication process is transparent
to the backup application. Backup applications can also
deploy and manage target deduplication onto a variety of
hardware targets such as disk arrays, tape libraries, and
network-attached storage devices. Target deduplication
reduces the volume of storage required for the backup,
but does not reduce the amount of data that must be sent
across a LAN or WAN during the backup.
With target deduplication, backup agents are not
aware of the deduplication process. In backup server
or application source deduplication, backup agents
will have deduplication technology built-in, and will be
deployed onto the backup server or application server as
appropriate.
Backed-up data can be transferred in a variety of
ways. In a traditional transfer, all backup data is sent.
In a deduplicated backup, the backup data stream only
contains the unique segments and references to duplicate
segments. This reduces the network bandwidth required.
With replicated deduplication, the unique backup data
is sent to a replication target, which enables efficient
replication over low-bandwidth links.

Figure 4. HP StoreOnce deduplication engine can be deployed at an application source, a backup server, or a target appliance such as B6200 appliance or
HPData Protector software store

HP Data Protector federated


deduplication
HP Data Protector software, powered by HP StoreOnce, solves the
challenges of traditional deduplication by offering a federated
deduplication approach. The foundation of federated deduplication
is HP StoreOnce technology which enables a single flexible and
highly efficient deduplication engine. A federated approach supports
the notion that deduplication should be performed only once,
anywhere, with efficient data movement, and all managed through a
single pane of glass.
Data Protector federated deduplication supports deduplication
at any locationapplication source, backup server, and target
appliancewherever it makes best business sense.

Flexible anywhere deployment


The HP StoreOnce federated deduplication capability provides a
common architecture that can be deployed across a wider range of
hardwareon both physical and virtual devicesfrom the edge of
an enterprise to the datacenter. Today, only HP StoreOnce backup
solutions (HP Data Protector + HP StoreOnce appliances) can deliver
deduplication anywhereapplication source, backup server, and
target device (software store or dedupe appliance) as the StoreOnce
technology is not tied to a hardware platform or operating system.
HP Data Protector leverages the flexible architecture of the
HP StoreOnce deduplication engine to provide software store
deduplication allowing customers to use existing hardware to
deploy deduplicated backups in small and/or remote offices. A
single software deduplication store can be shared among multiple
clients across a LAN or SAN. Software deduplication can be remotely
deployed and managed for use in offices with little to no IT staff.

Highly efficient and intelligent deduplication technology


Data Protector software deduplication is powered by the HP
StoreOnce engine. This patented technology, developed by HP Labs,
is portable, flexible, and scalable. Unlike competing solutions, the
HP StoreOnce engine can be integrated into hardware or software
to deliver greater flexibility, and faster backup and recovery
performance.
HP StoreOnces thin, efficient footprint minimizes the load on CPU
processing and maximizes application availability. It uses as little as
a tenth of the memory of other available solutions, which means it
can be deployed on application or backup servers, and even virtual
machines, without crippling performance.4
The StoreOnce deduplication engine uses an extremely efficient
Adaptive Micro-Chunking technique to segment data into very
small blocks ranging from one kilobyte to 10 kilobytes in size, with
an average of four kilobytes. These four-kilobyte chunks are up to
one sixteenth the size of the blocks used by other solutions. This
increases HP Data Protectors ability to find commonality in the data
stream during deduplication and, thus, store less data on disk.
StoreOnce also uses algorithms called Sparse Indexing and
Contained Matching to reduce the number of times the deduplication
engine has to read the data to determine if chunks match. Instead
of reading an entire data chunk, these algorithms preview parts
of it and compare them to a table of existing chunks stored
in memory. This greatly improves throughput and reduces
processingrequirements.
4

Benchmarking and testing done with equivalent deduplication solutions in a controlled


environment in HP R&D lab, Boeblingen, Germany, November 2011.

Figure 5. Single pane of glass management for backup and recovery,


deduplication and replication across the enterprise

Standalone small remote office protection with no local


recovery needs
Very small remote offices tend to lack backup and recovery
infrastructure. These standalone offices generally have a small
number of application and servers (less than 5 servers) that need
to be protected without a local recovery option. Application source
deduplication doesnt require additional hardwarephysical
or virtualto support remote office backup and recovery. With
one-click configuration, HP Data Protector application source
deduplication can be deployed on the application servers remotely
from the central data center. With this deployment strategy, the
HP Data Protector media agent will execute deduplication on the
application server for any form of data and backup remotely to a
target device (for example, HP StoreOnce B6200) in the central data
center. Unique to HP Data Protector, StoreOnce application source
deduplication is delivered through a single media agent.

Efficient data movement


HP StoreOnce supports the HP Catalyst interface which enables
backup applications to control the entire backup operation, including
data movement all around the organization, without the need for
rehydration and setting asymmetric retention policies on the backup
data at different sites based on business needs.

Centralized management and control


HP Data Protector software manages and controls the entire backup
and recovery process from edge to data center through a single pane
of glass. Centralized management enables IT to deploy, manage, and
monitor backup agents on the remote and branch office locations
eliminating the need for specialized IT staff at these locations.
With StoreOnce Catalyst integration, HP Data Protector manages
and controls deduplication-enabled multi-site replication between
sitesfor locally or geographically distributed environments.
Geographically distributed organizations can take control of the
data at its furthest outposts and bring it to the data center in a
costeffective way.

Federated deduplication use cases


Today, much of an organizations critical information is created and
consumed at the remote office/branch office (ROBO) locations. As
there is often little to no IT expertise at these small remote locations
they are exposed to data lossand the subsequent business
falloutas they are not adequately protected. Additionally, the
traditional tape based approach for a remote office is cumbersome,
expensive, and labor intensive.
HP Data Protectors federated deduplication capability can be
deployed across a range of different scenarios; particularly in a
global remote office environment, which can include hundreds of
small, medium, and large remote offices with differing backup and
recovery needs.

The extremely thin and efficient HP StoreOnce engine allows


applications and software deduplication to coexist on the same server
without crippling performance. HP StoreOnce algorithm delivers higher
deduplication ratio through the use of smaller chunk size improving
the overall storage efficiency. HP Data Protector deduplication works
for all applications and doesnt require any customization.
Data Protector application source deduplication provides efficient
use of network bandwidth by sending only deduplicated data from
the remote office to the central datacenter. This reduces backup
windows especially in high latency networks. With application
source deduplication, you get easy management, high efficiency and
complete coverage of your entire environment.
Figure 6. Application source deduplication reduces the network bandwidth
and eliminates the need of storage and server at the remote site providing a
costeffective dedupe solution for small environments

Figure 7. Backup server deduplication minimizes impact on application performance and maximizes performance of the target device

Medium size remote office protection with local recovery


requirements
To support more complex configurationsincluding a wider
variety of operating systemsHP Data Protector supports backup
server deduplication. A backup server is essentially a backup client
with an HP Data Protector media agent installed and running the
deduplication task and other standard media management tasks,
such as mirroring using object copy. Running deduplication tasks on
a dedicated server minimizes impact on application performance
and maximizes performance of the target device. The backup server
runs Windows or Linux, but the source servers can run any HP Data
Protector supported application or operating system. A server-side
deduplication strategy is very useful in medium size (5-15 servers)
remote offices that have local recovery requirements.
The data can be backed up to the backup server allowing staff in a
remote or branch office to run local restores as required. For disaster

recovery, data can be transferred to a target device in a central


data center using HP Data Protector object copy. This approach also
provides the efficient use of network bandwidth and reduction in
backup windows especially in high latency networks.

Large remote office protection with disaster recovery needs


Large remote, branch office, or regional data center environments
generally have some IT expertise and backup and recovery
infrastructure to protect data. In these environments HP StoreOnce
backup appliance or HP Data Protector software store can be used
locally to store backup data with the option to replicate data to a
central datacenter for disaster recovery. It is important to note that
in this case the deduplicated processing occurs on the target device.
As illustrated in figure 8, this strategy delivers rapid recovery
and maximum storage efficiency in large remote or branch
officeenvironments.

Figure 8. Deduplication at the target device is easy to deploy and provides maximum storage optimization

Figure 9. Deduplication at the target device is easy to deploy and provides maximum storage optimization

Virtual environment protection


As virtualization technologies become more mature and reliable,
IT organizations are increasingly deploying mission critical
applications in virtual environments requiring protection. A large
virtual environment can have thousands of virtual machines running
the same operating system (e.g., Microsoft Windows or Linux).
The duplication of information within virtualized data stores is
driving enormous consumption of backup storage resources and the
associated capital expenditures.
HP Data Protector provides many advanced options to protect
virtual environments. HP Data Protectors policy based protection
for applications and virtual environments automates and simplifies
virtual environment protection and frees up IT staff for high
priority projects that drive business growth. HP Data Protectors
deduplication capabilities offer significant cost savings through
storage efficiency, by eliminating the redundant operating system
information across backup images and guest profiles, which provides
fast recovery to any data within the backup image. HP Data Protector
provides application-aware array based snapshots for virtual
environments for a wide variety of storage arrays and applications
ensuring business continuity for 24X7 global operations.

Conclusion
HP Data Protector is the first and only solution in the market that
supports deduplication at any locationapplication source, backup
server, and target deviceacross an enterprise. Powered by the
industrys most advanced deduplication engine HP StoreOnce,
HP Data Protectors federated deduplication capability provides
the unique ability to deduplicate data only once across multiple
locations. HP StoreOnce technology provides a common architecture
across software and hardware, at remote sites and in the data center
enabling deduplicated data movement from edge to core without
having to rehydrate at multiple deployments. HP Data Protector
software central management combined with StoreOnce allows
organizations to maximize their critical storage resources through
the most efficient deduplication available, while meeting stringent
business SLAs and minimizing backup infrastructure related costs.
Solve the challenges of traditional deduplication with HP Data
Protector software, visit hp.com/go/dataprotector.

With a single pane of glass, HP Data Protector can manage the


entire backup and recovery process across any hypervisorincluding
snapshots and replication in VMware, Microsoft HyperV, and Citrix
Xen environments.

Get connected
hp.com/go/getconnected
Get the insider view on tech trends,
support alerts, and HP solutions
Copyright 20112012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and
services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors
or omissions contained herein.
Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. Oracle is a registered trademark of Oracle and/or its affiliates.
UNIX is a registered trademark of The Open Group.
4AA3-8728ENW, Created December 2011; Updated June 2012, Rev. 3

You might also like