Professional Documents
Culture Documents
Abstract
This white paper discusses the capacity optimization
technologies delivered in the new EMC VNX series of storage
platforms. Deduplication and compression capabilities for file
and block storage are delivered standard with the VNX
Operating Environment.
September 2014
Table of Contents
Executive Summary ................................................................................................. 5
Audience ............................................................................................................................ 5
Conclusion ............................................................................................................ 37
References ............................................................................................................ 38
Executive Summary
Capacity-optimization technologies play a critical role in todays environment where
companies need to do more with less. The new EMC VNX series of storage arrays
are well-equipped to meet users needs in this regard. By utilizing deduplication and
compression, users can significantly increase storage utilization for file and block
data. The intelligent and automated deduplication and compression features on both
Block and File are provided in the VNX Operating Environment at no additional cost.
Management is simple and convenient. Once the capacity-optimization technologies
are turned on, the system intelligently manages capacity-optimization processes as
new data is written. With Unisphere, users can manage block and file data from a
single, user friendly interface.
This white paper discusses the capacity-optimization capabilities of the new VNX
series systems, how they are best deployed, and how they fit in with other capacityoptimization technologies in the storage environment.
Audience
This white paper is intended for anyone interested in understanding the
deduplication and compression functionality included with the new VNX series
storage systems.
The product release notes provide the most up-to-date information on product
features and enhancements.
During a migration of a pool LUN into the deduplication container, and out of the
deduplication container when deduplication is disabled on a LUN, the pool requires
free space greater than 110% of the consumed capacity of the original LUN. Creating
a LUN with deduplication enabled is suggested, as this will eliminate the migration
and subsequent overhead that comes with it.
The deduplication container is private space within a pool in which all data for
deduplicated-enabled LUNs resides. This container is created when the first
deduplication-enabled LUN is created within a pool, or deduplication is enabled on a
LUN within a pool. The deduplication container is destroyed when deduplication is
disabled on the last deduplicated LUN or when the LUN is deleted. There will only be
one deduplication container per storage pool. The total space used for the container
depends directly on the amount of space used by the deduplicated LUNs. This space
is not reserved when a pool is created, but rather is used when deduplicated LUNs
exist. Non-deduplicated and deduplicated LUNs can coexist within the same pool.
When a deduplication container is created, the SP owning the container needs to be
determined. This is determined by the Allocation Owner of the first LUN deduplication
is enabled for in the pool. If the Allocation Owner of the LUN is SPA, then the
deduplication container will reside on SPA. Enabling Block Deduplication on
subsequent LUNs will cause the Allocation Owner of the LUNs to be on the SP that
owns the deduplication container. For existing LUNs, the Default and Current Owners
will be retained when deduplication is enabled even if the Allocation Owner changes.
A LUNs Allocation Owner can be viewed on the General tab within the LUN Properties
window in Unisphere. This is also where you can determine which SP owns the pools
deduplication container. To determine which SP owns the deduplication container for
a pool, view the Allocation Owner of a deduplication enabled LUN on the pool.
EMC recommends that all LUNs with Block Deduplication enabled within the same
pool should be owned by the same SP. After enabling deduplication on an existing
LUN, verify that the Default and Current Owner match the Allocation Owner of the LUN.
If the Default Owner does not match the Allocation Owner, the Default Owner should
be changed via the Properties page of the LUN in Unisphere. If the LUNs Current
Owner does not match the Allocation Owner, the LUN should be trespassed by rightclicking the LUN in Unisphere and clicking Trespass.
When creating new LUNs with deduplication enabled, first determine the SP owner of
the deduplication container by reviewing the Allocation Owner of a LUN with
deduplication enabled within the pool. During the creation of the new LUNs with
deduplication enabled in this pool, ensure that you select the Default Owner to match
the SP owning the deduplication container for the pool in the Advanced tab within the
Create LUN dialog window.
EMC also recommends balancing the deduplication containers between the SPs. If
deduplication is enabled on one pool, and the deduplication container is owned by
SPA, the next container should be created on SPB. In this example, if the user was
enabling deduplication for the first time on a second pool, the user should enable
deduplication on a LUN with an Allocation Owner of SPB. This will help to balance the
deduplication load across SPs. The owner of a pools deduplication container cannot
be changed without disabling deduplication on all LUNs within a pool.
VNX Block deduplication runs in the background on the contents of a deduplication
container post process, 12 hours after the previous deduplication run completed on
the pool. Each pool in the system has an independent deduplication container and
process unique to the pool. The contents in a deduplication container are compared
against each other only.
When the 12-hour timer expires and the deduplication process is set to run, the
number of running deduplication processes on the current SP is checked. A maximum
of 3 deduplication processes per SP may run at a time, and if an open slot is not
found, the deduplication process is queued until one becomes available. The SP the
process runs on is based on the SP the first deduplicated LUN was bound on.
Once a deduplication process is allowed to run, the metadata for the deduplicated
LUNs is checked for a total of 64 GBs of new and updated data. If the pool does not
contain 64 GBs of new or updated data, deduplication will not run and the twelve
hour timer is reset. If 64 GBs of new data exists, the new data is first segmented into
8 KB blocks. An algorithm is then run on each new 8 KB block of data on the
deduplication container to create a unique digest (hash) which represents each piece
of data. These digests are stored and later sorted on private space on the pool. The
size of the private space depends directly on the number of digests created to
represent the data inside the deduplication container. The deduplication process will
run to completion, or until deduplication is paused either by the user or by the system
if the maximum continuous run time of 4 hours is reached.
The digests are then compared to identify duplicates. Once duplicate digests are
found, the blocks of data that the digests represent are bit-by-bit compared to verify
the data is exactly the same. If the blocks of data are the same, the index for each of
the LUNs which contain the duplicate data will be updated to point to a single
instance of the data. Once the indexes are updated, unused duplicate 8 KB blocks are
removed. A 256MB slice may be freed back to the pool for use when the slice no
longer contains any data.
If a deduplication process has run on a pool for 4 hours straight, the run is paused
and the system checks for queued deduplication processes on that SP. If a
deduplication process is queued to run, it is allowed to run and the process that was
paused will wait to continue. Queued processes are run in a first come first served
basis. Once a free slot opens, the paused process will continue where it left off. If
there are no queued processes, the process continues to run.
10
At any point in time, you may pause block deduplication at the array level or at the
pool level. Pausing deduplication at either level does not disable block
deduplication, but rather pauses or prevents any deduplication process from running
on the array. Pausing deduplication at the system level will also pause any migrations
on the system caused by enabling or disabling deduplication on a LUN. Pausing
deduplication at the Pool level will cause deduplication migrations within the Pool to
pause. Once deduplication is resumed at the system level, the deduplication
processes for each pool will check to see if a deduplication run on the pool is needed,
and any deduplication migrations continue. If a deduplication process needs to run,
the process on those pools will start shortly after the user resumes deduplication.
Resuming deduplication at the Pool level will cause the Pool to check if a
deduplication process needs to run. Paused migrations on the Pool will also resume.
Management
You can easily manage VNX Block Deduplication by using either the Unisphere
software or NaviSecCLI at the System, Pool, and LUN levels. Once enabled, the system
automatically manages the processing of eliminating duplicate data based on the
LUNs in the deduplication container.
Figure 5 shows the Create LUN dialog box. When creating a Pool LUN, the user is given
the option of enabling VNX Block Deduplication at the time of creation. Notice that
the Thin checkbox is also enabled, since Deduplicated LUNs are Thin LUNs by
definition.
11
Figure 6 shows the Advanced tab of the Create LUN dialog box. When creating a
Deduplicated LUN, the Default Owner is automatically selected to match the owner of
the Deduplication Container within the Pool, if one exists. This is done to ensure
proper LUN ownership within the deduplication container. The Auto choice is not
selectable when a Deduplication Container exists in the Pool. The Default Owner will
be Auto if no Deduplicated LUNs exist in the Pool.
12
For Pool LUNs that are already created, controls for VNX Block Deduplication are
available in the properties dialog box of the LUN. Figure 8 shows the Deduplication
tab within the LUN properties window. The user is able to check the Turn on
Deduplication check box to enable VNX Block Deduplication on the LUN.
13
LUN. This is done by clicking the wrench icon in the top right of the LUNs window,
selecting Choose Columns, and checking the Deduplication check box.
14
15
LUNs in the deduplication container. As with other pool based LUNs, the default
setting is Start High then Auto-Tier.
Next on the Deduplication tab is the Deduplication rate. The Deduplication rate is the
rate at which the deduplication process runs, not the level of deduplication
performed. The default for this setting is Medium. The Deduplication rate also
controls the rate at which deduplication migrations run. At a rate of medium,
deduplication migrations caused by enabling or disabling deduplication on a LUN
within the Pool will run at medium.
The Deduplicated LUNs Shared Capacity is also displayed on the Deduplication tab.
This value, in GBs, is the total amount of space currently used by deduplication on
the pool. This value includes space used by LUNs and Snapshots.
Deduplication and Snapshot Savings is also provided. This is the total amount of
space, in GBs, the system is saving on this Pool by using deduplication. Assume that
the same 10 GBs of application data is copied to four LUNs on the same pool. When
these LUNs are deduplicated, only 1 copy of this data is needed, and the duplicates
will be deleted. The Deduplication and Snapshot Savings in this example will be 30
GBs, as 10 GBs is kept for referencing and the other 3 instances of the 10 GBs of data
is deleted. The savings may increase further if other duplicate data within the
deduplication container can be identified.
The data for VNX Snapshots taken on deduplicated LUNs is also stored within the
Deduplication Container. This allows the data to be deduplicated, and the savings
attained are counted towards the pool savings.
Last on the Deduplication tab in Figure 11 is the Refresh button. The Refresh button at
the bottom of the screen updates the information on the Deduplication tab when
clicked.
Figure 12 is the Deduplication Summary window, which displays deduplication
information for the array. In this window, the user is able to view the current System
Deduplication State, which can either be shown as On, Off, or Paused. Next to the
System Deduplication State are the Pause and Resume buttons. The user may Pause
or Resume deduplication at the system level here.
In the Pools portion of the window, each pool that has deduplication enabled LUNs is
displayed. Information for each pool is displayed, including the Pool Name, the
Deduplication Status (Pending, Running, or Paused), the selected Rate (High,
Medium, or Low), the Total Capacity and Free Capacity for the pool, the Shared
Capacity for the deduplicated LUNs, and Savings information for deduplication.
16
Block Deduplication with Multicore FAST Cache, FAST VP, and Multicore Cache
The basic purpose of VNX Block Deduplication is to eliminate data that is common between
deduplication enabled LUNs within a pool. While saving space in itself is a good reason to
enable deduplication, EMC suggests using deduplication with Multicore FAST Cache and FAST
VP to achieve better efficiency of these features.
After duplicate data has been removed within a deduplication container, FAST VP operations
become more efficient as there is less data to track and promote. Also, with a consolidated
storage footprint, FAST VP can better utilize space on the pool. Consider multiple LUNs on the
same pool that have been deduplicated. Before deduplication, due to application activity on
these LUNs, the slices for each of these LUNs would be contending with each other for
promotion. After deduplication, exact copies of data are removed and contention between
common data is eliminated. Now, only unique data within the deduplication container is
contending with the data in the pool for promotion.
Multicore FAST Cache also is more efficient when used with deduplication. With the removal of
exact copies of active data on the underlying pool, Multicore FAST Cache can also free this
space for other promotions. Data not previously considered for promotion may also benefit due
to the consolidation of I/O the deduplication causes. Consider multiple copies of an 8 KB block,
each with minimal activity. After deduplication runs and removes all but one 8 KB instance of
this data, all IO to each of the previous blocks is directed to the single 8 KB block. This block
becomes more utilized which may cause a promotion and allow the data to benefit from
residing on Flash based storage.
17
Multicore Cache also benefits when hot 8 KB blocks are deduplicated. Deduplication of these
hot blocks frees memory in Multicore Cache for other hot blocks to utilize. If deduplication
causes block to be highly accessed, the blocks may stay within Multicore Cache.
Interoperability
Table 1 below outlines the Interoperability of VNX Block Deduplication with other array
features.
Optional Feature
Behavior
LUN Migration
Fully supported in or out of the deduplicated Pool,
but not while a LUN is in enabling or disabling
deduplication states (because that uses a LUN
migration internally, and you cannot initiate on a LUN
that is already in the midst of migrating). Similarly, a
LUN that is currently migrating cannot have its
deduplication-enabled state changed.
VNX Snapshots
Fully supported. Cannot take a snapshot while the
LUN is being migrated in or out of the pool (when the
LUN is in enabling or disabling states).
Snapshots that exist prior to enabling or disabling
deduplication will be destroyed when the operation
completes (with a user warning before proceeding).
Compression
Not supported because the current implementation of
compression is not compatible with the
Deduplication container; thus each efficiency feature
would counter-act each other.
Controller Based Encryption Fully Supported
FAST VP
Fully supported, but with a single set of attributes on
the pool applying to all deduplication-enabled LUNs
in that pool. (Non deduplicated LUNs still maintain
per-LUN FAST VP policy).
SnapView Snapshots
Fully supported
SnapView Clones
Fully supported; however best practices dictate that
the clone should not reside in the same pool as the
source LUN. In the case of deduplication, if the clone
is in the same deduplication pool, it will share
duplicate blocks thus defeating the purpose of
creating a clone of the source on separate spindles.
Unisphere will issue a warning if the user creates a
clone target in the same pool as the source LUN.
MirrorView (Sync & Async)
Fully supported
SAN Copy
Fully supported
Reserved LUN Pool
Deduplicated LUN cannot be used in the Reserved
LUN Pool
Clone Private LUN
Deduplicated LUN cannot be used in the Clone
Private LUN
18
19
Reviewing the content of LUNs is suggested when considering LUNs for Block Deduplication. In
instances where portions of data on a single LUN are a good fit for Block Deduplication while
other portions are not, consider placing the data on different LUNs when possible.
20
Disabling compression on a LUN will caused all compressed data for the LUN to decompress.
This will increase the size of the LUN on disk, so space considerations should be taken into
account. Decompression operations are further explained later in this section.
Compression Rate
Moving datasets from Classic LUNs or Thick LUNs to Thin LUNs provides the benefits
of recapturing and repurposing unutilized capacity and provides a pay as you go
capacity consumption model. Users make this tradeoff when they do not have
stringent IOPs and response time requirements. Compression takes this a step
further, providing even greater capacity savings with the tradeoff of additional system
resources and LUN performance. The user may adjust the compression rate in the LUN
Properties window at any time. The choices are Low, Medium, or High, with Medium
being the default rate. Compression is best suited for data that is largely inactive but
requires five 9s availability.
EMC recommends that you change the compression rate to Low at the system level
when response-time critical applications are running on the storage system. You can
also Pause compression at the system level during spikes in activity if you wish. The
process of returning capacity to the pool that has been freed by compression can
contribute to CPU and write cache usage as data is consolidated onto as few slices as
possible. This process can also continue after the compression process completes.
Pausing the compression feature will ensure any background compression or
associated space reclamation operations do not impact server I/O.
The initial compression of Classic LUNs closely tracks the behavior of standard LUN
Migration to a Thin LUN target. Compression is initially performed inline with the
migration.
The difference in bandwidth with different compression rate settings for the new VNX
series is shown in Figure 13. There is a larger difference between the low and medium
rates than between medium and high. This behavior is consistent across all models.
MB/s
Compression Rates
High
Medium
Low
Compression Operations
21
Management
An online LUN migration of Classic LUNs to Thin LUNs is performed when compression
is enabled on a Classic LUN. All aspects of the migration are handled for the user with
no interruption to server I/O. Since Classic LUNs do not already reside in a pool, the
22
user is prompted to select a destination pool for the compressed LUN. Figure 14
shows the Turn On Compression dialog box that allows the user to select an existing
pool or launch the Create Pool dialog box to create a new destination pool. Users can
also set the rate of the migration in this dialog box. Keep in mind this is for migration
and not the in place conversion that takes place when LUNs are already in the pool.
23
VNX Block Compression is enabled by checking the Turn on Compression checkbox and
clicking Apply, as shown in Figure 16. After this LUN is in a Compressed state (Figure 16), the
LUN only consumes 63 GBs of space in the pool, for a savings of just over 193 GBs.
24
25
To view the LUN compression states, see Unisphere Help for a list and more
information about each state. The table located in the Help will show the
compression state, a description of that state, and what LUN types this state is valid
for.
The Compressed LUNs Summary dialog box shown in Figure 19 provides a consolidated view of
block compression activity for all LUNs. It also provides system-level compression control by
using the Pause Feature button and Resume Feature button at the bottom of the dialog box.
26
Available Limits
Table 2 details the limits for the VNX Block Data Compression feature.
VNX Model
5200
5400
5600
Total Compressed LUNs
1000
1000
1000
Concurrent Compressions per SP
20
20
20
Concurrent migrations per array
16
16
16
Table 2. Limits for the VNX Block Data Compression Feature
5800
2000
20
16
7600
3000
32
24
8000
4000
40
24
Notes:
27
Interoperability
Table 3 below outlines the Interoperability of VNX Block Compression with other array
features.
Optional Feature
LUN Migration
Behavior
VNX Block Compression cannot be enabled on a LUN
while the LUN is being migrated.
VNX Snapshots
Cannot be compressed with VNX Block Compression.
Deduplicated LUN
Not supported with VNX Block Compression.
Controller Based Encryption Fully Supported
FAST VP
Fully Supported
SnapView Snapshots
Not supported with VNX Block Compression.
SnapView Clones
A VNX Block Compression enabled LUN can be a
clone source.
MirrorView (Sync & Async)
A VNX Block Compression enabled LUN cannot be
replicated to a pre-29 FLARE code array.
SAN Copy
A VNX Block Compression enabled LUN can be the
destination of a San Copy Session. All data is treated
as new, thus a decompression and re-compression
operation may occur for the updates.
Reserved LUN Pool
A VNX Block Compression enabled LUN cannot be
used in the Reserved LUN Pool.
Clone Private LUN
A VNX Block Compression enabled LUN cannot be
used as a Clone Private LUN.
Write Intent Log
A VNX Block Compression enabled LUN cannot be
used as the Write Intent Log
Private LUN
Not supported with VNX Block Compression.
Table 3. VNX Block Compression Interoperability
Block Compression with VNX File
EMC highly recommends that you create File Systems using thick, non-compressed
LUNs. In VNX OE for Block Release 33, the default LUN type for a new LUN is thin so
this can be accomplished by unchecking the Thin checkbox when creating the LUN.
Block Compression will be disabled by default. Compression optimization can be
achieved using the thin and File deduplication (which includes compression)
attributes at the File System level.
28
Definition
Default value
Access Time
15 days
Case Sensitive
Off
Modification Time Length of time in days that the file has not
been modified
15 days
Minimum Size
24 KB
29
deduplicated
Maximum Size
8 TB
File Extension
Exclude List
None
Minimum Scan
Interval
7 days
90%
Pathname
Exclude List
None
Table 4. VNX File Deduplication and Compression File System/Data Mover Settings
Before a deduplication and compression scan begins, the date of the last deduplication scan is
checked against the Minimum Scan Interval. By default, a deduplication scan will not start if
the previous scan ran within the last 7 days. If 7 days have passed, the usage of the SavVol is
then checked to verify it is below the SavVol High Water Mark. If the SavVol usage is above the
SavVol High Water Mark either before deduplication is run or while deduplication is running,
deduplication is not run or paused respectively. The SavVol size is also checked as
deduplication may cause large changes on the File System in which checkpoints are created.
Next, the File System begins checking for files that do not reside in excluded paths. The user is
able to specify which pathnames they wish to be excluded during the scan. Next, a file found is
checked to see if the file extension is on the excluded list. The user can update the file
extension exclude list to specify different file extensions (such as .txt or .zip) they wish for
deduplication to pass over.
If the file is not in an excluded path and does not have an excluded file extension, the file is
checked for the last time it was accessed and modified. These times are compared to the
current settings, and if the files last access and modified dates are older than the values
currently set, the next policies are checked for compliance. If the user wishes for a file to be
skipped it if was accessed or modified within the last 30 days, this can be set and these files
will be skipped.
The size of the file is also checked to see if the policy criterion passes. If the file is larger than
the Minimum Size setting and smaller than the Maximum Size setting, the policy check moves
on to the other settings. By default, the minimum file size is set to 24 KB and the maximum file
size is set at 8 TBs. If the file is too small or too large, deduplication does not run.
Table 5 shows other settings associated with deduplication and compression. Both of these
values are user-customizable.
30
Setting
CPU % High Water Mark
Definition
If the CPU reaches this
level, deduplication will
throttle down (Data Mover
level setting)
CPU % Low Water Mark
If deduplication is
throttled down and the
CPU level returns to this
level, deduplication will
throttle up (Data Mover
level setting)
Table 5. Other deduplication and compression settings
Default Value
75%
40%
The last setting is the Backup Data High Water Mark, which is shown in Table 6. The user can
change this value and customize it for their environment.
Setting
Backup Data High Water
Mark
Definition
Default Value
Percentage of the logical
90%
size of the file that the
space-reduced size should
be for NDMP to back up
the file in its spacereduced format
Table 6. NDMP backup related setting.
Duplicate Detection Method
When enabling Deduplication on VNX File, you have the choice of which Duplicate Detection
Method to utilize. Figure 21 below shows the available choices for the Duplicate Detection
Method. The choices are sha1, byte, or off.
31
deduplicated, the SHA-1 hashes are compared and matching files are identified. After matches
are found, the data is deduplicated. The likelihood that two files with different contents are
given the same SHA-1 hash is very low, with the odds being less than 1 in 260.
Choosing byte will force the system to utilize a byte-by-byte comparison method when
confirming file contents are exact before they are deduplicated. The byte-by-byte comparison
occurs only after SHA-1 is used to identify potential matches, which helps to reduce the amount
of overhead using byte causes. A considerable amount of overhead can be expected when
using byte, especially for large files. Using a byte-by-byte comparison method may be preferred
if you need to guarantee the files being deduplicated are exactly the same.
Choosing off will cause Deduplication to be disabled for the File System. If the global
Deduplication setting for the File System is enabled, and the Duplicate Detection Method is off,
Compression will be the only method of capacity optimization that the File System utilizes.
Changing the Duplicate Detection Method
If you choose to change the Duplicate Detection Method, the new setting will take effect from
that point of change and forward. All files that were deduplicated using a different method
previously will remain unchanged. Changing the Duplicate Detection Method is not suggested
after enabling deduplication on a File System.
Compression Method
VNX File Compression has two different methods in which space savings can be achieved via
compression: Fast or Deep compression. Fast is the default setting, which optimizes for space
savings and speed. Deep is optimized for space savings rather than speed. These options are
available in the Deduplication Settings tab of the File Systems properties window.
Choosing a Compression Method
There are several key considerations when choosing between the Fast or Deep compression
methods. The first consideration is performance. The Deep compression method takes a longer
time on each file, compared to Fast, to achieve the highest level of space savings during the
compression phase of the operation. Decompression speed is another consideration. As files
that are compressed are accessed, they must be decompressed into memory before the I/O is
completed. Accessing a file that has been compressed using the Deep method may take more
time than if it was compressed using the Fast method. The amount of time depends on the
amount of data to rehydrate and the amount of data being accessed.
Another consideration is space savings. Using the Deep method of compression will achieve up
to 30% more space savings than if Fast is used. For example, if the Fast method is run on a file
and the space savings is 50%, then if the Deep method is used, the space savings on the same
file would be near 65%. If achieving the highest space savings is the goal, then the Deep
compression method should be used.
The last consideration is compatibility. Not all replication software supports the Deep
compression setting. The typical error seen when the replication software doesnt support the
Deep method and attempts a read operation on a file that was compressed using the Deep
32
method is an I/O error. The replication software needs to be checked to see if Deep
compression is supported or not before it is used.
CIFS Compression Enabled
Table 7 shows an option for controlling whether or not compression is enabled on CIFS. By
default, CIFS compression is enabled. CIFS compression will only occur if Deduplication is
turned on or suspended at the File System level. If Deduplication is disabled at the File System
level or is in the process of disabling, even if the CIFS Compression Enabled box is checked,
CIFS compression will not run.
Setting
CIFS Compression Enabled
Definition
Enables CIFS
compression
Table 7. Compression related settings.
Default Value
On
Management
VNX File Deduplication and Compression is enabled on a file system basis. Disabled by default,
it can be enabled at the time the file system is created, or after if the user chooses. A single
setting to enable File Deduplication and Compression exists, and it is called Deduplication. File
Deduplication and Compression both work to remove duplicate data and save space.
Figure 22 below shows the Deduplication Enabled setting, which can be enabled at the time the
File System is created.
33
34
35
Files scanned Total number of files that the deduplication policy engine
looked at when it last scanned the file system.
Original data size Space required to store the data in the file system if it is
not deduplicated. This number might exceed the capacity of the file system, in
which case the file system is said to be overprovisioned. This is shown by the
36
ratio of the original data size to the file system capacity, which is also
displayed.
After the first scan, statistics are reported as static values based on the last
successful scan. Figure 27 below is an example of the Deduplication Status and
Statistics for a file system.
Conclusion
VNX storage systems provide powerful capacity efficiency features that can improve
effective capacity utilizations. These capacity-optimization features are included with
the VNX Operating Environment at no additional cost. The deduplication and
compression features for file and block storage offer complementary capacity
efficiency opportunities for all data types in the primary storage systems.
The new VNX Block Deduplication is another way space savings can be achieved.
Data contained on deduplicated LUNs within a pool is compared at the 8 KB block
level and duplicates are eliminated to save space. Unlike with VNX Block
Compression, deduplication is suggested for use on active datasets where a large
amount of duplicates exist. EMC suggests using deduplication with Multicore FAST
Cache and FAST VP, to achieve better efficiency of these features.
VNX Block Compression runs at the sub-LUN level in 64 KB increments. If at least 8 KB
can be saved on the 64 KB chunk, the data is compressed. EMC suggests using VNX
Block Compression on static datasets like archives. The best use cases are those
where data needs to be stored most efficiently, with a high degree of availability, and
where performance overhead can be tolerated.
37
VNX File Deduplication and Compression work together in providing space savings on
VNX File data. While the user has the choice of running compression with or without
deduplication enabled, running them together will help achieve the largest amount of
space savings. Deduplication first single instances duplicate files on the File System
and then compression tries to reduce the size of the remaining files. The savings
achieved while using VNX File Deduplication and Compression will vary as the
savings entirely depend on the files being stored on the file system.
References
The following white papers are available on EMC Online Support:
EMC VNX Unified Best Practices for Performance Applied Best Practices
Guide
38
39
File System
Initial Analysis of 2TB
Windows server
FsUtil output
2,471GB
Deduplication Results
Not enabled
Savings
0
After Migration
to VNX
1,590GB
3,461,285 files
deduplicated
40
Customer B
Customer has an older NAS that needs to be migrated to a VNX:
Highly visible
After Migration to
VNX
File System
760GB
Deduplication Results
Not enabled
Savings
0
760GB
44,862 files
deduplicated
???
Initial Analysis of
Older NAS
FsUtil output
After Migration to
VNX
File System
Deduplication Results
Savings
760GB
Not enabled
760GB
845,293 files
deduplicated
276GB
Summary: Do not turn on checkpoints before doing migration with deduplication and
compression enabled.
41
When you turn on checkpoints before doing a migration with deduplication enabled, the system
will deduplicate files to clear space, freeing up blocks to be used. This could cause the
checkpoint SavVol to grow very large as writes will start to fill up those freed up blocks. This will
cause the SavVol to track those new blocks as needing to be saved into the SavVol. This can
cause the SavVol to keep expanding to track all the new changes.
42