You are on page 1of 103

Hitachi Universal Storage Platform V

Architecture and Concepts


A White Paper

By Alan Benway (Performance Measurement Group, Technical Operations)

Confidential Hitachi Data Systems Internal Use Only

June 2009

Notices and Disclaimer


Copyright 2009 Hitachi Data Systems Corporation. All rights reserved.
The performance data contained herein was obtained in a controlled isolated environment. Actual
results that may be obtained in other operating environments may vary significantly. While Hitachi
Data Systems Corporation has reviewed each item for accuracy in a specific situation, there is no
guarantee that the same or similar results can be obtained elsewhere.
All designs, specifications, statements, information and recommendations (collectively, "designs") in
this manual are presented "AS IS," with all faults. Hitachi Data Systems Corporation and its suppliers
disclaim all warranties, including without limitation, the warranty of merchantability, fitness for a
particular purpose and non-infringement or arising from a course of dealing, usage or trade practice.
In no event shall Hitachi Data Systems Corporation or its suppliers be liable for any indirect, special,
consequential or incidental damages, including without limitation, lost profit or loss or damage to
data arising out of the use or inability to use the designs, even if Hitachi Data Systems Corporation
or its suppliers have been advised of the possibility of such damages.
Universal Storage Platform is a registered trademark of Hitachi Data Systems, Inc. in the United
States, other countries, or both.
Other company, product or service names may be trademarks or service marks of others.
This document has been reviewed for accuracy as of the date of initial publication. Hitachi Data
Systems Corporation may make improvements and/or changes in product and/or programs at any
time without notice.
No part of this document may be reproduced or transmitted without written approval from Hitachi
Data Systems Corporation.

WARNING: This document can only be used as HDS internal documentation for informational
purposes only. This documentation is not meant to be disclosed to customers or discussed without
a proper non-disclosure agreement (NDA).

Document Revision Level


Revision

Date

Description

1.0

December 2007

Initial Release

1.1

April 2008

Updates

2.0

February 2009

Major revision - concepts additions, updates

2.1

June 2009

Changes to MP Workload Sharing, eLUNs, ePorts, Cache Mode, and


Concatenated Array Groups discussions, additional tables

Reference
Hitachi Universal Storage Platform V Performance Summary 08212008

Contributors
The information included in this document represents the expertise, feedback, and suggestions of a
number of skilled practitioners. The author would like to recognize and thank the following reviewers
of this document:
Gil Rangel, Director of Performance Measurement Group - Technical Operations
Dan Hood, Director, Product Management, Enterprise Arrays
Larry Korbus, Director, Product Management, HDP, UVM, VPM features
Ian Vogelesang, Performance Measurement Group - Technical Operations

Table of Contents
Introduction .......................................................................................................................................................................... 7

Glossary ........................................................................................................................................ 7
Overview of Changes ......................................................................................................................................................... 10

Software Overview ...................................................................................................................... 10


Hardware Overview .................................................................................................................... 11
Processor Upgrade ................................................................................................................................................ 12
Cache and Shared Memory ................................................................................................................................... 12
Features and PCBs ................................................................................................................................................ 13
Architecture Details............................................................................................................................................................ 14

Summary of Installable Hardware Features ................................................................................ 15


Logic Box Details ........................................................................................................................ 15
Memory Systems Details ............................................................................................................ 16
Shared Memory (SMA) .......................................................................................................................................... 17
Cache Switches (CSW) and Data Cache (CMA).................................................................................................... 18
Data Cache Operations Overview .......................................................................................................................... 20
Random I/O Cache Operations .............................................................................................................................. 20
Sequential I/O Cache Operations and Sequential Detect ...................................................................................... 22
BED and FED Local RAM ...................................................................................................................................... 23

Front-End Director Concepts ...................................................................................................... 23


FED FC-8 port, ESCON, and FICON: Summary of Features ................................................................................. 25
FED FC-16 port Feature......................................................................................................................................... 25
I/O Request Limits and Queue Depths (Open Fibre) ............................................................................................. 26
MP Distributed I/O (Open Fibre)............................................................................................................................. 27
External Storage Mode I/O (Open Fibre)................................................................................................................ 27

Front-End Director Board Details ................................................................................................ 28


Open Fibre 8-Port Feature ..................................................................................................................................... 28
Open Fibre 16-Port Feature ................................................................................................................................... 29
ESCON 8-port Feature ........................................................................................................................................... 30
FICON 8-port Feature ............................................................................................................................................ 30

Back-end Director Concepts ....................................................................................................... 31


BED Feature Summary .......................................................................................................................................... 31

Back-End-Director Board Details ................................................................................................ 32


BED Details ............................................................................................................................................................ 32

Back End RAID level Organization ......................................................................................................................... 32

Universal Storage Platform V: HDU and BED Associations by Frame ....................................... 33


Disk Details ................................................................................................................................. 35
SATA DISKS .......................................................................................................................................................... 36

HDU Switched Loop Details ........................................................................................................ 38


Universal Storage Platform V Configuration Overviews ................................................................................................. 39

Small Configuration (2 FEDs, 2 BEDs) ....................................................................................... 39


Midsize Configuration (4FEDs, 4 BEDs) ..................................................................................... 40
Large Configuration (8 FEDs, 8 BEDs) ....................................................................................... 41
Provisioning Managing Storage Volumes ..................................................................................................................... 42

Traditional Host-based Volume Management ............................................................................. 42


Traditional Universal Storage Platform Storage-based Volume Management............................ 43
Dynamic Provisioning Volume Management .............................................................................. 45
Usage Overview ..................................................................................................................................................... 46

Hitachi Dynamic Provisioning Pools............................................................................................ 46


V-VOL Groups and DP Volumes ................................................................................................ 48
Usage Mechanisms ................................................................................................................................................ 50
DPVOL Features and Restrictions ......................................................................................................................... 51
Pool Page Details ................................................................................................................................................... 51
Pool Expansion ...................................................................................................................................................... 53
Miscellaneous Hitachi Dynamic Provisioning Details ............................................................................................. 53
Hitachi Dynamic Provisioning and Program Products Compatibility....................................................................... 54
Universal Storage Platform V Volume Flexibility Example ..................................................................................... 55
Storage Concepts ............................................................................................................................................................... 56

Understand Your Customers Environment ................................................................................ 56


Disk Types .................................................................................................................................. 56
RAID Levels ................................................................................................................................ 57
Parity Groups and Array Groups ................................................................................................. 59
RAID Chunks and Stripes ........................................................................................................... 59
LUNS (host volumes) .................................................................................................................. 60
Number of LUNs per Parity Group .............................................................................................. 60
Port I/O Request Limits, LUN Queue Depths, and Transfer sizes .............................................. 60
Port I/O Request Limits .......................................................................................................................................... 60

Port I/O Request Maximum Transfer Size .............................................................................................................. 61


LUN Queue Depth .................................................................................................................................................. 61

Mixing Data on the Physical Disks .............................................................................................. 62


Workload Characteristics ............................................................................................................ 62
Selecting the Proper Disk Drive Form Factor.............................................................................. 63
Mixing I/O Profiles on the Physical Disks .................................................................................... 63
Front-end Port Performance and Usage Considerations ............................................................ 63
Host Fan-in and Fan-out ........................................................................................................................................ 63
Mixing I/O Profiles on a Port................................................................................................................................... 64
Summary ............................................................................................................................................................................. 64
Appendix 1. Universal Storage Platform V (Frames, HDUs, and Array Groups). ......................................................... 65
Appendix 2. Open Systems RAID Mechanisms ............................................................................................................... 66
Appendix 3. Mainframe 3390x and Open-x RAID Mechanisms ...................................................................................... 68
Appendix 4. BED: Loop Maps with Array Group Names ................................................................................................. 71
Appendix 5. FED: 8-Port Fibre Channel Maps with Processors .................................................................................... 72
Appendix 6. FED: 16-Port Fibre Channel Maps with Processors .................................................................................. 73
Appendix 7. FED: 8-Port FICON Maps with Processors ................................................................................................. 74
Appendix 8. FED: 8-Port ESCON Maps with Processors ............................................................................................... 75
Appendix 9. Veritas Volume Manager Example .......................................................................................................... 76
Appendix 10. Pool Details ................................................................................................................................................. 77
Appendix 11. Pool Example .............................................................................................................................................. 78
Appendix 12. Disks - Physical IOPS Details .................................................................................................................... 79
Appendix 13. File Systems and Hitachi Dynamic Provisioning Thin Provisioning ..................................................... 82
Appendix 14. Putting SATA Drive Performance into Perspective ................................................................................. 83
Appendix 15. LDEVs, LUNs, VDEVs and More ................................................................................................................. 87
Internal VDEV ........................................................................................................................................................ 92
External VDEV ....................................................................................................................................................... 92
CoW VDEV ............................................................................................................................................................ 93
Dynamic Provisioning VDEV .................................................................................................................................. 93
Layout of internal VDEVs on the parity group ........................................................................................................ 94
Appendix 16. Concatenated Parity Groups ...................................................................................................................... 98
Advantages of concatenated Parity Groups ......................................................................................................... 100
Disadvantages of concatenated Parity Groups: ................................................................................................... 101
Using LDEVs on concatenated Parity Groups as Dynamic Provisioning pool volumes ....................................... 101
Summary of the characteristics of concatenated Parity Groups (VDEV interleave): ............................................ 102

Hitachi Universal Storage Platform V


Architecture and Concepts
A White Paper
By Alan Benway (Performance Measurement Group, Technical Operations)

Introduction
ThisdocumentcoversthearchitectureandconceptsoftheHitachiUniversalStorage PlatformV. The
concepts and physical details regarding each hardware feature are covered in the following sections.
However,thisdocumentisnotintendedtocoveranyaspectsofprogramproducts,databases,customer
specificenvironments,ornewfeaturesavailablebythesecondgeneralrelease.Areasnotcoveredby
thisdocumentinclude:

TrueCopy/ShadowImage/UniversalReplicatorDisasterRecoverySolutions
HostLogicalVolumeManagementGeneralguidelines
HitachiDynamicLinkManager(HDLM)Generalguidelines
UniversalVolumeManagement(UVM)virtualizationofexternalstorageforData
LifecycleManagement(DLM)
VirtualPartitionManagement(VPM)Generalguidelinesforworkloadmanagement
OracleGeneralguidelinesforstorageconfiguration
MicrosoftExchangeGeneralguidelinesforstorageconfiguration

ThisdocumentisintendedtofamiliarizeHitachiDataSystemssalespersonnel,technicalsupportstaff,
customers,andvalueaddedresellerswiththefeaturesandconceptsoftheUniversalStoragePlatform
V.Theusersthatwillbenefitfromthisdocumentarethosewhoalreadypossessanindepthknowledge
oftheHitachiTagmaStoreUniversalStoragePlatformarchitecture.

Glossary
Throughout this paper the terminology used by Hitachi Data Systems not Hitachi will be normally
used.AssomestorageterminologyisuseddifferentlyinHitachidocumentationorbytheusersinthe
field,herearesomedefinitionsasusedinthispaper:

ArrayGroupThetermusedtodescribeasetoffourphysicaldiskdrivesinstalledasagroupin
thesubsystem.WhenasetofoneortwoArrayGroups(fouroreightdiskdrives)isformatted
usingaRAIDlevel,theresultingformattedentityiscalledaParityGroup.Althoughtechnically
thetermArrayGroupreferstoagroupofbarephysicaldrives,andthetermParityGrouprefers
to something that has been formatted as a RAID level and therefore actually has parity data
(hereweconsideraRAID10mirrorcopyasparitydata),beawarethatthistechnicaldistinction
isoftenlostandyouwillseethetermsParityGroupandArrayGroupusedinterchangeably.

RESTRICTEDCONFIDENTIAL

Page7

BED Backend Director feature; the pair of Disk Adapter (DKA) PCBs providing 4 pairs of
backendFibreChannelloops.Pairsoffeatures(4PCBs,or8pairsofloops)arealwaysinstalled.

CHAHitachisnamefortheFED.

CMA Cache Memory Adapter board (8 1064 MB/sec ports, 4 banks of RAM using 16 DDR2
DIMMslots)

ConcatenatedParityGroupAconfigurationwheretheVDEVscorrespondingtoapairofRAID
10 (2D+2D) or RAID5 (7D+1P) Parity Groups, or a quad of RAID5 (7D+1P) Parity Groups, are
interleavedonaRAIDstripebyRAIDstripe,round robinbasisontheirunderlyingdiskdrives.
ThishastheeffectofdispersingI/Oactivityovertwiceorfourtimesthenumberofdrives,butit
doesnotchangethenumber,names,orsizeofVDEVs,andhenceitdoesn'tmakeitpossibleto
assignlargerLDEVs.NotethatweoftenrefertoRAID10(4D+4D),butthisisactuallytwoRAID
10(2D+2D)ParityGroupsinterleavedtogether.Foramorecomprehensiveexplanationseethe
ConcatenatedParityGroupsectionintheappendix.

CSWCacheSwitchboard(161064MB/secports),a2DversionofaHitachiSupercomputer3D
crossbarswitchwithnanosecondlatencies(porttoport)

DKC Hitachisnamefor the basecontrolframe (orrack)wheretheLogicBoxand upto128


disksarelocated.

DKAHitachisnamefortheBED.

DPVOL a Dynamic Provisioning (DP) Volume; the Virtual Volume from an HDP Pool. Some
documentsrefertothisasaVVOL.ItisamemberofaVVOLGroup.EachDPVOLhavingauser
specifiedsizebetween8GBand4TB.

Featureaninstallablehardwareoption(suchasFED,BED,CSW,CMA,SMA)thatincludestwo
PCBs(oneperpowerdomain).NotethattwoBEDfeaturesmustbeinstalledatatime.

FEDFrontendDirectorfeature;thepairofChannelAdapter(CHA)PCBsusedtoattachhosts
tothestoragesystem,providingOpenFC,FICON,orESCONattachment.

HDU(HardDiskUnit)the64diskcontainerinaframethathas32diskslotsonthefrontside
and32moreontheback.TheHDUisfurthersplitintoleftandrighthalves,eachonseparate
powerdomains.ThereareuptofourHDUsperexpansionframe,andtwointhecontrolframe.

LDEV(LogicalDevice)Alogicalvolumeinternaltothesubsystemthatcanbeusedtocontain
customer data. LDEVs are uniquely identified within the subsystem using a six hex digit
identifierintheformLDKC:CU:LDEV.LDEVsarecarvedfromaVDEV(seeVDEV),andthusthere
arefourtypesofLDEVsinternalLDEVs,externalLDEVs,COWVVOLs,andDPVOLs.LDEVsmay
bemappedtoahostasaLUN,eitherasasingleLDEV,orasasetofupto36LDEVscombinedin
the form of a LUSE. Note: what is called an LDEV in HDS enterprise subsystems like the
Universal Storage Platform V is called an LU or LUN in HDS modular subsystems like the AMS
family,andwhatiscalledaLUNinHDSenterprisesubsystemsiscalledanHLUNinHDSmodular
subsystems.

LUN(LogicalUnitNumber)thehostvisibleidentifierassignedbytheusertoanLDEVtomake
itvisibleonahostport.AninternalLUNhasanominalQueueDepthlimitof32,andanexternal
(virtualized)LUNhasaQueueDepthlimitof2128(Adjustable).

RESTRICTEDCONFIDENTIAL

Page8

eLUN(externalLUN)anexternalLUNIsonewhichislocatedinanotherstoragearraythatis
attached via two or more ePorts on a Universal Storage Platform V and accessed by the host
throughaUniversalStoragePlatformV.

ePort(externalPort)AnexternalarrayconnectsviatwoormoreUniversalStoragePlatformV
Fibre Channel FED ports on the Universal Storage Platform V instead to a host. The FED ports
usedinthismannerarechangedfromahosttargetintoaninitiatorport(orexternalPort)by
useoftheUniversalVolumeManagerSoftwareproduct.TheUniversalStoragePlatformVwill
discover any exported LUNs on each ePort, and these will be configured as eLUNs used by
hostsattachedtotheUniversalStoragePlatformV.

LUSE(LogicalUnitSizeExpansion)Aconcatenation(spillandfill)of2to36LDEVs(uptoa
60TBlimit)thatisthenpresentedtoahostasasingleLUN.ALUSEwillnormallyperformatthe
levelofjustoneoftheseLDEVs.

MP(microprocessor)theCPUusedontheFEDsandBEDs.AlsocalledCHP(FED)orDKP(BED)

ParityGroupasetofoneortwoArrayGroups(asetof4or8diskdrives)formattedasaRAID
level,eitherasRAID10(oftenreferredtoasRAID1inHDSdocumentation),RAID5,orRAID6.
The Universal Storage Platform V's Parity Group types are RAID10 (2D+2D), RAID5 (3D+1P),
RAID5(7D+1P),andRAID6(6D+2P).InternalLDEVsarecarvedfromtheVDEV(s)corresponding
to the formatted space in a Parity Group, and thus the maximum size of an Internal LDEV is
determinedbythesizeoftheVDEVitiscarvedfrom.ThemaximumsizeofaninternalVDEVis
approximately2.99TB.IftheformattedspaceinaParityGroupisbiggerthan2.99TB,thenas
manymaximumsizeVDEVsaspossibleareassigned,andthentheremainingspaceisassigned
as the last, smaller VDEV. Note that there actually is no 4+4 Parity Group type see
ConcatenatedParityGroup.

PCBprintedcircuitboard;aninstallableboard(adapter).TherearetwoPCBsperFeature.

PDEV(PhysicalDEVice)aphysicalinternaldiskdrive.

SMA Shared Memory Adapter board (64 150MB/sec ports, 2 banks of RAM using 8 DDR2
DIMMslots,upto8GB)

SVPServiceProcessor(aPCrunningWindowsXP)installedinthecontrolrack

VDEVThelogicalcontainerfromwhichLDEVsarecarved.TherearefourtypesofVDEVs:
o InternalVDEV(2.99TBmax):mapstotheformattedspacewithinaparitygroupthatis
availabletostoreuserdata.LDEVscarvedfromaparitygroupVDEVarecalledinternal
LDEVs.
o External storage VDEV (2.99TB max): maps to a LUN on an external (virtualized)
subsystem.LDEVscarvedfromexternalVDEVsarecalledexternalLDEVs.
o Copy on Write (CoW) VDEV (2.99TB max): called a "VVOL group", and LDEVs carved
fromaCoWVVOLgrouparecalledaCoWVVOLs
o DynamicProvisioning(DP)VDEV(4TBmax):calledaVVOLgroup,andLDEVscarved
fromaDPVVOLgrouparecalledaDPVOLs(DynamicProvisioningVolumes).
VVOLGrouptheorganizationalcontainerofeitheraDynamicProvisioningVDEVoraCopyon
WriteVDEV.WithDynamicProvisioning,itisusedtoholdoneormoreDPVols.

RESTRICTEDCONFIDENTIAL

Page9

Overview of Changes
ExpandingontheprovenandsuperiorHitachiTagmaStoreUniversalStoragePlatformtechnology,the
Universal Storage Platform V offers a new level of Enterprise Storage, capable of meeting the most
demanding of workloads while maintaining great flexibility. The Universal Storage Platform V offers
muchhigherperformance,higherreliability,andgreaterflexibilitythananycompetitiveofferingtoday.
ThesearethenewfeaturesthatdistinguishthecompletelyrevampedUniversalStoragePlatformVfrom
thepreviousUniversalStoragePlatformmodels:

Software
o HitachiDynamicProvisioning(HDP)volumemanagementfeature.
Hardware
o EnhancedSharedMemorysystem(upto24GBand256paths@150MB/s).
o Faster,newergeneration800MHzNECRISCprocessorsonFEDs(ChannelAdapters)
andBEDs(DiskAdapters).
o FEDsMPshaveaMPWorkloadSharingfunction(perPCB)
o BEDshave4Gbit/sbackenddiskloops.
o SwitchedFCALLoopinterfacebetweenBEDsanddisks.
o HalfsizedPCBsusednow(exceptforSharedMemoryPCBs),allowingformore
flexibleFEDconfigurationcombinations.
o Supportforinternal1TB7200RPMSATAIIdisks.
o SupportforFlashDrives.

Software Overview
The Universal Storage Platform V software includes Hitachi Dynamic Provisioning, a major new Open
Systems volume management feature that will allow storage managers and system administrators to
moreefficientlyplanandallocatestoragetousersorapplications.Thisnewfeatureprovidestwonew
capabilities: thin provisioning and enhanced volume performance. Hitachi Dynamic Provisioning
providesforthecreationofoneormoreHitachiDynamicProvisioningPoolsofphysicalspace(eachPool
assignedmultipleLDEVsfrommultipleParityGroupsofthesamedisktypesandRAIDlevel),andforthe
establishment of DP volumes (DPVOLs, or virtual volumes) that are connected to a single Hitachi
DynamicProvisioningPool.
Thin provisioning comes from the creation of DPVOLs of a userspecified logical size without any
correspondingallocationofphysicalspace.Actualphysicalspace(allocatedas42MBPoolpages)isonly
assigned to a DPVOL from the connected Hitachi Dynamic Provisioning Pool as that DPVOLs logical
spaceiswrittenbythehosttoovertime.ADPVOLdoesnothaveanyPoolpagesassignedtoitwhenitis
first created. Technically, it never does the pages are loaned out from its connected Pool to that
DPVOLuntilthevolumeisdeletedfromthePool.Atthatpoint,allofthatDPVOLsassignedpagesare
returnedtothePoolsFreePageList.CertainindividualpagescanbefreedorreclaimedfromaDPVOL
usingfacilitiesoftheUniversalStoragePlatformV.
ThevolumeperformancefeatureisanautomaticresultfromthemannerinwhichtheindividualHitachi
Dynamic Provisioning Pools are created. A Pool is created using 21024 LDEVs (Pool Volumes) that
providethephysicalspace,andthePoolallocates42MBPoolpagesondemandtoanyoftheDPVOLs
connectedtothatPool.Eachindividual42MBPoolpageisconsecutivelylaiddownonawholenumber
of RAID stripes from one Pool Volume. Other pages assigned over time to that DPVOL will randomly
originatefromthenextfree42MBPagefromoneoftheotherPoolVolumesinthatPool.
RESTRICTEDCONFIDENTIAL

Page10

Asanexample,assumethatthereare12LDEVsfromtwelveRAID10(2D+2D)ParityGroupsassignedto
aHitachiDynamicProvisioningPool.All48disksinthatPoolwillcontributetheirIOPSandthroughput
powertoalloftheDPVOLsconnectedtothatPool.IfmorerandomreadIOPShorsepowerwasdesired
for that Pool, then it could have been, for example, created with 16 LDEVs from 16 RAID5 (7D+1P)
ParityGroups,thusproviding128disksofIOPSpowertothatPool.
As up to 1024 LDEVs may be assigned as Pool Volumes to a single Pool, this would provide a
considerable amount of I/O power to that Pools DPVOLs. This type of aggregation of disks was only
possiblepreviouslybytheuseofoftenexpensiveandsomewhatcomplexhostbasedvolumemanagers
(such as VERITAS VxVM) on each of the attached servers. On the Universal Storage Platform (and the
Universal Storage Platform V), the only alternative would be to build Striped Parity Groups
(ConcatenatedParityGroupsseeAppendix16)usingtwoorfourRAID5(7D+1P)ParityGroups.This
would provide either 16 or 32 disks under the volumes (VDEVs) created there. There is also the LUSE
option, but that is merely a simple concatenation of 236 LDEVs, a spillandfill capacity (not
performance)configuration.
AcommonquestionisHowdoestheperformanceofDPVOLsdifferfromtheuseofStandardVolumes
when using the same number of disks? Consider this example. Say that you are using 32 disks in 8
Parity Groups as RAID10 (2D+2D) with 8 LDEVs, all used as Standard Volumes over 4 host paths.
ComparethistoaHitachiDynamicProvisioningPoolwiththosesame8LDEVsasPoolVolumes,with8
DPVOLs configured against that Pool and used over 4 host paths. If the hosts are applying a heavy,
uniform, concurrent workload to all 8 Standard Volumes, then the server will see about the same
aggregateIOPScapacityaswouldbeavailablefromthe8DPVOLswiththesameworkload.However,if
the workloads per volume (Standard or DPVOL) are not uniform, or see intermittent workloads over
time,theDPVOLswillalwaysdeliveraconstantandmuchhigherIOPScapacitythanwilltheindividual
StandardVolumes.Ifonlyfourvolumes(StandardorDPVOL)weresimultaneouslyactive,thenthefour
StandardVolumeswouldonlyhavea16diskIOPScapacitywhilethefourDPVOLswouldalwayshave
thefull32diskIOPS capacity.Anotherwaytoseethisisthat theuseof StandardVolumescaneasily
leadtohotspots,whereastheuseofDPVOLswillmostlyavoidthem.

Hardware Overview
Figure 1. Frame and HDU Map
L2 Frame

L1 Frame

Control Frame

R1 Frame

R2 Frame

HDU-4L

HDU-4R

HDU-4L

HDU-4R

HDU-2L

HDU-2R

HDU-4L

HDU-4R

HDU-4L

HDU-4R

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

HDU-3L

HDU-3R

HDU-3L

HDU-3R

HDU-1L

HDU-1R

HDU-3L

HDU-3R

HDU-3L

HDU-3R

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

HDU-2L

HDU-2R

HDU-2L

HDU-2R

HDU-2L

HDU-2R

HDU-2L

HDU-2R

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

LOGIC BOX

HDU-1L

HDU-1R

HDU-1L

HDU-1R

HDU-1L

HDU-1R

HDU-1L

HDU-1R

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD front

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

16 HDD rear

DKU-L2

DKU-L1

RESTRICTEDCONFIDENTIAL

DKC

DKU-R1

DKU-R2

Page11

WhiletherearemanyphysicallyapparentchangestotheUniversalStoragePlatformVchassisandPCB
cards from the previous Universal Storage Platform model, there are also a number of notsoevident
internalconfigurationchangesthatanSEmustbeawareofwhenlayingoutasystem.
The Universal Storage Platform V is a collection of frames, HDUs, PCB cards in a Logic Box, FCAL
switches and disks (see Figure 1). The frames include the control frame (DKC) and the disk expansion
frames(DKUs).Disksareaddedtothe64diskHDUcontainers(upto18suchHDUs)insetsoffour(the
ArrayGroup).ArrayGroupsareinstalled(followingacertainupgradeorder)intospecificHDUdiskslots
oneithertheleftorrighthalf(suchasHDU1RorHDU1L)andfrontandrearofanHDUbox.Setsof
HDUsarecontrolledbyoneoftheBEDs.

Processor Upgrade
Theprocessor(MP)usedontheFEDs(theMPsarealsocalledCHPs)andBEDs(theMPsarealsocalled
DKPs) has been improved and its clock speed has been doubled. The quantities of processors
representedinTable1arevaluesperPCBbyfeature.AfeatureisdefinedasapairofPCBswhereeach
board is located on a separate power boundary. As there are twice as many FED/BED features in the
UniversalStoragePlatformVascomparedtotheUniversalStoragePlatform,theoverallprocessorcount
forFEDsandBEDsremainsthesamebuttheavailableprocessingpowerhasbeendoubled.
Table 1. Universal Storage Platform V Processor Enhancements

Feature

MPsperPCB
USP
USPV

FEDESCON8port

FEDESCON16port

2x400MHz

4x400MHz

FEDFICON8port

4x800MHz

FEDFICON16port

8x400MHz

4x800MHz

FEDFC8port

4x800MHz

FEDFC16port

4x400MHz

4x800MHz

FEDFC32port

8x400MHz

BEDFCAL8port

4x800MHz

BEDFCAL16port

8x400MHz

Cache and Shared Memory


TheDataCachesystemcarriesoverthesamepathspeeds(1064MB/s)andcounts(upto64paths)from
theUniversalStoragePlatform,withapeakwirespeedbandwidthof68GB/s.TheSharedMemory(or
ControlMemory)systemhasbeensignificantlyupgradedovertheUniversalStoragePlatform,with256
paths(upfrom192)operatingat150MB/s(upfrom83MB/s),withapeakbandwidthof38.4GB/s(up
from15.9GB/s).OntheUniversalStoragePlatform,theFEDPCBshad8SharedMemorypaths,andthe
BEDshad16.NowallofthePCBshave16SharedMemorypaths.
NotethattheDataCachesystemcontainsonlytheactualuserdatablocks.TheSharedMemorysystem
holdsallofthemetadataabouttheinternalParityGroups,LDEVs,externalLDEVs,andruntimetables
forvarioussoftwareproducts.Therecanbeupto512GBofDataCacheand24GBofSharedMemory.
RESTRICTEDCONFIDENTIAL

Page12

Features and PCBs


Logicboards(PCBs)areinstalledinthefrontandrearslotsintheLogicBoxintheControlFrame.The
logicboardtypesfortheUniversalStoragePlatformVincludethefollowing(asfeatures,eachapairof
PCBs):

CSWCacheSwitch1to4features(2to8PCBs)

CMACacheMemory1to4features(2to8PCBs)

SMASharedMemory1or2features(2to4PCBs)

FEDFrontendDirector(orChannelAdapter)1to14features(2to28PCBs)

BEDBackendDirector(orDiskAdapter)2,4,6,or8features(4,8,12,16PCBs)of4Gb/sec
FibreChannelloops

The Universal Storage Platform Vs new halfsized PCBs (now installed in upper and lower, front and
rear slots in the Logic Box, described later) allow for a less costly, more incremental expansion of a
system.Also,theBEDslotswillalsosupporttheuseofadditionalFEDPCBs(but2BEDfeaturesmustbe
installedataminimum).Forexample,therecouldbe14FEDfeatures(28PCBs)installedinaUniversal
StoragePlatform,andtheycouldbeanymixtureofOpenFibre,ESCON,FICON,andiSCSI.However,this
gaveyoualargenumberofportsofasingletypethatyoumaynotneed,withasubstantialreductionof
otherporttypesthatyoumayneedtomaximize.Withthenewhalfsizedcards,youcanhave18FED
features(orupto14FEDfeaturesifthenumberofBEDfeaturesisreducedtojust2),usinganymixture
(perfeature)oftheinterfacetypesasbefore.Now,astherearehalfasmanyportsperboard,smaller
numbersoflesserusedporttypesmaybeinstalled.FeaturesarestillinstalledaspairsofPCBcardsjust
aswiththeUniversalStoragePlatform.
Table 2. Summary of Limits, Universal Storage Platform to Universal Storage Platform V

Limits

USP

USPV

DataCache(GB)

128

512

RawCacheBandwidth

68GB/sec

68GB/sec

SharedMemory(GB)

12

24

SharedMemoryPaths(max)

192

256

RawSharedMemoryBandwidth

15.9GB/sec

38.4GB/sec

FibreChannelDisks

1152

1152

SATADisks

1152

LogicalVolumes

16k

64k

MaxInternalVolumeSize

2TB

2.99TB

MaxCoWVolumeSize

2TB

4TB

MaxExternalVolumeSize

2TB

4TB

IORequestLimitperFCFEDMP

2048

4096

NominalQueueDepthperLUN

16

32

HDPPools

128

MaxPoolCapacity

1.1PB

MaxcapacityofallPools

1.1PB

LDEVsperPool

1024

DPVolumesperPool

8192

DPVolumeSizeRange

46MB4TB

RESTRICTEDCONFIDENTIAL

Page13

OtherphysicalUniversalStoragePlatformVchangesinclude:

TheassociationsbetweensomeBEDsandHDUshavechanged

TheassociationsbetweenFEDs,BEDs,andtheCSWsaredifferent.

Thelocationsof50%oftheArrayGroupnameshaveshiftedaround.

IttakestwoBEDfeatures(4PCBs)tosupportthe8diskParityGroups(justaswasthecaseon
theLightningmodels).Hence,BEDsmustbeinstalledaspairsoffeatures(4PCBs).

Architecture Details
At the core of the Universal Storage Platform V is a fourth generation, nonblocking, high speed
switchedarchitecture.TheUniversalStarNetworkisafullyfaulttolerant,highperformance,truenon
blockingswitchedarchitecture.ThesestateoftheartHitachisupercomputerderivedcrossbarswitches
are described in more detail below. There are no specific models as there were with the Universal
StoragePlatformproducts.Thereisnowasinglebasemodelontowhichupgradefeaturesareappliedto
meettheneedsofthecustomer.
Figure 2. Universal Storage Platform V System Components and Bandwidths Overview
USP V Universal Star Network
128 control ports x 150 MB/s
(19.2 GB/s)

Shared
Memory
(base)
8-12GB

Cache Memory
(base) 8-128GB

Cache Memory
(Opt 1) 8-128GB

128 control ports x 150 MB/s


(19.2 GB/s)

Shared
Memory
(option1)
8-12GB

Cache Memory
(Opt 2) 8-128GB

Cache Memory
(Opt 3) 8-128GB
64 Cache
Paths
(68 GB/s)

Crossbar
Switches
64 concurrent CACHE Paths @ 1064 MB/s
CSW 1
(base)
16 ports

CSW 2
(base)
16 ports

CSW 3
(Opt 1)
16 ports

CSW 4
(Opt 1)
16 ports

CSW 5
(Opt 2)
16 ports

CSW 6
(Opt 2)
16 ports

CSW 7
(Opt 3)
16 ports

CSW 8
(Opt 3)
16 ports

64 concurrent FED/BED paths @ 1064 MB/sec

4 - 32 Cache
Paths
(34 GB/s)

4 - 32 Cache
Paths
(34 GB/s)

BED

BED

BED

BED

BED

BED

BED

BED

FED

FED

FED

FED

FED

FED

FED

FED

base

Opt 1

Opt 2

Opt 3

Opt 4

Opt 5

Opt 6

Opt 7

base

Opt 1

Opt 2

Opt 3

Opt 4

Opt 5

Opt 6

Opt 7

8 - 64 4Gbit/s
DISK LOOPS

1152
DISKS

8 224 4Gbit/s
FC HOST PORTS

HOSTS / External Storage

RESTRICTEDCONFIDENTIAL

Page14

Forquickreference,Figure2abovedepictsthefullyoptionedUniversalStoragePlatformVarchitecture.
OtherUniversalStoragePlatformVconfigurationsaredetailedfurtherdown.Thisarchitectureincludes
64x1064MB/secDatapathsrepresenting68GB/secofdatabandwidthand256x150MB/secControl
paths representing 38.4GB/sec of metadata and control bandwidth. When comparing the Universal
StoragePlatformVtoothervendorsmonolithiccachesystems(suchasEMCDMX4),theaggregateof
the Universal Storage Platform Vs Data + Control bandwidths must be used for an applestoapples
comparison. The throughput aggregate for the fully optioned system is a fully usable 106.4GB/s. This
includes any overhead for block mirroring operations that occur in both the Data Cache and Shared
Memorysystems(discussedfurtherdown).

Summary of Installable Hardware Features


ThetablesbelowshowoverviewsoftheavailablefeaturesforboththeUniversalStoragePlatformand
theUniversalStoragePlatformV.ExceptfortheSharedMemoryPCBs,alloftheotherPCBsarenowhalf
sized. The lower two tables show the tradeoff between installed Universal Storage Platform V BED
featuresandavailableUniversalStoragePlatformVFEDfeatures.Ifsomeoneneedsalargenumberof
ports (and reduced amounts of internal disks and performance) for virtualization, additional FED
featuresmaybeinstalledintheplaceofupto6BEDfeatures.
Table 3. Comparison: Universal Storage Platform and Universal Storage Platform V Features

Features

USP

USP V

FEDs
BEDs
Cache
SharedMemory

17
14
2
2

114
2,4,6,8
4
2

CacheSwitches

FED 16-Port Fibre Channel Kits Installed


BED PAIRS
Installed

FED #1

FED #2

FED #3

FED #4

FED #5

FED #6

FED #7

FED #8

FED #9

FED #10

FED #11

FED #12

FED #13

FED #14

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

1&2
3&4
5&6
7&8

16
16
16
16

32
32
32
32

48
48
48
48

64
64
64
64

80
80
80
80

96
96
96
96

112
112
112
112

128
128
128
128

144
144
144

160
160
160

176
176

192
192

208

224

FED 8-Port FICON or ESCON Channel Kits Installed


BED PAIRS
Installed

FED #1

FED #2

FED #3

FED #4

FED #5

FED #6

FED #7

FED #8

FED #9

FED #10

FED #11

FED #12

FED #13

FED #14

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

Total Ports

1&2
3&4
5&6
7&8

8
8
8
8

16
16
16
16

24
24
24
24

32
32
32
32

40
40
40
40

48
48
48
48

56
56
56
56

64
64
64
64

72
72
72

80
80
80

88
88

96
96

104

112

Logic Box Details


TheLogicBoxchassis(mainDKCframe)iswhereallofthedifferenttypesofPCBsforthefeaturesare
installedinspecificslots(seeFigure3).ThislayoutisverydifferentfromthepreviousUniversalStorage
Platformmodel.AllofthePCBsarenowhalfsized,andthereareupperandlowerslotsinadditionto
RESTRICTEDCONFIDENTIAL

Page15

thefrontandbacklayout.TheassociationsamongBEDs,FEDsandtheCSWs(cacheswitches)arealso
different.ThefactorynamesofthePCBtypesandtheiroptionnumbersareshownaswell.Notethat
FEDupgradefeaturesmayconsumeuptosixpairsofunusedBEDslots.
Figure 3. Logic Box Slots (BED slots 3-8 can be used for FEDs if desired)
USP-V Rear (Cluster-2)

2QL
Opt 3

2PL
Opt 1

2NL
Opt 4

2ML
Opt 3

1FU
Opt 2

1EU
Opt 1

1DU
Basic

1BU
Opt 2

1AU
Opt 1

UPPER

2RL
Opt 4

LOWER

BED-1

2CH
Opt 4

BED-3

BED-2

2SD
Opt 1

BED-4

CSW-0

2CD
Opt 2

CSW-1

FED-1

2TL
Opt 7

FED-3

FED-2

2UL
Opt 8

FED-4

2VL
Opt 3

CMA-4

2WL
Opt 8

SMA-1

2XL
Opt 7

CMA-2

CMA-3

2MU
Opt 1

SMA-2

2NU
Opt 2

CMA-1

2PU
Basic

FED-5

2QU
Opt 1

FED-7

2RU
Opt 2

FED-6

2CG
Opt 3

FED-8

2SB
Opt 2

CSW-2

2CC
Opt 1

CSW-3

2TU
Opt 5

BED-6

2UU
Opt 6

BED-8

2VU
Opt 2

BED-5

2WU
Opt 6

BED-7

2XU
Opt 5

FED-5

CMA-3

SMA-1

CMA-1

FED-2

FED-1

CSW-0

BED-2

BED-1

1LL
Opt 7

1KL
Opt 8

1Jl
Opt 3

1HL
Opt 8

1GL
Opt 7

1CF
Opt 4

1SC
Opt 2

1CB
Opt 2

1FL
Opt 4

1EL
Opt 3

1DL
Opt 1

1BL
Opt 4

1AL
Opt 3

FED-7

CMA-4

SMA-2

CMA-2

FED-4

FED-3

CSW-1

BED-4

BED-3

LOWER

FED-6

1CA
Opt 1

FED-8

1SA
Opt 1

CSW-2

1CE
Opt 3

CSW-3

1GU
Opt 5

BED-6

1HU
Opt 6

BED-8

1JU
Opt 2

BED-5

1KU
Opt 6

BED-7

1LU
Opt 5

UPPER

USP-V Front (Cluster-1)

Memory Systems Details


The Universal Storage Platform V memory subsystems are unique among all enterprise arrays on the
market. All other designs use a single global cache space that controls the operation of the array. All
accessesfordata,controltables,metadata,replicationsoftwaretablesandsuchallgoagainstthesame
commoncachesystem.Alloftheseactivitiescompetewithoneanotheroverthesameinternalpaths
forcachebandwidth.Assuch,cacheaccessisaprimarychokepointoncompetingdesigns.
TheUniversalStoragePlatformVhasfourparallelhighperformancememorysystemsthatisolateaccess
todata,metadata,controltablesandarraysoftware.Theseinclude:

SharedMemory(ControlMemory)systemformetadataandcontroltables

Datacache(datablocksonly)

LocalRAMpooloneachFEDandBEDPCB(upto32suchPCBsinanarray)forusebythefour
MP processors on each board (128 processors in a full array). This RAM is used for the
workspaceforeachMP,localdatablockcachingandlocalLUNmanagementtables.

NVRAMregionforeachMPthatholdsthemicrocodeandoptionalsoftwarepackages.

RESTRICTEDCONFIDENTIAL

Page16

Shared Memory (SMA)


The SMA subsystem is critical to achieving the Universal Storage Platform Vs very high array
performance as all control and metadata information is contained within this system. Access to and
managementofalldatablocksintheDataCacheismanagedbytheSMAsystem.TheSMAsubsystemis
uniqueamongallenterprisearraysonthemarket.Allotherdesignsuseasingleglobalcacheforalldata,
metadata,controltables,andsoftware.Theseotherdesignscreateaserializationofaccessintocache
forallofthesecompetingoperations.TheSMAdesignremovesallofthenondataaccesstoDataCache,
allowingdatablockstobemovedunimpededatveryhighrates.
TheSMAsubsystemissizedaccordingtothearrayconfiguration(hardwareandsoftware).AUniversal
StoragePlatformVcanhaveupto24GBofSMAinstalledacrosstwofeatures(4PCBs).Asectionofeach
boardismirroredontotheotherboardofthefeatureforredundancy.
TheLogicBoxchassishasslotsforfourSMAPCBs.OneSMAfeatureisinstalledinthebasesystemand
one more feature (2 PCBs) is an upgrade option. Due to the way the Shared Memory subsystem
functions, the optional SMA feature should always be installed in a system. As described below, each
FEDandBEDPCBhas8pathsintotheSharedMemorysubsystem.Itisimportantto knowthatthese
pathsarehardwiredtospecificportsoneachSMAPCB.EachFEDorBEDPCBdirects2SMApathsto
each installed SMA PCB. With all 4 SMA PCBs, this means that all 8 SMA paths per FED/BED PCB are
connected.
Figure 4. Shared Memory PCB

EachSMAPCBhas8DDR2333DIMMslots,organizedastwoseparatebanksofRAM(4slotseach).A
singleboardcansupportupto8GBofRAMwhenusingeither1GBor4GBDIMMs.Thepairofboardsfor
eachSMAfeatureisinstalledindifferentpowerdomainsintheLogicBox(Cluser1andCluster2).Each
PCB has 64 150MB/s SMA (control) paths. This provides for 9.6GB/s of bandwidth (wire speed) per
board. Each bank of RAM has a peak bandwidth of 5.19GB/s, or a concurrent total of 10.38GB/s per
board.DuetotheveryhighspeedoftheRAMcomparedtotheindividualSMAports,withportbuffering
andportinterleaving,theRAMallowsthe64SMApathstooperateatfullspeed.

RESTRICTEDCONFIDENTIAL

Page17

For each SMA feature, there is a small region that is mirrored onto the other board in that pair for
redundancy.(Note:TheregionusedforHitachiDynamicProvisioningcontrolisbackedupontoaprivate
diskregiononsomePoolVolumes.)TheCluster1SMAboardsareassociatedwiththeCacheAregionof
theDataCache(explainedbelow),whiletheCluster2boardsareassociatedwiththeCacheBregion.
The basic configuration information for the array is located in the lower memory address range of
SharedMemory.ThisisalsobackedupontheNVRAMlocatedoneachFEDandBEDPCBinthesystem.
The metadata and tables for optionalsoftware products (such as Hitachi Dynamic Provisioning) in the
other (higher) memory address regions in Shared Memory are backed up to the internal disk in the
Serviceprocessor(SVPPC)atarraypowerdown.

Cache Switches (CSW) and Data Cache (CMA)


Each Universal Storage Platform V has up to 8 Data Cache boards (CMA) and 8 Cache Switch boards
(CSW).ThecacheswitchesconnecttheindividualcacheboardstothecacheportsoneachFEDandBED
board.TheLogicBoxchassishasslotsfor8CacheMemory(CMA)and8CacheSwitch(CSW)PCBs.One
CMA feature and one CSW feature are installed in the base system and three more features (6 PCBs
each)ofeachtypemaybeaddedasupgrades.Theeight1064MB/scacheportsoneachCMAboardare
attachedtohalfoftheportsonthecachesideofcentralCacheSwitches.Theprocessorsideports
on the Cache Switches are connected to FED and BED cache ports. Due to the way the Data Cache
subsystemisinterconnectedthroughtheCacheSwitchsystem,anyCMAportonanyFEDorBEDboard
can address any cache board. As described below, each FED and BED PCB has 2 paths into the Cache
Switchsubsystem.

CSW - Cache Switch


EachCacheSwitchboard(CSW)has16bidirectionalports(8tocacheports,8toFED/BEDports),each
portoperatingat1064MB/s.Thesecacheswitchesareveryhighperformancecrossbarswitchesasused
in high performance supercomputers and servers. In fact, these CSW boards are a 2D scaled down
version of a 3D switch used in Hitachi supercomputers. The CSW was designed by the Hitachi
SupercomputerandHitachiNetworksDivisions.Theinternalporttoportlatencyofthisswitchisinthe
nanosecondrange.
ThenumberofCSWfeaturesneededdependspurelyonthenumberofCHA/DKAfeaturesinstalled.
AddingmoreCSWfeaturestoanexistingconfigurationonlypermitsmoreFED/BEDfeaturesto
beinstalled

AddingmoreCSWfeaturesdoesnotaffecttheperformanceofexistingFED/BEDfeatures.

EachCSWboardcansustain81064MB/stransfersbetweenthe8FED/BEDpathsandthe8cacheports
foranaggregateof8.5GB/sperboard.AstheremaybeuptofourCSWfeaturesinstalledpersystem(8
CSW PCBs), there can be 64 cacheside ports switching among 64 FED/BED processor ports. This
provides for a total of 68GB/s per array of nonblocking, dataonly bandwidth to cache from the
FED/BED boards at extremely small (nanosecond) latency rates. No other vendor has a nonblocking
design(despitesomeunsupportableclaimstothecontrary),noraretheyabletoprovidesustaineddata
onlybandwidthatanywherenearthisrate.ThisbandwidthisnotconstrainedbythetypeofI/O(read
versuswrite)sinceallcachepathsarefullspeedandbidirectional.
Figure 5 illustrates a Universal Storage Platform V array with only two CSW and two CMA features
installed.IfallfourCSWandCMAfeatures(8boardseach)wereinstalled,theneachcachesidepath
RESTRICTEDCONFIDENTIAL

Page18

onaCSWPCBwouldgotooneoftheeightCSAPCBs.EveryCSWhasatleastonepathtoeveryCMA.
Farther down in this document are maps of associations of FEDs and BEDs to the CSWs. There is a
certainrelationshipfromtheCSWprocessorsidetotheFEDandBEDCMAports.
Figure 5. Universal Storage Platform V with Two Cache Features and Two Cache Switch Features Installed.

Cache

Cache

Cache

Cache

Cache Side

Cache Side

Cache Side

Cache Side

CSW

CSW

CSW

CSW

Processor Side

Processor Side

Processor Side

Processor Side

CMA - Data Cache


Each CMA cache board has 16 DDR2400 DIMM slots, organized as four banks of RAM (thus 32
independentbanksinallfourfeatures).EachCMAboardcansupport864GBofRAMusingasinglesize
of DIMM across all installed CMA boards. The same amount of RAM must be installed on each CMA
board. The pair of boards for a CMA feature is installed in different power domains in the Logic Box
(Cluster1andCluster2).
EachCMAhas81064MB/sbidirectionaldatapaths.Thisprovidesfor8.5GB/sofreadwritebandwidth
(wirespeed)perboard.EachbankofRAMhasapeakbandwidthof6.25GB/s,or25GB/sperboard.Due
to the very high speed of the RAM compared to the CMA ports, with port buffering and port
interleaving,theRAMallowsall8CMAcachepathsperboardtooperateatfullspeed.
Figure 6. Data Cache Board (one of a feature pair)

RESTRICTEDCONFIDENTIAL

Page19

Data Cache Operations Overview


TheDataCacheisorganizedintotwocontiguousaddressrangesnamedCacheAandCacheB.TheRAM
located across the cluster1 CMA boards is a single address range called CacheA, while that of the
cluster2boardsiscalledCacheB.TheaddressrangeofCacheBimmediatelyfollowstheendofCacheA
(illustratedbelow).Thisentirespaceisaddressableasunitsof2KBcacheblocks.AdatablockforanI/O
readrequestcanbeplacedineitherCacherangebytheBEDMP(orFEDMPforexternalstorageI/O
operations).WriteoperationsaresimultaneouslyduplexedintobothCacheAandCacheBbytheFED
MPprocessingtherequest.Notethatonlytheindividualwriteblocksaremirrored,nottheentirecache
space.
Figure 7. Cache Address Space - Cache-A and Cache-B

Thedatacacheislogicallydividedupintological256KBcacheslotsacrosstheentirelinearcachespace
(the concatenation of CacheA and CacheB). It is a single uniform address space rather than a set of
isolatedpartitionedspaceswhereonlycertaindevicesaremappedtoindividualregionsonindependent
cacheboards.
Whencreated,eachUniversalStoragePlatformVLDEV(apartitionfromaParityGroup)isallocatedan
initial set of 16 cache slots for Random I/O operations. A separate set of 24 cache slots is allocated
when a FED MP detects sequential I/O operations occurring on that LDEV. The initial set of 16 cache
slots (per LDEV) for use for processing Random I/O requests can be temporarily expanded by one or
more additional sets of 16 random I/O 256KB slots as needed (if available from the associated cache
partitionsfreespace).
SinceindividualLDEVscandynamicallygrowtoverylargesizesintermsoftheircachefootprintbased
ontheworkloads,theuseofCachePartitioning(usingtheVirtualPartitionManagementpackage)can
be used to fence off the cache space usable by selected Parity Groups (with all of their LDEVs) to
managetheirmaximumallocatedrandomslotsets.

Random I/O Cache Operations


Each logical 256KB cache slot contains four logical 64KB cache segments. For each cache slot, two of
these 64KB segments are only used for read operations, and two are only used for write operations
(moreaboutthisdistinctionbelow).Furthermore,eachofthe64KBsegmentsisfurthersubdividedinto
four logical 16KB subsegments. Each of these is then divided into eight physical 2KB Cache Blocks.
Therefore,each256KBcacheslotcontains642KBcacheblocksforrandomreadsandanother642KB
blocksforrandomwrites.AnycacheblockinCacheAandCacheBcanbeusedtobuildthelogical16KB
subsegments(managedbytablesinSharedMemory).

RESTRICTEDCONFIDENTIAL

Page20

InitialRandomCacheAllocationPerLDEV
256KBCacheSlot
64KBReadSegments
64KBWriteSegments
16KBReadSubsegments
16KBWritesubsegments
2KBReadCacheBlocks
2KBWriteCacheBlocks

16
32
32
128
128
512
512

Thetableaboveshowstheoverallbreakdownofelementsforasetof16randomI/Ocacheslotsforone
LDEV.Figures810belowshowtherelationshipamongtheseelements.
Figure 8. Random I/O: Logical Cache Slot and Segments Layout
256KB Cache slot (random)
64KB Read segment
64KB Write Segment

64KB Read segment

64KB Write Segment

16KB Sub-seg

Figure 9. Random I/O: Logical Segment and Sub-segment Layout


64KB Segment
16KB Sub-seg
16KB Sub-seg

16KB Sub-seg

Figure 10. Random I/O: Logical Sub-segments and Physical Cache Blocks Layout
2KB block

2KB block

2KB block

16KB Sub-Segment
2KB block 2KB block

2KB block

2KB block

2KB block

Itisbecauseofthesesmall2KBcacheblockallocationunitsthattheUniversalStoragePlatformVdoes
so well with very small block sizes and random workloads. In general, random workloads are usually
associatedwithapplicationblocksizesof2KBto32KB.Othervendorsuseasinglefixedcacheallocation
size (such as 32KB and 64KB for EMC DMX) for individual I/O operations. Hence, a 2KB host I/O will
waste30KBor62KBofeachEMCDMXcacheslot.OntheIBMDS8000series,thecacheallocationunitis
4KB,therebyavoidingwastingcachespace.ButtheDS8000arraysareactuallygeneralpurposeservers
runningtheAIXoperatingsystemusingthelocalsharedRAMonindividualprocessorbooks.Onboth
EMCandIBM,theirglobalcachecontainstheoperatingsystemspace,allstoragecontrolandmetadata,
all software, and all data blocks. On the Universal Storage Platform V, the CMA cache system is only
usedforuserdatablocks.

Write Operations
Inthecaseofhostwrites,theindividualdatablocksaremirrored(duplexed)toanotherCMAboardona
different power boundary (i.e. CacheA and CacheB). The mirroring is only for the actual user data
blocks,unlikethestaticmirroringof50%ofcachetotheother50%ofcacheasisusedbyEMCforDMX
3andDMX4.
Forexample,ifahostapplicationwritesan8KBblockofdata,that8KBblockwillbewrittentotwoCMA
boards, using four 2KB cache blocks from a 64KB Write segment in CacheA and another set of four
cache blocks from CacheB. The FED MP that owns the port on which the host request arrived is the
processorthatperformsthisduplexingtask.Afterthewritehasbeenprocessedanddestagedtodisk,
the 8KB of mirrored blocks will be deleted, and the other 8KB will remain in cache (after being
remappedtoaReadsubsegmentseebelow)forapossiblefuturereadhitonthatdata.
RESTRICTEDCONFIDENTIAL

Page21

NotethatareadcachehitondatanotyetdestagedtodiskfromaWriteSegmentisnotpossible.Infact,
no host reads are directed against Write Segments. A read request for a block which still has a write
pendingconditionwillforceadiskdestagefirst,andallpendingwritesinthatsame64KBsegmentwill
beincludedinthisforceddestage.Similarly,awritecachehitisnotpossibleatanytimeifawritehitis
meantasoverwritingapendingwriteblock.OntheUniversalStoragePlatformV,allblockswrittento
cachemustbedestagedtodiskintheordertheyarrived.Thereisnooverwritingofdirtyblocksnot
yetwrittentodisksasisthecasewithamidrangearray.OntheUniversalStoragePlatformV,awritehit
actuallymeansthatthereisemptyspaceinoneofthatLDEVsexistingwritesegmentsalreadyavailable
foranewI/Orequest.Awritemisswouldthenmeanthatnosuchexistingspaceisavailable,andeither
another set of 16 random I/O cache slots will beallocated for use by that LDEV, or (if cache slots are
tight)allofthependingwritesinthatLDEVsallocated64KBWriteSegmentswillbedestaged,freeing
upallofitsrandomwritespace.

Write Pending Limit


There is a write pending limit trigger of 30% and 70% per cache partition (the base partition if others
havenotbeencreatedusingtheVirtualPartitionManagerproduct).Whenthe30%limitishit,thearray
begins adjusting the internal priority of write operations higher than those for reads. When the 70%
limitishit,thearraygoesintoanurgentlevelofdatadestage(writestakeprecedenceoverreads)to
diskfromthose64KBrandomsegmentsusedforwrites.IfusingVPMandcachepartitions(andaglobal
modesettingof454=ONforthearray),inmostcasesonlythe64KBWriteSegmentswithinapartition
see this urgent destage (the mode switch creates and averaging mechanism to prevent all partitions
from being similarly affected). Other partitions are generally isolated with their own write pending
triggers. In most cases a 70% write pending limit isan indication of too few disks absorbing the write
operations.

Read versus Write Segments


ThereisaninterestingeventthathappensafteraWriteSegmentisprocessedbyadestageoperation.
Allcacheslots,aswellastheirReadandWrite64KBsegments,subsegments,and2KBcacheblocksare
actuallydispersedacrosstheentirecachespacemanagedatthe2KBcacheblocklevel.These2KBcache
blocksaredynamicallyremappedtovariouscachesegments.AWritesegmentcannotbeusedforreads.
Sohowdoesafreshwritethathasbeendestagedbecomereadableasacachehitforfollowonread
requests?
Suppose8KBofdatawaswrittentoeight2KBcacheblocks(fourinoneWriteSegment,withfourmore
inaWriteSegmentmirror).Afterthediskdestageoperationoccurs,fourofthoseblocksaremarkedas
free (from the mirrored Write Segment) and four are remapped into a Read Segment (thus becoming
readableforacachehit).ThesefoursurrenderedblocksfromthatWriteSegmentsaddressspaceare
replaced by four unused 2KB cache blocks. The user data was not copied, but the logical location of
thoseblocksdidchange.AllofthisismappingmanagedinthecachemapinSharedMemorysystemand
occurs at RAM speeds. In fact, nearly everything in the Universal Storage Platform V (internal LDEVs,
external LDEVS, Hitachi Dynamic Provisioning pages, etc.) is a mapped entity with a high degree of
flexibilityonhowtomanageit.
Sequential I/O Cache Operations and Sequential Detect
A separate set of 24 cache slots is allocated to an individual LDEV when a sequential I/O pattern is
detectedagainstthatLDEVonaportownedbyaFEDMP(knownassequentialdetect).Thissetof24
slots is released when a sequential I/O pattern is no longer detected. This is one reason a proper
sequentialdetectstateisimportantontheUniversalStoragePlatformV.IfsequentialI/Oisbrokenup
RESTRICTEDCONFIDENTIAL

Page22

acrossseveralportsondifferentFEDs(forexample),itwilllooklikerandomI/Ototheseveralindividual
MPscontrollingthosehostports.Asanexample,theuseofahostbasedLogicalVolumeManagerthat
createsalargeRAID0volumebystripingacrossseveralLDEVsonseveraldifferenthostportscandefeat
thesequentialdetectionmechanism.SequentialdetectismanagedforeachLDEVcontrolledbyanMP
onaFEDfortheoneortwoFEDportsthatitowns.
Unlike the cache slots assigned for random I/O, each 256KB cache slot is used as a single segment of
256KBforsequentialI/O.ThismatchestheRAIDchunksizeof256KBonthedisks.Insequentialmode,
anentire256KBchunkisreadfromorwrittentoeachdisk.Sequentialprefetchwillreadseveralsuch
chunks into cache slots without being instructed to do so by the controlling MP on the FED involved.
Sequential writes will be performed as fullstripe writes to a Parity Group, thus minimizing the write
penalty for RAID levels 5 and 6. The data in these sequential cache slots can be reusable for a
subsequentcachehit.Ingeneral,oncethedatahasbeenprocessed,thesedynamicallyallocatedcache
slotsarereleasedforthenextsequentialI/OoperationagainstthatLDEVorarereturnedtothecache
pool once sequential detect is no longer present (for some period of time) for that LDEV. However,
certain lab tests do show a high cache hit rate where a small number of LUNs under test have high
sequentialreadratiosanduselargeblocksizes(suchas1MB).
Figure 11. Sequential I/O Read Slot
256KB Cache slot (sequential)
256KB Read segment

Figure 12. Sequential I/O Write Slot


256KB Cache slot (sequential)
256KB Write segment

BED and FED Local RAM


ThereisapoolofRAMinstalledoneachFEDorBEDPCB.ThisRAMissharedbythefourMPprocessors
perPCB.ThisiswheretheLUNmanagementtables(suchasMPWorkloadSharing),sequentialdetect
histograms and limited local caching for highly reused data blocks are located. There is also NVRAM
associatedwitheachMPforholdingthesystemmicrocodeandoptionalsoftwarepackages.Notethat
other designs locate both of these systems in a monolithic global cache system and every access for
these elements interferes with user data block movement. On the Universal Storage Platform V the
degreeofmemoryparallelismremovesthisburdenfromthedatacachesystem.
When new firmware is to be flashed into the NVRAM owned by each MP, the new software is first
copied into the local RAM (over the internal MPtoSVP network), and then, onebyone, each MP
suspends its activities, copies the new microcode into its NVRAM, and then reboots. The next MP on
thatFEDorBEDboardwillthenfollowthesamestepsuntilallfourMPsareupdated.

Front-End Director Concepts


The Frontend Directors manage the access and scheduling of I/O requests to and from the attached
servers.TheFEDscontainthehostportsthatinterfacewiththeserversor(whenusingOpenFibreFEDs)
attachtoexternalstorage.TherearefourtypesofFEDfeaturesavailable:8portOpenFibre,16port
OpenFibre,8portFICON,and8portESCON.Thevarioustypesofinterfaceoptionscanbesupported
simultaneouslybymixingFEDfeatureswithintheUniversalStoragePlatformV.
RESTRICTEDCONFIDENTIAL

Page23

Table 4. Comparison: Universal Storage Platform V FED Features

FEDOptions

Features

TotalPorts

USPV

USPV

OpenFibre

014

0224

or

ESCON

08

064

or

FICON

08

064

RefertotheFigurebelowforthefollowingboardcomponentsdiscussion.
The FED processors (CHPs, more commonly called MPs) manage the host I/O requests, the
Shared Memory and Cache areas, and execute the microcode and the optional software
featuressuchasDynamicProvisioning,UniversalReplicator,VolumeMigrator,andTrueCopy.

The DX4 chips are Fibre Channel encoderdecoder chips that manage the Fibre Channel
protocolonthecablestothehostports.

TheDataAdapter(DTA)chipisaspecialASICforcommunicationwiththeCSWs.

TheMicroprocessorAdapter(MPA)isanASICforcommunicationwiththeSMAPCBs.

AllI/O,whetheritisReadsorWrites,internalorexternalstorage,mustpassthroughtheCache
andSharedMemorysystems.

Figure 13. Example of a Universal Storage Platform V FED Board (16-port Open Fibre Feature)

MPA

RESTRICTEDCONFIDENTIAL

Page24

FED Microcode Updates


TheUniversalStoragePlatformVmicrocodeisupdatedbyflashingeachMPsNVRAMintheFEDPCBs.A
copyofthismicrocodeissavedinthelocalRAMoneachFEDPCB,andeachMPonthePCB(fourMPs)
will perform a hot upgrade in a rolling fashion. While host ports are still live, each MP will take itself
offline,flashthenewmicrocodefromthelocalFEDRAM,andthenrebootitself.Itthenreturnstoan
online state and resumes processing I/O requests and executing the optional installed software. The
typicaltimeforthisprocessis30secondsperMP.ThishappensindependentlyoneachFEDPCBandin
parallel.

FED FC-8 port, ESCON, and FICON: Summary of Features


Table5showstheUniversalStoragePlatformVoptionsforthe8portFibreChannel,8portFICON,and
8portESCONfeatures.Thefrontendportcounts,theFED(CHA)PCBnames,andtheassociatedCSWs
arealsoindicated.
Table 5. 8-port FED Features

FC- 8 port, FICON, ESCON Features


Pair
FED Boards
Ports

CSW

Feature1

FED1

FED00

FED08

Feature2
Feature3
Feature4
Feature5

FED2
FED3
FED4
FED5

FED02
FED01
FED03
FED04

FED0A
FED09
FED0B
FED0C

16
24
32
40

0
1
1
2

Feature6
Feature7

FED6
FED7

FED06
FED05

FED0E
FED0D

48
56

2
3

Feature8

FED8

FED07

FED0F

64

FED FC-16 port Feature


Table 6 shows the Universal Storage Platform V options for the 16port Fibre Channel features. The
frontendportcounts,theFEDPCBnames,andtheassociatedCSWsarealsoindicated.Thisportcount
canbeincreasedupto224(usingupto14FEDfeatures)ifthenumberofBEDfeaturesisminimal(two)
inordertoallowformoreFEDfeatures(bytaking12BEDslots).ThisFEDexpansionwouldbeusedfor
adding a large amount of external storage to the Universal Storage Platform V, where less than 256
internaldiskswouldbeconfigured.
Table 6. Universal Storage Platform V FED FC-16 Feature

FC- 16 port Feature


Pair
FED Boards
Feature1
Feature2
Feature3
Feature4
Feature5
Feature6

RESTRICTEDCONFIDENTIAL

FED1
FED2
FED3
FED4
FED5
FED6

FED00
FED02
FED01
FED03
FED04
FED06

FED08
FED0A
FED09
FED0B
FED0C
FED0E

Ports

CSW

16
32
48
64
80
96

0
0
1
1
2
2

Page25

Feature7

FED7

FED05

FED0D

112

Feature8
Feature9
Feature10

FED8

FED07

FED0F

128

FED9
FED10

(BED3)
(BED4)

144
160

1
1

Feature11
Feature12
Feature13

FED11
FED12
FED13

(BED5)
(BED6)
(BED7)

176
192
208

2
2
3

Feature14

FED14

(BED8)

224

I/O Request Limits and Queue Depths (Open Fibre)


Every fibre channel path between the host and the storage array has a specific maximum capacity,
knownastheMaximumI/ORequestLimit.Thisisthelimittotheaggregatenumberofrequestsbeing
directed against the individual LUNs. For the Universal Storage Platform V, this limit is 4096, and is
associatedwitheachFEDMP.Onthe16portfeature,wheretherearetwoportsmanagedbyeachMP,
4096 is the aggregate limit across those two ports. [This is handled differently for FICON, where the
MPslimitis480forOpenExchangemessaging.]However,whenaport(actuallytheassociatedMP)is
placedinexternalstoragemode(becomesaninitiator),theportI/ORequestLimitdropsto256.
QueueDepthisthemaximumnumberofoutstandingI/OrequestsperLUN(internalorexternal).Thisis
separatefromtheMPsportI/ORequestLimit.NotethatthereisnoconceptofQueueDepthforESCON
orFICONvolumes.
OntheUniversalStoragePlatformV,theperLUNnominalqueuedepthis32butcanbemuchhigher
(up to 4096) in the absence of other workloads on that MPs ports for internal LUNs. In the case of
external (Virtualized)LUNs,thequeue depthcanbesetto2to128intheUniversalVolumeManager
GUIasshownbelow.IncreasingthedefaultqueuedepthvalueforexternalLUNsfrom8upto32can
oftenhaveaverypositiveeffect(especiallyResponseTime)onOLTPlikeworkloads.Theactuallimitwill
dependonhowmanyotherconcurrentlyactiveexternalLUNsthereareonthatport.
Figure 14. Storage Navigator Screen Showing External Port Queue Depth Control

RESTRICTEDCONFIDENTIAL

Page26

MP Distributed I/O (Open Fibre)


TheOpenFibreFEDs(notavailableonFICONorESCONFEDs)ontheUniversalStoragePlatformVhavea
newfeatureknownasMPDistributedI/O.SomeoftheworkpertainingtoanindividualI/Oonasingle
portmaybeprocessedbyanotherMPonthesamePCBratherthantheMPthatnormallycontrolsthat
port. [See Appendix 8 for the portMP mappings.] A heavily loaded MP can hand off some tasks
pertaining to data, metadata, software (SI, TC, HUR), and Hitachi Dynamic Provisioning
PooladministrativetaskstoanotherMPthatislessbusy.MPscanbeconfiguredtobeinoneoffour
modes:target(host),HURA(Send),HURB(receive),andinitiator(external)mode.MPsmustbeinthe
samemodeinordertoparticipateinDistributedI/O.Forexample,twoMPs(onthesameboard)whose
portsareconfiguredininitiator(external)modewillworktogether.
TheDistributedI/OfeatureislimitedtothefourMPsonanindividualPCBboard.Someofthistechnique
wasactuallybegunintheUniversalStoragePlatformwithcodelevelsV08andhigher,butmostofitis
unique to the Universal Storage Platform V with its much higher powered MPs and faster Shared
Memory system. The degree to which Distributed I/O is available is dependent on the pattern of I/O
requestsfromthehostaswellastheinternalstatusofworkalreadyinthesystem,backendprocessing
ofpendingrequests,andsoforth.
DistributedI/OonanindividualMPbeginswhenitreachesasustainedaverage50%busyrate,andis
managed on a roundrobin basis with the other available MPs on the same FED board, taking into
accounttheircurrentbusystate.Atableiskeptinlocalmemorytoassistinthisworkloaddispatching.
AnMPthatreceivesanI/Ocommandoverahostpathandiscurrentlymorethan50%busywilllookfor
anotherMPonthatboardtoassistwiththeoperation.Thisoffloaddoesnotrepresentloadbalancing
among host ports, just a degree of parallel processing of I/O commands to the backend based on
availablecyclesamongtheMPsonaboard.[Note:WhentwoormoreMPsonaFEDboardareoperated
inexternalmode,wehaveobservedthatinsomecasesthereisnominimumpercentbusythreshold
requiredtotriggertheDistributedI/Omode.]
IfanMPisdoingprocessingforotherMPsandthenreceivesitsownhostI/Ocommand,thisworkload
couldbedistributedamongtheotherMPsontheboard.Astheprocessorsallgetbusy,thedegreeof
sharingrapidlydropsoff.MPsthatcurrentlyhavenohostI/Ooftheirowncanworkuptoa100%busy
rateonDistributedI/Oloads.InthepresenceofhostI/OonaportassociatedwithanMP,thesumof
thatMPshostI/O+distributedI/Omustbelessthan50%.OncehostI/OonanMPexceedsthe50%
busyrate,thereisnofurtheracceptanceofDistributedI/ObythatMP.
When Fibre Channel FED ports are configured for external storage, the associated owning MP no
longer participates in the hostfacing Distributed I/O mode. That MP, and the one or two ports it
manages,isplacedintoinitiatormode.(Note:EveryportonanMPisoperatedinthesamemode.)If
twoormoreMPsonaPCBareusedforexternalattachment,thentheywillengagetheDistributedI/O
function.AtleasttwoMPsmustbeinexternalmode(thusalloftheirports)inordertogetDistributed
I/OforexternalloadsonaFEDboard.

External Storage Mode I/O (Open Fibre)


When using the Open Fibre FEDs, each MP (and the one or two ports it owns) on the PCB may be
configured as one of four modes: a target (accepting host requests), an initiator (driving storage
requests), as TC/HURA (Copy products send) or as TC/HURB (Copy products receive). The initiator
modeishowotherstoragearraysareattachedtoFEDportsonaUniversalStoragePlatformV.Front
end ports from the external (secondary) array are attached to some number of Universal Storage
RESTRICTEDCONFIDENTIAL

Page27

PlatformVFEDportsasiftheUniversalStoragePlatformVwasaserver.Ifusingthe16portfeature,
thenbothportsmanagedbyanMPareplacedintothismode.Inessence,thatFEDMP(andtheoneor
twoportsitowns)isoperatedasthoughitwasaBEDMP.LUNsvisibleontheexternalarraysportsare
thenremappedbytheUniversalStoragePlatformVouttoFEDportsattachedtohosts.
AsI/OrequestsarriveoverotherhostattachedFEDportsontheUniversalStoragePlatformVforthose
LUNs, the normal I/O operations within the FED occur. The request is managed by Shared Memory
tablesandthedatablocksgointothedatacache.ButtherequestisthenroutedtotheFEDMP(nota
BEDMP)thatcontrolstheexternalpathswherethatLUNislocated.Theexternalarrayprocessesthe
requestasthoughitweretalkingtoaserverinsteadoftheUniversalStoragePlatformV.

Cache Mode Settings


Theexternalport(andallLUNspresentonit)maybeassignedwitheitherCacheMode=ONorOFF
when it is configured in the Universal Storage Platform V. The ON setting processes I/O and Cache
behavioridenticaltointernalLDEVs.TheOFFsettingdirectstheUniversalStoragePlatformVtowait
forWriteI/Ocompletionuntilthedataisacceptedbytheexternaldevice.TheOFFsettingdoesnot
changeothercachehandlingbehaviorsuchasreadstotheexternalLUNs;howeverthischangemakesa
significantdifferencewhenwritingtoslowerexternalstoragethatpresentsariskforhighwritepending
casestodevelop.TheuseofCacheMode=ONwillreportI/Ocompletiontothehostforwritesonce
theblocksarewrittentoUniversalStoragePlatformVcache.CacheMode=ONshouldnormallynot
beusedwhenhighwriteratesareexpected.Ingeneral,theruleofthumbistouseCacheMode=OFF.

Other External Mode Effects


RecallfromabovethattheQueueDepthforanexternalLUNcanbemodifiedtobebetween2and128
in the Universal Volume Manager GUI as shown above. Increasing the default value from 8 up to 32
(probably the best choice overall), 64, or perhaps even 128 can often have a very positive effect
(especially Response Time) on OLTPlike workloads. Along with this, recall that the maximum I/O
RequestLimitforanexternalportis256(not4096).
WhenportsonaPCBareconfiguredforuseasexternalattachment,thentheirowningMPisoperated
ininitiatormode;therewillbenoDistributedI/OmodeuntilatleastoneotherMPissoconfigured.
TheexternalportrepresentsitselfasaWindowsserver.Therefore,whenconfiguringthehosttypeon
theportsonthevirtualizedstoragearray,thenmustbesettoWindowsmode.

Front-End Director Board Details


This section discusses the details of the four types of FED features available on the Universal Storage
PlatformV.
Open Fibre 8-Port Feature
The8portOpenFibrefeatureconsistsoftwoPCBs,eachwithfour4Gb/secOpenFibreports.EachPCB
hasfour800MHzChannelHostInterfaceProcessors(CHP)andtwoTachyonDX4dualportedchips.The
Shared Memory (MPA) port count is now eight paths per PCB (double the Universal Storage Platform
PCB).SeeAppendix5forports,names,andMPassociations.

RESTRICTEDCONFIDENTIAL

Page28

Figure 15. Universal Storage Platform V 8-port Fibre Channel Feature (showing both PCBs)

Open Fibre 16-Port Feature


The16port OpenFibrefeatureconsistsoftwoPCBs,eachwitheight4Gb/secOpenFibreports.Each
PCBhasfour800MHzChannelHostInterfaceProcessors(CHP)andfourTachyonDX4dualportedchips.
ThereareeightSharedMemory(MPA)pathsperPCB(doubletheUniversalStoragePlatformPCB).See
Appendix6forports,names,andMPassociations.
Figure 16. Universal Storage Platform V 16-port Fibre Channel Feature (showing both PCBs)

MPA

RESTRICTEDCONFIDENTIAL

MPA

Page29

ESCON 8-port Feature


The8portESCONfeatureconsistsoftwoPCBs,eachwithfour17MB/sESCONFibreports.EachPCBhas
two 800MHz Channel Host Interface Processors (CHP) and one ESA0 interface. There are two Cache
paths and eight Shared Memory (MPA) paths per PCB. See Appendix 7 for ports, names, and MP
associations.
Figure 17. Universal Storage Platform V 8-port ESCON Feature (both PCBs shown)

ESA0
400MH
CHP
z

ESA0
400MH
CHP
z

400MH
CHP
z

400MH
CHP
z

DTA
DTA

MPA

DTA
DTA

MPA

2 X 1064 MB/sec
DATA Only

8 X 150 MB/sec
Meta-DATA Only

2 X 1064 MB/sec
DATA Only

8 X 150 MB/sec
Meta-DATA Only

FICON 8-port Feature


The8portFICONfeatureconsistsoftwoPCBs,eachwithfour4Gb/secFICONports.EachPCBhasfour
800MHz Channel Host Interface Processors (CHP) and two HTP interface chips. There are two cache
SwitchpathsandeightSharedMemory(MPA)pathsperPCB.SeeAppendix8forports,names,andMP
associations.
Figure 18. Universal Storage Platform V 8-port FICON Feature (both PCBs shown)

MPA

RESTRICTEDCONFIDENTIAL

MPA

Page30

Back-end Director Concepts


TheBackendDirectorsmanageaccessandschedulingofrequeststoandfromthephysicaldisks.The
BEDsalsomonitorutilizationofthe loops,ParityGroups,processors,andstatusofthePCBsinapair.
TheUniversalStoragePlatformVBEDfeature(2PCBs)hasfourpairsof4Gb/secloopssupportingupto
128disks.EachPCBhasfour800MHzDKPprocessorsandfourDRRRAIDprocessors.
The BED features control the Fibre Channel Loops that interface with the internal disks (but not to
virtualized external storage). Every I/O operation to the disks will pass through the Cache and Shared
Memory subsystems. Table 7 lists the BED features, loop counts and disk capacities for the Universal
Storage Platform V system. Note that BEDs must be installed as pairs of Options (four PCBs) due to
provide 8 pairs of loops to support the 8disk Parity Groups, these being RAID5 (7D+1P) and RAID6
(6D+2P).
Notethat,whilethefirstpairofBEDfeaturescanactuallysupport384disks(the128disksintheControl
FrameareattachedtoBED1andBED2seeFigure21below),HDShaslimitedthefieldconfigurationto
support256disksuntilthesecondpairofBEDfeatureshasbeeninstalled.
Table 7. Universal Storage Platform V BED Features

BEDFeatureCount
(2PCBs)

BackendLoops
Available

MaxDisks
Configurable

1BED

2BED

256(PMlimit)

3BED

4BED

16

640

5BED

6BED

24

896

7BED

8BED

32

1152

BED Microcode Updates


TheUniversalStoragePlatformVmicrocodeisupdatedbyflashingtheMPsintheBEDPCBs.Acopyof
this microcode is saved in the local RAM on each BED PCB, and each MP on the PCB (four MPs) will
perform a hot upgrade in a rolling fashion. While the disk loops are still live, each MP will take itself
offline,flashthenewmicrocodefromthelocalBEDRAM,andthenrebootitself.Itthenreturnstoan
online state and resumes processing I/O requests to the disks on its loops. The typical time for this
processis30secondsperMP.ThishappensindependentlyoneachBEDPCBandinparallel.

BED Feature Summary


Table 8 shows the Universal Storage Platform V options for the BED feature. The total backend loop
counts,thenamesoftheBEDPCBs,andtheassociatedCSWsarealsoindicated.

RESTRICTEDCONFIDENTIAL

Page31

Table 8. BED Features

BED Feature
Pair

BED Boards

Loop Pairs

CSW
Used

Basic

BED1

BED10

BED18

BED2

BED12

BED1A

Feature2

BED3

BED11

BED19

BED4

BED13

BED1B

16

Feature3

BED5

BED14

BED1C

BED6

BED16

BED1E

24

Feature4

BED7

BED15

BED1D

BED8

BED17

BED1F

32

Back-End-Director Board Details


BED Details
Figure19isahighleveldiagramofthepairofPCBsfortheUniversalStoragePlatformVBEDfeature.
Figure 19. Universal Storage Platform V 8-port BED Feature (both PCBs shown)

Back End RAID level Organization


Figure20isaviewoftheUniversalStoragePlatformVsBEDtodiskorganization.Thisisalsohowthe
Lightning 9900V was organized. These two BED features (4 PCBs) can support 256 disks, with the
exceptionbeingBED1andBED2,whichalsocontrolthe128disksintheDKCcontrolframe(hence384
disksoverall).IttakestwoBEDoptionstosupportalloftheParityGrouptypesduetothesmaller4loop
PCBs.Whilethesecondloopfromtheothercontrollerwouldtakeovertheloadfromthefailedpartner
loop,thesubsequentfailureofthatalternateloopwouldtakedownalloftheRAID10(4D+4D)orRAID
5(7D+1P)ParityGroups.Becauseofthis,pairsofBEDs(4PCBs)mustalwaysbeinstalled.
RESTRICTEDCONFIDENTIAL

Page32

Figure 20. Universal Storage Platform V Back-end Disk Layout. (2 BED features)

Universal Storage Platform V: HDU and BED Associations by Frame


Figures21and22illustratethelayoutsofthe64diskcontainers(HDU),theFrames,andtheassociated
BEDownershipsoftheUniversalStoragePlatformV.ThenamesoftherangesofArrayGroupsarealso
shown.Therearetwoviewspresented:aregularfrontalviewandaviewofthebackasseenfromthe
back.
Forexample,lookingatFigure21,thebottomofFrameDKUR1showsthefrontoftheHDUwhosedisks
arecontrolledbytheeight4GbitloopsonBED1(pg3,yellow)andBED2(pg4,orange).TheHDUissplit
inhalfbypowerdomains,wherethe32disksonthelefthalf(16onthefrontand16onthebackofthe
Frame)areattachedtoBED2andthe32ontherighthalfgotoBED1.Notethediagonaloffsetsofthe
HDUhalvesfromfronttorearofthetwoassociated16diskgroupsbyHDUpowerdomains.
Figure 21. Front View of the Universal Storage Platform V Frames, HDUs, with BED Ownership
DKU-L2
(front view)
bed8
bed7
16 HDD

16 HDD

pg18
pg17
HDU
bed8
bed7
16 HDD

16 HDD

pg18
pg17
HDU
bed6
bed5
16 HDD

16 HDD

pg16
pg15
HDU
bed6
bed5
16 HDD

16 HDD

pg16
pg15
HDU
power
power

DKU-L1
(front view)
bed8
bed7
16 HDD

16 HDD

pg10
pg9
HDU
bed8
bed7
16 HDD

16 HDD

pg10
pg9
HDU
bed6
bed5

DKU-R0
(front view)
bed2
bed1

DKU-R1
(front view)
bed4
bed3

16 HDD

16 HDD

16 HDD

16 HDD

pg2

pg1

pg6

pg5

HDU
bed2
bed1

HDU
bed4
bed3

16 HDD

16 HDD

16 HDD

16 HDD

pg2

pg1

pg6

pg5

HDU
bed-5
bed-1

HDU
bed2
bed1

16 HDD

16 HDD

16 HDD

16 HDD

pg8

pg7

bed-6

bed-2

pg4

pg3

HDU
bed6
bed5

bed-7

bed-3

HDU
bed2
bed1

bed-8

bed-4

power

power

16 HDD

16 HDD

pg8

pg7

HDU
power
power

RESTRICTEDCONFIDENTIAL

16 HDD

16 HDD

pg4

pg3

HDU
power
power

DKU-R2
(front view)
bed4
bed3
16 HDD

16 HDD

pg14
pg13
HDU
bed4
bed3
16 HDD

16 HDD

pg14
pg13
HDU
bed2
bed1
16 HDD

16 HDD

pg12
pg11
HDU
bed2
bed1
16 HDD

16 HDD

pg12
pg11
HDU
power
power

Page33

Figure 22. Rear View of the Universal Storage Platform V Frames, HDUs, with BED Ownership
DKU-R2
(rear facing view)
bed4
bed3
16 HDD

16 HDD

pg14
pg13
HDU
bed4
bed3
16 HDD

16 HDD

16 HDD

16 HDD

16 HDD

16 HDD

pg6

pg5

pg2

pg1

16 HDD

16 HDD

pg12
pg11
HDU
power
power

DKU-L1
(rear facing view)
bed8
bed7
16 HDD

16 HDD

16 HDD

16 HDD

16 HDD

pg6

pg5

pg2

pg1

16 HDD

16 HDD

16 HDD

HDU
bed-1
bed-5

16 HDD

16 HDD

16 HDD

pg4

pg3

bed-2

bed-6

pg8

pg7

HDU
bed2
bed1

bed-3

bed-7

HDU
bed6
bed5

bed-4

bed-8

power

power

16 HDD

pg4

pg3

HDU
power
power

16 HDD

16 HDD

16 HDD

pg8

pg7

HDU
power
power

16 HDD

pg18
pg17
HDU
bed8
bed7
16 HDD

pg10
pg9
HDU
bed6
bed5

16 HDD

16 HDD

DKU-L2
(rear facing view)
bed8
bed7

pg10
pg9
HDU
bed8
bed7

HDU
bed2
bed1

HDU
bed2
bed1

pg12
pg11
HDU
bed2
bed1
16 HDD

DKU-R0
(rear facing view)
bed2
bed1

HDU
bed4
bed3

pg14
pg13
HDU
bed2
bed1
16 HDD

DKU-R1
(rear facing view)
bed4
bed3

16 HDD

pg18
pg17
HDU
bed6
bed5
16 HDD

16 HDD

pg16
pg15
HDU
bed6
bed5
16 HDD

16 HDD

pg16
pg15
HDU
power
power

The yellow section contains sixteen 4disk Array Groups whose names (used in Parity Group
configurations)are31through316(orpg3asindicated).Theofficialname ofthese4diskgroups(2
DISKSonthefront,2onthebackoftheHDU)isArrayGroupeventhoughtheycouldbeassignedRAID
level 10 (no parity involved). See Appendix1 for maps of the Universal Storage Platform V Frames
showingthenamesoftheArrayGroupsandwheretheyarelocated.Takenoteofhowthelocationsof
theArrayGroupshavebeenshiftedbetweentheUniversalStoragePlatformandtheUniversalStorage
Platform V models. This can have important considerations when configuring a system if this is not
understood.
Figure23belowshowsthedetailsofasingleframe,withitsfourHDUs(showingfrontandrearelements
asviewedthroughthefront).Figure24isazoomedinlookatthefrontofjustoneHDUandthedetails
ofitsswitchesanddisks.
Figure 23. Details of a Frame, with the HDUs and FSWs.
USP-V - R2 Frame
HDU Top Rear
8 HDD

HDU Top Front


8 HDD

8 HDD

FSW-R2B-U
BED-4
8 HDD

FSW-R2B-L

BED-3

BED-4

8 HDD

8 HDD

8 HDD

8 HDD

BED-3

BED-4

8 HDD

8 HDD

8 HDD

8 HDD

BED-1

BED-2

8 HDD

8 HDD

8 HDD

8 HDD

BED-1

BED-2

8 HDD

8 HDD

HDU Upper Middle Rear


8 HDD

8 HDD

FSW-R2A-L

8 HDD

FSW-R23-L

8 HDD

FSW-R22-L

8 HDD

8 HDD
BED-1
FSW-R21-L

8 HDD

HDU Lower Front

FSW-R22-U
BED-2

BED-3
FSW-R28-L

FSW-R21-U

HDU Lower Rear


8 HDD

8 HDD

HDU Lower Middle Front

FSW-R23-U
BED-2

8 HDD

FSW-R28-U

HDU Lower Middle Rear


8 HDD

BED-3
FSW-R29-L

HDU Upper Middle Front

FSW-R2A-U
BED-4

8 HDD
FSW-R29-U

8 HDD
FSW-R20-U
BED-1
FSW-R20-L

8 HDD

RESTRICTEDCONFIDENTIAL

Page34

Figure 24. Closer Look at One HDU and Its Two Switches.

HDU Lower Front


8 HDD

8 HDD
FSW-R20-U

BED-2
8 HDD

BED-1
FSW-R20-L

8 HDD

Disk Details
TheUniversalStoragePlatformVsupportsamixture(inArrayGroupupgradeunitsoffourdisks)ofup
to1152FibreChanneland/orSATAdisks,orupto128Flashdisks(with8BEDfeatures)intheplaceof
128FibreChannelorSATAdisks.Table9illustratestheadvertisedsize(base10),theapparentsizein
thetools(stillbase10),andthetypicalusable(afterParityGroupformatting)sizeofeachtypeofdisk.
AlsoshownistheruleofthumbaveragemaximumrandomIOPSrateforeachtypeofdiskwhenusing
mostofthesurface.ThenumberandsizeoftheLDEVscreatedperParityGroupdeterminehowmuchof
itsdiskssurfacesareinactiveuse.
As more of the disk surface is allocated to LDEVs for a Parity Group, the farther the disk heads must
seek.Thiscreateshigheraverageseektimes.Forexample(usinga15kRPMdisk),whenonlyusingthe
outer25%ofthesurfacetheaveragereadseektimecanbearound1.8ms(providingabout267IOPS),
whereasthe100%surfaceaverageseektimewillbeabout3.8ms(about182IOPS).ForaSATAdiskat
7200RPM,thesevaluesareabout4.3ms(119IOPS)and8.5ms(79IOPS).AllwriteIOPSratesforalldisks
are about 5% lower due to the higher seek precision required. See Appendix 14 for more details on
theoreticalestimatesofdiskIOPSrates.
Normallywiththeuseofdiskswithdifferentinterfacerates(suchas4Gbit/sand2Gbit/s),thebackend
loop performance for all disks will drop to the 2Gbit rate on each BED feature where these disks are
installed.Thisisnottrueofthe3GbitSATAdisksduetothe4Gbit/sspeedconversion,theSATAtoFCAL
protocolconversion,anddatabufferingthattakesplacewithineachSATAdiskcanisterslogicboard.
Table 9. Table of Disk Types and Rule-of-Thumb IOPS Rates

Port
Speed

Advertised
Size(GB)

1TBSATA7.2kRPM
750GBSATA7.2kRPM
450GBFC15kRPM
400GBFC10kRPM
300GBFC15kRPM
146GBFlash
146GBFC15kRPM
72GBFlash

3Gb/sec
3Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec

1000
750
450
400
300
146
146
72

976
738.62
432
384
288.2
143.76
143.76
71.5

927.2
701.7
410.4
364.8
273.8
136.6
136.6
67.9

80
80
180
130
180
5000read1500write
180
5000read1500write

72GBFC15kRPM

4Gb/sec

72

71.5

67.9

180

HDDType

Apparent UsableSize
Size(GB)
(GB)

NominalRandom
IOPS

RESTRICTEDCONFIDENTIAL

Page35

ThetypeandquantityofdisksandtheRAIDlevelschosenfortheUniversalStoragePlatformVwillvary
according to analysis of the user workloads, performance and capacity requirements. The following
sections provide information to be considered when selecting which disks to install. The table below
illustratesusablecapacitiesbyRAIDanddisktypesforvariousdiskcapacities.
Table 10. Disk Types and Usable Capacities by RAID Level
256 HDDs

RAID Levels

512 HDDs

1024 HDDs

Array
Groups

Usable
Space %

Usable
Capacity
TB

Array
Groups

Usable
Space %

Usable
Capacity
TB

Array
Groups

Usable
Space %

Usable
Capacity
TB

32
32

87.5%
75%

202.8
173.9

64
64

87.5%
75%

405.7
347.7

128
128

87.5%
75%

811.3
695.4

64
32
64
32
32

75%
87.5%
50%
50%
75%

77.0
89.8
51.3
51.3
77.0

128
64
128
64
64

75%
87.5%
50%
50%
75%

153.9
179.6
102.6
102.6
153.9

256
128
256
128
128

75%
87.5%
50%
50%
75%

307.8
359.1
205.2
205.2
307.8

64
32
64
32
32

75%
87.5%
50%
50%
75%

68.4
79.8
45.6
45.6
68.4

128
64
128
64
64

75%
87.5%
50%
50%
75%

136.8
159.6
91.2
91.2
136.8

256
128
256
128
128

75%
87.5%
50%
50%
75%

273.6
319.2
182.4
182.4
273.6

64
32
64
32
32

75%
87.5%
50%
50%
75%

51.3
59.9
34.2
34.2
51.3

128
64
128
64
64

75%
87.5%
50%
50%
75%

102.7
119.8
68.4
68.4
102.7

256
128
256
128
128

75%
87.5%
50%
50%
75%

205.3
239.6
136.9
136.9
205.3

64
32
64
32
32

75%
87.5%
50%
50%
75%

25.6
29.9
17.1
17.1
25.6

128
64
128
64
64

75%
87.5%
50%
50%
75%

51.2
59.8
34.1
34.1
51.2

256
128
256
128
128

75%
87.5%
50%
50%
75%

102.4
119.5
68.3
68.3
102.4

64
32
64
32

75%
87.5%
50%
50%

12.7
14.9
8.5
8.5

128
64
128
64

75%
87.5%
50%
50%

25.5
29.7
17.0
17.0

256
128
256
128

75%
87.5%
50%
50%

50.9
59.4
34.0
34.0

1TB SATA
RAID5 7D+1P
RAID6 6D+2P

450GB FC
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D
RAID6 6D+2P

400GB FC
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D
RAID6 6D+2P

300GB FC
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D
RAID6 6D+2P

146GB FC or Flash
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D
RAID6 6D+2P

72GB FC or Flash
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D

SATA DISKS
LDEVsthatarecreatedfromParityGroupsthatuseSATAdiskscanbeconfiguredasOPENVor3390M
emulation;thereforetheseSATALDEVsarecurrentlyusableformainframevolumes.
SATA disks are constructed with a far lower level of technological sophistication as compared to
enterprisegradeFibreChanneldisks.Ingeneral,thedifferencesbetween theSATAandFibreChannel
disksinclude:
the amount of extreme technology and precision involved with the materials used for Fibre
Channel disks (disk platter materials and magnetic layer sandwiching, head technology, and
headflyheightcontrol)

thedegreeoferrordetectionandcorrection(FibreChanneldisksareaboutdoublethatofSATA)
andprocessingpower(HitachiFibreChanneldisksaredualprocessor)

theexpectedlifespanwiththedutycyclesexperiencedinenterpriseclassstorage

theoverallperformanceavailable

RESTRICTEDCONFIDENTIAL

Page36

WhenmakingperformanceestimatesforSATAParityGroupsthatuseRAID6(6D+2P),onemayuseasa
ruleofthumbthatreadsandwriteswillbe50%and75%lowerthanwithsimilarRAID6(6D+2P)146GB
15kRPMParityGroups.(Subjecttochangeinsummer2009withmicrocodechanges.)
The backend disk loops on the Universal Storage Platform V are operated at 4Gbit/s, with all disks
connectedtotheseloopsviaswitches.AllHitachiSATAdrivesoperateat3Gbit/s.SATAdrivesalsoonly
haveoneinterfaceconnection,whereasFibreChanneldisksaredualported(theymaytransferinand
outofcacheonbothloopsatthetime,withoneactualtransferpossiblethroughtheread/writehead).
TheSATAdiskcanistersfitintothesameDiskChassisslotsasdoFibreChanneldisksbecauseeachSATA
disk canister has a builtin controller module. One side of the controller presents two 4Gbit Fibre
Channelpaths(mimickingastandardFibreChanneldisk),whiletheinnersideoperatesthe3GbitSATAII
protocol. There is buffering in this controller that provides the speed level matching between the
differentprotocolrates.[Note:ThisbufferingkeepstheBEDloopsfromslowingdowntoaratebelow
4Gbitashappenswiththe2Gbit300GB10kFibreChanneldisks.]
The SATA canister interface also performs a readafterwrite check on every block (whether data or
parity/mirror) written to the disk. After a block is written to the disk, it is read from the surface and
compared to the buffer in the canister interface. While this reduces the write rate by some amount
depending on the RAID level, this read from the disk does guarantee that the data was accurately
written.
Table 11. Physical Disk IOPS Consumed by RAID Level and Disk Type

FibreChannel/Flash

HDDIOPS
perHost
Read

HDDIOPS
perHost
Write

RAID10

RAID5

RAID6

SATA

RAID10

RAID5

RAID6

Table11aboveshowstheactualphysicaldiskIOPSrequiredforreadorwriteoperationsasafunctionof
diskstypeandRAIDlevel.AlthoughRAID6doesprotectagainstadoublediskfailure,thewritecostsare
high(especiallyforSATAdiskswiththeextrareadcompareafterwritebythelocaldiskcanisterlogic),
so this must be considered when designing a storage solution. Note that in the case where full stripe
writes occur, the write penalty across the entire Parity Group will be much lower since the old parity
block(s)arenotread.
OtherconsiderationsontheuseofSATAversusFibreChannelwhenusingRAID6include:
thenoloadformattimesareabout18XlongerthanwithFibreChanneldisks
thenoloadcorrectivecopytimesareabout4.4XlongerthanwithFibreChanneldisks
thenoloaddiskcopytimeisabout10XlongerthanwithFibreChanneldisks
thereshouldbe1spareSATAdiskforevery16disksinuse

RESTRICTEDCONFIDENTIAL

Page37

HitachisrecommendationonsuitableusesforSATAvolumesistousethemforNearlinestoragewitha
lowfrequencyofaccessbutonlineavailability,suchas:

Emailarchives,backups,2ndand3rdcopies

Historicaldatawithlimitedaccess(medicalrecords,etc.)

Archiveddocumentsforregulatorypurposes

Soundorvideofiles

Tapereplacement

The decision to use up valuable internal disk slots with SATA disks rather than the much higher
performing Fibre Channel disks should be carefully considered. In many cases, the use of SATA disks
within virtualized storage on a midrange product might make better sense. However, there will be
individual cases where the use of some internal slots for SATA drives will solve a specific customer
business solution in a straight forward manner. It is not expected that a Universal Storage Platform V
wouldnormallybeconfiguredwithahighpercentageofSATAdisks.

HDU Switched Loop Details


OnthepreviousUniversalStoragePlatformmodel,allofthedisksattachedtoaBEDPCBwereoneight
nativeFCALLoops.Atransfertothelastdiskonaloop(couldbe32or48disksperloop)wouldhaveto
passthroughbypasslogiconallofthedisksinfrontofit.TheUniversalStoragePlatformVintroducesa
switchedloopbackend,whereallofthedrivesarestillonArbitratedLoops(allFibreChanneldisksuse
FCAL,notFCSWfabrics),buttheswitchlogicbypassesalltargetsonthatloopexceptfortheonedisk
being addressed. This reduces the propagation time by some microseconds (not noticeable), but the
primaryeffectofthischangeisimprovedfaultdetectionandcorrection.
EachHDUhasfoursuchFSWswitches,twoonthefrontside(32disks)andtwomoreontheback(32
disks).EachpairofswitchesiseitheranupperoralowerswitchasseeninFigure25.Figure26isa
zoomedviewshowingjustthelefthalfoftheswitch.
Figure 25. Full View of the Two Switches on the Front or Rear Half of an HDU
To Next HDU/
FSW
UPPER FSW PCB

C
T
L
F
C

First
BED
Pair
F
C
C
T
L

FSW (upper)
Port Bypass Control
Circuit
LOOP
SWITCH

Port Bypass Control


Circuit
LOOP
SWITCH

Port Bypass Control


Circuit
LOOP
SWITCH

Port Bypass Control


Circuit
LOOP
SWITCH

Port Bypass Control


Circuit
LOOP
SWITCH

Port Bypass Control


Circuit
LOOP
SWITCH

FSW (upper)
Port Bypass Control
Circuit
LOOP
SWITCH

Port Bypass Control


Circuit
LOOP
SWITCH

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

HDD

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

SW

LOOP
SWITCH
Port Bypass Control
Circuit
FSW (lower)

LOOP
SWITCH
Port Bypass Control
Circuit

LOOP
SWITCH
Port Bypass Control
Circuit

LOOP
SWITCH
Port Bypass Control
Circuit

LOOP
SWITCH
Port Bypass Control
Circuit

LOOP
SWITCH
Port Bypass Control
Circuit

LOOP
SWITCH
Port Bypass Control
Circuit

C
T
L
F
C

Second
BED
Pair
F
C
C
T
L

LOOP
SWITCH
Port Bypass Control
Circuit
FSW (lower)

LOWER FSW PCB

To Next HDU/
FSW

RESTRICTEDCONFIDENTIAL

Page38

Figure 26. Closer View of the Left Half of the Upper and Lower Switches on an HDU

Universal Storage Platform V Configuration Overviews


The Universal Storage Platform V is not being offered in specific models as is the Universal Storage
Platform,butasbasicmodelthatisastartingpointforconfigurationsthatmeetavarietyofcustomer
requirements.

Small Configuration (2 FEDs, 2 BEDs)


The Universal Storage Platform V depicted here is comprised of 16 x 1064MB/sec Data paths,
representing17GB/secofdatabandwidth,and64x150MB/secControlorMetadatapaths,representing
9.6GB/secofcontrolbandwidth.
Figure 27. Universal Storage Platform V, Small Configuration

RESTRICTEDCONFIDENTIAL

Page39

Thissystemisshownwith:

Upto128physicaldisksinthecontrollerframe.
TwoBackendDirectorfeatures(BED1,BED2;4PCBs),consistingof164Gb/secFibreChannel
loops,foratotalbackendbandwidthof6.4GB/sec.
One Cache Card feature (CMA1; 2 PCBs) supporting up 64GB of cache with a bandwidth of
17GB/sec.
OneCacheSwitchfeature(CSW0;2PCBs)withatotalinternaldatabandwidthof17GB/sec.
OneSharedMemoryfeature(SMA1;2PCBs)withatotalmetadatabandwidthof9.6GB/sec.
Two FrontEndDirector features (FED1, FED2; 4 PCBs) supporting either 16 ESCON, 16
4Gb/secFICON,164Gb/secFibreChannel,or324Gb/secFibreChannelports.

Midsize Configuration (4FEDs, 4 BEDs)


The Universal Storage Platform V depicted here is comprised of 32 x 1064MB/sec Data paths,
representing 34GB/sec of data bandwidth, and 128 x 150MB/sec Control or Metadata paths,
representing19.2GB/secofcontrolbandwidth.
Figure 28. Universal Storage Platform V, Enhanced Configuration

RESTRICTEDCONFIDENTIAL

Page40

Thissystemisshownwith:

Upto640physicaldisks,with256intheR1frame,256intheR2expansionframe,and128in
theControlFrame.
FourBackendDirectorfeatures(BED1,BED2,BED3,BED4;8PCBs),consistingof324Gb/sec
FibreChannelloops,foratotalbackendbandwidthof12.8GB/sec.
Two Cache Card features (CMA1, CMA2; 4 PCBs) supporting up 128GB of cache with a
bandwidthof34GB/sec.
Two Cache Switch features (CSW1, CSW2; 4 PCBs) with a total internal data bandwidth of
34GB/sec.
OneSharedMemoryfeature(SMA1;2PCBs)withatotalmetadatabandwidthof19.2GB/sec.
Four FrontEndDirector features (FED1, FED2, FED3, FED4; 8 PCBs) supporting either 32
ESCON,324Gb/secFICON,324Gb/secFibreChannel,or644Gb/secFibreChannelports(more
iffewerBEDfeaturesareinstalled).

Large Configuration (8 FEDs, 8 BEDs)


The Universal Storage Platform V depicted here is comprised of 64 x 1064MB/sec Data paths,
representing 68GB/sec of data bandwidth, and 256 x 150MB/sec Control or Metadata paths,
representing38.4GB/secofcontrolbandwidth.
Figure 29. Universal Storage Platform V, Maximum Configuration

RESTRICTEDCONFIDENTIAL

Page41

Thissystemisshownwith:

Up to 1152 physical disks, 128 in the control frame, 256 in each of the R1, R2, L1, and L2
expansionframes.
EightBackendDirectorfeatures(16PCBs),consistingof644Gb/secFibreChannelloops,fora
totalbackendbandwidthof25.6GB/sec.
FourCacheCardfeatures(8PCBs)supportingup256GBofcachewithabandwidthof68GB/sec.
FourCacheSwitchfeatures(8PCBs)withatotalinternaldatabandwidthof68GB/sec.
TwoSharedMemoryfeatures(4PCBs)withatotalmetadatabandwidthof38.4GB/sec.
EightFrontEndDirectorfeatures(16PCBs)supportingeither64ESCON,644Gb/secFICON,64
4Gb/secFibreChannel,or1284Gb/secFibreChannelports(upto224in14featuresifonly2
BEDfeaturesareinstalled).

Provisioning Managing Storage Volumes


Thissectionwillbeginwithareviewofthetechniquestypicallyusedtomanagestoragevolumes.Some
ofthesemethodsarehostbased,whileothersuseexistingHDSEnterprisestoragesystemcapabilities
suchasLUSEorConcatenatedParityGroups.Theendofthissectionwillexaminetheoveralldetailsof
thenewHitachiDynamicProvisioningvolumemanagementfeatureandexplainhowitworks.

Traditional Host-based Volume Management


Therearetwomethodstypicallyusedtoorganizestoragespaceonaservertoday.Onemethodwould
bethedirectuseofphysicalvolumesasdevicesforuseeitherasrawspaceorasfilesystems(localor
clustered).Thesearefixedsizevolumeswithasmallnumberofdisks;assucheachvolumehasacertain
inherentphysicalrandomIOPSorsequentialthroughput(MB/s)capacity.Asystemadministratormust
carefully manage the aggregate server workloads against these individual volumes. This is shown in
Figure30.AsworkloadsexceedavolumesavailablespaceoritsIOPScapacity,theusercontentsmust
bemanuallymovedontoalargerorfaster(morespindles)volumeifpossible.
Figure 30. Use of Individual Volumes on a Host

ThealternativemethodtotheuseofindividualvolumesistouseahostbasedLogicalVolumeManager
(LVM) when the planned workloads require either more space or IOPS capacity than the individual
physical volumes can provide. The LVM can create a single logical (virtual) volume from two or more
independentphysicalvolumes.ThisisshowninFigure31.

RESTRICTEDCONFIDENTIAL

Page42

Figure 31. Use of LUNs in Host-managed Logical Volumes.

WhensuchalogicalvolumerunsoutofspaceorIOPScapacity,onecouldreplaceitwithonethatwas
createdwithevenmorephysicalvolumesandthencopyalloftheuserdata.Insomecasesitisbestto
create another logical volume and then manually relocate just part of the existing data in order to
redistribute the overall workload across two such logical volumes. These two logical volumes would
probablybemappedtotheserverusingseparatehostpathsaswell.However,thismanualintervention
couldbecomeacostlyandtediousexerciseasworkloadsgrowovertimeorasnewonesareaddedto
themix.Theproblembecomesmorecomplexasnewserversareaddedtothescenario.Thisrequiresa
carefulandaccurateongoingdocumentationeffortofalloftheseelementsandtheirusagebytheteam
ofsystemadministrators.

Traditional Universal Storage Platform Storage-based Volume


Management
In order to create volumes on a Hitachi enterprise storage system, an administrator will first create
ParityGroups,eachfromoneortwoofthe4diskArrayGroups.TheseParityGroupscouldbeanymixof
fivetypesofRAIDlevels,toinclude:RAID10(2D+2D),RAID5(3D+1P,7D+1P)andRAID6(6D+2P).The
next step would be to create volumes (LDEVs) of the desired size from these individual Parity Groups
using OpenV (or OpenX, but this isnt recommended) emulation on Open Systems, and 3390X
emulationforaMainframe.TheseStandardLDEVswillthenbeindividuallymapped tooneormore
hostportsasaLUN(anLDEVonaMainframe).ThisstandardmethodisshownbelowinFigure32.
Figure 32. Standard Volumes
LUN

LUN

LUN

LUN

LDEV

LDEV

LDEV

LDEV

Array Group Array Group Array Group Array Group


Independent Array Groups

In order to change the size of a volume (LDEV) already in use, one must first create a new one (if
possible)thatwaslargerthantheoldone,andthenmovethecontentsofthecurrentvolumeontothe
newone.Thenewvolumewouldthenberemappedontheservertotakethemountpointoftheold
one,whichisthenretired.Thiscouldbeaccomplishedeithermanuallyorinamoreautomatedfashion
on the Universal Storage Platform and the Universal Storage Platform V with the use of the optional
HitachiTieredStorageManagersoftwareproduct.
RESTRICTEDCONFIDENTIAL

Page43

OnthepreviousHitachiLightningproduct,thevolumesizeswerefixed,andlimitedtoasmallnumberof
OpenXchoices.Ifyouneededsomethingbiggerthanthelargestemulationsize(OpenL,at33GB),the
storage systembased solution was to use the LUSE (Logical Unit Size Expansion) LDEV concatenation
feature(Figure33).NotethatasLUSEisaconcatenationof2to36LDEVs(uptoa60TBmaximumsize),
itoftenoperateswithjusttheIOPSpowerofthosedisksunderasingleLDEV(either4or8disks).Onthe
otherhand,thehostbasedRAID0stripedLVMvolumewillusethepowerofallofthedisksofthelogical
volumeallofthetime.
Figure 33. LUSE Volume

OnaHitachiUniversalStoragePlatformproduct,withitsOpenVemulation,asingleLDEVcouldbeas
large as the entire Parity Group (up to 2.99TB for internal storage and 4TB for external storage).
HoweverasingleParityGroupmightnothavesufficientIOPSpower(disks)tosupporttheIOPSneeded
bytheworkloadonasinglevolume.YoucouldbuildaconcatenatedLUSEvolumeaswiththeLightning,
but the use of the Universal Storage Platforms large volume facility known as Concatenated Parity
Groupsispreferred.Thisfeatureallowsyoutocreatealogicalvolumethatisstripedacross2or4RAID
5(7D+1P)ParityGroups.
The individual RAID stripes from those member Parity Groups are interleaved rather than the LDEVs
being concatenated as the name implies. Thus, one could have 2x or 4x the amount of IOPS (not
capacity) available per volume as compared to a standard volume. Note that a hostbased LVM uses
block striping (usually 64KB) whereas a Concatenated Parity Group uses RAID5 (7D+1P) stripe
interleavingof3,584MB.SeeAppendix3foradetaileddiscussionofRAIDlevels,stripesizes(chunks),
andstripewidths.SeeAppendix16foramoredetaileddiscussionofConcatenatedParityGroups.
Figure 34. Concatenated Parity Group Striped Volume

All of these techniques can be awkward and time consuming to configure. As such, there is a lot of
potentialforerrorinabusyITshop.Nowthereisanewandsimpleralternativetotheoldermethods.
The Hitachi Dynamic Provisioning feature (HDP) allows for new volume management techniques that
provideagreaterdegreeoffreedomofstoragemanagementforadministrators.

RESTRICTEDCONFIDENTIAL

Page44

Dynamic Provisioning Volume Management


ThenewHitachiDynamicProvisioningfeatureallowsforthecreationofoneormoreHitachiDynamic
ProvisioningPoolsofphysicalspace(multipleLDEVsfrommultipleParityGroupsofanyRAIDlevel)to
supportthecreationofvirtualvolumes(DPVOLs).Therearethreedistinctanduniquecapabilitiesthat
becomeavailabletotheuserwiththisfeature.Theseare:
ThinprovisioningvolumesafeaturewherephysicalspaceisonlyallocatedtoaDPVOLasitis
writtentoovertime.TheDPVOLmaybespecifiedtobeanysizeupto4TBwhencreated,and
this will be the size the host sees when scanning the port for this new volume. The DPVOL is
allocatedphysicalspacein42MBpagesfromaPoolasthehostwritestothisvolume.Thiscan
beagoodmethodtomanagelotsofsmalluseraccountswhileavoidingadedicatedlarge,fixed
poolofspacetoeachaccountupfront.
High performance volumes the ability to create Pools with the RAID level desired and the
numberofdisksneededtosupportthedesiredtargetIOPSratesofalloftheDPVOLsconnected
tothatPool.WiththeuseoftheConcatenatedParityGroupfeature,onecouldcreatevolumes
thatspannedeither16or32disks,buttheyhadtobeRAID5(7D+1P).Now,awidevarietyof
choicesexistsforestablishingthephysicalspaceunderDPVOLs.OnemightcreateaLogPoolof
4LDEVstakenfromfourRAID10(2D+2D)ParityGroups(onelargeLDEVeach)andusethisPool
forDPVOLsassignedtodatabaselogsortempspace.OnemightalsothencreatefourDatabase
Pools, each with 8 LDEVs from eight RAID5 (7D+1P) Parity Groups, and use these for the
databasefilesthatcontainuserdata.TheLogPoolwasconfiguredtoaccepthighlevelsoflarge
block sequential write workloads, whereas the Database Pools are configured to support high
levelsofrandomsmallblockOLTPworkloads.
Managementsimplificationthoughrelatedtoandembeddedwithinbothoftheabovetopics,
this requires some special mention as a feature in and of itself. Because of this new ability to
create large and/or high performance volumes, a considerable amount of often complicated
hostbasedstoragemanagementcanbeavoided.RatherthanuseaLogicalVolumeManageron
eachservertoorganizevolumespace(using2ormoreLDEVsaspreviouslydescribed),inmany
casesonecannowrelyonjusttheHitachiDynamicProvisioningfeatureforthisfunction.None
oftheserverswouldrequiretheextramanagementofusinganLVM.
One example of simplification would be the Log Pool example from above. Another example
would be the ability to manage users and space quite easily in a Microsoft Exchange Server
environment.SinceonlyfourStoragePoolsareavailableinExchange2003(i.e.fourvolumes),it
is necessary to carefully balance the user types (regular or Blackberry) among these four
volumesbecauseoftheirsignificantworkloaddifferencesandtheIOPSlimitationsofaStandard
volume from a 4disk or 8disk Parity Group. In addition to the use of one Standard LDEV of
RAID10forthelogs,therecouldbefourDPVOLscreatedfromoneortwoPoolsforthesefour
StorageGroups.AstorageadministratorisnowincontrolofdesigningthedesiredphysicalIOPS
ratesavailableperStorageGroup.Assuch,ExchangeusersmaybeaddedtoanyStorageGroup
inasimpleroundrobinmethodwithoutbeingconcernedwithwhattypeofusertheymaybe,
howmanyIOPStheymayconsume,andhowmuchspacetheymayneedinthefuture.

RESTRICTEDCONFIDENTIAL

Page45

The Figure below is an overview of a single Pool environment used for discussion purposes in the
followingsections.SeeAppendices911foradditionaldetailedviews.
Figure 35. An Hitachi Dynamic Provisioning Pool Environment

Usage Overview
InordertouseDynamicProvisioningvolumes,thereareafewextrastepstotakewhencreatingthem.
One still configures various Parity Groups to the desired RAID level (see D), and then creates one or
more OpenV volumes (LDEVs) on each of them. The first new step in setting up an Hitachi Dynamic
ProvisioningenvironmentwouldbetocreateoneormoreHitachiDynamicProvisioningPools(seeC)
that are each collection of selected LDEVs (Hitachi Dynamic Provisioning Pool Volumes). The second
step would be to create one or more VVOL Groups (see B), which are administrative containers for
virtualvolumes(usedaspseudoParityGroups).ThethirdstepistocreateoneormoreHitachiDynamic
Provisioning Volumes (DPVOLs), each assigned to one VVOL Group, and each connected to a single
Pool.TheDPVOListhenmappedtoahostportasaLUN(seeA)withahostserveremulationmodeand
usedjustlikeanyStandardLDEV.

Hitachi Dynamic Provisioning Pools


Each Hitachi Dynamic Provisioning Pool (up to 128 Pools per Universal Storage Platform V system)
provides the physical space for all of the DPVOLs which are connected to that Pool. Each Hitachi
DynamicProvisioningPoolisgivenanamewhichisusedwhenconnectingnewlydefinedDPVOLs.The
LDEVsassignedtoaPoolmustbeoftypeOPENV.ThesizeofeachLDEVmustbeatleast8GB,butno
morethan4TB.Thesize usedforLDEVsshouldbeamultipleof42MB,asPoolspaceisallocatedand
managedinthatsize.OnceassignedtoaPool,theseLDEVsarecalledPoolVolumes.APoolssizemust
fall between 8GB (1 Pool Volume, minimum size) and 1.1PB in size, although the aggregate size of all
RESTRICTEDCONFIDENTIAL

Page46

Pools(andallnonHitachiDynamicProvisioningvolumes)onaUniversalStoragePlatformVmustbeless
than1.1PB.
Figure 36. Hitachi Dynamic Provisioning Pool and DPVOL Limits

HDPLimits
MaximumnumberofPools
MaximumPoolcapacity
MaximumcapacityofallPools

PoolAllocationWarningThresholds
fixed:
Useradjustable:

Emulation
RAIDLevel

NumberofLDEVsperPool

128
1.1PB
1.1PB

80%
595%

OpenV
any

21024
2.99TBinternal
4TBexternal
8GB

65,536
8192

MaximumLDEVsize
MinimumLDEVsize

MaximumDPVolumespersystem
MaximumDPVolumesperPool
DPVolumesizerange

46MB4TB

ThePoolVolumesizesareimportantinthattheycontrolthestripingrateintheFreePageList(atthe
OptimizestepinPoolsetup).IfallPoolVolumesarethesamesize,andiftheycomefromthesame
typeofParityGroups(i.e.2D+2Dor7D+1P),thenthePoolwillpresentanevendistributionofI/Oacross
theAGsinthatPooltoallofitsconnectedDPVOLs.Theactualsize(ifuniformforalloftheLDEVs)does
notmatterexceptthatitwilldeterminehowsoonthePoolrunsoutofinitialphysicalspace.Youmust
scaleyourusageexpectationsagainsthowbigthePoolneedsbewhenitfirststartsout.AdditionalPool
VolumesmaybeaddedtoaPoolwhenitsspacebeginstorunlow(explainedinPoolExpansionbelow),
withanoveralllimitof1024LDEVsperPool.
YoushouldusethesamedisktypeandRAIDlevelforeveryPoolVolumewithinaPool.Additionally,all
oftheLDEVsfromaParityGroupshouldbeassignedtothesamePool.Itisrecommendedthattherebe
32ormoredisksperPoolinordertohaveanadequatediskIOPSpool.Thiscouldbefour8diskParity
Groups(suchas4D+4D,withfourLDEVseach)oreight4diskParityGroups(suchasRAID53D+1P).
TherearevariousrestrictionsontheLDEVsusedforPools.Theserestrictionsinclude:

There can be a maximum of 1024 Pool Volumes assigned to a single Hitachi Dynamic
ProvisioningPool.

PoolVolumes(LDEVs)mustcomefrominternaldisksorexternalstorageonthesameUniversal
StoragePlatformV(cantcreatePoolsacrosssystems).

PoolVolumesmustbeoftheOPENVemulationtype.

RESTRICTEDCONFIDENTIAL

Page47

APoolVolumeisanLDEV,butitisanormalLDEVcreatedfrom(orallocatedfrom)aninternal
RAIDGrouporexternalstorage,soitsLDKC:CU:LDEVaddressisassignedbeforethevolumeis
definedasanHitachiDynamicProvisioningpoolvolume.

ALUSEvolumecannotbeusedasaPoolVolume.

If used, all Parity Groups that are members of a Pool must be assigned to the same cache
partition (CLPR). Hence, all associated DPVOLs will also be assigned to the same VPM cache
partition. However, the Parity Groups for other Pools may be assigned to different cache
partitions.

Once an LDEV becomes a Pool Volume (assigned to a given Pool), there are a wide variety of
restrictions that are invoked in terms of what you can and cannot do with it, particularly in
terms of replication functions. However restrictions that apply to Pool Volume do not
encompasstheDPVOLswhichhaveveryfewextrarestrictionsovertraditionLDEVs.

YoucannotremoveaPoolVolumefromanHitachiDynamicProvisioningPoolwithoutdeleting
the entire Pool (first disconnecting all of its DPVOLs volumes) from the Universal Storage
PlatformVsystem.

AnLDEVmaynotbeuninstalledfromthesystemifitisaPoolVolume.

EachLDEVmustbeRAIDformatted(scrubbedwithzeros)beforebeingassignedtoaPool(you
cantformataPool).

V-VOL Groups and DP Volumes


TherearebothVVOLGroups(virtualvolumecontainers)andtheirDPVOLs(virtualvolumes,alsoknown
as VVOLs). The DPVOLs are seen as realvolumes by the hosts. There may be up to 8,192 DPVOLs
created per Pool. DPVOLsshare the Universal Storage Platform V systemwide addressability limit of
65,536volumeswithexternalvolumes,ShadowImagevolumes,UniversalReplicatorvolumes,andPool
Volumes.ThesizeofeachDPVOLmaybebetween46MBand4TBinsize,dependingonthesizeofitsV
VOLcontainer.
Once a Pool has been created, one or more VVOL Groups are then configured. AVVOL Group is a
virtual substitution for a Parity Group. When a VVOL Group is created and connected to an Hitachi
Dynamic Provisioning Pool (a VVOL Group cannot span Pools), a user selectable Group size between
46MB and 4TB is specified. The next step is to create one virtual DP volume within theVVOL
Groupcontainer.TheDPVOListhenmappedtoahostportasaLUN(justlikeastandardLDEV).Thehost
willthendetecttheLUNssizeasthevirtualsizespecifiedforthatVVOLGroup.Figure37showstwoDP
volumes,twoVVOLGroupcontainers,theassociatedPoolwith1152GBofcapacity,andthefourParity
GroupLDEVsassignedasPoolVolumes(288GBeach)tothatPool.

RESTRICTEDCONFIDENTIAL

Page48

Figure 37. Hitachi Dynamic Provisioning: Association of a DPVOL to Parity Groups

500GB
DP-VOL

1000GB
DP-VOL

500GB V-VOL
Group

1TB V-VOL
Group
HDP Physical Pool

288GB LDEV
Pool Volume

288GB LDEV
Pool Volume

288GB LDEV
Pool Volume

288GB LDEV
Pool Volume

2D+2D
RAID Group

2D+2D
RAID Group

2D+2D
RAID Group

2D+2D
RAID Group

ThephysicalspaceassignedasneededfromthePooltoaVVOLGroupcontainerisevenlydistributed
acrossthePoolVolumesbythemechanismofthePoolsFreePageList.AsaserverwritestoaDPVOL,
andnewpagesarerequired,theseareassignedtotheVVOLGroupforusebythatDPVOL.Notethat
thisassignmentisfromthenextavailablepageintheFreePageList,notfromjustanywhereintheFree
PageList.ItisrandominthesensethatyoudonotknowfromwhichPoolVolumethenextpageforan
individualDPVOLwillbemapped.
However,theserverseeseachDPVOLashavingthecompleteSCSILogicalBlockAddress(LBA)rangeof
512 byte physical blocks that would match the logical size that was specified when that DPVOL was
created. This is what thin provisioning is all about space allocated as needed. A DPVOL is assigned
42MBpagesofphysicalspacefromthePoolovertimeastheserverwritestoblocksinthatvolume.
OnceassignedtoaVVOLGroup,apageismappedintotheassociatedDPvolumePageTable.Thisisa
table of pointers matching up the hosts virtual LBA range and physical 512 byte blocks with that DP
volumes assigned pages. The Page Table presents a contiguous space of virtual and assigned 42MB
segmentstothatvolumestartingatbeginningofthatDPVOLsaddressspace(byte0).ADPVOLwhose
size is not a whole number of 42MB pages can acquire a page that will be only partially used at the
highest address range for that DPVOL. The remainder of that page is unused (but this is a very small
amountofspace).
If a DPVOL was specified to be a 100GB volume, that would represent 209,715,200 such SCSI blocks
sittingon2,43942MBPoolpages.EachassignedPoolpagewillcontainaspecific(predetermined)range
ofconsecutive86,016512byteblocks.Asillustratedbelow,onepageinaDPVOLwillcontainLBA0to
LBA86015,andsomeotherpagewillcontainLBA86016toLBA172031.Whetheraphysicalpagehas
beenassignedisdependentonwhetheranyLBAwithinthatfixedrangehasbeenwrittentobythehost.
For example, if the server wrote 8KB file system blocks to a new DPVOL at LBA locations 0, 86016,
RESTRICTEDCONFIDENTIAL

Page49

172032,258048,and344064thenthatDPVOLwouldhavefivenewPoolpagesassigned.Iftheserver
next wrote to LBA locations 16, 86032, 172048, 258064, and 344080 (or such) then those same five
pageswouldbeused.
Figure 38. Illustration of Pool Page to a Volumes SCSI LBA Range

ThenumberofPoolpagesinitiallyassignedtoaDPVOLbyfilesystemformattingwilldependuponthe
operating system, the file system, and the size of the volume. Once the server formats a volume, the
Poolwillhaveassignedsomenumberofphysicalpagestoit.Theamountofthephysicalspaceinitially
requiredbythisformattingoperationisdeterminedbytheamountofmetadatathefilesysteminitially
writestothatvolume,alongwiththedistribution(stride)oftheseblocksofmetadataacrosstheentire
volume.Hence,thenumberofpagesinitiallyassignedtovolumebytheformattingoperationwillvary.
SeeAppendix13foratableoftypicalfilesystemusage.
If,forexample,aDPVOLisformattedbyafilesystem,andthefilesystemsblocksizeis8KB,therewill
be5,376file systemblocksperpage(each8KBblockconsuming 16LBAblocks).Thefirst timeoneof
theseblocksfromaDPVOLvirtualpageiswrittento,thefollowingsequenceoccurs:
ThenextavailablefreePoolpagewillbeassignedfromthePoolsFreePageList.
ThatpagewillbemappedintothatDPVOLsPageTableandassociatedwithacertainLBArange.
Thenewblock(s)willbewrittentothatpage.
Usage Mechanisms
IfahostReadispostedtoanareaoftheDPVOLthatdoesnotyethavephysicalspace(noblockshave
beenwrittenthereyet),theHitachiDynamicProvisioningDynamicMappingTablewilldirectthereadto
aspecialNullPage.NewPoolpagestocoverthatregionwillnotbeallocatedfromtheFreePageList
sinceitwasaRead.
IfaWriteispostedtoanareaoftheDPVOLwhichdoesnotyethavephysicalspace,theDPVOLrequests
anewpageandassignsittothatLBArangeinthePageTable.
IfaWriteispostedtoanareaoftheDPVOLwhichdoesnotyethavephysicalspace,andthePoolhas
completely run out of space, the host will get a Write Protect error. During this out of space
condition,allReadoperationsspecifyinganLBAregionofaDPVOLwithoutanassignedpagewillnow
returnanIllegalRequesterrorratherthantheNullPage.Theseerrorconditionswillbeclearedonce
additionalphysicalPoolVolumespacehasbeenaddedtothatPoolorsomeDPVOLsareformattedand
disconnectedfromthePool,therebyfreeingupalloftheirpreviouslyallocatedpages.

RESTRICTEDCONFIDENTIAL

Page50

TherewillnormallybeseveralDPVOLsconnectedtoasingleHitachiDynamicProvisioningPool.Aseach
DPVOLbeginsreceivingwritesfromtheservers,thenumberofassignedPoolpagestothatDPVOLwill
grow.AsthisgroupofDPVOLsconsumesPoolPagesovertime,thePoolsFreeSpaceThresholdcanbe
reachedandthestorageadministratorwillreceiveanalerttellinghimtoaddadditionalPoolVolumesto
thatPool.TheamountofphysicalspaceaDPVOLcanacquireovertimewillbelimitedbytheDPVOLsize
(upto4TB)whenitwasdefined.ThesumofallthevirtualsizesofthoseDPVOLsconnectedtoaPool,
roundeduptoanintegralnumberofpages,willbethemaximumphysicalPoolsizerequiredovertime.

DPVOL Features and Restrictions


TherearevariousrestrictionsregardingDPVOLs.Theseinclude:

ADPVOLmayonlybeconnectedtoonePool.

PagesarereleasedfromaDPVOLwhentheassociationtotheHitachiDynamicProvisioningPool
is deleted. Since these Pages were merely mapped from the Pool to the DPVOLs local Page
Table,theseNfreedPagesreturntothatPoolsFreePageListandtheywillbereassignedto
DPVOLsbythenextNrequestsforPages.

PagescomprisedofallbinaryzeroscanbefreedfromaDPVOL.Aconditionwheresomepages
areallzeroisnormalwhenatraditionalvolumeismigratedasistoaDPVOL.

Therecanbeupto8,192DPVOLsassignedtoanHitachiDynamicProvisioningPool.

ADPVOLcanrangefrom46MBto4TBinsize.

ADPVOLcanbeincreasedincapacity(upto4TB).

Intermsofusage,aDPVOLisprettymuchlikeastandardLDEVwithafewrestrictions.
1. ADPVOLhasanormalLDKC:CU:LDEVnumberbutthisisinaspecialtypeofParityGroup
(theVVOLGroup).
2. UsingtheusualHDStoolsyoucanuseDRUtoprotectaDPVOL.
3. ShadowImagepairs,TrueCopypairs,HURpairs,orCoWPVOLscanbecreatedfrom
DPVOLs.
4. DPVOLcannotusecacheresidency.
5. ADPVOLalsodiffersfromanormalLDEVinthatyoucan'tresizeorsubdivideitusingVLVI
orgluethemtogetherwithLUSE.

Pool Page Details


ThephysicalPoolspaceisassignedovertimeasneededtotheDPVOLsthatareconnectedtothatPool
throughtheirVVOLGroup.SpaceisassignedfromthePoolin42MBallocationunitscalledPoolpages.
AllPoolpagesaremanagedbytheHitachiDynamicProvisioningsoftwareinaDynamicMappingTable
(DMT)inSharedMemory.AllmetadatafortheHitachiDynamicProvisioningPoolsandtheirDPVOLsis
maintainedintheDMT,whichresidesinthe12GB14GBaddressspaceofSharedMemory.Onetypeof
elementintheDMTisaFreePageListthatismaintainedforeachPoolforalloftheir42MBpages.
ThePoolsFreePageListisatableofpointerstoallofthepageslocatedonitssetofPoolVolumesas
well as some status information. When a DPVOL needs a new physical page, it is assigned the next
availablepagefromthisFreePageList.ThestructureoftheFreePageListconsistsofamatrixof42MB
RESTRICTEDCONFIDENTIAL

Page51

pages from all of the Pool Volumes using intelligent algorithms that take all of the Pool factors into
consideration. The Free Page List is generated (or regenerated) when the Optimize button in Storage
NavigatorisselectedafteraddingLDEVstoaPool(eitherinitiallyorsubsequently).Theconceptofthin
provisionedcapacityinvolvesthePoolpagecapacityactuallyassignedtoaDPVOLwhichgrowsover
timeandnotthestatic,reportedsizeofthevolumefromthehostsperspective.
Note: If one does not use the Optimize button when appropriate, a new Free Page List will be
dynamicallycreatedovertime.However,thisoperationismuchfasterwhendonepurposefullywiththe
Optimizebutton.
EachDPVOLhasaPageTableforkeepingtrackofpointerstoitsassignedPoolpages.Thisisalistofall
assignedPoolpagesandwheretofindthemonthePoolVolumes.EachPageTableisalsoalookuptable
ofassignedphysicalpagestotheserversvirtualaddressspace(theLBArangeofallthe512byteblocks
inthatvolume).NotethatonceaPoolpageisassignedtoaDPVOL,itremainsthereregardlessofthe
stateofthedatawrittentoblocksonthatpage.Ifallfilesthathappentoresideonthatpageisdeleted
bythehostfilesystem,leavingthepageemptywithalotofunneededghostdata,itdoesntmatter.
As described previously, each page once assigned to the volume becomes a piece of the physical
spaceforaspecificLBArangeforthatvolume.Onceallocatedandcontaininganynonbinaryzerodata
thenthepageassignmentispermanentuntilthatDPVOLisdeleted.
Note: When using Tiered Storage Manager, once a DPVOL is moved from one Pool into
another,theoriginalPoolpagesareautomaticallyreformattedandreleasedbacktothePool.
TheOptimizebuttonhasnoeffectexceptafteraddingnewPoolVolumesordeletingDPVOLs.
TherearesomehousekeepingpagesfromeachPoolthatarenotavailableforassignmenttoDPVOLs.
Thereisasmall84MBsystemregion(2pages)perPoolforeachassignedPoolVolume.Thereisalsoa
small4GB(98page)regionineveryHitachiDynamicProvisioningPoolthatisreservedbythesystemto
useforamirrorcopyoftheDMTinSharedMemory.ForUniversalStoragePlatformVreleaseG01,with
thesettingofaflagintheSVP,theDMTiscopiedtotheSVPsinternaldiskonasystempowerdown
(PSOFF). For G02, the DMT is mirrored to this reserved 4GB region in the first Hitachi Dynamic
ProvisioningPoolcurrentlydefinedinthesystem.
There is a simple physical structure for Pool pages. An individual 42MB page is laid down on a whole
number of consecutive RAID stripes from a single Pool Volume. The Table below illustrates the
relationship among different RAID levels, their various RAID stripe sizes, and the assignment of
consecutivestripestoasinglepage.
Table 12. Number of RAID Stripes per 42MB Pool Page

RAIDType
RAID10
RAID5
RAID5
RAID6

RAIDLevel
2+2
3+1
7+1
6+2

DataDisks
2
3
7
6

RAIDStripeSize
(KB)
1024
1536
3584
3072

RAIDStripes
perHDPPage
42
28
12
14

Take,forexample,thecaseofanHitachiDynamicProvisioningPoolthatiscomposedofseveralRAID5
(7D+1P) Pool Volumes (LDEVs). Each LDEV has a formatted RAID stripe width of 3,584KB. This comes
fromtwo256KBconsecutivechunksperdisk,with(effectively)sevendatadisksinthestripe(256+256x
RESTRICTEDCONFIDENTIAL

Page52

7).[SeeAppendix2forthevariousHitachiRAIDleveldetails.]Therefore,twelvesuchRAIDstripesfroma
single RAID5 (7D+1P) LDEV will be used to create a single 42MB Pool Page. In the case of a Pool
consisting of RAID10 (2D+2D) Parity Groups, each LDEV will provide 42 consecutive stripes for each
42MBPoolpage.
Pool Expansion
As Pages in the Pool are assigned to DPVOLs over time and the number of free Pages drops below a
certainthresholdoftheinitialPoolcapacity,moreLDEVswillhavetobeaddedtothatPool.Oncenew
LDEV(s)areaddedtothePoolandtheOptimizebuttoninStorageNavigatorisused,theFreePageList
willonceagainbeoptimized.TheexistingFreePageListwillbeextendedusingthenewpagesavailable
from the additional Pool Volumes. When new Pool Volumes are added to a Pool once the remaining
FreePagecountislow,theHDSrecommendationistoaddanothersetofLDEVsthatmatchtheinitial
set(sameRAIDlevel,samediskRPM).ThiswillensurethattheDPVOLperformanceremainsonparwith
whatwaspreviouslyexperiencedforthatPool.
Forexample,aPoolthatstartedwithsixLDEVswillhaveacertainperformanceduetothenumberof
disksandtheRAIDlevelused(writepenalty).IfsixRAID10(2D+2D)ParityGroupswereusedforthese
six LDEVs, that Pool would have started out with 24 disks of IOPS power. Each individual page would
havetheIOPSpoweroftheunderlyingLDEV(suchas2D+2D),butonaveragethesetofpagesmapped
toaDPVOLwouldincludepagesfromallPoolVolumes,thusdeliveringthefull24diskIOPSpower.
Supposethattheinitialsetoffreepages(fromthesixLDEVPool)becomedepleted(below20%ofthe
Pool) and two new LDEVs are then added to the Pool (each from different RAID10 2D+2D Parity
Groups). Once the last 20% of original Pool pages are assigned from a Pools Free Page List, the new
pages will now exhibit the overall IOPS power of only 8 disks. There will be a noticeable drop in IOPS
ratesonceavailablefromthe24diskPoolasIOtonewpagesisnowhittingan8diskPool.
Thefollowingmaybeawaytodescribethisconceptina10second"techiespot":
AnewPooliscreatedfromasetofLDEVstakenfromlikeParityGroups.ThemoreParityGroups
in use, the higher the IOPS performance that is available to that Pool. This space is then
"Optimized",wheretheFreePoolPageList(using42MBpages)iscreatedintheHitachiDynamic
ProvisioningregionofSharedMemory.Thislistisbuiltfrom42MBpagesacrossthePoolLDEVs.
These free pages are then assigned on a firstcome firstserve basis as needed to DPVOLs
connectedtothatPool.
WhennewLDEVsareaddedtothePooloncethespacebecomesdepleted,thesameprocessis
followed. A complementary number of LDEVs from enough Parity Groups should be added to
maintaintheIOPSperformancelevelofthePool.
Miscellaneous Hitachi Dynamic Provisioning Details
The configuration of Hitachi Dynamic Provisioning Pools and DP Volumes is done via Storage
Navigator(orHiCommandDeviceManagerHDvM).Thisisalsohowthephysicalspaceusage
per Pool is monitored. There are two Pool Free Space thresholds: a user specified threshold
between 5%95% page utilization and a fixed threshold at 80% utilization. Both of these
thresholdsareusedasthetriggersforthelowPoolspacenotifications.

If more than 4TB of space is required for a single host volume, then multiple DPVOLs may be
usedintheordinaryfashionbyalogicalvolumemanagerontheservertocreateasinglelarge

RESTRICTEDCONFIDENTIAL

Page53

stripedLogicalVolume.NotethatiftheseDPVOLscomefromdifferentPools,thentheoverall
IOPScapacitywillincrease.IfalloftheDPVOLsusedbyahostLVMarefromthesamePool,then
onlythetotalusablecapacitywillincrease.

TheuseofHitachiDynamicProvisioningwillcauseanincreaseintheportprocessor(FEDMP)
utilization for some writes. There is very small increase in utilization for individual reads or
writes to the DPVOL area when physical pages are already allocated from the Pool for that
addressspace.WhenawritetoaDPVOLcausesaphysicalPageallocationfromthePool,there
isabriefutilizationincrease(inmicroseconds)ontheassociatedFEDMPforthatoperation.

Theassociationofallofthearrayelementsthatmustbeconfiguredisasfollows:
ArrayGroup>ParityGroup>LDEVs(assignLDKC:CU:LDEV)>HDPPoolVolume>VVol
Group>DPVol(assignLDKC:CU:LDEV)>LUN.

Thislengthysetofstepsissimilartowhatisrequiredwhenusingahostbasedvolumemanager
such as Veritas VxVM and an array that does not offer Hitachi Dynamic Provisioning. For
instance,ifusinganEMCDMX4,onewouldhavetofollowthesestepswhichwouldhavetobe
repeatedoneveryserverattachedtothatarray:

Disks>Hypervolumes>MetavolumeswithasoftwareRAID>LUNs(assignport:address)>
VMDisk(privateregion)>VMDisk(publicregion)>PLEX>Volume(LUN).

Hitachi Dynamic Provisioning and Program Products Compatibility


Thetwo tablesbelowshowtheapplicationofPoolLDEVsand DPVOLstothevariousHitachiProgram
Products.

ProgramProductCompatibility
CacheResidencyManager
CacheResidencyManagerforz/OS
CompatibleMirroringforIBMFlashCopy
CompatiblePAVforz/OS
CompatibleReplicationforIBMXRC
CopyonWritesnapshot
PVOL
POOL
DataRetentionUtility
LUNManager
LUNSecurity
OpenVolumeManagement
CVS
LUSE
PerformanceManager
ServerPriorityManager

PoolVol
n/a
n/a
n/a
n/a
n/a

n/a
n/a
n/a
n/a
n/a

yes
n/a
yes
n/a

Support

DPVol

no

n/a

n/a

n/a

n/a

yes

n/a

yes

yes

yes

yes

no

yes

yes

ShadowImage
ShadowImageforz/OS

n/a
n/a

yes
n/a

StorageNavigator

yes

yes

RESTRICTEDCONFIDENTIAL

Page54


ProgramProductCompatibility
TrueCopy
TrueCopyforz/OS
TrueCopyAsync
TrueCopyAsyncforz/OS
UniversalReplicator
PVOL
SVOL
JournalVOL
UniversalReplicatorforz/OS
UniversalVolumeManager
VirtualLVI
VirtualLVIz/OS
VirtualPartitionManager
VolumeSecurity
VolumeSecurityz/OS
VolumeShredder

PoolVol
n/a
n/a
n/a
n/a

n/a
n/a
n/a
n/a
yes
n/a
n/a
yes
n/a
n/a
n/a

Support

DPVol

yes

n/a

no

n/a

yes

yes

n/a

n/a

n/a

yes

n/a

yes

yes

n/a

yes

VolumeMigration
VolumeRetentionManager

no
n/a

yes
yes

VolumeRetentionManagerz/OS

n/a

n/a

Universal Storage Platform V Volume Flexibility Example


IfyouhaveacustomerusingExchangeServer2003,theyarelimitedtotheuseoffourStorageGroups
(LUNs).AhostvolumemanagercannotbeusedwithExchange.DuetothewaythattheunderlyingJet
database operates, one must normally use RAID10 volumes to avoid the database mailbox insert
response time issues that a well known characteristic of Jet. Here are some alternative methods on a
UniversalStoragePlatformVtoprovidethesefourStorageGroupLUNstoanExchangeserver:
STANDARD VOLUMES: If using RAID10 (2D+2D), this method would provide 16 disks when
usingfourstandardLDEVsfromfourParityGroups.
LUSE VOLUMES: If using four LUSE volumes where each of them has fourLDEVs from four
RAID10 (2D+2D) Parity Groups (thus 16 Parity Groups overall), this would provide 64 disks
overall.However,youwillnormallyonlyseetheIOpowerofoneLDEVperLUSEvolume(4disks
each,or16inparallel)duetothesimpleconcatenationnatureofLUSE(filloneLDEV,thenstart
onthenextone).
STRIPED (CONCATENATED) PARITY GROUPS: If using this configuration with four VDEV
volumes, where each Storage Group was located on one VDEV created across four RAID5
(7D+1P) Parity Groups, there would be 128 disks overall. However, due to the write latency
sensitivityofJet,onewouldnormallynotwanttouseaRAID5solution.Notethatthisstorage
system based interleaved (not concatenated) RAID stripe solution is not available from other
vendors.
HDP #1: If using Dynamic Provisioning for the Exchange Storage Groups, there could be 12
LDEVsfrom12RAID10(2D+2D)ParityGroupsputintoaPool(or48disksinall),andthencreate
fourDPVOLsconnectedtothisPool.EachDPVOLwouldbeusedasanExchangeStorageGroup
RESTRICTEDCONFIDENTIAL

Page55

overadedicated4Gbithostpath.Thisverylarge"native"IOPSsolutionisnotavailablefromany
othervendor.
HDP#2:IfusingDynamicProvisioning,therecouldbe32LDEVsfrom32RAID10(2D+2D)Parity
Groups put into four Pools (128 disks in all), with four DPVOLs created per Pool. Each DPVOL
would be used as an Exchange Storage Group, and four such Exchange Servers could be
supportedbythisconfiguration.Thishuge"native"IOPSsolutionisnotavailablefromanyother
vendor. However, note that 4 paths from each of the four servers would likely be needed to
effectivelyusethishugereservoirofrandomIOPScapacity.

Storage Concepts
Understand Your Customers Environment
Before recommending a storage design for customers, it is important to know how the product will
address the customers specific business needs. The factors that must be taken into consideration
include:Capacity,Performance,Reliability,FeaturesandCostwithrespecttothestorageinfrastructure
component. The more data you have,the easier it is to architect a solution as opposed to just selling
another storage unit. The types of storage configured, including the disk types, the number of RAID
GroupsandtheirRAIDlevels,aswellasthenumberandtypeofhostpathsisimportanttothesolution.

Disk Types
Table13belowliststhe diskstypesavailableintheUniversalStoragePlatformVstoragearrays.Note
that the table lists the true sizes typically seen by a host after formatting. The additional formatting
appliedbyhostbasedLogicalVolumeManagersandfilesystemswillconsumeadditionalspace.
Table 13. List of Disk Types and Characteristics, Along with Nominal Rule-of-Thumb IOPS Limits

Port
Speed

Advertised
Size(GB)

1TBSATA7.2kRPM
750GBSATA7.2kRPM
450GBFC15kRPM
400GBFC10kRPM
300GBFC15kRPM
146GBFlash
146GBFC15kRPM
72GBFlash

3Gb/sec
3Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec

1000
750
450
400
300
146
146
72

976
738.62
432
384
288.2
143.76
143.76
71.5

927.2
701.7
410.4
364.8
273.8
136.6
136.6
67.9

80
80
180
130
180
5000read1500write
180
5000read1500write

72GBFC15kRPM

4Gb/sec

72

71.5

67.9

180

HDDType

Apparent UsableSize
Size(GB)
(GB)

NominalRandom
IOPS

The table above illustrates the advertised size (base10), the apparent size in the tools (still base10),
andthetypicalusable(afterParityGroupformatting)sizeofeachtypeofdisk.Alsoshownistheruleof
thumb average maximum random 8KB block IOPS rate for each type of disk when using most of the
disksurface. Alsoshownarethe nominalratesexpectedforFlashDriveswhenusedasmembersof a
RAID5(7D+1P)groupacrosstwoBEDpairs(fourboards,8FCALlooppairs).
RESTRICTEDCONFIDENTIAL

Page56

Every disk has a maximum random small block IOPS rate determined by seek times (head movement
across the disk) and rotational delays (disk RPM). The number and size of the LUNs created per RAID
Group determine how much of a disks surface is in active use. When using the full disk surface, any
expectationsforhighersustainedlevelsofIOPSperdiskmustbemetbysustainedcachehitratesonthe
storagearray(avoidingphysicalreadsfromdisk).Notethat1020%isatypicalcachehitrateonOpen
Systemsservers.
AsmoreofthedisksurfaceisallocatedtoLUNsforaRAIDGroup,thefartherthediskheadsmustseek.
Thiscreateshigheraverageseektimes.Forexample(usinga15kRPMdisk),whenonlyusingtheouter
25% of the surface the average read seek time can be around 1.8ms (providing about 267 IOPS),
whereasthe100%surfaceaverageseektimewillbeabout3.8ms(about182IOPS).ForaSATAdiskat
7200 RPM, these values are about 4.3ms (119 IOPS) for using 25% of the disk surface and 8.5ms (79
IOPS)whenusingthefulldisksurface.Allwriteratesareabout5%lowerthanthereadratesduetothe
higherseekprecisionrequired.SeeAppendix14formoredetailsontheoreticalestimatesofdiskIOPS
rates.

RAID Levels
TheUniversalStoragePlatformVarraycurrentlysupportsRAIDlevels5,6,and10.RAID5isthemost
spaceefficientofthesethreeRAIDlevels.
RAID5isagroupofdisks(RAIDGrouporParityGroup)withthespaceofonediskusedforthe
rotating parity chunk per RAID stripe (row of chunks across the set of disks). If using a 7D+1P
configuration(7datadisks,1paritydisk),thenyouget87.5%capacityutilizationforuserdata
blocksoutofthatRAIDGroup.
RAID6isRAID5withasecondparitydiskforaseconduniqueparityblock.Thesecondparity
block includes all of the data chunks plus the first parity chunk for that row. This would be
indicatedasa6D+2Pconstruction(75%capacityutilization)ifusing8disks.
RAID10isamirroringandstripingmechanism.First,individualpairsofdisksareplacedintoa
mirrorstate.ThentwoofthesepairsareusedinasimpleRAID0stripe.Astherearefourdisks
in the RAID Group, this would be represented as RAID10 (2D+2D) and have 50% capacity
utilization.RAID10isnotthesameasRAID0+1,althoughsloppyusagebymanywouldleadone
tothinkthisisthecase.RAID0+1isaRAID0stripeofNdisksmirroredtoanotherRAID0stripe
ofNdisks.Thiswouldalsobeshownas2D+2Dfora4diskconstruction.However,ifonedisk
fails, that RAID0 stripe also fails, and the mirror then fails, leaving a user with a single
unprotectedRAID0group.InthecaseofrealRAID10,onediskofeachmirrorpairwouldhave
tofailbeforegettingintothissameunprotectedstate.
ThefactorsindeterminingwhichRAIDleveltousearecost,reliability,andperformance.Table14shows
the major benefits and disadvantage of each RAID type. Each type provides its own unique set of
benefitssoaclearunderstandingofyourcustomersrequirementsiscrucialinthisdecision.

RESTRICTEDCONFIDENTIAL

Page57

Table 14. RAID Levels Comparison.

RAID10

RAID5

RAID6

Description

DataStripingandMirroring

DataStripingwith
distributedparity

DataStripingwithtwo
distributedparities

Numberof
Disks

4/8

4/8

Benefit

Highestperformancewithdata
redundancy;higherwriteIOPSper
ParityGroupthanwithsimilarRAID5.

Thebestbalanceofcost,
reliability,and
performance.

Balanceofcost,with
extremeemphasison
reliability

Disadvantages

Highercostpernumberofphysical
disks

Performancepenaltyfor
highpercentageof
RandomWrites

Performancepenaltyfor
allwrites

AnothercharacteristicofRAIDistheideaofwritepenalty.EachtypeofRAIDhasadifferentbackend
physicaldiskI/Ocost,determinedbythemechanismofthatRAIDlevel.Thetablebelowillustratesthe
tradeoffsbetweenthevariousRAIDlevelsforwriteoperations.Thereareadditionalphysicaldiskreads
andwritesforeveryapplicationwriteduetotheuseofmirrorsorXORparity.NotethatSATAdisksare
usuallydeployedwithRAID6toprotectagainstaseconddrivefailurewithintheParityGroupduringthe
lengthy disk rebuild of a failed drive. Also note that you must deploy many more SATA disks than FC
diskstobeabletomeetthesamelevelofIOPSperformance.
Table 15. RAID Write Penalties

HDD IOPS
per Host
Read

HDD IOPS
per Host
Write

RAID10
RAID5
RAID6

1
1
1

2
4
6

RAID10
RAID5

1
1

4
6

RAID6

FC

SATA

WhenusingFibreChanneldisks,awriteverifyoperationisperformedinternallyaftereachwritetothe
disk(fordataorparity).ThisBEDissuedcommandsimply,byalgorithmicmeans,verifiesthatitwrote
thedatawhereitwastoldtowritethedata.ThissamecommandisnotavailableforSATAHDDs.To
protectdata,inthecaseofSATAdisks,thereareadditionalphysicalI/Ooperationsperwriteinorderto
comparewhatwasjustwrittentodiskwiththedataandparityblocksheldincache.WithRAID6,there
arethreesuchblocks:data,parity1,andparity2.

RESTRICTEDCONFIDENTIAL

Page58

Table 16. Break Down of RAID Level Write Costs (Fibre Channel Disks)

RAID10
RAID5
RAID6

IOPSConsumedperhostI/Orequest

1DataWrite,1mirroredDataWrite
2reads(1data,1parity),2writes(1data,1parity)
3reads(1data,2parity),3writes(1data,2parity)
SATAwrites:alsoreadandcomparedataandparitywrites(mirror)

Parity Groups and Array Groups


On Hitachi enterprise products the terms Parity Group and Array Group are used when configuring
storage. The Array Group is the set of four associated disks, while the Parity Group is the RAID level
appliedtoeither1or2setsofdisks.However,ingeneraluseinthefield,thetermsArrayGroupand
ParityGroupareoftenusedinterchangeably.

RAID Chunks and Stripes


AParityGroupisalogicalmechanismthathastwobasicelements:avirtualblocksizefromeachdisk(a
chunk)andarowofchunksacrossthegroup(theRAIDstripe).Thechunksizeis256KBontheUniversal
StoragePlatformV,buttwosuchchunksperdiskareusedatatimesotheeffectivechunksizeisreally
512KB.ThestripesizeisthesumofthechunksizesacrossaRAIDGroup.Thisonlycountsthedata
chunksandnotanymirrororparityspace.Therefore,onaRAID6groupcreatedas6D+2P(eightdisks),
thestripesizewouldbe3072MB(512KBx6).
Notethatsomeindustryusagereplaceschunkwithstripesize,stripedepth,orinterleavefactor,
andstripesizewithstripewidth,rowwidthorrowsize.
NotethatonallcurrentRAIDsystems,thechunkisaprimarilyunitofprotectionmanagement:either
the parity or mirror mechanism. I/O is not performed on a chunk basis as is commonly thought. On
OpenSystems,theentirespacepresentedbyaLUNisacontiguousspanof512byteblocks,knownas
the Logical Block Address range (LBA). The host application makes I/O requests using some native
request size (such as a file system block size), and this is passed down to the storage as a unique I/O
request.Therequesthasthestartingaddress(ofa512byteblock)andalength(suchasthefilesystem
8KBblocksize).ThestoragearraywilllocatethataddresswithinthatLUNtoaparticulardiskaddress,
andreadorwriteonlythatamountofdatanotthatentire chunk.Alsonotethatthisrequestcould
requiretwodiskstosatisfyif2KBoftheblockliesononechunkand6KBonthenextoneinthestripe.
Because of the variations of file system formatting and such, there is no way to determine where a
particularblockmaylieontherawspacepresentedbyavolume.Afilesystemwillcreateavarietyof
metadata in a quantity and distribution pattern that is related to the size of that volume. Most file
systemsalsotypicallyscatterwritesaroundwithintheLBArangeanoutdatedholdoverfromlongago
whenfilesystemswantedtoavoidacommonproblemoftheappearanceofbadsectors ortrackson
disks. What this means is that attempts to align application block sizes with RAID chunk sizes is a
pointlessexercise.
The one alignment issue that should be noted is in the case of hostbased Logical Volume Managers.
These also have a native stripe size that is selectable when creating a logical volume from several
physicalstorageLUNs.Inthiscase,theLVMstripesizeshouldbeamultipleoftheRAIDchunksizedue
RESTRICTEDCONFIDENTIAL

Page59

to various interactions between the LVM and the LUNs. One such example is the case of large block
sequentialI/O.IftheLVMstripesizeisequaltotheRAIDchunksize,thenaseriesofrequestswillbe
issued to different LUNs for that same I/O, making the request appear to be several random I/O
operationstothestoragearray.Thiscandefeatthearrayssequentialdetectmechanisms,andturnoff
sequentialprefetch,slowingdownthesetypesofoperations.

LUNS (host volumes)


OnaUniversalStoragePlatformV,whenspaceiscarvedoutofaParityGroupandmadeintoavolume,
itisthenknownasaLogicalDevice(LDEV).OncetheLDEVismappedtoahostportforusebyaserver,
it is known as a LUN (Logical Unit Number) and is assigned a certain World Wide Name if using fibre
channelinterfacesonthearray.OnaniSCSIconfiguration,theLUNgetsanamethatisassociatedwith
anNFSmountpoint.NotethatgeneralusagehasturnedthetermofLUintoLUN.

Number of LUNs per Parity Group


WhenconfiguringaUniversalStoragePlatformV,oneormoreLDEVsmaybecreatedperParityGroup,
but the goal should be to clearly understand what percentage of that groups overall capacity will
containactivedata.InthecasewheremultiplehostsattempttosimultaneouslyuseLUNsthatsharethe
same physical disks in an attempt to fully utilize capacity, seek and rotational latency may be a
performance limiting factor. In attempting to maximize utilization, RAID groups should contain both
activeandlessfrequentlyusedLUNs.Thisistrueofallphysicaldisksregardlessofsize,RAIDlevel,and
physicalcharacteristics.
Itisalsotruethat,ifmanysmallLUNsarecarvedoutofasingleParityGroup,theirsimultaneoususewill
createmaximumseektimesoneachdisk,reducingthemaximumsustainablesmallblockrandomIOPS
ratetothedisksminimum.

Port I/O Request Limits, LUN Queue Depths, and Transfer sizes
There are three aspects of I/O request handling per storage path that need to be understood. At the
portlevel,therearethemechanismsofanI/Orequestlimitandamaximumtransfersize.AttheLUN
level,thereisthemechanismofthemaximumqueuedepth.
Port I/O Request Limits
Each server and storage array has a particular port I/O request limit, this being the number of total
outstandingI/Orequeststhatmaybeissuedtoaport (notaLUN)atany pointin time.Thislimitwill
vary according to the servers Fibre Channel HBA and its driver, as well as the limit supported by the
target storage port. As switches serve to aggregate many HBAs to the same storage system port, the
operatinglimitwillbethelowerofthesetwopoints.InthecaseofaSANswitchbeingusedtoattach
multipleserverportstoasinglestorageport(fanin),thelimitwillmostlikelybethelimitsupportedby
thestorageport.
OntheUniversalStoragePlatformV,theI/OrequestlimitperFEDMP(controllingeither1or2ports)is
4096.Thismeansthat,atanyonetime,therecanbeupto4096activehostI/Ocommandsqueuedup
forthevariousLUNsvisibleonthatMPsports.
RESTRICTEDCONFIDENTIAL

Page60

Most environments dont require that all LUNs be active at the same time. As I/O requests are
applicationdriven,thismustbetakenintoconsideration.Understandingthecustomersrequirements
for performance will dictate how many LUNs should be assigned to a single port. Environments that
have low I/O demands overall will allow a large number of LUNs to be assigned per port without
routinelyexceedingthequeuelimitforthatport.Ontheotherhand,environmentswithhighlyactive
devicesandmultipleI/OrequestshouldbedesignedwithfewerLUNsperport,whereadditionalports
willberequiredinordertosatisfytheI/Odemand.

Port I/O Request Maximum Transfer Size


Thereisalsoahosttunableparameteratthedriverlevelthatcontrolsthemaximumtransfersizeofa
single I/O operation. This is usually a perport setting, but it also might be a perhost setting. The
defaultsforthiscanrangefromfairlysmall(like32KB)tosomethingmidsized(128KB).Themaximumis
usually1MB16MB.Itisprobablyabestpracticetosetthistothelargestsettingthatissupportedonthe
host.The maximumtransfersizethataUniversalStoragePlatformVwillacceptis1024KB.Thisvalue
controls how application I/O requests get processed. If one used a large application block size (say
256KB)andtheport/LUNdefaultwasjust32KB,theneachrequestwouldbebrokendown(fragmented)
into8*32KBrequests.Thiscreatesadditionaloverheadinmanagingtheapplicationrequest.Theuseof
alargemaximumtransfersizesuchas1024KBwilloftenbereadilyapparentontheperformanceofthe
system.

LUN Queue Depth


AttheLUNlevel,thereisthemechanismofthemaximumqueuedepth.Thisisthelimitofhowmany
outstandingI/OrequestsmaybescheduledbyahostforeachLUN.OnahostHBA,thedevicequeue
depth limit is usually configurable as adriver tunable, usually on a perport basis rather than a global
setting.Onastoragesystem,thislimitisimposedbythecontrollerandisnotconfigurable.Theusable
queuedepthperLUNwhentherearemultiplesimultaneouslyactiveLUNsfromthesameRAIDgroup
limitwillbedeterminedbyphysicalI/Ocapacityofthatgroupofdisks.
OntheUniversalStoragePlatformV,theruleofthumbfortheLUNqueuedepthlimitis32.Theactual
limitcanbe4096,butthatisfarbeyondtherangethatanyLUNcouldaccommodate.Asthe4096limit
isactuallytheMPlimit,thiswillbereducedbythenumberofLUNsmanagedbythatMPforitsport(s).
ItisimportanttonotethattheeffectivequeuedepthlimitperLUNisnotdirectlyrelatedtothehost
I/Osassuch,butmoretotheactualphysicaldiskoperationsrequiredforthatRAIDlevel(seeTable7
above), and for the alignment of the data blocks requested. If the data being written crosses chunk
boundaries,thentheparityread/writeoperationsrequiredwillbedoubled.RecallthatforparityRAID
(RAID5or6)thattheparitychunkisassociatedwithaspecificdatachunk.NotethattheeffectiveRAID
chunksizeiseither512KB(OpenV)forOpenSystemsor464KB(3390x)formainframes.

RESTRICTEDCONFIDENTIAL

Page61

Figure 39. Table of Maximum LUN Queue Depths and Port Limits

MaximumQueueDepth
QDPerLUN
QDPerExternalPortLUN
16
8
32
2128
32

PortIORequestLimitforHosts
PerMPIOReq.Limit
PerPortIOReq.Limit
2048
1024
4096
2048
PortIORequestLimitforExternalStorage
PerMPIOReq.Limit
PerPortIOReq.Limit
256
128
512
256

ArrayType
USP
USPV
AMSFC/SASDisk
AMSSATADisk
ArrayType
USP
USPV
ArrayType
USP
USPV

Mixing Data on the Physical Disks


Physical placement of data by Parity Groups (not just the LDEVs from the same Parity Group) is
extremely important when the data access patterns differ. Mixing highly write intensive data with
high read intensive data will cause both to suffer performance degradation. This performance
degradation will be much greater when using RAID5 or RAID6 due to the increase in backend disk
operationsrequiredforwrites.WhenusingFibreChanneldisks,therearetwophysicalI/Osrequiredfor
eachrandomwriteforRAID10,fourforRAID5,andsixforRAID6.WhenusingSATAdisksandRAID6,
there are three additional physical I/Os required due to the read verification after writing (data and
parity/parity).

Workload Characteristics
Mostapplications,fromadisksubsystempointofview,canbecategorizedintooneofseventypesof
applicationprofiles(thoughtherearetheextremecasesthatarenotcoveredhere):
OnlineTransactionalProcessingRandomRead/Writetodatafiles,withSequentialRead/Write
toLogfunctions
Decision Support Systems Sequential or Random Read, depending on data access method,
postETL
GeneralPurposeFilesystemsRandomRead/WritewithpotentiallysomeSequentialI/O
MessagingsystemsRandomRead/Write
WebServersystemsMixedRandomandSequentialwithvariousblocksizes.
ContentRichMediaStreamingand/orVideoStreamingSequentialRead/Write
HighperformanceComputingSequentialRead/Write

The applications that do not directly fit into one of these seven categories will most likely be a
combinationoftwoormoreofthesecategories.

RESTRICTEDCONFIDENTIAL

Page62

Selecting the Proper Disk Drive Form Factor


In all cases, distributing a workload across a higher number of smallcapacity, highRPM drives will
provide better performance in terms of full random access. Even better results can be achieved by
distributingtheworkloadoverahighernumberofsmallLUNswheretheLUNsaretheonlyactiveLUNs
intheParityGroup.Whencacheddatalocalityislow,multiplesmallcapacityhighRPMdrivesshould
beused.
Onemustalsotakeintoconsiderationthecasewhereasystemthatisonlypartiallypopulatedwith15k
RPM disks will be able to provide a much higher aggregate level of host IOPS if the same budget is
applied to lower cost 10k RPM disks. If there is a 50% increase in the cost of the 15k disks, then one
could install 50% more disks of the 10k variety. The individual I/O will see some increase in response
timewhenusing10kdisks,butthetotalamountofIOPSavailablewillbemuchhigher.

Mixing I/O Profiles on the Physical Disks


MixinglargeblocksequentialI/OwithsmallblockrandomI/Oonthesamephysicaldiskscanresultin
poorperformance.ThisappliestobothreadandwriteI/Opatterns.ThisproblemcanoccurattheLUN
levelorRAIDGrouplevel.Inthefirstcase,withasinglelargeLUN,fileswithdifferentI/Oprofileswill
resultinpoorperformanceduetolackofsequentialdetectforthelargeblocksequentialI/Orequests.
In the second case, where multiple LUNs share a Parity Group, files having different I/O profiles will
result in sequential I/O dominating the disk access time, due to prefetching, thereby creating high
responsetimesandlowIOPSforthesmallblockrandomI/O.Theresolutiontothesetwoissuesistouse
differentLUNsanddifferentArrayGroupsfordifferentdatatypesandI/Oprofiles.

Front-end Port Performance and Usage Considerations


Theflexibilityofthefrontendportsissuchthatseveraltypesofconnectionsarepossible.Acoupleof
portusagepointsareconsidered:
Portfaninandfanout
I/Oprofilemixesonaport
Host Fan-in and Fan-out
WithaSAN,theabilitytoshareaportorportsamongseveralhostnodesispossible,asisconfiguringa
single host node to multiple storage ports. Fanin is one of the great promises of fiber channel based
SANtechnologiesthesharingofcostlystorageresources.Severalhostnodesarefannedintoasingle
storageport,orahostnodehasfannedouttoseveralstorageports.
Fanin
HostFaninreferstotheconsolidationofmanyhostnodesintooneorjustafewstorageports(many
toone). Fanin has the potential for performance issues by creating a bottleneck at the frontend
storage port. Having multiple hosts connected to the same storage port does work for environments
that have minimal performance requirements. In designing this type of solution, it is important to
understand the performance requirements of each host. If each host has either a high IOPS or
throughput requirement, it is highly probable that a single 2gigabit or 4gigabit FCAL port will not
satisfytheiraggregateperformancerequirements.
RESTRICTEDCONFIDENTIAL

Page63

Fanout
Fan out allows a host node to take advantage of several storage ports (and possibly additional port
processors)fromasinglehostport(onetomany).Fanouthasapotentialperformancebenefitforsmall
blockrandomI/Oworkloads.Thisallowsmultiplestorageports(andtheirqueues)toserviceasmaller
numberofhostports.Fanouttypicallydoesnotbenefitenvironmentswithhighthroughput(MB/sec)
requirementsduethetransferlimitsofthehostbusadapters(HBAs).

Mixing I/O Profiles on a Port


Mixing largeblock sequential I/O with smallblock random I/O can result in poor performance. This
appliestobothreadandwriteI/Opatterns.ThisproblemcanoccurattheLUNlevelorParityGroup
level. In the first case, with a single large LUN, files with different I/O profiles will result in poor
performanceduetolackofsequentialdetectforthelargeblocksequentialI/Orequests.Inthesecond
case,wheremultipleLUNsshareaRAIDGroup,fileshavingdifferentI/Oprofileswillresultinsequential
I/Odominatingthediskaccesstime,duetoprefetching,therebycreatinghighresponsetimesandlow
IOPS for the smallblock random I/O. The resolution to these two issues is to use different LUNs and
differentArrayGroupsfordifferentdatatypesandI/Oprofiles.

Summary
The Universal Storage Platform V offers great flexibility and it should perform superbly in any
environment.Atthehardwarelevel,itoffersconsiderablymoreIOPSpowerthandoestheequivalent
Universal Storage Platform line of products. Even though the Universal Storage Platform already
delivered bestofclass Sequential performance (about 3x higher than any competition with >9GB/s
Read, >4GB/s write), the Universal Storage Platform V significantly outperforms the Universal Storage
Platform.
It is important for users to understand the two configuration methods now available with the new
DynamicProvisioningfeature.TheHitachiDynamicProvisioningfeaturecanbeusedforconfiguringThin
Provisioning and for the creation of High Performance volumes. While only certain applications will
benefitfromtheThinProvisioningaspect,mostcantakeadvantageofthehighperformancepervolume
feature.
It is vital, however, that Hitachi Data Systems sales personnel, technical support staff, valueadded
resellers, and others who are responsible delivery of solutions, invest the time required to design the
bestpossiblesolutiontomeeteachcustomersuniquerequirements,whetheritbecapacity,reliability,
performance or cost. Toassist in delivering the highest quality solution, Hitachi Data Systems Global
SolutionServicesshouldbeengaged.

RESTRICTEDCONFIDENTIAL

Page64

Appendix 1. Universal Storage Platform V (Frames,


HDUs, and Array Groups).

USP-V - L2 Frame (256 HDD)


18-16
12:0F
4 HDD

18-14
12:0D
4 HDD

18-12
12:0B
4 HDD

DKA 8
18-10
18-8
12:09
12:07
4 HDD
4 HDD

18-6
12:05
4 HDD

18-4
12:03
4 HDD

18-2
12:01
4 HDD

H
D
U

17-16
11:0F
4 HDD

17-14
11:0D
4 HDD

17-12
11:0B
4 HDD

DKA 7
17-10
17-8
11:09
11:07
4 HDD
4 HDD

17-6
11:05
4 HDD

17-4
11:03
4 HDD

17-2
11:01
4 HDD

18-15
12:0E
4 HDD

18-13
12:0C
4 HDD

18-11
12:0A
4 HDD

18-9
12:08
4 HDD

18-7
12:06
4 HDD

18-5
12:04
4 HDD

18-3
12:02
4 HDD

18-1
12:00
4 HDD

H
D
U

17-15
11:0E
4 HDD

17-13
11:0C
4 HDD

17-11
11:0A
4 HDD

17-9
11:08
4 HDD

17-7
11:06
4 HDD

17-5
11:04
4 HDD

17-3
11:02
4 HDD

17-1
11:00
4 HDD

16-16
10:0F
4 HDD

16-14
10:0D
4 HDD

16-12
10:0B
4 HDD

DKA 6
16-10
16-8
10:09
10:07
4 HDD
4 HDD

16-6
10:05
4 HDD

16-4
10:03
4 HDD

16-2
10:01
4 HDD

H
D
U

15-16
0F:0F
4 HDD

15-14
0F:0D
4 HDD

15-12
0F:0B
4 HDD

DKA 5
15-10
15-8
0F:09
0F:07
4 HDD
4 HDD

15-6
0F:05
4 HDD

15-4
0F:03
4 HDD

15-2
0F:01
4 HDD

16-15
10:0E
4 HDD

16-13
10:0C
4 HDD

16-11
10:0A
4 HDD

16-9
10:08
4 HDD

16-7
10:06
4 HDD

16-5
10:04
4 HDD

16-3
10:02
4 HDD

16-1
10:00
4 HDD

H
D
U

15-15
0F:0E
4 HDD

15-13
0F:0C
4 HDD

15-11
0F:0A
4 HDD

15-9
0F:08
4 HDD

15-7
0F:06
4 HDD

15-5
0F:04
4 HDD

15-3
0F:02
4 HDD

15-1
0F:00
4 HDD

10-16
0A:0F
4 HDD

10-14
0A:0D
4 HDD

10-12
0A:0B
4 HDD

DKA 8
10-10
10-8
0A:09
0A:07
4 HDD
4 HDD

10-6
0A:05
4 HDD

10-4
0A:03
4 HDD

10-2
0A:01
4 HDD

H
D
U

9-16
09:0F
4 HDD

9-14
09:0D
4 HDD

9-12
09:0B
4 HDD

DKA 7
9-10
9-8
09:09
09:07
4 HDD
4 HDD

9-6
09:05
4 HDD

9-4
09:03
4 HDD

9-2
09:01
4 HDD

10-15
0A:0E
4 HDD

10-13
0A:0C
4 HDD

10-11
0A:0A
4 HDD

10-9
0A:08
4 HDD

10-7
0A:06
4 HDD

10-5
0A:04
4 HDD

10-3
0A:02
4 HDD

10-1
0A:00
4 HDD

H
D
U

9-15
09:0E
4 HDD

9-13
09:0C
4 HDD

9-11
09:0A
4 HDD

9-9
09:08
4 HDD

9-7
09:06
4 HDD

9-5
09:04
4 HDD

9-3
09:02
4 HDD

9-1
09:00
4 HDD

8-16
08:0F
4 HDD

8-14
08:0D
4 HDD

8-12
08:0B
4 HDD

DKA 6
8-10
8-8
08:09
08:07
4 HDD
4 HDD

8-6
08:05
4 HDD

8-4
08:03
4 HDD

8-2
08:01
4 HDD

H
D
U

7-16
07:0F
4 HDD

7-14
07:0D
4 HDD

7-12
07:0B
4 HDD

DKA 5
7-10
7-8
07:09
07:07
4 HDD
4 HDD

7-6
07:05
4 HDD

7-4
07:03
4 HDD

7-2
07:01
4 HDD

8-15
08:0E
4 HDD

8-13
08:0C
4 HDD

8-11
08:0A
4 HDD

8-9
08:08
4 HDD

8-7
08:06
4 HDD

8-5
08:04
4 HDD

8-3
08:02
4 HDD

8-1
08:00
4 HDD

H
D
U

7-15
07:0E
4 HDD

7-13
07:0C
4 HDD

7-11
07:0A
4 HDD

7-9
07:08
4 HDD

7-7
07:06
4 HDD

7-5
07:04
4 HDD

7-3
07:02
4 HDD

7-1
07:00
4 HDD

2-16
02:0F

2-14
02:0D

2-12
02:0B

DKA 2
2-10
2-8
02:09
02:07

2-6
02:05

2-4
02:03

2-2
02:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

2-15
02:0E

2-13
02:0C

2-11
02:0A

2-9
02:08

2-7
02:06

2-5
02:02

2-3
02:02

2-1
02:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

6-16
06:0F

6-14
06:0D

6-12
06:0B

DKA 4
6-10
6-8
06:09
06:07

6-6
06:05

6-4
06:03

6-2
06:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

6-15
06:0E

6-13
06:0C

6-11
06:0A

6-9
06:08

6-7
06:06

6-5
06:04

6-3
06:02

6-1
06:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

USP-V - L1 Frame (256 HDD)

USP-V - Center Frame (128 HDD)


H
D
U
H
D
U

1-16
01:0F

1-14
01:0D

1-12
01:0B

DKA 1
1-10
1-8
01:09
01:07

1-6
01:05

1-4
01:03

1-2
01:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

1-15
01:0E

1-13
01:0C

1-11
01:0A

1-9
01:08

1-7
01:06

1-5
01:04

1-3
01:02

1-1
01:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

RAID 600 - R1 Frame (256 HDD)

4-16
04:0F

4-14
04:0D

4-12
04:0B

DKA 2
4-10
4-8
04:09
04:07

4-6
04:05

4-4
04:03

4-2
04:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4-15
04:0E

4-13
04:0C

4-11
04:0A

4-9
04:08

4-7
04:06

4-5
04:04

4-3
04:02

4-1
04:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

14-16
0E:0F

14-14
0E:0D

14-12
0E:0B

DKA 4
14-10
14-8
0E:09
0E:07

14-6
0E:05

14-4
0E:03

14-2
0E:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

14-15
0E:0E

14-13
0E:0C

14-11
0E:0A

14-9
0E:08

14-7
0E:06

14-5
0E:04

14-3
0E:02

14-1
0E:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

H
D
U
H
D
U

H
D
U
H
D
U

5-16
05:0F

5-14
05:0D

5-12
05:0B

DKA 3
5-10
5-8
05:09
05:07

5-6
05:05

5-4
05:03

5-2
05:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

5-15
05:0E

5-13
05:0C

5-11
05:0A

5-9
05:08

5-7
05:06

5-5
05:04

5-3
05:02

5-1
05:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

3-16
03:0F

3-14
03:0D

3-12
03:0B

DKA 1
3-10
3-8
03:09
03:07

3-6
03:05

3-4
03:03

3-2
03:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

3-15
03:0E

3-13
03:0C

3-11
03:0A

3-9
03:08

3-7
03:06

3-5
03:04

3-3
03:02

3-1
03:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

USP-V - R2 Frame (256 HDD)

12-16
0C:0F

12-14
0C:0D

12-12
0C:0B

DKA 2
12-10
12-8
0C:09
0C:07

12-6
0C:05

12-4
0C:03

12-2
0C:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

12-15
0C:0E

12-13
0C:0C

12-11
0C:0A

12-9
0C:08

12-7
0C:06

12-5
0C:04

12-3
0C:02

12-1
0C:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

RESTRICTEDCONFIDENTIAL

H
D
U

13-16
0D:0F

13-14
0D:0D

13-12
0D:0B

DKA 3
13-10
13-8
0D:09
0D:07

13-6
0D:05

13-4
0D:03

13-2
0D:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

H
D
U

13-15
0D:0E

13-13
0D:0C

13-11
0D:0A

13-9
0D:08

13-7
0D:06

13-5
0D:04

13-3
0D:02

13-1
0D:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

H
D
U

11-16
0B:0F

11-14
0B:0D

11-12
0B:0B

DKA 1
11-10
11-8
0B:09
0B:07

11-6
0B:05

11-4
0B:03

11-2
0B:01

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

H
D
U

11-15
0B:0E

11-13
0B:0C

11-11
0B:0A

11-9
0B:08

11-7
0B:06

11-5
0B:04

11-3
0B:02

11-1
0B:00

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

4 HDD

Page65

Appendix 2. Open Systems RAID Mechanisms

This section describes how the OpenV Open Systems Groups function on both the Universal Storage
Platform and the Universal Storage Platform V. The track size (chunk) used for OpenV is 256KB. The
available 9900 Lightning style OpenX emulation is available, with its track (chunk) size of 48KB. One
shouldavoidmixingOpenXandthestandardOpenVemulationsonthesamestoragesystem.
WhenusinggenericRAID5,onewouldnormallycalculatethestripewidth(rowsize)bymultiplyingthe
RAID chunk size (or stripe size) by 7 (if 7D+1P). In the case of Open Systems (OpenV volumes), the
UniversalStoragePlatformVusesanativechunksizeof256KB.Henceyouwouldexpectarowsizeof
1,792KB. However, the Universal Storage Platform V manages RAID5 somewhat differently than any
othervendor.ForbothUniversalStoragePlatformVRAID5+types,therearetwoadjacent256KBRAID
chunkstakenfromthesamediskbeforeswitchingtothenextdiskintherow,sotheeffectivechunksize
is512KB.ThisisillustratedinthetwoFiguresbelow.Thus,therearetwiceasmanychunksperrowas
normal. So the Universal Storage Platform Vs RAID5+ 7D+1P stripe width is actually 3,584KB (7 x
512KB).TheUniversalStoragePlatformVsRAID5+(3D+1P)stripewidthisactually1,536KB(3x512KB).
Figure 40. RAID-5+ (7D+1P) Layout (3,584KB stripe width).
7D+1P OPEN-V
Primary
Disk 1

Parity
Disk 2

Disk 3

Disk 4

Disk 5

Disk 6

Disk 7

Disk 8

Stripe Set 1

Stripe Set 2

Figure 41. RAID-5+ (3D+1P) Layout (1,536KB stripe width).

RESTRICTEDCONFIDENTIAL

Page66

WhenusinggenericRAID10,onewouldnormallycalculatethestripewidth(rowsize)bymultiplyingthe
RAIDchunksize(orstripesize)by2 (if2D+2D).Inthe caseofOpenSystems(OpenVvolumes), the
UniversalStoragePlatformVusesanativechunksizeof256KB.Henceyouwouldexpectarowsizeof
512KB.
However,theUniversalStoragePlatformVmanagesRAID10+inthesamemannerasRAID5+,where
therearetwoadjacentRAIDchunkstakenfromthesamediskbeforeswitchingtothenextdiskinthe
row.ThisisillustratedinthenextFigure.Thus,therearetwiceasmanychunksperrowasnormal.So
theUniversalStoragePlatformVsRAID10+(2D+2D)is1,024KB(4x512KB).
Figure 42. RAID-10+ (2D+2D) Layout (stripe width 1024KB).

The Universal Storage Platform V manages RAID6+ in the same manner as RAID5+, where there are
twoadjacentRAIDchunkstakenfromthesamediskbeforeswitchingtothenextdiskintherow.Thisis
illustrated in the Figure below. Thus, there are twice as many chunks per row as normal. So the
UniversalStoragePlatformVsRAID66D+2Pstripewidthis3,072KB(6x512KB).
Figure 43. RAID-6+ (6D+2P) Layout (stripe width 3,072KB)

6D+2P OPEN-V
Primary
Disk 1

Parity
Disk 2

Disk 3

Disk 4

Disk 5

Disk 6

Disk 7

Disk 8

RESTRICTEDCONFIDENTIAL

Page67

Appendix 3. Mainframe 3390x and Open-x RAID


Mechanisms

This section describes how the 3390x (mainframe) and OpenX (9900V Lightning style Open Systems
volumes) Parity Groups function on both the Universal Storage Platform and the Universal Storage
PlatformV.Thetracksize(chunk)usedfor3390xis58KB,andtheOpenXtrack(chunk)sizeis48KB.One
shouldavoidmixingOpenXandthestandardOpenVonthesamestoragesystem.
RAID5+
WhenusinggenericRAID5,onewouldnormallycalculatethestripewidth(rowsize)bymultiplyingthe
RAIDchunksize(orstripesize)by7(if7D+1P).Inthecaseofmainframeemulation(3390xvolumes),
theUniversalStoragePlatformVusesanativetracksizeof58KB.Henceyouwouldexpectarowsizeof
406KB(7x58KB).FortheOpenXemulation,thetracksizeis48KB,withanexpectedrowsizeof336KB
(7x48KB).
However, the Universal Storage Platform V manages RAID5 somewhat differently than any other
vendor.ForbothUniversalStoragePlatformVRAID5+types,thereareeightadjacenttrackstakenfrom
thesamediskbeforeswitchingtothenextdiskintherow.ThisisillustratedinthetwoFiguresbelow.
TheUniversalStoragePlatformVsRAID5+7D+1Pstripewidthfor3390Xis3,248KB(7x8x58KB),and
forOpenXitis2,688KB(7x8x48KB).TheUniversalStoragePlatformVsRAID5+(3D+1P)stripewidth
for3390Xis1,392KB(3x8x256KB),andforOpenXitis1,152KB(3x8x48KB).
Figure 44. RAID-5+ (7D+1P) Layout, 3390-X and Open-X
7D+1P 3390-x & OPEN-x
Primary
Disk 1

Parity
Disk 2

Disk 3

Disk 4

Disk 5

Disk 6

Disk 7

Disk 8

Stripe Set 1

Stripe Set 2

RESTRICTEDCONFIDENTIAL

Page68

Figure 45. RAID-5+ (3D+1P) Layout, 3390-X and Open-X

3D+1P 3390-x & OPEN-x


Primary
Disk 1

Parity
Disk 2

Disk 3

Disk 4

Stripe Set 1

Stripe Set 2

RAID10+
WhenusinggenericRAID10,onewouldnormallycalculatethestripewidth(rowsize)bymultiplyingthe
RAIDchunksize(ortracksize)by2(if2D+2D).Inthecaseofmainframeemulation(3390xvolumes),
theUniversalStoragePlatformVusesanativetracksizeof58KB.Henceyouwouldexpectarowsizeof
116KB(2x58KB).FortheOpenXemulation,thetracksizeis48KB,withanexpectedrowsizeof96KB(2
x48KB).
However,theUniversalStoragePlatformVmanagesRAID10+inthesamemannerasRAID5+,where
thereareeightadjacenttrackstakenfromthesamediskbeforeswitchingtothenextdiskintherow.
ThisisillustratedinthetwoFiguresbelow.SotheUniversalStoragePlatformVsRAID10+(2D+2D)is
928KB(2x8x58KB).TheUniversalStoragePlatformVsRAID10+stripewidthforOpenXis768KB(2x
8x48KB).

RESTRICTEDCONFIDENTIAL

Page69

Figure 46. RAID-10+ (2D+2D) Layout, 3390-X and Open-X.

RAID6+
The Universal Storage Platform V manages RAID6+ in the same manner as RAID5+ where there are
eight adjacent tracks taken from the same disk before switching to the next disk in the row. This is
illustratedintheFigurebelow.Thus,therearetwiceasmanychunksperrowasnormal.So,for3390X,
theUniversalStoragePlatformVsRAID66D+2Pstripewidthisactually2,784KB(6x8x58KB),andfor
OpenXitis2,304KB(6x8x48KB).
Figure 47. RAID-6+ (6D+2P) Layout, 3390-X and Open-X.

RESTRICTEDCONFIDENTIAL

Page70

Appendix 4. BED: Loop Maps with Array Group Names


USP-V (Front View) Cluster-1
BED-6
BED16
Slot 1KU
Loop
PGrp
0
8, 16
1
8, 16
2
8, 16
3
8, 16

BED-5
BED14
Slot 1LU
Loop
PGrp
0
7, 15
1
7, 15
2
7, 15
3
7, 15

BED-2
BED12
Slot 1BU
Loop
PGrp
0
2, 4, 12
1
2, 4, 12
2
2, 4, 12
3
2, 4, 12

BED-1
BED10
Slot 1AU
Loop
PGrp
0
1, 3, 11
1
1, 3, 11
2
1, 3, 11
3
1, 3, 11

BED-8
BED17
Slot 1KL
Loop
PGrp
0
10, 18
1
10, 18
2
10, 18
3
10, 18

BED-7
BED15
Slot 1LL
Loop
PGrp
0
9, 17
1
9, 17
2
9, 17
3
9, 17

BED-4
BED13
Slot 1BL
Loop
PGrp
0
6, 14
1
6, 14
2
6, 14
3
6, 14

BED-3
BED11
Slot 1AL
Loop
PGrp
0
5, 13
1
5, 13
2
5, 13
3
5, 13

USP-V (Rear View) Cluster-2


BED-1
BED18
Slot 2MU
Loop
PGrp
0
1, 3, 11
1
1, 3, 11
2
1, 3, 11
3
1, 3, 11

BED-2
BED1A
Slot 2NU
Loop
PGrp
0
2, 4, 12
1
2, 4, 12
2
2, 4, 12
3
2, 4, 12

BED-5
BED1C
Slot 2XU
Loop
PGrp
0
7, 15
1
7, 15
2
7, 15
3
7, 15

BED-6
BED1E
Slot 2WU
Loop
PGrp
0
8, 16
1
8, 16
2
8, 16
3
8, 16

BED-3
BED19
Slot 2ML
Loop
PGrp
0
5, 13
1
5, 13
2
5, 13
3
5, 13

BED-4
BED1B
Slot 2NL
Loop
PGrp
0
6, 14
1
6, 14
2
6, 14
3
6, 14

BED-7
BED1D
Slot 2XL
Loop
PGrp
0
9, 17
1
9, 17
2
9, 17
3
9, 17

BED-8
BED1F
Slot 2WL
Loop
PGrp
0
10, 18
1
10, 18
2
10, 18
3
10, 18

RESTRICTEDCONFIDENTIAL

Page71

Appendix 5. FED: 8-Port Fibre Channel Maps with


Processors
USP-V (Front View) OPEN_Fibre 8-port Cluster-1
FED-6
FED06
Slot 1HU
Port
MP
1N
mp1
3N
mp2
1P
mp3
3P
mp4

FED-5
FED04
Slot 1GU
Port
MP
1J
mp1
3J
mp2
1K
mp3
3K
mp4

FED-2
FED02
Slot 1FU
Port
MP
1E
mp1
3E
mp2
1F
mp3
3F
mp4

FED-1
FED00
Slot 1EU
Port
MP
1A
mp1
3A
mp2
1B
mp3
3B
mp4

FED-8
FED07
Slot 1HL
Port
MP
1Q
mp1
3Q
mp2
1R
mp3
3R
mp4

FED-7
FED05
Slot 1GL
Port
MP
1L
mp1
3L
mp2
1M
mp3
3M
mp4

FED-4
FED03
Slot 1FL
Port
MP
1G
mp1
3G
mp2
1H
mp3
3H
mp4

FED-3
FED01
Slot 1EL
Port
MP
1C
mp1
3C
mp2
1D
mp3
3D
mp4

USP-V (Rear View) OPEN_Fibre 8-port Cluster-2


FED-1
FED08
Slot 2QU
Port
MP
2A
mp1
4A
mp2
2B
mp3
4B
mp4

FED-2
FED0A
Slot 2RU
Port
MP
2E
mp1
4E
mp2
2F
mp3
4F
mp4

FED-5
FED0C
Slot 2TU
Port
MP
2J
mp1
4J
mp2
2K
mp3
4K
mp4

FED-6
FED0E
Slot 2UU
Port
MP
2N
mp1
4N
mp2
2P
mp3
4P
mp4

FED-3
FED09
Slot 2QL
Port
MP
2C
mp1
4C
mp2
2D
mp3
4D
mp4

FED-4
FED0B
Slot 2RL
Port
MP
2G
mp1
4G
mp2
2H
mp3
4H
mp4

FED-7
FED0D
Slot 2TL
Port
MP
2L
mp1
4L
mp2
2M
mp3
4M
mp4

FED-8
FED0F
Slot 2UL
Port
MP
2Q
mp1
4Q
mp2
2R
mp3
4R
mp4

RESTRICTEDCONFIDENTIAL

Page72

Appendix 6. FED: 16-Port Fibre Channel Maps with


Processors
USP-V (Front View) OPEN_Fibre 16-port Cluster-1
FED-6
FED06
Slot 1HU
Port
MP
1N
mp1
3N
mp2
5N
mp1
7N
mp2
1P
mp3
3P
mp4
5P
mp3
7P
mp4

FED-5
FED04
Slot 1GU
Port
MP
1J
mp1
3J
mp2
5J
mp1
7J
mp2
1K
mp3
mp4
3K
5K
mp3
mp4
7K

FED-2
FED02
Slot 1FU
Port
MP
1E
mp1
3E
mp2
5E
mp1
7E
mp2
1F
mp3
3F
mp4
5F
mp3
7F
mp4

FED-1
FED00
Slot 1EU
Port
MP
1A
mp1
3A
mp2
5A
mp1
7A
mp2
1B
mp3
3B
mp4
5B
mp3
7B
mp4

FED-8
FED07
Slot 1HL
Port
MP
1Q
mp1
mp2
3Q
mp1
5Q
mp2
7Q
1R
mp3
3R
mp4
5R
mp3
7R
mp4

FED-7
FED05
Slot 1GL
Port
MP
1L
mp1
3L
mp2
5L
mp1
7L
mp2
1M
mp3
3M
mp4
5M
mp3
7M
mp4

FED-4
FED03
Slot 1FL
Port
MP
1G
mp1
3G
mp2
5G
mp1
7G
mp2
1H
mp3
3H
mp4
5H
mp3
7H
mp4

FED-3
FED01
Slot 1EL
Port
MP
1C
mp1
3C
mp2
5C
mp1
7C
mp2
1D
mp3
3D
mp4
5D
mp3
7D
mp4

USP-V (Rear View) OPEN_Fibre 16-port Cluster-2


FED-1
FED08
Slot 2QU
Port
MP
2A
mp1
4A
mp2
6A
mp1
8A
mp2
2B
mp3
4B
mp4
6B
mp3
8B
mp4

FED-2
FED0A
Slot 2RU
Port
MP
2E
mp1
4E
mp2
6E
mp1
8E
mp2
2F
mp3
4F
mp4
6F
mp3
8F
mp4

FED-5
FED0C
Slot 2TU
Port
MP
2J
mp1
4J
mp2
6J
mp1
8J
mp2
2K
mp3
4K
mp4
6K
mp3
8K
mp4

FED-6
FED0E
Slot 2UU
Port
MP
2N
mp1
4N
mp2
6N
mp1
8N
mp2
2P
mp3
4P
mp4
6P
mp3
8P
mp4

FED-3
FED09
Slot 2QL
Port
MP
2C
mp1
4C
mp2
6C
mp1
8C
mp2
2D
mp3
4D
mp4
6D
mp3
8D
mp4

FED-4
FED0B
Slot 2RL
Port
MP
2G
mp1
4G
mp2
6G
mp1
8G
mp2
2H
mp3
4H
mp4
6H
mp3
8H
mp4

FED-7
FED0D
Slot 2TL
Port
MP
2L
mp1
4L
mp2
6L
mp1
8L
mp2
2M
mp3
4M
mp4
6M
mp3
8M
mp4

FED-8
FED0F
Slot 2UL
Port
MP
2Q
mp1
mp2
4Q
6Q
mp1
8Q
mp2
2R
mp3
4R
mp4
6R
mp3
8R
mp4

RESTRICTEDCONFIDENTIAL

Page73

Appendix 7. FED: 8-Port FICON Maps with Processors

USP-V (Front View) FICON Cluster-1


FED-6
FED06
Slot 1HU
Port
MP
1N
1, 2
3N
1, 2
5N
3, 4
7N
3, 4

FED-5
FED04
Slot 1GU
Port
MP
1J
1, 2
3J
1, 2
5J
3, 4
7J
3, 4

FED-2
FED02
Slot 1FU
Port
MP
1E
1, 2
3E
1, 2
5E
3, 4
7E
3, 4

FED-1
FED00
Slot 1EU
Port
MP
1A
1, 2
3A
1, 2
5A
3, 4
7A
3, 4

FED-8
FED07
Slot 1HL
Port
MP
1, 2
1Q
1, 2
3Q
3, 4
5Q
7Q
3, 4

FED-7
FED05
Slot 1GL
Port
MP
1L
1, 2
3L
1, 2
5L
3, 4
7L
3, 4

FED-4
FED03
Slot 1FL
Port
MP
1G
1, 2
3G
1, 2
5G
3, 4
7G
3, 4

FED-3
FED01
Slot 1EL
Port
MP
1C
1, 2
3C
1, 2
5C
3, 4
7C
3, 4

USP-V (Rear View) FICON Cluster-2


FED-1
FED08
Slot 2QU
Port
MP
2A
1, 2
4A
1, 2
6A
3, 4
8A
3, 4

FED-2
FED0A
Slot 2RU
Port
MP
2E
1, 2
4E
1, 2
6E
3, 4
8E
3, 4

FED-5
FED0C
Slot 2TU
Port
MP
2J
1, 2
4J
1, 2
6J
3, 4
8J
3, 4

FED-6
FED0E
Slot 2UU
Port
MP
2N
1, 2
4N
1, 2
6N
3, 4
8N
3, 4

FED-3
FED09
Slot 2QL
Port
MP
2C
1, 2
4C
1, 2
6C
3, 4
8C
3, 4

FED-4
FED0B
Slot 2RL
Port
MP
2G
1, 2
4G
1, 2
6G
3, 4
8G
3, 4

FED-7
FED0D
Slot 2TL
Port
MP
2L
1, 2
4L
1, 2
6L
3, 4
8L
3, 4

FED-8
FED0F
Slot 2UL
Port
MP
2Q
1, 2
1, 2
4Q
3, 4
6Q
8Q
3, 4

RESTRICTEDCONFIDENTIAL

Page74

Appendix 8. FED: 8-Port ESCON Maps with Processors

USP-V (Front View) ESCON Cluster-1


FED-6
FED06
Slot 1HU
Port
MP
1N
1, 2
3N
1, 2
5N
1, 2
7N
1, 2

FED-5
FED04
Slot 1GU
Port
MP
1J
1, 2
3J
1, 2
5J
1, 2
7J
1, 2

FED-2
FED02
Slot 1FU
Port
MP
1E
1, 2
3E
1, 2
5E
1, 2
7E
1, 2

FED-1
FED00
Slot 1EU
Port
MP
1A
1, 2
3A
1, 2
5A
1, 2
7A
1, 2

FED-8
FED07
Slot 1HL
Port
MP
1P
1, 2
3P
1, 2
5P
1, 2
7P
1, 2

FED-7
FED05
Slot 1GL
Port
MP
1K
1, 2
3K
1, 2
5K
1, 2
7K
1, 2

FED-4
FED03
Slot 1FL
Port
MP
1F
1, 2
3F
1, 2
5F
1, 2
7F
1, 2

FED-3
FED01
Slot 1EL
Port
MP
1B
1, 2
3B
1, 2
5B
1, 2
7B
1, 2

USP-V (Rear View) ESCON Cluster-2


FED-1
FED08
Slot 2QU
Port
MP
2A
1, 2
4A
1, 2
6A
1, 2
8A
1, 2

FED-2
FED0A
Slot 2RU
Port
MP
2E
1, 2
4E
1, 2
6E
1, 2
8E
1, 2

FED-5
FED0C
Slot 2TU
Port
MP
2J
1, 2
4J
1, 2
6J
1, 2
8J
1, 2

FED-6
FED0E
Slot 2UU
Port
MP
2N
1, 2
4N
1, 2
6N
1, 2
8N
1, 2

FED-3
FED09
Slot 2QL
Port
MP
2B
1, 2
4B
1, 2
6B
1, 2
8B
1, 2

FED-4
FED0B
Slot 2RL
Port
MP
2F
1, 2
4F
1, 2
6F
1, 2
8F
1, 2

FED-7
FED0D
Slot 2TL
Port
MP
2K
1, 2
4K
1, 2
6K
1, 2
8K
1, 2

FED-8
FED0F
Slot 2UL
Port
MP
2P
1, 2
4P
1, 2
6P
1, 2
8P
1, 2

RESTRICTEDCONFIDENTIAL

Page75

Appendix 9. Veritas Volume Manager Example

ThisdiagramillustratesatypicalVeritashostLVMconfigurationandnamingconventions.
SERVERS

STORAGE

Vol 01

Plex #1
data

Volume #1

Plex #2
log

Vol 03

Plex #3

Volume #2

VM10

Veritas VM Disk Pool


VM09

VM11

VM12

VM13

Vol 03

Plex #4

VM15

Volume #3

VM14

VM16

LUN

VM08

LUN

LUN

VM07

LUN

LUN

VM06

LUN

LUN

VM05

LUN

LUN

VM04

LUN

LUN

VM03

LUN

LUN

VM02

LUN

LUN

VM01

LUN

The volumes are used for raw


space or files systems.

Volume a set of 1 or more plexes

PLEX- a set of subdisks with


either a striping (usually RAID-0)
or cancatention association
among them.

VM disks (each LUN), with a


Private and Public Region.
Public Regions are cut into one
or more Subdisks which are
used as Plex building blocks.

LUNs - from RAID


Groups

Page76

RESTRICTEDCONFIDENTIAL

Appendix 10. Pool Details

ThissectionillustratestheHitachiDynamicProvisioningenvironmentwithnamingconventions.Note
thesimilaritiesinconceptstothepreviousVeritasexample.

RESTRICTEDCONFIDENTIAL

Page77

Appendix 11. Pool Example


ThisillustratesanexampleofaPoolandDPVOLs.

RESTRICTEDCONFIDENTIAL

Page78

Appendix 12. Disks - Physical IOPS Details

These are examples of maximum expected physical disk Read and Write IOPS rates for three types of
disks.TheCylinderscolumnsshowthepercentageofsurfacespaceinuseasdiskpartitions.Asmore
disksurfaceisbroughtintouse,theheadsmustseekfartherinwards,whichincreasestheaverageseek
times. Note that the Write table shows reduced maximum IOPS due to the higher track centering
precision required for writes. Please note in the tables below that the three drive types use a
representative capacity. For instance, almost all 15K RPM HDDs would be very similar to the 300GB
Seagate15Kusedinthefirstentry.
Table 17. Calculated Maximum Disk Read IOPS Rates by Type and Active Surface Usage

Reads at 100% Busy


300GBSeagateFibreChannel15kRPM
DiskRPM
AverageReadSeekTime
AverageRotationalDelay
AverageAccessTime
Random8KBIOPSRate

Cylinders
25%

50%

75%

100%

15,000
1.8
2.0
3.8

15,000
2.5
2.0
4.5

15,000
2.8
2.0
4.8

15,000
3.5
2.0
5.5

267

225

208

182

146GBHitachiFibreChannel10kRPM
DiskRPM
AverageReadSeekTime
AverageRotationalDelay
AverageAccessTime
Random8KBIOPSRate

25%
10,000
2.4
3.0
5.4
187

50%
10,000
3.3
3.0
6.3
159

75%
10,000
3.8
3.0
6.8
148

100%
10,000
4.7
3.0
7.7
130

750GBSeagateSATA7.2kRPM
DiskRPM
AverageReadSeekTime
AverageRotationalDelay
AverageAccessTime

25%
7,200
4.3
4.2
8.4

50%
7,200
6.0
4.2
10.1

75%
7,200
6.8
4.2
11.0

100%
7,200
8.5
4.2
12.7

119

99

91

79

Random8KBIOPSRate

RESTRICTEDCONFIDENTIAL

Page79

Table 2. Calculated maximum disk Write IOPS rates by type and active surface usage

Writesat100%Busy

Cylinders

300GBSeagateFibreChannel15kRPM
DiskRPM
AverageWriteSeekTime
AverageRotationalDelay
AverageAccessTime
Random8KBIOPSRate

146GBHitachiFibreChannel10kRPM
DiskRPM
AverageWriteSeekTime
AverageRotationalDelay
AverageAccessTime
Random8KBIOPSRate

25%
15,000
2.0
2.0
4.0
250

25%
10,000
2.6
3.0
5.6
180

50%
15,000
2.8
2.0
4.8
208

50%
10,000
3.6
3.0
6.6
152

75%
15,000
3.2
2.0
5.2
192

75%
10,000
4.1
3.0
7.1
141

100%
15,000
4.0
2.0
6.0
167

100%
10,000
5.1
3.0
8.1
123

750GBSeagateSATA7.2kRPM
DiskRPM
AverageWriteSeekTime
AverageRotationalDelay
AverageAccessTime

25%
7,200
5.0
4.2
9.2

50%
7,200
7.0
4.2
11.2

75%
7,200
8.0
4.2
12.2

100%
7,200
10.0
4.2
14.2

109

90

82

71

Random8KBIOPSRate

The two tables below indicate what IOPS rates (small block random) might be expected from RAID
Groups of different types when using 15K RPM SAS disks or 7200 RPM SATAII disks. The disk IOPS
referencevaluesweretakenfromTables1and2above,fromthe50%surfaceusagecolumns.Theleft
side of each table is the expected maximum rate where the disks are operating at 100% busy (not
recommended).Therightsideofeachtableshowsamorerealistic30%busyrate.Forexample,when
using15KRPMSASdriveswithRAID5(7D+1P)ata30%busyrate,youshouldplanforabout540IOPS
for100%Reads,or125IOPSfor100%Writes.Takenotetheseareallcachemissvalues.
Table 3. Calculated RAID Group IOPS Rates (SAS) by RAID Level and Percent Busy Rates

TotalDiskIOPSbyRAIDGroup(100%busy)

15kRPMDisks

UsingIOPS=

TotalDiskIOPSbyRAIDGroup(30%busy)
15kRPMDisks

225

208

225

208

RAIDType

IOPS
Availablefor
RandomRead

IOPS
Availablefor
RandomWrite

RAIDType

IOPS
Availablefor
RandomRead

IOPS
Availablefor
RandomWrite

RAID1+0
2+2
4+4
RAID5
3+1

900
1,800

900

416
832

208

RAID10
2+2
4+4
RAID5
3+1

270
540

270

125
250

62

7+1
RAID6

1,800

416

7+1
RAID6

540

125

6+2

1,800

277

6+2

540

83

RESTRICTEDCONFIDENTIAL

UsingIOPS=

Page80

Table 4. Calculated RAID Group IOPS Rates (SATA-II) by RAID Level and Percent Busy Rates

TotalDiskIOPSbyRAIDGroup(100%busy)

7.2kRPMSATA

UsingIOPS=

TotalDiskIOPSbyRAIDGroup(30%busy)

7.2kRPMSATA

99

90

99

90

RAIDType

RAID1+0
2+2
4+4
RAID5
3+1
7+1
RAID6

IOPS
Availablefor
RandomRead

396
792

396
792

IOPS
Availablefor
RandomWrite

90
180

60
120

RAIDType

RAID1+0
2+2
4+4
RAID5
3+1
7+1
RAID6

IOPS
Availablefor
RandomRead

119
238

119
238

IOPS
Availablefor
RandomWrite

27
54

18
36

6+2

792

80

6+2

238

24

RESTRICTEDCONFIDENTIAL

UsingIOPS=

Page81

Appendix 13. File Systems and Hitachi Dynamic


Provisioning Thin Provisioning

Thetablebelowshowsvariousguidelinesfordifferentfilesystemsandtheirformattingbehaviorona
DPVOL.

RESTRICTEDCONFIDENTIAL

Page82

Appendix 14. Putting SATA Drive Performance into


Perspective

ByIanVogelesang
Thefactthatwealreadyworryabouthotspots(busyparitygroups)meansthatFibreChanneldrives
arealreadyquitebusy.Inotherwords,wegenerallyadjustworkloadsonFibreChanneldrivestostay
withintheirIOPShandlingcapability.
Forexample,theperformancebenefitofDynamicProvisioningisthattheI/Oactivityassociatedwitha
large number of host DPVOLs can be combined and distributed over many parity groups. This evens
outtheactivityoverthosemanyparitygroupsandthuseliminateshotspots.
Thus, by eliminating hot spots, Dynamic Provisioning lets you raise average parity group utilization (%
busy) somewhat. But what does raise parity group utilization mean? It means put more data on
eachdrive.
Ifwe combinethedatafromtwoseparatehostsortwoapplicationstogetherandstorethemonone
paritygroup,wearealsocombiningtheassociatedactivity(IOPS)together.
Thusifweputtwiceasmuchdataofthesametypeononediskdrive,wewillalsodoubletheactivity
(IOPS) on the drive. Notice that there is the concept of the type of data in terms of its intensity of
access. Some types of data are infrequently accessed, and others are heavily accessed. This is a
fundamentalcharacteristicofthedata,nomatterwhattypeofdrivethedataisusedtostorethedata.
The activity is associated with the data, and if you have twice the data, you will have twice the
associatedactivity.
WecallthisintrinsicmeasureoftheactivityassociatedwithagivenamountofdatatheAccessDensity,
anditismeasuredinIOPSperGB.Forexample,ifyouhave100IOPSassociatedwith200GBofdatathe
accessdensityforthatdatais100IOPSper200GBofdata,or0.5IOPSperGB.
ComparedtoFibreChanneldrives:
SATAdriveshaveamuchlowerIOPScapabilityonaperdrivebasis.
SATAdriveshaveamuchhigherstoragecapacityonaperdrivebasis.

Thus to figure out if its costeffective to use SATA drives, you need to figure out not only how many
SATAdrivesitwilltaketoholdthedata,butalsohowmanySATAdrivesitwilltaketohandletheIOPS
thatcomewiththedata.
IngeneralwecantputmostnormaltypesofdataonSATAdrives,becauseSATAdriveshaveamuch
lowerIOPSlimitthanthatofFibreChanneldrives.
Letslookatsomenumbers:
EachI/Ooperationbeginswithmechanicalpositioning,whichstartswithaseekoperation(tomovethe
accessarmintoposition)followedbylatency(waitingforthedrivetoturnuntilthestartingpointofthe
targetdatacomesunderthehead).

RESTRICTEDCONFIDENTIAL

Page83

Theaverageseektimeforadiskdriveisgenerallylongerthantheaveragelatency(1/2turn),sointhe
timeittakestodothemechanicalpositioning,thedrivemakesmorethanonefullrotation.However,a
randomI/Otypicallyprocessesa4KBor8KBblock,whichisontheorderofonepercentofthestorage
capacityofonediskdrivetrack.Thecapacityofonetrackvariesalotbetweendrivetypesandtheinner
andouterzonesofthedrive,butingeneralthetransfertimeforarandomI/Oblockisontheorderof
lessthan1%ofthetotalI/Oservicetime,sorandomI/Ois99%mechanicalpositioning.
NominaloneatatimeIOPScapabilityofsampledrivetypes:

1TBSATAdrive

400GB10,000RPMFibre
Channeldrive

300GB15,000RPMFibre
Channeldrive

Averagereadseektime

8.5ms

4.9ms

3.5ms

Averagelatency
Thisisofarotation.

4.2ms

3.0ms

2.0ms

Totalmechanicalpositioning
time(read)

12.7ms

7.9ms

5.5ms

ReadIOPScapability
Thisisthenumberof(read)
mechanicalpositioning
operations(IOPS)thatcanbe
doneinonesecond,oneat
atime.

79IOPS

127IOPS

182IOPS

Whenwefactorintheamountofdatathatisstoredonthedrive,wecancalculatethebaseoneata
timereadaccessdensitycapabilityofeachtypeofdrive:

1TBSATAdrive

400GB10,000RPMFibre
Channeldrive

300GB15,000RPMFibre
Channeldrive

ReadIOPScapability
Numberof(read)mechanical
positioningoperations(IOPS)
thatcanbedoneinone
second,oneatatime.

79IOPS

127IOPS

182IOPS

Storagecapacity,GB

1,000GB

400GB

300GB

.08IOPSperGB

.32IOPSperGB

.61IOPSperGB

Nominalreadaccessdensity
capability

RESTRICTEDCONFIDENTIAL

Page84

RandomReadAccessDensity,
IOPSperGB
(onereadatatime)
1
0.5
0

1TBSATA 10K400

15K300

Whatwecanseehereisthatifwefilleachdrivewithdata,10KRPMdrivesarecapableofhandlingdata
thatis4xasactiveasthedatawecanputonaSATAdrive,and15KRPMdrivesarecapableofhandling
datathatis8xasactive.
Itgetsworse.
I/O operations are not issued oneatatime. Multiple I/O operations (each identified by a tag) are
issued at once to each disk drive, in order to let the disk drive decide what order to perform the I/O
operationsin,soastominimizetheamountofseekingandrotationaldelaynecessary.Thiscapabilityto
reduceaveragemechanicalpositioningtimeiscalledTaggedCommandQueuing,orTCQ.
Fibre Channel drives have substantially more microprocessor horsepower than SATA drives, and as a
result, they get substantially higher boosts in throughput through the use of TCQ. The amount of
throughputboostfromTCQvariesbyworkload,butforthepurposesofillustration,itsreasonableto
showat25%boostinIOPScapabilityusingTCQforSATA,anda50%boostforFibreChannel.
NowamorerealisticcomparisonforreadAccessDensitycapabilityisshown:

RandomReadAccessDensity,
IOPSperGB
(withTaggedCommandQueuing)
1
0.5
0

1TBSATA 10K400

15K300

NowtheAccessDensitycapabilityof10KRPMdrivesisroughly5xthatofSATA,and15KRPMdrives
roughly10xthatofSATA.
Itgetsevenworse.
The use of SATA drives in subsystem applications is relatively recent, and protecting the integrity of
customerdataisparamount.Therefore,Hitachisubsystemsperformareadverifyoperationafterevery
RESTRICTEDCONFIDENTIAL

Page85

writewithSATAdrives.Thisaddsarandomreadpositioningtimeaftereverywriteoperation.Note
thatthisisnotoneextrarotation,buttrulyafullrandomreadpositioning,asingeneralmultipleI/O
operationsarequeuedinthedriveatonce,andthereforethereadverifyoperationgoesinthequeue
after the write completes, and other I/O operations in general will intervene between the write and
subsequentreadverify.

RandomWriteAccessDensity,
IOPSperGB
(withTaggedCommandQueuing&SATA
readverify)
1
0.5
0

1TBSATA 10K400

15K300

Forrandomwrites,the10KRPMdriveiscapableofhandlingabout10xtheaccessdensityoftheSATA
drive,andthe15KRPMdriveabout18xthatoftheSATAdrive.
Thesenumbersareonlyroughestimates,butwhatyoucanclearlyseehereisthatinordertorealizethe
cost effectiveness of SATA drives, that is, in order to take advantage of the storage capacity of SATA
drives,theyneedtobefilledwithdata.TheonlytypeofdatathatcanfillaSATAdriveisdatathatis
approximately an order of magnitude (10 times) less active than data that would normally be put on
FibreChanneldrives.
Failure to understand this 10x difference in access density capability can result in serious SATA drive
deploymentproblems.IfnormalFibreChanneldrivetypedataisplacedonSATAdrives,theresultwill
usuallybethattheSATAdrivescannotkeepupwiththeIOPSdemand,andsubsystemcachewillfillup
with data waiting to be written to the SATA drives. In order to treat the symptom (excessive write
pending),SATAdriveworkloadscanbeplacedintheirownCacheLogicalPartition(CLPR).Eventhough
this may isolate other workloads from the problem, the SATA drives will still be a bottleneck
constrainingthethroughputoftheapplicationthatisusingthem.
ThesummaryonSATAdrivesisthatiftheyareusedfortraditionalapplicationdata,thereareonlytwo
scenarios:
1)Costwasreducedbyputtingover100GBofnormaldataona1TBSATAdrive.Excessivewrite
pendingforcestheSATAdrivestobeisolatedinaCLPR.Costwasreduced,butthroughputis
drasticallyreduced.
2)Lessthan100GBofdatawasplacedoneachSATAdrive.Theextramoney(overandabove
whatitwouldhavecostforFibreChanneldrivers)wasspentinordertoprovideenoughSATA
drivestohandletheIOPS.Forthispricepremiumtheresultisslowerresponsetimeandhigher
electricalpowercosts.
Thusonlyifyouhaveatypeofworkloadthathasatmost1/10ththeaccessdensityofnormalworkloads
willSATAdrivesbecosteffective.
RESTRICTEDCONFIDENTIAL

Page86

Appendix 15. LDEVs, LUNs, VDEVs and More

An LDEV is a logical volume internal to the subsystem, which can hold host data. LDEVs are
carvedfromalogicalcontainercalledaVDEV.Inthediagrambelow,aninternalVDEVisshown,which
mapstotheformattedspace(netofparity)inaParityGroup.
Figure 1. Overview of Logical Devices.

Host

Multi-path software device drivers LUN

HBA device drivers LUN

HBA

HBA

Switch

Switch

Physical
Port

Physical
Port

Logical
Port

Logical
Port

LU

Physical
port
Host
Storage
Domain
(HSD)

LU
Logical
constructs

HDEV

LDEV
VDEV

LUSE

Physical
disk
drives

Parity group

RESTRICTEDCONFIDENTIAL

Page87

Figure 2. Alternate Overview of Logical Devices.

Host Group

LUN

HDEV

LUSE

LUSE

LDEV

LDEV

LDEV

LDEV

LDEV

LDEV

COW
V-VOL

COW
V-VOL

DPVol

DPVol

VDEV
(internal)

VDEV
(concat.
parity
group)

VDEV
(external)

VDEV
(internal)

VDEV
(concat.
parity
group)

VDEV
(external)

VDEV (CoW
V-Vol Grp)

VDEV (CoW
V-Vol Grp)

VDEV
(HDP V-Vol
Grp)

VDEV
(HDP V-Vol
Grp)

LUN
(external)

Parity
Group

LUN
(external)

CoW Pool

CoW Pool

HDP Pool

HDP Pool

Parity
Group

VDEV
(internal)
< 2.99TB

LDEV
LDEV
Parity
LDEV
Group

VDEV
(external)
< 4TB

VDEV (CoW
V-Vol Grp)
4 TB

LDEV
LDEV
Parity
LDEV
Group

LDEV
LDEV
LDEV
LDEV
LDEV
LDEV

LDEV
LDEV
LDEV
LDEV
LDEV
LDEV

LDEV
LDEV
LDEV
LDEV
LDEV
LDEV

LDEV
LDEV
LDEV
LDEV
LDEV
LDEV

LDEV
LDEV
LDEV
LDEV
VDEV
LDEV
(internal)

LDEV
LDEV
LDEV
LDEV
VDEV
LDEV
(external)

LDEV
LDEV
LDEV
LDEV
VDEV
LDEV
(internal)

LDEV
LDEV
LDEV
LDEV
VDEV
LDEV
(external)

LDEV
LDEV
LDEV
LDEV
Parity
LDEV
Group

LDEV
LDEV
LDEV
LDEV
LUN
LDEV
(external)

LDEV
LDEV
LDEV
LDEV
Parity
LDEV
Group

LDEV
LDEV
LDEV
LDEV
LUN
LDEV
(external)

VDEV (HDP
V-Vol Grp)
4 TB

LDEVsareuniquelyidentifiedacrossthesubsystembyasixhexadecimaldigitidentifierusuallywritten
with a colon between each pair of hex digits, for example 00:00:01. This format is called the
LDKC:CU:LDEV format, and thus the subfields with LDEV names are called the LDKC (Logical DKC)
number,theCU(ControlUnit)number,andtheLDEVnumber(withintheCU).
ThemaximumnumberofLDEVsthatmaybecreatedwithinaUniversalStoragePlatformVorUniversal
StoragePlatformVMsubsystemis64K.TheLDKCfield,thefirst2hexdigitsofanLDEVidentifier,was
addedwiththeintroductionoftheUniversalStoragePlatformVandUniversalStoragePlatformVMin
ordertoprovidefortheabilityatalaterdatetopossiblyincreasethenumberofLDEVspersubsystem
above64K.
NotethatthetermLDEVasusedwithHDSenterprisesubsystems(UniversalStoragePlatformfamily)
to describe an internal logical volume, such as can be carved from a parity group, is somewhat
confusinglycalledanLUoraLUNforHDSmodularsubsystems(AMSfamily).WhatiscalledanLUor
LUNinHDSenterprisesubsystemsiscalledanHLUNinmodularsubsystems.Sorryfortheconfusion.
NotethataLUN,thewaythatahostseesanLDEV,emulatesaphysicalFibreChanneldiskdrive.As
such, accessing a LUN consists of reading or writing disk sectors comprised of 512 bytes. Thus the

RESTRICTEDCONFIDENTIAL

Page88

smallestamountofdatathatahostcanreadorwriteisonewhole512bytesector.ThesizeofaLUNor
anLDEVisthereforeamultipleofa512byteblock(sector).
WhencreatinganLDEV,ifyouspecifythesizeinblocks(512bytesectors)youwillgetanLDEVofthat
exactsize.Ifyouspecifythesizeyouwantusinganyotherunit,suchasMB,therewillbearounding
operation that will take the size you asked for and will round up to the next higher multiple of an
internalallocationunit(logicalcylindersof15logicaltracks).Sufficeittosaythatifyouwanttomake
anLDEVofanexactsize,specifythesizeinblocks(sectors).
NoteinthediagramabovethatLDEVsarenotdirectlypresentedtohostsasaLogicalUnit(LU)orLUN,
but rather there is an intermediate construct called an HDEV (Head LDEV). An HDEV may either map
directlytoasingleLDEV,oritmaybemappedtowhatiscalledaLUSE(LogicalUnitSizeExpansion).A
LUSEisacombinationoffrom2to36LDEVS:
Figure 3. Overview of LUSE Devices.

Physical
Port

Logical
Port

Physical
Port
Host
Storage
Domain
(HSD)

Logical
Port

LU

From 2 to 36
LDEVs may be
combined in a
LUSE

LU

LDEV

VDEV

LUSE

HDEV

LDEV

VDEV

The only reason that one should make a LUSE is to create a LUN that is larger than the maximum
availableLDEVsize.ForinternalLDEVs,thesizeofanLDEVislimitedbytheamountofformattedspace
in one parity group, or about 2.99 TB, whichever is smaller (more on this below when we talk about
VDEVs).ForothertypesofLDEV,themaximumsizeis4TB.
ThemaximumsizeofaLUSE,ontheotherhand,is60TB.
Note:TheLDEVswithinaLUSEarenotinterleavedorstriped.LBAs(LogicalBlockAddresses,orsector
numbers)fortheLUSEstartatthebeginningofthefirst,orheadLDEV,andthenproceedtotheendof
eachLDEVbeforestartingonthenextone.Thuswhenorwritingsequentially,theperformancewillbe
limitedbytheperformanceofasingleLDEVatatime.
RESTRICTEDCONFIDENTIAL

Page89

Lateron,wewilltalkaboutwhatareconfusinglycalledconcatenatedparitygroupswhichdonotgive
the ability to make larger LDEVs, but which do interleave the RAID stripes of an LDEV across multiple
paritygroups.
Figure 4. Overview of LUSE Devices.

Host

Multi-path software device drivers LUN

HBA device drivers LUN

HBA

HBA

Switch

Switch

Physical
Port

Physical
Port

Logical
Port

Logical
Port

LU

Physical
port
Host
Storage
Domain
(HSD)

LU
Logical
constructs

HDEV

LDEV
VDEV

LUSE

Physical
disk
drives

Parity group

Going back to our original diagram for a moment, we can see that a single LDEV (subsystem internal
logicalvolume)maybeexposedasmultipledifferentLUNStohosts.Thediagramaboveshowsasingle
hostwithtwopathstoanHDEV.
GrantingahostaccesstoanHDEV(LDEVorLUSE)isdonebecreatingaLogicalUnit(LU)withinaHost
StorageDomain.ThismappingfromahosttoanHDEVissometimescalledapath.Apathconsistsof
three parts, the physical port (such as port 1A), the logical port, or Host Storage Domain within that
physicalport,andthentheLogicalUnitNumber,orLUNwithinthatHSD.HostStorageDomainsmaybe
identified by name, or by relative number within the physical port. Note that it is possible to map a
particularLDEVasoneLUnumberinoneHSDandadifferentLUnumberinadifferentHSD.
Fromthehostsperspective,thehostdoesntseethetwolevelstructureofthephysicalportandHSD
(logicalport).WhentheHSDisconfigured,youtellthesubsystemwhichhost(s)istobemappedtoeach
HSD.(AhostcanonlybemappedtooneHSDonagivenport.)Thusthehostseesaseriesoflogical
units(LUs)behindtheport,whereeachLUisidentifiedbyitsLUnumber.
RESTRICTEDCONFIDENTIAL

Page90

ThediagramabovealsoillustrateshowmultiplepathsfromahosttoanHDEVareimplemented.The
keyistheuseofmultipathsoftwareinthehost,suchasHitachisHDLM(HitachiDynamicLinkManager).
Themultipathsoftwareisadevicedriverwhichpresentsasinglelogicaldeviceforapplicationstouse.
But when I/Os are directed to that logical device, the multipath software device driver receives those
I/Ooperationsandreissuesthemto oneortheotheroftheseparateLUNspresentedbyeachHBAs
device driver. The HBA device drivers LUNs are hidden from applications, and are only used by the
multipathdriversoftware.Itisonlythemultipathsoftwarethatknowsthatthe(hidden)LUNsfrom
thetwoHBAsreallyleadtothesameHDEVinsidethesubsystem.
Correspondinglyatthesubsystemend,whenahosthasmultipathsoftwarerunning,thesubsystemis
notawareofthatfact.ThesubsystemdoesnotseeI/Ooperationscomingdownseparatepathsas
beingrelatedinanyway.Forthisreason,theextendedroundrobinhostmultipathingalgorithmwill
producebetterresultsforsequentialI/Oforactive/activeconfigurations.Extendedroundrobinissuesa
seriesofI/OoperationsforgivenmultipathLUNoveroneofthepathsbeforeissuingthenextseriesof
operationsovertheotherpath.Inthisway,thesubsystemssequentialdetectionalgorithmwhichis
not aware of multipathing will still see sequential host I/O as sequential at the individual subsystem
port.
RecallthatwesaidthatLDEVsarecarvedfromalogicalcontainercalledaVDEV.Therearefourtypes
ofVDEV:internal,external,COW,andDynamicProvisioning(DP)VDEVs:

Copy On Write Snapshot

V-VOL
Group

Pool ID
Pool ID

Pool volume

LDEV
VDEV

Dynamic Provisioning
X-VOL
Group

Pool ID
Pool ID

Internal

External
Parity group
Port

Port

Port

Port

LU

RESTRICTEDCONFIDENTIAL

External
subsystem

Page91

Internal VDEV
TheoriginaltypewastheinternalVDEV,whichisalogicalcontainerthatmapstotheformattedspace
withinaparitygroup,netofparityormirrorcopies.SinceLDEVsarecarvedfromVDEVs,thesizeofthe
largestinternalLDEVthatcanbecreatedisonethatentirelyfillstheVDEVthatitiscarvedfrom.
ThereisamaximumsizeforinternalVDEVs,whichisabout2.999TB.Ifthereismoreformattedspace
thanthisintheparitygroup,thenasmanymaximumsizeVDEVsaswillfitarefirstallocated,thenany
remainingspaceisallocatedtoonemoreVDEV.
Youcanseethisifyourparitygrouphasmorethan3TBofspaceinit.Inthefollowingexample,we
haveaRAID57+1paritygroupconsistingof750GBdrives.

The effective capacity of each drive is not quite 750 GB, but just as a rough estimation the total
formattedcapacityoftheparitygroupis~0.7TBperdrive*7drivesworthofformattedspace=~4.9
TB.ThisisbiggerthanthemaxinternalVDEVsize,soyoucanseeabovethattwoVDEVswerecreated
torepresenttheformattedspaceonthisparitygroup.Hereyoucanseethattheparitygroupnameis
11,andtheVDEVnamesonthisparitygroupare11(1),and11(2).Thatswhatthosenumbersin
parentheses are at the end of a parity group name. They are the number of the VDEV within that
paritygroup.NotethattheLDEVsarewithintheVDEV,nottheparitygroup.
Whileweareatit,paritygroupshavenameslike11,12,31,andsoon.Thesenameshavephysical
meaning,astheyindicatetheBEDpairandphysicaldrivelocation.
ThusthenamesforinternalVDEVsarepurelydeterminedbythephysicallayoutofthesubsystem.

External VDEV
Each External VDEV maps to a LUN on an external (virtualized) subsystem. Thus each LUN on the
externalsubsystemcorrespondstotheequivalentofaparitygroup.ItispossibletocreateanLDEVon
an External VDEV that is the exact size in sectors of the external LUN. This makes it handy to do
replicationormigration.
The names for external VDEVs look like either E11 or 11 #. Both forms ofthe name are equivalent,
thus E 11 and 11 # both refer to the same external VDEV. In some places one form of the name is
used,andinotherplacestheotherformisused.
NotethatthereisaseparatenamespaceforeachofthefourtypesofVDEV,sothatevenifyouhavean
internal parity group called 11, you can create an external VDEV called E 11, and these are not the
same.
ThemaximumsizeofanexternalVDEVis4TB.
RESTRICTEDCONFIDENTIAL

Page92

CoW VDEV
VDEVsforCopyonWriteSnapshot(CoW)areusuallycalledVVOLgroups.
EachCoWVDEV(CoWVVOLgroup)isfixedinsizeat4TB.
CoWVVOLgroupshavenamesthatlooklikeV11or11V.Thesetwoformsareequivalent,withone
formbeingusedinsomeplacesandtheotherformtherestofthetime.
ACoWVVOLgroupdoesnotmaptoanystorage.ItispurelyalogicalcontainerfromwhichCoWV
VOLs(CoWLDEVs)maybecarved.ItistheseCoWVVOLseachindividuallythataremappedtoCoW
pools.
VVOLstandsforVirtualVOLume.ThisisbecausephysicalspaceisnotprovidedforCoWVVOLswhen
they are created. Instead, the LBA range representing extent of the VVOL is divided up into 42 MB
pages, where the pages are tracked in a page table. A physical location for a 42 MB page within the
assigned CoW pool is only assigned when that page in the VVOL is written to. Thus pages that have
neverbeenwrittentohavenostorageassigned.ThiscomesinhandyasCoWVVOLsarepointintime
snapshot ShadowImage secondary volumes, and thus you only need space for whatever data has
changedontheprimaryvolume.
TheunderlyingspaceinaCoWpoolisprovidedbypoolvolumes,whicharenormalinternalorexternal
LDEVsexclusivelyassignedfordutyaspoolvolumes.ThespaceonthepoolvolumeLDEVsisbrokeninto
42 MB pages, each starting at a RAID stripe boundary. Thus one page consists of a seriesof multiple
RAIDstripes.
Odd note on pool volumes when an LDEV is assigned as CoW or Hitachi Dynamic Provisioning pool
volume,thesuffix"P"isshownonthenameoftheparitygroupthatthepoolvolumeLDEVison.
Forexample,ifanormalLDEV00:12:34whichisonparitygroup"11"isassignedasaCoWorHitachi
DynamicProvisioningpoolvolume,fromthatpointon,LDEV00:12:34willshowasbeingonparitygroup
"11P".Inthiscase,the"P"isreallyamarkertoindicatespecialtreatmentasapoolvolume,notthat
thereisaparitygroupwhosenameis"11P".
YoucanseethisusingtheSVPconfigurationprintouttool.

Dynamic Provisioning VDEV


DPVDEVsarealsocalledVVOLgroups,althoughtheirnamesarewrittenintheformX11or11X.
DynamicprovisioningisexactlylikewhereaCOWsecondaryVVOLhasbeenmountedtoahostforhost
access,andwheretheCoWprimaryisallzeros.
Infact,althoughanyonepoolmustbeeitheraCoWpooloraDPpool,thereisonlyonenamespacefor
COW/DPpools.Themaximumnumberofpoolswithinasubsystemis128.
DynamicProvisioningVVOLsareLDEVscarvedfromaDPVVOLgroup(DPVDEV).TheseDPVVOLsare
usuallycalledDPVOLsforshort.
The mechanism is the same as for CoW. When a host writes to a 42MB page on a DPVOL that has
neverbeenwrittentobefore,apageisallocatedfromthepool,andfromthenonwheneverthehost
readsorwritesfromthatpage,theaccessisdirectedtotheassignedphysicalpageinthepool.
DPVOLgroups(VDEVs)arefixedinsizeat4TB.
RESTRICTEDCONFIDENTIAL

Page93

WhenyoulookattheperformancestatisticsforDynamicProvisioning,whatyouseeisthattheDPVOLs
(thevirtualvolumes)aretheonesthatthehostsaccess,sotheyhavefrontendI/Ostatisticsonly.On
theotherhand,poolvolumesonlyhavebackendI/Ostatistics,astheyaretheonesthattowhichthe
actualbackendI/Oisperformed.
ThemaximumnumberofLDEVswithinaUniversalStoragePlatformVoraUniversalStoragePlatform
VMis64K.ThisisalsothemaximumnumberofVDEVsinasubsystem,andthereforeitispossible(and
itisnowrequiredsincemicrocodeversion060400/00)toplaceeachDPVOLbyitselfatthetopofits
ownVVOLgroup(DPVDEV).
In general, since LDEVs are carved from VDEVs, if there is contiguous unused space immediately
followingtheLDEVwithintheVDEVitispossibletoexpandthesizeofanexistingLDEVintothatfree
space,asshowninthefollowingdiagram:

VDEV

LDEV

LDEV
LDEV

Free space

An LDEV may be grown in place by extending into immediately


following (contiguous) free space within the VDEV

During this expansion, any existing data in the LDEV stays in place, and all that happens is that the
LDEVgetsmorespaceattheend.Notethatappropriateoperatingsystemsupportisrequiredforthe
hosttonoticethatthevolumegotbigger,andadjustthepartitiontableonvolumeaccordingly.Then
an existing partition may be dynamically expanded, again if the operating system and file system
supportthat.
ThisisthereasonwhyeachDPVOLisrequiredtobeplacedwithinitsownVDEV(VVOLgroup).Since
themaximumnumberofVDEVsisthesameasthemaximumnumberofLDEVs,byputtingeachDPVOL
byitselfatthetopofitsownVVOLgroup(VDEV),theDPVOLcanbedynamicallyexpandedlaterif
necessary.
OfcoursenowwehavetheanomalyofaDynamicProvisioningVDEVbeingcalledaVVOLgroupeven
thoughitcanonlycontainoneVVOL.

Layout of internal VDEVs on the parity group


Recall that the VDEV is the logical container from which LDEVs are carved, and that internal VDEVs
representtheformattedspacewithinaparitygroup.
This abstraction of a VDEV as a logical container allows operations that deal with LDEVs to be
implementedlargelyindependentlyofthetypeofVDEV,andthusLDEVsworkthesamewaywhether
theyareinternal,external,CoWVVOL,orDPVOL.

RESTRICTEDCONFIDENTIAL

Page94

ThusitsintheinternaltypeofVDEVwherethemappinghappenstotheformattedspacewithinaparity
group.
In the next two diagrams, we see the VDEV layout on first RAID1 (2D+2D) and then RAID5 (3D+1P)
paritygrouptypes.

ThustheVDEVismappedtochunksonthediskdrives.IntheUniversalStoragePlatformfamily,the
chunk size varies only with the emulation type, and not by the RAID level or number of drives in the
paritygroup.
Notethatthechunksarerelativelybig.Aboutahundred4KBblocksfitwithinonechunk.
ForrandomI/O,onlythedatarequestedbythehostisreadorwritten.Ifthehostreadsa4KBblock,
only4KBisreadfromthedrive,notanentirechunk.
ThetimeittakestodoarandomI/Oiscomposedofover99%mechanicalpositioningandlessthan1%
datatransfer.Thisisbecauseablocksizeinthe4KBto8KBrangerepresentsontheorderofonly1%of
adiskdrivephysicaltracksize,andmechanicalpositioning(seek+latency)takesonaveragemorethan
onefulldiskdriverotation.
ThatswhyforrandomI/Oworkloads,allwecareaboutisthediskdriveseektimeandRPM.
Thechunksizeischosensothattheamountoftimeittakestotransferawholechunk(thesizeofthe
chunkdividedbytheread/writeheadsustaineddatatransferrate)isofthesameorderofmagnitudeas
themechanicalpositioningtime,thatis,seekandlatencytime.
ItsdesignedthiswaybecausesequentialI/Ooperationsreadorwriteanentirechunkatatime.With
having such big chunks, very roughly one half of the disk drive busy time when reading or writing
sequentiallyismechanicalpositioningandonehalfisdatatransfer.
RESTRICTEDCONFIDENTIAL

Page95

Thismakessequentialtransferrelativelyproductive.
ItalsomakestheI/Oservicetimetoreadorwriteachunkabouttwiceaslongintimeasasmallblock
randomI/O.Themechanicalpositioningisthesameforboth,butforlargeblocksequentialI/Oafter
themechanicalpositioningiscompleteittakesroughlyonediskdriverotationworthoftimetotransfer
thechunk.
Youmightaskwhynotmakethechunksevenbigger?Iftransferringa512KBchunkrepresentsalmost
of the total I/O service time, then why not make the chunks 1MB? That way you would get a 50%
throughput increase for sequential I/O, because then almost 2/3 of the drive busy time during a
sequentialI/Owouldbedatatransfer.
ThereasonisthatifarandomreadmissoccurswhileasequentialI/O(prefetchstageorwritedestage)
is in progress, the host has to wait. Sequential I/O operations already take about twice as long as
random I/Os, and if we made the chunks even bigger the host would have to wait even longer if a
randomreadmissoccursduringasequentialI/O.Thustheresatradeoffasalargerchunksizewould
increasesequentialthroughputattheexpenseofadverselyimpactingrandomreadperformance.
NotealsointheRAID1VDEVchunklayoutdiagramthattheprimarychunks(showncoloredblue)and
themirrorcopychunks(coloredred)areinalternatingpositionsfromrowtorow.Thisisdonebecause
themirrorcopiesarenormallywriteonlyasabackup.Allreadoperationsaredirectedtotheprimary
copyunlessthedrivecontainingtheprimarycopyhasfailed.
Thus the effect of the alternating primary and mirror chunks in the layout from row torow in RAID1
tendstobalancethenumberofreadoperationsdirectedtoeachdrivewithintheparitygroup.
LetstakealookatthechunklayoutforaRAID5(3D+1P)VDEVintheUniversalStoragePlatformfamily:

YoumightaskWhydoesthe2ndrowhavethechunksnumberedintheorder4,5,3?Thisisconfusing.
The reason why is that, in the Universal Storage Platform family, sequential reads will read a chunk
fromeachdriveintheparitygroupinparallel.Thelayoutaboveenableschunks1,2,3,4tobereadat
once. In other words, sequential reads do not read one parity group stripe at a time. However, for
RESTRICTEDCONFIDENTIAL

Page96

sequentialwritesonestripeatatime(inthiscase3datablocksandoneparityblock),againperforming
oneI/Otoeachdriveintheparitygroupatonce.
Note:TheHDSAMSormodularfamilyofsubsystemsusesadifferentRAID5chunklayoutschemewhich
ismorewhatyoumighthaveexpected.Theanalogousdiagramfor3+1foramodularsubsystemhasin
row2fromlefttorightchunks3,4,parity35,andchunk5.
WesaidthattheabstractionofaVDEVasalogicalcontainerallowsoperationsthatdealwithLDEVsto
beimplementedlargelyindependentlyofthetypeofVDEV.OnesmalldependencyonthetypeofVDEV
is that LDEVs always start on a RAID stripe boundary. That is, the first chunk of an internal LDEV is
alwaysthefirstdatachunkofaRAIDstripe,orparitygrouprow.
ThisisdonetomakeiteasytocalculatethephysicallocationofwhereaparticularLBAisstored.Each
LDEVhasanattributewhichistheLBAaddressofwherethatLDEVstartsoneverydriveintheparity
group.ThusyoucandivideanLDEVLBAbytheRAIDstripesize(dataonlystripesize,exclusiveofparity
ormirrorcopies),andthequotientgivesyoutherelativestripeoffsetinchunksonthedrive,relativeto
wheretheLDEVstartsoneachdrive,andtheremaindertellsyouwhereyouarewithinthestripeand
therefore which drive contains that LDEV LBA. (The number of the stripe on the drive also tells you
whichchunksonthatstripearethedatachunksandwhicharethemirrorcopy/paritychunk(s).)

RESTRICTEDCONFIDENTIAL

Page97

Appendix 16. Concatenated Parity Groups

TheUniversalStoragePlatformVfamilyhasexactlythefollowing4baseParityGrouptypes:
RAID1(2D+2D)
RAID5(3D+1P)
RAID5(7D+1P)
RAID6(6D+2P)
Notethatthereisno4+4asabaseParityGrouptype.
Thefollowingconceptualdiagram,showingtwoParityGroupsthatarenotconcatenated,willbeusedas
thestartingpointtoexplaintwowayconcatenation:

ConcatenatedParityGroupsarederivedfromthebaseParityGrouptypesbyinterleavingtheirVDEVs.
Thefollowingdiagramshowsatwowayconcatenationorinterleaves:

RESTRICTEDCONFIDENTIAL

Page98

Two-way parity group concatenation (VDEV interleave)


NOTE: Concatenation does not create one big VDEV for both parity
groups. Each VDEV has its first stripe on the parity group that it is
named after. Thereafter, RAID stripes for a given VDEV alternate
between the parity groups in the concatenated set.

VDEV 1-1-(1)

...

...

VDEV 3-1-(1)

...

...

Parity group 1-1

...

...

Parity group 3-1

TherearetwotypesoftwowayinterleavedParityGroups.

ThetwowayinterleaveofRAID1(2D+2D)iswhatweusuallycall4+4.
o

NotethatthereisaVDEVforeachParityGroupthatisthesamesizeastheVDEVwould
havebeenwithoutconcatenation.ThusthebiggestLDEVyoucanmakeona4+4isthe
samesizeasthebiggestLDEVyoucanmakeona2+2.

ThetwowayinterleaveofRAID5(7D+1P)doesnthaveashorthandname,butifyouwrite
2x(7+1)thisisclear.Sometimespeoplewrite14+2iftheyarewritingabouttheUniversal
StoragePlatformVfamilyyouknowthata14+2couldonlybe2x(7+1).

RESTRICTEDCONFIDENTIAL

Page99

ThereisonetypeoffourwayinterleavedParityGroup.

ThefourwayinterleaveofRAID57+1maybeclearlywrittenas4x(7+1).Ifyouseesomeone
write28+4,thisiswhattheymustbetalkingabout.

Yes,thenameisconfusing.
YouwillsometimesseethetermsVDEVdisperseorVDEVinterleave.Theseareequivalenttothe
officialnameconcatenatedParityGroups.SometimespeoplewilltalkaboutLDEVstriping,andhereit
ispossiblethattheyaretalkingaboutconcatenatedParityGroups,becausetheLDEVsthatarecarved
fromaVDEVthatispartofaconcatenatedParityGroupsetareindeedstripedacrosstheParityGroups
thathavebeenconcatenatedtogether.
Thetermconcatenatedisunfortunate,asmanynativeEnglishspeakerswouldassumethatthisrefers
tosomethingmorelikeaLUSE.AcolloquialEnglishexpressionoftenusedtodescribeLUSEisspilland
fill, because in a LUSE, reading or writing sequentially through the LUSE, one proceeds through each
constituent LDEV in its entirety before moving to the next. Thus LUSE does not improve sequential
throughput.

Advantages of concatenated Parity Groups


IndividualLDEVsarestripedacrossthesetofconcatenatedParityGroups.Thismeansthat
sequentialperformanceisincreased,asawhenstagingordestagingsequentialdata,anI/Oto
readorwriteachunkmaybeissuedtoallthedrivesintheconcatenatedParityGroupsetatthe
sametime.

RandomI/OoperationsareeffectivelydispersedacrossthesetofconcatenatedParityGroups.
IfsmallblockrandomI/Ooperationsareclusteredtowithinuptoahundredblocksorso,they
willlikelygotothesinglediskdrivethatcontainsthechunk(512KBforOpenV)thatcontains
thosesmallblocks.ThisistrueforanyParityGrouptype.However,theadvantageof
concatenatedParityGroupsisthatifrandomI/OisclusteredwithintensorhundredsofMB,
theseI/OswillbedispersedacrossthewholesetofconcatenatedParityGroups.Inthecaseofa
LUSEvolume,randomI/Oclusteredwithin10sor100sofmegabyteswilllikelybeconcentrated
onasingleLDEVthatisonasingleParityGroup.

RESTRICTEDCONFIDENTIAL

Page100

ConcatenatedParityGroupsreducetosomedegreeindividualParityGrouphotspots.The
totalactivityacrossalltheLDEVsonasetofconcatenatedParityGroupsisdistributedmoreor
lessevenlyacrossthatsetofParityGroups.ThusthisaveragesoutParityGrouputilization
acrosstheconcatenatedset,andthusreducesaverageParityGroupbusy.ReducedParity
Groupbusyyieldsbetteroverallaverageresponseandofferstheopportunitytoincrease
overallthroughput.
NotethatDynamicProvisioningoffersthebenefitsofI/Oactivitydispersalacrossmuch
largernumbersofParityGroups.

Disadvantages of concatenated Parity Groups:


SinceI/OactivityisdispersedoveraconcatenatedsetofParityGroups,allLDEVscarvedfrom
VDEVsinthesetaresharingthesameperformancecharacteristics.Thusthereisless
performanceisolation.Normallydispersingactivityisagoodthing,butitshouldbepointed
outthatanyoftheLDEVsinthecombinedsethaveaccesstothefullbandwidthofthedisk
drivesintheset,andthushighactivityLDEVswillmaketheentireconcatenatedsetofdrives
busy,andthusreducetheperformanceoftheotherLDEVssharingthosedrives.

Shouldadiskdrivefail,performancecouldbeaffectedforalargersetofLDEVs.

MostpeopleexpectthatwhenyouconcatenateParityGroupstogether,youwillbeableto
makebiggerLDEVs.Aswesaw,concatenationdoesnotalterthenumberorsizeoftheVDEVs
thatwouldotherwisebeassignedtothesameParityGroupsiftheywerenotconcatenated
together.Tofixthisproblem,ifyouwanttheperformancecharacteristicsofLDEVsthatare
stripedacrossaconcatenatedsetofParityGroups,butyoualsowantaLUNthatisasbigasthe
combinedformattedsizeoftheParityGroupsintheset,thencarveasinglemaximumsizeLDEV
outofeachVDEVbelongingtoasetofconcatenatedParityGroups,andLUSEtheseLDEVs
togethertomakeoneLUN.

Using LDEVs on concatenated Parity Groups as Dynamic Provisioning pool volumes


RecallthatDPVOLsaredividedupinto42MBpages,andthateachindividualpagewhenithasbeen
writtentoisassignedalocationinthepool.
42MBisfairlybig,inthesensethatthebiggestRAIDstripeisanOpenVRAID5(7D+1P)stripewhich
has7datablocksof512KBeachforatotalof3.5MBofdata.
Whatwethusseeisthateach42MBDP/CoWpageiscomprisedattheParityGrouplevelofaseriesof
RAIDstripes.EachpagestartsonaRAIDstripeboundary.The42MBsizeofapagewaspickedsoasnot
towasteanyspaceinanyofthebaseParityGrouptypes.One42MBpageiscomprisedof:

42stripesofa2+2,wherethestripesizeis2x512KB=1MB.

28stripesofa3+1,wherethestripesizeis3x512KB=1.5MB.

12stripesofa7+1,wherethestripesizeis7x512MB=3.5MB.

14stripesofa6+2,wherethestripesizeis6x512MB=3MB.

We have not specifically tested the performance of LDEVs on concatenated Parity Groups used as
DP/CoW pool volumes. However, there is a theoretical argument that says this might improve
RESTRICTEDCONFIDENTIAL

Page101

sequential performance. Since sequential performance is higher for normal LDEVs on concatenated
ParityGroupsduetothecharacteristicofreading/writingachunktoallthedrivesintheconcatenated
setatonce,(reading/writingoneroundofinterleavedRAIDstripesatones)itwouldbeworthtrying
to see if this also applies to sequential processing of DP/CoW pages. This may or may not provide a
benefit in the context of DP/CoW sequential processing. This document may be revised in future to
clarifythispoint,butforthetimebeingthisremainsanopenquestion.

Summary of the characteristics of concatenated Parity Groups (VDEV interleave):


ThesizeoftheVDEVsisexactlythesameasthesizeofVDEVsonsimilarParityGrouptypesthat
arenotinterleaved.

NochangetothemaximumsizeofanLDEV.

ItisnotpossibletoallocateasingleLDEVthatconsumesallthespaceinalltheParityGroups
beingconcatenated(interleaved)together.Instead,createamaximumsizeLDEVoneachVDEV
associatedwiththesetofconcatenatedParityGroupsandLUSEtheseLDEVstogether.

TheI/OactivityofeachLDEVcarvedoutofaconcatenatedParityGroupVDEVmapstothe
underlyingRAIDstripescomprisingtheVDEV,whichmeansthattheI/Oactivityspreadsacross
allthedrivesintheconcatenationset.

SequentialI/Othroughputisimprovedbecauseyoucanreadorwriteonechunktoeachofthe
drivesintheconcatenatedsetinparallel.

RandomI/OoperationsbenefitfromhavinganLDEVsI/Ooperationsdistributedacrossalarger
numberofdiskdrives.

RAIDgroupperformanceisolationhappensattheleveloftheconcatenatedset,asalltheLDEVS
inalltheVDEVsintheconcatenationsetsharethesamedrives.

InadditiontothebaseParityGrouptypes2+2,3+1,7+1,and6+2,concatenatedParityGroups
yieldthetypes2x(2+2),whichiscommonlycalled4+4,aswellas2x(7+1)and4x(7+1).

RESTRICTEDCONFIDENTIAL

Page102

Corporate Headquarters 750 Central Expressway, Santa Clara, California 95050-2627 USA
Contact Information: + 1 408 970 1000 www.hds.com / info@hds.com
Asia Pacific and Americas 750 Central Expressway, Santa Clara, California 95050-2627 USA
Contact Information: + 1 408 970 1000 www.hds.com / info@hds.com
Europe Headquarters Sefton Park, Stoke Poges, Buckinghamshire SL2 4HD United Kingdom
Contact Information: + 44 (0) 1753 618000 www.hds.com / info.uk@hds.com
Hitachi is a registered trademark of Hitachi, Ltd., and/or its affiliates in the United States and other countries. Hitachi Data Systems is a registered trademark and
service mark of Hitachi, Ltd. In the United States and other countries.
Microsoft is a registered trademark of Microsoft Corporation.
Hitachi Data Systems has achieved Microsoft Competency in Advanced Infrastructure Solutions.
All other trademarks, service marks, and company names are properties of their respective owners.
Notice: This document is for informational purposes only, and does not set forth any warranty, express or limited, concerning any equipment or service offered or
to be offered by Hitachi Data Systems. This document describes some capabilities that are conditioned on a maintenance contract with Hitachi Data Systems
being in effect, and that may be configuration-dependent, and features that may not be currently available. Contact your local Hitachi Data Systems sales office for
information on feature and product availability.
Hitachi Data Systems sells and licenses its products subject to certain terms and conditions, including limited warranties. To see a copy of these terms and
conditions prior to purchase or license, please go to http://www.hds.com/corporate/legal/index.html or call your local sales representatives to obtain a printed copy.
If you purchase or license the product, you are deemed to have accepted the terms and conditions.
Hitachi Data Systems Corporation 2009. All Rights Reserved
WHP-###-## September 2009

RESTRICTEDCONFIDENTIAL

Page103

You might also like