Professional Documents
Culture Documents
June 2009
WARNING: This document can only be used as HDS internal documentation for informational
purposes only. This documentation is not meant to be disclosed to customers or discussed without
a proper non-disclosure agreement (NDA).
Date
Description
1.0
December 2007
Initial Release
1.1
April 2008
Updates
2.0
February 2009
2.1
June 2009
Reference
Hitachi Universal Storage Platform V Performance Summary 08212008
Contributors
The information included in this document represents the expertise, feedback, and suggestions of a
number of skilled practitioners. The author would like to recognize and thank the following reviewers
of this document:
Gil Rangel, Director of Performance Measurement Group - Technical Operations
Dan Hood, Director, Product Management, Enterprise Arrays
Larry Korbus, Director, Product Management, HDP, UVM, VPM features
Ian Vogelesang, Performance Measurement Group - Technical Operations
Table of Contents
Introduction .......................................................................................................................................................................... 7
Glossary ........................................................................................................................................ 7
Overview of Changes ......................................................................................................................................................... 10
Introduction
ThisdocumentcoversthearchitectureandconceptsoftheHitachiUniversalStorage PlatformV. The
concepts and physical details regarding each hardware feature are covered in the following sections.
However,thisdocumentisnotintendedtocoveranyaspectsofprogramproducts,databases,customer
specificenvironments,ornewfeaturesavailablebythesecondgeneralrelease.Areasnotcoveredby
thisdocumentinclude:
TrueCopy/ShadowImage/UniversalReplicatorDisasterRecoverySolutions
HostLogicalVolumeManagementGeneralguidelines
HitachiDynamicLinkManager(HDLM)Generalguidelines
UniversalVolumeManagement(UVM)virtualizationofexternalstorageforData
LifecycleManagement(DLM)
VirtualPartitionManagement(VPM)Generalguidelinesforworkloadmanagement
OracleGeneralguidelinesforstorageconfiguration
MicrosoftExchangeGeneralguidelinesforstorageconfiguration
ThisdocumentisintendedtofamiliarizeHitachiDataSystemssalespersonnel,technicalsupportstaff,
customers,andvalueaddedresellerswiththefeaturesandconceptsoftheUniversalStoragePlatform
V.Theusersthatwillbenefitfromthisdocumentarethosewhoalreadypossessanindepthknowledge
oftheHitachiTagmaStoreUniversalStoragePlatformarchitecture.
Glossary
Throughout this paper the terminology used by Hitachi Data Systems not Hitachi will be normally
used.AssomestorageterminologyisuseddifferentlyinHitachidocumentationorbytheusersinthe
field,herearesomedefinitionsasusedinthispaper:
ArrayGroupThetermusedtodescribeasetoffourphysicaldiskdrivesinstalledasagroupin
thesubsystem.WhenasetofoneortwoArrayGroups(fouroreightdiskdrives)isformatted
usingaRAIDlevel,theresultingformattedentityiscalledaParityGroup.Althoughtechnically
thetermArrayGroupreferstoagroupofbarephysicaldrives,andthetermParityGrouprefers
to something that has been formatted as a RAID level and therefore actually has parity data
(hereweconsideraRAID10mirrorcopyasparitydata),beawarethatthistechnicaldistinction
isoftenlostandyouwillseethetermsParityGroupandArrayGroupusedinterchangeably.
RESTRICTEDCONFIDENTIAL
Page7
BED Backend Director feature; the pair of Disk Adapter (DKA) PCBs providing 4 pairs of
backendFibreChannelloops.Pairsoffeatures(4PCBs,or8pairsofloops)arealwaysinstalled.
CHAHitachisnamefortheFED.
CMA Cache Memory Adapter board (8 1064 MB/sec ports, 4 banks of RAM using 16 DDR2
DIMMslots)
ConcatenatedParityGroupAconfigurationwheretheVDEVscorrespondingtoapairofRAID
10 (2D+2D) or RAID5 (7D+1P) Parity Groups, or a quad of RAID5 (7D+1P) Parity Groups, are
interleavedonaRAIDstripebyRAIDstripe,round robinbasisontheirunderlyingdiskdrives.
ThishastheeffectofdispersingI/Oactivityovertwiceorfourtimesthenumberofdrives,butit
doesnotchangethenumber,names,orsizeofVDEVs,andhenceitdoesn'tmakeitpossibleto
assignlargerLDEVs.NotethatweoftenrefertoRAID10(4D+4D),butthisisactuallytwoRAID
10(2D+2D)ParityGroupsinterleavedtogether.Foramorecomprehensiveexplanationseethe
ConcatenatedParityGroupsectionintheappendix.
CSWCacheSwitchboard(161064MB/secports),a2DversionofaHitachiSupercomputer3D
crossbarswitchwithnanosecondlatencies(porttoport)
DKAHitachisnamefortheBED.
DPVOL a Dynamic Provisioning (DP) Volume; the Virtual Volume from an HDP Pool. Some
documentsrefertothisasaVVOL.ItisamemberofaVVOLGroup.EachDPVOLhavingauser
specifiedsizebetween8GBand4TB.
Featureaninstallablehardwareoption(suchasFED,BED,CSW,CMA,SMA)thatincludestwo
PCBs(oneperpowerdomain).NotethattwoBEDfeaturesmustbeinstalledatatime.
FEDFrontendDirectorfeature;thepairofChannelAdapter(CHA)PCBsusedtoattachhosts
tothestoragesystem,providingOpenFC,FICON,orESCONattachment.
HDU(HardDiskUnit)the64diskcontainerinaframethathas32diskslotsonthefrontside
and32moreontheback.TheHDUisfurthersplitintoleftandrighthalves,eachonseparate
powerdomains.ThereareuptofourHDUsperexpansionframe,andtwointhecontrolframe.
LDEV(LogicalDevice)Alogicalvolumeinternaltothesubsystemthatcanbeusedtocontain
customer data. LDEVs are uniquely identified within the subsystem using a six hex digit
identifierintheformLDKC:CU:LDEV.LDEVsarecarvedfromaVDEV(seeVDEV),andthusthere
arefourtypesofLDEVsinternalLDEVs,externalLDEVs,COWVVOLs,andDPVOLs.LDEVsmay
bemappedtoahostasaLUN,eitherasasingleLDEV,orasasetofupto36LDEVscombinedin
the form of a LUSE. Note: what is called an LDEV in HDS enterprise subsystems like the
Universal Storage Platform V is called an LU or LUN in HDS modular subsystems like the AMS
family,andwhatiscalledaLUNinHDSenterprisesubsystemsiscalledanHLUNinHDSmodular
subsystems.
LUN(LogicalUnitNumber)thehostvisibleidentifierassignedbytheusertoanLDEVtomake
itvisibleonahostport.AninternalLUNhasanominalQueueDepthlimitof32,andanexternal
(virtualized)LUNhasaQueueDepthlimitof2128(Adjustable).
RESTRICTEDCONFIDENTIAL
Page8
eLUN(externalLUN)anexternalLUNIsonewhichislocatedinanotherstoragearraythatis
attached via two or more ePorts on a Universal Storage Platform V and accessed by the host
throughaUniversalStoragePlatformV.
ePort(externalPort)AnexternalarrayconnectsviatwoormoreUniversalStoragePlatformV
Fibre Channel FED ports on the Universal Storage Platform V instead to a host. The FED ports
usedinthismannerarechangedfromahosttargetintoaninitiatorport(orexternalPort)by
useoftheUniversalVolumeManagerSoftwareproduct.TheUniversalStoragePlatformVwill
discover any exported LUNs on each ePort, and these will be configured as eLUNs used by
hostsattachedtotheUniversalStoragePlatformV.
LUSE(LogicalUnitSizeExpansion)Aconcatenation(spillandfill)of2to36LDEVs(uptoa
60TBlimit)thatisthenpresentedtoahostasasingleLUN.ALUSEwillnormallyperformatthe
levelofjustoneoftheseLDEVs.
MP(microprocessor)theCPUusedontheFEDsandBEDs.AlsocalledCHP(FED)orDKP(BED)
ParityGroupasetofoneortwoArrayGroups(asetof4or8diskdrives)formattedasaRAID
level,eitherasRAID10(oftenreferredtoasRAID1inHDSdocumentation),RAID5,orRAID6.
The Universal Storage Platform V's Parity Group types are RAID10 (2D+2D), RAID5 (3D+1P),
RAID5(7D+1P),andRAID6(6D+2P).InternalLDEVsarecarvedfromtheVDEV(s)corresponding
to the formatted space in a Parity Group, and thus the maximum size of an Internal LDEV is
determinedbythesizeoftheVDEVitiscarvedfrom.ThemaximumsizeofaninternalVDEVis
approximately2.99TB.IftheformattedspaceinaParityGroupisbiggerthan2.99TB,thenas
manymaximumsizeVDEVsaspossibleareassigned,andthentheremainingspaceisassigned
as the last, smaller VDEV. Note that there actually is no 4+4 Parity Group type see
ConcatenatedParityGroup.
PCBprintedcircuitboard;aninstallableboard(adapter).TherearetwoPCBsperFeature.
PDEV(PhysicalDEVice)aphysicalinternaldiskdrive.
SMA Shared Memory Adapter board (64 150MB/sec ports, 2 banks of RAM using 8 DDR2
DIMMslots,upto8GB)
SVPServiceProcessor(aPCrunningWindowsXP)installedinthecontrolrack
VDEVThelogicalcontainerfromwhichLDEVsarecarved.TherearefourtypesofVDEVs:
o InternalVDEV(2.99TBmax):mapstotheformattedspacewithinaparitygroupthatis
availabletostoreuserdata.LDEVscarvedfromaparitygroupVDEVarecalledinternal
LDEVs.
o External storage VDEV (2.99TB max): maps to a LUN on an external (virtualized)
subsystem.LDEVscarvedfromexternalVDEVsarecalledexternalLDEVs.
o Copy on Write (CoW) VDEV (2.99TB max): called a "VVOL group", and LDEVs carved
fromaCoWVVOLgrouparecalledaCoWVVOLs
o DynamicProvisioning(DP)VDEV(4TBmax):calledaVVOLgroup,andLDEVscarved
fromaDPVVOLgrouparecalledaDPVOLs(DynamicProvisioningVolumes).
VVOLGrouptheorganizationalcontainerofeitheraDynamicProvisioningVDEVoraCopyon
WriteVDEV.WithDynamicProvisioning,itisusedtoholdoneormoreDPVols.
RESTRICTEDCONFIDENTIAL
Page9
Overview of Changes
ExpandingontheprovenandsuperiorHitachiTagmaStoreUniversalStoragePlatformtechnology,the
Universal Storage Platform V offers a new level of Enterprise Storage, capable of meeting the most
demanding of workloads while maintaining great flexibility. The Universal Storage Platform V offers
muchhigherperformance,higherreliability,andgreaterflexibilitythananycompetitiveofferingtoday.
ThesearethenewfeaturesthatdistinguishthecompletelyrevampedUniversalStoragePlatformVfrom
thepreviousUniversalStoragePlatformmodels:
Software
o HitachiDynamicProvisioning(HDP)volumemanagementfeature.
Hardware
o EnhancedSharedMemorysystem(upto24GBand256paths@150MB/s).
o Faster,newergeneration800MHzNECRISCprocessorsonFEDs(ChannelAdapters)
andBEDs(DiskAdapters).
o FEDsMPshaveaMPWorkloadSharingfunction(perPCB)
o BEDshave4Gbit/sbackenddiskloops.
o SwitchedFCALLoopinterfacebetweenBEDsanddisks.
o HalfsizedPCBsusednow(exceptforSharedMemoryPCBs),allowingformore
flexibleFEDconfigurationcombinations.
o Supportforinternal1TB7200RPMSATAIIdisks.
o SupportforFlashDrives.
Software Overview
The Universal Storage Platform V software includes Hitachi Dynamic Provisioning, a major new Open
Systems volume management feature that will allow storage managers and system administrators to
moreefficientlyplanandallocatestoragetousersorapplications.Thisnewfeatureprovidestwonew
capabilities: thin provisioning and enhanced volume performance. Hitachi Dynamic Provisioning
providesforthecreationofoneormoreHitachiDynamicProvisioningPoolsofphysicalspace(eachPool
assignedmultipleLDEVsfrommultipleParityGroupsofthesamedisktypesandRAIDlevel),andforthe
establishment of DP volumes (DPVOLs, or virtual volumes) that are connected to a single Hitachi
DynamicProvisioningPool.
Thin provisioning comes from the creation of DPVOLs of a userspecified logical size without any
correspondingallocationofphysicalspace.Actualphysicalspace(allocatedas42MBPoolpages)isonly
assigned to a DPVOL from the connected Hitachi Dynamic Provisioning Pool as that DPVOLs logical
spaceiswrittenbythehosttoovertime.ADPVOLdoesnothaveanyPoolpagesassignedtoitwhenitis
first created. Technically, it never does the pages are loaned out from its connected Pool to that
DPVOLuntilthevolumeisdeletedfromthePool.Atthatpoint,allofthatDPVOLsassignedpagesare
returnedtothePoolsFreePageList.CertainindividualpagescanbefreedorreclaimedfromaDPVOL
usingfacilitiesoftheUniversalStoragePlatformV.
ThevolumeperformancefeatureisanautomaticresultfromthemannerinwhichtheindividualHitachi
Dynamic Provisioning Pools are created. A Pool is created using 21024 LDEVs (Pool Volumes) that
providethephysicalspace,andthePoolallocates42MBPoolpagesondemandtoanyoftheDPVOLs
connectedtothatPool.Eachindividual42MBPoolpageisconsecutivelylaiddownonawholenumber
of RAID stripes from one Pool Volume. Other pages assigned over time to that DPVOL will randomly
originatefromthenextfree42MBPagefromoneoftheotherPoolVolumesinthatPool.
RESTRICTEDCONFIDENTIAL
Page10
Asanexample,assumethatthereare12LDEVsfromtwelveRAID10(2D+2D)ParityGroupsassignedto
aHitachiDynamicProvisioningPool.All48disksinthatPoolwillcontributetheirIOPSandthroughput
powertoalloftheDPVOLsconnectedtothatPool.IfmorerandomreadIOPShorsepowerwasdesired
for that Pool, then it could have been, for example, created with 16 LDEVs from 16 RAID5 (7D+1P)
ParityGroups,thusproviding128disksofIOPSpowertothatPool.
As up to 1024 LDEVs may be assigned as Pool Volumes to a single Pool, this would provide a
considerable amount of I/O power to that Pools DPVOLs. This type of aggregation of disks was only
possiblepreviouslybytheuseofoftenexpensiveandsomewhatcomplexhostbasedvolumemanagers
(such as VERITAS VxVM) on each of the attached servers. On the Universal Storage Platform (and the
Universal Storage Platform V), the only alternative would be to build Striped Parity Groups
(ConcatenatedParityGroupsseeAppendix16)usingtwoorfourRAID5(7D+1P)ParityGroups.This
would provide either 16 or 32 disks under the volumes (VDEVs) created there. There is also the LUSE
option, but that is merely a simple concatenation of 236 LDEVs, a spillandfill capacity (not
performance)configuration.
AcommonquestionisHowdoestheperformanceofDPVOLsdifferfromtheuseofStandardVolumes
when using the same number of disks? Consider this example. Say that you are using 32 disks in 8
Parity Groups as RAID10 (2D+2D) with 8 LDEVs, all used as Standard Volumes over 4 host paths.
ComparethistoaHitachiDynamicProvisioningPoolwiththosesame8LDEVsasPoolVolumes,with8
DPVOLs configured against that Pool and used over 4 host paths. If the hosts are applying a heavy,
uniform, concurrent workload to all 8 Standard Volumes, then the server will see about the same
aggregateIOPScapacityaswouldbeavailablefromthe8DPVOLswiththesameworkload.However,if
the workloads per volume (Standard or DPVOL) are not uniform, or see intermittent workloads over
time,theDPVOLswillalwaysdeliveraconstantandmuchhigherIOPScapacitythanwilltheindividual
StandardVolumes.Ifonlyfourvolumes(StandardorDPVOL)weresimultaneouslyactive,thenthefour
StandardVolumeswouldonlyhavea16diskIOPScapacitywhilethefourDPVOLswouldalwayshave
thefull32diskIOPS capacity.Anotherwaytoseethisisthat theuseof StandardVolumescaneasily
leadtohotspots,whereastheuseofDPVOLswillmostlyavoidthem.
Hardware Overview
Figure 1. Frame and HDU Map
L2 Frame
L1 Frame
Control Frame
R1 Frame
R2 Frame
HDU-4L
HDU-4R
HDU-4L
HDU-4R
HDU-2L
HDU-2R
HDU-4L
HDU-4R
HDU-4L
HDU-4R
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
HDU-3L
HDU-3R
HDU-3L
HDU-3R
HDU-1L
HDU-1R
HDU-3L
HDU-3R
HDU-3L
HDU-3R
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
HDU-2L
HDU-2R
HDU-2L
HDU-2R
HDU-2L
HDU-2R
HDU-2L
HDU-2R
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
LOGIC BOX
HDU-1L
HDU-1R
HDU-1L
HDU-1R
HDU-1L
HDU-1R
HDU-1L
HDU-1R
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD front
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
16 HDD rear
DKU-L2
DKU-L1
RESTRICTEDCONFIDENTIAL
DKC
DKU-R1
DKU-R2
Page11
WhiletherearemanyphysicallyapparentchangestotheUniversalStoragePlatformVchassisandPCB
cards from the previous Universal Storage Platform model, there are also a number of notsoevident
internalconfigurationchangesthatanSEmustbeawareofwhenlayingoutasystem.
The Universal Storage Platform V is a collection of frames, HDUs, PCB cards in a Logic Box, FCAL
switches and disks (see Figure 1). The frames include the control frame (DKC) and the disk expansion
frames(DKUs).Disksareaddedtothe64diskHDUcontainers(upto18suchHDUs)insetsoffour(the
ArrayGroup).ArrayGroupsareinstalled(followingacertainupgradeorder)intospecificHDUdiskslots
oneithertheleftorrighthalf(suchasHDU1RorHDU1L)andfrontandrearofanHDUbox.Setsof
HDUsarecontrolledbyoneoftheBEDs.
Processor Upgrade
Theprocessor(MP)usedontheFEDs(theMPsarealsocalledCHPs)andBEDs(theMPsarealsocalled
DKPs) has been improved and its clock speed has been doubled. The quantities of processors
representedinTable1arevaluesperPCBbyfeature.AfeatureisdefinedasapairofPCBswhereeach
board is located on a separate power boundary. As there are twice as many FED/BED features in the
UniversalStoragePlatformVascomparedtotheUniversalStoragePlatform,theoverallprocessorcount
forFEDsandBEDsremainsthesamebuttheavailableprocessingpowerhasbeendoubled.
Table 1. Universal Storage Platform V Processor Enhancements
Feature
MPsperPCB
USP
USPV
FEDESCON8port
FEDESCON16port
2x400MHz
4x400MHz
FEDFICON8port
4x800MHz
FEDFICON16port
8x400MHz
4x800MHz
FEDFC8port
4x800MHz
FEDFC16port
4x400MHz
4x800MHz
FEDFC32port
8x400MHz
BEDFCAL8port
4x800MHz
BEDFCAL16port
8x400MHz
Page12
CSWCacheSwitch1to4features(2to8PCBs)
CMACacheMemory1to4features(2to8PCBs)
SMASharedMemory1or2features(2to4PCBs)
FEDFrontendDirector(orChannelAdapter)1to14features(2to28PCBs)
BEDBackendDirector(orDiskAdapter)2,4,6,or8features(4,8,12,16PCBs)of4Gb/sec
FibreChannelloops
The Universal Storage Platform Vs new halfsized PCBs (now installed in upper and lower, front and
rear slots in the Logic Box, described later) allow for a less costly, more incremental expansion of a
system.Also,theBEDslotswillalsosupporttheuseofadditionalFEDPCBs(but2BEDfeaturesmustbe
installedataminimum).Forexample,therecouldbe14FEDfeatures(28PCBs)installedinaUniversal
StoragePlatform,andtheycouldbeanymixtureofOpenFibre,ESCON,FICON,andiSCSI.However,this
gaveyoualargenumberofportsofasingletypethatyoumaynotneed,withasubstantialreductionof
otherporttypesthatyoumayneedtomaximize.Withthenewhalfsizedcards,youcanhave18FED
features(orupto14FEDfeaturesifthenumberofBEDfeaturesisreducedtojust2),usinganymixture
(perfeature)oftheinterfacetypesasbefore.Now,astherearehalfasmanyportsperboard,smaller
numbersoflesserusedporttypesmaybeinstalled.FeaturesarestillinstalledaspairsofPCBcardsjust
aswiththeUniversalStoragePlatform.
Table 2. Summary of Limits, Universal Storage Platform to Universal Storage Platform V
Limits
USP
USPV
DataCache(GB)
128
512
RawCacheBandwidth
68GB/sec
68GB/sec
SharedMemory(GB)
12
24
SharedMemoryPaths(max)
192
256
RawSharedMemoryBandwidth
15.9GB/sec
38.4GB/sec
FibreChannelDisks
1152
1152
SATADisks
1152
LogicalVolumes
16k
64k
MaxInternalVolumeSize
2TB
2.99TB
MaxCoWVolumeSize
2TB
4TB
MaxExternalVolumeSize
2TB
4TB
IORequestLimitperFCFEDMP
2048
4096
NominalQueueDepthperLUN
16
32
HDPPools
128
MaxPoolCapacity
1.1PB
MaxcapacityofallPools
1.1PB
LDEVsperPool
1024
DPVolumesperPool
8192
DPVolumeSizeRange
46MB4TB
RESTRICTEDCONFIDENTIAL
Page13
OtherphysicalUniversalStoragePlatformVchangesinclude:
TheassociationsbetweensomeBEDsandHDUshavechanged
TheassociationsbetweenFEDs,BEDs,andtheCSWsaredifferent.
Thelocationsof50%oftheArrayGroupnameshaveshiftedaround.
IttakestwoBEDfeatures(4PCBs)tosupportthe8diskParityGroups(justaswasthecaseon
theLightningmodels).Hence,BEDsmustbeinstalledaspairsoffeatures(4PCBs).
Architecture Details
At the core of the Universal Storage Platform V is a fourth generation, nonblocking, high speed
switchedarchitecture.TheUniversalStarNetworkisafullyfaulttolerant,highperformance,truenon
blockingswitchedarchitecture.ThesestateoftheartHitachisupercomputerderivedcrossbarswitches
are described in more detail below. There are no specific models as there were with the Universal
StoragePlatformproducts.Thereisnowasinglebasemodelontowhichupgradefeaturesareappliedto
meettheneedsofthecustomer.
Figure 2. Universal Storage Platform V System Components and Bandwidths Overview
USP V Universal Star Network
128 control ports x 150 MB/s
(19.2 GB/s)
Shared
Memory
(base)
8-12GB
Cache Memory
(base) 8-128GB
Cache Memory
(Opt 1) 8-128GB
Shared
Memory
(option1)
8-12GB
Cache Memory
(Opt 2) 8-128GB
Cache Memory
(Opt 3) 8-128GB
64 Cache
Paths
(68 GB/s)
Crossbar
Switches
64 concurrent CACHE Paths @ 1064 MB/s
CSW 1
(base)
16 ports
CSW 2
(base)
16 ports
CSW 3
(Opt 1)
16 ports
CSW 4
(Opt 1)
16 ports
CSW 5
(Opt 2)
16 ports
CSW 6
(Opt 2)
16 ports
CSW 7
(Opt 3)
16 ports
CSW 8
(Opt 3)
16 ports
4 - 32 Cache
Paths
(34 GB/s)
4 - 32 Cache
Paths
(34 GB/s)
BED
BED
BED
BED
BED
BED
BED
BED
FED
FED
FED
FED
FED
FED
FED
FED
base
Opt 1
Opt 2
Opt 3
Opt 4
Opt 5
Opt 6
Opt 7
base
Opt 1
Opt 2
Opt 3
Opt 4
Opt 5
Opt 6
Opt 7
8 - 64 4Gbit/s
DISK LOOPS
1152
DISKS
8 224 4Gbit/s
FC HOST PORTS
RESTRICTEDCONFIDENTIAL
Page14
Forquickreference,Figure2abovedepictsthefullyoptionedUniversalStoragePlatformVarchitecture.
OtherUniversalStoragePlatformVconfigurationsaredetailedfurtherdown.Thisarchitectureincludes
64x1064MB/secDatapathsrepresenting68GB/secofdatabandwidthand256x150MB/secControl
paths representing 38.4GB/sec of metadata and control bandwidth. When comparing the Universal
StoragePlatformVtoothervendorsmonolithiccachesystems(suchasEMCDMX4),theaggregateof
the Universal Storage Platform Vs Data + Control bandwidths must be used for an applestoapples
comparison. The throughput aggregate for the fully optioned system is a fully usable 106.4GB/s. This
includes any overhead for block mirroring operations that occur in both the Data Cache and Shared
Memorysystems(discussedfurtherdown).
Features
USP
USP V
FEDs
BEDs
Cache
SharedMemory
17
14
2
2
114
2,4,6,8
4
2
CacheSwitches
FED #1
FED #2
FED #3
FED #4
FED #5
FED #6
FED #7
FED #8
FED #9
FED #10
FED #11
FED #12
FED #13
FED #14
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
1&2
3&4
5&6
7&8
16
16
16
16
32
32
32
32
48
48
48
48
64
64
64
64
80
80
80
80
96
96
96
96
112
112
112
112
128
128
128
128
144
144
144
160
160
160
176
176
192
192
208
224
FED #1
FED #2
FED #3
FED #4
FED #5
FED #6
FED #7
FED #8
FED #9
FED #10
FED #11
FED #12
FED #13
FED #14
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
Total Ports
1&2
3&4
5&6
7&8
8
8
8
8
16
16
16
16
24
24
24
24
32
32
32
32
40
40
40
40
48
48
48
48
56
56
56
56
64
64
64
64
72
72
72
80
80
80
88
88
96
96
104
112
Page15
thefrontandbacklayout.TheassociationsamongBEDs,FEDsandtheCSWs(cacheswitches)arealso
different.ThefactorynamesofthePCBtypesandtheiroptionnumbersareshownaswell.Notethat
FEDupgradefeaturesmayconsumeuptosixpairsofunusedBEDslots.
Figure 3. Logic Box Slots (BED slots 3-8 can be used for FEDs if desired)
USP-V Rear (Cluster-2)
2QL
Opt 3
2PL
Opt 1
2NL
Opt 4
2ML
Opt 3
1FU
Opt 2
1EU
Opt 1
1DU
Basic
1BU
Opt 2
1AU
Opt 1
UPPER
2RL
Opt 4
LOWER
BED-1
2CH
Opt 4
BED-3
BED-2
2SD
Opt 1
BED-4
CSW-0
2CD
Opt 2
CSW-1
FED-1
2TL
Opt 7
FED-3
FED-2
2UL
Opt 8
FED-4
2VL
Opt 3
CMA-4
2WL
Opt 8
SMA-1
2XL
Opt 7
CMA-2
CMA-3
2MU
Opt 1
SMA-2
2NU
Opt 2
CMA-1
2PU
Basic
FED-5
2QU
Opt 1
FED-7
2RU
Opt 2
FED-6
2CG
Opt 3
FED-8
2SB
Opt 2
CSW-2
2CC
Opt 1
CSW-3
2TU
Opt 5
BED-6
2UU
Opt 6
BED-8
2VU
Opt 2
BED-5
2WU
Opt 6
BED-7
2XU
Opt 5
FED-5
CMA-3
SMA-1
CMA-1
FED-2
FED-1
CSW-0
BED-2
BED-1
1LL
Opt 7
1KL
Opt 8
1Jl
Opt 3
1HL
Opt 8
1GL
Opt 7
1CF
Opt 4
1SC
Opt 2
1CB
Opt 2
1FL
Opt 4
1EL
Opt 3
1DL
Opt 1
1BL
Opt 4
1AL
Opt 3
FED-7
CMA-4
SMA-2
CMA-2
FED-4
FED-3
CSW-1
BED-4
BED-3
LOWER
FED-6
1CA
Opt 1
FED-8
1SA
Opt 1
CSW-2
1CE
Opt 3
CSW-3
1GU
Opt 5
BED-6
1HU
Opt 6
BED-8
1JU
Opt 2
BED-5
1KU
Opt 6
BED-7
1LU
Opt 5
UPPER
SharedMemory(ControlMemory)systemformetadataandcontroltables
Datacache(datablocksonly)
LocalRAMpooloneachFEDandBEDPCB(upto32suchPCBsinanarray)forusebythefour
MP processors on each board (128 processors in a full array). This RAM is used for the
workspaceforeachMP,localdatablockcachingandlocalLUNmanagementtables.
NVRAMregionforeachMPthatholdsthemicrocodeandoptionalsoftwarepackages.
RESTRICTEDCONFIDENTIAL
Page16
EachSMAPCBhas8DDR2333DIMMslots,organizedastwoseparatebanksofRAM(4slotseach).A
singleboardcansupportupto8GBofRAMwhenusingeither1GBor4GBDIMMs.Thepairofboardsfor
eachSMAfeatureisinstalledindifferentpowerdomainsintheLogicBox(Cluser1andCluster2).Each
PCB has 64 150MB/s SMA (control) paths. This provides for 9.6GB/s of bandwidth (wire speed) per
board. Each bank of RAM has a peak bandwidth of 5.19GB/s, or a concurrent total of 10.38GB/s per
board.DuetotheveryhighspeedoftheRAMcomparedtotheindividualSMAports,withportbuffering
andportinterleaving,theRAMallowsthe64SMApathstooperateatfullspeed.
RESTRICTEDCONFIDENTIAL
Page17
For each SMA feature, there is a small region that is mirrored onto the other board in that pair for
redundancy.(Note:TheregionusedforHitachiDynamicProvisioningcontrolisbackedupontoaprivate
diskregiononsomePoolVolumes.)TheCluster1SMAboardsareassociatedwiththeCacheAregionof
theDataCache(explainedbelow),whiletheCluster2boardsareassociatedwiththeCacheBregion.
The basic configuration information for the array is located in the lower memory address range of
SharedMemory.ThisisalsobackedupontheNVRAMlocatedoneachFEDandBEDPCBinthesystem.
The metadata and tables for optionalsoftware products (such as Hitachi Dynamic Provisioning) in the
other (higher) memory address regions in Shared Memory are backed up to the internal disk in the
Serviceprocessor(SVPPC)atarraypowerdown.
AddingmoreCSWfeaturesdoesnotaffecttheperformanceofexistingFED/BEDfeatures.
EachCSWboardcansustain81064MB/stransfersbetweenthe8FED/BEDpathsandthe8cacheports
foranaggregateof8.5GB/sperboard.AstheremaybeuptofourCSWfeaturesinstalledpersystem(8
CSW PCBs), there can be 64 cacheside ports switching among 64 FED/BED processor ports. This
provides for a total of 68GB/s per array of nonblocking, dataonly bandwidth to cache from the
FED/BED boards at extremely small (nanosecond) latency rates. No other vendor has a nonblocking
design(despitesomeunsupportableclaimstothecontrary),noraretheyabletoprovidesustaineddata
onlybandwidthatanywherenearthisrate.ThisbandwidthisnotconstrainedbythetypeofI/O(read
versuswrite)sinceallcachepathsarefullspeedandbidirectional.
Figure 5 illustrates a Universal Storage Platform V array with only two CSW and two CMA features
installed.IfallfourCSWandCMAfeatures(8boardseach)wereinstalled,theneachcachesidepath
RESTRICTEDCONFIDENTIAL
Page18
onaCSWPCBwouldgotooneoftheeightCSAPCBs.EveryCSWhasatleastonepathtoeveryCMA.
Farther down in this document are maps of associations of FEDs and BEDs to the CSWs. There is a
certainrelationshipfromtheCSWprocessorsidetotheFEDandBEDCMAports.
Figure 5. Universal Storage Platform V with Two Cache Features and Two Cache Switch Features Installed.
Cache
Cache
Cache
Cache
Cache Side
Cache Side
Cache Side
Cache Side
CSW
CSW
CSW
CSW
Processor Side
Processor Side
Processor Side
Processor Side
RESTRICTEDCONFIDENTIAL
Page19
Thedatacacheislogicallydividedupintological256KBcacheslotsacrosstheentirelinearcachespace
(the concatenation of CacheA and CacheB). It is a single uniform address space rather than a set of
isolatedpartitionedspaceswhereonlycertaindevicesaremappedtoindividualregionsonindependent
cacheboards.
Whencreated,eachUniversalStoragePlatformVLDEV(apartitionfromaParityGroup)isallocatedan
initial set of 16 cache slots for Random I/O operations. A separate set of 24 cache slots is allocated
when a FED MP detects sequential I/O operations occurring on that LDEV. The initial set of 16 cache
slots (per LDEV) for use for processing Random I/O requests can be temporarily expanded by one or
more additional sets of 16 random I/O 256KB slots as needed (if available from the associated cache
partitionsfreespace).
SinceindividualLDEVscandynamicallygrowtoverylargesizesintermsoftheircachefootprintbased
ontheworkloads,theuseofCachePartitioning(usingtheVirtualPartitionManagementpackage)can
be used to fence off the cache space usable by selected Parity Groups (with all of their LDEVs) to
managetheirmaximumallocatedrandomslotsets.
RESTRICTEDCONFIDENTIAL
Page20
InitialRandomCacheAllocationPerLDEV
256KBCacheSlot
64KBReadSegments
64KBWriteSegments
16KBReadSubsegments
16KBWritesubsegments
2KBReadCacheBlocks
2KBWriteCacheBlocks
16
32
32
128
128
512
512
Thetableaboveshowstheoverallbreakdownofelementsforasetof16randomI/Ocacheslotsforone
LDEV.Figures810belowshowtherelationshipamongtheseelements.
Figure 8. Random I/O: Logical Cache Slot and Segments Layout
256KB Cache slot (random)
64KB Read segment
64KB Write Segment
16KB Sub-seg
16KB Sub-seg
Figure 10. Random I/O: Logical Sub-segments and Physical Cache Blocks Layout
2KB block
2KB block
2KB block
16KB Sub-Segment
2KB block 2KB block
2KB block
2KB block
2KB block
Itisbecauseofthesesmall2KBcacheblockallocationunitsthattheUniversalStoragePlatformVdoes
so well with very small block sizes and random workloads. In general, random workloads are usually
associatedwithapplicationblocksizesof2KBto32KB.Othervendorsuseasinglefixedcacheallocation
size (such as 32KB and 64KB for EMC DMX) for individual I/O operations. Hence, a 2KB host I/O will
waste30KBor62KBofeachEMCDMXcacheslot.OntheIBMDS8000series,thecacheallocationunitis
4KB,therebyavoidingwastingcachespace.ButtheDS8000arraysareactuallygeneralpurposeservers
runningtheAIXoperatingsystemusingthelocalsharedRAMonindividualprocessorbooks.Onboth
EMCandIBM,theirglobalcachecontainstheoperatingsystemspace,allstoragecontrolandmetadata,
all software, and all data blocks. On the Universal Storage Platform V, the CMA cache system is only
usedforuserdatablocks.
Write Operations
Inthecaseofhostwrites,theindividualdatablocksaremirrored(duplexed)toanotherCMAboardona
different power boundary (i.e. CacheA and CacheB). The mirroring is only for the actual user data
blocks,unlikethestaticmirroringof50%ofcachetotheother50%ofcacheasisusedbyEMCforDMX
3andDMX4.
Forexample,ifahostapplicationwritesan8KBblockofdata,that8KBblockwillbewrittentotwoCMA
boards, using four 2KB cache blocks from a 64KB Write segment in CacheA and another set of four
cache blocks from CacheB. The FED MP that owns the port on which the host request arrived is the
processorthatperformsthisduplexingtask.Afterthewritehasbeenprocessedanddestagedtodisk,
the 8KB of mirrored blocks will be deleted, and the other 8KB will remain in cache (after being
remappedtoaReadsubsegmentseebelow)forapossiblefuturereadhitonthatdata.
RESTRICTEDCONFIDENTIAL
Page21
NotethatareadcachehitondatanotyetdestagedtodiskfromaWriteSegmentisnotpossible.Infact,
no host reads are directed against Write Segments. A read request for a block which still has a write
pendingconditionwillforceadiskdestagefirst,andallpendingwritesinthatsame64KBsegmentwill
beincludedinthisforceddestage.Similarly,awritecachehitisnotpossibleatanytimeifawritehitis
meantasoverwritingapendingwriteblock.OntheUniversalStoragePlatformV,allblockswrittento
cachemustbedestagedtodiskintheordertheyarrived.Thereisnooverwritingofdirtyblocksnot
yetwrittentodisksasisthecasewithamidrangearray.OntheUniversalStoragePlatformV,awritehit
actuallymeansthatthereisemptyspaceinoneofthatLDEVsexistingwritesegmentsalreadyavailable
foranewI/Orequest.Awritemisswouldthenmeanthatnosuchexistingspaceisavailable,andeither
another set of 16 random I/O cache slots will beallocated for use by that LDEV, or (if cache slots are
tight)allofthependingwritesinthatLDEVsallocated64KBWriteSegmentswillbedestaged,freeing
upallofitsrandomwritespace.
Page22
acrossseveralportsondifferentFEDs(forexample),itwilllooklikerandomI/Ototheseveralindividual
MPscontrollingthosehostports.Asanexample,theuseofahostbasedLogicalVolumeManagerthat
createsalargeRAID0volumebystripingacrossseveralLDEVsonseveraldifferenthostportscandefeat
thesequentialdetectionmechanism.SequentialdetectismanagedforeachLDEVcontrolledbyanMP
onaFEDfortheoneortwoFEDportsthatitowns.
Unlike the cache slots assigned for random I/O, each 256KB cache slot is used as a single segment of
256KBforsequentialI/O.ThismatchestheRAIDchunksizeof256KBonthedisks.Insequentialmode,
anentire256KBchunkisreadfromorwrittentoeachdisk.Sequentialprefetchwillreadseveralsuch
chunks into cache slots without being instructed to do so by the controlling MP on the FED involved.
Sequential writes will be performed as fullstripe writes to a Parity Group, thus minimizing the write
penalty for RAID levels 5 and 6. The data in these sequential cache slots can be reusable for a
subsequentcachehit.Ingeneral,oncethedatahasbeenprocessed,thesedynamicallyallocatedcache
slotsarereleasedforthenextsequentialI/OoperationagainstthatLDEVorarereturnedtothecache
pool once sequential detect is no longer present (for some period of time) for that LDEV. However,
certain lab tests do show a high cache hit rate where a small number of LUNs under test have high
sequentialreadratiosanduselargeblocksizes(suchas1MB).
Figure 11. Sequential I/O Read Slot
256KB Cache slot (sequential)
256KB Read segment
Page23
FEDOptions
Features
TotalPorts
USPV
USPV
OpenFibre
014
0224
or
ESCON
08
064
or
FICON
08
064
RefertotheFigurebelowforthefollowingboardcomponentsdiscussion.
The FED processors (CHPs, more commonly called MPs) manage the host I/O requests, the
Shared Memory and Cache areas, and execute the microcode and the optional software
featuressuchasDynamicProvisioning,UniversalReplicator,VolumeMigrator,andTrueCopy.
The DX4 chips are Fibre Channel encoderdecoder chips that manage the Fibre Channel
protocolonthecablestothehostports.
TheDataAdapter(DTA)chipisaspecialASICforcommunicationwiththeCSWs.
TheMicroprocessorAdapter(MPA)isanASICforcommunicationwiththeSMAPCBs.
AllI/O,whetheritisReadsorWrites,internalorexternalstorage,mustpassthroughtheCache
andSharedMemorysystems.
Figure 13. Example of a Universal Storage Platform V FED Board (16-port Open Fibre Feature)
MPA
RESTRICTEDCONFIDENTIAL
Page24
CSW
Feature1
FED1
FED00
FED08
Feature2
Feature3
Feature4
Feature5
FED2
FED3
FED4
FED5
FED02
FED01
FED03
FED04
FED0A
FED09
FED0B
FED0C
16
24
32
40
0
1
1
2
Feature6
Feature7
FED6
FED7
FED06
FED05
FED0E
FED0D
48
56
2
3
Feature8
FED8
FED07
FED0F
64
RESTRICTEDCONFIDENTIAL
FED1
FED2
FED3
FED4
FED5
FED6
FED00
FED02
FED01
FED03
FED04
FED06
FED08
FED0A
FED09
FED0B
FED0C
FED0E
Ports
CSW
16
32
48
64
80
96
0
0
1
1
2
2
Page25
Feature7
FED7
FED05
FED0D
112
Feature8
Feature9
Feature10
FED8
FED07
FED0F
128
FED9
FED10
(BED3)
(BED4)
144
160
1
1
Feature11
Feature12
Feature13
FED11
FED12
FED13
(BED5)
(BED6)
(BED7)
176
192
208
2
2
3
Feature14
FED14
(BED8)
224
RESTRICTEDCONFIDENTIAL
Page26
Page27
PlatformVFEDportsasiftheUniversalStoragePlatformVwasaserver.Ifusingthe16portfeature,
thenbothportsmanagedbyanMPareplacedintothismode.Inessence,thatFEDMP(andtheoneor
twoportsitowns)isoperatedasthoughitwasaBEDMP.LUNsvisibleontheexternalarraysportsare
thenremappedbytheUniversalStoragePlatformVouttoFEDportsattachedtohosts.
AsI/OrequestsarriveoverotherhostattachedFEDportsontheUniversalStoragePlatformVforthose
LUNs, the normal I/O operations within the FED occur. The request is managed by Shared Memory
tablesandthedatablocksgointothedatacache.ButtherequestisthenroutedtotheFEDMP(nota
BEDMP)thatcontrolstheexternalpathswherethatLUNislocated.Theexternalarrayprocessesthe
requestasthoughitweretalkingtoaserverinsteadoftheUniversalStoragePlatformV.
RESTRICTEDCONFIDENTIAL
Page28
Figure 15. Universal Storage Platform V 8-port Fibre Channel Feature (showing both PCBs)
MPA
RESTRICTEDCONFIDENTIAL
MPA
Page29
ESA0
400MH
CHP
z
ESA0
400MH
CHP
z
400MH
CHP
z
400MH
CHP
z
DTA
DTA
MPA
DTA
DTA
MPA
2 X 1064 MB/sec
DATA Only
8 X 150 MB/sec
Meta-DATA Only
2 X 1064 MB/sec
DATA Only
8 X 150 MB/sec
Meta-DATA Only
MPA
RESTRICTEDCONFIDENTIAL
MPA
Page30
BEDFeatureCount
(2PCBs)
BackendLoops
Available
MaxDisks
Configurable
1BED
2BED
256(PMlimit)
3BED
4BED
16
640
5BED
6BED
24
896
7BED
8BED
32
1152
RESTRICTEDCONFIDENTIAL
Page31
BED Feature
Pair
BED Boards
Loop Pairs
CSW
Used
Basic
BED1
BED10
BED18
BED2
BED12
BED1A
Feature2
BED3
BED11
BED19
BED4
BED13
BED1B
16
Feature3
BED5
BED14
BED1C
BED6
BED16
BED1E
24
Feature4
BED7
BED15
BED1D
BED8
BED17
BED1F
32
Page32
Figure 20. Universal Storage Platform V Back-end Disk Layout. (2 BED features)
16 HDD
pg18
pg17
HDU
bed8
bed7
16 HDD
16 HDD
pg18
pg17
HDU
bed6
bed5
16 HDD
16 HDD
pg16
pg15
HDU
bed6
bed5
16 HDD
16 HDD
pg16
pg15
HDU
power
power
DKU-L1
(front view)
bed8
bed7
16 HDD
16 HDD
pg10
pg9
HDU
bed8
bed7
16 HDD
16 HDD
pg10
pg9
HDU
bed6
bed5
DKU-R0
(front view)
bed2
bed1
DKU-R1
(front view)
bed4
bed3
16 HDD
16 HDD
16 HDD
16 HDD
pg2
pg1
pg6
pg5
HDU
bed2
bed1
HDU
bed4
bed3
16 HDD
16 HDD
16 HDD
16 HDD
pg2
pg1
pg6
pg5
HDU
bed-5
bed-1
HDU
bed2
bed1
16 HDD
16 HDD
16 HDD
16 HDD
pg8
pg7
bed-6
bed-2
pg4
pg3
HDU
bed6
bed5
bed-7
bed-3
HDU
bed2
bed1
bed-8
bed-4
power
power
16 HDD
16 HDD
pg8
pg7
HDU
power
power
RESTRICTEDCONFIDENTIAL
16 HDD
16 HDD
pg4
pg3
HDU
power
power
DKU-R2
(front view)
bed4
bed3
16 HDD
16 HDD
pg14
pg13
HDU
bed4
bed3
16 HDD
16 HDD
pg14
pg13
HDU
bed2
bed1
16 HDD
16 HDD
pg12
pg11
HDU
bed2
bed1
16 HDD
16 HDD
pg12
pg11
HDU
power
power
Page33
Figure 22. Rear View of the Universal Storage Platform V Frames, HDUs, with BED Ownership
DKU-R2
(rear facing view)
bed4
bed3
16 HDD
16 HDD
pg14
pg13
HDU
bed4
bed3
16 HDD
16 HDD
16 HDD
16 HDD
16 HDD
16 HDD
pg6
pg5
pg2
pg1
16 HDD
16 HDD
pg12
pg11
HDU
power
power
DKU-L1
(rear facing view)
bed8
bed7
16 HDD
16 HDD
16 HDD
16 HDD
16 HDD
pg6
pg5
pg2
pg1
16 HDD
16 HDD
16 HDD
HDU
bed-1
bed-5
16 HDD
16 HDD
16 HDD
pg4
pg3
bed-2
bed-6
pg8
pg7
HDU
bed2
bed1
bed-3
bed-7
HDU
bed6
bed5
bed-4
bed-8
power
power
16 HDD
pg4
pg3
HDU
power
power
16 HDD
16 HDD
16 HDD
pg8
pg7
HDU
power
power
16 HDD
pg18
pg17
HDU
bed8
bed7
16 HDD
pg10
pg9
HDU
bed6
bed5
16 HDD
16 HDD
DKU-L2
(rear facing view)
bed8
bed7
pg10
pg9
HDU
bed8
bed7
HDU
bed2
bed1
HDU
bed2
bed1
pg12
pg11
HDU
bed2
bed1
16 HDD
DKU-R0
(rear facing view)
bed2
bed1
HDU
bed4
bed3
pg14
pg13
HDU
bed2
bed1
16 HDD
DKU-R1
(rear facing view)
bed4
bed3
16 HDD
pg18
pg17
HDU
bed6
bed5
16 HDD
16 HDD
pg16
pg15
HDU
bed6
bed5
16 HDD
16 HDD
pg16
pg15
HDU
power
power
The yellow section contains sixteen 4disk Array Groups whose names (used in Parity Group
configurations)are31through316(orpg3asindicated).Theofficialname ofthese4diskgroups(2
DISKSonthefront,2onthebackoftheHDU)isArrayGroupeventhoughtheycouldbeassignedRAID
level 10 (no parity involved). See Appendix1 for maps of the Universal Storage Platform V Frames
showingthenamesoftheArrayGroupsandwheretheyarelocated.Takenoteofhowthelocationsof
theArrayGroupshavebeenshiftedbetweentheUniversalStoragePlatformandtheUniversalStorage
Platform V models. This can have important considerations when configuring a system if this is not
understood.
Figure23belowshowsthedetailsofasingleframe,withitsfourHDUs(showingfrontandrearelements
asviewedthroughthefront).Figure24isazoomedinlookatthefrontofjustoneHDUandthedetails
ofitsswitchesanddisks.
Figure 23. Details of a Frame, with the HDUs and FSWs.
USP-V - R2 Frame
HDU Top Rear
8 HDD
8 HDD
FSW-R2B-U
BED-4
8 HDD
FSW-R2B-L
BED-3
BED-4
8 HDD
8 HDD
8 HDD
8 HDD
BED-3
BED-4
8 HDD
8 HDD
8 HDD
8 HDD
BED-1
BED-2
8 HDD
8 HDD
8 HDD
8 HDD
BED-1
BED-2
8 HDD
8 HDD
8 HDD
FSW-R2A-L
8 HDD
FSW-R23-L
8 HDD
FSW-R22-L
8 HDD
8 HDD
BED-1
FSW-R21-L
8 HDD
FSW-R22-U
BED-2
BED-3
FSW-R28-L
FSW-R21-U
8 HDD
FSW-R23-U
BED-2
8 HDD
FSW-R28-U
BED-3
FSW-R29-L
FSW-R2A-U
BED-4
8 HDD
FSW-R29-U
8 HDD
FSW-R20-U
BED-1
FSW-R20-L
8 HDD
RESTRICTEDCONFIDENTIAL
Page34
Figure 24. Closer Look at One HDU and Its Two Switches.
8 HDD
FSW-R20-U
BED-2
8 HDD
BED-1
FSW-R20-L
8 HDD
Disk Details
TheUniversalStoragePlatformVsupportsamixture(inArrayGroupupgradeunitsoffourdisks)ofup
to1152FibreChanneland/orSATAdisks,orupto128Flashdisks(with8BEDfeatures)intheplaceof
128FibreChannelorSATAdisks.Table9illustratestheadvertisedsize(base10),theapparentsizein
thetools(stillbase10),andthetypicalusable(afterParityGroupformatting)sizeofeachtypeofdisk.
AlsoshownistheruleofthumbaveragemaximumrandomIOPSrateforeachtypeofdiskwhenusing
mostofthesurface.ThenumberandsizeoftheLDEVscreatedperParityGroupdeterminehowmuchof
itsdiskssurfacesareinactiveuse.
As more of the disk surface is allocated to LDEVs for a Parity Group, the farther the disk heads must
seek.Thiscreateshigheraverageseektimes.Forexample(usinga15kRPMdisk),whenonlyusingthe
outer25%ofthesurfacetheaveragereadseektimecanbearound1.8ms(providingabout267IOPS),
whereasthe100%surfaceaverageseektimewillbeabout3.8ms(about182IOPS).ForaSATAdiskat
7200RPM,thesevaluesareabout4.3ms(119IOPS)and8.5ms(79IOPS).AllwriteIOPSratesforalldisks
are about 5% lower due to the higher seek precision required. See Appendix 14 for more details on
theoreticalestimatesofdiskIOPSrates.
Normallywiththeuseofdiskswithdifferentinterfacerates(suchas4Gbit/sand2Gbit/s),thebackend
loop performance for all disks will drop to the 2Gbit rate on each BED feature where these disks are
installed.Thisisnottrueofthe3GbitSATAdisksduetothe4Gbit/sspeedconversion,theSATAtoFCAL
protocolconversion,anddatabufferingthattakesplacewithineachSATAdiskcanisterslogicboard.
Table 9. Table of Disk Types and Rule-of-Thumb IOPS Rates
Port
Speed
Advertised
Size(GB)
1TBSATA7.2kRPM
750GBSATA7.2kRPM
450GBFC15kRPM
400GBFC10kRPM
300GBFC15kRPM
146GBFlash
146GBFC15kRPM
72GBFlash
3Gb/sec
3Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
1000
750
450
400
300
146
146
72
976
738.62
432
384
288.2
143.76
143.76
71.5
927.2
701.7
410.4
364.8
273.8
136.6
136.6
67.9
80
80
180
130
180
5000read1500write
180
5000read1500write
72GBFC15kRPM
4Gb/sec
72
71.5
67.9
180
HDDType
Apparent UsableSize
Size(GB)
(GB)
NominalRandom
IOPS
RESTRICTEDCONFIDENTIAL
Page35
ThetypeandquantityofdisksandtheRAIDlevelschosenfortheUniversalStoragePlatformVwillvary
according to analysis of the user workloads, performance and capacity requirements. The following
sections provide information to be considered when selecting which disks to install. The table below
illustratesusablecapacitiesbyRAIDanddisktypesforvariousdiskcapacities.
Table 10. Disk Types and Usable Capacities by RAID Level
256 HDDs
RAID Levels
512 HDDs
1024 HDDs
Array
Groups
Usable
Space %
Usable
Capacity
TB
Array
Groups
Usable
Space %
Usable
Capacity
TB
Array
Groups
Usable
Space %
Usable
Capacity
TB
32
32
87.5%
75%
202.8
173.9
64
64
87.5%
75%
405.7
347.7
128
128
87.5%
75%
811.3
695.4
64
32
64
32
32
75%
87.5%
50%
50%
75%
77.0
89.8
51.3
51.3
77.0
128
64
128
64
64
75%
87.5%
50%
50%
75%
153.9
179.6
102.6
102.6
153.9
256
128
256
128
128
75%
87.5%
50%
50%
75%
307.8
359.1
205.2
205.2
307.8
64
32
64
32
32
75%
87.5%
50%
50%
75%
68.4
79.8
45.6
45.6
68.4
128
64
128
64
64
75%
87.5%
50%
50%
75%
136.8
159.6
91.2
91.2
136.8
256
128
256
128
128
75%
87.5%
50%
50%
75%
273.6
319.2
182.4
182.4
273.6
64
32
64
32
32
75%
87.5%
50%
50%
75%
51.3
59.9
34.2
34.2
51.3
128
64
128
64
64
75%
87.5%
50%
50%
75%
102.7
119.8
68.4
68.4
102.7
256
128
256
128
128
75%
87.5%
50%
50%
75%
205.3
239.6
136.9
136.9
205.3
64
32
64
32
32
75%
87.5%
50%
50%
75%
25.6
29.9
17.1
17.1
25.6
128
64
128
64
64
75%
87.5%
50%
50%
75%
51.2
59.8
34.1
34.1
51.2
256
128
256
128
128
75%
87.5%
50%
50%
75%
102.4
119.5
68.3
68.3
102.4
64
32
64
32
75%
87.5%
50%
50%
12.7
14.9
8.5
8.5
128
64
128
64
75%
87.5%
50%
50%
25.5
29.7
17.0
17.0
256
128
256
128
75%
87.5%
50%
50%
50.9
59.4
34.0
34.0
1TB SATA
RAID5 7D+1P
RAID6 6D+2P
450GB FC
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D
RAID6 6D+2P
400GB FC
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D
RAID6 6D+2P
300GB FC
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D
RAID6 6D+2P
146GB FC or Flash
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D
RAID6 6D+2P
72GB FC or Flash
RAID5 3D+1P
RAID5 7D+1P
RAID10 2D+2D
RAID10 4D+4D
SATA DISKS
LDEVsthatarecreatedfromParityGroupsthatuseSATAdiskscanbeconfiguredasOPENVor3390M
emulation;thereforetheseSATALDEVsarecurrentlyusableformainframevolumes.
SATA disks are constructed with a far lower level of technological sophistication as compared to
enterprisegradeFibreChanneldisks.Ingeneral,thedifferencesbetween theSATAandFibreChannel
disksinclude:
the amount of extreme technology and precision involved with the materials used for Fibre
Channel disks (disk platter materials and magnetic layer sandwiching, head technology, and
headflyheightcontrol)
thedegreeoferrordetectionandcorrection(FibreChanneldisksareaboutdoublethatofSATA)
andprocessingpower(HitachiFibreChanneldisksaredualprocessor)
theexpectedlifespanwiththedutycyclesexperiencedinenterpriseclassstorage
theoverallperformanceavailable
RESTRICTEDCONFIDENTIAL
Page36
WhenmakingperformanceestimatesforSATAParityGroupsthatuseRAID6(6D+2P),onemayuseasa
ruleofthumbthatreadsandwriteswillbe50%and75%lowerthanwithsimilarRAID6(6D+2P)146GB
15kRPMParityGroups.(Subjecttochangeinsummer2009withmicrocodechanges.)
The backend disk loops on the Universal Storage Platform V are operated at 4Gbit/s, with all disks
connectedtotheseloopsviaswitches.AllHitachiSATAdrivesoperateat3Gbit/s.SATAdrivesalsoonly
haveoneinterfaceconnection,whereasFibreChanneldisksaredualported(theymaytransferinand
outofcacheonbothloopsatthetime,withoneactualtransferpossiblethroughtheread/writehead).
TheSATAdiskcanistersfitintothesameDiskChassisslotsasdoFibreChanneldisksbecauseeachSATA
disk canister has a builtin controller module. One side of the controller presents two 4Gbit Fibre
Channelpaths(mimickingastandardFibreChanneldisk),whiletheinnersideoperatesthe3GbitSATAII
protocol. There is buffering in this controller that provides the speed level matching between the
differentprotocolrates.[Note:ThisbufferingkeepstheBEDloopsfromslowingdowntoaratebelow
4Gbitashappenswiththe2Gbit300GB10kFibreChanneldisks.]
The SATA canister interface also performs a readafterwrite check on every block (whether data or
parity/mirror) written to the disk. After a block is written to the disk, it is read from the surface and
compared to the buffer in the canister interface. While this reduces the write rate by some amount
depending on the RAID level, this read from the disk does guarantee that the data was accurately
written.
Table 11. Physical Disk IOPS Consumed by RAID Level and Disk Type
FibreChannel/Flash
HDDIOPS
perHost
Read
HDDIOPS
perHost
Write
RAID10
RAID5
RAID6
SATA
RAID10
RAID5
RAID6
Table11aboveshowstheactualphysicaldiskIOPSrequiredforreadorwriteoperationsasafunctionof
diskstypeandRAIDlevel.AlthoughRAID6doesprotectagainstadoublediskfailure,thewritecostsare
high(especiallyforSATAdiskswiththeextrareadcompareafterwritebythelocaldiskcanisterlogic),
so this must be considered when designing a storage solution. Note that in the case where full stripe
writes occur, the write penalty across the entire Parity Group will be much lower since the old parity
block(s)arenotread.
OtherconsiderationsontheuseofSATAversusFibreChannelwhenusingRAID6include:
thenoloadformattimesareabout18XlongerthanwithFibreChanneldisks
thenoloadcorrectivecopytimesareabout4.4XlongerthanwithFibreChanneldisks
thenoloaddiskcopytimeisabout10XlongerthanwithFibreChanneldisks
thereshouldbe1spareSATAdiskforevery16disksinuse
RESTRICTEDCONFIDENTIAL
Page37
HitachisrecommendationonsuitableusesforSATAvolumesistousethemforNearlinestoragewitha
lowfrequencyofaccessbutonlineavailability,suchas:
Emailarchives,backups,2ndand3rdcopies
Historicaldatawithlimitedaccess(medicalrecords,etc.)
Archiveddocumentsforregulatorypurposes
Soundorvideofiles
Tapereplacement
The decision to use up valuable internal disk slots with SATA disks rather than the much higher
performing Fibre Channel disks should be carefully considered. In many cases, the use of SATA disks
within virtualized storage on a midrange product might make better sense. However, there will be
individual cases where the use of some internal slots for SATA drives will solve a specific customer
business solution in a straight forward manner. It is not expected that a Universal Storage Platform V
wouldnormallybeconfiguredwithahighpercentageofSATAdisks.
C
T
L
F
C
First
BED
Pair
F
C
C
T
L
FSW (upper)
Port Bypass Control
Circuit
LOOP
SWITCH
FSW (upper)
Port Bypass Control
Circuit
LOOP
SWITCH
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
SW
LOOP
SWITCH
Port Bypass Control
Circuit
FSW (lower)
LOOP
SWITCH
Port Bypass Control
Circuit
LOOP
SWITCH
Port Bypass Control
Circuit
LOOP
SWITCH
Port Bypass Control
Circuit
LOOP
SWITCH
Port Bypass Control
Circuit
LOOP
SWITCH
Port Bypass Control
Circuit
LOOP
SWITCH
Port Bypass Control
Circuit
C
T
L
F
C
Second
BED
Pair
F
C
C
T
L
LOOP
SWITCH
Port Bypass Control
Circuit
FSW (lower)
To Next HDU/
FSW
RESTRICTEDCONFIDENTIAL
Page38
Figure 26. Closer View of the Left Half of the Upper and Lower Switches on an HDU
RESTRICTEDCONFIDENTIAL
Page39
Thissystemisshownwith:
Upto128physicaldisksinthecontrollerframe.
TwoBackendDirectorfeatures(BED1,BED2;4PCBs),consistingof164Gb/secFibreChannel
loops,foratotalbackendbandwidthof6.4GB/sec.
One Cache Card feature (CMA1; 2 PCBs) supporting up 64GB of cache with a bandwidth of
17GB/sec.
OneCacheSwitchfeature(CSW0;2PCBs)withatotalinternaldatabandwidthof17GB/sec.
OneSharedMemoryfeature(SMA1;2PCBs)withatotalmetadatabandwidthof9.6GB/sec.
Two FrontEndDirector features (FED1, FED2; 4 PCBs) supporting either 16 ESCON, 16
4Gb/secFICON,164Gb/secFibreChannel,or324Gb/secFibreChannelports.
RESTRICTEDCONFIDENTIAL
Page40
Thissystemisshownwith:
Upto640physicaldisks,with256intheR1frame,256intheR2expansionframe,and128in
theControlFrame.
FourBackendDirectorfeatures(BED1,BED2,BED3,BED4;8PCBs),consistingof324Gb/sec
FibreChannelloops,foratotalbackendbandwidthof12.8GB/sec.
Two Cache Card features (CMA1, CMA2; 4 PCBs) supporting up 128GB of cache with a
bandwidthof34GB/sec.
Two Cache Switch features (CSW1, CSW2; 4 PCBs) with a total internal data bandwidth of
34GB/sec.
OneSharedMemoryfeature(SMA1;2PCBs)withatotalmetadatabandwidthof19.2GB/sec.
Four FrontEndDirector features (FED1, FED2, FED3, FED4; 8 PCBs) supporting either 32
ESCON,324Gb/secFICON,324Gb/secFibreChannel,or644Gb/secFibreChannelports(more
iffewerBEDfeaturesareinstalled).
RESTRICTEDCONFIDENTIAL
Page41
Thissystemisshownwith:
Up to 1152 physical disks, 128 in the control frame, 256 in each of the R1, R2, L1, and L2
expansionframes.
EightBackendDirectorfeatures(16PCBs),consistingof644Gb/secFibreChannelloops,fora
totalbackendbandwidthof25.6GB/sec.
FourCacheCardfeatures(8PCBs)supportingup256GBofcachewithabandwidthof68GB/sec.
FourCacheSwitchfeatures(8PCBs)withatotalinternaldatabandwidthof68GB/sec.
TwoSharedMemoryfeatures(4PCBs)withatotalmetadatabandwidthof38.4GB/sec.
EightFrontEndDirectorfeatures(16PCBs)supportingeither64ESCON,644Gb/secFICON,64
4Gb/secFibreChannel,or1284Gb/secFibreChannelports(upto224in14featuresifonly2
BEDfeaturesareinstalled).
ThealternativemethodtotheuseofindividualvolumesistouseahostbasedLogicalVolumeManager
(LVM) when the planned workloads require either more space or IOPS capacity than the individual
physical volumes can provide. The LVM can create a single logical (virtual) volume from two or more
independentphysicalvolumes.ThisisshowninFigure31.
RESTRICTEDCONFIDENTIAL
Page42
WhensuchalogicalvolumerunsoutofspaceorIOPScapacity,onecouldreplaceitwithonethatwas
createdwithevenmorephysicalvolumesandthencopyalloftheuserdata.Insomecasesitisbestto
create another logical volume and then manually relocate just part of the existing data in order to
redistribute the overall workload across two such logical volumes. These two logical volumes would
probablybemappedtotheserverusingseparatehostpathsaswell.However,thismanualintervention
couldbecomeacostlyandtediousexerciseasworkloadsgrowovertimeorasnewonesareaddedto
themix.Theproblembecomesmorecomplexasnewserversareaddedtothescenario.Thisrequiresa
carefulandaccurateongoingdocumentationeffortofalloftheseelementsandtheirusagebytheteam
ofsystemadministrators.
LUN
LUN
LUN
LDEV
LDEV
LDEV
LDEV
In order to change the size of a volume (LDEV) already in use, one must first create a new one (if
possible)thatwaslargerthantheoldone,andthenmovethecontentsofthecurrentvolumeontothe
newone.Thenewvolumewouldthenberemappedontheservertotakethemountpointoftheold
one,whichisthenretired.Thiscouldbeaccomplishedeithermanuallyorinamoreautomatedfashion
on the Universal Storage Platform and the Universal Storage Platform V with the use of the optional
HitachiTieredStorageManagersoftwareproduct.
RESTRICTEDCONFIDENTIAL
Page43
OnthepreviousHitachiLightningproduct,thevolumesizeswerefixed,andlimitedtoasmallnumberof
OpenXchoices.Ifyouneededsomethingbiggerthanthelargestemulationsize(OpenL,at33GB),the
storage systembased solution was to use the LUSE (Logical Unit Size Expansion) LDEV concatenation
feature(Figure33).NotethatasLUSEisaconcatenationof2to36LDEVs(uptoa60TBmaximumsize),
itoftenoperateswithjusttheIOPSpowerofthosedisksunderasingleLDEV(either4or8disks).Onthe
otherhand,thehostbasedRAID0stripedLVMvolumewillusethepowerofallofthedisksofthelogical
volumeallofthetime.
Figure 33. LUSE Volume
OnaHitachiUniversalStoragePlatformproduct,withitsOpenVemulation,asingleLDEVcouldbeas
large as the entire Parity Group (up to 2.99TB for internal storage and 4TB for external storage).
HoweverasingleParityGroupmightnothavesufficientIOPSpower(disks)tosupporttheIOPSneeded
bytheworkloadonasinglevolume.YoucouldbuildaconcatenatedLUSEvolumeaswiththeLightning,
but the use of the Universal Storage Platforms large volume facility known as Concatenated Parity
Groupsispreferred.Thisfeatureallowsyoutocreatealogicalvolumethatisstripedacross2or4RAID
5(7D+1P)ParityGroups.
The individual RAID stripes from those member Parity Groups are interleaved rather than the LDEVs
being concatenated as the name implies. Thus, one could have 2x or 4x the amount of IOPS (not
capacity) available per volume as compared to a standard volume. Note that a hostbased LVM uses
block striping (usually 64KB) whereas a Concatenated Parity Group uses RAID5 (7D+1P) stripe
interleavingof3,584MB.SeeAppendix3foradetaileddiscussionofRAIDlevels,stripesizes(chunks),
andstripewidths.SeeAppendix16foramoredetaileddiscussionofConcatenatedParityGroups.
Figure 34. Concatenated Parity Group Striped Volume
All of these techniques can be awkward and time consuming to configure. As such, there is a lot of
potentialforerrorinabusyITshop.Nowthereisanewandsimpleralternativetotheoldermethods.
The Hitachi Dynamic Provisioning feature (HDP) allows for new volume management techniques that
provideagreaterdegreeoffreedomofstoragemanagementforadministrators.
RESTRICTEDCONFIDENTIAL
Page44
RESTRICTEDCONFIDENTIAL
Page45
The Figure below is an overview of a single Pool environment used for discussion purposes in the
followingsections.SeeAppendices911foradditionaldetailedviews.
Figure 35. An Hitachi Dynamic Provisioning Pool Environment
Usage Overview
InordertouseDynamicProvisioningvolumes,thereareafewextrastepstotakewhencreatingthem.
One still configures various Parity Groups to the desired RAID level (see D), and then creates one or
more OpenV volumes (LDEVs) on each of them. The first new step in setting up an Hitachi Dynamic
ProvisioningenvironmentwouldbetocreateoneormoreHitachiDynamicProvisioningPools(seeC)
that are each collection of selected LDEVs (Hitachi Dynamic Provisioning Pool Volumes). The second
step would be to create one or more VVOL Groups (see B), which are administrative containers for
virtualvolumes(usedaspseudoParityGroups).ThethirdstepistocreateoneormoreHitachiDynamic
Provisioning Volumes (DPVOLs), each assigned to one VVOL Group, and each connected to a single
Pool.TheDPVOListhenmappedtoahostportasaLUN(seeA)withahostserveremulationmodeand
usedjustlikeanyStandardLDEV.
Page46
Pools(andallnonHitachiDynamicProvisioningvolumes)onaUniversalStoragePlatformVmustbeless
than1.1PB.
Figure 36. Hitachi Dynamic Provisioning Pool and DPVOL Limits
HDPLimits
MaximumnumberofPools
MaximumPoolcapacity
MaximumcapacityofallPools
PoolAllocationWarningThresholds
fixed:
Useradjustable:
Emulation
RAIDLevel
NumberofLDEVsperPool
128
1.1PB
1.1PB
80%
595%
OpenV
any
21024
2.99TBinternal
4TBexternal
8GB
65,536
8192
MaximumLDEVsize
MinimumLDEVsize
MaximumDPVolumespersystem
MaximumDPVolumesperPool
DPVolumesizerange
46MB4TB
ThePoolVolumesizesareimportantinthattheycontrolthestripingrateintheFreePageList(atthe
OptimizestepinPoolsetup).IfallPoolVolumesarethesamesize,andiftheycomefromthesame
typeofParityGroups(i.e.2D+2Dor7D+1P),thenthePoolwillpresentanevendistributionofI/Oacross
theAGsinthatPooltoallofitsconnectedDPVOLs.Theactualsize(ifuniformforalloftheLDEVs)does
notmatterexceptthatitwilldeterminehowsoonthePoolrunsoutofinitialphysicalspace.Youmust
scaleyourusageexpectationsagainsthowbigthePoolneedsbewhenitfirststartsout.AdditionalPool
VolumesmaybeaddedtoaPoolwhenitsspacebeginstorunlow(explainedinPoolExpansionbelow),
withanoveralllimitof1024LDEVsperPool.
YoushouldusethesamedisktypeandRAIDlevelforeveryPoolVolumewithinaPool.Additionally,all
oftheLDEVsfromaParityGroupshouldbeassignedtothesamePool.Itisrecommendedthattherebe
32ormoredisksperPoolinordertohaveanadequatediskIOPSpool.Thiscouldbefour8diskParity
Groups(suchas4D+4D,withfourLDEVseach)oreight4diskParityGroups(suchasRAID53D+1P).
TherearevariousrestrictionsontheLDEVsusedforPools.Theserestrictionsinclude:
There can be a maximum of 1024 Pool Volumes assigned to a single Hitachi Dynamic
ProvisioningPool.
PoolVolumes(LDEVs)mustcomefrominternaldisksorexternalstorageonthesameUniversal
StoragePlatformV(cantcreatePoolsacrosssystems).
PoolVolumesmustbeoftheOPENVemulationtype.
RESTRICTEDCONFIDENTIAL
Page47
APoolVolumeisanLDEV,butitisanormalLDEVcreatedfrom(orallocatedfrom)aninternal
RAIDGrouporexternalstorage,soitsLDKC:CU:LDEVaddressisassignedbeforethevolumeis
definedasanHitachiDynamicProvisioningpoolvolume.
ALUSEvolumecannotbeusedasaPoolVolume.
If used, all Parity Groups that are members of a Pool must be assigned to the same cache
partition (CLPR). Hence, all associated DPVOLs will also be assigned to the same VPM cache
partition. However, the Parity Groups for other Pools may be assigned to different cache
partitions.
Once an LDEV becomes a Pool Volume (assigned to a given Pool), there are a wide variety of
restrictions that are invoked in terms of what you can and cannot do with it, particularly in
terms of replication functions. However restrictions that apply to Pool Volume do not
encompasstheDPVOLswhichhaveveryfewextrarestrictionsovertraditionLDEVs.
YoucannotremoveaPoolVolumefromanHitachiDynamicProvisioningPoolwithoutdeleting
the entire Pool (first disconnecting all of its DPVOLs volumes) from the Universal Storage
PlatformVsystem.
AnLDEVmaynotbeuninstalledfromthesystemifitisaPoolVolume.
EachLDEVmustbeRAIDformatted(scrubbedwithzeros)beforebeingassignedtoaPool(you
cantformataPool).
RESTRICTEDCONFIDENTIAL
Page48
500GB
DP-VOL
1000GB
DP-VOL
500GB V-VOL
Group
1TB V-VOL
Group
HDP Physical Pool
288GB LDEV
Pool Volume
288GB LDEV
Pool Volume
288GB LDEV
Pool Volume
288GB LDEV
Pool Volume
2D+2D
RAID Group
2D+2D
RAID Group
2D+2D
RAID Group
2D+2D
RAID Group
ThephysicalspaceassignedasneededfromthePooltoaVVOLGroupcontainerisevenlydistributed
acrossthePoolVolumesbythemechanismofthePoolsFreePageList.AsaserverwritestoaDPVOL,
andnewpagesarerequired,theseareassignedtotheVVOLGroupforusebythatDPVOL.Notethat
thisassignmentisfromthenextavailablepageintheFreePageList,notfromjustanywhereintheFree
PageList.ItisrandominthesensethatyoudonotknowfromwhichPoolVolumethenextpageforan
individualDPVOLwillbemapped.
However,theserverseeseachDPVOLashavingthecompleteSCSILogicalBlockAddress(LBA)rangeof
512 byte physical blocks that would match the logical size that was specified when that DPVOL was
created. This is what thin provisioning is all about space allocated as needed. A DPVOL is assigned
42MBpagesofphysicalspacefromthePoolovertimeastheserverwritestoblocksinthatvolume.
OnceassignedtoaVVOLGroup,apageismappedintotheassociatedDPvolumePageTable.Thisisa
table of pointers matching up the hosts virtual LBA range and physical 512 byte blocks with that DP
volumes assigned pages. The Page Table presents a contiguous space of virtual and assigned 42MB
segmentstothatvolumestartingatbeginningofthatDPVOLsaddressspace(byte0).ADPVOLwhose
size is not a whole number of 42MB pages can acquire a page that will be only partially used at the
highest address range for that DPVOL. The remainder of that page is unused (but this is a very small
amountofspace).
If a DPVOL was specified to be a 100GB volume, that would represent 209,715,200 such SCSI blocks
sittingon2,43942MBPoolpages.EachassignedPoolpagewillcontainaspecific(predetermined)range
ofconsecutive86,016512byteblocks.Asillustratedbelow,onepageinaDPVOLwillcontainLBA0to
LBA86015,andsomeotherpagewillcontainLBA86016toLBA172031.Whetheraphysicalpagehas
beenassignedisdependentonwhetheranyLBAwithinthatfixedrangehasbeenwrittentobythehost.
For example, if the server wrote 8KB file system blocks to a new DPVOL at LBA locations 0, 86016,
RESTRICTEDCONFIDENTIAL
Page49
172032,258048,and344064thenthatDPVOLwouldhavefivenewPoolpagesassigned.Iftheserver
next wrote to LBA locations 16, 86032, 172048, 258064, and 344080 (or such) then those same five
pageswouldbeused.
Figure 38. Illustration of Pool Page to a Volumes SCSI LBA Range
ThenumberofPoolpagesinitiallyassignedtoaDPVOLbyfilesystemformattingwilldependuponthe
operating system, the file system, and the size of the volume. Once the server formats a volume, the
Poolwillhaveassignedsomenumberofphysicalpagestoit.Theamountofthephysicalspaceinitially
requiredbythisformattingoperationisdeterminedbytheamountofmetadatathefilesysteminitially
writestothatvolume,alongwiththedistribution(stride)oftheseblocksofmetadataacrosstheentire
volume.Hence,thenumberofpagesinitiallyassignedtovolumebytheformattingoperationwillvary.
SeeAppendix13foratableoftypicalfilesystemusage.
If,forexample,aDPVOLisformattedbyafilesystem,andthefilesystemsblocksizeis8KB,therewill
be5,376file systemblocksperpage(each8KBblockconsuming 16LBAblocks).Thefirst timeoneof
theseblocksfromaDPVOLvirtualpageiswrittento,thefollowingsequenceoccurs:
ThenextavailablefreePoolpagewillbeassignedfromthePoolsFreePageList.
ThatpagewillbemappedintothatDPVOLsPageTableandassociatedwithacertainLBArange.
Thenewblock(s)willbewrittentothatpage.
Usage Mechanisms
IfahostReadispostedtoanareaoftheDPVOLthatdoesnotyethavephysicalspace(noblockshave
beenwrittenthereyet),theHitachiDynamicProvisioningDynamicMappingTablewilldirectthereadto
aspecialNullPage.NewPoolpagestocoverthatregionwillnotbeallocatedfromtheFreePageList
sinceitwasaRead.
IfaWriteispostedtoanareaoftheDPVOLwhichdoesnotyethavephysicalspace,theDPVOLrequests
anewpageandassignsittothatLBArangeinthePageTable.
IfaWriteispostedtoanareaoftheDPVOLwhichdoesnotyethavephysicalspace,andthePoolhas
completely run out of space, the host will get a Write Protect error. During this out of space
condition,allReadoperationsspecifyinganLBAregionofaDPVOLwithoutanassignedpagewillnow
returnanIllegalRequesterrorratherthantheNullPage.Theseerrorconditionswillbeclearedonce
additionalphysicalPoolVolumespacehasbeenaddedtothatPoolorsomeDPVOLsareformattedand
disconnectedfromthePool,therebyfreeingupalloftheirpreviouslyallocatedpages.
RESTRICTEDCONFIDENTIAL
Page50
TherewillnormallybeseveralDPVOLsconnectedtoasingleHitachiDynamicProvisioningPool.Aseach
DPVOLbeginsreceivingwritesfromtheservers,thenumberofassignedPoolpagestothatDPVOLwill
grow.AsthisgroupofDPVOLsconsumesPoolPagesovertime,thePoolsFreeSpaceThresholdcanbe
reachedandthestorageadministratorwillreceiveanalerttellinghimtoaddadditionalPoolVolumesto
thatPool.TheamountofphysicalspaceaDPVOLcanacquireovertimewillbelimitedbytheDPVOLsize
(upto4TB)whenitwasdefined.ThesumofallthevirtualsizesofthoseDPVOLsconnectedtoaPool,
roundeduptoanintegralnumberofpages,willbethemaximumphysicalPoolsizerequiredovertime.
ADPVOLmayonlybeconnectedtoonePool.
PagesarereleasedfromaDPVOLwhentheassociationtotheHitachiDynamicProvisioningPool
is deleted. Since these Pages were merely mapped from the Pool to the DPVOLs local Page
Table,theseNfreedPagesreturntothatPoolsFreePageListandtheywillbereassignedto
DPVOLsbythenextNrequestsforPages.
PagescomprisedofallbinaryzeroscanbefreedfromaDPVOL.Aconditionwheresomepages
areallzeroisnormalwhenatraditionalvolumeismigratedasistoaDPVOL.
Therecanbeupto8,192DPVOLsassignedtoanHitachiDynamicProvisioningPool.
ADPVOLcanrangefrom46MBto4TBinsize.
ADPVOLcanbeincreasedincapacity(upto4TB).
Intermsofusage,aDPVOLisprettymuchlikeastandardLDEVwithafewrestrictions.
1. ADPVOLhasanormalLDKC:CU:LDEVnumberbutthisisinaspecialtypeofParityGroup
(theVVOLGroup).
2. UsingtheusualHDStoolsyoucanuseDRUtoprotectaDPVOL.
3. ShadowImagepairs,TrueCopypairs,HURpairs,orCoWPVOLscanbecreatedfrom
DPVOLs.
4. DPVOLcannotusecacheresidency.
5. ADPVOLalsodiffersfromanormalLDEVinthatyoucan'tresizeorsubdivideitusingVLVI
orgluethemtogetherwithLUSE.
Page51
pages from all of the Pool Volumes using intelligent algorithms that take all of the Pool factors into
consideration. The Free Page List is generated (or regenerated) when the Optimize button in Storage
NavigatorisselectedafteraddingLDEVstoaPool(eitherinitiallyorsubsequently).Theconceptofthin
provisionedcapacityinvolvesthePoolpagecapacityactuallyassignedtoaDPVOLwhichgrowsover
timeandnotthestatic,reportedsizeofthevolumefromthehostsperspective.
Note: If one does not use the Optimize button when appropriate, a new Free Page List will be
dynamicallycreatedovertime.However,thisoperationismuchfasterwhendonepurposefullywiththe
Optimizebutton.
EachDPVOLhasaPageTableforkeepingtrackofpointerstoitsassignedPoolpages.Thisisalistofall
assignedPoolpagesandwheretofindthemonthePoolVolumes.EachPageTableisalsoalookuptable
ofassignedphysicalpagestotheserversvirtualaddressspace(theLBArangeofallthe512byteblocks
inthatvolume).NotethatonceaPoolpageisassignedtoaDPVOL,itremainsthereregardlessofthe
stateofthedatawrittentoblocksonthatpage.Ifallfilesthathappentoresideonthatpageisdeleted
bythehostfilesystem,leavingthepageemptywithalotofunneededghostdata,itdoesntmatter.
As described previously, each page once assigned to the volume becomes a piece of the physical
spaceforaspecificLBArangeforthatvolume.Onceallocatedandcontaininganynonbinaryzerodata
thenthepageassignmentispermanentuntilthatDPVOLisdeleted.
Note: When using Tiered Storage Manager, once a DPVOL is moved from one Pool into
another,theoriginalPoolpagesareautomaticallyreformattedandreleasedbacktothePool.
TheOptimizebuttonhasnoeffectexceptafteraddingnewPoolVolumesordeletingDPVOLs.
TherearesomehousekeepingpagesfromeachPoolthatarenotavailableforassignmenttoDPVOLs.
Thereisasmall84MBsystemregion(2pages)perPoolforeachassignedPoolVolume.Thereisalsoa
small4GB(98page)regionineveryHitachiDynamicProvisioningPoolthatisreservedbythesystemto
useforamirrorcopyoftheDMTinSharedMemory.ForUniversalStoragePlatformVreleaseG01,with
thesettingofaflagintheSVP,theDMTiscopiedtotheSVPsinternaldiskonasystempowerdown
(PSOFF). For G02, the DMT is mirrored to this reserved 4GB region in the first Hitachi Dynamic
ProvisioningPoolcurrentlydefinedinthesystem.
There is a simple physical structure for Pool pages. An individual 42MB page is laid down on a whole
number of consecutive RAID stripes from a single Pool Volume. The Table below illustrates the
relationship among different RAID levels, their various RAID stripe sizes, and the assignment of
consecutivestripestoasinglepage.
Table 12. Number of RAID Stripes per 42MB Pool Page
RAIDType
RAID10
RAID5
RAID5
RAID6
RAIDLevel
2+2
3+1
7+1
6+2
DataDisks
2
3
7
6
RAIDStripeSize
(KB)
1024
1536
3584
3072
RAIDStripes
perHDPPage
42
28
12
14
Take,forexample,thecaseofanHitachiDynamicProvisioningPoolthatiscomposedofseveralRAID5
(7D+1P) Pool Volumes (LDEVs). Each LDEV has a formatted RAID stripe width of 3,584KB. This comes
fromtwo256KBconsecutivechunksperdisk,with(effectively)sevendatadisksinthestripe(256+256x
RESTRICTEDCONFIDENTIAL
Page52
7).[SeeAppendix2forthevariousHitachiRAIDleveldetails.]Therefore,twelvesuchRAIDstripesfroma
single RAID5 (7D+1P) LDEV will be used to create a single 42MB Pool Page. In the case of a Pool
consisting of RAID10 (2D+2D) Parity Groups, each LDEV will provide 42 consecutive stripes for each
42MBPoolpage.
Pool Expansion
As Pages in the Pool are assigned to DPVOLs over time and the number of free Pages drops below a
certainthresholdoftheinitialPoolcapacity,moreLDEVswillhavetobeaddedtothatPool.Oncenew
LDEV(s)areaddedtothePoolandtheOptimizebuttoninStorageNavigatorisused,theFreePageList
willonceagainbeoptimized.TheexistingFreePageListwillbeextendedusingthenewpagesavailable
from the additional Pool Volumes. When new Pool Volumes are added to a Pool once the remaining
FreePagecountislow,theHDSrecommendationistoaddanothersetofLDEVsthatmatchtheinitial
set(sameRAIDlevel,samediskRPM).ThiswillensurethattheDPVOLperformanceremainsonparwith
whatwaspreviouslyexperiencedforthatPool.
Forexample,aPoolthatstartedwithsixLDEVswillhaveacertainperformanceduetothenumberof
disksandtheRAIDlevelused(writepenalty).IfsixRAID10(2D+2D)ParityGroupswereusedforthese
six LDEVs, that Pool would have started out with 24 disks of IOPS power. Each individual page would
havetheIOPSpoweroftheunderlyingLDEV(suchas2D+2D),butonaveragethesetofpagesmapped
toaDPVOLwouldincludepagesfromallPoolVolumes,thusdeliveringthefull24diskIOPSpower.
Supposethattheinitialsetoffreepages(fromthesixLDEVPool)becomedepleted(below20%ofthe
Pool) and two new LDEVs are then added to the Pool (each from different RAID10 2D+2D Parity
Groups). Once the last 20% of original Pool pages are assigned from a Pools Free Page List, the new
pages will now exhibit the overall IOPS power of only 8 disks. There will be a noticeable drop in IOPS
ratesonceavailablefromthe24diskPoolasIOtonewpagesisnowhittingan8diskPool.
Thefollowingmaybeawaytodescribethisconceptina10second"techiespot":
AnewPooliscreatedfromasetofLDEVstakenfromlikeParityGroups.ThemoreParityGroups
in use, the higher the IOPS performance that is available to that Pool. This space is then
"Optimized",wheretheFreePoolPageList(using42MBpages)iscreatedintheHitachiDynamic
ProvisioningregionofSharedMemory.Thislistisbuiltfrom42MBpagesacrossthePoolLDEVs.
These free pages are then assigned on a firstcome firstserve basis as needed to DPVOLs
connectedtothatPool.
WhennewLDEVsareaddedtothePooloncethespacebecomesdepleted,thesameprocessis
followed. A complementary number of LDEVs from enough Parity Groups should be added to
maintaintheIOPSperformancelevelofthePool.
Miscellaneous Hitachi Dynamic Provisioning Details
The configuration of Hitachi Dynamic Provisioning Pools and DP Volumes is done via Storage
Navigator(orHiCommandDeviceManagerHDvM).Thisisalsohowthephysicalspaceusage
per Pool is monitored. There are two Pool Free Space thresholds: a user specified threshold
between 5%95% page utilization and a fixed threshold at 80% utilization. Both of these
thresholdsareusedasthetriggersforthelowPoolspacenotifications.
If more than 4TB of space is required for a single host volume, then multiple DPVOLs may be
usedintheordinaryfashionbyalogicalvolumemanagerontheservertocreateasinglelarge
RESTRICTEDCONFIDENTIAL
Page53
stripedLogicalVolume.NotethatiftheseDPVOLscomefromdifferentPools,thentheoverall
IOPScapacitywillincrease.IfalloftheDPVOLsusedbyahostLVMarefromthesamePool,then
onlythetotalusablecapacitywillincrease.
TheuseofHitachiDynamicProvisioningwillcauseanincreaseintheportprocessor(FEDMP)
utilization for some writes. There is very small increase in utilization for individual reads or
writes to the DPVOL area when physical pages are already allocated from the Pool for that
addressspace.WhenawritetoaDPVOLcausesaphysicalPageallocationfromthePool,there
isabriefutilizationincrease(inmicroseconds)ontheassociatedFEDMPforthatoperation.
Theassociationofallofthearrayelementsthatmustbeconfiguredisasfollows:
ArrayGroup>ParityGroup>LDEVs(assignLDKC:CU:LDEV)>HDPPoolVolume>VVol
Group>DPVol(assignLDKC:CU:LDEV)>LUN.
Thislengthysetofstepsissimilartowhatisrequiredwhenusingahostbasedvolumemanager
such as Veritas VxVM and an array that does not offer Hitachi Dynamic Provisioning. For
instance,ifusinganEMCDMX4,onewouldhavetofollowthesestepswhichwouldhavetobe
repeatedoneveryserverattachedtothatarray:
Disks>Hypervolumes>MetavolumeswithasoftwareRAID>LUNs(assignport:address)>
VMDisk(privateregion)>VMDisk(publicregion)>PLEX>Volume(LUN).
ProgramProductCompatibility
CacheResidencyManager
CacheResidencyManagerforz/OS
CompatibleMirroringforIBMFlashCopy
CompatiblePAVforz/OS
CompatibleReplicationforIBMXRC
CopyonWritesnapshot
PVOL
POOL
DataRetentionUtility
LUNManager
LUNSecurity
OpenVolumeManagement
CVS
LUSE
PerformanceManager
ServerPriorityManager
PoolVol
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
yes
n/a
yes
n/a
Support
DPVol
no
n/a
n/a
n/a
n/a
yes
n/a
yes
yes
yes
yes
no
yes
yes
ShadowImage
ShadowImageforz/OS
n/a
n/a
yes
n/a
StorageNavigator
yes
yes
RESTRICTEDCONFIDENTIAL
Page54
ProgramProductCompatibility
TrueCopy
TrueCopyforz/OS
TrueCopyAsync
TrueCopyAsyncforz/OS
UniversalReplicator
PVOL
SVOL
JournalVOL
UniversalReplicatorforz/OS
UniversalVolumeManager
VirtualLVI
VirtualLVIz/OS
VirtualPartitionManager
VolumeSecurity
VolumeSecurityz/OS
VolumeShredder
PoolVol
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
yes
n/a
n/a
yes
n/a
n/a
n/a
Support
DPVol
yes
n/a
no
n/a
yes
yes
n/a
n/a
n/a
yes
n/a
yes
yes
n/a
yes
VolumeMigration
VolumeRetentionManager
no
n/a
yes
yes
VolumeRetentionManagerz/OS
n/a
n/a
Page55
overadedicated4Gbithostpath.Thisverylarge"native"IOPSsolutionisnotavailablefromany
othervendor.
HDP#2:IfusingDynamicProvisioning,therecouldbe32LDEVsfrom32RAID10(2D+2D)Parity
Groups put into four Pools (128 disks in all), with four DPVOLs created per Pool. Each DPVOL
would be used as an Exchange Storage Group, and four such Exchange Servers could be
supportedbythisconfiguration.Thishuge"native"IOPSsolutionisnotavailablefromanyother
vendor. However, note that 4 paths from each of the four servers would likely be needed to
effectivelyusethishugereservoirofrandomIOPScapacity.
Storage Concepts
Understand Your Customers Environment
Before recommending a storage design for customers, it is important to know how the product will
address the customers specific business needs. The factors that must be taken into consideration
include:Capacity,Performance,Reliability,FeaturesandCostwithrespecttothestorageinfrastructure
component. The more data you have,the easier it is to architect a solution as opposed to just selling
another storage unit. The types of storage configured, including the disk types, the number of RAID
GroupsandtheirRAIDlevels,aswellasthenumberandtypeofhostpathsisimportanttothesolution.
Disk Types
Table13belowliststhe diskstypesavailableintheUniversalStoragePlatformVstoragearrays.Note
that the table lists the true sizes typically seen by a host after formatting. The additional formatting
appliedbyhostbasedLogicalVolumeManagersandfilesystemswillconsumeadditionalspace.
Table 13. List of Disk Types and Characteristics, Along with Nominal Rule-of-Thumb IOPS Limits
Port
Speed
Advertised
Size(GB)
1TBSATA7.2kRPM
750GBSATA7.2kRPM
450GBFC15kRPM
400GBFC10kRPM
300GBFC15kRPM
146GBFlash
146GBFC15kRPM
72GBFlash
3Gb/sec
3Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
4Gb/sec
1000
750
450
400
300
146
146
72
976
738.62
432
384
288.2
143.76
143.76
71.5
927.2
701.7
410.4
364.8
273.8
136.6
136.6
67.9
80
80
180
130
180
5000read1500write
180
5000read1500write
72GBFC15kRPM
4Gb/sec
72
71.5
67.9
180
HDDType
Apparent UsableSize
Size(GB)
(GB)
NominalRandom
IOPS
The table above illustrates the advertised size (base10), the apparent size in the tools (still base10),
andthetypicalusable(afterParityGroupformatting)sizeofeachtypeofdisk.Alsoshownistheruleof
thumb average maximum random 8KB block IOPS rate for each type of disk when using most of the
disksurface. Alsoshownarethe nominalratesexpectedforFlashDriveswhenusedasmembersof a
RAID5(7D+1P)groupacrosstwoBEDpairs(fourboards,8FCALlooppairs).
RESTRICTEDCONFIDENTIAL
Page56
Every disk has a maximum random small block IOPS rate determined by seek times (head movement
across the disk) and rotational delays (disk RPM). The number and size of the LUNs created per RAID
Group determine how much of a disks surface is in active use. When using the full disk surface, any
expectationsforhighersustainedlevelsofIOPSperdiskmustbemetbysustainedcachehitratesonthe
storagearray(avoidingphysicalreadsfromdisk).Notethat1020%isatypicalcachehitrateonOpen
Systemsservers.
AsmoreofthedisksurfaceisallocatedtoLUNsforaRAIDGroup,thefartherthediskheadsmustseek.
Thiscreateshigheraverageseektimes.Forexample(usinga15kRPMdisk),whenonlyusingtheouter
25% of the surface the average read seek time can be around 1.8ms (providing about 267 IOPS),
whereasthe100%surfaceaverageseektimewillbeabout3.8ms(about182IOPS).ForaSATAdiskat
7200 RPM, these values are about 4.3ms (119 IOPS) for using 25% of the disk surface and 8.5ms (79
IOPS)whenusingthefulldisksurface.Allwriteratesareabout5%lowerthanthereadratesduetothe
higherseekprecisionrequired.SeeAppendix14formoredetailsontheoreticalestimatesofdiskIOPS
rates.
RAID Levels
TheUniversalStoragePlatformVarraycurrentlysupportsRAIDlevels5,6,and10.RAID5isthemost
spaceefficientofthesethreeRAIDlevels.
RAID5isagroupofdisks(RAIDGrouporParityGroup)withthespaceofonediskusedforthe
rotating parity chunk per RAID stripe (row of chunks across the set of disks). If using a 7D+1P
configuration(7datadisks,1paritydisk),thenyouget87.5%capacityutilizationforuserdata
blocksoutofthatRAIDGroup.
RAID6isRAID5withasecondparitydiskforaseconduniqueparityblock.Thesecondparity
block includes all of the data chunks plus the first parity chunk for that row. This would be
indicatedasa6D+2Pconstruction(75%capacityutilization)ifusing8disks.
RAID10isamirroringandstripingmechanism.First,individualpairsofdisksareplacedintoa
mirrorstate.ThentwoofthesepairsareusedinasimpleRAID0stripe.Astherearefourdisks
in the RAID Group, this would be represented as RAID10 (2D+2D) and have 50% capacity
utilization.RAID10isnotthesameasRAID0+1,althoughsloppyusagebymanywouldleadone
tothinkthisisthecase.RAID0+1isaRAID0stripeofNdisksmirroredtoanotherRAID0stripe
ofNdisks.Thiswouldalsobeshownas2D+2Dfora4diskconstruction.However,ifonedisk
fails, that RAID0 stripe also fails, and the mirror then fails, leaving a user with a single
unprotectedRAID0group.InthecaseofrealRAID10,onediskofeachmirrorpairwouldhave
tofailbeforegettingintothissameunprotectedstate.
ThefactorsindeterminingwhichRAIDleveltousearecost,reliability,andperformance.Table14shows
the major benefits and disadvantage of each RAID type. Each type provides its own unique set of
benefitssoaclearunderstandingofyourcustomersrequirementsiscrucialinthisdecision.
RESTRICTEDCONFIDENTIAL
Page57
RAID10
RAID5
RAID6
Description
DataStripingandMirroring
DataStripingwith
distributedparity
DataStripingwithtwo
distributedparities
Numberof
Disks
4/8
4/8
Benefit
Highestperformancewithdata
redundancy;higherwriteIOPSper
ParityGroupthanwithsimilarRAID5.
Thebestbalanceofcost,
reliability,and
performance.
Balanceofcost,with
extremeemphasison
reliability
Disadvantages
Highercostpernumberofphysical
disks
Performancepenaltyfor
highpercentageof
RandomWrites
Performancepenaltyfor
allwrites
AnothercharacteristicofRAIDistheideaofwritepenalty.EachtypeofRAIDhasadifferentbackend
physicaldiskI/Ocost,determinedbythemechanismofthatRAIDlevel.Thetablebelowillustratesthe
tradeoffsbetweenthevariousRAIDlevelsforwriteoperations.Thereareadditionalphysicaldiskreads
andwritesforeveryapplicationwriteduetotheuseofmirrorsorXORparity.NotethatSATAdisksare
usuallydeployedwithRAID6toprotectagainstaseconddrivefailurewithintheParityGroupduringthe
lengthy disk rebuild of a failed drive. Also note that you must deploy many more SATA disks than FC
diskstobeabletomeetthesamelevelofIOPSperformance.
Table 15. RAID Write Penalties
HDD IOPS
per Host
Read
HDD IOPS
per Host
Write
RAID10
RAID5
RAID6
1
1
1
2
4
6
RAID10
RAID5
1
1
4
6
RAID6
FC
SATA
WhenusingFibreChanneldisks,awriteverifyoperationisperformedinternallyaftereachwritetothe
disk(fordataorparity).ThisBEDissuedcommandsimply,byalgorithmicmeans,verifiesthatitwrote
thedatawhereitwastoldtowritethedata.ThissamecommandisnotavailableforSATAHDDs.To
protectdata,inthecaseofSATAdisks,thereareadditionalphysicalI/Ooperationsperwriteinorderto
comparewhatwasjustwrittentodiskwiththedataandparityblocksheldincache.WithRAID6,there
arethreesuchblocks:data,parity1,andparity2.
RESTRICTEDCONFIDENTIAL
Page58
Table 16. Break Down of RAID Level Write Costs (Fibre Channel Disks)
RAID10
RAID5
RAID6
IOPSConsumedperhostI/Orequest
1DataWrite,1mirroredDataWrite
2reads(1data,1parity),2writes(1data,1parity)
3reads(1data,2parity),3writes(1data,2parity)
SATAwrites:alsoreadandcomparedataandparitywrites(mirror)
Page59
to various interactions between the LVM and the LUNs. One such example is the case of large block
sequentialI/O.IftheLVMstripesizeisequaltotheRAIDchunksize,thenaseriesofrequestswillbe
issued to different LUNs for that same I/O, making the request appear to be several random I/O
operationstothestoragearray.Thiscandefeatthearrayssequentialdetectmechanisms,andturnoff
sequentialprefetch,slowingdownthesetypesofoperations.
Port I/O Request Limits, LUN Queue Depths, and Transfer sizes
There are three aspects of I/O request handling per storage path that need to be understood. At the
portlevel,therearethemechanismsofanI/Orequestlimitandamaximumtransfersize.AttheLUN
level,thereisthemechanismofthemaximumqueuedepth.
Port I/O Request Limits
Each server and storage array has a particular port I/O request limit, this being the number of total
outstandingI/Orequeststhatmaybeissuedtoaport (notaLUN)atany pointin time.Thislimitwill
vary according to the servers Fibre Channel HBA and its driver, as well as the limit supported by the
target storage port. As switches serve to aggregate many HBAs to the same storage system port, the
operatinglimitwillbethelowerofthesetwopoints.InthecaseofaSANswitchbeingusedtoattach
multipleserverportstoasinglestorageport(fanin),thelimitwillmostlikelybethelimitsupportedby
thestorageport.
OntheUniversalStoragePlatformV,theI/OrequestlimitperFEDMP(controllingeither1or2ports)is
4096.Thismeansthat,atanyonetime,therecanbeupto4096activehostI/Ocommandsqueuedup
forthevariousLUNsvisibleonthatMPsports.
RESTRICTEDCONFIDENTIAL
Page60
Most environments dont require that all LUNs be active at the same time. As I/O requests are
applicationdriven,thismustbetakenintoconsideration.Understandingthecustomersrequirements
for performance will dictate how many LUNs should be assigned to a single port. Environments that
have low I/O demands overall will allow a large number of LUNs to be assigned per port without
routinelyexceedingthequeuelimitforthatport.Ontheotherhand,environmentswithhighlyactive
devicesandmultipleI/OrequestshouldbedesignedwithfewerLUNsperport,whereadditionalports
willberequiredinordertosatisfytheI/Odemand.
RESTRICTEDCONFIDENTIAL
Page61
Figure 39. Table of Maximum LUN Queue Depths and Port Limits
MaximumQueueDepth
QDPerLUN
QDPerExternalPortLUN
16
8
32
2128
32
PortIORequestLimitforHosts
PerMPIOReq.Limit
PerPortIOReq.Limit
2048
1024
4096
2048
PortIORequestLimitforExternalStorage
PerMPIOReq.Limit
PerPortIOReq.Limit
256
128
512
256
ArrayType
USP
USPV
AMSFC/SASDisk
AMSSATADisk
ArrayType
USP
USPV
ArrayType
USP
USPV
Workload Characteristics
Mostapplications,fromadisksubsystempointofview,canbecategorizedintooneofseventypesof
applicationprofiles(thoughtherearetheextremecasesthatarenotcoveredhere):
OnlineTransactionalProcessingRandomRead/Writetodatafiles,withSequentialRead/Write
toLogfunctions
Decision Support Systems Sequential or Random Read, depending on data access method,
postETL
GeneralPurposeFilesystemsRandomRead/WritewithpotentiallysomeSequentialI/O
MessagingsystemsRandomRead/Write
WebServersystemsMixedRandomandSequentialwithvariousblocksizes.
ContentRichMediaStreamingand/orVideoStreamingSequentialRead/Write
HighperformanceComputingSequentialRead/Write
The applications that do not directly fit into one of these seven categories will most likely be a
combinationoftwoormoreofthesecategories.
RESTRICTEDCONFIDENTIAL
Page62
Page63
Fanout
Fan out allows a host node to take advantage of several storage ports (and possibly additional port
processors)fromasinglehostport(onetomany).Fanouthasapotentialperformancebenefitforsmall
blockrandomI/Oworkloads.Thisallowsmultiplestorageports(andtheirqueues)toserviceasmaller
numberofhostports.Fanouttypicallydoesnotbenefitenvironmentswithhighthroughput(MB/sec)
requirementsduethetransferlimitsofthehostbusadapters(HBAs).
Summary
The Universal Storage Platform V offers great flexibility and it should perform superbly in any
environment.Atthehardwarelevel,itoffersconsiderablymoreIOPSpowerthandoestheequivalent
Universal Storage Platform line of products. Even though the Universal Storage Platform already
delivered bestofclass Sequential performance (about 3x higher than any competition with >9GB/s
Read, >4GB/s write), the Universal Storage Platform V significantly outperforms the Universal Storage
Platform.
It is important for users to understand the two configuration methods now available with the new
DynamicProvisioningfeature.TheHitachiDynamicProvisioningfeaturecanbeusedforconfiguringThin
Provisioning and for the creation of High Performance volumes. While only certain applications will
benefitfromtheThinProvisioningaspect,mostcantakeadvantageofthehighperformancepervolume
feature.
It is vital, however, that Hitachi Data Systems sales personnel, technical support staff, valueadded
resellers, and others who are responsible delivery of solutions, invest the time required to design the
bestpossiblesolutiontomeeteachcustomersuniquerequirements,whetheritbecapacity,reliability,
performance or cost. Toassist in delivering the highest quality solution, Hitachi Data Systems Global
SolutionServicesshouldbeengaged.
RESTRICTEDCONFIDENTIAL
Page64
18-14
12:0D
4 HDD
18-12
12:0B
4 HDD
DKA 8
18-10
18-8
12:09
12:07
4 HDD
4 HDD
18-6
12:05
4 HDD
18-4
12:03
4 HDD
18-2
12:01
4 HDD
H
D
U
17-16
11:0F
4 HDD
17-14
11:0D
4 HDD
17-12
11:0B
4 HDD
DKA 7
17-10
17-8
11:09
11:07
4 HDD
4 HDD
17-6
11:05
4 HDD
17-4
11:03
4 HDD
17-2
11:01
4 HDD
18-15
12:0E
4 HDD
18-13
12:0C
4 HDD
18-11
12:0A
4 HDD
18-9
12:08
4 HDD
18-7
12:06
4 HDD
18-5
12:04
4 HDD
18-3
12:02
4 HDD
18-1
12:00
4 HDD
H
D
U
17-15
11:0E
4 HDD
17-13
11:0C
4 HDD
17-11
11:0A
4 HDD
17-9
11:08
4 HDD
17-7
11:06
4 HDD
17-5
11:04
4 HDD
17-3
11:02
4 HDD
17-1
11:00
4 HDD
16-16
10:0F
4 HDD
16-14
10:0D
4 HDD
16-12
10:0B
4 HDD
DKA 6
16-10
16-8
10:09
10:07
4 HDD
4 HDD
16-6
10:05
4 HDD
16-4
10:03
4 HDD
16-2
10:01
4 HDD
H
D
U
15-16
0F:0F
4 HDD
15-14
0F:0D
4 HDD
15-12
0F:0B
4 HDD
DKA 5
15-10
15-8
0F:09
0F:07
4 HDD
4 HDD
15-6
0F:05
4 HDD
15-4
0F:03
4 HDD
15-2
0F:01
4 HDD
16-15
10:0E
4 HDD
16-13
10:0C
4 HDD
16-11
10:0A
4 HDD
16-9
10:08
4 HDD
16-7
10:06
4 HDD
16-5
10:04
4 HDD
16-3
10:02
4 HDD
16-1
10:00
4 HDD
H
D
U
15-15
0F:0E
4 HDD
15-13
0F:0C
4 HDD
15-11
0F:0A
4 HDD
15-9
0F:08
4 HDD
15-7
0F:06
4 HDD
15-5
0F:04
4 HDD
15-3
0F:02
4 HDD
15-1
0F:00
4 HDD
10-16
0A:0F
4 HDD
10-14
0A:0D
4 HDD
10-12
0A:0B
4 HDD
DKA 8
10-10
10-8
0A:09
0A:07
4 HDD
4 HDD
10-6
0A:05
4 HDD
10-4
0A:03
4 HDD
10-2
0A:01
4 HDD
H
D
U
9-16
09:0F
4 HDD
9-14
09:0D
4 HDD
9-12
09:0B
4 HDD
DKA 7
9-10
9-8
09:09
09:07
4 HDD
4 HDD
9-6
09:05
4 HDD
9-4
09:03
4 HDD
9-2
09:01
4 HDD
10-15
0A:0E
4 HDD
10-13
0A:0C
4 HDD
10-11
0A:0A
4 HDD
10-9
0A:08
4 HDD
10-7
0A:06
4 HDD
10-5
0A:04
4 HDD
10-3
0A:02
4 HDD
10-1
0A:00
4 HDD
H
D
U
9-15
09:0E
4 HDD
9-13
09:0C
4 HDD
9-11
09:0A
4 HDD
9-9
09:08
4 HDD
9-7
09:06
4 HDD
9-5
09:04
4 HDD
9-3
09:02
4 HDD
9-1
09:00
4 HDD
8-16
08:0F
4 HDD
8-14
08:0D
4 HDD
8-12
08:0B
4 HDD
DKA 6
8-10
8-8
08:09
08:07
4 HDD
4 HDD
8-6
08:05
4 HDD
8-4
08:03
4 HDD
8-2
08:01
4 HDD
H
D
U
7-16
07:0F
4 HDD
7-14
07:0D
4 HDD
7-12
07:0B
4 HDD
DKA 5
7-10
7-8
07:09
07:07
4 HDD
4 HDD
7-6
07:05
4 HDD
7-4
07:03
4 HDD
7-2
07:01
4 HDD
8-15
08:0E
4 HDD
8-13
08:0C
4 HDD
8-11
08:0A
4 HDD
8-9
08:08
4 HDD
8-7
08:06
4 HDD
8-5
08:04
4 HDD
8-3
08:02
4 HDD
8-1
08:00
4 HDD
H
D
U
7-15
07:0E
4 HDD
7-13
07:0C
4 HDD
7-11
07:0A
4 HDD
7-9
07:08
4 HDD
7-7
07:06
4 HDD
7-5
07:04
4 HDD
7-3
07:02
4 HDD
7-1
07:00
4 HDD
2-16
02:0F
2-14
02:0D
2-12
02:0B
DKA 2
2-10
2-8
02:09
02:07
2-6
02:05
2-4
02:03
2-2
02:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
2-15
02:0E
2-13
02:0C
2-11
02:0A
2-9
02:08
2-7
02:06
2-5
02:02
2-3
02:02
2-1
02:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
6-16
06:0F
6-14
06:0D
6-12
06:0B
DKA 4
6-10
6-8
06:09
06:07
6-6
06:05
6-4
06:03
6-2
06:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
6-15
06:0E
6-13
06:0C
6-11
06:0A
6-9
06:08
6-7
06:06
6-5
06:04
6-3
06:02
6-1
06:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
1-16
01:0F
1-14
01:0D
1-12
01:0B
DKA 1
1-10
1-8
01:09
01:07
1-6
01:05
1-4
01:03
1-2
01:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
1-15
01:0E
1-13
01:0C
1-11
01:0A
1-9
01:08
1-7
01:06
1-5
01:04
1-3
01:02
1-1
01:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4-16
04:0F
4-14
04:0D
4-12
04:0B
DKA 2
4-10
4-8
04:09
04:07
4-6
04:05
4-4
04:03
4-2
04:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4-15
04:0E
4-13
04:0C
4-11
04:0A
4-9
04:08
4-7
04:06
4-5
04:04
4-3
04:02
4-1
04:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
14-16
0E:0F
14-14
0E:0D
14-12
0E:0B
DKA 4
14-10
14-8
0E:09
0E:07
14-6
0E:05
14-4
0E:03
14-2
0E:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
14-15
0E:0E
14-13
0E:0C
14-11
0E:0A
14-9
0E:08
14-7
0E:06
14-5
0E:04
14-3
0E:02
14-1
0E:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
H
D
U
H
D
U
H
D
U
H
D
U
5-16
05:0F
5-14
05:0D
5-12
05:0B
DKA 3
5-10
5-8
05:09
05:07
5-6
05:05
5-4
05:03
5-2
05:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
5-15
05:0E
5-13
05:0C
5-11
05:0A
5-9
05:08
5-7
05:06
5-5
05:04
5-3
05:02
5-1
05:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
3-16
03:0F
3-14
03:0D
3-12
03:0B
DKA 1
3-10
3-8
03:09
03:07
3-6
03:05
3-4
03:03
3-2
03:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
3-15
03:0E
3-13
03:0C
3-11
03:0A
3-9
03:08
3-7
03:06
3-5
03:04
3-3
03:02
3-1
03:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
12-16
0C:0F
12-14
0C:0D
12-12
0C:0B
DKA 2
12-10
12-8
0C:09
0C:07
12-6
0C:05
12-4
0C:03
12-2
0C:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
12-15
0C:0E
12-13
0C:0C
12-11
0C:0A
12-9
0C:08
12-7
0C:06
12-5
0C:04
12-3
0C:02
12-1
0C:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
RESTRICTEDCONFIDENTIAL
H
D
U
13-16
0D:0F
13-14
0D:0D
13-12
0D:0B
DKA 3
13-10
13-8
0D:09
0D:07
13-6
0D:05
13-4
0D:03
13-2
0D:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
H
D
U
13-15
0D:0E
13-13
0D:0C
13-11
0D:0A
13-9
0D:08
13-7
0D:06
13-5
0D:04
13-3
0D:02
13-1
0D:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
H
D
U
11-16
0B:0F
11-14
0B:0D
11-12
0B:0B
DKA 1
11-10
11-8
0B:09
0B:07
11-6
0B:05
11-4
0B:03
11-2
0B:01
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
H
D
U
11-15
0B:0E
11-13
0B:0C
11-11
0B:0A
11-9
0B:08
11-7
0B:06
11-5
0B:04
11-3
0B:02
11-1
0B:00
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
4 HDD
Page65
This section describes how the OpenV Open Systems Groups function on both the Universal Storage
Platform and the Universal Storage Platform V. The track size (chunk) used for OpenV is 256KB. The
available 9900 Lightning style OpenX emulation is available, with its track (chunk) size of 48KB. One
shouldavoidmixingOpenXandthestandardOpenVemulationsonthesamestoragesystem.
WhenusinggenericRAID5,onewouldnormallycalculatethestripewidth(rowsize)bymultiplyingthe
RAID chunk size (or stripe size) by 7 (if 7D+1P). In the case of Open Systems (OpenV volumes), the
UniversalStoragePlatformVusesanativechunksizeof256KB.Henceyouwouldexpectarowsizeof
1,792KB. However, the Universal Storage Platform V manages RAID5 somewhat differently than any
othervendor.ForbothUniversalStoragePlatformVRAID5+types,therearetwoadjacent256KBRAID
chunkstakenfromthesamediskbeforeswitchingtothenextdiskintherow,sotheeffectivechunksize
is512KB.ThisisillustratedinthetwoFiguresbelow.Thus,therearetwiceasmanychunksperrowas
normal. So the Universal Storage Platform Vs RAID5+ 7D+1P stripe width is actually 3,584KB (7 x
512KB).TheUniversalStoragePlatformVsRAID5+(3D+1P)stripewidthisactually1,536KB(3x512KB).
Figure 40. RAID-5+ (7D+1P) Layout (3,584KB stripe width).
7D+1P OPEN-V
Primary
Disk 1
Parity
Disk 2
Disk 3
Disk 4
Disk 5
Disk 6
Disk 7
Disk 8
Stripe Set 1
Stripe Set 2
RESTRICTEDCONFIDENTIAL
Page66
WhenusinggenericRAID10,onewouldnormallycalculatethestripewidth(rowsize)bymultiplyingthe
RAIDchunksize(orstripesize)by2 (if2D+2D).Inthe caseofOpenSystems(OpenVvolumes), the
UniversalStoragePlatformVusesanativechunksizeof256KB.Henceyouwouldexpectarowsizeof
512KB.
However,theUniversalStoragePlatformVmanagesRAID10+inthesamemannerasRAID5+,where
therearetwoadjacentRAIDchunkstakenfromthesamediskbeforeswitchingtothenextdiskinthe
row.ThisisillustratedinthenextFigure.Thus,therearetwiceasmanychunksperrowasnormal.So
theUniversalStoragePlatformVsRAID10+(2D+2D)is1,024KB(4x512KB).
Figure 42. RAID-10+ (2D+2D) Layout (stripe width 1024KB).
The Universal Storage Platform V manages RAID6+ in the same manner as RAID5+, where there are
twoadjacentRAIDchunkstakenfromthesamediskbeforeswitchingtothenextdiskintherow.Thisis
illustrated in the Figure below. Thus, there are twice as many chunks per row as normal. So the
UniversalStoragePlatformVsRAID66D+2Pstripewidthis3,072KB(6x512KB).
Figure 43. RAID-6+ (6D+2P) Layout (stripe width 3,072KB)
6D+2P OPEN-V
Primary
Disk 1
Parity
Disk 2
Disk 3
Disk 4
Disk 5
Disk 6
Disk 7
Disk 8
RESTRICTEDCONFIDENTIAL
Page67
This section describes how the 3390x (mainframe) and OpenX (9900V Lightning style Open Systems
volumes) Parity Groups function on both the Universal Storage Platform and the Universal Storage
PlatformV.Thetracksize(chunk)usedfor3390xis58KB,andtheOpenXtrack(chunk)sizeis48KB.One
shouldavoidmixingOpenXandthestandardOpenVonthesamestoragesystem.
RAID5+
WhenusinggenericRAID5,onewouldnormallycalculatethestripewidth(rowsize)bymultiplyingthe
RAIDchunksize(orstripesize)by7(if7D+1P).Inthecaseofmainframeemulation(3390xvolumes),
theUniversalStoragePlatformVusesanativetracksizeof58KB.Henceyouwouldexpectarowsizeof
406KB(7x58KB).FortheOpenXemulation,thetracksizeis48KB,withanexpectedrowsizeof336KB
(7x48KB).
However, the Universal Storage Platform V manages RAID5 somewhat differently than any other
vendor.ForbothUniversalStoragePlatformVRAID5+types,thereareeightadjacenttrackstakenfrom
thesamediskbeforeswitchingtothenextdiskintherow.ThisisillustratedinthetwoFiguresbelow.
TheUniversalStoragePlatformVsRAID5+7D+1Pstripewidthfor3390Xis3,248KB(7x8x58KB),and
forOpenXitis2,688KB(7x8x48KB).TheUniversalStoragePlatformVsRAID5+(3D+1P)stripewidth
for3390Xis1,392KB(3x8x256KB),andforOpenXitis1,152KB(3x8x48KB).
Figure 44. RAID-5+ (7D+1P) Layout, 3390-X and Open-X
7D+1P 3390-x & OPEN-x
Primary
Disk 1
Parity
Disk 2
Disk 3
Disk 4
Disk 5
Disk 6
Disk 7
Disk 8
Stripe Set 1
Stripe Set 2
RESTRICTEDCONFIDENTIAL
Page68
Parity
Disk 2
Disk 3
Disk 4
Stripe Set 1
Stripe Set 2
RAID10+
WhenusinggenericRAID10,onewouldnormallycalculatethestripewidth(rowsize)bymultiplyingthe
RAIDchunksize(ortracksize)by2(if2D+2D).Inthecaseofmainframeemulation(3390xvolumes),
theUniversalStoragePlatformVusesanativetracksizeof58KB.Henceyouwouldexpectarowsizeof
116KB(2x58KB).FortheOpenXemulation,thetracksizeis48KB,withanexpectedrowsizeof96KB(2
x48KB).
However,theUniversalStoragePlatformVmanagesRAID10+inthesamemannerasRAID5+,where
thereareeightadjacenttrackstakenfromthesamediskbeforeswitchingtothenextdiskintherow.
ThisisillustratedinthetwoFiguresbelow.SotheUniversalStoragePlatformVsRAID10+(2D+2D)is
928KB(2x8x58KB).TheUniversalStoragePlatformVsRAID10+stripewidthforOpenXis768KB(2x
8x48KB).
RESTRICTEDCONFIDENTIAL
Page69
RAID6+
The Universal Storage Platform V manages RAID6+ in the same manner as RAID5+ where there are
eight adjacent tracks taken from the same disk before switching to the next disk in the row. This is
illustratedintheFigurebelow.Thus,therearetwiceasmanychunksperrowasnormal.So,for3390X,
theUniversalStoragePlatformVsRAID66D+2Pstripewidthisactually2,784KB(6x8x58KB),andfor
OpenXitis2,304KB(6x8x48KB).
Figure 47. RAID-6+ (6D+2P) Layout, 3390-X and Open-X.
RESTRICTEDCONFIDENTIAL
Page70
BED-5
BED14
Slot 1LU
Loop
PGrp
0
7, 15
1
7, 15
2
7, 15
3
7, 15
BED-2
BED12
Slot 1BU
Loop
PGrp
0
2, 4, 12
1
2, 4, 12
2
2, 4, 12
3
2, 4, 12
BED-1
BED10
Slot 1AU
Loop
PGrp
0
1, 3, 11
1
1, 3, 11
2
1, 3, 11
3
1, 3, 11
BED-8
BED17
Slot 1KL
Loop
PGrp
0
10, 18
1
10, 18
2
10, 18
3
10, 18
BED-7
BED15
Slot 1LL
Loop
PGrp
0
9, 17
1
9, 17
2
9, 17
3
9, 17
BED-4
BED13
Slot 1BL
Loop
PGrp
0
6, 14
1
6, 14
2
6, 14
3
6, 14
BED-3
BED11
Slot 1AL
Loop
PGrp
0
5, 13
1
5, 13
2
5, 13
3
5, 13
BED-2
BED1A
Slot 2NU
Loop
PGrp
0
2, 4, 12
1
2, 4, 12
2
2, 4, 12
3
2, 4, 12
BED-5
BED1C
Slot 2XU
Loop
PGrp
0
7, 15
1
7, 15
2
7, 15
3
7, 15
BED-6
BED1E
Slot 2WU
Loop
PGrp
0
8, 16
1
8, 16
2
8, 16
3
8, 16
BED-3
BED19
Slot 2ML
Loop
PGrp
0
5, 13
1
5, 13
2
5, 13
3
5, 13
BED-4
BED1B
Slot 2NL
Loop
PGrp
0
6, 14
1
6, 14
2
6, 14
3
6, 14
BED-7
BED1D
Slot 2XL
Loop
PGrp
0
9, 17
1
9, 17
2
9, 17
3
9, 17
BED-8
BED1F
Slot 2WL
Loop
PGrp
0
10, 18
1
10, 18
2
10, 18
3
10, 18
RESTRICTEDCONFIDENTIAL
Page71
FED-5
FED04
Slot 1GU
Port
MP
1J
mp1
3J
mp2
1K
mp3
3K
mp4
FED-2
FED02
Slot 1FU
Port
MP
1E
mp1
3E
mp2
1F
mp3
3F
mp4
FED-1
FED00
Slot 1EU
Port
MP
1A
mp1
3A
mp2
1B
mp3
3B
mp4
FED-8
FED07
Slot 1HL
Port
MP
1Q
mp1
3Q
mp2
1R
mp3
3R
mp4
FED-7
FED05
Slot 1GL
Port
MP
1L
mp1
3L
mp2
1M
mp3
3M
mp4
FED-4
FED03
Slot 1FL
Port
MP
1G
mp1
3G
mp2
1H
mp3
3H
mp4
FED-3
FED01
Slot 1EL
Port
MP
1C
mp1
3C
mp2
1D
mp3
3D
mp4
FED-2
FED0A
Slot 2RU
Port
MP
2E
mp1
4E
mp2
2F
mp3
4F
mp4
FED-5
FED0C
Slot 2TU
Port
MP
2J
mp1
4J
mp2
2K
mp3
4K
mp4
FED-6
FED0E
Slot 2UU
Port
MP
2N
mp1
4N
mp2
2P
mp3
4P
mp4
FED-3
FED09
Slot 2QL
Port
MP
2C
mp1
4C
mp2
2D
mp3
4D
mp4
FED-4
FED0B
Slot 2RL
Port
MP
2G
mp1
4G
mp2
2H
mp3
4H
mp4
FED-7
FED0D
Slot 2TL
Port
MP
2L
mp1
4L
mp2
2M
mp3
4M
mp4
FED-8
FED0F
Slot 2UL
Port
MP
2Q
mp1
4Q
mp2
2R
mp3
4R
mp4
RESTRICTEDCONFIDENTIAL
Page72
FED-5
FED04
Slot 1GU
Port
MP
1J
mp1
3J
mp2
5J
mp1
7J
mp2
1K
mp3
mp4
3K
5K
mp3
mp4
7K
FED-2
FED02
Slot 1FU
Port
MP
1E
mp1
3E
mp2
5E
mp1
7E
mp2
1F
mp3
3F
mp4
5F
mp3
7F
mp4
FED-1
FED00
Slot 1EU
Port
MP
1A
mp1
3A
mp2
5A
mp1
7A
mp2
1B
mp3
3B
mp4
5B
mp3
7B
mp4
FED-8
FED07
Slot 1HL
Port
MP
1Q
mp1
mp2
3Q
mp1
5Q
mp2
7Q
1R
mp3
3R
mp4
5R
mp3
7R
mp4
FED-7
FED05
Slot 1GL
Port
MP
1L
mp1
3L
mp2
5L
mp1
7L
mp2
1M
mp3
3M
mp4
5M
mp3
7M
mp4
FED-4
FED03
Slot 1FL
Port
MP
1G
mp1
3G
mp2
5G
mp1
7G
mp2
1H
mp3
3H
mp4
5H
mp3
7H
mp4
FED-3
FED01
Slot 1EL
Port
MP
1C
mp1
3C
mp2
5C
mp1
7C
mp2
1D
mp3
3D
mp4
5D
mp3
7D
mp4
FED-2
FED0A
Slot 2RU
Port
MP
2E
mp1
4E
mp2
6E
mp1
8E
mp2
2F
mp3
4F
mp4
6F
mp3
8F
mp4
FED-5
FED0C
Slot 2TU
Port
MP
2J
mp1
4J
mp2
6J
mp1
8J
mp2
2K
mp3
4K
mp4
6K
mp3
8K
mp4
FED-6
FED0E
Slot 2UU
Port
MP
2N
mp1
4N
mp2
6N
mp1
8N
mp2
2P
mp3
4P
mp4
6P
mp3
8P
mp4
FED-3
FED09
Slot 2QL
Port
MP
2C
mp1
4C
mp2
6C
mp1
8C
mp2
2D
mp3
4D
mp4
6D
mp3
8D
mp4
FED-4
FED0B
Slot 2RL
Port
MP
2G
mp1
4G
mp2
6G
mp1
8G
mp2
2H
mp3
4H
mp4
6H
mp3
8H
mp4
FED-7
FED0D
Slot 2TL
Port
MP
2L
mp1
4L
mp2
6L
mp1
8L
mp2
2M
mp3
4M
mp4
6M
mp3
8M
mp4
FED-8
FED0F
Slot 2UL
Port
MP
2Q
mp1
mp2
4Q
6Q
mp1
8Q
mp2
2R
mp3
4R
mp4
6R
mp3
8R
mp4
RESTRICTEDCONFIDENTIAL
Page73
FED-5
FED04
Slot 1GU
Port
MP
1J
1, 2
3J
1, 2
5J
3, 4
7J
3, 4
FED-2
FED02
Slot 1FU
Port
MP
1E
1, 2
3E
1, 2
5E
3, 4
7E
3, 4
FED-1
FED00
Slot 1EU
Port
MP
1A
1, 2
3A
1, 2
5A
3, 4
7A
3, 4
FED-8
FED07
Slot 1HL
Port
MP
1, 2
1Q
1, 2
3Q
3, 4
5Q
7Q
3, 4
FED-7
FED05
Slot 1GL
Port
MP
1L
1, 2
3L
1, 2
5L
3, 4
7L
3, 4
FED-4
FED03
Slot 1FL
Port
MP
1G
1, 2
3G
1, 2
5G
3, 4
7G
3, 4
FED-3
FED01
Slot 1EL
Port
MP
1C
1, 2
3C
1, 2
5C
3, 4
7C
3, 4
FED-2
FED0A
Slot 2RU
Port
MP
2E
1, 2
4E
1, 2
6E
3, 4
8E
3, 4
FED-5
FED0C
Slot 2TU
Port
MP
2J
1, 2
4J
1, 2
6J
3, 4
8J
3, 4
FED-6
FED0E
Slot 2UU
Port
MP
2N
1, 2
4N
1, 2
6N
3, 4
8N
3, 4
FED-3
FED09
Slot 2QL
Port
MP
2C
1, 2
4C
1, 2
6C
3, 4
8C
3, 4
FED-4
FED0B
Slot 2RL
Port
MP
2G
1, 2
4G
1, 2
6G
3, 4
8G
3, 4
FED-7
FED0D
Slot 2TL
Port
MP
2L
1, 2
4L
1, 2
6L
3, 4
8L
3, 4
FED-8
FED0F
Slot 2UL
Port
MP
2Q
1, 2
1, 2
4Q
3, 4
6Q
8Q
3, 4
RESTRICTEDCONFIDENTIAL
Page74
FED-5
FED04
Slot 1GU
Port
MP
1J
1, 2
3J
1, 2
5J
1, 2
7J
1, 2
FED-2
FED02
Slot 1FU
Port
MP
1E
1, 2
3E
1, 2
5E
1, 2
7E
1, 2
FED-1
FED00
Slot 1EU
Port
MP
1A
1, 2
3A
1, 2
5A
1, 2
7A
1, 2
FED-8
FED07
Slot 1HL
Port
MP
1P
1, 2
3P
1, 2
5P
1, 2
7P
1, 2
FED-7
FED05
Slot 1GL
Port
MP
1K
1, 2
3K
1, 2
5K
1, 2
7K
1, 2
FED-4
FED03
Slot 1FL
Port
MP
1F
1, 2
3F
1, 2
5F
1, 2
7F
1, 2
FED-3
FED01
Slot 1EL
Port
MP
1B
1, 2
3B
1, 2
5B
1, 2
7B
1, 2
FED-2
FED0A
Slot 2RU
Port
MP
2E
1, 2
4E
1, 2
6E
1, 2
8E
1, 2
FED-5
FED0C
Slot 2TU
Port
MP
2J
1, 2
4J
1, 2
6J
1, 2
8J
1, 2
FED-6
FED0E
Slot 2UU
Port
MP
2N
1, 2
4N
1, 2
6N
1, 2
8N
1, 2
FED-3
FED09
Slot 2QL
Port
MP
2B
1, 2
4B
1, 2
6B
1, 2
8B
1, 2
FED-4
FED0B
Slot 2RL
Port
MP
2F
1, 2
4F
1, 2
6F
1, 2
8F
1, 2
FED-7
FED0D
Slot 2TL
Port
MP
2K
1, 2
4K
1, 2
6K
1, 2
8K
1, 2
FED-8
FED0F
Slot 2UL
Port
MP
2P
1, 2
4P
1, 2
6P
1, 2
8P
1, 2
RESTRICTEDCONFIDENTIAL
Page75
ThisdiagramillustratesatypicalVeritashostLVMconfigurationandnamingconventions.
SERVERS
STORAGE
Vol 01
Plex #1
data
Volume #1
Plex #2
log
Vol 03
Plex #3
Volume #2
VM10
VM11
VM12
VM13
Vol 03
Plex #4
VM15
Volume #3
VM14
VM16
LUN
VM08
LUN
LUN
VM07
LUN
LUN
VM06
LUN
LUN
VM05
LUN
LUN
VM04
LUN
LUN
VM03
LUN
LUN
VM02
LUN
LUN
VM01
LUN
Page76
RESTRICTEDCONFIDENTIAL
ThissectionillustratestheHitachiDynamicProvisioningenvironmentwithnamingconventions.Note
thesimilaritiesinconceptstothepreviousVeritasexample.
RESTRICTEDCONFIDENTIAL
Page77
RESTRICTEDCONFIDENTIAL
Page78
These are examples of maximum expected physical disk Read and Write IOPS rates for three types of
disks.TheCylinderscolumnsshowthepercentageofsurfacespaceinuseasdiskpartitions.Asmore
disksurfaceisbroughtintouse,theheadsmustseekfartherinwards,whichincreasestheaverageseek
times. Note that the Write table shows reduced maximum IOPS due to the higher track centering
precision required for writes. Please note in the tables below that the three drive types use a
representative capacity. For instance, almost all 15K RPM HDDs would be very similar to the 300GB
Seagate15Kusedinthefirstentry.
Table 17. Calculated Maximum Disk Read IOPS Rates by Type and Active Surface Usage
Cylinders
25%
50%
75%
100%
15,000
1.8
2.0
3.8
15,000
2.5
2.0
4.5
15,000
2.8
2.0
4.8
15,000
3.5
2.0
5.5
267
225
208
182
146GBHitachiFibreChannel10kRPM
DiskRPM
AverageReadSeekTime
AverageRotationalDelay
AverageAccessTime
Random8KBIOPSRate
25%
10,000
2.4
3.0
5.4
187
50%
10,000
3.3
3.0
6.3
159
75%
10,000
3.8
3.0
6.8
148
100%
10,000
4.7
3.0
7.7
130
750GBSeagateSATA7.2kRPM
DiskRPM
AverageReadSeekTime
AverageRotationalDelay
AverageAccessTime
25%
7,200
4.3
4.2
8.4
50%
7,200
6.0
4.2
10.1
75%
7,200
6.8
4.2
11.0
100%
7,200
8.5
4.2
12.7
119
99
91
79
Random8KBIOPSRate
RESTRICTEDCONFIDENTIAL
Page79
Table 2. Calculated maximum disk Write IOPS rates by type and active surface usage
Writesat100%Busy
Cylinders
300GBSeagateFibreChannel15kRPM
DiskRPM
AverageWriteSeekTime
AverageRotationalDelay
AverageAccessTime
Random8KBIOPSRate
146GBHitachiFibreChannel10kRPM
DiskRPM
AverageWriteSeekTime
AverageRotationalDelay
AverageAccessTime
Random8KBIOPSRate
25%
15,000
2.0
2.0
4.0
250
25%
10,000
2.6
3.0
5.6
180
50%
15,000
2.8
2.0
4.8
208
50%
10,000
3.6
3.0
6.6
152
75%
15,000
3.2
2.0
5.2
192
75%
10,000
4.1
3.0
7.1
141
100%
15,000
4.0
2.0
6.0
167
100%
10,000
5.1
3.0
8.1
123
750GBSeagateSATA7.2kRPM
DiskRPM
AverageWriteSeekTime
AverageRotationalDelay
AverageAccessTime
25%
7,200
5.0
4.2
9.2
50%
7,200
7.0
4.2
11.2
75%
7,200
8.0
4.2
12.2
100%
7,200
10.0
4.2
14.2
109
90
82
71
Random8KBIOPSRate
The two tables below indicate what IOPS rates (small block random) might be expected from RAID
Groups of different types when using 15K RPM SAS disks or 7200 RPM SATAII disks. The disk IOPS
referencevaluesweretakenfromTables1and2above,fromthe50%surfaceusagecolumns.Theleft
side of each table is the expected maximum rate where the disks are operating at 100% busy (not
recommended).Therightsideofeachtableshowsamorerealistic30%busyrate.Forexample,when
using15KRPMSASdriveswithRAID5(7D+1P)ata30%busyrate,youshouldplanforabout540IOPS
for100%Reads,or125IOPSfor100%Writes.Takenotetheseareallcachemissvalues.
Table 3. Calculated RAID Group IOPS Rates (SAS) by RAID Level and Percent Busy Rates
TotalDiskIOPSbyRAIDGroup(100%busy)
15kRPMDisks
UsingIOPS=
TotalDiskIOPSbyRAIDGroup(30%busy)
15kRPMDisks
225
208
225
208
RAIDType
IOPS
Availablefor
RandomRead
IOPS
Availablefor
RandomWrite
RAIDType
IOPS
Availablefor
RandomRead
IOPS
Availablefor
RandomWrite
RAID1+0
2+2
4+4
RAID5
3+1
900
1,800
900
416
832
208
RAID10
2+2
4+4
RAID5
3+1
270
540
270
125
250
62
7+1
RAID6
1,800
416
7+1
RAID6
540
125
6+2
1,800
277
6+2
540
83
RESTRICTEDCONFIDENTIAL
UsingIOPS=
Page80
Table 4. Calculated RAID Group IOPS Rates (SATA-II) by RAID Level and Percent Busy Rates
TotalDiskIOPSbyRAIDGroup(100%busy)
7.2kRPMSATA
UsingIOPS=
TotalDiskIOPSbyRAIDGroup(30%busy)
7.2kRPMSATA
99
90
99
90
RAIDType
RAID1+0
2+2
4+4
RAID5
3+1
7+1
RAID6
IOPS
Availablefor
RandomRead
396
792
396
792
IOPS
Availablefor
RandomWrite
90
180
60
120
RAIDType
RAID1+0
2+2
4+4
RAID5
3+1
7+1
RAID6
IOPS
Availablefor
RandomRead
119
238
119
238
IOPS
Availablefor
RandomWrite
27
54
18
36
6+2
792
80
6+2
238
24
RESTRICTEDCONFIDENTIAL
UsingIOPS=
Page81
Thetablebelowshowsvariousguidelinesfordifferentfilesystemsandtheirformattingbehaviorona
DPVOL.
RESTRICTEDCONFIDENTIAL
Page82
ByIanVogelesang
Thefactthatwealreadyworryabouthotspots(busyparitygroups)meansthatFibreChanneldrives
arealreadyquitebusy.Inotherwords,wegenerallyadjustworkloadsonFibreChanneldrivestostay
withintheirIOPShandlingcapability.
Forexample,theperformancebenefitofDynamicProvisioningisthattheI/Oactivityassociatedwitha
large number of host DPVOLs can be combined and distributed over many parity groups. This evens
outtheactivityoverthosemanyparitygroupsandthuseliminateshotspots.
Thus, by eliminating hot spots, Dynamic Provisioning lets you raise average parity group utilization (%
busy) somewhat. But what does raise parity group utilization mean? It means put more data on
eachdrive.
Ifwe combinethedatafromtwoseparatehostsortwoapplicationstogetherandstorethemonone
paritygroup,wearealsocombiningtheassociatedactivity(IOPS)together.
Thusifweputtwiceasmuchdataofthesametypeononediskdrive,wewillalsodoubletheactivity
(IOPS) on the drive. Notice that there is the concept of the type of data in terms of its intensity of
access. Some types of data are infrequently accessed, and others are heavily accessed. This is a
fundamentalcharacteristicofthedata,nomatterwhattypeofdrivethedataisusedtostorethedata.
The activity is associated with the data, and if you have twice the data, you will have twice the
associatedactivity.
WecallthisintrinsicmeasureoftheactivityassociatedwithagivenamountofdatatheAccessDensity,
anditismeasuredinIOPSperGB.Forexample,ifyouhave100IOPSassociatedwith200GBofdatathe
accessdensityforthatdatais100IOPSper200GBofdata,or0.5IOPSperGB.
ComparedtoFibreChanneldrives:
SATAdriveshaveamuchlowerIOPScapabilityonaperdrivebasis.
SATAdriveshaveamuchhigherstoragecapacityonaperdrivebasis.
Thus to figure out if its costeffective to use SATA drives, you need to figure out not only how many
SATAdrivesitwilltaketoholdthedata,butalsohowmanySATAdrivesitwilltaketohandletheIOPS
thatcomewiththedata.
IngeneralwecantputmostnormaltypesofdataonSATAdrives,becauseSATAdriveshaveamuch
lowerIOPSlimitthanthatofFibreChanneldrives.
Letslookatsomenumbers:
EachI/Ooperationbeginswithmechanicalpositioning,whichstartswithaseekoperation(tomovethe
accessarmintoposition)followedbylatency(waitingforthedrivetoturnuntilthestartingpointofthe
targetdatacomesunderthehead).
RESTRICTEDCONFIDENTIAL
Page83
Theaverageseektimeforadiskdriveisgenerallylongerthantheaveragelatency(1/2turn),sointhe
timeittakestodothemechanicalpositioning,thedrivemakesmorethanonefullrotation.However,a
randomI/Otypicallyprocessesa4KBor8KBblock,whichisontheorderofonepercentofthestorage
capacityofonediskdrivetrack.Thecapacityofonetrackvariesalotbetweendrivetypesandtheinner
andouterzonesofthedrive,butingeneralthetransfertimeforarandomI/Oblockisontheorderof
lessthan1%ofthetotalI/Oservicetime,sorandomI/Ois99%mechanicalpositioning.
NominaloneatatimeIOPScapabilityofsampledrivetypes:
1TBSATAdrive
400GB10,000RPMFibre
Channeldrive
300GB15,000RPMFibre
Channeldrive
Averagereadseektime
8.5ms
4.9ms
3.5ms
Averagelatency
Thisisofarotation.
4.2ms
3.0ms
2.0ms
Totalmechanicalpositioning
time(read)
12.7ms
7.9ms
5.5ms
ReadIOPScapability
Thisisthenumberof(read)
mechanicalpositioning
operations(IOPS)thatcanbe
doneinonesecond,oneat
atime.
79IOPS
127IOPS
182IOPS
Whenwefactorintheamountofdatathatisstoredonthedrive,wecancalculatethebaseoneata
timereadaccessdensitycapabilityofeachtypeofdrive:
1TBSATAdrive
400GB10,000RPMFibre
Channeldrive
300GB15,000RPMFibre
Channeldrive
ReadIOPScapability
Numberof(read)mechanical
positioningoperations(IOPS)
thatcanbedoneinone
second,oneatatime.
79IOPS
127IOPS
182IOPS
Storagecapacity,GB
1,000GB
400GB
300GB
.08IOPSperGB
.32IOPSperGB
.61IOPSperGB
Nominalreadaccessdensity
capability
RESTRICTEDCONFIDENTIAL
Page84
RandomReadAccessDensity,
IOPSperGB
(onereadatatime)
1
0.5
0
1TBSATA 10K400
15K300
Whatwecanseehereisthatifwefilleachdrivewithdata,10KRPMdrivesarecapableofhandlingdata
thatis4xasactiveasthedatawecanputonaSATAdrive,and15KRPMdrivesarecapableofhandling
datathatis8xasactive.
Itgetsworse.
I/O operations are not issued oneatatime. Multiple I/O operations (each identified by a tag) are
issued at once to each disk drive, in order to let the disk drive decide what order to perform the I/O
operationsin,soastominimizetheamountofseekingandrotationaldelaynecessary.Thiscapabilityto
reduceaveragemechanicalpositioningtimeiscalledTaggedCommandQueuing,orTCQ.
Fibre Channel drives have substantially more microprocessor horsepower than SATA drives, and as a
result, they get substantially higher boosts in throughput through the use of TCQ. The amount of
throughputboostfromTCQvariesbyworkload,butforthepurposesofillustration,itsreasonableto
showat25%boostinIOPScapabilityusingTCQforSATA,anda50%boostforFibreChannel.
NowamorerealisticcomparisonforreadAccessDensitycapabilityisshown:
RandomReadAccessDensity,
IOPSperGB
(withTaggedCommandQueuing)
1
0.5
0
1TBSATA 10K400
15K300
NowtheAccessDensitycapabilityof10KRPMdrivesisroughly5xthatofSATA,and15KRPMdrives
roughly10xthatofSATA.
Itgetsevenworse.
The use of SATA drives in subsystem applications is relatively recent, and protecting the integrity of
customerdataisparamount.Therefore,Hitachisubsystemsperformareadverifyoperationafterevery
RESTRICTEDCONFIDENTIAL
Page85
writewithSATAdrives.Thisaddsarandomreadpositioningtimeaftereverywriteoperation.Note
thatthisisnotoneextrarotation,buttrulyafullrandomreadpositioning,asingeneralmultipleI/O
operationsarequeuedinthedriveatonce,andthereforethereadverifyoperationgoesinthequeue
after the write completes, and other I/O operations in general will intervene between the write and
subsequentreadverify.
RandomWriteAccessDensity,
IOPSperGB
(withTaggedCommandQueuing&SATA
readverify)
1
0.5
0
1TBSATA 10K400
15K300
Forrandomwrites,the10KRPMdriveiscapableofhandlingabout10xtheaccessdensityoftheSATA
drive,andthe15KRPMdriveabout18xthatoftheSATAdrive.
Thesenumbersareonlyroughestimates,butwhatyoucanclearlyseehereisthatinordertorealizethe
cost effectiveness of SATA drives, that is, in order to take advantage of the storage capacity of SATA
drives,theyneedtobefilledwithdata.TheonlytypeofdatathatcanfillaSATAdriveisdatathatis
approximately an order of magnitude (10 times) less active than data that would normally be put on
FibreChanneldrives.
Failure to understand this 10x difference in access density capability can result in serious SATA drive
deploymentproblems.IfnormalFibreChanneldrivetypedataisplacedonSATAdrives,theresultwill
usuallybethattheSATAdrivescannotkeepupwiththeIOPSdemand,andsubsystemcachewillfillup
with data waiting to be written to the SATA drives. In order to treat the symptom (excessive write
pending),SATAdriveworkloadscanbeplacedintheirownCacheLogicalPartition(CLPR).Eventhough
this may isolate other workloads from the problem, the SATA drives will still be a bottleneck
constrainingthethroughputoftheapplicationthatisusingthem.
ThesummaryonSATAdrivesisthatiftheyareusedfortraditionalapplicationdata,thereareonlytwo
scenarios:
1)Costwasreducedbyputtingover100GBofnormaldataona1TBSATAdrive.Excessivewrite
pendingforcestheSATAdrivestobeisolatedinaCLPR.Costwasreduced,butthroughputis
drasticallyreduced.
2)Lessthan100GBofdatawasplacedoneachSATAdrive.Theextramoney(overandabove
whatitwouldhavecostforFibreChanneldrivers)wasspentinordertoprovideenoughSATA
drivestohandletheIOPS.Forthispricepremiumtheresultisslowerresponsetimeandhigher
electricalpowercosts.
Thusonlyifyouhaveatypeofworkloadthathasatmost1/10ththeaccessdensityofnormalworkloads
willSATAdrivesbecosteffective.
RESTRICTEDCONFIDENTIAL
Page86
An LDEV is a logical volume internal to the subsystem, which can hold host data. LDEVs are
carvedfromalogicalcontainercalledaVDEV.Inthediagrambelow,aninternalVDEVisshown,which
mapstotheformattedspace(netofparity)inaParityGroup.
Figure 1. Overview of Logical Devices.
Host
HBA
HBA
Switch
Switch
Physical
Port
Physical
Port
Logical
Port
Logical
Port
LU
Physical
port
Host
Storage
Domain
(HSD)
LU
Logical
constructs
HDEV
LDEV
VDEV
LUSE
Physical
disk
drives
Parity group
RESTRICTEDCONFIDENTIAL
Page87
Host Group
LUN
HDEV
LUSE
LUSE
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
COW
V-VOL
COW
V-VOL
DPVol
DPVol
VDEV
(internal)
VDEV
(concat.
parity
group)
VDEV
(external)
VDEV
(internal)
VDEV
(concat.
parity
group)
VDEV
(external)
VDEV (CoW
V-Vol Grp)
VDEV (CoW
V-Vol Grp)
VDEV
(HDP V-Vol
Grp)
VDEV
(HDP V-Vol
Grp)
LUN
(external)
Parity
Group
LUN
(external)
CoW Pool
CoW Pool
HDP Pool
HDP Pool
Parity
Group
VDEV
(internal)
< 2.99TB
LDEV
LDEV
Parity
LDEV
Group
VDEV
(external)
< 4TB
VDEV (CoW
V-Vol Grp)
4 TB
LDEV
LDEV
Parity
LDEV
Group
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
LDEV
VDEV
LDEV
(internal)
LDEV
LDEV
LDEV
LDEV
VDEV
LDEV
(external)
LDEV
LDEV
LDEV
LDEV
VDEV
LDEV
(internal)
LDEV
LDEV
LDEV
LDEV
VDEV
LDEV
(external)
LDEV
LDEV
LDEV
LDEV
Parity
LDEV
Group
LDEV
LDEV
LDEV
LDEV
LUN
LDEV
(external)
LDEV
LDEV
LDEV
LDEV
Parity
LDEV
Group
LDEV
LDEV
LDEV
LDEV
LUN
LDEV
(external)
VDEV (HDP
V-Vol Grp)
4 TB
LDEVsareuniquelyidentifiedacrossthesubsystembyasixhexadecimaldigitidentifierusuallywritten
with a colon between each pair of hex digits, for example 00:00:01. This format is called the
LDKC:CU:LDEV format, and thus the subfields with LDEV names are called the LDKC (Logical DKC)
number,theCU(ControlUnit)number,andtheLDEVnumber(withintheCU).
ThemaximumnumberofLDEVsthatmaybecreatedwithinaUniversalStoragePlatformVorUniversal
StoragePlatformVMsubsystemis64K.TheLDKCfield,thefirst2hexdigitsofanLDEVidentifier,was
addedwiththeintroductionoftheUniversalStoragePlatformVandUniversalStoragePlatformVMin
ordertoprovidefortheabilityatalaterdatetopossiblyincreasethenumberofLDEVspersubsystem
above64K.
NotethatthetermLDEVasusedwithHDSenterprisesubsystems(UniversalStoragePlatformfamily)
to describe an internal logical volume, such as can be carved from a parity group, is somewhat
confusinglycalledanLUoraLUNforHDSmodularsubsystems(AMSfamily).WhatiscalledanLUor
LUNinHDSenterprisesubsystemsiscalledanHLUNinmodularsubsystems.Sorryfortheconfusion.
NotethataLUN,thewaythatahostseesanLDEV,emulatesaphysicalFibreChanneldiskdrive.As
such, accessing a LUN consists of reading or writing disk sectors comprised of 512 bytes. Thus the
RESTRICTEDCONFIDENTIAL
Page88
smallestamountofdatathatahostcanreadorwriteisonewhole512bytesector.ThesizeofaLUNor
anLDEVisthereforeamultipleofa512byteblock(sector).
WhencreatinganLDEV,ifyouspecifythesizeinblocks(512bytesectors)youwillgetanLDEVofthat
exactsize.Ifyouspecifythesizeyouwantusinganyotherunit,suchasMB,therewillbearounding
operation that will take the size you asked for and will round up to the next higher multiple of an
internalallocationunit(logicalcylindersof15logicaltracks).Sufficeittosaythatifyouwanttomake
anLDEVofanexactsize,specifythesizeinblocks(sectors).
NoteinthediagramabovethatLDEVsarenotdirectlypresentedtohostsasaLogicalUnit(LU)orLUN,
but rather there is an intermediate construct called an HDEV (Head LDEV). An HDEV may either map
directlytoasingleLDEV,oritmaybemappedtowhatiscalledaLUSE(LogicalUnitSizeExpansion).A
LUSEisacombinationoffrom2to36LDEVS:
Figure 3. Overview of LUSE Devices.
Physical
Port
Logical
Port
Physical
Port
Host
Storage
Domain
(HSD)
Logical
Port
LU
From 2 to 36
LDEVs may be
combined in a
LUSE
LU
LDEV
VDEV
LUSE
HDEV
LDEV
VDEV
The only reason that one should make a LUSE is to create a LUN that is larger than the maximum
availableLDEVsize.ForinternalLDEVs,thesizeofanLDEVislimitedbytheamountofformattedspace
in one parity group, or about 2.99 TB, whichever is smaller (more on this below when we talk about
VDEVs).ForothertypesofLDEV,themaximumsizeis4TB.
ThemaximumsizeofaLUSE,ontheotherhand,is60TB.
Note:TheLDEVswithinaLUSEarenotinterleavedorstriped.LBAs(LogicalBlockAddresses,orsector
numbers)fortheLUSEstartatthebeginningofthefirst,orheadLDEV,andthenproceedtotheendof
eachLDEVbeforestartingonthenextone.Thuswhenorwritingsequentially,theperformancewillbe
limitedbytheperformanceofasingleLDEVatatime.
RESTRICTEDCONFIDENTIAL
Page89
Lateron,wewilltalkaboutwhatareconfusinglycalledconcatenatedparitygroupswhichdonotgive
the ability to make larger LDEVs, but which do interleave the RAID stripes of an LDEV across multiple
paritygroups.
Figure 4. Overview of LUSE Devices.
Host
HBA
HBA
Switch
Switch
Physical
Port
Physical
Port
Logical
Port
Logical
Port
LU
Physical
port
Host
Storage
Domain
(HSD)
LU
Logical
constructs
HDEV
LDEV
VDEV
LUSE
Physical
disk
drives
Parity group
Going back to our original diagram for a moment, we can see that a single LDEV (subsystem internal
logicalvolume)maybeexposedasmultipledifferentLUNStohosts.Thediagramaboveshowsasingle
hostwithtwopathstoanHDEV.
GrantingahostaccesstoanHDEV(LDEVorLUSE)isdonebecreatingaLogicalUnit(LU)withinaHost
StorageDomain.ThismappingfromahosttoanHDEVissometimescalledapath.Apathconsistsof
three parts, the physical port (such as port 1A), the logical port, or Host Storage Domain within that
physicalport,andthentheLogicalUnitNumber,orLUNwithinthatHSD.HostStorageDomainsmaybe
identified by name, or by relative number within the physical port. Note that it is possible to map a
particularLDEVasoneLUnumberinoneHSDandadifferentLUnumberinadifferentHSD.
Fromthehostsperspective,thehostdoesntseethetwolevelstructureofthephysicalportandHSD
(logicalport).WhentheHSDisconfigured,youtellthesubsystemwhichhost(s)istobemappedtoeach
HSD.(AhostcanonlybemappedtooneHSDonagivenport.)Thusthehostseesaseriesoflogical
units(LUs)behindtheport,whereeachLUisidentifiedbyitsLUnumber.
RESTRICTEDCONFIDENTIAL
Page90
ThediagramabovealsoillustrateshowmultiplepathsfromahosttoanHDEVareimplemented.The
keyistheuseofmultipathsoftwareinthehost,suchasHitachisHDLM(HitachiDynamicLinkManager).
Themultipathsoftwareisadevicedriverwhichpresentsasinglelogicaldeviceforapplicationstouse.
But when I/Os are directed to that logical device, the multipath software device driver receives those
I/Ooperationsandreissuesthemto oneortheotheroftheseparateLUNspresentedbyeachHBAs
device driver. The HBA device drivers LUNs are hidden from applications, and are only used by the
multipathdriversoftware.Itisonlythemultipathsoftwarethatknowsthatthe(hidden)LUNsfrom
thetwoHBAsreallyleadtothesameHDEVinsidethesubsystem.
Correspondinglyatthesubsystemend,whenahosthasmultipathsoftwarerunning,thesubsystemis
notawareofthatfact.ThesubsystemdoesnotseeI/Ooperationscomingdownseparatepathsas
beingrelatedinanyway.Forthisreason,theextendedroundrobinhostmultipathingalgorithmwill
producebetterresultsforsequentialI/Oforactive/activeconfigurations.Extendedroundrobinissuesa
seriesofI/OoperationsforgivenmultipathLUNoveroneofthepathsbeforeissuingthenextseriesof
operationsovertheotherpath.Inthisway,thesubsystemssequentialdetectionalgorithmwhichis
not aware of multipathing will still see sequential host I/O as sequential at the individual subsystem
port.
RecallthatwesaidthatLDEVsarecarvedfromalogicalcontainercalledaVDEV.Therearefourtypes
ofVDEV:internal,external,COW,andDynamicProvisioning(DP)VDEVs:
V-VOL
Group
Pool ID
Pool ID
Pool volume
LDEV
VDEV
Dynamic Provisioning
X-VOL
Group
Pool ID
Pool ID
Internal
External
Parity group
Port
Port
Port
Port
LU
RESTRICTEDCONFIDENTIAL
External
subsystem
Page91
Internal VDEV
TheoriginaltypewastheinternalVDEV,whichisalogicalcontainerthatmapstotheformattedspace
withinaparitygroup,netofparityormirrorcopies.SinceLDEVsarecarvedfromVDEVs,thesizeofthe
largestinternalLDEVthatcanbecreatedisonethatentirelyfillstheVDEVthatitiscarvedfrom.
ThereisamaximumsizeforinternalVDEVs,whichisabout2.999TB.Ifthereismoreformattedspace
thanthisintheparitygroup,thenasmanymaximumsizeVDEVsaswillfitarefirstallocated,thenany
remainingspaceisallocatedtoonemoreVDEV.
Youcanseethisifyourparitygrouphasmorethan3TBofspaceinit.Inthefollowingexample,we
haveaRAID57+1paritygroupconsistingof750GBdrives.
The effective capacity of each drive is not quite 750 GB, but just as a rough estimation the total
formattedcapacityoftheparitygroupis~0.7TBperdrive*7drivesworthofformattedspace=~4.9
TB.ThisisbiggerthanthemaxinternalVDEVsize,soyoucanseeabovethattwoVDEVswerecreated
torepresenttheformattedspaceonthisparitygroup.Hereyoucanseethattheparitygroupnameis
11,andtheVDEVnamesonthisparitygroupare11(1),and11(2).Thatswhatthosenumbersin
parentheses are at the end of a parity group name. They are the number of the VDEV within that
paritygroup.NotethattheLDEVsarewithintheVDEV,nottheparitygroup.
Whileweareatit,paritygroupshavenameslike11,12,31,andsoon.Thesenameshavephysical
meaning,astheyindicatetheBEDpairandphysicaldrivelocation.
ThusthenamesforinternalVDEVsarepurelydeterminedbythephysicallayoutofthesubsystem.
External VDEV
Each External VDEV maps to a LUN on an external (virtualized) subsystem. Thus each LUN on the
externalsubsystemcorrespondstotheequivalentofaparitygroup.ItispossibletocreateanLDEVon
an External VDEV that is the exact size in sectors of the external LUN. This makes it handy to do
replicationormigration.
The names for external VDEVs look like either E11 or 11 #. Both forms ofthe name are equivalent,
thus E 11 and 11 # both refer to the same external VDEV. In some places one form of the name is
used,andinotherplacestheotherformisused.
NotethatthereisaseparatenamespaceforeachofthefourtypesofVDEV,sothatevenifyouhavean
internal parity group called 11, you can create an external VDEV called E 11, and these are not the
same.
ThemaximumsizeofanexternalVDEVis4TB.
RESTRICTEDCONFIDENTIAL
Page92
CoW VDEV
VDEVsforCopyonWriteSnapshot(CoW)areusuallycalledVVOLgroups.
EachCoWVDEV(CoWVVOLgroup)isfixedinsizeat4TB.
CoWVVOLgroupshavenamesthatlooklikeV11or11V.Thesetwoformsareequivalent,withone
formbeingusedinsomeplacesandtheotherformtherestofthetime.
ACoWVVOLgroupdoesnotmaptoanystorage.ItispurelyalogicalcontainerfromwhichCoWV
VOLs(CoWLDEVs)maybecarved.ItistheseCoWVVOLseachindividuallythataremappedtoCoW
pools.
VVOLstandsforVirtualVOLume.ThisisbecausephysicalspaceisnotprovidedforCoWVVOLswhen
they are created. Instead, the LBA range representing extent of the VVOL is divided up into 42 MB
pages, where the pages are tracked in a page table. A physical location for a 42 MB page within the
assigned CoW pool is only assigned when that page in the VVOL is written to. Thus pages that have
neverbeenwrittentohavenostorageassigned.ThiscomesinhandyasCoWVVOLsarepointintime
snapshot ShadowImage secondary volumes, and thus you only need space for whatever data has
changedontheprimaryvolume.
TheunderlyingspaceinaCoWpoolisprovidedbypoolvolumes,whicharenormalinternalorexternal
LDEVsexclusivelyassignedfordutyaspoolvolumes.ThespaceonthepoolvolumeLDEVsisbrokeninto
42 MB pages, each starting at a RAID stripe boundary. Thus one page consists of a seriesof multiple
RAIDstripes.
Odd note on pool volumes when an LDEV is assigned as CoW or Hitachi Dynamic Provisioning pool
volume,thesuffix"P"isshownonthenameoftheparitygroupthatthepoolvolumeLDEVison.
Forexample,ifanormalLDEV00:12:34whichisonparitygroup"11"isassignedasaCoWorHitachi
DynamicProvisioningpoolvolume,fromthatpointon,LDEV00:12:34willshowasbeingonparitygroup
"11P".Inthiscase,the"P"isreallyamarkertoindicatespecialtreatmentasapoolvolume,notthat
thereisaparitygroupwhosenameis"11P".
YoucanseethisusingtheSVPconfigurationprintouttool.
Page93
WhenyoulookattheperformancestatisticsforDynamicProvisioning,whatyouseeisthattheDPVOLs
(thevirtualvolumes)aretheonesthatthehostsaccess,sotheyhavefrontendI/Ostatisticsonly.On
theotherhand,poolvolumesonlyhavebackendI/Ostatistics,astheyaretheonesthattowhichthe
actualbackendI/Oisperformed.
ThemaximumnumberofLDEVswithinaUniversalStoragePlatformVoraUniversalStoragePlatform
VMis64K.ThisisalsothemaximumnumberofVDEVsinasubsystem,andthereforeitispossible(and
itisnowrequiredsincemicrocodeversion060400/00)toplaceeachDPVOLbyitselfatthetopofits
ownVVOLgroup(DPVDEV).
In general, since LDEVs are carved from VDEVs, if there is contiguous unused space immediately
followingtheLDEVwithintheVDEVitispossibletoexpandthesizeofanexistingLDEVintothatfree
space,asshowninthefollowingdiagram:
VDEV
LDEV
LDEV
LDEV
Free space
During this expansion, any existing data in the LDEV stays in place, and all that happens is that the
LDEVgetsmorespaceattheend.Notethatappropriateoperatingsystemsupportisrequiredforthe
hosttonoticethatthevolumegotbigger,andadjustthepartitiontableonvolumeaccordingly.Then
an existing partition may be dynamically expanded, again if the operating system and file system
supportthat.
ThisisthereasonwhyeachDPVOLisrequiredtobeplacedwithinitsownVDEV(VVOLgroup).Since
themaximumnumberofVDEVsisthesameasthemaximumnumberofLDEVs,byputtingeachDPVOL
byitselfatthetopofitsownVVOLgroup(VDEV),theDPVOLcanbedynamicallyexpandedlaterif
necessary.
OfcoursenowwehavetheanomalyofaDynamicProvisioningVDEVbeingcalledaVVOLgroupeven
thoughitcanonlycontainoneVVOL.
RESTRICTEDCONFIDENTIAL
Page94
ThusitsintheinternaltypeofVDEVwherethemappinghappenstotheformattedspacewithinaparity
group.
In the next two diagrams, we see the VDEV layout on first RAID1 (2D+2D) and then RAID5 (3D+1P)
paritygrouptypes.
ThustheVDEVismappedtochunksonthediskdrives.IntheUniversalStoragePlatformfamily,the
chunk size varies only with the emulation type, and not by the RAID level or number of drives in the
paritygroup.
Notethatthechunksarerelativelybig.Aboutahundred4KBblocksfitwithinonechunk.
ForrandomI/O,onlythedatarequestedbythehostisreadorwritten.Ifthehostreadsa4KBblock,
only4KBisreadfromthedrive,notanentirechunk.
ThetimeittakestodoarandomI/Oiscomposedofover99%mechanicalpositioningandlessthan1%
datatransfer.Thisisbecauseablocksizeinthe4KBto8KBrangerepresentsontheorderofonly1%of
adiskdrivephysicaltracksize,andmechanicalpositioning(seek+latency)takesonaveragemorethan
onefulldiskdriverotation.
ThatswhyforrandomI/Oworkloads,allwecareaboutisthediskdriveseektimeandRPM.
Thechunksizeischosensothattheamountoftimeittakestotransferawholechunk(thesizeofthe
chunkdividedbytheread/writeheadsustaineddatatransferrate)isofthesameorderofmagnitudeas
themechanicalpositioningtime,thatis,seekandlatencytime.
ItsdesignedthiswaybecausesequentialI/Ooperationsreadorwriteanentirechunkatatime.With
having such big chunks, very roughly one half of the disk drive busy time when reading or writing
sequentiallyismechanicalpositioningandonehalfisdatatransfer.
RESTRICTEDCONFIDENTIAL
Page95
Thismakessequentialtransferrelativelyproductive.
ItalsomakestheI/Oservicetimetoreadorwriteachunkabouttwiceaslongintimeasasmallblock
randomI/O.Themechanicalpositioningisthesameforboth,butforlargeblocksequentialI/Oafter
themechanicalpositioningiscompleteittakesroughlyonediskdriverotationworthoftimetotransfer
thechunk.
Youmightaskwhynotmakethechunksevenbigger?Iftransferringa512KBchunkrepresentsalmost
of the total I/O service time, then why not make the chunks 1MB? That way you would get a 50%
throughput increase for sequential I/O, because then almost 2/3 of the drive busy time during a
sequentialI/Owouldbedatatransfer.
ThereasonisthatifarandomreadmissoccurswhileasequentialI/O(prefetchstageorwritedestage)
is in progress, the host has to wait. Sequential I/O operations already take about twice as long as
random I/Os, and if we made the chunks even bigger the host would have to wait even longer if a
randomreadmissoccursduringasequentialI/O.Thustheresatradeoffasalargerchunksizewould
increasesequentialthroughputattheexpenseofadverselyimpactingrandomreadperformance.
NotealsointheRAID1VDEVchunklayoutdiagramthattheprimarychunks(showncoloredblue)and
themirrorcopychunks(coloredred)areinalternatingpositionsfromrowtorow.Thisisdonebecause
themirrorcopiesarenormallywriteonlyasabackup.Allreadoperationsaredirectedtotheprimary
copyunlessthedrivecontainingtheprimarycopyhasfailed.
Thus the effect of the alternating primary and mirror chunks in the layout from row torow in RAID1
tendstobalancethenumberofreadoperationsdirectedtoeachdrivewithintheparitygroup.
LetstakealookatthechunklayoutforaRAID5(3D+1P)VDEVintheUniversalStoragePlatformfamily:
YoumightaskWhydoesthe2ndrowhavethechunksnumberedintheorder4,5,3?Thisisconfusing.
The reason why is that, in the Universal Storage Platform family, sequential reads will read a chunk
fromeachdriveintheparitygroupinparallel.Thelayoutaboveenableschunks1,2,3,4tobereadat
once. In other words, sequential reads do not read one parity group stripe at a time. However, for
RESTRICTEDCONFIDENTIAL
Page96
sequentialwritesonestripeatatime(inthiscase3datablocksandoneparityblock),againperforming
oneI/Otoeachdriveintheparitygroupatonce.
Note:TheHDSAMSormodularfamilyofsubsystemsusesadifferentRAID5chunklayoutschemewhich
ismorewhatyoumighthaveexpected.Theanalogousdiagramfor3+1foramodularsubsystemhasin
row2fromlefttorightchunks3,4,parity35,andchunk5.
WesaidthattheabstractionofaVDEVasalogicalcontainerallowsoperationsthatdealwithLDEVsto
beimplementedlargelyindependentlyofthetypeofVDEV.OnesmalldependencyonthetypeofVDEV
is that LDEVs always start on a RAID stripe boundary. That is, the first chunk of an internal LDEV is
alwaysthefirstdatachunkofaRAIDstripe,orparitygrouprow.
ThisisdonetomakeiteasytocalculatethephysicallocationofwhereaparticularLBAisstored.Each
LDEVhasanattributewhichistheLBAaddressofwherethatLDEVstartsoneverydriveintheparity
group.ThusyoucandivideanLDEVLBAbytheRAIDstripesize(dataonlystripesize,exclusiveofparity
ormirrorcopies),andthequotientgivesyoutherelativestripeoffsetinchunksonthedrive,relativeto
wheretheLDEVstartsoneachdrive,andtheremaindertellsyouwhereyouarewithinthestripeand
therefore which drive contains that LDEV LBA. (The number of the stripe on the drive also tells you
whichchunksonthatstripearethedatachunksandwhicharethemirrorcopy/paritychunk(s).)
RESTRICTEDCONFIDENTIAL
Page97
TheUniversalStoragePlatformVfamilyhasexactlythefollowing4baseParityGrouptypes:
RAID1(2D+2D)
RAID5(3D+1P)
RAID5(7D+1P)
RAID6(6D+2P)
Notethatthereisno4+4asabaseParityGrouptype.
Thefollowingconceptualdiagram,showingtwoParityGroupsthatarenotconcatenated,willbeusedas
thestartingpointtoexplaintwowayconcatenation:
ConcatenatedParityGroupsarederivedfromthebaseParityGrouptypesbyinterleavingtheirVDEVs.
Thefollowingdiagramshowsatwowayconcatenationorinterleaves:
RESTRICTEDCONFIDENTIAL
Page98
VDEV 1-1-(1)
...
...
VDEV 3-1-(1)
...
...
...
...
TherearetwotypesoftwowayinterleavedParityGroups.
ThetwowayinterleaveofRAID1(2D+2D)iswhatweusuallycall4+4.
o
NotethatthereisaVDEVforeachParityGroupthatisthesamesizeastheVDEVwould
havebeenwithoutconcatenation.ThusthebiggestLDEVyoucanmakeona4+4isthe
samesizeasthebiggestLDEVyoucanmakeona2+2.
ThetwowayinterleaveofRAID5(7D+1P)doesnthaveashorthandname,butifyouwrite
2x(7+1)thisisclear.Sometimespeoplewrite14+2iftheyarewritingabouttheUniversal
StoragePlatformVfamilyyouknowthata14+2couldonlybe2x(7+1).
RESTRICTEDCONFIDENTIAL
Page99
ThereisonetypeoffourwayinterleavedParityGroup.
ThefourwayinterleaveofRAID57+1maybeclearlywrittenas4x(7+1).Ifyouseesomeone
write28+4,thisiswhattheymustbetalkingabout.
Yes,thenameisconfusing.
YouwillsometimesseethetermsVDEVdisperseorVDEVinterleave.Theseareequivalenttothe
officialnameconcatenatedParityGroups.SometimespeoplewilltalkaboutLDEVstriping,andhereit
ispossiblethattheyaretalkingaboutconcatenatedParityGroups,becausetheLDEVsthatarecarved
fromaVDEVthatispartofaconcatenatedParityGroupsetareindeedstripedacrosstheParityGroups
thathavebeenconcatenatedtogether.
Thetermconcatenatedisunfortunate,asmanynativeEnglishspeakerswouldassumethatthisrefers
tosomethingmorelikeaLUSE.AcolloquialEnglishexpressionoftenusedtodescribeLUSEisspilland
fill, because in a LUSE, reading or writing sequentially through the LUSE, one proceeds through each
constituent LDEV in its entirety before moving to the next. Thus LUSE does not improve sequential
throughput.
RandomI/OoperationsareeffectivelydispersedacrossthesetofconcatenatedParityGroups.
IfsmallblockrandomI/Ooperationsareclusteredtowithinuptoahundredblocksorso,they
willlikelygotothesinglediskdrivethatcontainsthechunk(512KBforOpenV)thatcontains
thosesmallblocks.ThisistrueforanyParityGrouptype.However,theadvantageof
concatenatedParityGroupsisthatifrandomI/OisclusteredwithintensorhundredsofMB,
theseI/OswillbedispersedacrossthewholesetofconcatenatedParityGroups.Inthecaseofa
LUSEvolume,randomI/Oclusteredwithin10sor100sofmegabyteswilllikelybeconcentrated
onasingleLDEVthatisonasingleParityGroup.
RESTRICTEDCONFIDENTIAL
Page100
ConcatenatedParityGroupsreducetosomedegreeindividualParityGrouphotspots.The
totalactivityacrossalltheLDEVsonasetofconcatenatedParityGroupsisdistributedmoreor
lessevenlyacrossthatsetofParityGroups.ThusthisaveragesoutParityGrouputilization
acrosstheconcatenatedset,andthusreducesaverageParityGroupbusy.ReducedParity
Groupbusyyieldsbetteroverallaverageresponseandofferstheopportunitytoincrease
overallthroughput.
NotethatDynamicProvisioningoffersthebenefitsofI/Oactivitydispersalacrossmuch
largernumbersofParityGroups.
Shouldadiskdrivefail,performancecouldbeaffectedforalargersetofLDEVs.
MostpeopleexpectthatwhenyouconcatenateParityGroupstogether,youwillbeableto
makebiggerLDEVs.Aswesaw,concatenationdoesnotalterthenumberorsizeoftheVDEVs
thatwouldotherwisebeassignedtothesameParityGroupsiftheywerenotconcatenated
together.Tofixthisproblem,ifyouwanttheperformancecharacteristicsofLDEVsthatare
stripedacrossaconcatenatedsetofParityGroups,butyoualsowantaLUNthatisasbigasthe
combinedformattedsizeoftheParityGroupsintheset,thencarveasinglemaximumsizeLDEV
outofeachVDEVbelongingtoasetofconcatenatedParityGroups,andLUSEtheseLDEVs
togethertomakeoneLUN.
42stripesofa2+2,wherethestripesizeis2x512KB=1MB.
28stripesofa3+1,wherethestripesizeis3x512KB=1.5MB.
12stripesofa7+1,wherethestripesizeis7x512MB=3.5MB.
14stripesofa6+2,wherethestripesizeis6x512MB=3MB.
We have not specifically tested the performance of LDEVs on concatenated Parity Groups used as
DP/CoW pool volumes. However, there is a theoretical argument that says this might improve
RESTRICTEDCONFIDENTIAL
Page101
sequential performance. Since sequential performance is higher for normal LDEVs on concatenated
ParityGroupsduetothecharacteristicofreading/writingachunktoallthedrivesintheconcatenated
setatonce,(reading/writingoneroundofinterleavedRAIDstripesatones)itwouldbeworthtrying
to see if this also applies to sequential processing of DP/CoW pages. This may or may not provide a
benefit in the context of DP/CoW sequential processing. This document may be revised in future to
clarifythispoint,butforthetimebeingthisremainsanopenquestion.
NochangetothemaximumsizeofanLDEV.
ItisnotpossibletoallocateasingleLDEVthatconsumesallthespaceinalltheParityGroups
beingconcatenated(interleaved)together.Instead,createamaximumsizeLDEVoneachVDEV
associatedwiththesetofconcatenatedParityGroupsandLUSEtheseLDEVstogether.
TheI/OactivityofeachLDEVcarvedoutofaconcatenatedParityGroupVDEVmapstothe
underlyingRAIDstripescomprisingtheVDEV,whichmeansthattheI/Oactivityspreadsacross
allthedrivesintheconcatenationset.
SequentialI/Othroughputisimprovedbecauseyoucanreadorwriteonechunktoeachofthe
drivesintheconcatenatedsetinparallel.
RandomI/OoperationsbenefitfromhavinganLDEVsI/Ooperationsdistributedacrossalarger
numberofdiskdrives.
RAIDgroupperformanceisolationhappensattheleveloftheconcatenatedset,asalltheLDEVS
inalltheVDEVsintheconcatenationsetsharethesamedrives.
InadditiontothebaseParityGrouptypes2+2,3+1,7+1,and6+2,concatenatedParityGroups
yieldthetypes2x(2+2),whichiscommonlycalled4+4,aswellas2x(7+1)and4x(7+1).
RESTRICTEDCONFIDENTIAL
Page102
Corporate Headquarters 750 Central Expressway, Santa Clara, California 95050-2627 USA
Contact Information: + 1 408 970 1000 www.hds.com / info@hds.com
Asia Pacific and Americas 750 Central Expressway, Santa Clara, California 95050-2627 USA
Contact Information: + 1 408 970 1000 www.hds.com / info@hds.com
Europe Headquarters Sefton Park, Stoke Poges, Buckinghamshire SL2 4HD United Kingdom
Contact Information: + 44 (0) 1753 618000 www.hds.com / info.uk@hds.com
Hitachi is a registered trademark of Hitachi, Ltd., and/or its affiliates in the United States and other countries. Hitachi Data Systems is a registered trademark and
service mark of Hitachi, Ltd. In the United States and other countries.
Microsoft is a registered trademark of Microsoft Corporation.
Hitachi Data Systems has achieved Microsoft Competency in Advanced Infrastructure Solutions.
All other trademarks, service marks, and company names are properties of their respective owners.
Notice: This document is for informational purposes only, and does not set forth any warranty, express or limited, concerning any equipment or service offered or
to be offered by Hitachi Data Systems. This document describes some capabilities that are conditioned on a maintenance contract with Hitachi Data Systems
being in effect, and that may be configuration-dependent, and features that may not be currently available. Contact your local Hitachi Data Systems sales office for
information on feature and product availability.
Hitachi Data Systems sells and licenses its products subject to certain terms and conditions, including limited warranties. To see a copy of these terms and
conditions prior to purchase or license, please go to http://www.hds.com/corporate/legal/index.html or call your local sales representatives to obtain a printed copy.
If you purchase or license the product, you are deemed to have accepted the terms and conditions.
Hitachi Data Systems Corporation 2009. All Rights Reserved
WHP-###-## September 2009
RESTRICTEDCONFIDENTIAL
Page103