Professional Documents
Culture Documents
John Hock jrhock@us.ibm.com Dan Braden dbraden@us.ibm.com Power Systems Advanced Technical Skills
Materials may not be reproduced in whole or in part without the prior written permission of IBM.
5.3
Agenda
Correctly Configuring Your Disks
Filesets
MPIO Commands
priorities
Failed
path management SDD and SDDPCM Multi-path code choices for DS4000, DS5000 and DS3950 XIV & Nseries SAN Boot
MPIO
Disk configuration
https://tuf.hds.com/gsc/bin/view/Main/AIXODMUpdates ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS/
The disk vendor Dictates what multi-path code can be used Supplies the filesets for the disks and multipath code Supports the components that they supply A fileset is loaded to update the ODM to support the storage AIX then recognizes and appropriately configures the disk Without this, disks are configured using a generic ODM definition Performance and error handling may suffer as a result # lsdev Pc disk displays supported storage The multi-path code will be a different fileset Unless using the MPIO thats included with AIX
Beware of generic Other disk definition No command queuing. Poor Performance & Error Handling
Server
FC Switch
Storage
Server
FC Switch
Storage
4 X 4 = 16
2011 IBM Corporation
2X2+2X2=8
If the links arent busy, there likely wont be much, if any, savings from use of sophisticated path selection algorithims vs. round robin
Generally utilization of links is low
Costs of path selection algorithms CPU cycles to choose the best path Memory to keep track of in-flight IOs down each path, or Memory to keep track of IO service times down each path Latency added to the IO to choose the best path
2011 IBM Corporation
Two layers of multi-path code: VIOC and VIOS VSCSI disks always use AIX default MPIO and
all IO for a LUN normally goes to one VIOS
algorithm
= fail_over only
Multi-path code
VIO Server
Multi-path code
VIO Server
VIO Client
Potentially
one vFC adapter for every real FC adapter in each VIOC of 64 vFC adapters per real FC adapter recommended
Maximum
VIO Server
VIO Server
What is MPIO?
MPIO is an architecture designed by AIX development (released in AIX V5.2) MPIO is also a commonly used acronym for Multi-Path IO
In this presentation MPIO refers explicitly to the architecture, not the acronym
With the advent of SANs, each disk subsystem vendor wrote their own multi-path code These multi-path code sets were usually incompatible Mixing disk subsystems was usually not supported on the same system, and if they were, they usually required their own FC adapters Integration with AIX IO error handling and recovery Several levels of IO timeouts: basic IO timeout, FC path timeout, etc
Compliant code requires a Path Control Module (PCM) for each disk subsystem
Default PCMs for SCSI and FC exist on AIX and often used by the vendors
Capabilities exist for different path selection algorithms Disk vendors have been moving towards MPIO compliant code
MPIO Common Interface
for 32 K paths
more than 16 paths are necessary PCMs exist for FC, SCSI may write optional PCMs
Hdisks can be Available, Defined or non-existent Paths can also be Available, Defined, Missing or non-existent Path status can be enabled, disabled or failed if the path is Available
(use chpath command to change status) Add path: e.g. after installing new adapter and cable to the disk run cfgmgr (or cfgmgr l <adapter>) One must get the device layer correct, before working with the path status layer
2011 IBM Corporation
10
MPIO support
Storage Subsystem Family IBM ESS, DS6000, DS8000, DS3950, DS4000, DS5000, SVC, V7000 DS3/4/5000 in VIOS IBM XIV Storage System IBM System Storage N Series EMC Symmetrix HP & HDS (varies by model) MPIO code IBM Subsystem Device Driver Path Control Module (SDDPCM) Default FC PCM recommended Default FC PCM Default FC PCM Default FC PCM Hitachi Dynamic Link Manager (HDLM) Default FC PCM Multi-path algorithm fail_over, round_robin, load balance, load balance port fail_over, round_robin fail_over, round_robin fail_over, round_robin fail_over, round_robin fail_over, round robin, extended round robin fail_over, round_robin
SCSI
VIO VSCSI
fail_over, round_robin
fail_over
11
Storage subsystem family IBM DS4000 EMC HP HDS Vertias supported storage
Multi-path code Redundant Disk Array Controller (RDAC) Power Path AutoPath HDLM (older versions) Dynamic MultiPathing (DMP)
12
disk subsystem vendor supports their storage, the server vendor generally doesnt
You can mix multi-path code compliant with MPIO and even share adapters
There
may be exceptions. Contact vendor for latest updates. HP example: Connection to a common server with different HBAs requires separate HBA zones for XP, VA, and EVA that SDD and RDAC can be mixed on the same LPAR
Generally one non-MPIO compliant code set can exist with other MPIO compliant code sets
Except The
you cant used SDDPCM for one DS8000 and SDD for another DS8000 on the same AIX instance
13
Disk using MPIO compliant code sets can share adapter ports Its recommended that disk and tape use separate ports
Disk (typicaly small block random) and tape (typically large block sequential) IO are different, and stability issues have been seen at high IO rates
14
or disable paths
a path into the defined mode means it wont be used (from available to defined)
One
chdev change a devices attributes (not specific to MPIO) cfgmgr add new paths to an hdisk or make defined paths available
(not specific to MPIO)
15
16
Path priorities
A Priority Attribute for paths can be used to specify a preference for path
IOs. How it works depends whether the hdisks algorithm attribute is set to fail_over or round_robin. Value specified is inverse to priority, i.e. 1 is high priority
algorithm=fail_over
the the Set if
path with the higher priority value handles all the IOs unless there's a path failure. other path(s) will only be used when there is a path failure.
the primary path to be used by setting it's priority value to 1, and the next path's priority (in case of path failure) to 2, and so on. the path priorities are the same and algorithm=fail_over, the primary path will be the first listed for the hdisk in the CuPath ODM as shown by # odmget CuPath
algorithm=round_robin
If
the priority attributes are the same, then IOs go down each path equally.
In
the case of two paths, if you set path As priority to 1 and path Bs to 255, then for every IO going down path A, there will be 255 IOs sent down path B.
17
Path priorities
# lsattr -El hdisk9 PCM PCM/friend/otherapdisk algorithm fail_over hcheck_interval 60 hcheck_mode nonactive lun_id 0x5000000000000 node_name 0x20060080e517b6ba queue_depth 10 reserve_policy single_path ww_name 0x20160080e517b6ba Path Control Module Algorithm Health Check Interval Health Check Mode Logical Unit Number ID FC Node Name Queue DEPTH Reserve Policy FC World Wide Name False True True True False False True True False
# lspath -l hdisk9 -F"parent connection status path_status" fscsi1 20160080e517b6ba,5000000000000 Enabled Available fscsi1 20170080e517b6ba,5000000000000 Enabled Available # lspath -AEl hdisk9 -p fscsi1 -w"20160080e517b6ba,5000000000000" scsi_id 0x10a00 SCSI ID False node_name 0x20060080e517b6ba FC Node Name False priority 1 Priority True Note: whether or not path priorities apply depends on the PCM. With SDDPCM, path priorities only apply when the algorithm used is fail over (fo). Otherwise, they arent used.
2011 IBM Corporation
18
priorities for half the LUNs to use VIOSa/vscsi0 and half to use VIOSb/vscsi1 both VIOSs CPU and virtual adapters is the only option at the VIOC for VSCSI disks
Uses
algorithm=fail_over
With NSeries have the IOs go the primary controller for the LUN
Set
19
hcheck_interval
Defines how often the health check is performed on the paths for a device. The attribute supports a range from 0 to 3600 seconds. When a value of 0 is selected (the default), health checking is disabled Preferably set to at least 2X IO timeout value
hcheck_mode
Determines which paths should be checked when the health check capability is used: enabled: Sends the healthcheck command down paths with a state of enabled failed: Sends the healthcheck command down paths with a state of failed nonactive: (Default) Sends the healthcheck command down paths that have no active I/O, including paths with a state of failed. If the algorithm selected is failover, then the healthcheck command is also sent on each of the paths that have a state of enabled but have no active IO. If the algorithm selected is round_robin, then the healthcheck command is only sent on paths with a state of failed, because the round_robin algorithm keeps all enabled paths active with IO.
20
Path Recovery
MPIO will recover failed paths if path health checking is enabled with hcheck_mode=nonactive or failed
and the device has been opened
Trade-offs exist:
Automatic recovery requires turning on path health checking for each LUN
Lots of time between health checks means paths will take longer to recover after repair Health checking for a single LUN is often sufficient to monitor all the physical paths, but not to recover them
SDD and SDDPCM also recover failed paths automatically In addition, SDDPCM provides a health check daemon to provide an automated method of reclaiming failed
paths to a closed device.
To manually enable a failed path after repair or re-enable a disabled path: To disable all paths using a specific FC port on the host:
# chpath l hdisk1 p <parent> -s disable
21
One should also set up error notification for path failure, so that someone knows
about it and can correct it before something else fails.
This is accomplished by determining the error that shows up in the error log when a
path fails (via testing), and then
Adding an entry to the errnotify ODM class for that error which calls a script (that you
write) that notifies someone that a path has failed. Hint: You can use # odmget errnotify to see what the entries (or stanzas) look like, then you create a stanza and use the odmadd command to add it to the errnotify class.
22
Adapter replacement
1.
2. 3. 4. 5.
Its better to stop using a path before you know the path will disappear
23
So a controller is active for some LUNs, but passive for the others
IOs for a LUN are only sent to the Active controllers port for disk subsystems with Active/Passive
controllers
DS4000, DS5000, DS3950, Nseries, V7000 have active/passive controllers The NSeries passive controller can accept IOs but IO latency is affected The passive controller takes over in the event the active controller or all paths to it fail
MPIO recognizes Active/Passive disk subsystems and sends IOs only to the primary controller
Except under failure conditions, then the active/passive role switches for the affected LUNs
24
25
SDD: An Overview
SDD = Subsystem Device Driver Pre-MPIO Architecture Used with IBM ESS, DS6000, DS8000 and the SAN Volume Controller, but is not MPIO compliant
A
host attachment fileset (provides subsystem-specific support code & populates the ODM) and SDD fileset are both installed attachment: ibm2105.rte devices.sdd.<sdd_version>.rte
Host
SDD:
paths maximum per LUN, but less are recommended with more than 600 LUNs
One can exclude disks from SDD control using the excludesddcfg command Mirror rootvg across two separate LUNs on different adapters for availability
26
SDD
Load balancing algorithms
fo: rr: lb:
failover round robin load balancing (aka. df or the default) and chooses adapter with fewest in-flight IOs load balancing sequential optimized for sequential IO round robin sequential optimized for sequential IO
lbs: rrs:
The datapath command is used to examine vpaths, adapters, paths, vpath statistics,
path statistics, adapter statistics, dynamically change the load balancing algorithm, and other administrative tasks such as adapter replacement, disabling paths, etc.
mkvg4vp is used instead of mkvg, and extendvg4vp is used instead of extendvg SDD automatically recovers failed paths that have been repaired via the sddsrv
daemon
27
These latencies are typically < 1% of typical IO service times Load balancing is more likely to be of benefit in SANs with heavy utilizations or
with intermittent errors that slow IOs on some path A round_robin algorithm is usually equivalent
Conclusion: Load balancing is unlikely to improve performance--especially when compared to other strategies like algorithm=round_robin or approaches that balance IO with algorithm=fail_over
28
load_balancing algorithm must consume CPU and memory resources to determine the best path to use. possible to setup fail_over LUNs so that the loads are balanced across the available FC adapters. use an example with 2 FC adapters. Assume we correctly lay out our data so that the IOs are balanced across the LUNs (this is usually a best practice). Then if we assign half the LUNs to FC adapterA and half to FC adapterB, then the IOs are evenly balanced across the adapters! question to ask is, If one adapter is handling more IO than another, will this have a significant impact on IO latency? the FC adapters are capable of handling more than 35,000 IOPS then we're unlikely to bottleneck at the adapter and add significant latency to the IO.
It's
Let's
Since
29
SDDPCM: An Overview
SDDPCM = Subsystem Device Driver Path Control Module
SDDPCM is MPIO compliant and can be used with IBM ESS, DS6000, DS8000,
DS4000 (most models), DS5000, DS3950, V7000 and the SVC
A host attachment fileset (populates the ODM) and SDDPCM fileset are both installed Host attachment: devices.fcp.disk.ibm.mpio.rte SDDPCM: devices.sddpcm.<version>.rte
Provides a PCM per the MPIO architecture One installs SDDPCM or SDD, not both.
30
Suported Devices
OS Integration Considerations
Update levels are provided and are updated and migrated as a mainline part of all the normal AIX and VIOS service strategy and upgrade/migration paths
access must be stopped in order to change Dynamic Algorithm Selection Disk algorithm SAN boot, dump, paging support PowerHA & GPFS Support Utilities Yes Yes standard AIX performance monitoring tools such as iostat and fcstat Yes. Restart required if SDDPCM installed after MPIOPCM and SDDPCM boot desired. Yes Enhanced utilities (pcmpath commands) to show mappings from adapters, paths, devices, as well as performance and error statistics
31
SDDPCM
Load balancing algorithms
rr lb fo lbp
- round robin - load balancing based on in-flight IOs per adapter - failover policy
- load balancing port (for ESS, DS6000, DS8000, V7000 and SVC only) based on in-flight IOs per adapter and per storage port
The pcmpath command is used to examine hdisks, adapters, paths, hdisk statistics, path
statistics, adapter statistics, dynamically change the load balancing algorithm, and other administrative tasks such as adapter replacement, disabling paths
SDDPCM automatically recovers failed paths that have been repaired via the pcmserv
daemon
MPIO
health checking can also be used, and can be dynamically set via the pcmpath command. This is recommended. Set the hc_interval to a non-zero value for an appropriate number of LUNs to check the physical paths.
32
33
* Indicates path to passive controller 2145 is a SVC which has active/passive nodes for a LUN DS4000, DS5000, V7000 and DS3950 also have active/passive controllers IOs will be balanced across paths to the active controller
34
Active Read 0 0
Active Write 0 0
Maximum
20
5888
Transfer Size:
<= 4k 67388759
Maximum value useful for tuning hdisk queue depths 20 is maximum inflight requests for the IOs shown Increase queue depth until queue is not filling up or
until IO services times suffer (bottleneck is pushed to the subsystem)
35
36
www-01.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350#AIXSDDPCM
37
38
Migration from SDD to SDDPCM is fairly straightforward and doesn't require a lot of time. The procedure is documented in the manual: 1. Varyoff your SDD VGs 2. Stop the sddsrv daemon via stopsrc -s sddsrv 3. Remove the SDD devices (both vpaths and hdisks) via instructions below 4. Remove the dpo device 5. Uninstall SDD and the host attachment fileset for SDD 6. Install the host attachment fileset for SDDPCM and SDDPCM 7. Configure the new disks (if you rebooted it's done, else run cfgmgr and startsrc s pcmserv) 8. Varyon your VGs - you're back in business To remove the vpaths and hdisks, use:
No exportvg/importvg is needed because LVM keeps track of PVs via PVID Effective queue depths change (and changes to queue_depth will be lost): SDD effective queue depth = # paths for a LUN x queue_depth SDDPCM effective queue depth = queue_depth
39
depend on model and AIX level uses MPIO and is recommended not supported on VIOS yet for these disk subsystems so use MPIO
MPIO is strategic
SDDPCM SDDPCM
requires fcsA be connected to controllerA and fcsB connected to controllerB with no cross connections IO for a LUN goes to its primary controller
Unless the paths to it fail, or the controller fails, then the other controller takes over the LUN
Storage
administrator assigns half the LUNs to each controller vary among models and AIX levels
40
AIX_AAPCM - MPIO with active/active controllers AIX_APPCM - MPIO with active/passive controllers AIX_SDDAPPCM - SDDPCM AIX_fcparray - RDAC
2011 IBM Corporation
41
42
XIV
Host Attachment Kit for AIX
http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000802 # lsdev -Pc disk | grep xiv disk 2810xiv fcp N/A XIV support has moved from fileset support, to support within AIX
Installing the Host Attachment Kit is still recommended Provides diagnostic and other commands
Disks configured as 2810xiv devices ODM entries for XIV included with
AIX 5.3 TL 10, AIX 6.1 TL3, VIOS 2.1.2.x and AIX 7
43
Nseries/NetApp
Nseries/NetApp has a preferred storage controller for each LUN Not exactly an active/passive disk subsystem, as the non-preferred
controller can accept IO requests I/O requests have to be passed to the preferred controller which impacts latency Install the SAN Toolkit Ontap.mpio_attach_kit.*
Provides the dotpaths utility and sanlun commands dotpaths sets hdisk path priorities to favor the primary controller
for every IO going down secondary path, there will be 255 IOs sent down primary path
2011 IBM Corporation
44
Boot Directly from SAN Storage is zoned directly to the client HBAs used for boot and/or data access Multi-path code for the storage runs in client
SAN Sourced Boot Disks Affected LUNs are zoned to VIOS(s) and assigned to clients via VIOS definitions Multi-path code in the client will be the MPIO default PCM for disks seen through the VIOS.
Boot from SVC via VIO Server Affected LUNs are zoned to VIOS(s) and assigned to clients via VIOS definitions Multi-path code in the client will be the MPIO default PCM for disks seen through the VIOS.
45
with FC boot capability microcode (system, FC adapter, disk subsystem and FC switch)
Appropriate Disk
subsystem supporting AIX FC boot Some older systems dont support FC boot, if in doubt, check the sales manual
the SAN LUNs and assign them to the system's FC adapters WWPNs prior to installing the system
AIX installation
Boot
When
you do the installation you'll get a list of disks that will be on the SAN for the system the disks for installing rootvg aware of disk SCSI reservation policies
Choose Be
46
These criteria can be used to select the LUN from the AIX install program (shown in following screen shots) or via a bosinst_data file for NIM
47
Storage WWN
LUN ID
48
49
50
problems can cause loss of access to rootvg ~ not an issue as app data is on SAN anyway loss of system dump and diagnosis if loss of access to SAN is caused by a kernel bug to change multi-path IO code
Potential
Difficult
Not an issue with dual VIOScan take down one VIOS at a time and change multipath code SAN boot thru VIO with NPIV is like SAN boot
51
rootvg to internal SAS disks, e.g., using extendvg, migratepv, reducevg, bosboot and bootlist, or use alt_disk_install
Change
Move
rootvg back to SAN versions of AIX require a newer version of SDD or SDDPCM
Newer
Follow procedures in the SDD and SDDPCM manual for upgrades of AIX
and/or the multi-path code Not an issue when using VIO with dual VIOSs
If one has many LPARs booting from SAN, one SAS adapter with a SAS disk or two can be used to migrate SDD to SDDPCM, one LPAR at a time
52
53
http://h18006.www1.hp.com/storage/aix.html
54
Session Evaluations
55