You are on page 1of 55

Path Management and SAN Boot with MPIO on AIX

John Hock jrhock@us.ibm.com Dan Braden dbraden@us.ibm.com Power Systems Advanced Technical Skills

Materials may not be reproduced in whole or in part without the prior written permission of IBM.

5.3

IBM Power Systems Technical Symposium 2011

Agenda
Correctly Configuring Your Disks
Filesets

for disks and multipath code

Multi-path basics Multi Path I/O (MPIO)


Useful Path

MPIO Commands

priorities

Failed

Path Recovery and path health checking

path management SDD and SDDPCM Multi-path code choices for DS4000, DS5000 and DS3950 XIV & Nseries SAN Boot

MPIO

2011 IBM Corporation

IBM Power Systems Technical Symposium 2011

Disk configuration

https://tuf.hds.com/gsc/bin/view/Main/AIXODMUpdates ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS/

The disk vendor Dictates what multi-path code can be used Supplies the filesets for the disks and multipath code Supports the components that they supply A fileset is loaded to update the ODM to support the storage AIX then recognizes and appropriately configures the disk Without this, disks are configured using a generic ODM definition Performance and error handling may suffer as a result # lsdev Pc disk displays supported storage The multi-path code will be a different fileset Unless using the MPIO thats included with AIX
Beware of generic Other disk definition No command queuing. Poor Performance & Error Handling

2011 IBM Corporation

IBM Power Systems Technical Symposium 2011

How many paths for a LUN?


Paths = (# of paths from server to switch) x (# paths from storage to switch) Here there are potentially 6 paths per LUN But reduced via: LUN masking at the storage Assign LUNs to specific FC adapters at the host, and thru specific ports on the storage Zoning WWPN or SAN switch port zoning Dual SAN fabrics divides potential paths by two 4 paths per LUN are sufficient for availability and reduces CPU overhead for choosing the path Path selection overhead is relatively lowusually negligible MPIO has no practical limits to number of paths Other products have path limits SDDPCM limited to 32 paths per LUN

Server

FC Switch

Storage

2011 IBM Corporation

IBM Power Systems Technical Symposium 2011

How many paths for a LUN?, contd


Dual SAN Fabric Reduces Potential Paths

Server

FC Switch

Storage

4 X 4 = 16
2011 IBM Corporation

2X2+2X2=8

IBM Power Systems Technical Symposium 2011

Path selection benefits and costs


Path selection algorithms choose a path to hopefully minimize latency added to an IO to send it over the SAN to the storage Latency to send a 4 KB IO over a 8 Gbps SAN link is 4 KB / (8 Gb/s x 0.1 B/b x1048576 KB/GB) = 0.0048 ms Multiple links may be involved, and IOs are round trip As compared to fastest IO service times around 1 ms

If the links arent busy, there likely wont be much, if any, savings from use of sophisticated path selection algorithims vs. round robin
Generally utilization of links is low

Costs of path selection algorithms CPU cycles to choose the best path Memory to keep track of in-flight IOs down each path, or Memory to keep track of IO service times down each path Latency added to the IO to choose the best path
2011 IBM Corporation

IBM Power Systems Technical Symposium 2011

Multi-path IO with VIO and VSCSI LUNs

VIO Client MPIO

Two layers of multi-path code: VIOC and VIOS VSCSI disks always use AIX default MPIO and
all IO for a LUN normally goes to one VIOS
algorithm

= fail_over only

Multi-path code

VIO Server

Multi-path code

VIO Server

VIOS uses the multi-path code specified for the disk


subsystem

Set the path priorities for the VSCSI hdisks so half


Disk Subsystem

use one VIOS, and half use the other

2011 IBM Corporation

IBM Power Systems Technical Symposium 2011

Multi-path IO with VIO and NPIV


VIOC has virtual FC adapters (vFC)
Multi-path code
VFC VFC VFC VFC

VIO Client

Potentially

one vFC adapter for every real FC adapter in each VIOC of 64 vFC adapters per real FC adapter recommended

Maximum

VIO Server

VIO Server

VIOC uses multi-path code that the disk subsystem


supports

IOs for a LUN can go thru both VIOSs


Disk Subsystem

One layer of multi-path code

2011 IBM Corporation

IBM Power Systems Technical Symposium 2011

What is MPIO?
MPIO is an architecture designed by AIX development (released in AIX V5.2) MPIO is also a commonly used acronym for Multi-Path IO

In this presentation MPIO refers explicitly to the architecture, not the acronym

Why was the MPIO architecture developed?


With the advent of SANs, each disk subsystem vendor wrote their own multi-path code These multi-path code sets were usually incompatible Mixing disk subsystems was usually not supported on the same system, and if they were, they usually required their own FC adapters Integration with AIX IO error handling and recovery Several levels of IO timeouts: basic IO timeout, FC path timeout, etc

MPIO architecture details available to disk subsystem vendors


Compliant code requires a Path Control Module (PCM) for each disk subsystem

Default PCMs for SCSI and FC exist on AIX and often used by the vendors
Capabilities exist for different path selection algorithms Disk vendors have been moving towards MPIO compliant code
MPIO Common Interface

2011 IBM Corporation

IBM Power Systems Technical Symposium 2011

Overview of MPIO Architecture


LUNs show up as an hdisk
Architected No

for 32 K paths

more than 16 paths are necessary PCMs exist for FC, SCSI may write optional PCMs

PCM: Path Control Module


Default Vendors May

provide commands to manage paths

Allows various algorithms to balance use


of paths Full support for multiple paths to rootvg
Tip: to keep paths <= 16, group sets of 4 host ports and 4 storage ports and balance LUNs across them

Hdisks can be Available, Defined or non-existent Paths can also be Available, Defined, Missing or non-existent Path status can be enabled, disabled or failed if the path is Available
(use chpath command to change status) Add path: e.g. after installing new adapter and cable to the disk run cfgmgr (or cfgmgr l <adapter>) One must get the device layer correct, before working with the path status layer
2011 IBM Corporation

10

IBM Power Systems Technical Symposium 2011

MPIO support
Storage Subsystem Family IBM ESS, DS6000, DS8000, DS3950, DS4000, DS5000, SVC, V7000 DS3/4/5000 in VIOS IBM XIV Storage System IBM System Storage N Series EMC Symmetrix HP & HDS (varies by model) MPIO code IBM Subsystem Device Driver Path Control Module (SDDPCM) Default FC PCM recommended Default FC PCM Default FC PCM Default FC PCM Hitachi Dynamic Link Manager (HDLM) Default FC PCM Multi-path algorithm fail_over, round_robin, load balance, load balance port fail_over, round_robin fail_over, round_robin fail_over, round_robin fail_over, round_robin fail_over, round robin, extended round robin fail_over, round_robin

SCSI
VIO VSCSI

Default SCSI PCM


Default SCSI PCM

fail_over, round_robin
fail_over

2011 IBM Corporation

11

IBM Power Systems Technical Symposium 2011

Non-MPIO multi-path code

Storage subsystem family IBM DS4000 EMC HP HDS Vertias supported storage

Multi-path code Redundant Disk Array Controller (RDAC) Power Path AutoPath HDLM (older versions) Dynamic MultiPathing (DMP)

2011 IBM Corporation

12

IBM Power Systems Technical Symposium 2011

Mixing multi-path code sets


The disk subsystem vendor specifies what multi-path code is supported for their storage
The

disk subsystem vendor supports their storage, the server vendor generally doesnt

You can mix multi-path code compliant with MPIO and even share adapters
There

may be exceptions. Contact vendor for latest updates. HP example: Connection to a common server with different HBAs requires separate HBA zones for XP, VA, and EVA that SDD and RDAC can be mixed on the same LPAR

Generally one non-MPIO compliant code set can exist with other MPIO compliant code sets
Except The

non-MPIO compliant code must be using its own adapters

Devices of a given type use only one multi-path code set


e.g.,

you cant used SDDPCM for one DS8000 and SDD for another DS8000 on the same AIX instance

2011 IBM Corporation

13

IBM Power Systems Technical Symposium 2011

Sharing Fibre Channel Adapter ports

Disk using MPIO compliant code sets can share adapter ports Its recommended that disk and tape use separate ports

Disk (typicaly small block random) and tape (typically large block sequential) IO are different, and stability issues have been seen at high IO rates

2011 IBM Corporation

14

IBM Power Systems Technical Symposium 2011

MPIO Command Set


lspath list paths, path status and path attributes for a disk chpath change path status or path attributes
Enable

or disable paths

rmpath delete or change path state


Putting

a path into the defined mode means it wont be used (from available to defined)

One

cannot define/delete the last path of a device

mkpath add another path to a device or makes a defined path available


Generally

cfgmgr is used to add new paths

chdev change a devices attributes (not specific to MPIO) cfgmgr add new paths to an hdisk or make defined paths available
(not specific to MPIO)

2011 IBM Corporation

15

IBM Power Systems Technical Symposium 2011

Useful MPIO Commands


List status of the paths and the parent device (or adapter)
# lspath -Hl <hdisk#> List connection information for a path # lspath -l hdisk2 -F"status parent connection path_status path_id Enabled fscsi0 203900a0b8478dda,f000000000000 Available 0 Enabled fscsi0 201800a0b8478dda,f000000000000 Available 1 Enabled fscsi1 201900a0b8478dda,f000000000000 Available 2 Enabled fscsi1 203800a0b8478dda,f000000000000 Available 3 The connection field contains the storage port WWPN In the case above, paths go to two storage ports and WWPNs: 203900a0b8478dda 201800a0b8478dda List a specific path's attributes # lspath -AEl hdisk2 -p fscsi0 w 203900a0b8478dda,f00000000000 scsi_id 0x30400 SCSI ID False node_name 0x200800a0b8478dda FC Node Name False priority 1 Priority True

2011 IBM Corporation

16

IBM Power Systems Technical Symposium 2011

Path priorities
A Priority Attribute for paths can be used to specify a preference for path
IOs. How it works depends whether the hdisks algorithm attribute is set to fail_over or round_robin. Value specified is inverse to priority, i.e. 1 is high priority

algorithm=fail_over
the the Set if

path with the higher priority value handles all the IOs unless there's a path failure. other path(s) will only be used when there is a path failure.

the primary path to be used by setting it's priority value to 1, and the next path's priority (in case of path failure) to 2, and so on. the path priorities are the same and algorithm=fail_over, the primary path will be the first listed for the hdisk in the CuPath ODM as shown by # odmget CuPath

algorithm=round_robin
If

the priority attributes are the same, then IOs go down each path equally.

In

the case of two paths, if you set path As priority to 1 and path Bs to 255, then for every IO going down path A, there will be 255 IOs sent down path B.

To change the path priority of an MPIO device on a VIO client:


# chpath -l hdisk0 -p vscsi1 -a priority=2
Set

path priorities for VSCSI disks to balance use of VIOSs

2011 IBM Corporation

17

IBM Power Systems Technical Symposium 2011

Path priorities
# lsattr -El hdisk9 PCM PCM/friend/otherapdisk algorithm fail_over hcheck_interval 60 hcheck_mode nonactive lun_id 0x5000000000000 node_name 0x20060080e517b6ba queue_depth 10 reserve_policy single_path ww_name 0x20160080e517b6ba Path Control Module Algorithm Health Check Interval Health Check Mode Logical Unit Number ID FC Node Name Queue DEPTH Reserve Policy FC World Wide Name False True True True False False True True False

# lspath -l hdisk9 -F"parent connection status path_status" fscsi1 20160080e517b6ba,5000000000000 Enabled Available fscsi1 20170080e517b6ba,5000000000000 Enabled Available # lspath -AEl hdisk9 -p fscsi1 -w"20160080e517b6ba,5000000000000" scsi_id 0x10a00 SCSI ID False node_name 0x20060080e517b6ba FC Node Name False priority 1 Priority True Note: whether or not path priorities apply depends on the PCM. With SDDPCM, path priorities only apply when the algorithm used is fail over (fo). Otherwise, they arent used.
2011 IBM Corporation

18

IBM Power Systems Technical Symposium 2011

Path priorities why change them?


With VIOCs, send the IOs for half the LUNs to one VIOS and half to the other
Set

priorities for half the LUNs to use VIOSa/vscsi0 and half to use VIOSb/vscsi1 both VIOSs CPU and virtual adapters is the only option at the VIOC for VSCSI disks

Uses

algorithm=fail_over

With NSeries have the IOs go the primary controller for the LUN
Set

via the dotpaths utility that comes with Nseries filesets

2011 IBM Corporation

19

IBM Power Systems Technical Symposium 2011

Path Health Checking and Recovery


Validate a path is working Automate recovery of path For SDDPCM and MPIO compliant disks, two hdisk attributes apply:
# lsattr -El hdisk26 hcheck_interval hcheck_mode 0 nonactive Health Check Interval Health Check Mode True True

hcheck_interval
Defines how often the health check is performed on the paths for a device. The attribute supports a range from 0 to 3600 seconds. When a value of 0 is selected (the default), health checking is disabled Preferably set to at least 2X IO timeout value

hcheck_mode

Determines which paths should be checked when the health check capability is used: enabled: Sends the healthcheck command down paths with a state of enabled failed: Sends the healthcheck command down paths with a state of failed nonactive: (Default) Sends the healthcheck command down paths that have no active I/O, including paths with a state of failed. If the algorithm selected is failover, then the healthcheck command is also sent on each of the paths that have a state of enabled but have no active IO. If the algorithm selected is round_robin, then the healthcheck command is only sent on paths with a state of failed, because the round_robin algorithm keeps all enabled paths active with IO.

Consider setting up error notification for path failures (later slide)


2011 IBM Corporation

20

IBM Power Systems Technical Symposium 2011

Path Recovery
MPIO will recover failed paths if path health checking is enabled with hcheck_mode=nonactive or failed
and the device has been opened

Trade-offs exist:

Lots of path health checking can create a lot of SAN traffic

Automatic recovery requires turning on path health checking for each LUN
Lots of time between health checks means paths will take longer to recover after repair Health checking for a single LUN is often sufficient to monitor all the physical paths, but not to recover them

SDD and SDDPCM also recover failed paths automatically In addition, SDDPCM provides a health check daemon to provide an automated method of reclaiming failed
paths to a closed device.

To manually enable a failed path after repair or re-enable a disabled path: To disable all paths using a specific FC port on the host:
# chpath l hdisk1 p <parent> -s disable

# chpath -l hdisk1 -p <parent> w <connection> -s enable

2011 IBM Corporation

21

IBM Power Systems Technical Symposium 2011

Path Health Checking and Recovery Notification!

One should also set up error notification for path failure, so that someone knows
about it and can correct it before something else fails.

This is accomplished by determining the error that shows up in the error log when a
path fails (via testing), and then

Adding an entry to the errnotify ODM class for that error which calls a script (that you
write) that notifies someone that a path has failed. Hint: You can use # odmget errnotify to see what the entries (or stanzas) look like, then you create a stanza and use the odmadd command to add it to the errnotify class.

2011 IBM Corporation

22

IBM Power Systems Technical Symposium 2011

Path management with MPIO


Includes examining, adding, removing, enabling and disabling paths

Adapter failure/replacement or addition VIOS upgrades (VIOS or multi-path code)

Cable failure and replacement


Storage controller/port failure and repair Paths will not be in use if the adapter has failed, paths will be in the failed state Remove paths with # rmpath l <hdisk> -p <parent> -w <connection> [-d] -d will remove the path, without it the path will changed to Defined Remove the adapter with # rmdev Rdl <fcs#> Replace the adapter cfgmgr

Adapter replacement
1.

2. 3. 4. 5.

Check the paths with lspath


Avoid timeouts, application delays or performance impacts and potential error recovery bugs

Its better to stop using a path before you know the path will disappear

2011 IBM Corporation

23

IBM Power Systems Technical Symposium 2011

Active/Active vs. Active/Passive Disk Subsystem Controllers


IOs for a LUN can be sent to any storage port with Active/Active controllers LUNs are balanced across controllers for Active/Passive disk subsystems

So a controller is active for some LUNs, but passive for the others

IOs for a LUN are only sent to the Active controllers port for disk subsystems with Active/Passive
controllers

ESS, DS6000, DS8000, and XIV have active/active controllers

DS4000, DS5000, DS3950, Nseries, V7000 have active/passive controllers The NSeries passive controller can accept IOs but IO latency is affected The passive controller takes over in the event the active controller or all paths to it fail

MPIO recognizes Active/Passive disk subsystems and sends IOs only to the primary controller

Except under failure conditions, then the active/passive role switches for the affected LUNs

Terminology regaring active/active and active/passive varies considerably

2011 IBM Corporation

24

IBM Power Systems Technical Symposium 2011

Example: Active/Passive Paths

2011 IBM Corporation

25

IBM Power Systems Technical Symposium 2011

SDD: An Overview
SDD = Subsystem Device Driver Pre-MPIO Architecture Used with IBM ESS, DS6000, DS8000 and the SAN Volume Controller, but is not MPIO compliant
A

host attachment fileset (provides subsystem-specific support code & populates the ODM) and SDD fileset are both installed attachment: ibm2105.rte devices.sdd.<sdd_version>.rte

Host

SDD:

LUNs show up as vpaths, with an hdisk device for each path


32

paths maximum per LUN, but less are recommended with more than 600 LUNs

One installs SDDPCM or SDD, not both.

No support for rootvg, dump or paging devices


One can exclude disks from SDD control using the excludesddcfg command Mirror rootvg across two separate LUNs on different adapters for availability

2011 IBM Corporation

26

IBM Power Systems Technical Symposium 2011

SDD
Load balancing algorithms
fo: rr: lb:

failover round robin load balancing (aka. df or the default) and chooses adapter with fewest in-flight IOs load balancing sequential optimized for sequential IO round robin sequential optimized for sequential IO

lbs: rrs:

The datapath command is used to examine vpaths, adapters, paths, vpath statistics,
path statistics, adapter statistics, dynamically change the load balancing algorithm, and other administrative tasks such as adapter replacement, disabling paths, etc.

mkvg4vp is used instead of mkvg, and extendvg4vp is used instead of extendvg SDD automatically recovers failed paths that have been repaired via the sddsrv
daemon

2011 IBM Corporation

27

IBM Power Systems Technical Symposium 2011

Does Load Balancing Improve Performance?


Load balancing tried to reduce latency by picking a less active path
but

adds latency to choose the best path

These latencies are typically < 1% of typical IO service times Load balancing is more likely to be of benefit in SANs with heavy utilizations or
with intermittent errors that slow IOs on some path A round_robin algorithm is usually equivalent

Conclusion: Load balancing is unlikely to improve performance--especially when compared to other strategies like algorithm=round_robin or approaches that balance IO with algorithm=fail_over

2011 IBM Corporation

28

IBM Power Systems Technical Symposium 2011

Balancing IOs with algorithm=fail_over


A fail_over algorithm can be efficiently used to balance IOs!
Any

load_balancing algorithm must consume CPU and memory resources to determine the best path to use. possible to setup fail_over LUNs so that the loads are balanced across the available FC adapters. use an example with 2 FC adapters. Assume we correctly lay out our data so that the IOs are balanced across the LUNs (this is usually a best practice). Then if we assign half the LUNs to FC adapterA and half to FC adapterB, then the IOs are evenly balanced across the adapters! question to ask is, If one adapter is handling more IO than another, will this have a significant impact on IO latency? the FC adapters are capable of handling more than 35,000 IOPS then we're unlikely to bottleneck at the adapter and add significant latency to the IO.

It's

Let's

Since

2011 IBM Corporation

29

IBM Power Systems Technical Symposium 2011

SDDPCM: An Overview
SDDPCM = Subsystem Device Driver Path Control Module
SDDPCM is MPIO compliant and can be used with IBM ESS, DS6000, DS8000,
DS4000 (most models), DS5000, DS3950, V7000 and the SVC

A host attachment fileset (populates the ODM) and SDDPCM fileset are both installed Host attachment: devices.fcp.disk.ibm.mpio.rte SDDPCM: devices.sddpcm.<version>.rte

LUNs show up as hdisks, paths shown with pcmpath or lspath commands

16 paths per LUN supported

Provides a PCM per the MPIO architecture One installs SDDPCM or SDD, not both.

SDDPCM is recommended and strategic

2011 IBM Corporation

30

IBM Power Systems Technical Symposium 2011

Comparing AIX Default MPIO PCMs & SDDPCM


Feature/Function How obtained MPIO PCMs SDDPCM Provided as an integrated part of the base VIOS Provided by most IBM storage products for POWERVM firmware and AIX operating system subsequent installation on the various server product distribution OSs that the device supports Supports most disk devices that the AIX operating system and VIOS POWERVM firmware support, including selected third-party devices Supports specific IBM devices and is referenced by the particular device support statement. The supported devices differ between AIX and POWERVM VIOS Add-on software entity that has its own update strategy and process for obtaining fixes. The customer must manage coexistence levels between both the mix of devices, operating system levels and VIOS levels. NOT a licensed program product. Fail over Round Robin Load Balancing (default) Load Balancing Port

Suported Devices

OS Integration Considerations

Update levels are provided and are updated and migrated as a mainline part of all the normal AIX and VIOS service strategy and upgrade/migration paths

Path Selection Algorithms

Fail over (default) Round Robin (excluding VSCSI disks)

access must be stopped in order to change Dynamic Algorithm Selection Disk algorithm SAN boot, dump, paging support PowerHA & GPFS Support Utilities Yes Yes standard AIX performance monitoring tools such as iostat and fcstat Yes. Restart required if SDDPCM installed after MPIOPCM and SDDPCM boot desired. Yes Enhanced utilities (pcmpath commands) to show mappings from adapters, paths, devices, as well as performance and error statistics

2011 IBM Corporation

31

IBM Power Systems Technical Symposium 2011

SDDPCM
Load balancing algorithms
rr lb fo lbp

- round robin - load balancing based on in-flight IOs per adapter - failover policy

- load balancing port (for ESS, DS6000, DS8000, V7000 and SVC only) based on in-flight IOs per adapter and per storage port

The pcmpath command is used to examine hdisks, adapters, paths, hdisk statistics, path
statistics, adapter statistics, dynamically change the load balancing algorithm, and other administrative tasks such as adapter replacement, disabling paths

SDDPCM automatically recovers failed paths that have been repaired via the pcmserv
daemon
MPIO

health checking can also be used, and can be dynamically set via the pcmpath command. This is recommended. Set the hc_interval to a non-zero value for an appropriate number of LUNs to check the physical paths.

2011 IBM Corporation

32

IBM Power Systems Technical Symposium 2011

Path management with SDDPCM and the pcmpath command


# pcmpath # pcmpath # pcmpath # pcmpath # pcmpath # pcmpath # pcmpath # pcmpath # pcmpath # pcmpath # pcmpath # pcmpath # pcmpath And more query adapter List adapters and status query device List hdisks and paths query port List DS8000/DS6000/SVC ports query devstats List hdisk/path IO statistics query adaptstats List adapter IO statistics query portstats List DS8000/DS6000/SVC port statistics query essmap List rank, LUN ID and more for each hdisk set adapter Disable/enable paths to adapter set device path Disable/enable paths to a hdisk set device algorithm Dynamically change path algorithm set device hc_interval Dynamically change health check interval disable/enable ports Disable/enable paths to a disk port query wwpn Display all FC adapter WWPNs

SDD offers the similar datapath command

2011 IBM Corporation

33

IBM Power Systems Technical Symposium 2011

Path management with SDDPCM and the pcmpath command


# pcmpath query device DEV#: 2 DEVICE NAME: hdisk2 TYPE: 2145 ALGORITHM: Load Balance SERIAL: 600507680190013250000000000000F4 ========================================================================== Path# Adapter/Path Name State Mode Select Errors 0 fscsi0/path0 OPEN NORMAL 40928736 0 1* fscsi0/path1 OPEN NORMAL 16 0 2 fscsi2/path4 OPEN NORMAL 43927751 0 3* fscsi2/path5 OPEN NORMAL 15 0 4 fscsi1/path2 OPEN NORMAL 44357912 0 5* fscsi1/path3 OPEN NORMAL 14 0 6 fscsi3/path6 OPEN NORMAL 43050237 0 7* fscsi3/path7 OPEN NORMAL 14 0

* Indicates path to passive controller 2145 is a SVC which has active/passive nodes for a LUN DS4000, DS5000, V7000 and DS3950 also have active/passive controllers IOs will be balanced across paths to the active controller

2011 IBM Corporation

34

IBM Power Systems Technical Symposium 2011

Path management with SDDPCM and the pcmpath command


# pcmpath query devstats Total Dual Active and Active/Asymmetrc Devices : 67 DEV#: 2 DEVICE NAME: hdisk2 =============================== Total Read Total Write I/O: 169415657 2849038 SECTOR: 2446703617 318507176

Active Read 0 0

Active Write 0 0

Maximum

20
5888

Transfer Size:

<= 512 183162

<= 4k 67388759

<= 16K 35609487

<= 64K 46379563

> 64K 22703724

Maximum value useful for tuning hdisk queue depths 20 is maximum inflight requests for the IOs shown Increase queue depth until queue is not filling up or
until IO services times suffer (bottleneck is pushed to the subsystem)

See References for queue depth tuning whitepaper


2011 IBM Corporation

writes > 3ms reads > 15-20ms

35

IBM Power Systems Technical Symposium 2011

SDD & SDDPCM: Getting Disks configured correctly


Install the appropriate filesets
SDD or SDDPCM for the required disks (and host attachment fileset) If you are using SDDPCM, install the MPIO fileset as well which comes with AIX devices.common.IBM.mpio.rte Host attachment scripts http://www.ibm.com/support/dlsearch.wss?rs=540&q=host+scripts&tc=ST52G7&dc=D410 Reboot or start the sddsrv/pcmsrv daemon

smitty disk -> List All Supported Disk


Displays disk types for which software support has been installed Or # lsdev -Pc disk | grep MPIO disk mpioosdisk fcp MPIO Other FC SCSI Disk Drive disk 1750 fcp IBM MPIO FC 1750 DS6000 disk 2105 fcp IBM MPIO FC 2105 ESS disk 2107 fcp IBM MPIO FC 2107 DS8000 disk 2145 fcp MPIO FC 2145 SVC disk DS3950 fcp IBM MPIO DS3950 Array Disk disk DS4100 fcp IBM MPIO DS4100 Array Disk disk DS4200 fcp IBM MPIO DS4200 Array Disk disk DS4300 fcp IBM MPIO DS4300 Array Disk disk DS4500 fcp IBM MPIO DS4500 Array Disk disk DS4700 fcp IBM MPIO DS4700 Array Disk disk DS4800 fcp IBM MPIO DS4800 Array Disk disk DS5000 fcp IBM MPIO DS5000 Array Disk disk DS5020 fcp IBM MPIO DS5020 Array Disk

2011 IBM Corporation

36

IBM Power Systems Technical Symposium 2011

www-01.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350#AIXSDDPCM

2011 IBM Corporation

37

IBM Power Systems Technical Symposium 2011

2011 IBM Corporation

38

IBM Power Systems Technical Symposium 2011

Migration from SDD to SDDPCM

Migration from SDD to SDDPCM is fairly straightforward and doesn't require a lot of time. The procedure is documented in the manual: 1. Varyoff your SDD VGs 2. Stop the sddsrv daemon via stopsrc -s sddsrv 3. Remove the SDD devices (both vpaths and hdisks) via instructions below 4. Remove the dpo device 5. Uninstall SDD and the host attachment fileset for SDD 6. Install the host attachment fileset for SDDPCM and SDDPCM 7. Configure the new disks (if you rebooted it's done, else run cfgmgr and startsrc s pcmserv) 8. Varyon your VGs - you're back in business To remove the vpaths and hdisks, use:

# rmdev -Rdl dpo

No exportvg/importvg is needed because LVM keeps track of PVs via PVID Effective queue depths change (and changes to queue_depth will be lost): SDD effective queue depth = # paths for a LUN x queue_depth SDDPCM effective queue depth = queue_depth

2011 IBM Corporation

39

IBM Power Systems Technical Symposium 2011

Multi-path code choices for DS4000/DS5000/DS3950


These disk subsystems might use RDAC, MPIO or SDDPCM
Choices

depend on model and AIX level uses MPIO and is recommended not supported on VIOS yet for these disk subsystems so use MPIO

MPIO is strategic
SDDPCM SDDPCM

SAN cabling/zoning is more flexible with MPIO/SDDPCM than with RDAC


RDAC

requires fcsA be connected to controllerA and fcsB connected to controllerB with no cross connections IO for a LUN goes to its primary controller

These disk subsystems have active/passive controllers


All

Unless the paths to it fail, or the controller fails, then the other controller takes over the LUN
Storage

administrator assigns half the LUNs to each controller vary among models and AIX levels

The manage_disk_drivers command is used to choose the multi-path code


Choices

DS3950, DS5020, DS5100, DS5300 use MPIO or SDDPCM


2011 IBM Corporation

40

IBM Power Systems Technical Symposium 2011

Multi-path code choices for DS3950, DS4000 and DS5000


# manage_disk_drivers -l Device Present Driver 2810XIV AIX_AAPCM DS4100 AIX_SDDAPPCM DS4200 AIX_SDDAPPCM DS4300 AIX_SDDAPPCM DS4500 AIX_SDDAPPCM DS4700 AIX_SDDAPPCM DS4800 AIX_SDDAPPCM DS3950 AIX_SDDAPPCM DS5020 AIX_SDDAPPCM DS5100/DS5300 AIX_SDDAPPCM DS3500 AIX_AAPCM Driver Options AIX_AAPCM,AIX_non_MPIO AIX_APPCM,AIX_fcparray AIX_APPCM,AIX_fcparray AIX_APPCM,AIX_fcparray AIX_APPCM,AIX_fcparray AIX_APPCM,AIX_fcparray AIX_APPCM,AIX_fcparray AIX_APPCM AIX_APPCM AIX_APPCM AIX_APPCM

To set the driver for use:


# manage_disk_drivers -d <device> -o <driver_option>

AIX_AAPCM - MPIO with active/active controllers AIX_APPCM - MPIO with active/passive controllers AIX_SDDAPPCM - SDDPCM AIX_fcparray - RDAC
2011 IBM Corporation

41

IBM Power Systems Technical Symposium 2011

Other MPIO commands for DS3/4/5000


# mpio_get_config Av Frame id 0: Storage Subsystem worldwide name: 608e50017be8800004bbc4c7e Controller count: 2 Partition count: 1 Partition 0: Storage Subsystem Name = 'DS-5020' hdisk LUN # Ownership hdisk4 0 A (preferred) hdisk5 1 B (preferred) hdisk6 2 A (preferred) hdisk7 3 B (preferred) hdisk8 4 A (preferred) hdisk9 5 B (preferred) # sddpcm_get_config Av
output is the same as above

User Label Array1_LUN1 Array2_LUN1 Array3_LUN1 Array4_LUN1 Array5_LUN1 Array6_LUN1

2011 IBM Corporation

42

IBM Power Systems Technical Symposium 2011

XIV
Host Attachment Kit for AIX
http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000802 # lsdev -Pc disk | grep xiv disk 2810xiv fcp N/A XIV support has moved from fileset support, to support within AIX

Installing the Host Attachment Kit is still recommended Provides diagnostic and other commands

Disks configured as 2810xiv devices ODM entries for XIV included with
AIX 5.3 TL 10, AIX 6.1 TL3, VIOS 2.1.2.x and AIX 7

2011 IBM Corporation

43

IBM Power Systems Technical Symposium 2011

Nseries/NetApp
Nseries/NetApp has a preferred storage controller for each LUN Not exactly an active/passive disk subsystem, as the non-preferred
controller can accept IO requests I/O requests have to be passed to the preferred controller which impacts latency Install the SAN Toolkit Ontap.mpio_attach_kit.*

Provides the dotpaths utility and sanlun commands dotpaths sets hdisk path priorities to favor the primary controller

for every IO going down secondary path, there will be 255 IOs sent down primary path
2011 IBM Corporation

44

IBM Power Systems Technical Symposium 2011

Storage Area Network (SAN) Boot


Boot from an SVC Storage is zoned directly to the client HBAs used for boot and/or data access SDDPCM runs in client (to support boot)

Boot Directly from SAN Storage is zoned directly to the client HBAs used for boot and/or data access Multi-path code for the storage runs in client

SAN Sourced Boot Disks Affected LUNs are zoned to VIOS(s) and assigned to clients via VIOS definitions Multi-path code in the client will be the MPIO default PCM for disks seen through the VIOS.

Boot from SVC via VIO Server Affected LUNs are zoned to VIOS(s) and assigned to clients via VIOS definitions Multi-path code in the client will be the MPIO default PCM for disks seen through the VIOS.

2011 IBM Corporation

45

IBM Power Systems Technical Symposium 2011

Storage Area Network (SAN) Boot


Requirements for SAN Booting
System

with FC boot capability microcode (system, FC adapter, disk subsystem and FC switch)

Appropriate Disk

subsystem supporting AIX FC boot Some older systems dont support FC boot, if in doubt, check the sales manual

SAN disk configuration


Create For

the SAN LUNs and assign them to the system's FC adapters WWPNs prior to installing the system

non-MPIO configurations, assign one LUN to one WWPN to keep it simple

AIX installation
Boot

from installation CD or NIM, this runs the install program

When

you do the installation you'll get a list of disks that will be on the SAN for the system the disks for installing rootvg aware of disk SCSI reservation policies

Choose Be

Avoid policies that limit access to a single path or adapter


2011 IBM Corporation

46

IBM Power Systems Technical Symposium 2011

How to assure you install to the right SAN disk


Only assign the rootvg LUN to the host prior to install, assign data LUNs later, or Create a LUN for rootvg with a size different than other LUNs, or Write down LUN ID and storage WWN, or Use disk with an existing PVID

These criteria can be used to select the LUN from the AIX install program (shown in following screen shots) or via a bosinst_data file for NIM

2011 IBM Corporation

47

IBM Power Systems Technical Symposium 2011

Choose via Location Code

1 hdisk2 U8234.EMA.06EF634-V5-C22-T1-W50050768012017C2-L1000000000000 2 hdisk3 U8234.EMA.06EF634-V5-C22-T1-W500507680120165C-L2000000000000 3 hdisk5 U8234.EMA.06EF634-V5-C22-T1-W500507680120165C-L3000000000000

Storage WWN

LUN ID

2011 IBM Corporation

48

IBM Power Systems Technical Symposium 2011

Choose via Size

2011 IBM Corporation

49

IBM Power Systems Technical Symposium 2011

Choose via PVID

2011 IBM Corporation

50

IBM Power Systems Technical Symposium 2011

Storage Area Network Booting: Pros & Cons


The main benefits of SAN rootvg
< 2 ms write, 5-10 ms read due to cache. Higher IOPS Availability with built in RAID protection Ability to easily redeploy disk Ability to FlashCopy/MetroMirror the rootvg for backup/DR Fewer hardware resources SAN rootvg disadvantages
SAN Performance

problems can cause loss of access to rootvg ~ not an issue as app data is on SAN anyway loss of system dump and diagnosis if loss of access to SAN is caused by a kernel bug to change multi-path IO code

Potential

Difficult

Not an issue with dual VIOScan take down one VIOS at a time and change multipath code SAN boot thru VIO with NPIV is like SAN boot

2011 IBM Corporation

51

IBM Power Systems Technical Symposium 2011

Changing multi-path IO code for rootvg not so easy


How do you change/update rootvg multi-path code when its in use? Changing from SDD to SDDPCM (or vice versa) requires contacting
support if booting from SAN, or:
Move

rootvg to internal SAS disks, e.g., using extendvg, migratepv, reducevg, bosboot and bootlist, or use alt_disk_install

Change
Move

the multi-path code

rootvg back to SAN versions of AIX require a newer version of SDD or SDDPCM

Newer

Follow procedures in the SDD and SDDPCM manual for upgrades of AIX
and/or the multi-path code Not an issue when using VIO with dual VIOSs
If one has many LPARs booting from SAN, one SAS adapter with a SAS disk or two can be used to migrate SDD to SDDPCM, one LPAR at a time

2011 IBM Corporation

52

IBM Power Systems Technical Symposium 2011

Documentation & References


Infocenter Multiple Path IO http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.baseadmn/doc /baseadmndita/dm_mpio.htm SDD and SDDPCM Support matrix: www.ibm.com/support/docview.wss?rs=540&uid=ssg1S7001350 Downloads and documentation for SDD www.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S400006 5&loc=en_US&cs=utf-8&lang=en Downloads and documentation for SDDPCM: www.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S400020 1&loc=en_US&cs=utf-8&lang=en IBM System Storage Interoperation Center (SSIC) http://www-03.ibm.com/systems/support/storage/ssic/interoperability.wss Guide to selecting a multipathing path control module for AIX or VIOS http://www.ibm.com/developerworks/aix/library/au-multipathing/index.html AIX disk queue depth tuning techdoc: http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105745
2011 IBM Corporation

53

IBM Power Systems Technical Symposium 2011

Documentation & References


Hitachi MPIO Support Site https://tuf.hds.com/gsc/bin/view/Main/AIXODMUpdates EMC MPIO Support Site ftp://ftp.emc.com/pub/elab/aix/ODM_DEFINITIONS/ HP Support Site http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=u s&objectID=c02619876&jumpid=reg_R1002_USEN HP StorageWorks for IBM AIX

http://h18006.www1.hp.com/storage/aix.html

2011 IBM Corporation

54

IBM Power Systems Technical Symposium 2011

Session Evaluations

Session Number SE39


Session Name Working with San Boot Date - Thursday, April 28, 14:30, Lake Down B Friday, April 29, 13:00, Lake Hart B
2011 IBM Corporation

55

You might also like