You are on page 1of 627

IBM TotalStorage Enterprise Storage Server

Service Guide 2105 Models 750/800 and Expansion Enclosure Volume 1 Chapters 1, 2 (START), and 3

SY27-7635-05

IBM TotalStorage Enterprise Storage Server

Service Guide 2105 Models 750/800 and Expansion Enclosure Volume 1 Chapters 1, 2 (START), and 3

SY27-7635-05

Note Before using this information and the product it supports, be sure to read the general information under Notices on page xvii.

Sixth Edition (November 2005) This edition replaces SY27-7635-04 This edition applies to the first release of the IBM TotalStorage Enterprise Storage Server and to all following releases and changes until otherwise indicated in new editions. Order publications through your IBM representative or the IBM branch office serving your locality. Publications are not stocked at the address given below. IBM welcomes your comments. A form for readers comments may be supplied at the back of this publication, or you may mail your comments to the following address: International Business Machines Corporation Information Development Department 61C 9032 South Rita Road Tucson, AZ 85775-5501 U.S.A. When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes suitable without incurring any obligation to you. Copyright International Business Machines Corporation 2004, 2005. All rights reserved. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Contents
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Safety Notices . . . . . . . . . . . . . . . . . . . . . . . . . xvii Laser Safety and Compliance . . . . . . . . . . . . . . . . . . xvii Translated Safety Notices. . . . . . . . . . . . . . . . . . . . xvii Environmental Notices . . . . . . . . . . . . . . . . . . . . . . xvii Product Recycling . . . . . . . . . . . . . . . . . . . . . . xviii Product Disposal . . . . . . . . . . . . . . . . . . . . . . . xviii Electronic Emission Notices . . . . . . . . . . . . . . . . . . . . xviii Federal Communications Commission (FCC) Statement . . . . . . . . xviii Industry Canada Compliance Statement . . . . . . . . . . . . . . xviii European Community Compliance Statement . . . . . . . . . . . . xviii Japanese Voluntary Control Council for Interference (VCCI) Class A Statement . . . . . . . . . . . . . . . . . . . . . . . . . xix Korean Ministry of Information and Communication (MIC) Statement . . . . xix Taiwan Class A Compliance Statement . . . . . . . . . . . . . . . xx Chinese Class A Electronic Emission Statement . . . . . . . . . . . . xx Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . xx Using This Service Guide. . . . Where to Start . . . . . . . . Limited Vocabulary . . . . . . . Publications . . . . . . . . . TotalStorage ESS Product Library Ordering Publications . . . . . Web Sites . . . . . . . . . Other Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii xxiii xxiii xxiv xxiv xxv xxv xxv . 1 . 1 . 2 . 3 . 17 . 19 . 21 . 21 . 23 . 24 . 24 . 28 . . . . . . . . . 29 29 33 33 34 34 34 36 37

Chapter 1: Reference Information. . . . . . . . . . . . . . 2105 Models 750 and 800 Overview . . . . . . . . . . . . . 2105 Model 750 Specifications . . . . . . . . . . . . . . 2105 Model 800 Specifications . . . . . . . . . . . . . . Using the ESS operator panel . . . . . . . . . . . . . . . Switching the ESS power on and off (Local, Automatic or Remote) . 2105 Models 750 and 800 Disk Storage . . . . . . . . . . . DDM Bay Indicators . . . . . . . . . . . . . . . . . . DDM Bay Disk Drive Module Indicators . . . . . . . . . . . Internal Connections (DDM Bay) . . . . . . . . . . . . . External SSA Connections (DDM Bay) . . . . . . . . . . . Special Tools . . . . . . . . . . . . . . . . . . . . . Chapter 2: Entry for All Service Actions . . . Entry Table for All Service Actions . . . . . . SIM Generation and Usage . . . . . . . . Repair Using a SIM Console Message . . . . Customer Receives Sense Data Without a SIM Repair Using an EREP Report . . . . . . . EREP Reports . . . . . . . . . . . . Decode a Refcode . . . . . . . . . . . Generating a Refcode from Sense Bytes . .
Copyright IBM Corp. 2004, 2005

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

iii

Media SIM Maintenance Procedures . . . . . . . . . . . . . . . . . 37 Customer Media Maintenance Procedure Examples . . . . . . . . . . 38 Chapter 3: Problem Isolation Procedures . . . . . . . . . . . . . . 41 Entry for Maintenance Analysis Procedures (MAPs) . . . . . . . . . . . 41 MAP 1XXX: General Maintenance Analysis Procedures . . . . . . . . . 41 MAP 2XXX: Power and Cooling Maintenance Analysis Procedures . . . . . 42 MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures . . . . 43 MAP 4XXX: Cluster Maintenance Analysis Procedures . . . . . . . . . 45 MAP 5XXX: Host Interface Maintenance Analysis Procedures. . . . . . . 48 MAP 6XXX: Service Terminal Maintenance Analysis Procedures . . . . . . 49 MAPs 1XXX: General Isolation Procedures . . . . . . . . . . . . . . 50 MAP 1200: Prioritizing Visual Symptoms and Problems For Repair . . . . . 50 MAP 1210: Displaying and Repairing a Problem . . . . . . . . . . . 51 MAP 1300: Isolating Cluster to Modem Communication Problems . . . . . 52 MAP 1301: Isolating Call Home / Remote Services Failure . . . . . . . . 55 MAP 1305: Isolating SNMP Notification Problems . . . . . . . . . . . 56 MAP 1310: Isolating E-Mail Notification Problems . . . . . . . . . . . 58 MAP 1320: Isolating Problems Using Visual Symptoms . . . . . . . . . 60 MAP 1460: Isolating E-Mail Reported Errors . . . . . . . . . . . . . 66 MAP 1480: Replacing a FRU, Without Using a Problem . . . . . . . . . 66 MAP 1500: Ending a Service Action . . . . . . . . . . . . . . . . 67 MAP 1600: ESSNet Console Problem . . . . . . . . . . . . . . . 68 MAP 1602: Repairing the ESSNet Consoles Personal Computer . . . . . 69 MAP 1604: Restoring the Personal Computers Software . . . . . . . . 69 MAP 1605: Master Console Product Recovery Wizard . . . . . . . . . 73 MAP 1606: Converting the Personal Computer to an ESSNet Console . . . 76 MAP 1607: Changing the Network Configuration (IP address, host name, domain, subnet mask) for ESS and the TotalStorage ESS Master Console . 85 MAP 1608: Manually Configuring the Video/Graphics Adapter for the Master Console . . . . . . . . . . . . . . . . . . . . . . . . . 86 MAP 1609: Power Off and Reboot Procedure for the TotalStorage ESS Master Console . . . . . . . . . . . . . . . . . . . . . . . 87 MAP 1610: Connecting the Modem and Modem Expander for Remote Support . . . . . . . . . . . . . . . . . . . . . . . . . . 88 MAP 1620: Attaching The ESSNet to a Customer Network . . . . . . . 107 MAP 1630: Master Console Product Recovery Wizard for Xseries 206 PCs 111 MAPs 2XXX: Power and Cooling Isolation Procedures . . . . . . . . . . 112 MAP 2000: Model 100 Attachment Rack Reported . . . . . . . . . . 112 MAP 2020: Isolating Power Symptoms . . . . . . . . . . . . . . . 112 MAP 2030: CEC, I/O, or Host Bay Drawer Overcurrent . . . . . . . . . 113 MAP 2031: Repair Ground Continuity . . . . . . . . . . . . . . . 114 MAP 20A0: Cluster Not Ready . . . . . . . . . . . . . . . . . . 117 MAP 2210: Host Bay Drawer Power Supply Problem . . . . . . . . . 119 MAP 2220: Input Power to CEC, I/O, Host Bay Drawer Power Supply Not Detected . . . . . . . . . . . . . . . . . . . . . . . . . 120 MAP 2230: CEC, I/O, or Host Bay Drawer Power Fault . . . . . . . . 122 MAP 2320: Installed Unit or Feature Mismatch . . . . . . . . . . . . 124 MAP 2340: PPS Status Code 06 . . . . . . . . . . . . . . . . . 125 MAP 2350: Isolating PPS Status Indicator Codes . . . . . . . . . . . 127 MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem . . . . . . . . . 131 MAP 2365: UEPO Loop Problem . . . . . . . . . . . . . . . . . 133 MAP 2370: Rack 1 Power On Problem, Automatic Mode . . . . . . . . 136 MAP 2380: 2105 Expansion Enclosure (Rack 2) UEPO Problem . . . . . 138 MAP 2390: Rack 1 Power On Problem, Remote Mode . . . . . . . . . 140 MAP 23B0: 2105 Expansion Enclosure (Rack 2) Power Off Problem . . . . 144

iv

VOLUME 1, TotalStorage ESS Service Guide

MAP 23C0: Power Event Threshold Exceeded . . . . . . . . . . . . MAP 23D0: RPC-2 Card Reporting PPS Battery Set Present . . . . . . MAP 23E0: Cluster Powered Off Unexpectedly . . . . . . . . . . . MAP 2400: 2105 Model 800 Local Power On Problems . . . . . . . . MAP 2410: RPC Power Mode Switch Mismatch . . . . . . . . . . . MAP 2420: 2105 Expansion Enclosure Power On Problem . . . . . . . MAP 2430: One RPC Card Firmware Down Level . . . . . . . . . . MAP 2440: Rack 1 Power Off Problem . . . . . . . . . . . . . . MAP 2450: Crossed RPC Cables to Expansion Rack . . . . . . . . . MAP 2460: Battery Set Charge Low. . . . . . . . . . . . . . . . MAP 2470: Battery Set Detection Problem . . . . . . . . . . . . . MAP 2490: PPS Input Phase Missing . . . . . . . . . . . . . . . MAP 24A0: PPS Power On Problem . . . . . . . . . . . . . . . MAP 24B0: 2105 Cannot Power Off, Pinned Data . . . . . . . . . . MAP 24F0: Both RPC Cards Firmware Down Level . . . . . . . . . . MAP 2520: PPS Output Circuit Breaker Tripped . . . . . . . . . . . MAP 2600: RPC Card Cannot Reset a Power Fault . . . . . . . . . . MAP 2700: CEC Drawer Power On Problem . . . . . . . . . . . . MAP 2800: CEC or I/O Drawer Visual Power Supply Problem . . . . . . MAP 2810: Host Bay Drawer Visual Power Supply Problem . . . . . . . MAPs 3XXX SSA DASD DDM Bay Isolation Procedures . . . . . . . . . Using the SSA DASD Maintenance Analysis Procedures (MAPs) . . . . . MAP 3000: Isolating an SSA Link Error Between Two DDMs . . . . . . MAP 3010: Isolating a Degraded SSA Link between Two DDMs . . . . . MAP 3050: Isolating an SSA Link Error Between a DDM and an SSA Device Card . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3060: Isolating a Degraded SSA Link Between a DDM and an SSA Device Card . . . . . . . . . . . . . . . . . . . . . . . MAP 3077: Isolating an SSA Link Error Between a DDM and two SSA Device Cards . . . . . . . . . . . . . . . . . . . . . . . MAP 3078: Isolating a Degraded SSA Link Between a DDM and Two SSA Device Cards . . . . . . . . . . . . . . . . . . . . . . . MAP 3085: Isolating an SSA Link Error Between Two SSA Device Cards Connected Through a DDM Bay . . . . . . . . . . . . . . . . MAP 3086: Isolating a Degraded SSA Link Between Two SSA Device Cards Connected Through a DDM Bay . . . . . . . . . . . . . . . . MAP 3095: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays and an SSA Device Card . . . . . . . . . . . . . . MAP 3096: Isolating a Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card . . . . . . . . . . . . . . MAP 3100: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays . . . . . . . . . . . . . . . . . . . . . . . . MAP 3101: Isolating a Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays . . . . . . . . MAP 3120: Isolating an SSA Link Error . . . . . . . . . . . . . . MAP 3121: Isolating a Degraded SSA Link . . . . . . . . . . . . . MAP 3123: Array Repair Required . . . . . . . . . . . . . . . . MAP 3124: Isolating Between DDM Hardware and Microcode Failures MAP 3125: Isolating an Unexpected SSA SRN. . . . . . . . . . . . MAP 3126: Isolating an Unexpected SSA Test Result . . . . . . . . . MAP 3127: Formatting of a DDM Has Not Completed . . . . . . . . . MAP 3128: Isolating an Unknown DDM Failure . . . . . . . . . . . MAP 3129: Isolating an Array Repair Required Failure . . . . . . . . . MAP 3131: Attempt to Format Array Member . . . . . . . . . . . . MAP 3142: Isolating Multiple DDMs on an SSA Loop Cannot be Accessed MAP 3149: Repairing Single or Multiple DDM Failures . . . . . . . . .
Contents

146 147 149 149 153 154 157 157 160 162 162 164 165 167 168 168 169 170 171 174 176 176 176 178 179 184 187 193 197 201 204 209 212 217 220 223 226 227 228 228 229 229 230 231 231 232

MAP 3152: Replacing DDMs Called Out by Enhanced PFA . . . . . . . MAP 3160: SSA DASD DDM Bay Isolating a Single DDM Redundant Power Fault . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3180: Controller Card Failed . . . . . . . . . . . . . . . . MAP 3190: Wrong Drawer Type Installed . . . . . . . . . . . . . . MAP 3200: Uninstalled SSA DDMs Connected to Loop A . . . . . . . . MAP 3210: Uninstalled SSA DDMs Connected to Loop B . . . . . . . . MAP 3220: Isolating too Few DDMs in a DDM Bay . . . . . . . . . . MAP 3300: Repair Alternate Cluster to Run SSA Loop Test . . . . . . . MAP 3360: Ending a DASD Service Action . . . . . . . . . . . . . MAP 3375: Isolating a Storage Cage Fan/Power Sense Card Error . . . . MAP 3378: Isolating a Storage Cage Fan/Power Sense Card Error . . . . MAP 3379: Analyzing a Storage Cage Fan/Power Sense Card Check Summary Indicator On . . . . . . . . . . . . . . . . . . . . MAP 3381: Isolating a Storage Cage Fan/Power Sense Card Error . . . . MAP 3384: Isolating a Storage Cage Fan Failure . . . . . . . . . . . MAP 3387: Isolating a Storage Cage Power Supply Failure . . . . . . . MAP 3391: Isolating a Storage Cage Power System Problem . . . . . . MAP 3395: Isolating a DDM Bay Power Problem . . . . . . . . . . . MAP 3397: Isolating an SSA DASD DDM Bay Controller Card Problem MAP 3398: Isolating a DDM Bay Controller Card Communications Failure MAP 3400: Replacing a DDM Bay Frame Assembly . . . . . . . . . . MAP 3421: Storage Cage Fan/Power Sense Card R2 Cable Problem . . . MAP 3422: Storage Cage Fan/Power Sense Card R2 Jumper and Cable Problems . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3423: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Missing Error . . . . . . . . . . . . . . . . . . . . . . . MAP 3424: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Failing Error. . . . . . . . . . . . . . . . . . . . . . . . MAP 3425: Isolating a Storage Cage Fan/Power Sense Card R2 Cable Error . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3426: Isolating a Storage Cage Fan/Power Sense Card Location Error MAP 3427: Isolating a Storage and DDM Bay Location Error . . . . . . MAP 3428: Isolating a DDM Bay Location Error . . . . . . . . . . . MAP 3429: Isolating a DDM Location Error . . . . . . . . . . . . . MAP 3500: Verifying a DDM Bay Repair . . . . . . . . . . . . . . MAP 3520: DDM Bay Verification for Possible Problems . . . . . . . . MAP 3530: SSA Devices Certify Test Failure . . . . . . . . . . . . MAP 3540: Web Initiated Format Incomplete, User to Restart . . . . . . MAP 3550: Incomplete or Failed Format Process, User to Restart . . . . MAP 3560: Unrelated Occurrence, Retry Verification Test . . . . . . . . MAP 3570: Unrelated Event Caused Resume Fail . . . . . . . . . . MAP 3580: DDM, or DDMs, Found in Formatting State During IML . . . . MAP 3600: Multiple DDMs Isolated on an SSA Loop . . . . . . . . . MAP 3605: Isolating an Unexpected Result . . . . . . . . . . . . . MAP 3610: DDM Installation with New Rank Site Capacity . . . . . . . MAP 3612: DDM Installation with Mixed Capacity Rank Site . . . . . . . MAP 3614: DDM Installation Introduces Different RPM . . . . . . . . . MAP 3615: DDMs of Same Capacity but Different RPMs on the Same SSA Loop . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3617: DDM Size is Not Supported . . . . . . . . . . . . . . MAP 3618: Replacement DDM Has Slower RPM Than Called For . . . . MAP 3619: This Repair Requires a Larger Capacity DDM . . . . . . . MAP 3621: New DDM Storage Capacity Smaller Than Original DDMs MAP 3625: All DDMs on SSA Loop A Do Not Have the Same Characteristics . . . . . . . . . . . . . . . . . . . . . . .

233 234 235 236 237 238 239 240 241 242 245 246 247 248 251 255 261 263 264 266 266 268 270 272 273 275 277 279 282 283 284 284 285 286 287 288 288 289 290 290 293 296 298 298 299 301 301 302

vi

VOLUME 1, TotalStorage ESS Service Guide

MAP 3626: All DDMs on SSA Loop B Do Not Have the Same Characteristics . . . . . . . . . . . . . . . . . . . . . . . MAP 3627: Unable to Determine DDM Use . . . . . . . . . . . . . MAP 3640: Other Cluster Fenced - Unable to Verify SSA Loop . . . . . . MAP 3650: Wrong, Missing, or Failing Bypass Card . . . . . . . . . . MAP 3652: Wrong, Missing, or Failing Passthrough Card . . . . . . . . MAP 3654: Bypass Card Jumpers Wrong . . . . . . . . . . . . . . MAP 3656: 20 MB SSA Cable Installed Where 40 MB Cable Expected MAP 3680: Isolating a Two DDMs Detect Over-Temperature Problem . . . MAP 3685: Isolating a Multiple DDM Detect Over-Temperature Problem MAPs 4XXX: Cluster Isolation Procedures . . . . . . . . . . . . . . MAP 4010: Cluster Hang During a Failback or Error Recovery . . . . . . MAP 4020: Hard Disk Drive Build Process for Both Drives . . . . . . . MAP 4025: Hard Drive Build Process for Automatic LIC . . . . . . . . MAP 4040: Entry MAP for CPI Problems . . . . . . . . . . . . . . MAP 4055: Resolving a Bay Held Reset Condition . . . . . . . . . . MAP 4060: Replacing I/O Drawer FRUs for CPI Problems . . . . . . . MAP 4070: Replacement of Host Bay FRUs for CPI Problems . . . . . . MAP 4090: CPI Address Mismatch . . . . . . . . . . . . . . . . MAP 40A0: Fence Network Isolation . . . . . . . . . . . . . . . MAP 40B0: Special Cluster Problem Determination Using Slow Boot Mode MAP 40C0: Special SCSI Bus Problems . . . . . . . . . . . . . . MAP 40D0: Special SRN Problems . . . . . . . . . . . . . . . . MAP 40E0: Only One I/O Drawer Power Supply Detected . . . . . . . MAP 4100: Isolating a LIC Process Read/Display Problem . . . . . . . MAP 4110: Host Bay Drawer Fan Reporting Failure . . . . . . . . . . MAP 4120: Handling Unexpected Resources . . . . . . . . . . . . MAP 4130: Handling a Missing or Failing Resource . . . . . . . . . . MAP 4140: Isolating a LIC Activation Process Failure . . . . . . . . . MAP 4150: PPS to RPC Interface Failure . . . . . . . . . . . . . MAP 4160: Isolating Memory Related Error Codes . . . . . . . . . . MAP 4170: Loss of Redundant Input Power to CEC, I/O, or Host Bay Drawers . . . . . . . . . . . . . . . . . . . . . . . . . MAP 4180: RPC to RPC Communication Failure . . . . . . . . . . . MAP 4190: RPC to Host Bay Drawer Power Supply Communication Failure MAP 41A0: RPC Card Host Bay Drawer Fan Reporting Failure . . . . . MAP 41B0: CPI Interface NVS/IOA Card to Host Bay Failure . . . . . . MAP 41C0: ESC 2770 or 2771, Missing CPI Detected . . . . . . . . . MAP 41D0: CPI Problem for Host Bay Slot Failure . . . . . . . . . . MAP 41E0: CPI Failure Needing CPI Cable as FRU . . . . . . . . . . MAP 41F0: A Temporary CPI Error was Detected . . . . . . . . . . . MAP 4200: Extended Cluster IML Time Due to NVS Battery Charging . . . MAP 4240: Isolating a Blinking 888 Error on the CEC Drawer Operator Panel . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 4350: Isolating Cluster Code Load Counter=2 . . . . . . . . . . MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 4370: Error Displaying Problems Needing Repair . . . . . . . . . MAP 4380: Isolating a Customer LAN Connection Problem . . . . . . . MAP 4390: Isolating a Cluster to Cluster Ethernet Problem . . . . . . . MAP 43A0: Bootlist Management Using SMS . . . . . . . . . . . . MAP 43A5: Bootlist Management Using SMS for Automatic LIC . . . . . MAP 43B0: Cluster Dual Hard Drive ESC 1xxx . . . . . . . . . . . MAP 43C0: Cluster IML from Second Hard Disk Drive . . . . . . . . . MAP 43D0: Duplicate TCP/IP Address Detected for this Cluster . . . . . MAP 43E0: Service Processor Reset . . . . . . . . . . . . . . .
Contents

303 304 305 307 309 311 312 313 316 319 319 320 324 326 339 341 343 343 344 346 347 348 349 351 351 352 353 354 355 355 357 359 360 361 361 362 364 365 365 366 367 370 371 375 376 377 387 392 398 400 401 401

vii

MAP 4400: Displaying Cluster SMS Error Logs . . . . . . . . . . . MAP 4410: Cluster to Cluster Ethernet Communication Test . . . . . . . MAP 4420: Display Cluster Ethernet Network Address . . . . . . . . . MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem . . . MAP 4450: ESS Cluster to Customer Network Problem . . . . . . . . MAP 4460: Cluster NVS Problem . . . . . . . . . . . . . . . . MAP 4470: ESC 2768, NVS/IOA Card Problem . . . . . . . . . . . MAP 4480: Cluster to RPC Cards Communication Problem . . . . . . . MAP 4510: Isolating a Cluster to Cluster CPI Communication Failure . . . MAP 4520: Pinned Data and/or Volume Status Unknown . . . . . . . . MAP 4540: Cluster Minimum Configuration . . . . . . . . . . . . . MAP 4550: NVS FRU Replacement . . . . . . . . . . . . . . . . MAP 4560: No Valid Subsystem Status Available . . . . . . . . . . . MAP 45A0: Pinned Data, Special Case . . . . . . . . . . . . . . MAP 4600: Isolating a CD-ROM Test Failure . . . . . . . . . . . . MAP 4610: Cluster SP, SPCN, or System Firmware Down-Level . . . . . MAP 4620: Isolating a Diskette Drive Failure . . . . . . . . . . . . MAP 4640: Cluster SP, SPCN, or System Firmware Reload . . . . . . . MAP 4670: Cluster Powered Off Unexpectedly . . . . . . . . . . . . MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) . . . . . . MAP 4710: Isolating a DDM LIC Update Problem . . . . . . . . . . . MAP 4720: Host Bay Fails to Power Off . . . . . . . . . . . . . . MAP 4730: Cluster Power Off Request Problem . . . . . . . . . . . MAP 4760: Recovering from Corrupted Files or Functions . . . . . . . MAP 4780: Isolating a Functional Code Not Running Problem . . . . . . MAP 47A0: Cluster Fails to Power Off . . . . . . . . . . . . . . . MAP 4810: Unexpected Host Bay Power Off . . . . . . . . . . . . MAP 4820: Isolating a SCSI Card Configuration Timeout . . . . . . . . MAP 4840: CPI Diagnostic Communication Problem . . . . . . . . . MAP 4850: Repair the Host Bay Drawer . . . . . . . . . . . . . . MAP 4870: Host Bay Power On Problem . . . . . . . . . . . . . . MAP 4880: Cluster Power On Problem . . . . . . . . . . . . . . MAP 4885: SPCN Load Fault Firmware Error Code . . . . . . . . . . MAP 4890: Replacing a CEC or I/O Drawer Power Supply . . . . . . . MAP 4960: ESC 5500 Isolation . . . . . . . . . . . . . . . . . MAP 4970: Isolating a Software Problem . . . . . . . . . . . . . . MAP 4980: Customer Copy Services Problems . . . . . . . . . . . MAP 4990: LIC Feature License Failure . . . . . . . . . . . . . . MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA . . . . . . . . . . . . . . . . . . . . MAP 4A00: Isolating an Automatic LIC Activation Failure . . . . . . . . MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) . . . . . . . . . . . . . . . . . . MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) . . . . . . . . . . . . . . . . . . . . . . MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL) . . . . . . . . . . . . . . . . . . . . . . . . . .

402 403 405 405 407 410 411 411 415 417 418 426 427 428 429 430 430 431 431 432 442 443 446 446 447 449 452 456 457 458 459 461 468 471 471 472 474 476 477 482 482 485 486 488 491 493 495 497 499 501 503 504

viii

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL) MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL) MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL) MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL) MAPs 5XXX: Host Interface Isolation Procedures . . . . . . . . . . . . MAP 5000: ESS Specialist Cannot Access Cluster . . . . . . . . . . MAP 5220: Isolating a SCSI Bus Error . . . . . . . . . . . . . . . MAP 5230: Isolating a Fixed Block Read Data Failure . . . . . . . . . MAP 5240: Isolating a Customer Data Check Failure . . . . . . . . . MAP 5250: Isolating a Meta Data Check Failure . . . . . . . . . . . MAP 5300: ESCON or Fibre Channel Link Fault . . . . . . . . . . . MAP 5305: ESCON or Fibre Channel Bit Error Rate Test Failure . . . . . MAP 5310: ESCON Bit Error Rate Validation . . . . . . . . . . . . MAP 5320: ESCON Optical Power Measurement . . . . . . . . . . . MAP 5321: Fibre Channel Optical Power Measurement . . . . . . . . MAP 5330: Display ESCON and Fibre Node Descriptors . . . . . . . . MAP 5340: CKD Read Data Failure . . . . . . . . . . . . . . . . MAP 5400: Fibre Channel Link Fault . . . . . . . . . . . . . . . MAP 5410: Fibre Channel Bit Error Rate Validation . . . . . . . . . . MAP 5430: Host Fibre Channel Fails to Recognize ESS LUNs . . . . . . MAP 5440: Fibre Host Card Reports a Loss of Light . . . . . . . . . MAPs 6XXX: Service Terminal Isolation Procedures . . . . . . . . . . . MAP 6060: Isolating a Service Terminal Login Failure . . . . . . . . . Appendix. Accessibility . Features . . . . . . . Navigating by keyboard . Accessing the publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

506 509 511 514 517 520 523 526 529 532 534 537 540 540 541 543 544 547 548 550 551 552 556 560 561 562 563 564 566 567 567 573 573 573 573

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575

Contents

ix

VOLUME 1, TotalStorage ESS Service Guide

Tables
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. Fibre Channel Host Card LED Indicators . . . . . . . . . . . . . . CEC Drawer Power Indicators . . . . . . . . . . . . . . . . . . I/O Drawer Power Indicators . . . . . . . . . . . . . . . . . . Power control with Remote Power Control Feature installed . . . . . . . Power control without Remote Power Control Feature installed . . . . . . Summary of Bypass Card Indicators . . . . . . . . . . . . . . . . Entry for All Service Actions . . . . . . . . . . . . . . . . . . . 2105 Media Maintenance Procedures . . . . . . . . . . . . . . . MAP 1XXX: General Maintenance Analysis Procedures . . . . . . . . MAP 2XXX: Power and Cooling Maintenance Analysis Procedures . . . . MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures . . . MAP 4XXX: Cluster Maintenance Analysis Procedures . . . . . . . . . MAP 5XXX: Host Interface Maintenance Analysis Procedures . . . . . . MAP 6XXX: Service Terminal Maintenance Analysis Procedures . . . . . Prioritizing Repairs . . . . . . . . . . . . . . . . . . . . . . Call Home Return Codes . . . . . . . . . . . . . . . . . . . . 2105 Model 800 Operator Panel Visual Symptoms . . . . . . . . . . 2105 Model 800 PPS and RPC Card Visual Symptoms. . . . . . . . . 2105 Model 800 CEC, I/O, and Host Bay Visual Symptoms . . . . . . . 2105 Model 800 Storage Bay Visual Symptoms . . . . . . . . . . . DDM Bay, and DDMs Visual Symptoms . . . . . . . . . . . . . . 2105 Model 800 Recommended ESSNet Hub Connection Sequence . . . 2105 Model 800 Power Symptoms . . . . . . . . . . . . . . . . Cluster Power Supply Input Power Cable Plug Chart . . . . . . . . . PPS Status Display Codes . . . . . . . . . . . . . . . . . . . RPC Card and Local Switch Card Configuration Switch Settings . . . . . With Remote Power Feature Installed . . . . . . . . . . . . . . . Remote Power Feature Not Installed . . . . . . . . . . . . . . . CEC or I/O Drawer Visual Power Supply Problems . . . . . . . . . . Host Bay Drawer Visual Power Supply Problems . . . . . . . . . . 2105 Model 800 and Expansion Enclosure, Storage Cages 1 and 2 (upper) Expansion Enclosure, Storage Cages 3 and 4 (lower) . . . . . . . . . Storage Cage Power Supply Installation Requirements . . . . . . . . Original Repair MAP . . . . . . . . . . . . . . . . . . . . . CPI Diagnostics Overview . . . . . . . . . . . . . . . . . . . Failure Condition . . . . . . . . . . . . . . . . . . . . . . Fenced or Quiesced Cluster or Host Bays` . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . CPI Cable Connections . . . . . . . . . . . . . . . . . . . . NVS Power Cards . . . . . . . . . . . . . . . . . . . . . . Memory Quad DIMMs . . . . . . . . . . . . . . . . . . . . PPS Cable Connectors . . . . . . . . . . . . . . . . . . . . Host Bay Drawer Power Supply Communication Cable Connectors . . . . Failing CPI Interface . . . . . . . . . . . . . . . . . . . . . CIP FRUs . . . . . . . . . . . . . . . . . . . . . . . . . Cluster I/O Drawer Slot Locations . . . . . . . . . . . . . . . . Host Adapter Card FRU Names . . . . . . . . . . . . . . . . . CPI Cable FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 17 20 20 22 29 38 41 42 43 45 48 49 50 55 61 63 64 65 66 95 112 122 128 137 158 158 172 175 253 254 256 324 327 327 328 332 333 333 334 336 338 344 354 357 358 360 361 363 364 364 365

Copyright IBM Corp. 2004, 2005

xi

54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94.

Cluster Boot or Down, Symptoms . . . . . . . . . . . Cluster to Cluster Communication Problem, MAP Entry . . . Cluster to Cluster Communication Failure . . . . . . . . Cluster to Cluster Communication Problem, TCP/IP Settings . Cluster to Cluster Communication Problem, New ESSNet . . Cluster to Cluster Communication Problem, Existing ESSNet . Cluster to Cluster Communication Problem, Customer Network Cluster to Cluster Communication Problem, Unknown Cause . Boot Devices Found by Firmware on Power On . . . . . . Number of Harddisks Displayed . . . . . . . . . . . . MAP Repair Started in . . . . . . . . . . . . . . . hdisk_ Repairs . . . . . . . . . . . . . . . . . . ESC Repair Actions . . . . . . . . . . . . . . . . Conditions for Fencing . . . . . . . . . . . . . . . Minimum Configuration Error Codes . . . . . . . . . . Minimum Configuration Checkpoint . . . . . . . . . . . Memory Quad DIMMs . . . . . . . . . . . . . . . CEC Drawer FRU Replacements . . . . . . . . . . . I/O Drawer FRU Replacements . . . . . . . . . . . . Host Bay LEDs . . . . . . . . . . . . . . . . . . ESC Repairs . . . . . . . . . . . . . . . . . . . ESS Web Copy Services Problems . . . . . . . . . . ESC Actions . . . . . . . . . . . . . . . . . . . Status Actions . . . . . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . SCSI Read Data Failure ESC Repairs . . . . . . . . . Customer Data Check Failure ESC Repairs . . . . . . . Meta Data Check Failure ESC Repairs . . . . . . . . . 2105 Port ID Field . . . . . . . . . . . . . . . . . rsACExec.c Return Code Definitions . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

372 378 379 380 380 380 381 381 390 393 394 395 398 412 421 421 423 437 439 454 472 475 478 482 483 486 488 489 492 494 496 498 500 502 504 505 544 545 547 561 567

xii

VOLUME 1, TotalStorage ESS Service Guide

Figures
1. Chinese EMI Statement (s009679) . . . . . . . . . . . . . . . . . . . . . . . . xx 2. 2105 Model 800 Front and Rear Views (s009119) . . . . . . . . . . . . . . . . . . . 3 3. 2105 Expansion Enclosure Front and Rear Views (S007726m) . . . . . . . . . . . . . . 4 4. Master Console Connections (s009220) . . . . . . . . . . . . . . . . . . . . . . . 9 5. Fibre Channel Host Card LED Indicator Locations (s009528) . . . . . . . . . . . . . . 14 6. CEC Drawer Power Indicator Location (s009612) . . . . . . . . . . . . . . . . . . . 16 7. I/O Drawer Power Indicator Location (s009613) . . . . . . . . . . . . . . . . . . . 17 8. ESS ndicators (s009531) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 9. ESS Operator Panel Switches and Indicators for an Expansion Enclosure (s008026m) . . . . . 19 10. DDM Bay Indicators (S008108l) . . . . . . . . . . . . . . . . . . . . . . . . . 22 11. DDM Bay Drawer Disk Drive Module Indicators (t007660m) . . . . . . . . . . . . . . . 23 12. DDM Bay Internal SSA Connections (S008107l) . . . . . . . . . . . . . . . . . . . 24 13. DDM Bay Diagram Explanation (S008122l) . . . . . . . . . . . . . . . . . . . . . 24 14. One DDM Bay External SSA Connections (S008129m) . . . . . . . . . . . . . . . . . 25 15. Two DDM Bay Initial External SSA Connections (S008128m) . . . . . . . . . . . . . . 25 16. Two DDM Bay Final External SSA Connections (S008127m) . . . . . . . . . . . . . . . 26 17. Three DDM Bay External SSA Connections (S008126m) . . . . . . . . . . . . . . . . 26 18. Four DDM Bay External SSA Connections (S008125m) . . . . . . . . . . . . . . . . 27 19. Five DDM Bay External SSA Connections (S008124m) . . . . . . . . . . . . . . . . . 27 20. Six DDM Bay External SSA Connections (S008123m) . . . . . . . . . . . . . . . . . 28 21. Service Information Messages Report (S009434) . . . . . . . . . . . . . . . . . . . 35 22. Event History Report (S009433) . . . . . . . . . . . . . . . . . . . . . . . . . 36 23. Decoding the Refcode (s008597m) . . . . . . . . . . . . . . . . . . . . . . . . 36 24. Refcode in the 2105 SIM Sense Bytes (S008594n) . . . . . . . . . . . . . . . . . . 37 25. Example of ICKDSF Analyze Drivetest Output . . . . . . . . . . . . . . . . . . . . 39 26. Modem and Modem Expander Attachment Diagram (s009425) . . . . . . . . . . . . . . 89 27. Modem Configuration Switch Settings (S007457l) . . . . . . . . . . . . . . . . . . . 89 28. Modem Expander Setup Switch Settings (S007455l) . . . . . . . . . . . . . . . . . . 90 29. Modem Rear View (S008410l) . . . . . . . . . . . . . . . . . . . . . . . . . . 90 30. Modem Expander Rear View (S008411l) . . . . . . . . . . . . . . . . . . . . . . 91 31. Cluster Modem Connectors (s009133) . . . . . . . . . . . . . . . . . . . . . . . 92 32. Modem Front Panel Locations (S008412l) . . . . . . . . . . . . . . . . . . . . . 93 33. Modem Expander Switches and Indicators (S007486l) . . . . . . . . . . . . . . . . . 93 34. Cluster to Cluster Communication Cable Location (s009120) . . . . . . . . . . . . . . . 95 35. ESSNet Hub Port Connector Locations (S008603p) . . . . . . . . . . . . . . . . . . 96 36. Line Cord Bracket Connectors (s009124) . . . . . . . . . . . . . . . . . . . . . 115 37. Ground Continuity Repair Diagram (s009406) . . . . . . . . . . . . . . . . . . . . 116 38. Male Plug on the Mainline Power Cable (S008045l) . . . . . . . . . . . . . . . . . 116 39. Female Connector on the Mainline Power Cable (S008046l) . . . . . . . . . . . . . . 117 40. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 132 41. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 134 42. Rack Operator Panel Locations (s009714) . . . . . . . . . . . . . . . . . . . . . 135 43. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 139 44. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 145 45. Rack Power Control Card Cable Locations (s009706) . . . . . . . . . . . . . . . . . 148 46. Rack Power Control Card Switch Locations (s009707) . . . . . . . . . . . . . . . . 149 47. 2105 Model 800 RPC Local/Remote Switch Location (s009127) . . . . . . . . . . . . . 150 48. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 151 49. 2105 Model 800 Operator Panel Locations (s009422) . . . . . . . . . . . . . . . . . 152 50. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 155 51. 2105 Model 800 Operator Panel Locations (s009422) . . . . . . . . . . . . . . . . . 156 52. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 159 53. RPC Card Cables (s009705) . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Copyright IBM Corp. 2004, 2005

xiii

54. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . 55. SSA Link Failure, Two Adjoining DDMs (s009440) . . . . . . . . . . . . . . . . . 56. SSA Link Failure, Two Adjoining DDMs (s009440) . . . . . . . . . . . . . . . . . 57. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008041l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58. DDM bay SSA Connectors (S007693l) . . . . . . . . . . . . . . . . . . . . . 59. Cluster SSA Device Card Connector Locations (s009166) . . . . . . . . . . . . . . 60. DDM bay DDM Indicator Locations (S008021l) . . . . . . . . . . . . . . . . . . 61. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008041l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62. DDM bay SSA Connectors (S007693l) . . . . . . . . . . . . . . . . . . . . . 63. Cluster SSA Device Card Connector Locations (s009166) . . . . . . . . . . . . . . 64. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008141l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 66. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 67. DDM bay DDM Indicator Locations (S008021l) . . . . . . . . . . . . . . . . . . 68. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 69. SSA Link Failure, Passthrough and Bypass Card Link Between a DDM and SSA Device Card (S008141l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 71. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 72. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 73. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S007649l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 75. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 76. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 77. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S007649l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 79. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 80. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 81. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008140l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82. DDM Bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 83. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 84. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 85. SSA Link Degraded, Two Passthrough and Bypass Card Link Between Two DDMs (S008384l) 86. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 87. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 88. SSA Link Failure, Passthrough/Bypass Cards and Two DDMs (s009437) . . . . . . . . . 89. DDM Bay DDM Indicator Locations (S008021l) . . . . . . . . . . . . . . . . . . 90. DDM Bay SSA Connectors (S007693l) . . . . . . . . . . . . . . . . . . . . . 91. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 92. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 93. SSA Link Failure, Passthrough/Bypass Cards and Two DDMs (s009437) . . . . . . . . . 94. DDM Bay SSA Connectors (S007693l) . . . . . . . . . . . . . . . . . . . . . 95. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 96. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 97. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (s009438) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98. DDM bay DDM Indicator Locations (S008021l) . . . . . . . . . . . . . . . . . . 99. DDM Bay Bypass Card Jumper Settings (s009436). . . . . . . . . . . . . . . . . 100. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (s009438) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 166 . 177 . 178 . . . . 180 181 182 183

. 184 . 185 . 186 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 189 190 191 192 193 194 194 195 197 198 199 200 201 202 202 203 205 206 207 208 210 210 211 213 214 215 216 216 217 218 219 219

. 221 . 222 . 223 . 224

xiv

VOLUME 1, TotalStorage ESS Service Guide

101. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . 102. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . 103. DDM Bay Bypass Card Jumper Settings (s009436). . . . . . . . . . . 104. Cluster SSA Device Card Locations (s009166) . . . . . . . . . . . . 105. Cluster SSA Device Card Locations (s009166) . . . . . . . . . . . . 106. Expected DDM Bay DDM Locations (S007657l) . . . . . . . . . . . . 107. DDM bay Indicator Locations (S008018l) . . . . . . . . . . . . . . 108. 2105 Model 800 DDM Bay Locations (s009136) . . . . . . . . . . . . 109. 2105 Expansion Enclosure DDM Bay Locations (S007741s) . . . . . . . 110. Storage Cage Power Planar Fan Jumper Locations (s008352p) . . . . . . 111. Storage Cage Power Supply Locations (s009536) . . . . . . . . . . . 112. Primary Power Supply CB and Connector Locations (S008496l) . . . . . . 113. Storage Cage Power Supply Locations (S008495m) . . . . . . . . . . 114. Storage Cage Power Supply Locations (S008495m) . . . . . . . . . . 115. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 116. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 117. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 118. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 119. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 120. Fan Sense Card Jumper and Cable Locations (S008774m) . . . . . . . . 121. Fan Sense Card Jumper and Cable Locations (S008774m) . . . . . . . . 122. DDM Bay Front Power Cable Locations (S009430) . . . . . . . . . . . 123. DDM Bay Rear Power Cable Locations (S009431) . . . . . . . . . . . 124. DDM bay Indicator Locations (S008018l) . . . . . . . . . . . . . . 125. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . 126. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . 127. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . 128. I/O Drawer Cluster ID Jumpers (s009459) . . . . . . . . . . . . . . 129. 2105 Model 800 Memory Riser Card Memory DIMM Locations (s009638) . . 130. Cluster to Cluster Communication Cable Location (s009120) . . . . . . . 131. Boot Sequence Display . . . . . . . . . . . . . . . . . . . . . 132. CEC Drawer Operator Panel Locations (s009652) . . . . . . . . . . . 133. CEC Drawer Bulkhead Connector Locations (s009527) . . . . . . . . . 134. I/O Drawer Bulkhead Connector Locations (s009526) . . . . . . . . . . 135. CEC Drawer and I/O Drawer Communication (s009721) . . . . . . . . . 136. CEC Drawer, Memory Riser Card Memory DIMM Module Locations (s009241) 137. Power Supply Connector Locations (s009710) . . . . . . . . . . . . 138. Host Bay Planar LED Indicator Location (s009643) . . . . . . . . . . . 139. Host Drawer Power Supply HA LED Indicator Location (s009644) . . . . . 140. CEC Drawer Bulkhead Connector Locations (s009527) . . . . . . . . . 141. I/O Drawer Bulkhead Connector Locations (s009526) . . . . . . . . . . 142. RPC Card J2 Connector Locations (s009583) . . . . . . . . . . . . . 143. Example of Problem Details Report (s009716) . . . . . . . . . . . . 144. 2105 Model 800 ESD Discharge Pad Locations (s009141) . . . . . . . . 145. Measuring Optical Transmit Power (S008185m) . . . . . . . . . . . . 146. Measuring Optical Receive Power (s008186n) . . . . . . . . . . . . 147. Measuring Fibre Channel Optical Transmit Power (s008840l) . . . . . . . 148. Measuring Fibre Channel Optical Receive Power (s008841m) . . . . . . . 149. 2105 Model 800 Host Bay Connector Locations (s009135) . . . . . . . . 150. Service Terminal Connections to Controllers and Power (s009595) . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

224 225 226 237 238 239 240 244 245 249 251 252 256 262 267 269 271 272 274 275 277 281 282 290 308 309 311 345 357 382 389 402 418 419 422 423 434 453 454 462 463 463 472 543 553 555 557 559 565 571

Figures

xv

xvi

VOLUME 1, TotalStorage ESS Service Guide

Notices
References in this manual to IBM products, programs, or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Subject to IBMs valid intellectual property or other legal protected rights, any functionally equivalent product, program, or service may be used instead of the IBM product, program, or service. The evaluation and verification of operation in conjunction with other products, except those expressly designated by IBM, are the responsibility of the user. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 USA

Safety Notices
Safety notices are printed throughout this manual. Danger notices warn you of conditions or procedures that can result in death or severe personal injury. Caution notices warn you of conditions or procedures that can cause personal injury that is neither lethal nor extremely hazardous. Attention notices warn you of conditions or procedures that can cause damage to machines, equipment, or programs.

Laser Safety and Compliance


These products contain components that comply with performance standards that are set by the U. S. Food and Drug Administration (Part 21CFR, 1040.10/11). This means that these products belong to a class of laser products that do not emit hazardous laser radiation. This classification was accomplished by providing the necessary protective housing and scanning safeguards to ensure that laser radiation is inaccessible during operation or is within Class I limits. External safety agencies have reviewed these products and have obtained approvals to the latest standards as they apply to this product type.

Translated Safety Notices


Several countries require that caution and danger safety notices be shown in their national languages. Translations of the caution and danger safety notices are provided in a separate document, IBM Storage Solution Safety Notices manual, form number GC26-7229.

Environmental Notices
This section contains information about: v Product recycling for this product v Environmental guidelines for this product

Copyright IBM Corp. 2004, 2005

xvii

Product Recycling
This unit contains recyclable materials. These materials should be recycled where processing sites are available and according to local regulations. In some areas, IBM provides a product take-back program that ensures proper handling of the product. Contact your IBM representative for more information.

Product Disposal
This unit contains several types of batteries. Return all Pb-acid (lead-acid) batteries to IBM for proper recycling, according to the instructions received with the replacement batteries.

Electronic Emission Notices Federal Communications Commission (FCC) Statement


Note: This equipment has been tested and found to comply with the limits for a Class A digital device, pursuant to Part 15 of the FCC Rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment. This equipment generates, uses, and can radiate radio frequency energy and, if not installed and used in accordance with the instruction manual, may cause harmful interference to radio communications. Operation of this equipment in a residential area is likely to cause harmful interference, in which case the user will be required to correct the interference at his own expense. Properly shielded and grounded cables and connectors must be used in order to meet FCC emission limits. IBM is not responsible for any radio or television interference caused by using other than recommended cables and connectors or by unauthorized changes or modifications to this equipment. Unauthorized changes or modifications could void the users authority to operate the equipment. This device complies with Part 15 of the FCC Rules. Operation is subject to the following two conditions: (1) this device may not cause harmful interference, and (2) this device must accept any interference received, including interference that may cause undesired operation.

Industry Canada Compliance Statement


This Class A digital apparatus complies with Canadian ICES-003.

Avis de conformit la rglementation dIndustrie Canada


Cet appareil numrique de la classe A est conform la norme NMB-003 du Canada.

European Community Compliance Statement


This product is in conformity with the protection requirements of EC Council Directive 89/336/EEC on the approximation of the laws of the Member States relating to electromagnetic compatibility. IBM cannot accept responsibility for any failure to satisfy the protection requirements resulting from a non-recommended modification of the product, including the fitting of non-IBM option cards.

xviii

VOLUME 1, TotalStorage ESS Service Guide

Conformity with the Council Directive 73/23/EEC on the approximation of the laws of the Member States relating to electrical equipment designed for use within certain voltage limits is based on compliance with the following harmonized standard: EN60950.

Germany Only
Zulassungsbescheinigung laut Gesetz ueber die elektromagnetische Vertraeglichkeit von Geraeten (EMVG) vom 30. August 1995. Dieses Geraet ist berechtigt, in Uebereinstimmung mit dem deutschen EMVG das EG-Konformitaetszeichen - CE - zu fuehren. Der Aussteller der Konformitaetserklaeung ist die IBM Deutschland. Informationen in Hinsicht EMVG Paragraph 3 Abs. (2) 2: .bx 0 80 erfuellt die Schutzanforderungen nach EN 50082-1 un EN 55022 off EN 55022 Klasse A Geraete beduerfen folgender Hinweise: Nach dem EMVG: Geraete duerfen an Orten, fuer die sie nicht ausreichend entstoert sind, nur mit besonderer Genehmigung des Bundesministeriums fuer Post und Telekommunikation oder des Bundesamtes fuer Post und Telekommunikation betrieben werden. Die Genehmigung wird erteilt, wenn keine elektromagnetischen Stoerungen zu erwarten sind. (Auszug aus dem EMVG, Paragraph 3, Abs.4) Dieses Genehmigungsverfahren ist nach Paragraph 9 EMVG in Verbindung mit der entsprechenden Kostenverordnung (Amtsblatt 14/93) kostenpflichtig. Nach der EN 55022: Dies ist eine Einrichtung der Klasse A. Diese Einrichtung kann im Wohnbereich Funkstoerungen verursachen; in diesem Fall kann vom Betreiber verlangt werden, angemessene Massnahmen durchzufuehren und dafuer aufzukommen. Anmerkung: Um die Einhaltung des EMVG sicherzustellen, sind die Geraete wie in den Handbuechern angegeben zu installieren und zu betreiben. Das Geraet Klasse A. .bx

Japanese Voluntary Control Council for Interference (VCCI) Class A Statement

Korean Ministry of Information and Communication (MIC) Statement


Please note that this device has been certified for business use with regard to electromagnetic interference. If you find this is not suitable for your use, you may exchange it for one of residential use.

Notices

xix

Taiwan Class A Compliance Statement

Chinese Class A Electronic Emission Statement

Figure 1. Chinese EMI Statement (s009679)

Trademarks
The following terms are trademarks of the IBM Corporation in the United States or other countries or both: AIX AS/400 DB2 DFSMS/MVS DFSMS/VM e (logo) Enterprise Storage Server Enterprise Systems Architecture/390 ESCON ES/9000 FICON FlashCopy IBM MVS MVS/ESA Netfinity NetVista NUMA-Q Operating System/400 OS/390 OS/400 RETAIN RS/6000 S/390 Seascape SP System/360 System/370 System/390 TotalStorage Versatile Storage Server

xx

VOLUME 1, TotalStorage ESS Service Guide

VM/ESA VSE/ESA xSeries z/Architecture z/OS zSeries z/VM Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation in the United States, other counties, or both. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other counties, or both. UNIX, is a registered trademark of The Open Group in the United States and other countries. Other company, product, and service names, may be trademarks or service marks of others.

Notices

xxi

xxii

VOLUME 1, TotalStorage ESS Service Guide

Using This Service Guide


This guide is for service representatives who are taught to install and repair the IBM 2105 TotalStorage ESS. Internal components of this machine are designed and certified to be serviced by trained personnel only.

Where to Start
Start all service actions at Chapter 2: Entry for All Service Actions on page 29. Note: 2105 Model 750 information v The 2105 Model 750 is fully supported by the service information in this manual when following guided procedures. However, the service information will only reference the 2105 Model 800. v The 2105 Model 750 supports limited configuration options when compared to the 2105 Model 800. For further information, reference the IBM TotalStorage ESS Introduction and Planning Guide (form number SC267246). Attention: When performing any service action on the IBM 2105 TotalStorage ESS, follow the directions given in Chapter 2: Entry for All Service Actions on page 29 or from the service terminal. This ensures that you use the correct remove, replace, or repair procedure, including the correct power on/off procedure, for this machine. Failure to follow these instructions can cause damage to the machine and might or might not also cause an unexpected loss of access to customer data.

Limited Vocabulary
This manual uses a specific range of words so that the text can be understood by IBM service representatives in countries where English is not the primary language.

Copyright IBM Corp. 2004, 2005

xxiii

Publications
This section describes the TotalStorage ESS library and publications for related products. It also gives ordering information.

TotalStorage ESS Product Library


The 2105 Models 750 and 800; are an IBM Enterprise architecture-based product. See the following publications for more information on the TotalStorage ESS: v IBM TotalStorage ESS: Service Guide, 2105 Model 750/800 and Expansion Enclosure, Volume 1 manual, SY277635 v IBM TotalStorage ESS: Service Guide, 2105 Model 750/800 and Expansion Enclosure, Volume 2 manual, SY277636 v IBM TotalStorage ESS: Service Guide, 2105 Model 750/800 and Expansion Enclosure, Volume 3 manual, SY277637 v IBM TotalStorage ESS: Parts Catalog, 2105 Model 750/800 and Expansion Enclosure manual, S127-0978 v IBM TotalStorage ESS: Safety Notices manual, GC26-7229 This manual provides translations of the Danger and Caution notices used in the TotalStorage ESS publications. v Maintenance Information for S/390 Fiber Optic Links (ESCON, FICON, Coupling Links, and Open System Adapters) manual, form number SY27-2597 v Enterprise Systems Link Fault Isolation manual, form number SY22-9533 (Available online in CORE) v IBM TotalStorage ESS: Configuration Planner for Open Systems Hosts manual, SC26-7477 (See Note) This guide provides guidelines and work sheets for planning the logical configuration of an ESS that attaches to open-system hosts. v IBM TotalStorage ESS: Configuration Planner for S/390 and IBM eserver zSeries Hosts manual, SC26-7476 (See Note) This guide provides guidelines and work sheets for planning the logical configuration of an ESS that attaches to either the IBM S/390 and IBM eserver zSeries host systems. v IBM TotalStorage ESS: Host Systems Attachment Guide manual, SC26-7446 This guide provides guidelines for attaching the ESS to your host system and for migrating to fibre-channel attachment from either a small computer system interface (SCSI) or from the IBM San Data Gateway. v IBM TotalStorage ESS: DFSMS Software Support Reference manual, SC26-7440 This publication provides an overview of the ESS capabilities. It also describes Data Facility Storage Management Subsystems (DFSMS) software support for the ESS, including support for large volumes. v IBM TotalStorage ESS: Introduction and Planning Guide manual, GC267444 This guide introduces the ESS product and lists the features you can order. It also provides guidelines for planning the installation and configuration of the ESS. v IBM TotalStorage ESS: Quick Configuration Guide manual, SC26-7354 This manuallet provides flow charts for using the TotalStorage Enterprise Storage Server Specialist (ESS Specialist). The flow charts provide a high-level view of the tasks that the IBM service support representative performs during initial logical configuration. You can also use the flow charts for tasks that you might perform when you are modifying the logical configuration.

xxiv

VOLUME 1, TotalStorage ESS Service Guide

v IBM TotalStorage ESS: S/390 Command Reference manual, SC26-7298 This publication describes the functions of the ESS and provides reference information, such as channel commands, sense bytes, and error recovery procedures for IBM S/390 and zSeries hosts. v IBM TotalStorage ESS: SCSI Command Reference manual, SC26-7297 This publication describes the functions of the ESS. It provides reference information, such as channel commands, sense bytes, and error recovery procedures for UNIX , IBM Application System/400 (AS/400), and Eserverserver iSeries 400 hosts. v IBM TotalStorage ESS: Subsystem Device Driver manual, SC26-7478 This publication describes how to use the IBM TotalStorage ESS Subsystem Device Driver (SDD) on open-systems hosts to enhance performance and availability on the ESS. SDD creates redundant paths for shared logical unit numbers. SDD permits applications to run without interruption when path errors occur. It balances the workload across paths, and it transparently integrates with applications. For information about the SDD, go to the following Web site: www.ibm.com/storage/support/techsup/swtechsup.nsf/support/sddupdates/ v IBM TotalStorage ESS: Users Guide manual, SC26-7445 This guide provides instructions for setting up and operating the ESS and for analyzing problems. v IBM TotalStorage ESS: Web Interface Users Guide manual, SC26-7448 This guide provides instructions for using the two ESS Web interfaces, ESS Specialist and ESS Copy Services. Note: No hardcopy manual is produced for this publication. However, a PDF file is available from the following Web site: www.storage.ibm.com/hardsoft/products/ess/refinfo.htm

Ordering Publications
All of the above publications are available on a CD-ROM that comes with the TotalStorage ESS. You can also order a hard copy of some of the publications. For additional CD-ROMs, order: v ESS Service Documents CD-ROM, SK2T-8822 v ESS Customer Documents CD-ROM, SK2T-8803

Web Sites
v IBM Storage home page: http://www.storage.ibm.com/ v IBM Enterprise Storage Server home page: http://www.ibm.com/storage/ess http://www.storage.ibm.com/hardsoft/product/refinfo.htm

Other Related Publications


The following is a list of other related manuals. IBM Input/Output Equipment, Installation ManualPhysical Planning, GC22-7064 Electrical Safety for IBM Customer Engineers, S229-8124

Using This Service Guide

xxv

xxvi

VOLUME 1, TotalStorage ESS Service Guide

Chapter 1: Reference Information


2105 Models 750 and 800 Overview . . . . . . . . . . . . . 2105 Model 750 Specifications . . . . . . . . . . . . . . 2105 Model 800 Specifications . . . . . . . . . . . . . . Redundant Array of Independent Disks (RAID) . . . . . . . . Arrays Across Loops . . . . . . . . . . . . . . . . . Host Systems that the IBM ESS Supports . . . . . . . . . ESS Interfaces . . . . . . . . . . . . . . . . . . . ESS Specialist . . . . . . . . . . . . . . . . . . . Control Unit Initiated Reconfiguration (CUIR) . . . . . . . . Service Interface . . . . . . . . . . . . . . . . . . Fibre Channel Connection . . . . . . . . . . . . . . . Fibre Channel Host Card Indicators . . . . . . . . . . . Cluster Indicators . . . . . . . . . . . . . . . . . . Using the ESS operator panel . . . . . . . . . . . . . . . Switching the ESS power on and off (Local, Automatic or Remote) . Switching the ESS power on (Local Power Control Mode) . . . Switching the ESS power on (Automatic Power Control Mode) . Switching the ESS power on (Remote Power Control Mode) . . RPC Local and Remote or Local and Automatic switch settings . Switching the ESS power off (Local Power Control Mode) . . . Switching the ESS power off (Automatic Power Control Mode) . Switching the ESS power off (Remote Power Control Mode) . . 2105 Models 750 and 800 Disk Storage . . . . . . . . . . . DDM Bay Indicators . . . . . . . . . . . . . . . . . . DDM Bay Disk Drive Module Indicators . . . . . . . . . . . Internal Connections (DDM Bay) . . . . . . . . . . . . . DDM Bay Internal Connections . . . . . . . . . . . . . External SSA Connections (DDM Bay) . . . . . . . . . . . Special Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 4 5 5 7 9 11 13 13 14 15 17 19 19 19 20 20 20 21 21 21 21 23 24 24 24 28

2105 Models 750 and 800 Overview


This section gives an overview of the 2105 Models 750 and 800 and their interfaces and components. Also described is the Expansion Enclosure which is supported by a Model 800 but not the Model 750. These products are also known as the IBM TotalStorage Enterprise Storage Server (ESS). The 2105 Models 750 and 800 are members of the Seascape family of storage servers. The ESS concurrently supports different host systems over different attachment protocols. Customers can allocate data storage among the attached host systems with the ESS Specialist, a Web based interface. The ESS provides integrated caching and support for the attached disk drive modules (DDMs). The DDMs are attached through a serial storage architecture (SSA) interface. The ESS provides the following major features: v RAID-5 and RAID-10 arrays v Fast disk drives v Fast reduced instruction set computer (RISC) processors v Fault tolerant system
Copyright IBM Corp. 2004, 2005

Reference Information
v Disk capacity that can be assigned and reassign among attached host systems. v Instant copy solutions with FlashCopy v Disaster recovery solutions with Peer to Peer Remote Copy (PPRC)

2105 Model 750 Specifications


The following specifications are not all inclusive. What is provided are those items that can be compared to the information provided in this manual for the 2105 Model 800. Disk Drive Modules (DDMs) The 2105 Model 750 subsystem supports a maximum of 64 DDMs. Note: The minimum configuration for all ESS models is 16 DDMs. Redundant Array of Independent Disks (RAID) 2105 Model 750 supports RAID-5 and RAID-10 arrays. (Model 750 does not support non-RAID disk groups.) For additional information about RAID, see Redundant Array of Independent Disks (RAID) on page 4. Arrays Across Loops Not supported. Host Systems that the IBM ESS Supports The ESS Model 750 supports a maximum of 6 host adapters. The 2105 Model 750 can be configured for any intermix of the following host adapter types and protocols: v SCSI-FCP attached host systems v Fibre-channel adapters, for support of fibre-channel protocol (FCP) and fibre connection (FICON) protocol v Enterprise Systems Connection Architecture (ESCON) adapters SCSI-FCP attached host systems The ESS attaches to open-systems hosts with one-port fibre-channel adapters. The fibre-channel adapters can be configured to operate with the SCSI-to-FCP (SCSI-FCP) protocol. Longwave adapters and shortwave adapters are available on 2105 Model 750. Fibre-channel adapters, for support of fibre-channel protocol (FCP) and fibre connection (FICON) protocol The 2105 Model 750 can attach to S/390 and zSeries host systems with fibre-channel adapters that are configured to operate with the FICON upper layer protocol. A maximum of eight fibre-channel ports (one per adapter) can be installed. 2 Gbps adapters are being shipped with the 2105 Model 750. When operating at 2 Gbps, channel-link speed can be up to 200 MB per second in full duplex mode. However, effective sustained throughput of the adapters will be less than these theoretical maximums. Enterprise Systems Connection Architecture (ESCON) adapters The ESS attaches to S/390 host systems and zSeries host systems with two-port ESCON adapters or FICON bridge channels. The FICON bridge card in ESCON Director 9032 Model 5 enables a FICON bridge channel to connect to ESCON host adapters in the ESS. The FICON bridge architecture supports up to 16 384 devices per channel.

VOLUME 1, TotalStorage ESS Service Guide

Reference Information
For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. Physical differences The storage cage in the upper right quandrant of 2105 Model 750 has been removed. Therefore, the four plenum fans above the cage have been removed. Also, the power cables to the fans are terminated at the fan end with the same jumpers used on the power planar board. The left quandrant of the 2105 Model 750 is the same as found on the 2105 Model 800. The dimensions are identical to the 2105 Model 800 and the dimensions with packaging are also identical to the 2105 Model 800. However, there is a weight difference with a maximum weight of 1059 kg (2330 lbs) without packaging and a maximum weight with packaging of 1173 kg (2580 lbs).

2105 Model 800 Specifications


Figure 2 shows the 2105 Model 800. Figure 3 on page 4 shows the 2105 Expansion Enclosure. Both of these ESS racks have dual mainline power cables and redundant power. The redundant power system enables the ESS to continue normal operation when one of the mainline power cables is inactive. Redundancy also ensures continuous data availability.

Front View
Figure 2. 2105 Model 800 Front and Rear Views (s009119)

Rear View

Reference Information, CHAPTER 1

Reference Information

Front view

Rear view

Figure 3. 2105 Expansion Enclosure Front and Rear Views (S007726m)

The 2105 Model 800 subsystem supports a maximum of 384 DDMs with: v 128 DDMs in a 2105 Model 800 v 256 DDMs in an 2105 Expansion Enclosure, must be attached to a 2105 Model 800 v 384 DDMs in a 2105 Model 800 with attached 2105 Expansion Enclosure Note: The minimum configuration for all ESS models is 16 DDMs.

Redundant Array of Independent Disks (RAID)


With RAID implementation the ESS offers fault tolerant data storage. The ESS supports RAID implementation on the ESS device adapters. An ESS disk group consists of eight disk drives. The ESS supports groups of DDMs in a RAID-5 array or a RAID-10 array implementation. For a RAID-array implementation a device adapter loop must have two spares. Model 800 supports RAID-5 and RAID-10 arrays. (Model 800 does not support non-RAID disk groups.) RAID 5 optimizes capacity. RAID 10 optimizes performance. RAID-5 Implementation: RAID 5 is a method of spreading volume data across multiple disk drives. RAID 5 stripes data across a user-defined set of DDMs. Data protection is provided by parity, which redundantly saves the data to the same DDMs. Implementation of RAID-5 data striping increases performance by supporting concurrent accesses to the multiple DDMs within each logical volume. RAID-10 Implementation: RAID 10 provides high availability, by combining features of RAID 0 and RAID 1. RAID 0 optimizes performance by striping volume data across multiple disk drives at a time. RAID 1 is disk mirroring-duplicating data between two disk drives. By combining the two, RAID 10 provides a second optimization on fault tolerance. Data is striped across half of the disk drives in the RAID-10 array, and the other half of the array mirrors the first set of disk drives. Access to data is preserved if one disk in each mirrored pair remains available.

VOLUME 1, TotalStorage ESS Service Guide

Reference Information
Because the ESS requires that a loop have two spare disk drives, the first RAID-10 disk group must consist of six DDMs and two spares. The data on three DDMs is mirrored to the other three DDMs. This configuration satisfies the ESS requirement for two-spares per loop. Later disk groups on the same loop could have eight DDMs, with the data on four DDMs mirrored to the other four DDMs. With half of the DDMs in the group used for data and the other half for mirrored data, RAID-10 arrays have less capacity than RAID-5 arrays.

Arrays Across Loops


The ESS supports the arrays across loops feature on open-systems, S/390, and zSeries hosts. This feature allows configuration of arrays across loops on 2105 Model 800 only. Disk arrays are spread across the loops of an SSA device adapter to optimize single array performance. For RAID 10, the arrays across loops feature provides mirroring across two loops, preventing loss of the array during loop failure. The pair of loops must have the same characteristics (for example: number of DDMs, DDM capacities, rpm, speed). Adding capacity to a loop pair takes more time than adding capacity to a single loop. Both loops in the pair must be upgraded together. The repair times for an AAL loop versus a non-AAL loop are the same.

Host Systems that the IBM ESS Supports


The ESS Model 800 supports a maximum of 16 host adapters . The 2105 Model 800 can be configured for any intermix of the following host adapter types and protocols: v Small computer system interface (SCSI) adapters v Fibre-channel adapters, for support of fibre-channel protocol (FCP) and fibre connection (FICON) protocol v Enterprise Systems Connection Architecture (ESCON) adapters The following sections provide an overview about each of the following host-system attachments: v SCSI-attached open systems hosts v SCSI-FCP attached host systems v ESCON-attached IBM S/390 host systems and zSeries host systems v FICON-attached IBM S/390 host systems and zSeries host systems For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. SCSI Attach Open Systems Hosts: The 2105 Model 800 attaches to open system hosts with a two port SCSI adapters. The SCSI adapters are 2-byte wide, differential, fast-20. With SCSI adapters the ESS supports: v A maximum of 32 SCSI ports (two ports per adapter) v A maximum of 15 targets per SCSI adapter v A maximum of 64 logical units per target, depending on the host system type v A maximum of 512 SCSI-FCP host login IDs or SCSI-3 initiators per ESS For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual.
Reference Information, CHAPTER 1

Reference Information
SCSI-FCP Attached Host Systems: The ESS attaches to open-systems hosts with one-port fibre-channel adapters. The fibre-channel adapters can be configured to operate with the SCSI-to-FCP (SCSI-FCP) protocol. Longwave adapters and shortwave adapters are available on 2105 Model 800. With fibre-channel adapters configured for SCSI-FCP protocol, the ESS supports: v A maximum of 16 fibre channel ports (one port per adapter) v A maximum of 128 host login IDs per fibre channel port v A maximum of 512 SCSO-FCP host login IDs or SCSI-3 initiators per ESS v Logical unit number (LUN) and port masking by target v Either fibre channel arbitrated loop (FC-AL) fabric, or point to point topologies. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. ESCON Attached Host Systems: The ESS attaches to S/390 host systems and zSeries host systems with two-port ESCON adapters or FICON bridge channels. The FICON bridge card in ESCON Director 9032 Model 5 enables a FICON bridge channel to connect to ESCON host adapters in the ESS. The FICON bridge architecture supports up to 16 384 devices per channel. With ESCON adapters, the ESS supports: v A maximum of 32 ESCON ports (two ports per adapter) per ESS v A maximum of 64 logical paths per port v A maximum of 2048 logical paths per ESS v A maximum of 16 control-unit images per ESS v A maximum of 256 logical paths per control-unit image v Access to all 16 control-unit images and 2048 CKD devices over a single ESCON port on the ESS Note: Certain LIC levels might limit the number of devices per ESCON channel to 1024. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. FICON Attached Host Systems: The 2105 Model 800 can attach to S/390 and zSeries host systems with fibre-channel adapters that are configured to operate with the FICON upper layer protocol. A maximum of sixteen fibre-channel ports (one per adapter) can be installed. ESS FICON adapters for the 2105 Model 800 support 1 Gbps or 2 Gbps operation. When operating at 2 Gbps, channel-link speed can be up to 200 MB per second in full duplex mode. However, effective sustained throughput of the adapters will be less than these theoretical maximums. With fibre-channel adapters configured for FICON, the ESS supports: v Either fabric or point to point topologies v A maximum of 127 channel login IDs per fibre-channel port v A maximum of 16 FICON ports v A maximum of 256 logical paths per FICON port v A maximum of 4096 logical paths per ESS (256 logical paths x 16 ports = 4096)

VOLUME 1, TotalStorage ESS Service Guide

Reference Information
Note: Certain FICON host channels might limit the number of logical paths to 2048. v A maximum of 16 control-unit images per ESS v A maximum of 256 logical paths to each control-unit image v Access to all 16 control-unit images (4096 CKD devices) over each FICON port For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual.

ESS Interfaces
This section describes the following interfaces: v ESS connection security v IBM TotalStorage Enterprise Storage Server Network (ESSNet) v IBM TotalStorage Enterprise Storage Server Specialist (ESS Specialist) v A command-line interface (CLI) v IBM TotalStorage Enterprise Storage Server Copy Services (ESS Copy Services) v IBM TotalStorage Expert, an optional software product v The ESS service interface See the IBM TotalStorage Enterprise Storage Server Web Users Interface Guide manual for detailed descriptions of the Web interfaces and instructions about how to use them. ESS Connection Security: The customer connects to the ESS administrative functions through the IBM TotalStorage Enterprise Storage Server Master Console (ESS Master Console). Access to the server functions associated with ESS Specialist and ESS Copy Services requires user IDs and passwords. The customer controls user access by assigning levels of access, such as configure or view. The levels of access limit users to the set of functions that they are authorized to perform. IBM TotalStorage Enterprise Storage Server Network: The IBM TotalStorage Enterprise Storage Server Network (ESSNet) is a network that is established between a set of ESSs and various support functions. The customer needs an ESSNet facility for each set of ESSs in a locality. A local ESSNet is the network between the ESSNet facility and the ESSs. The local ESSNet supports installation functions and configuration functions on the associated ESSs through the ESS Specialist. IBM installs the ESSNet facility when they install the ESS. The facility consists of the dedicated ESS Master Console and the networking components. Note: Feature code (FC) 2717, the ESS Master Console, replaces the remote support facility, FC 2715. FC 2715 included the ESSNet console. The ESS Master Console includes an application that provides links to the ESS user interface. When one of these links is selected, it initiates the Web interface to ESS Specialist and ESS Copy Services. The following service functions for local and remote service areas depend on facilities that the local ESSNet networking components provide: v Simple Network Management Protocol (SNMP) traps
Reference Information, CHAPTER 1

Reference Information
v Electronic mail (e-mail) v Pagers v Call home The customer can extend the local ESSNet into their Ethernet network and between local ESSNets to create an expanded Copy-Services server domain. The local ESSNet can also enable other personal computers (PCs) in the network to interact with the ESSs through either of the following: v v v v ESS Specialist ESS Copy Services ESS Copy Services Command Line interface (CLI) SNMP protocols

Interface into the ESSNet facility is through the ESS Master Console, or through an external Ethernet switch or hub that provides cable connections from the ESSNet to the ESS. The ESS Master Console also requires a telephone connection for operation of call home, remote service, and pager functions. Note: The customer can attach the Ethernet LAN to the external hub. The hub speed is 10 or 100 megabits per second (Mbps), depending on the LAN. The customer provides any hardware that is needed for this connection. ESS Master Console: IBM has replaced the ESSNet console with the ESS Master Console. The ESS Master Console uses a modem and a 16-port serial adapter that enables communication between the ESS and IBM. This communication offers the following enhancements to remote support over the ESSNet console: v Monitoring of hardware and microcode operations v The log viewer displays console message files, formatted error files, log files, and trace files on demand. This function is available to IBM service representatives and other support representatives. v Activation of microcode engineering changes (ECs) from the ESS Master Console v Reduce or elimination of long-distance telephone costs for call-home service (The ESS Master Console uses the IBM Global Network to communicate with the Field Support Center.) v Improved data transmission rates and improved reliability for state saves and traces v Simultaneous code load to multiple ESSs v The ability for IBM or the service provider to copy a LIC package from a CD-ROM at the ESS Master Console to any or all of the attached ESSs. IBM or the service provider could then use the service panels on the ESS Master Console to perform a LIC activate. Because of the added benefits, customers should convert their ESSNet console to an ESS Master Console. Contact their IBM marketing office to request this free service. Figure 4 on page 9 shows the ESS Master Console connections and the remote support functions.

VOLUME 1, TotalStorage ESS Service Guide

Reference Information

Ethernet 15.3 m (50 ft) Ethernet 15.3 m (50 ft) 16-port Ethernet switch 15.3 m (50 ft) Ethernet Master Console MSA PCI card
1.2 m (4 ft)

ESS Cluster Bay 1

RS 232-S1 RS 232-S2 RS 232-S3 CE MoST RS 232-S1 RS 232-S2 RS 232-S3


Customer-provided analog phone line

ESS Cluster Bay 2

Customer-provided Ethernet cable Customer firewall (optional)

Modem

Catcher systems

Call home

To customer network FTP trace data 15.3 m (50 ft) null-modem cable 15.3 m (50 ft) null-modem cable
Remote service

To RETAIN or distributed file service

Serial port

Product engineer 16-port serial adapter (MSA)

1.8 m (6 ft) modem cable

Figure 4. Master Console Connections (s009220)

Methods of accessing the ESS Specialist and ESS Copy Services Web interfaces: The customer can access the ESS Specialist and ESS Copy Services Web interfaces from the ESS Master Console. The ESS Master Console includes browser software for this access. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual.

ESS Specialist
The 2105 includes the ESS Specialist. The ESS Specialist is a Web-based interface that allows the customer to configure the 2105. From the Web interface the customer can perform the following tasks: v Monitor problems v View and change the configuration, which includes the following subtasks: Add or delete SCSI-attached host systems and fibre-channel-attached host systems Configure SCSI host ports and fibre-channel host ports on the ESS Define control-unit images for S/390 host systems and zSeries host systems Define fixed-block (FB) and count-key-data (CKD) disk groups Add FB and CKD logical devices (volumes) Assign logical devices to be accessible to more than one host system Change logical-device assignments v Change and view communication resource settings, such as electronic mail (e-mail) addresses and telephone numbers v Authorize user access
Reference Information, CHAPTER 1

Reference Information
With ESS Specialist the customer can view the following information: v The external connection between a host system and an ESS port v The allocation of storage space to fixed-block (FB) and count key data (CKD) volumes IBM updates the ESS Specialist through licensed internal code (LIC) updates. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. 2105 Copy Services: ESS Copy Services operates over the ESSNet and involves a set of ESS storage servers that are associated in a Copy-Services server domain. Each Copy-Services server domain contains a primary and a backup ESS Copy Services server. The ESS Copy Services servers each run on one of the ESS clusters within the Copy-Services server domain. ESS Copy Services provides the following types of data-copy functions: v Peer-to-Peer Remote Copy, Peer-to-Peer Remote Copy (PPRC) automatically copies changes that the customer makes to a source volume to the target volume until they suspend or terminate the PPRC relationship. v FlashCopy, FlashCopy makes a single point-in-time copy. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. IBM TotalStorage Expert: The TotalStorage Enterprise Storage Server Expert (ESS Expert) is an optional software product the customer can purchase to use with the ESS (ESS Ex[ert). ESS Expert gathers performance, asset, and data capacity information from each ESS that it finds on a network. It stores this information in a database, and generates reports that are based on this information. ESS Expert displays these reports to administrators who sign on to the Expert using a Web browser. The customer must provide a LAN connection between ESS Expert and the ESS to enable ESS Expert to gather the information from the ESS. v Asset management ESS Expert collects and displays asset management data. v Capacity management The ESS Expert collects and displays capacity management data. v Performance management ESS Expert collects and displays performance management data, for example: Number of I/O requests Number of bytes transferred Read and write response time Cache use statistics. v Manage Volume Data The ESS Expert collects and displays volume data. ESS Expert enables the customer to schedule the information collection. With this information, they can make informed decisions about capacity planning and volume placement. They can also isolate I/O performance bottlenecks.

10

VOLUME 1, TotalStorage ESS Service Guide

Reference Information
For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. ESS Service Interface: The ESS provides service interface ports for external connection of a service terminal. IBM or the service provider can perform service on the ESS by using an IBM mobile service terminal (MoST) or an equivalent service interface. The ESS service interface also provides remote service support with call-home capability for directed maintenance by service personnel. The customer must provide an analog telephone line to enable this support. The ESS provides the following service functions: v Continuous self-monitoring initiates a call (call home) to service personnel if a failure has occurred. Because service personnel who respond to the call know about the failing component, they can reduce the repair time. v Service personnel can access error and problems remotely. Service personnel use the logs to analyze potential failures. v Remote support can correct many types of problems on the ESS. When the ESS reports a problem, service personnel can often correct the problem from a remote location. v The call-home facility enables the use of step-ahead storage. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual.

Control Unit Initiated Reconfiguration (CUIR)


In large configurations, quiescing channel paths in preparation for upgrades or service actions is a complex, time consuming, and potentially error prone process. Control Unit Initiated Reconfiguration (CUIR) automates the process of quiescing and re-enabling channel paths, reducing the time required for service actions and reducing the operations staff required efforts, and reduces the possibility for human error. The CUIR function is used with the Quiesce/Resume operation. The CUIR function automates the manual reconfiguration actions necessary to put paths in the correct state for maintenance. It also makes them available for use again after a maintenance action. CUIR allows the IBM Service Support Representative (IBM SSR) to request that all attached system images set all paths associated with a particular service action offline. System images with the appropriate level of software support will respond to such requests by varying off the affected paths, and either notifying the ESS subsystem that the paths are offline, or that it cannot take the paths offline. CUIR reduces manual operator intervention and the possibility for human error during maintenance actions, at the same time reducing the time required for the maintenance. This is particularly useful in environments, where there are many systems attached to an ESS. The CUIR function can be enabled/disabled on each ESS subsystem selectively. The IBM SSR will enable or disable CUIR during the initial ESS install sequence based on information provided by the customer on the Communication Resources work sheet.

Reference Information, CHAPTER 1

11

Reference Information
The customer must enter the configuration information (to Allow CUIR to Automatically Vary Paths OFF/ON) into the Communication Resources Worksheets. These worksheets are found in the IBM TotalStorage Enterprise Storage Server Introduction and Planning Guide manual, GC26-7444. The worksheet options will be either: v Enable: allows the ESS to initiate the reconfiguration for service. The ESS will request the attached hosts to automatically vary paths offline for service, and back online after the service is complete. v Disable (default): sets the paths to the ESS cluster offline for service. The system operator has to manually vary the affected paths offline, and back online. The service representative will set the configuration option during the initial install sequence or can change the original option setting using the Change/Show Control Switches procedure in chapter 6 of the Volume 2. The service representative requests a quiesce of the channel paths from the ESS subsystem. The ESS sends a reconfiguration request to the operating system. The operating system determines the appropriate reconfiguration actions necessary and performs these actions: v For a CUIR quiesce request: The operating system determines the paths affected by the request. For each affected path, the operating system checks to determine if it is online. If the channel path is already offline for some of the devices using it, the operating system marks it as in use by CUIR. The channel path can then not be varied online by the operator while it is being serviced. If the channel path is already offline for all of the devices using it, the operating system WILL NOT mark it in use by CUIR. If the operating system detects that the path is online, it issues the command to vary the path offline. This marks the path as in use by CUIR. CUIR will not take the last path from any processor image to an online device. If the request is unsuccessful, the quiesce process gives instructions for the service representative to take to the system operator. The system operator issues the host commands to vary the channel paths offline. If the request is successful, the service representative can perform the service action. If the request is unsuccessful, the quiesce process gives instructions for the service representative to take to the system operator. The system operator issues the host commands to vary the channel paths offline. If the request is successful, the service representative can perform the service action. v After the service action, the service representative resumes the I/O components that were quiesced. The ESS sends the resume request to the operating system. The operating system then performs the appropriate reconfiguration action: Resume channel path request The operating system determines which channel paths are affected and varies them online automatically. All paths varied off by the operator will remain offline but will no longer be marked as in use by CUIR. v The operating system sends the results of the resume request to the ESS. The automatic control provided by CUIR simplifies the actions required by operations personnel in managing their service requirements.

12

VOLUME 1, TotalStorage ESS Service Guide

Reference Information

Service Interface
The 2105 Model 800 provides service interface ports for external connection of a service terminal. IBM or the customers service provider can perform service on the 2105 using an IBM mobile service terminal (MoST) or equivalent. Remote Services Support: The 2105 service interface also provides remote service support with call-home capability with directed maintenance for service support representatives. The customer provides an analog telephone line to enable this support. The service interface provides an RS232 connection via a modem switch and modem, to the analog telephone line. The customer must order a modem and modem switch. The first 2105 Model 800 ordered requires this equipment. The modem and modem switch support up to seven 2105 Model 800s. The cable length from the 2105 Model 800 to the modem switch should be a maximum of 50 feet (15 meters). The 2105 Model 800 and Expansion Enclosure provides the following service functions: v Continuous self-monitoring that initiates a call (call home) to service personnel; if a failure has occurred. Because service personnel who respond to the call knows about the failing component, repair time is reduced. v Problems are available that service personnel can access remotely to analyze potential failures. v Remote support that allows the ESS to correct many types of problems. When the ESS reports a problem, service personnel can often create a correction which they can apply from the remote location. The Service support representative, logically configures the ESS during installation. After the ESS is installed the customer can perform additional configuration using the ESS Web interfaces. This includes modifying the remote service functions.

Fibre Channel Connection


Class I logo Attention: A Class I laser assembly, in the optical transceiver, is mounted on the ESS fibre channel host card. This laser assembly is registered with the Department of Health and Human Services and is in compliance with IEC825. The ESS provides Fibre channel connection to host systems that it supports. Fibre channel interconnection architecture provides a variety of communication protocols on the ESS. The units that are interconnected are referred to as nodes. Each node has one or more ports. An ESS is a node in a Fibre channel network. Each port on an ESS Fibre channel host adapter is a Fibre channel port. A host is also a node in a Fibre channel network. Each port on a host Fibre channel adapter is a Fibre channel port. Each port attaches to a serial-transmission medium that provides full-duplexed communication with the node at the other end of the medium. ESS architecture supports three basic interconnection topologies.
Reference Information, CHAPTER 1

13

Reference Information
v Point-to-point allows direct interconnection of ports. v Fabric (the underlying structure) To allow multiple nodes to be interconnected, a fabric that provides the necessary switching functions can be used to support communication between multiple nodes. A fabric can be implemented using available vendor products. v Arbitrated Loop Arbitrated loop is a ring topology that enables the interconnection of a set of nodes. The maximum number of ports for a Fibre channel arbitrated loop is 128.

Fibre Channel Host Card Indicators


The fibre channel host card has five LED indicators, their function and location follow: v LED 1 (green), card status, see Table 1 on page 15 v LED 2 (yellow), card status, see Table 1 on page 15 v LED 3 (green), engineering use only v LED 4 (green), engineering use only v LED 5 (red): On, card detected error Off, normal

LED 1 (Green) LED 2 (Yellow) LED 3 (Green) LED 4 (Green) LED 5 (Red)

Front View

View LEDs Through Holes

Side View

Figure 5. Fibre Channel Host Card LED Indicator Locations (s009528)

14

VOLUME 1, TotalStorage ESS Service Guide

Reference Information
Table 1. Fibre Channel Host Card LED Indicators Green LED 1 Indicator Off Off Off Off Off On On On On On Blinking slowly (1 blink per second) Blinking slowly (1 blink per second) Blinking slowly (1 blink per second) Yellow LED 2 Indicator Off On Blinking slowly (1 blink per second) Blinking rapidly (4 blinks per second) Unsteady blinking (no pattern) Off On Blinking slowly (1 blink per second) Unsteady blinking (no pattern) Blinking rapidly (4 blinks per second) Off Blinking slowly (1 blink per second) Blinking rapidly (4 blinks per second) Indicated Condition Wake-up failure (card failed) Power on Self Test failure (card failed) Wake-up failure Power on Self Test failure Power on Self Test in progress Failure while operating Failure while operating Normal, inactive Normal, active Normal, busy Normal, link down or not yet started (loss of light) Off-line for download Restricted off-line mode (waiting for restart)

Cluster Indicators
Each cluster is made up of a CEC drawer and an I/O drawer. Each of these drawers have their own power indicators. CEC Drawer Power Indicator: The CEC drawer power indicator 1 is located on the front of the CEC drawer on the lower left corner.
Table 2. CEC Drawer Power Indicators CEC Drawer Power Indicator State Off Blinking Slowly Blinking Rapidly On Steady (not blinking) Condition Indicated Power is off Power off in progress Power on in progress Power is on

Reference Information, CHAPTER 1

15

Reference Information
CEC Drawer

Front View
1

CEC Drawer Power Indicator

Front View

Figure 6. CEC Drawer Power Indicator Location (s009612)

16

VOLUME 1, TotalStorage ESS Service Guide

Reference Information
I/O Drawer Power Indicator: The I/O drawer power indicator 2 is located on the front of the CEC drawer on the top left corner of the CEC drawer operator panel.
Table 3. I/O Drawer Power Indicators I/O Drawer Power Indicator State Off Blinking Slowly Blinking Rapidly On Steady (not blinking) Condition Indicated Power is off Power off in progress Power on in progress Power is on

I/O Drawer Power Indicator

CEC Drawer

Front View
Figure 7. I/O Drawer Power Indicator Location (s009613)

Front View

Using the ESS operator panel


This section describes the ESS operator panel switches and light-emitting diode (LED) indicators, and provides the procedures for switching the ESS power on and off. It also shows the ESS operator panel for the ESS expansion enclosure. Figure 8 on page 18 shows the ESS operator panel. The ESS operator panel has the following switches and LED indicators: v Local Power switch: Used to power on and off the ESS when it is in Local Power or Automatic Control Mode. If the ESS is in Remote Power Control Mode, the ESS power is controlled by the attached host systems and the ESS operator panel switch cannot be used. ESS expansion enclosure power is controlled by the ESS power system. It cannot be powered on or off separately. v Line Cord 1 and Line Cord 2 Power Complete LEDs:
Reference Information, CHAPTER 1

17

Reference Information
LEDs are off when the ESS power is off. LEDs flash rapidly (twice per second) to indicate that the ESS power-on or power-off sequence is in progress. LEDs are on solid when the power on sequence is complete with no errors. LEDs flash slowly (once per second) to indicate a power fault. v Cluster 1 and Cluster 2 message LEDs: LED is on when a problem is created that requires a service action. LEDs flash rapidly (twice per second) to indicate that a cluster power-on or power-off sequence is in progress. v Cluster 1 and Cluster 2 Ready LEDs: LED is off when the cluster is powering on, fenced, or being serviced. LED is on when the cluster is ready for customer use by the host systems. v Unit Emergency power switch: Causes an immediate ESS power off and may cause customer data loss. Works the same if the ESS is in Local Power Control Mode, Automatic Power Control Mode or Remote Power Control Mode.

2105 Model 800 Unit Emergency

Local Power

Ready Cluster 1 Cluster 2 Power Complete Line Cord 1 Line Cord 2 Messages Cluster 1 Cluster 2 Front View Front View

Figure 8. ESS ndicators (s009531)

18

VOLUME 1, TotalStorage ESS Service Guide

Reference Information

Unit Emergency

Power Complete Line Cord 1 Line Cord 2

Figure 9. ESS Operator Panel Switches and Indicators for an Expansion Enclosure (s008026m)

Switching the ESS power on and off (Local, Automatic or Remote)


This section provides the procedures for switching the ESS power on and off.

Switching the ESS power on (Local Power Control Mode)


Note: The Unit Emergency switch is in the on position. To switch the ESS power on: 1. Press the Local Power switch momentarily to the on position, see Figure 8 on page 18. Note: The power LEDs for Line Cord 1 and Line Cord 2 flash for 3 - 10 seconds. 2. Wait up to thirty minutes for the cluster Ready LEDs to be on. The clusters are then ready for host system activity.

Switching the ESS power on (Automatic Power Control Mode)


Note: The Unit Emergency switch is in the on position. To switch the ESS power on: 1. Press the Local Power switch momentarily to the on position, see Figure 8 on page 18. Note: The power LEDs for Line Cord 1 and Line Cord 2 flash for 3 - 10 seconds. 2. Wait up to thirty minutes for the cluster Ready LEDs to be on. The clusters are then ready for host system activity.

Reference Information, CHAPTER 1

19

Reference Information

Switching the ESS power on (Remote Power Control Mode)


The ESS power on and off is controlled only by the host systems with remote power control cables connected. The following list describes the requirements for remote powering an ESS: v A minimum of one host system and a maximum of eight host systems with remote power control cables connected. v First host system to power up, powers up the ESS. v ESS operators panel local power switch cannot be used.

RPC Local and Remote or Local and Automatic switch settings


The following table describes the Local, Automatic and Remote Power Control Modes which are dependent on whether the remote power control feature is installed. Note: Only the service representative can change the Local and Remote switch settings.
Table 4. Power control with Remote Power Control Feature installed Local and Remote Switch Setting Local Power Control Modes Local Power Control Mode: The ESS power is controlled by the ESS operator panel local power switch. Remote Remote Power Control Mode: With the host S/370 power control interface cables connected, the ESS power is controlled remotely.

Table 5. Power control without Remote Power Control Feature installed Local and Automatic Switch Setting Local Power Control Modes Local Power Control Mode: The ESS power is controlled by the ESS operator panel local power switch. Note: If the ESS loses customer power to both line cords, it will need to be manually powered on when customer power is returned to one or both line cords. Automatic Automatic Power Control Mode: The ESS power is controlled by the ESS operator panel local power switch. Note: If the ESS loses customer power to both line cords, it will automatically power on when customer power is restored to one or both line cords.

Switching the ESS power off (Local Power Control Mode)


To switch the ESS power off: 1. Press the Local Power switch momentarily to the off position. 2. Wait up to five minutes for the Line Cord 1 and 2 LEDs to be off.

20

VOLUME 1, TotalStorage ESS Service Guide

Reference Information
Note: If the ESS will not power off, there may be a hardware problem or pinned data. Your service representative must repair this condition. Attention: Do not force the power off using the Unit Emergency switch, customer data loss may occur.

Switching the ESS power off (Automatic Power Control Mode)


To switch the ESS power off: 1. Press the Local Power switch momentarily to the off position. 2. Wait up to five minutes for the Line Cord 1 and 2 LEDs to be off. Note: If the ESS will not power off, there may be a hardware problem or pinned data. Your service representative must repair this condition. Attention: Do not force the power off using the Unit Emergency switch, customer data loss may occur.

Switching the ESS power off (Remote Power Control Mode)


All host systems with remote power control cable connections to the ESS must be switched off. Note: The ESS operator panel Local Power switch is ignored. The last host system to be powered off will power off the ESS. To switch the ESS power off: 1. Power off all the host systems that have remote power control cables connected. 2. Wait up to five minutes for the Line Cord 1 and 2 LEDs to be off. Note: If the ESS will not power off, there may be a hardware problem or pinned data. Your service representative must repair this condition. Attention: Do not force the power off using the Unit Emergency switch, customer data loss may occur.

2105 Models 750 and 800 Disk Storage


Storage capacity is incorporated in the Enterprise Storage Server subsystem using DDM bays (referred to in host documentation as disk eight packs or eight packs). A 2105 Model 800 subsystem may contain up to 48 DDM bays, with 16 DDM bays in the 2105 and an additional 32 DDM bays in the Expansion Enclosure (with FC 2100). Note: A 2105 Model 750 subsystem might contain up to 8 DDM bays. A DDM bay is installed with eight disk drives (DDMs) that are all of the same capacity. DDM bays are always ordered and installed in pairs and DDM bays of different capacities can be intermixed within a subsystem.

DDM Bay Indicators


The DDM bay has indicators that show the status of the DDM bay. Each DDM has indicators that show the status of that DDM. 3 [Figure 10] Controller Card Power Check Indicator This green indicator is on when controller card power is present.
Reference Information, CHAPTER 1

21

2105 Models 750 and 800 disk storage


4 [Figure 10] DDM Check Indicator This amber indicator is on when a DDM fails. 5 [Figure 10] Controller Card Indicator This amber indicator is on when the controller card fails. 1 [Figure 10] Link Status (Ready) Indicator This green indicator shows the status of the port (for example, port 1) through which the bypass card is connected to another device: Permanently On, The path through this port is operational. Flashing, The path through this port is not operational. Off,, one of the following conditions exists: - The path through this port is not operational. - The card is switched into Bypass state (mode light is on amber) - The card is jumpered for Forced Inline mode (mode light is on green) 2 [Figure 10] Mode Indicator This indicator has two colors that show which mode the bypass card is operating in: Permanently On (amber), the bypass card is switched to bypass state. Permanently On (green), the bypass card is jumpered for forced inline mode. Off,, the bypass card is switched to inline mode. The following table summarizes the various states of the three bypass card lights:
Table 6. Summary of Bypass Card Indicators Operating Mode Automatic Automatic Forced Inline Forced Bypass Forced Open Jumpered Forced Inline Status Inline Bypass Inline Bypass Open Inline Link Status Light-1 On Off Off On Off Off Mode Light Off Amber Green Amber Off Green Link Status Light-2 On Off Off On Off Off

Figure 10. DDM Bay Indicators (S008108l)

22

VOLUME 1, TotalStorage ESS Service Guide

2105 Models 750 and 800 disk storage

DDM Bay Disk Drive Module Indicators

Figure 11. DDM Bay Drawer Disk Drive Module Indicators (t007660m)

1 [Figure 11] Ready Indicator This green indicator shows the following conditions: Indicator Off Both SSA links are inactive because one of the following conditions exists: - The DDMs or DDM and bypass card that are logically on each side of, and next to, this DDM are not connected or are missing. - The DDMs or DDM and bypass card that are logically on each side of, and next to, this DDM are inactive. - An SSA attachment that is in the loop is inactive. - A power-on self-test (POST) is running on this DDM. Indicator Permanently On Both SSA links are active, and the DDM is ready to accept commands from the using system. The Ready indicator does not show that the motor of the DDM is spinning. The DDM might be waiting for a Motor Start command, or might have received a Motor Stop Command. Indicator Slowly Blinks (two seconds on, two seconds off) Only one SSA link is active. Indicator Blinks Fast (five times per second) The DDM is active with a command in progress. 2 [Figure 11] Check Indicator This amber indicator shows the following conditions: Indicator Off Normal operating condition. Indicator Permanently On One of the following conditions exists: - An unrecoverable error that prevents the normal operation of the SSA link has been detected. - The power-on self-tests (POSTs) are running or have failed. The indicator comes on as soon as the DDM is powered on, and goes off when the POSTs are complete. If the indicator remains on for longer than one minute after the DDM is powered on, the POSTs have failed. - Neither SSA link is active.
Reference Information, CHAPTER 1

23

2105 Models 750 and 800 disk storage


- The DDM is in Service mode, and can be removed from the SSA DASD DDM bay. Indicator Blinking The Check indicator has been set by a service aid to identify the position of a particular DDM.

Internal Connections (DDM Bay)


Inside the DDM bay, the DDMs are connected in a string of eight DDMs. The string is connected to the external SSA connectors at the front of the DDM bay. The following diagram show the relationships between the disk drive DDM string and the external SSA connectors at the front of the DDM bay.

DDM Bay Internal Connections


The diagram below shows the relationship between the DDM string and the external SSA connectors.

Figure 12. DDM Bay Internal SSA Connections (S008107l)

External SSA Connections (DDM Bay)


From one to six DDM bays can be connected on two loops, each of which is connected to a different SSA device card. The following diagram show the relationships between the SSA device cards loops with one to six DDM bays. Note: Figure 15 on page 25 and Figure 16 on page 26 show the two stages necessary to concurrently connect a second (E2) DDM bay.

Figure 13. DDM Bay Diagram Explanation (S008122l)

24

VOLUME 1, TotalStorage ESS Service Guide

2105 Models 750 and 800 disk storage

Figure 14. One DDM Bay External SSA Connections (S008129m)

Figure 15. Two DDM Bay Initial External SSA Connections (S008128m)

Reference Information, CHAPTER 1

25

2105 Models 750 and 800 disk storage

Figure 16. Two DDM Bay Final External SSA Connections (S008127m)

Figure 17. Three DDM Bay External SSA Connections (S008126m)

26

VOLUME 1, TotalStorage ESS Service Guide

2105 Models 750 and 800 disk storage

Figure 18. Four DDM Bay External SSA Connections (S008125m)

Figure 19. Five DDM Bay External SSA Connections (S008124m)

Reference Information, CHAPTER 1

27

Special Tools

Figure 20. Six DDM Bay External SSA Connections (S008123m)

Special Tools
v v v v SSA screwdriver tool (P/N 32H7059) ESCON wrap tool (large), (P/N 5605670) ESCON wrap tool (small), (P/N 05N6767) Fibre channel (SW2 and LW2) wrap tool (P/N 11P3847)

28

VOLUME 1, TotalStorage ESS Service Guide

Chapter 2: Entry for All Service Actions


Start all service actions for the IBM 2105 subsystem, 2105 Model 800 rack, 2105 Expansion Enclosure, or SSA DASD DDM bay here. Select the type of action you want to perform from Table 7 below. Note: 2105 Model 750 information v The 2105 Model 750 is fully supported by the service information in this chapter when following guided procedures. However, the service information will only reference the 2105 Model 800. v The 2105 Model 750 supports limited configuration options when compared to the 2105 Model 800. For further information, reference the IBM TotalStorage ESS Introduction and Planning Guide (form number SC267246).

Entry Table for All Service Actions


Table 7. Entry for All Service Actions If you are here to: SERVICE TERMINAL Connect the service terminal to 2105 Model 800 rack Service Terminal Setup and 2105 Configuration Verification in chapter 8 of Volume 3 Go to:

Repair service terminal connection MAP 6060: Isolating a Service Terminal Login Failure on page 567 problem, cannot display the Copyright and Login screen Repair service terminal connection problem, cannot display the Main Service Menu screen MAP 6060: Isolating a Service Terminal Login Failure on page 567

INSTALL 2105 Model 800 Subsystem Installing and Testing the 2105 Model 800 Unit in chapter 5 of Volume 2

2105 Expansion Enclosure (Physically Adding a separate Expansion Enclosure, that was NOT shipped or tested with attached to a 2105 Model 800) the existing 2105 Model 800, requires a separate MES. Relocated 2105 Model 800 Subsystem (Previously installed and relocated) DDM Bay (8 Pack) Host Card Master Console ESSNet1 console Connect the ESSNet to a customer network Safety inspection Installing a Relocated 2105 Model 800 and Expansion Enclosure Subsystem in chapter 5 of Volume 2 Adding a DDM bay to an existing 2105 subsystem requires a separate MES. Installing a Host Card Installing a Host Card in chapter 5 of Volume 2 Begin Installation of the Master Console in chapter 5 of Volume 2 MAP 1610: Connecting the Modem and Modem Expander for Remote Support on page 88 Connecting an ESSNet1 or Master Console to a Customer Network in chapter 5 of Volume 2 Safety Inspection in chapter 12 of Volume 3

REMOVE 2105 Subsystem


Copyright IBM Corp. 2004, 2005

Discontinue a 2105 Model 800 Subsystem in chapter 5 of Volume 2

29

Start
Table 7. Entry for All Service Actions (continued) If you are here to: 2105 Expansion Enclosure DDM Bay (8 Pack) Host Card Relocate 2105 Subsystem Go to: Removing a 2105 Expansion Enclosure from an existing 2105 subsystem requires a separate RPQ. Removing a DDM bay from an existing 2105 subsystem requires a separate RPQ. Removing a Host Card Removing a Host Card in chapter 5 of Volume 2 Relocating a 2105 Model 800 Subsystem in chapter 5 of Volume 2

LOGICAL CONFIGURATION / ESS SPECIALIST Change logical subsystem configuration Customer cannot access the 2105 Model 800 using the ESS Specialist If additional configuration needs to be completed, use the ESS Specialist from the ESSNet console. Go to Analyze and Repair a Service Request section of this table.

Customer cannot access a SCSI LUN Go to Analyze and Repair a Service Request section of this table. Customer requests list of WWPNs of installed fibre channel cards Offload ESS Specialist User Files Go to Configuration Options Menu in chapter 8 of Volume 3. Look for WWPN under System Attachment Resource Menu. Offload User Files in chapter 6 of Volume 2

CHANGE COMMUNICATIONS CONFIGURATION TCP/IP LAN, use only after 2105 initial installation Enable/Disable ESS Specialist Regenerate the ESS Specialist Certificate Repair Multiple DDM Failures E-mail Serial port / modem SNMP Call home/remote reporting options Configure Copy Services, with DNS Configure Copy Services, without DNS Managing Copy Services Enable/Disable Control Unit Initiated Reconfiguration (CUIR) Changing TCP/IP Configuration in chapter 6 of Volume 2 Configure ESS Specialist in chapter 6 of Volume 2 Regenerate ESS Specialist Certificate in chapter 6 of Volume 2 MAP 3149: Repairing Single or Multiple DDM Failures on page 232 Configure E-mail in chapter 6 of Volume 2 Configure Call Home/Remote Services in chapter 6 of Volume 2 book. Configure SNMP in chapter 6 of Volume 2 Configure Call Home/Remote Services in chapter 6 of Volume 2 Configure Copy Services, with DNS in chapter 6 of Volume 2 Configure Copy Services, without DNS in chapter 6 of Volume 2 Copy Services Server Menu in chapter 6 of Volume 2, refer to the Copy Services Server Menu options there Change/Show Control Switches in chapter 6 of Volume 2

ANALYZE and REPAIR a SERVICE REQUEST Prioritize symptoms for repair Codes displayed by the CEC drawer operator panel Cluster Ready indicator LED Off Display and repair a problem with the service terminal MAP 1200: Prioritizing Visual Symptoms and Problems For Repair on page 50 MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 MAP 20A0: Cluster Not Ready on page 117 MAP 1210: Displaying and Repairing a Problem on page 51

30

VOLUME 1, TotalStorage ESS Service Guide

Start
Table 7. Entry for All Service Actions (continued) If you are here to: E-mail reported problem Go to: MAP 1460: Isolating E-Mail Reported Errors on page 66

SCSI-Host system receives command MAP 4560: No Valid Subsystem Status Available on page 427 rejects and check condition of internal target failure SCSI-Host system detected ESCON-Host system receives FC status, pinned data MAP 5220: Isolating a SCSI Bus Error on page 541 MAP 4560: No Valid Subsystem Status Available on page 427

ESCON-host or fiber-host, system link MAP 5300: ESCON or Fibre Channel Link Fault on page 548 error Display ESCON and Fibre Node Descriptors Customer reports a loss of line cord input power via e-mail message Power on or off problems Modem call home problems SNMP Notification Problems E-Mail Notification Problems Visual symptom Power and cooling Cluster boot or down problem Customer LAN connection problem Replace a FRU without using a problem Repair a service terminal connection problem to one cluster Repair a service terminal connection problem to both clusters MAP 5330: Display ESCON and Fibre Node Descriptors on page 560 This should cause a visual symptom, MAP 1320: Isolating Problems Using Visual Symptoms on page 60 MAP 2020: Isolating Power Symptoms on page 112 MAP 1300: Isolating Cluster to Modem Communication Problems on page 52 MAP 1305: Isolating SNMP Notification Problems on page 56 MAP 1310: Isolating E-Mail Notification Problems on page 58 MAP 1320: Isolating Problems Using Visual Symptoms on page 60 MAP 1320: Isolating Problems Using Visual Symptoms on page 60 MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 MAP 4450: ESS Cluster to Customer Network Problem on page 407 MAP 1480: Replacing a FRU, Without Using a Problem on page 66 MAP 6060: Isolating a Service Terminal Login Failure on page 567 MAP 6060: Isolating a Service Terminal Login Failure on page 567

Customer cannot access a SCSI LUN Normally this is due to a logical configuration problem or other customer related problem with the SCSI based host server. This could be the result of an off-line Raid array on the ESS. This can only occur if there are two problems on the same SSA loop, or two problems on each loop of an adapter pair in an AAL configured machine. Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option. If related problems are not found, call the next level of support. Customer cannot access a fibre channel LUN Customer cannot access the 2105 Model 800 using the ESS Specialist ESSNet Console Hardware Problem ESSNet Console Software Problem MAP 5430: Host Fibre Channel Fails to Recognize ESS LUNs on page 564 MAP 5000: ESS Specialist Cannot Access Cluster on page 540 MAP 1600: ESSNet Console Problem on page 68 MAP 1600: ESSNet Console Problem on page 68

ESSNet CONSOLE ESSNet Console Hardware Problem ESSNet Console Software Problem MAP 1600: ESSNet Console Problem on page 68 MAP 1600: ESSNet Console Problem on page 68

Entry for All Service Actions, CHAPTER 2

31

Start
Table 7. Entry for All Service Actions (continued) If you are here to: Manage Master Console Entries Create Master Console PE Package Offload ESS Specialist User Files Test Master Console Configuration and Communication Status Boot Sector Problem Go to: Master Console Queue Management in chapter 6 of Volume 2 Master Console PE Package in chapter 6 of Volume 2 Offload User Files in chapter 6 of Volume 2 Test Master Console Configuration and Communication Status in chapter 5 of Volume 2 MAP 1600: ESSNet Console Problem on page 68

SYSTEM/390 REPAIRS SIM Generation and Usage Repair Using a Hardware SIM ID SIM Generation and Usage on page 33 The SIM ID is the same as the Problem Number in the 2105 Problem. Use this number to begin the repair, go to MAP 1210: Displaying and Repairing a Problem on page 51. Repair Using an EREP Report on page 34 Repair Using a SIM Console Message on page 33 Media SIM Maintenance Procedures on page 37 Decode a Refcode on page 36 Change SIM Reporting Options (System/390 Only) in chapter 6 of Volume 2

Repair Using an EREP Report Repair Using a SIM Console Message Media SIM Maintenance Procedures Decode a Refcode Change SIM Reporting Levels

TEST a MACHINE FUNCTION Cluster Host Bay Planners Interface Cards Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3

External connections: SSA loop, LAN, Machine Test Menu in chapter 8 of Volume 3 cluster-to-cluster, and initialize modem expander SSA Devices, certify SSA Loops Rack Power Control (RPC) Cards CD-ROM Drive Diskette Drive Send Test Notification: E-mail, SNMP, pager, service Show Problem Safety inspection Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Safety Inspection in chapter 12 of Volume 3

LICENSED INTERNAL CODE (Microcode EC) Install/Activate LIC Feature LIC Feature Control Record Extraction Display LIC Levels and Resource Requirements Display LIC Installation Instructions Activate LIC Feature in chapter 8 of Volume 3 LC Feature Control Record Extraction in chapter 5 of Volume 2 Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3 Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3

32

VOLUME 1, TotalStorage ESS Service Guide

Start
Table 7. Entry for All Service Actions (continued) If you are here to: Copy a LIC Image to LIC Library Activate a LIC Image Copy and Activate a LIC Image Go to: Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3 Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3 Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3

INFORMATION Machine overview Service interface Locations and FRUs, 2105 Model 800, only Determine ESD procedures Determine standard tools needed CEC drawer operator panel, status codes DDM Bay and SSA DASD Drawer indicators and switch 2105 Model 800 maintenance agreement qualification Chapter 1: Reference Information on page 1 Service Interface on page 13 Locations in chapter 7 of Volume 3 Working with ESD-Sensitive Parts in chapter 4 of Volume 2 Standard Tools Needed in chapter 4 of Volume 2 MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 2105 Models 750 and 800 Disk Storage on page 21 Safety Check in chapter 4 of Volume 2

SIM Generation and Usage


SIM generation by the ESS family of products is not intended to be the primary notification for service, as it was for the 3390, 3990, 9340, and 9390 product families. SIM generation for ESS is a complement to the existing problem notification process, and is used to support previous system attachments to S/390 hosts. The strategy for SIM presentation differs from previous products. Instead of directing a SIM to the failing device and system, hardware SIMs will be presented to all S/390 hosts attached to the storage subsystem. Exception Class 0 and Media SIMs will still be off-loaded against the failing device and system. The SIM ID is the same as the Problem Number in the 2105 Problem and will be used to repair the problem.

Repair Using a SIM Console Message


The SIM ID is the same as the Problem Number in the 2105 Problem. When a SIM ID is available, start the repair by going to MAP 1210: Displaying and Repairing a Problem on page 51. The 2105 maintenance strategy does not rely on the analysis of data in environmental recording, editing and printing (EREP) reports, or sense bytes on the console. Sense data records for some 2105 temporary and all permanent errors are sent from the 2105 to the system to give information necessary to perform needed system error recovery procedures. The 2105 sense data is logged in the error-recording data set (ERDS) in the system, but is not used for 2105 problem

Entry for All Service Actions, CHAPTER 2

33

Repair Using a SIM Console Message


determination. It is preferred that you start all service actions with a SIM. If the customer receives sense data without a SIM, the following procedure can be used to evaluate the error.

Customer Receives Sense Data Without a SIM


If you do not see a SIM in EREP or on the console, and the customer continues to receive sense data on the console or console messages: 1. Use the service terminal to display all active problems associated with the failing 2105. 2. If the service terminal does not find any problems related to the console message, run Machine Tests on the suspected failing machine function. See Machine Test Menu in chapter 8 of the Volume 3. Repair any failure detected. 3. If the error continues, call your next level of support.

Repair Using an EREP Report


The SIM ID is the same as the Problem Number in the 2105 Problem. When a SIM ID is available, start the repair by going to MAP 1210: Displaying and Repairing a Problem on page 51. The 2105 maintenance strategy does not rely on the analysis of data in environmental recording, editing, and printing (EREP) reports. Sense data records for some 2105 temporary and all permanent errors are sent from the 2105 to the system to give information necessary to perform needed system error recovery procedures. The 2105 sense data is logged in the error-recording data set (ERDS) in the system, but it is not used for 2105 problem determination. Start a service action with a SIM ID only. All 2105 sense data, including the sense data sent to the system for error recovery, is processed by the 2105 support facility (SF) which generates SIMs whenever 2105 service is needed. The SIMs summarize the service information necessary to isolate and repair 2105 error conditions. SIMs are presented to the customer as console messages. SIMs are also logged in the ERDS. Do not attempt to off-load device statistics when running EREP (SYSEXN) if devices or paths are failing. A device or path problem can prevent EREP from successfully collecting statistics, and the EREP job will not complete successfully. To prevent off-loading statistics, make a working data set from the ERDS and then run EREP against the working data set. For more information on EREP, see EREP Reports.

EREP Reports
For detailed information about EREP reports, see Environmental Recording, Editing, and Printing Program Users Guide book.

System Exception Reports


The customer should normally run the system exception reports daily. The best report to use as a basis for servicing the 2105 is the Service Information Messages report, see Figure 21 on page 35. Other system exception reports might contain 2105 information. The other reports would only be used as a basis for 2105 service if there were no SIMs.

34

VOLUME 1, TotalStorage ESS Service Guide

Repair Using an EREP Report

SERVICE INFORMATION MESSAGES

REPORT DATE 024 99 PERIOD FROM 021 99 TO 022 99

COUNT

FIRST OCCURRENCE

LAST OCCURRENCE

****************************************************************************************************
1 021/99 17:44:27:78 021/99 17:44:27:78 MODERATE ALERT 2105-800 S/N 0113-10473 REFCODE C211-1060-A00A ID=03 DASD EXCEPTION ON SSID 0011 ADDITIONAL ANALYSIS REQUIRED TO DETERMINE REPAIR IMPACT. SEE PROBLEM NUMBER 03 FOR DETAILS

021/99 19:24:19:56 021/99 19:24:19:56 SERVICE ALERT 2105-800 S/N 0113-30224 REFCODE 4320-0000-5284 ID=06 MEDIA EXCEPTION ON SSID 00D2, VOLSER 380050 DEV 0E12, 0D REFERENCE MEDIA MAINTENANCE PROCEDURE 2

021/99 19:24:04:67 022/99 03:29:01:65 SERIOUS ALERT 2105-800 S/N 0113-10473 REFCODE C211-1060-A00A ID=09 DASD EXCEPTION ON SSID 00D2 ADDITIONAL ANALYSIS REQUIRED TO DETERMINE REPAIR IMPACT. SEE PROBLEM NUMBER 09 FOR DETAILS

Figure 21. Service Information Messages Report (S009434)

To run EREP for the system exception reports: 1. Make a working data set using the following parameters: PRINT=NO ACC=Y ZERO=N TYPE=O TABSIZE=999K 2. Run EREP against the working data set and print using the following parameters: SYSEXN=Y HIST ACC=N TABSIZE=999K DEV=(33xx)

Event History Report


Note: The best EREP report to use is the Service Information Messages report. See Figure 21. The Event History report gives a one-line summary of each entry in the system error recording data set (ERDS). See Figure 22 on page 36. Selection parameters can be used to select records by device type, date, and time. When an Event History report (EVENT) is needed, instruct the customer to select the following parameters when running EREP against the working data set: EVENT=Y HIST ACC=N

Entry for All Service Actions, CHAPTER 2

35

Repair Using an EREP Report


TABSIZE=999K DEV=(2105) CUA=(xxx-xxx) where xxx-xxx is the device address (CUA) range of the string. For details about the Event History report, see Environmental Recording, Editing, and Printing Users Guide book.
REPORT DATE 079/99 PERIOD FROM 052/99 PERIOD TO 076/99

SPID SSYS ID TIME JOBNAME RECTYP CP CUA * DNO DEVT CRW-CHP REASON

SNID PSW-MCH /PROG-EC 04 06 08 10 ESW RCYRYXIT 12 14 COMP/MOD CSECTID 16 18 20 22 ERROR-ID VOLUME SEEK SD CT

CMD CSW SENSE SCSW

DATE 052 99 00 12 10 44

N/A
N/A N/A N/A

ASYNCH 00 0201 2105-800 RAS201


ASYNCH 00 0201 2105-800 RAS201 ASYNCH 00 0201 2105-800 RAS201 ASYNCH 00 0201 2105-800 RAS201

00000500 0127CF1A 35000680


00000500 0127CF1A 35000680 00000500 0127CF1A 35000680 00000500 0127CF1A 35000680

00410A00 00412000 00444100


00410A00 00412000 00444100 00410A00 00412000 00444100 00410A00 00012000 00444100

05104501 FE000100
05104601 FE000100 05104601 FE000100 05104601 FE000100

*****

00 19 22 92 00 27 44 10 00 28 41 75

***** *****

Figure 22. Event History Report (S009433)

To make a refcode from SIM sense bytes, see Generating a Refcode from Sense Bytes on page 37.

Decode a Refcode
The refcode is a 6-byte field that contains information you can use to locate and repair a 2105 error condition. This section explains how to decode the refcode and find the probable failing FRUs, see Figure 23.
KTGS-CCCC-II PP

KTGS: ESC Refcode Bytes 0 and 1 Exception Class Exception Type General Symptom MAP or SIM Symptom CCCC: LIC Level Identifier Refcode Byte 2 II: Problem ID (SIM ID) Refcode Byte 4

PP: Repair Procedure If PP=09 (Refcode Byte 5), Perform procedure for problem indicated in Refcode Byte 4. If PP=82 (Refcode Byte 5), Perform Media Maintenance 2

Figure 23. Decoding the Refcode (s008597m)

36

VOLUME 1, TotalStorage ESS Service Guide

Decode a Refcode

Generating a Refcode from Sense Bytes


The refcode is a 6-byte field that contains information the service representative can use to locate and repair a 2105 error condition. The refcode is created from SIM sense byte data as shown in Figure 24 below. For details about the refcode, see Decode a Refcode on page 36.
2105 SIM Sense Byte Fields: DASD SIM 00 03 xxxxxxxx Byte 06 = xF: Needed for SIM sense bytes YY: SIM ID field refcode: KTGS-CCCC-IIPP Byte 28= FE: 2105 DASD SIM 04 07 xxxxxFYY 08 11 xxxxxxCC 12 15 CCIIPPxx 16 19 xxxxxxxx 20 23 xxxx KTGS 24 27 xxxxxxxx 28 31 FExxxxxx

2105 SIM Sense Byte Fields: MEDIA SIM 00 03 xxxxxxxx Byte 06 = xF: Needed for SIM sense bytes YY: SIM ID field refcode: KTGS-0000-SSQM Byte 28= FE: 2105 DASD SIM Failing cylinder Failing head 04 07 xxxxxFYY 08 11 xxxxxx00 12 15 00SSQMxx 16 19 xxxxxxxx 20 23 xxxx KTGS 24 27 xxxxxxxx 28 31 FEc.ccchh
. . . .

Figure 24. Refcode in the 2105 SIM Sense Bytes (S008594n)

Use the information in Figure 24 to determine the refcode if the EREP or similar function is not available. See EREP Reports on page 34 for more information. If the record type in the Event History report is ASYNCH, that indicates this record contains SIM sense bytes. If the record type in the Event History report is OBRxxx, the record is a unit check sense and does not contain SIM sense bytes.

Media SIM Maintenance Procedures


Instruct the customer to perform the media maintenance procedure indicated in Table 8 on page 38. Also, look at the examples shown in Customer Media Maintenance Procedure Examples on page 38.

Entry for All Service Actions, CHAPTER 2

37

Media SIM Maintenance Procedures


Table 8. 2105 Media Maintenance Procedures Procedure Number 2 Description The first part of this procedure finds all tracks with unrecoverable data and supplies information on the allocation of the user data (for example, dataset names). The second part of this procedure returns the indicated track to a usable condition. Data on this track has been lost. All subsystem attempts at media maintenance have been unsuccessful. All attempts to recover the data have been unsuccessful. ICKDSF Commands Use ICKDSF Release 16 or higher, enter the following commands: IODELAY SET MSEC(100) See Note 1 below. ANALYZE <UNIT() DDNAME()> NODRIVE SCAN See Note 2 below. See Figure 25 on page 39 for the location of the ESC and addresses of the failing track and head (cccchh) in the Analyze sense information. For each track that reports an ESC of 4xC0 or 0F0B, issue the following command (all on the same line): INSPECT <UNIT()DDNAME()> <VFY()NOVFY> ASSIGN NOCHECK NOPRESERVE TRACK(cccc,hh) See Note 3 below. Note: The above ICKDSF inspect command will result in the loss of all customer data on that track.

Notes: 1. IODELAY adjusts ICKDSF to run concurrently with customer operations. 2. ANALYZE scans the volume for data that is not readable or not usable. 3. The NOPRESERVE parameter must be specified for the 2105. The PRESERVE parameter is not valid for the 2105. All previous attempts by the subsystem to recover the data have not been successful. Although the track will be returned to a usable state, all customer data on the specified track will be lost when the INSPECT command is run.

Customer Media Maintenance Procedure Examples


Example of Procedure 2
To locate all tracks with unrecoverable data, obtain information on the allocation of the user data. To restore such tracks to a usable condition, run the ICKDSF command sequence below. ICKDSF must be at level 16 or higher.

38

VOLUME 1, TotalStorage ESS Service Guide

Media SIM Maintenance Procedures


ENTER INPUT COMMAND: analyze unit(1290) nodrive scan ANALYZE UNIT(1290) NODRIVE SCAN ICK00700I DEVICE INFORMATION FOR 1290 IS CURRENTLY AS FOLLOWS: PHYSICAL DEVICE = 2105 STORAGE CONTROLLER = 2105 STORAGE CONTROL DESCRIPTOR = CC DEVICE DESCRIPTOR = 06 ICK04000I DEVICE IS IN SIMPLEX STATE ICK01400I 1290 ANALYZE STARTED ICK01408I 1290 DATA VERIFICATION TEST STARTED ICK21776I DATAVER TEST: ERROR DURING DATA VERIFICATION CSW = D07C88 0200FFFF CCW = DE000000 3000FFFF FILEMASK = 1E SENSE = 80000000 9000010B 00000034 80000004 02007667 FFB20F0B 000040E2 0003A401 ICK21401I 1290 SUSPECTED DRIVE PROBLEM | | ICK401I 1290 SUSPECTED DRIVE PROBLEM ESC 1 cccchh 2 ICK01406I 1290 ANALYZE ENDED ICK00001I FUNCTION COMPLETED, HIGHEST CONDITION CODE WAS 8 Figure 25. Example of ICKDSF Analyze Drivetest Output

Sense Information Key Description: ESC 1 cccchh 2 ESC = 0F0B in this example Failing track and head address (cccchh) v Failing track address (cccc = track 03A4 in this example) v Failing head address (hh = head 01 in this example)

Entry for All Service Actions, CHAPTER 2

39

Media SIM Maintenance Procedures

40

VOLUME 1, TotalStorage ESS Service Guide

Chapter 3: Problem Isolation Procedures


Entry for Maintenance Analysis Procedures (MAPs)
Select the correct MAP entry table: v MAP 1XXX, general isolation procedures, go to Table 9 v MAP 2XXX, power and cooling isolation procedures, go to Table 10 on page 42 v MAP 3XXX, SSA DASD DDM bay isolation procedures, go to Table 11 on page 43 v MAP 4XXX, cluster isolation procedures, go to Table 12 on page 45 v MAP 5XXX, host interface isolation procedures, go to Table 13 on page 48 v MAP 6XXX, service terminal isolation procedures, go to Table 14 on page 49 Note: 2105 Model 750 information v The 2105 Model 750 is fully supported by the service information in this chapter when following guided procedures. However, the service information will only reference the 2105 Model 800. v The 2105 Model 750 supports limited configuration options when compared to the 2105 Model 800. For further information, reference the IBM TotalStorage ESS Introduction and Planning Guide (form number SC267246).

MAP 1XXX: General Maintenance Analysis Procedures


Table 9. MAP 1XXX: General Maintenance Analysis Procedures MAP 1XXX Procedures: MAP 1200: Prioritizing Visual Symptoms and Problems For Repair MAP 1210: Displaying and Repairing a Problem MAP 1300: Isolating Cluster to Modem Communication Problems MAP 1301: Isolating Call Home / Remote Services Failure MAP 1305: Isolating SNMP Notification Problems MAP 1310: Isolating E-Mail Notification Problems MAP 1320: Isolating Problems Using Visual Symptoms MAP 1460: Isolating E-Mail Reported Errors MAP 1480: Replacing a FRU, Without Using a Problem MAP 1500: Ending a Service Action MAP 1600: ESSNet Console Problem MAP 1602: Repairing the ESSNet Consoles Personal Computer MAP 1604: Restoring the Personal Computers Software MAP 1605: Master Console Product Recovery Wizard MAP 1606: Converting the Personal Computer to an ESSNet Console MAP 1607: Changing the Network Configuration (IP address, host name, domain, subnet mask) for ESS and the TotalStorage ESS Master Console MAP 1608: Manually Configuring the Video/Graphics Adapter for the Master Console Go to: Page 50 Page 51 Page 52 Page 55 Page 56 Page 58 Page 60 Page 66 Page 66 Page 67 Page 68 Page 69 Page 69 Page 73 Page 76 Page 85 Page 86

Copyright IBM Corp. 2004, 2005

41

Isolate
Table 9. MAP 1XXX: General Maintenance Analysis Procedures (continued) MAP 1XXX Procedures: MAP 1609: Power Off and Reboot Procedure for the TotalStorage ESS Master Console MAP 1610: Connecting the Modem and Modem Expander for Remote Support MAP 1620: Attaching The ESSNet to a Customer Network MAP 1630: Master Console Product Recovery Wizard for Xseries 206 PCs Go to: Page 87 Page 88 Page 107 Page 111

MAP 2XXX: Power and Cooling Maintenance Analysis Procedures


Table 10. MAP 2XXX: Power and Cooling Maintenance Analysis Procedures MAP 2XXX Procedures: MAP 2000: Model 100 Attachment Rack Reported MAP 2020: Isolating Power Symptoms MAP 2030: CEC, I/O, or Host Bay Drawer Overcurrent MAP 2031: Repair Ground Continuity MAP 20A0: Cluster Not Ready MAP 2210: Host Bay Drawer Power Supply Problem MAP 2220: Input Power to CEC, I/O, Host Bay Drawer Power Supply Not Detected MAP 2230: CEC, I/O, or Host Bay Drawer Power Fault MAP 2320: Installed Unit or Feature Mismatch MAP 2340: PPS Status Code 06 MAP 2350: Isolating PPS Status Indicator Codes MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem MAP 2365: UEPO Loop Problem MAP 2370: Rack 1 Power On Problem, Automatic Mode MAP 2380: 2105 Expansion Enclosure (Rack 2) UEPO Problem MAP 2390: Rack 1 Power On Problem, Remote Mode MAP 23B0: 2105 Expansion Enclosure (Rack 2) Power Off Problem MAP 23C0: Power Event Threshold Exceeded MAP 23D0: RPC-2 Card Reporting PPS Battery Set Present MAP 23E0: Cluster Powered Off Unexpectedly MAP 2400: 2105 Model 800 Local Power On Problems MAP 2410: RPC Power Mode Switch Mismatch MAP 2420: 2105 Expansion Enclosure Power On Problem MAP 2430: One RPC Card Firmware Down Level MAP 2440: Rack 1 Power Off Problem MAP 2450: Crossed RPC Cables to Expansion Rack MAP 2460: Battery Set Charge Low MAP 2470: Battery Set Detection Problem MAP 2490: PPS Input Phase Missing Go to: Page 112 Page 112 Page 113 Page 114 Page 117 Page 119 Page 120 Page 122 Page 124 Page 125 Page 127 Page 131 Page 133 Page 136 Page 138 Page 140 Page 144 Page 146 Page 147 Page 149 Page 149 Page 153 Page 154 Page 157 Page 157 Page 160 Page 162 Page 162 Page 164

42

VOLUME 1, TotalStorage ESS Service Guide

Isolate
Table 10. MAP 2XXX: Power and Cooling Maintenance Analysis Procedures (continued) MAP 2XXX Procedures: MAP 24A0: PPS Power On Problem MAP 24B0: 2105 Cannot Power Off, Pinned Data MAP 24F0: Both RPC Cards Firmware Down Level MAP 2520: PPS Output Circuit Breaker Tripped MAP 2600: RPC Card Cannot Reset a Power Fault MAP 2700: CEC Drawer Power On Problem MAP 2800: CEC or I/O Drawer Visual Power Supply Problem MAP 2810: Host Bay Drawer Visual Power Supply Problem Go to: Page 165 Page 167 Page 168 Page 168 Page 169 Page 170 Page 171 Page 174

MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures


Table 11. MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures MAP 3XXX Procedures: Using the SSA DASD Maintenance Analysis Procedures (MAPs) MAP 3000: Isolating an SSA Link Error Between Two DDMs MAP 3010: Isolating a Degraded SSA Link between Two DDMs MAP 3050: Isolating an SSA Link Error Between a DDM and an SSA Device Card MAP 3060: Isolating a Degraded SSA Link Between a DDM and an SSA Device Card Go to: Page 176 Page 176 Page 178 Page 179 Page 184

MAP 3077: Isolating an SSA Link Error Between a DDM and two SSA Device Page 187 Cards MAP 3078: Isolating a Degraded SSA Link Between a DDM and Two SSA Device Cards MAP 3085: Isolating an SSA Link Error Between Two SSA Device Cards Connected Through a DDM Bay MAP 3086: Isolating a Degraded SSA Link Between Two SSA Device Cards Connected Through a DDM Bay MAP 3095: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays and an SSA Device Card MAP 3096: Isolating a Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card MAP 3100: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays Page 193 Page 197 Page 201 Page 204 Page 209 Page 212

MAP 3101: Isolating a Degraded SSA Link Between Two Between Two DDMs Page 217 in Separate DDM Bays in Separate DDM Bays MAP 3120: Isolating an SSA Link Error MAP 3121: Isolating a Degraded SSA Link MAP 3123: Array Repair Required MAP 3124: Isolating Between DDM Hardware and Microcode Failures MAP 3125: Isolating an Unexpected SSA SRN MAP 3126: Isolating an Unexpected SSA Test Result MAP 3127: Formatting of a DDM Has Not Completed Page 220 Page 223 Page 226 Page 227 Page 228 Page 228 Page 229

Problem Isolation Procedures, CHAPTER 3

43

Isolate
Table 11. MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures (continued) MAP 3XXX Procedures: MAP 3128: Isolating an Unknown DDM Failure MAP 3129: Isolating an Array Repair Required Failure MAP 3131: Attempt to Format Array Member MAP 3142: Isolating Multiple DDMs on an SSA Loop Cannot be Accessed MAP 3149: Repairing Single or Multiple DDM Failures MAP 3152: Replacing DDMs Called Out by Enhanced PFA MAP 3160: SSA DASD DDM Bay Isolating a Single DDM Redundant Power Fault MAP 3180: Controller Card Failed MAP 3190: Wrong Drawer Type Installed MAP 3200: Uninstalled SSA DDMs Connected to Loop A MAP 3210: Uninstalled SSA DDMs Connected to Loop B MAP 3220: Isolating too Few DDMs in a DDM Bay MAP 3300: Repair Alternate Cluster to Run SSA Loop Test MAP 3360: Ending a DASD Service Action MAP 3375: Isolating a Storage Cage Fan/Power Sense Card Error MAP 3378: Isolating a Storage Cage Fan/Power Sense Card Error MAP 3379: Analyzing a Storage Cage Fan/Power Sense Card Check Summary Indicator On MAP 3381: Isolating a Storage Cage Fan/Power Sense Card Error MAP 3384: Isolating a Storage Cage Fan Failure MAP 3384: Isolating a Storage Cage Fan Failure MAP 3391: Isolating a Storage Cage Power System Problem MAP 3395: Isolating a DDM Bay Power Problem MAP 3397: Isolating an SSA DASD DDM Bay Controller Card Problem MAP 3398: Isolating a DDM Bay Controller Card Communications Failure MAP 3400: Replacing a DDM Bay Frame Assembly MAP 3421: Storage Cage Fan/Power Sense Card R2 Cable Problem MAP 3422: Storage Cage Fan/Power Sense Card R2 Jumper and Cable Problems MAP 3423: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Missing Error MAP 3424: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Failing Error MAP 3425: Isolating a Storage Cage Fan/Power Sense Card R2 Cable Error MAP 3426: Isolating a Storage Cage Fan/Power Sense Card Location Error MAP 3427: Isolating a Storage and DDM Bay Location Error MAP 3428: Isolating a DDM Bay Location Error MAP 3429: Isolating a DDM Location Error MAP 3500: Verifying a DDM Bay Repair MAP 3520: DDM Bay Verification for Possible Problems Go to: Page 229 Page 230 Page 231 Page 231 Page 232 Page 233 Page 234 Page 235 Page 236 Page 237 Page 238 Page 239 Page 240 Page 241 Page 242 Page 245 Page 246 Page 247 Page 248 Page 251 Page 255 Page 261 Page 263 Page 264 Page 266 Page 266 Page 268 Page 270 Page 272 Page 273 Page 275 Page 277 Page 279 Page 282 Page 283 Page 284

44

VOLUME 1, TotalStorage ESS Service Guide

Isolate
Table 11. MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures (continued) MAP 3XXX Procedures: MAP 3530: SSA Devices Certify Test Failure MAP 3540: Web Initiated Format Incomplete, User to Restart MAP 3550: Incomplete or Failed Format Process, User to Restart MAP 3560: Unrelated Occurrence, Retry Verification Test MAP 3570: Unrelated Event Caused Resume Fail MAP 3580: DDM, or DDMs, Found in Formatting State During IML MAP 3600: Multiple DDMs Isolated on an SSA Loop MAP 3605: Isolating an Unexpected Result MAP 3610: DDM Installation with New Rank Site Capacity MAP 3612: DDM Installation with Mixed Capacity Rank Site MAP 3614: DDM Installation Introduces Different RPM MAP 3615: DDMs of Same Capacity but Different RPMs on the Same SSA Loop MAP 3617: DDM Size is Not Supported MAP 3618: Replacement DDM Has Slower RPM Than Called For MAP 3619: This Repair Requires a Larger Capacity DDM MAP 3621: New DDM Storage Capacity Smaller Than Original DDMs MAP 3625: All DDMs on SSA Loop A Do Not Have the Same Characteristics MAP 3626: All DDMs on SSA Loop B Do Not Have the Same Characteristics MAP 3627: Unable to Determine DDM Use MAP 3640: Other Cluster Fenced - Unable to Verify SSA Loop MAP 3650: Wrong, Missing, or Failing Bypass Card MAP 3652: Wrong, Missing, or Failing Passthrough Card MAP 3654: Bypass Card Jumpers Wrong MAP 3656: 20 MB SSA Cable Installed Where 40 MB Cable Expected MAP 3680: Isolating a Two DDMs Detect Over-Temperature Problem MAP 3685: Isolating a Multiple DDM Detect Over-Temperature Problem Go to: Page 284 Page 285 Page 286 Page 287 Page 288 Page 288 Page 289 Page 290 Page 290 Page 293 Page 296 Page 298 Page 298 Page 299 Page 301 Page 301 Page 302 Page 303 Page 304 Page 305 Page 307 Page 309 Page 311 Page 312 Page 313 Page 316

MAP 4XXX: Cluster Maintenance Analysis Procedures


Table 12. MAP 4XXX: Cluster Maintenance Analysis Procedures MAP 4XXX Procedures: MAP 4010: Cluster Hang During a Failback or Error Recovery MAP 4020: Hard Disk Drive Build Process for Both Drives MAP 4025: Hard Drive Build Process for Automatic LIC MAP 4040: Entry MAP for CPI Problems MAP 4055: Resolving a Bay Held Reset Condition MAP 4060: Replacing I/O Drawer FRUs for CPI Problems MAP 4070: Replacement of Host Bay FRUs for CPI Problems MAP 4090: CPI Address Mismatch Go to: Page 319 Page 320 Page 324 Page 326 Page 339 Page 341 Page 343 Page 343

Problem Isolation Procedures, CHAPTER 3

45

Isolate
Table 12. MAP 4XXX: Cluster Maintenance Analysis Procedures (continued) MAP 4XXX Procedures: MAP 40A0: Fence Network Isolation MAP 40B0: Special Cluster Problem Determination Using Slow Boot Mode MAP 40C0: Special SCSI Bus Problems MAP 40D0: Special SRN Problems MAP 40E0: Only One I/O Drawer Power Supply Detected MAP 4100: Isolating a LIC Process Read/Display Problem MAP 4110: Host Bay Drawer Fan Reporting Failure MAP 4120: Handling Unexpected Resources MAP 4130: Handling a Missing or Failing Resource MAP 4140: Isolating a LIC Activation Process Failure MAP 4150: PPS to RPC Interface Failure MAP 4150: PPS to RPC Interface Failure Go to: Page 344 Page 346 Page 347 Page 348 Page 349 Page 351 Page 351 Page 352 Page 353 Page 354 Page 355 Page 355

MAP 4170: Loss of Redundant Input Power to CEC, I/O, or Host Bay Drawers Page 357 MAP 4180: RPC to RPC Communication Failure MAP 4190: RPC to Host Bay Drawer Power Supply Communication Failure MAP 41A0: RPC Card Host Bay Drawer Fan Reporting Failure MAP 41B0: CPI Interface NVS/IOA Card to Host Bay Failure MAP 41C0: ESC 2770 or 2771, Missing CPI Detected MAP 41D0: CPI Problem for Host Bay Slot Failure MAP 41E0: CPI Failure Needing CPI Cable as FRU MAP 41F0: A Temporary CPI Error was Detected MAP 4200: Extended Cluster IML Time Due to NVS Battery Charging MAP 4240: Isolating a Blinking 888 Error on the CEC Drawer Operator Panel MAP 4350: Isolating Cluster Code Load Counter=2 MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel MAP 4370: Error Displaying Problems Needing Repair MAP 4380: Isolating a Customer LAN Connection Problem MAP 4390: Isolating a Cluster to Cluster Ethernet Problem MAP 43A0: Bootlist Management Using SMS MAP 43A5: Bootlist Management Using SMS for Automatic LIC MAP 43B0: Cluster Dual Hard Drive ESC 1xxx MAP 43C0: Cluster IML from Second Hard Disk Drive MAP 43D0: Duplicate TCP/IP Address Detected for this Cluster MAP 43E0: Service Processor Reset MAP 4400: Displaying Cluster SMS Error Logs MAP 4410: Cluster to Cluster Ethernet Communication Test MAP 4420: Display Cluster Ethernet Network Address MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem MAP 4450: ESS Cluster to Customer Network Problem Page 359 Page 360 Page 361 Page 361 Page 362 Page 364 Page 365 Page 365 Page 366 Page 367 Page 370 Page 371 Page 375 Page 376 Page 377 Page 387 Page 392 Page 398 Page 400 Page 401 Page 401 Page 402 Page 403 Page 405 Page 405 Page 407

46

VOLUME 1, TotalStorage ESS Service Guide

Isolate
Table 12. MAP 4XXX: Cluster Maintenance Analysis Procedures (continued) MAP 4XXX Procedures: MAP 4460: Cluster NVS Problem MAP 4470: ESC 2768, NVS/IOA Card Problem MAP 4480: Cluster to RPC Cards Communication Problem MAP 4510: Isolating a Cluster to Cluster CPI Communication Failure MAP 4520: Pinned Data and/or Volume Status Unknown MAP 4540: Cluster Minimum Configuration MAP 4550: NVS FRU Replacement MAP 4560: No Valid Subsystem Status Available MAP 45A0: Pinned Data, Special Case MAP 4600: Isolating a CD-ROM Test Failure MAP 4610: Cluster SP, SPCN, or System Firmware Down-Level MAP 4620: Isolating a Diskette Drive Failure MAP 4640: Cluster SP, SPCN, or System Firmware Reload MAP 4670: Cluster Powered Off Unexpectedly MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) MAP 4710: Isolating a DDM LIC Update Problem MAP 4720: Host Bay Fails to Power Off MAP 4730: Cluster Power Off Request Problem MAP 4760: Recovering from Corrupted Files or Functions 6 MAP 47A0: Cluster Fails to Power Off 3 MAP 4820: Isolating a SCSI Card Configuration Timeout MAP 4840: CPI Diagnostic Communication Problem MAP 4850: Repair the Host Bay Drawer MAP 4870: Host Bay Power On Problem MAP 4880: Cluster Power On Problem MAP 4885: SPCN Load Fault Firmware Error Code MAP 4890: Replacing a CEC or I/O Drawer Power Supply MAP 4960: ESC 5500 Isolation MAP 4970: Isolating a Software Problem MAP 4980: Customer Copy Services Problems MAP 4990: LIC Feature License Failure MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA MAP 4A00: Isolating an Automatic LIC Activation Failure MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) Go to: Page 410 Page 411 Page 411 Page 415 Page 417 Page 418 Page 426 Page 427 Page 428 Page 429 Page 430 Page 430 Page 431 Page 431 Page 432 Page 442 Page 443 Page 446 Page 446 Page 448 Page 449 Page 454 Page 456 Page 457 Page 458 Page 459 Page 461 Page 468 Page 471 Page 471 Page 472 Page 474 Page 476 Page 477 Page 482 Page 482 Page 485 Page486

Problem Isolation Procedures, CHAPTER 3

47

Isolate
Table 12. MAP 4XXX: Cluster Maintenance Analysis Procedures (continued) MAP 4XXX Procedures: MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL) MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL) MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL) MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL) MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL) Go to: Page 488 Page 491 Page 493 Page 495 Page 497 Page 499 Page 501 Page 503 Page 504 Page 506 Page 509 Page 511 Page 514 Page 517 Page 520 Page 523 Page 526 Page 529 Page 532 Page 534 Page 537

MAP 5XXX: Host Interface Maintenance Analysis Procedures


Table 13. MAP 5XXX: Host Interface Maintenance Analysis Procedures MAP 5XXX Procedures: MAP 5000: ESS Specialist Cannot Access Cluster MAP 5220: Isolating a SCSI Bus Error MAP 5230: Isolating a Fixed Block Read Data Failure MAP 5240: Isolating a Customer Data Check Failure MAP 5250: Isolating a Meta Data Check Failure MAP 5300: ESCON or Fibre Channel Link Fault MAP 5305: ESCON or Fibre Channel Bit Error Rate Test Failure MAP 5310: ESCON Bit Error Rate Validation MAP 5320: ESCON Optical Power Measurement MAP 5321: Fibre Channel Optical Power Measurement MAP 5330: Display ESCON and Fibre Node Descriptors MAP 5340: CKD Read Data Failure MAP 5400: Fibre Channel Link Fault Go to: Page 540 Page 541 Page 543 Page 544 Page 547 Page 548 Page 550 Page 551 Page 552 Page 556 Page 560 Page 561 Page 562

48

VOLUME 1, TotalStorage ESS Service Guide

Isolate
Table 13. MAP 5XXX: Host Interface Maintenance Analysis Procedures (continued) MAP 5XXX Procedures: MAP 5410: Fibre Channel Bit Error Rate Validation MAP 5430: Host Fibre Channel Fails to Recognize ESS LUNs MAP 5440: Fibre Host Card Reports a Loss of Light Go to: Page 563 Page 564 Page 566

MAP 6XXX: Service Terminal Maintenance Analysis Procedures


Table 14. MAP 6XXX: Service Terminal Maintenance Analysis Procedures MAP 6XXX Procedures: MAP 6060: Isolating a Service Terminal Login Failure Go to: Page 567

Problem Isolation Procedures, CHAPTER 3

49

MAPs 1XXX: General Isolation Procedures

MAPs 1XXX: General Isolation Procedures


The isolate procedures in the MAP 1XXX group in Chapter 3 cover general MAPs that deal with reported errors and error logs.

MAP 1200: Prioritizing Visual Symptoms and Problems For Repair


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Use this procedure if there is more than one visual symptom and/or problem needing repair.

Procedure
v Display the details of each problem and then use the table below to prioritize their repair.
Table 15. Prioritizing Repairs Condition Visual Symptoms Description Visual symptoms should create a related problem that can be displayed in the Repair Menu, Show / Repair Problems Needing Repair option. Repair related problems before using visual symptoms. If there are no related problems, go to MAP 1320: Isolating Problems Using Visual Symptoms on page 60. The Automatic LIC Activation process stopped or suspended because one or more problems needing repair are listed in the problem log. If a problem exists with ESC=14xx, you must use it first as it will prioritize the repairs and recover the Automatic LIC Activation process. A single fault may create more than one related problem. The successful repair of one problem will automatically close the other related problems for the same resource. Power problems can normally be repaired after logic problems because of the fault tolerant power system design. Cluster problems should be repaired before SSA loop or DDM problems. Both fault free clusters are needed to verify the repair of an SSA loop or DDM problem. Both clusters must be fault free to verify the repair of an SSA loop or DDM problem. Repair this SSA loop before repairing an SSA loop with only one problem. All CPI interface problems needing isolation use the same isolation MAP, so either problem can be used.

Problems found during Automatic LIC Activation

Multiple problems for one fault.

Power problems

Cluster problems

SSA loop or DDM problems An SSA loop has two or more problems. CPI interface problems for a cluster and host bay.

50

VOLUME 1, TotalStorage ESS Service Guide

MAP 1200: Prioritizing Visual Symptoms and Problems For Repair


Table 15. Prioritizing Repairs (continued) Condition Cluster hung with a code displayed in its operator panel, the other cluster Ready LED indicator is on. Each cluster has a problem, at least one cluster Ready LED indicator is on. Description Use the other cluster to show and repair any problems for it. If there are none, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. The service terminal must be connected to a cluster with the Ready LED indicator on. The problem for the other cluster is then repaired first. If one of the problems has an ESC of 5xxx (SRN based repair), repair the other problem for the cluster first. Repair either cluster first using a visual symptom of the code, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371.

A cluster has more than one problem.

Both clusters are hung with a code in their operator panels.

MAP 1210: Displaying and Repairing a Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A problem was created by a cluster and stored in the problem. A 2105 Model 800 operator panel Message indicator was turned on to show which cluster reported the problem. The problem may be in the cluster indicated, the other cluster, or somewhere else in the 2105 Model 800. If the clusters can communicate with each other, the service terminal can display problems from both clusters while attached to either cluster. If the clusters cannot communicate, error information will be displayed to connect the service terminal to the other cluster. Problems from that cluster can then be displayed. A failing cluster may be able to communicate with the service terminal even when it cannot communicate with the other cluster. The Message indicator turns off when the service terminal connects to that cluster. If e-mail is enabled, a copy of the problem will be sent to the defined customer destinations. The service terminal will be used to display the problem or problems needing repair. The problems show FRUs and/or isolation procedures needed to repair the problem. The service terminal and service guide will work together to guide you through the repair process.

Procedure
Use the following steps to display and repair the problem: 1. Ensure the 2105 Model 800 is powered on. 2. Observe the 2105 Model 800 operator panel Message indicators: v If both cluster message indicators are on, connect the service terminal to either cluster. v If only one cluster message indicator is on, connect the service terminal to that cluster. v If both cluster message indicators are off, connect the service terminal to cluster 1. 3. Look at the service terminal screen.
Problem Isolation Procedures, CHAPTER 3

51

MAP 1210: Display and Repair a Problem


Is the service terminal displaying the copyright and login screen? v Yes, go to step 5. v No, continue with the next step. 4. Connect the service terminal to the other cluster and try again. Is the service terminal displaying the copyright and login screen? v Yes, go to step 5. v No, go to MAP 6060: Isolating a Service Terminal Login Failure on page 567. 5. Display the problems. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Note: Each cluster will display its problems and the problems from the other cluster. If the cluster cannot communicate with the other cluster, an informational error message will be displayed. With this condition, display the available problems then connect to the other cluster and display its problems. If this fails, a problem for the cluster to cluster communication problem should be available on the cluster that does display logs. 6. Display the problem details for each listed problem. Details such as: the reporting cluster, failing cluster, FRUs, isolation procedures, timestamp of first and last occurrence, and other information. v Select the problem summary line to display problem details v If more than one problem is listed, reference MAP 1200: Prioritizing Visual Symptoms and Problems For Repair on page 50 for repair. 7. Follow the service terminal instructions to select and repair a problem.

MAP 1300: Isolating Cluster to Modem Communication Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster is not able to communicate with the modem expander or the modem. This error can occur for the following reasons: v The modem expander or modem is powered off. v The modem expander needs to be reset. Powering the modem expander off and on will not reset it. The SET and CLEAR buttons must be used to reset the modem expander. The service terminal configuration screens are used to reload the initialization strings. This can only be done through cluster 1 in the 2105 Model 800. Modem expander port 1 is always cabled to cluster 1 in one of the attached 2105 Model 800s. The other modem expander ports do not have authority to accept the initialization string. v The modem is hung and needs to be reset. Powering the modem off and on should clear the hang. To ensure the modem is set correctly, use the service terminal configuration screens to reload the initialization strings. v The cable between the modem expander and modem, or the cluster and modem expander, is disconnected or damaged. v One or more of the modem configuration settings in the cluster is not configured correctly. The possible FRUs are:

52

VOLUME 1, TotalStorage ESS Service Guide

MAP 1300: Cluster to Modem Communication


v v v v v Modem Modem expander to modem cable, packaged with the modem expander Modem expander Cluster to modem expander cable (null modem cable) Cluster I/O drawer planar assembly

The service terminal Change / Show Modem Configuration option has two different uses: 1. It displays the modem configuration settings. These can be compared to the values listed on the Communications Resources Work Sheet provided by the customer. 2. It will attempt to initialize the modem expander and then the modem when the Enter key is pressed. This occurs even if none of the displayed values have been updated. This is a pass/fail test. If the test fails, no reason for the failure is indicated. Note: Any problems that were created while the modem was unavailable will still be queued to be sent to the call home destination. If e-mail notification is enabled, these problems will be sent to the customer by e-mail.

Isolation
1. Ensure the modem expander and modem are powered on by observing their ON indicators. 2. Determine if the cluster to modem communication error is still present. Use the following procedure as a cluster to modem communication test. Display the Change / Show Modem Configuration screen. From the service terminal Main Service Menu, select: Configurations Options Menu Configure Communications Resources Menu Configure Call Home / Remote Services Menu Change / Show Modem Configuration Pressing enter, will attempt to initialize the modem expander and modem. If it is not successful, an error message will be displayed. The error message does not isolate the type of failure, this is a pass/fail test. For an explanation of Call Home return codes, see Table 16 on page 55. 3. Determine if the test passed or failed: v If the test failed, stopped with an error, go to step 4. v If the test was successful, complete OK, check that the modem can call the defined remote telephone numbers. From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Service Notification (via modem) v If the modem call is successful go to MAP 1500: Ending a Service Action on page 67. v If the modem call fails, go to MAP 1301: Isolating Call Home / Remote Services Failure on page 55. For an explanation of Call Home return codes, see Table 16 on page 55. 4. If a problem is found and corrected in any of the following steps, you should jump to step 14 on page 54.

Problem Isolation Procedures, CHAPTER 3

53

MAP 1300: Cluster to Modem Communication


5. Get a copy of the Communication Resources Work Sheet that the customer provided when this 2105 Model 800 was installed. Refer to work sheet section 6. Modem Configuration fields. Use the service terminal to display and correct these fields as needed. From the service terminal Main Service Menu, select: Configurations Options Menu Configure Communications Resources Menu Configure Call Home / Remote Services Menu Change / Show Modem Configuration As required, update the modem configuration to match the worksheets. Verify that the cluster to modem expander cable is plugged into modem expander Port 16 and the modem serial port. Verify that the cluster to modem expander cable has the proper connectors installed at each end. There must be a null modem connector (labeled null) on one end, and a standard connector (not labelled) on the other end. The null modem connector crosses signals so that the serial ports in the expander and cluster can be connected directly together without a set of modems in between. Check that the modem expander to modem cable is plugged into cluster serial port S3 and the proper port in the modem expander. Refer to the Communication Resources Worksheet section 6. Modem Configuration fields. Power the modem off and then on. Power the modem expander off and then on. Determine if the other cluster in this 2105 Model 800 is also failing. Connect the service terminal to the other cluster and run the cluster to modem communications test again. v If only one cluster fails, call the next level of support.

6. 7.

8.

9. 10. 11.

v If both clusters fail, continue with the next step. 12. Read the note below then reset the modem expander. Note: Resetting the modem expander will load factory default settings. These settings will not work with the 2105 Model 800. The modem expander must be initialized through port 1 after the reset. You must locate the 2105 Model 800 with the cluster 1 that is cable to modem expander port 1. Ensure that the customer will let you have access to it. The modem expander can attache up to seven 2105 Model 800s. a. Press and hold both the SET and CLEAR buttons. b. Release only the CLEAR button. c. Release the SET button. 13. Initialize the modem expander. Connect the service terminal to the cluster 1 that is cabled to modem expander port 1. Use the cluster to modem communication test to test and initialize the modem expander. v If the test fails, call the next level of support. v If the test is successful, continue with the next step. 14. Connect the service terminal to the original cluster that was failing and repeat the cluster to modem communication test. v If the test is successful, then go to MAP 1500: Ending a Service Action on page 67. v If the problem has not been fixed, and the cluster to modem communication test still fails, call the next level of support.

54

VOLUME 1, TotalStorage ESS Service Guide

MAP 1300: Cluster to Modem Communication


Table 16. Call Home Return Codes Return Code 00 48 49 50 51 Description and Information INITIALIZE_SUCCESSFUL Note: NO errors detected. INITIALIZE_PARM_ERROR Note: This is essentially an MLE error. CLUSTER/EXPANDER/MODEM_ODM_ERROR Note: This is a failure to access Call Home or other RAS ODM. MODEM_DIAL_ERROR Note: The same as return code 52 for 2nd number, check configuration. TTY/EXPANDER/MODEM_CONNECT_TIMEOUT Note: Actually a failure to connect or lock the tty, not necessarily a hardware failure. MODEM_FAILED_TO_CONNECT Note: Phone being called was either busy or doesnt answer, check configuration. TTY_MODEM_EXPANDER_BUSY Note: NOT an error condition, some other cluster is using the expander/modem. MODEM_EXPANDER_CONFIG_ERROR Note: Call Home not configured correctly. MODEM_WRITE_ERROR Note: Bad response from Call Home Catcher, may be bad phone lines or Catcher failure. MODEM_EXPANDER_TTY_ERROR Note: Failure to connect tty to modem, not necessarily a hardware failure. MODEM_RESET_ERROR Note: Bad return from resetting the modem, may be a hardware problem but no 1220 ESC issued. MODEM_EXPANDER_INIT_ERROR Note: Failure to initialize modem, can result in an 1220 ESC. MODEM_EXPANDER_RESPONSE_ERROR Note: Failure to receive a response from the Call Home Catcher OR response was invalid. MODEM_EXPANDER_NO_REPONSE Note: Also know as a MODEM_HANG_ERROR, can result in an 1220 ESC.

52

53

54 55

56 57

58 59

60

MAP 1301: Isolating Call Home / Remote Services Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Call Home / Remote Services has been configured on the storage facility (cluster) but the cluster cannot communicate with IBM.

Description
v If the ESS is NOT configured to send call home records to the ESSNet, this failure can occur for the following reasons: The customers analog phone line is not functional.

Problem Isolation Procedures, CHAPTER 3

55

MAP 1301: Call Home / Remote Services


The phone numbers and protocols defined to those phone numbers do not match. A cabling problem exists between the cluster and the customers phone line. Repair the problem with the Isolation steps in this MAP. v If the ESS IS configured to send call home records to the ESSNet, this failure can occur for the following reasons: There are communication problems between the ESSNet console and the cluster. The ESSNet console has operating problems. Go to MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem on page 405 to repair the problem.

Isolation
1. Verify that the phone number or phone numbers being used are valid and that the customer phone line is functional: a. Connect the customers analog phone line to a phone receiver set. b. Call the phone number or phone numbers defined for use by the Configure Call Home / Remote Services Menu. If a modem answers, hang up and reconnect the customers phone line to the modem. Continue with step 2. 2. Verify that the cabling between the cluster and the modem is functional. a. Review MAP 1300: Isolating Cluster to Modem Communication Problems on page 52. b. Repair any problems found, if no problem is found go to step 3. 3. Determine if the protocol for a phone number is correct: a. Call the next level of support. Have them confirm that the required PE protocol or RETAIN protocol match the phone number or phone numbers being used. b. If the problem is not resolved, call the next level of support again.

MAP 1305: Isolating SNMP Notification Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The ESS can send SNMP Trap messages to the customers LAN when it requires service or to notify the customer of certain information events. The ESS generates Simple Network Management Protocol (SNMP) traps and supports a read-only management information base (MIB) to allow monitoring by the customers network. Note: SNMP is required for PPRC status reporting for ESS Copy Services in Open Systems environments. For SNMP to function, the 2105 must be installed on the Customer LAN with TCP/IP addresses and ethernet cables. The SNMP trap is sent to the TCP/IP addresses of the Trap Destinations that are configured by the CE through the service panels or by the customer through the web interface. Note: These addresses should be the dotted TCP/IP address form. You can use the hostname dotted form; however, if the Name Server is down, you will not receive the SNMP trap as desired.

56

VOLUME 1, TotalStorage ESS Service Guide

MAP 1305: Isolating SNMP Notification Problems


At this point in time, ESS only supports the basic function of sending a SNMP trap. Typically, this SNMP trap will only be seen in the event log of the management system being used. If the customer desires further customization, they will have to utilize the management system customization functions to achieve their level of customization.

Isolation
1. Determine if the problem is still occurring. From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Customer Notification (via SNMP) Did the customer receive the SNMP test notification? v Yes, exit this MAP. The problem is no longer occurring v No, continue with the next step. Notes: a. This test procedure will ONLY send a test SNMP trap to all destinations that are configured. b. This test will complete with Customer Notification Test Results: Passed. No Problem Detected. This message means that the 2105 sent the SNMP trap messages, not that the customer SNMP trap destinations received the messages. c. Have the customer inspect the event log of the associated management system to see if they have received a SNMP trap from the ESS. 2. Verify the SNMP is correctly configured and Enabled. Use your ESS Communication Worksheet and the SNMP Menu options. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu SNMP Menu Note: The addresses should be the dotted TCP/IP address form. You can use the hostname dotted form; however, if the Name Server is down, you will not receive the SNMP trap as desired. 3. Use the ping test to verify connectivity to the SNMP trap destination or destinations. From the service terminal Main Service Menu, select: Machine Test Menu External Connections Menu LAN Test On the Ping Test for LAN panel, fill in the TCP/IP address or hostname of the Trap Destination. Press Enter to run the test. Was the ping test successful? v Yes, call the next level of support. The ping test works, but the SNMP test notification fails. v No, continue with the next step. 4. Use the ping test from step 3 to attempt to ping the other cluster in this 2105. Was the ping test successful?

Problem Isolation Procedures, CHAPTER 3

57

MAP 1305: Isolating SNMP Notification Problems


v Yes, the ping test works to the other cluster but fails to the customer network. Work with the customer to resolve the network problem. v No, go to MAP 4390 MAP 4390: Isolating a Cluster to Cluster Ethernet Problem on page 377.

MAP 1310: Isolating E-Mail Notification Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The ESS can send e-mail messages to the customers LAN when it requires service or to notify the customer of certain information events. For e-mail to function, the 2105 must be installed on the Customer LAN with TCP/IP addresses and ethernet cables. E-mail messages are sent to the addresses that are configured by the CE through the service panels or by the customer through the web interface. The ESS generates e-mail messages in the following categories: v Errors v Information Examples of Informational Types of messages are notifications that: v A new level of Licensed Internal Code (LIC) has been installed v New hardware has been installed v The service provider has run the customer-notification diagnostic test. This test verifies that e-mail messages are being received by those who should receive them. The ESS sends Error Type of messages when it detects a situation that requires customer action. A key point here is that the Test e-mail Notification will only be sent to e-mail users that are configured for Informational types of messages. We support three basic SMTP environments for e-mail. The default choice is to use the Domain Nameserver where you specify the domain name on the TCP/IP configuration panel. The other two choices: 1. Smart Host Relay e-mail and 2. Local e-mail must be selected from the e-mail Menu under Configure Communication Resources Menu. Microsoft Exchange and Lotus Notes are NOT SMTP environments. To send e-mail to these environments, the customer must have a Message Transfer Agent (MTA) installed and configured. Without this MTA, you will not be able to send e-mail to these types of environments.

Isolation
1. Determine if the problem is still occurring. From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Customer Notification (via E-Mail) Notes: a. This test procedure will ONLY send a test message to all destinations that are configured for Information type messages. (For example, if the

58

VOLUME 1, TotalStorage ESS Service Guide

MAP 1310: Isolating E-Mail Notification Problems


destination is configured for Information or All, they will receive the test message. If they are configured for Error or None, they will not receive the test message). b. This test will complete with Customer Notification Test Results Passed. No Problem Detected. This message means that the 2105 sent the e-mail messages, not that the customer e-mail destinations received the messages. This procedure will assist with isolating an e-mail problem. Did the customer receive the E-Mail test notification? v Yes, exit this MAP. The problem is no longer occurring. v No, continue with the next step. 2. Verify the e-mail is correctly configured and Enabled. Use the ESS Communication Worksheet and the E-Mail Menu options. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu E-Mail Menu 3. Verify that the e-mail address or destination is configured for Information or All types of e-mail. If the e-mail address is only configured for Error type of messages, they will not receive the Test messages as these are considered Informational messages. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu E-Mail Menu Change / Show E-Mail Destination Settings 4. Use the e-mail address and determine the hostname of the mail server. Note: For example, if the customer e-mail address is joe_customer@friendly.com. The userid is joe_customer and the hostname of the mail server is friendly.com. 5. Use the ping test to verify connectivity to the mail server hostname: From the service terminal Main Service Menu, select: Machine Test Menu External Connections Menu LAN Test from the External Connections Menu On the Ping Test for LAN panel, fill in the hostname of the mail server. Press Enter to run the test. 6. Was the Ping Test successful? v Yes, go to step 10 on page 60. v No, continue with the next step. 7. The Ping Test fails using the hostname. Ask the customer for the TCP/IP address for the mail server and retry the Ping Test using the TCP/IP address. 8. Determine if the TCP/IP address Ping Test was successful. v If the TCP/IP address Ping Test was not successful, work with your customer IT administrator to resolve the IP address problems. v If the TCP/IP address Ping Test is successful, youll need to fix why the cluster cannot resolve the hostname to an address for the mail server. v Verify that the nameserver is correctly specified.
Problem Isolation Procedures, CHAPTER 3

59

MAP 1310: Isolating E-Mail Notification Problems


From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Change / Show TCP/IP Configuration Minimum Configuration & Startup Choose en0 or et0 for the Interface that you are using. Verify the Nameserver and Gateway configuration. If the ping test with the host name is still not successful, work with the IT administrator to resolve the DNS problem. 9. Determine if Local e-mail is configured. v If Local e-mail is not configured, go to step 10. v If Local e-mail is configured, verify/add the hostname and IP address in the /etc/hosts file by: From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Change / Show TCP/IP Configuration Further Configuration Name Resolution Hosts Table (/etc/hosts) Add a Host (/etc/hosts) Add the Mail Server Hostname and TCP/IP address. 10. If the Mail Server Hostname Ping Test is successful, collect the following information: a. Does the customer have other UNIX based systems in his environment that e-mail correctly. If so, collect the sendmail.cf file from one of the systems and call your next level of support. (This configuration file will allow us to determine how they made another SMTP mail based host system function in their environment). b. What type of e-mail system does the customer have installed? (For example, Microsoft Exchange or Lotus Notes). Do they have the Message Transfer Agent (MTA) installed that will translate the SMTP mail from the ESS cluster to their e-mail systems requirement? Do they have a firewall preventing the e-mail transfer? Please make note of your answers from the above procedure, collect your ESS Communication Resources worksheets, and then call your next level of support.

MAP 1320: Isolating Problems Using Visual Symptoms


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Most visual symptoms create a related problem which should be used to start the problem repair. If a related problem was not created, the table below can be used to start the repair.

60

VOLUME 1, TotalStorage ESS Service Guide

MAP 1320: Visual Symptoms

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. v Locate your visual symptom in the following tables then follow the description and actions. 2105 Model 800 operator panel, use Table 17 2105 Model 800 rack, cluster, and storage bay, use Table 18 on page 63 2105 Model 800 CEC drawer, I/O drawer, and host bay, use Table 19 on page 64 DDM bay and DDMs, use Table 21 on page 66
Table 17. 2105 Model 800 Operator Panel Visual Symptoms Visual Symptom Operator panel cluster Message indicator is on. Description and Action Description: A problem has been logged in that cluster. The indicator will go off when a service terminal login to that cluster occurs. Action: Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option to begin the repair. 2105 Model 800 Operator panel cluster Ready indicator is off. Description: During cluster power on and code load, status codes are displayed on the CEC drawer operator panel. When the code load is complete the CEC drawer operator panel Ready indicator LED will be lit. The LED is set to off when a cluster is fenced and a problem is created. Notes: 1. If you have switched off a PPS, there may be a failing operator panel indicator LED. This LED is controlled from the RPC card that is still powered on. If the Ready indicator switches on when the primary power supply is powered on, then there is no LED problem. 2. It is possible for the code to switch off the cluster Ready indicator, even when the cluster is still ready. The cluster will allow a service terminal login. The Repair Menu, End Of Call Status option will show no related problem and the cluster will not be fenced or quiesced. The Ready indicator will return to normal operation when the cluster code is loaded again. 3. If the Repair Menu, End Of Call Status option shows the cluster fenced, but the Repair Menu, Show / Repair Problems Needing Repair option shows no related problem, call the next level of support. There is a code problem, all fencing should create a problem that defines what needs to be repaired. Do not reset the fence condition without first repairing the cluster. Description during normal operation: v Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option to repair any related cluster problems. If there are none, continue. v Observe the CEC drawer operator panel. If it is displaying a code, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. If it is not, continue. v There is no single point of hardware failure that can cause the operator panel Ready indicator to fail. (Behind each cluster Ready indicator are two LEDs, each controlled by a single RPC card.) Call the next level of support.
Problem Isolation Procedures, CHAPTER 3

61

MAP 1320: Visual Symptoms


Table 17. 2105 Model 800 Operator Panel Visual Symptoms (continued) Visual Symptom 2105 Model 800 operator panel cluster Ready or Message indicator is dimly lit. Description and Action Description: This is normal if one primary power supply and its RPC card have been powered off for service. Each Ready or Message indicator is lit by two LEDs behind the operator panel indicator lens. One LED is controlled by each RPC card. Both RPC cards receive the same signal from a cluster and each lights one LED behind the same lens. If one of the LEDs is not working, the operator panel indicator will appear dimly lit. Action: v Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option to repair any related cluster bay problems. If there are none, continue. v To determine which LED indicator is not lit, remove the four screws that fasten label panel to the front of the operator panel. Observe the two LEDs that are side by side. The left LED is from the RPC-1 card, the right LED is from the RPC-2 card. v The possible failing FRUs are the operator panel, RPC card, or RPC card to operator panel cable. Use the Main Service Menu, Repair Menu, Replace a FRU option. Both operator panel Line Cord indicators off. Description: Normal condition when the 2105 Model 800 is powered off, both primary power supplies (PPS) will have some indicators on. Will also occur if both customer line cords lose power, or both PPS input circuit breakers are in the off position. Both PPS will have all indicators off. Action: If both PPS have some indicators on, no action needed. If both PPS have all indicators off, ensure PPS input circuit breakers are on and have customer restore line cord power. One operator panel Line Cord indicator off, the other Line Cord indicator on. Description: Primary power supply (PPS) input power section problem. Action: 1. Use the service terminal to display and repair any related power problems. 2. Observe the PPS front status display. If any codes are displayed, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. 3. Observe the PPS front LED indicators. If the PPS Good LED (middle) indicator is on, the operator panel Line cord indicator should also be on. The indicator circuit is either not active, is broken or the indicator is bad. One of the following FRUs is failing: v PPS v PPS to RPC cable (PPS connector J4) (2105 Model 800 only) v RPC card (for that PPS) (2105 Model 800 only) v RPC to Operator Panel cable (RPC connector J2) (2105 Model 800 only) v PPS to Operator Panel cable (PPS connector J2) (2105 Expansion Enclosure only) Operator panel Line Cord indicator Description: The indicator slow blinks if a problem has been detected. A slow blinking. code is displayed in the primary power supply (PPS) status display. Action: Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. Operator panel Line Cord indicator Description: The indicator fast blinks while the cluster is powering on. fast blinking. Action: None. Wait up to three minutes for the cluster power on to complete.

62

VOLUME 1, TotalStorage ESS Service Guide

MAP 1320: Visual Symptoms


Table 18. 2105 Model 800 PPS and RPC Card Visual Symptoms Visual Symptom Description and Action

Both primary power supplies (PPS) Description: This occurs when both customer line cords lose power, or both have all indicators off. PPS input circuit breakers are in the off position. Action: Ensure PPS input circuit breakers are on and have customer restore line cord power. A code is displayed in the primary power supply (PPS) status display. Description: The PPS has detected an error condition. Action: Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. Primary power supply (PPS) indicators Description: There are five PPS indicators which can be as listed here: 1. UEPO PWR/STBY indicator is lit when customer line voltage input is available to the PPS. A code is displayed in the PPS status display. 2. UEPO Loop CMPLT indicator is lit when customer line voltage input is available to the PPS and the UEPO Switch is in the normal position. A code is displayed in the PPS status display. 3. PPS Good indicator slow blinks in standby mode when the 2105 Model 800 is off. The indicator is on when the 2105 Model 800 is powered on. 4. PPS Fault indicator slow blinks when a fault has been detected. A code is displayed in the PPS status display. 5. On Batt indicator is only lit when customer power to both line cords has been lost. The 2105 Model 800 will complete writing the customer data in cache to DDMs and will then power off within 5 minutes. Action: Use other visual symptom in this table to correct any problems. Primary power supply (PPS) status display and indicators are off. The other PPS has a status display code of 06. Description: The PPS has no customer line cord power and the PPS to PPS communication is failing. Action: Read the Attention below before continuing. Ensure the communication cable is connected to PPS connector J3 at both ends. If it is, replace it. The status code 06 will automatically reset when communication is again successful. Go to MAP 2340: PPS Status Code 06 on page 125. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS. RPC card indicator is off. Description: The RPC card indicator LED is off when the 2105 is powered off. The indicator is on when the 2105 is powered on, the RPC is receiving power from the PPS, and there is no RPC error. If an RPC error is detected, the indicator will be switched off, the RPC card will be fenced (removed from use) and a problem will be created. Action: Use the service terminal to display and repair any related problems. If there are none, observe the primary power supply (PPS) status code display at the front of the rack. If a code is displayed, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. If a code is not displayed, and the Repair Menu, End Of Call Status option shows the RPC card is fenced, call the next level of support. There is a code problem, all fencing should create a problem that defines the needed repair. Do not reset the fence condition without first repairing the cluster. Primary power supply (PPS) input circuit breaker is tripped. Description: An over-current condition in the PPS has occurred. The PPS digital status display between the front PPS fans may display a code of 16. Action: Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127.

Problem Isolation Procedures, CHAPTER 3

63

MAP 1320: Visual Symptoms


Table 18. 2105 Model 800 PPS and RPC Card Visual Symptoms (continued) Visual Symptom Description and Action

Primary power supply (PPS) output Description: An over-current condition outside the PPS has occurred. The circuit breaker is tripped. PPS digital status display between the front PPS fans may display a code of 13. Action: Go to MAP 2520: PPS Output Circuit Breaker Tripped on page 168. Table 19. 2105 Model 800 CEC, I/O, and Host Bay Visual Symptoms Visual Symptom CEC operator panel is blank or stopped with a progress code displayed. Description and Action Description: During cluster power on and code load, status codes are displayed, some codes for seconds, others for minutes. An error condition is occurring if a code is displayed for more than 10 minutes. The alternate cluster may have created a problem for the failing cluster. It may have specific problem information or may just report no communication with the failing cluster. Action: Connect the service terminal to the working cluster, display and repair any related problems for the failing cluster. If there are no problems, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. CEC drawer power LED (front lower left) is off or blinking. I/O drawer power LED on CEC drawer operator panel (upper left) is off or blinking. CEC drawer or I/O drawer power supply: This is normal if the cluster has been powered off for service. If the cluster is not being serviced, go to MAP 4880: Cluster Power On Problem on page 461. This is normal if the cluster has been powered off for service. If the cluster is not being serviced, go to MAP 4880: Cluster Power On Problem on page 461. Description during normal operation:

v PWR 1 and PWR 2 indicator LEDs are both on (green). v Input power indicator LEDs PWR v CHK/POWER GOOD indicator LED is on (green). 1 and 2 are both not on (green), If all three LEDs are not normal (on green), continue with the action. or v CHK/POWER GOOD indicator LED is off or on solid amber Host Bay drawer power supply: Input power LED indicator INPUT PRESENT is not on (green), or POWER ON: HA1 and HA2 indicator LEDs are both not on (green) Action: Go to MAP 2800: CEC or I/O Drawer Visual Power Supply Problem on page 171. Description during normal operation: INPUT PRESENT indicator LED is on (green). POWER ON: HA1 and HA2 indicator LEDs are both on (green).If all three LEDs are not normal (on green), continue with the action. Action: Go to MAP 2810: Host Bay Drawer Visual Power Supply Problem on page 174.

64

VOLUME 1, TotalStorage ESS Service Guide

MAP 1320: Visual Symptoms


Table 20. 2105 Model 800 Storage Bay Visual Symptoms Visual Symptom Storage bay power supply indicators: v PWR, J1 and J2 indicators are not both on (green) Description and Action Description during normal operation: v PWR, J1 and J2 indicators are both on (green) These two indicators monitor the DC input voltage to the power supply. They are green when the DC input voltage from the primary power supply is present. Action: Go to MAP 3387: Isolating a Storage Cage Power Supply Failure on page 251. Storage bay power supply indicators: Description during normal operation: v CHK/POWER GOOD indicator is on (green) v CHK/POWER GOOD indicator is This indicator is green with normal power on. not on (green), or v CHK/POWER GOOD indicator is v If it is off, the power supply is not operating at all. on (amber) v If it is on amber, the power supply has detected a fault and has partly or completely powered off. Action: Go to MAP 3387: Isolating a Storage Cage Power Supply Failure on page 251. Storage bay FAN/POWER SUPPLY Description during normal operation: CHECK summary indicator: v CHECK indicator is normally off. v CHECK indicator is on (amber) This indicator is off during normal operations. If it is on, the fan/power sense card has detected a storage bay or power supply failure. Action: Go to MAP 3379: Analyzing a Storage Cage Fan/Power Sense Card Check Summary Indicator On on page 246. Storage bay FAN POWER SENSE CARD CHECK indicator: v CARD CHECK indicator is on (amber) Description during normal operation: v CARD CHECK indicator is normally off. This indicator is off during normal operations. If it is on, the fan/power sense card is failing. Action: Go to MAP 3378: Isolating a Storage Cage Fan/Power Sense Card Error on page 245. A cooling fan is not turning: Description during normal operation: v All cooling fans should be turning. A problem should have been created for this. Action: Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option to repair the fan. Note: If no problem exists there are two problems, the fan and the fan detection circuitry. Call the next level of support before replacing the fan. Input power indicator is off for a CEC, I/O, or Host Bay drawer power supply. Description: The power supply is not detecting input power from the input connector associated with that indicator. Action: Use the service terminal to display and repair any related problems, if there are no problems, go to MAP 2220: Input Power to CEC, I/O, Host Bay Drawer Power Supply Not Detected on page 120. CHK/PWR Good amber indicator is Description: A power fault has been detected. on for a CEC or I/O drawer power Action: Use the service terminal to display and repair any related problems, supply. if there are no problems, go to MAP 2230: CEC, I/O, or Host Bay Drawer Power Fault on page 122.

Problem Isolation Procedures, CHAPTER 3

65

MAP 1320: Visual Symptoms


Table 21. DDM Bay, and DDMs Visual Symptoms Visual Symptom Lights on disk drive modules, DDM bay: Description and Action Description during normal operation:

v Green DDM ready indicator is on and v Green DDM ready indicator is off v Amber DDM check indicator is off or Action: Look at all of the above indicators on all of the DDMs in the DDM v Amber DDM check indicator is bay. on. v If all of the indicators on all of the DDMs in the DDM bay are off, go to For indicator locations, see DDM MAP 3395: Isolating a DDM Bay Power Problem on page 261. if the Bay Disk Drive Module Indicators DDM indicators are not as described above, go to MAP 3520: DDM Bay on page 23. Verification for Possible Problems on page 284. Controller card DDM Check indicator, DDM bay: v Check indicator is on (amber) This indicator is off during normal operations. If it is on, the DDM bay controller card has detected a failure in the DDM bay. Action: Go to MAP 3520: DDM Bay Verification for Possible Problems on page 284. Controller Card CHECK indicator, DDM bay: v Card Check indicator is on (amber) Description during normal operation: v Card Check indicator is normally off. This indicator is off during normal operations. If it is on, the DDM bay controller card is failing. Action: Go to MAP 3397: Isolating an SSA DASD DDM Bay Controller Card Problem on page 263. Description during normal operation: v Check indicator is normally off.

MAP 1460: Isolating E-Mail Reported Errors


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A problem was created by one of the 2105 clusters. It was stored in the problem and an e-mail copy of it was sent to the e-mail destination(s). The 2105 operator panel Message indicator for the reporting cluster should be on steady (not blinking). The customer may have given you a copy of the e-mail or may just have told you that an e-mail occurred. The service terminal will be used to display and then repair the problem.

Procedure
Use the following to begin the problem repair. If you have a copy of the e-mail problem, and this Service Guide you may be able to plan the service action prior to arriving at the 2105 Model 800. The problem displays the FRUs and/or isolation procedures used to determine the FRUs. Go to, MAP 1210: Displaying and Repairing a Problem on page 51.

MAP 1480: Replacing a FRU, Without Using a Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

66

VOLUME 1, TotalStorage ESS Service Guide

MAP 1480: FRU Replacement Without Problem

Description
Occasionally you may need to replace a FRU that is not failing and has not generated a problem. The following procedure uses the service terminal functions to replace a FRU with no problem. This procedure replaces a FRU that no problem has been logged for.

Procedure
1. Select a FRU for replacement. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Cluster FRUs Host Bay FRUs DDM Bay FRUs Rack Power Cooling FRUs Device Power Cooling FRUs Electronics Cage Power Cooling FRUs Select the FRU area and press enter. Select the FRU in the FRU area and press enter. 2. Follow the service terminal instructions.

MAP 1500: Ending a Service Action


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Before leaving the customer account the following actions are needed: v Ensure that the problem just repaired had its problem closed. If not, use the menu option to close it. Note: Closing or cancelling a problem will attempt to return to customer use any fenced or quiesced resources. v Ensure that any resources associated with the repair have been returned to customer use. v Ensure that any other resources not available for customer use are associated with problem(s) still needing repair. Plan to repair those problems.

Procedure
1. If the service terminal repair process did not automatically close the problem, then use this step to close it now. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Close a Previously Repaired Problem. Note: Closing or cancelling a problem will attempt to return to customer use any fenced or quiesced resources. If the problem was not fully repaired, the existing problem may be updated or a new problem created. 2. Use the service terminal options listed below to ensure all resources for this repair have been returned to customer use (they will not be listed). Any listed resources are not available for customer use and will still be quiesced or
Problem Isolation Procedures, CHAPTER 3

67

MAP 1500: End Service Action


fenced. Those resources should have a related problem listed that still needs repair. If resources are listed and there are no problems listed, call the next level of support. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu End of Call Status

MAP 1600: ESSNet Console Problem


Description
The ESSNet Console platform has a software or hardware problem.

ESSNet Console Repair Process


The ESSNet Console is an off-the-shelf personal computer (PC) that has been converted into an ESSNet Console. The repair process for the ESSNet Console is: 1. Repair the personal computer 2. Restore the personal computers software 3. Convert the personal computer to an ESSNet Console Isolation: Note: If you are not trained on repairing personal computers, have the ESSNet PC repaired by a qualified technician. 1. A problem with the ESSNet Console is occurring. Find the description that applies: v Hardware problem other than hard drive, go to step 2. v Hardware problem with hard drive, go to step 3. v Software problem with the ESSNet Consoles operating system, go to step 4 on page 69. v Software problem with the ESSNet Console application software, go to step 5 on page 69. v For other problems call your next level of support. 2. Any hardware problem with the keyboard, mouse, display or server platform should be repaired using the repair procedures in MAP 1602: Repairing the ESSNet Consoles Personal Computer on page 69. After the repair is complete, ensure that the ESSNet Console applications function properly. If the applications do not function properly, continue to step 4 on page 69 to reload the ESSNet Consoles operating system. 3. This step will replace the ESSNet Consoles hard drive. Please read the note below before proceeding. If you have determined that the hard drive is failing then use the repair procedures in MAP 1602: Repairing the ESSNet Consoles Personal Computer on page 69 to replace the hard drive. After the hard drive is replaced, continue with the next step. Note: If you are here because you saw the message Drive does not contain a valid boot sector, no operating system was found on the hard drive. Go to step 4 on page 69 and attempt to restore the PC operating system. If that is unsuccessful, return here and continue with this step to replace the Hard Drive.

68

VOLUME 1, TotalStorage ESS Service Guide

MAP 1600: ESSNet Console Problem


4. The personal computers operating system needs to be reloaded. The hard drive has been replaced or there was an operating system problem that could not be recovered. Perform the procedures in MAP 1604: Restoring the Personal Computers Software. After the personal computers software is installed, continue with the next step. 5. The ESSNet Consoles applications software must be installed and configured. Perform the procedures in MAP 1606: Converting the Personal Computer to an ESSNet Console on page 76. 6. END of MAP. Return to the MAP or procedure that sent you here.

MAP 1602: Repairing the ESSNet Consoles Personal Computer


Description
The ESSNet Consoles personal computer has a hardware problem.

Repairing the Personal Computer


Since the ESSNet Console is an off-the-shelf personal computer, it should be repaired by a person who is trained on repairing PCs. Several levels of repair assistance are available, the following list shows the preferred order of service: 1. A person trained on repairing personal computers. 2. The IBM Technical Support Line at 1-800-IBM-2472. 3. The IBM Personal Systems Help Center at 1-800-772-2227. 4. The personal computer Hardware Maintenance Manuals on the Service Document CD-ROM (SK2T-8771) shipped with each 2105 Model 800. 5. IBM Personal Computing Support on the Internet at http://www.ibm.com/pc/support. 6. In emergency situations, the IBM 2105 Field Support Center can authorize shipment of a replacement ESSNet Console. 7. If you are repairing an Master Console, get the PC Doctor CD-ROM or diskette from the ship group. Insert the CD-ROM or diskette into the ESSNet console and boot the console, use PC Doctor to assist in problem isolation. 8. END of MAP. Return to the MAP or procedure that sent you here.

MAP 1604: Restoring the Personal Computers Software


Description
The ESSNet Consoles personal computer (PC) has a software problem. Note: The IBM TotalStorage ESS Master Console will be referred to as the Master Console in this document.

Restoring the Personal Computers Software


This procedure is only required if the PCs hard drive has been replaced or its software has become damaged. v ESSNet Console on IBM PC300 PCs: The personal Computers software can be restored to its off-the-shelf state using procedure Restoring PC Software ESSNet Console on IBM PC300 PCs on page 71. v ESSNet Console on IBM NetVista PCs: The personal computers software can be restored to its off-the-shelf state using procedure Restoring PC Software ESSNet Console on IBM NetVista PCs on page 70.

Problem Isolation Procedures, CHAPTER 3

69

MAP 1604: Restoring the Personal Computers Software


v Master Console on IBM PC300 PCs, IBM NetVista PCs, Xseries 205 or Xseries 206 PCs: The personal Computers software can be restored to its off-the-shelf state using procedure Restoring PC Software - Master Console on IBM PC300 PCs, IBM NetVista PCs, Xseries 205 or Xseries 206 PCs on page 72.

Restoring PC Software - ESSNet Console on IBM NetVista PCs


The NetVistas software is restored to its off-the-shelf state using the following procedure. If you experience any difficulties refer to the NetVista Quick Reference manual, procedure: Reinstalling the Operating System. 1. Power on the ESSNet console by pressing and releasing the ESSNet console ON/OFF switch. 2. As soon as To Start the Product Recovery Program, Press F11 is displayed, immediately press F11. If the To Start the Product Recovery Program, Press F11 message is NOT displayed, the PCs hard drive has been replaced or its partitions have become damaged. With this condition, you will have to restore the Personal Computers software using the PCs Product Recovery CD-ROM. If you cannot locate the PCs Product Recovery CD-ROM, do one of the following: v Use the Product Recovery CD-ROM from another ESSNet Console. v Contact a CE who is trained on IBM Personal Computers, they have access to Product Recovery CD-ROMs. v In the US, a replacement CD-ROM can be ordered from 1-800-722-2227. Provide them with the PC model, the PC type, the PC serial number, and an external mailing address. v Information on obtaining a replacement Product Recovery CD-ROM is available on the World Wide Web at the IBM personal computing support page at http://www.ibm.com/pc/support. v If there is difficulty obtaining a Product Recovery CD-ROM through any of the above methods, contact your next level of support for PE assistance. 3. When IBM Product Recovery Program Version 5.0 is displayed, select Windows NT 4.0 and press the Enter key. 4. At the EasyRestore screen, click Continue to start the recovery process. Press Yes if a warning message comes up. Press OK and continue if an error message comes up during the restore process. 5. When the recovery is complete, remove the IBM Product Recovery CD and reboot the PC. 6. When the PC reboots successful, you should see the ESSNet Console II Product Recovery Wizard screen. If you do not see this screen, the ESSNets software has not been restored properly and you will need to either repeat this procedure or go to MAP 1600: ESSNet Console Problem on page 68 to repair the PC and reinstall its software. 7. When Your hard disk will be formatted, and all files will be deleted is displayed, press the y key to continue. 8. Wait seven to ten minutes for the recovery to complete. 9. When the Recovery is Complete is displayed, press the Enter key to restart the computer. 10. Wait for the system to reboot and go through the IBM Windows NT Setup several times. Be patient, it takes 10 to 15 minutes to complete. 11. When the Window NT Work Station Setup window displays Window NT Setup, the personal computers software has been restored. 12. END of MAP. Return to the MAP or procedure that sent you here.

70

VOLUME 1, TotalStorage ESS Service Guide

MAP 1604: Restoring the Personal Computers Software

Restoring PC Software - ESSNet Console on IBM PC300 PCs


The personal computers software is restored to its off-the-shelf state using the IBM PC 300s Product Recovery procedure. This procedure is in the About Your Software Windows NT Workstation 4.0, Applications and Support Software pamphlet. This pamphlet is shipped with the IBM PC 300, and uses the IBM PC 300s Product Recovery CD-ROM to restore the hard drive to its off-the-shelf state. Use the following procedure to recover the factory-installed operating system and software: 1. Make backup copies of the configuration files and any files you created. Any files not backed up will be lost. 2. Insert the IBM Product Recovery CD into the CD-ROM drive. If you cannot locate the PCs Product Recovery CD-ROM, do one of the following: v Use the Product Recovery CD-ROM from another ESSNet Console. v Contact a CE who is trained on IBM Personal Computers, they have access to Product Recovery CD-ROMs. v In the US, a replacement CD-ROM can be ordered from 1-800-722-2227. Provide them with the PC model, the PC type, the PC serial number, and an external mailing address. v Information on obtaining a replacement Product Recovery CD-ROM is available on the World Wide Web at the IBM personal computing support page at http://www.ibm.com/pc/support. If there is difficulty obtaining a Product Recovery CD-ROM through any of the above methods, contact your next level of support for PE assistance.

3. Restart the PC and do the instructions on the screen. If the PC does not start from the CD on the first try, change the startup sequence in the Configuration/Setup Utility program: __ a. Turn off the PC, wait a few seconds, then turn the power on again. __ b. When the Configuration/Setup Utility program prompt appears in the lower left corner of the screen, quickly press F1. Note: The Configuration/Setup Utility program prompt will only appear on the screen for a few seconds. You must press F1 quickly. __ c. Select Start Options from the Configuration/Setup Utility program menu. __ d. Select Startup Sequence from the Start Options menu. __ e. Write down the startup sequence shown on the screen. You will need this information later to restore the original startup sequence after the recovery process. __ f. Change the First Startup Device to the CD-ROM drive. __ g. Press Esc (escape) until you return to the Configuration/Setup Utility program menu. __ h. Before you exit from the program, select Save Settings from the Configuration/Setup Utility program menu, then press Enter. __ i. Exit the Configuration/Setup Utility program, press Esc and do the instructions on the screen. 4. When asked for Recovery Type, choose Full Recovery. 5. When the recovery is complete, remove the IBM Product Recovery CD and restart the PC. 6. END of MAP. Return to the MAP or procedure that sent you here.
Problem Isolation Procedures, CHAPTER 3

71

MAP 1604: Restoring the Personal Computers Software

Restoring PC Software - Master Console on IBM PC300 PCs, IBM NetVista PCs, Xseries 205 or Xseries 206 PCs
The personal computers software is restored to its off-the-shelf state using the Master Console (IBM TotalStorage Master Console) Product Recovery procedure. This procedure uses the Master Console Product Recovery CD-ROM to restore the ESSNets software to the state that it was when it left 2105 Manufacturing. Note: The Master Console Product Recovery CD-ROM is different for Xseries 206 PCs. If you have an Xseries 206 PC then ensure that you use part number 22R1500. 1. If the Master Console is bootable, make backup copies of the configuration files and any files you created. Any files not backed up will be lost. If you are unable to make backup copies of the configuration then continue with the next step. Note: Refer to ESSNet Backup/Restore Configuration Data in chapter 5 of the Volume 2 2. Insert the Master Console Product Recovery CD into the CD-ROM drive. Note: If there are multiple copies of the Master Console Product Recovery CD available, use the newest one. 3. Restart the PC and wait for it to boot. If the PC does not start from the CD on the first try, change the startup sequence in the Configuration/Setup Utility program: __ a. Turn off the PC, wait a few seconds, then turn the power on again. __ b. When the Configuration/Setup Utility program prompt appears in the lower left corner of the screen, quickly press F1. Note: The Configuration/Setup Utility program prompt will only appear on the screen for a few seconds. You must press F1 quickly. __ c. Select Start Options from the Configuration/Setup Utility program menu. __ d. Select Startup Sequence from the Start Options menu. __ e. Write down the startup sequence shown on the screen. You will need this information later to restore the original startup sequence after the recovery process. __ f. Change the First Startup Device to the CD-ROM drive. __ g. Press Esc (escape) until you return to the Configuration/Setup Utility program menu. __ h. Before you exit from the program, select Save Settings from the Configuration/Setup Utility program menu, then press Enter. __ i. Exit the Configuration/Setup Utility program, press Esc and do the instructions on the screen. At the EasyRestore screen, click continue to start the recovery process. When the recovery is complete, remove the IBM Product Recovery CD and restart the PC. When the PC restarts, you should see the Master Console Product Recovery Wizard screen. If you do not see this screen, the ESSNETs software has not been restored properly and you will need to either repeat this procedure or go to MAP 1600: ESSNet Console Problem on page 68 to repair the PC and reinstall its software. When the Master Console Product Recovery Wizard screen is displayed, continue with one of the following MAPs:

4. 5. 6.

7.

72

VOLUME 1, TotalStorage ESS Service Guide

MAP 1604: Restoring the Personal Computers Software


v IBM PC300 PCs, IBM NetVista PCs, or Xseries 205 go to MAP 1605: Master Console Product Recovery Wizard. v Xseries 206 go to MAP 1630: Master Console Product Recovery Wizard for Xseries 206 PCs on page 111. 8. END of MAP. Return to the MAP or procedure that sent you here.

MAP 1605: Master Console Product Recovery Wizard


Description
The ESSNet Consoles personal computer (PC) has a problem. Note: The IBM TotalStorage ESS Master Console will be referred to as the Master Console in this document.

Run the Hardware Configuration


This map is only required if Restoring PC Software - Master Console on IBM PC300 PCs, IBM NetVista PCs, Xseries 205 or Xseries 206 PCs on page 72 has been previously performed. Note: This map should not be used for Xseries 206 PCs. For Xseries 206 PCs use MAP 1630: Master Console Product Recovery Wizard for Xseries 206 PCs on page 111. 1. Verify that the ESSNet PC is turned on and the initial Master Console Product Recovery Wizard screen is displayed. 2. At the RESTORE CONFIGURATION screen (1 of 4): a. To restore a previously backed up configuration, type Yes and press Enter: 1) Insert the diskette containing the backup into the drive and press Enter.

2) Press Enter again. 3) Remove the diskette from the drive and press Enter. b. If there is no previously backed up configuration, type No and press Enter. 3. At the HARDWARE CONFIGURATION screen (2 of 4), press the Enter to continue. 4. At the Welcome to Kudzu screen, press any key to start the hardware detection process kudzu. Note: Steps 5 to 6 on page 75 depend on the physical configuration of the ESSNet PC. More than one step might be missing, or appear in a different order, or display slightly different messages. 5. The following steps will be used to build the Hardware configuration files for the LINUX operating system. The automated process will discover the installed hardware and prompt for configuration options. Use the following table for guidance on the required responses for the type of hardware that is discovered. Notes: a. The sequence in which the hardware is discovered, the actual hardware and minor screen details will differ between PC models and Master Console Code levels. b. Only a sub-set of the hardware in the table below will be discovered for a particular PC type. That is normal and does not indicate a problem.
Screen Name (Hardware Type) Response Actions Notes

Problem Isolation Procedures, CHAPTER 3

73

MAP 1605: Master Console Product Recovery Wizard


Hardware Added (Network Adaptor) for example, Ethernet Pro 100 1. Verify that the Configure button is selected then press the Enter key. 2. At the Existing Configuration Detected screen, verify that the Yes button is selected, then press the Enter key. If an error occurs during configuration of the video/graphics adapter this will be handled later in step 8 on page 75. This screen may appear in place of the Monitor Probe screen.

Hardware Added Verify that the Configure button is selected, (Video/Graphics then press the Enter key. Adaptor) for example, ATI / Rage XL

Monitor Setup

1. Select the installed monitor from the list. 2. Use the Tab key to move the cursor to OK button. 3. Press Enter to continue.

Monitor Probe

Verify that the Yes button is selected, then press This screen may Enter. appear in place of the Monitor Setup screen. 1. The detected size of video memory will be preselected. 2. Use the Tab key to move the cursor to OK button. 3. Press Enter to continue.

Video Memory

Clockchip Configuration

1. No Clockchip Setting (recommended) will be preselected. 2. Use the Tab key to move the cursor to OK button. 3. Press Enter to continue.

Select Video Modes

1. Press the Tab key twice to move the cursor to the column labeled 24 bit:. 2. Use the down arrow key to highlight the field [ ] .1024x768.. 3. Press the space bar to select the 1024 x 768 resolution. 4. Use the Tab key to move the cursor to OK button. 5. Press Enter to continue.

Starting X Screen

1. Verify that the OK button is selected then press Enter. 2. At the Can You See This Message message, verify that the Yes button is selected, then press Enter. 3. At the Xconfigurator can set up your computer to automatically start X upon booting. Would you like X to start when you reboot? message, verify that the Yes button is selected then press Enter.

74

VOLUME 1, TotalStorage ESS Service Guide

MAP 1605: Master Console Product Recovery Wizard


Hardware Added (Unknown device) for example 8086:24c2 Hardware Added (Audio) Hardware Added (Mouse) Configure Mouse Verify that Configure is selected then press Enter. Screen may be seen several times with different devices. Screen may not appear on Xseries 205 Screen may not appear on Xseries 205 Screen may not appear on Xseries 205

Verify that Configure is selected then press Enter. Verify that Configure is selected then press Enter. 1. The type of mouse will be preselected. 2. Use the Tab key to move cursor to the Emulate 3 Buttons? field. 3. Press the space bar to select emulating 3 buttons. 4. Use the Tab key to move the cursor to OK button. 5. Press Enter to continue.

Update X Configuration Hardware Added (CD-ROM) Hardware Added (Hard disk)

Verify that the Yes button is selected, then press Screen may not Enter. appear on Xseries 205 Verify that Configure is selected, then press the Screen may not Enter key. appear on Xseries 205 Verify that Configure is selected, then press the Screen may not Enter key. appear on Xseries 205

6. At the TIMEZONE CONFIGURATION screen (3 of 4), press Enter. 7. At the Configure Timezones screen: a. Use the Tab key to move the cursor from the [*] Hardware clock set to GMT field, to the list displaying the locations. b. Use arrow keys to select your location, for example America/Los_Angeles. c. Use the Tab key to move the cursor to OK button. d. Press Enter to continue. 8. At the CONFIGURATION COMPLETE screen (4 of 4): v If the video/graphics adapter was configured successfully in step 5 on page 73, type No and press the Enter key. v If an error occurred while configuring the video/graphics adapter, type YES and press the Enter key. Then perform MAP 1608 starting with step 4 on page 86. 9. The ESSNet Console PC will now begin booting. Check for the following messages during the PC boot process: a. Equinox SST driver loaded: The MSA PCI card has been recognized by the system b. Installed Memory: xxxMByte: The number in xxx should be 192 or higher c. The modem has been secured successfully: The modem is recognized by the system. This messages will stay on the PC monitor screen for 30 seconds. Press any key to continue before the 30 seconds times out. 10. The Master Console login screen (blue background with Console logos) indicates the Master Console is ready for use.
Problem Isolation Procedures, CHAPTER 3

75

MAP 1605: Master Console Product Recovery Wizard


Note: If you do NOT get the Master Console login screen or error messages are displayed, perform MAP 1608: Manually Configuring the Video/Graphics Adapter for the Master Console on page 86. Examples of screen or error messages: v The terminal-based login screen (black background) is displayed v According to /var/run/adm.pid, gdm was already running (810), but seems to have been murdered mysteriously error message 11. END of PROCEDURE. Return to the MAP or procedure that sent you here.

MAP 1606: Converting the Personal Computer to an ESSNet Console


Description
The ESSNet Console applications software must be reloaded.

Converting the Personal Computer to an ESSNet Console


1. Check that the voltage setting for the input power is correct for the ESSNet console. The switch is above the power connector on the rear of the console (115V or 230V). 2. Determine which type of ESSNet Console you have: v ESSNet Console on IBM PC300 PC: The personal Computer can be converted to an ESSNet Console using procedure ESSNet Console Installation, Personal Computer 300PL Only on page 81. v ESSNet Console on IBM NetVista PC: The personal Computer can be converted to an ESSNet Console using procedure ESSNet Console Installation for NetVista. v ESSNet Console II on IBM PC300 PC or NetVista PC: This procedure assumes that the PC is either a new ESSNet Console II PC or an ESSNet Console with software that has just been restored using MAP 1604 and MAP 1605. If neither of these is true, perform the procedure Restoring PC Software - Master Console on IBM PC300 PCs, IBM NetVista PCs, Xseries 205 or Xseries 206 PCs on page 72 before continuing. Convert the personal Computer to an ESSNet Console II using the procedure Configuring the ESSNet Console II in chapter 5 of the Volume 2. 3. Return to the MAP or procedure that sent you here.

ESSNet Console Installation for NetVista


1. Power on the ESSNet Console. 2. At the Windows NT Setup screen: a. Click Next. b. Click the I accept this agreement button, then click Next. 1) Enter essnet1 as the name, leave organization blank, then click Next. 2) Enter ESSNET1 as the computer name, then click Next. Note: The computer name will always appear in uppercase. 3) Enter the Password of password, enter the Password of password again to confirm it, then click Next. Note: Enter password in lower case. 4) Click on Finish, the machine will reboot. 3. At the Begin Logon screen, press the Ctrl + Alt + Delete keys.

76

VOLUME 1, TotalStorage ESS Service Guide

MAP 1606: Converting the Personal Computer to an ESSNet Console


4. At the Logon Information window, enter Username: Administrator, password = password. Click OK. 5. Display the Setup of the ESSNet Console: a. Right click on a clear area of the desktop (not over any icon). b. Select Properties. c. Select the Background tab, then select None. d. Select the Screen Saver tab. e. Open the drop down list box and select 3D Text (Open GL). f. At Screen Saver click on Settings. g. Click on the Text button then enter IBM ESSNet next to the radio button. h. Click OK. i. Set the wait time to 15 minutes. j. At Display Properties, select the Settings tab. k. At Color Palette, select 65536 Colors from the drop down menu. l. At Desktop Area, adjust the slider to 800 x 600. m. Click on Test. n. At the Testing Mode window, click OK and wait to view the test screen. o. If you saw the bit map correctly, click Yes. p. At the Display Properties window, click Apply. q. Click OK to close the Display Properties window. r. Continue with TCP/IP Setup of the ESSNet Console for NetVista

TCP/IP Setup of the ESSNet Console for NetVista


1. Right click on the Network Neighborhood icon. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. Select Properties. At Network Configuration, click Yes to install the NT Networking. Check the Wired to the Network box, then click Next. Click Start Search to find the ethernet adapter. Verify that the ethernet adapter is checked, then click Next. Verify that TCP/IP Protocol is checked, then click Next. Verify that all four Network Services boxes are checked, then click Next. Click Next to install the selected components. Click Continue to install the drives from the drive c:... Click Continue to copy some Windows NT files. If a DHCP question is asked, click NO. Under the Adapter drop down menu, verify that (1) Intel(R) PRO/100 VE Desktop Connection is selected. Select the Specify an IP Address radio button. Enter an IP Address of 172.31.1.250. Enter a Subnet Mask of 255.255.255.0. Leave the Default Gateway blank. Select Apply. Select OK. Click Next. Click Next again to start the network. Verify the Workgroup radio button is selected, then click Next.
Problem Isolation Procedures, CHAPTER 3

77

MAP 1606: Converting the Personal Computer to an ESSNet Console


23. Click Finish. 24. Click Yes to reboot the ESSNet Console. 25. Continue with ESSNet Setup for NetVista.

ESSNet Setup for NetVista


1. At the Begin Logon screen, press the Ctrl + Alt + Delete keys. 2. At the Logon Information window, enter Username: Administrator, password = password. Click OK. 3. Insert the current ESSNet Console Installation Diskette into the floppy diskette drive. 4. On the desktop, click Start, then Run. 5. Enter a:setupenc.exe, then press Enter. 6. Do the instructions on the screen by selecting Yes to continue. 7. At the Installshield Self - extracting EXE window, click Yes, then Next, then Yes, then Next then Finish. 8. After clicking Finish, remove the ESSNet Console Installation Diskette from the a: drive. 9. Close the ESS Network window. 10. On the ESSNet console desktop, double click on the ESSNet Toolkit icon. 11. 12. 13. 14. 15. 16. 17. At the Missing Setup Files... dialog box, click OK. Click the Install/Configure tab. Click ESSNet Configuration. Click the Subsystem tab. Click Add ESS (2105). Click Save. Click OK.

18. Close the ESSNet Toolkit. 19. Continue with Web Browser Setup for NetVista.

Web Browser Setup for NetVista


1. Start the web browser by clicking the Windows Start menu then choose Programs, ESS Network, ESSNet Console.... 2. From the pull down menu, click on Tools->Internet Options. This will bring up the Internet Options panel. 3. Under the General tab, click the Use Current button. 4. Click the Security tab. 5. Click the Internet icon. 6. Click the Custom Level. This will bring up the Security Settings panel. 7. Scroll through the Java section and under Java Permissions, click on the Custom radio button. 8. At the bottom of the Security Settings panel, click on Java Custom Settings. This will display the Internet panel. 9. Click the Edit Permissions tab. 10. At Run Unsigned Content, click the Enable radio button. 11. 12. 13. 14. At the Internet panel, click OK. At the Security Settings panel, click OK. At the Warning ! screen, click Yes. At the Internet option panel, click OK.

78

VOLUME 1, TotalStorage ESS Service Guide

MAP 1606: Converting the Personal Computer to an ESSNet Console


15. 16. 17. 18. 19. Maximize the web browser window (Home-Microsoft Internet Explorer). Click the ESS Specialist button. After the window changes, under Select a cluster, click (ESS-1 cluster-1). At the Internet Connection Wizard window, click Cancel. At another Internet Connection Wizard window, check Do not show the Internet Connection Wizard in the future, then click Yes. 20. Close the web browser (Specialist-Microsoft Internet Explorer). 21. Continue with Cleanup Desktop for NetVista.

Cleanup Desktop for NetVista


1. Move (drag and drop) all of the icons below to the right side of the desktop: v ESSNet Toolkit v Internet Explorer v v v v v My Computer Network Neighborhood Inbox My Briefcase Recycle bin

Delete all of the remaining icons on the left side of the desktop. Note: To move an icon: 1. Left click and hold on the icon, 2. Move mouse/icon, 3. Release the mouse button to drop the icon. To delete an icon: 1. Right click on the icon, 2. Select Delete from the drop down menu. 2. Reply Yes to the boxes that ask you to confirm deletion of the icons. 3. Right click on the desktop (not over any icon on the desktop). From the drop down menu select Arrange Icons then Auto Arrange. 4. Continue with Install Netscape Browser for NetVista.

Install Netscape Browser for NetVista


The Netscape browser can be installed two different ways, this is determined by which installation CD-ROM was shipped with the system: v Installing Netscape Browser Using Software Selections CD-ROM v Installing Netscape Browser Using Ready-to-Configure Utility Program CD-ROM on page 80 Installing Netscape Browser Using Software Selections CD-ROM: 1. Insert the Software Selections CD-ROM, that came with the PC system, into the PCs CD-ROM drive. It will take a minute to display the Software Selections for IBM window. 2. At the Software Selections for IBM window: a. At the Install column, check Netscape Communicator. Remove the checks from all other boxes. b. Click Install, then click OK. 3. When the installation completes, at the Software Selections for IBM window, click Exit. 4. Close the Netscape Communicator window, then remove the Software Selections CD-ROM from the CD-ROM drive. 5. Continue with Setup Netscape Browser for NetVista on page 80.

Problem Isolation Procedures, CHAPTER 3

79

MAP 1606: Converting the Personal Computer to an ESSNet Console


Installing Netscape Browser Using Ready-to-Configure Utility Program CD-ROM: 1. Insert the Ready-to-Configure Utility Program CD-ROM, that came with the PC system, into the PCs CD-ROM drive. 2. Double click the My Computer icon. 3. Double click the D:drive (Nirtcn019ww). 4. Double click the Common folder. 5. Double click the Netscape folder. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Double click the Setup icon (with the green monitor icon). At the Netscape Communicator 4.5 Setup window, click Next. At the Software license agreement, click Yes. Click Next. At the Question window, click Yes. Click Next. Click Install. Click No for not viewing the README file. Click OK. At Restarting windows, click OK to restart my computer now.

16. At the Begin Logon screen, press the Ctrl + Alt + Delete keys. 17. At the Logon Information window, enter Username: Administrator, password = password. Click OK. 18. Close all of the windows. 19. Remove the Ready-to-Configure Utility Program CD-ROM from the CD-ROM drive. 20. Continue with Setup Netscape Browser for NetVista.

Setup Netscape Browser for NetVista


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. On the desktop, double click on the Nescape Communicator icon. As different windows are displayed, click on Next five times. At the Set up your Newsgroups Server window, click on Finish. At the Netscape Navigator window, click Yes. At the Netscape warning window, click OK. At the Netscape Browser, click EDIT. Click Preferences. At the Home Page location field, enter c:\Program Files\ESSNet\www\index.htm. Click OK. Close the Netscape Browser window. Click the Start button. Click Programs. Move the mouse pointer over the Startup menu item (so that its highlighted), then click the right-mouse button, and then click Explore.

Note: If there are two Startup menu items, select the one that contains the ESSNet Console. 14. In the Exploring-Startup window, click once on the ESSNet Console icon in the right-hand panel, and then press the Delete key on the keyboard.

80

VOLUME 1, TotalStorage ESS Service Guide

MAP 1606: Converting the Personal Computer to an ESSNet Console


15. At the Confirm File Delete window, click Yes. 16. On the Desktop, right click on the Netscape Communicator icon and drag the icon into the right-hand panel of the Exploring-Startup window. 17. At the pop-up menu, click Create Shortcut Here. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. Close the Exploring-Startup window. Click the Start button. Click Programs. Click ESS Network. Move the mouse pointer over the ESSNet Console menu item (so that it is highlighted). Click the right mouse button, then click Properties. In the Target field, enter: c:\Program Files\Netscape\Communicator\Program\netscape.exe. At the Netscape Console Properties window, click Apply, and then click Close. Shut the system down: click the Start button, select Shutdown, and then click OK. Continue with Setup Verification for NetVista.

Setup Verification for NetVista


1. Power the ESSNet console on. 2. At the Begin Logon screen, press the Ctrl + Alt + Delete keys. 3. At the Logon Information window, enter Username: Administrator, password = password. 4. Maximize the window or scroll down until Welcome to IBM StorWatch Enterprise Storage Network is displayed. If this is not displayed, the ESSNet Console setup has failed. Perform the ESSNet setup again by going to Restoring PC Software - ESSNet Console on IBM NetVista PCs on page 70. 5. Close the Welcome to IBM StorWatch... window. 6. At the desk top, double click on the ESSNet Toolkit icon. 7. The ESSNet Toolkit window should be displayed. If a Missing Setup Files message is displayed, the ESSNet Console setup has failed. Perform the ESSNet setup again by going to Restoring PC Software - ESSNet Console on IBM NetVista PCs on page 70. 8. Close the ESSNet Toolkit window. 9. End of procedure. 10. END of PROCEDURE. Return to the MAP or procedure that sent you her

ESSNet Console Installation, Personal Computer 300PL Only


1. Power on the ESSNet Console. 2. At the Windows NT Setup screen: a. Click Next on Windows NT Setup. b. Select the Radio button I accept this agreement and click Next. c. Enter essnet1 as the name, and leave organization blank and click Next. d. Enter the 20 digit Product ID found on the Certificate of Authenticity and click Next. e. Enter ESSNET1 as the computer name.
Problem Isolation Procedures, CHAPTER 3

81

MAP 1606: Converting the Personal Computer to an ESSNet Console


f. Set the Password to password. Note: Enter password in lower case. g. Click on Finish, and reboot machine. 3. Login to the NT operating system with login = administrator, password = password. 4. The Display Setup of ESSNet Console: a. If This is the first boot of the workstation you may have to close the Microsoft Internet Explorer window. b. Right click on the desktop (not over any icon on the desktop). c. Select Properties. d. Select the Background tab, and select None. e. Select the Screen Saver tab. f. Open the drop down list box and select 3D Text (Open GL). g. Click on Settings. h. Click on the Text radio button and type ESSNet next to the radio button. i. Click OK. j. Set the Wait to 15 minutes. k. Select the Settings tab. l. Select 65536 Colors from the drop down menu under Color Palette. m. Adjust the slider to the desired setting under Desktop Area: v 15 inch monitors, recommended setting 800 x 600 v 17 inch monitors, recommended setting 1024 x 768 n. Click on Test. o. Click OK on the Testing Mode window and wait to view the test screen. p. Click Yes if you saw the bitmap correctly. q. Click Apply on the Display Properties window. r. Click OK to close the Display Properties window. s. Use the buttons at the bottom of the monitor to adjust the screen. 5. Connect the ESSNet console to the Ethernet hub using the remaining RJ45 cable. Connect to hub port (8X) on the hub, do not connect to either hub port 16MDI-X or 16MDI. See in Figure 35 on page 96.

TCP/IP Setup of the ESSNet Console, Personal Computer 300PL Only


1. Right click the mouse on the Network Neighborhood icon. 2. Select Properties: Note: Do steps 3 to 12 on page 83 only if this is the initial setup of TCP/IP on the workstation. If this is not the initial setup of TCP/IP, do the following: a. Click on the Protocols tab. b. Highlight the TCP/IP Protocol. c. Click on the Properties tab. d. Continue with step 13 on page 83. 3. Click Yes under Network Configuration to install NT Networking. 4. Check the box by Wired to the Network, and click Next. 5. Click Start Search to find the Ethernet adapter.

82

VOLUME 1, TotalStorage ESS Service Guide

MAP 1606: Converting the Personal Computer to an ESSNet Console


6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. Ensure that the ethernet adapter is checked and click Next. Ensure that TCP/IP Protocol is checked and click Next. Ensure all Network Services are checked and click Next. Click Next to install selected components. Click Continue to install the drivers from the c: drive. Click Continue to copy some Windows NT files. Click OK, when the Ethernet Adapter properties are shown, if a question is asked about DHCP, click NO. Ensure the ethernet adapter is selected under Adapter drop down menu. Select the Specify an IP address radio button. Enter 172.31.1.250 for the IP Address. Enter 255.255.255.0 for the Subnet Mask. Leave Default Gateway blank. Select Apply. Select OK:

Note: Do steps 20 to 24 only if this is the initial setup of TCP/IP on the workstation. If this is not the initial setup of TCP/IP, do the following: a. Click OK. b. Click the Start button. c. Click the Shutdown button and restart the ESSNet console. d. Go to step 25. 20. Click Next. 21. Click Next to start the network. 22. 23. 24. 25. Ensure the Workgroup radio button is selected then click Next. Click Finish. Click Yes to reboot the ESSNet Console. Press the Escape (Esc) key to cancel DHCP Load during bootup.

ESSNet Setup, Personal Computer 300PL Only


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Login as administrator. Insert the ESSNet Console Installation Diskette into the floppy diskette drive. On the desktop click start then Run... Enter a:setupenc.exe. Do the instructions on the screen. Remove the ESSNet Console Installation Diskette from the a: drive. On the ESSNet console desktop, double click on the ESSNet Toolkit icon. When the Missing Setup Files... dialog box is displayed, click OK. If you do not get this box, see ESSTOOLKIT NOTES in README.TXT. Click Install/Configure tab. Click ESSNet Configuration. Click Subsystem tab. Click Add ESS (2105). Note: If the ESSNet is already connected to the customers network, enter the information from the Communication Resources Worksheets before clicking Save.
Problem Isolation Procedures, CHAPTER 3

83

MAP 1606: Converting the Personal Computer to an ESSNet Console


13. 14. 15. 16. Click Save. Click OK. Close the ESSNet Toolkit. Close the ESSNetwork window.

Web Browser Setup, Personal Computer 300PL Only


1. Internet Explorer comes preloaded with Windows NT. If you choose to use a different browser such as Netscape Communicator install it now. Note: For approved web browsers, see IBM TotalStorage Enterprise Storage Server Introduction and Planning Guide book (GC26-7444) or IBM TotalStorage Enterprise Storage Server Web Users Interface Guide book (SC26-7346). 2. Bring up the web browser by clicking the Windows Start menu and choosing Programs, ESS Network, ESSNet Console. 3. You may have to do the instructions to set the default profiles if this is the first invocation of the browser. 4. Set the current page to the homepage: v If the browser is Netscape: a. Click Edit from the Menu Bar. b. Select Preferences. c. Highlight the Navigator category. d. Click on the Use Current Page button. e. Click OK. v If the browser is Internet Explorer @4.0: a. Click View, from the Menu Bar. b. Select Internet Options. c. Click the Use Current button under the General tab. d. Click the Security tab. e. Select Internet Zone from the drop down box next to Zone:. f. Click on the Custom radio button then click Settings. g. Scroll through to the Java section and click the Custom radio button, under Java Permissions. h. Click Java Custom Settings at the bottom of the Security Settings panel. This brings up the Internet Zone panel. i. Click the Edit Permissions tab. j. Click the Enable radio button under Run Unsigned Content. k. Click OK on the Internet Zone panel. l. Click OK on the Security Settings panel. m. Click OK on the Internet Options panel. v If the browser is Internet Explorer @5.0: a. Click Tools->Internet Options from the menu pull down. This brings up a new panel called Internet Options. b. Click the Use Current button under the General tab. c. Click the Security tab. d. Click the Internet icon. e. Click the Custom Level. This brings up the Security Settings panel.

84

VOLUME 1, TotalStorage ESS Service Guide

MAP 1606: Converting the Personal Computer to an ESSNet Console


f. Scroll through to the Java section and click the Custom radio button, under Java Permissions. g. Click Java Custom Settings at the bottom of the Security Settings panel. This brings up the Internet panel. h. Click the Edit Permissions tab. i. Click the Enable radio button, under Run Unsigned Content. j. Click OK on the Internet panel. k. Click OK on the Security Settings panel. l. Click OK on the Internet Options panel. 5. Close the web browser.

Cleanup Desktop, Personal Computer 300PL Only


1. Delete all icons on the desktop except ESSNet Toolkit, Netscape, Internet Explorer, My computer, Network Neighborhood, Inbox, Ethernet, My Briefcase, and Recycle bin. 2. Click Yes on the Confirm File Delete panels. 3. Restart the console. 4. END of PROCEDURE. Return to the MAP or procedure that sent you her

MAP 1607: Changing the Network Configuration (IP address, host name, domain, subnet mask) for ESS and the TotalStorage ESS Master Console
Description
The network configuration is changing for the ESS, the Master Console or both. Dependencies: To communicate properly, each ESS clusters IP address must be registered on the Master Console. On each ESS, the IP address and host name from the attached Master Console must be registered. If an ESS cluster IP address changes, the old IP address must be deleted and the new IP address must be registered on the Master Console. If the Master Console IP address or host name changes, it must be changed on the ESS also. When changing the network configuration, the general sequence is shown below. Depending on what settings are changed, some steps may be omitted : 1. Change network settings on ESS, see Changing TCP/IP Configuration in chapter 6 of the Volume 2. 2. Change network settings on Master Console, see Master Console Configuration for Customer Network in chapter 5 of the Volume 2. 3. Delete old ESS cluster IP addresses on Master Console, see Verify Cluster IP Address on the Master Console in chapter 5 of the Volume 2. 4. Register/Add new ESS cluster IP addresses on Master Console, see Verify Cluster IP Address on the Master Console in chapter 5 of the Volume 2. Note: With this step, the Master Console communicates with that ESS to retrieve additional configuration information, for example, the ESS cluster host name. Make sure that all ESS cluster return values are not N/A. If N/A is returned, no communications are possible with that ESS cluster, this is an error condition!

Problem Isolation Procedures, CHAPTER 3

85

MAP 1607: Changing Network Configuration for ESS and Master Console
The Refresh button on the Master Consoles ESS Configurations panel can be used to query ESS cluster configuration information at any time. If only the subnet or the domain has changed, and the IP address stayed the same, the Refresh button must be pressed. This verifies that the Master Console is communicating correctly with the ESS and also retrieves the latest ESS cluster configuration information. 5. Change Master Console IP address and host name on the ESS, see Configuring the 2105 Model 800 in chapter 5 of the Volume 2.

MAP 1608: Manually Configuring the Video/Graphics Adapter for the Master Console
Description
The Master Consoles personal computer (PC) has a problem. Note: The IBM TotalStorage ESS Master Console will be referred to as the Master Console in this document. Manual Configuration of the Video/Graphics Adapter: Normally the automatic hardware configuration, performed in MAP 1605: Master Console Product Recovery Wizard on page 73 configures the video/graphics adapter. This map procedure is only required when the automatic configuration fails. 1. Reboot the PC. 2. During the boot process press the R key when the message Press R to enter the Console system Menu is displayed. 3. The boot process will continue and display the Product Recovery System Configuration Menu. 4. Choose selection 4) RUN XWINDOW CONFIGURATOR TOOL by typing 4, then press the Enter key. 5. At the WELCOME screen, verify that the OK button is selected then press the Enter key. 6. At the Choose a card screen: a. Select your video/graphics adapter, or if not found, select Unlisted Card at the bottom of the list. b. Use the Tab key to move the cursor to OK button. c. Press the Enter key to continue. 7. The Pick a Server screen may be displayed? v If the Pick a Server screen is displayed: a. Select XF86_SVGA. b. Use the Tab key to move the cursor to OK button. c. Press the Enter key to continue. v If the Pick a Server screen is not displayed, continue with the next step. 8. At the Monitor Probe screen, verify that the Yes button is selected then press the Enter key. 9. At the Screen Configuration screen, select Not to probe. 10. At the Video Memory screen: a. b. c. 11. At Select 4mb. Use the Tab key to move the cursor to OK button. Press the Enter key to continue. the Clockchip Configuration screen:

86

VOLUME 1, TotalStorage ESS Service Guide

MAP 1608: Configuring the Video/Graphics Adapter for the Master Console
a. No Clockchip Setting (recommended) will be preselected. b. Use the Tab key to move the cursor to OK button. c. Press the Enter key to continue. 12. The Probe for Clocks screen may be displayed: v If the Probe for Clocks screen is displayed, verify that the Probe button is selected then press the Enter key. v If the Probe for Clocks screen is not displayed, continue with the next step. At the Probe for Clocks screen, verify that the Probe button is selected then press the Enter key. 13. The Clock Probe Failed scree may be displayed: v If the Clock Probe Failed screen is displayed, verify that the OK button is selected then press the Enter key. v If the Clock Probe Failed screen is not displayed, continue with the next step. 14. At the Select Video Modes screen: a. Press the Tab key once to move the cursor to the column labeled 16 bit:. b. Use the down arrow key to highlight the field [ ]1024x768. c. Press the space bar to select the 1024 x 768 resolution. d. Use the Tab key to move the cursor to OK button. e. Press the Enter key to continue. At the Starting X screen, verify that the OK button is selected then press the Enter key. At the Can You See This Message message screen, verify that the Yes button is selected then press the Enter key. At the Xconfigurator can set up your computer to automatically start X upon booting. Would you like X to start when you reboot? message, verify that the Yes button is selected then press the Enter key. At the Confirm screen, verify that the OK button is selected then press the Enter key. When prompted, press the Enter key to continue On the Product Recovery System Configuration Menu screen, choose selection 98) REBOOT SYSTEM by typing 98, then press the Enter key. The ESS Net Console PC will now begin rebooting. If the problem still exists, repeat the procedure and choose a different server in step 7 on page 86. END of PROCEDURE. Return to the MAP or procedure that sent you here.

15. 16. 17.

18. 19. 20. 21. 22. 23.

MAP 1609: Power Off and Reboot Procedure for the TotalStorage ESS Master Console
Description
The Master Consoles personal computer (PC) needs to be powered off or rebooted. To avoid potential system damage, that may occur if the PC power switch is used, perform the following steps to power off or reboot the PC. Power Off the Master Console: Use this procedure to power off the Master Console. 1. Click on the Foot icon located at the bottom left of the screen. 2. Click on Log Out.
Problem Isolation Procedures, CHAPTER 3

87

MAP 1609: Power Off and Reboot Procedure, TotalStorage ESS Master Console
3. 4. 5. 6. In the next window, select Halt and then click the Yes button. The screen switches to text mode and displays the power-down process status. Wait until you see the message System halted at the bottom of the screen. It is now safe to turn off the Master Console using the PCs main power switch.

Reboot the Master Console: Use this procedure to reboot the Master Console. 1. Click on the Foot icon located at the bottom left of the screen. 2. Click on Log Out. 3. In the next window, select Reboot and then click the Yes button. 4. The screen switches to text mode and displays the power-down process status. 5. The Master Console PC will begin to automatically reboot.

MAP 1610: Connecting the Modem and Modem Expander for Remote Support
Description
This procedure supports early level modem and ESSNet hardware. It does not support ESSNet II.

Procedure
Attention: The modem and modem expander are installed with the initial 2105 Model 800. They support the initial 2105 Model 800 and the next six 2105 Model 800s via the modem expander. If an eighth 2105 Model 800 is installed, a new modem and modem expander must be installed. Attention: The 2105 and cables in this procedure are ESD-sensitive. Always wear an ESD wrist strap during this procedure. Follow the ESD procedures in Working With ESD Sensitive Parts in chapter 4 of the Volume 2. 1. Verify that the customer has supplied the required analog telephone connection and cables, and that there are two AC service connections available for the modem and modem expander. Note: This is an additional AC service requirement for the customer, do not connect the modem or modem expander to the AC cord required for the service terminal. 2. Locate the Modem Kit, ordered separately, but shipped with the 2105 Model 800 Ship Group (Feature Code for US/Canada 2715). Place the modem 1 [Figure 26] and the modem expander (asynchronous port switch) 6 [Figure 26] in an area between the customer supplied AC service and the 2105 Model 800 clusters. Note: Feature code 2715 contains remote support switch parts for the modem and ESSNet.

88

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support

Customer AC Service

1
Modem Line DTE

6 4
Power

2105 Model 800

5
Power

Modem Expander Port 1

10 S3

Cluster 1 Cluster 2

9 8

Port 16 Port 2

S3

Telephone Line
Figure 26. Modem and Modem Expander Attachment Diagram (s009425)

3. Is the modem you are attaching a Microcom DeskPort? v Yes, continue with the next step. v No, go to step 5. 4. Verify that the Modem Configuration Switches [Figure 27] are set correctly, all switches down. Note: To access the switches, use a thin blade screwdriver to lift off the modem nameplate on the left side of the front panel. The two banks of switches are located behind the nameplate.
Microcom DeskPorte Modem T/D O/A ON/OFF

TST DCD DTR TXD RXD

Front View

Modem Configuration Switches


Figure 27. Modem Configuration Switch Settings (S007457l)

5. Verify that the Modem Expander Setup Switches [Figure 28] on the bottom of the expander are set correctly. Switches 1 and 3 should be OFF (0) and all other switches should be ON (1). Note: Setting switches 1 and 3 to OFF (0) sets the modem baud rate at 38.4 kb.

Problem Isolation Procedures, CHAPTER 3

89

MAP 1610: Modem for Remote Support

ON (1) OFF (0) 1 2 3 4 5 6 7 8 Modem Expander Setup Switches


Figure 28. Modem Expander Setup Switch Settings (S007455l)

6. Plug the 25-pin end of the data cable 3 [Figure 26], into the DTE connector 14 [Figure 29] on the back of the modem. Plug the 9-pin end of the data cable into Port 16 16 [Figure 30] on the back of the modem expander. Tighten the cable connector retention screws. 7. Plug the RJ-11 telephone cable 2 [Figure 26], into the LINE connector, 15 [Figure 29] on the back of the modem. Plug the other end of the cable into the customers telephone line connector. 8. Plug the modem power adapter 4 [Figure 26], into the POWER connector 13 [Figure 29] on the rear of the modem. Plug the other end of the cable into the customers AC service outlet. 9. Determine if the customer supplied AC input voltage for the modem expander is in the range of 115 V ac or 230 V ac. Set the voltage range switch 18 [Figure 30] on the rear of the modem expander to match the customers AC voltage: v 115 V ac range, push the switch to the left v 230 V ac range, push the switch to the right 10. Plug the power cord 5 [Figure 26], supplied with the modem expander, into the power connector 19 [Figure 30] on the rear of the modem expander. Plug the other end of the cable into the customers AC service outlet.

13
Microcom DeskPorte Modem Parallel Port Power DTE

14

15

Line

Phone

Rear View

13
MultiTech MultiModem
PHONE LINE EIA RS232C

VOLUME

Rear View

15
Figure 29. Modem Rear View (S008410l)

14

90

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support

15 16

13 14

11 12

9 10

7 8

5 6

3 4

1 2

Rear View
Figure 30. Modem Expander Rear View (S008411l)

11. Locate the two null-modem cables (P/N 34L7144, length 15 meters, 50 feet) in the ship group. 12. Determine if this is the first 2105 Model 800 being installed on the modem expander: v If this is the first 2105 Model 800 being installed on the modem expander, go to step 13. v If this is not the first 2105 Model 800 being installed on the modem expander, go to step 15. 13. Connect cluster 1 9 [Figure 26] to modem expander port 1: Plug the connector labeled CLUSTER S3 of the null-modem cable 10 [Figure 26], into the cluster 1, S3 connector 19 [Figure 31], on the front of cluster 1. Plug the other end of the cable labeled MODEM EXPANDER into Port 1 14 [Figure 30], on the rear of the modem expander. Attention: For correct modem expander initialization, cluster 1 of the first 2105 subsystem installed must be connected to port 1 of the modem expander. This connection is critical because the modem expander can only be configured through the cluster 1/port 1 connection. 14. Connect cluster 2 8 [Figure 26] to modem expander port 2: Connect the other null-modem cable 7 [Figure 26] to the S3 connector 20 [Figure 31] on the front of cluster 2. Plug the other end of the cable into the into Port 2 17 [Figure 30], on the rear of the modem expander, go to step 17 on page 92. Attention: Both clusters must be connected to the modem expander for the 2105 service strategy to work. Note: After each null modem cable is connected to the cluster, run each loose cable into the center cable bundle. Ensure that the Ferrite cores on the cables are located inside the 2105 frame. This is needed to minimize RFI and to provide a loop to allow the I/O drawer to be moved to the service position. The additional cable length can be stored in the area between the AC input connectors. 15. Connect cluster 1 9 [Figure 26] to the next available modem expander port: Plug the 9-pin connector end of the null-modem cable 10 [Figure 26], into the S3 connector 20 [Figure 31] on the front of cluster 1. Plug the other end of the cable into the lowest numbered port available on the rear of the modem expander [Figure 30]. 16. Connect cluster 2 to the next available modem expander port:

Problem Isolation Procedures, CHAPTER 3

91

MAP 1610: Modem for Remote Support


Connect the other null-modem cable to into the S3 connector 21 [Figure 31] on the front of cluster 2. Plug the other end of the cable into the lowest numbered port available on the rear of the modem expander [Figure 30]. Go to step 17. Attention: Both clusters must be connected to the modem expander for the 2105 service strategy to work. 17. Enter the Cluster Modem Expander Port information for clusters 1 and 2 on the Communications Worksheets.

I/O Drawer 1

I/O Drawer 2

19
Front View

20

Figure 31. Cluster Modem Connectors (s009133)

18. Power on the modem expander. At the rear of the expander press the power switch 15 [Figure 30], to On (up). 19. Is the modem you are attaching a MultiTech MultiModem? v Yes, continue with the next step. v No, go to step 21. 20. Turn on the modem using the on/off switch located on the front panel. When you apply power, the modem performs a diagnostic self-test, indicated by the TM indicator lighting for a few seconds after which the LCD should light. If this does not happen, check that the power switch is on, the power supply is solidly connected correctly and the AC outlet voltage is present.If these checks do not work, see Chapter 8 of the Users Guide supplied with the modem, Solving Problems. Go to step 22 on page 93. 21. Power on the modem. At the front of the modem press the ON/OFF switch, 22 [Figure 32]. The light in the switch comes on when the modem is powered on. Note: Pressing the ON/OFF switch is the same as unplugging the modem from its AC power source and plugging it back in. Each time you power the modem off then on, it performs its power-up diagnostics. These tests take about 5 seconds and the modem ignores all commands while diagnostics are running. If the TST light is on steady (not blinking) for more then 5 seconds after the test, the modem has detected an error. Repair the problem using the trouble shooting section of the user guide supplied with the modem.

92

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support

Microcom DeskPorte Modem T/D O/A ON/OFF

TST DCD DTR TXD RXD

Front View

21

MultiTech MultiModem

Power

Front View

21
Figure 32. Modem Front Panel Locations (S008412l)

22. Initialize the modem expander: a. Go to the front of the modem expander and locate the CLEAR 23 [Figure 33] and SET 24 switches. b. Press and hold the SET and CLEAR switches at the same time. c. Release the CLEAR switch, wait one second, and then release the SET switch.

ON CLEAR SET

OFF 1 2

23

Front

24

Figure 33. Modem Expander Switches and Indicators (S007486l)

23. Is the modem being installed as part of the initial 2105 installation? v Yes, continue with Completing the Installation of the 2105 Model 800 Unit on page 94. v No, the 2105 was previously installed without a modem, continue with the next step. 24. Ensure that the Communications Resources Worksheet has been filled in for the Call Home/Remote Services and Modem Configuration fields. Refer to the IBM TotalStorage Enterprise Storage Server Introduction and Planning Guide book, form number GC26-7444 for the worksheets, and to Filling in fields on the Communications Resources Worksheet in chapter 6 of the Volume 2 for the procedure.
Problem Isolation Procedures, CHAPTER 3

93

MAP 1610: Modem for Remote Support


When the worksheets have been filled out by both the customer and the service support representative, go to the service terminal and perform Configure Call Home / Remote Services in chapter 6 of the Volume 2.

Completing the Installation of the 2105 Model 800 Unit


This section describes the actions that should be done after the 2105 Model 800 has been physically installed. v Installing the ESSNet v Configure subsystem v Connect the Ethernet LAN cables required for customer E-mail reports and ESS Specialist functions. v Connect the modem cables for remote support of the 2105 Model 800 v Update installation records v Organize installation documents and forms

Installation of the ESSNet and ESSNet Console


Do the following steps to install the ESSNet and console. Connecting the ESSNet Hub: 1. Is this the first 2105 Model 800 to be installed on this ESSNet? v Yes, continue with the next step. v No, the ESSNet hub is already installed, go to Connecting the 2105 Model 800 to the ESSNet Hub. Locate the ESSNet hub, its power cord, and three RJ45 Ethernet cables in the ship group. Place the ESSNet hub within 15 meters (50 feet) of the 2105 Model 800. Verify that the power cord supplied matches your input power. You may have to obtain a power converter or adapter locally. Connect the power cord to the customer power outlet and power on the hub. Continue with Connecting the 2105 Model 800 to the ESSNet Hub.

2. 3. 4. 5. 6.

Connecting the 2105 Model 800 to the ESSNet Hub: Attention: The 2105 and cable in this procedure are ESD-sensitive. Always wear an ESD wrist strap during this procedure. Follow the ESD procedures in Working With ESD Sensitive Parts in chapter 4 of Volume 2. 1. Disconnect the cluster to cluster communication Ethernet (RJ45) cable 1 [Figure 34] from both clusters. Note: Clusters will communicate across the ESSNet after the ESSNet Ethernet cables are installed.

94

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support

I/O Drawer 1

I/O Drawer 2

Front View

Figure 34. Cluster to Cluster Communication Cable Location (s009120)

2. Route one RJ45 cable (PN 18P1896) between the 10/100 Base T (RJ45) connector on each the I/O drawer and the ethernet hub. Ensure that the Ferrite cores on the cables are located inside the 2105 frame. This is needed to minimize RFI and to provide a loop to allow the I/O drawer to be moved to the service position. The additional cable length can be stored in the area between the AC input connectors. 3. Connect an Ethernet RJ45 cable to the RJ45 connector on each cluster. Connect the other end of each cable to the recommended port (1X to 15X) on the hub [Figure 35], do not connect to either hub port 16MDL-X or 16MDI. See Table 22 for the recommended 2105 Model 800 cluster hub port connection sequence. Label all RJ45 cables.
Table 22. 2105 Model 800 Recommended ESSNet Hub Connection Sequence 2105 Subsystem Being Installed 1 2 3 4 5 6 7 Cluster 1, Hub Connector 1X 2X 3X 4X 5X 6X 7X Cluster 2, Hub Connector 9X 10X 11X 12X 13X 14X 15X

Problem Isolation Procedures, CHAPTER 3

95

MAP 1610: Modem for Remote Support

2105 Subsystem #1 Cluster 1

Ethernet 10/100 Base T (RJ45) connector ESSNET* Ethernet RJ45 cable Ethernet Hub 1X 2X 3X 4X 5X 6X 7X

Master Console

Master Console RJ45 cable 8X

IBM

9X

10X

11X

12X 13X

14X

15X 16 MDI-X 16 MDI MDI PORT Customer Port (non-crossover) Customer Port (crossover)

MDI-X PORTS Front View 2105 Subsystem #1 Cluster 2

Ethernet 10 Base T (RJ45) connector * Note: See table for recommended plugging of additional 2105 subsystem connections to the Ethernet Hub

8X
Crossover

16 Ethernet Switch
Figure 35. ESSNet Hub Port Connector Locations (S008603p)

4. Connect the service terminal to cluster 1. Use the Repair Menu, Display / Repair Problems Needing Repair option, which displays problems from both clusters. If the cluster to cluster communication is not working, it will give an error message for cluster 2. Is there an error message for cluster 2? v Yes, go to MAP 4390: Isolating a Cluster to Cluster Ethernet Problem on page 377. v No, continue with the next step. 5. Is this the first 2105 Model 800 being installed on this ESSNet? v Yes, continue with the next step. v No, go to Configuring the 2105 Model 800 in chapter 5 of the Volume 2. 6. Continue with Installing and Connecting the ESSNet Console to the ESSNet Hub. Installing and Connecting the ESSNet Console to the ESSNet Hub:

96

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support


1. Check that the voltage setting for the input power is correct for the ESSNet console. The switch is above the power connector on the rear of the console (115V or 230V). 2. The following installation steps may have already been performed by Manufacturing. There are two ways to determine if the installation and setup has already been performed: v The ESSNet Console shipping carton contains a note saying that setup has already been done by manufacturing v The ESSNet Console does not display a Windows Setup screen when it is powered ON 3. Do either of the two above conditions exist? v Yes, the setup and installation procedure has already been performed, go to Configuring the 2105 Model 800 in chapter 5 of the Volume 2. v No, continue with the next step. 4. Determine which type of ESSNet console you have. The ESSNet console come in two models: v Personal Computer Model 300PL v NetVista model Each console requires a different set of installation instructions. Look at the front of the console to determine which model your console is. Is the ESSNet console a NetVista model? v Yes, the NetVista operating system must be converted from Windows 2000 to NT 4.0, continue with Converting Windows 2000 to Windows NT 4.0, NetVista. v No, continue with ESSNet Console Installation, Personal Computer 300PL Only on page 103. Converting Windows 2000 to Windows NT 4.0, NetVista: 1. Install and power on the ESSNet console using the documentation that came with the NetVista hardware being used as the ESSNet console. The ESSNet console must be located within 15 meters (50 feet) of the ESSNet Ethernet hub. Note: Use the documentation that came with the NetVista hardware to analyze and repair any problems with the ESSNet console hardware. 2. Power on the ESSNet console by pressing and releasing the ESSNet console ON/OFF switch. Wait for the ESSNet console to power up and display the Windows 2000 Professional Setup window. 3. Power the ESSNet console off by pressing the ON/OFF button for one minute. Note: If the console does not power off after one minute, unplug the power cord to the console. Leave the power cord disconnected for one minute then reconnect it. Power on the ESSNet console by pressing and releasing the ESSNet console ON/OFF switch. As soon as To Start the Product Recovery Program, Press F11 is displayed, immediately press F11. When IBM Product Recovery Program Version 5.0 is displayed, select Windows NT 4.0 and press the Enter key. At the Main Menu window, select Full Recovery and press the Enter key.

4. 5. 6. 7.

Problem Isolation Procedures, CHAPTER 3

97

MAP 1610: Modem for Remote Support


8. At theTerms and Conditions window, press the y key to accept these conditions. 9. At the ATTENTION-READ THIS BEFORE YOU CONTINUE window, press the y key to continue. 10. When Your hard disk will be formatted, and all files will be deleted is displayed, press the y key to continue. 11. Wait seven to ten minutes for the recovery to complete. 12. When the Recovery is Complete is displayed, press the Enter key to restart the computer. 13. Wait for the system to reboot and go through the IBM Windows NT Setup several times. Be patient, it takes 10 to 15 minutes to complete. 14. When the Window NT Work Station Setup window displays Window NT Setup, continue with ESSNet Console Installation for NetVista. ESSNet Console Installation for NetVista: 1. At the Windows NT Setup screen: a. Click Next. b. Click the I accept this agreement button, then click Next. 1) Enter essnet1 as the name, leave organization blank, then click Next. 2) Enter ESSNET1 as the computer name, then click Next. Note: The computer name will always appear in uppercase. 3) Enter the Password of password, enter the Password of password again to confirm it, then click Next. Note: Enter password in lower case. 4) Click on Finish, the machine will reboot. 2. At the Begin Logon screen, press the Ctrl + Alt + Delete keys. 3. At the Logon Information window, enter Username: Administrator, password = password. Click OK. 4. Display the Setup of the ESSNet Console: a. Right click on a clear area of the desktop (not over any icon). b. Select Properties. c. Select the Background tab, then select None. d. Select the Screen Saver tab. e. Open the drop down list box and select 3D Text (Open GL). f. At Screen Saver click on Settings. g. Click on the Text button then enter IBM ESSNet next to the radio button. h. Click OK. i. Set the wait time to 15 minutes. j. At Display Properties, select the Settings tab. k. At Color Palette, select 65536 Colors from the drop down menu. l. At Desktop Area, adjust the slider to 800 x 600. m. Click on Test. n. At the Testing Mode window, click OK and wait to view the test screen. o. If you saw the bit map correctly, click Yes. p. At the Display Properties window, click Apply. q. Click OK to close the Display Properties window.

98

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support


r. Continue with TCP/IP Setup of the ESSNet Console for NetVista TCP/IP Setup of the ESSNet Console for NetVista: 1. Right click on the Network Neighborhood icon. 2. Select Properties. 3. At Network Configuration, click Yes to install the NT Networking. 4. Check the Wired to the Network box, then click Next. 5. Click Start Search to find the ethernet adapter. 6. Verify that the ethernet adapter is checked, then click Next. 7. Verify that TCP/IP Protocol is checked, then click Next. 8. Verify that all four Network Services boxes are checked, then click Next. 9. Click Next to install the selected components. 10. Click Continue to install the drives from the drive c:... 11. Click Continue to copy some Windows NT files. 12. If a DHCP question is asked, click NO. 13. Under the Adapter drop down menu, verify that (1) Intel(R) PRO/100 VE Desktop Connection is selected. 14. Select the Specify an IP Address radio button. 15. Enter an IP Address of 172.31.1.250. 16. Enter a Subnet Mask of 255.255.255.0. 17. Leave the Default Gateway blank. 18. Select Apply. 19. Select OK. 20. Click Next. 21. 22. 23. 24. 25. Click Next again to start the network. Verify the Workgroup radio button is selected, then click Next. Click Finish. Click Yes to reboot the ESSNet Console. Continue with ESSNet Setup for NetVista.

ESSNet Setup for NetVista: 1. At the Begin Logon screen, press the Ctrl + Alt + Delete keys. 2. At the Logon Information window, enter Username: Administrator, password = password. Click OK. 3. Insert the current ESSNet Console Installation Diskette into the floppy diskette drive. 4. On the desktop, click Start, then Run. 5. Enter a:setupenc.exe, then press Enter. 6. Follow the instructions on the screen by selecting Yes to continue. 7. At the Installshield Self - extracting EXE window, click Yes, then Next, then Yes, then Next then Finish. 8. After clicking Finish, remove the ESSNet Console Installation Diskette from the a: drive. 9. Close the ESS Network window. 10. On the ESSNet console desktop, double click on the ESSNet Toolkit icon. 11. At the Missing Setup Files... dialog box, click OK.

Problem Isolation Procedures, CHAPTER 3

99

MAP 1610: Modem for Remote Support


12. 13. 14. 15. 16. 17. 18. Click the Install/Configure tab. Click ESSNet Configuration. Click the Subsystem tab. Click Add ESS (2105). Click Save. Click OK. Close the ESSNet Toolkit.

19. Continue with Web Browser Setup for NetVista. Web Browser Setup for NetVista: 1. Start the web browser by clicking the Windows Start menu then choose Programs, ESS Network, ESSNet Console.... 2. From the pull down menu, click on Tools->Internet Options. This will bring up the Internet Options panel. 3. Under the General tab, click the Use Current button. 4. Click the Security tab. 5. Click the Internet icon. 6. Click the Custom Level. This will bring up the Security Settings panel. 7. Scroll through the Java section and under Java Permissions, click on the Custom radio button. 8. At the bottom of the Security Settings panel, click on Java Custom Settings. This will display the Internet panel. 9. Click the Edit Permissions tab. 10. At Run Unsigned Content, click the Enable radio button. 11. At the Internet panel, click OK. 12. At the Security Settings panel, click OK. 13. At the Warning ! screen, click Yes. 14. At the Internet option panel, click OK. 15. Maximize the web browser window (Home-Microsoft Internet Explorer). 16. Click the ESS Specialist button. 17. After the window changes, under Select a cluster, click (ESS-1 cluster-1). 18. At the Internet Connection Wizard window, click Cancel. 19. At another Internet Connection Wizard window, check Do not show the Internet Connection Wizard in the future, then click Yes. 20. Close the web browser (Specialist-Microsoft Internet Explorer). 21. Continue with Cleanup Desktop for NetVista. Cleanup Desktop for NetVista: 1. Move (drag and drop) all of the following icons to the right side of the desktop: v ESSNet Toolkit v Internet Explorer v My Computer v Network Neighborhood v Inbox v My Briefcase v Recycle bin Delete all of the remaining icons on the left side of the desktop.

100

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support


Note: To move an icon: 1. Left click and hold on the icon, 2. Move mouse/icon, 3. Release the mouse button to drop the icon. To delete an icon: 1. Right click on the icon, 2. Select Delete from the drop down menu. 2. Reply Yes to the boxes that ask you to confirm deletion of the icons. 3. Right click on the desktop (not over any icon on the desktop). From the drop down menu select Arrange Icons then Auto Arrange. 4. Continue with Install Netscape Browser for NetVista. Install Netscape Browser for NetVista: The Netscape browser can be installed two different ways, this is determined by which installation CD-ROM was shipped with your system: v Installing Netscape Browser Using Software Selections CD-ROM v Installing Netscape Browser Using Ready-to-Configure Utility Program CD-ROM Installing Netscape Browser Using Software Selections CD-ROM: 1. Insert the Software Selections CD-ROM, that came with the PC system, into the PCs CD-ROM drive. It will take a minute to display the Software Selections for IBM window. 2. At the Software Selections for IBM window: a. At the Install column, check Netscape Communicator. Remove the checks from all other boxes. b. Click Install, then click OK. 3. When the installation completes, at the Software Selections for IBM window, click Exit. 4. Close the Netscape Communicator window, then remove the Software Selections CD-ROM from the CD-ROM drive. 5. Continue with Setup Netscape Browser for NetVista on page 102. Installing Netscape Browser Using Ready-to-Configure Utility Program CD-ROM: 1. Insert the Ready-to-Configure Utility Program CD-ROM, that came with the PC system, into the PCs CD-ROM drive. 2. Double click the My Computer icon. 3. Double click the D:drive (Nirtcn019ww). 4. Double click the Common folder. 5. Double click the Netscape folder. 6. Double click the Setup icon (with the green monitor icon). 7. At the Netscape Communicator 4.5 Setup window, click Next. 8. At the Software license agreement, click Yes. 9. Click Next. 10. At the Question window, click Yes. 11. Click Next. 12. Click Install. 13. Click No for not viewing the README file. 14. Click OK. 15. At Restarting windows, click OK to restart my computer now. 16. At the Begin Logon screen, press the Ctrl + Alt + Delete keys. 17. At the Logon Information window, enter Username: Administrator, password = password. Click OK. 18. Close all of the windows.
Problem Isolation Procedures, CHAPTER 3

101

MAP 1610: Modem for Remote Support


19. Remove the Ready-to-Configure Utility Program CD-ROM from the CD-ROM drive. 20. Continue with Setup Netscape Browser for NetVista. Setup Netscape Browser for NetVista: 1. On the desktop, double click on the Nescape Communicator icon. 2. As different windows are displayed, click on Next five times. 3. At the Set up your Newsgroups Server window, click on Finish. 4. At the Netscape Navigator window, click Yes. 5. At the Netscape warning window, click OK. 6. At the Netscape Browser, click EDIT. 7. Click Preferences. 8. At the Home Page location field, enter c:\Program Files\ESSNet\www\index.htm. 9. Click OK. 10. Close the Netscape Browser window. 11. Click the Start button. 12. Click Programs. 13. Move the mouse pointer over the Startup menu item (so that its highlighted), then click the right-mouse button, and then click Explore. Note: If there are two Startup menu items, select the one that contains the ESSNet Console. 14. In the Exploring-Startup window, click once on the ESSNet Console icon in the right-hand panel, and then press the Delete key on the keyboard. 15. At the Confirm File Delete window, click Yes. 16. On the Desktop, right click on the Netscape Communicator icon and drag the icon into the right-hand panel of the Exploring-Startup window. 17. At the pop-up menu, click Create Shortcut Here. 18. Close the Exploring-Startup window. 19. Click the Start button. 20. Click Programs. 21. Click ESS Network. 22. Move the mouse pointer over the ESSNet Console menu item (so that it is highlighted). 23. Click the right mouse button, then click Properties. 24. In the Target field, enter: c:\Program Files\Netscape\Communicator\Program\netscape.exe. 25. At the Netscape Console Properties window, click Apply, and then click Close. 26. Shut the system down: click the Start button, select Shutdown, and then click OK. 27. Continue with Setup Verification for NetVista. Setup Verification for NetVista: 1. Power the ESSNet console on. 2. At the Begin Logon screen, press the Ctrl + Alt + Delete keys.

102

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support


3. At the Logon Information window, enter Username: Administrator, password = password. 4. Maximize the window or scroll down until Welcome to IBM StorWatch Enterprise Storage Network is displayed. If this is not displayed, the SharkNet setup has failed. Perform the ESSNet setup again by going to Converting Windows 2000 to Windows NT 4.0, NetVista on page 97. 5. Close the Welcome to IBM StorWatch... window. 6. At the desk top, double click on the ESSNet Toolkit icon. 7. The ESSNet Toolkit window should be displayed. If a Missing Setup Files message is displayed, the SharkNet setup has failed. Perform the ESSNet setup again by going to Converting Windows 2000 to Windows NT 4.0, NetVista on page 97. 8. Close the ESSNet Toolkit window. 9. Continue with Configuring the 2105 Model 800 in chapter 5 of the Volume 2. ESSNet Console Installation, Personal Computer 300PL Only: 1. Install and power on the ESSNet console using the documentation that came with the hardware being used as the ESSNet console. The ESSNet console must be located within 15 meters (50 feet) of the ESSNet Ethernet hub. Use the documentation that came with the hardware to analyze and repair any problems with the ESSNet console hardware. Note: The ESSNet console may be set to boot from the network first. Press Escape if Press Escape (Esc) to Cancel the DHCP Network Load is displayed. If not, the ESSNet console should boot after one minute. 2. NT Setup (only valid during initial power on): a. Ensure that ESSNet console is powered on. b. Click Next on Windows NT Setup. c. Select the Radio button I accept this agreement and click Next. d. Enter essnet1 as the name, and leave organization blank and click Next. e. Enter the 20 digit Product ID found on the Certificate of Authenticity and click Next. f. Enter ESSNET1 as the computer name. Note: The computer name will only appear in uppercase regardless of which case it is typed. g. Set the Password to password. Note: Enter password in lower case. h. Click on Finish, and reboot machine. 3. Login to the NT operating system with login = administrator, password = password. 4. The Display Setup of ESSNet Console: a. If This is the first boot of the workstation you may have to close the Microsoft Internet Explorer window. b. c. d. e. Right click on the desktop (not over any icon on the desktop). Select Properties. Select the Background tab, and select None. Select the Screen Saver tab.
Problem Isolation Procedures, CHAPTER 3

103

MAP 1610: Modem for Remote Support


f. g. h. i. j. k. l. Open the drop down list box and select 3D Text (Open GL). Click on Settings. Click on the Text radio button and type ESSNet next to the radio button. Click OK. Set the Wait to 15 minutes. Select the Settings tab. Select 65536 Colors from the drop down menu under Color Palette.

m. Adjust the slider to the desired setting under Desktop Area: v 15 inch monitors, recommended setting 800 x 600 v 17 inch monitors, recommended setting 1024 x 768 n. Click on Test. o. Click OK on the Testing Mode window and wait to view the test screen. p. Click Yes if you saw the bitmap correctly. q. Click Apply on the Display Properties window. r. Click OK to close the Display Properties window. s. Use the buttons at the bottom of the monitor to adjust the screen. 5. Connect the ESSNet console to the Ethernet hub using the remaining RJ45 cable. Connect to hub port (8X) on the hub, do not connect to either hub port 16MDI-X or 16MDI. See in Figure 35 on page 96. TCP/IP Setup of the ESSNet Console, Personal Computer 300PL Only: 1. Right click the mouse on the Network Neighborhood icon. 2. Select Properties: Note: Do steps 3 to 12 only if this is the initial setup of TCP/IP on the workstation. If this is not the initial setup of TCP/IP, do the following: a. Click on the Protocols tab. b. Highlight the TCP/IP Protocol. c. Click on the Properties tab. d. Continue with step 13. Click Yes under Network Configuration to install NT Networking. Check the box by Wired to the Network, and click Next. Click Start Search to find the Ethernet adapter. Ensure that the ethernet adapter is checked and click Next. Ensure that TCP/IP Protocol is checked and click Next. Ensure all Network Services are checked and click Next. Click Next to install selected components. Click Continue to install the drivers from the c: drive. Click Continue to copy some Windows NT files. Click OK, when the Ethernet Adapter properties are shown, if a question is asked about DHCP, click NO. Ensure the ethernet adapter is selected under Adapter drop down menu. Select the Specify an IP address radio button. Enter 172.31.1.250 for the IP Address. Enter 255.255.255.0 for the Subnet Mask. Leave Default Gateway blank. Select Apply.

3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

104

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support


19. Select OK: Note: Do steps 20 to 24 only if this is the initial setup of TCP/IP on the workstation. If this is not the initial setup of TCP/IP, do the following: a. Click OK. b. Click the Start button. c. Click the Shutdown button and restart the ESSNet console. d. Go to step 25. Click Next. Click Next to start the network. Ensure the Workgroup radio button is selected then click Next. Click Finish. Click Yes to reboot the ESSNet Console. Press the Escape (Esc) key to cancel DHCP Load during bootup.

20. 21. 22. 23. 24. 25.

ESSNet Setup, Personal Computer 300PL Only: 1. Login as administrator. 2. Insert the ESSNet Console Installation Diskette into the floppy diskette drive. 3. On the desktop click start then Run... 4. Enter a:setupenc.exe. 5. Follow the instructions on the screen. 6. Remove the ESSNet Console Installation Diskette from the a: drive. 7. On the ESSNet console desktop, double click on the ESSNet Toolkit icon. 8. When you get the Missing Setup Files... dialog box, click OK. If you do not get this box, see ESSTOOLKIT NOTES in README.TXT. 9. Click Install/Configure tab. 10. Click ESSNet Configuration. 11. Click Subsystem tab. 12. Click Add ESS (2105). Note: If the ESSNet is already connected to the customers network, enter the information from the Communication Resources Worksheets before clicking Save. Click Save. Click OK. Close the ESSNet Toolkit. Close the ESSNetwork window.

13. 14. 15. 16.

Web Browser Setup, Personal Computer 300PL Only: 1. Internet Explorer comes preloaded with Windows NT. If you choose to use a different browser such as Netscape Communicator install it now. Note: For approved web browsers, see IBM TotalStorage Enterprise Storage Server Introduction and Planning Guide book (GC26-7444) or IBM TotalStorage Enterprise Storage Server Web Users Interface Guide book (SC26-7346). 2. Bring up the web browser by clicking the Windows Start menu and choosing Programs, ESS Network, ESSNet Console.

Problem Isolation Procedures, CHAPTER 3

105

MAP 1610: Modem for Remote Support


3. You may have to follow the instructions to set the default profiles if this is the first invocation of the browser. 4. Set the current page to the homepage: v If the browser is Netscape: a. Click Edit from the Menu Bar. b. Select Preferences. c. Highlight the Navigator category. d. Click on the Use Current Page button. e. Click OK. v If the browser is Internet Explorer @4.0: a. Click View, from the Menu Bar. b. Select Internet Options. c. Click the Use Current button under the General tab. d. Click the Security tab. e. Select Internet Zone from the drop down box next to Zone:. f. Click on the Custom radio button then click Settings. g. Scroll through to the Java section and click the Custom radio button, under Java Permissions. h. Click Java Custom Settings at the bottom of the Security Settings panel. This brings up the Internet Zone panel. i. Click the Edit Permissions tab. j. Click the Enable radio button under Run Unsigned Content. k. Click OK on the Internet Zone panel. l. Click OK on the Security Settings panel. m. Click OK on the Internet Options panel. v If the browser is Internet Explorer @5.0: a. Click Tools->Internet Options from the menu pull down. This brings up a new panel called Internet Options. b. Click the Use Current button under the General tab. c. Click the Security tab. d. Click the Internet icon. e. Click the Custom Level. This brings up the Security Settings panel. f. Scroll through to the Java section and click the Custom radio button, under Java Permissions. g. Click Java Custom Settings at the bottom of the Security Settings panel. This brings up the Internet panel. h. Click the Edit Permissions tab. i. Click the Enable radio button, under Run Unsigned Content. j. Click OK on the Internet panel. k. Click OK on the Security Settings panel. l. Click OK on the Internet Options panel. 5. Close the web browser. Cleanup Desktop, Personal Computer 300PL Only: 1. Delete all icons on the desktop except ESSNet Toolkit, Netscape, Internet Explorer, My computer, Network Neighborhood, Inbox, Ethernet, My Briefcase, and Recycle bin.

106

VOLUME 1, TotalStorage ESS Service Guide

MAP 1610: Modem for Remote Support


2. Click Yes on the Confirm File Delete panels. 3. Restart the console. 4. Continue with Configuring the 2105 Model 800 in chapter 5 of the Volume 2.

MAP 1620: Attaching The ESSNet to a Customer Network


Attention: In order to avoid LAN conflicts, the customer MUST provide TCP/IP information for ALL 2105 subsystems that are currently attached to the ESSNet. Before continuing, ensure that the customer TCP/IP information is available. 1. Verify that all ESS Specialist sessions are closed before continuing. 2. Change the ESSNet Console TCP/IP Information in all 2105s on the ESSNet. a. Using the Service Terminal, login to the 2105 Model 800 as service. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Change / Show TCP/IP Configuration Further Configuration Name Resolution Hosts Table (/etc/hosts) Add a Host b. Use the Communications Resources Worksheets to enter the customer supplied INTERNET ADDRESS (dotted decimal) and HOST NAME of the ESSNet console connected to your ESSNet. (The default name and address was essnet1 and 172.31.1.250.) c. Press Enter. d. After OK received, press F3 until the Change / Show TCP/IP Configuration menu is displayed. Note: Pressing F3 too many times will cause the daemons to restart and will add unnecessary delay. 3. Changing Network information of the 2105s a. Select Minimum Configuration & Startup. b. Select en0. Note: If the customer wants to use the et0 Ethernet interface, call your next level of support. c. Modify the following as supplied by the customer on the Communication Resources Work Sheet, and then press Enter. 1) Hostname 2) Internet Address 3) Subnet Mask 4) Nameserver 5) Domain Name 6) Gateway Server d. Press Enter. e. After OK received, press F3 until the Change / Show TCP/IP Configuration menu is displayed.

Problem Isolation Procedures, CHAPTER 3

107

MAP 1620: Attaching The ESSNet to a Customer Network


Note: Pressing F3 too many times will cause the daemons to restart and will add unnecessary delay. 4. Configuring the Alternate Cluster: a. Select Configure Alternate Cluster IP Address and Hostname. b. Enter the customer supplied INTERNET ADDRESS (dotted decimal) and HOST NAME of the alternate cluster (the cluster the service terminal is NOT connected to) of the 2105. c. Press Enter. d. After OK received, press F3 until the Change / Show TCP/IP Configuration menu is displayed. 5. Restarting Daemons. Press F3 until you receive the Restarting TCP/IP daemons... message. Then type Y and press enter, wait for the Press enter to continue message, then press enter again. The ESS Specialist Certificate must be regenerated. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu ESS Specialist Menu Disable the ESS Specialist From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu ESS Specialist Menu Create New Key Files/Certificate Follow the service terminal instructions. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu ESS Specialist Menu Enable the ESS Specialist Continue with the next step. Press F10 to exit SMIT. This is a very important step, read the following Attention carefully: Attention: Repeat steps 1 on page 107 to 7 for each Cluster of each 2105 subsystem connected to the ESSNet. Configuring the Network Information of the 2105s on the ESSNet Console: a. On the ESSNet console, double click the ESSNet Toolkit icon on the desktop. b. Select the Install/Configure tab. c. Click on the ESSNet Configuration Button. d. Select the Primary ESSNet Console tab. e. Enter the following as supplied by the customer in the communications resources worksheets from any attached 2105. v Hostname of the ESSNet Console v IP Address of the ESSNet Console v Subnet Mask of the ESSNet Console

6.

7. 8.

9.

108

VOLUME 1, TotalStorage ESS Service Guide

MAP 1620: Attaching The ESSNet to a Customer Network


f. Click on the Subsystem tab. g. Select the HostName of the cluster of the 2105 that needs to be changed. (To do this click on the Device-Model of the row you wish to edit.) h. Click on Edit Subsystem and change the following as supplied by the customer in the IBM TotalStorage Enterprise Storage Server Introduction and Planning Guide book, form number GC26-7444.. (This information needs to match the IP information already updated on the clusters of the 2105.) Update the following for each cluster of the 2105 Model 800: v HostName v IP Address v Subnet Mask i. Click on Save Note: The window may not close if data is missing or not correct. j. Repeat steps 9g to 9i for each 2105 on the ESSNet. k. Click OK. l. Close the ESSNet Toolkit window. 10. Configuring the ESSNet Console Network Information. a. Point the mouse at the Network Neighborhood Icon. Press right mouse button. Select Properties on pop-up menu. Select the Identification tab. Click on the Change button. Change the Computer Name to the customer supplied Hostname. Click OK. Click OK on the window that shows that the computer name has been changed. i. Select Protocols tab. j. Highlight TCP/IP Protocol. k. Click on the Properties button. l. Select the IP Address tab and update the following for the ESSNet Console with the information provided by the customer: v The IP address of the ESSNet Console v The Subnet Mask of the ESSNet Console v The Default Gateway of the ESSNet Console m. Select the DNS tab and update the following for the ESSNet Console with the information provided by the customer: v The Hostname of the ESSNet Console v The Domain of the ESSNet Console v The DNS Servers Gateway of the ESSNet Console n. Select Apply. o. Select OK. p. Select OK. q. Reboot ESSNet Console. 11. Verifying the Network Connection. a. Ensure that the ESSNet console is powered on and the web browser is started.
Problem Isolation Procedures, CHAPTER 3

b. c. d. e. f. g. h.

109

MAP 1620: Attaching The ESSNet to a Customer Network


b. Click on the browsers Home button. This will display the Enterprise Storage Servers network home page. c. Click on the ESS Specialist button on the left side of the panel. d. Click on each 2105 Model 800 cluster (1 and 2) that you want to connect to. This will start a new browser window and start the ESS Specialist on the selected cluster. e. Verify that the rack serial number is correct for each cluster. Did the web browser connect correctly to each Cluster? v Yes, continue with the next step. v No, go to MAP 5000: TotalStorage ESS Specialist Cannot Access Cluster in chapter 3 of the Volume 1. 12. Plug the customers RJ45 cable into the hub. If the customers connector requires a non-crossover port, use port 16MDI. If the customer requires a crossover port, use 16MDI-X, see Figure 35 on page 96. Never connect a cable in both connectors 16MDI-X and 16MDI at the same time. For more information, refer to documentation provided with hub. 13. Have the customer verify the connection by bringing up the ESS Specialist to one of the attached 2105 clusters. Did the web browser connect correctly to the selected cluster? v Yes, continue with the next step. v No, go to MAP 5000: TotalStorage ESS Specialist Cannot Access Cluster in chapter 3 of the Volume 1. 14. Use the service terminal to test the network connection with the E-Mail Test to a customer defined E-mail recipient. Connect the service terminal interface cable to the S2 connector on the front of cluster 1 and log into cluster 1. From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Customer Notification (via E-Mail) Follow the instructions on the service terminal to send E-mail to a customer designated E-mail recipient. Use the service terminal to test the network connection with the SNMP test to a customer defined SNMP recipient: From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Customer Notification (via SNMP) Follow the instructions on the service terminal to send a customer notification via SNMP. Verify that the E-mail and the SNMP notification were received by the customer. When the E-mail and SNMP messages are successful, move the service terminal interface cable to S2 on cluster 2. Repeat the same tests, steps 14 to 16, on cluster 2. Setup is complete, inform the customer that he is now attached to the ESSNet. Return to the procedure that sent you here.

15.

16. 17.

18. 19.

110

VOLUME 1, TotalStorage ESS Service Guide

MAP 1630: Master Console Product Recovery Wizard

MAP 1630: Master Console Product Recovery Wizard for Xseries 206 PCs
Description
The ESSNet Consoles personal computer (PC) has a problem. Note: The IBM TotalStorage ESS Master Console will be referred to as the Master Console in this document.

Run the Hardware Configuration


This map is only required if Restoring PC Software - Master Console on IBM PC300 PCs, IBM NetVista PCs, Xseries 205 or Xseries 206 PCs on page 72 has been previously performed. Note: This map should only be used for Xseries 206 PCs. For IBM PC300 PCs, IBM NetVista PCs, or Xseries 205 PCs use MAP 1605: Master Console Product Recovery Wizard on page 73. 1. Verify that the ESSNet PC is turned on and the initial Master Console Product Recovery Wizard screen is displayed. 2. At the RESTORE CONFIGURATION screen (1 of 3): a. To restore a previously backed up configuration, type Yes and press Enter: 1) Insert the diskette containing the backup into the drive and press Enter. 2) Press Enter again. 3) Remove the diskette from the drive and press Enter. b. If there is no previously backed up configuration, type No and press Enter. 3. At the TIMEZONE CONFIGURATION screen (2 of 3), press Enter. 4. At the Configure Timezones screen: a. Use the Tab key to move the cursor from the [*] Hardware clock set to GMT field, to the list displaying the locations. b. Use arrow keys to select your location, for example America/Los_Angeles. c. Use the Tab key to move the cursor to OK button. d. Press Enter to continue. 5. At the CONFIGURATION COMPLETE screen (3 of 3): The configuration update is now complete. Type No and press Enter. 6. The ESSNet Console PC begins booting. Check for the following messages during the PC boot process: a. Equinox SST driver loaded: The MSA PCI card has been recognized by the system b. Installed Memory: xxxMByte: The number in xxx should be 192 or higher c. The modem has been secured successfully: The modem is recognized by the system. This messages will stay on the PC monitor screen for 30 seconds. Press any key to continue before the 30 seconds times out. 7. The Master Console login screen (blue background with Console logos) indicates the Master Console is ready for use. Note: If you do NOT get the Master Console login screen or error messages are displayed, verify that the correct CD was used for the Xseries 206. Otherwise, contact your next level of support. 8. END of PROCEDURE. Return to the MAP or procedure that sent you here.

Problem Isolation Procedures, CHAPTER 3

111

MAPs 2XXX: Power and Cooling Isolation Procedures

MAPs 2XXX: Power and Cooling Isolation Procedures


Procedures in the MAP 2XXX group in Chapter 3 cover the power and cooling areas.

MAP 2000: Model 100 Attachment Rack Reported


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 2105 Model 800 RPC cards are reporting a Model 100 attachment unit. Attachment of a Model 100 to a 2105 Model 800 is not supported. The problem is caused by incorrectly set RPC card DIP switches or a Model 100 has been connected.

Isolation
1. The 2105 Model 800 does not support connection of a Model 100 attachment rack. Note: 2105 Model Exx/Fxx do support the Model 100 attachment. Verify that RPC card DIP switch 5 is set to Off for both RPC cards.

MAP 2020: Isolating Power Symptoms


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Most power symptoms create a related problem which should be used to start the problem repair. If a related problem was not created, the table below can be used to start the repair.

Isolation
Use the table below to find and repair your power symptom:
Table 23. 2105 Model 800 Power Symptoms Power Symptom Visual power symptoms. Description and Action Description: A problem is created for most power problems. Action: Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option to begin the repair. If no related problems are found, go to MAP 1320: Isolating Problems Using Visual Symptoms on page 60. 2105 Model 800 will not power Description: If the RPC card switches are set for local mode, on in local mode. the 2105 Model 800 Local power switch should be able to power it on. Action: Go to MAP 2400: 2105 Model 800 Local Power On Problems on page 149.

112

VOLUME 1, TotalStorage ESS Service Guide

MAP 2020: Power Symptoms


Table 23. 2105 Model 800 Power Symptoms (continued) Power Symptom Description and Action

2105 Model 800 will not power Description: If the RPC card switches are set for local mode, off in local mode. the 2105 Model 800 Local power switch should be able to power it off. If a pinned data condition exists, a problem will have been created and the 2105 Model 800 will not power off until that condition is repaired. Action: Go to MAP 2440: Rack 1 Power Off Problem on page 157. 2105 Model 800 will not power Description: If the RPC card switches are set for remote on in remote mode. mode, a 2105 Model 800 remote system should be able to power it on. Action: Go to MAP 2390: Rack 1 Power On Problem, Remote Mode on page 140. 2105 Model 800 will not power Description: If the RPC card switches are set for remote off in remote mode. mode, a 2105 Model 800 remote system should be able to power it on. If a pinned data condition exists, a problem will have been created and the 2105 Model 800 will not power off until that condition is repaired. Action: Go to MAP 2390: Rack 1 Power On Problem, Remote Mode on page 140. 2105 Model 800 will not power Description: If the RPC card switches are set for automatic on or off in automatic mode. mode, the 2105 Model 800 should power on the first time line cord power returns after both line cords lost power. Action: Go to MAP 2370: Rack 1 Power On Problem, Automatic Mode on page 136. 2105 Model 800 UEPO problems. Description: The UEPO switch on the operator panel should prevent the 2105 Model 800 power on when in the off position and should allow the 2105 Model 800 power on when in the on position. Action: Go to MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem on page 131.

CEC Drawer will not power on Action: Go toMAP 4880: Cluster Power On Problem on page 461.

I/O Drawer will not power on

Action: Go toMAP 4880: Cluster Power On Problem on page 461.

Single Host Bay will not power Action: Go toMAP 4870: Host Bay Power On Problem on on page 459.

MAP 2030: CEC, I/O, or Host Bay Drawer Overcurrent


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Problem Isolation Procedures, CHAPTER 3

113

MAP 2030: CEC, I/O, or Host Bay Drawer Overcurrent

Description
The failing drawer power supplies are most likely creating an overcurrent condition on one of the output power busses that both power supplies share.

Isolation
Overcurrent affects both power supplies in the drawer due to the shared power busses. Find the drawer power supply that is failing: v CEC drawer power supplies, continue with the next step. v I/O drawer power supplies, go to step 3. v Host Bay drawer power supplies, go to 4. 2. One of the FRUs in the CEC drawer is drawing too much current and needs to be replaced: v Quiesce and power off the cluster. Login in to the cluster not being repaired and use the Repair Menu, Alternate cluster repair options. v Unplug or replace one or more of the CEC drawer FRUs including the power supplies and then power on retest until the overcurrent condition no longer occurs. 3. One of the FRUs in the I/O drawer is drawing too much current and needs to be replaced: v Quiesce and power off the cluster. Login in to the cluster not being repaired and use the Repair Menu, Alternate cluster repair options. v Unplug or replace one or more of the I/O drawer FRUs including the power supplies and then power on retest until the overcurrent condition no longer occurs. 4. A FRU in the host bay drawer is drawing too much current and needs to be replaced. The failure may be in either host bay, or in the host bay drawer power backplane. v Quiesce and power off a host bay using the Repair Menu, Replace a FRU option. v Unplug or replace one or more of the host bay FRUs including the power supplies and then power on retest until the overcurrent condition no longer occurs. v If the overcurrent still occurs, replace the host bay drawer backplane using MAP 4850: Repair the Host Bay Drawer on page 458. 1.

MAP 2031: Repair Ground Continuity


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The previous procedure measured more than 0.1 ohm resistance between the ground pin of the mainline power cable and the primary power supply enclosure. Follow these steps to check and repair the ground continuity problem.

Isolation
1. Disconnect the problem mainline power cable 1 [Figure 36] or 2 from the line cord bracket.

114

VOLUME 1, TotalStorage ESS Service Guide

MAP 2031: Power Symptoms

PPS-2 PPS-1 Front View

Figure 36. Line Cord Bracket Connectors (s009124)

2. Prepare the multimeter to measure 0.1 ohm or less resistance. For connector information, refer to figure Figure 39 on page 117. v For the mainline power cable (plug in): Place one lead of the multimeter on the ground pin of the male plug on the mainline power cable.

Problem Isolation Procedures, CHAPTER 3

115

MAP 2031: Power Symptoms

2105

Primary Power Supply

Do Not Connect to 2105

Ground 2105 Female Connector

Meter Ground Customer Male Connector

Mainline Power Cable

Ohm

1 50/60 A

Good: 1.0 Ohm or less


3 30/50 A

Fail: More than 1.0 Ohm

3 60 A

Customer AC Power

Wired

CB OFF
Tag: Do Not Connect... S229-0237

Green/Yellow wire

Figure 37. Ground Continuity Repair Diagram (s009406)

Ground Single-Phase 50/60 amp

Ground Three-Phase 30/50 amp Three-Phase 60 amp

Figure 38. Male Plug on the Mainline Power Cable (S008045l)

v For the mainline power cable (wired): Place one lead of the multimeter on the green and yellow wire at the customer end of the mainline power cable. 3. Place the other lead on the ground pin of the female connector on the mainline power cable.

116

VOLUME 1, TotalStorage ESS Service Guide

MAP 2031: Power Symptoms

Ground

Figure 39. Female Connector on the Mainline Power Cable (S008046l)

4.

5. 6.

7.

8. 9.

v If there is 0.1 ohm or less resistance, the mainline power cable is good but the primary power supply enclosure is not grounded. Perform steps 7 through 9. v If there is more than 0.1 ohm resistance, the mainline power cable ground lead is open or has resistance. Perform steps 4 through 6. The ground lead on the primary power supply is open or has resistance. Replace the mainline power cable Return to Primary Power Supply Removal and Replacement in chapter 4 of the Volume 2, and then return here to continue. Insert the female connector on the new mainline power cable into the inlet on the line cord bracket. Return to Checking the Ground Continuity in chapter 5 of the Volume 2, to verify that ground continuity now measures 0.1 ohm or less resistance on the replaced cable. The primary power supply enclosure is not grounded, replace the primary power supply. Go to Primary Power Supply Removal and Replacement in chapter 4 of the Volume 2, and then return here to continue. Insert the female connector on the mainline power cable into the inlet on the line cord bracket. Return to Checking the Ground Continuity in chapter 5 of the Volume 2, to verify that ground continuity now measures 0.1 ohm or less resistance

MAP 20A0: Cluster Not Ready


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 2105 Model 800 is powered on. The cluster should be powered on and the 2105 Model 800 operator panel cluster Ready indicator should be on.

Isolation
1. The operator panel cluster Ready indicator may be off for one of the following reasons: v You have quiesced the cluster as part of a service action. The Ready indicator will be lit when the cluster is successfully resumed. If the problem is resolved, return to the procedure that sent you here. v The cluster has been fenced by an error and a problem was created. Display and repair the related problem.
Problem Isolation Procedures, CHAPTER 3

117

MAP 20A0: Cluster Not Ready


v For all other conditions, continue with the next step. 2. Observe the 2105 Model 800 operator panel cluster Ready indicator. Is the cluster Ready indicator on? v Yes, the problem is not failing. Go to MAP 1500: Ending a Service Action on page 67. v No, continue with the next step. 3. Observe the CEC drawer operator panel on the failing cluster. Is it displaying any codes, changing codes or the same code? v Yes, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v No, continue with the next step. 4. Connect the service terminal to the failing cluster and attempt to login. Did the service terminal login and display the main menu? v Yes, continue with the next step. v No, go to 6. 5. Connect the service terminal to the working cluster and use the Main Menu, Repair Menu, Show / Repair Problems Needing Repair option. Is there a related problem? v Yes, exit this MAP and repair the problem. v No, there is no single point of hardware failure that should cause this error. It is possible for the microcode to switch off the Ready LED, even if there is no hardware or microcode error. The microcode will operate the Ready LED normally again after the next cluster power on. If the customer is operating normally, the Ready LED off is a false indication. If the customer is having problems with the cluster, call the next level of support. 6. Connect the service terminal to the working cluster and login. Test cluster to cluster communication through the ethernet cable: From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair The problem status for the login cluster will be displayed. The problem status for the failing cluster will also be displayed. If the failing cluster cannot communicate, an an error message will be displayed for that clusters problems. Was there a communication error message for the failing cluster? v Yes, go to step 8. v No, continue with the next step. 7. The failing cluster is able to communicate with the other cluster. It is not accepting logins, and the Ready indicator is off, call the next level of support. 8. Determine if the failing cluster is powered on. Observe the following LED indicators: v The CEC drawer power LED indicator (amber) on the front lower left of the drawer. v The I/O drawer power LED indicator (green) on the CEC drawer display panel, upper left corner. Are both indicators on solid? v Yes, the cluster is powered on but is not accepting logins and it is not in Ready. A power cycle of the cluster may recover it to normal operation. Power the cluster off then on, then resume the failing cluster:

118

VOLUME 1, TotalStorage ESS Service Guide

MAP 20A0: Cluster Not Ready


Connect the service terminal to the working cluster. From the service terminal Main Service Menu Repair Menu Alternate Cluster Repair Menu If the cluster is still failing, call the next level of support. v No, go to MAP 2600: RPC Card Cannot Reset a Power Fault on page 169.

MAP 2210: Host Bay Drawer Power Supply Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A single host bay drawer power supply is not switching power on or off as expected by the 2105 functional code. There are two possible failures: v The power supply may be failing so that it is always in the same state, always on, or always off. v The power supply may not be receiving power off or on signals, from the RPC card, through the cable shared with the other power supply in the same host bay drawer.

Isolation
1. Ensure both RPC card to host bay drawer power supply cables are connected to both host bay drawer power supplies. 2. Repeat the power on or off procedure that sent you here. If the procedure still fails, return here and continue with the next step. 3. Observe the failing host bay drawer power supply input power LED indicators. Are both input indicators off? v Yes, continue with the next step. v No, go to step 5. 4. The power supply does not see either power input as present. Each power input cable is shared by the six power supplies (CEC drawer, I/O drawer, host bay drawer) in that half of the 2105. If both cables had no power, all the drawers would be powered off. Only one power supply is failing, replace it. 5. The other power supply in the host bay drawer is working as expected by the 2105 functional code. Only one RPC to host bay drawer power supply cable needs to be working for the power supply to operate properly. It is unlikely that both cables are failing. You can replace the failing power supply now, or do the following isolation test: a. Unplug one of the RPC to host bay drawer power supply cables from the failing power supply. b. Repeat the original operation that failed to power the host bay off or on: v If it works, replace the cable that is disconnected. v It it fails, reconnect the cable and unplug the other cable from the failing power supply. Repeat the test: If it works, replace the cable that is disconnected. If it fails, replace the power supply.

Problem Isolation Procedures, CHAPTER 3

119

MAP 2220: Input Power Not Detected

MAP 2220: Input Power to CEC, I/O, Host Bay Drawer Power Supply Not Detected
MAP 2220, input power to CEC, I/O, host bay drawer power supplies not detected Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Each CEC, I/O, and Host Bay drawer power supply has two input power connectors. Each power supply can operate normally with one or both input connectors receiving input power. Input connector 1 on the six CEC, I/O, and host bay drawer power supplies, in a cluster, share a common input power cable from one PPS. Input connector 2 is supplied by a similar cable from the other PPS. Notes: 1. Before replacing an input power cable to the CEC drawer, I/O drawer, or host bay power supply, verify that the power supplies are receiving power through the other input power cable. Observe the power supply input LED indicators on the affected power supply. v CEC drawer power supply input power indicators are: PWR 1 and PWR 2. v I/O drawer power supply input power indicators are: PWR 1 and PWR 2. v Host bay power supply input power indicators are: INPUT J11 and INPUT J12. 2. When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence.

Isolation
Observe the six power supplies for the CEC, I/O, and host bay drawers for a cluster. How many power supplies have the same power supply input power LED indicator off? v One, continue with the next step. v Two to Five, go to step 5 on page 121. v Six, go to step 6 on page 121. 2. Observe the failing power supply. Is the input power cable plugged into the failing power supply input connector? v Yes, go to step 4. v No, continue with the next step. 3. Plug the input power cable. Did the power supply input power LED indicator come on? v Yes, use the Repair Menu options, Close a Previously Repaired Problem and End Of Call Status to complete the repair action. v No, continue with the next step. 4. Unplug the cable from the failing power supply input connector. Check the connector contacts on the cable and failing power supply. v If a problem is found, replace the damaged FRU. 1.

120

VOLUME 1, TotalStorage ESS Service Guide

MAP 2220: Input Power Not Detected


v If no problem is found, replace the following FRUs until the problem is repaired: power supply, input power cable. Use the Repair Menu, Replace a FRU menu option. Note: You may be able to test the power supply by installing it in the same position in the other cluster. Use the Repair Menu, Replace a FRU menu option. Observe the failing power supplies. Are the input power cables plugged in? v Yes, replace the input power cable. Use the Repair Menu, Replace a FRU menu option. v No, go to step 7. Reference Table 24 on page 122 below for cable plugging. Use the Cable To Location column to find the cluster and power supply input connector that is failing. Use the Cable From Location to determine the PPS and connector. Observe the PPS, is the power cable plugged in to the connector? v Yes, go to step 8. v No, continue with the next step. Plug the cable in and observe the power input LED indicators. Did the power supply input LED indicators come on? v Yes, use the Repair Menu options, Close a Previously Repaired Problem and End Of Call Status to complete the repair action. v No, continue with the next step. Press the operator panel Local power switch momentarily to the on position. Did the power supply input LED indicators come on? v Yes, use the Repair Menu options, Close a Previously Repaired Problem and End Of Call Status to complete the repair action. v No, continue with the next step. The PPS output for the connector is not being switched on. Observe the digital display indicator on the front of the PPS. Are any codes displayed? v Yes, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127, to repair the PPS. If the original problem with the power input LED indicator being off still occurs, return to the beginning of this MAP and repair that problem. v No, replace the PPS using the Repair Menu, Replace a FRU menu option. Note: The power cable is not a FRU because there is no single wire for voltage or ground that would cause all six power supplies to fail.

5.

6.

7.

8.

9.

Problem Isolation Procedures, CHAPTER 3

121

MAP 2220: Input Power Not Detected


Table 24. Cluster Power Supply Input Power Cable Plug Chart Cable From Location Cable To Location

PPS-1 output connector J7-1 PPS-1 output connector J7-2

Cluster 1 (T1), CEC, I/O, Host Bay drawer power supplies, input 1 connector Cluster 2 (T2), CEC, I/O, Host Bay drawer power supplies, input 2 connector

PPS-2 output connector J7-1 PPS-2 output connector J7-2

Cluster 2 (T2), CEC, I/O, Host Bay drawer power supplies, input 1 connector Cluster 1 (T1), CEC, I/O, Host Bay drawer power supplies, input 2 connector

MAP 2230: CEC, I/O, or Host Bay Drawer Power Fault


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The CEC and I/O drawer power supplies will switch on the amber CHK/PWR-GOOD LED indicator for: v v v v Missing input power Over-current Over-voltage Under-voltage

The Host Bay power supplies will switch off the HA1 or HA2 power LEDs under control of the RPC or for the following error conditions: v Missing input power on both inputs v Over-current v Over-voltage v Under-voltage Notes: 1. Before replacing an input power cable to the host bay drawer power supplies, verify that the power supplies are receiving power through the other input power cable. Observe the power supply INPUT PRESENT, LED indicators. 2. When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence.

Isolation
1. Were you sent here for a problem with one of the Host Bay power supplies? v Yes, continue with the next step. v No, continue to step 7 on page 124. 2. Reference the ESC in the problem log. Use the following table to determine the failing FRU and action:
ESC Failing FRU Action

122

VOLUME 1, TotalStorage ESS Service Guide

MAP 2230: CEC, I/O, or Host Bay Drawer Power Fault


8435 8436 8437 8438 849C 849D 849E 849F Host bay power supply R1B1V1 Host bay power supply R1B1V2 Host bay power supply R1B3V1 Host bay power supply R1B3V2 Host bay power supply R1B1V1 Host bay power supply R1B1V2 Host bay power supply R1B3V1 Host bay power supply R1B3V2 Go to step 4 Go to step 4 Go to step 4 Go to step 4 Go to step 3 Go to step 3 Go to step 3 Go to step 3

3. Replace the failing FRU listed in the previous step. Select the FRU for replacement using the problem log. If the FRU is not selectable in the problem log, use: Repair Menu, Replace a FRU option. (The Host bay power supplies are listed under the rsrack1 container.) v If the FRU replacement is completed successfully, continue with step 12 on page 124. v If the FRU replacement is unsuccessful, replace any remaining FRUs listed in the problem. Use the Repair Menu, Replace a FRU option. 4. Observe the input J11 and J12 LEDs on the failing Host bay power supply. Is one or both of the LEDs off? v Yes, go to map MAP 2220: Input Power to CEC, I/O, Host Bay Drawer Power Supply Not Detected on page 120. v No, continue with the next step. 5. Observe the HA1 and HA2 LEDs on the failing power supply. Is one or both of the LEDs off? v Yes, continue to the next step. v No, the power supply does not appear to be failing. Perform a dummy replacement of the power supply. Use: Repair Menu, Replace a FRU option. Ensure that the power supply is physically removed and replaced. The Host bay power supplies are listed under the rsrack1 container. If the dummy replacement is successful, then continue with step 12 on page 124. If the dummy repair is unsuccessful, then replace the failing power supply or other FRUs listed in the problem. Use: Repair Menu, Replace a FRU option. Continue with step 12 on page 124 when the replacement is completed. 6. Observe the HA1 and HA2 LEDs on the companion Host bay power supply in the same drawer. Is the same LED off on both power supplies? v Yes, go to map MAP 2030: CEC, I/O, or Host Bay Drawer Overcurrent on page 113. v No, replace the failing power supply determined in step 2 on page 122. Use: Repair Menu, Replace a FRU option.
Problem Isolation Procedures, CHAPTER 3

123

MAP 2230: CEC, I/O, or Host Bay Drawer Power Fault


Note: The Host bay power supplies are listed under the rsrack1 container. When the FRU replace is completed successfully continue with step 12. Observe the CEC or I/O power supply CHK/PWR-GOOD LED indicator. Is the LED lit with amber color? v Yes, continue with the next step. v No, the failure is no longer present. Return to the procedure that sent you here. Observe the INPUT PRESENT, LED indicators on the failing power supply. Is an LED indicator off? v Yes, go to MAP 2220: Input Power to CEC, I/O, Host Bay Drawer Power Supply Not Detected on page 120. v No, continue with the next step. Observe the companion power supply in the same drawer. Is the LED lit with amber color? v Yes, go to MAP 2030: CEC, I/O, or Host Bay Drawer Overcurrent on page 113. v No, continue with the next step. Use the service terminal Repair Menu, Replace a FRU menu options to replace the failing power supply. Was the repair successful? v Yes, go to step 12. v No, continue with the next step. One of the power inputs to the failing power supply may be causing the error. The FRUs for each power input are the PPS and cable from the PPS to the failing power supply. Use the service terminal Repair Menu, Replace a FRU menu options to replace the FRUs.

7.

8.

9.

10.

11.

Note: Call the next level of support before proceeding with the FRU replacement. 12. If you used the Replace a FRU menu option, then close the problem that sent you here. Use: Repair Menu, Close a Previously Repaired Problem. Then use: Repair Menu, End of Call Status menu option to complete the service action.

MAP 2320: Installed Unit or Feature Mismatch


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 2105 Model 800 code has detected an installed unit or feature that is not correct.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The 2105 Model 800 code has detected one of the following:

124

VOLUME 1, TotalStorage ESS Service Guide

MAP 2320: Installed Unit or Feature Mismatch


v RPC card DIP switch 5 is set to on (Model 100 attachment rack). This is not supported, switch 5 must be set to off. v A Model 100 attachment rack is physically connected to the 2105 Model 800. The Model 100 must be removed. v The rack battery set connected to primary power supply 1 (PPS) is not being detected. Verify that the power cable and signal cable from the battery set are connected to the PPS. v Observe the digital status display on the front of the PPS: If codes are displayed, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. If no codes are displayed, go to MAP 2470: Battery Set Detection Problem on page 162.

MAP 2340: PPS Status Code 06


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The PPS status code of 06 is a communication failure between PPS-1 and PPS-2. This communication failure can be caused by two different conditions: 1. A hardware communication fault between PPS 1 and PPS 2. Because PPS 1 and PPS 2 communicate in both directions, the failure could be in either PPS or the communication cable. 2. A mismatch of the PPS identifications. When a PPS is installed in the PPS-2 position, which never has a battery signal cable connection, the PPS identification status code should be a 92. When a PPS is installed in the PPS-1 position, which always has a battery signal cable connection, the PPS identification status code should be a 91. If both PPS have the same identification status code, they will display an 06 status code.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Verify that the PPS to PPS Cable is properly plugged into the J3 connector on PPS 1 and PPS 2. Is the cable connected correctly? v Yes, continue with the next step. v No, Read the Attention below before continuing. Connect the cable and then press the 2105 Model 800 operator panel Local power on switch momentarily to on (up). If the status code 06 is no longer displayed, go to MAP 1500: Ending a Service Action on page 67. If the status code 06 is still displayed, continue with the next step. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS. 2. Ensure both PPS have the same code level.

Problem Isolation Procedures, CHAPTER 3

125

MAP 2340: PPS Status Code 06


Display the code level. Press the 2105 Model 800 operator panel Local power on switch momentarily to on (up). Observe the status code display on each PPS. A sequence of 00, then xx (the code level number, 30-89), and then yy (either 91 or 92). Are the code levels the same? v Yes, continue with the next step. v No, call the next level of support to determine the proper code level. Replace the PPS that contains the improper code level. Use the service terminal Repair Menu, Replace a FRU, Rack Power Cooling FRUs options for the Primary Power Supply. After the PPS is replaced, ensure both PPS have the same code level. See the description for status code 00-xx-yy. Note: Ensure both PPS in the rack are the same type. The new type have an additional connector J5C that is not present on the old type. (The exception to this is while upgrading a rack from the old to new type of PPS concurrently.) For further information see Primary Power Supply, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. Determine if the communication failure is caused by a PPS identification mismatch. Display the PPS-2 identification. Press the 2105 Model 800 operator panel Local power on switch momentarily to on (up). Observe the PPS-2 status code display. A sequence of 00, then xx (any number between 30-89), and then yy is repeated for about 10 seconds. Does yy = 92? v Yes, continue with the next step. v No, replace PPS-2. Use the service terminal Repair Menu, Replace a FRU, Rack Power Cooling FRUs options for the Primary Power Supply. After the PPS is replaced, ensure both PPS have the same code level. See the description for status code 00-xx-yy. Display the PPS-1 identification. Press the 2105 Model 800 operator panel Local power on switch momentarily to on (up). Observe the PPS-1 status code display. A sequence of 00, then xx (any number between 30-89), and then yy is repeated for about 10 seconds. Does yy = 91? v Yes, go to step 9 on page 127. v No, continue with the next step. Verify that the PPS-1 to Battery Signal cable is properly plugged into the PPS-1 J5B connector and the Battery J1B connector. Is the cable connected correctly? v Yes, continue with the next step. v No, connect the cable correctly, then return to the top of this MAP. Switch the battery circuit breaker to the off position (down). Unplug both ends of the PPS-1 to Battery Signal Cable (PPS-1 J5B and Battery J1B). Use a meter to measure continuity of each of the four wires in the cable. Does the continuity indicate any wire as an open circuit? v Yes, replace the PPS-1 to Battery Signal Cable, switch the Battery circuit breaker to the on position (up) and then return to the top of this MAP. v No, continue with the next step. Measure the continuity between the upper two pins of the 390V Battery J1B connector. Measure the continuity between the lower two pins of the 390V Battery J1B connector. Do both pairs of pins indicate a closed circuit?

3.

4.

5.

6. 7.

8.

126

VOLUME 1, TotalStorage ESS Service Guide

MAP 2340: PPS Status Code 06


v Yes, replace PPS-1. Use the service terminal Repair Menu, Replace a FRU, Rack Power Cooling FRUs options for the Primary Power Supply. After the PPS is replaced, ensure both PPS have the same code level. See the description for status code 00-xx-yy. v No, replace the 390V Battery Set. Use the service terminal Repair Menu, Replace a FRU, Rack Power Cooling FRUs options for the Primary Power Supply. 9. A communication problem exists between the PPS. Note: A bent pin in the J3 connector in either PPS can cause this failure. Do both PPS display an 06 status code? v Yes, Read the Aattention below before continuing. Replace each PPS and the PPS to PPS Cable until the problem is fixed. Use the service terminal Repair Menu, Replace a FRU, Rack Power Cooling FRUs options for the Primary Power Supply. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS. v No, Read the Aattention below before continuing. The PPS displaying the 06 status code is receiving bad parity from the sending PPS. Replace the sending PPS, the PPS to PPS Cable, the receiving PPS in that order until the problem is fixed. Use the service terminal Repair Menu, Replace a FRU, Rack Power Cooling FRUs options for the Primary Power Supply. If the PPS is replaced, ensure both PPS have the same code level. See the description for status code 00-xx-yy. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS.

MAP 2350: Isolating PPS Status Indicator Codes


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
The PPS Status display is normally off. If a power fault is detected, the status display will display a two digit code. If more than one fault is present, the first status code will display followed by the next codes. If a status code is displayed, the operator panel Line Cord indicator for this PPS should be blinking slowly. Pressing the 2105 Model 800 operator panel Local power switch momentarily to the on position will display the PPS code level, the PPS I.D. and any status codes that are active.

Isolation
1. Observe the operator panel Line Cord LED indicator for the failing PPS. Find the condition that applies: v On solid, the failing condition is no longer present. Return to the procedure that sent you here or call the next level of support. v Blinking slowly, the failing condition is still present and a status code should be displayed. Continue at the next step.

Problem Isolation Procedures, CHAPTER 3

127

MAP 2350: PPS Status Indicator Codes


v Off, one of the following conditions exists: The PPS is powered off, the LED indicator is burnt out, or the RPC card is failing to light the LED. Go to MAP 1320: Isolating Problems Using Visual Symptoms on page 60 and use the table entry visual symptom of One operator panel Line Cord indicator off, the other Line Cord indicator on. 2. Observe the PPS digital status indicator display at front of PPS between the two fans. Is a status code displayed? v Yes, use the table below to lookup the code and perform the action. After the fault is repaired go to MAP 1500: Ending a Service Action on page 67. v No, continue with the next step. 3. Determine if the PPS digital status display is failing. Press the operator panel white (Local Power) power switch momentarily to the on position (up). This issues a reset to the PPS. Observe the PPS status display. Is a sequence of status codes (00-xx-yy in the table) displayed for up to 30 seconds? v Yes, the status display is working, but the PPS is not displaying error status. If the Line Cord LED is still blinking slowly (indicating error status is present), replace the PPS or call the next level of support. Use the Repair Menu, Replace a FRU menu options. v No, the status display is failing, replace the PPS FRU. Use the Repair Menu, Replace a FRU menu options. Note: A UEPO condition to both PPS in the rack can cause no status code to be displayed. See Table 25 for blank.
Table 25. PPS Status Display Codes Status Code blank Description and Action Description: If a problem log sent you to this MAP, the display should have a status code displayed. If both PPS have been switched off by an EPO condition, the displays will be blank. If only one PPS has been switched off by an EPO problem, the display should have a status code displayed. Power from the working PPS through the PPS to PPS communication cable will enable the PPs with the EPO to display status. (A blank display is normal if there is no error condition and it has been more than 30 seconds from a power on request to the PPS.) Action: At the failing PPS, observe the UEPO PWR LED indicator. If it is off, the PPS is not receiving line cord power. If it is on, observe the UEPO LOOP-STBY LED indicator. If it is off, a UEPO condition exists. Observe the operator panel UEPO red switch for the failing rack. If it is off, determine why before resetting it. If it is on, there is another UEPO problem, go to MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem on page 131 or MAP 2380: 2105 Expansion Enclosure (Rack 2) UEPO Problem on page 138 as needed. If it is on, the original problem condition may no longer be present. Call the next level of support. 00-xx-yy Description: PPS code level. 00 is displayed, followed by the PPS code level (xx, 3x-8x) and then the PPS I.D. (yy, 91=PPS-1, 92=PPS-2). This sequence will repeat a few times at the start of a PPS power and when the 2105 Model 800 operator panel Local power switch is momentarily pressed to on. Action: None. 01 Description: PPS Fan #1 fault. The fan rotation sensor is reporting the fan is below minimum speed. Action: Replace PPS Fan #1 (left fan). The visual symptoms automatically reset when the FRU is replaced. Then go to: MAP 1500: Ending a Service Action on page 67.

128

VOLUME 1, TotalStorage ESS Service Guide

MAP 2350: PPS Status Indicator Codes


Table 25. PPS Status Display Codes (continued) Status Code 02 Description and Action Description: PPS Fan #2 fault. The fan rotation sensor is reporting the fan is below minimum speed. Action: Replace PPS Fan #2 (right fan). The visual symptoms automatically reset when the FRU is replaced. Then go to: MAP 1500: Ending a Service Action on page 67. 03 Description: 390 V battery has a low charge. When fully discharged the battery can require up to 25 hours to become fully charged. The 03 status code will no longer display when the batteries are fully charged. v If status code 03 is still displayed after 25 hours a permanent error will be logged. v When status code 03 is no longer displayed, the 390 V Battery has been fully charged. Go to MAP 1500: Ending a Service Action on page 67. 04 Description: The 390 V battery set is not detected correctly. This can be a false error, if the correct cable connection sequence is not followed, when PPS-1 or the 390V battery is replaced. Action: Go to MAP 2470: Battery Set Detection Problem on page 162. 05 Description: 390 V battery fault. The system has lost customer mainline power input to both PPS and is now operating on battery set power. The 2105 Model 800 will complete writing customer data from cache to the DDMs and power off in five minutes. Action: Have the customer restore power to the 2105 Model 800. Find the condition that applies: v Remote power control feature not installed, RPC switch card set to local power control mode. Use the rack operator panel to power on. v Remote power control feature not installed, RPC switch card set to remote power control mode. Rack will power on automatically when customer line cord power returns. v Remote power control feature is installed, RPC switch card set to local power control mode. Use the rack operator panel to power on. v Remote power control feature is installed, RPC switch card set to remote power control mode. Have customer issue power on request from attach host systems. 06 Description: PPS communication fault to the other PPS in this rack due to hardware communication problem or both PPS reporting as the same logical PPS (PPS-1 or PPS-2). Action: v Go to MAP 2340: PPS Status Code 06 on page 125. 07 Description: PPS A/C input phase is missing. Action: v Use the power checks for this line cord listed in the service guide Install Chapter 5 for this rack. Use the service terminal Repair Menu, Replace a FRU option to prepare the PPS to be powered off for the checks. v If line cord power is not good, contact the customer. v If line cord power is good, replace the PPS . Use the service terminal Repair Menu, Replace a FRU option, Rack Power Cooling FRUs option.

Problem Isolation Procedures, CHAPTER 3

129

MAP 2350: PPS Status Indicator Codes


Table 25. PPS Status Display Codes (continued) Status Code 08 Description and Action Description: Input voltage not detected or UEPO loop open. Action: 1. Observe the PPS UEPO PWR indicator, see Figure 40 on page 132. Is the PPS UEPO PWR indicator on? v Yes, the UEPO loop seems to be open. Go to MAP 2365: UEPO Loop Problem on page 133. v No, the PPS line input seems to be missing. Continue with the next step. 2. Read the notes below then continue with the next step. Note: There are three types of PPS: v One for the complete input voltage range v One for a low input voltage range v One for a high input voltage rang Note: The high input voltage range PPS will act like the line cord input is missing if the customer is providing power at the low input voltage range. For more information, refer to Primary Power Supply Removal and Replacement, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. 3. Use the service terminal Repair Menu, Replace a FRU option to power off the PPS. 4. Use the Service Guide Checking the Customer Power in chapter 5 of the Volume 2. to verify the correct line input voltage for this PPS. Was the input power good ? v Yes, replace the PPS . From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Rack Power Cooling FRUs v No, work with the customer to identify the cause of the missing AC input. Note: If the Customer Circuit Breaker was found to have tripped unexpectedly, this condition could have happened due to a faulty CB or an incorrectly rated CB. Reference the Introduction and Planning Guide, GC267444, to ensure that the CB is correctly rated. If no problem is identified with the circuit breaker, then replace the PPS using the process associated with the Yes condition described above. 09 Description: PPS over-temperature condition. Action: v Check that no other fault codes are displayed, the room air temperature is within limits and proper airflow is not blocked. v Replace the PPS. Use the service terminal Repair Menu, Replace a FRU option, Rack Power Cooling FRUs option. 10 Description: PPS Over-current Fault. Action: v If there is also a status code 14, repair it first. v If any output circuit breaker is tripped, go to MAP 2520: PPS Output Circuit Breaker Tripped on page 168. v If no output circuit breaker is tripped, replace the PPS . Use the service terminal Repair Menu, FRU Replace Menu options, Rack Power Cooling FRUs option. 11 Description: PPS Over-voltage Fault. Action: Replace the PPS . Use the service terminal Repair Menu, FRU Replace Menu options, Rack Power Cooling FRUs option.

130

VOLUME 1, TotalStorage ESS Service Guide

MAP 2350: PPS Status Indicator Codes


Table 25. PPS Status Display Codes (continued) Status Code 12 Description and Action Description: PPS Under-voltage Fault. Action: Note: Status code 12 might appear in combination with other status codes, for example, 101214. If this happens, you should follow the actions for the other status codes first as indicated in the following list. Correcting the other codes normally resolves status code 12. v If there is also a status code 14, then repair it first. v If there is also a status code 10, then repair that next. v If status code 12 is the only error indication, then replace the PPS. Use the service terminal Repair Menu, FRU Replace Menu options, Rack Power Cooling FRUs option. 13 Description: PPS Output CB tripped. Action: Go to MAP 2520: PPS Output Circuit Breaker Tripped on page 168. 14 Description: PPS internal logic error or a UEPO loop open error. Action: Go to MAP 2365: UEPO Loop Problem on page 133. 15 Description: Battery low early warning. The PPS is reporting that the battery charge capacity has reached a threshold where at least one minute of battery capacity remains to support the frame. Action: The 2105 Model 800 is on battery and the battery set has gone low. When the customer restores line cord power, the battery set will be automatically recharged. 16 Description: Input CB tripped Action: If the input circuit breaker tripped and no output circuit breaker tripped, there is a problem inside the PPS. Do not reset the input circuit breaker (CB00) to the on position (up). Replace the PPS. Use the service terminal Repair Menu, Replace a FRU option, Rack Power Cooling FRUs option. If the input circuit breaker was switched off intentionally with no problem, switch the input circuit breaker back to the on position. After the CB is on, press the rack 1 operator panel Local Power switch to on, then release it. Wait for both PPS PWR GOOD indicators to come on solid. If the input circuit breaker was switched off intentionally and a problem occurred, DO NOT switch on the input circuit breaker. Replace the failing PPS. 3x-8x 91 92 Description: The PPS code level. See the description for status code 00-xx-yy above. Description: 91 is the ID status code for PPS-1. See the description for status code 00 above. Description: 92 is the ID status code for PPS-1. See the description for status code 00 above.

MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 2105 Model 800 operator panel UEPO (Unit Emergency Power Off) switch is used to switch off the PPS dc output. The logic voltage for the PPS internal logic, RPC card and operator panel are not switched off. To switch off all logic voltage, the PPS input circuit breaker must be switched to off. Note: Each PPS supplies the other PPS with logic voltage only for the PPS internal logic through the PPS to PPS communication cable. This occurs if the PPS input circuit breaker is on and customer line cord power is present.
Problem Isolation Procedures, CHAPTER 3

131

MAP 2360: UEPO


The PPS UEPO PWR indicator is on when the PPS has customer input power, the input circuit breaker is on and the PPS internal logic is providing UEPO logic voltage.
Primary Power Supply

Rear View

Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY

Front View PPS Digital Status (two digits)

Figure 40. 2105 Primary Power Supply Locations (s009048)

Isolation
The 2105 Model 800 will be powered off during this isolation. Ensure it is not in use by the customer. This isolation does a complete checkout of the UEPO functions. Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The 2105 Model 800 should be in local power control mode for this MAP. Ensure the local/remote switch is set to local (down) for both RPC switch cards (Local Switch Cards or Remote Switch Cards depending on which feature is installed). Verify the local/remote switch is set to local (down) on both RPC switch cards (Local Switch Cards or Remote Switch Cards depending on which feature is installed). 2. Power off the 2105 Model 800. 3. Ensure the input circuit breaker for each PPS is set to on (up). 4. Ensure that the 2105 Model 800 operator panel Unit Emergency switch is set to on (up).

132

VOLUME 1, TotalStorage ESS Service Guide

MAP 2360: UEPO


5. Ensure that the 2105 Model 800 operator panel Local/Remote switch, inside the front cover, is in the back position (partially covering the connector). 6. Ensure that the 2105 Model 800 operator panel Local Power rocker switch is not stuck in the down or up position. It is a momentary contact rocker switch. 7. Is the PPS UEPO PWR indicator on? v Yes, continue with the next step. v No, go to the install Chapter 5 and perform the customer line cord power checks. If no problems are found, replace the PPS and then return here. Use the service terminal Repair Menu, Replace a FRU option. 8. Does either Base Frame PPS display a Status code of 14 or 08 continuously or flashing in combination with other codes, for example 10-12-14? v Yes, go to MAP 2365: UEPO Loop Problem. Return here and continue with the next step when the repair is complete. v No, continue with the next step. 9. Switch the operator panel UEPO switch to the off position (O, down). Is the PPS UEPO LOOP-STBY indicator off? v Yes, go to step 11. v No, the UEPO switch is not opening the UEPO loop circuit. Continue with the next step. 10. Unplug the J6 connector on the on each PPS. Did the UEPO LOOP-STBY indicator go off ? v Yes, use a CE meter to measure the resistance between the pins on the end of each cable plug. You should measure a high resistance, for example greater that 1M ohm. A low resistance indicates a problem. Replace the faulty UEPO panel or J6 to UEPO panel cable. Continue with the next step when the problem is resolved. v No, one of the PPSs is failing. Unplug the J3 connector on one of the PPSs. The PPS with the UEPO LOOP-STBY indicator still on is failing. Replace the PPS. Continue with the next step when the problem is resolved. 11. The UEPO is working properly. Set the operator panel UEPO switch to the on position (up). Return to the procedure that sent you here, or go to MAP 1500: Ending a Service Action on page 67.

MAP 2365: UEPO Loop Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 2105 Model 800 operator panel UEPO (Unit Emergency Power Off) switch is used to switch off the primary power supply (PPS) 395 V dc output. The logic voltage for the PPS internal logic, RPC card, and operator panel are not switched off. To switch off all logic voltage, the PPS input circuit breaker must be switched to off. Note: Each PPS supplies the other PPS with logic voltage only for the PPS internal logic through the PPS to PPS communication cable. This occurs if the PPS input circuit breaker is on and customer line cord power is present.

Problem Isolation Procedures, CHAPTER 3

133

MAP 2365: UEPOLOOP


The PPS UEPO PWR indicator is on when the PPS has customer input power, the input circuit breaker is on and the PPS internal logic is providing UEPO logic voltage.
Primary Power Supply

Rear View

Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY

Front View PPS Digital Status (two digits)

Figure 41. 2105 Primary Power Supply Locations (s009048)

Isolation
Attention: If you are performing the following steps concurrently with customer operation, you must take care to ensure that you always remove the correct connectors. Failure to do so could result in a complete subsystem power drop. 1. The 2105 Model 800 should be in local power control mode for this MAP. Verify that the RPC card local/remote switch for each RPC card is set to local (down). If they are set to remote (up), set them to the down position. When the repair is complete, set them back to their original position. 2. Is status code 14 or 08 displayed on both PPSs in this rack? Note: The status code can appear continuously or flashing in combination with other codes, for example 10-12-14. v Yes, this condition is not expected. Call your next level of support. v No, continue with the next step. 3. Ensure the UEPO cable is plugged into PPS connector J6 and operator panel UEPO card connectors J1 or J2. Did you find a problem? v Yes, go to Step 7 on page 136.

134

VOLUME 1, TotalStorage ESS Service Guide

MAP 2365: UEPOLOOP


v No, continue with the next step. 4. Perform the following actions: a. Unplug the J6 connector on the failing PPS. b. Use a CE meter to measure the resistance between the pins on the cable plug. Was the resistance measured less then 1 ohm? v Yes, replace the PPS. Use the Repair Menu, Replace a FRU. v No, continue with the next step. 5. Is there a cable plugged in the UEPO Local/Remote connector 2 ?
UEPO LOCAL REMOTE

(R1- )

1 2
Clips

Front View
Figure 42. Rack Operator Panel Locations (s009714)

Rear View

v Yes, the cause may be a problem on the Customer Remote UEPO circuit. Do not continue, contact your next level of support for advice. v No, ensure that the UEPO Local/Remote switch is in the Local position 1 . If you found a problem go to Step 7 on page 136, if not, continue with the next step. 6. The PPS to UEPO card cable or the operator panel UEPO card is failing. Perform the following actions to determine the cause. a. Carefully trace the cable back from the PPS to the UEPO panel to identify the connector (P1 or P2) into which the cable plugs. b. Unplug the connector from the UEPO panel. c. Use a CE meter to measure the resistance between the pins on the UEPO panel connector. Was the resistance measured in step c less than 1 ohm? v Yes, replace the cable from the PPS to the UEPO panel then continue with the next step. v No, read the notes below, replace the UEPO panel and then continue with the next step. Notes: a. If the UEPO panel needs to be changed concurrently, contact your next level of support for a procedure.
Problem Isolation Procedures, CHAPTER 3

135

MAP 2365: UEPOLOOP


b. To test the UEPO function, use the procedure Check the 2105 UEPO and Power Supplies in chapter 5 of Volume 2. This cannot be done concurrently with customer use. 7. Ensure that the J6 connector is plugged back into the PPS. Press the Operator Panel Power On Switch towards the On (1) position and then release. Wait 15 seconds and then observe the PPS indicator. Is the display now blank? v Yes, continue with the next step. v No, return to MAP 2350: Isolating PPS Status Indicator Codes on page 127 to repair the problem with indicated Status Code. 8. The problem is now repaired. Return the Local/Remote switches to the position noted in step 1 on page 134. 9. Close the problem that sent you here. Use the Repair Menu and Close a previously repaired problem. Then use the Repair Menu and End of Call Status to verify that good subsystem status is established.

MAP 2370: Rack 1 Power On Problem, Automatic Mode


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 2105 Model 800 power can be controlled in three modes: v Local Power Control Mode: This mode is available with or without the Remote power feature, RPC DIP switch set to off (left) or on (right). The operator panel Local power switch controls power on and power off. For local power control, the RPC remote switch card Local/Remote or Local/Automatic switch is set to the Local position (down). v Automatic Power Mode: This mode is only available when the Remote power feature is not installed. RPC DIP switch 3 will be set to off (left). Loss of power to both line cords causes a power off after the 2105 Model 800 has de-staged customer data using the batteries for up to five minutes. When one or both line cords have power again, a power on automatically occurs. The automatic power on will only occur once after each power loss to both line cords. The operator panel Local power switch can also control power on and off. For automatic power control, the RPC card Local/Automatic switch is set to Automatic Position (up) and RPC card switch 3 is Off (left). v Remote Power Control Mode: This mode is only available when the Remote power feature is installed. RPC DIP switch 3 will be set to on (right). With line cord power present, a remote control power cable from a host system controls power on and power off. The operator panel Local power switch cannot control a power off or on. If the operator panel needs to be used to control power, switch the Local/Remote switch to the Local position (down). For remote power control, the RPC card Local/Remote switch is set to Remote position (up) and RPC card switch 3 is On (right). It only requires one host system to power on the 2105 Model 800, even if remote power control cables from others host systems that are powered off are connected. A single system cannot power off the 2105 Model 800 unless all the host systems with remote power control cables attached are powered off.

136

VOLUME 1, TotalStorage ESS Service Guide

MAP 2370: Automatic Power On Problem


See Table 26
Table 26. RPC Card and Local Switch Card Configuration Switch Settings Power Mode Automatic Automatic Remote Remote Local Local Local Local RPC Card RPC 1 RPC 2 RPC 1 RPC 2 RPC 1 RPC 2 RPC 1 RPC 2 Local Remote DIP DIP Switch Switch 1 Switch 2 Remote Remote Remote Remote Local Local Local Local On Off On Off On Off On Off Off On Off On Off On Off On DIP Switch 3 Off Off On On Off Off On On DIP DIP Switch Switch 4 5 Off Off Off Off Off Off Off Off Off Off Off Off Off Off Off Off DIP Switch 6 Off Off Off Off Off Off Off Off

Note: DIP switch 3 is set to off if Remote Power Control feature is not installed. DIP switch 3 is set to on if Remote Power Control feature is installed.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Use the service terminal Repair Menu, Display / Repair Problems Needing Repair option to repair any related power problems before continuing. 2. You must take the 2105 Model 800 away from the customer before continuing with this procedure. 3. Use the 2105 Model 800 operator panel Local power switch to power off. 4. Ensure the RPC Interconnect Cable is connected. 5. Ensure the RPC card to CEC, I/O, and host bay drawer cables are connected. 6. Ensure the switches on each RPC card are set for automatic mode per the table above. 7. Set the input MAIN LINE circuit breaker (CB00) to off (down) on PPS 1 and PPS 2. 8. Set the PPS 1 input CB to on (up). Did the 2105 Model 800 power on? v Yes, do the following steps: a. Power the 2105 Model 800 off. b. Set the PPS 1 input CB to off. c. Set the PPS 2 input CB to on. When the 2105 Model 800 powers on, return to the procedure that sent you here or go to MAP 1500: Ending a Service Action on page 67. v No, continue with the next step. 9. Set the PPS 1 input CB to off. 10. Set the PPS 2 input CB to on.
Problem Isolation Procedures, CHAPTER 3

137

MAP 2370: Automatic Power On Problem


Did the 2105 Model 800 power on? v Yes, do the following steps: a. Power off the 2105 Model 800 b. Set the PPS 2 input CB to off. c. Replace RPC1. d. Go to step 8 on page 137. v No, do the following steps: a. Set the PPS 2 input CB to off. b. Replace the following FRUs one at a time until the procedure works: RPC1 RPC2 RPC Interconnect Cable

MAP 2380: 2105 Expansion Enclosure (Rack 2) UEPO Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 2105 Expansion Enclosure operator panel UEPO (Unit Emergency Power Off) switch is used to switch off the PPS dc output in the 2105 Expansion Enclosure only. The logic voltage for the PPS internal logic, RPC card and operator panel are not switched off. To switch off all logic voltage, the PPS input circuit breaker must be switched to off. Note: Each PPS supplies the other PPS with logic voltage only for the PPS internal logic through the PPS to PPS communication cable. This occurs if the PPS input circuit breaker is on and customer line cord power is present. The 2105 Expansion Enclosure operator panel UEPO switch only powers off the 2105 Expansion Enclosure, not the 2105 Model 800. The 2105 Expansion Enclosure is powered on using the 2105 Model 800 operator panel local/remote power control switch. The PPS UEPO PWR indicator is on when the PPS has customer input power, the input circuit breaker is on and the PPS internal logic is providing UEPO logic voltage.

138

VOLUME 1, TotalStorage ESS Service Guide

MAP 2380: 2105 Expansion Enclosure UEPO


Primary Power Supply

Rear View

Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY

Front View PPS Digital Status (two digits)

Figure 43. 2105 Primary Power Supply Locations (s009048)

Isolation
The 2105 Expansion Enclosure and 2105 Model 800 will be powered off during this isolation. Ensure it is not in use by the customer. This isolation does a complete checkout of the UEPO functions. Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The 2105 Model 800 should be in local power control mode for this MAP. Ensure the local/remote switch is set to local (down) for both RPC switch cards (Local Switch Cards or Remote Switch Cards depending on which feature is installed). If they are set to remote (up), set them to the down position. When the repair is complete, set them back to their original position. 2. Power off the 2105 Model 800, which also powers off the 2105 Expansion Enclosure. 3. Ensure the input circuit breaker for each 2105 Expansion Enclosure PPS is set to on (up). 4. Ensure that the 2105 Expansion Enclosure operator panel Unit Emergency switch is set to on (up).

Problem Isolation Procedures, CHAPTER 3

139

MAP 2380: 2105 Expansion Enclosure UEPO


5. Ensure that the 2105 Expansion Enclosure operator panel Local/Remote switch, inside the front cover, is in the back position (partially covering the connector). 6. Is each 2105 Expansion Enclosure PPS UEPO PWR indicator on? v Yes, continue with the next step. v No, go to the install Chapter 5 and perform the customer line cord power checks. If no problems are found, replace the PPS and then return here. Use the service terminal Repair Menu, Replace a FRU option. 7. Is each 2105 Expansion Enclosure PPS UEPO LOOP-STBY indicator on? v Yes, continue with the next step. v No, the UEPO loop is open. Ensure the UEPO cable is plugged into PPS connector J6 and operator panel UEPO card connectors J1 or J2. If still failing replace the PPS to UEPO card cable, the operator panel UEPO card, the PPS until the UEPO LOOP-STBY indicator comes on. Then go to the next step. 8. Does either Expansion Frame PPS display a Status code of 14 or 08 continuously or flashing in combination with other codes, for example 10-12-14? v Yes, go to MAP 2365: UEPO Loop Problem on page 133. Return here and continue with the next step when the repair is complete. v No, continue with the next step. 9. Switch the 2105 Expansion Enclosure operator panel UEPO switch to the off position (O, down). Is each 2105 Expansion Enclosure PPS UEPO LOOP-STBY indicator off? v Yes, go to step 11. v No, the UEPO switch is not opening the UEPO loop circuit, continue with the next step. 10. Unplug the J6 connector on the on each 2105 Expansion Enclosure PPS. Did the UEPO LOOP-STBY indicator go off? v Yes, use a CE meter to measure the resistance between the pins on the end of each cable plug. You should measure a high resistance i.e. greater that 1M ohm. A low resistance indicates a problem. Replace the faulty UEPO panel or J6 to UEPO panel cable. Continue with the next step when the problem is resolved. v No, one of the PPSs is failing. Unplug the J3 connector on one of the PPSs. The PPS with the UEPO LOOP-STBY indicator still on is failing. Replace the PPS. Continue with the next step when the problem is resolved. 11. The UEPO is working properly. Set the 2105 Expansion Enclosure operator panel UEPO switch to the on position (up). You may now power up the 2105 Model 800 if needed. Return to the procedure that sent you here, or go to MAP 1500: Ending a Service Action on page 67.

MAP 2390: Rack 1 Power On Problem, Remote Mode


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
The 2105 Model 800 power can be controlled in three modes:

140

VOLUME 1, TotalStorage ESS Service Guide

MAP 2390: Remote Power On Problem


1. Local - With line cord power present, only the operator panel Local power switch controls power on and power off. RPC remote switch card Local/Remote or Local/Automatic switch in local position (down). 2. Automatic Mode - Loss of power to both line cords causes a power off after the 2105 Model 800 has de-staged customer data using the batteries for up to 5 minutes. When one or both line cords have power again, a power on automatically occurs. The automatic power will only occur once after each power loss to both line cords. The operator panel Local power switch can also control power on and off. RPC card Local/Automatic switch in Automatic position (up) and RPC card switch 3 is off (left). 3. Remote Mode - With line cord power present, a remote control power cable from a host system controls power on and power off. The operator panel Local power switch cannot control a power off or on. If the operator panel needs to be used to control power, switch the Local/Remote switch to the Local Position (down). RPC card Local/Remote switch in remote position (up) and RPC card switch is on (right). It only requires one host system to power on the 2105 Model 800, even if remote power control cables from others host systems that are powered off are connected. A single system cannot power off the 2105 Model 800 unless all the host systems with remote power control cables attached are powered off. The RPC cards each passes a 4.4 volt signal to the remote power control card which then is connected to each remote power control host port connector. That signal goes to the host which controls two return lines. The pick line return is pulsed momentarily to begin the 2105 Model 800 power on. The hold line return is held active to keep 2105 Model 800 powered on. When the hold line drops, the 2105 Model 800 will power off if no other hold lines from other hosts are active.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. This procedure requires the 2105 Model 800 be taken away from customer use so it can be powered off and on. Verify that all customer activity is stopped before going to the next step. 2. Do the following checks: v Verify that the RPC card switch 3 is set to remote, On (right). v Verify that the RPC remote switch card Local/Remote switch is set to remote, On (up). v Verify that the host system(s) remote power control cables are properly connected to the remote power control card in the tailgate and also at each host system. v Verify that the remote power control card, in the tailgate, to RPC remote switch card cable is properly connected to remote power control card connector J1 and both RPC remote switch cards. Are the cables and switches correct? v Yes, continue with the next step. v No, correct the problem and attempt to power on from the host system(s). If it still fails return to the beginning of this MAP. 3. Set the 2105 Model 800 to local power control mode.

Problem Isolation Procedures, CHAPTER 3

141

MAP 2390: Remote Power On Problem


v Set the Local/Remote switch to Local (down) for both RPC remote switch cards. Power off using the operator panel Local power switch (white). Wait up to 5 minutes for the power off to complete and then continue with the next step. If it will not power off, go to MAP 2440: Rack 1 Power Off Problem on page 157. 4. Set the 2105 Model 800 to remote power control mode. v Set the Local/Remote switch to Remote (up) for both RPC remote switch cards. 5. Determine if more than one host system is connected to the remote power control card in this tailgate. Is there more than one host system remote power control cable connected? v Yes, choose one of the following: If remote power DOES work from any of those host systems, go to step 6. If remote power DOES NOT work from any of those host systems, go to step 7. v No, go to step 8. 6. This step isolates the problem to the 2105 Model 800 or a host system. v Use step 3 on page 141 to power down the 2105 Model 800. v Use step 4 to change back to remote power control mode. v At the 2105 Model 800 remote power control card, unplug two remote power control cables, one from a host system that works and one from a host system that does not. Swap the two cables and plug them back in. v Attempt to power on from the host system that originally worked. Does the 2105 Model 800 power on. Yes, the 2105 Model 800 remote power control port works with one host system remote power control cable plugged in and fails with the other host system power control cable plugged in. The problem is in the host system or the remote power control cable from the system that fails. No, the 2105 Model 800 remote power control port fails with a host system remote power control cable that worked when connected to a different remote power control port. The problem is internal to the 2105 Model 800. Replace the remote power control card and remote power control to RPC remote switch cards cable until the problem is fixed. (The 2105 Model 800 can power on with only one RPC working, therefore the RPC cards are not included here.) When the problem is corrected go to MAP 1500: Ending a Service Action on page 67. 7. All host systems cannot power on the 2105 Model 800. Do step 3 on page 141 to set the 2105 Model 800 to local power mode. Attempt to power on using the operator panel local power switch. Does the 2105 Model 800 power on? v Yes, it only fails in remote power mode. Replace the remote power control card and remote power control to RPC remote switch cards cable until the problem is fixed. (The 2105 Model 800 can power on with only one RPC working, therefore the RPC cards are not included here.) When the problem is corrected go to MAP 1500: Ending a Service Action on page 67. v No, go to MAP 2400: 2105 Model 800 Local Power On Problems on page 149. 8. This tests more than one remote power control connector on the remote power control card in the tailgate. Unplug the remote power control cable from the

142

VOLUME 1, TotalStorage ESS Service Guide

MAP 2390: Remote Power On Problem


remote power control card connector and plug it into a different connector. Attempt to power on the 2105 Model 800 from the host system. Does it power on? v Yes, one or more host ports on the remote power control card are failing. Replace the following FRUs until the problem is fixed, remote power control card and remote power control to RPC remote switch cable. Then go to MAP 1500: Ending a Service Action on page 67. v No, continue with the next step. Isolate the failure to the 2105 Model 800 (not sending or receiving) the remote power control signal) or the host system (not receiving or returning the power controls signals). v Use step 3 on page 141 to power off the 2105 Model 800. v Both RPC cards supply to each remote power control card host port, between pin 1 and 2. Use a volt-meter to measure the voltage present at a free connector. Do pins 1 and 2 have +4.4v present? Yes, the voltage is leaving the 2105 Model 800, go to step 10. No, the +4.4v from both RPC cards is not reaching the remote power control card host port connectors. Replace the remote power control card and remote power control to RPC remote switch cards cable until the problem is fixed. When the problem is corrected go to MAP 1500: Ending a Service Action on page 67. Verify that the remote power control cable is plugged into the remote power control card. Verify that the host system is powered up and it has attempted to power on the attached devices. This should leave the hold line line active at +5v. Measure the voltage at the remote power control connector pin 5 that the cable is plugged into. Is +5v present? v Yes, go to step 11. v No, go to step 12. Measure the pick line voltage at the remote power control connector pin 5 that the cable is plugged into. The voltage will momentarily pulse when the host system requests the attached devices to power on. You may need a second person at the host system to create the power on condition. Is +5v momentarily present? v Yes, both needed signals are being returned to the 2105 Model 800. Replace the remote power control card and remote power control to RPC cards cable until the problem is fixed. (Only one RPC card is needed to power on the 2105 Model 800 and because there are two present, they are not part of the FRU group.) When the problem is corrected go to MAP 1500: Ending a Service Action on page 67. v No, continue with the next step. The 2105 Model 800 is sending the voltage but not receiving one or both signals needed to power on. The problem is either in the remote power control cable or the host system control of the signals. Use the host system documentation to verify that the host system is receiving the voltage and then returning the control signals back to the 2105 Model 800 If the host is returning the signals back, the remote power control cable may have one or more open lines. Correct the problem and then go to MAP 1500: Ending a Service Action on page 67.

9.

10.

11.

12.

Problem Isolation Procedures, CHAPTER 3

143

MAP 23B0: 2105 Expansion Enclosure Power Off Problem

MAP 23B0: 2105 Expansion Enclosure (Rack 2) Power Off Problem


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
To power off the 2105 Expansion enclosure, the following must occur: 1. The 2105 Model 800 (Rack 1) RPC cards must receive a power off request. This signal is from the 2105 Model 800 operator panel, if in Local mode, or from the remote power control card, if in Remote mode. 2. Each RPC card sends a power signal to the primary power supply (PPS) in the expansion enclosure that it is directly cabled to. The 2105 Expansion enclosure does not prevent the 2105 Model 800x (rack 1) from powering off. If rack the 2105 Model 800 will not power off, exit this MAP and begin at MAP 2440: Rack 1 Power Off Problem on page 157.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. You are here because the 2105 Expansion Enclosure will not power off. Does the 2105 Model 800 power off and the expansion enclosure remain powered on? v Yes, continue with the next step. v No, the 2105 Model 800 must be able to power off by itself. Exit this MAP and go to MAP 2440: Rack 1 Power Off Problem on page 157. 2. Verify that the control cables from the primary power supply (PPS) to the RPC are connected correctly. Check the J4 connector on the expansion enclosure primary power supply and the J2 slot 5 connector on the 2105 Model 800 RPC card. Reference the 2015 Expansion Enclosure procedure in the install Chapter 5 for more information and diagrams if needed. v If a problem is found and repaired, retry the operation that sent you here. v If the problem is not fixed, continue with the next step. 3. With the 2105 Model 800 powered on and ready, connect the service terminal. Use the Repair Menu, Show / Repair Problems Needing Repair option to repair any related power problems for the RPC in rack 1 or the PPS in rack 2. v If a problem is found and repaired, retry the operation that sent you here. v If the problem is not fixed, continue with the next step. 4. Check the operation of the 2105 Expansion Enclosure PPS connections to the 2105 Model 800 RPC cards. Momentarily press the operator panel Local Power switch to on (up). Observe both PPS status display in the expansion enclosure. They should both display the PPS code level with the repeated sequence 00-xx-yy (xx=code level, yy=PPS l.D.). Do both PPS display the code level sequence? v Yes, each RPC is correctly cabled to its PPS power supply in the expansion enclosure, continue with the next step. v No, go to step 6 on page 146.

144

VOLUME 1, TotalStorage ESS Service Guide

MAP 23B0: 2105 Expansion Enclosure Power Off Problem


Primary Power Supply

Rear View

Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY

Front View PPS Digital Status (two digits)

Figure 44. 2105 Primary Power Supply Locations (s009048)

5. The 2105 Model 800 will only power off if both of its PPSs power off. The PWR GOOD indicator on the PPS should be blinking slowly when the PPS is powered off and in standby mode. Standby mode is when the main output voltages are off, but the PPS internal logic voltages and line cord input voltages are still on. Press the 2105 Model 800 operator panel Local Power switch momentarily to off (down). Wait up to five minutes for the PPS PWR GOOD indicators to slow flash (indicates powered off to standby mode). Find the condition below that you have: v Both PPS PWR GOOD indicators are blinking slowly. The 2105 Model 800 powered off successfully. Return to the procedure that sent you here, or go to MAP 1500: Ending a Service Action on page 67. v Both PPS PWR GOOD indicators are on solid. Continue with the next step. v One PPS PWR GOOD indicator is on solid and the other is blinking slowly. One PPS powered off and the other did not. Do the following: a. Momentarily press the operator panel Local Power switch to on. This will cause both PPS to be powered on again. Wait until both PPS PWR GOOD indicators are on solid. This allows the working PPS power system to keep the 2105 Model 800 power on while the possible failing FRUs are replaced. b. Replace the following FRUs until both PPS power off from the operator panel Local Power switch.
Problem Isolation Procedures, CHAPTER 3

145

MAP 23B0: 2105 Expansion Enclosure Power Off Problem


The PPS that failed to power off The PPS to RPC card cable The RPC card for that PPS Use the service terminal Repair Menu, Replace a FRU option. Once the problem has been repaired, return to the procedure that sent you here or go to MAP 1500: Ending a Service Action on page 67. 6. The power system that did not display the code level is failing. The possible FRUs are the: v RPC card v PPS to RPC control cable v PPS A quick test of the RPC card is to unplug the PPS to RPC control cable from each RPC card (J2 slot 5 connector), swap them and plug them back in. Repeat the test to display the PPS code level. v If the other PPS fails, replace the RPC card. Remember to unswap the control cables at both ends. v If the original PPS still fails, then the RPC card is good. Continue with the next step. 7. With the control cables still swapped at the RPC card end, swap the PPS end of the control cables between both PPS in the expansion enclosure. Repeat the test to display the PPS code level. v If the original PPS fails, replace the PPS. If the other PPS fails, replace the control cable that is connected to it. Remember to unswap the control cables at both ends. v When the problem has been repaired, return to the procedure that sent you here or go to MAP 1500: Ending a Service Action on page 67.

MAP 23C0: Power Event Threshold Exceeded


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
When the 2105 is powered on and Ready, the microcode monitors various power boundaries such as the clusters and primary power supplies. If the microcode senses a power boundary power off and on several times in a short period of time, it creates a problem for it. The power cycles can be caused by the customer or service representative or a microcode recovery action. The problem is reported for conditions that the customer and service representative may not be aware of.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The problem details list one or more FRUs. These are FRUs that, with special microcode recovery conditions, could cause a power boundary within the 2105 to power off and on repeatedly. The power off and on could also have occurred due to special service actions being performed by the service representative or customer line cord power being powered off and on repeatedly.

146

VOLUME 1, TotalStorage ESS Service Guide

MAP 23C0: Power Event Threshold Exceeded


Has the customer or service representative been creating any repeated power off and on actions, or performing multiple/repeated power service actions to the 2105. v Yes, cancel this problem. Use the Main Service Menu, Utility Menu, Problem Log Menu, Change a Problem State v No, continue with the next step. 2. Look for and repair any related problems for the power system or clusters. This would include microcode problems that call for the next level of support. If there are none, call the next level of support.

MAP 23D0: RPC-2 Card Reporting PPS Battery Set Present


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
The 390 V battery set is connected to primary power supply 1 (PPS-1) and PPS-1 is connected to RPC-1. RPC-1 reports that the 390 V battery set is attached. Primary power supply 2 (PPS-2) is connected only to RPC-2, not to the 390 V battery set, and does not report battery attachment. With the problem calling this MAP, RPC-2 is falsely reporting that a 390 V battery set is attached to PPS-2. One of the following problems can cause this: v The cable from PPS-1 is connected to RPC-2 (for example, both PPS to RPC cables are cross connected). v The RPC card DIP switch address settings are not correct.

Isolation
Notes: 1. This problem is usually caused by the rack 1 or rack 2 PPS-1 being connected to RPC-2 (instead of RPC-1), or the RPC card DIP switches set incorrectly. 2. When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Find the ESCs from the related problem or logs to determine the action:
ESC Single problem with ESC=8462 and expansion rack installed within the last 4 weeks. All other conditions Action Go to MAP 2450: Crossed RPC Cables to Expansion Rack on page 160 Continue with the next step.

2. Verify that the following 2105 (rack 1) cables are connected correctly: v RPC-1 card (top RPC card) connector J2 port 6 is cabled to rack 1 primary power supply 1 (PPS-1) connector J4. v RPC-2 card (bottom RPC card) connector J2 port 6 is cabled to rack 1 primary power supply 2 (PPS-2) connector J4. After the cables are verified, find the condition that applies: v If both cables are connected correctly, continue with the next step. v If both cables are not connected correctly, the 2105 will have to be powered off to correct the problem.
Problem Isolation Procedures, CHAPTER 3

147

MAP 23D0: RPC-2 Card Reporting PPS Battery Set Present

Top View J2 Connectors RPC-1 2105 Model 800


RPC-1 Card
J2

2 1

6 5

16 15

3
RPC-2 6 5 16 15

2
RPC-2 Card
J2

5
J2-6 15

Rear View

1 J2-5

J4 PPS-2

J4 PPS-1

2105 Expansion Enclosure

Rear View

Figure 45. Rack Power Control Card Cable Locations (s009706)

3. If there is an expansion rack installed, verify that the following cables are connected correctly: v RPC-1 card (top RPC card) connector J2 port 5 is cabled to rack 2 primary power supply 1 (PPS-1) connector J4. v RPC-2 card (bottom RPC card) connector J2 port 5 is cabled to rack 2 primary power supply 2 (PPS-2) connector J4. Find the condition that applies: v If both cables are connected correctly, continue with the next step. v If both cables are not connected correctly, go to MAP 2450: Crossed RPC Cables to Expansion Rack on page 160. 4. Verify that address switches 1 and 2, on both RPC cards, are set correctly on both RPC cards: v Cluster 1, RPC-1 6 [Figure 46] or Cluster 2, RPC-2 7 Switch 1: - RPC 1 = On (switch to right ) - RPC 2 = Off (switch to left ) Switch 2: - RPC 1 = Off (switch to left )

148

VOLUME 1, TotalStorage ESS Service Guide

MAP 23D0: RPC-2 Card Reporting PPS Battery Set Present


- RPC 2 = On (switch to right )

RPC-1 Card

RPC-2 Card

RPC-1 Address Switches 0 1 1 0 1 2 0 1 3 0 1 4 0 1 5 1 6 0 0 1 RPC-2 Address Switches 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 3 4 5 6

Rear View

Figure 46. Rack Power Control Card Switch Locations (s009707)

After the switches are verified, find the condition that applies: v If the switches on both RPC cards are set correctly, call the next level of support for engineering assistance. v If the switches on both RPC cards are not set correctly, the 2105 will have to be powered off to correct the problem. v If the switches on only one RPC card are not set correctly, use the RPC card FRU replacement procedure to power off the RPC card before correcting switch settings. Use the service login Main Service Menu, Repair Menu, Replace a FRU menu, (Container = rsrack1).

MAP 23E0: Cluster Powered Off Unexpectedly


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Directions to replacement MAP.

Isolation
1. This MAP has been replaced by a new MAP, go to MAP 4670: Cluster Powered Off Unexpectedly on page 431.

MAP 2400: 2105 Model 800 Local Power On Problems


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Problem Isolation Procedures, CHAPTER 3

149

MAP 2400: Local Power On

Description
The 2105 Model 800 is not powering on properly. Only one of the two 2105 Model 800 power systems is needed to power on the 2105 Model 800. However, this MAP will require both power systems to be functioning.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. At the 2105 Model 800, are the Local /Remote power switches on both RPC switch cards [Figure 47] set to Local mode (down)? v Yes, go to step 3. v No, continue with the next step. 2. Set the RPC switch card Local/Remote or Local/Automatic switch to Local mode (down). (Remember to set the switches back to their original position when the repair is complete.) Attempt to power on the 2105 Model 800 using the 2105 Model 800 operator panel Local Power switch [Figure 49]. Does it power on? v Yes, the 2105 Model 800 only fails in remote power control mode. Go to MAP 2390: Rack 1 Power On Problem, Remote Mode on page 140. v No, continue with the next step.

RPC-1 Card

RPC-2 Card

AUTO or REMOTE

Address Switches 0 1 1 0 1 2 0 1 3 0 1 4 0 1 5 1 6 0 0 1 RPC Power Select Switch

7
LOCAL

Rear View

RPC-2 Switch Card

RPC-1 Switch Card

Figure 47. 2105 Model 800 RPC Local/Remote Switch Location (s009127)

3. Observe the primary power supply (PPS) to RPC control cables. v PPS-1 connector J4 to RPC-1 connector J2 slot 6. v PPS-2 connector J4 to RPC-2 connector J2 slot 6. Are both cables properly connected? v Yes, continue with the next step. v No, before reconnecting the cable, go to the PPS it should be connected to and set the input circuit breaker to the off position. The 2105 Model 800 RPC cards can stay powered on while the cable is connected. Connect the cable. Set the input circuit breaker to on (up), then attempt to power on the

150

VOLUME 1, TotalStorage ESS Service Guide

MAP 2400: Local Power On


2105 Model 800 again. If it still fails, continue with the next step. If it works, go to MAP 1500: Ending a Service Action on page 67. 4. Ensure each PPS input circuit breaker is set to on (up).
Primary Power Supply

Rear View

Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY

Front View PPS Digital Status (two digits)

Figure 48. 2105 Primary Power Supply Locations (s009048)

5. Observe each PPS UEPO PWR indicator. Is the indicator on? v Yes, continue with the next step. v No, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127 for status code 8 and perform the actions listed. 6. Ensure the 2105 Model 800 operator panel UEPO switch is set to on (up). 7. Observe the 2105 Model 800 PPS UEPO LOOP-STBY indicator. Is it on? v Yes, continue with the next step. v No, go to MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem on page 131. 8. Observe the PWR GOOD indicator. Is it slow blinking? v Yes, the PPS is in standby mode, waiting for a power on request. Continue with the next step. v No, replace the PPS. If the 2105 Model 800 still fails to power on, return to the beginning of this MAP. 9. Observe the PWR UNIT FAULT indicator.
Problem Isolation Procedures, CHAPTER 3

151

MAP 2400: Local Power On


Is it on? v Yes, use the PPS status code displayed to repair the problem. Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v No, continue with the next step. 10. Observe the PPS Status Code display. Is a status code displayed? v Yes, use the PPS status code displayed to repair the problem. Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. Return to the beginning of this MAP after the repair is complete. v No, continue with the next step. 11. Attempt to power on the 2105 Model 800. Press the 2105 Model 800 operator panel Local power control switch momentarily to the on position (up).
2105 Model 800 Unit Emergency

LOCAL L/R SWITCH

REMOTE

Local Power

Ready Cluster 1 Cluster 2 Power Complete Line Cord 1 Line Cord 2 Messages Cluster 1 Cluster 2 Front View Rear View

Front View

Figure 49. 2105 Model 800 Operator Panel Locations (s009422)

12. Observe each 2105 Model 800 PPS. Find the condition that now exists. v The PPS GOOD indicator is on solid which is normal operation. 390V output is being supplied to electronics cage and storage cage power supplies. The 2105 Model 800 should be powering on. If not, reenter the service guide with the new symptom(s). v A PPS status code is displayed. Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v The PPS GOOD indicator is still slow blinking. Continue with the next step. 13. Replace the PPS. Does it still fail? v Yes, continue with the next step. v No, go to MAP 1500: Ending a Service Action on page 67. 14. Replace the RPC card for the PPS that is slow blinking and then attempt to power on. (RPC-1 for PPS-1, RPC-2 for PPS-2) If the clusters are in Ready, use the service terminal FRU Replace menu option to replace the RPC card. If it still fails, call the next level of support. If it no longer fails go to MAP 1500: Ending a Service Action on page 67.

152

VOLUME 1, TotalStorage ESS Service Guide

MAP 2410: RPC Power Mode Switch Mismatch

MAP 2410: RPC Power Mode Switch Mismatch


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The microcode detected a mismatch between the setting of the power mode switches on the two RPC cards that lasted more than 10 seconds.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Do the following visual checks: v Observe DIP switch position 3 on each RPC card. v Observe the Local control switch (large white) on each RPC switch card (below the RPC cards.) Are the switches set the same? v Yes, the RPC code is detecting that one of the four switches appears to be failing. Go to step 4. v No, determine which RPC card, switch card, or both are set incorrectly, reference the instructions in Checking the 2105 Model 800 Switch Settings in chapter 5 of the Volume 2. Read the section for setting the RPC and switch card settings to match the power control feature the customer wants. Use the next step to change the switch settings. 2. To change the RPC card or switch card settings, you must power the FRU off using the Replace a FRU process. When the FRU is powered off, the switch settings can be changed. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU 3. After the replace a FRU action is complete, determine if the error has been repaired. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): v In the original problem, display the Last Occurrence date/timestamp field in the problem details. If it has not been updated, the problem has been fixed. v If there is no new related problem, the problem has been fixed. Has the problem been fixed? v Yes, verify that the original problem has been closed and that there are no other problems to be repaired. Use the following to do this. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem End of Call Status v No, one of the FRUs is failing even though the switches appear to be set correctly. Replace the RPC cards and RPC switch cards until the problem has been fixed using steps 2 and 3. 4. There are two actions you can take:
Problem Isolation Procedures, CHAPTER 3

153

MAP 2410: RPC Power Mode Switch Mismatch


a. Call the next level of support. They should be able to determine which switch is failing if they remote login or get a PE package. If the failing FRU is not one that is already fenced, they will provide a special procedure to replace it. Normal procedures cannot replace an RPC card when the other RPC is fenced. b. Replace the RPC card and its RPC Switch card for the RPC card called out in the problem. v If the problem details Last Occurrence timestamp field is not updated, the problem has been fixed. The problem should be closed and the Repair Menu, End Of Call Status option should be used. v If the problem details Last Occurrence timestamp field has been updated, the problem has not been fixed. Call the next level of support. The remaining RPC card and RPC Switch card need to be replaced using a special procedure. Normal procedures cannot replace an RPC card when the other RPC card is fenced.

MAP 2420: 2105 Expansion Enclosure Power On Problem


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
The 2105 Expansion Enclosure is not powering on properly from the 2105 Model 800. Only one of the two 2105 Expansion Enclosure power systems is needed to power on the 2105 Model 800. However, this MAP will require both power systems to be functioning.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Does the 2105 Model 800 this 2105 Expansion Enclosure is attached to power on? v Yes, continue with the next step. v No, go to MAP 2400: 2105 Model 800 Local Power On Problems on page 149. 2. Observe the 2105 Expansion Enclosure primary power supply (PPS) to 2105 Model 800 RPC control cables. v 2105 Expansion Enclosure PPS-1 connector J4 to 2105 Model 800 RPC-1 card connector J2 (Slot 5). v 2105 Expansion Enclosure PPS-2 connector J4 to 2105 Model 800 RPC-2 card connector J2 (Slot 5). Are both cables properly connected? v Yes, continue with the next step. v No, before reconnecting the cable, go to the 2105 Expansion Enclosure PPS it should be connected to and set the input circuit breaker to the off position. The 2105 Model 800 RPC cards can stay powered on while the cable is connected. Connect the cable. Set the input circuit breaker to on (up), then attempt to power the 2105 Expansion Enclosure on again. If it still fails continue with the next step.

154

VOLUME 1, TotalStorage ESS Service Guide

MAP 2420: 2105 Expansion Enclosure Power On Problem


3. Ensure each 2105 Expansion Enclosure PPS Main Line CB200 circuit breaker is set to on (up).
Primary Power Supply

Rear View

Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY

Front View PPS Digital Status (two digits)

Figure 50. 2105 Primary Power Supply Locations (s009048)

4. Observe each 2105 Expansion Enclosure PPS UEPO PWR indicator. Is the indicator on? v Yes, continue with the next step. v No, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127 for status code 8 and perform the actions listed. 5. Ensure the 2105 Expansion Enclosure operator panel UEPO switch is set to on (up). 6. Observe the 2105 Expansion Enclosure PPS UEPO LOOP-STBY indicator. Is it on? v Yes, continue with the next step. v No, go to MAP 2380: 2105 Expansion Enclosure (Rack 2) UEPO Problem on page 138. 7. Observe the PPS Good indicator. Is it slow blinking? v Yes, the PPS is in standby mode, waiting for a power on request. Continue at the next step. v No, replace the PPS. If the 2105 Expansion Enclosure still fails to power on, return to the beginning of this MAP.
Problem Isolation Procedures, CHAPTER 3

155

MAP 2420: 2105 Expansion Enclosure Power On Problem


8. Observe the PPS Fault indicator. Is it on? v Yes, use the PPS status code displayed to repair the problem. Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v No, continue with the next step. 9. Observe the PPS Status Code display. Is a status code displayed? v Yes, use the PPS status code displayed to repair the problem. Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. Return to the beginning of this MAP after the repair is complete. v No, continue with the next step. 10. Attempt to power on the 2105 Expansion Enclosure It is best to have the 2105 Model 800 in local power control mode. Ensure the Power Select switch on each 2105 Model 800 RPC card is in the Local position (down). Press the 2105 Model 800 operator panel Local power control switch momentarily to the on position. set to on (up). Note: Remember to return these switches to their original position after the repair is complete.
2105 Model 800 Unit Emergency

LOCAL L/R SWITCH

REMOTE

Local Power

Ready Cluster 1 Cluster 2 Power Complete Line Cord 1 Line Cord 2 Messages Cluster 1 Cluster 2 Front View Rear View

Front View

Figure 51. 2105 Model 800 Operator Panel Locations (s009422)

11. Observe each 2105 Expansion Enclosure PPS. Find the condition that now exists. v The PPS Pwr Good indicator is on solid which is normal operation. 390V output is being supplied to storage cage power supplies. The 2105 Expansion Enclosure should be powering on. If not, reenter the service guide with the new symptom(s). v A PPS status code is displayed. Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v The PPS Pwr Good indicator is still slow blinking. Continue at the next step. 12. Replace the PPS. Does it still fail? v Yes, continue with the next step.

156

VOLUME 1, TotalStorage ESS Service Guide

MAP 2420: 2105 Expansion Enclosure Power On Problem


v No, go to MAP 1500: Ending a Service Action on page 67. 13. Replace the RPC card for the PPS that is slow blinking and then attempt to power on. (RPC-1 for PPS-1, RPC-2 for PPS-2) If the 2105 Model 800 clusters are in Ready, use the service terminal FRU Replace menu option to replace the RPC card. If it still fails, call the next level of support. If it no longer fails go to MAP 1500: Ending a Service Action on page 67.

MAP 2430: One RPC Card Firmware Down Level


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
The firmware code in one RPC card is not at the latest level available.

Isolation
1. The firmware installed on the RPC card is down level from the latest available on the 2105 Model 800 LIC code library. The problem that sent you here displays the RPC card that is down level in the FRUs list. 2. Return to the service terminal and follow the displayed instructions to load the RPC code. Note: Do not press F3 to escape out of the problem. Do not use the LIC Menu options to update the RPC card firmware.

MAP 2440: Rack 1 Power Off Problem


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
The following must occur for the 2105 Model 800 to power off. Both RPC cards must receive a power off request. This is from the 2105 Model 800 operator panel if in Local or Automatic mode or from the remote power control card if in Remote mode. Both RPC cards must agree that they have received a power off request. If one RPC card is fenced (quiesced), the other card can power off the 2105 Model 800 without getting agreement. If a pinned data condition exists, the power off request will be ignored. The power off request will work after the pinned data condition is cleared.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Connect the service terminal to a cluster that will not power off. From the service terminal Main Service Menu, select: Utilities Menu Pinned Data Menu Display Pinned Data Are any volumes displayed with retryable, non-retryable or FC status?

Problem Isolation Procedures, CHAPTER 3

157

MAP 2440: Power Off Problem


v Yes, go to MAP 4520: Pinned Data and/or Volume Status Unknown on page 417. v No, continue with the next step. 2. This procedure will power off the 2105 Model 800 Ensure the customer is not using it. 3. Observe the 2105 Model 800 operator panel Line Cord and Cluster Message indicators. If the cluster Line Cord and Message indicators are blinking rapidly, a power off is already in progress. Wait for the power off to complete, this can take up to 5 minutes. If one or both line cord indicators are still on solid, the 2105 Model 800 cannot power off. Go to the next step. 4. The RPC switches control the source of the power off, either local from the operator panel or remote from a host system. Ensure the switches are set correctly for the power off being attempted. If the remote power control feature is installed (HDI card installed between the line cord connectors in the tailgate) use Table 27. If the remote power control feature is not installed, use Table 28. Note: Both RPC cards must have these two switches set the same.
Table 27. With Remote Power Feature Installed RPC Card Local/Remote Switch Setting Down (local power control) Up (remote power control) RPC card DIP switch position 3 setting (remote power control) On (to right) On (to right) Switches Set to Power Off From Operator panel local power switch Remote from attached host system

Table 28. Remote Power Feature Not Installed RPC Card Local/Remote Switch Setting Down (local power control) RPC card DIP switch position 3 setting (automatic power control) Off (to left) Switches Set to Power Off From Operator panel local power switch Operator panel local power switch

Up (automatic power control) Off (to left)

5. Are the RPC switches set to use the 2105 Model 800 operator panel Local Power switch? v Yes, continue with step 7 v No, continue with the next step. 6. Set the RPC card DIP switch position 3 to off (to left) for both RPC cards. Set the RPC switch card switch to local (down) for both RPC switch cards. Attempt to power off using the operator panel Local Power switch. Does the 2105 Model 800 power off now? v Yes, go to step 13 on page 160. v No, power off fails in both remote and local modes. Leave the switches set for Local mode. (After the problem is fixed, remember to set the switches back to remote mode.) Continue with the next step. 7. Connect the service terminal and use the Repair Menu, Show / Repair Problems Needing Repair option to repair any related power problems (PPS,

158

VOLUME 1, TotalStorage ESS Service Guide

MAP 2440: Power Off Problem


RPC, cluster). If a problem is found and repaired, retry the operation that sent you here. If no problems are found go to the next step. 8. Check the operation of the operator panel Local Power switch. Momentarily press the Local Power switch to on (up). Observe both PPS status display, they should display the PPS code level with the repeated sequence 00-xx-yy (xx=code level, yy=PPS I.D.). Do both PPS display the code level sequence? v Yes, the Local Power switch cables are connected to both RPC cards. Go to the next step. v No, the PPS that did not display the code level should have created a new problem. Use the Repair Menu, Show / Repair Problems Needing Repair option to repair the problem. If no related problem is found, go to MAP 24A0: PPS Power On Problem on page 165

Primary Power Supply

Rear View

Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY

Front View PPS Digital Status (two digits)

Figure 52. 2105 Primary Power Supply Locations (s009048)

9. The 2105 Model 800 will only power off if both PPS power off. The PWR GOOD indicator on the PPS will be slow blinking when the PPS is powered off to standby mode. Standby mode is when the main output voltages are off, but the PPS internal logic voltages and line cord input voltages are still on. Press the 2105 Model 800 operator panel Local Power switch momentarily to off (down). Wait up to 5 minutes for the PPS PWR GOOD indicators to slow flash (indicates powered off to standby mode). Find the condition that applies for you?
Problem Isolation Procedures, CHAPTER 3

159

MAP 2440: Power Off Problem


v Both PPS PWR GOOD indicators are slow blinking. The 2105 Model 8002105 Model 800 powered off successfully. Return to the procedure that sent you here, or go to MAP 1500: Ending a Service Action on page 67. v Both PPS PWR GOOD indicators are on solid. Continue with the next step. v One PPS PWR GOOD indicator is on solid and the other is slow blinking. One PPS powered off and the other did not. Ensure the PPS to RPC card cable and all RPC card cables are properly connected. Do the following: Momentarily press the operator panel Local Power switch to on. This will cause both PPS to be powered on again. Wait until both PPS PWR GOOD indicators are on solid. This allows the working PPS power system to keep the 2105 Model 800 power on while the possible failing FRUs are replaced. Replace the following FRUs until both PPS power off from the operator panel Local Power switch. The PPS that failed to power off, the PPS to RPC card cable, the RPC card for that PPS. Use the service terminal Repair Menu, Replace a FRU option. Once the problem has been repaired, return to the procedure that sent you here or Go to MAP 1500: Ending a Service Action on page 67. 10. Both RPC cards must agree with each other to power off the 2105 Model 800. Both RPC cards do not have to agree if one RPC card is already fenced (quiesced) either by a problem or by using the service terminal Utility menu options. 11. Use the service terminal Utility Menu, Resource Management Menu, Quiesce a Resource option to quiesce RPC-1. 12. Press the operator panel Local Power switch momentarily to off. Does the 2105 Model 800 power off? v Yes, one of the following FRUs is failing. RPC-1 card, 2105 Model 800 operator panel or RPC-1 to Operator Panel cable. Power the 2105 Model 800 on and then use the Repair Menu, Replace A FRU option to replace the FRUs until it powers off. Then go to MAP 1500: Ending a Service Action on page 67. v No, resume RPC-1 and repeat this procedure for RPC-2. If this does not repair the problem, call the next level of support. 13. If the customer wants to use local power control, the problem is no longer occurring. If the customer wants to use remote power control, go to MAP 2390: Rack 1 Power On Problem, Remote Mode on page 140 to repair any remaining problem with remote power control.

MAP 2450: Crossed RPC Cables to Expansion Rack


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
The RPC power control cables to the expansion rack have been detected as being crossed. Normally this would occur during the install of the expansion rack. It can be detected and reported up to 30 days after the expansion rack is installed.

Isolation
1. Observe the RPC card online green LED indicator beneath the DIP switches on each RPC card.

160

VOLUME 1, TotalStorage ESS Service Guide

MAP 2450: Crossed RPC Cables to Expansion Rack


Are both indicators on? v Yes, continue with the next step. v No, the RPC card with the LED off is fenced and there should be a related problem. Exit this MAP to repair that condition, and then return here to continue. 2. Does the cable from the expansion rack primary power supply (PPS-1) connector J4 go to the rack 1 RPC-1 card (top RPC card) connector J2 port 5? (See Figure 53.) v Yes, the cables are not cross connected. Call the next level of support to determine the cause of the false error. v No, continue with the next step.

Top View J2 Connectors RPC-1 2105 Model 800


RPC-1 Card
J2

16 5

1
~
~

15

P/N 18P4495 2 RPC-2 16 5 2

RPC-2 Card

J2

15

~
~

P/N 18P4495 15

Rear View

RPC-2 RPC-1 Switch Card Switch Card

1 J2-5

P/N 34L3085 P/N 34L3086 PPS-2 PPS-1

2105 Expansion Enclosure


Figure 53. RPC Card Cables (s009705)

Rear View

3. Correct the cable plugging so that it matches the description in step 2. Note: The expansion rack will stay powered up while this is done.
Problem Isolation Procedures, CHAPTER 3

161

MAP 2450: Crossed RPC Cables to Expansion Rack


4. A new problem will be created for each RPC card that has its expansion rack cable disconnected: v ESC=854D for RPC-1 card v ESC=854E for RPC-2 card

Close each problem. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 5. Complete the service action. From the service terminal Main Service Menu, select: Repair Menu End of Call Status

MAP 2460: Battery Set Charge Low


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 390V Battery Set did not reach full charge in 30 hours. An uncharged battery set will be charged at a high rate for up to 5 hours with a switched 750 ma current. Then at low rate for up to 25 hours with a constant 750 ma current. It then begins a trickle charge.

Isolation
1. Ensure the circuit breaker on the master battery (under PPS -1) is set to on. 2. Ensure the cable between the master and slave battery is connected. 3. Ensure both cables between the master battery and PPS 1 are connected. 4. The 03 will automatically go blank when the battery set reaches full charge in not more than 30 hours. The 03 is always displayed for 5 minutes (PPS code level 20 or greater) when PPS 1 powers. Then the battery charge level is checked. 5. Wait up to 30 hours for the batteries to reach full charge.

MAP 2470: Battery Set Detection Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The battery has a low charge or PPS 1 has detected a battery fault condition. If code 03 is displayed, the battery is low and is charging. A battery that is discharged can require up to 25 hours to become fully charged. The system will report a permanent battery failure if the condition persists beyond the normal charge time. If code 04 is displayed, a battery failure is indicated. This can be a false condition generated during replacement of PPS 1 or the battery. Notes: 1. If the battery set is the FRU, both halves of the battery must be replaced at the same time.

162

VOLUME 1, TotalStorage ESS Service Guide

MAP 2470: Battery Set Detection Problem


2. A battery that has voltage less than 275 volts will not be recognized by PPS-1 when PPS-1 is powered on. The PPS cannot charge a battery with less than 275 volts, a battery in this state cannot be used by the 2105.

Isolation
Notes: 1. If the 390V Battery FRU was replaced, the new initial charge date must be entered. The service login FRU replacement process should prompt you to enter the date. If you are not prompted, use the Main Service Menu, Utility Menu, Battery Menu, Update Battery Charge Date option. 2. When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. Procedural Steps 1. If the RPC-1 (R1G1) card has not been quiesced, quiesce it now. From the service terminal Main Service Menu, select: Use the Utility Menu Resource Management Menu 2. Quiesce a Resource Do the following in the listed order to ensure that PPS 1 has been reset properly: a. Switch off the PPS 1 input circuit breaker (CB00). b. Switch off the battery set circuit breaker. c. Disconnect the PPS 1 to PPS 2 communication cable from the PPS 1, J3 connector. d. Verify that the cable between the two halves of the battery set is connected. (battery 1, J2 to battery 2, J1 connectors) e. Verify that both cables between PPS 1 and the battery set are connected. (PPS 1, J5A to battery 1 J1A and PPS 1, J5B to battery 1 J1B connectors) f. Wait 10 seconds. g. Read the following Attention before continuing. Connect the PPS 1 to PPS 2 communication cable to the PPS 1, J3 connector. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS. h. Switch on the battery set circuit breaker.

i. Switch on the PPS 1 input circuit breaker. 3. Resume the RPC-1 (R1G1) card using the menu option in Step 1. 4. Press the 2105 Model 800 operator panel Local power switch momentarily to the On position. 5. Is code 04 still displayed? v Yes, continue with the next step. v No, go to step 9 on page 164. One of the following FRUs is failing: v 390 V battery set (both batteries are replaced at the same time)
Problem Isolation Procedures, CHAPTER 3

163

MAP 2470: Battery Set Detection Problem


v v v v PPS 1 power supply (connected to the battery) Battery signal cable between PPS 1 connector J5B and battery Battery power cable between PPS 1 connector J5A and battery Battery to battery cable

The 04 status display shows the current condition. It will reset to blank as soon as the failing FRU is replaced. 6. The FRU(s) must be replaced using the service login. See 390 V Battery Set Removal and Replacement, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. v If the FRUs are listed in a problem, use the problem to select the FRUs and continue the repair. v If the FRUs are not listed in a problem, use the Main Service Menu, Repair Menu, Replace a FRU menu options to replace the FRUs. 7. If code 03 is displayed, the battery is charging. Wait up to 30 hours. If code 03 is still displayed, the battery is not being charged. The possible failing FRUs are the PPS, battery set, and battery cables. Use the service terminal Repair Menu, Replace a FRU, Power Cooling FRUs menu options. 8. When the repair is complete go to MAP 1500: Ending a Service Action on page 67. 9. Was the ESC in the problem an 8526, 8528, or 8531? v Yes, continue with the next step. v No, the failure is no longer occurring, exit this MAP and return to the procedure that sent you here. 10. Test the PPS to verify it has recognized the battery and will be able to charge and use it when needed. Switch off the battery set circuit breaker. Is code 04 displayed? v Yes, switch on the battery set circuit breaker, exit this MAP and return to the procedure that sent you here. v No, the PPS does not recognize the battery. Continue with the next step. 11. Replace one or more of the following FRUs. Repeat step 10 after each FRU replacement, until code 04 is displayed, and you can answer Yes to the question. The possible failing FRUs are: v Battery set v PPS-1 v Power and signal cables between PPS-1 and the battery set

MAP 2490: PPS Input Phase Missing


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The model number of this 2105 Model 800 requires that all PPS have three phase input power. This allows for maximum power output. If single phase input power is used, only 60% of maximum power output is available.

164

VOLUME 1, TotalStorage ESS Service Guide

MAP 2490: PPS Input Phase Missing

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The PPS powered up and detected single phase input power when it should have three phase input power. If the three phase input power had dropped to single phase after power up, a PPS status code 07 would be displayed. 2. Use the service guide install Chapter 5 procedures to check the customer input to the PPS line cords. Use the service terminal Repair Menu, Replace a FRU, Rack Power Cooling FRUs option to prepare the PPS to be powered off for the power checks. The PPS line cord will need to be disconnected from the customer power source. 3. When the problem is repaired go to MAP 1500: Ending a Service Action on page 67.

MAP 24A0: PPS Power On Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Each time the 2105 Model 800 operator panel Local power switch is momentarily pressed to on (up), the primary power supply (PPS) status display should display a sequence of 2 characters codes. If it does not. either the PPS is not providing power to its RPC card, the RPC card is not sending a power on request to the PPS or the PPS itself is failing.

Problem Isolation Procedures, CHAPTER 3

165

MAP 24A0: PPS Power On Problem


Primary Power Supply

Rear View

Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY

Front View PPS Digital Status (two digits)

Figure 54. 2105 Primary Power Supply Locations (s009048)

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Switch the failing PPS input circuit breaker to off. Unplug the PPS to PPS communication cable from the J3 connector. (This removes both power sources from the PPS logic.) 2. Read the following Attention before continuing. Plug the cable back to the J3 connector. Switch the input circuit breaker to on. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS. 3. Observe the PPS UEPO PWR indicator. Is the indicator on solid? v Yes, the PPS has customer line cord input power. Go to the next step. v No, either the customer line cord power is off or the PPS is failing. Use the instructions in Check the Customers Circuit Breaker with the Power On in

166

VOLUME 1, TotalStorage ESS Service Guide

MAP 24A0: PPS Power On Problem


chapter 5 of the Volume 2, to measure the input voltages. If the input voltage is present, replace the PPS. Use the service terminal Repair Menu, Replace a FRU option. Observe the PPS UEPO Loop Stby indicator. Is the indicator on solid? v Yes, the UEPO is working correctly. Go to the next step. v No, the UEPO is not working correctly. Go to MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem on page 131. Observe the PPS PWR GOOD indicator. Find the indicator condition you have. v On solid. The PPS powered on without a power on request from the 2105 Model 800 operator panel local power switch (while in local power mode). Replace the following FRUs until this no longer occurs. Failing PPS, the RPC card for this PPS, the RPC card for the other PPS. Use the service terminal Repair Menu, Replace a FRU option. If it still fails, call the next level of support. v Slow flashing. The PPS is in the expected standby mode. Go to the next step. v Off. Replace the failing PPS. Use the service terminal Repair Menu, Replace a FRU option. Press the 2105 Model 800 operator panel Local power control switch momentarily to the on position (up). A sequence of 2 character status codes should be displayed and then the PPS PWR GOOD indicator should be on solid. Find the indicator condition you have. v No status codes displayed, PWR GOOD indicator on. The PPS is powered on properly but is not displaying progress codes. Replace the failing PPS. Use the service terminal Repair Menu, Replace a FRU option. v No status codes displayed, PWR GOOD indicator off, continue with the next step. v Status code displayed, PWR GOOD indicator off. Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. Look up each status code and repair the one that indicates a failure. v Status code displayed, PWR GOOD indicator on. The PPS powered on normally, return to the original procedure that sent you here or go to MAP 1500: Ending a Service Action on page 67. Replace the following FRU until status codes are displayed. Failing PPS, RPC card, operator panel local power control, PPS to RPC cable, RPC to operator panel cable. Use the service terminal Repair Menu, Replace a FRU option.

4.

5.

6.

7.

MAP 24B0: 2105 Cannot Power Off, Pinned Data


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
When a pinned data condition occurs a problem is created. The power control microcode will not allow the 2105 Model 800 to power off until after the pinned data condition is repaired. The attempt to power off also disables all the host system interfaces. The functional code has been stopped so it is not possible to query and repair the condition that caused the pinned data. The 2105 power must be forced off using the operator panel red UEPO switch (causes a firehouse dump). Then it
Problem Isolation Procedures, CHAPTER 3

167

MAP 24B0: 2105 Cannot Power Off, Pinned Data


must be powered back on using the operator panel white switch. This will leave the 2105 in a condition that normal service procedures can be used to correct the pinned data condition.

Isolation
1. An attempt to power off the 2105 Model 800 failed because a pinned data condition already existed or was found during the power off destage of data. 2. Force the 2105 to power off. Use the operator panel red UEPO switch. 3. Power the 2105 on using the operator panel white switch 4. Correct the pinned data condition. Go to MAP 4520: Pinned Data and/or Volume Status Unknown on page 417.

MAP 24F0: Both RPC Cards Firmware Down Level


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
The firmware code in both RPC cards is not at the latest level available.

Isolation
1. The firmware installed on both RPC cards is down level from the latest available on the 2105 Model 800 LIC code library. Close the problem that sent you here. LIC Activation requires that there be no open problems needing repair, all problems must be in closed or cancelled state to continue. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Multiple LIC Activation (Concurrent option) Note: Do not use the Licensed Internal Code Maintenance Menu, Firmware LIC menu option. 2. Go to Go to: MAP 1500: Ending a Service Action on page 67.

MAP 2520: PPS Output Circuit Breaker Tripped


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A tripped PPS output circuit breaker will normally display a status code 13 but in some cases a status code of 10 may appear. An overcurrent condition can cause this. The loads connected to this circuit breaker will be disconnected until the problem is isolated.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Ensure that the circuit breaker (CB) is still tripped. 2. Disconnect the power cable from the connector beneath the tripped CB.

168

VOLUME 1, TotalStorage ESS Service Guide

MAP 2520: PPS Output Circuit Breaker Tripped


3. Reset the CB to on (up). Does the CB trip? v Yes, replace the PPS. Use the service terminal, Repair Menu, Replace a FRU menu options. v No, continue with the next step. 4. Disconnect the other ends of the power cable. Each power cable supplies the input for up to three power supplies. Manually trace the power cable from the PPS to the other power supplies. Observe each power supply input indicator to ensure the input power is already missing before disconnecting the power cable. 5. Reconnect the PPS power cable beneath the tripped CB. Rest the CB to on (up). Does the CB trip? v Yes, replace the power supply cable and then repeat this step. v No, continue with the next step. 6. Reconnect the power cable to one power supply input and then set the CB to the on position (up). Does the CB trip? v Yes, replace the power supply that was just connected. Use the service terminal, Repair Menu and Replace a FRU menu options. If the CB still trips, the PPS CB itself might be failing under light load. Replace the PPS. v No, repeat this step until all the power supplies are connected and the CB no longer trips. Then use the service terminal Repair Menu, End of Call Status menu option.

MAP 2600: RPC Card Cannot Reset a Power Fault


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The RPC card in the problem is reporting a power fault or event, that cannot be reset.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Use the problem or the Repair Menu, Replace a FRU option to do a dummy repair of the RPC card called out in the problem. This will reset the RPC card and may correct the problem. A dummy repair requires you to remove and reinstall the original RPC card using the normal repair process. Was the dummy repair successful? v Yes, the problem has been fixed, use the Repair Menu, End of Call Status option to complete the service action. v No, the problem is still occurring. Repeat the repair and replace the failing RPC card with a new FRU. If the repair is not successful, call the next level of support. The problem could be a failure external to the RPC card or a microcode problem.

Problem Isolation Procedures, CHAPTER 3

169

MAP 2700: CEC Drawer Power On Problem

MAP 2700: CEC Drawer Power On Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster CEC drawer power on is controlled from the I/O drawer through the system power control network (SPCN). The SPCN interface signals are in the VS/COMM and the JTAG cables between I/O and CEC drawers.

Isolation
1. Observe the I/O drawer power indicator LED on the upper left of the CEC drawer operator panel. Is the I/O drawer power indicator LED on solid? v Yes, continue with the next step. v No, go to MAP 4880: Cluster Power On Problem on page 461. 2. Observe the CEC drawer power indicator LED on the front lower left of the CEC drawer. Select the LED condition that applies: v Off solid, go to step 5. v Blinking slowly, CEC drawer is in standby waiting for a power on signal through the SPCN interface. Continue with the next step. v On solid, the CEC drawer is powered on. Exit this MAP. 3. Verify that the VS/COMM and JTAG cables are connected correctly at the CEC drawer and the I/O drawer. Are both cables connected correctly? v Yes, continue with the next step. v No, power off the cluster, connect the cables correctly then power on the cluster: If the CEC drawer powers on, exit this MAP. If the CEC drawer does not power on, return to step 1. 4. Power off the cluster using the service terminal Alternate Cluster Repair Menu options, see Cluster Power On and Off Procedures, 2105 Model 800 in chapter 4 of the Volume 2. Put the CEC drawer in the service position with the cover opened, seeCEC Drawer Service Position Procedure, 2105 Model 800 and CEC Drawer Top Service Access Procedure, 2105 Model 800 in chapter 4 of the Volume 2. Ensure the flat ribbon cable from the fan controller card to the CEC drawer planar assembly is connected correctly. Verify that the flat ribbon cable from the power planar to the CEC planar on the CEC drawer planar assembly is connected correctly. Are both cables connected correctly? v Yes, continue with the next step. v No, connect the cables correctly, close the top cover, and then attempt to power on the cluster: If the CEC drawer powers on, the I/O drawer power indicator is on solid, exit this MAP. If the CEC drawer does not power on, return to step 1. 5. The CEC drawer is not able to create standby power. Observe the CEC drawer power supply input power indicator LEDs.

170

VOLUME 1, TotalStorage ESS Service Guide

MAP 2700: CEC Drawer Power On Problem


Is at least one CEC drawer power supply input power LED on? v Yes, continue with the next step. v No, the CEC drawer power supplies are not receiving input power. Verify that the CEC drawer power supply input power cables are connected. Verify that all of the cables are plugged into the PPS. Observe the two digit PPS status display between the front cooling fans. If a code is displayed, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. Display and repair any related problem for the PPS. If a code is not displayed, momentarily press the operator panel local power switch to the on position (up). Wait 45 seconds for the power on status codes to stop. If a code is now set (not changing), go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. If a status code is not displayed, call the next level of support. 6. Before replacing any FRUs, determine if one of the CEC drawer power supplies is preventing the other supply from powering on. a. Switch off one of the CEC drawer power supplies by disconnecting its input power cables. b. Unplug the power supply from the CEC drawer by sliding it out until it is free. c. Attempt to power on the drawer. Does the CEC drawer power on? v Yes, replace the failing CEC drawer power supply. v No, reinstall the CEC drawer power supply then connect the input power cables. Wait 2 minutes and then repeat this step for the other power supply. If the drawer still fails to power on, continue with the next step. 7. Only one CEC drawer power supply is needed to power on the CEC drawer. The possible failing FRUs, not in order of probability, are: v CEC drawer planar assembly v V/S COMM cable between the CEC and I/O drawers v JTAG cable between the CEC and I/O drawers v I/O drawer planar assembly assembly

MAP 2800: CEC or I/O Drawer Visual Power Supply Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Each CEC drawer and I/O drawer power supply has two power inputs, one from each primary power supply (PPS). Either input can supply all of the input power needed for the drawer power supplies to operate. This allows both CEC and I/O drawer power supplies to operate when one PPS is powered off from a failure or service action.

Isolation
1. Verify that the 2105 is already powered on. Use the following table to find and repair your visual symptom:

Problem Isolation Procedures, CHAPTER 3

171

MAP 2800: CEC or I/O Drawer Visual Power Supply Problem


Table 29. CEC or I/O Drawer Visual Power Supply Problems PWR 1 LED Off PWR 2 LED Off CHK/PWRGOOD LED Off Description and Action Description: Normal when 2105 is powered off. If the 2105 is powered on, do the following Action. Action: Observe the power supply next to the failing power supply: v If the other power supply has the same visual condition, both power supplies are not receiving input power from both primary power supplies (PPS). Verify that the input power cables are connected. The input power cables supply power to the CEC drawer, I/O drawer, and host bay drawer power supplies for the cluster. If all the drawers are powered off, verify that the power cables are connected at the PPS output connectors. v If the other power supply has the power LEDs on, it should be supplying logic voltage to the failing power supply. The failing power supplies amber check LED should be lit. Replace the failing power supply. On On On green Description: Normal when 2105 is powered on. Action: None On On Blinking green Description: Normal when 2105 is powering on. Standby power is being supplied to the drawers. Once the service processor bring up is successful, the RPC cards will request the power supply to provide normal power. LED changes to on green. Action: If the CHK/PWR Good LED hangs in blinking green for more than 5 minutes, go to MAP 4880: Cluster Power On Problem on page 461. Off On Blinking green Description: Two possible error conditions combined. Action: Read the description for On | On | blinking green in this table. If the CHK/PWR Good LED hangs in blinking green for more than 5 minutes, go to MAP 4880: Cluster Power On Problem on page 461. If the CHK/PWR Good LED changes does not hang in blinking green use this table to lookup the new condition. On Off Blinking green Description: Two possible error conditions combined. Action: Read the description for On | On | blinking green in this table. If the CHK/PWR Good LED hangs in blinking green for more than 5 minutes, go to MAP 4880: Cluster Power On Problem on page 461. If the CHK/PWR Good LED changes does not hang in blinking green use this table to lookup the new condition.

172

VOLUME 1, TotalStorage ESS Service Guide

MAP 2800: CEC or I/O Drawer Visual Power Supply Problem


Table 29. CEC or I/O Drawer Visual Power Supply Problems (continued) PWR 1 LED Off PWR 2 LED Off CHK/PWRGOOD LED On Amber Description and Action Description: J1 and J2 input voltage not detected. Voltage from the power supply it is next to is controlling the amber CHK/PWR GOOD indicator. Action: Verify that the 2105 is powered on. v Verify that each input power cable is properly plugged at both ends (drawer power supply and PPS). v Observe the primary power supply status display (front of PPS). If it displays a code, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v If the PPS status display is blank, use the service terminal login to display and repair any related power problems. v If there are no related problems, the possible FRUs are: CEC or I/O drawer power supply PPS CEC or I/O drawer power supply input power cable Use the service terminal Repair Menu, Replace a FRU menu options. On Off On Amber Description: J1 input voltage not detected. Action: v Verify that the input power cable is plugged correctly at both ends (drawer power supply and PPS). v Observe the primary power supply status display (front of PPS). If it displays a code, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v If the PPS status display is blank, use the service terminal login to display and repair any related power problems. v If there are no related problems, the possible FRUs are: CEC or I/O drawer power supply PPS CEC or I/O drawer power supply input power cable Use the service terminal Repair Menu, Replace a FRU menu options.

Problem Isolation Procedures, CHAPTER 3

173

MAP 2800: CEC or I/O Drawer Visual Power Supply Problem


Table 29. CEC or I/O Drawer Visual Power Supply Problems (continued) PWR 1 LED Off PWR 2 LED On CHK/PWRGOOD LED On Amber Description and Action Description: J2 input voltage not detected. Action: v Verify that the input power cable is plugged correctly at both ends (drawer power supply and PPS). v Observe the primary power supply status display (front of PPS). If it displays a code, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v If the PPS status display is blank, use the service terminal login to display and repair any related power problems. v If there are no related problems, the possible FRUs are: CEC or I/O drawer power supply PPS CEC or I/O drawer power supply input power cable Use the service terminal Repair Menu, Replace a FRU menu options. On On On Amber Description: Error condition Action: Replace the failing CEC or I/O drawer power supply.

MAP 2810: Host Bay Drawer Visual Power Supply Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Each host bay power supply has two power inputs, one from each primary power supply (PPS). Either input can supply all of the input power needed for the host bay power supplies to operate. This allows both host bay power supplies to operate when one PPS is powered off from a failure or service action.

Isolation
1. Verify that the 2105 is already powered on. Use the following table to find and repair your visual symptom:
Table 30. Host Bay Drawer Visual Power Supply Problems PWR 1 LED Off PWR 2 LED Off HA 1 LED Off HA 2 LED Off Description: Normal when 2105 is powered off. Action: None On On On On Description: Normal when 2105 is powered on. Action: None Description and Action

174

VOLUME 1, TotalStorage ESS Service Guide

MAP 2810: Host Bay Drawer Visual Power Supply Problem


Table 30. Host Bay Drawer Visual Power Supply Problems (continued) PWR 1 LED On PWR 2 LED On HA 1 LED Off HA 2 LED Off Description and Action Description: Input voltage is detected, but output voltage is switched off. Output voltage is controlled by the power supply switch or the service terminal power control for service actions. Action: Verify that the host bay power supply switch is set to ON. v Use the service terminal login to display and repair any related power problems. v If there are no related problems, replace the host bay power supply using the service terminal Repair Menu, Replace a FRU menu options. On On On Off Description: Normal input voltage is detected, one of the two output voltages is switched off. One output voltage is normal switched off when the host bay it powers is being serviced. Action: Verify that a service action for the host bay drawer for the output is not in progress. On On Off On v Use the service terminal login to display and repair any related power problems. v If there are no related problems, replace the host bay power supply using the service terminal Repair Menu, Replace a FRU menu options. On Off On On Description: J1 or J2 input voltage not detected. Action: v Verify that the input power cable is plugged correctly at both ends (host bay power supply and PPS). v Observe the primary power supply status display (front of PPS). If it displays a code, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. Off On On On v If the PPS status display is blank, use the service terminal login to display and repair any related power problems. v If there are no related problems, the possible FRUs are: Host bay power supply PPS Host bay power supply input power cable Use the service terminal Repair Menu, Replace a FRU menu options.

Problem Isolation Procedures, CHAPTER 3

175

MAPs 3XXX: SSA DASD DDM Bay Isolation Procedures

MAPs 3XXX SSA DASD DDM Bay Isolation Procedures


Procedures in the MAP 3XXX group in Chapter 3 cover the SSA DASD in the 2105 Model 800, and 2105 Expansion Enclosure.

Using the SSA DASD Maintenance Analysis Procedures (MAPs)


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. These maintenance analysis procedures (MAPs) describe how to analyze a continuous failure that has occurred in a DDM bay. Failing field-replaceable units (FRUs) of the DDM bay can be isolated with these MAPs. Attention: With all FRU replacements, the Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, then use that method first. v If the FRU is not listed or selectable in the problem, then use the Repair Menu/Replace a FRU option. To locate a DDM bay in a 2105, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate a FRU in a DDM bay in a 2105, see DDM Bay, Component Physical Location Codes in chapter 7 of the Volume 3. To isolate the FRUs in the failing DDM bay, do the actions and answer the questions given in these MAPs. See DDM Bay Indicators on page 21 for locations and descriptions of the indicators and switches. Attention: Do not power off the 2105 rack or DDM bays unless instructed to do so. Attention: If all steps in these MAPs have been followed, and verification of the repair is still unsuccessful, call the next level of support. Attention: Disk drive modules are fragile. Handle them with care, and keep them well away from strong magnetic fields.

MAP 3000: Isolating an SSA Link Error Between Two DDMs


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs).

Description
The SSA link between two adjoining disk drive modules (DDMs) is failing. The failing link is between two adjoining DDMs, on the same backplane, in the same left or right group of four DDMs. See Figure 55 for the relationship of the DDM and backplane FRUs involved with this failure. v DDM locations in DDM bay, two adjoining DDMs in DDM bay positions 1 to 8

176

VOLUME 1, TotalStorage ESS Service Guide

MAP 3000: SSA Link Error Between Two DDMs

DDM Bay Backplane

DDM

DDM

Figure 55. SSA Link Failure, Two Adjoining DDMs (s009440)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Review if any other problems (pending or open) have a single DDM as the FRU. Besides the problem you are working on, are there any other pending or open problems with a single DDM as the FRU? v Yes, go to step 3. v No, go to step 4. 3. Compare the single DDM FRU in the pending or open problem with the DDMs in the problem you are working on. Is the DDM in the open or pending problem the same as one of the DDMs in the problem you are working on? v Yes, repair the problem with the single DDM FRU first, it should fix the problem you are working on. v No, go to step 4. 4. Replace the first of the two DDMs displayed on the service terminal, then verify the repair. Note: If the amber check indicator on one of the two DDMs is on, replace that DDM first, see DDM Bay Disk Drive Module Indicators on page 23. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 5. 5. Replace the second DDM displayed on the service terminal with the DDM removed in step 4, then verify the repair. Note: The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to completed. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time can be many hours. Sparing time varies with system usage and the storage capacity of the DDM being spared.
Problem Isolation Procedures, CHAPTER 3

177

MAP 3000: SSA Link Error Between Two DDMs


Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing go to step 6. 6. Replace the DDM bay frame assembly displayed on the service terminal, then verify the repair. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, call the next level of support.

MAP 3010: Isolating a Degraded SSA Link between Two DDMs


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The 40 MBs per second SSA link, between two adjoining disk drive modules (DDMs) is degraded and is running at 20 MBs per second. The degraded link is between two adjoining DDMs, on the same backplane. See Figure 56 for the relationship of the DDM and backplane FRUs involved with this failure. v DDM locations in DDM bay, two adjoining DDMs in DDM bay positions 1 to 8

DDM Bay Backplane

DDM

DDM

Figure 56. SSA Link Failure, Two Adjoining DDMs (s009440)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Replace the first of the two DDMs displayed on the service terminal, then verify the repair.

178

VOLUME 1, TotalStorage ESS Service Guide

MAP 3010: Degraded SSA Link Between Two DDMs


Note: If the amber check indicator on one of the two DDMs is on, replace that DDM first, see DDM Bay Disk Drive Module Indicators on page 23. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to step 3. 3. Replace the second DDM displayed on the service terminal with the DDM removed in step 2 on page 178, then verify the repair. Note: The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to complete. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time can be many hours. Sparing time varies with system usage and the storage capacity of the DDM being spared. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to step 4. 4. Replace the DDM bay frame assembly displayed on the service terminal, then verify the repair. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, call the next level of support.

MAP 3050: Isolating an SSA Link Error Between a DDM and an SSA Device Card
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
An SSA link failed between a DDM and the SSA device card. The failing FRU is either a center DDM, a passthrough or bypass card, a SSA device cable, or an SSA device card. See Figure 57 for the relationship of the DDM, passthrough or bypass card, backplane, SSA device cable and SSA device card FRUs involved with this failure.

Problem Isolation Procedures, CHAPTER 3

179

MAP 3050: SSA Link Error Between a DDM and an SSA Device Card

SSA Device Cable SSA Device Cable Bypass Card SSA Device Card

DDM Bay - A DDM Bay Backplane

Passthrough Card

Passthrough Card

DDM

DDM Bay - B

DDM Bay Backplane

Figure 57. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008041l)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Review if any other problems (pending or open) have a single DDM as the FRU. Besides the problem you are working on, are there any other pending or open problems with a single DDM as the FRU? v Yes, go to step 3. v No, go to step 4. 3. Compare the single DDM FRU in the pending or open problem with the DDM in the problem you are working on. Is the DDM in the open or pending problem the same as the DDM in the problem you are working on? v Yes, repair the open or pending problem with the single DDM FRU first, it should fix the problem you are working on. v No, go to step 4. 4. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA and cables are connected correctly, go to step 5. v No, continue with step 7 on page 181. 5. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. See Locating an SSA Cable. Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors, go to step 6. v No, go to step 7 on page 181. 6. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable you just connected correctly. Proceed

180

VOLUME 1, TotalStorage ESS Service Guide

MAP 3050: SSA Link Error Between a DDM and an SSA Device Card
through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, go to step 7. 7. Locate the SSA cables displayed on the service terminal as possible FRUs. For this isolation procedure, one of the the SSA cables is connected between a DDM bay and an SSA device card. The other SSA cable is connected between the same DDM bay and another DDM bay. The service terminal will identify the DDM bay and its SSA connector, and the SSA device card and its SSA connector. Note: To locate a DDM bay see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 58. To locate an SSA device card cable connector, see Figure 59 on page 182.

Figure 58. DDM bay SSA Connectors (S007693l)

Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2 Use the figure below to locate an SSA device card cable connector.

Problem Isolation Procedures, CHAPTER 3

181

MAP 3050: SSA Link Error Between a DDM and an SSA Device Card

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 59. Cluster SSA Device Card Connector Locations (s009166)

a. Disconnect one of the two SSA device cables shown in Figure 57 on page 180, and listed in the Problem FRU list. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Reconnect both ends of the SSA device cable, ensure good connection. c. Run the repair verification. Select one cable from the Problem FRU list and follow the repair process and verification without actually replacing the cable. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, select one of the following. If you have inspected only one cable, repeat the above steps on the second cable, If you have inspected both cables, go to step 8. 8. Locate DDM bay A, it may be in the front or rear of the 2105. Observe all of the DDM bay, DDM Ready and Check indicators. See Figure 60 on page 183. Are any of the DDM bay DDM indicators on? v Yes, go to step 9. v No, there is a DDM bay power problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261. 9. Locate DDM bay B, it may be in the front or rear of the 2105. Observe all of the DDM bay, DDM Ready and Check indicators. Are any of the DDM bay DDM indicators on? v Yes, go to step 10 on page 183.

182

VOLUME 1, TotalStorage ESS Service Guide

MAP 3050: SSA Link Error Between a DDM and an SSA Device Card
v No, there is a DDM bay power problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261.

Figure 60. DDM bay DDM Indicator Locations (S008021l)

10. Replace the DDM displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 11. 11. Replace SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 12. 12. Replace the passthrough cards displayed on the service terminal. Replace these cards one at a time, see Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. After each card is replaced, verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, If all of the cards shown in Figure 57 on page 180, have been replaced, go to step 13. If all of the cards shown in Figure 57 on page 180, have NOT been replaced, repeat this step until all of the cards have been replaced. 13. Replace one of the two SSA device cables displayed on the service terminal FRU list, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing. If both of the SSA device cables shown in Figure 57 on page 180, have been replaced, go to step 14 on page 184.
Problem Isolation Procedures, CHAPTER 3

183

MAP 3050: SSA Link Error Between a DDM and an SSA Device Card
If both of the SSA device cables shown in Figure 57 on page 180, have NOT been replaced, repeat this step until all of the cables have been replaced. 14. Replace the DDM bay frames displayed on the service terminal, one at a time: v DDM bay Frame assembly, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing. If all of the backplanes shown in Figure 57 on page 180, have been replaced, the SSA link is still failing, call the next level of support. If all of the backplanes shown in Figure 57 on page 180, have NOT been replaced, repeat this step until all of the backplanes have been replaced.

MAP 3060: Isolating a Degraded SSA Link Between a DDM and an SSA Device Card
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
A 40 MBs per second SSA link is degraded and is running at 20 MBs per second, between a DDM and the SSA device card. The degraded FRU is either a center DDM, a passthrough or bypass card, a SSA device cable, or an SSA device card. See Figure 61 for the relationship of the DDM, passthrough or bypass card, backplane, SSA device cable and SSA device card FRUs involved with this failure. v DDM bay A v DDM bay B

SSA Device Cable SSA Device Cable Bypass Card SSA Device Card

DDM Bay - A DDM Bay Backplane

Passthrough Card

Passthrough Card

DDM

DDM Bay - B

DDM Bay Backplane

Figure 61. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008041l)

184

VOLUME 1, TotalStorage ESS Service Guide

MAP 3060: Degraded SSA Link Between a DDM and an SSA Device Card

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the SSA cables displayed on the service terminal as possible FRUs. For this isolation procedure, one of the SSA cables is connected between a DDM bay and an SSA device card. The other SSA cable is connected between the same DDM bay and another DDM bay. The service terminal will identify the DDM bay and its SSA connector, and the SSA device card and its SSA connector. Note: To locate a DDM bay see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 62. To locate an SSA device card cable connector, see Figure 63 on page 186.

Figure 62. DDM bay SSA Connectors (S007693l)

Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2 Use the figure below to locate an SSA device card cable connector.

Problem Isolation Procedures, CHAPTER 3

185

MAP 3060: Degraded SSA Link Between a DDM and an SSA Device Card

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 63. Cluster SSA Device Card Connector Locations (s009166)

a. Disconnect one of the two SSA device cables shown in Figure 61 on page 184, and listed in the Problem FRU list. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. There should be six pins in each plug. If there are less than six pins, replace the cable. Reconnect both ends of the SSA device cable, ensure good connection. c. Run the repair verification. Select one cable from the Problem FRU list and follow the repair process and verification without actually replacing the cable. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded: If you have inspected only one cable, repeat the above steps on the second cable, If you have inspected both cables, go to step 3. 3. Replace the passthrough and bypass cards displayed on the service terminal. Replace these cards one at a time, see Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. After each card is replaced, verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded: If all of the cards shown in Figure 61 on page 184, have been replaced, go to step 4 on page 187.

186

VOLUME 1, TotalStorage ESS Service Guide

MAP 3060: Degraded SSA Link Between a DDM and an SSA Device Card
If all of the cards shown in Figure 61 on page 184, have NOT been replaced, repeat this step until all of the cards have been replaced. 4. Replace one of the two SSA device cables displayed on the service terminal FRU list, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded: If both of the SSA device cables shown in Figure 61 on page 184, have been replaced, go to step 5. If both of the SSA device cables shown in Figure 61 on page 184, have NOT been replaced, repeat this step until all of the cables have been replaced. 5. Replace the DDM displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to step 6. 6. Replace SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to step 7. 7. Replace the DDM bay frames displayed on the service terminal, one at a time: v DDM bay Frame assembly, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded: If all of the backplanes shown in Figure 61 on page 184, have been replaced, the SSA link is still degraded, call the next level of support. If all of the backplanes shown in Figure 61 on page 184, have NOT been replaced, repeat this step until all of the backplanes have been replaced.

MAP 3077: Isolating an SSA Link Error Between a DDM and two SSA Device Cards
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Problem Isolation Procedures, CHAPTER 3

187

MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards
Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
An SSA link between a DDM and two SSA device cards is failing. The failing link includes two SSA device cards, one bypass card, one passthrough card, three SSA cables, and the DDM bay backplane. See Figure 64 for the relationship of these FRUs. The failure or incorrect connection of any of these components can cause the link to fail. Other failures can also cause the link to fail. For example, a hot reset line to the SSA device card can cause the connection between the two loop inputs to appear to be open.

Figure 64. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008141l)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Write the following information on a piece of paper. a. The Problem ID of this problem. b. The number of the failing cluster, cluster 1 or 2. c. The number of the other cluster: v If cluster 1 is the failing cluster, record the other cluster as cluster 2. v If cluster 2 is the failing cluster, record the other cluster as cluster 1. 3. Press F3 on the service terminal to list other problems. Are there any other problems whose Failing Cluster is the other cluster written down in step 2c? v Yes, repair and verify them now. Repairing these problems may correct this problem. After repair verification, continue with the next step. v No, continue with step 5 on page 189 4. Did the repair of the other problems resolve the problem recorded in the last step (problem ID not displayed)? v Yes, this problem is resolved. v No, continue with the next step.

188

VOLUME 1, TotalStorage ESS Service Guide

MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards
5. Return to the original problem. Select one of the SSA device cards from the Possible FRU to Replace list. Continue through the repair and verify process but do not replace any FRU. Did the verification test run without error? v Yes, the problem is resolved. This problem was caused by a condition that has now been resolved. v No, continue with the next step. 6. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA cables are connected correctly, continue with the next step. v No, continue with step 9 on page 190. 7. Verify that the SSA cables are connected correctly. Locate all of the three SSA cables displayed by the service terminal as possible FRUs. These SSA cables will each be connected between a DDM bay and an SSA device card. The service terminal FRU Location will identify the DDM bay and SSA connector where each end of the SSA cable is connected. Note: To locate the DDM bay see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 65. To locate an SSA device card cable connector, see Figure 66 on page 190.

Figure 65. DDM bay SSA Connector Locations (S007693l)

Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2

Problem Isolation Procedures, CHAPTER 3

189

MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 66. Cluster SSA Device Card SSA Connector Locations (s009166)

Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors, continue with the next step. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group. v No, go to step 9. 8. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable you just connected correctly. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, continue with the next step. 9. Locate the DDM bay, it may be located in the front or rear of the 2105. Observe all of the DDM bay DDM Ready and Check indicators. Are any of the DDM bay DDM indicators on? v Yes, go to step 10 on page 191. v No, there is a DDM bay problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261.

190

VOLUME 1, TotalStorage ESS Service Guide

MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards

Figure 67. DDM bay DDM Indicator Locations (S008021l)

10. Replace the DDM displayed on the service terminal, then verify the repair. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. 11. Replace one of the SSA device cards displayed on the service terminal, then verify the repair. See SSA Service Card, Cluster in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. 12. Replace the other SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. 13. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.

Problem Isolation Procedures, CHAPTER 3

191

MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards

DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 68. DDM bay Bypass Card Jumper Settings (s009436)

14.

15.

16.

17.

Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. Replace the passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. Replace the first SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 7 on page 189. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to the next step. Replace the second SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 7 on page 189. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to the next step. Replace the third SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 7 on page 189. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

v No, the SSA link is still failing, go to the next step. 18. Replace the backplane in the DDM bay, then verify the repair: See Frame Assembly, DDM Bay in chapter 4 of the Volume 2.

192

VOLUME 1, TotalStorage ESS Service Guide

MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards
Note: For a DDM bay, the backplanes are replaced by replacing the frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, call the next level of support.

MAP 3078: Isolating a Degraded SSA Link Between a DDM and Two SSA Device Cards
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
A 40 MBs per second SSA link between a DDM and two SSA device cards is degraded and is running at 20 MBs per second. The degraded link includes two SSA device cards, one bypass card, one passthrough card, three SSA cables, and the DDM bay backplane. See Figure 69 for the relationship of these FRUs. The failure or incorrect connection of any of these components can cause the link to run at a slower speed.

Figure 69. SSA Link Failure, Passthrough and Bypass Card Link Between a DDM and SSA Device Card (S008141l)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate all of the three SSA cables displayed by the service terminal as possible FRUs. These SSA cables will each be connected between a DDM bay and an SSA device card. The service terminal FRU Location will identify the DDM bay and SSA connector where each end of the SSA cable is connected.

Problem Isolation Procedures, CHAPTER 3

193

MAP 3078: Degraded SSA Link Between a DDM and Two SSA Device Cards
Note: To locate the DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 70. To locate an SSA device card cable connector, see Figure 71.

Figure 70. DDM bay SSA Connector Locations (S007693l)

Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 71. Cluster SSA Device Card SSA Connector Locations (s009166)

Disconnect both ends of each of these SSA cables. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group.

194

VOLUME 1, TotalStorage ESS Service Guide

MAP 3078: Degraded SSA Link Between a DDM and Two SSA Device Cards
Inspect the cable connectors for bent pins and correct any problems found. There should be three pins in each plug. If there are less than three pins, replace the cable. Reconnect both ends of the SSA device cables, ensure good connection. Continue with the next step. 3. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any of the cables. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, continue with the next step. 4. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 72. DDM bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. 5. Replace the passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. 6. Replace the first SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 2 on page 193. Did repair verification run without error?
Problem Isolation Procedures, CHAPTER 3

195

MAP 3078: Degraded SSA Link Between a DDM and Two SSA Device Cards
v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to the next step. Replace the second SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 2 on page 193. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to the next step. Replace the third SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 2 on page 193. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to the next step. Replace the DDM displayed on the service terminal, then verify the repair. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. Replace one of the SSA device cards displayed on the service terminal, then verify the repair. See SSA Service Card, Cluster in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. Replace the other SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. Replace the backplane in the DDM bay, then verify the repair: See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: For a DDM bay, the backplanes are replaced by replacing the frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

7.

8.

9.

10.

11.

12.

196

VOLUME 1, TotalStorage ESS Service Guide

MAP 3078: Degraded SSA Link Between a DDM and Two SSA Device Cards
v No, the SSA link is still degraded, call the next level of support.

MAP 3085: Isolating an SSA Link Error Between Two SSA Device Cards Connected Through a DDM Bay
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
An SSA link failed between two SSA device cards connected through a DDM bay. The failing FRU is one of the FRUs displayed in the FRU list. See Figure 73 for the relationship of these FRUs.

SSA Device Cable Bypass Card

SSA Device Card Passthrough Card

SSA Device Cable

SSA Device Card

DDM Bay Backplane

Figure 73. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S007649l)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA cables are connected correctly, go to step 3. v No, continue with step 5 on page 198. 3. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. See Locating an SSA Cable. Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors, go to step 4. v No, go to step 5 on page 198. 4. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable you just connected correctly. Proceed

Problem Isolation Procedures, CHAPTER 3

197

MAP 3085: SSA Link Error Between Two SSA Device Cards Through a DDM Bay
through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, go to step 5. 5. Locate the two SSA cables displayed on the service terminal as possible FRUs. For this isolation procedure, the SSA cables will be connected between a DDM bay and SSA device cards. The service terminal will identify the DDM bays and their SSA connectors, and the SSA device cards and their SSA connectors. Note: To locate a DDM bay see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 74. To locate an SSA device card cable connector, see Figure 75 on page 199.

Figure 74. DDM bay SSA Connector Locations (S007693l)

Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2

198

VOLUME 1, TotalStorage ESS Service Guide

MAP 3085: SSA Link Error Between Two SSA Device Cards Through a DDM Bay

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 75. Cluster SSA Device Card SSA Connector Locations (s009166)

a. Disconnect the SSA device cable from the cluster SSA device card and the DDM bay. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Reconnect both ends of the SSA device cable, ensure good connection. c. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, go to step 6. 6. Locate DDM bay, it may be located in the front or rear of the 2105. Observe all of the DDM bay DDM and card indicators. Are any of the DDM bay indicators on? v Yes, go to step 7. v No, there is a DDM bay problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261. 7. Replace the first SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 8 on page 200.

Problem Isolation Procedures, CHAPTER 3

199

MAP 3085: SSA Link Error Between Two SSA Device Cards Through a DDM Bay
8. Replace the other SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 9. 9. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 76. DDM bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 10. 10. Replace the passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 11. 11. Replace the SSA device cables displayed on the service terminal one at a time, then verify each repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, of you have not replaced the other cable, replace it and verify the repair. If both cables have been replaced, and the SSA link is still failing, go to step 12. 12. Replace the frame (DDM bay) assembly displayed on the service terminal:

200

VOLUME 1, TotalStorage ESS Service Guide

MAP 3085: SSA Link Error Between Two SSA Device Cards Through a DDM Bay
v DDM bay Frame assembly, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, call the next level of support.

MAP 3086: Isolating a Degraded SSA Link Between Two SSA Device Cards Connected Through a DDM Bay
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
A 40 MBs per second SSA link between two SSA device cards connected through a DDM bay is degraded and is running at 20 MBs per second. The degraded FRU is one of the FRUs displayed in the FRU list. See Figure 77 for the relationship of these FRUs.

SSA Device Cable Bypass Card

SSA Device Card Passthrough Card

SSA Device Cable

SSA Device Card

DDM Bay Backplane

Figure 77. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S007649l)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the two SSA cables displayed on the service terminal as possible FRUs. For this isolation procedure, the SSA cables will be connected between a DDM bay and SSA device cards. The service terminal will identify the DDM bays and their SSA connectors, and the SSA device cards and their SSA connectors.

Problem Isolation Procedures, CHAPTER 3

201

MAP 3086: Degraded SSA Link Between Two SSA Device Cards Through a DDM Bay
Note: To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 78. To locate an SSA device card cable connector, see Figure 79.

Figure 78. DDM bay SSA Connector Locations (S007693l)

Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 79. Cluster SSA Device Card SSA Connector Locations (s009166)

a. Disconnect the SSA device cables from the cluster SSA device cards and the DDM bay. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group.

202

VOLUME 1, TotalStorage ESS Service Guide

MAP 3086: Degraded SSA Link Between Two SSA Device Cards Through a DDM Bay
b. Inspect the cable connectors for bent pins and correct any problems found. Each connector should have three pins. If there are less than three pins, replace the cable. Reconnect both ends of the SSA device cables, ensure good connection. c. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9 on page 204. v No, continue with the next step. 3. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 80. DDM bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Go to step 9 on page 204. v No, the SSA link is still degraded, continue with the next step. 4. Replace the passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9 on page 204. v No, the SSA link is still degraded, continue with the next step. 5. Replace the SSA device cables displayed on the service terminal one at a time, then verify each repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9 on page 204. v No, if you have not replaced the other cable, replace it and verify the repair. If both cables have been replaced, and the SSA link is still degraded, go to step 6. 6. Replace the first SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error?
Problem Isolation Procedures, CHAPTER 3

203

MAP 3086: Degraded SSA Link Between Two SSA Device Cards Through a DDM Bay
v Yes, the problem is resolved. Go to step 9. v No, the SSA link is still degraded, continue with the next step. 7. Replace the other SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9. v No, the SSA link is still degraded, continue with the next step. 8. Replace the frame (DDM bay) assembly displayed on the service terminal: v DDM bay Frame assembly, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9. v No, the SSA link is still degraded, call the next level of support. 9. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

MAP 3095: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays and an SSA Device Card
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
An SSA link between two DDMs is failing. The DDMs are in separate DDM bays. The failing link goes through two passthrough cards, a bypass card, SSA cable(s), and possibly an SSA device adapter. See Figure 81 for the relationship of these FRUs. The failure or incorrect connection of any of these components can cause the link to fail. Other failures can also cause the link to fail. For example, a hot reset line to the SSA device card can cause the connection between the two loop inputs to appear to be open.

204

VOLUME 1, TotalStorage ESS Service Guide

MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
SSA Device Cable

SSA Device Cables

SSA Device Card

Bypass Card

DDM

DDM Bay - A DDM Bay Backplane

Passthrough Card

Passthrough Card

DDM DDM Bay - B

DDM Bay Backplane

Figure 81. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008140l)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Write the following information on a piece of paper. a. The Problem ID of this problem. b. The number of the failing cluster, cluster 1 or 2. c. The number of the other cluster: v If cluster 1 is the failing cluster, record the other cluster as cluster 2. v If cluster 2 is the failing cluster, record the other cluster as cluster 1. 3. Press F3 on the service terminal to list other problems. Are there any other problems whose Failing Cluster is the other cluster written down in step 2c? v Yes, repair and verify them now. Repairing these problems may correct this problem. After repair verification, continue with the next step. v No, go to step 6. 4. Did the repair of the other problems resolve the problem recorded in the last step (problem ID not displayed)? v Yes, this problem is resolved. v No, continue with the next step. 5. Return to the original problem. Select the SSA device card from the Possible FRU to Replace list. Continue through the repair and verify process but do not replace any FRU. Did the verification test run without error? v Yes, the problem is resolved. This problem was caused by another problem that has now been resolved. v No, continue with the next step. 6. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed?

Problem Isolation Procedures, CHAPTER 3

205

MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
v Yes, verify that the SSA cables are connected correctly, continue with the next step. v No, continue with step 11 on page 207. 7. Locate the SSA cables displayed on the service terminal as possible FRUs. One of these SSA cables will be connected between two separate DDM bays. The service terminal will identify the DDM bay and SSA connector that each end of the SSA cable is connected to. Note: To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 82. Is the SSA cable connected to the correct connectors? v Yes, continue with the next step. v No, connect the cable correctly. Continue with the next step. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. After the cable is connected correctly, go to step 10 on page 207. 8. Disconnect both ends of the SSA device cable. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. Inspect the cable connectors for bent pins and correct any problems found. Reconnect both ends of the SSA device cable, ensure good connection. Continue with the next step.

Figure 82. DDM Bay SSA Connector Locations (S007693l)

9. Locate the two remaining SSA cables in the Possible FRU list. These SSA cable will be connected between a DDM bay and an SSA device card. The service terminal will identify the DDM bay and its SSA connector, and the SSA device card and its SSA connector. Locate the DDM bay end of the SSA cable, see the instructions in step 7. Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2

206

VOLUME 1, TotalStorage ESS Service Guide

MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
To locate an SSA device card cable connector, see Figure 83. Are the SSA cables connected to the correct connectors? v Yes, step 11. v No, connect the cable correctly. After the cable is connected correctly, go to step 10.

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 83. Cluster SSA Device Card SSA Connector Locations (s009166)

10. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any cable in the Possible FRUs to Replace list. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, continue with the next step. 11. Replace the SSA device card displayed on the service terminal then verify the repair See SSA Service Card, Cluster in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 12. Replace the first of the two DDMs displayed on the service terminal, then verify the repair. Note: If the amber check indicator on one of the two DDMs is on, replace that DDM first, see DDM Bay Disk Drive Module Indicators on page 23. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 13. Replace the second DDM displayed on the service terminal with the DDM removed in the last step, then verify the repair. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2.
Problem Isolation Procedures, CHAPTER 3

207

MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
Note: It may take many hours before the second DDM can be replaced. The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to complete. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time for 18 MB DDMs can be up to 36 hours. Sparing time varies with system usage and the storage capacity of the DDM being spared. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 14. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 84. DDM bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 15. Replace the first passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 16. Replace the second passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Use the card removed in the last step. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 17. Replace the SSA device cable that connects the two DDM bays. This cable is displayed in the FRU list on the service terminal. To locate the cable, see step 7 on page 206. Did repair verification run without error?

208

VOLUME 1, TotalStorage ESS Service Guide

MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, continue with the next step. 18. Replace the second SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 9 on page 206. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, continue with the next step. 19. Replace the third SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 9 on page 206. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, continue with the next step. 20. Replace the frame assembly (backplane) in DDM bay A, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? Note: For DDM bays, the backplanes are replaced by replacing the frame assembly. v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, continue with the next step. 21. Replace the backplane in DDM bay B, then verify the repair: v DDM bay see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: For DDM bays, the backplanes are replaced by replacing the frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, call the next level of support. 22. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

MAP 3096: Isolating a Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
A 40 MBs per second SSA link between two DDMs is degraded and is running at 20 MBs per second. The DDMs are in separate DDM bays. The degraded link goes through two passthrough cards, a bypass card, and an SSA cable. See Figure 85 for the relationship of these FRUs. The degradation of any of these components can cause the link to run slower.

Problem Isolation Procedures, CHAPTER 3

209

MAP 3096: Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card
SSA Device Cable

Bypass Card

DDM

DDM Bay - A DDM Bay Backplane

Passthrough Card

Passthrough Card

DDM DDM Bay - B

DDM Bay Backplane

Figure 85. SSA Link Degraded, Two Passthrough and Bypass Card Link Between Two DDMs (S008384l)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the SSA cable displayed on the service terminal as possible FRU. This SSA cable will be connected between two separate DDM bays. The service terminal will identify the DDM bay and SSA connector that each end of the SSA cable is connected to. To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 86. Continue with the next step. 3. Disconnect both ends of the SSA device cable. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. Inspect the cable connectors for bent pins and correct any problems found. Each connector should have three pins. If there are less than three pins, replace the cable. Reconnect both ends of the SSA device cable, ensure good connection. Continue with the next step.

Figure 86. DDM bay SSA Connector Locations (S007693l)

4. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any cable in the Possible FRUs to Replace list.

210

VOLUME 1, TotalStorage ESS Service Guide

MAP 3096: Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card
Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, continue with the next step. 5. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 87. DDM bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, the SSA link is still degraded, continue with the next step. 6. Replace the first passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, the SSA link is still degraded, continue with the next step. 7. Replace the second passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Use the card removed in the last step. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, the SSA link is still degraded, continue with the next step. 8. Replace the SSA device cable that connects the two DDM bays. This cable is displayed in the FRU list on the service terminal. To locate the cable, see step 2 on page 210. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, the SSA link is still degraded, continue with the next step. 9. Replace the first of the two DDMs displayed on the service terminal, then verify the repair.
Problem Isolation Procedures, CHAPTER 3

211

MAP 3096: Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card
Did repair verification run without error? v Yes, the problem is resolved. Go to step 13. v No, the SSA link is still degraded, continue with the next step. 10. Replace the second DDM displayed on the service terminal with the DDM removed in the last step, then verify the repair. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2. Note: It may take many hours before the second DDM can be replaced. The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to complete. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time for 18 MB DDMs can be up to 36 hours. Sparing time varies with system usage and the storage capacity of the DDM being spared. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13. v No, the SSA link is still degraded, continue with the next step. 11. Replace the frame assembly (backplane) in DDM bay A, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? Note: For DDM bays, the backplanes are replaced by replacing the frame assembly. v Yes, the problem is resolved. Go to step 13. v No, the SSA link is still degraded, continue with the next step. 12. Replace the backplane in DDM bay B, then verify the repair: v DDM bay see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: For DDM bays, the backplanes are replaced by replacing the frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13. v No, the SSA link is still degraded, call the next level of support. 13. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

MAP 3100: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The SSA link between two DDMs is failing. The failing link is between two DDMs, in different DDM bays, two passthrough and or bypass cards and the SSA cable that links them. See Figure 88 for the relationship of the DDM, passthrough and or bypass card, and backplane FRUs involved with this failure.

212

VOLUME 1, TotalStorage ESS Service Guide

MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays
DDM locations in DDM bays v DDM 1 or 8

SSA Device Cable

DDM Bay-A

DDM

Passthrough or Bypass Cards

DDM

DDM Bay-B

DDM Bay Backplane

DDM Bay Backplane

Figure 88. SSA Link Failure, Passthrough/Bypass Cards and Two DDMs (s009437)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Review if any other problems (pending or open) have a single DDM as the FRU. Besides the problem you are working on, are there any other pending or open problems with a single DDM as the FRU? v Yes, go to step 3. v No, go to step 4. 3. Compare the single DDM FRU in the pending or open problem with the DDMs in the problem you are working on. Is the DDM in the open or pending problem the same as one of the DDMs in the problem you are working on? v Yes, repair the problem with the single DDM FRU first, it should fix the problem you are working on: If the problem is resolved, go to step 17 on page 217. If the problem is not resolved, continue with the next step. v No, go to step 4. 4. Determine if the SSA cables to the failing DDM bays have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA cables are connected correctly, go to step 5. v No, continue with step 7 on page 214. 5. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. See Locating an SSA Cable. Are any of the cables connected wrong?

Problem Isolation Procedures, CHAPTER 3

213

MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays
v Yes, Connect the cables to the correct connectors, go to step 6. v No, go to step 7. 6. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable you just connected correctly. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, go to step 7. 7. Locate DDM bay-A, it may be located in the front or rear of the 2105. Observe all of the DDM bay DDM indicators, see Figure 89. Are any of the DDM bay indicators on? v Yes, go to step 8. v No, there is a DDM bay power problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261.

Figure 89. DDM Bay DDM Indicator Locations (S008021l)

8. Locate DDM bay-B, it may be located in the front or rear of the 2105. Observe all of the DDM bay DDM indicators, see Figure 89. Are any of the DDM bay indicators on? v Yes, go to step 9. v No, there is a DDM bay power problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261. 9. Locate the SSA cable displayed on the service terminal as a possible FRU. For this isolation procedure, the SSA cable will be connected between two separate DDM bays. The service terminal FRU Location will identify the DDM bay and SSA connector to which each end of the SSA cable is connected. To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. Use the drawing below to locate SSA cable connectors on a DDM bay. Select the cable shown on the service terminal for repair.

214

VOLUME 1, TotalStorage ESS Service Guide

MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays

Figure 90. DDM Bay SSA Connectors (S007693l)

a. Disconnect the SSA device cable between the two DDM bays. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Reconnect both ends of the SSA device cable, ensure good connection. c. Run the repair verification. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any FRU. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, go to step 10. 10. Replace the first of the two DDMs displayed on the service terminal, then verify the repair. Note: If the amber check indicator on one of the two DDMs is on, replace that DDM first, see DDM Bay Disk Drive Module Indicators on page 23. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, the SSA link is still failing, go to step 11. 11. Replace the second DDM displayed on the service terminal with the DDM removed in step 10, then verify the repair. Note: The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to completed. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time can be many hours. Sparing time varies with system usage and the storage capacity of the DDM being spared. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, the SSA link is still failing, go to step 12. 12. Replace the first of the two passthrough or bypass cards displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2.

Problem Isolation Procedures, CHAPTER 3

215

MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays
Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 91. DDM bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, the SSA link is still failing, go to step 13. 13. Replace the second passthrough or bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Use the card removed in step 12 on page 215. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 92. DDM bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. v No, the SSA link is still failing, go to step 14. 14. Replace the SSA device cable displayed on the service terminal, see SSA Cables, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, the SSA link is still failing, go to step 15 on page 217.

216

VOLUME 1, TotalStorage ESS Service Guide

MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays
15. Replace the backplane in DDM bay-A, see MAP 3400: Replacing a DDM Bay Frame Assembly on page 266. Note: For DDM bays, the backplanes are replaced by replacing the frame (DDM bay) assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17. v No, the SSA link is still failing, go to step 16. 16. Replace the backplane in DDM Bay-B, see MAP 3400: Replacing a DDM Bay Frame Assembly on page 266, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17. v No, the SSA link is still failing, call the next level of support. 17. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

MAP 3101: Isolating a Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The 40 MBs per second SSA link between two DDMs is degraded and is running at 20 MBs per second. The degraded link is between two DDMs, in different DDM bays, two passthrough and/or bypass cards and the SSA cable that links them. See Figure 93 for the relationship of the DDM, passthrough and or bypass card, and backplane FRUs involved with this failure. DDM locations in DDM bays: v Both are DDM 8

SSA Device Cable

DDM Bay-A

DDM

Passthrough or Bypass Cards

DDM

DDM Bay-B

DDM Bay Backplane

DDM Bay Backplane

Figure 93. SSA Link Failure, Passthrough/Bypass Cards and Two DDMs (s009437)

Isolation
1. Read this Attention before replacing any FRUs in this MAP:

Problem Isolation Procedures, CHAPTER 3

217

MAP 3101: Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays
Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the SSA cable displayed on the service terminal as a possible FRU. For this isolation procedure, the SSA cable will be connected between two separate DDM bays. The service terminal FRU Location will identify the DDM bay and SSA connector to which each end of the SSA cable is connected. To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. Use the drawing below to locate SSA cable connectors on a DDM bay. Select the cable shown on the service terminal for repair.

Figure 94. DDM Bay SSA Connectors (S007693l)

a. Disconnect the SSA device cable between the two DDM bays. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Disconnect both ends of each of these SSA cables. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group. c. Inspect the cable connectors for bent pins and correct any problems found. There should be six pins in each plug. If there are less than six pins, replace the cable. Reconnect both ends of the SSA device cable, ensure good connection. Reconnect both ends of the SSA device cable, ensure good connection. d. Run the repair verification. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any FRU. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10 on page 220. v No, continue with the next step. 3. Replace the first of the two passthrough or bypass cards displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2.

218

VOLUME 1, TotalStorage ESS Service Guide

MAP 3101: Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays
Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 95. DDM bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Go to step 10 on page 220. v No, the SSA link is still degraded, continue with the next step. 4. Replace the second passthrough or bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Use the card removed in step 3 on page 218. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 96. DDM bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Go to step 10 on page 220. v No, the SSA link is still degraded, continue with the next step. 5. Replace the SSA device cable displayed on the service terminal, see SSA Cables, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10 on page 220. v No, the SSA link is still degraded, go to step 6 on page 220.

Problem Isolation Procedures, CHAPTER 3

219

MAP 3101: Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays
6. Replace the first of the two DDMs displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10. v No, the SSA link is still degraded, continue with the next step. 7. Replace the second DDM displayed on the service terminal with the DDM removed in step 6. then verify the repair. Note: The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to completed. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time can be many hours. Sparing time varies with system usage and the storage capacity of the DDM being spared. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10. v No, the SSA link is still degraded, continue with the next step. 8. Replace the backplane in DDM Bay-A, see MAP 3400: Replacing a DDM Bay Frame Assembly on page 266. Note: For DDM bays, the backplanes are replaced by replacing the frame (DDM bay) assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10. v No, the SSA link is still degraded, continue with the next step. 9. Replace the backplane in DDM Bay-B, see MAP 3400: Replacing a DDM Bay Frame Assembly on page 266, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10. v No, the SSA link is still degraded, call the next level of support. 10. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

MAP 3120: Isolating an SSA Link Error


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
An SSA link failed between a DDM and the SSA device card. See Figure 97 for the relationship of the DDM, passthrough or bypass card, backplane, SSA device cable and SSA device card FRUs involved with this failure.

220

VOLUME 1, TotalStorage ESS Service Guide

MAP 3120: SSA Link Error

SSA Device Cable Bypass or Passthrough Card

SSA Device Card

DDM Bay Backplane

DDM

Figure 97. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (s009438)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Review if any other problems (pending or open) have a single DDM or SSA device card as the FRU. Besides the problem you are working on, are there any other pending or open problems with a single DDM as the FRU? v Yes, go to step 3. v No, go to step 4. 3. Compare the single DDM or SSA device card FRU in the pending or open problem with the DDM in the problem you are working on. Is the FRU in the open or pending problem the same as the FRU in the problem you are working on? v Yes, repair the open or pending problem with the single FRU first, it should fix the problem you are working on. v No, go to step 4. 4. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA cables are connected correctly, go to step 5. v No, continue with step 7 on page 222. 5. Verify that the SSA cables are connected correctly. Look at the SSA cables displayed on the Detail Problem screen. Compare the SSA cables displayed with the cabling of the DDM bay or DDM bay. See Locating an SSA Cable in chapter 7 of the Volume 3. Are any of the SSA cables connected wrong?

Problem Isolation Procedures, CHAPTER 3

221

MAP 3120: SSA Link Error


v Yes, Connect the SSA cables to the correct connectors, go to step 6. v No, go to step 7. 6. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable you just connected correctly. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Go to step 14 on page 223. v No, go to step 7. 7. Locate the DDM bay, it may be located in the front or rear of the 2105. Observe all of the DDM bay DDM indicators, see Figure 98. Are any of the DDM bay indicators on? v Yes, go to step 8. v No, there is a DDM bay power problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261.

Figure 98. DDM bay DDM Indicator Locations (S008021l)

8. Intermittent SSA link errors, can reopen problems that were corrected earlier with a successful FRU replacement. In steps 9 through 13 on page 223, skip any step that is the same as an earlier step that had a successful repair. Continue with the next step. 9. Replace the DDM displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14 on page 223. v No, the SSA link is still failing, go to step 10. 10. Replace SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14 on page 223. v No, the SSA link is still failing, go to step 11. 11. Replace the passthrough or bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2.

222

VOLUME 1, TotalStorage ESS Service Guide

MAP 3120: SSA Link Error


Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 99. DDM Bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Go to step 14. v No, the SSA link is still failing, go to step 12. 12. Replace the SSA device cable displayed on the service terminal probable FRU list, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14. v No, the SSA link is still failing, go to step 13. 13. Replace the DDM bay frame assembly displayed on the service terminal. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14. v No, the SSA link is still failing, call the next level of support. 14. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

MAP 3121: Isolating a Degraded SSA Link


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
A 40 MBs per second SSA link between a DDM and the SSA device card is degraded and is running at 20 MBs per second. See Figure 100 for the relationship of the DDM, passthrough or bypass card, backplane, SSA device cable and SSA device card FRUs involved with this degraded link.

Problem Isolation Procedures, CHAPTER 3

223

MAP 3121: Degraded SSA Link

SSA Device Cable Bypass or Passthrough Card

SSA Device Card

DDM Bay Backplane

DDM

Figure 100. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (s009438)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the SSA cable displayed on the service terminal as a possible FRU. For this isolation procedure, the SSA cable will be connected between a DDM bay and an SSA device card. The service terminal FRU Location will identify the DDM bay and its SSA connector, and the SSA device card and its SSA connector. Note: To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 101. To locate an SSA device card cable connector, see Figure 101 and Figure 102 on page 225.

Figure 101. DDM bay SSA Connector Locations (S007693l)

Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar

224

VOLUME 1, TotalStorage ESS Service Guide

MAP 3121: Degraded SSA Link


v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 102. Cluster SSA Device Card SSA Connector Locations (s009166)

a. Disconnect the SSA device cable from the SSA device card and the DDM bay Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Each connector should have three pins. If there are less than three pins, replace the cable. Reconnect both ends of the SSA device cable, ensure good connection. c. Run the repair verification, go to the Problem Detail screen on the service terminal. Select any FRU for replacement, go through the repair and verification procedure but do not remove or replace any FRU. This will verify if the problem is resolved. Did repair verification run without error? v Yes, the problem is resolved. Go to step 8 on page 226. v No, continue with the next step. 3. Replace the passthrough or bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.

Problem Isolation Procedures, CHAPTER 3

225

MAP 3121: Degraded SSA Link

DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 103. DDM Bay Bypass Card Jumper Settings (s009436)

Did repair verification run without error? v Yes, the problem is resolved. Go to step 8. v No, the SSA link is still degraded, continue with the next step. 4. Replace the DDM displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 8. v No, the SSA link is still degraded, continue with the next step. 5. Replace SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 8. v No, the SSA link is still degraded, continue with the next step. 6. Replace the SSA device cable displayed on the service terminal probable FRU list, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 8. v No, the SSA link is still degraded, continue with the next step. 7. Replace the DDM bay frame assembly displayed on the service terminal. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14 on page 223. v No, the SSA link is still failing, call the next level of support. 8. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

MAP 3123: Array Repair Required


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

226

VOLUME 1, TotalStorage ESS Service Guide

MAP 3123: Array Repair Required

Description
This failure indicates that a DDM failure occurred during an array build. The array needs to be rebuilt.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. Repair any other problems before continuing with this MAP. Display the problem and record the information with the FRU Engineering Name. This information should be rank## or ssa## with ## being a one or two digit number. Record the SRN and the rank or SSA number, then call your next level of support. They will help you and the system operator through the array disband and rebuild. This problem will have to be manually closed after the rebuild is started.

2. 3.

4.

5.

MAP 3124: Isolating Between DDM Hardware and Microcode Failures


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
This failure indicates that either the hardware or the microcode of a DDM has failed. This MAP will determine if which has failed.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Display the problems. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair 3. Review the SRN portion of each one line problem description. Does this same SRN appear in more than one problem? v Yes, this is a complex problem that the maintenance procedures are unable to resolve. Call your next level of support. v No, select the DDM in this problem for replacement. Follow the service terminal instructions for the replacement of the DDM.

Problem Isolation Procedures, CHAPTER 3

227

MAP 3125: Unexpected SSA SRN

MAP 3125: Isolating an Unexpected SSA SRN


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The cluster received an unexpected service request number (SRN) from the SSA.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check if there are any other DDM or SSA open problems associated with the failing resource: v If there are no other problems to repair, go to step 4. v If there are other problems, repair them before continuing with this MAP, then continue with the next step. 3. After the problems are repaired: v If the unexpected SSA SRN problem is closed, then the repair is complete. v If the unexpected SSA SRN problem is still open, then continue with the next step. 4. The problem cannot be corrected with a service procedure. 5. Call your next level of support.

MAP 3126: Isolating an Unexpected SSA Test Result


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The cluster received unexpected results from the SSA.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option.

228

VOLUME 1, TotalStorage ESS Service Guide

MAP 3126: Unexpected SSA Test Result


2. Determine if the SSA jumpers or SSA cables to the failing DDM bay have just been changed or installed. Have the SSA jumpers or cables just been changed or installed? v Yes, continue with step 3. v No, continue with step 4. 3. Look at the SSA cables displayed on the Detail Problem screen. Compare the SSA cables displayed with the cabling of the DDM bay. Are any of the SSA cables connected wrong? v Yes, Connect the jumper cables to the correct connectors. Verify the repair, go to MAP 3500: Verifying a DDM Bay Repair on page 283. v No, continue with the next step. 4. Check if there are any other DDM or SSA open problems associated with the failing resource: v If there are no other problems to repair, go to step 6. v If there are other problems, repair them before continuing with this MAP, then continue with the next step. 5. After the problems are repaired: v If the unexpected SSA SRN problem is closed, then the repair is complete. v If the unexpected SSA SRN problem is still open, then continue with the next step. 6. The problem cannot be corrected with a service procedure. 7. Call your next level of support. Note: An unassisted repair can disrupt customer operation and may lose customer data.

MAP 3127: Formatting of a DDM Has Not Completed


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
Disk drive module (DDM) still formatting from previous installation or repair.

Isolation
1. Wait for the formatting of the DDM to complete. Formatting is complete when the indicators on the DDM stop flickering. 2. Retry the verification test.

MAP 3128: Isolating an Unknown DDM Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Problem Isolation Procedures, CHAPTER 3

229

MAP 3128: Unknown DDM Failure

Description
DDM Failure(s) have left array(s) with no spares.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check for any other DDM or SSA problems: Display problems needing repair. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Show / Repair Problems Needing Repair. v If there are other DDM or SSA problems, repair and test them. v If there are not any other DDM or SSA problems, continue with the next step. 3. Call your next level of support.

MAP 3129: Isolating an Array Repair Required Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
Array is not available for customer use. There may be multiple problems that can be repaired to restore access. If no problems are found call your next level of support.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check for any other DDM or SSA problems: Display problems needing repair. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Show / Repair Problems Needing Repair. v If there are other DDM or SSA problems, repair and test them. v If there are not any other DDM or SSA problems, continue with the next step. 3. Call your next level of support.

230

VOLUME 1, TotalStorage ESS Service Guide

MAP 3131: Attempt to Format Array Member

MAP 3131: Attempt to Format Array Member


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The DDMs you are attempting to format are members of an existing array that may contain customer data. By formatting these DDMs customer data would be destroyed. A possible cause of this condition is: previously configured DDMs or DDM bays were installed, that were not properly discontinued, when they were removed from their original rack.

Isolation
Call technical support. Do not attempt to resolve this problem without assistance from technical support, customer data may be lost.

MAP 3142: Isolating Multiple DDMs on an SSA Loop Cannot be Accessed


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
Multiple DDMs on an SSA loop cannot be accessed.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check if there are any other open problems: Note: Priority should be given to problems with the same ssaxx (SSA device card) or rsDDMxxxx as Failing Resource. Note the problem ID of the problem you are working on. To find other problems, press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Repair Menu Show/Repair Problems v If there are no other problems that can be repaired, go to step 4 on page 232.

Problem Isolation Procedures, CHAPTER 3

231

MAP 3142: Multiple DDMs on an SSA Loop Cannot be Accessed


v If there are other problems, repair them before continuing with this MAP, then continue with the next step. 3. If this problem is still open after repairing the other problems, continue with the next step. 4. The problem cannot be corrected with a service procedure. 5. Call your next level of support. Note: An unassisted repair can disrupt customer operation and may lose customer data.

MAP 3149: Repairing Single or Multiple DDM Failures


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
This procedure: v Supports the repair and replacement of one or more DDMs at the same time v Allows only one DDM per loop to be selected at one time so the loop does not have two breaks simultaneously v Automatically closes the original for each DDM that was replaced v Formats and resumes all the replaced DDMs at the same time v v v v Automatically disconnects the service login once the format/resume is started Opens a new if the format and resume fails for a DDM Allows you to log back in to monitor the progress of the format and resume Prevents further service actions until the format and resume is complete

Repair
1. From the service terminal Main Service Menu, select: Repair Menu Multiple DDM Repair 2. Review all problems needing repair and note the location of DDMs that require replacement. 3. Exit to the Repair menu and select: Repair / Verify DDM(s) (Multiple DDM Repair on older LIC code levels) 4. Do all of the DDM(s) for repair appear on the list? v Yes, continue with the next step. v No, go to step 9 on page 233. 5. Are all of the DDM(s) for repair available for selection ? Note: DDMs which have a pound sign (#) in front of the name cannot be selected at this time. v Yes, continue with the next step. v No, go to step 10 on page 233. 6. Select the DDMs for repair and follow the instructions on the terminal. Continue with the next step.

232

VOLUME 1, TotalStorage ESS Service Guide

MAP 3149: Repairing Single or Multiple DDM Failures


Notes: a. You can only select one DDM per loop. b. A successful DDM repair will automatically close any open problems for the DDM. 7. Are there any SSA loops that had more than one DDM to be repaired? v Yes, repeat step 6 on page 232 above for those DDMs. v No, all the DDMs have been repaired. Continue with the next step. 8. From the service terminal Main Service Menu, select: Repair Menu Format/Resume Previously Repaired DDM(s) Follow the instructions provided on the terminal. After the format and resume is started, you will be automatically logged off. This repair is now complete. 9. There are DDM(s) for repair which are not listed. You must exit and use the Show / Repair Problems Needing Repair option to repair those DDM(s). If there are additional DDM(s) to be repaired you should return to this map and continue with step 5 on page 232. 10. There are DDM(s) for repair which are not selectable. The DDM(s) are located in a part of the SSA loop which is not serviceable at this time. It is likely that another problem exists elsewhere on the loop. Removal of the DDM(s) could result in loss of customer access to data. Has the customer reported loss of access to data? v Yes, contact your next level of support. Do not proceed without guidance. v No, continue with the next step. 11. Are there any loop with all DDM(s) to be repaired not selectable? v Yes, you must exit and use the Show / Repair Problems Needing Repair option to repair any other problems on the loop. If there are additional DDM(s) to be repaired you should return to this map and continue at step 5 on page 232. v No, select one DDM that is currently selectable and follow the instructions on the terminal. The other DDM(s) listed on the loop may become selectable after repairing the currently selectable DDM. When complete, return to this MAP and continue at step 5 on page 232.

MAP 3152: Replacing DDMs Called Out by Enhanced PFA


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
Firmware update has detected DDM or DDMs with the enhanced PFA (Predictive Failure Analysis) test. These DDMs have not failed but should be replaced to prevent possible future failures. PFA has been enhanced in this level of code to be more sensitive in detecting conditions that could lead to future drive failures. For this reason, DDMs with no current functional problems may be called out for replacement. At this code level, the number of DDMs called out for replacement may be higher than in previous levels.

Problem Isolation Procedures, CHAPTER 3

233

MAP 3152: Replacing DDMs Called Out by Enhanced PFA

Isolation
1. Note the DDMs in the FRU list. 2. If there are other ESC 1216 problems, note the DDMs on those FRU lists. 3. Inform customer that you would like to replace these DDMs and explain enhanced PFA to them. 4. Replace the DDMs using the Replace FRU menu. 5. After all DDMs have been replaced cancel all 1216 ESC problems.

MAP 3160: SSA DASD DDM Bay Isolating a Single DDM Redundant Power Fault
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
A single DDM is reporting a different redundant power or cooling status than the other DDMs in the same DDM bay.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Display the problem which sent you here. Is there a controller card displayed in the FRU list? v Yes, select the controller card to replace. Follow the on-screen instructions but do not replace the controller card, just reseat it. Continue with the next step. v No, Use the Replace a FRU menu item and select the controller card (rs8pkctlrxx) in the same drawer as the DDM displayed in the problem. Follow the on-screen instructions but do not replace the controller card, just reseat it. Continue with the next step. 3. Verify that reseating the controller card resolved the problem. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Machine Test Menu SSA Loops Menu Select SSA Loop by DDM Bay (Drawer) Select the line that has the DDM bay containing the controller card which was just reseated. Press enter on the next screen, the loop test will run. Did loop test run without error? v Yes, the problem is resolved. Cancel the problem now. v No, the failure is still present, continue with the next step.

234

VOLUME 1, TotalStorage ESS Service Guide

MAP 3160: SSA DASD DDM Bay Single DDM Redundant Power Fault
4. Display the problem which sent you here. Is there a controller card displayed on the list of FRUs ? v Yes, use the problem to replace the controller card. See Controller Card, DDM Bay in chapter 4 of the Volume 2. Continue with the next step. v No, Use the Replace a FRU menu item to replace the controller card (rs8pkctlrxx) in the same drawer as the DDM displayed in the problem. Controller Card, DDM Bay in chapter 4 of the Volume 2. Continue with the next step. 5. Verify that replacing the controller card resolved the problem. Machine Test Menu SSA Loops Menu Select SSA Loop by DDM Bay (Drawer) Select the line that has the DDM bay containing the DDM displayed on the service terminal. Press enter on the next screen, the loop test will run. Did loop test run without error? v Yes, the problem is resolved. Cancel the problem now. v No, the failure is still present, continue with the next step. 6. Use the problem to replace the listed DDM. See SSA Disk Drive Model, 7133 Model 020/040 in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the failure is still present, continue with the next step. 7. Use the problem to replace the DDM Bay frame assembly displayed on the service terminal. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process to return the resources to the customer and cancel the problem. v No, the failure is still present, call the next level of support.

MAP 3180: Controller Card Failed


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
A controller card has failed in a DDM bay.

Isolation
1. Read this Attention before replacing any FRUs in this MAP:

Problem Isolation Procedures, CHAPTER 3

235

MAP 3180: Controller Card Failed


Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Select the controller card listed with Possible FRUs to replace using the service terminal. See Controller Card, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. Do not perform any more steps in this map, follow the instructions on the service terminal to end the call. v No, the problem is not resolved, call your next level of support.

MAP 3190: Wrong Drawer Type Installed


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
A different drawer has been installed where a DDM bay was expected. All of the drawers on the SSA loop must be uninstalled then reinstalled. If the customer has any data on the SSA loop, they will need to off load the data then reload it after the reinstallation.

Isolation
1. Use the service terminal to locate the drawer displayed as a Possible FRU to Replace. Copy down the FRU Location Description (Rr-Yxx or Rr-Ux-Wx). 2. Locate the improperly installed drawer. Use the location code copied down in the last step. Use Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3 to locate the SSA DASD DDM bay, and to determine which type of drawer is installed at that location. This drawer will need to be removed from the loop and then reinstalled using a DDM bay. 3. Use the service terminal to remove the drawer. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Remove Device Drawers Select the drawer line with the Resource Location that matches the location copied down in step 1. Continue through the instructions to remove the drawer. 4. Use the service terminal to install the DDM bay. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Install a Device Drawer Follow the install process, be sure to enter the correct DDM bay information this time.

236

VOLUME 1, TotalStorage ESS Service Guide

MAP 3200: Uninstalled DDMS on Loop A

MAP 3200: Uninstalled SSA DDMs Connected to Loop A


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
Installation of DDM bays on loop B failed, when loop A on the same SSA device card had uninstalled DDMs. The SSA cables attached to loop A must be disconnected.

Isolation
1. Use the service terminal to locate the SSA device card displayed as a Possible FRU to Replace. Copy down the FRU location. 2. Locate the cluster and the SSA device card using the information below and in Figure 104. Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v v v v Tx is the cluster, 1 or 2 P2 is the cluster planar Ix is the SSA device card location, slot yy is the cable connector, A1, A2, B1, or B2

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 104. Cluster SSA Device Card Locations (s009166)

3. Disconnect the SSA device cables from SSA device card connectors A1 and A2 on the indicated card. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group.
Problem Isolation Procedures, CHAPTER 3

237

MAP 3200: Uninstalled DDMS on Loop A


4. Locate the same SSA device card position in the other cluster. Disconnect the SSA device cables from connectors A1 and A2 on this card also. 5. Go to the service terminal and press F3 until the Main Service Menu is displayed. Restart the installation process.

MAP 3210: Uninstalled SSA DDMs Connected to Loop B


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
Installation of DDM bays on loop A failed, when loop B on the same SSA device card had uninstalled DDMs. The SSA cables attached to loop B must be disconnected.

Isolation
1. Use the service terminal to locate the SSA device card displayed as a Possible FRU to Replace. Copy down the FRU location. 2. Locate the cluster and the SSA device card using the information below and in Figure 105. Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2

I/O Drawer 1/2 SSA Device Card Connectors B2 B1 A2 A1

Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12

Figure 105. Cluster SSA Device Card Locations (s009166)

238

VOLUME 1, TotalStorage ESS Service Guide

MAP 3210: Uninstalled DDMS on Loop B


3. Disconnect the SSA device cables from SSA device card connectors B1 and B2 on the indicated card. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group. 4. Locate the same SSA device card position in the other cluster. Disconnect the SSA device cables from connectors B1 and B2 on this card also. 5. Go to the service terminal and press F3 until the Main Service Menu is displayed. Restart the installation process.

MAP 3220: Isolating too Few DDMs in a DDM Bay


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The wrong number of DDMs were found where eight were expected. v Disk drive module (DDM) locations in DDM bay: New DDM locations: 1, 2, 3, 4, 5, 6, 7, and 8

16 15 14 13 N N N N 1 N 2 N 3 N 4 N

12 11 10 N N N 5 N 6 N 7 N

9 N 8 N

Rear DDMs Front DDMs

N = Newly Installed DDM


Figure 106. Expected DDM Bay DDM Locations (S007657l)

Isolation
1. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, continue with step 2. v No, continue with step 3 on page 240. 2. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors. Use the service terminal to verify that the problem is resolved. Select the cable that was incorrectly connected from the cable list and continue through verification without replacing the cable. v No, go to step 3 on page 240.

Problem Isolation Procedures, CHAPTER 3

239

MAP 3220: Too Few DDMs in a DDM Bay


3. Check the DDM bay in the Additional Message area to see if the correct number of DDMs are installed. See Figure 106 on page 239. v All eight slots should contain DDMs, If too few new DDMs are installed, remove any dummy DDMs and replace them with new DDMs. Where any additional DDMs installed in the DDM bay? v Yes, to verify that the problem has been corrected, select any cable from the service terminal. Continue through verification without replacing the cable. v No, go to step 4. 4. Observe the indicators on the following FRUs at the front of the DDM bay: v DDMs (eight) v Bypass card v Controller card Are any of the indicators on? v Yes, call your next level of support. v No, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261

Figure 107. DDM bay Indicator Locations (S008018l)

MAP 3300: Repair Alternate Cluster to Run SSA Loop Test


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
During a repair or installation, the SSA Loop Verify test could not run from both clusters because one of the clusters is failing. To verify SSA loop operation, the SSA Loop test must be run from both clusters. The other (failing) cluster or cluster communications must be repaired before the SSA loop repair or installation can be completed

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check for open cluster or cluster communications problems.

240

VOLUME 1, TotalStorage ESS Service Guide

MAP 3300: Repair Alternate Cluster Berore SSA Loop


From the service terminal Main Service Menu, select: Repair Menu Show/Repair Problems Needing Repair Look for all cluster and cluster communications problems. Were any cluster or cluster communications problems found? v Yes, go to step 3. v No, unexpected results, call your next level of support. 3. Repair all cluster and communications problems in the following order: a. Cluster (local) problems b. Cluster to cluster communications problems c. Cluster (alternate) problems When all cluster and cluster communications problems are resolved, check to see if the original problem is resolved. Go to MAP 3500: Verifying a DDM Bay Repair on page 283.

MAP 3360: Ending a DASD Service Action


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
Before some DASD visual symptom service actions can be completed, this procedure must be done to ensure the status of the 2105 subsystem: Display any related problems shown as needing repair and change their status to closed.

Procedure
Use the description above and these procedures to complete the service action. 1. Display problems needing repair. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Show / Repair Problems Needing Repair Select a Problem to View or Repair v Record the Problem ID of all problems with a Failing Resource of rsrpc..... Note: To find the Failing Resource, select the problem and display the Detail Problem Record. Scroll down the screen until Failing Resource... is displayed. v Press F3 on the service terminal to display the next problem. Record its Problem ID if its Failing Resource is rsrpc..... Repeat this step until all related problem IDs problems have been recorded. 2. Change the state of the open problem with a Failing Resource of rsrpc.... to Closed. Press F3 on the service terminal until the Main Service Menu is displayed, then select:
Problem Isolation Procedures, CHAPTER 3

241

MAP 3360: End a DASD Service Action


Utility Menu Problem Log Menu Change A Problem State Select a problem whose ID was recorded in the last step. Press F4, select Closed, then press Enter. v If this was the only problem with a Failing Resource of rsrpc...., the repair is complete. v If you recorded other problems with a Failing Resource of rsrpc...., continue with the next step. 3. Close any other open problems recorded earlier. Press F3 on the service terminal twice to display the Problem Log Menu, then select: Change A Problem State Select a problem whose ID was recorded in the step 1 on page 241. Press F4, select Closed, then press Enter. Repeat this step until all open problems recorded earlier are closed. When these problems are all closed the repair is complete.

MAP 3375: Isolating a Storage Cage Fan/Power Sense Card Error


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Only one DDM bay has sensed a storage cage fan/power sense card failure. The other installed DDM bays, that monitor the same card, did not sense the failure. If the storage cage fan/power sense card was failing, all of the DDM bays should have reported the failure. This indicates that the storage cage fan/power sense card is OK. The fault reporting path, through the DDM bay that reported the failure, is not working correctly.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine which DDM bay reported the storage cage fan/power sense card failure and replace its DDM bay controller card. See Controller Card, DDM Bay in chapter 4 of the Volume 2. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 3. Refer to the following figures to determine the physical location of the DDM bay that you just replaced the controller card in: v 2105 Model 800, see Figure 108 on page 244. v 2105 Expansion Enclosure, see Figure 109 on page 245.

242

VOLUME 1, TotalStorage ESS Service Guide

MAP 3375: Storage Cage Fan/Power Sense Card Problem


The DDM bay controller cards are fed from the storage cage fan/power sense card using several separate buses. Each bus goes to four DDM bays. The following lists the DDM bays that share a bus from the sense card: v U1-W1, W2, W5, W6 v U1-W3, W4, W7, W8 v U2-W1, W2, W5, W6 v U2-W3, W4, W7, W8 v U3-W1, W2, W5, W6 v U3-W3, W4, W7, W8 v U4-W1, W2, W5, W6 v U4-W3, W4, W7, W8 One of the controller cards sharing the bus may be blocking communications on the bus from the storage cage fan/power sense card. The controller card reported as bad may not be the card causing the problem. Move the new DDM bay controller card through each of the other three DDM bays that share the same bus with the bay you have been working on. After each controller card move (replacement), verify the repair to see if the controller card you just replaced is the failing one. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 4. Replace the power planar to DDM bay planar cable to the DDM bay that reported the failure. See Cables, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. Verify the repair. Return to the service terminal and select the DDM bay controller card for replacement. Proceed through the repair but do not replace the DDM bay controller card, this will simulate a repair and run verification. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 5. Replace the pack frame assembly (backplane) in the DDM bay that reported the failure. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Verify the repair. Return to the service terminal and select the DDM bay controller card for replacement. Proceed through the repair but do not replace the DDM bay controller card, this will simulate a repair and run verification. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 6. Replace the storage cage power planar. See Storage Cage Power Planar, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. After the replacement verify the repair. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, call your next level of support.

Problem Isolation Procedures, CHAPTER 3

243

MAP 3375: Storage Cage Fan/Power Sense Card Problem


(R1-) Storage Cage 1 (-U1-) R1-U1-W1 R1-U1-W2 R1-U1-W3 R1-U1-W4
1 2 3 4 5 6 7 8

Storage Cage 2 (-U2-) R1-U2-W1 R1-U2-W2 R1-U2-W3 R1-U2-W4

Front View

Storage Cage 2 (-U2-) R1-U2-W5 R1-U2-W6 R1-U2-W7 R1-U2-W8


1 2 3 4 5 6 7 8

Storage Cage 1 (-U1-) R1-U1-W5 R1-U1-W6 R1-U1-W7 R1-U1-W8

Rear View
Figure 108. 2105 Model 800 DDM Bay Locations (s009136)

244

VOLUME 1, TotalStorage ESS Service Guide

MAP 3378: Storage Cage Fan/Power Sense Card Problem


2105 Expansion Enclosure (R2-) Storage Cage 1 (-U1-) R2-U1-W1 12345678 R2-U1-W2 R2-U1-W3 R2-U1-W4 Storage Cage 3 (-U3-) R2-U3-W1 R2-U3-W2 R2-U3-W3 R2-U3-W4 Storage Cage 2 (-U2-) R2-U2-W1 R2-U2-W2 R2-U2-W3 R2-U2-W4 Storage Cage 4 (-U4-) R2-U4-W1 R2-U4-W2 R2-U4-W3 R2-U4-W4

Front View

Storage Cage 2 (-U2-) R2-U2-W5 R2-U2-W6 R2-U2-W7 R2-U2-W8 Storage Cage 4 (-U4-) R2-U4-W5 R2-U4-W6 R2-U4-W7 R2-U4-W8 12345678

Storage Cage 1 (-U1-) R2-U1-W5 R2-U1-W6 R2-U1-W7 R2-U1-W8 Storage Cage 3 (-U3-) R2-U3-W5 R2-U3-W6 R2-U3-W7 R2-U3-W8

Rear View
Figure 109. 2105 Expansion Enclosure DDM Bay Locations (S007741s)

MAP 3378: Isolating a Storage Cage Fan/Power Sense Card Error


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Problem Isolation Procedures, CHAPTER 3

245

MAP 3378: Storage Cage Fan/Power Sense Card Problem

Description
Multiple DDM bays have sensed a storage cage fan/power sense card failure. The storage cage fan/power sense card is the most likely FRU. There is a small chance that the storage cage power planar is failing.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Replace the storage cage fan/power sense card. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. After the replacement, verify the repair. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 3. 3. Replace the storage cage power planar. See Storage Cage Power Planar, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. After the replacement verify the repair. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, call your next level of support.

MAP 3379: Analyzing a Storage Cage Fan/Power Sense Card Check Summary Indicator On
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A storage cage fan/power sense card Check Summary indicator is on. This indicator is on when the fan/power sense card detects a problem with one of the storage cage fans or power supplies that it monitors.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Use the service terminal to check for open problems: From the service terminal Main Service Menu, select: Repair Menu Show/Repair Problems Needing Repair Menu If there are any open storage cage fan or power supply faults, select and repair them.

246

VOLUME 1, TotalStorage ESS Service Guide

MAP 3379: Storage Cage Fan/Power Sense Card Check Summary Indicator On
v If there are any open storage cage fan or power supply faults, select and repair them. v If there are not any open storage cage fan or power supply faults, go to the next step. 3. Run the machine test on All SSA Loops. From the service terminal Main Service Menu, select: Machine Test Menu SSA Loops Menu Select SSA Loops by SSA Device Card All Loops Run the SSA loop test on all SSA loops attached to an SSA device card If Machine Test found any problems, repair them. If Machine Test did not fine any problems, replace the storage cage fan/power sense card, see Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. Is the problem resolved? - Yes, end call. - No, call your next level of support.

MAP 3381: Isolating a Storage Cage Fan/Power Sense Card Error


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Only one DDM bay sensed a storage cage fan/power sense card failure. No other DDM bays are installed in the half-rack being sensed by the storage cage fan/power sense card. The most likely FRUs are the storage cage fan/power sense card or the DDM bay controller card in the reporting DDM bay. The problem could be a failure in the error reporting path.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Replace the storage cage fan/power sense card. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 3. 3. Determine which DDM bay reported the storage cage fan/power sense card failure and replace its DDM bay controller card. See Controller Card, DDM Bay in chapter 4 of the Volume 2. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 4 on page 248.
Problem Isolation Procedures, CHAPTER 3

247

MAP 3381: Storage Cage Fan/Power Sense Card Problem


4. Replace the power planar to 8-pack planar cable to the DDM bay that reported the failure. See Cables, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. Verify the repair. Return to the service terminal and select the DDM bay controller card for replacement. Proceed through the repair but do not replace the DDM bay controller card, this will simulate a repair and run verification. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 5. 5. Replace the 8-pack frame assembly (backplane) in the DDM bay that reported the failure. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Verify the repair. Return to the service terminal and select the DDM bay controller card for replacement. Proceed through the repair but do not replace the DDM bay controller card, this will simulate a repair and run verification. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 6. 6. Replace the storage cage power planar. See Storage Cage Power Planar, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. After the replacement verify the repair. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, call your next level of support.

MAP 3384: Isolating a Storage Cage Fan Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A storage cage cooling fan failure has been reported. It could be one of the storage cage fans in the top of the 2105, or one of the two fans in the front of the 2105 Model 800 between the DDM bays. The most likely FRU is the failing fan. The fan fault reporting circuits could also be reporting a false fan error. Note: Every fan connector on the storage cage power planar must be plugged with a fan cable or a fan jumper. If any connector is empty, a false fan error will be created.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine which storage cage fan reported the storage cage fan failure. Locate the failing fan in the 2105, see chapter 7, volume 3 of this book for: v 2105 Model 800 and Expansion Enclosure Storage Cage Fan (Top) Location Codes in chapter 7 of the Volume 3

248

VOLUME 1, TotalStorage ESS Service Guide

MAP 3384: Storage Cage Fan Problem


v 2105 Model 800 and Expansion Enclosure Storage Cage Fan (Center) Location Codes in chapter 7 of the Volume 3 Is there a real fan, not a dummy fan, installed in the failing fans location? v Yes, go to step 5 on page 250. v No, go to step 3. 3. You have verified that the location of the failing fan being reported in the problem log is a dummy fan. The failing FRU is either the storage cage power planar or a missing dummy fan jumper that should be plugged into the storage cage power planar connector. Use figure 110 and the table below it to translate your failing fan location code to the storage power planar connector. Note: Every fan connector on the storage cage power planar must be plugged with a fan cable or a fan jumper. If a connector is empty, a false fan error will be created. Is the storage cage power planar fan jumper installed correctly for the failing fan? v Yes, go to step 6 on page 250. v No, install the storage cage power planar fan jumper. Continue with the next step.

Storage Bay Power Planar

J18

J28

J31

J33

J35 J36

J37

J39

J41 J40 J42 J43

J32 J17 J27 J44 J16 J26

J34

J38

J15

J25

J14

J24

J13

J23

J12

J22

J11 Front View

J21

Figure 110. Storage Cage Power Planar Fan Jumper Locations (s008352p) Connector Number J 31 Storage cage fan location code Rx-U1 or U3-F1
Problem Isolation Procedures, CHAPTER 3

249

MAP 3384: Storage Cage Fan Problem


Connector Number J 32 J 33 J 34 J 35 J 36 J 37 J 38 J 39 J 40 J 41 J 42 J 43 J 44 Storage cage fan location code Rx-U1 or U3-F2 Rx-U1 or U3-F3 Rx-U1 or U3-F4 Rx-U1 or U3-F5 Rx-U1 or U3-F6 Rx-U2 or U4-F4 Rx-U2 or U4-F5 Rx-U2 or U4-F6 Rx-U2 or U4-F1 Rx-U2 or U4-F2 Rx-U2 or U4-F3 Rx-Q1 or Q2-F1 Rx-Q1 or Q2-F2

4. Verify the repair. Return to the service terminal and select the storage cage fan for replacement. Proceed through the repair but do not replace the storage cage fan, this will simulate a repair and run verification. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 6. 5. Replace the failing storage cage fan. See Storage Cage Fan (Top) Removal and Replacement, 2105 Model 800 and Expansion Enclosure or Storage Cage Fan (Center), 2105 Model 800 and Expansion Enclosurein chapter 4 of the Volume 2. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 6. 6. Replace the storage cage fan/power sense card. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 7. Replace the DDM bay controller card. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 8. Disconnect the cable to the failing fan at the fan and the storage cage power planar. Connect a storage cage fan FRU cable to the fan and the storage cage power planar. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 9. 9. Replace the storage cage power planar. See Storage Cage Power Planar, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2.

250

VOLUME 1, TotalStorage ESS Service Guide

MAP 3384: Storage Cage Fan Problem


After the replacement verify the repair. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, call your next level of support.

MAP 3387: Isolating a Storage Cage Power Supply Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A storage cage power supply failure has been reported. The failure could be the storage cage power supply, its dc input voltage, or its error reporting path.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine which storage cage power supply is failing. Locate the failing power supply, see Rack, 2105 Model 800 and Expansion Enclosure Storage Cage Power Supply Location Codes in chapter 7 of the Volume 3. Is there a real power supply, not a dummy power supply, installed in the failing power supply location? v Yes, go to step 3. v No, go to step 15 on page 255. 3. Observe the power switch on the failing storage cage power supply. Is the storage cage power supplies power switch set to On (up)? v Yes, go to step 4 on page 252. v No, set the switch to On (up). Verify the problem has been corrected. If the FRU is listed on the problem details screen, select it and do a pseudo repair (do not actually replace it) of the storage cage power FRU so that the FRU verification tests will run and report the results.

Storage CagePower Supply

Input Power Indicators

Power Switch

CHK/PWR Good Indicator

Figure 111. Storage Cage Power Supply Locations (s009536)

Problem Isolation Procedures, CHAPTER 3

251

MAP 3387: Storage Cage Power Supply Problem


4. Observe the two green input power indicators (PWR 1 and 2) on the failing storage cage power supply: PWR-1, PPS-2 Power PWR-2, PPS-1 Power Are both of the indicators on? v Yes, go to step 13 on page 254. v No, do one of the following: If the PWR-1 and PWR-2 indicators are both off, go to step 5. If only the PWR-1 indicator is off, go to step 7. If only the PWR-2 indicator is off, go to step 6. 5. Replace the failing storage cage power supply, then verify the repair. Is the storage cage power supply problem corrected? v Yes, use the service terminal to close the problem and end the call. v No, call your next level of support. 6. Do the following steps only on PPS-1 and the failing storage cage power supply. Go to step 8. 7. Do the following steps only on PPS-2 and the failing storage cage power supply. Go to step 8. 8. Locate primary power supply (PPS) circuit breaker (CB that supplies power to the failing storage cage power supply:

CB1 CB2 CB3 CB4 CB5 J1 J2 J3 J4 J7-1 J7-2 J7-3 J7-4 J7-5
J5A J5B J6

CB00

Rear View
Figure 112. Primary Power Supply CB and Connector Locations (S008496l) Failing Storage Cage Power Supply (SCPS) SCPS-1 SCPS-2 SCPS-3 SCPS-4 SCPS-5 SCPS-6 CB Check for 2105 Model 800 and Expansion Enclosure Storage Cages 1 and 2 (upper) CB-3 CB-4 CB-3 CB-4 CB-3 CB-4 CB Check for Expansion Enclosure Storage Cages 3 and 4 (lower) CB-1 CB-2 CB-1 CB-2 CB-1 CB-2

Is the input power CB for the failing storage cage power supply tripped (down)? v Yes, go to MAP 2520: PPS Output Circuit Breaker Tripped on page 168. v No, go to step 9 on page 253.

252

VOLUME 1, TotalStorage ESS Service Guide

MAP 3387: Storage Cage Power Supply Problem


9. Check the indicators on the front of the PPS Are the following PPS indicators as shown? v PPS Good indicator, On v PPS Fault indicator, Off v Yes, go to step 10. v No, go to step MAP 1320: Isolating Problems Using Visual Symptoms on page 60 in chapter 3, volume 1 of this book. 10. Locate the primary power supply (PPS) to storage cage power supply (SCPS) cable for the failing indicator PWR-1 or -2 power indicator. Verify that the cable is connected at the storage cage power supply and the PPS. Use the correct table below for the failing SCPS and the storage cages it is associated with (upper or lower): v If the failing SCPS is in an 2105 Model 800 rack, use Table 31 v If the failing SCPS is in an 2105 Expansion Enclosure, storage cages 1 and 2 (upper) use Table 31 v If the failing SCPS is in an 2105 Expansion Enclosure, storage cages 3 and 4 (lower) use Table 32 on page 254 2105 Model 800 and Expansion Enclosure, Storage Cages 1 and 2 (upper)
Table 31. 2105 Model 800 and Expansion Enclosure, Storage Cages 1 and 2 (upper) Failing Storage Cage Power Supply (SCPS) SCPS-1 SCPS-1 SCPS-2 SCPS-2 SCPS-3 SCPS-3 SCPS-4 SCPS-4 SCPS-5 SCPS-5 SCPS-6 SCPS-6 Failing SCPS PWR (Power) Indicator PWR-2 PWR-1 PWR-2 PWR-1 PWR-2 PWR-1 PWR-2 PWR-1 PWR-2 PWR-1 PWR-2 PWR-1

SCPS and PPS Connectors to Check SCPS-1, J2 and PPS-1, J7-3 SCPS-1, J1 and PPS-2, J7-3 SCPS-2, J2 and PPS-1, J7-4 SCPS-2, J1 and PPS-2, J7-4 SCPS-3, J2 and PPS-1, J7-3 SCPS-3, J1 and PPS-2, J7-3 SCPS-4, J2 and PPS-1, J7-4 SCPS-4, J1 and PPS-2, J7-4 SCPS-5, J2 and PPS-1, J7-3 SCPS-5, J1 and PPS-2, J7-3 SCPS-6, J2 and PPS-1, J7-4 SCPS-6, J1 and PPS-2, J7-4

Problem Isolation Procedures, CHAPTER 3

253

MAP 3387: Storage Cage Power Supply Problem


Expansion Enclosure, Storage Cages 3 and 4 (lower)
Table 32. Expansion Enclosure, Storage Cages 3 and 4 (lower) Failing Storage Cage Power Supply (SCPS) SCPS-1 SCPS-1 SCPS-2 SCPS-2 SCPS-3 SCPS-3 SCPS-4 SCPS-4 SCPS-5 SCPS-5 SCPS-6 SCPS-6 Failing SCPS PWR (Power) Indicator PWR-2 PWR-1 PWR-2 PWR-1 PWR-2 PWR-1 PWR-2 PWR-1 PWR-2 PWR-1 PWR-2 PWR-1

SCPS and PPS Connectors to Check SCPS-1, J2 and PPS-1, J7-1 SCPS-1, J1 and PPS-2, J7-1 SCPS-2, J2 and PPS-1, J7-2 SCPS-2, J1 and PPS-2, J7-2 SCPS-3, J2 and PPS-1, J7-1 SCPS-3, J1 and PPS-2, J7-1 SCPS-4, J2 and PPS-1, J7-2 SCPS-4, J1 and PPS-2, J7-2 SCPS-5, J2 and PPS-1, J7-1 SCPS-5, J1 and PPS-2, J7-1 SCPS-6, J2 and PPS-1, J7-2 SCPS-6, J1 and PPS-2, J7-2

Is the storage cage P.S. cable connected correctly? v Yes, go to step 11. v No, reseat the cable as required. If the green PWR-1 or -2 Power indicator is now on, the problem is resolved. Use the service terminal to verify the problem and close it. If the green PWR-1 or -2 Power indicator is still off, go to step 11. 11. Swap the two input power cables, J1 and J2, on the rear of the failing storage cage power supply. Observe the status of the PWR-1 and -2 Power indicators. Did the PWR-1 and -2 Power indicator swap states (On now Off and Off now On)? v Yes, go to step 12. v No, replace the storage cage power supply. See Storage Cage Power Supply, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. If the problem is not resolved, call your next level of support. 12. Swap the two input power cables, J1 and J2, back to their original positions. Replace the primary P.S. to storage cage P.S. cable associated with the PWR-1 or -2 power indicator that is Off. See Cables, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. If the problem is not resolved, call your next level of support. 13. Replace the storage cage power supply. See Storage Cage Power Supply, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. Observe the CHK/PWR GOOD indicator On (green)? Is the storage cage power supply problem resolved? v Yes, the problem is resolved. Return to the service terminal and Continue Repair Process to return the resources to the customer and cancel the problem. v No, continue with the next step.

254

VOLUME 1, TotalStorage ESS Service Guide

MAP 3387: Storage Cage Power Supply Problem


14. Is the CHK/PWR GOOD indicator On (amber) on all installed storage cage power supplies? v Yes, go to MAP 3391: Isolating a Storage Cage Power System Problem v No, go to step 15. 15. Replace the storage cage fan/power sense card. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. Is the storage cage power supply problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 16. Replace the DDM bay controller card. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 17. Replace the storage cage power planar. See Storage Cage Power Planar, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. After the replacement verify the repair. Is the storage cage power supply problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, call your next level of support.

MAP 3391: Isolating a Storage Cage Power System Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
SSA DASD DDM bay power problem. A group of storage cage power supplies are failing. The storage cage power supplies shut down when they cannot maintain their output voltage. This can be caused by too few storage cage power supplies or by a short circuit on their output voltage. All of the storage cage power supplies feed a common voltage bus. A short on the bus will affect all attached storage cage power supplies. With this failure, the CHK/POWER GOOD indicators on all associated storage cage power supplies will be On (amber). Note: The CHK/POWER GOOD indicator can be on with the color amber or green. v Amber is CHK (check) v Green is POWER GOOD

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first.

Problem Isolation Procedures, CHAPTER 3

255

MAP 3391: Storage Cage Power Problem


v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine if the failing storage cage power supplies are associated with storage cages 1 and 2 or storage cages 3 and 4. To locate the failing power supply and which storage cage it is mounted in, see Rack, 2105 Model 800 and Expansion Enclosure Storage Cage Power Supply Location Codes in chapter 7 of the Volume 3. Note: A storage cage is the enclosure with four DDM bays, in front and four DDM bays in the rear. v 2105 Model 800 Storage cage 1 and 2, storage cage power supplies v 2105 Expansion Enclosure Storage cage 1 and 2, storage cage power supplies Storage cage 3 and 4, storage cage power supplies Verify that the switches on the rear of all affected storage cage power supplies are set to On (up). Were all of the switches set to On (up). v Yes, go to step 3. v No, set all of the switches to On (up), then go to step 3.

Storage Cage Power Supply

Input Power Indicators

Power Switch

CHK/PWR Good Indicator

Figure 113. Storage Cage Power Supply Locations (S008495m)

3. Determine if the correct number of storage cage power supplies are installed. Count the DDM bays and the storage cage power supplies installed in the storage cages associated with the failing power supplies (storage cages 1 and 2 or 3 and 4).
Table 33. Storage Cage Power Supply Installation Requirements Number of DDM bays Installed 1 to 8 1 to 8 and 9 to 16 Minimum Number of Storage Cage Power Supplies Required 4 6

Are the correct number of storage cage power supplies installed for the number of DDM bays installed?

256

VOLUME 1, TotalStorage ESS Service Guide

MAP 3391: Storage Cage Power Problem


Note: It is OK to have more storage supplies installed than required. v Yes, go to step 4. v No, install the missing storage cage power supplies. See Storage Cage Power Supply, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. 4. Go to the operator panel on the front of the 2105 Model 800, use the Local Power switch to power the subsystem completely off then on. Go to the rear of the failing 2105 and observe the CHK/PWR GOOD indicators on the failing storage cage power supplies. Are all of the failing storage cage power supply CHK/POWER GOOD indicators still On (amber)? v Yes, there is an overcurrent on the output of the failing storage cage power supplies, go to step 5. v No, go to step 21 on page 261. 5. Determine if the overcurrent is caused by the storage cage fans or the storage cage fan/power sense card: a. Power the subsystem off. b. Disconnect all of the storage cage fans from their storage cage planar. c. Remove the storage cage fan/power sense card from the failing 2105. d. Power the subsystem on. Attention: Do not leave subsystem power on for more then five minutes with the cooling fans disconnected. e. Observe the CHK/POWER GOOD indicators on all of the failing storage cage power supplies. Are all of the failing storage cage power supply CHK/POWER GOOD indicators still On (amber)? v Yes, the fan FRUs are not causing the overcurrent. Go to step 8 on page 258. v No, one of the disconnected fan FRUs is causing the overcurrent; go to step 6. 6. Inspect all of the storage cage fans, the fan sense card, and their cables for obvious damage. Repair any problems and found. Were any problems found and repaired? v Yes, verify the repair. If the problem was resolved, go to step 21 on page 261. If the problem was not resolved, go to step 7. v No, go to step 7. 7. Determine which of the disconnected fan FRUs is causing the overcurrent: a. Reconnect one of the disconnected storage cage fans. Attention: Do not leave subsystem power on for more then five minutes with the cooling fans disconnected. b. Observe the CHK/POWER GOOD indicators on all of the failing storage cage power supplies. Are all of the failing storage cage power supply CHK/POWER GOOD indicators On (amber)? v Yes, the fan FRUs you just reconnected is causing the overcurrent, replace it. See Storage Cage Fan (Center), 2105 Model 800 and Expansion Enclosure, Storage Cage Fan, 2105 Model 800 and Expansion
Problem Isolation Procedures, CHAPTER 3

257

MAP 3391: Storage Cage Power Problem


Enclosure, or Storage Cage Fan/Power Sense Card, 2105 Model 800, all in chapter 4 of the Volume 2. Go to step 21 on page 261. v No, repeat the above steps on each fan FRU until all of the storage cage fans are reconnected and the storage cage fan/power sense card is installed. Note: After all of the fans are reconnected, reinstall the storage cage fan/power sense card. 8. Reconnect any disconnected storage cage fan cables and reinstall the storage cage fan/power sense card, as required. Continue with the next step. 9. Determine if the overcurrent is caused by the DDM bays associated with the failing storage cage power supplies: a. Power the subsystem off. b. Remove the four screws that hold each DDM bay in the storage cages associated with the failing storage cage power supplies. c. Pull each DDM bay out about 5 cm (2 inches). d. Power the subsystem on. e. Observe the CHK/POWER GOOD indicators on all of the failing storage cage power supplies. Are all of the failing storage cage power supply CHK/POWER GOOD indicators still On (amber)? v Yes, the DDM bays are not causing the overcurrent. go to step 12. v No, one of the disconnected DDM bays is causing the overcurrent, go to step 10. 10. Determine which of the disconnected DDM bays is causing the overcurrent: a. Power the subsystem off. b. Reinstall one of the disconnected DDM bays. c. Power the subsystem on. d. Observe the CHK/POWER GOOD indicators on all of the failing storage cage power supplies. Are all of the failing storage cage power supply CHK/POWER GOOD indicators On (amber)? v Yes, the DDM bay you just reconnected is causing the overcurrent, go to step 19 on page 260. v No, repeat the above steps on each DDM bay until all of the DDM bays are reinstalled. 11. Power the subsystem off. Reinstall all of the DDM bays. Continue with the next step. 12. Determine if the overcurrent is caused by one of the storage cage power supplies: a. Power the subsystem off. b. Remove the two mounting screws from all of the failing storage cage power supplies. See Storage Cage Power Supply, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. c. Pull all of the storage cage power supplies, except one, out about 5 cm (2 inches). d. Power the subsystem on.

258

VOLUME 1, TotalStorage ESS Service Guide

MAP 3391: Storage Cage Power Problem


e. Observe the CHK/POWER GOOD indicators on all of the failing storage cage power supplies. f. Record which storage cage power supply is installed and the state of its CHK/POWER GOOD indicator (amber or green). g. Pull the storage cage power supply out about 5 cm (2 inches). h. Repeat this test until each of the storage cage power supplies have been installed and the state of their CHK/POWER GOOD indicators recorded. After all storage cage power supplies have been tested, continue with the next step. Review the recorded results of the last step: v If the CHK/PWR GOOD indicators were On (amber) for all storage cage power supplies, go to step 14. v If the CHK/PWR GOOD indicators were On (amber) for only one storage cage power supplies, replace it. See Storage Cage Power Supply, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. Reinstall all of the storage cage power supplies, then go to step 21 on page 261. Power the subsystem off. Verify that all storage cage power supplies are reinstalled correctly. Continue with the next step. Determine if the overcurrent is caused by the storage cage planar or the power planar to DDM bay backplane cables associated with the failing storage cage power supplies: a. Power the subsystem off. b. Disconnect all of the power planar to DDM bay backplane cables from the storage cage planar associated with the failing power supplies. See Cables, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. c. Power the subsystem on. d. Observe the CHK/POWER GOOD indicators on all of the failing storage cage power supplies. Are all of the failing storage cage power supply CHK/POWER GOOD indicators still On (amber)? v Yes, the power planar to DDM bay backplane cables are not causing the overcurrent. go to step 18 on page 260. v No, one of the disconnected power planar to DDM bay backplane cables is causing the overcurrent, go to step 16. Determine which of the disconnected the power planar to DDM bay backplane cables is causing the overcurrent: a. Power the subsystem off. b. Reconnect one of the disconnected power planar to DDM bay backplane cables. c. Power the subsystem on. d. Observe the CHK/POWER GOOD indicators on all of the failing storage cage power supplies. Are all of the failing storage cage power supply CHK/POWER GOOD indicators On (amber)? v Yes, the power planar to DDM bay backplane cable you just reconnected is causing the overcurrent, replace it. See Cables, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. After the repair, go to step 21 on page 261.
Problem Isolation Procedures, CHAPTER 3

13.

14. 15.

16.

259

MAP 3391: Storage Cage Power Problem


v No, repeat the above steps on each power planar to DDM bay backplane cable until all of the cables are reinstalled. 17. Power the subsystem off. Reinstall all of the power planar to DDM bay backplane cables. Continue with the next step. 18. Replace the storage cage power planar. See Storage Cage Power Planar, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. Reinstall all assemblies and FRUs removed as part of this procedure. See the Chapter Table of Contents in chapter 4 of the Volume 2. After the replacement verify the repair. Is the storage cage fan/power sense card problem resolved? v Yes, end the call. v No, call your next level of support. 19. Determine which of the DDM bay FRUs is causing the overcurrent: Do the following steps on the DDM bay that is causing the overcurrent. a. Power the subsystem off. b. Remove all of the FRUs from the failing DDM bay: v Disk drive modules (DDMs), see SSA Disk Drive Model, DDM Bay in chapter 4 of the Volume 2. Mark the DDMs for reinstallation in the same locations. v DDM bay controller card, see Controller Card, DDM Bay in chapter 4 of the Volume 2. v DDM bay bypass and passthrough cards, see Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. c. Power the subsystem on. d. Observe the CHK/POWER GOOD indicators on all of the failing storage cage power supplies. Are all of the failing storage cage power supply CHK/POWER GOOD indicators still On (amber)? v Yes, replace the DDM bay frame assembly (backplane). See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. After the repair, go to step 21 on page 261. v No, go to step 20. 20. Determine which of the removed DDM bay FRUs is causing the overcurrent: a. Power the subsystem off. b. Reconnect one of the disconnected DDM bay FRUs. c. Power the subsystem on. d. Observe the CHK/POWER GOOD indicators on all of the failing storage cage power supplies. Are all of the failing storage cage power supply CHK/POWER GOOD indicators On (amber)? v Yes, the DDM bay FRU you just reinstalled is causing the overcurrent, replace it. See the Chapter Table of Contents in chapter 4 of the Volume 2. After the repair, go to step 21 on page 261. v No, repeat the above steps on each DDM bay FRU until all of the FRUs are reinstalled. If the problem is still present after all of the DDM bay FRUs are installed, call your next level of support.

260

VOLUME 1, TotalStorage ESS Service Guide

MAP 3391: Storage Cage Power Problem


21. Reconnect all cables and reinstall all assemblies and FRUs removed as part of this procedure. See the Chapter Table of Contents in chapter 4 of the Volume 2. 22. Change the state of the problems related to this failure to Closed, if not already closed. From the service terminal Main Service Menu, select: Press F3 on the service terminal until the Main Service Menu is displayed, then select: Utility Menu Problem Log Menu Change A Problem State Select problems with the following Resource to cancel: v rs SSA xxxx v rsDDMxxxx v rsENCLOSURE Press F4, select Cancel, then press Enter. After all related problems are canceled, continue with the next step. 23. Run DDM bay Power test on all DDM Bay related to the failing storage cage power supplies. From the service terminal Main Service Menu, select: Machine Test Menu SSA Loops Menu Select SSA Loop by SSA Device Card All SSA Loops v If the test runs without error, the problem is resolved. v If the test fails, repair the new problems.

MAP 3395: Isolating a DDM Bay Power Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
DDM bay power problem. All indicators on an DDM bay are off. This indicates that input power to the DDM bay is missing.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Did you start this service action from a problem displayed on a service terminal? v Yes, go to step 5 on page 262.
Problem Isolation Procedures, CHAPTER 3

261

MAP 3395: DDM Bay Power Problem


v No, continue with the next step. 3. Use the service terminal to look for any problems. Repair these problems first then continue with the next step. 4. Are the symptoms that originally sent you to this MAP repaired? v Yes, the problem is resolved end the service call. v No, continue with the next step. 5. Determine if the failing storage cage power supplies are associated with storage cages 1 and 2 or storage cages 3 and 4. To locate the failing power supply and which storage cage it is mounted in, see Rack, 2105 Model 800 and Expansion Enclosure Storage Cage Power Supply Location Codes in chapter 7 of the Volume 3. Note: A storage cage is the enclosure with four DDM bays, in front and four DDM bays in the rear. v 2105 Model 800 Storage cage 1 and 2, storage cage power supplies v 2105 Expansion Enclosure Storage cage 1 and 2, storage cage power supplies Storage cage 3 and 4, storage cage power supplies Is the failing DDM bay in storage cage 1 or 2? v Yes, go to step 6. v No, the failing DDM bay is in storage cage 3 or 4. Go to step 7. 6. Go to the rear of the 2105 Model 800 or 2105 Expansion Enclosure. Locate the storage cage power supplies mounted between storage cages 1 and 2. Observe the CHK/POWER GOOD indicators on all of the storage cage 1 and 2 power supplies.
Storage Cage Power Supply

Input Power Indicators

Power Switch

CHK/PWR Good Indicator

Figure 114. Storage Cage Power Supply Locations (S008495m)

Are all of the storage cage 1 and 2 power supply CHK/POWER GOOD indicators On (amber)? v Yes, MAP 3391: Isolating a Storage Cage Power System Problem on page 255. v No, go to step 8 on page 263. 7. Go to the rear of the 2105 Expansion Enclosure. Locate the storage cage power supplies mounted between storage cages 3 and 4.

262

VOLUME 1, TotalStorage ESS Service Guide

MAP 3395: DDM Bay Power Problem


Observe the CHK/POWER GOOD indicators on all of the storage cage 3 and 4 power supplies. Are all of the storage cage 3 and 4 power supply CHK/POWER GOOD indicators On (amber)? v Yes, MAP 3391: Isolating a Storage Cage Power System Problem on page 255. v No, go to step 8. 8. Replace the power planar to 8-pack planar cable to the failing DDM bay. See Cables, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. Verify the repair. Return to the service terminal and and run the SSA Loop Test on the failing resource listed for this problem. Is the problem resolved? v Yes, end the call. v No, call your next level of support.

MAP 3397: Isolating an SSA DASD DDM Bay Controller Card Problem
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
DDM bay controller card problem. The controller card failure indicator is on.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Did you start this service action from a problem displayed on a service terminal? v Yes, go to step 6. v No, continue with the next step. 3. Use the service terminal to look for any problems. Repair these problems first then continue with the next step. 4. Are the symptoms that originally sent you to this MAP repaired? v Yes, the problem is resolved end the service call. v No, continue with the next step. 5. Replace the controller card, use Controller Card, DDM Bay in chapter 4 of the Volume 2. 6. Determine the location code for the DDM bay that you just replaced the controller card in. The DDM bay location code is in the format: Rx-Uy-Wz. Do you know the DDM bays location code?
Problem Isolation Procedures, CHAPTER 3

263

MAP 3397: DDM Bay Controller Card Problem


v Yes, continue with the next step. v No, determine the location code of the DDM bay. Use Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. 7. Verify that the controller card replacement resolved the problem. From the service terminal Main Service Menu, select: Machine Test Menu SSA Loops Menu SSA Loop by Storage Bay Drawer... Select the line that has the DDM bay location code from the last step (Rx-Uy-Wz). Press enter on the next screen, the verification test will run. v If verification is successful, the problem is resolved. Return to the service terminal and Continue Repair Process to return the resources to the customer and cancel the problem. v If verification is not successful, repair the problem that was created by the test.

MAP 3398: Isolating a DDM Bay Controller Card Communications Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The DDM bay controller card has problems communicating with the bypass card or the passthrough cards in the DDM bay. The cause of the failure may be the controller card, bypass card, one of the pass through cards, or the DDM bay backplane.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the controller card in the FRU list. Select the controller card and replace it. After replacement, verify the repair: v If the problem is resolved, end the call. v If the problem is not resolved, continue with the next step. 3. Verify that the controller card check indicator is on (amber), see DDM Bay Indicators on page 21. v If the check indicator is on, continue with the next step. v If the check indicator is not on, call your next level of support. 4. Select the bypass card from the FRU list for replacement. a. Do not disconnect the SSA cables from the bypass card.

264

VOLUME 1, TotalStorage ESS Service Guide

MAP 3398: DDM Bay Controller Card Communications Problem


b. Follow the service terminal instructions to where you are told to remove the card. c. Pull the card out only until it is unplugged from the backplane. d. Continue with the next step. 5. Check if the controller card check indicator is off with the bypass card out. v If the check indicator is off, continue with the next step. v If the check indicator is still on, plug the bypass card back in and go to step 7. 6. Replace the bypass card and run verification. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Was the verification successful? v Yes, the problem is resolved, end the call. v No, continue with the next step. 7. Select the first passthrough card from the FRU list for replacement. a. Do not disconnect the SSA cables from the passthrough card. b. Follow the service terminal instructions to where you are told to remove the card. c. Pull the card out only until it is unplugged from the backplane. d. Continue with the next step. 8. Check if the controller card check indicator is off with the passthrough card out. v If the check indicator is off, continue with the next step. v If the check indicator is still on, plug the passthrough card back in and go to step 10. 9. Replace the passthrough card and run verification. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Was the verification successful? v Yes, the problem is resolved, end the call. v No, continue with the next step. 10. Select the second passthrough card from the FRU list for replacement. a. Do not disconnect the SSA cables from the passthrough card. b. Follow the service terminal instructions to where you are told to remove the card. c. Pull the card out only until it is unplugged from the backplane. d. Continue with the next step. 11. Check if the controller card check indicator is off with the passthrough card out. v If the check indicator is off, continue with the next step. v If the check indicator is still on, plug the passthrough card back in and go to step 13. 12. Replace the passthrough card and run verification. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Was the verification successful? v Yes, the problem is resolved, end the call. v No, continue with the next step. 13. Select the DDM bay frame from the FRU list for replacement. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Replace the DDM bay backplane then run verification.
Problem Isolation Procedures, CHAPTER 3

265

MAP 3398: DDM Bay Controller Card Communications Problem


Was the verification successful? v Yes, the problem is resolved, end the call. v No, call your next level of support.

MAP 3400: Replacing a DDM Bay Frame Assembly


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
This procedure is used for SSA failures when the service terminal repair process cannot call out the backplane for replacement.

Procedure
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Record the MAP and step number that sent you to this MAP. 3. Verify you are at the SSA link repair screen that did not include the backplane as a FRU. 4. Record the DDM bay number you are repairing. 5. Press F3 on the service terminal until the Repair Menu is displayed, select: Replace a FRU 6. Move the cursor to the DDM bay location for the backplane or frame being replaced, front or back, and press Enter. 7. Replace the selected backplane or frame: v DDM bay frame assembly (backplane). See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. 8. After the DDM bay frame is replaced, follow the instructions displayed on the service terminal to verify the repair process. v If the repair verification runs without error, the problem is resolved. v If the SSA link is still failing, look at the MAP and step that sent you to this MAP. If that step is the last step in the procedure, call the next level of support. If there are more steps in the procedure, continue with that MAP.

MAP 3421: Storage Cage Fan/Power Sense Card R2 Cable Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

266

VOLUME 1, TotalStorage ESS Service Guide

MAP 3421: Fan/Power Sense Card R2 Cable Problem

Description
The storage cage fan/power sense card in the bottom half of a 2105 Expansion Enclosure has reported that it has no cage sense card R2 cable installed. This cable is needed for proper control of fan speeds in the 2105 Expansion Enclosure box. The problem can be caused by one of the following: v The cage sense card R2 cable is not connected correctly. v The cage sense card R2 cable is failing. v The lower fan/power sense card is reporting incorrectly. v A DDM bay controller card is reporting incorrectly.

FAN/POWER SUPPLY CHECK

FAN POWER SENSE CARD CHECK

Rack 1 (Top, R1-Q1-C1)

Rack 2 Storage Cages 1 and 2 only (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)

Figure 115. 2105 Primary Power Supply Connectors (5008774m)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the cage sense card R2 cable that is connected to the upper and lower storage cage fan/power sense cards in the 2105 Expansion Enclosure. Verify that the R2 cable is connected correctly to both sense cards. Did you find and fix a problem with the R2 cable? v Yes, verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If verification is successful, close the problem.
Problem Isolation Procedures, CHAPTER 3

267

MAP 3421: Fan/Power Sense Card R2 Cable Problem


If verification fails, continue with the next step. v No, continue with the next step. 3. Replace the cage sense card R2 cable, and then verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 4. Replace the fan/power sense card show as a FRU by the service terminal, then verify the repair. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 5. Replace the DDM bay controller card shown as a FRU by the service terminal, then verify the repair. See Controller Card, DDM Bay in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, call your next level of support.

MAP 3422: Storage Cage Fan/Power Sense Card R2 Jumper and Cable Problems
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The Storage cage fan/power sense card in the top of the 2105 Expansion Enclosure has reported one of the following: v Missing cage sense card R2 jumper v Missing cage sense card R2 cable

268

VOLUME 1, TotalStorage ESS Service Guide

MAP 3422: Fan/Power Sense Card R2 Jumper and Cable Problems

FAN/POWER SUPPLY CHECK

FAN POWER SENSE CARD CHECK

Rack 1 (Top, R1-Q1-C1)

Rack 2 Storage Cages 1 and 2 only (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)

Figure 116. 2105 Primary Power Supply Connectors (5008774m)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check if there is a storage cage fan/power sense card in the bottom of the 2105 Expansion Enclosure. Is there a lower storage cage fan/power sense card in the 2105 Expansion Enclosure? v Yes, go to step 7 on page 270. v No, continue with the next step. 3. Inspect the upper storage cage fan/power sense card in the 2105 Expansion Enclosure. Verify that cage sense card R2 jumper is present and installed correctly on the upper storage cage fan/power sense card. Did you find and correct a problem with the R2 jumper? v Yes, verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If verification is successful, close the problem. If verification fails, continue with the next step. v No, continue with the next step.

Problem Isolation Procedures, CHAPTER 3

269

MAP 3422: Fan/Power Sense Card R2 Jumper and Cable Problems


4. Replace the R2 jumper and verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 5. Replace the sense card and then verify the repair. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 6. Replace the DDM bay controller card shown as a FRU by the service terminal, then verify the repair. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, call your next level of support. 7. There are storage cage fan/power cards in both the top and the bottom of the 2105 Expansion Enclosure. The cage sense card R2 cable should run from the top to the bottom sense cards. v If the cable is missing or unplugged, install the cable. v If the cable is already installed, continue with the next step. 8. Replace the cage sense card R2 cable, then verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 9. Replace the top storage cage fan/power sense card and then verify the repair. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 10. Replace the bottom storage cage fan/power sense card and then verify the repair. v If the verification was successful, close the problem and end the call. v If the verification was not successful,continue with the next step. 11. Replace the DDM bay controller card shown as a FRU by the service terminal, then verify the repair. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, call your next level of support.

MAP 3423: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Missing Error
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The storage cage fan/power sense card in 2105 Model 800 has reported that the cage sense card R1 jumper is missing.

270

VOLUME 1, TotalStorage ESS Service Guide

MAP 3423: Storage Cage Fan/Power Sense Card R1 Jumper Problem


The problem is one of the following: v The cage sense card R1 jumper is missing v The cage sense card R1 jumper is failing v The fan/power sense card is reporting incorrectly v A DDM bay controller card is reporting incorrectly.

FAN/POWER SUPPLY CHECK

FAN POWER SENSE CARD CHECK

Rack 1 (Top, R1-Q1-C1)

Rack 2 Storage Cages 1 and 2 only (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)

Figure 117. 2105 Primary Power Supply Connectors (5008774m)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Inspect the upper storage cage fan/power sense card in the 2105 Model 800 Verify that cage sense card R1 jumper is present and installed correctly on the storage cage fan/power sense card. Did you find and correct a problem with the R1 jumper? v Yes, verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If verification is successful, close the problem. If verification fails, go to step 3 on page 272. v No, replace the R1 jumper and verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification.
Problem Isolation Procedures, CHAPTER 3

271

MAP 3423: Storage Cage Fan/Power Sense Card R1 Jumper Problem


If the verification was successful, close the problem and end the call. If the verification was not successful, continue with the next step. 3. Replace the sense card and then verify the repair. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 4. Replace the DDM bay controller card shown as a FRU by the service terminal, then verify the repair. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, call your next level of support.

MAP 3424: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Failing Error
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The storage cage fan/power sense card in 2105 Model 800 has reported a failure that is only possible in 2105 Expansion Enclosure. This indicates that the 2105 Model 800 cage sense card R1 jumper is failing.

FAN/POWER SUPPLY CHECK

FAN POWER SENSE CARD CHECK

Rack 1 (Top, R1-Q1-C1)

Rack 2 Storage Cages 1 and 2 only (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)

Figure 118. 2105 Primary Power Supply Connectors (5008774m)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification.

272

VOLUME 1, TotalStorage ESS Service Guide

MAP 3424: Storage Cage Fan/Power Sense R1 Jumper Problem


v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Replace the cage sense card R1 jumper, then verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 3. Replace the storage cage fan/power sense card shown as a FRU by the service terminal, then verify the repair. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 4. Replace the DDM bay controller card shown as a FRU by the service terminal, then verify the repair. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, call your next level of support.

MAP 3425: Isolating a Storage Cage Fan/Power Sense Card R2 Cable Error
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
One of the storage cage fan/power sense cards in 2105 Expansion Enclosure has reported a line open in the cage sense card R2 cable. This cable connects the upper and lower storage cage fan/power sense cards. The most likely cause of the problem is one of the following: v The cage sense card R2 cable is failing v The storage cage fan/power sense card that reported the failure is failing. v A DDM bay controller card is reporting incorrectly.

Problem Isolation Procedures, CHAPTER 3

273

MAP 3425: Storage Cage Fan/Power Sense Card R2 Cable Problem

FAN/POWER SUPPLY CHECK

FAN POWER SENSE CARD CHECK

Rack 1 (Top, R1-Q1-C1)

Rack 2 Storage Cages 1 and 2 only (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)

Figure 119. 2105 Primary Power Supply Connectors (5008774m)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Replace the cage sense card R2 cable, then verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 3. Replace the storage cage fan/power sense card, that was shown as a FRU by the service terminal, then verify the repair. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 4. Replace the DDM bay controller card shown as a FRU by the service terminal, then verify the repair. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, call your next level of support.

274

VOLUME 1, TotalStorage ESS Service Guide

MAP 3426: Storage Cage Fan/Power Sense Card Location Problem

MAP 3426: Isolating a Storage Cage Fan/Power Sense Card Location Error
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The machine hardware is reporting different rack location information from than entered manually at the service terminal. The problem must be corrected. The possible causes of this condition are: v A cage sense card R2 jumper has mistakenly been plugged onto the storage cage fan/power sense card in 2105 Model 800 v A cage sense card R1 jumper has mistakenly been plugged onto the storage cage fan/power sense card in the top half of 2105 Expansion Enclosure v A cage sense card is reporting the location incorrectly. v A DDM bay controller card is reporting the location incorrectly. v The DDM bay location selected by the service support representative for a DDM bay was in the wrong 2105, and needs to be changed.

FAN/POWER SUPPLY CHECK

FAN POWER SENSE CARD CHECK

Rack 1 (Top, R1-Q1-C1)

Rack 2 Storage Cages 1 and 2 only (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)

Figure 120. Fan Sense Card Jumper and Cable Locations (S008774m)

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option.
Problem Isolation Procedures, CHAPTER 3

275

MAP 3426: Storage Cage Fan/Power Sense Card Location Problem


2. Inspect the storage cage fan/power sense card in the 2105 Model 800. If a 2105 Expansion Enclosure is present, inspect the upper storage cage fan/power sense card in it also. Verify that the correct cage sense card Rx jumper is present and installed correctly on the upper storage cage fan/power sense cards. v 2105 Model 800, cage sense card R1 jumper v 2105 Expansion Enclosure, cage sense card R2 jumper Did you find and correct a problem with the Rx jumper? v Yes, verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If verification is successful, close the problem. If verification fails, continue with the next step. v No, continue with the next step. 3. Review the DDM bay location selected by the service support representative. Look below the FRU list on the service terminal, at the line that starts with Additional Message.... Look for the word Reported, followed by the Rack-Bay-Drawer (DDM bay) location reported by the 2105. Then look for the word Entered, followed by the Rack-Bay-Drawer location that was entered by the service support representative. v If the Entered location is correct, go to step 5. v If the Entered location is not correct, the drawer or drawers just installed must be uninstalled and then reinstalled, using the correct location. Continue with the next step 4. Do the following steps to uninstall the drawer or drawers that you just installed: a. Press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Remove Device Drawer Select and quiesce the cluster you are powering off. Attention: Select Continue to Remove Device Drawers. b. Find the lines with the Resource Locations of the DDM bay you just installed. Select the highest line for one of the DDM bays you just installed. That DDM bay, and all the DDM bays below it on the same loop, will be removed from the loop. Note: If you were doing a single DDM bay install, you must remove only that DDM bay. If you were doing a multiple DDM bay install, you must remove all of the new DDM bays that you were installing. c. Continue through the removal process. When complete, you may continue with any operation desired. 5. Replace the DDM bay controller card shown as a FRU by the service terminal. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2 book, then verify the repair. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. After the installation, go to step 5 on page 281.

276

VOLUME 1, TotalStorage ESS Service Guide

MAP 3426: Storage Cage Fan/Power Sense Card Location Problem


6. Replace the storage cage fan/power sense card. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. Is the storage cage power supply problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, call your next level of support..

MAP 3427: Isolating a Storage and DDM Bay Location Error


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The machine hardware is reporting different DDM bay location information than was entered manually at the service terminal. The problem must be corrected. The possible causes for this condition are: v The cage sense card R2 cable has been plugged backwards. The end marked Fan Sense Card Top Power Stack has been plugged into the lower sense card. The end marked Fan Sense Card Bottom Power Stack has been plugged into the upper sense card. v The DDM Bay location selected by the CE for an DDM Bay was in the wrong bay, and needs to be changed. v A DDM bay controller card is reporting incorrectly.

FAN/POWER SUPPLY CHECK

FAN POWER SENSE CARD CHECK

Rack 1 (Top, R1-Q1-C1)

Rack 2 Storage Cages 1 and 2 only (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)

Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)

Figure 121. Fan Sense Card Jumper and Cable Locations (S008774m)

Isolation
1. Read this Attention before replacing any FRUs in this MAP:
Problem Isolation Procedures, CHAPTER 3

277

MAP 3427: Storage and DDM Bay Location Problems


Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Inspect the 2105 Expansion Enclosure, determine if there are storage bays in the top and bottom of the rack. v If there are storage bays in the top and bottom of the 2105 Expansion Enclosure, go to step 3. v If there is a storage bays only in the top of the 2105 Expansion Enclosure, go to step 4. 3. Verify that the cage sense card R2 cable is installed correctly to the top and bottom sense cards. v If you find and fix a problem, return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If the verification was successful, close the problem and end the call. If the verification was not successful, continue with the next step. v If you did not find and a problem, continue with the next step. 4. Review the DDM bay location selected by the service support representative. Look below the FRU list on the service terminal, at the line that starts with Additional Message.... Look for the word Reported, followed by the Rack-Bay-Drawer (DDM bay) location reported by the 2105. Then look for the word Entered:, followed by the Rack-Bay-Drawer location that was entered by the service support representative. Note: You can verify that the Reported location is correct by looking on the Additional Messages line, to the right of the Reported Rack-Bay-Drawer location. You may need to use the arrow keys on the keyboard to scroll to the right. Look for the words DDMSN, followed by the serial number of the DDM that was used to read the Reported location. Following the serial number is the slot number in the DDM bay, in parentheses, where the DDM is located. You should be able to find the DDM with this serial number in the DDM bay slot indicated by the Reported location. If this DDM is not in the DDM bay slot indicated, call your next level of support. v If the entered location is wrong, continue with the next step. v If the reported location is wrong, replace the DDM bay controller card shown as a FRU by the service terminal. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2, then verify the repair. If the verification was successful, close the problem and end the call. If the verification was not successful, call your next level of support. 5. Change the DDM bay location selected by the CE. Do the following steps to uninstall the DDM bay or DDM bays that you just installed: a. Press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu

278

VOLUME 1, TotalStorage ESS Service Guide

MAP 3427: Storage and DDM Bay Location Problems


Remove Device Drawer Select and quiesce the cluster you are powering off. Attention: Select Continue to Remove Device Drawers. b. Find the lines with the Resource Locations of the DDM bays you just installed. Select the highest line for one of the DDM bays you just installed. That DDM bay, and all the DDM bays below it on the same loop, will be removed from the loop. Note: If you were doing a single DDM bay install, you must remove only that DDM bay. If you were doing a multiple DDM bay install, you must remove all of the new DDM bays that you were installing. c. Continue through the removal process. When complete, you may reinstall the DDM bays. Be careful to select the correct locations.

MAP 3428: Isolating a DDM Bay Location Error


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The machine hardware is reporting different DDM bay location information than was entered manually at the service terminal. The problem must be corrected. The possible causes for this condition are: v The power planar to DDM bay planar cable is plugged to the wrong connector position on the storage cage power planar. See Figure 122 on page 281 and Figure 123 on page 282 v The connectors that the power planar to DDM bay planar cable plugs into, may have bent or pushed back pins. v The DDM bay location selected by the service support representative for a DDM bay was in the wrong location, and needs to be changed. v A DDM bay controller card is reporting incorrectly.

Isolation
1. Review the DDM bay location entered by the service support representative. Look below the FRU list on the service terminal, at the line that starts with Additional Message.... Look for the word Reported, followed by the Rack-Bay-Drawer (DDM bay) location reported by the 2105. You can find the actual DDM that was used to read the Reported location. Look on the Additional Messages line, to the right of the Reported Rack-Bay-Drawer location. You may need to use the arrow keys on the keyboard to scroll to the right. Look for the words DDMSN, followed by the serial number of the DDM that was used to read the Reported location. Following the serial number is the slot number in the DDM bay, in parentheses, where the DDM is located. You should be able to find the DDM with this serial number in the DDM bay slot indicated by the Reported location. Then look for the word Entered:, followed by the Rack-Bay-Drawer location that was entered by the service support representative. Carefully review the location that the service support representative entered to determine if it is correct.
Problem Isolation Procedures, CHAPTER 3

279

MAP 3428: DDM Bay Location Problem


v If the location entered by the service support representative is not correct, go to step 2. v If the location entered by the service support representative is correct, go to step 3. 2. Do the following steps to uninstall the DDM bay that you just installed: a. Press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Remove Device Drawer Select and quiesce the cluster you are powering off. Attention: Select Continue to Remove Device Drawers. b. Find the lines with the Resource Locations of the DDM bays you just installed. Select the highest line for one of the DDM bays you just installed. That DDM bay, and all the DDM bays below it on the same loop, will be removed from the loop. Note: If you were doing a single DDM bay install, you must remove only that drawer. If you were doing a multiple DDM bay install, you must remove all of the new DDM bays that you were installing. c. Continue through the removal process. When complete, you may reinstall the DDM bays. Be careful to select the correct locations. Complete the install process. If any problems are found, proceed as directed by the service panel and end this call. Do not proceed to the next step. 3. Replace the DDM bay controller card shown as a FRU by the service terminal. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2 book, then verify the repair. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. After the installation, go to step 5 on page 281. 4. The power planar to DDM bay planar cable may be plugged into the wrong connector position on the storage cage power planar. The connectors that the cable plugs into, may have bent or pushed back pins. Remove the DDM bay in question and verify that the power planar to DDM bay planar cable is plugged correctly: a. Remove the DDM bay, from the 2105. See Frame Assembly Removal and Replacement, DDM Bay in chapter 4 of the Volume 2 book. Do only the steps necessary to remove and replace the DDM bay. b. Verify that the power planar to DDM bay planar cable is plugged correctly. The most likely problem is the cables to a pair of front and rear DDM bays are swapped. See Figure 122 on page 281 and Figure 123 on page 282. c. If the cable is connected correctly, unplug the cables connectors. Examine the pins of the connectors that the cable plugs into. Check for bent or pushed back pins. Repair or replace as required. Did you find and correct a problem with the power planar to DDM bay planar cable? v Yes, continue with the next step. v No, call your next level of support.

280

VOLUME 1, TotalStorage ESS Service Guide

MAP 3428: DDM Bay Location Problem


5. Verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. v If verification is successful, close the problem. v If verification fails, work on the resulting problem.

2105 Model 800 and Expansion Enclosure Storage Cage U1


F1 F3
Power Planar Q1
F1 J18 J17 J16 J15 J14 J13 J12 J11 J28 J27 J26 J25 J24 J23 J22 J21

Storage Cage U2
F1 F3

DDM Bay U1 - W1 DDM Bay U1 - W2 DDM Bay U1 - W3 DDM Bay U1 - W4

DDM Bay U2 - W1 DDM Bay U2 - W2 DDM Bay U2 - W3 DDM Bay U2 - W4

F2

Storage Cage U3
F1 F3
Power Planar Q2
F1 J18 J17 J16 J15 J14 J13 J12 J11 J28 J27 J26 J25 J24 J23 J22 J21

Storage Cage U4
F1 F3

DDM Bay U3 - W1 DDM Bay U3 - W2 DDM Bay U3 - W3 DDM Bay U3 - W4

DDM Bay U4 - W1 DDM Bay U4 - W2 DDM Bay U4 - W3 DDM Bay U4 - W4

F2

Front View
Figure 122. DDM Bay Front Power Cable Locations (S009430)

Note: The two lower storage cages (U3 and U4) are not present in 2105 Model 800s.

Problem Isolation Procedures, CHAPTER 3

281

MAP 3428: DDM Bay Location Problem

2105 Model 800 and Expansion Enclosure Storage Cage U2


F6 F4
Power Planar Q1
J28 J27 J26 J25 J24 J23 J22 J21 J18 J17 J16 J15 J14 J13 J12 J11

Storage Cage U1
F6 F4

DDM Bay U2 - W5 DDM Bay U2 - W6 DDM Bay U2 - W7 DDM Bay U2 - W8

DDM Bay U1 - W5 DDM Bay U1 - W6 DDM Bay U1 - W7 DDM Bay U1 - W8

Storage Cage U4
F6 F4
Power Planar Q2
J28 J27 J26 J25 J24 J23 J22 J21 J18 J17 J16 J15 J124 J13 J12 J11

Storage Cage U3
F6 F4

DDM Bay U4 - W5 DDM Bay U4 - W6 DDM Bay U4 - W7 DDM Bay U4- W8

DDM Bay U3 - W5 DDM Bay U3 - W6 DDM Bay U3 - W7 DDM Bay U3 - W8

Rear View
Figure 123. DDM Bay Rear Power Cable Locations (S009431)

Note: The two lower storage cages (U4 and U3) are not present in 2105 Model 800s.

MAP 3429: Isolating a DDM Location Error


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The machine hardware is reporting different DDM location information than was created internally based on what was entered manually at the service terminal. The problem must be corrected. The possible causes for this condition are: v The SSA loop has been cabled incorrectly. v The DDM bay controller card is reporting the DDM location incorrectly.

282

VOLUME 1, TotalStorage ESS Service Guide

MAP 3429: DDM Location Problem

Isolation
1. Look at the SSA cables displayed on the Detail Problem screen. Compare the SSA cables displayed with the cabling of the DDM bay being Installed/Analyzed. Are any of the SSA cables connected wrong? v Yes, connect the jumper cables to the correct connectors, then verify the repair. Return to the service terminal and select the sense card for replacement Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If the verification was successful, close the problem and end the call. If the verification was not successful, continue with the next step. v No, continue with the next step. 2. Replace the DDM bay controller card shown as a FRU by the service terminal, then verify the repair. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, call your next level of support.

MAP 3500: Verifying a DDM Bay Repair


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
This MAP helps you to verify a repair to a Verifying a DDM bay Repair that generated a problem because it was powered off. This MAP will verify if the problem is resolved.

Isolation
1. Determine if the DDM bay with the problem was just installed into the 2105 or if DDMs were just installed into it. Was the failing DDM bay or its DDMs just installed? v Yes, the DDM bay or its DDMs were just installed. At the service terminal press F3 until the screen that allows the restart of installation is displayed. Restart the installation to verify the repair. If the repair is verified, the installation will resume at the point that the original error was detected. v No, the DDM bay or its DDMs were not just installed. Verify the repair using the service terminal. From the Main Service Menu, select: Machine Test Menu. Machine Test Menu Select SSA Loops Menu. Select the DDM bay you just repaired. Identify the DDM bay by the location code. Did the SSA device test run without error? Yes, go to step 2 on page 284. No, follow the instructions displayed on the service terminal to correct the problem.
Problem Isolation Procedures, CHAPTER 3

283

MAP 3500: Verifying a DDM Bay Repair


2. Go to MAP 1500: Ending a Service Action on page 67.

MAP 3520: DDM Bay Verification for Possible Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
This MAP verifies that an DDM bay is operating correctly when visual symptoms, or other reasons, indicate a possible problem.

Isolation
1. Did you start this service action from a problem displayed on a service terminal? v Yes, go to step 4. v No, continue with the next step. 2. Use the service terminal to look for any problems. Repair these problems first then continue with the next step. 3. Are the symptoms that originally sent you to this MAP repaired? v Yes, the problem is resolved end the service call. v No, continue with the next step. 4. Record the location of the DDM bay that you have just repaired. 5. At the service terminal, press F3 until the Main Service Menu is displayed, select: Machine Test Menu SSA Loops Menu Find the line that has the SSA Device DDM bay with location you recorded. 6. Select a line with the recorded DDM bay location to run the SSA loop test. Select loop A or B for this test, it does not matter which you select. This test will verify correct operation of all of the DDM bays on both loops of that SSA device card.

MAP 3530: SSA Devices Certify Test Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The SSA Devices Certify Test detected a problem. The failure was due to either: v A media problem was detected with one or more DDMs. Or. v Some unrelated occurrence in the system caused the process to abort.

Isolation
1. Observe the Code EC level displayed on the logon screen. Is the code level above 2.3.0.255? v Yes, continue with the next step.

284

VOLUME 1, TotalStorage ESS Service Guide

MAP 3530: SSA Devices Certify Test Failure


v No, go to step 6. 2. Display the list of DDM(s) that failed the latest SSA Devices Certify Test. From the service terminal Main Service Menu, select: Utility Menu Show Status of DDM Format/Certify Processes Menu Show DDMs that failed the SSA Devices Certify Test Are any DDMs listed as having failed on the latest SSA Devices Certify Test? Note: There may be failed DDMs listed under either or both clusters. v Yes, continue with the next step. v No, call your next level of support to determine if any DDMs need to be reformatted or replaced and the process to use. Were more than 4 DDMs listed as having failed the Certify Test? v Yes, call your next level of support to determine which DDMs will need to be reformatted or replaced and the process to use. v No, note the DDM locations then continue with the next step. Perform a pseudo FRU replacement of the failing DDMs: a. From the service terminal Main Service Menu, select: Repair Menu Repair / Verify DDM(s) b. Attempt to select all the DDMs at the same time. If multiple DDMs are located on the same loop, you may have to perform the operation more than once, selecting a different DDM on the loop each time. When you are directed to remove and replace the DDM, leave the original DDM in position. c. When all the Failing DDMs have been pseudo-replaced, Format and Resume the DDMs. From the service terminal Main Service Menu, select: Repair Menu Format/Resume DDM(s) Did any of the DDMs fail again during the pseudo replacement? v Yes, replace the failing DDM or DDMs using the Menu options in the previous step. v No, this problem is now resolved. Use Close a Previously Repaired Problem to close the problem. Continue with the Install process, resolve any remaining problems or if all problems are resolved then go to MAP 1500: Ending a Service Action on page 67. Call your next level of support and then do the following: a. Wait for the next level of support to inspect the Certify result files and DDM status to determine which DDM or DDMs failed the Certify Test. b. Continue as directed by the next level of support to reinitialize or replace the failing DDMs. They may direct you to use the Repair / Verify DDM(s) option or go directly to Format/Resume DDM(s).

3.

4.

5.

6.

MAP 3540: Web Initiated Format Incomplete, User to Restart


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Problem Isolation Procedures, CHAPTER 3

285

MAP 3540: Web Initiated Format Incomplete


Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The web-initiated DDM format operation fails to complete because some unrelated occurrence in the system caused the process to abort. Retrying the DDM format operation may allow the process to run to completion. The web-initiated DDM format operation probably failed because of a problem on the machine or an error recovery by the machine.

Isolation
1. Cancel the ESC 1247 problem generated by the DDM format operation. 2. Retry the DDM Format / Resume operation. From the service terminal Main Service Menu, select: Repair Menu Format / Resume DDM(s) Continue through the instructions to retry the DDM format operation. After the DDM format operation is started, you will be automatically logged off. 3. Log back on the 2105 any time to check the DDM format operation progress. When the DDM format operation has completed. From the service terminal Main Service Menu, select: Repair Menu Show Result of DDM Format / Resume Operation Did the operation complete successfully? v Yes, the problem is resolved. Ask the customer continue their web operation. Notes: a. Customer may or may not have any web operation to continue. b. Customer may want to retry the previous failed web operation. v No, the machine is still failing. Fix any additional problems that occurred on the machine. Retry step 3 if possible. If machine is still failing, call the next level of support.

MAP 3550: Incomplete or Failed Format Process, User to Restart


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The service-initiated DDM format operation fails to complete because some unrelated occurrence in the system caused the process to abort. Retrying the DDM format operation may allow the process to run to completion. The service-initiated DDM format operation probably failed because of a problem on the machine or an error recovery by the machine.

286

VOLUME 1, TotalStorage ESS Service Guide

MAP 3550: Incomplete or Failed Format Process

Isolation
1. Cancel the ESC 1246 problem generated by the DDM format operation. 2. Retry the DDM Format / Resume operation. From the service terminal Main Service Menu, select: Repair Menu Format / Resume DDM(s) Continue through the instructions to retry the DDM format operation. After the DDM format operation is started, you will be automatically logged off. 3. Log back on the 2105 any time to check the DDM format operation progress. When the DDM format operation has completed. From the service terminal Main Service Menu, select: Repair Menu Show Result of DDM Format / Resume Operation Did the operation complete successfully? v Yes, the problem is resolved. v No, the machine is still failing. Fix any additional problems that occurred on the machine. Retry step 3 if possible. If machine is still failing, call the next level of support.

MAP 3560: Unrelated Occurrence, Retry Verification Test


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The verification test did not complete successfully because some unrelated occurrence in the system caused the test to abort. Retrying the verification test will allow the verification test to run to completion. If there is a real problem, you will be directed to a different MAP.

Isolation
v If you are viewing the problem after selecting Show / Repair Detected Problems from the Verification Tests Has Detected Problems screen, rerun the verification test. Press F3 once, then at the new screen select the Run Verification Tests Again option. v If you are not viewing the problem from Show / Repair Detected Problems, select and repair the original problem and choose the original FRU. Do not replace the FRU when instructed to do so. Did repair verification run without error? v If the verification ran without error, the problem is resolved. v If the verification failed, continue with any problem displayed by the verification process. If this same problem continues to occur, there may be another problem on the machine that prevents verification from running successfully. Resolve these problems then retry this problem again. If verification still fails, call your next level of support.

Problem Isolation Procedures, CHAPTER 3

287

MAP 3570: Unrelated Event Caused Resume Failure

MAP 3570: Unrelated Event Caused Resume Fail


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The verification test did not complete successfully because some unrelated occurrence in the system caused the test to abort. Retrying the verification test will allow the verification test to run to completion. If there is a real problem, you will be directed to a different MAP. At the end of a repair process a Resume process is performed that makes the resource available for customer use. During the Resume process an unrelated event occurred that prevented the Resume to complete normally. You will need to go through a pseudo repair process to complete the repair.

Isolation
1. Select the DDM listed in the Possible FRUs to Replace portion of the problem. 2. Proceed through the repair process, when the process instructs you to replace the DDM, do not replace it. Continue through the repair process as if you had replaced the DDM. If this repair process directs you to resolve other problems before completing this problem, do so. Then return to this problem

MAP 3580: DDM, or DDMs, Found in Formatting State During IML


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
One or more DDMs have been found in a formatting state during IML. A possible cause for this condition is: a format process was interrupted by some unrelated occurrence in the system.

Isolation
1. Check if there are any other open problems: v If there are no other problems to repair, go to step 2. v If there are other problems, repair them before continuing with this MAP, then continue with the next step. 2. Cancel the problem. 3. Use the service terminal to format the drawer, or drawers, that has the DDM, or DDMs, found in a formatting state. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Format DDM Bays (Drawers) Format All Drawers Listed Continue through the instructions to format the DDM, or DDMs.

288

VOLUME 1, TotalStorage ESS Service Guide

MAP 3600: Multiple DDM Isolated on an SSA Loop

MAP 3600: Multiple DDMs Isolated on an SSA Loop


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
Multiple DDMs can not be accessed. The open links are on a DDM bay boundary.

Isolation
1. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, go to step 2. v No, go to step 4. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors, go to step 3. v No, go to step 4. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any FRU in the Possible FRUs to Replace list or any cable in the cable list. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, go to step 4. Look at the Additional Message in the Detail Problem Record, it will give you the name and location of one or more failing DDM bays. Find one of these failing DDM bays. See Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. Continue with the next step. Observe the following indicators on the front of the DDM bay: v DDMs (eight) v Bypass card v Controller card

2.

3.

4.

5.

Problem Isolation Procedures, CHAPTER 3

289

MAP 3600: Multiple DDM Isolated on an SSA Loop

Figure 124. DDM bay Indicator Locations (S008018l)

6. Go to the DDM bay and observe the indicators. Note: The front of the DDM bay can be facing the front or rear of the 2105. Are any of the indicators on? v Yes, call your next level of support. v No, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261

MAP 3605: Isolating an Unexpected Result


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
Unexpected results were reported by an SSA component.

Isolation
An unexpected condition was detected, call your next level of support.

MAP 3610: DDM Installation with New Rank Site Capacity


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
This section describes the conditions that created this state. The full storage capacity of all DDMs (Disk Drive Modules) on an SSA loop (or on both loops of an adapter pair for an AAL configured machine) can be used only when all of the DDMs have the same storage capacity. There are times when it is correct to add DDMs of a different capacity to a loop. This can happen when a specific DDM is no longer manufactured and DDMs with a larger storage capacity must be used. There are also times when there is a need to have mixed capacity DDM bay on a single loop (or adapter pair for an AAL configured machine). You have been sent to this MAP because multiple capacity arrays may be created on this loop (or adapter pair for an AAL configured machine), and additional DDMs may be required as spares.

290

VOLUME 1, TotalStorage ESS Service Guide

MAP 3610: DDM Installation with New Rank Site Capacity


If you understand the conditions that created this state, go directly to the Isolation section. If you need more information on allowing this new effective capacity, read the following Detailed Description section.

Detailed Description
This section is to describe the details of the conditions that created this state. The following Isolation section will describe what to do to fix the condition. 1. The capacity of all DDMs on an SSA loop (or on both loops of an adapter pair for an AAL configured machine) are most fully used when all DDMs have the same storage capacity. There are times when there is a need to add DDMs of a different capacity. 2. On each SSA loop, one spare is created for each of the first two arrays of each DDM capacity. 3. There are two possible options to resolving this condition. a. Give permission for the installation to continue with DDMs intermixed as they currently are. b. Remove the DDM bay(s) that you have just installed. 4. The follow items will help you determine the exact condition and what the options mean. 5. On each SSA loop (or on both loops of an adapter pair for an AAL configured machine), DDMs are grouped together as Potential and Configured Rank Sites. Each Rank Site consists of eight DDMs. 6. Arrays consist of seven or eight array member DDMs. All of the members of any array are found on the same rank site. When there are seven members in an array, the additional DDM in that rank site is always assigned as a spare. 7. There is a Utility that allows viewing the Rank Sites on an SSA Loop and the capacities of the DDMs on those Rank Sites. The effective capacity of a Rank Site is determined by the smallest capacity of any DDM on a rank site. 8. Configured rank sites contain those DDMs which have already been assigned as array members, or spare DDMs. Since these rank sites contain customer data, they will not be affected by this MAP. The effective capacity of these rank sites is the same capacity as the smallest capacity DDM in the rank site. Note: There is a possible, but infrequent, situation where an arrays effective capacity will be smaller than the smallest DDM. See the note with Description step 12. 9. All unassigned DDMs on a loop are considered to be Free and have been grouped into potential rank sites. Note: Some DDMs may have a status of Failed and may occur in either rank site. 10. Whenever new DDMs are installed on a loop, these DDMs become Free DDMs. Existing potential rank sites are dissolved releasing their Free DDMs and any spare DDMs. Then all the Free DDMs, both new and previously existing, are grouped together into new potential rank sites. 11. These Free DDMs are then placed in potential rank sites by capacity. The Largest DDMs are placed into rank sites first. When there are not enough DDMs of the largest capacity to fill the next rank site, the next smaller capacity is used. This continues until all the Free DDMs are in potential rank sites. 12. The capacity of an array is determined by the smallest capacity of the member DDMs when the array is created. This will be the smallest DDM in the rank site. If one of the DDMs in a rank site is to become a spare, the largest capacity DDM is chosen for the spare. The rest of the DDMs will become
Problem Isolation Procedures, CHAPTER 3

291

MAP 3610: DDM Installation with New Rank Site Capacity


members of the array. The difference in capacity between a large and small capacity DDM, in the same rank site, will be unused. Note: If, after an array is created, all of the smaller drives fail and are replaced by larger spares, the array capacity will then be less than the smallest drive. 13. This condition occurred when one, or more, potential rank sites was found to have a different effective capacity than previously existing rank sites.

Isolation
1. Do you want to display the capacities and rank sites of the DDMs on this loop? v Yes, go to step 3. 2. v No, continue with the next step. Do you want to complete the installation with the DDMs that are currently on the loop? v Yes, go to step 8 on page 293. v No, go to step 6. To display the capacities of the DDMs on this loop, perform the following: a. Note the Loop Name (color) of the loop where the installation is being done. b. From the service terminal select Exit Install, to display the Main Service Menu, then select: Utility Menu Show Storage Facility Resources Menu List DDMs on an SSA Loop by Rank Site Select the line with the install Loop Name (color). Scroll up and down on the screen to view the Rank Sites and Capacities of the DDMs on this loop. c. Continue with the next step. Now that you have viewed the DDM capacities, do you want to complete the installation with the DDMs that are currently on the loop? v Yes, complete the installation, continue with the next step. v No, go to step 7 to remove the DDM bay(s) or DDM bay(s) you just installed. Return to the Install process on the Service Terminal. Press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Continue into the install process you performed before until the screen that directed you to this MAP appears. Go to step 8 on page 293. At the Service terminal, select Exit Install and you will be at the Main Service Menu. Continue with the next step. Do the following steps to uninstall the DDM bay or DDM bays that you just installed: a. From the service terminal Main Service Menu, select: Install/Remove Menu

3.

4.

5.

6.

7.

292

VOLUME 1, TotalStorage ESS Service Guide

MAP 3610: DDM Installation with New Rank Site Capacity


DDM Bay (Drawer) Menu Remove Device Drawer Select and quiesce the cluster you are powering off. Attention: Select Continue to Remove Device Drawers. b. Find the lines with the Resource Locations of the DDM bays you just installed. Select the highest line for one of the DDM bays you just installed. That DDM bay, and all the DDM bays below it on the same loop, will be removed from the loop. Note: If you were doing a single DDM bay install, you must remove only that DDM bay. If you were doing a multiple DDM bay install, you must remove all of the new DDM bay that you were installing. c. Continue through the removal process. When complete, you may continue with any operation desired. 8. Select Continue with Install. This will continue through the install process to completion and the new effective capacity will be accepted. Installation is complete,

MAP 3612: DDM Installation with Mixed Capacity Rank Site


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
This section describes the conditions that created this state. The full storage capacity of all DDMs (Disk Drive Modules) on an SSA loop can be used only when all of the DDMs have the same storage capacity. There are times when DDMs of a different capacity are added to a loop. This can happen when a specific DDM is no longer manufactured and a DDM with a larger storage capacity must be used as a replacement. There are also times when it is desirable to install DDM bays that contain intermixed capacity DDMs. You have been sent to this MAP to make sure that you intended to install different size DDMs on this loop. If you understand the conditions that created this state, go directly to the Isolation section. If you need more information on to determine if you will allow mixed DDM capacities in a rank site, read the following Detailed Description section.

Detailed Description
This section is to describe the conditions that created this state. The following Isolation section will describe what to do to fix the condition. 1. The capacity of all DDMs on an SSA loop are most fully used when all DDMs have the same storage capacity. There are times when there is a need to add DDMs of a different capacity. 2. There are two possible options to resolving this condition. a. Give permission for the installation to continue with DDMs intermixed as they currently are.
Problem Isolation Procedures, CHAPTER 3

293

MAP 3612: DDM Installation with Mixed Capacity Rank Site


b. Remove the DDM bay(s) that you have just installed. These may be reinstalled with different DDMs. The follow items will help you determine the exact condition and what the options mean. On each SSA loop, DDMs are grouped together as Potential and Configured Rank Sites. Each Rank Site consists of eight DDMs. Configured rank sites contain those DDMs which have already been assigned as array members, or spare DDMs. Since these rank sites contain customer data, they will not be affected by this MAP. Most unassigned DDMs on a loop are considered to be Free and have been grouped into potential rank sites. Some of these unassigned DDMs are configured as spares, if needed, to allow for the configuration of potential rank sites as arrays. Arrays consist of seven or eight array member DDMs. All of the members of any array are found on the same rank site. When there are seven members in an array, the additional DDM in that rank site is always assigned as a spare. A potential rank site will consist of seven Free DDMs and one spare DDM, or eight Free DDMs. Whenever new DDMs are installed on a loop, these DDMs become Free DDMs. Existing potential rank sites are dissolved. When a potential rank site is dissolved, any spare DDM in it is made Free so that all of its DDMs are Free free. All of the Free DDMs (both new and previously existing) are then grouped together into new potential rank sites and any needed spares are created. The DDMs are placed in rank sites by capacity. The largest DDMs are placed into rank sites first. When there are not enough DDMs of the largest capacity to fill the next rank site, the next smaller capacity DDMs are used until all the Free DDMs are in rank sites. The capacity of an array is determined by the smallest capacity of the member DDMs when the array is created. This will be the smallest DDM in the rank site. If one of the DDMs in a rank site is to become a spare, the largest capacity DDM is chosen for the spare. The rest of the DDMs will become members of the array. The difference in capacity between a large and small capacity DDM, in the same rank site, will be unused. than the smallest capacity in the rank site will be unused. When an array is made up of all the same capacity DDMs and spares, the capacity of all of those DDMs will be fully used. You are in this MAP because new DDMs, of different capacities, are being installed on a loop. When configured into an array these DDMs will not allow the full capacity to be used. One, or more, of the potential rank sites exists that has DDMs with different capacities. Note: Seldom will there be more than one such Rank Site. 13. There are two possible options to resolving this condition. a. Give permission for the installation to continue with DDMs intermixed as they currently are. b. Remove the DDM bay(s) that you have just installed.

3. 4. 5.

6.

7.

8. 9.

10.

11.

12.

Isolation
1. Do you want to display the capacities of the DDMs on this loop? v Yes, go to step 3 on page 295. v No, continue with the next step.

294

VOLUME 1, TotalStorage ESS Service Guide

MAP 3612: DDM Installation with Mixed Capacity Rank Site


2. Do you want to complete the installation with the DDMs that are currently on the loop? v Yes, go to step 8 on page 296. v No, go to step 6. 3. To display the RPMs of the DDMs on this loop, perform the following: a. Note the Loop Name (color) of the loop where the installation is being done. b. From the service terminal select Exit Install, to display the Main Service Menu, then select: Utility Menu Show Storage Facility Resources Menu List DDMs on an SSA Loop by Rank Site Select the line with the install Loop Name (color). Scroll up and down on the screen to view the Rank Sites and Capacities of the DDMs on this loop. c. Continue with the next step. Now that you have viewed the DDM RPM speeds, do you want to complete the installation with the DDMs that are currently on the loop? v Yes, complete the installation, continue with the next step. v No, go to step 7 to remove the DDM bay(s) or DDM bay(s) you just installed. Return to the Install process on the Service Terminal. Press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Continue into the install process you performed before until the screen that directed you to this MAP appears. Go to step 8 on page 296. At the Service terminal, select Exit Install and you will be at the Main Service Menu. Continue with the next step. Do the following steps to uninstall the DDM bay or DDM bays that you just installed: a. Press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Remove Device Drawer Select and quiesce the cluster you are powering off. Attention: Select Continue to Remove Device Drawers. b. Find the lines with the Resource Locations of the DDM bays you just installed. Select the highest line for one of the DDM bays you just installed. That DDM bay, and all the DDM bays below it on the same loop, will be removed from the loop. Note: If you were doing a single DDM bay install, you must remove only that DDM bay.

4.

5.

6.

7.

Problem Isolation Procedures, CHAPTER 3

295

MAP 3612: DDM Installation with Mixed Capacity Rank Site


If you were doing a multiple DDM bay install, you must remove all of the new DDM bays that you were installing. c. Continue through the removal process. When complete, you may continue with any operation desired. 8. Select Continue with Install. This will continue through the install process to completion and the new effective capacity will be accepted. Installation is complete,

MAP 3614: DDM Installation Introduces Different RPM


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
During the installation of new DDM bay(s), a DDM was found that has a different RPM than other DDMs previously on the loop. This is permitted, but not recommended. A DDM with a lower RPM will slow the access to any array in which it is included. You may choose to leave this DDM in the loop. If you do, you will not be notified if any other DDMs with this RPM are included in this installation. On any new installations, you will only be notified of a still different RPM DDM.

Isolation
1. Do you want to display the RPMs of the DDMs on this loop? v Yes, go to step 3. v No, continue with the next step. Do you want to complete the installation with the DDMs that are currently on the loop? v Yes, go to step 10 on page 297. v No, go to step 6 on page 297. To display the RPMs of the DDMs on this loop, perform the following: a. Note the Loop Name (color) of the loop where the installation is being done. b. From the service terminal select Exit Install, to display the Main Service Menu, then select: Utility Menu Show Storage Facility Resources Menu List DDMs on an SSA Loop by Rank Site Select the line with the install Loop Name (color). Scroll up and down on the screen to view the Rank Sites and Capacities of the DDMs on this loop. c. Continue with the next step. Now that you have viewed the DDM RPM speeds, do you want to complete the installation with the DDMs that are currently on the loop? v Yes, complete the installation, continue with the next step. v No, go to step 7 on page 297 to remove the DDM bay(s) you just installed. Return to the Install process on the Service Terminal. Press F3 until the Main Service Menu is displayed.

2.

3.

4.

5.

296

VOLUME 1, TotalStorage ESS Service Guide

MAP 3614: DDM Installation Introduces Different RPM


From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Continue into the install process you performed before until the screen that directed you to this MAP appears. Go to step 10. 6. At the Service terminal, select Exit Install and you will be at the Main Service Menu. Continue with the next step. 7. Do you want to leave the DDM bay on the loop and replace only some of the DDMs that are currently in that DDM bay? v Yes, continue with the next step. v No, go to step 9. 8. Replace the desired DDMs and then return to Install for a reverification of the DDMs being installed. Do not replace DDMs in any other DDM bay. v If you were installing a single DDM bay, you may now replace any of the DDMs in that DDM bay. Do not replace DDMs in any other DDM bay. v If you were doing a multiple DDM bay install, you may replace any of the DDMs in those DDM bays that were just newly installed. Do not replace DDMs in any other DDM bay. After all DDMs you wish to replace, have been replaced, go to step 5 on page 296 to verify that the loop is now correct. 9. Do the following steps to uninstall the DDM bay or DDM bays that you just installed: a. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Remove Device Drawer Select and quiesce the cluster you are powering off. Attention: Select Continue to Remove Device Drawers. b. Find the lines with the Resource Locations of the DDM bays you just installed. Select the highest line for one of the DDM bays you just installed. That DDM bay, and all the DDM bays below it on the same loop, will be removed from the loop. Note: If you were doing a single DDM bay install, you must remove only that DDM bay. If you were doing a multiple DDM bay install, you must remove all of the new DDM bays that you were installing. c. Continue through the removal process. When complete, you may continue with any operation desired. 10. Select Continue with Install. This will continue through the install process to completion and the new effective capacity will be accepted. Installation is complete,

Problem Isolation Procedures, CHAPTER 3

297

MAP 3615: DDMs of Same Capacity but Different RPMs on the Same SSA Loop

MAP 3615: DDMs of Same Capacity but Different RPMs on the Same SSA Loop
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
DDMs with the same storage capacity, but different speed (RPM), were found on the same SSA loop. DDMs with RPMs of 15K or higher, are not allowed on the same SSA loop as DDMs with the same capacity, but slower RPM.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Was a DDM with a different RPM, than called for in the FRU list, used as a replacement FRU during a repair? v Yes, get a new DDM FRU of the same RPM as called for in the FRU list, then retry the repair. If the problem is still present, contact your next level of support. v No, continue with the next step. 3. You have attempted to install DDMs that have different RPMs than other DDMs already on the loop. Determine if you need to replace individual DDMs or a DDM bay. Do you need to remove an entire DDM bay? v Yes, do the following: Remove the DDM bays you just attempted to install using the Remove Drawer option. Replace any DDMs with different RPMs than other DDMs already on the loop. Retry the DDM installation. If it still fails, contact your next level of support v No, Replace any DDMs with different RPMs than other DDMs already on the loop. Retry the DDM installation. If it still fails, contact your next level of support

MAP 3617: DDM Size is Not Supported


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

298

VOLUME 1, TotalStorage ESS Service Guide

MAP 3617: DDM Size is Not Supported

Description
A disk drive module or modules, has been detected with an unsupported storage capacity. These DDMs are either: not supported by this machine model, or not supported by the level of Licensed Internal Code (LIC) on this machine.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine if the unsupported DDM was installed: as a replacement FRU during a repair, or as a new DDM. Was the unsupported DDM installed as a replacement FRU during a repair? v Yes, obtain a new DDM FRU that is a compatible or supported replacement FRU for the original DDM, then retry the repair. If the problem happens again with a compatible or supported replacement FRU, contact your next level of support. v No, you have attempted to install a DDM Bay that contains one or more DDMs with a storage capacity that is not supported. Do the following: Remove the DDM bays you just attempted to install using the Remove Drawer option. Replace any DDMs with unsupported capacity with DDMs of a supported capacity. Retry the installation. If the problem happens, contact your next level of support.

MAP 3618: Replacement DDM Has Slower RPM Than Called For
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
A DDM used for replacement has a slower RPM than was called for on the FRU list. It is recommended that a replacement DDM have an equal or higher RPM than called for on the FRU list. If a DDM with a lower RPM is spared into an array with higher RPM DDMs, the performance of that array will be somewhat degraded. If speed of repair is more important than performance, a slower speed DDM can be used by activating the Allow Slower RPM Replacement switch. This flag will be valid only for this repair.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification.
Problem Isolation Procedures, CHAPTER 3

299

MAP 3618: Replacement DDM Has Slower RPM Than Called For
v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine if it is you want to degrade subsystem performance by allowing a lower RPM replacement DDM to be installed (see Description above). Do you want to install a lower RPM DDM and degrade loop performance? v Yes, continue with the next step. v No, go to step 6. 3. You have chosen to degrade loop performance by allowing of a slower RPM replacement DDM than called for on the FRU list. This step will Allow Slower RPM Replacement: a. Return to the service terminal and record the number of the problem you are working on. b. Press F3 until the Main Service Menu is displayed. c. From the service terminal Main Service Menu, select: Configuration Option Menu Change/Show Control Switches d. Select Allow Slower RPM Replacement. e. Change the value to True. f. Continue with the next step. Press F3 until the Main Service Menu is displayed. a. From the service terminal Main Service Menu, select: Repair Menu Show/Repair Problems Needing Repair b. Select the problem with the number you recorded in step 3a. c. Select the DDM on the Possible FRUs to Replace list. d. Continue with the next step. Continue through the repair process until the DDM replacement is called. Do not replace the DDM. Continue through the replace process as if you had replaced the DDM. Did the Repair process complete successfully? v Yes, this problem is resolved. Continue to the end of the repair process to see if there are any additional problems. v No, continue with the problem displayed on the Service Terminal. continue with the next step. Replace the DDM with a correct RPM DDM. a. Select the DDM on the Possible FRUs to Replace list. b. Continue with the next step. Continue through the repair process until the DDM replacement is called. Replace the DDM with another DDM with the correct RPM. Continue through the replace process. Did the Repair process complete successfully? v Yes, this problem is resolved. Continue to the end of the repair process to see if there are any additional problems. v No, continue with the problem displayed on the Service Terminal. continue with the next step.

4.

5.

6.

7.

300

VOLUME 1, TotalStorage ESS Service Guide

MAP 3619: This Repair Requires a Larger Capacity DDM

MAP 3619: This Repair Requires a Larger Capacity DDM


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The storage capacity of the replacement DDM is smaller than required. A replacement DDM must have the same or greater storage capacity of the DDM shown on the FRU list. One of the following conditions exits: v The storage capacity of the replacement DDM is smaller than the DDM listed in the FRU list of the original problem. v The current conditions on the loop now require that the replacement DDM have a storage capacity larger than specified in the FRU list of the original problem. This occurs when a member of a good array is replaced during a service call and the only spare available is of a larger capacity. The required replacement storage capacity of the DDM must now be increased to the size of the DDM listed in the FRU list of the current problem.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Select the DDM listed in the Possible FRUs to Replace portion of the problem. 3. Proceed through the repair process to the DDM replacement. Replace the DDM with a DDM that has the same or larger storage capacity than the DDM requested in the FRUs to Replace portion of the problem.

MAP 3621: New DDM Storage Capacity Smaller Than Original DDMs
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
One or more DDMs have been added to an SSA loop that have a smaller storage capacity than the existing DDMs. All DDMs in an SSA loop must have the same storage capacity.

Isolation
1. Read this Attention before replacing any FRUs in this MAP:

Problem Isolation Procedures, CHAPTER 3

301

MAP 3621: Wrong Storage Capacity DDM Installed


Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine which DDMs were added to the SSA loop that have a smaller storage capacity than the original DDMs. Remove those new DDMs, that have a smaller storage capacity, and replace them with DDMs that have the same or larger storage capacity than the existing DDMs. 3. Continue with the install or repair.

MAP 3625: All DDMs on SSA Loop A Do Not Have the Same Characteristics
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
DDMs have been added to SSA loop A that have different characteristics than the existing DDMs or each other. All DDMs in an SSA loop must have the same bus speed.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the , use that method first. v If the FRU is not listed or selectable in the , use the Repair Menu/Replace a FRU option. 2. Use the service terminal to locate the SSA device card displayed as a Possible FRU to Replace. Copy that Resource Name (rsssaxx). 3. From the service terminal Main Service Menu, select: Utility Menu Show Storage Facility Resources Menu List DDMs on an SSA Loop Select the loop that uses the same SSA device card resources copied and loop A. 4. Observe the bus speed of each DDM on the loop. All DDMs on a loop must have the same characteristics. As required to correct the problem, you will have to replace: v Entire DDM bay, or v Individual DDMs Notes: a. To correct the characteristics problem, only the DDMs or DDM bays that you just placed on the loop should be replaced.

302

VOLUME 1, TotalStorage ESS Service Guide

MAP 3625: DDMs on Loop A Have Mixed Characteristics


b. The model of the DDMs on the loop are shown. This tells you, at least, one model of DDM that can be used on the loop. There may be other DDM models with the same characteristics that can also be used on the same loop. Continue with the next step. 5. Determine if you need to replace individual DDMs or DDM bay. Do you need to remove an entire DDM bay? v Yes, go to step 7. v No, go to step 6. 6. Remove any DDMs with the wrong characteristics and replace them with the correct DDMs. After this, determine if there are any other problems, go to MAP 3500: Verifying a DDM Bay Repair on page 283. 7. Remove the entire DDM bay that was just installed. Press F3 until the Main Service Menu is displayed, select: Install/Remove Menu DDM Bay (Drawer) Menu Remove Device Drawers Select the DDM bay you are removing and follow the instructions on the service terminal.

MAP 3626: All DDMs on SSA Loop B Do Not Have the Same Characteristics
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
DDMs have been added to SSA loop B that have different characteristics than the existing DDMs. All DDMs in an SSA loop must have the same bus speed.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Use the service terminal to locate the SSA device card displayed as a Possible FRU to Replace. Copy that Resource Name (rsssaxx). 3. From the service terminal Main Service Menu, select: Utility Menu Show Storage Facility Resources Menu List DDMs on an SSA Loop Select the loop that uses the same SSA device card resources copied and loop B.

Problem Isolation Procedures, CHAPTER 3

303

MAP 3626: DDMs on Loop B Have Mixed Characteristics


4. Observe the bus speed of each DDM on the loop. All DDMs on a loop must have the same characteristics. As required to correct the problem, you will have to replace: v Entire DDM bay, or v Individual DDMs Notes: a. To correct the characteristics problem, only the DDMs or DDM bays that you just placed on the loop should be replaced. b. The model of the DDMs on the loop are shown. This tells you, at least, one model of DDM that can be used on the loop. There may be other DDM models with the same characteristics that can also be used on the same loop. Continue with the next step. 5. Determine if you need to replace individual DDMs or DDM bay. Do you need to remove an entire DDM bay? v Yes, go to step 7. v No, go to step 6. 6. Remove any DDMs with the wrong characteristics and replace them with the correct DDMs. After this, determine if there are any other problems, go to MAP 3500: Verifying a DDM Bay Repair on page 283. 7. Remove the entire DDM bay that was just installed. Press F3 until the Main Service Menu is displayed, select: Install/Remove Menu DDM Bay (Drawer) Menu Remove Device Drawers Select the DDM bay you are removing and follow the instructions on the service terminal.

MAP 3627: Unable to Determine DDM Use


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The SSA adapter failed to retrieve the state of DDMs.

Isolation
1. Check the Problem Log for Open or Pending problems with ESC=E100. Use: Repair Menu Show / Repair Problems Needing Repair Did you find any problems with ESC=E100? v Yes, repair the listed problems and then close the problem that sent you to this Map. v No, continue with the next step. 2. Check the DDM_State for all of the DDMs attached to the SSA card listed in the problem that sent you here.

304

VOLUME 1, TotalStorage ESS Service Guide

MAP 3627: Unable to Determine DDM Use


From the Main Service Menu select: Utility Menu Display Physical and Logical Configuration Menu Display DDMs Physical and Logical Information Check for any DDMs that show a DDM_State of fail. Are any DDMs listed as fail? v Yes continue with the next step. v No, continue with step 4. 3. Repair any DDMs which are listed as fail in the previous step then close the problem that sent you to this Map. To repair the DDM(s) use: Repair Menu Repair / Verify DDM(s) and Format / Resume DDM(s) 4. Set a PE password and then call your next level of support. To set a PE password use: Configurations Options Menu Configure Communications Resources Menu Call Home / Remote Services Menu v Enable Product Engineering Access (Make a note of the password that is generated).

MAP 3640: Other Cluster Fenced - Unable to Verify SSA Loop


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
You are connected to one cluster and are attempting to verify a repair on an SSA Loop. For this repair verification, a test must be run on both clusters. When verification was run, it failed because the alternate cluster was fenced. There are two situations that will cause this: 1. There is a problem on the alternate cluster that needs to be resolved before verifying an SSA repair. 2. The failure on the SSA loop caused the alternate cluster to fence. With this condition, the alternate cluster needs to be powered off and then on to clear the fence.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Examine the other problems to see if there are any, that need to be repaired, that are not SSA loop problems. a. Go to list of other problems. From the service terminal Main Service Menu, select: Repair Menu
Problem Isolation Procedures, CHAPTER 3

305

MAP 3640: Other Cluster Fenced - Unable to Verify SSA Loop


Show/Repair Problems Needing Repair b. Look for any problem whose ESC does NOT equal 12xx, Cxxx, Dxxx, or Exxx.. Are there any problems other than the above ESCs? v Yes, the fence of the other cluster was probably caused by a different problem than the SSA loop problem you were repairing. Repair those problems first, then return to the SSA loop problems. Continue with the next step. v No, fence of the other cluster was caused by a loop problem. Go to step 4 to reset the other cluster fence before continuing to repair the SSA loop. 3. Repair non-SSA loop problems before returning to the repair of this SSA loop problem. a. Repair the problems whose ESC does Not Equal to 12xx, Cxxx, Dxxx, or Exxx. b. When you have repaired all the non-SSA loop problems, return to the SSA loop problem you were repairing. Follow the instructions for that problem. 4. This step will quiesce and then power off the alternate cluster, the following step will power it on again. a. Return to the service terminal and press F3 until the Main Service Menu is displayed. From the Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Alternate Cluster Repair Menu Quiesce the Alternate Cluster Wait for processing to complete. Select: Make resources not available for customer use. Wait for: Quiesce was successful. b. Power off the alternate cluster, press F3 once. From the service terminal Alternate Cluster Repair Menu, select: Power Off the Alternate Cluster Power Off the cluster now. Wait for: The cluster has been successfully powered off. Continue with the next step. 5. Power on the alternate cluster, press F3 once. From the service terminal Alternate Cluster Repair Menu, select: Power On the Alternate Cluster Power On the cluster now Wait for: The alternate cluster has been powered on. Wait for the Ready light to be turned on when the IML is complete. Continue with the next step. 6. Return to the problem you were originally working on, you will now be able to complete it. Return to service terminal and press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Repair Menu Show/Repair Problems Needing Repair

306

VOLUME 1, TotalStorage ESS Service Guide

MAP 3640: Other Cluster Fenced - Unable to Verify SSA Loop


Select the original problem on which you were working.

MAP 3650: Wrong, Missing, or Failing Bypass Card


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
In an DDM bay, where a bypass card should be plugged, one of the following conditions is present: v A different kind of card is plugged v There is no card in that location v The bypass card in that location is failing v The controller card in that DDM bay is failing v The DDM bay backplane is failing

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the bypass card listed under Possible FRUs to Replace on the service terminal. See chapter 7, volume 3 of this book for: DDM Bay, Component Physical Location Codes in chapter 7 of the Volume 3 book. Is there a card plugged into that location? v Yes, continue with the next step. v No, select the bypass card from the Possible FRUs to Replace list on service terminal. Install a bypass card in that location and proceed through the verification process. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.

Problem Isolation Procedures, CHAPTER 3

307

MAP 3650: Wrong, Missing, or Failing Bypass Card

DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 125. DDM bay Bypass Card Jumper Settings (s009436)

If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. If the verification failed, continue with any problem displayed by the verification process. 3. Look at the card(s) plugged into the bypass card position. Is it a single card with two SSA connectors on it? v Yes, there is a bypass card in this position, continue with the next step. v No, the card in this position is a passthrough card instead of a bypass card. Select the bypass card from the Possible FRUs to Replace list on service terminal. Install a bypass card in that location and proceed through the verification process. Note: Be sure that the two jumpers on the bypass card are in the correct positions. See the jumper figures in: Bypass and Passthrough Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2 book. If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. If the verification failed, continue with the next step. 4. Select the bypass card from the Possible FRUs to Replace list on service terminal. Install a bypass card in that location and proceed through the verification process. Note: Be sure that the two jumpers on the bypass card are in the correct positions.

308

VOLUME 1, TotalStorage ESS Service Guide

MAP 3650: Wrong, Missing, or Failing Bypass Card

DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 126. DDM bay Bypass Card Jumper Settings (s009436)

v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with the next step. 5. Select the controller card from the Possible FRUs to Replace list on the service terminal. Install a new controller card in that location and proceed through the verification process. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2 book. v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with the next step. 6. Select the frame from the Possible FRUs to Replace list on the service terminal. Install a new frame in that location and proceed through the verification process. v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with any problem displayed by the verification process.

MAP 3652: Wrong, Missing, or Failing Passthrough Card


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
In an DDM bay, where a passthrough card should be plugged, one of the following conditions is present: v A different kind of card is plugged v There is no card in that location v The passthrough card in that location is failing v The controller card in that DDM bay is failing

Isolation
1. Read this Attention before replacing any FRUs in this MAP:

Problem Isolation Procedures, CHAPTER 3

309

MAP 3652: Wrong, Missing, or Failing Passthrough Card


Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. Locate the passthrough card listed under Possible FRUs to Replace on the service terminal. See DDM Bay, Component Physical Location Codes in chapter 7 of the Volume 3. Is there a card plugged into that location? v Yes, continue with the next step. v No, select the passthrough card from the Possible FRUs to Replace list on service terminal. Install a passthrough card in that location and proceed through the verification process. If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. If the verification failed, continue with any problem. displayed by the verification process. Look at the card(s) plugged into the passthrough card position. Is it a single card with two SSA connectors on it? v Yes, the card in this position is a bypass card instead of a passthrough card. Select the passthrough card from the Possible FRUs to Replace list on service terminal. Install a passthrough card in that location and proceed through the verification process. If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. If the verification failed, continue with any problem. displayed by the verification process. v No, there is a passthrough card in this position, continue with the next step. The passthrough card is failing. Select the passthrough card from the Possible FRUs to Replace list on service terminal. Install a passthrough card in that location and proceed through the verification process. v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with the next step. Select the controller card from the Possible FRUs to Replace list on the service terminal. Install a new controller card in that location and proceed through the verification process. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, call your next level of support.

2.

3.

4.

5.

310

VOLUME 1, TotalStorage ESS Service Guide

MAP 3654: Bypass Card Jumpers Wrong

MAP 3654: Bypass Card Jumpers Wrong


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
v A bypass card has one or both jumpers in the wrong position v A controller card in that DDM bay is failing

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the bypass card listed under Possible FRUs to Replace on the service terminal. See DDM Bay, Component Physical Location Codes, in chapter 7 of the Volume 3. 3. Select the bypass card from the Possible FRUs to Replace list on the service terminal. 4. Remove the bypass card. Verify the jumpers on the bypass card are set correctly. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card

Jumper Pins 2 to 3

3 21

3 21

Jumper Pins 2 to 3
Figure 127. DDM bay Bypass Card Jumper Settings (s009436)

Reinstall the bypass card and verify the repair: v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with the next step. 5. Select the controller card from the Possible FRUs to Replace list on the service terminal. Install a new controller card in that location and proceed through the verification process. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2.

Problem Isolation Procedures, CHAPTER 3

311

MAP 3654: Bypass Card Jumpers Wrong


v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with any problem displayed by the verification process.

MAP 3656: 20 MB SSA Cable Installed Where 40 MB Cable Expected


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
One of the following conditions exists: v The SSA cable may be unplugged. v A 20 MB SSA cable is plugged where a 40 MB SSA cable should be used. Note: 20 MB SSA cables are grey and 40 MB SSA cables are blue. v The bypass card at that location has failed v The controller card in that DDM bay has failed

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the , use that method first. v If the FRU is not listed or selectable in the , use the Repair Menu/Replace a FRU option. 2. Locate the bypass card listed under Possible FRUs to Replace on the service terminal. See DDM Bay, Component Physical Location Codes in chapter 7 of the Volume 3 book. Determine the color of the SSA cables connected to the bypass card. Are both of the cables blue? v Yes, continue with the next step. v No, the wrong type of SSA cable(s) are installed. Select the bypass card from the Possible FRUs to Replace list on the service terminal. Do not replace the bypass card. Replace any grey SSA cables with blue SSA cables. Proceed through the verification process. If the verification ran without error, the problem is resolved. Go to step 9 on page 313. If the verification failed, continue with any problem. displayed by the verification process. 3. Are both of the SSA cables connected to the bypass card v Yes, continue with the next step. v No, connect the cable that is not connected. Select the cable from the Possible FRUs to Replace list on the service terminal. Do not replace the cable. Proceed through the verification process.

312

VOLUME 1, TotalStorage ESS Service Guide

MAP 3656: Wrong SSA Cable Installed


If the verification ran without error, the problem is resolved. Go to step 9. If the verification failed, continue with any problem. displayed by the verification process. 4. Select the bypass card from the Possible FRUs to Replace list on the service terminal. Do not remove or replace the bypass card at this time. 5. Remove the two SSA cables from the bypass card and inspect the pins in each connector. Are there three pins in each connector? v Yes, continue with the next step. v No, replace the SSA cable with less than three pins. Connect the SSA cables and continue through the verification process without replacing any other FRUs. If the verification ran without error, the problem is resolved. Go to step 9. If the verification failed, continue with any problem. displayed by the verification process. 6. Inspect the SSA connectors for bent pins. Do any of the pins need to be straightened? v Yes, straighten the pins and replace the cables. Go through the verification process without replacing any FRUs. If the verification ran without error, the problem is resolved. Go to step 9. If the verification failed, continue with any problem displayed by the verification process. v No, continue with the next step. 7. The bypass card may have a problem that causes it to report the wrong cable speed. Replace the bypass card then proceed through the verification process. v If the verification ran without error, the problem is resolved. Go to step 9. v If the verification failed, continue with the next step. 8. Select the controller card from the Possible FRUs to Replace list on the service terminal. Install a new controller card in that location and proceed through the verification process. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification ran without error, the problem is resolved. Continue with the next step. v If the verification failed, continue with any problem displayed by the verification process. 9. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.

MAP 3680: Isolating a Two DDMs Detect Over-Temperature Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Problem Isolation Procedures, CHAPTER 3

313

MAP 3680: Two DDM Detected Over-Temperature

Description
The 2105 requires that the temperature of the room air entering it must not exceed 32C (89.6F). With a room temperature of less than 32C (89.6F), the base casting temperature of the DDMs should not exceed 50C (122F). You have been directed to this MAP because the base casting temperature on two DDMs has exceeding 50C (122F). This may be caused by: v The air temperature surrounding the DDMs exceeding the maximum allowed temperature. v The air flow to the DDMs being restricted. v The temperature sensing circuits on the DDMs being faulty. v The DDMs being faulty and generating too much heat. The repair strategy of this MAP is to first determine if the air supply to the DDMs is too warm or is restricted. An over-temperature condition is not reported until two or more DDMs have sensed an over-temperature. It is possible that one of the two drives has been failing for some time and that the second DDM has just failed. If the over-temperature condition can not be corrected while examining the air supply, you will be directed to replace the DDMs one at a time.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. Record the Problem ID of this problem. Look at the time stamp of the last occurrence. If it is more than 30 minutes old the problem is resolved and can be closed. Was the last occurrence more than 30 minutes ago? v Yes, go to step 14 on page 316. v No, continue with the next step. Determine the approximate temperature of the air at the front and rear of each 2105 Model 800 and Expansion racks. Does the air exceed 32C ( 90F)? v Yes, contact the customer and have the temperature of the room lowered, then go to step 11 on page 315. v No, continue with the next step. Look for other problems with the Failing Resource = rsuplnrsnsxxx or rslplnrsnsxxx or ssaxxx. Are there any problems as described above? v Yes, repair all of these problems, this may lower the DDM temperatures. Then return to this map and go to step 11 on page 315. v No, continue with the next step. Locate the DDMs shown in the Possible FRUs to Replace section of the problem detail or your list from the temperature utility. Note the FRU Location for the FRUs and refer to Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3.

2.

3.

4.

5.

314

VOLUME 1, TotalStorage ESS Service Guide

MAP 3680: Two DDM Detected Over-Temperature


6. Open the rack cover adjacent to those drive locations and look if there is anything interfering with the air flow between the DDMs and the covers. Did you find anything interfering with the air flow to those drives? v Yes, remove the interference to the air flow, then go to step 11. v No, continue with the next step. 7. Verify that the fans at the top of the rack are all turning. Note: You can hold a strip of paper over each of the fans to see if each of the fans are turning. For the location of these fans see 2105 Model 800 and Expansion Enclosure Storage Cage Fan (Top) Location Codes in chapter 7 of the Volume 3. Are all the fans turning? v Yes, continue with the next step. v No, replace the fans that are not turning, then go to step 11. 8. Have you already replaced the first of the two DDMs displayed on the service terminal as Possible FRUs to Replace? v Yes, go to step 10 and replace the second DDM displayed on the service terminal. v No, go to the next step and replace the first DDM displayed on the service terminal. 9. Replace the first of the two DDMs displayed on the service terminal as Possible FRUs to Replace, then verify the repair. Did repair verification run without error? v Yes, go to step 11 to determine if the over-temperature problem is resolved. v No, repair the problems from the repair verification. 10. Replace the other DDM displayed on the service terminal as a Possible FRUs to Replace, then verify the repair. Note: The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to complete. When sparing for the first DDM replacement completes, the second DDM can be replaced. Did repair verification run without error? v Yes, go to the next step to determine if the over-temperature problem is resolved. v No, repair the problems from the repair verification. 11. Wait 15 minutes after the last action was performed that may have decreased the DDM Temperatures. At the end of this time, press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Utility Menu Machine Test Menu SSA Devices Temperature Test At the top of the display there will be a Maximum Temperature = xxC (yyF). Is the Maximum Temperature greater than 40C? v Yes, continue with the next step.

Problem Isolation Procedures, CHAPTER 3

315

MAP 3680: Two DDM Detected Over-Temperature


v No, the problem is resolved, go to step 14. 12. Look down the display and record the Locations of all of the DDMs whose temperature is greater than 40C. Then continue with next step. Is there only one DDM Location on your list? v Yes, continue with the next step. v No, go back to step 5 on page 314 and use the FRU Location List. 13. Replace the DDM. Press F3 until the Main Service Menu is displayed: From the service terminal Main Service Menu, select: Repair Menu Replace a FRIU Move cursor to desired item and press Enter: v Select DDM bay that contains the DDM. v Select the DDM you wish to replace. Follow the instructions to replace the DDM, then go to step 11 on page 315. 14. Close the Problem that you have just resolved, reference the problem ID recorded in step 2 on page 314. Press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem Select the Problem ID you recorded earlier. Follow the service terminal instructions to see if all problems are resolved.

MAP 3685: Isolating a Multiple DDM Detect Over-Temperature Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.

Description
The 2105 requires that the temperature of the room air entering it must not exceed 32C (89.6F). With a room temperature of less than 32C (89.6F), the base casting temperature of the DDMs should not exceed 60C (140F). You have been directed to this MAP because the base casting temperature on more than two DDMs has exceeded 60C (140F). This may be caused by the air temperature surrounding the 2105 exceeding the maximum allowed temperature or something restricting the air flow to the DDMs. The DDMs reporting the over-temperature conditions are in a DDM bay.

Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option.

316

VOLUME 1, TotalStorage ESS Service Guide

MAP 3685: Multiple DDMs Detect Over-Temperature


2. Determine if the problem is still occurring. Display the problem details for the problem that sent you here. Look at the time stamp of the last occurrence field. Was the last occurrence more than 30 minutes ago? v Yes, go to step 13 on page 318. v No, continue with the next step. 3. Determine the approximate temperature of the air at the front and rear of each 2105 Model 800 and expansion enclosure. Does the air exceed 32C ( 90F)? v Yes, go to step 7. v No, continue with the next step. The room temperature exceeds the operating limit of the 2105 subsystem. Contact the customer and have the temperature of the room lowered. Inform the customer that the room temperature problem must be corrected immediately to prevent permanent damage or loss of customer data. Can the customer quickly lower the room temperature? v Yes, go to step 12 on page 318. v No, continue with the next step. If the room temperature problem cannot be repaired quickly, the 2015 subsystem should be powered off. Have the customer switch off the machine. Did the customer switch off the machine? v Yes, continue with the next step. v No, if the customer cannot switch off the 2105 to prevent permanent damage or loss of customer data, contact your next level of support immediately. After the customer has corrected the room temperature problem, power on the machine and continue with step 12 on page 318. Look for problems needing repair that contain any of the following FRUs: SSA Device Card, Fan Sense Card, Storage Cage Fan. Are there any problems containing these FRUs? v Yes, repair all of these problems, this may lower the DDM temperatures. After the repair return to this map and go to step 12 on page 318. v No, continue with the next step. Locate the DDMs shown in the Possible FRUs to Replace section of the problem detail or your list from the temperature utility run in step 12 on page 318. Note the FRU Location for the FRUs and refer to Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. Open the rack cover adjacent to those drive locations and look if there is anything interfering with the air flow between the DDMs and the covers. Did you find anything interfering with the air flow to those drives? v Yes, remove the interference to the air flow, then go to step 12 on page 318. v No, continue with the next step. Verify that the fans at the top of the rack are all turning. Note: You can hold a strip of paper over each of the fans to see if each of the fans are turning. For the location of these fans see 2105 Model 800 and Expansion Enclosure Storage Cage Fan (Top) Location Codes in chapter 7 of the Volume 3. Are all the fans turning?
Problem Isolation Procedures, CHAPTER 3

4.

5.

6. 7.

8.

9.

10.

317

MAP 3685: Multiple DDMs Detect Over-Temperature


v Yes, continue with the next step. v No, replace the fans that are not turning, then go to step 12. This is a complex problem. Call your next level of support. Wait 15 minutes after the last action was performed, this may reduce the temperature of the DDMs. After 15 minutes, continue with the next step. From the service terminal Main Service Menu, select: Utility Menu Machine Test Menu SSA Devices Temperature Test At the top of the display there will be a Maximum Temperature = xxC (yyF). Is the Maximum Temperature greater than 40C? v Yes, continue with the next step. v No, go to step 16. Look down the display and record the Locations of all of the DDMs whose temperature is greater than 40C. Is there only one DDM Location on your list? v Yes, continue with the next step. v No, go back to step 8 on page 317 using your new list of DDM locations.

11. 12. 13.

14.

15. Replace the DDM. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Move cursor to desired item and press Enter: v Select DDM bay that contains the DDM. v Select the DDM you wish to replace. Follow the instructions to replace the DDM, then go to step 12. 16. Close the Problem that you have just resolved. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem Select the Problem ID you recorded earlier. Follow the service terminal instructions to see if all problems are resolved.

318

VOLUME 1, TotalStorage ESS Service Guide

MAPs 4XXX: Cluster Isolation Procedures

MAPs 4XXX: Cluster Isolation Procedures


Procedures in the MAP 4XXX group in Chapter 3 cover the cluster area of the 2105 Model 800 unit.

MAP 4010: Cluster Hang During a Failback or Error Recovery


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
While one cluster reboots, the other cluster waits for it to complete. The waiting cluster needs the other cluster to complete its rebooting, before the waiting cluster can bring the rebooted cluster back on line. The cluster never completed the reboot so that it could communicate with the waiting cluster. The cluster may be hung during the power on firmware process or booting of AIX. The total time for all this occur may add up to one additional hour to the normal time for a cluster to come ready. The cluster reboot could have been part of a manual cluster resume during a service action or a reboot due to an automatic microcode error recovery process where one cluster reboots the other cluster. Information for Product Engineering - If the cluster takes a failure in the IML back or the failback to dual cluster a different problem is created.

Isolation
1. Verify that the cluster is powered on. Press the CD-ROM drive eject button. Did the CD-ROM tray open? v Yes, continue with the next step. v No, go to MAP 4880: Cluster Power On Problem on page 461. 2. Determine if the cluster hung prior to IPL of AIX complete by observing the CEC drawer operator panel. Are any codes displayed? v Yes, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v No, continue with the next step. 3. Connect the service terminal to the failing cluster and attempt to login. Can you login? v Yes, the cluster is not hung, continue with the next step. v No, go to step 5 on page 320. 4. Disconnect the service terminal from the failing cluster and connect it to the working cluster. Use the Repair Menu, and the Alternate Cluster Repair Menu options to attempt to resume the failing cluster: v If the cluster hangs with a code displayed, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v If the cluster completes the resume and comes ready, close the problem and use the Repair Menu, End of Call Status option. v If the cluster does not complete the resume, call the next level of support.

Problem Isolation Procedures, CHAPTER 3

319

MAP 4010: Cluster Hung


5. Disconnect the service terminal from the failing cluster and connect it to the working cluster. Use the Repair Menu, Alternate Cluster Repair options to power off and on the failing cluster. v If the cluster hangs with a code displayed, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v If the cluster completes the power on continue at the next step. 6. Use the Repair Menu, Alternate Cluster Repair Menu options to attempt to resume the failing cluster. This will cause the cluster to reload the code one more time. Was the resume successful? v Yes, close the problem and then use the Repair Menu, End of Call Status option. v No, call the next level of support.

MAP 4020: Hard Disk Drive Build Process for Both Drives
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: The FRUs and cables in this procedure are ESD-sensitive. Always wear an ESD wrist strap during this isolation procedure. Follow the ESD procedures in Working with ESD-Sensitive Parts in chapter 4 of the Volume 2.

Description
This procedure is used: v To load AIX and 2105 Model 800 code on both the hard disk drives of one cluster. v The code will first be loaded on one of the hard disk drives then it will be automatically mirrored to the other hard disk drive.

Requirements
v v v v 2105 O/S CD volumes 1 and 2 2105 O/S update (PTF) CD if required 2105 LIC CD Blank diskettes to save customization and configuration

Procedure
1. Are you doing an Automatic LIC Code update? v Yes, continue with the next step. v No, go to step 3. 2. Is there a problem with ESC=14xx calling a 4Axx MAP? v Yes, go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. v No, continue with the next step. 3. Were you sent here from another MAP to do the hard disk drive build process? v Yes, continue with the next step. v No, please return to the procedure that sent you here. Replacing a single hard disk drive can be done concurrent with customer operation on the cluster. The service login Cluster Dual Hard Disk Drive Repair Menu options are used.

320

VOLUME 1, TotalStorage ESS Service Guide

MAP 4020: Hard Disk Drive Build Process for Both Drives
4. Were both clusters running on the same level of LIC code prior to entering this MAP? v Yes, continue with the next step. v No, the clusters were in a LIC code update/activation. Call the next level of support. (This MAP is designed to end up with both clusters at the same LIC level. You must use LIC CDs that are at the same level as the working cluster.) 5. Verify the service terminal is connected to the cluster not being repaired, see Service Terminal Setup in chapter 8 of the Volume 3. 6. Did you quiesce the failing cluster before you started this MAP? v Yes, continue with the next step. v No, quiesce the failing cluster using the alternate cluster repair menu options from the operating cluster. Then continue with the next step. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Quiesce the Alternate Cluster 7. Make configuration diskette(s) from the cluster not being repaired. (Multiple diskettes will be needed if the configuration is large.) Note: Do not use previously created configuration diskettes, they may not have current information. From the service terminal Main Service Menu, select: Configurations Option Menu Import/Export Configuration Data Menu Export Configuration Data via Diskette Follow the service terminal prompts, insert the diskette when instructed. Note: When the diskette(s) are removed, label them with a date and as a configuration diskette. (If there are multiple configuration diskettes, mark them in the order they were created.) 8. Make a customization diskette now from the cluster not being repaired. Note: Do not use previously created customization diskettes, they may not have current information. From the service terminal Main Service Menu, select: Utility Menu Make A Customization Diskette Follow the service terminal prompts, insert the diskette (new media for /dev/rfd0) when prompted. Insert the customization diskette in the diskette drive of the cluster being repaired. Insert the 2105 O/S VER. X.X.X. volume 1 CD in the failing cluster CD-ROM Drive. Wait until the CD-ROM Drive LED stops blinking, then go to the next step. Power off the cluster, use the Alternate Cluster Repair Menu options. Power on the cluster, use the Alternate Cluster Repair Menu options. Connect the service terminal to port S1 of the cluster being repaired and attempt to logically connect to the cluster.

9. 10.

11. 12. 13.

Problem Isolation Procedures, CHAPTER 3

321

MAP 4020: Hard Disk Drive Build Process for Both Drives
Note: The service terminal logical connection will be lost several times. Keep logically reconnecting the service terminal so you do not miss seeing the displayed information. 14. The cluster will begin loading code from the 2105 CD and customization diskette to build the hard disk drive. After 20-30 minutes you will be asked to remove the 2105 O/S Volume 1 CD and the customization diskette and install the 2105 O/S Volume 2 CD. Please follow the instructions to install the CD and continue. Notes: a. If the CEC drawer operator panel displays code 0c31 and the following message appears on the service terminal, this indicates that the cluster was unable to read the customization diskette. ****** Please define the System Console. ******* Type a 1 and press Enter to use this terminal as the system console... The likely causes and actions are : 1) The customization diskette was not inserted. Please insert the diskette and then restart at step 9 on page 321. 2) The customization diskette is corrupted. Please create another diskette (step 8 on page 321) and then restart at step 9 on page 321. 3) The diskette drive is failing. Please replace the diskette drive and then restart at step 9 on page 321. 4) The bootlist is incorrect. Please use MAP 43A0: Bootlist Management Using SMS on page 387 to repair this and then restart at step 9 on page 321. b. If any of the following symptoms occur, the CD image is not being read. v CEC drawer operator panel code 20EE000B - no bootable devices found. v Cluster begins booting from one of the hard-drives. For example, Init CPI4 message appears on the CEC drawer operator panel v The symptoms which sent you to this map are repeated The likely causes and actions are: 1) Dirty CD. Clean the CD and then retry the failing operation from step 9 on page 321. 2) Failing CD or CD ROM drive. Replace the FRU and then return to step 9 on page 321. 3) Incorrect bootlist. Use MAP 43A0: Bootlist Management Using SMS on page 387 to check and correct this, then restart at step9 on page 321. 15. The cluster will reboot. Please reconnect the service terminal. Ignore any error messages that may temporarily display as the status messages scroll by. 16. After 10-15 minutes, a message will appear which will ask you to either: v Remove the 2105 O/S Volume 2 CD and install the 2105 O/S update CD. In that case follow the instructions to install the CD and select the option to continue. Then go to the next step. or v Remove the 2105 O/S Volume 2 CD and install the 2105 LIC CD and the first Configuration diskette. In that case please follow the instructions to install the CD plus Configuration diskette and go to step 18 on page 323.

322

VOLUME 1, TotalStorage ESS Service Guide

MAP 4020: Hard Disk Drive Build Process for Both Drives
17. After a few minutes you will be asked to remove the 2105 O/S update CD and install the 2105 LIC CD and the first Configuration diskette. Please follow the instructions to install the CD plus configuration diskette and select the option to continue. 18. After a few minutes you will see a message to inform you that the configuration data is being read. You will then be asked to insert the additional configuration diskettes as required. Please follow the instructions to install the remaining configuration diskette(s) and select the option to continue. 19. After a few minutes you will be asked to remove the 2105 LIC CD and configuration diskette. Please follow the instructions to remove the CD and configuration diskette and select the option to continue. 20. After a few minutes you will see a message to indicate that the cluster hard disk drives are being mirrored. 21. After approximately 40 minutes a message will be displayed to indicate that the Hard Drive Rebuild has completed. When completed, do the following: a. Type 1 and press Enter to continue. b. Messages will display and then the Copyright screen will display. c. Login to the other cluster. (Use the normal S2 port.) d. Use the Alternate Cluster Repair Menu options to power off, and power on the cluster being repaired. 22. Wait up to 45 minutes for the cluster to come ready and then attempt to login with the service terminal. Was the service terminal able to login to the cluster being repaired? v Yes, continue with the next step. v No, go to MAP 6060: Isolating a Service Terminal Login Failure on page 567. 23. With the service terminal still connected to the cluster being repaired, display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): v If there are any new problems for the cluster (CEC drawer or I/O drawer), repair them before continuing. v If the last occurrence timestamp in an existing cluster (CEC drawer or I/O drawer) problem was updated, then the cluster is still failing and needs to be repaired before continuing. 24. Connect the service terminal to the cluster not being repaired. Use the Alternate Cluster Repair Menu option to resume the alternate cluster. Wait for the operator panel cluster Ready Indicator LED to come on and then go to the next step. 25. If the service terminal repair process did not automatically close the problem, then use this step to close it now. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Close a Previously Repaired Problem. Note: Closing or cancelling a problem will attempt to return to customer use any fenced or quiesced resources. If the problem was not fully repaired, the existing problem may be updated or a new problem created. 26. Use the service terminal options listed below to ensure all resources for this repair have been returned to customer use (they will not be listed). Any listed
Problem Isolation Procedures, CHAPTER 3

323

MAP 4020: Hard Disk Drive Build Process for Both Drives
resources are not available for customer use and will still be quiesced or fenced. Those resources should have a related problem listed that still needs repair. If resources are listed and there are no problems listed, call the next level of support. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu End of Call Status

MAP 4025: Hard Drive Build Process for Automatic LIC


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This procedure is used as follows: v During AutoLIC recovery only. v To reload AIX and the 2105 Model 800 code on the hard disk drives of one cluster.

Requirements
v v v v 2105 O/S CD volumes 1 and 2 2105 O/S update (PTF) CD if required 2105 LIC CD Customization and configuration diskettes created with the LIC Install Instructions

Isolation
1. Verify the service terminal is connected to the cluster not being repaired, see Service Terminal Setup in chapter 8 of the Volume 3. 2. Use the table to proceed.
Table 34. Original Repair MAP The AutoLIC repair started with a problem log calling MAP MAP 4A10 MAP 4A40 Any other 4Axx MAP Action Go to step 3 Go to step 3 Call the next level of support. See note.

Note: One of the following two conditions is present: v Multiple failures exist. v The cluster to be repaired may have already been booted on the new LIC code, this MAP only supports reloading the original code and then recovering the AutoLIC. 3. Did you quiesce the failing cluster before you started this MAP? v Yes, continue with the next step. v No, quiesce the failing cluster using the alternate cluster repair menu options from the cluster not being repaired. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair

324

VOLUME 1, TotalStorage ESS Service Guide

MAP 4025: Hard Drive Build Process for Automatic LIC


Quiesce the Alternate Cluster Continue with the next step. 4. Insert the customization diskette in the diskette drive of the failing cluster. Note: This customization diskette was created previously under guidance of the LIC install instructions. 5. Insert the 2105 O/S VER. X.X.X. volume 1 CD in the failing cluster CD-ROM drive. Wait until the CD-ROM drive LED stops blinking, then go to the next step. 6. Power off the cluster, use the Alternate Cluster Repair Menu options. 7. Power on the cluster, use the Alternate Cluster Repair Menu options. 8. Connect the service terminal to port S1 of the cluster being repaired, attempt to logically connect to the cluster. Note: The service terminal logical connection will be lost several times. Keep logically reconnecting the service terminal so you do not miss seeing the displayed information. 9. The cluster will begin loading code from the 2105 CD and customization diskette to build the hard disk drive. After 20 to 30 minutes you will be asked to remove the 2105 O/S Volume 1 CD and the customization diskette and install the 2105 O/S Volume 2 CD. Please follow the instructions to install the CD and continue. 10. The cluster will reboot. Please reconnect the service terminal. Ignore any error messages that may temporarily display as the status messages scroll by. 11. After 10 to 15 minutes, a message will appear which will ask you to either: v Remove the 2105 O/S Volume 2 CD and install the 2105 O/S update CD. Follow the instructions to install the CD and select the option to continue. Then go to step 12. - or v Remove the 2105 O/S Volume 2 CD and install the 2105 LIC CD and the first Configuration diskette. In that case please follow the instructions to install the CD plus Configuration diskette and go to step 13. Note: Use the configuration diskette or diskettes were created previously under guidance of the LIC install instructions. Use the original 2105 LIC CD. 12. After a few minutes you will be asked to remove the 2105 O/S update CD and install the 2105 LIC CD and the first Configuration diskette. Please follow the instructions to install the CD plus configuration diskette and select the option to continue. Note: Use the configuration diskette or diskettes were created previously under guidance of the LIC install instructions. Use the original 2105 LIC CD. 13. After a few minutes you will see a message to inform you that the configuration data is being read. You will then be asked to insert the additional configuration diskettes as required. Please follow the instructions to install the remaining configuration diskette or diskettes and select the option to continue. 14. After a few minutes you will be asked to remove the 2105 LIC CD and configuration diskette. Please follow the instructions to remove the CD and configuration diskette and select the option to continue. 15. After a few minutes you will see a message to indicate that the cluster hard disk drives are being mirrored.
Problem Isolation Procedures, CHAPTER 3

325

MAP 4025: Hard Drive Build Process for Automatic LIC


16. After approximately 40 minutes a message will be displayed to indicate that the Hard Drive Rebuild has completed. When completed, do the following: a. Type 1 and press Enter to continue. b. Messages will display and then the Copyright screen will display. c. Login to the other cluster. (Use the normal S2 port.) d. Use the Alternate Cluster Repair Menu options to power off, and power on the cluster being repaired. 17. Wait up to 45 minutes for the cluster to come ready. Did the cluster come ready? v Yes, continue with the next step. v No, go to MAP 6060: Isolating a Service Terminal Login Failure on page 567. 18. Did you quiesce the failing cluster in step 3 on page 324? v Yes, continue with the next step. v No, return to the MAP that sent you here. 19. Resume the cluster using the Alternate Cluster Repair menu options. When the resume is complete, return to the MAP that sent you here.

MAP 4040: Entry MAP for CPI Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A CPI error has generated a problem that is ready for repair. The error recovery code has fenced (removed from customer use), a host bay, cluster, or host bay and cluster. There are four CPI diagnostic tests: v IOA Test, tests the IOA/NVS card in the cluster. v IOA to Host Bay Planar Test, tests the interface between the IOA/NVS card in the cluster and the host bay planar in the host bay. v Host Bay Planar Test, tests the host bay planar. v Host Bay Planar PCI Bus Test, tests the PCI bus section of the host bay planar which is used for cluster to cluster communication. It is the common logic between the CPI interface to each cluster. This test first uses the cluster to cluster ethernet communications to setup registers in both clusters before testing the cluster to cluster CPI communications. There are four conditions when the CPI diagnostics are run. These are listed in the table below.

326

VOLUME 1, TotalStorage ESS Service Guide

MAP 4040: CPI Entry MAP


Table 35. CPI Diagnostics Overview Resume Cluster, Host Bay Available, Fenced or Quiesced Yes Yes Yes No Resume Host Bay, Both Clusters Available Yes Yes Yes Yes Resume Host Bay, One Cluster Fenced or Quiesced No No No No

CPI Test IOA Test IOA to Host Bay Planar Test Host Bay Planar Test Host Bay Planar PCI Bus Test

Two Cluster IML, 2105 Power On Yes Yes Yes Yes

To load the host bay planar firmware, both clusters must be available (not quiesced or fenced).

Isolation
Attention: The Likely to Fix FRU percentages, shown in the problem details, cannot be used to determine the order of FRU replacement. To avoid customer impact, this MAP must be followed exactly. The repair sequence is determined by which resources are fenced and which FRUs require replacement. 1. Review all problems needing repair looking for CPI interface problems with FRUs in the host bay, cluster I/O drawer, or both. v The possible cluster I/O drawer FRUs are the NVS/IOA card and the I/O drawer planar assembly. v The possible host bay FRUs are the host bay planar assembly and host adapter cards. v The CPI cables are also possible FRUs, but will not be listed in a problem. 2. Write down the time stamp in the Last Occurrence field of each related problem. This field is updated with a new time stamp if the same error is detected again during the repair verification procedures. It is also possible for a new problem to be created if the CPI diagnostics or functional code discover a related problem. 3. Select the condition below that applies:
Table 36. Failure Condition Condition You just replaced a cluster or host bay FRU. You were performing an AutoLIC or MultiLIC code upgrade. Neither of the above. Action Go to step 5 on page 329 Go to step 7 on page 330 Go to step 4

4. Determine if any cluster or host bay or bays, are fenced or quiesced. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Show Fenced Resources Show Quiesced Resources

Problem Isolation Procedures, CHAPTER 3

327

MAP 4040: CPI Entry MAP


Find the combination of fenced or quiesced cluster or host bays in the first column of Table 37. Use the Repair Method referenced in the second column.
Table 37. Fenced or Quiesced Cluster or Host Bays` Resource Name (Resource Description) No fenced cpcpix (host bay) or cpclusterx (Cluster) Go to the Following Repair Method MAP 4040 Section-1 on page 331 MAP 4040 Section-2 on page 331 MAP 4040 Section-2 on page 331 MAP 4040 Section-2 on page 331 MAP 4040 Section-2 on page 331 MAP 4040 Section-3 on page 332 MAP 4040 Section-3 on page 332 MAP 4040 Section-3 on page 332 MAP 4040 Section-3 on page 332 MAP 4040 Section-4 on page 333 MAP 4040 Section-4 on page 333 MAP 4040 Section-4 on page 333 MAP 4040 Section-4 on page 333 MAP 4040 Section-4 on page 333 MAP 4040 Section-4 on page 333

cpcpi4 (Host Bay 1)

cpcpi5 (Host Bay 3)

cpcpi6 (Host Bay 2)

cpcpi7 (Host Bay 4)

cpcpi4 (Host Bay 1) cpcpi6 (Host Bay 2)

cpcpi4 (Host Bay 1) cpcpi7 (Host Bay 4)

cpcpi5 (Host Bay 3) cpcpi6 (Host Bay 2)

cpcpi5 (Host Bay 3) cpcpi7 (Host Bay 4)

cpcluster0 (Cluster 1)

cpcluster1 (Cluster 2)

cpcluster0 (Cluster 1) cpcpi4 (Host Bay 1)

cpcluster0 (Cluster 1) cpcpi5 (Host Bay 3)

cpcluster0 (Cluster 1) cpcpi6 (Host Bay 2)

cpcluster0 (Cluster 1) cpcpi7 (Host Bay 4)

328

VOLUME 1, TotalStorage ESS Service Guide

MAP 4040: CPI Entry MAP


Table 37. Fenced or Quiesced Cluster or Host Bays` (continued) Resource Name (Resource Description) cpcluster1 (Cluster 2) cpcpi4 (Host Bay 1) Go to the Following Repair Method MAP 4040 Section-4 on page 333 MAP 4040 Section-4 on page 333 MAP 4040 Section-4 on page 333 MAP 4040 Section-4 on page 333 MAP 4040 Section-5 on page 334 MAP 4040 Section-7 on page 338 MAP 4040 Section-7 on page 338 MAP 4040 Section-6 on page 336 MAP 4040 Section-7 on page 338 MAP 4040 Section-7 on page 338 MAP 4040 Section-8 on page 339 MAP 4040 Section-8 on page 339 MAP 4040 Section-8 on page 339 MAP 4040 Section-8 on page 339

cpcluster1 (Cluster 2) cpcpi5 (Host Bay 3)

cpcluster1 (Cluster 2) cpcpi6 (Host Bay 2)

cpcluster1 (Cluster 2) cpcpi7 (Host Bay 4)

cpcluster0 (Cluster 1) cpcpi4 (Host Bay 1) cpcpi5 (Host Bay 3)

cpcluster1 (Cluster 2) cpcpi4 (Host Bay 1) cpcpi5 (Host Bay 3)

cpcluster0 (Cluster 1) cpcpi6 (Host Bay 2) cpcpi7 (Host Bay 4)

cpcluster1 (Cluster 2) cpcpi6 (Host Bay 2) cpcpi7 (Host Bay 4)

cpcluster0 (Cluster 1) cpcpi4 (Host Bay 1) cpcpi6 (Host Bay 2)

cpcluster1 (Cluster 2) cpcpi5 (Host Bay 3) cpcpi7 (Host Bay 4)

cpcluster0 (Cluster 1) cpcpi4 (Host Bay 1) cpcpi5 (Host Bay 3) cpcpi6 (Host Bay 2) cpcluster1 (Cluster 2) cpcpi4 (Host Bay 1) cpcpi5 (Host Bay 3) cpcpi7 (Host Bay 4) cpcluster0 (Cluster 1) cpcpi4 (Host Bay 1) cpcpi6 (Host Bay 2) cpcpi7 (Host Bay 4) cpcluster1 (Cluster 2) cpcpi5 (Host Bay 3) cpcpi6 (Host Bay 2) cpcpi7 (Host Bay 4)

5. Verify that the CPI cables are connected correctly at the host bay planars and the I/O drawer NVS/IOA cards. Note: If the listed FRUs do not fix the problem, the attached CPI cable or cables should be replaced only if a host bay planar and NVS/IOA card were replaced. 6. Is the ESC listed in the problem one of the following:
Problem Isolation Procedures, CHAPTER 3

329

MAP 4040: CPI Entry MAP


ESC = 1101: host bay 1 (cpcpi4) ESC = 1102: host bay 2 (cpcpi6) ESC = 1103: host bay 3 (cpcpi5) ESC = 1104: host bay 4 (cpcpi7) v Yes, the two CPI cables are cross connected at that host bay planar. The host bay will need to be quiesced and powered off to correct the problem. Go to step 4 on page 327 and handle the CPI cables as host bay FRUs. v No, go to step 4 on page 327. 7. Is the ESC listed in the problem one of the following: ESC = 1340: CPI4 cluster 1 and host bay ESC = 1341: CPI5 cluster 1 and host bay ESC = 1342: CPI6 cluster 1 and host bay ESC = 1343: CPI7 cluster 1 and host bay ESC = 1344: CPI4 cluster 2 and host bay ESC = 1345: CPI5 cluster 2 and host bay ESC = 1346: CPI6 cluster 2 and host bay ESC = 1347: CPI7 cluster 2 and host bay ESC = 1348: CPI4 host bay ESC ESC ESC ESC ESC ESC ESC = = = = = = = 1349: CPI5 host bay 134A: CPI6 host bay 134B: CPI7 host bay 134C: CPI4 cluster 1 134D: CPI5 cluster 1 134E: CPI6 cluster 1 134F: CPI7 cluster 1

ESC = 1350: CPI4 cluster 2 ESC = 1351: CPI5 cluster 2 ESC = 1352: CPI6 cluster 2 ESC = 1353: CPI7 cluster 2 v Yes, continue with the next step. v No, the availability and code level of each cluster must be determined before a CPI repair can be attempted. Call your next level of support. 8. Did this failure occur while performing a MuiltiLIC or AutoLIC update ? v Yes, continue with the next step. v No, to continue with the analysis and FRU replacement, go to step 4 on page 327. Note: You may prefer to call the next level of support to have PFE attempt to restore the firmware without FRU replacement. 9. Did this failure occur while performing a MultiLic update ? v Yes, go to step 11 on page 331. v No, (AutoLIC update) continue with the next step. 10. Is a cluster FRU listed in the problem (ESC 1340-1347 or 134C-1353)? v Yes, replace ONLY the cluster FRU listed. Do the following: Determine if the failing cluster is already quiesced. Use the Main Service Menu, Utility Menu, Resource Management Menu, Show Quiesced Resources.

330

VOLUME 1, TotalStorage ESS Service Guide

MAP 4040: CPI Entry MAP


Go to step 4 on page 327 to replace the cluster FRUs. Do not resume the cluster following FRU replacement if the cluster was already quiesced above. Notes: a. If replacing the cluster FRUs resolves the problem, return to the AutoLIC Map that sent you here. b. If replacing the cluster FRUs does not resolve the problem, call the next level of support. Before replacing host bay FRUs, both clusters must be available and at the same code level. v No, call the next level of support. Before replacing host bay FRUs, both clusters must be available and at the same code level. 11. Connect the service terminal to the failing cluster and logon. Use the following menus to determine if a norsStart* file exists. Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu List norsStart* Files Does a norsStart* or norsStartOnce file exist or is a norsStartOnce diskette inserted? v Yes, do not remove the norsStart* file or diskette. Call your next level of support. The cluster state must be determined before continuing. v No, continue with the next step. 12. Is a cluster FRU listed for replacement (ESC 1340-1347 or 134C-1353)? v Yes, replace ONLY the cluster FRU or FRUs listed. Do not resume the cluster following FRU replacement. Do the following: Determine if the failing cluster is already quiesced. Use the Main Service Menu, Utility Menu, Resource Management Menu, Show Quiesced Resources. Go to step 4 on page 327 to replace the cluster FRUs. Do not resume the cluster following FRU replacement if the cluster was already quiesced above. Notes: a. If replacing the cluster FRUs resolves the problem, return to the code load instructions and restart the failed operation. b. If replacing the cluster FRUs does not resolve the problem, call the next level of support. Before replacing host bay FRUs, both clusters must be available and at the same code level. v No, call the next level of support. Before replacing host bay FRUs, both clusters must be available and at the same code level. MAP 4040 Section-1: 1. Replace the listed FRUs. Use the following MAPs as needed: v MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 v MAP 4070: Replacement of Host Bay FRUs for CPI Problems on page 343 MAP 4040 Section-2: 1. Have you been directed to replace a CPI cable in addition to the FRUs listed in the problem?

Problem Isolation Procedures, CHAPTER 3

331

MAP 4040: CPI Entry MAP


v Yes, when you replace a host bay or I/O drawer FRU, also replace the CPI cable. Continue with the next step. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. v No, continue with the next step. 2. Find the combination of FRUs listed in the problem, or logs, that have not been replaced. Do the action shown in the table:
Table 38. FRUs Not Yet Replaced FRUs Not Yet Replaced I/O drawer and host bay FRUs Action Replace the I/O drawer FRUs, go to MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 Replace the I/O drawer FRUs, go to MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 Replace the host bay FRUs, go to MAP 4070: Replacement of Host Bay FRUs for CPI Problems on page 343

I/O drawer FRUs only

Host bay FRUs only

Note: A CPI fence can be caused by a host bay power failure. If there is a problem for this host bay or an RPC card, repair that problem first and then begin this repair over again. If there is not a related power problem, observe the host bay planar power LEDs as shown in Figure 138 on page 453. Use a working host bay to ensure you know where to look. If the LEDs are lit, there is no power problem. If the LEDs are not lit, at the rear of the rack, observe the HA1 (for host bay 1 or 3) or HA2 (for host bay 2 or 4) LEDs on both host bay drawer power supplies. If one or both LEDs are lit, the possible failing FRUs are the host bay planar or the host bay drawer backplane. Use the Repair Menu, Replace a FRU option to replace the FRUs. MAP 4040 Section-3: 1. Have you have been directed to replace a CPI cable in addition to the FRUs listed in the problem? v Yes, when you replace a host bay or I/O drawer FRU, also replace the CPI cable. Continue with the next step. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. v No, continue with the next step. 2. Find the combination of FRUs listed in the problem, or logs, that have not been replaced. Do the action shown in the table:

332

VOLUME 1, TotalStorage ESS Service Guide

MAP 4040: CPI Entry MAP


Table 39. FRUs Not Yet Replaced FRUs Not Yet Replaced I/O drawer and host bay FRUs Action Replace the host bay FRUs, go to MAP 4070: Replacement of Host Bay FRUs for CPI Problems on page 343 Replace the I/O drawer FRUs, go to MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 Replace the host bay FRUs, go to MAP 4070: Replacement of Host Bay FRUs for CPI Problems on page 343

I/O drawer FRUs only

Host bay FRUs only

MAP 4040 Section-4: 1. Have you have been directed to replace a CPI cable in addition to the FRUs listed in the problem? v Yes, when you replace a host bay or I/O drawer FRU, also replace the CPI cable. Continue with the next step. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. v No, continue with the next step. 2. Find the combination of FRUs listed in the problem, or logs, that have not been replaced. Do the action shown in the table:
Table 40. FRUs Not Yet Replaced FRUs Not Yet Replaced I/O drawer and host bay FRUs Action Replace the I/O drawer FRUs, go to MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 Replace the I/O drawer FRUs, go to MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 Go to step 3 Note: Both clusters must be available when a host bay FRU is replaced so the host bay FRU firmware can be properly loaded. There is no cluster FRU, so it is assumed that the CPI failure is in the CPI cable or host bay and the cluster can be made available.

I/O drawer FRUs only

Host bay FRUs only

3. Quiesce the failing host bay. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource 4. Quiesce and then Resume the unavailable cluster. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu
Problem Isolation Procedures, CHAPTER 3

333

MAP 4040: CPI Entry MAP


Quiesce a Resource Resume a Resource Wait for the operator panel cluster Ready LED to light before continuing with the next step. Note: If the resume is not successful, do not continue. Call the next level of support. 5. Replace the host bay FRU. Go to MAP 4070: Replacement of Host Bay FRUs for CPI Problems on page 343. MAP 4040 Section-5: 1. Have you have been directed to replace a CPI cable in addition to the FRUs listed in the problem? v Yes, when you replace a host bay or I/O drawer FRU, also replace the CPI cable. Continue with the next step. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. v No, continue with the next step. 2. Find the combination of FRUs listed in the problem, or logs, that have not been replaced. Do the action shown in the table:
Table 41. FRUs Not Yet Replaced FRUs Not Yet Replaced FRUs for host bays 1 and 3, and cluster 1 FRUs for host bays 1 and 3 only All other combinations of FRUs Action Call the next level of support Call the next level of support Go to step 3

3. Logon to Cluster 2 then Quiesce Cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 4. Are there any cluster I/O drawer FRUs to be replaced? v Yes, continue with the next step. v No, go to step 6. 5. Replace the cluster I/O drawer FRUs but do not attempt to resume the cluster. a. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. b. When MAP 4700 directs you to resume the cluster, do not resume it. c. Exit MAP 4700, return here and continue with the next step. 6. Select the condition that applies: v Host bay FRUs are listed for host bay 1 only. Continue with the next step. v Host bay FRUs are listed for host bay 3 only. Go to step 8 on page 335. v No host bay FRUs. Go to step 10 on page 335. 7. Attempt to quiesce and then resume CPI 5 (Host Bay 3). From the service terminal Main Service Menu, select: Utilities Menu

334

VOLUME 1, TotalStorage ESS Service Guide

MAP 4040: CPI Entry MAP


Resource Management Menu Quiesce a Resource Resume a Resource Was it successful? v Yes, go to step 9. v No, call the next level of support. 8. Attempt to quiesce and then resume CPI 4 (Host Bay 1). From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource Resume a Resource Was it successful? v Yes, continue with the next step. v No, call the next level of support. 9. Resume Cluster 1. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Resume a Resource Was it successful? v Yes, go to step 4 on page 327. Note: The cluster has been resumed. Any unavailable resources must be assessed before continuing the repair to prevent customer loss of access. v No, call the next level of support. 10. Attempt to quiesce and then resume CPI 4 (Host Bay 1). From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource Resume a Resource Was it successful? v Yes, continue with the next step. v No, determine if there is a new problem, or if the problem details last occurrence timestamp has been updated for an existing problem. Then call the next level of support. 11. Attempt to quiesce and then resume CPI 5 (Host Bay 3). From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource Resume a Resource Was it successful? v Yes, continue with the next step.
Problem Isolation Procedures, CHAPTER 3

335

MAP 4040: CPI Entry MAP


v No, determine if there is a new problem, or if the problem details last occurrence timestamp has been updated for an existing problem. Then call the next level of support. 12. Resume Cluster 1. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Resume a Resource Was it successful? v Yes, close the related problems and then use the Repair Menu, End Of Call Status option. v No, determine if there is a new problem, or if the problem details last occurrence timestamp has been updated for an existing problem. Then call the next level of support. MAP 4040 Section-6: 1. Have you have been directed to replace a CPI cable in addition to the FRUs listed in the problem? v Yes, when you replace a host bay or I/O drawer FRU, also replace the CPI cable. Continue with the next step. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. v No, continue with the next step. 2. Find the combination of FRUs listed in the problem, or logs, that have not been replaced. Do the action shown in the table:
Table 42. FRUs Not Yet Replaced FRUs Not Yet Replaced FRUs for host bays 2 and 4, and cluster 2 FRUs for host bays 2 and 4 only All other combinations of FRUs Action Call the next level of support Call the next level of support Go to step 3

3. Logon to Cluster 1 then Quiesce Cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 4. Are there any cluster I/O drawer FRUs to be replaced? v Yes, continue with the next step. v No, go to step 6. 5. Replace the cluster I/O drawer FRUs but do not attempt to resume the cluster. a. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. b. When MAP 4700 directs you to resume the cluster, do not resume it. c. Exit MAP 4700, return here and continue with the next step. 6. Select the condition that applies: v Host bay FRUs are listed for host bay 2 only. Continue with the next step.

336

VOLUME 1, TotalStorage ESS Service Guide

MAP 4040: CPI Entry MAP


v Host bay FRUs are listed for host bay 4 only. Go to step 8 on page 335. v No host bay FRUs. Go to step 10. 7. Attempt to quiesce and then resume CPI 7 (Host Bay 4). From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource Resume a Resource Was it successful? v Yes, go to step 9. v No, call the next level of support. 8. Attempt to quiesce and then resume CPI 6 (Host Bay 2). From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource Resume a Resource Was it successful? v Yes, continue with the next step. v No, call the next level of support. 9. Resume Cluster 2. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Resume a Resource Was it successful? v Yes, go to step 4 on page 327. Note: The cluster has been resumed. Any unavailable resources must be assessed before continuing the repair to prevent customer loss of access. v No, call the next level of support. 10. Attempt to quiesce and then resume CPI 6 (Host Bay 2). From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource Resume a Resource Was it successful? v Yes, continue with the next step. v No, determine if there is a new problem, or if the problem details last occurrence timestamp has been updated for an existing problem. Then call the next level of support. 11. Attempt to quiesce and then resume CPI 7 (Host Bay 4). From the service terminal Main Service Menu, select: Utilities Menu
Problem Isolation Procedures, CHAPTER 3

337

MAP 4040: CPI Entry MAP


Resource Management Menu Quiesce a Resource Resume a Resource Was it successful? v Yes, continue with the next step. v No, determine if there is a new problem, or if the problem details last occurrence timestamp has been updated for an existing problem. Then call the next level of support. 12. Resume Cluster 2. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Resume a Resource Was it successful? v Yes, close the related problems and then use the Repair Menu, End Of Call Status option. v No, determine if there is a new problem, or if the problem details last occurrence timestamp has been updated for an existing problem. Then call the next level of support. MAP 4040 Section-7: 1. Have you have been directed to replace a CPI cable in addition to the FRUs listed in the problem? v Yes, when you replace a host bay or I/O drawer FRU, also replace the CPI cable. Continue with the next step. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. v No, continue with the next step. 2. Find the combination of FRUs listed in the problem, or logs, that have not been replaced. Do the action shown in the table:
Table 43. FRUs Not Yet Replaced FRUs Not Yet Replaced I/O drawer and host bay FRUs Action Replace the I/O drawer FRUs, go to MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 Replace the I/O drawer FRUs, go to MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 Go to step 3 Note: Both clusters must be available when a host bay FRU is replaced so the host bay FRU firmware can be properly loaded. There is no cluster FRU, so it is assumed that the CPI failure is in the CPI cable or host bay and the cluster can be made available.

I/O drawer FRUs only

Host bay FRUs only

3. Quiesce the first failing host bay.

338

VOLUME 1, TotalStorage ESS Service Guide

MAP 4040: CPI Entry MAP


From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource 4. Quiesce the second failing host bay. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource 5. Quiesce and then Resume the unavailable cluster. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource Resume a Resource Wait for the operator panel cluster Ready LED to light before continuing with the next step. Note: If the resume is not successful, do not continue. Call the next level of support. 6. Go to step 4 on page 327. Note: The cluster has been resumed. Any unavailable resources must be assessed before continuing the repair to prevent customer loss of access. MAP 4040 Section-8: Three of the four CPI interfaces have been fenced, call the next level of support.

MAP 4055: Resolving a Bay Held Reset Condition


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A cluster has detected a condition where a Host bay is being permanently held in reset state. This condition can be triggered during a machine or cluster recovery action. This is not generally a hardware failure and can be recovered by power cycling the affected bays.

Isolation
1. Is the ESS currently being used by the customer? v Yes, go to step 6 on page 340. v No, continue with the next step. 2. Cancel the problem or logs which sent you to this Map. From the service terminal Main Service Menu, select: Utility Menu Problem Log Menu
Problem Isolation Procedures, CHAPTER 3

339

MAP 4055: Bay Held Reset Condition


Cancel Problems by Selecting Problem Ids 3. Power off the ESS with the White switch on the Operator panel. Wait for the power to drop on both clusters. 4. Power on the ESS with the White switch on the Operator panel. Wait up to 40 minutes for both clusters to come Ready. 5. Check if any new problems were logged which send you to this Map. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Were any new problems created which send you to this Map. v Yes, call your next level of support. v No, manage any remaining problems using the problems or if no further problems exist then go to MAP 1500: Ending a Service Action on page 67. 6. Determine which Host bay or bays are affected. View the ESC in the problem or logs. Make a list of the Host Bays.
ESC 1100 1134 1135 1136 1137 1144 1145 1146 1147 Host Bay See Note CPI 4 / Host Bay 1 CPI 5 / Host Bay 3 CPI 6 / Host Bay 2 CPI 7 / Host Bay 4 CPI 4 / Host Bay 1 CPI 5 / Host Bay 3 CPI 6 / Host Bay 2 CPI 7 / Host Bay 4 Cluster Interface See Note CLUSTER 1 CLUSTER 1 CLUSTER 1 CLUSTER 1 CLUSTER 2 CLUSTER 2 CLUSTER 2 CLUSTER 2

Note: This ESC will only be used on code levels below 2.3.0.0. Next level of support will need to determine the affected Host bay from the AIX error log. Alternatively, you can proceed with this map and perform step 10 on page 341 against all Host bays. 7. Determine whether one or both clusters is operational. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Show Fenced Resources Show Quiesced Resources Note: If the cluster you are logged onto is fenced, then you may not be able to display the Fenced or Quiesced Resources. In that case logon to the working cluster. 8. Connect the service terminal to an operating cluster. 9. Cancel the problem or logs which sent you to this Map. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Utility Menu

340

VOLUME 1, TotalStorage ESS Service Guide

MAP 4055: Bay Held Reset Condition


Problem Log Menu Cancel Problems by Selecting Problem Ids 10. Reference the list of failing host bays determined in step 6 on page 340. Perform actions a, b, and c in this step for each failing host bay. If CPI4 / Host Bay 1 was determined to be failing, then to avoid further problems, repeat this step for ALL Host Bays. Note: When performing actions a and c you will need to work with the customer to remove and then restore host access to the affected host bays i.e. vary host paths offline and then online or use CUIR. If you are not familiar with this procedure then contact your next level of support. a. Quiesce the Host Bay. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource b. Power Off/On the Host Bay: From the service terminal Main Service Menu, select: Utility Menu Host Bay Power Off/On Menu Power Off a Host Bay Power On a Host Bay c. Resume the Host Bay: From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Resume a Resource 11. Check if any new problems were logged which send you to this Map. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Were any new problems created which send you to this Map? v Yes, call your next level of support. v No, manage any remaining problems using the problems or if no further problems exist then go to MAP 1500: Ending a Service Action on page 67.

MAP 4060: Replacing I/O Drawer FRUs for CPI Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: The FRUs and cables in this procedure are ESD-sensitive. Always wear an ESD wrist strap during this isolation procedure. Follow the ESD procedures in Working with ESD-Sensitive Parts in chapter 4 of the Volume 2.

Description
This MAP is used to replace cluster I/O drawer FRUs for CPI problems. These FRUs are the IOA/NVS card and I/O drawer planar assembly.

Problem Isolation Procedures, CHAPTER 3

341

MAP 4060: I/O Drawer FRUs for CPI Problems


Only one cluster may be fenced or quiesced at a time. If a cluster is already fenced or quiesced, replace the FRUs in that cluster first. If no cluster is fenced or quiesced, you may quiesce either cluster and then replace its FRUs.

Isolation
Note: All CPI repairs begin at MAP 4040. If you were not sent here from MAP 4040: Entry MAP for CPI Problems on page 326, go there now. 1. Do one of the following: v If a cluster is fenced, replace the FRUs in that cluster I/O drawer first. Go to the next step. v If a cluster is not fenced, and you have FRUs listed in both clusters, you may select either cluster to replace FRUs in. Go to the next step. 2. Replace the cluster I/O drawer FRU or FRUs, go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. a. When MAP 4700 directs you to close the problem, do not close it until you have replaced all the listed FRUs. b. When MAP 4700 directs you to go to MAP 1500: Ending a Service Action on page 67, return here instead and continue with the next step. Have all listed FRUs for both the I/O drawer and host bays been replaced? v Yes, continue with the next step. v No, replace the FRUs by using MAP 4040: Entry MAP for CPI Problems on page 326. (Use MAP 4040 because the unavailable resources may have changed.) Are any host bays fenced? v Yes, continue with the next step. v No, go to step 6. Quiesce the fenced host bay. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource Are any host bays quiesced? v Yes, continue with the next step. v No, go to step 8. Resume the quiesced host bay. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Resume a Resource Close the related problems and then use the Repair Menu, End Of Call Status option.

3.

4.

5.

6.

7.

8.

342

VOLUME 1, TotalStorage ESS Service Guide

MAP 4060: I/O Drawer FRUs for CPI Problems


Note: If the listed FRUs did not fix the problem, and the host bay planar and NVS/IOA card were replaced, the CPI cable between them may be failing.

MAP 4070: Replacement of Host Bay FRUs for CPI Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: The FRUs and cables in this procedure are ESD-sensitive. Always wear an ESD wrist strap during this isolation procedure. Follow the ESD procedures in Working with ESD-Sensitive Parts in chapter 4 of the Volume 2.

Description
This MAP is used to replace host bay FRUs. The CPI diagnostics are run when the host bay is resumed.

Isolation
Note: All CPI repairs begin at MAP 4040. If you were not sent here from MAP 4040: Entry MAP for CPI Problems on page 326, go there now. 1. Replace the host bay FRU or FRUs. Use the Replace a FRU option. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU 2. Have all listed FRUs for both the I/O drawer and host bays been replaced? v Yes, continue with the next step. v No, replace the FRUs by using MAP 4040: Entry MAP for CPI Problems on page 326. (Use MAP 4040 because the unavailable resources may have changed.) 3. Close the related problems and then use the Repair Menu, End of Call Status option. Note: If the listed FRUs did not fix the problem, and the host bay planar and NVS/IOA card were replaced, the CPI cable between them may be failing.

MAP 4090: CPI Address Mismatch


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The CPI diagnostics check that each cluster IOA card CPI interface is cabled to the proper host bay CPI interface. A diagnostic detected CPI address mismatch indicates a CPI address logic failure if only one error is detected. If two errors are detected, then the most likely cause is two CPI cables being cross connected. The CPI cables and adjacent sheet-metal are marked with matching color labels to indicate proper connection.
Problem Isolation Procedures, CHAPTER 3

343

MAP 4090: CPI Address Mismatch

Isolation
1. Determine if there are one or two problems related to CPI address mismatch. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): v There is only one related problem. Continue the repair using the problem and replace the listed FRU(s). v There are two or more related problems. Go to the next step. 2. Two or more CPI cables are cross connected. Use the color labels on CPI cables and adjacent sheet metal to determine which cables are crossed. Or use the following tables to determine the proper connections for each CPI cable. Notes: a. The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. b. Reference to Locating a CPI Cable Using Colored Labels in chapter 7 of the Volume 3.
Table 44. CPI Cable Connections CPI Interface CPI4 Local CPI4 Remote CPI5 Local CPI5 Remote CPI6 Local CPI6 Remote CPI7 Local CPI7 Remote Cluster Location T1-U0.1-P1-I3/A1 T2-U0.1-P1-I3/A1 T2-U0.1-P1-I4/A1 T1-U0.1-P1-I4/A1 T1-U0.1-P1-I5/A1 T2-U0.1-P1-I5/A1 T2-U0.1-P1-I9/A1 T1-U0.1-P1-I9/A1 Host Bay Location R1-B1-P1/JB R1-B1-P1/JA R1-B3-P1/JB R1-B3-P1/JA R1-B2-P1/JB R1-B2-P1/JA R1-B4-P1/JB R1-B4-P1/JA Color Code Green Yellow Red Violet Gray Brown Orange Blue

3. Determine the end of each cable that is cross connected. Use the service terminal Main Menu, Replace a FRU option to quiesce and power off the FRUs the cables are connected to before correcting the cable connections. v Use the host bay FRU option for that end of each CPI cable. v Use the cluster FRU option for that end of each CPI cable.

MAP 40A0: Fence Network Isolation


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Each cluster I/O drawer has a unique jumper that identifies it as cluster 1 (left) or cluster 2 (right). These jumpers are installed when the 2105 is built by manufacturing. When the I/O drawer planar assembly is replaced, the jumper must be moved from the old planar to the new planar. The functional code will post a problem if the jumpers are missing or the same ID. A problem will be created and refer to this MAP.

344

VOLUME 1, TotalStorage ESS Service Guide

MAP 40A0: Fence Network Isolation

Isolation
1. Display the Problems Needing Repair. Is ESC 2780 in the problem details? v Yes, go to step 6 on page 346. v No, continue with the next step. 2. Use the problem to determine which cluster is failing. A quick visual inspection can be done without quiescing and powering off the cluster. Move the I/O drawer to the service position, then open the top cover just long enough to verify the proper cluster ID jumper is installed: v Cluster 1 (left) jumper with blue wires is labeled T1-U0.1 P1/Q5 (P/N 18P3209) v Cluster 2 (right) jumper with blue wires is labeled T2-U0.1 P1/Q5 (P/N 18P3210) Is either jumper missing or incorrect? v Yes, continue with the next step. v No, go to step 5.

Top View

Cluster ID Jumpers Cluster 1, T1-U0.1 P1/Q5 Cluster 2, T2-U0.1 P1/Q5

Front
Figure 128. I/O Drawer Cluster ID Jumpers (s009459)

3. The cluster ID jumper is missing or incorrect. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to install the correct jumper. Then continue with the next step. 4. Login and display the original problem. Observe the last occurrence date/time field in the problem details display. Was the date/time field updated? v Yes, the problem has not been fixed, continue with the next step. v No, the problem is fixed. Use the Repair Menu options, Close a Previously Repaired Problem and End of Call Status to complete the repair action. 5. The possible failing FRUs are:
Problem Isolation Procedures, CHAPTER 3

345

MAP 40A0: Fence Network Isolation


v Cluster ID jumper v I/O drawer planar assembly v Cluster ID jumper cable (cable assembly, NVS bottom card to jumper bracket) Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. Use step 4 on page 345 to verify the fix. If the problem is not resolved following replacement of all listed FRUs, continue with the next step. 6. There is a problem with the fence network that will require further analysis to determine an action plan. Generate a PE password and call the next level of support. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Call Home / Remote Services Menu Enable Product Engineering Access (note the password)

MAP 40B0: Special Cluster Problem Determination Using Slow Boot Mode
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is for special cases where additional problem determination is needed to try to generate an error code

Isolation
1. Connect the service terminal to the cluster not being serviced. 2. Quiesce and power off the cluster being serviced. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Quiesce Alternate Cluster Power Off the Alternate Cluster 3. Wait up to three minutes for the CEC drawer operator panel to display OK. Is OK displayed? v Yes, continue with the next step. v No, go to step MAP 4730: Cluster Power Off Request Problem on page 446. 4. When OK is displayed on the CEC drawer operator panel, connect the service terminal (CE most or laptop) to the S2 port on the cluster being serviced. Press the Enter key to cause a keyboard interrupt to the service processor. The service processor (SP) Main Menu should be displayed. Note: The Master console cannot be used to access the SP menus. 5. Disable Fast System Boot. (Slow system boot uses more diagnostics which might be able to provide an error code to repair with.) From the service terminal Main Service Menu, select:

346

VOLUME 1, TotalStorage ESS Service Guide

MAP 40B0: Special Cluster Problem Determination Using Slow Boot Mode
System Power Control Menu Enable/Disable Fast System Boot Note: Remember to reset to fast system boot when the repair is complete. 6. Power on the cluster using the alternate cluster repair menu options. Look for an error code to repair the cluster with that does not send you back to this MAP. If there is none, go to the next step. 7. Display the SP error logs looking for repair information. Power off the cluster. From the service terminal Main Service Menu, select: System Information Menu Read Service Processor Error Logs 8. Display the SP progress indicators from the last system boot (cluster power on). From the service terminal Main Service Menu, select: System Information Menu Read Progress Indicators form Last System Boot Read the last progress indicator to determine what might have occurred last before the cluster failed or hung. 9. Go to MAP 4540: Cluster Minimum Configuration on page 418.

MAP 40C0: Special SCSI Bus Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is for special cases where additional problem determination is needed for an error code.

Isolation
Note: Only use this MAP if directed here by a service guide procedure. 1. If a SCSI bus repair is in progress, check the FRU(s) that were just replaced are: a. The correct part number. b. Correctly connected for signal and power inside the CEC drawer and between the CEC drawer to the I/O drawer. c. Correctly plugged for the SCSI ID. (CD-ROM = 3, HDD1 = 0, HDD2 = 2). 2. Verify that the SCSI devices are receiving power. With the cluster powered on, press the CD-ROM drive eject button. Does the CD tray open? v Yes, go to step 4 on page 348. v No, verify that the SCSI power cable is connected to each hard disk drive and CD-ROM drive in the CEC drawer. Ensure the SCSI power cable between the CEC drawer and I/O drawer is connected. If no problem is found continue with the next step. 3. There may be an overcurrent condition that tripped the automatic resettable fuses on the I/O drawer planar assembly. a. Quiesce and power off the cluster. (Connect the service terminal to the working cluster and use the Alternate Cluster Repair Menu options.) b. Wait more than five minutes with power off for the fuses to reset.
Problem Isolation Procedures, CHAPTER 3

347

MAP 40C0: Special SCSI Bus Problems


c. Power on the cluster and observe the CD-ROM drive ready LED indicator. Does the LED blink during the power on? v Yes, the problem is not failing at this time. If you suspect an intermittent problem, continue with the next step. v No, there may be an overcurrent condition. Unplug the power connector from the following FRUs, one at a time: hard disk drives and CD-ROM drive in the CEC drawer. Repeat the cluster power on. If the CD-ROM LED still does not blink, replace the I/O drawer planar assembly. 4. If the SRN is for a terminator power failure (xxx-226, xxx-240, xxx-800), quiesce and power off the cluster. Let it sit 5 minutes and then power on the cluster. If it still fails, continue with the next step. 5. Disconnect the SCSI devices one at a time to isolate if one is holding down the SCSI interface during power on. Check for a damaged SCSI cable or connector at the CEC drawer or I/O drawer. If none are found, call the next level of support.

MAP 40D0: Special SRN Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Use this MAP to resolve problems reported by SRNs A00-000 to A1F-FFF.

MAP 40D0 Section-1


1. Refer to the last character of the SRN. A value of 4, 5, 6 or 7 indicates a possible software or firmware problem. Does the last character indicate a possible software or firmware problem? v Yes, go to MAP 40D0 Section-2. v No, go to MAP 40D0 Section-4.

MAP 40D0 Section-2


1. Verify that all the firmware and software for this LIC code level have been installed. Check if there are any known problems with this LIC code level. Are there any known problems with this LIC code level? v Yes, the new LIC code may have a problem, call the next level of support. v No, go to MAP 40D0 Section-3.

MAP 40D0 Section-3


1. Were any FRUs reported with the SRN? v Yes, go to MAP 40D0 Section-7 on page 349. v No, go to MAP 40D0 Section-4.

MAP 40D0 Section-4


1. Were there any other SRNs in the range A00 xxx to A1F xxx reported? v Yes, go to MAP 40D0 Section-1 and use the next SRN. v No, go to MAP 40D0 Section-5.

MAP 40D0 Section-5


Use the service processor menu to set the cluster in slow boot mode. Slow mode boot runs additional diagnostics that may detect this problem during cluster power on and code load.

348

VOLUME 1, TotalStorage ESS Service Guide

MAP 40D0: Special SRN Problems


1. Quiesce and power off the cluster being serviced. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Quiesce Alternate Cluster Power Off the Alternate Cluster 2. Wait up to three minutes for the CEC drawer operator panel to display OK. Is OK displayed? v Yes, continue with the next step. v No, go to MAP 4730: Cluster Power Off Request Problem on page 446. 3. When OK is displayed on the CEC drawer operator panel, connect the service terminal to the S2 serial port of the cluster being repaired. Press the Enter key to cause a keyboard interrupt to the service processor. The service processor Main Menu should be displayed. 4. Setup to boot to the SMS menu. From the service terminal Main Service Menu, select: System Power Control Menu Boot Mode Menu (Disable fast boot mode) 5. Go to MAP 40D0 Section-6.

MAP 40D0 Section-6


1. Power on the cluster and then find the condition that applies: v If the cluster hangs with a code displayed on the CEC drawer operator panel, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v If the cluster loads code, login and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). If there is a new related problem, exit this MAP and repair it now. v If no new error code or problem was created, go MAP 40D0 Section-7.

MAP 40D0 Section-7


1. Refer to the last character in the SRN, a 1, 3, 5, or 7 indicates all listed FRUs should be replaced at the same time. Is the last character of the SRN a 1, 3, 5, or 7? v Yes, replace all at once the FRUs listed for the SRN. v No, replace one at a time the FRUs listed for the SRN.

MAP 40E0: Only One I/O Drawer Power Supply Detected


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Observe the FRU location code for the I/O drawer power supply in the problem details: v If the FRU location code ends in U0.1-V, the cluster firmware detected only one of the two power supplies when the cluster powered on.

Problem Isolation Procedures, CHAPTER 3

349

MAP 40E0: Only One I/O Drawer Power Supply Detected


v If the FRU location code ends in U0.1-V1 or U0.1-V2, both power supplies were detected normally.

Isolation
1. Observe the LED indicators on both power supplies for the failing drawer. Is an amber check indicator on? v Yes, go to step MAP 2230: CEC, I/O, or Host Bay Drawer Power Fault on page 122. v No, continue with the next step. 2. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): Are there any related power problems for the same I/O drawer? v Yes, exit this MAP and repair them now. (If the other problem also sends you to this MAP, answer No to this step. ) v No, continue with the next step. 3. One of the I/O drawer power supplies is failing without a visual symptom. To determine if the failure is still occurring will require you to power the cluster off and then on. Do the following: a. Close the problem, or logs, that sent you here. Use the Repair Menu, Close a Previously Repaired Problem option. b. Use the Repair Menu, Alternate Cluster Repair menu options to quiesce, power off, and power on the cluster. Do not resume the cluster. c. Wait up to 45 minutes for the cluster to come ready. Note: If the cluster does not come ready, go to MAP 6060: Isolating a Service Terminal Login Failure on page 567. d. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option) and then do one of the following: v If there is a new problem for this error, one of the power supplies is failing. Continue with the next step. v If the original problem details Last Occurrence Timestamp field is updated, one of the power supplies is failing. Continue with the next step. v If there is not a new problem and the original problem Last Occurrence Timestamp field is not updated, the error condition is not failing. Close the original problem using the Repair Menu, Close a Previously Repaired Problem option. Use the Repair Menu, End of Call Status option to complete the service action. 4. Isolate the failing power supply: a. Close the problem, or logs, that sent you here. Use the Repair Menu, Close a Previously Repaired Problem option. b. Quiesce and power off the cluster. c. Replace one power supply. d. Wait up to 45 minutes for the cluster to come ready. Note: If the cluster does not come ready, go to MAP 6060: Isolating a Service Terminal Login Failure on page 567. e. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option) and then do one of the following: v If there is a new problem, or if the original problem details timestamp field was updated, the power supply that was not replaced is failing. Replace

350

VOLUME 1, TotalStorage ESS Service Guide

MAP 40E0: Only One I/O Drawer Power Supply Detected


the failing power supply. If replacing both power supplies does not fix the problem, replace the I/O drawer planar assembly. v If there is not a new problem or the original problem was not updated, you replaced the failing power supply. Close the original problem using the Repair Menu, Close a Previously Repaired Problem option. Use the Repair Menu, End of Call Status option to complete the service action.

MAP 4100: Isolating a LIC Process Read/Display Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Isolation
Determine if the LIC installation will be from CD-ROM or diskette: v If using a CD-ROM as the LIC installation media, go to MAP 4600: Isolating a CD-ROM Test Failure on page 429. v If using a diskette as the LIC installation media, go to MAP 4620: Isolating a Diskette Drive Failure on page 430.

MAP 4110: Host Bay Drawer Fan Reporting Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Both host bay drawer power supplies in the same host bay drawer are reporting different status for one or more host bay drawer cooling fans. It is necessary to first replace the power supply that is reporting the fan failure. If the conflicting status error still is reported, then the other power supply must be replaced. Once the conflicting status problem has been repaired, then any remaining host bay drawer fan problem can be repaired.

Isolation
1. Read the description above before continuing. The most likely failing FRUs are: v Host bay drawer power supply 1 or 2 v Host bay planar Note: Replacing a power supply or cable can be done concurrently. Replacing the host bay planar requires taking away both host bays from customer use. 2. Record the Last Occurrence Time stamp field value from the problem details display for this problem. After the FRU has been replaced, display the same field to determine if the error is still occurring (time stamp was updated). Look for any new related problems that may have been created. 3. Do one of the following: v To replace or reseat a host bay drawer power supply continue at the next step. v To replace the host bay planar go to step 5 on page 352.

Problem Isolation Procedures, CHAPTER 3

351

MAP 4110: Host Bay Drawer Fan Reporting Failure


4. If the FRU list in the problem details gives the location of only one host bay drawer power supply replace it first. If both power supplies locations are listed, replace either power supply first. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU (Host bay x) 5. Replace the host bay planar using MAP 4850: Repair the Host Bay Drawer on page 458.

MAP 4120: Handling Unexpected Resources


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This failure indicates that a resource has been detected (ESC = 1202) that has not been properly installed in the 2105 Model 800 .

Isolation
1. Is there another problem (ESC = 1201) indicating that a resource is missing? v Yes, a FRU has been placed in a wrong location and needs to be moved, go to step 6 on page 353. 2. v No, continue with the next step. Look at the resource in the FRU list of the problem. The 2105 Model 800 has detected a resource that has not been properly installed. Should this resource be installed in this machine? v Yes, record the Problem ID number then continue with the next step to install this resource. v No, go to step 7 on page 353. Look at Install and Remove in chapter 5 of the Volume 2. See if there is an installation procedure for this resource. Is there an installation procedure for this resource? v Yes, continue with the next step and perform the installation. v No, there is no installation process for this resource. Call the next level of support for assistance. Perform the installation as described in the Service Guide. Were you able to complete the installation? v Yes, continue with the next step to cancel original problem. v No, contact your next level of support. The problem is now resolved, cancel the original problem. Press F3 until Main Service Menu is displayed. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem Select the problem with ID you recorded in step 2. Scroll to bottom of display and select the line that starts with: Close Problem ..... The problem is now closed and this repair is complete.

3.

4.

5.

352

VOLUME 1, TotalStorage ESS Service Guide

MAP 4120: Handling Unexpected Resources


6. You are going to move the FRU to the correct location. Select the FRU in the FRU list of the other problem which indicates the missing resource. When directed to replace the FRU, move the FRU to the correct location. Continue through Verification. Does Verification run without a problem? v Yes, the problem is resolved. Return to the service terminal and follow directions to return the resource to the customer and close the problem. v No, resolve the problem created by verification. 7. You will remove the resource from the system. a. Select the FRU from the problem FRU list. b. When you are directed to replace the FRU, follow the Remove/Replace instructions to remove the FRU, but do not replace the FRU. Follow any instructions for any reassembly required. c. Go through the verification process. Does Verification run without a problem? v Yes, the problem is resolved. Return to the service terminal and follow directions to return the resource to the customer and close the problem. v No, resolve the problem created during the verification.

MAP 4130: Handling a Missing or Failing Resource


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This failure indicates that a resource has not been detected (ESC = 1201) that should be in the 2105 Model 800. This may mean that the resource is not in the expected location or the resource is failing in such a way that it can not be detected.

Isolation
1. Is there another problem (ESC = 1202) indicating that a resource is unexpected? v Yes, a FRU has been placed in a wrong location, continue with the next step to move the FRU. v No, go to step 3. 2. You are going to move the FRU to the correct location. Select the FRU in the FRU list with either of the two problems. When directed to replace the FRU, move the FRU to the correct location. Continue through verification. Does Verification run without a problem? v Yes, the problem is resolved. Return to the service terminal and follow directions to return the resource to the customer and close the problem. v No, resolve the problem created by verification. 3. Is the listed FRU an NVS card? v Yes, go to step 6 on page 354. v No, continue with the next step. 4. You will add or replace the missing/failing resource. a. Select the FRU from the problem FRU list. b. When you are directed to replace the FRU, follow the remove/replace instructions to remove the FRU.
Problem Isolation Procedures, CHAPTER 3

353

MAP 4130: Handling a Missing or Failing Resource


Is there a FRU in that location? v Yes, the FRU has failed. Remove the FRU and continue with the next step. v No, the FRU is missing. Add a FRU to that location and continue with the next step. 5. Place a FRU in the specified location and follow the replace instructions through verification. Does Verification run without a problem? v Yes, the problem is resolved. Return to the service terminal and follow directions to return resources to the customer and close the problem. v No, resolve the problem created during verification. 6. Display all the problems needing repair. Is there a problem for a second NVS card? v Yes, continue with the next step. v No, use the original problem to replace the NVS/IOA card. If the failure still occurs, the remaining FRU is the I/O drawer planar assembly. 7. Use the table below to determine if both NVS cards share the same NVS Power card. Do they share the same NVS power card? v Yes, replace the NVS power card common to both NVS/IOA cards. If it still fails, the remaining FRUs are the I/O planar and each NVS/IOA card. v No, go to step 5 to replace each NVS card.
Table 45. NVS Power Cards NVS/IOA Card in Location Tx-U0.1-P1-I3 and Tx-U0.1-P1-I5 Tx-U0.1-P1-I4 and Tx-U0.1-P1-I9 Share NVS Power Card in Location Tx-U0.1-P1-I6 Tx-U0.1-P1-I10

MAP 4140: Isolating a LIC Activation Process Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Do not power off the 2105 Model 800 unless instructed to do so.

Description
A Cluster hard disk drive is failing or data on it has been corrupted.

Isolation
1. Select the type of LIC update process that failed: v If Automatic LIC Activation failed, call the next level of support. v If Multiple LIC Activation failed, continue with the next step. 2. Connect the service terminal to the failing cluster and attempt to login. Was the login successful? v Yes, continue with the next step. v No, go to step 5 on page 355. 3. Do the following and then return here and continue: v Display and repair any problem for the failing cluster.

354

VOLUME 1, TotalStorage ESS Service Guide

MAP 4140: LIC Activation


v Display the cluster dual hard disk drive status and repair any failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Menu Display Cluster Dual Hard Disk Drive Status (Identify/Replace a Failing Cluster Hard Disk Drive) v Go to MAP 43A0: Bootlist Management Using SMS on page 387 to isolate a failure to boot problem (hardware or software). Were any of the above actions used to successfully repair a problem? v Yes, continue with the next step. v No, call your next level of support. 4. Attempt the LIC Activation process again. Was it successfully? v Yes, exit this MAP and return to the procedure that sent you here. v No, if it still fails, call the next level of support. 5. Is the failing cluster hung displaying a code in the CEC drawer operator panel? v Yes, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v No, call the next level of support.

MAP 4150: PPS to RPC Interface Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A failure at the interface between the primary power supply (PPS) and RPC card has been detected. The failure can be caused by the PPS, the RPC card or the cable between them.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Replace the FRUs listed in the problem. If the error still occurs, replace the cable between the RPC card and the PPS. v The PPS in rack 1 connects to RPC card, J2 connector slot 6 (near front of rack) v The PPS in rack 2 connects to RPC card, J2 connector slot 5 (near rear of rack)

MAP 4160: Isolating Memory Related Error Codes


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Problem Isolation Procedures, CHAPTER 3

355

MAP 4160: Isolating Memory Related Error Codes

Description
You have been directed here because you have an SRN or service processor error code that lists one or more memory FRUs (memory DIMM or memory riser card). v Memory DIMMs work together in quads (four memory DIMMs with memory words spread across them). See the artwork and tables at the end of this MAP. v A single memory DIMM may fail and not affect the operation of the other three DIMMs in the quad it is part of. Problem logs for the single failing DIMM will be created. v A memory DIMM may fail and affect one or more memory DIMMs in the quad it is part of. Problem logs for each affected memory DIMM will be created. v All memory DIMMs in a quad must be of the same type and size.

Isolation
1. Read the description section above before continuing. 2. Display problems needing repair. Is there more than one problem that lists memory FRUs for the failing cluster? v Yes, It is possible for a single memory failure to create multiple problem logs. For example, if a memory DIMM is physically missing during cluster power on, one or two problems may be created for each of the three remaining DIMMs in the same quad. There will be no problem for the missing DIMM. Go to step 4. v No, continue with the next step. 3. Does the problem list a single Memory DIMM with no other FRUs. v Yes, go to step 6. v No, continue with the next step. 4. If more than one memory DIMM is called out, go to the service processor memory configuration/deconfiguration menu to verify the memory DIMM state. Access the SP menus: v Connect the service terminal to the working cluster. v Use the Main Service Menu, Repair Menu, Alternate Cluster Repair Menu options to quiesce and power off the failing cluster. v When the failing cluster CEC drawer operator panel displays OK, connect the service terminal (CE MOST or laptop) to the I/O drawer S1 serial port connector and login. Use the SP MAIN MENU, System Information Menu, Memory Configuration/Deconfiguration Menu. 5. From the Memory Configuration/Deconfiguration Menu, select the card or cards specified by the location code or codes of the failing memory DIMM or DIMMs. If the first character of the error status of any memory DIMM is 1, 2, or 3 (but not 0 or 4), this is a suspect memory DIMM. Record its location. For more information on the error status of the memory DIMMs, see System Information Menu, step 2 on page 419. v If only one memory DIMM was recorded, go to step 6. v If more than one memory DIMM was recorded, and the memory DIMMs reside in one quad, go to step 7 on page 357. v If more than one memory DIMM was recorded, go to step 8 on page 357. 6. Only one memory DIMM was recorded. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the following FRUs in the order listed until the problem is fixed: a. The memory DIMM. b. The memory quad. See table below.

356

VOLUME 1, TotalStorage ESS Service Guide

MAP 4160: Isolating Memory Related Error Codes


c. The memory riser card.
Table 46. Memory Quad DIMMs Memory Quad A B C D Memory Riser Card DIMM Slot 1, 2, 15 ,16 3, 4, 13, 14 5, 6, 11, 12 7, 8, 9, 10

7. More than one memory DIMM was recorded, and the memory DIMMs reside in one quad. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the following FRUs in the order listed until the problem is fixed: a. All of the failing memory DIMMs. b. The memory riser card. 8. More than one memory DIMM was recorded, and the memory DIMMs reside in more than one quad. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the following FRUs in the order listed until the problem is fixed: a. The memory riser card. b. All of the failing memory DIMMs.

SLOT (15)A SLOT (13)B SLOT (11)C SLOT (9)D SLOT (7)D SLOT (5)C SLOT (3)B SLOT (1)A

SLOT (16)A SLOT (14)B SLOT (12)C SLOT (10)D SLOT (8)D SLOT (6)C SLOT (4)B SLOT (2)A

DIMM INSTALLATION
Figure 129. 2105 Model 800 Memory Riser Card Memory DIMM Locations (s009638)

MAP 4170: Loss of Redundant Input Power to CEC, I/O, or Host Bay Drawers
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A CEC drawer, I/O drawer, or host bay drawer power supply has lost input power to one of its two connectors. It is still powered up on the remaining good input power connector.

Problem Isolation Procedures, CHAPTER 3

357

MAP 4170: Loss of Redundant Input Power to CEC, I/O, or Host Bay Drawers

Isolation
1. Observe both input power LED indicators on the failing power supply. Find the condition below that applies: v One indicator is off, continue with the next step.

v Both indicators are on, the input is no longer failing, return to the problem that sent you here. v Both indicators are off. Use the problem to repair the power supply that has no input power. 2. Observe the same indicator on the power supply along side this one. Is the same indicator off on the other power supply? v Yes, continue with the next step. v No, verify that the power input cable is properly seated. Unplug and inspect the cable and power supply connector for damage and replace them if damage is found. If no damage is found, replace the power supply. 3. Observe the PPS digital status display at the front of the rack. Is a two digit status code displayed? v Yes, go to step MAP 2350: Isolating PPS Status Indicator Codes on page 127. v No, the display is blank, continue with the next step. 4. Each PPS has two power cables that supply six power supplies each. v The power cable plugged into PPS connector J7-1 supplies input power to the six power supplies above it (CEC drawer, I/O drawer, and host bay drawer). v The power cable plugged into PPS connector J7-2 supplies input power to the six power supplies across from it (CEC drawer, I/O drawer, and host bay drawer). Use the table below to find the power supply LED indicator that is off. Using the same table row, go to the first column and find the PPS connector. Verify that the power cable is properly connected to the PPS connector and there is no connector damage. The possible failing FRUs are the cable and the PPS. Note: With the all seven cable connectors unplugged, it is possible to use an ohm meter to check the continuity of the cable before replacing it.
Table 47. PPS Cable Connectors PPS Connector (Location Code) J7-1 (R1-V1) PPS-1 J7-2 (R1-V1) PPS-1 J7-1 (R1-V2) PPS-2 J7-2 (R1-V2) PPS-2 Host Bay Power Supply Connector (Location) J11 (R1-B1-V1 and R1-B1-V2) J12 (R1-B3-V1 and R1-B3-V2) J11 (R1-B3-V1 and R1-B3-V2) J12 (R1-B1-V1 and R1-B1-V2) I/O Drawer Power Supply Connector (Location) J1 (T1-U0.1-V1 and T1-U0.1-V2) J1 (T2-U0.1-V1 and T2-U0.1-V2) J1 (T2-U0.1-V1 and T2-U0.1-V2) J1 (T1-U0.1-V1 and T1-U0.1-V2) CEC Drawer Power Supply Connector (Location) J1 (T1-U1.1-V1 and T1-U1.1-V2) J1 (T2-U1.1-V1 and T2-U1.1-V2) J1 (T2-U1.1-V1 and T2-U1.1-V2) J1 (T1-U1.1-V1 and T1-U1.1-V2)

358

VOLUME 1, TotalStorage ESS Service Guide

MAP 4180: RPC to RPC Communication Failure

MAP 4180: RPC to RPC Communication Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
An RPC card has been detected as failing to communicate with the RPC reporting the error. The loss of communication could be caused by: v RPC to RPC communication cable connectors are loose or faulty v RPC card is receiving power from its PPS v RPC card did not power on correctly, the green indicator LED is off v RPC card DIP switch positions 1 and 2 are set the same, they should be set opposite of each other.

Isolation
Observe the green LED indicator on the failing RPC card. Is the indicator on? v Yes, verify that the RPC to RPC communication cable is connected to the J2 slot 8 connector of each card. Continue at step 4. v No, continue with the next step. 2. Do the following: v Verify that the RPC to RPC communication cable is connected to the J2 slot 8 connector of each card. v Verify that the PPS to RPC power cable is connected to RPC Card connector J2 slot 6 and PPS connector J4. v Verify that the RPC card DIP switch positions 1 and 2 are set opposite of each other. v Continue with the next step. 3. Observe the PPS digital status display at the front of the rack. Is a two digit status code displayed? v Yes, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v No, continue with the next step. 4. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): Is there is a problem for the failing RPC card? v Yes, repair it now. v No, continue with the next step. 5. Determine if the problem is still failing when the RPC card is powered off and on and unfenced. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Did the RPC green indicator light after the power on? v Yes, the problem is no longer failing. Close the problems and then use the End Of Call Status option. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem
Problem Isolation Procedures, CHAPTER 3

1.

359

MAP 4180: RPC to RPC Communication Failure


End of Call Status v No, the possible failing FRUs are the RPC cards and the RPC to RPC communication cable. Use the Replace a FRU option.

MAP 4190: RPC to Host Bay Drawer Power Supply Communication Failure
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
One or more host bay drawer power supplies have been detected as failing to communicate with the RPC reporting the error. The loss of communication could be caused by: v The RPC to host bay drawer power supply communication cable is loose or faulty. v The host bay drawer power supply is failing. v The RPC card communication interface is failing.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Verify that the RPC to host bay drawer power supply communication cables are connected correctly. Each cable has three connectors, one for the RPC card and one for each of the two host bay drawer power supplies. Use each row of the table to determine the connector locations of each of the four communication cables:
Table 48. Host Bay Drawer Power Supply Communication Cable Connectors RPC Connector RPC-1 J2 slot 9 RPC-2 J2 slot 9 RPC-1 J2 slot 13 RPC-2 J2 slot 13 Host Bay Drawer Power Supply Connector T1-U1.1-V1/J15 (right connector viewed from rear) T1-U1.1-V1/J14 (left connector viewed from rear) T2-U1.1-V1/J15 (right connector viewed from rear) T2-U1.1-V1/J14 (left connector viewed from rear) Host Bay Drawer Power Supply Connector T1-U1.1-V2/J15 J15 (right connector viewed from rear) T1-U1.1-V2/J14 (left connector viewed from rear) T2-U1.1-V2/J15 J15 (right connector viewed from rear) T2-U1.1-V2/J14 (left connector viewed from rear)

Are the cables connected correctly? v Yes, the possible failing FRUs are the host bay drawer power supply, or the RPC card to host bay drawer power supply communication cable, the RPC card. Use the Repair Menu, Replace a FRU option for the FRU. For the cable use the RPC card as the FRU being replaced. Note: If the failure is to one power supply, the failing FRU is probably the power supply. If the failure is to both power supplies, the failing FRU is probably the communication cable or the RPC Card.

360

VOLUME 1, TotalStorage ESS Service Guide

MAP 4190: RPC to Host Bay Drawer Power Supply Communication Failure
v No, to replug the cable, use the Repair Menu, Replace a FRU option for the FRU the cable connects to.

MAP 41A0: RPC Card Host Bay Drawer Fan Reporting Failure
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The host bay drawer power supplies are reporting conflicting status for the host bay drawer power supply cooling fans.

Isolation
1. The most likely failing FRU is host bay drawer power supply 1 or 2. 2. Record the Last Occurrence Time stamp field value from the problem details display for this problem. After the FRU has been replaced, display the same field to determine if the error is still occurring (time stamp was updated). Look for any new related problems that may have been created. 3. Replace one host bay drawer power supply. v If it still fails, replace the other power supply. v If it still fails call the next level of support. Note: If the power supply to be replaced is not listed in the problem, use the Repair Menu, Replace a FRU option instead.

MAP 41B0: CPI Interface NVS/IOA Card to Host Bay Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A CPI interface failure has been detected between an NVS/IOA card in a cluster and the connected host bay. There may be a problem with the CPI cable.

Isolation
Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. 1. Find the ESC from the problem in the table below to determine which CPI interface is failing. Then continue with the next step.
Table 49. Failing CPI Interface ESC 1111 1112 1113 Cable From Host Bay Host bay 1, left connector Host bay 1, right connector Host bay 2, left connector Cable To NVS/IOA Card Cluster 1, I/O drawer card slot 3 Cluster 2, I/O drawer card slot 3 Cluster 1, I/O drawer card slot 5 Resource cpcpi4 cpcpi4 cpcpi6

Problem Isolation Procedures, CHAPTER 3

361

MAP 41B0: CPI Interface NVS/IOA Card to Host Bay Failure


Table 49. Failing CPI Interface (continued) ESC 1114 1115 1116 1117 1118 Cable From Host Bay Host bay 2, right connector Host bay 3, left connector Host bay 3, right connector Host bay 4, left connector Host bay 4, right connector Cable To NVS/IOA Card Cluster 2, I/O drawer card slot 5 Cluster 1, I/O drawer card slot 4 Cluster 2, I/O drawer card slot 4 Cluster 1, I/O drawer card slot 9 Cluster 2, I/O drawer card slot 9 Resource cpcpi6 cpcpi5 cpcpi5 cpcpi7 cpcpi7

Are you installing the 2105 or did you just replace the host bay planar and/or NVS/IOA card FRU or FRUs for the failing CPI interface? v Yes, check or reseat the CPI cable at both ends for the failing CPI interface and then retry the FRU verification. If it still fails, the possible failing FRUs are the CPI cable , NVS/IOA card (in I/O drawer), or host bay planar. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the FRUs. v No, continue with the next step. 3. Did you just replace the CPI cable for the failing CPI interface? v Yes, check or reseat the CPI cable at both ends for the failing CPI interface and then retry the FRU verification. If it still fails the possible failing FRUs are the CPI cable, NVS/IOA card (in I/O drawer) or host bay planar. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the FRUs. v No, the possible failing FRUs are the NVS/IOA card (in I/O drawer) or host bay planar. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the FRUs.

2.

MAP 41C0: ESC 2770 or 2771, Missing CPI Detected


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
An NVS/IOA card CPI interface resource has been detected as logically missing. The ESC defines if one or both clusters are detecting this condition.

Isolation
Do one of the following: v If the ESC is 2770, continue with the next step. v If the ESC is 2771, go to step 5 on page 363. 2. Are FRUs listed in the problem? v Yes, continue with the next step. v No, go to step 4 on page 363. 3. Use the table below to find the combination of FRUs listed and the needed action. 1.

362

VOLUME 1, TotalStorage ESS Service Guide

MAP 41C0: ESC 2770 or 2771, Missing CPI Detected


Table 50. CIP FRUs Engineering FRU Names Listed in Problem rsioa04 and rsioa06 with location codes for the same cluster Description and Action v The NVS Battery Charger card in the P1-I6 slot should be added to the FRU group listed. It provides power to this pair of NVS/IOA cards. v Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the NVS/IOA cards. For other FRUs, use normal repair procedures. rsioa05 and rsioa07 with location codes for the same cluster v The NVS Battery Charger card in the P1-I10 slot should be added to the FRU group listed. It provides power to this pair of NVS/IOA cards. v Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the NVS/IOA cards. For other FRUs, use normal repair procedures. Other combination of FRUs and locations Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace NVS/IOA cards. For other FRUs, use normal repair procedures.

4.

Determine which cluster is detecting the problem. In the problems details, go to the Additional Engineering Information for this problem section to see the Failing Cluster field value: v 1 = cluster 1, left

v 2 = cluster 2, right 5. Determine which CPI interface is failing by doing one of the following: v Go to step 6 to display the resource list using the service login. v Call the next level of support. They will need to access the detailed error log information by using a remote login or a PE package. (The PE database has a procedure for ESC = 2770 or 2771 that they can use.) Then go to step 7 on page 364. 6. Display the resources and determine which of the four CPI interfaces is not listed as Available. Repeat this procedure logged in to each cluster. From the service terminal Main Service Menu, select: Configuration Options Menu Show Storage Facility Resources Menu Show Storage Facility Resources The CPI interface resource names are listed in the first column. The four CPI resources are cpcpi4, cpcpi5, cpcpi6, and cpcpi7. A working resource will have a status of Available in the second column. A failing resource will have a status of Defined in the second column or will not be listed in column one. There can be several hundred resources listed. Use the AIX find feature. Type / to open the find feature. Type cpcpi and press Enter to search for the first occurrence. A CPI resource should be displayed. Repeat the find three more times. How many CPI interfaces are not listed as Available or are not listed: v One, continue with the next step.
Problem Isolation Procedures, CHAPTER 3

363

MAP 41C0: ESC 2770 or 2771, Missing CPI Detected


v More than one, call the next level of support. Determine the location of the NVS/IOA card to be replaced. Use step 4 on page 363 to determine the cluster I/O drawer. Use this table to convert the CPI interface (from step 5 on page 363) to the I/O drawer slot location.

7.

Table 51. Cluster I/O Drawer Slot Locations CPI Interface cpcpi4 cpcpi5 cpcpi6 cpcpi7 I/O Drawer slot location P1-I3 P1-I4 P1-I5 P1-I9

8. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the NVS/IOA FRU or FRUs.

MAP 41D0: CPI Problem for Host Bay Slot Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A CPI problem is being detected for a host bay planar host adapter slot. The most likely cause is the host adapter installed in the slot. The failure was discovered at a time when the normal reporting information was not available. Special information in the FRU Name field of the problem details identifies the failing slot location.

Isolation
1. Determine which host adapter card is failing, At least one FRU listed in the problem details is a host adapter card. To determine which host adapter card to replace, display the FRU Name fields in the problem details. Use the table below to translate the FRU Name value to the host adapter card location to replace. Note: The FRU Name field syntax is: Slot xy0 where x is the CPI interface and y is the host bay slot.
Table 52. Host Adapter Card FRU Names Text in FRU Name field Slot 400 Slot 410 Slot 420 Slot 430 Host Adapter Card to Replace Host Bay 1, Slot 1 Host Bay 1, Slot 2 Host Bay 1, Slot 3 Host Bay 1, Slot 4

Slot 500 Slot 510 Slot 520 Slot 530

Host Bay 3, Slot 1 Host Bay 3, Slot 2 Host Bay 3, Slot 3 Host Bay 3, Slot 4

Slot 600 Slot 610

Host Bay 2, Slot 1 Host Bay 2, Slot 2

364

VOLUME 1, TotalStorage ESS Service Guide

MAP 41D0: CPI Problem for Host Bay Slot Failure


Table 52. Host Adapter Card FRU Names (continued) Text in FRU Name field Slot 620 Slot 630 Host Adapter Card to Replace Host Bay 2, Slot 3 Host Bay 2, Slot 4

Slot 700 Slot 710 Slot 720 Slot 730

Host Bay 4, Slot 1 Host Bay 4, Slot 2 Host Bay 4, Slot 3 Host Bay 4, Slot 4

2. Replace the FRU or FRUs listed in the problem using the Repair Menu, Replace a FRU option. If the problem still occurs, call the next level of support.

MAP 41E0: CPI Failure Needing CPI Cable as FRU


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A CPI problem has been detected which can be caused by a CPI cable or one of the FRUs listed in the problem.

Isolation
1. A CPI cable should be replaced along with the FRUs listed in the problem. Use the following table to determine which CPI cable to replace. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished.
Table 53. CPI Cable FRUs Listed FRUs NVS/IOA card Host Bay Planar CPI Cable to Add to FRU List CPI cable connected to the NVS/IOA card CPI cable between host bay planar and cluster in the Failing Cluster field in the problem details. CPI cable between the listed FRUs.

NVS/IOA card and Host Bay Planar

2. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the FRUs.

MAP 41F0: A Temporary CPI Error was Detected


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A host bay or cluster was temporarily unavailable for customer use because it was fenced due to a CPI error. CPI diagnostics were run automatically and did not fail.

Problem Isolation Procedures, CHAPTER 3

365

MAP 41F0: A Temporary CPI Error was Detected


The host bay or cluster was returned to customer use. The customer may have detected a temporary loss of performance or access.

Isolation
There is no repair action needed for this problem and it should be closed. 1. Contact the customer to ensure that the recovery was successful. 2. Close the problem. Note: If this CPI error reoccurs, a new problem will be created that will require FRUs to be replaced. The FRUs in this problem are listed as reference information for the next level of support.

MAP 4200: Extended Cluster IML Time Due to NVS Battery Charging
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Do not power off the 2105 unless instructed to do so.

Description
The time to complete a cluster IML is extended when one or both NVS batteries, in the battery assembly FRU, are being charged. This can also happen when the charger card status is being rebuilt. These conditions can occur when an NVS battery FRU has been disconnected, replaced, or discharged. The NVS batteries must be fully charged to guarantee NVS data retention. Note: If one or both battery cables were left disconnected, a battery failure will be indicated. This will be handled as a battery failure, and a problem will be created during the cluster IML.

Isolation
1. The cluster LCD display panel is displaying a status message to reference this MAP. This status message also shows which NVS batteries are being charged. 2. Find the condition you have in the table below. You must wait up to the maximum amount of time for the cluster IML to complete. If the IML does not complete, call your next level of support.
Condition Total time for cluster to complete IML from power on (See note below) Action

I/O drawer NVS battery FRU was replaced

Wait for IML to complete v Minimum 30 minutes (battery does not need to be recharged but the battery charge profile must be rebuilt) v Maximum 90 minutes (battery needs to be fully recharged and the battery charge profile must be rebuilt)

I/O drawer NVS charger card Maximum 30 minutes (battery Wait for IML to complete FRU was replaced does not need to be recharged but the battery charge profile must be rebuilt)

366

VOLUME 1, TotalStorage ESS Service Guide

MAP 4200: Extended Cluster IML


Condition Total time for cluster to complete IML from power on (See note below) Action

I/O drawer FRU replaced that Maximum 30 minutes (battery Wait for IML to complete required disconnecting an does not need to be NVS battery cable. recharged but the battery charge profile must be rebuilt) NVS battery was drained v Minimum 30 minutes when the 2105 lost customer (battery needs a minimal input power unexpectedly recharge) (2105 was not powered down v Maximum 90 minutes using the operator panel (battery needs to be fully white switch) recharged) 2105 has been powered off for several days Wait for IML to complete

Wait for IML to complete v Minimum 30 minutes (battery does not need to be recharged but the battery charge profile must be rebuilt) v Maximum 90 minutes (battery needs to be fully recharged) Return to install instructions

2105 is being installed

First IML during an install is not delayed for an NVS battery status check

Note: If one of the I/O drawer power supplies is failing, the time listed will be doubled.

MAP 4240: Isolating a Blinking 888 Error on the CEC Drawer Operator Panel
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Do not power off the 2105 unless instructed to do so.

Description
v A blinking 888 number suggests that either a hardware or software problem has been detected and a diagnostic message is ready to be read. The next level of support will be called as they may have the additional information and access authority to do problem isolation and resolution.

Isolation
1. Perform the following steps to record the information contained in the blinking 888 message and then call your next level of support. a. Wait until the blinking 888 is displayed. b. Record in sequence each code that is displayed after the blinking 888 goes away. Stop recording when the blinking 888 reappears. Separate each code recorded with a blank space. c. Go to step 2 on page 368.

Problem Isolation Procedures, CHAPTER 3

367

MAP 4240: 888 Blinking on Cluster


2. Using the first code recorded use the following list to determine the next step to use. v Type 102, go to step 3. v Type 103, go to step 4 on page 369. 3. Use the following steps and information to determine the content of the type 102 message. Crash and dump status codes are listed later in this step. Notes: a. A Type 102 message is generated when a software or hardware error occurs while the system is running an application. b. There are no SRNs associated with message Type 102. 102 = Message type RRR = Crash code, the three-digit code that immediately follows the 102, see Crash Codes on page 368. SSS = Dump status code, the three-digit code that immediately follows the Crash code, see Dump Progress Indicators (Dump Status Codes) on page 369. Record the Crash code and the Dump Status from the message you recorded. Are there additional codes following the Dump Status? v Yes, this message also has a type 103 message included in it. To decipher the SRN and FRU information in the Type 103 message, go to step 4 on page 369. v No, call your next level of support. The 2105 software on the cluster hard disk drive has most likely been corrupted. You may be asked to use MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320 to reload all the cluster software. MAP 4020 will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive. Crash Codes The crash codes that follow are part of a Type 102 message. These crash codes are grouped into three categories: Category 1: Dump analysis is the appropriate first action in Problem Determination, call the next level of support. Category 2: Dump analysis most likely will not aid in Problem Determination, begin the Problem Determination process with hardware support. Category 3: Both software and hardware support may be needed in Problem Determination, call the next level of support. Category 1, Crash Codes 300 Data storage interrupt from the processor. 32x Data storage interrupt because of an I/O exception from IOCC. 38x Data storage interrupt because of an I/O exception from SLA. 400 Instruction storage interrupt. 600 AIX 4.3.3.3 and above: Alignment Interrupt. If pre-AIX 4.3.3.3: AIX has crashed because the Portability Assist Layer (PAL) for this machine type has detected a problem. 605 AIX has crashed because the Portability Assist Layer (PAL) for this machine type has detected a problem (AIX 4.3.3.3 and above). 700 Program interrupt. Category 2, Crash Codes 200 Machine check because of a memory bus error.

368

VOLUME 1, TotalStorage ESS Service Guide

MAP 4240: 888 Blinking on Cluster


201 202 203 204 205 206 207 208 500 501 51x 52x 53x Machine check because of a memory time-out. Machine check because of a memory card failure. Machine check because of a out of range address. Machine check because of an attempt to write to ROS. Machine check because of an uncorrectable address parity. Machine check because of an uncorrectable ECC error. Machine check because of an unidentified error. Machine check due to an L2 uncorrectable ECC. External interrupt because of a scrub memory bus error. External interrupt because of an unidentified error. External interrupt because of a DMA memory bus error. External interrupt because of an IOCC channel check. External interrupt from an IOCC bus timeout; x represents the IOCC number. 54x External interrupt because of an IOCC keyboard check. 800 Floating point is not available. Category 3, Crash Codes 000 Unexpected system interrupt. 558 There is not enough memory to continue the IPL. 600 AIX 4.3.3.3 and above: Alignment Interrupt. If pre-AIX 4.3.3.3: AIX has crashed because the Portability Assist Layer (PAL) for this machine type has detected a problem. 605 AIX has crashed because the Portability Assist Layer (PAL) for this machine type has detected a problem (AIX 4.3.3.3 and above). Note: If you have 888 102 605 0C5, then go to step 5 on page 370. Dump Progress Indicators (Dump Status Codes) The following dump progress indicators, or dump status codes, are part of a Type 102 message: 0c0 The dump completed successfully. 0c1 The dump failed due to an I/O error. 0c2 A dump, requested by the user, is started. 0c3 The dump is inhibited. 0c4 The dump device is not large enough. 0c5 The dump did not start, or the dump crashed. 0c6 Dumping to a secondary dump device. 0c7 Reserved. 0c8 The dump function is disabled. 0c9 A dump is in progress. 0cc Unknown dump failure 4. Use the following steps and information to determine the content of the Type 103 message. Note: A Type 103 message is generated when a hardware error is detected. 103 = Message type XXX YYY = SRN (where XXX = the three-digit code following the 103 and YYY is the three-digit code following the XXX three-digit code). a. Record the SRN and FRU location codes from the recorded message. Note: If you have 888 103 605 0C5, then go to step 5 on page 370. b. Call the next level of support before continuing.
Problem Isolation Procedures, CHAPTER 3

369

MAP 4240: 888 Blinking on Cluster


c. Find the SRN in the SRN Listing and do the indicated action, go to step Bus SRN to FRU Reference Table in chapter 9 of the Volume 3. 5. Resetting the cluster SPCN code may clear this error condition. Do the following to reset the SPCN code: a. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to do the following major steps. Do not replace any FRUs. b. Login to the working cluster and use the Alternate Cluster Repair Menu options. c. Quiesce the failing cluster. d. Power off the failing cluster. e. Unplug the input power cables from CEC drawer and I/O drawer power supplies. Leave unplugged for 30 seconds to fully drain the circuits. f. Reconnect the input power cables. g. Power on the failing cluster: v If the cluster does not stop at blinking 888, continue in MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to check for problem needing repair and then resume the cluster. v If the cluster hangs at blinking 888, call the next level of support.

MAP 4350: Isolating Cluster Code Load Counter=2


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster attempted to IML two times and failed each time. A problem was created. When a cluster powers on, it first loads the AIX operating system, then the functional code and finally the RAS (maintenance package) code. The code load counter is initially set to 0 and is increased by 1 at the start of the code load. If the code load is successful, the counter is reset to 0. If it is unsuccessful, the counter is not reset to 0. If the load of the functional code is not successful, the failing cluster creates an AIX error log. A problem is not created as the functional code and RAS code were not able to be loaded yet. The other cluster reboots the failing cluster to attempt to get past the error. If the code load is successful, the code load counter is reset to 0. The AIX error log from the prior unsuccessful attempt will not create a problem as the error was temporary. If the second reboot attempt fails, a final reboot occurs. The AIX code is loaded, the functional code load which would fail is bypassed, and the RAS code is loaded. This leaves the failing cluster unable to do customer operations, but able to accept a service terminal login for service actions. The other cluster creates a problem with an ESC=38F0 and uses this MAP for further isolation. The problem does not give the error that caused the code load failures. The failing cluster should create a problem using the AIX error log from the prior unsuccessful attempt. The problem should contain the repair action for the error that caused the code load failures.

370

VOLUME 1, TotalStorage ESS Service Guide

MAP 4350: Cluster Code Load Counter=2

Isolation
1. Read the description section above. 2. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Look for related problems that have cluster or power FRUs. (SSA or drawer problems are not related.) Were related problems found? v Yes, repair them. v No, call the next level of support.

MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The CEC drawer operator panel displays various types of codes that indicate the status of the cluster power on and code load. Some are normal status and progress indications that change every few seconds. These same codes can indicate a problem if the cluster appears to hang with the code still displayed. Other codes indicate error conditions that will not prevent the code load from completing but will create a problem. Still other codes indicate conditions that prevent the cluster from completing its power on or code load. Notice that a Ready for Login display normally means the cluster is powered on and all code is nearly loaded. The Ready for Login can become blank shortly after first appearing, this is normal operation. However, the Cluster Ready indicator on the 2105 Model 800 operator panel will stay lit.

Isolation
1. Use the Repair Menu, End Of Call Status option to display and repair any related problems. If there are none, continue with the next step. 2. The table lists possible symptoms and the actions to repair them. If you have: v Only one symptom, find it in the following table and do the listed Actions. If that does not correct the problem, look for other listed symptoms you may have missed. If that does not repair the problem, call the next level of support. v Multiple symptoms, find the last symptom you observed in the table and do the listed Actions. If that does not repair the problem, use the earlier symptoms you observed to attempt to repair the problem. If that still does not correct the problem, call the next level of support.

Problem Isolation Procedures, CHAPTER 3

371

MAP 4360: Isolating with CEC Operator Panel Codes


Table 54. Cluster Boot or Down, Symptoms Symptom Blank during power on and code load. Action Shortly after cluster power on, the CEC drawer operator panel displays various status codes until the code load is complete and Ready for Login is displayed. If these codes are not displayed determine if the cluster powered on. Observe the CEC drawer power indicator LED (front lower left corner of the CEC drawer). Observe the I/O drawer power indicator (upper left corner of the CEC drawer operator panel). Both indicators should be on solid. v If both are not on solid, go to MAP 4880: Cluster Power On Problem on page 461. v If both are on solid, test the CEC drawer operator panel by attempting to login to this cluster. The login password should be displayed automatically: If the password is displayed the display is working, return to the procedure that sent you here. If the password is not displayed, one of the following FRUs is failing: CEC drawer operator panel, I/O drawer planar assembly, the cable between the operator panel and the I/O drawer. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the cluster FRUs. A bouncing or scrolling ball Connect the service terminal to I/O drawer serial port S1. remains on the CEC drawer 1. If the service processor menu is displayed: operator panel or the panel is filled a. Replace the CEC drawer operator panel with dashes. b. Replace the I/O drawer planar assembly 2. If the service processor menu is not displayed, replace the I/O drawer planar assembly. A sequence of displays, each appearing for five seconds, repeats continuously as follows: A blank display, followed by an eight-digit error code, followed by up to nine full panels of dump data. 1. Record the 8-digit error code seen after the five second blank display. 2. Reset the service processor, using MAP 43E0: Service Processor Reset on page 401. 3. Go to Error Messages, Diagnostic Codes, and Service Reports in chapter 9 of the Volume 3. Locate the recorded code in the entry tables and do the action shown.

Went blank after displaying Ready 1. This is a normal indication at the end of a successful cluster power on and for Login code load. The cluster is ready for a service terminal login 2. The Ready for Login display can be overwritten at any time by an AIX operating system or service terminal action that will cause it to be blank. Ready for Login is displayed 1. This is a normal indication at the end of a successful cluster power on and code load. The cluster is ready for a service terminal login but should not be resumed until the rack operator panel Cluster Ready LED is lit. 2. The Ready display can be overwritten at any time by an AIX operating system or service terminal action that will cause it to be blank. 3. If the 2105 has been powered off for more than three days, the cluster power on may take up to two hours to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity.

372

VOLUME 1, TotalStorage ESS Service Guide

MAP 4360: Isolating with CEC Operator Panel Codes


Table 54. Cluster Boot or Down, Symptoms (continued) Symptom NVSBATx Charging is displayed Action This is a normal indication. The cluster can take up to 2 hours to display Ready for Login or light the rack operator panel Cluster Ready LED. 1. An NVS battery is being charged so it will have a minimum of 72 hours of capacity 2. An NVS battery has been disconnected from its NVS battery charger card and the charger card needs to rebuild its profile of the battery capacity. 3. The 2105 has been powered off for more than three days causing the battery card to disconnect its NVS batteries. The NVS charger card needs this time to rebuild its profile of the battery capacity. NVS Battery Test is turned Off is This is normal during rack install as the control switch Bypass NVS Battery Test displayed at IML was automatically set to True to allow the install to continue even if the batteries were not at full charge yet. After the install is complete, the control switch should have been automatically set to False (default). To display/set this switch, login and display the Main Service Menu. Select Configuration Options Menu > Change / Show Control Switches > Bypass NVS Battery Test at IML. OK is displayed The Service Processor (SP) is ready. The cluster is in standby power mode. v This is normal if the cluster was powered off for service by using the service terminal Alternate Cluster Repair Menu options. v This is not normal if the cluster was not powered off for service. Repair any related problems, using the Repair Menu, Display/Repair Problems Needing Repair. If there are no related problems, the cluster can power itself off during a power on process if the RPC card remote/local switches are not set to the same positions on both cards. Ensure that the RPC power select switch at the top of each card are set the same. Ensure that the bottom two positions of the 4 position DIP switch at the bottom of each card are set the same. v If the 2105 Model 800 is being powered on, OK should display for a few seconds and then the cluster power on should begin. If the cluster hangs with OK displayed, go to MAP 4880: Cluster Power On Problem on page 461 STBY is displayed The Service Processor (SP) is ready. The cluster was shutdown by the cluster operating system, AIX. Read SP error log for possible fault indications and then call the next level of support. See Service Processor Operations in Appendix A of the Volume 3. Check for these further symptoms. v Connect problem to only one cluster. Go to MAP 6060: Isolating a Service Terminal Login Failure on page 567 v Connect problem to both clusters. Go to MAP 6060: Isolating a Service Terminal Login Failure on page 567 Record SRN 101-xxx, where xxx is the 3-digit displayed, then go to Service Request Number List in chapter 9 of the Volume 3. Go to MAP 4240: Isolating a Blinking 888 Error on the CEC Drawer Operator Panel on page 367 The cluster unsuccessfully attempted to load code three times. The threshold counter was exceeded and it stopped with 0005 displayed. AIX and the RAS (maintenance package) code did load successfully. If the problem is due to hardware, a problem should have been created. Connect the service terminal to the failing cluster and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Repair any related problems. A power on of the cluster will automatically reset the threshold counter. If there are no related problems, the problem is due to a code problem. Call your next level of support.

Service terminal cannot connect (display Copyright and Login screen) or cannot login (display Main Service Menu)

The cluster stops and a 3-digit number is displayed in the CEC drawer operator panel. 888 is displayed followed by additional error codes. Cluster stops with 0005 displayed.

Problem Isolation Procedures, CHAPTER 3

373

MAP 4360: Isolating with CEC Operator Panel Codes


Table 54. Cluster Boot or Down, Symptoms (continued) Symptom Cluster stops with a 4-character code displayed for more than 10 minutes Note: If the CEC drawer operator panel displays 2 sets of numbers (one above the other), use the top set of numbers as the error code. 4 character codes (0500-0900, 0Cxx) are displayed xxx-xxx, a SRN (Service Reference Numbers) is displayed 8 character codes are displayed 10 character codes are displayed The cluster appears to restart/reboot while displaying the E105 system firmware code. Action Go to Entry Table for CEC Drawer Operator Panel Codes in chapter 9 of the Volume 3.

Go to Entry Table for CEC Drawer Operator Panel Codes in chapter 9 of the Volume 3. Go to Entry Table for CEC Drawer Operator Panel Codes in chapter 9 of the Volume 3. Go to Entry Table for CEC Drawer Operator Panel Codes in chapter 9 of the Volume 3. Go to Entry Table for CEC Drawer Operator Panel Codes in chapter 9 of the Volume 3. If the service terminal is kept logically connected, this normally happens after the cluster POST indicators are displayed. The term POST indicators refer to the resource names that are listed after the multiple lines of RS/6000 are displayed. They are memory keyboard network SCSI speaker. Go to MAP 43A0: Bootlist Management Using SMS on page 387.

The cluster appears to restart/reboot more than four times while displaying the Exxx system firmware codes. The 0Cxx AIX progress codes are not displayed. The cluster appears to restart/reboot when displaying the 10 character codes. The cluster returns to the E1xx progress codes and begins the code load sequence again. This may occur up to three times.

There is a problem that prevents the cluster from reaching E105 that would begin the AIX boot from the hard disk drive. Connect the service terminal to the working cluster and use the Alternate Cluster Repair Menu options to power off the failing cluster. Then go to MAP 2700: CEC Drawer Power On Problem on page 170. When the problem is fixed, the cluster should be able to boot from the hard disk drive. There are certain error recovery sequences during code load at the time the CPI interfaces are being initialized that will cause up to 3 code loads to be attempted. A problem will be created and the cluster message indicator on the 2105 Model 800 operator panel will be on. Connect the service terminal to the cluster with the message indicator on and use the Main Service Menu -> Start Repair -> Show/Repair Problems Needing Repair option. v If a related problem is found, repair it. v If no related problem is found, then attempt to recreate the problem by power cycling the cluster again. Connect the service terminal to the working cluster and use the Repair Menu -> Alternate Cluster Repair Menu -> options to: Quiesce the Alternate Cluster Power Off the Alternate Cluster Power On the Alternate Cluster. Observe the CEC drawer operator panel during power on and code load. If it loads normally, then use Resume Alternate Cluster to return the cluster to customer use. If the cluster fails with a problem created, repair it. If the cluster fails with no problem created, call the next level of support.

374

VOLUME 1, TotalStorage ESS Service Guide

MAP 4360: Isolating with CEC Operator Panel Codes


Table 54. Cluster Boot or Down, Symptoms (continued) Symptom Action

The cluster stops and POST Go to MAP 2700: CEC Drawer Power On Problem on page 170. indicators are displayed on the service terminal session (if it had been kept logically connected since the cluster power on. The term POST indicators refer to the resource names that are listed after the multiple lines of RS/6000 are displayed. They are memory keyboard network SCSI speaker.

MAP 4370: Error Displaying Problems Needing Repair


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The process to display the problems first attempts to access the problem file on the cluster that the service terminal is connected to. If the file cannot be read, an error message will be included in the service terminal problem display screen for that cluster. The process to display the problems then attempts to access the problem file on the other cluster. It attempts to communicate through the cluster to cluster ethernet connection. If there is no response from the other cluster when trying to read the problem, then an error message will be included in the service terminal problem display screen for that cluster.

Isolation
1. Use the service terminal Show / Repair Problems Needing Repair option. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, continue with the next step. v No, go to step 6 on page 376. 2. There is a problem displaying the problems for the other cluster. (The other cluster is the cluster that the service terminal is not connected to, it is also the failing cluster.) Is the CEC drawer operator panel for the failing cluster hung displaying a code (other than Ready for Login) for more than five minutes? v Yes, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v No, continue with the next step. 3. Connect the service terminal to the other cluster and attempt to login. Is the Copyright and Login screen displayed? v Yes, continue with the next step. v No, the Copyright and Login screen is not displayed, go to step MAP 6060: Isolating a Service Terminal Login Failure on page 567. 4. Attempt to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option):
Problem Isolation Procedures, CHAPTER 3

375

MAP 4370: Display Problem Needing Repair Error


Does it now fail to the cluster the service terminal is connected to? (This is the same cluster that originally failed.) v Yes, continue with the next step. v No, if no error message is displayed for this cluster, the problem is with the cluster to cluster ethernet connection. Go to MAP 4390: Isolating a Cluster to Cluster Ethernet Problem on page 377. 5. IML the cluster code again, then try the operation again by doing the following: v Connect the service terminal to the other cluster. v Go to the Alternate Cluster Repair Menu options v Quiesce the Alternate Cluster (failing cluster) v Power Off the Alternate Cluster v Power On the Alternate Cluster v Resume the Alternate Cluster v Connect the service terminal back to the failing cluster. v Display the problems needing repair. Does it still fail to display the problems to the cluster the service terminal is connected to? v Yes, call the next level of support. (The cluster hard disk drive may need the rebuild process to reload its code.) v No, go to MAP 1500: Ending a Service Action on page 67. 6. Use these steps to IML the code again for the failing cluster, then try the operation again. v Connect the service terminal to the other cluster (working cluster). v Go to the Alternate Cluster Repair Menu options v Quiesce the Alternate Cluster (failing cluster) v Power Off the Alternate Cluster v Power On the Alternate Cluster v Resume the Alternate Cluster v Connect the service terminal back to the failing cluster. v Display the problems needing repair again. Does it still fail? v Yes, call the next level of support. (The cluster hard disk drive may need the rebuild process to reload its code.) v No, go to MAP 1500: Ending a Service Action on page 67.

MAP 4380: Isolating a Customer LAN Connection Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This Map is no longer used to isolate Customer LAN problems. Please use the table below to select the correct MAP for the problem which was detected.

Isolation
1. Use the following table to determine the next action based on the problem that sent you here:
Problem Go to:

376

VOLUME 1, TotalStorage ESS Service Guide

MAP 4380: Customer LAN Connection


ESS Cluster to Cluster communication problem ESSNet1 or Master Console to ESS Cluster communications problem ESS Cluster to Customer Network Problem Customer email problem SNMP problem ESS Specialist problem MAP 4390 Section-6 on page 381 MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem on page 405 MAP 4450: ESS Cluster to Customer Network Problem on page 407 MAP 1310: Isolating E-Mail Notification Problems on page 58 MAP 1305: Isolating SNMP Notification Problems on page 56 MAP 5000: ESS Specialist Cannot Access Cluster on page 540

MAP 4390: Isolating a Cluster to Cluster Ethernet Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. If the customer is using the ESS specialist, Web Copy Services or the Command Line Interface then disruption may occur.

Description
The clusters communicate to each other through an ethernet connection for the RAS (maintenance package) operations. A new 2105 Model 800 comes from the factory with a short ethernet jumper cable directly connecting the RJ-45 connector on each cluster. This special jumper cable crosses signals within the cable so an ethernet hub is not needed to direct connect the clusters to each other. All 2105 Model 800 leave the factory set with the same pair of TCP/IP addresses, one for each cluster bay. Those TCP/IP addresses are changed when connected to the ESSNet console or customer ethernet. You have been directed to this map because a cluster to cluster ethernet communication problem was detected by the microcode. The following SRN information (displayed in the problem log) may provide additional information for product engineering.
SRN 128 129 130 131 132 133 134 136 137 144 1232 Description Invalid parameter failure Failure during authorization Socket failure Daemon Initialization/Setup fail Loopback IP address invalid failure Write to socket failure Read to socket failure Operations on a file failure System subroutine failure Client timeout failure No additional information provided in the SRN

Problem Isolation Procedures, CHAPTER 3

377

MAP 4390: Cluster to Cluster Ethernet

Isolation
MAP 4390 Section-1: Note: Intermittent failures are not normally due to 2105 TCP/IP settings. They can be due to intermittent hardware failures and intermittent or marginal network problems. This includes the customer network if it is attached to the ESSNet. 1. Does the problem that sent you here have ESC = 13FF? v Yes, continue with the next step. 2. v No, go to step 3. Is the problem ESC = 13FF? v Yes, the cluster to cluster communication problem was caused by the rsACExecd daemon on the cluster that reported the 13FF. The daemon was automatically recovered by the microcode:

Close the problem with ESC = 13FF. If the other cluster has a prior problem with ESC = 1232, close that problem. Use the Repair Menu, End of Call Status option to verify 2105 status before logging out. v No, continue with the next step. 3. Find the condition that applies in the table below:
Table 55. Cluster to Cluster Communication Problem, MAP Entry Condition Most Likely Cause of the Problem Go To: MAP 4390 Section-2 on page 379

A 2105 is being installed and v New 2105 from IBM no TCP/IP settings were manufacturing. Most likely a changed yet. The cluster to hardware failure of either cluster cluster communication failed I/O drawer planar assembly, or with the cross cluster the cross cluster ethernet jumper ethernet communication cable. cable still connected. v Reinstall of a 2105. Most likely the cross cluster ethernet jumper cable is not installed. It also could be invalid TCP/IP settings from the prior installation, or hardware FRUs listed for new 2105 above. The cluster TCP/IP settings on the 2105 were just changed, and now it is failing. The first 2105 was just installed on a new ESSNet. One or more TCP/IP settings on either cluster did not get updated correctly, is invalid, or created a duplicate TCP/IP condition on the network. The TCP/IP settings were working prior to the 2105 being connected to the new ESSNet. The problem is with the ESSNet ethernet hub or ethernet cables.

MAP 4390 Section-3 on page 379

MAP 4390 Section-4 on page 380

378

VOLUME 1, TotalStorage ESS Service Guide

MAP 4390: Cluster to Cluster Ethernet


Table 55. Cluster to Cluster Communication Problem, MAP Entry (continued) Condition Most Likely Cause of the Problem Go To: MAP 4390 Section-5 on page 380

An additional 2105 was just v If the new 2105 is failing, the added to an existing ESSNet problem is probably the ESSNet (with one or more 2105s ethernet hub, the ethernet already on it.) cables, or a duplicate TCP/IP address. (It is assumed that the TCP/IP settings were working prior to the 2105 being connected to the new ESSNet.) v If an existing 2105 is failing, the new 2105 has probably created a duplicate TCP/IP address. Customer just made changes The customer network may have a to their network network device that now has a duplicate TCP/IP setting. (It is assumed that no changes were made to the ESSNet network.) None of the above conditions The problem could be failing are suspected, the cause of hardware, TCP/IP settings in the problem is unknown. cluster, duplicate TCP/IP address on the network.

MAP 4390 Section-6 on page 381

MAP 4390 Section-7 on page 381

MAP 4390 Section-2: The 2105 is being installed new from IBM or reinstalled from a prior account. Do the following actions in the order listed until the problem if fixed:
Table 56. Cluster to Cluster Communication Failure Action Go to:

Verify the cross cluster ethernet jumper cable See figure 382 is installed and connected to both clusters. Verify the TCP/IP settings for both clusters are correct. The above most likely causes did not fix the problem. Do a complete checkout. MAP 4390 Section-10: Check the TCP/IP Settings for Each Cluster on page 383 MAP 4390 Section-13: 2105 Install Failure, Replace FRUs or Do Further Isolation on page 386

MAP 4390 Section-3: TCP/IP settings on the 2105 were just changed. The new settings were provided by the customer if the ESSNet is to be attached to the customer network. The settings were provided by the service guide Install Chapter 5 if it is not to be attached to the customer network. There are two methods used to change the settings: 1. You logged in to one cluster and then used the Dual cluster option to update all information on both clusters. 2. You logged in to each cluster and used the single cluster options to update each cluster. This step assumes the problem is with the new settings. One of the following occurred: 1. One or more of the provided settings is not valid. 2. One or more of the settings were entered incorrectly.

Problem Isolation Procedures, CHAPTER 3

379

MAP 4390: Cluster to Cluster Ethernet


3. If the dual cluster option was used, one or more of the settings may not have been updated correctly on both clusters even if there was no error message. Do the following actions in the order listed until the problem if fixed:
Table 57. Cluster to Cluster Communication Problem, TCP/IP Settings Action Go to:

Display cluster TCP/IP settings as defined by MAP 4390 Section-10: Check the TCP/IP the customer configuration worksheet or Settings for Each Cluster on page 383 service guide Install Chapter 5. The above most likely causes did not fix the problem. Do a complete checkout. MAP 4390 Section-7 on page 381

MAP 4390 Section-4: The first 2105 was just installed on a new ESSNet. This step assumes that the cluster to cluster communication was working before being connected to the ESSNet ethernet hub. The problem is most likely with the ESSNet ethernet hub or ethernet cables. Note: During the 2105 install, the clusters are directly connected to each other by the cross cluster ethernet communication cable. The cluster TCP/IP settings are updated for attachment to the ESSNet while the clusters are still directly connected together. Do the following actions in the order listed until the problem if fixed:
Table 58. Cluster to Cluster Communication Problem, New ESSNet Action Check visual symptoms of cluster Ready LEDs and ESSNet ethernet hub LEDs. The above most likely causes did not fix the problem. Do a complete checkout. Go to: MAP 4390 Section-9: Check for Visual Symptoms on page 382 MAP 4390 Section-7 on page 381

MAP 4390 Section-5: A 2105 was just installed on an existing ESSNet with at least one 2105 already attached. This step assumes that the new 2105 cluster to cluster communications were working after the cluster TCP/IP settings were updated before being connected to the ESSNet ethernet hub. v If the new 2105 is failing: 1. There is a problem with the ethernet cables to the ethernet hub, or the ethernet hub. 2. There is a duplicate TCP/IP setting between the new and existing 2105. v If the existing 2105 is failing: 1. There is a duplicate TCP/IP setting between the new and existing 2105. Do the following actions in the order listed until the problem if fixed:
Table 59. Cluster to Cluster Communication Problem, Existing ESSNet Action Check visual symptoms of cluster Ready LEDs and ESSNet ethernet hub LEDs. Check cluster TCP/IP settings as defined by customer configuration worksheet or service guide Install Chapter 5. Go to: MAP 4390 Section-9: Check for Visual Symptoms on page 382 MAP 4390 Section-10: Check the TCP/IP Settings for Each Cluster on page 383

380

VOLUME 1, TotalStorage ESS Service Guide

MAP 4390: Cluster to Cluster Ethernet


Table 59. Cluster to Cluster Communication Problem, Existing ESSNet (continued) Action The above most likely causes did not fix the problem. Do a complete checkout. Go to: MAP 4390 Section-7

MAP 4390 Section-6: The customer has made changes to their network and now the cluster to cluster communication does not work. This step assumes the clusters are connected to an ESSNet which is also connected to the customer network. The most likely cause of this problem is that the customer network has a duplicate TCP/IP address of one of the clusters. Do the following actions in the order listed until the problem if fixed:
Table 60. Cluster to Cluster Communication Problem, Customer Network Action Go to:

Use the ESSNet console to ping each cluster MAP 4390 Section-12: Console Ping Test to looking for a failure or duplicate TCP/IP Each Cluster (also Tests for Duplicate address on network. TCP/IP Address) on page 385 The above most likely causes did not fix the problem. Do a complete checkout. MAP 4390 Section-7

MAP 4390 Section-7: No known changes have been made to the clusters, ESSNet, or customer network and yet the cluster to cluster communication stopped working. The following actions in the order listed should isolate the problem:
Table 61. Cluster to Cluster Communication Problem, Unknown Cause Action Test cluster to cluster communication by displaying problems needing repair from each cluster. Repair any related problems. Check visual symptoms of cluster Ready LEDs and ESSNet ethernet hub LEDs. Check cluster TCP/IP settings as defined by customer configuration worksheet or the service guide Install Chapter 5. Test cluster to cluster communication using the cross cluster ethernet cable (clusters disconnected from the ESSNet). MAP 4390 Section-9: Check for Visual Symptoms on page 382 MAP 4390 Section-10: Check the TCP/IP Settings for Each Cluster on page 383 MAP 4390 Section-11: Test Cluster to Cluster Communication with the Direct Connect Cable on page 384 Go to: MAP 4390 Section-8: Test if the Communication Problem is still Occurring

Do an ethernet ping test to each cluster from MAP 4390 Section-12: Console Ping Test to the ESSNet console. If it fails, disconnect the Each Cluster (also Tests for Duplicate failing cluster and ping again looking for a TCP/IP Address) on page 385 duplicate TCP/IP address. Call the next level of support.

MAP 4390 Section-8: Test if the Communication Problem is still Occurring: Login to each cluster and use the service login Display Problems Needing Repair (Repair Menu, Show / Repair Problems Needing Repair option) option to test if the cross cluster communication is working.
Problem Isolation Procedures, CHAPTER 3

381

MAP 4390: Cluster to Cluster Ethernet


Login to cluster 1 (left) and Display Problems Needing Repair (Repair Menu, Show / Repair Problems Needing Repair option). Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, the problem is still occurring, return to the MAP Section that sent you here. v No, continue with the next step. 2. Login to cluster 2 (right) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Does the problem display screen give status that the other cluster cannot be accessed? v Yes, the problem is still occurring, return to the MAP Section that sent you here. v No, the problem is no longer occurring, go to MAP 1500: Ending a Service Action on page 67. Note: If you believe the problem is intermittent, call the next level of support. MAP 4390 Section-9: Check for Visual Symptoms: Verify the cluster Ready LEDs are lit, the ethernet cables are connected, and the ESSNet ethernet hub power, error and port LEDs are normal. 1. Verify the 2105 Model 800 operator panel Ready indicator LED for each cluster is still lit. If an LED is off, exit this MAP and use MAP 20A0: Cluster Not Ready on page 117 to return the cluster to ready. If the cluster to cluster communication still fails, return to MAP 4390 Section-1 on page 378. 2. Does each cluster bay have an ethernet cable connected? v Yes, continue with the next step. v No, Each cluster bay must have access to the other cluster bay through an ethernet connection. That connection can be through an ESSNet ethernet hub or through the cluster interconnect ethernet cable that goes directly between both cluster bays. Exit this map and use the ESSNet installation instructions in the Chapter 5 Install section in Volume 2 of this Service Guide to make the ethernet connections. 1.

I/O Drawer 1

I/O Drawer 2

Front View

Figure 130. Cluster to Cluster Communication Cable Location (s009120)

3. Verify the following ESSNet ethernet hub indications are present: a. Power LED is on.

382

VOLUME 1, TotalStorage ESS Service Guide

MAP 4390: Cluster to Cluster Ethernet


b. Error indicator LEDs are off. Reference the ethernet hub maintenance documentation. Are the hub indicators as listed above? v Yes, continue with the next step. v No, go to the ESSNet ethernet hub maintenance documentation to correct the problem or ask the customer to arrange for repair to his equipment. Then continue with the next step. 4. Observe the ESSNet ethernet hub port indicators for the ports connected to cluster bay 1 and cluster bay 2. The indicator is: v Off, if the hub port cannot detect the cluster v On, if the hub port can detect the cluster. The cluster IP address may still be set incorrectly. v Blinking, if the hub port is passing data to or from the cluster. Find the condition you have: a. Cluster bay 1 hub port On/blink, cluster bay 2 hub port On/blink. Return to the MAP Section step that sent you here. b. Cluster bay 1 hub port On/blink, cluster bay 2 hub port Off. Go to step 5. c. Cluster bay 1 hub port Off, cluster bay 2 hub port On/blink. Go to step 5. d. Cluster bay 1 hub port Off, cluster bay 2 hub port Off. Go to step 6. 5. One ESSNet hub port indicator is on, the other is off. Swap the ethernet cables between the two ESSNet hub ports. (Do not move the ESSNet hub port ends of the cables.) Is the same hub port indicator off? v Yes, replace the ethernet cable connected to the hub port with the indicator off. Continue with the next step to verify the repair. v No, the cluster bay connected to the hub port with the indicator off is failing. One of the following FRUs is failing: I/O drawer planar assembly or ethernet cable in the cluster bay. Use the Repair Menu, Replace a FRU option. Continue with the next step to verify the repair. 6. Go to the ethernet hub maintenance manual with the symptom of more than one port indicator that should be on is off. The hub may need to be reset or replaced. Exit this MAP. MAP 4390 Section-10: Check the TCP/IP Settings for Each Cluster: Note: Displaying the TCP/IP settings for each cluster also restarts the TCP/IP daemons for each cluster. This can correct a cluster to cluster communication problem due to a corrupted daemon when there is no other problem. 1. Display the TCP/IP settings for the local cluster. Use both options shown: a. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Change / Show TCP/IP Configuration Minimum Configuration and Startup b. From the service terminal Main Service Menu, select:

Problem Isolation Procedures, CHAPTER 3

383

MAP 4390: Cluster to Cluster Ethernet


Configuration Options Menu Configure Alternate Cluster IP Address and Hostname c. Use the customer supplied communications worksheets to verify the correct entries for Cluster hostnames, Cluster TCP/IP addresses, Network Mask, Gateway. 2. Verify that only current definitions exist in the /etc/hosts file, use the Configuration Options Menu. From the service terminal Main Service Menu, select: Configuration Options Menu Communications Resources Menu Change/Show TCP/IP Configuration Further Configuration Name Resolution Host Tables (/etc/hosts) List All Hosts a. Check that the IP address and cluster (host) name for the local cluster is only listed once. Note: There may be other hosts listed if the customer uses Copy Services without DNS. You will also see a loopback address. b. If there are multiple entries for either cluster, remove them using: Remove a Host. c. Press F3 until you return to the Name Resolution menu. 3. If the customer uses DNS, check that the correct DNS server or servers are defined. Use: Name Resolution Domain Nameserver (/etc/resolv.conf) List All Nameservers a. Check that only the required Nameserver or Nameservers are listed. b. Correct any entries using the Add and Remove options. 4. Login to the other cluster and repeat step 1 on page 383 to step 3, then return to the MAP Section step that sent you here. MAP 4390 Section-11: Test Cluster to Cluster Communication with the Direct Connect Cable: 1. Will the customer allow you to temporarily disconnect both clusters from the Ethernet hub or the customer network? Note: If the customer is using Web Copy Services then he MUST be consulted before continuing, v Yes, disconnect the ESSNet hub ethernet cables from each cluster and connect the direct ethernet connection cable between the clusters. (The direct connection cable is normally left in place after the initial install with the connectors unplugged.) Continue with the next step. v No, return to the MAP that sent you here. 2. Test the cluster to cluster communication. Display problems needing repair using the Repair Menu, Show / Repair Problems Needing Repair option. If the problem details screen displays status that problems from the alternate cluster are inaccessible, login to the other cluster and repeat this test. Was there any error accessing the problem summary on the other cluster?

384

VOLUME 1, TotalStorage ESS Service Guide

MAP 4390: Cluster to Cluster Ethernet


v Yes, the TCP/IP settings could cause this, but the TCP/IP settings should already have been checked using MAP 4390 Section-10: Check the TCP/IP Settings for Each Cluster on page 383. The failure is in one of the clusters or possibly the direct connect cable if this failure occurred during an install. Return to the MAP step that sent you here. v No, the communication between clusters appears to be working OK when using the direct connection. The most likely cause is an hardware problem with the ESSNet ethernet hub, ethernet cables or the customer network (if connected). Reconnect the original ethernet cables. Return to the MAP and step that sent you here. MAP 4390 Section-12: Console Ping Test to Each Cluster (also Tests for Duplicate TCP/IP Address): Note: If you do not have an ESSNet1 or Master Console, you will need to work with the customer and maybe the next level of support to resolve this problem. The following steps can be used for general guidance only. 1. Verify that all ethernet cable connections to the clusters, IBM ethernet hub and customer network (if used) are connected. 2. Do an ethernet ping test to each cluster from the ESSNet console. Use the IP network address for each cluster. Note: Reference MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem on page 405 for instructions. The ping test has two parts that must both be successful: Note: Do not leave the ping test running, it will slow down all communications through the hub. Press Ctrl/C to quit the ping test. v Part 1 tests if the target IP address can respond: If successful, a line of information will be displayed each time a packet of data is received back from the cluster. For example: 64 bytes from 9.113.24.123:icmp_seq=0 ttl=252 time=4ms. If the ping is not successful, the line of information will not display and the test will appear to hang with no response. v Part 2 tests for intermittent packet loss and the results are displayed after you stop the ping test with a Ctrl/C. For example: 20 packets transmitted, 19 packets received, 5% packet loss Some intermittent problems may require the ping test to run a large number of packets before missing packets will occur. Also, the default packet size for the ping command may be too small to detect the failure. You can use a parameter with the ping command to use a larger packet size. For example: ping -s 200 9.113.24.1 will use a 200 byte packet instead of the normal 64 bytes. The ping results displayed add 8 bytes of header information to the number of bytes you specify. Find the condition you have: a. The ping test worked to both cluster bays: v If the cluster to cluster communications still fail, call the next level of support. v If the communications now work, close the problem, then go to the service terminal Repair Menu, End of Call Status option. b. The ping test failed to one cluster bay. Continue with the next step.

Problem Isolation Procedures, CHAPTER 3

385

MAP 4390: Cluster to Cluster Ethernet


c. The ping test failed to both cluster bays. The ESSNet console may not be able to talk with the ethernet hub. Go to MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem on page 405. 3. Select the condition which applies: v You already successfully tested the direct cluster to cluster communications previously using the cross cluster ethernet cable to prove the cluster hardware is working. The cluster still fails when connected to the ethernet hub, so the possible causes are: The ethernet hub port to the cluster is failing. The ethernet cable to the cluster is failing. A duplicate I/P address exists on the network, from another 2105 or from a customer device if attached to the ESSNet is connected to the customer network. Do the following: a. Swap the ethernet cables between the ethernet hub port to the cluster that works and the hub to the cluster that does not work. Repeat the ping test. If the same cluster still fails, go to step 3b. If the other cluster now fails, the problem is most likely the ethernet hub port. After the repair is complete, replug the ethernet cables to their original positions. b. Test for a duplicate IP address on the network. Unplug the ethernet cable from the ethernet hub port to the failing cluster and repeat the ping test: If the test is successful, this indicates that a device exists on the network with a duplicate IP address. If the ping test fails, the ethernet cable to the failing cluster needs to be replaced. v You tested the cluster to cluster communications previously using the direct connection and found that it failed. The TCP/IP settings should have already been checked to be correct in a prior MAP Section step. The possible causes are: I/O drawer planar assembly or the I/O drawer internal ethernet cable. v You were not able to test the cluster to cluster communications previously. The TCP/IP settings should have already been checked to be correct in a prior MAP and step. Possible causes are the ethernet hub, ethernet cable, I/O planar, or internal ethernet cable. Try the following: Swap to spare port on hub, replace ethernet cable, Check TCP/IP settings using MAP 4390 Section-3 on page 379. If nothing found replace I/O planar and internal ethernet cable. MAP 4390 Section-13: 2105 Install Failure, Replace FRUs or Do Further Isolation: 1. The 2105 is being installed. The clusters are still directly connected with a cross cluster ethernet communication cable and the TCP/IP settings are correct. It is not possible to isolate which cluster is failing with this configuration. It is possible to isolate the failing cluster if an ESSNet or laptop computer can be used, see the note below. The failure is due to a hardware problem. The possible failing FRUs are: v Cross cluster ethernet communication cable v I/O drawer planar assembly for cluster 1 (left) v I/O drawer planar assembly for cluster 2 (right)

386

VOLUME 1, TotalStorage ESS Service Guide

MAP 4390: Cluster to Cluster Ethernet


Note: The failing cluster can be determined if a third platform is used to ping each cluster separately. Call the next level of support before proceeding. There are two possible methods: v Use an ESSNet. If the 2105 was connected to the ESSNet, the ESSNet console could be used to issue ping commands to each cluster. Only one cluster should fail the ping command. The probable failing FRU is the I/O drawer planar assembly. The TCP/IP settings need to be updated before connecting to the ESSNet. The ESSNet installation instructions in the Chapter 5 Install section in Volume 2 of this Service Guide need to be used to configure and connect the ESSNet. v Use a laptop computer. Configure the laptop ethernet port so it has the same settings as an ESSNet console (TCP/IP address and network mask). Disconnect the cross cluster ethernet cable from cluster 1 (left) and connect it to the laptop. Issue a ping command to the cluster 2 (right) TCP/IP address. Reconnect the cable to cluster 1, disconnect the cable from cluster 2 and connect it to the laptop. Issue a ping command to the cluster 1 TCP/IP address. If the ping fails to both clusters, the cable is failing or the laptop is not setup correctly. If the ping fails to only one cluster, replace the I/O drawer planar assembly in that cluster.

MAP 43A0: Bootlist Management Using SMS


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster boot list is kept in the I/O Drawer Planar Assembly NVRAM. The normal bootlist sequence is fd0 (diskette drive), cd0 (CD-ROM drive), hdisk0 ( hard disk drive), and hdisk1 (hard disk drive) or more simply: fd0, CD0, hdisk1, or hdisk0. v If the cluster is not able to boot up to AIX, System Management Services (SMS) is used to display and update the bootlist. v If the cluster is able to boot to AIX, service login options are used to display and update the bootlist. This MAP is only for SMS bootlist maintenance.

Isolation
1. Are you doing an Automatic LIC Code update? v Yes, continue with the next step. v No, go to step 3. 2. Is there a problem with ESC=14xx calling a 4Axx MAP? v Yes, go to step MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392. v No, continue with the next step. 3. Connect the service terminal to the cluster not being serviced. Repair any problems related the dual cluster disk drives for the failing cluster. If there are none, continue at the next step. From the service terminal Main Service Menu, select: Repair Menu
Problem Isolation Procedures, CHAPTER 3

387

MAP 43A0: Bootlist Management Using SMS


Show / Repair Problems Needing Repair Attempt to login to the failing cluster and display the cluster dual hard disk drive status. v If the login is successful, repair either hard drive that is NOT shown as in good condition using the menu options shown below. v If the login fails, continue with the next step. From the service terminal Main Service Menu, select: Repair Menu

4.

Cluster Dual Hard Disk Drive Menu Display Cluster Dual Hard Disk Drive Status (Identify/Replace a Failing Cluster Hard Disk Drive) 5. Connect the service terminal to the cluster not being serviced. 6. Quiesce and power off the cluster being serviced. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Quiesce Alternate Cluster Power Off the Alternate Cluster 7. Wait up to five minutes for the CEC drawer operator panel to display OK. Is OK displayed? v Yes, continue with the next step. v No, go to MAP 4730: Cluster Power Off Request Problem on page 446. 8. When OK is displayed on the CEC drawer operator panel, connect the service terminal to the S1 serial port of the cluster being repaired. Press the Enter key to cause a keyboard interrupt to the service processor. The service processor Main Menu should be displayed. 9. Setup to boot to the SMS menu. From the service processor Main Menu, select: System Power Control Menu Boot Mode Menu Boot to SMS Menu (set to enable) Note: The Boot to SMS setting will automatically be reset to disable during the next cluster power off. 10. Power on the cluster (SP menu refers to the cluster as a system). Enter 98 to return to the System Power Control Menu, then select Power On System. During the cluster power on, keep the service terminal logically connected to the cluster being powered on. Do not press the 1 key when prompted to select type of boot. Wait for the SMS main menu to display. It takes about four minutes from cluster power on to SMS being up. 11. Display the boot list using the text-based System Management Services: From the service processor Main Menu, select: Utilities Menu Multi Boot Menu Select Boot Devices Display Current Settings

388

VOLUME 1, TotalStorage ESS Service Guide

MAP 43A0: Bootlist Management Using SMS


This SMS option displays what was configured and also discovered by the system firmware during cluster power on. An additional requirement for harddisks is that they must also contain a boot record. If a boot device is not displayed, it may not be configured, it may be failing or it may not have a boot record (harddisk only). The actual SCSI ID values read are displayed in the bold Ax values shown below.

Version M2P020312 Copyright IBM Corp, 2000 All rights reserved. Current Boot Sequence 1 2 3 4 5 Diskette SCSI CD-ROM (loc=U0.1-P1/Z1-A3,0 SCSI 9100 MB Harddisk (loc=U0.1-P1/Z1-A0,0) SCSI 9100 MB Harddisk (loc=U0.1-P1/Z1-A2,0) None

Figure 131. Boot Sequence Display

Are the four boot devices shown above displayed on the cluster SMS screen? (The order of the Harddisks in the list does not matter.) v Yes, go to MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320 to reload all the cluster software. MAP 4020 will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive. v No, continue with the next step. 12. Do the following to see if the hard drives are detected on the SCSI bus. a. Press X repeatedly until you are back at the Utilities Menu. b. Enter 6 (MultiBoot) c. Watch carefully for one or both of the following messages. It is important to know exactly which of them appear. They will only appear briefly so you may need to exit and repeat this step several times. (To repeat : Press X once to return to the Utilities menu and then press 6). v Message, /pci@fff7f08000/scsi@c/sd@0,0, indicates that hdisk0 has been found. v Message, /pci@fff7f08000/scsi@c/sd@2,0, indicates that hdisk1 has been found. Note: The messages will be preceded by several other similar messages such as: /pci@fff7f0a000/pci@c,2/ssa@1/disk@21013D1104B14CK. These messages refer to SSA devices devices detected on the PCI adaptors and can be disregarded. 13. Use the table to determine the condition you have, and the action to perform. Note: The hdisk0 or hdisk1 displayed in current settings can be on either hard disk drive (HDD1 or HDD2). It depends on which hard disk drive the cluster booted from last. To translate the hdisk to an HDD, you must use the displayed SCSI IDs in the location codes as explained in step 12c.
Problem Isolation Procedures, CHAPTER 3

389

MAP 43A0: Bootlist Management Using SMS


Table 62. Boot Devices Found by Firmware on Power On Was cd0 listed (Step Was hdisk0 listed in Was hdisk1 listed in 11 on page 388) step 11 on page 388 step 11 on page 388 or found in step 12 or found in step 12 on page 389 on page 389 no no no Go to:

MAP 43A0 Section-1: Three SCSI boot devices not found. MAP 43A0 Section-2: Three boot devices found on page 391. MAP 43A0 Section-3: Both hard disk drives not found on page 391. MAP 43A0 Section-4: One hard disk drive not found on page 391. MAP 43A0 Section-4: One hard disk drive not found on page 391. MAP 43A0 Section-5: One hard disk drive and CD-ROM drive not found on page 391. MAP 43A0 Section-5: One hard disk drive and CD-ROM drive not found on page 391. MAP 43A0 Section-6: CD-ROM drive not found on page 392.

yes

yes

yes

yes

no

no

yes

no

yes

yes

yes

no

no

no

yes

no

yes

no

no

yes

yes

MAP 43A0 Section-1: Three SCSI boot devices not found: Common problem to all three SCSI boot devices. Do the following: 1. Press the CD-ROM drive eject button: v If the CD tray opens, continue with the next step. v If the CD tray does not open, there is most likely a power problem to the SCSI devices. Verify the SCSI power cables between the CD-ROM drive and I/O drawer planar assembly are plugged correctly. The possible failing FRUs are the I/O drawer planar assembly and the SCSI power cables. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. 2. Verify the SCSI signal and power cables between the SCSI boot devices and the I/O drawer planar assembly are plugged correctly.

390

VOLUME 1, TotalStorage ESS Service Guide

MAP 43A0: Bootlist Management Using SMS


3. Isolate if one of the SCSI boot devices is corrupting the SCSI signal bus. Unplug the SCSI signal cable from one device at a time and attempt to reboot the cluster. v If the cluster still does not boot, continue with the next step. v If the cluster boots, verify the SCSI ID jumper is set correctly before replacing the device that is unplugged. Verify the hard disk drive jumpers were set correctly. (See CD-ROM, Hard Disk Drive, and Diskette Drive Removals and Replacements, Cluster in chapter 4 of the Volume 2 for jumper information.) Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. 4. The possible failing FRUs are: I/O drawer planar assembly, SCSI signal cables. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. MAP 43A0 Section-2: Three boot devices found: All three SCSI boot devices were found, the problem is most likely with the boot records or boot code on both hard drives. Go to MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320 to reload all the cluster software. MAP 4020 will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive. MAP 43A0 Section-3: Both hard disk drives not found: The CD-ROM drive was found, so the SCSI power and signal cables are functioning. Both hard disk drives were not found. Do the following: 1. Verify the SCSI power and signal cables are correctly plugged to the hard disk drives. 2. Verify the SCSI ID is set correctly on each hard disk drive. A duplicate SCSI ID can cause this problem. (See Verify the hard disk drive jumpers were set correctly. (See CD-ROM, Hard Disk Drive, and Diskette Drive Removals and Replacements, Cluster in chapter 4 of the Volume 2 for jumper information.) 3. Isolate if one of the SCSI boot devices is corrupting the SCSI signal bus. Unplug the SCSI signal cable from one hard disk drive at a time and attempt to reboot the cluster. v If the cluster boots, replace the hard disk drive that is disconnected. Use the Main Service Menu, Cluster Dual Hard Disk Drive Repair Menu, Identify/Replace a Failing Cluster Hard Disk Drive option. v If the cluster does not boot, replace both hard disk drives. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. After the FRUs are replaced, go to MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320 to reload all the cluster software. MAP 4020 will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive. MAP 43A0 Section-4: One hard disk drive not found: If the cluster does not boot in this condition then there are two separate problems. Do the following: 1. Go to MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320 to reload all the cluster software. MAP 4020 will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive. 2. At the completion of the Hard Drive Build use the Repair Menu / Cluster Dual Hard Disk Drive Repair Menu to repair the problem with the hard disk drive that was not found in steps 11 on page 388 or 12 on page 389. MAP 43A0 Section-5: One hard disk drive and CD-ROM drive not found: The CD-ROM and one hard disk drive are not found. Repair the CD-ROM drive first. Do the following:
Problem Isolation Procedures, CHAPTER 3

391

MAP 43A0: Bootlist Management Using SMS


1. Verify the SCSI power and signal cables are correctly plugged to the CD-ROM drive. 2. Verify the CD-ROM drive SCSI ID is set correctly. (See CD-ROM, Hard Disk Drive, and Diskette Drive Removals and Replacements, Cluster in chapter 4 of the Volume 2 for jumper information.) 3. Replace the CD-ROM drive. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. 4. Return to step 5 on page 388 of this MAP to continue the repair. MAP 43A0 Section-6: CD-ROM drive not found: The CD-ROM drive is not found, this should not create a boot problem from the hard disk drives. The CD-ROM drive is needed to reload the code on the hard disk drives. Do the following: 1. Verify the SCSI power and signal cables are correctly plugged to the CD-ROM drive. 2. Verify the CD-ROM drive SCSI ID is set correctly. (See CD-ROM, Hard Disk Drive, and Diskette Drive Removals and Replacements, Cluster in chapter 4 of the Volume 2 for jumper information.) 3. Power off the cluster and replace the CD-ROM drive. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. 4. Return to step 5 on page 388 of this MAP to continue the repair.

MAP 43A5: Bootlist Management Using SMS for Automatic LIC


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster boot list is kept in the I/O Drawer Planar Assembly NVRAM. The normal bootlist sequence is fd0 (diskette drive), cd0 (CD-ROM drive), hdisk0 ( hard disk drive), and hdisk1 (hard disk drive) or fd0, CD0, hdisk1,hdisk0 . v If the cluster is not able to boot up to AIX, System Management Services (SMS) is used to display and update the bootlist. v If the cluster is able to boot to AIX, service login options are used to display and update the bootlist. This MAP is only for SMS bootlist maintenance.

Isolation
1. Connect the service terminal to the cluster not being serviced. 2. Quiesce and power off the cluster being serviced. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Quiesce Alternate Cluster Power Off the Alternate Cluster 3. Wait up to five minutes for the CEC drawer operator panel to display OK. Is OK displayed? v Yes, continue with the next step. v No, go to MAP 4730: Cluster Power Off Request Problem on page 446.

392

VOLUME 1, TotalStorage ESS Service Guide

MAP 43A5: Bootlist Management, SMS for Automatic LIC


4. When OK is displayed on the CEC drawer operator panel, connect the service terminal to the S1 serial port of the cluster being repaired. Press the Enter key to cause a keyboard interrupt to the service processor. The service processor Main Menu should be displayed. 5. Setup to boot to the SMS menu. From the service terminal Main Service Menu, select: System Power Control Menu Boot Mode Menu Boot to SMS Menu (set to enable) Note: The Boot to SMS setting will automatically be reset to disable during the next cluster power off. 6. Power on the cluster (SP menu refers to the cluster as a system). Enter 98 to return to the System Power Control Menu, then select Power On System. During the cluster power on, keep the service terminal logically connected to the cluster being powered on. Do not press the 1 key when prompted to select type of boot. Wait for the SMS main menu to display. It takes about four minutes from cluster power on to SMS being up. 7. Display the boot list using the text-based System Management Services. From the service terminal Main Service Menu, select: Utilities Menu Multi Boot Menu Select Boot Devices Display Current Settings This SMS option displays what was configured and also discovered by the system firmware during cluster power on. An additional requirement for Harddisks is that they must also contain a boot record. If a boot device is not displayed, it may not be configured, it may be failing or it may not have a boot record (harddisk only). The actual SCSI ID values read are displayed in the bold Ax values shown below. Version M2P020312 Copyright IBM Corp, 2000 All rights reserved. Current Boot Sequence 1 Diskette 2 SCSI CD-ROM (loc=U0.1-P1/Z1-A3,0 3 SCSI 9100 MB Harddisk (loc=U0.1-P1/Z1-A0,0) 4 SCSI 9100 MB Harddisk (loc=U0.1-P1/Z1-A2,0) 5 None 8. Use the table to determine the condition you have.
Table 63. Number of Harddisks Displayed Number of Harddisks Displayed 2 1 none Action Go to 9 Go to 13 on page 394 Go to 17 on page 394

9. Both hard disk drives are listed. Reverse the order of the hard disk drives in the boot list.
Problem Isolation Procedures, CHAPTER 3

393

MAP 43A5: Bootlist Management, SMS for Automatic LIC


the following: Enter x to return to the prior SMS menu Select Boot Devices. Select 5 Configure 3rd Boot Device. On the Configure 3rd Boot Device screen, select the Harddisk that was earlier listed as the 4th boot device. d. Enter x to return to the prior SMS menu Select Boot Devices. e. Select 6 Configure 4th Boot Device f. On the Configure 4th Boot Device screen, select the Harddisk that was earlier listed as the 3rd boot device. Login to the working cluster and power off and on the failing cluster. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster Ready LED to light. Is the cluster Ready LED lit? v Yes, continue with the next step. v No, both hard disk drives are corrupted, call the next level of support. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there now a problem with ESC = 14Fx calling a MAP 4Bxx? v Yes, return to MAP 4Axx and continue (in the priority table) v No, call the next level of support. (The AutoLIC process did not detect the boot from the unexpected hard disk drive.) Only one HDD is listed. Use the table to determine the condition you have.
Table 64. MAP Repair Started in The AutoLIC repair started with a problem Action calling MAP MAP 4A10 MAP 4A40 Any other 4Axx MAP Go to step 14 Go to step 15 Go to step 16

Do a. b. c.

10.

11.

12.

13.

14. Go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. When the build is complete return to MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) on page 482 and continue in the priority table. 15. Go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. When the build is complete return to MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) on page 488 and continue in the priority table. 16. This is an unexpected condition which probably indicates a double failure (one hard disk drive not detected and the other hard disk drive cannot boot). Call your next level of support. 17. No hard disks are shown on SMS Boot list. Do the following to see if the hard drives are detected on the SCSI bus. a. Press X repeatedly until you are back at the Utilities Menu. b. Enter 6 (MultiBoot).

394

VOLUME 1, TotalStorage ESS Service Guide

MAP 43A5: Bootlist Management, SMS for Automatic LIC


c. Watch carefully for one or both of the following messages. It is important to know exactly which of them appear. They will only appear briefly so you may need to exit and repeat this step several times. (To repeat : Press X once to return to the Utilities menu and then press 6). Note: he messages will be preceded by several other similar messages such as / pci@fff7f0a000/pci@c,2/ssa@1/disk@21013D1104B14CK. These refer to SSA devices detected on the PCI adaptors and can be ignored. /pci@fff7f08000/scsi@c/sd@0,0, this means that hdisk0 has been found /pci@fff7f08000/scsi@c/sd@2,0, this means that hdisk1 has been found 18. Use the table to determine the condition you have, and the action to perform.
Table 65. hdisk_ Repairs Is cd0 Listed in SMS? No Is hdisk0 found? No Is hdisk1 found? No Action: Go to MAP Section 43A5-1, Three SCSI Boot Devices Not Found on page 396 Go to MAP Section 43A5-2, Three Devices Found on page 396 Go to MAP Section 43A5-3, Both Hard Disk Drives Not Found on page 396 Go to MAP Section 43A5-4, One Hard Disk Drive Not Found on page 397 Go to MAP Section 43A5-4, One Hard Disk Drive Not Found on page 397 Go to MAP Section 43A5-5, One Hard Disk Drive and CD-ROM Drive Not Found on page 397 Go to MAP Section 43A5-5, One Hard Disk Drive and CD-ROM Drive Not Found on page 397 Go to MAP Section 43A5-6, CD-ROM Dive Not Found on page 397

Yes

Yes

Yes

Yes

No

No

Yes

No

Yes

Yes

Yes

No

No

No

Yes

No

Yes

No

No

Yes

Yes

Problem Isolation Procedures, CHAPTER 3

395

MAP 43A5: Bootlist Management, SMS for Automatic LIC


MAP Section 43A5-1, Three SCSI Boot Devices Not Found: Common problem to all three SCSI boot devices. 1. Check for SCSI devices power being present. Press the CD-ROM drive eject button: v If the CD tray opens, continue with the next step. v If the CD tray does not open, there is most likely a power problem to the SCSI devices. Verify the SCSI power cables between the CD-ROM drive and I/O drawer planar assembly are plugged correctly. The possible failing FRUs are the I/O drawer planar assembly and the SCSI power cables. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. After the power problem is corrected, go to MAP Section 43A5-7, Cluster Checkout on page 397. 2. Verify the SCSI signal cables between the SCSI boot devices and the I/O drawer planar assembly are plugged correctly. v If a signal cable problem is found and corrected, go to MAP Section 43A5-7, Cluster Checkout on page 397. v If no problems are found, continue at the next step. 3. Isolate if one of the SCSI boot devices is corrupting the SCSI signal bus. Unplug the SCSI signal cable from one boot device at a time, do step 17 on page 394 above and then return here and continue. v If there are still no boot devices detected, continue with the next step. v If one or more boot devices now appears, the problem is with the boot device that is unplugged. Verify the SCSI ID jumper is set correctly before replacing the device that is unplugged. (See CD-ROM and Hard Disk Drive Removals and Replacements, 2105 Model 800 CEC Drawer in chapter 4 of the Volume 2 for jumper information.) Then return here and go to MAP Section 43A5-7, Cluster Checkout on page 397. 4. The possible failing FRUs are: I/O drawer planar assembly, SCSI signal cables. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. Then return here and go to MAP Section 43A5-7, Cluster Checkout on page 397. MAP Section 43A5-2, Three Devices Found: All three SCSI boot devices were found, the problem is most likely with the boot records or boot code on both Hard drives. 1. Were you sent here from MAP 4A10 or 4A40 ? v Yes, continue with the next step. v No, call your next level of support. 2. Go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. When the build is complete go to MAP Section MAP Section 43A5-7, Cluster Checkout on page 397. MAP Section 43A5-3, Both Hard Disk Drives Not Found: The CD-ROM drive was found, so the SCSI power and signal cables are functioning. Both hard disk drives were not found. 1. Verify the SCSI power and signal cables are correctly plugged to both hard disk drives. 2. Verify the SCSI ID is set correctly on each hard disk drive. A duplicate SCSI ID can cause this problem. (See Verify the hard disk drive jumpers were set correctly. (CD-ROM and Hard Disk Drive Removals and Replacements, 2105 Model 800 CEC Drawer in chapter 4 of the Volume 2 for jumper information.)

396

VOLUME 1, TotalStorage ESS Service Guide

MAP 43A5: Bootlist Management, SMS for Automatic LIC


3. If a problem was found and corrected, go to MAP Section 43A5-7, Cluster Checkout. If no problem was found, call the next level of support. MAP Section 43A5-4, One Hard Disk Drive Not Found: 1. Did the AutoLIC repair start with MAP 4A10 or 4A40? v Yes, continue with the next step. v No, call the next level of support. (One boot drive is not detected and the other is not bootable.) 2. The hard disk that is not detected is failing. Check that the hard disk drive power and SCSI signal cables are connected. v If no problem is found, the possible failing FRUs are the hard disk drive, the power cable, or the SCSI signal cable. Note: These FRUs can be replaced with the cluster remaining powered on. v If you need to replace the hard disk drive FRU, but it is not readily available, you can delay the repair and continue with the next step. 3. Go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. When the build is complete go to MAP Section 43A5-7, Cluster Checkout. MAP Section 43A5-5, One Hard Disk Drive and CD-ROM Drive Not Found: The CD-ROM and one hard disk drive are not found. 1. Verify the SCSI power and signal cables are correctly plugged to the boot devices. 2. Verify the boot drive SCSI IDs are set correctly. (See CD-ROM and Hard Disk Drive Removals and Replacements, 2105 Model 800 CEC Drawer in chapter 4 of the Volume 2. for jumper information.) 3. Was a problem found and corrected? v Yes, go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. v No, call the next level of support. (Two of the three boot devices are not detected.) MAP Section 43A5-6, CD-ROM Dive Not Found: The CD-ROM drive is not found, this should not create a boot problem from the hard disk drives. The CD-ROM drive is needed to reload the code on the hard disk drives. 1. Verify the SCSI power and signal cables are correctly plugged to the CD-ROM drive. 2. Verify the CD-ROM drive SCSI ID is set correctly. (See CD-ROM and Hard Disk Drive Removals and Replacements, 2105 Model 800 CEC Drawer in chapter 4 of the Volume 2. for jumper information.) 3. Was a problem found and corrected? v Yes, go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. v No, you can delay the repair of the CD ROM drive until later. MAP Section 43A5-7, Cluster Checkout: 1. Did you replace or reconnect a FRU that required you to power off the cluster? v Yes, go to step 3 on page 398. v No, continue with the next step. 2. Did you do a hard drive rebuild

Problem Isolation Procedures, CHAPTER 3

397

MAP 43A5: Bootlist Management, SMS for Automatic LIC


v Yes, continue with the next step. v No, power the cluster off and on. Login to the working cluster and use the Alternate Cluster Repair Menu options. 3. Wait up to 45 minutes for the rack operator panel cluster ready LED to light. Is the Cluster Ready LED lit? v Yes, Return to the 4Axx MAP that was used to begin the AutoLIC repair. v No, call next level of support. (There may be a new problem that can be identified by beginning again at step 7 on page 393.)

MAP 43B0: Cluster Dual Hard Drive ESC 1xxx


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
ESC 1xxx are used to report problems with the cluster dual hard drives.

Isolation
Note: A single hard drive failure can normally be repaired without the cluster being Quiesced or Powered Off. Do not power off the failing cluster during this repair unless directed by the maintenance package. Undirected use of cluster power off can lead to unpredictable results. 1. Determine the failing cluster and ESC. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Select the problem to be repaired and note the ESC and the Failing Cluster. 2. Use the table to determine the condition you have, and the action to perform:
Table 66. ESC Repair Actions ESC 1050 Description and Action Description: Cluster hard disk drives are not mirrored, and the automatic mirroring function has been disabled. Action: Connect the service terminal to the failing cluster. Use Restore the Cluster to Automatic Mirroring Mode on the Cluster Dual Hard Disk Drive Repair Menu. Then select Restore Mirroring after a Cluster Hard Disk Drive Replacement. 1051 Description: A cluster dual hard drive has failed. Action: Connect the service terminal to the failing cluster. Use the Identify/Replace a Failing Cluster Hard Disk Drive option on the Cluster Dual Hard Disk Drive Repair Menu to identify and replace the failing drive. Note: The cluster MUST NOT be powered down during replacement of the failing cluster hard disk drive. 1052, Description: The cluster dual hard drive Mirror operation failed Action: Go to step 3 on page 400. 1053 Description: The cluster dual hard drive Unmirror operation failed Action: Go to step 3 on page 400.

398

VOLUME 1, TotalStorage ESS Service Guide

MAP 43B0: Cluster Dual Hard Drive ESC 1xxx


Table 66. ESC Repair Actions (continued) ESC 1054 Description and Action Description: The cluster IML of updated LIC code failed. Action: Call the next level of support. 1055 Description: The cluster IML of original LIC code failed. Action: Call the next level of support. 1056 Description: The cluster dual hard drive Cloning operation failed. Action: Go to step 3 on page 400. 1057 Description: The cluster dual hard drive Cloning cleanup operation failed. Action: Go to step 3 on page 400. 1058 Description: Description: The cluster dual hard drive Mirroring cleanup operation failed. Action: Go to step 3 on page 400. 1059 Description: The cluster dual hard drive rsAltInst command failed. Action: Go to step 3 on page 400. 105A Description: An illegal cluster dual hard drive operation was attempted. Action: Go to step 3 on page 400. 105B Description: A cluster dual hard drive command failed. Action: Go to step 3 on page 400. 105C Description: The mirrored cluster hard drives cannot be synchronized. Action: Attempt to resynchronize the dual hard disk drives. Connect the service terminal to the failing cluster. Use the Main Service Menu, Repair Menu, Cluster Dual Hard Disk Drive Repair Menu, Restore Mirroring after a Cluster Hard Disk Drive Replacement. (It is not necessary to have actually replaced a hard disk drive to use this option). v If it fails, then attempt to quiesce and resume the cluster. Use the Main Service Menu, Repair Menu, Alternate Cluster Repair Menu options. v Display the dual hard disk drive status. Use the Main Service Menu, Repair Menu, Cluster Dual Hard Disk Drive Repair Menu, Display Cluster Dual Hard Disk Drive Status. v If the status is not mirrored, call the next level of support. 105D Description: A cluster dual hard drive bosboot command failed. Action: Go to step 3 on page 400. 105E Description: A cluster dual hard drive has a SCSI ID problem. Action: Connect the service terminal to the failing cluster. Use the Identify/Replace a Failing Cluster Hard Disk Drive option on the Cluster Dual Hard Disk Drive Repair Menu to identify and correct the SCSI ID conflict.

Problem Isolation Procedures, CHAPTER 3

399

MAP 43B0: Cluster Dual Hard Drive ESC 1xxx


Table 66. ESC Repair Actions (continued) ESC 105F Description and Action Description: One cluster dual hard disk drive is failing. The other is not mirrored so cannot be used. The AIX and Licensed Internal Code will have to be reloaded. Action: 1. Connect the service terminal to the failing cluster. 2. Determine which hard disk drive needs to be replaced, HDD1 or HDD2. (The SCSI signal cable to each drive is labeled with HDD1 and HDD2.) Use the Main Service Menu, Repair Menu, Cluster Dual Hard Disk Drive Repair Menu, and Display Cluster Dual Hard Drive Status option. 3. Quiesce and power off the cluster. Connect the service terminal to the working cluster. Use the Main Service Menu, Repair Menu, and Alternate Cluster Repair menu options. 4. Replace the failing hard drive, use CD-ROM and Hard Disk Drive Removals and Replacements, 2105 Model 800 CEC Drawer in chapter 4 of the Volume 2. When the hard drive is replaced, return here and continue with the next step. 5. Go to MAP 4020: Hard Disk Drive Build Process for Both Drives to reload all the cluster software. MAP 4020 will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive. 1060 Description: The cluster IML from the second hard disk drive in the bootlist, instead of the first. Action: Go to MAP 43C0: Cluster IML from Second Hard Disk Drive.

3. Connect the service terminal to the failing cluster. Use Display Cluster Dual Hard Drive Status on the Cluster Dual Hard Disk Drive Repair Menu to determine the status of the cluster hard disk drives. Find the condition listed below and the action to perform: v If the status of both hard disk drives is good, the problem may have already been repaired but the problem was not closed. Close the problem. If the problem was not already repaired, call the next level of support. v If one hard disk drive has good status and the other does not, use the Identify/Replace a Failing Cluster Hard Disk Drive option on the Cluster Dual Hard Disk Drive Repair Menu to repair the failing hard disk drive. If this option does not show one hard disk drive that can be repaired, call the next level of support. v If both hard disk drives show bad status, call the next level of support before using MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320 to reload all the cluster software. That MAP will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive.

MAP 43C0: Cluster IML from Second Hard Disk Drive


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
ESC 1060 is used to report a problem that caused the cluster to IML from the second hard disk drive, in the SMS boot list, instead of the first.

400

VOLUME 1, TotalStorage ESS Service Guide

MAP 43C0: Cluster IML from Second Hard Disk Drive

Isolation
1. There is a problem with the first hard disk drive in the cluster SMS boot list. The cluster did IML from the second hard disk drive listed in the SMS boot list. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Repair the related problem. If there is no related problem, call the next level of support.

MAP 43D0: Duplicate TCP/IP Address Detected for this Cluster


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster reporting this error has detected a duplicate of its TCP/IP address on the network (ESSNet or customer) connected to its ethernet port.

Isolation
1. The cluster reporting this error has detected a duplicate of its TCP/IP address on the attached network. Use standard LAN network problem isolation techniques. Note: The following actions may assist you: v Verify that the TCP/IP addresses are set correctly in this cluster. v Determine the network topology. Direct connected to customer network. Connected to an ESSNet network that is not connected to a customer network. Connected to an ESSNet network that is connected to a customer network. v Disconnect the ethernet cable to this cluster and use ping commands from the ESSNet console (or customer console if attached) to help identify the source of the duplicate address. v Call the next level of support as needed.

MAP 43E0: Service Processor Reset


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A service processor reset is needed to attempt to clear an error condition.

Isolation
1. The service processor reset will cause the cluster to power off. 2. Login to the cluster not being serviced. 3. Quiesce the cluster x. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 4. Reset the service processor. Use a small straight nonconducting object, or an insulated paper clip (straightened), as a tool to activate the switch. The switch is very small in diameter, insert the tool straight into the hole 1 , keeping it at a
Problem Isolation Procedures, CHAPTER 3

401

MAP 43E0: Service Processor Reset


right angle to the plastic bezel. Press the switch until it clicks. After the switch is activated, the service processor will reset and power off the cluster. Note: Earlier 2105s have two reset switch holes on the operator panel, the upper hole 1 is the service processor reset switch. The lower reset hole is not present on later 2105s.

Front View
Figure 132. CEC Drawer Operator Panel Locations (s009652)

5. Power on the cluster. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 6. If the cluster IMLs normally, resume the cluster. If the cluster hangs displaying a progress or error code in the CEC drawer operator panel go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371, or call the next level of support.

MAP 4400: Displaying Cluster SMS Error Logs


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The SMS (System Management Services) includes an option to display SMS errors logs. These logs may contain repair information for a failure that did not create a viewable problem.

402

VOLUME 1, TotalStorage ESS Service Guide

MAP 4400: Cluster SMS Error Logs

Procedure
Note: This procedure requires the cluster to be taken away from customer use. 1. Quiesce and power off the failing cluster: a. Connect the service terminal to the cluster not being repaired. b. Use the Main Service Menu, Repair Menu, and then the Alternate Cluster Repair Menu options. 2. Power on the failing cluster and immediately go to the next step. v Use the Alternate Cluster Repair Menu options. 3. Display the SMS entry menu: a. Connect the service terminal to the failing cluster b. Keep logically connecting to the failing cluster until the work keyboard is displayed. Note: The firmware boot may disconnect the service terminal one or more times.

RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 Memory ====> Keyboard

c. Then quickly press the 1 key to bring up SMS. 4. Select Utilities and then Display Error Logs. 5. If an error is logged, check the time stamp: v If the error was logged during the current boot attempt, record it, then look up the error in Chapter 9: Error Messages, Diagnostic Codes, and Service Reports of the Volume 3. v If no recent error is logged in the error log, go to MAP 2700: CEC Drawer Power On Problem on page 170.

MAP 4410: Cluster to Cluster Ethernet Communication Test


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Each cluster must have the correct TCP/IP settings for both clusters. Each cluster must have a working ethernet hardware connection to the other cluster.

Isolation
1. Is the Code EC level above 2.3.0.0? Note: The current Code EC level can be seen on the logon screen. v Yes, continue with the next step. v No, the Cluster to Cluster test on levels prior to 2.3.0.0 can give unpredictable results. Go to step 3 on page 404 to verify cluster to cluster communications.
Problem Isolation Procedures, CHAPTER 3

403

MAP 4410: Cluster Ethernet Communication Test


2. Test the cluster to cluster ethernet communication using the service terminal. From the service terminal Main Service Menu, select: Machine Test Menu External Connections Menu Cluster-Cluster Communications Test Was the test successful? v Yes, return to the procedure that sent you here. v No, go to step 4. 3. Test each cluster individually. Use the Main Menu, Display / Repair Problems Needing Repair from each cluster. This option uses the ethernet connection to display problems from both clusters: v If each cluster can display the problem from the other cluster, return to the procedure that sent you here. v If either cluster fails, continue with the next step. 4. Verify that the TCP/IP settings for each cluster match the entries on the customer supplied communications worksheets (for the local cluster). Note: Use the customer supplied communications worksheets to verify the correct entries for Cluster hostnames, Cluster TCP/IP addresses, and Network Mask, Gateway. If the ESS Net is not connected to the customer network, the TCP/IP settings may have been set using the values in the service guide Install Chapter 5. a. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Change / Show TCP/IP Configuration Minimum Configuration and Startup b. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Alternate Cluster IP Address and Hostname 5. Verify that only current definitions exist in the /etc/hosts file, use: Configuration Options Menu Communications Resources Menu Change/Show TCP/IP Configuration Further Configuration Name Resolution Host Tables (/etc/hosts) List All Hosts a. Check that the IP address and cluster (host) name for each cluster is only listed once. Note: There may be other hosts listed if the customer uses Copy Services without DNS. You will also see a loopback address. b. If there are multiple entries for either cluster, remove them using: Remove a Host. c. Press F3 until you return to the Name Resolution menu. 6. If the customer uses DNS, then check that the correct DNS server or servers are defined. Use:

404

VOLUME 1, TotalStorage ESS Service Guide

MAP 4410: Cluster Ethernet Communication Test


Name Resolution Domain Nameserver (/etc/resolv.conf) List All Nameservers a. Check that only the required Nameserver or Nameservers are listed. b. Correct any entries using the Add and Remove options. 7. Repeat steps 4 on page 404 and 6 on page 404 for the other cluster. 8. Test the Cluster-Cluster Communications Test again using the instructions in steps 1 on page 403 to 3 on page 404 Was the test successful? v Yes, return to the procedure that sent you here. v No, go to map MAP 4390: Isolating a Cluster to Cluster Ethernet Problem on page 377 to further isolate the problem.did it work

MAP 4420: Display Cluster Ethernet Network Address


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The service login can display the UAA of the integrated ethernet adapter in the I/O Drawer Planar Assembly.

Procedure
1. Connect the service terminal to the cluster whos Ethernet network address will be displayed. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Change / Show TCP/IP Configuration Display Ethernet Network Address

MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
ESSNet1 or Master Console to cluster Ethernet problem.

Procedure
1. Verify that the following conditions are met before going to the next step: a. The ESSNet ethernet hub is powered on. (Use hub documentation to repair any problem.) b. The ESSNet console is powered on and ESSNet console software is active. (Use console documentation to repair any problem.) c. The 2105 cluster is powered on, the 2105 operator panel cluster ready indicator will be on. (Use normal repair actions to identify and correct any problem.) d. The ESSNet ethernet hub to 2105 cluster cable is correctly connected at both ends. (Use service guide Install Chapter 5 information if needed.)
Problem Isolation Procedures, CHAPTER 3

405

MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem


e. The ESSNet ethernet hub to ESSNet console PC cable is correctly connected at both ends. (Use service guide Install Chapter 5 information if needed.) Observe the ESSNet ethernet hub port indicator for the cluster connection. Is it on? v Yes, continue with the next step. v No, verify that the ethernet cable is good. Unplug and inspect the connectors, check the full length of the cable for any damage. If no damage is found reconnect the cable. Then do one of the following until the hub indicator comes on: Plug the cable into a known working port on this hub: - If that port indicator lights, there is a bad port on this hub, replace the hub. - If the indicator still does not light, return the cable to its original port and then continue. Try another ethernet cable. You may be able to temporarily use the ethernet cable from the other cluster on this 2105. The cluster ethernet connection may not be working. Replace the I/O drawer planar assembly FRU. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. Observe the ESSNet ethernet hub port indicators for the ESSNet console connection. Are the ESSNet ethernet hub port indicators on? v Yes, continue with the next step. v No, ensure the ethernet cable is good. Unplug and inspect the connectors, check the full length of the cable for any damage. If no damage is found, reconnect the cable, then do one of the following until the hub indicator comes on: Plug the cable into a known working port on this hub: - If that port indicator lights, there is a bad port on this hub, replace the hub. - If the indicator still does not light, return the cable to its original port and then continue. Try another ethernet cable. The ESSNet PC ethernet connection may not be working. Use the PC documentation to test or replace the FRU. Determine what type of ESSNet console you have: v Master Console (uses a Multiport Serial Adapter and Linux operating system), continue with the next step. v ESSNet1 Console (uses a modem expander and Windows operating system), go to step 6 on page 407. Use the Master Console to do an ethernet ping test. a. Double click on the Console Launcher icon. b. At the popup window, login as service, with the password of service. c. At Hardware Management Console window in the Views pane, click on Console Actions. d. In the Console Actions pane, double click on the Network Diagnostic Information icon.

2.

3.

4.

5.

406

VOLUME 1, TotalStorage ESS Service Guide

MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem


e. On the Network Diagnostic Information window, enter the TCP/IP address of the cluster bay and then click the Ping button. Was the ping test successful? v Yes, the Master Console ethernet adapter is able to communicate with the cluster ethernet adapter. The Master Console software should be able to communicate with the cluster software. Retry the original operation that failed. If it still fails, go to the No leg of this question. v No, check all ESSNet TCP/IP settings in both the Master Console and the cluster. Use the customer worksheets and SG Install Chapter 5 section for the Master Console install and configuration. 6. Use the ESSNet1 console to do an ethernet ping test. Go to the ESSNet console and open a DOS window. At the command line, enter a ping command with the cluster TCP/IP address. This will test the communication from the ESSNet console to the cluster. The general format is ping 9.172.31.1. v If the ping is successful, a line of information will be displayed each time data is received back from the cluster. For example: 74 bytes from 9.113.24.123: icmp_seq=0 ttl=252. v If the ping is not successful, the line of information will not display and the test may appear to hang with no response. Note: Do not leave the ping test running as it will slow down all communications through the hub. Press Ctrl/C to quit the ping test. Was the ping test successful? v Yes, the ESSNet1 console ethernet adapter is able to communicate with the cluster ethernet adapter. The ESSNet1 console software should be able to communicate with the cluster software. Retry the original operation that failed. If it still fails, use the No leg of this question. v No, check all ESSNet TCP/IP settings in both the ESSNet1 console and the cluster. Use the customer worksheets and SG Install Chapter 5 section for the Master Console install and configuration.

MAP 4450: ESS Cluster to Customer Network Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 2105 Model 800 cluster ethernet connections to the customer LAN network are made through the ESSNet ethernet hub or switch. All the TCP/IP settings including the ethernet protocol (en0 or et0) across the network must be compatible. Note: In this Map the term ESSNet console refers to either the ESSNet1 console or the Master Console.

Isolation
1. Verify the following ESSNet ethernet hub indications are present: a. Power LED is on. b. Error indicator LEDs are off. Reference the ethernet hub documentation. Are the hub indicators as listed above? v Yes, continue with the next step. v No, use the ESSNet ethernet hub documentation to correct the problem.
Problem Isolation Procedures, CHAPTER 3

407

MAP 4450: ESS Cluster to Customer Network Problem


2. Observe the ESSNet ethernet hub port indicators for the ports connected to cluster 1 and cluster 2. The indicator is: v Off, if the hub port cannot detect the cluster. On, if the hub port can detect the cluster. Blinking, if the hub port is passing data to/from the cluster. the hub port indicator for the cluster On/Blinking? Yes, continue with the next step. No, go to MAP 4390: Isolating a Cluster to Cluster Ethernet Problem on page 377. 3. Observe the ESSNet ethernet hub port indicator for the port connected to the ESSNet console. The indicator is: v Off, if the hub port cannot detect the ESSNet console. v On, if the hub port can detect the ESSNet console. v Blinking, if the hub port is passing data to/from the ESSNet console. Is the hub port indicator On/Blinking? v Yes, continue with the next step. v v Is v v v No, go to MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem on page 405. 4. Observe the ESSNet ethernet hub port indicator for the port connected to the customer LAN. The indicator is: v Off, if the hub port cannot detect the customer LAN connection. v On, if the hub port can detect the customer LAN connection. v Blinking, if the hub port is passing data to/from the customer LAN connection. Is the hub port indicator On/Blinking? v Yes, continue with the next step. v No, go to step 11 on page 410. 5. Verify that the TCP/IP minimum configuration and startup fields are set correctly. Compare it to the customer provided worksheet. Verify that the correct TCP/IP protocol (network interface) is selected, en0 or et0. The entire network must use the same protocol. Check it against the customer provided TCP/IP addresses. Use the following service terminal option while connected to the failing cluster: From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Change / Show TCP/IP Configuration Minimum Configuration & Startup Verify the correct TCP/IP protocol (network interface), en0 or et0 is selected. Are the fields set correctly? v Yes, continue with the next step. v No, correct the fields and retest the communications. 6. Use the following service terminal option while connected to the failing cluster:

408

VOLUME 1, TotalStorage ESS Service Guide

MAP 4450: ESS Cluster to Customer Network Problem


From the service terminal Main Service Menu, select: Machine Test Menu External Connections Menu LAN Test Enter the IP address of the ESSNet console. Was the ping test successful? v Yes, continue with the next step. v No, ping the address for the alternate cluster. If successful then go to MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem on page 405. If unsuccessful then go to MAP 4390: Isolating a Cluster to Cluster Ethernet Problem on page 377. 7. Is a customer Gateway configured? v Yes, enter a ping command to the customer Gateway TCP/IP address. If the ping is successful then continue with the next step. If the ping is unsuccessful then work with the customer to resolve the problem (the Gateway is unavailable or the TCP/IP address is incorrect.) v No, continue with the next step. 8. Is a customer Nameserver or Nameservers configured? Notes: a. If the customer uses the Copy Services Command Line Interface feature, a Nameserver must be sepcified or alternatively all CLI hosts must be defined in the cluster etc/hosts file. See Configure Copy Services in Chapter 5 of Volume 2. b. To check for multiple Nameservers see Changing TCP/IP Configuration (Single Cluster Version) in Chapter 6 of Volume 2. v Yes, enter a ping command to each of the configured nameserver addresses. If the ping is successful then continue with the next step. If the ping is unsuccessful then work with the customer to resolve the problem (the Nameserver or Nameservers are unavailable or the TCP/IP address is incorrect.) v No, if the customer is using Copy Services CLI then please see Note a above. Continue with the next step. 9. Enter a ping command to the TCP/IP address of the customer host which is experiencing problems. Was each ping test successful? v Yes, continue with the next step. v No, there appears to be a customer network problem external to the ESSNet. 10. All basic connectivity tests were successful. Retry the failing operation to verify if the original problem still exists. Does the original problem still occur? v Yes, take one of the following actions based on the failure symptoms: Email notification problems, go to MAP 1310: Isolating E-Mail Notification Problems on page 58. SNMP problems, go to MAP 1305: Isolating SNMP Notification Problems on page 56.

Problem Isolation Procedures, CHAPTER 3

409

MAP 4450: ESS Cluster to Customer Network Problem


ESS Specialist cannot access cluster, go to MAP 5000: ESS Specialist Cannot Access Cluster on page 540. Other symptoms - call your next level of support. v No, Go to the Repair Menu, End of Call Status. 11. Verify the ethernet cable from the customer LAN to the ethernet hub is properly connected. Is the cable connected at both ends? v Yes, continue with the next step. v No, connect the cable and retry the test. 12. Have the customer verify their ethernet hub is on and has no check conditions for the hub or the port that is connected to the ESSNet ethernet hub. Have the customer reset the hub if possible. Is the customer ethernet hub on and error free? v Yes, continue with the next step. v No, have the customer correct the problem and then retest. 13. At the ESSNet ethernet hub, unplug the customer ethernet cable and plug it in to a known good port. Is the hub port indicator On/Blinking? v Yes, the original hub port was not working. Use the hub documentation to correct the problem. The hub may need to be reset or replaced. v No, reconnect the cable to its original port. Go to the next step. 14. At the customer ethernet hub, have the customer unplug the customer ethernet cable and plug it into a known good port. Is the port indicator on both ethernet hubs for this cable On/Blinking? v Yes, the original hub port was not working. Have the customer correct the problem. The hub may need to be reset or replaced. v No, have the customer reconnect the cable to its original port. Continue with the next step. 15. Have the customer test or replace the ethernet cable. Verify the cable is the proper type for the port speed and distance. Is the port indicator on both ethernet hubs for this cable On/Blinking? v Yes, the connection is now working. Retest the communication. v No, call the next level of support.

MAP 4460: Cluster NVS Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The FRUs listed in the problem details did not repair the problem, additional NVS related FRUs must be replaced.

Isolation
1. Observe the FRUs listed in the problem details display. Select the condition that applies: v Only NVS/IOA cards are listed, go to step 2 on page 411. v Only the NVS battery charger card and/or the NVS battery assembly are listed, go to step 3 on page 411.

410

VOLUME 1, TotalStorage ESS Service Guide

MAP 4460: Cluster NVS Problem


v A combination of NVS/IOA cards, NVS battery charger card, and NVS battery assembly are listed, go to step 4. 2. The problem is not directly related to NVS power. If replacing both NVS/IOA card does not repair the problem, replace the remaining FRUs in the order listed: v I/O drawer planar assembly (connects the two NVS/IOA card together through the rear planar connectors). v NVS battery charger card Note: Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. 3. If replacing the NVS battery charger card and NVS battery assembly does not repair the problem, replace the remaining FRUs in the order listed: v I/O drawer planar assembly (connects the battery charger card to the NVS/IOA cards through the rear planar connectors). v NVS/IOA charger card (may be reporting a false NVS power error) Note: Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. 4. The problem could not be isolated to NVS power or logic, so both type FRUs were listed in the problem details. v If replacing the listed FRUs does not repair the problem, replace the I/O drawer planar assembly. v If the problem is still not repaired, call the next level of support. Note: Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs.

MAP 4470: ESC 2768, NVS/IOA Card Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
An I/O slot failure has been detected.

Isolation
1. Replace the FRU listed in the problem details. If the FRU does not repair the problem, call the next level of support. Note: The next level of support will need to get the PE package and statesaves for engineering assistance.

MAP 4480: Cluster to RPC Cards Communication Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A problem with a FRU list that contains both RPC cards, and the I/O drawer planar assembly. Each RPC card communicates with the service processor function on the I/O drawer planar assembly. v Each RPC card has a separate status register for each cluster that can be read.
Problem Isolation Procedures, CHAPTER 3

411

MAP 4480: Cluster to RPC Cards Communication Problem


v The path from the cluster code to the RPC registers is: Cluster code Service processor on the I/O drawer planar assembly RJ45 card on the I/O drawer planar assembly Cable from I/O drawer planer assembly connector J14 to RJ45 Card connector 1 Cables from RJ45 card to the RPC 1 and RPC 2 cards RPC 1 card and RPC2 card v The clusters compare the status they receive from the RPC cards. If the status is not the same, the error recovery code will create a problem and will fence (remove from use) a cluster or an RPC Card. The resource fenced is the most likely cause of the problem. v There are five basic types of error conditions that are listed in the table below. The fencing action for each type is shown. The fenced resource will normally contain the FRU having the highest percent probability of fixing the error condition. It should be replaced first.
Table 67. Conditions for Fencing Condition Only one cluster reads bad status from both RPC cards. The other cluster reads good information. Only one cluster reads bad status from one RPC card. The other cluster reads good information from the same card. An RPC card presents invalid status to one or both clusters. A cluster cannot read the status from one RPC card. A cluster cannot read the status from both RPC cards. See Note 1. Notes: 1. If a cluster cannot talk to both RPC cards, it will fence the first one and create a problem (either ESC=8314 or 8315). It cannot fence the remaining RPC card, so it reboots the cluster to try to recover. If it still cannot talk to the RPC card, that cluster is left fenced. The failure would be those FRUs that could be common to both RPC cards. That would be the I/O drawer planar in the failing cluster or the I/O drawer to RPC cards communication Y cable. Fences a Cluster Yes No Fences an RPC Card No Yes

No No No

Yes Yes Yes

v When replacing a cluster FRU, the communication to both RPC Cards is only tested if both RPC Cards are not fenced. If an RPC Card is fenced, it must be quiesced and then resumed to test the communication from the cluster. v When replacing an RPC Card, the cluster to cluster comparison of the RPC status occurs only if both clusters are not fenced or quiesced. If a cluster is fenced or quiesced, it must be resumed to run the cluster to cluster RPC status comparison.

412

VOLUME 1, TotalStorage ESS Service Guide

MAP 4480: Cluster to RPC Cards Communication Problem

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Is the problem ESC=8314 or 8315? v No, got to step 3. v Yes, go to step 2. 2. See Note 1 in Table 67 on page 412 above. The most likely failing FRU would be the I/O drawer planar in the failing cluster or the RPC card to the I/O drawer planar communication cables should be checked. Reference Volume 3, chapter 10, 2105 Model 800 Cluster Power Control Diagram. The cable at I/O drawer connector R1 is common to both RPC cards. If it is, then replace one or both FRUs. If the FRUs cannot be selected in the problem, then use the Repair Menu, Replace a FRU option. After the problem is repaired, close the problem and then use the Repair Menu, End of Call Status option. 3. If the FRUs listed in the problem do not fix the problem, use this list of all possible FRUs. v I/O drawer planar assembly v RJ45 card (on the front of the I/O drawer planar assembly) v External cable from I/O drawer planer assembly connector J14 to RJ45 Card connector 1 v Cables from RJ45 card to the RPC 1 and RPC 2 cards v RPC 1 card and RPC2 card Display the problem details that sent you here and write down the timestamp value in the last occurrence field. After the FRU replace you will display this field again. If the value has been updated, then the same failure is still occurring and additional FRUs will need to be replaced. The FRU list contains both RPC cards and one or more cluster FRUs: v To replace a cluster FRU, go to step 6. v To replace an RPC card FRU, go to step 12 on page 414. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the cluster FRU. Return here after the cluster FRU replacement is completed and the cluster has come ready. Display the problems to determine if a problem is still occurring. v If the original problem last occurrence timestamp value has been updated, the problem is still occurring. Return to the beginning of this MAP to replace the remaining FRUs or call the next level of support if all FRUs have been replaced. v If a new related problem was created, repair that problem now. After that repair is complete return to this MAP if the original problem is still occurring. (The last occurrence timestamp field value of the original problem was updated during the last cluster power on.) v If the original problem was not updated and there is no new related problem, continue with the next step. Quiesce and then Resume RPC-1. This will ensure that both clusters read the status register from the RPC-1 card. From the service terminal Main Service Menu, select:
Problem Isolation Procedures, CHAPTER 3

4.

5.

6.

7.

8.

413

MAP 4480: Cluster to RPC Cards Communication Problem


Utility Menu Resource Management Menu Quiesce a Resource Select the Rack Power Control Card to quiesce. Use the Resume a Resource option to resume that RPC Card. 9. Display the problems to determine if a problem is still occurring. v If the original problem last occurrence timestamp value has been updated, the problem is still occurring. Return to the beginning of this MAP to replace the remaining FRUs. If all FRUs have been replaced, call the next level of support. v If a new related problem was created, repair that problem now. After that repair is complete return to this MAP if the original problem is still occurring. (The last occurrence timestamp field value of the original problem was updated during the last cluster power on.) v If the original problem was not updated and there is no new related problem, continue with the next step. 10. Quiesce and then Resume RPC-2. This will ensure that both clusters read the status register from the RPC-2 card. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource Select the Rack Power Control Card to quiesce. Use the Resume a Resource option to resume that RPC Card. 11. Display the problems to determine if a problem is still occurring. v If the original problem last occurrence timestamp value has been updated, the problem is still occurring. Return to the beginning of this MAP to replace the remaining FRUs. Note: If all FRUs listed in the problem have been replaced, additional FRUs listed in the description section of this MAP need to be replaced. To replace the cables, use the Replace a FRU option for the RPC card the cable is connected to. If it still fails, call the next level of support. v If a new related problem was created, repair that problem now. After that repair is complete return to this MAP if the original problem is still occurring. (The last occurrence timestamp field value of the original problem was updated during the last cluster power on.) v If the original problem was not updated and there is no new related problem, go to MAP 1500: Ending a Service Action on page 67. 12. Replace the RPC Card. Use the service terminal Replace A FRU option to replace the RPC card. Then return here and continue with the next step. 13. Display the problems to determine if a problem is still occurring. v If the original problem last occurrence timestamp value has been updated, the problem is still occurring. Return to the beginning of this MAP to replace the remaining FRUs. If all FRUs have been replaced, call the next level of support. v If a new related problem was created, repair that problem now. After that repair is complete return to this MAP if the original problem is still occurring. (The last occurrence timestamp field value of the original problem was updated during the last cluster power on.)

414

VOLUME 1, TotalStorage ESS Service Guide

MAP 4480: Cluster to RPC Cards Communication Problem


v If the original problem was not updated and there is no new related problem, continue with the next step. 14. Determine if a cluster is fenced. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Show Fenced Resources v If a cluster is fenced, continue with the next step. v If no cluster is fenced, go to MAP 1500: Ending a Service Action on page 67. 15. Quiesce the cluster using the Alternate Cluster Repair menu options. Connect the service terminal to the cluster that is not fenced. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Quiesce the Alternate Cluster Resume the cluster using the Alternate Cluster Repair menu options. The resume causes the cluster to load code as if it were being powered on. It then does a fail-back of the resources from the other cluster. 16. Display the problems to determine if a problem is still occurring. v If the original problem last occurrence timestamp value has been updated, the problem is still occurring. Return to the beginning of this MAP to replace the remaining FRUs. Note: If all FRUs listed in the problem have been replaced, additional FRUs listed in the description section of this MAP need to be replaced. To replace the cables, use the Replace a FRU option for the RPC card the cable is connected to. If it still fails, call the next level of support. v If a new related problem was created, repair that problem now. After that repair is complete return to this MAP if the original problem is still occurring. (The last occurrence timestamp field value of the original problem was updated during the last cluster power on.) v If the original problem was not updated and there is no new related problem, go to MAP 1500: Ending a Service Action on page 67.

MAP 4510: Isolating a Cluster to Cluster CPI Communication Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is used for a cluster to cluster CPI communication timeout. The communication after AIX is loaded and as the functional code loads occurs across the CPI interfaces (cluster 1 I/O Attachment Card to the Host Bay Planar Card to the cluster 2 I/O Attachment Card). There are four CPI interfaces that may be used. Once the cluster code is loaded, each cluster periodically sends a communication message to the other cluster (heartbeat) and sets a timer waiting for the response. If the timer expires with no response, the error recovery process will cause the non-responding cluster to failover its resources to the originating cluster. The non-responding cluster is then fenced (which removes customer use of that cluster).
Problem Isolation Procedures, CHAPTER 3

415

MAP 4510: Cluster to Cluster CPI Communication


The originating cluster attempts to power cycle the non-responding cluster to reload its code in an attempt to recover it for customer use. A timer is set waiting for the code load and failback to complete. v If the non-responding cluster hangs loading the code, this become a cluster boot or cluster down problem. This will cause the working cluster to have a communication timeout and it will create a problem with MAP 4510 for isolation. The code load process normally leaves an error or progress code displayed in the CEC drawer operator panel. v The 2105 Model 800 code will begin cluster to cluster communication testing (heartbeats) during the code loading. It checks all 4 CPI paths. If any fail, the cluster is power cycled up to two times to reload the code and attempt to clear the condition. If the communication timeout is still present, the failing CPI path will be fenced. If all 4 CPI paths are failing, the cluster will be fenced. v If the cluster successfully loads the code, then the error recovery process will attempt to failback the resources to their original cluster. If the failback is not successful this creates a communication timeout which will create a problem with MAP 4510 for isolation. v If the failback is successful, the error recovery timer is reset and a communication timeout will not occur. The cluster that created the original communication problem may still have created a problem, even if it was temporary and the cluster recovered and the cluster Ready indicator on the 2105 Model 800 operator panel is on.

Isolation
Use the following steps to continue this repair action. 1. Ensure that the problem is still displayed on the service terminal. Note the following: v Failing Cluster should be the other cluster (not the one the service terminal is connected to). v Reporting Cluster should be the cluster you are connected to. v Ignore the information in the Failure Actions, Probable Cause, Failure Cause and User Actions fields. This information is only used by engineering and the next level of support. 2. Observe the cluster Ready indicator LED for the failing cluster on the 2105 Model 800 operator panel. Is the Ready indicator LED on? v Yes, the cluster has successfully completed the power on error recovery. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option) and repair any related problems. Then go to MAP 1500: Ending a Service Action on page 67. v No, the cluster did not successfully complete the power on error recovery. Continue with the next step. 3. Observe the CEC drawer operator panel. Is the cluster hung displaying a code on the operator panel? v Yes, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 and use the codes displayed on the CEC drawer operator panel. v No, display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option) and repair any related problems. If there are none, call the next level of support.

416

VOLUME 1, TotalStorage ESS Service Guide

MAP 4520: Pinned Data and/or Volume Status Unknown

MAP 4520: Pinned Data and/or Volume Status Unknown


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
Pinned Data can exist for DASD Fast Write, High Bandwidth Sequential Fast Write, and Cache Fast Write Data. Pinned Data is caused by failures that prevent data from being destaged to DASD. These are either DASD failures that make the array/volume unavailable or failures that make cache and/or NVS data unavailable. Pinned Data can only be freed or un-pinned by successful retry of the destage operation or a request to discard the pinned data is received from the host or service interface.

Isolation
1. Use this step to collect the needed information and then call the next level of support. Do not perform any repair unless directed by the next level of support. If repairs are performed in the wrong sequence, customer data loss can occur. a. Determine all of the volumes with Pinned Data and/or Volume Status Unknown. From the service terminal Main Service Menu, select: Utilities Menu Pinned Data Menu/Volume Status Unknown Display Pinned Data Note: Volumes displayed have retryable pinned data, non-retryable pinned data or FC (no global subsystem status). A volume can be listed with more than one pinned data status. Pinned data status can be caused by hardware problems which create problems. Retryable pinned data is normally caused by DASD or SSA interface problems. Non-retryable pinned data is normally caused by cluster problems. FC status can be caused by either of the above problem types. b. Display problems needing repair. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problem Needing Repair c. Continue with the next step. 2. Call your next level of support now. Have ready the information you gathered in the last step. Your next level of support may need to login remotely and perform additional problem analysis. 3. Your next level of support may direct you to do the following steps after they have reviewed all of the information. They may change the order of the repairs. Wait for them to guide you before continuing. 4. Are there any DASD or SSA interface related problems? v Yes, repair the DASD or SSA interface problems. The repair may allow retryable pinned data to destage. (An SSA loop with only one DDM failure will not normally cause pinned data if the DDM is part of a RAID array.)
Problem Isolation Procedures, CHAPTER 3

417

MAP 4520: Pinned Data and/or Volume Status Unknown


v No, continue with the next step. 5. Are there any cluster related problems? v Yes, repair the cluster problems. The repair may allow pinned data to destage so the retryable pinned data status is reset. The repair process may require you to discard non-retryable pinned data before the FRUs are replaced. This will cause customer data loss. v No, continue with the next step. 6. After all related repairs have been completed, display the pinned data status. Do any volumes still have retryable or non-retryable pinned data? v Yes, inform the next level of support. v No, continue with the next step. 7. Do any volumes have FC status (no valid global subsystem status available)? v Yes, go to MAP 4560: No Valid Subsystem Status Available on page 427. v No, go to the Repair Menu, End of Call Status option.

MAP 4540: Cluster Minimum Configuration


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is used to locate defective FRUs not found by normal diagnostics that hang the cluster preventing it from loading code. The problem may be in the CEC drawer or the I/O drawer. Use the following figures to locate CEC and I/O drawer bulkhead connectors:
(SCSI Signal) Q7 (SCSI Power) Q8

Fan 7

Fan 8

Front View
Q1 V/S Comm Q3 RIO-0 Q2 RIO-1 Q4 JTAG

Figure 133. CEC Drawer Bulkhead Connector Locations (s009527)

418

VOLUME 1, TotalStorage ESS Service Guide

MAP 4540: Cluster Minimum Configuration


Media Power (CEC Drawer SCSI Devices)

RIO 1

RIO 0

OP (CEC Drawer Operator Panel) P3 P2 P1

10-100 (Ethernet) Q1 (V/S Comm)

No Use

Debug (not used)

S1 S3

S2 S4

J11 J15

J14 J16

Q4 Q7 (CEC Drawer SCSI Signal)

R1 (JTAG)

Figure 134. I/O Drawer Bulkhead Connector Locations (s009526)

MAP 4540 Section-1


This MAP section removes the cluster from customer use, displays the service processor error logs, and sets the cluster reboot value from 3 to 0. 1. Quiesce the failing cluster. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Quiesce Alternate Cluster 2. Check the service processor (SP) error logs. The service processor may have recorded one or more symptoms in its error log. Note: If the error condition does not allow this, continue with the next step. If the cluster will not power off, go to MAP 4730: Cluster Power Off Request Problem on page 446. To access the service processor menus, connect the service terminal to the working cluster. To power off the failing cluster, use the: Main Service Menu Repair Menu Alternate Cluster Repair Menu. When the failing cluster CEC drawer operator panel displays OK, connect the service terminal (CE most or laptop) to the failing cluster and login. Press the Enter key to cause a keyboard interrupt to display the SP Main Service Menu. Use the SP: MAIN MENU System Information Menu
Problem Isolation Procedures, CHAPTER 3

419

MAP 4540: Cluster Minimum Configuration


Read Service Processor Error Logs. 3. Change the service processor reboot attempts setting from 3 to 0, using this step, then continue with MAP 4540 Section-2. Notes: a. If the error condition does not allow this, continue with MAP 4540 Section-2. b. Remember to reset the reboot attempts back to 3, after the isolation is complete. From the service processor MAIN MENU, select: System Power Control Menu Reboot/Restart Power-On Menu Number of reboot attempts

MAP 4540 Section-2


This MAP section sets the cluster in slow-mode boot and checks for a new error code. 1. Set the service processor to slow-mode boot. (A fast-mode boot skips much of the built-in diagnostic testing. A slow-mode boot may yield a new 8-character error code on the CEC drawer operator panel and new errors in the service processor error log.) Note: To disable fast system boot, use service processor: MAIN MENU System Power Control Menu Enable/Disable Fast System Boot. 2. Power on the cluster. (Connect the service terminal to the working cluster and use the Alternate Cluster Repair Menu options.) 3. Select the condition that applies: v The original failing error code still occurs. Reset the cluster to fast-mode boot and then go to MAP 4540 Section-3. v A new error code occurs. If the new error code identifies the failing cluster FRU, go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to repair the cluster. If the new error code does not identify the failing cluster FRU, go to MAP 4540 Section-3. v The cluster powers on and IMLs with no error code. The problem is no longer occurring, exit this MAP and return to the procedure that sent you here.

MAP 4540 Section-3


This MAP section uses the error code to determine if the problem is already isolated to the CEC drawer. 1. Use the following tables to locate, if possible, the error code that sent you to minimum configuration:

420

VOLUME 1, TotalStorage ESS Service Guide

MAP 4540: Cluster Minimum Configuration


Table 68. Minimum Configuration Error Codes First Four Characters 4066 to 406F Second Four Characters 00A1 00A3 00A6 00A7 4506 to 450F 4606 to 460F 4B26 to 4B2F 244C 244D 244E 244F 24A1 24A2 24A3 24A4 25BD 25BE 25BF 25EA 25EB 25F2 263D 271A 271D 288D 2A00 B166 to B1FF 4601 4660 469E 469F

Locate, if possible, the checkpoint that sent you to this MAP in the following table:
Table 69. Minimum Configuration Checkpoint Checkpoint 91FF 9380 94B0 94B1 Checkpoint 94B2 94BB 9501 9502 Checkpoint 9503 9504 9505 9506

Did you find the error code or checkpoint that sent you here in the above tables or did the action that sent you to MAP 4540 direct you to run the CEC Drawer Minimum Configuration? v Yes, go to MAP 4540 Section-4 on page 422.
Problem Isolation Procedures, CHAPTER 3

421

MAP 4540: Cluster Minimum Configuration


v No, go to MAP 4540 Section-9 on page 424.

MAP 4540 Section-4


This MAP section disconnects the RIO-0 cable, uses the RIO-1 cable to replace it and then tests the RIO-0 connections. 1. Power off the cluster, use the Alternate Cluster Repair Menu options. 2. At the CEC drawer, disconnect the RIO-0 cable at the Q3 port. 3. At the CEC drawer, disconnect the RIO-1 cable at the Q2 port and reconnect it to the Q3 port. 4. At the I/O drawer, disconnect the RIO-0 cable. 5. At the I/O drawer, disconnect the RIO-1 cable and reconnect it to the RIO-0 connector. 6.

CEC Drawer

V/S Comm

RIO 0 RIO 1

JTAG

RIO 1 I/O Drawer RIO 0 V/S Comm JTAG


s009721

Figure 135. CEC Drawer and I/O Drawer Communication (s009721)

With the RIO-1 cable connected between the CEC drawer RIO-0 port and the I/O drawer RIO-0 port, power on the cluster. Does the same error code still occur? v Yes, reconnect both RIO cables to their original connectors on both drawers, then go to MAP 4540 Section-5. v No, the original RIO-0 cable you removed in step 3 is defective. Replace the failing RIO cable. Reconnect the RIO-1 cable back to its original connectors on both drawers, then go to MAP 4540 Section-12 on page 426.

MAP 4540 Section-5


This MAP section removes all but one quad of memory DIMMs on each memory riser card and then checks if the same error code still occurs.

422

VOLUME 1, TotalStorage ESS Service Guide

MAP 4540: Cluster Minimum Configuration


1. Power off the cluster, use the Alternate Cluster Repair Menu options. 2. Switch off both CEC drawer power supplies (rear of drawer). Disconnect both power input cables to each power supply. 3. On the memory riser card in slot Tx-U1.1-P1-MI, record the DIMM locations and remove all the memory DIMMs except the ones in slots 1, 2, 15, and 16 (memory quad A).
Table 70. Memory Quad DIMMs Memory Quad A B C D Memory Riser Card DIMM Slot 1, 2, 15 ,16 3, 4, 13, 14 5, 6, 11, 12 7, 8, 9, 10

13

15 SLOT (15)A

SLOT (13)B 11 SLOT (11)C 9 SLOT (9)D

SLOT (16)A 16 SLOT (14)B 14 SLOT (12)C 12 SLOT (10)D 10 SLOT (8)D SLOT (6)C SLOT (4)B SLOT (2)A

5 1

7 SLOT (7)D
SLOT (5)C 3 SLOT (3)B SLOT (1)A

8 4

6 2

DIMM INSTALLATION
Figure 136. CEC Drawer, Memory Riser Card Memory DIMM Module Locations (s009241)

4. On the memory riser card in slot Tx-UI.1-P1-M2, record the DIMM locations and remove all the memory DIMMs except the ones in slots I, 2, 15, and 16 (memory quad A). 5. With the CEC drawer now configured with only minimum required memory, connect and switch on the power supplies, then power on the cluster. Does the same error code still occur? v Yes, go to MAP 4540 Section-6. v No, go to MAP 4540 Section-8 on page 424.

MAP 4540 Section-6


This MAP section replaces the memory riser card in slot M1 and M2, one at a time, checking if the same error code still occurs. 1. Power off the cluster, use the Alternate Cluster Repair Menu options. 2. Switch off both CEC drawer power supplies (rear of drawer). Disconnect both power input cables to each power supply. 3. Replace the memory riser card (Location: Tx-U1.1-P1-MI) with a new card. 4. Connect and switch on the power supplies, then power on the cluster. Does the same error code still occur? v Yes, go to MAP 4540 Section-7 on page 424. v No, the last item replaced was defective, go to MAP 4540 Section-12 on page 426.
Problem Isolation Procedures, CHAPTER 3

423

MAP 4540: Cluster Minimum Configuration

MAP 4540 Section-7


This MAP section replaces the memory DIMM quad A on the riser card in slot M1, the memory riser card in slot M2, the processor card, and CEC drawer planar assembly and checks if the error code still occurs. 1. Power off the cluster, use the Alternate Cluster Repair Menu options. 2. Switch off both CEC drawer power supplies (rear of drawer). Disconnect both power input cables to each power supply. 3. Replace the items in the following list, ONE at a time. a. Memory DIMMs (quad A on memory riser card I) with new or previously removed DIMMs or memory DIMMs b. Memory riser card (Location: Tx-U1.1-P1-M2) c. Processor card (Location: Tx-U1.1-P1-CI) d. CEC drawer planar assembly (Location: Tx-U1.1-P1) 4. Connect and switch on the power supplies, then power on the cluster. Does the same error code still occur? v Yes, the last item replaced was defective, go to MAP 4540 Section-12 on page 426. v No, continue with the list of FRUs until all of the FRUs have been replaced, then go to MAP 4540 Section-9.

MAP 4540 Section-8


This MAP section reinstalls memory DIMM quads to find which is causing the error code. 1. Power off the cluster, use the Alternate Cluster Repair Menu options. 2. Switch off both CEC drawer power supplies (rear of drawer). Disconnect both power input cables to each power supply. 3. Reinsert one or more of the memory DIMM quads to isolate which is causing the problem. Note: Four memory DIMMs must be installed in the slots that make a quad, if not, memory errors due to incorrect plugging will occur. 4. Connect and switch on the power supplies, then power on the cluster. Does the same error code still occur? v Yes, One or more of the DIMMs you just reinserted is defective. Isolate to the failing memory DIMM quads by temporarily replacing them with new or previously removed DIMMs. If not, the memory riser card is bad. Replace the failing FRU and then go to MAP 4540 Section-12 on page 426. v No, Repeat this step until all the memory DIMMs have been reinstalled, then go to MAP 4540 Section-7.

MAP 4540 Section-9


This MAP section unplugs NVS/IOA and NVS battery charger cards and cables from the I/O drawer, then checks if the same error code occurs. Note: Removing all the cards in the PCI slots at the same time may cause damage to the I/O planar assembly on power up. 1. Power off the cluster, use the Alternate Cluster Repair Menu options. 2. Switch off both I/O drawer power supplies (rear of drawer). Disconnect both power input cables to each power supply. 3. Switch off both CEC drawer power supplies (rear of drawer). Disconnect both power input cables to each power supply.

424

VOLUME 1, TotalStorage ESS Service Guide

MAP 4540: Cluster Minimum Configuration


4. Connect the service terminal to the I/O drawer S1 serial port of the failing cluster. 5. Leave the remaining external cables connected (RlO-port0, V/S COMM. RlO-1 port, and JTAG) 6. Disconnect the signal cable from the diskette drive in the I/O drawer. 7. Disconnect the SCSI signal cable from the CD-ROM drive in the CEC drawer. 8. Remove the following cards and label them so they can reinstalled in the original positions: v Label and remove the NVS battery charger cards in PCI slots I6 and I10. v Label and remove NVS/IOA cards in PCI slots I3, I4, I5, and I9. 9. Connect and switch on the power supplies, then power on the cluster. Does the same error code still occur? v Yes, go to MAP 4540 Section-10. v No, go to MAP 4540 Section-11.

MAP 4540 Section-10


This MAP section reinstalls the adapter cards previously removed. It then unplugs the SSA cards from the I/O drawer PCI slots and checks if the same error code occurs. Note: Removing all the cards in the PCI slots at the same time may cause damage to the I/O planar assembly on power up. 1. Power off the cluster, use the Alternate Cluster Repair Menu options. 2. Switch off both I/O drawer power supplies (rear of drawer). Disconnect both power input cables to each power supply. 3. Switch off both CEC drawer power supplies (rear of drawer). Disconnect both power input cables to each power supply. 4. Reinstall the NVS/IOA cards and NVS battery charger cards to their original PCI slots. 5. Remove the following cards and label them so they can reinstalled in the original positions. Label and remove the SSA cards in PCI slots I1, I2, I11, and I12. 6. Connect and switch on the power supplies, then power on the cluster. Does the same error code still occur? v Yes, go to step MAP 4540 Section-12 on page 426. v No, go to step MAP 4540 Section-11.

MAP 4540 Section-11


The problem is with one of the adapter cards, devices or cables that was removed or disconnected from the I/O drawer. (The CD-ROM drive is in the CEC drawer but receives its power and signals from the I/O drawer.) 1. Power off the cluster. 2. Reinstall or connected the one or more of the FRUs that were just removed or disconnected in their original locations. 3. Power on the cluster. Does the same error code still occur? v Yes, one of FRUs just installed is causing the error code. Repeat this MAP section plugging and unplugging FRUs until the failing FRU is isolated. Replace the failing FRU, then go to MAP 4540 Section-12 on page 426.

Problem Isolation Procedures, CHAPTER 3

425

MAP 4540: Cluster Minimum Configuration


v No, Reinstall the next adapters, device, or cables and return to the beginning of this step. Continue repeating this process until an adapter or device causes the same error code to occur.

MAP 4540 Section-12


This step replaces the remaining I/O drawer FRUs. 1. Power off the cluster, use the Alternate Cluster Repair Menu options. 2. Switch off both I/O drawer power supplies (rear of drawer). Disconnect both power input cables to each power supply. 3. Replace one or more of the FRUs below until the error code no longer occurs: a. I/O drawer planar assembly b. RIO card assembly c. I/O drawer power supplies d. Fan controller card 4. Connect and switch on the power supplies, then power on the cluster. Does the same error code still occur? v Yes, replace the next FRU in the list until the failing error code no longer occurs. If you have replaced all the FRUs and still have the failing error code, call the next level of support. v No, go to MAP 4540 Section-12.

MAP 4540 Section-12


This MAP section returns the cluster to customer use. 1. Reinstall and reconnect all the remaining FRUs and cables. 2. Reset the service processor reboots attempt value back to 3, reference step 3 on page 420. 3. Power on the cluster and ensure it no longer gets the failing error code. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to check if the replaced FRU needs any additional action such as a firmware update or diagnostics before resuming the cluster. Complete the service action using the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem End of Call Status.

MAP 4550: NVS FRU Replacement


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
When a problem calls this MAP, the NVS FRU or FRUs must be replaced as described below.

Isolation
1. A problem with NVS/IOA cards sent you here. If replacing the FRUs listed in the problem does not repair the NVS problem, replace the I/O drawer planar assembly. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432

426

VOLUME 1, TotalStorage ESS Service Guide

MAP 4550: NVS FRU Replacement


Note: The rear planar, of the I/O drawer planar assembly, connects the two NVS/IOA cards together through their rear bottom card connectors.

MAP 4560: No Valid Subsystem Status Available


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
Global subsystem status (GSS) exists for each Logical Subsystem (LSS). Two copies are kept, each on a separate array. If one copy becomes unavailable, a problem is created and a new second copy is created on a different array if possible. It stays in this new location even after the repair is complete. Normally, when a volume is unavailable, the array it is located on has status of offline or unknown. An LSS can operate on just one GSS copy. If both GSS copies are unavailable, the LSS gives FC status to all ESCON host system requests to its volumes. The LSS gives command rejects and check conditions of internal target failure to all SCSI host system requests to its volumes. There can be one or more problems for each GSS copy that is unavailable. It normally takes two or more failures to prevent the fault tolerant RAID architecture from accessing a particular array (rank). If access to the GSS copies was lost, but the data is still valid, then the repair action should restore access. This will automatically reset the No Valid Subsystem Status condition. If both copies lost the actual GSS data, then the GSS status for that LSS will have to be reset when determined by the next level of support. This can cause customer data loss. There is no one problem that will identify the various combinations of failures that created the condition. Each GSS copy has at least one problem needing repair. There may be other non-related problems needing repair also. An example would be a problem for a DDM replacement on an array and SSA loop not part of the LSS with the condition. Therefore, the isolation procedure below helps you determine the highest priority problem to repair first.

Isolation
1. It is important you read the description section above before proceeding with this isolation procedure. 2. Call your next level of support before going to the next step. 3. Display the pinned data status: From the service terminal Main Service Menu, select: Utilities Menu Pinned Data Menu Display Pinned Data A volume is only displayed if it has pinned data status. The LUA/LSS and SSID are shown for each volume displayed. The display groups volumes having retryable pinned data, non-retryable pinned data and FC (no global subsystem status). v If a volume has FC status, go to the next step. v If a volume has retryable or non-retryable pinned data go to MAP 4520: Pinned Data and/or Volume Status Unknown on page 417.
Problem Isolation Procedures, CHAPTER 3

427

MAP 4560: No Valid Global Subsystem Status


4. Display the status of all arrays (ranks): From the service terminal Main Service Menu, select: Utilities Menu Display Physical and Logical Configuration List all Ranks An array with status of offline or unknown may include one or both GSS volumes. Record any arrays with this status then go to the next step. 5. Determine the SSA loop and DDM bays locations the offline or unknown arrays are part of: From the service terminal Main Service Menu, select: Utilities Menu Display Physical and Logical Configuration List Physical Disks in a Rank At the Select A Rank Name display, find the rank (array) noted in the prior step. Record the drawer and location fields for that rank. A rank can exist on more than one drawer and may appear more than once in the list. Determine the loop name (color) by observing the SSA cables connected to the DDM bay at the location (physical) noted. 6. Display problems needing repair: From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Display the problem details for each problem. Notice the physical location code and/or SSA loop identified. Record the problem ID for any problem related to an array that is offline or unknown. Go to the next step. 7. If an array has more than one problem related to it, use the following priorities: a. First repair a problem that includes an SSA card or SSA cable as a FRU or an isolation procedure for these FRUs. b. Next repair a problem that has an SRN of: 46000 (more than one DDM not available) 48900 (more than one DDM failed) 48950 (array build failed) c. Repair any remaining related problems. 8. After each repair is complete, display the pinned data status. Restoring just one of the two GSS copies will clear the No Valid Subsystem Status Available condition.

MAP 45A0: Pinned Data, Special Case


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
There is a pinned data condition that was detected by: v A 2105 power off, local or remote mode v A cluster resume, part of a service action v A cluster recovery, other cluster rebooted this cluster v A failback to service or a failover to service, specific code actions The pinned data may be retryable or non-retryable. Any of the above conditions will create a problem, and call this map, with an ESC = 38E7.

428

VOLUME 1, TotalStorage ESS Service Guide

MAP 45A0: Pinned Data, Special Case


During a 2105 power off, this condition will prevent the 2105 from powering off. The clusters will not be available for customer use but should be able to do limited service login and PE login actions. No cluster fencing actions will occur. The rack operator panel Line Cord indicator LEDs will stop blinking, indicating that the power off process has been cancelled. During a cluster resume, cluster recovery, failback to service, or failover to service, one cluster was controlling the other cluster that was being rebooted. The controlling cluster has the pinned data and could not complete the return of the other cluster. The other cluster is left fenced, even though it may have no operational problem and no related problems.

Isolation
1. Read the description above. 2. Call your next level of support for specific guidance. Failure to do so may cause unnecessary customer data loss. 3. The next level of support may have you do the following: v Create a product engineering login password so they can do a remote connection and access information not available using the service login. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Call Home / Remote Services Menu Enable Product Engineering Access. v Use the UEPO red switch to power off the 2105. This causes an emergency dump of the data as part of the recovery. v Power on the 2105. v Go to MAP 4520: Pinned Data and/or Volume Status Unknown on page 417. This MAP also uses your next level of support to guide you through identifying the proper order of repair of the related problems that should have been created.

MAP 4600: Isolating a CD-ROM Test Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The CD-ROM drive in one of the clusters is failing.

Isolation
Retry the failing operation with another CD-ROM disk of the same type. Notes: 1. For test media, use the 2105 code/LIC CD-ROM instead of the CD-ROM test disk requested by the CD-ROM drive test. 2. Audio is not used by the 2105, do not run the audio test that uses the audio headset. Is the CD-ROM still failing? v Yes, go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 and replace the CD-ROM drive.
Problem Isolation Procedures, CHAPTER 3

429

MAP 4600: CD-ROM


v No, discard the failing CD-ROM disk and replace it with a new one of the same type.

MAP 4610: Cluster SP, SPCN, or System Firmware Down-Level


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster SP or System Firmware is down-level. This can happen when the I/O drawer planar assembly FRU is replaced and has down-level firmware. On cluster power up, the down-level code is discovered and a problem is created. This occurs even before you have the chance to check and update the firmware per the FRU Replace table in MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. Before firmware can be updated, all problems needing repair must be repaired or cancelled.

Isolation
1. Cancel the problem that sent you to this MAP. From the service terminal Main Service Menu, select: Utility Menu Problem Log Menu Change a Problem State 2. Repair all problems needing repair before going to the next step. From the service processor Main Menu, select: Repair Menu Show / Repair Problems Needing Repair 3. Check and update to the latest level of LIC firmware for the I/O drawer planar assembly and SP. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Multiple LIC Menu Select one of the following: v Concurrent or Nonconcurrent Select one: a. Concurrent or b. Noncurrent. v System Planar / Service Processor Menu

MAP 4620: Isolating a Diskette Drive Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The diskette drive in one of the clusters is failing.

Isolation
Retry the failing operation with a new diskette of the same type. Is the diskette drive still failing?

430

VOLUME 1, TotalStorage ESS Service Guide

MAP 4620: Diskette Drive


v Yes, go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 and replace the diskette drive. v No, discard the failing diskette disk and replace it with a new one of the same type.

MAP 4640: Cluster SP, SPCN, or System Firmware Reload


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster firmware needs to be reloaded, it may be corrupted.

Isolation
1. Find the following that applies: v The cluster hangs with an eight digit system firmware error code displayed on the CEC drawer operator panel. Continue with the next step. v The cluster comes Ready and there is a problem with a firmware error code that calls this MAP. Use MAP 4610: Cluster SP, SPCN, or System Firmware Down-Level on page 430 to reload the firmware (even though the firmware may not be downlevel). 2. The system firmware needs to be reloaded using the Service Processor (SP) menu options and firmware diskettes. Do the following: a. Connect the service terminal to the working cluster and use the Main Service Menu, Repair Menu, Alternate Cluster Repair menu options to quiesce and power off the failing cluster. b. Wait for the CEC drawer operator panel to display OK. c. Connect the service terminal (CE most or laptop) to the S2 port to the cluster being serviced. Press the Enter key to cause a keyboard interrupt to display the SP Main Menu. Note: The Master console cannot be used to access the SP menus. The current system firmware version is displayed. Locate the system firmware diskettes. Select the Service Processor Setup Menu, Reprogram Flash EPROM Menu option. Follow the prompts. The following will be loaded: System Power Control Network (SPCN), service processor, system firmware, run-time abstraction services. When the update completes the service processor will reboot to OK. Connect the service terminal to the working cluster and use the Main Service Menu, Repair Menu, Alternate Cluster Repair menu options to power on and resume the cluster. When the repair is complete, go to MAP 1500: Ending a Service Action on page 67.

d. e. f.

g.

h.

MAP 4670: Cluster Powered Off Unexpectedly


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster powered off unexpectedly.
Problem Isolation Procedures, CHAPTER 3

431

MAP 4670: Cluster Powered Off Unexpectedly

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Observe the CEC drawer operator panel. Are any codes displayed? v Yes, go to step MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v No, continue with the next step. Login to the working cluster and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Are there any problems related to the RPC cards, or failing cluster (CEC or I/O drawer) power or cooling? v Yes, exit this MAP and repair the related problem. v No, continue with the next step. Verify cluster is powered off. Observe the I/O drawer power LED indicator on the upper left of the CEC drawer operator panel. Find the condition that applies: v On solid, continue with the next step. v Blinking slowly, go to step 5. v Off, go to step 5. Observe the CEC drawer power LED indicator on the front lower left of the drawer. Find the condition that applies: v On solid, both drawers of the cluster are powered on normally. Exit this MAP and return to the procedure that sent you here. This appears to be a false error condition. If there is a problem, cancel it or call the next level of support. v Blinking slowly, the CEC drawer did not power up, go to MAP 4880: Cluster Power On Problem on page 461. v Off, the CEC drawer did not power up, go to MAP 4880: Cluster Power On Problem on page 461. Determine if there is a cluster power on problem that may be related to the unexpected cluster power off. Login to the working cluster and attempt to power on the failing cluster using the Alternate Cluster Repair Menu options. Does the cluster power on? v Yes, continue with the next step. v No, exit this MAP and go to MAP 4880: Cluster Power On Problem on page 461. The failing FRU may be the I/O drawer planar assembly. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) to replace the FRU, or call the next level of support.

2.

3.

4.

5.

6.

MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers)


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

432

VOLUME 1, TotalStorage ESS Service Guide

MAP 4700: Cluster FRU Replacement

Description
A problem or MAP isolation procedure has identified one or more cluster FRUs for replacement. Following all steps in this MAP will verify that the FRU is replaced and verified properly.

Procedure
Note: If memory FRUs are being replaced, reference MAP 4160: Isolating Memory Related Error Codes on page 355 before continuing. 1. Is there an existing problem for the cluster FRU or FRUs being replaced? v Yes, display the problem details for all related problems then continue with the next step. v No, continue with the next step. 2. Are you here to replace a single hard drive? v Yes, go to MAP 43B0: Cluster Dual Hard Drive ESC 1xxx on page 398. v No, continue with the next step. (This includes replacement of both hard disk drives.) 3. Are you here to replace only a CEC or I/O drawer power supply (no other cluster FRUs to replace)? v Yes, go to MAP 4890: Replacing a CEC or I/O Drawer Power Supply on page 471. v No, continue with the next step. 4. Connect the service terminal to the cluster that is not being repaired. See Service Terminal Setup in chapter 8 of the Volume 3. 5. Quiesce the cluster being repaired using the service terminal Alternate Cluster Repair menu option. Note: If pinned data is detected during the quiesce, you will be sent to MAP 4520: Pinned Data or FC Status. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Quiesce the Alternate Cluster 6. Was pinned data status detected during the quiesce cluster in the prior step? v Yes, verify that all the actions in MAP 4520: Pinned Data or FC Status were attempted. If you are still unable to Quiesce the cluster normally, contact your next level of support. Get their approval to quiesce the cluster using the Unconditionally Quiesce the Alternate Cluster option, instead of the Quiesce the Alternate Cluster option. This will bypass the check for pinned data. Note: Note. This action may result in a loss of customer data. When the quiesce is complete, continue with the next step. v No, go to step 10 on page 434. 7. Was the original pinned data status non-retryable? v Yes, continue with the next step. v 8. Is v v No, go to step 10 on page 434. an NVS card FRU being replaced? Yes, continue with the next step. No, go to step 10 on page 434.
Problem Isolation Procedures, CHAPTER 3

433

MAP 4700: Cluster FRU Replacement


9. The cluster must be prepared for the NVS to be repaired. From the service terminal Main Service Menu, select: Utility Menu Pinned Data Menu Pinned Data NVS Repair Continue with the next step. 10. Power off the cluster being repaired using the service terminal Alternate Cluster Repair menu option. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Power Off the Alternate Cluster 11. Locate the two CEC and two I/O power supplies associated with the cluster you are servicing. Unplug both input power cables from each of these CEC and I/O drawer power supplies. Note: There are a total of eight input power cables for the four power supplies associated with each cluster. See figure Figure 137.

(R1-)
Power Cable Connectors 2 per power supply

PWR

1 2 J1 J2

Cluster 2 cpcluster1

Cluster 1 cpcluster0

CL-2 P/S 1 I/O-2 P/S 1

CL-2 P/S 2 I/O-2 P/S 2

CL-1 P/S 1 I/O-1 P/S 1

CL-1 P/S 2 I/O-1 P/S 2

Power Supply

ON CHK/ PWR-GOOD

OFF

Rear View
Figure 137. Power Supply Connector Locations (s009710)

12. Slide the CEC or I/O drawer to be repaired into the service position. Open the CEC or I/O drawer top cover. Reference the correct cluster drawer repair procedure, in chapter 4 volume 2 of this book, see:

434

VOLUME 1, TotalStorage ESS Service Guide

MAP 4700: Cluster FRU Replacement


v 2105 Model 800, CEC Drawer Service Position Procedure, 2105 Model 800 CEC Drawer in chapter 4 of the Volume 2 or v 2105 Model 800, I/O Drawer Service Position Procedure, 2105 Model 800 in chapter 4 of the Volume 2 13. Replace the cluster CEC or I/O drawer FRU or FRUs. Use the table of contents in the front of FRU Removal and Replacement Procedures in chapter 4 of the Volume 2 book, to find the replacement procedures. After the repair, return here and continue with the next step. Note: If replacing more than one FRU, read and do all the actions for each FRU before completing this MAP. 14. Close the CEC or I/O drawer top cover then slide the drawer into the operating position. 15. Connect the eight input power cables to the CEC and I/O drawer power supplies for the cluster your are servicing. 16. Did you replace the I/O drawer planar or I/O drawer planar assembly battery: v Yes, replacing the I/O drawer planar assembly or I/O drawer planar assembly battery affects the NVRAM service terminal connection serial port settings. a. Power on the cluster being repaired using the Alternate Cluster Repair menu option. As soon as the cluster begins to power up, immediately continue with this procedure. b. Connect the service terminal cable to the S1 port on the cluster being repaired and then logically connect the service terminal. Each time the service terminal logical connection drops, you must quickly reconnect it. c. Respond to the message requesting you to enter a 1 to define this port as the unused system console. (The prompt from the system firmware may say CONSOLE, but for the 2105 Model 800, you will use the service terminal instead.) The cluster code load will then continue. You can now connect the service terminal to the S2 port. Note: When the I/O drawer planar assembly or I/O drawer planar assembly battery is replaced, the NVRAM memory will be reset and the system console port will not be set. Shortly after cluster power on, the S1 and S2 ports each will attempt to display a prompt to allow that port to be defined as the system console port. (The system console port is not used. However if the port is not defined, each power on code load will take one additional minute as it times out waiting for the port to be defined. The cluster code load will complete successfully in either case.) The prompt is only displayed if the service terminal is already connected to the proper port. The service terminal must be connected to the S1 port. The prompt will display and then a 1 will be entered. After this, the NVRAM settings will use only port S2 for the service terminal. If you do not respond quickly enough to define the port, you can repeat the cluster power on to have another chance. d. The boot list settings should already be at the correct default values. If they are not, the cluster power on will stop with a firmware error code displayed. Find the displayed code in chapter 9 for instructions how to correct the settings. e. Go to step 19 on page 436. v No, continue with the next step.
Problem Isolation Procedures, CHAPTER 3

435

MAP 4700: Cluster FRU Replacement


17. Did you replace both hard disk drives? v Yes, go to MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320, to reload all the cluster software. MAP 4020 will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive. v No, continue with the next step. 18. Power on the cluster being repaired using the Alternate Cluster Repair menu option. Continue with the next step. 19. Wait up to 45 minutes for the rack operator panel Cluster Ready LED to be lit. Connect the service terminal to the cluster being repaired and attempt to login. Notes: a. Replacing an I/O drawer FRU that requires the NVS battery cables be disconnected from the NVS battery charger card or cards may extend up to two hours the time to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity. b. If there is still a problem, Ready may not display, the CEC drawer operator panel may be hung displaying a progress code. If any of the automatic cluster firmware updates are needed, it will extend the time to come Ready for Login. Connect the service terminal to the cluster being repaired and attempt to login. Was the service terminal able to login to the cluster being repaired? v Yes, continue with the next step. v No, wait for the cluster to come ready, see Note above. If the cluster hangs displaying a code, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. If the cluster still does not display Ready, connect the service terminal to the cluster not being repaired and show and repair and new related problems. If there are no new related problems call the next level of support. 20. Determine if the repair was successful: From the service terminal Main Service Menu, select: Repair Menu Show/Repair Problems Needing Repair - If there is a new related problem, repair it now. After the repair, return to this step. View the problem details First Occurrence timestamp field. Verify that the problem was just created. - If there is no new related problem, continue with the next step. 21. Some FRUs need additional tests to ensure they work properly. For each FRU replaced, go to Table 71 on page 437 or Table 72 on page 439 and do any additional actions listed.

436

VOLUME 1, TotalStorage ESS Service Guide

MAP 4700: Cluster FRU Replacement


Table 71. CEC Drawer FRU Replacements Cluster FRU v CEC drawer hard disk drive Description and Action Description: No additional verification needed. The hard disk drive is normally replaced concurrent with customer operation on both clusters using the Cluster Dual Hard Disk Drive Menu options. v If a single hard disk drive was not replaced using those menu options, manually restore the mirroring. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Menu Restore Mirroring after a Cluster Hard Disk Drive Replacement v If both hard disk drives were replaced, go to MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320 to reload all the cluster software. MAP 4020 will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive. v CEC drawer CD-ROM drive v CEC drawer SCSI signal cables Description: Verify CD-ROM drive. Action: 1. Run CD-ROM drive diagnostics. v Connect service terminal to the cluster being repaired. From the service terminal Main Service Menu, select: Machine Test Menu CD-ROM drive Notes: a. b. For test media, use the 2105 code/LIC CD-ROM instead of the CD-ROM test disk requested by the CD-ROM drive test. Audio is not used by the 2105, do not run the audio test that uses the audio headset.

2. Go to step 22 on page 441.

Problem Isolation Procedures, CHAPTER 3

437

MAP 4700: Cluster FRU Replacement


Table 71. CEC Drawer FRU Replacements (continued) Cluster FRU v CEC drawer planar assembly v CEC drawer processor card v CEC drawer memory card v CEC drawer memory DIMM Description and Action Description: Verify processors and memory. Action: 1. Display memory. v Connect service terminal to the cluster being repaired. From the service terminal Main Service Menu, select: Install/Remove Menu Cluster Memory Menu List Installed Cluster Memory Verify that both clusters list the same amount of Total Installed and Available Memory. If not, then recheck the cluster for loose or missing memory cards or memory card modules. 2. Display CPUs. v Connect service terminal to this cluster. From the service terminal Main Service Menu, select: Configurations Options Menu Show Storage Facility Resources Menu Show Storage Facility Resources v Scroll down and verify that resources proc0, proc2, proc4 and proc6 are all shown as Available for a 4-way cluster. For a 6-way cluster, verify that proc8 and proc10 are also shown. (If unsure of 4-way or 6-way, login to the working cluster and check it.) If all the processors are not displayed as available, then recheck the cluster for a loose or missing CEC drawer processor card. 3. Go to step 22 on page 441. CEC drawer power supply Description: No additional verification needed. Action: The CEC drawer power supply can be replaced concurrent with customer activity on the cluster. When the problem has ESC=5300, it is combined with cluster FRUs that must be replaced using this MAP. Replace the power supply when the other FRUs are replaced. No additional verification. Then go to step 22 on page 441. v CEC drawer operator panel or EEPROM Description: No additional verification. Action: The EEPROM on the operator panel has unique vital product data (VPD) that includes the 2105 Model 800 serial number and cluster ID. The operator panel/EEPROMs cannot be swapped from cluster to cluster. The EEPROM from the old operator panel should be moved to the new operator panel FRU. If the new operator panel FRU still fails, then the old EEPROM might be failed. Reinstall the EEPROM that came on the new operator panel FRU. The new EEPROM will not have valid VPD. You must call the next level of support for the procedure to enter the unique VPD for your cluster. 1. If the old EEPROM module was swapped to the new CEC drawer operator panel, go to step 22 on page 441. 2. If the old EEPROM module was not swapped to the new CEC drawer operator panel, call the next level of support for procedure to update the Vital Product Data. After the VPD has been loaded, go to step 22 on page 441.

438

VOLUME 1, TotalStorage ESS Service Guide

MAP 4700: Cluster FRU Replacement


Table 71. CEC Drawer FRU Replacements (continued) Cluster FRU CEC drawer to I/O drawer external cables: v V/S comm. v JTAG v RIO-0 v RIO-1 Table 72. I/O Drawer FRU Replacements Cluster FRU v I/O drawer diskette drive v I/O drawer diskette drive signal cable Description and Action Description: verify diskette drive. Action: 1. Run diskette drive diagnostics. v Connect the service terminal to the cluster being repaired. From the service terminal Main Service Menu, select: Machine Test Menu Diskette Drive Note: A test diskette is part of the ship group and should be stored in the document enclosure. 2. Go to step 22 on page 441. v I/O drawer planar assembly Description: The time of day is automatically restored by the cluster power on and code load when communication is established with the other cluster. However additional verification tests are needed. Note: Replacing the I/O drawer planar assembly will require that the NVS battery cables be disconnected from the NVS battery charger cards. When these cables are reconnected, the cluster power on may take up to two hours to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity. Action: 1. Verify the correct level of LIC firmware is on the I/O drawer planar assembly. Connect the service terminal to the working cluster. Use the service terminal Main Service Menu, Licensed Internal Code Maintenance Menu, Multiple LIC Activation, Concurrent, SVP Service Processor / System Planar Activation option to check and update the level if needed. Note: The service processor function is integrated in the 2105 Model 800 I/O drawer planar assembly. 2. Verify customer e-mail notification. Use procedure in Action column of this table for the Ethernet 10Base-T Cable. 3. Verify modem and expander connection (if installed). Use procedure in Action column of this table for the serial interface cable (S3 port). 4. Go to step 22 on page 441. v I/O drawer planar assembly battery Description: The time of day is automatically restored by the cluster power on and code load when communication is established with the other cluster. No additional verification. Note: Replacing the I/O drawer planar assembly battery will require that the NVS battery cables be disconnected from the NVS battery charger cards. When these cables are reconnected, the cluster power on may take up to two hours to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity. Action: Go to step 22 on page 441.
Problem Isolation Procedures, CHAPTER 3

Description and Action Description: No additional verification needed.

439

MAP 4700: Cluster FRU Replacement


Table 72. I/O Drawer FRU Replacements (continued) Cluster FRU v I/O drawer fan controller card Description and Action Description: No additional verification: Action: Go to step 22 on page 441. v I/O drawer NVS battery charger card Description: No additional verification: Note: Replacing the NVS battery charger card will require that the NVS battery cables be disconnected from them. When these cables are reconnected, the cluster power on may take up to two hours to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity. Action: Go to step 22 on page 441. v I/O drawer RIO card assembly Description: No additional verification: Note: Replacing the RIO card will require that the NVS battery cable be disconnected from the NVS battery charger card. When this cable is reconnected, the cluster power on may take up to two hours to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity. Action: Go to step 22 on page 441. v I/O drawer SSA card Description: No additional verification: Note: Replacing the SSA device cards in slots 11 or 12 require that the NVS battery cable be disconnected from the NVS battery charger card. When this cable is reconnected, the cluster power on may take up to two hours to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity. Action: Go to step 22 on page 441. v I/O drawer NVS/IOA card Description: No additional verification: Note: Replacing the NVS/IOA card in slot 9 requires that the NVS battery cable be disconnected from the NVS battery charger card. When this cable is reconnected, the cluster power on may take up to two hours to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity. Action: Go to step 22 on page 441. v I/O drawer NVS battery assembly Description: See action below. Note: Replacing the NVS battery assembly will require that the NVS battery cables be disconnected from the NVS battery charger cards. When these cables are reconnected, the cluster power on may take up to two hours to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity. Action: 1. The NVS battery assembly installation date needs to be entered into the functional code any time a new NVS battery assembly is installed. Use the service terminal Main Menu, Utility Menu, Battery Menu, Update Battery Installation Date option. (This is used to create error logs in the future to replace these batteries before they exceed their expected life.) 2. Go to step 22 on page 441. v I/O drawer serial interface cable (S1 Description: No additional verification. The S2 port has been tested while using the service terminal connected to this cluster. The S1 port is not used. and S2 ports) Action: Go to step 22 on page 441.

440

VOLUME 1, TotalStorage ESS Service Guide

MAP 4700: Cluster FRU Replacement


Table 72. I/O Drawer FRU Replacements (continued) Cluster FRU Description and Action

v I/O drawer serial interface cable (S3 Description: Verify the connection to the modem and expander (if installed). port) Action: 1. Verify modem and expander connection. v Connect the service terminal to the S2 port of this cluster. From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Service Notification (via modem) 2. Go to step 22. v I/O drawer ethernet 10Base-T cable Description: Test the ethernet connection to the other cluster. Action: 1. To test the ethernet connection to the other cluster: v Connect the service terminal to the S2 port of this cluster. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Verify that the problems status is displayed for both clusters. 2. Go to step 22. v I/O drawer ethernet AUI cable Description: The AUI connection if not used for the 2105 Model 800. Action: None v I/O drawer power supply Description: No additional verification needed. Action: The I/O drawer power supply can be replaced concurrent with customer activity on the cluster. When the problem has ESC=5300, it is combined with cluster FRUs that must be replaced using this MAP. Replace the power supply when the other FRUs are replaced. No additional verification. Then go to step 22. CEC drawer to I/O drawer external cables: v V/S comm. v JTAG v RIO-0 v RIO-1 Description: No additional verification needed.

22. Verify that the cluster being repaired has come ready by connecting the service terminal to the cluster and attempting to login. The time to come ready will be increased if any cluster firmware updates are needed. The updates occur automatically during the cluster IML. Was the service terminal able to login to the cluster being repaired? v Yes, continue with the next step. v No, wait for the cluster to come ready. If the cluster hangs displaying a code, go to MAP 4360: Isolation Using Codes Displayed by the CEC

Problem Isolation Procedures, CHAPTER 3

441

MAP 4700: Cluster FRU Replacement


Drawer Operator Panel on page 371. If the cluster still does not come ready, display and repair any new related problems or call the next level of support. 23. Connect the service terminal back to the cluster not being serviced. Resume the cluster being repaired using the Alternate Cluster Repair Menu option. Note: Resuming a cluster that is not yet ready could corrupt an automatic firmware update that is in progress causing a long service action. 24. Close the problem for the cluster FRU when the repair is complete. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem Go to the next step. 25. If retryable pinned data was present during the original quiesce, display the pinned data status again. Is the retryable pinned data status still shown? v Yes, repair related problems still needing repair. If there are no related problems, call the next level of support. v No, continue with the next step. 26. Go to MAP 1500: Ending a Service Action on page 67.

MAP 4710: Isolating a DDM LIC Update Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A failure was detected when new disk drive module (DDM) licensed internal code was being downloaded to the DDMs. Note: The term download means the same as update. One of the following error conditions could have been detected: v SSA card is not in the proper state. v Unable to check the array status. v Arrays are not in a the proper state. v DDM diagnostic failed for pdiskXX. v Download failed for pdiskXX. v The download process took too long and timed out. The DDM code download process includes the following: v The new DDM code is included on the 2105 LIC Code update CD-ROM. v The LIC update process copies the code from the CD-ROM to the cluster. v The DDM download process is started using the service terminal Disk Drive Module (DDM) LIC Menu options. It automatically runs to one DDM at a time. It runs the DDM diagnostics, then loads the new code, then runs the DDM diagnostics again. If the diagnostics and code load are successful, the process is repeated on the next DDM, until every DDM is complete.

442

VOLUME 1, TotalStorage ESS Service Guide

MAP 4710: DDM LIC Update


v If a DDM diagnostic or DDM code update fails, a problem is created. The DDM that failed will also be recorded in the DDM code update status. The remaining DDMs will not have been downloaded yet. v After the DDM is repaired, the DDM download process needs to be started again. The service terminal DDM Download Restart option will cause the cluster to start with the first DDM and check each one until it finds the DDM that was repaired. If the diagnostics and download are successful this time, the process will continue to download the remaining DDMs, one at a time.

Isolation
1. Read the description section above. 2. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Look for related problem (SSA or drawer FRUs). From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair v If there are no related problems, call the next level of support. v If there are related problems, fix them and then return here and continue with the next step. 3. Use the DDM Download Restart option to complete the DDM download process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Disk Drive Module (DDM) LIC Menu DDM Download Restart

MAP 4720: Host Bay Fails to Power Off


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
Each host bay drawer power supply: v Has two output boundaries, one for each host bay. v Receives control and status request commands from each RPC card through an RS-485 interface. v Must receive power off status from both RPC cards to switch off host bay power (if one of the RS-485 interfaces is not operational, the power supply will power off if it receives a command from the operational interface).

Isolation
Show and repair any problems with RPC card or Host Bay Drawer power supplies. If there are no related problems, continue at the next step. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair 2. Verify the following cables are correctly connected to both Host Bay Drawer power supplies: v Power control cables are correctly plugged into both host bay power supply RJ45 connectors J14 and J15.
Problem Isolation Procedures, CHAPTER 3

1.

443

MAP 4730: Host Bay Fails to Power Off


v Power control cables are correctly plugged into both RPC card connector J2 port 9 or 13. 3. Verify that the failing host bay has been quiesced before continuing. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Show Quiesced Resources (Quiesce a Resource) 4. Observe the HA1/HA2 power indicator LED for the failing host bay on each power supply. Note: LED HA1 is for host bay 1 or 3, LED HA2 is for host bay 2 or 4. Find the condition that applies: v LED On for both power supplies. Go to step 6. v LED On for one power supply and LED Off for the other power supply. Go to the next step. v LED Off for both power supplies. Host bay is powered off, this is a false error. Exit this MAP and return to the service action that sent you here. 5. Replace the host bay power supply with the LED stuck On for the failing host bay. Then return to the original service action and attempt to power off the host bay. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU (Container Name - IBM ESS Model 800) 6. Isolate if RPC-1 card is preventing the host bay from powering off. Do the following steps: a. Quiesce RPC-1 card: From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource (R1-G1 Rack Power Control Card) b. Find the host bay you are repairing: v Host bay 1 or 2, beneath cluster 1 (left): Follow the power control cable from the RPC-1 card connector J2 port 13 to both host bay drawer power supplies. Unplug the cable from both power supplies. v Host bay 3 or 4, beneath cluster 2 (right): Follow the power control cable from the RPC-1 card connector J2 port 9 to both host bay drawer power supplies. Unplug the cable from both power supplies. c. Attempt to power off the host bay: From the service terminal Main Service Menu, select: Utility Menu Cluster Power Off / On Power Off a Host Bay

444

VOLUME 1, TotalStorage ESS Service Guide

MAP 4730: Host Bay Fails to Power Off


v If the host bay powers off, the possible failing FRUs are the RPC-1 card or the power control cable from RPC-1. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU (IBM ESS Model 800) (R1-G1 Rack Power Control Card) v If the host bay does not power off, reconnect the power control cables and resume the RPC-1 card. Continue with the next step. 7. Isolate if RPC-2 card is preventing the host bay from powering off. Do the following steps: a. Quiesce RPC-2 card: From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource (R1-G2 Rack Power Control Card) b. Find the host bay you are repairing: v Host bay 1 or 2, beneath cluster 1 (left): Follow the power control cable from the RPC-1 card connector J2 port 13 to both host bay drawer power supplies. Unplug the cable from both power supplies. v Host ay 3 or 4, beneath cluster 2 (right): Follow the power control cable from the RPC-1 card connector J2 port 9 to both host bay drawer power supplies. Unplug the cable from both power supplies. c. Attempt to power off the host bay: From the service terminal Main Service Menu, select: Utility Menu Cluster Power Off / On Power Off a Host Bay v If the host bay powers off, the possible failing FRUs are the RPC-2 card or the power control cable from RPC-2. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU (IBM ESS Model 800) (R1-G2 Rack Power Control Card) v If the host bay does not power off, reconnect the power control cables and resume the RPC-2 card. Continue with the next step. 8. If you must power off the host bay to continue a service action consider the following: v The host bay drawer power supply output to the failing host bay cannot be forced off. If you unplug the host bay with power on, the connector contacts may arc and be damaged.

Problem Isolation Procedures, CHAPTER 3

445

MAP 4730: Host Bay Fails to Power Off


v If the customer will give you the other host bay in the same host bay drawer, quiesce it so that both host bays are quiesced. Force host bay drawer power off by uplugging the four power input cables to the host bay drawer power supplies. v Replace the needed FRUs, then plug the power input cables. v Power on both host bays. v Attempt to power off the original failing host bay. If it still fails call the next level of support. Resume one or both host bays to customer use if the power repair can or will be rescheduled.

MAP 4730: Cluster Power Off Request Problem


Attention: Perform this procedure only at the direction of the service terminal or other service guide procedures. Failure to follow this attention can cause customer operations to be disrupted.

Description
Cannot power off both clusters

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The service terminal utility menu options were used to attempt to power off a cluster, but the other cluster was already powered off. Only one cluster may be powered off at a time. Power on the other cluster. Then connect the service terminal to the other cluster and use the Alternate Cluster Repair menu option to power off this cluster.

MAP 4760: Recovering from Corrupted Files or Functions


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A cluster file (dataset) or function is corrupted. If this has affected customer operations, a separate problem should have been created. In many cases, customer operations will not be affected. Only Processes and/or files used by the RAS (maintenance package) processes may be affected. There are three recommended actions: v The cluster can be quiesced, powered off and on, then resumed. This reloads the code into the cluster which might clear a hung process. If the failure is still present, then the next action is needed. v The code is reloaded onto the cluster hard disk drives. An important part of this process is the saving and restoring of the configuration and customization files. This allows the cluster to restore access to the customer data after the process is complete. If the failure is still present, then the next action is needed. v The next level of support is contacted. They can login through the modem and do functions similar to that of an AIX system administrator.

Isolation
1. Does the problem ESC = 38F5?

446

VOLUME 1, TotalStorage ESS Service Guide

MAP 4760: Corrupted Files or Functions


v Yes, call the next level of support. There is a mismatch of the code levels on the clusters during the multi-LIC update. v No, continue with the next step. 2. Read the description section above. 3. Reload the cluster code by quiescing, powering off, powering on and then resuming the cluster. Connect the service terminal to the cluster that does not have the problem. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Quiesce the Alternate Cluster Note: If the resume fails, that may need to be repaired before continuing with this MAP. You may need to call the next level of support if this happens. 4. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The original problem may have been updated if the problem is still occurring. The time stamp in the Last Occurrence field will be updated from the original Last Occurrence. It is also possible that a new related problem may have been created. v If an error was not detected during the power up and resume, then the original condition may be gone. If you are not sure, go to the next step to rebuild the hard disk drive with new code. If you believe the problem is no longer occurring, go to MAP 1500: Ending a Service Action on page 67. v If an error was detected, continue with the next step. 5. Go to MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320, to reload all the cluster software. That MAP will reload all code on one hard disk drive, and then automatic mirroring will restore the other hard disk drive. Then return here and continue when the build process is complete. 6. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The original problem may have been updated if the problem is still occurring. The time stamp in the Last Occurrence field will be updated from the original Last Occurrence. It is also possible that a new related problem may have been created. v If an error was not detected after the hard disk drive rebuild, then the original condition has probably been corrected. If you believe the problem is no longer occurring, go to MAP 1500: Ending a Service Action on page 67. v If a related error was detected, continue with the next step. 7. Call the next level of support.

MAP 4780: Isolating a Functional Code Not Running Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The cluster functional code was not loaded during the last cluster power on. Only the AIX operating system and RAS (maintenance package) code was loaded. The service terminal can login to the failing cluster because it only requires the RAS code.

Problem Isolation Procedures, CHAPTER 3

447

MAP 4780: Functional Code Not Running


This most commonly occurs when both clusters are powering on and loading code, and one cluster has an unrecoverable error. The other cluster powers the failing cluster off then on in an attempt to recover from the error. This recovery action is repeated up to two times. On the second attempt, the failing cluster is fenced with its functional code not loaded. This can also occur if a fenced cluster is rebooted or powered off and on without first being quiesced with the Alternate Cluster Repair Menu. If both clusters are in this condition, it is possible that both RPC cards are in an incorrect logical state. Resetting the RPC card may clear this condition.

Isolation
1. Verify that no diskette is in the failing clusters diskette drive. v If there is not a diskette, continue with the next step. v If there is a diskette, remove it and repeat the operation that failed. Note: The norsStartOnce diskette used when directed by the next level of support can create this condition. 2. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there any other related problem for the failing cluster? v Yes, exit this MAP and repair the related problem. v No, continue with the next step. 3. Do both clusters have a problem that calls this MAP? v Yes, continue with the next step. v No, go to step 6. 4. There may be a false error condition in the rack power control cards that can be reset. a. Power Off the 2105 Model 800. b. Switch the System Power AC circuit breaker on both primary power supplies to Off (down). c. Wait until the green Power Control Good indicators on both rack power control cards are off. It takes up to 30 seconds for the logic voltage supplied to the rack power control cards to discharge. d. Switch the System Power AC circuit breaker on both primary power supplies to On (up). e. Power On the 2105 Model 800, then continue with the next step. 5. Wait more than the normal amount of time for the customer operator panel Cluster 1 and 2 Ready indicators to come on solid. A failing cluster may attempt to load its code up to three times before it posts an error. Each code load attempt may take 10 to 20 minutes. v If both clusters come ready, go to MAP 1500: Ending a Service Action on page 67. v If a cluster hangs and displays a code on its operator panel, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v If a cluster does not come ready, attempt to log in, display, and repair any new related problems. If there are no new related problems, call the next level of support. 6. Only one cluster has a problem that sent you to this MAP. Verify that the other Cluster Ready indicator on the rack operator panel is On.

448

VOLUME 1, TotalStorage ESS Service Guide

MAP 4780: Functional Code Not Running


Is the Cluster Ready indicator for the other cluster On? v Yes, continue with the next step. v No, display and repair any problem for the other cluster first. If there are none, call the next level of support. 7. From the cluster that is ready, attempt to clear the failing cluster by quiescing, powering off, and powering on the failing cluster: Connect the service terminal to the cluster that is not failing. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Quiesce the Alternate Cluster Power Off the Alternate Cluster Power On the Alternate Cluster 8. Wait for the customer operator panel Cluster Ready indicator to come on. Note: If the failure is still occurring, the cluster may attempt to load its code up to three times before it reaches the reboot threshold. Find the condition that applies: v If the cluster comes ready, the original problem was intermittent. Call the next level of support, PFE may want to offload engineering information before you continue. Then continue with step9. v If the cluster hangs displaying a code on its operator panel, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v If the cluster does not come ready, attempt to log in, display, and repair any new related problems. If there are none, call the next level of support 9. Do the following: a. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Repair any new related problems for the failing cluster. If none are found, continue. b. Resume the cluster using the Alternate Cluster Repair menu options from the working cluster. c. Use the Repair Menu, Close a Previously Repaired Problem for the original problem. d. Use MAP 1500: Ending a Service Action on page 67.

MAP 47A0: Cluster Fails to Power Off


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Do not power off the 2105 Model 800 unless instructed to do so.

Description
The cluster can be powered off three ways, the service login options, the rack operator panel local power switch, and the RPC card push-buttons. The push-buttons and local power switch circuits are directly connected to the RPC cards. The service login communicates to the RPC cards from the login cluster, normally the cluster not being serviced. Both RPC cards must receive the cluster power off request and agree that they have received it. When they agree, they request the service processor in the cluster being serviced, to begin the cluster
Problem Isolation Procedures, CHAPTER 3

449

MAP 47A0: Cluster Fails to Power Off


power off sequence. First the cluster code is shutdown, then the cluster SPCN (system power control network) requests the CEC drawer and I/O drawer power supplies to switch off. If one RPC card is fenced or quiesced, the other RPC card can request a cluster power off by itself. This allows normal power off actions when one RPC card is failing and has not been replaced. The SPCN code will detect power problems within the CEC and I/O drawers and report them to the service processor and RPC cards. This will create a problem.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Show and repair any problems with RPC card, CEC drawer or I/O drawer power FRUs, then retry the cluster power off. Note: If there are no related problems, continue with the next step. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair 2. Verify the following cables are correctly connected to both I/O drawers and both RPC cards: v I/O drawer power control cables are correctly plugged into the I/O drawer RJ45 connectors P2 and P3 (drawer front middle right). v I/O drawer power control cables are correctly plugged into each RPC card. RPC card connector J2-11 (for cluster 2) or J2-15 (for cluster 1). Note: When using the service terminal to do Alternate Cluster repair and cluster power off, both clusters must be able to communicate successfully with both RPC cards. 3. Verify that the failing cluster has been quiesced before continuing. Connect the service terminal to the cluster not being serviced. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource (Cluster Bay 1 is for cluster 1, Cluster Bay 2 is for cluster 2) 4. Determine if an I/O drawer power supply is stuck on. Observe the CHK/PWR GOOD power indicator LED on each I/O drawer power supply for the failing I/O drawer (rear of rack). Is one power supply LED blinking slowly and the other power supply LED on solid? v Yes, the I/O drawer power supply with the LED on solid is preventing the I/O drawer from powering off properly. The possible failing FRUs are: the I/O drawer power supply with the LED on solid, replace that power supply. Use the Repair Menu, FRU Replace Menu, IBM ESS Model 800 options. If the problem still occurs, call the next level of support before replacing the I/O Drawer Planar Assembly. v No, continue with the next step.

450

VOLUME 1, TotalStorage ESS Service Guide

MAP 47A0: Cluster Fails to Power Off


5. RPC card 1 or 2 may be preventing the cluster from powering off. Isolate if RPC-1 card is preventing the cluster power off. Do the following steps: a. Quiesce RPC-1 card: From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource (R1-G1 Rack Power Control Card) b. Attempt to power off the cluster: From the service terminal Main Service Menu, select: Utility Menu Cluster Power Off /On Power Off a Cluster Bay v If the cluster does not power off, resume RPC-1 card and continue with step 6. v If the cluster powers off, replace the RPC-1 card that is quiesced: (You can power on and resume the cluster to customer use before or after replacing the RPC-1 card. ) From the service terminal Main Service Menu, select: Repair Menu Replace a FRU (R1 IBM ESS Model 800) (R1-G1 Rack Power Control Card) 6. Isolate if RPC-2 card is preventing the cluster power off. Do the following steps: a. Quiesce RPC-2 card: From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource (R1-G2 Rack Power Control Card) b. Attempt to power off the cluster: From the service terminal Main Service Menu, select: Utility Menu Cluster Power Off /On Power Off a Cluster Bay v If the cluster does not power off, resume RPC-2 card and continue with step 7 on page 452. v If the cluster powers off, replace the RPC-2 card that is quiesced: (You can power on and resume the cluster to customer use before or after replacing the RPC-2 card. ) From the service terminal Main Service Menu, select: Repair Menu Replace a FRU (R1 IBM ESS Model 800) (R1-G2 Rack Power Control Card)

Problem Isolation Procedures, CHAPTER 3

451

MAP 47A0: Cluster Fails to Power Off


Attempt to power off the cluster using the RPC card push-buttons. Each RPC card has two push-buttons, the top one for Cluster 1 and the bottom for Cluster 2. The cluster push-buttons on each RPC card must be pushed within 8 seconds of each other maximum. Verify that you power off the cluster that is quiesced. Did the cluster power off? v Yes, the cluster only fails to power off using the service login from the other cluster. The cluster power off problem may also cause a cluster power on problem. Attempt to power on the cluster using the Utility Menu, Cluster Power Off / On options: If it does not power on, go to MAP 4880: Cluster Power On Problem on page 461. If it does power on, call the next level of support for help with the original power off problem. v No, continue with the next step. 8. Force the cluster power off. For the failing cluster, unplug both input power cables from each I/O drawer power supply and CEC drawer power supply. The cluster will power off! 9. Plug the input power cables back in. Attempt to power on the cluster using the Utilities Menu, Cluster Power Off / On option: v If it does not power on, go to MAP 4880: Cluster Power On Problem on page 461. v If it does power on, call the next level of support for help with the original power off problem. 7.

MAP 4810: Unexpected Host Bay Power Off


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A host bay can unexpectedly lose power in two ways: v The host bay drawer power supplies are operating correctly, but the power is not reaching the host bay planar or the host bay planar is failing. The power supply HA1 and HA2 LED indicators only indicate if the power supply outputs are switched on. They do not indicate if the host bay is receiving power. v The host bay drawer power supplies are not operating correctly and the HA1 and HA2 indicators may be off. The 2105 code can detect the host bay power being off when the clusters cannot communicate with the host bay logic through the CPI interfaces. The host bay planar assembly receives bulk voltage from the host bay drawer power supplies. It then converts the bulk voltage into logic voltages. Note: The are four LEDs indicators on the host bay planar. They are located at the front of the planar to the right of slot 4. They can be seen by looking at an angle through the front sheet metal cooling air holes, see Figure 138 on page 453. The functions of the four LEDs are: v First (front) LED, host bay planar power is on. v Second LED, remote FPGA chip updated from flash properly during power on.

452

VOLUME 1, TotalStorage ESS Service Guide

MAP 4810: Unexpected Host Bay Power Off


v Third LED, local FPGA chip updated from flash properly during power on. v Fourth (rear) LED, PCI Arbiter chip updated from flash properly during power on.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Display and repair any related problems for the host bay drawer power supplies or RPC cards. Then return here and continue with the next step. 2. Observe the four LEDs on the failing host bay planar assembly, see the note in the description section of this MAP. Use a working host bay to verify you know where to look.

4 Host Bay Planer LEDs

Figure 138. Host Bay Planar LED Indicator Location (s009643)

Observe the HA LED for the failing host bay on both host bay drawer power supplies: v HA1 LED is for host bay 1 or 3. v HA2 LED is for host bay 2 or 4.

Problem Isolation Procedures, CHAPTER 3

453

MAP 4810: Unexpected Host Bay Power Off

R1

Host Bay Drawer Power Supply

HA LEDs

Switch

Rear View
Figure 139. Host Drawer Power Supply HA LED Indicator Location (s009644)

Find the condition that applies:


Table 73. Host Bay LEDs Host Bay Planar LEDs All four LEDs off All four LEDs off One or more LEDs on HA LED on Each Power Supply (PS) HA LED off for both PS HA LED on for one or both PS HA LED on for one or both PS Action Go to step 3 Go to step 4 Go to step 5 on page 455

3. Both power supplies are not supplying power to the host bay. Do the following until the problem is repaired: v Check for an overcurrent condition. Use step 7 on page 455. v Check for a failing power supply that prevents the other power supply from powering on. Use step 8 on page 456. v Replace one or more of the possible failing FRUs. Host bay planar assembly, use the problem or Repair Menu, Replace a FRU option. Host bay drawer backplane, use step 9 on page 456. When the problem is repaired, go to step 10 on page 456. 4. The host bay is being sent power by at least one host bay drawer power supply. The host bay may not be receiving bulk voltage, or the host bay planar assembly may not be making logic voltages properly. Do the following until the problem is repaired: v Check for damaged auto-docking power connectors and proper host bay seating. Use step 6 on page 455.

454

VOLUME 1, TotalStorage ESS Service Guide

MAP 4810: Unexpected Host Bay Power Off


v Replace one or more of the possible failing FRUs. Host bay planar assembly, use the problem or Repair Menu, Replace a FRU option. Host bay drawer backplane, use step 9 on page 456. When the problem is repaired, go to step 10 on page 456. 5. The host bay is able to power on properly, however it is not able to communicate with either cluster through the local and remote CPI interfaces. Replace the host bay planar assembly. Use the problem or Repair Menu, Replace a FRU option. When the problem is repaired, go to step 10 on page 456. 6. Inspect and reseat the host bay auto-docking power connectors. Do the following: a. Quiesce the failing host bay, use: Utility Menu, Resource Management Menu, Quiesce a Resource. b. Power off the failing host bay, use: Utility Menu, Host Bay Power Off/On, Power Off a Host Bay. c. Slide the host bay out to the service position and inspect the auto-docking power connector contacts. Was any damage found? v Yes, go to step 4 on page 454 and continue with replacing the damaged FRU. v No, go to step 4 on page 454 and continue with the remaining actions. 7. An overcurrent condition can cause both power supplies to switch off their output power to the failing host bay. To check for an overcurrent condition: a. Quiesce the failing host bay, use: Utility Menu, Resource Management Menu, Quiesce a Resource. b. Power off the failing host bay, use: Utility Menu, Host Bay Power Off/On, Power Off a Host Bay. c. Slide the host bay out just far enough that it is unlatched from the auto-docking power connector. Leave it in this position. d. Power on the host bay, use: Utility Menu, Host Bay Power Off/On. Power On a Host Bay. e. Observe the HA LED for the failing host bay on both host bay drawer power supplies. Are both HA LEDs still off? v Yes, there is not an overcurrent condition caused by the host bay: 1) Power off the host bay, use: Utility Menu, Host Bay Power Off/On , Power Off a Host Bay. 2) Push the host bay in fully. 3) Power on the host bay, use: Utility Menu, Host Bay Power Off/On , Power On a Host Bay. 4) Return to step 3 on page 454 and continue with the remaining actions. v No, If one or both HA LEDs are now on, one of the FRUs listed below is creating an overcurrent condition: 1) Power off the host bay, use: Utility Menu, Host Bay Power Off/On , Power Off a Host Bay. 2) Replace or temporarily remove the FRU or FRUs until the host bay powers on successfully, use: Utility Menu, Host Bay Power Off/On.
Problem Isolation Procedures, CHAPTER 3

455

MAP 4810: Unexpected Host Bay Power Off


Host bay planar assembly Host bay drawer backplane Host adapter card Host bay drawer power supply 3) Replace the failing FRU. 4) When the problem is repaired, go to step 10. 8. One of the power supplies may have a problem with the output to the failing host bay that is preventing the other power supply from powering on. Do the following: a. Observe the HA LED on both host bay drawer power supplies for the working host bay. v If one of the HA LEDs is off, replace the host bay drawer power supply that has both HA LEDs off. Use Repair Menu, Replace a FRU, then return to the beginning of this MAP to fix the remaining problem. v If the HA LEDs for the working host bay are on for both host bay power supplies, continue with step 8b. b. Remove the input power cables to one of the two power supplies. Then slide that power supply part way out so it is no longer plugged into the host bay drawer backplane. The cooling fans on the other power supply will speed up. Attempt to power on the host bay, use: Utility Menu, Host Bay Power Off/On, Power On a Host Bay. v If the other power supply HA LED comes on, the power supply that is removed is failing and needs to be replaced. v If the other power supply HA LED does not come on, reinstall the power supply and connect the input power cables. Wait one minute for the cooling fans to return to normal speed. Then repeat this procedure for the other power supply. If it still fails, return to step 3 on page 454 and continue with the remaining actions. 9. Replace the host bay drawer backplane using MAP 4850: Repair the Host Bay Drawer on page 458. To replace all other FRUs use the Repair Menu, Replace a FRU option. After the repair is complete, continue with the next step. 10. The problem is repaired. Do the following to verify that the host bay is available for customer use. If any problems are found, then return to the start of this Map. a. Verify that all the HA LEDs are on for both host bay power supplies. b. Verify that the failing host bay was resumed successfully, use: Utility Menu, Resource Management Menu, Show Quiesced Resources, Show Fenced Resources . If required, resume the host bay, use: Utility Menu, Resource Management Menu, Resume a Resource. c. Close all associated problems, use: Repair Menu, Close a Previously Repaired Problem. d. Run the Repair Menu, End Of Call option.

MAP 4820: Isolating a SCSI Card Configuration Timeout


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

456

VOLUME 1, TotalStorage ESS Service Guide

MAP 4820: SCSI Card Configuration Timeout

Description
The SCSI card firmware load process did not complete the first load attempt which created the problem that sent you here. That failure should have caused a reset that attempted a second firmware load attempt. If the card status is available, the second firmware load attempt was successful.

Isolation
1. Repair any other problems for this SCSI Card. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Were any other problems for this SCSI Card repaired? v Yes, retry the firmware update load process. If it still fails, call the next level of support. v No, continue with the next step. 2. Read the description section above. Determine if SCSI card status is available. From the service terminal Main Service Menu, select: Utility Menu Show Storage Facility Resources Menu Show Storage Facility Resources Use the left column to find the Engineering FRU Name listed in the problem and determine the status. Is the status available? v Yes, continue with the next step. v No, call the next level of support. 3. Close the problem. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repair Problem

MAP 4840: CPI Diagnostic Communication Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The CPI diagnostics are run from both clusters to each host bay. The clusters communicate with each other through the cluster to cluster ethernet connection. Note: The problem may list the failing resource as a CPI interface. The CPI interface shown is the CPI interface that was being tested when the communication failure occurred. It is not the actual failing resource.

Isolation
1. Test the cluster to cluster communications through the ethernet connection. From the service terminal Main Service Menu, select: Machine Test Menu External Connections Menu Cluster-Cluster Communications Test
Problem Isolation Procedures, CHAPTER 3

457

MAP 4840: CPI Diagnostic Communication Problem


v Yes, continue with the next step. v No, go to step MAP 4410: Cluster to Cluster Ethernet Communication Test on page 403. 2. The communication failure stopped the diagnostics before all of the CPI interfaces were tested. 3. Has the customer been using this 2105 Model 800 after the problem was logged? v Yes, show and repair any related CPI problems. If there are none, use the Repair Menu, Close a Previously Repaired Problem option for the problem that sent you here. Then exit this MAP and go to MAP 1500: Ending a Service Action on page 67. v No, continue with the next step. 4. You can run the CPI diagnostics in two ways: a. Power the 2105 Model 800 off and on again. This tests all four CPI interfaces. b. Quiesce/resume a cluster and then each host bay. This tests each CPI interface one at a time. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource Then display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option) and look for new related problems. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair

MAP 4850: Repair the Host Bay Drawer


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
To replace the host bay planar FRU, a special procedure must be followed. There is no option for this FRU in the service terminal Replace a FRU option.

Procedure
Attention:This procedure requires taking both host bays and one of the clusters away from customer use. 1. Determine if a cluster is fenced. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Show Fenced Resources Is the cluster fenced? v Yes, there should be a problem for it. Repair that first then return here and continue with the next step. v No, continue with the next step.

458

VOLUME 1, TotalStorage ESS Service Guide

MAP 4850: Repair the Host Bay Drawer


2. The quiesces and resumes in this and the following steps must be done in the listed order. Quiesce the cluster that is above the host bay drawer to be repaired. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource 3. Quiesce both host bays in the host bay drawer being repaired. Use the same Quiesce a Resource option as for the cluster. 4. Quiesce both host bay drawer power supplies. Use the same Quiesce a Resource option as for the cluster. 5. Switch off each host bay drawer power supply (switch near LED indicators) 6. Unplug both input power cables on each power supply. 7. Replace the host bay drawer FRU or FRUs using the procedure in FRU Removal and Replacement Procedures in chapter 4 of the Volume 2. When complete return here and continue. 8. Plug the input power cables on each power supply and switch on each power supply. 9. Resume both host bays. 10. Resume the cluster. 11. Return to the procedure that sent you here.

MAP 4870: Host Bay Power On Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A host bay will not power on.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Do the following checks before continuing: v Verify that the host bay drawer power supply power switch is set to on (up) on both power supplies. v Display and repair any problems for the RPC cards or host bay drawer power. v Verify that the host bay drawer power supply power input cables and power control cables are correctly plugged. v Verify that the host bay drawer power control cables are correctly plugged into the RPC cards connector J2-9 (host bay 1 and 2) or J2-13 (host bay 3 and 4). v Slide the host bay in and out a few times to verify that the connector contacts are clean. v Attempt to power on the host bay, if it fails continue with the next step. 2. Determine where the host bay failed to power on from:
Problem Isolation Procedures, CHAPTER 3

459

MAP 4870: Host Bay Power On Problem


v Failed to power on from the service terminal, continue with the next step. v Failed to power on from the rack operator panel Local Power switch, go to step 7. 3. Attempt to power on again with the service terminal connected to the same cluster. Use the Utilities Menu, Host Bay Power Menu options to power on and off, so you do not have to repeat the original repair action. Did the host bay power on? v Yes, no longer failing. Continue with the original service action. v No, continue with the next step. 4. Attempt to power on again with the service terminal connected to the other cluster. Did the host bay power on? v Yes, continue with the next step. v No, the host bay power on is failing from both clusters. The power problem may also prevent powering on with the operator panel Local Power switch. Go to step 7. 5. Connect the service terminal back to the original cluster and attempt to power off the host bay. Did the host bay power off? v Yes, continue with the next step. v No, the cluster is having problems communicating with both RPC cards. Use the service terminal Replace a FRU menu. The most probable failing FRU is the I/O drawer planar assembly. 6. Attempt to power on again with the service terminal connected to the same cluster. Did the host bay power on? Did the host bay power on? v Yes, no longer failing. Continue with the original service action. v No, the cluster is able to communicate with the RPCs, but is not able to send the power on command. The most probable failing FRU is the I/O drawer planar assembly. 7. Attempt to power on the host bay. Press the rack operator panel Local Power switch to the on position (up), then release it. Observe the host bay drawer power supply HA output LED indicators. Note: The HA1 LED indicator is for host bay 1 or 3, the HA2 LED indicator is for host bay 2 or 4. Did the HA LED indicator come on solid? v Yes, no longer failing. Continue with the original service action or go to step 5 to test powering off and on from the service terminal. v No, continue with the next step. Put the host bay into the service position to disconnect it from the power supplies. Attempt to power on the host bay. If it powers on, replace the host bay planar FRU. If it does not power on, continue with the next step. 8. Attempt to power on the host bay while observing the primary power supply digital status indicator displays (between the front cooling fans). Are codes displayed? v Yes, the RPC cards are receiving a power on signal and sending it to the PPS. Continue with the next step.

460

VOLUME 1, TotalStorage ESS Service Guide

MAP 4870: Host Bay Power On Problem


v No, the RPC cards are not receiving a power on signal. Go to step 10. 9. It only requires one working RPC card to power on the host bay drawer power supply output to a host bay. The output will switch on even if the host bay is not plugged in to the host bay drawer. The most probable FRUs are v Host bay drawer power supply, replace them one at a time. v Signal cable, RPC to host bay drawer power supply v RPC card 10. Replace the rack operator panel On/Off card, this can be done concurrently. There is no service boundary to quiesce or set in service mode for this FRU. When the FRU is replaced, power on the rack: press the rack operator panel Local Power switch to the on position (up), then release it.

MAP 4880: Cluster Power On Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The CEC drawer and I/O drawer have three power states: v Powered off This only occurs when the 2105 is powered off. The drawer power LED indicator will be off. v Standby power mode This occurs when the cluster has been powered off for service. The CEC drawer and I/O drawer power LED indicators will be blinking slowly. The CEC drawer operator panel will display OK. v Powered on This is the normal mode when the 2105 is powered on. The drawer power LED indicator will be on solid. When the 2105 is powered on, the drawers receive standby power from the drawer power supplies. The drawer power LED indicators are blinking slowly. The service processor in the I/O drawer and the System Power Control Network (SPCN) including the fan controller card in both drawers are operational. Once the CEC drawer operator panel displays OK, standby power mode is complete. This state is signalled to both RPC cards through the cables connected to the I/O drawer RJ45 connectors 2 and 3 (drawer front right). The RPC cards automatically send back an I/O drawer power on signal. The I/O drawer then powers on completely and signals the CEC drawer to also power on completely.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence.

MAP 4880 Section-1


General inspection 1. Were you sent here from Chapter 9, by an SPCN firmware code of 1011 1C0x? v Yes, go to MAP 4885: SPCN Load Fault Firmware Error Code on page 468. v No, continue with the next step. 2. Verify that the 2105 is powered on. 3. Observe the CEC drawer operator panel.

Problem Isolation Procedures, CHAPTER 3

461

MAP 4880: Cluster Power On Problem


Is the CEC drawer operator panel hung displaying a code? v Yes, continue with the next step. v No, go to step 5. 4. Observe the CEC drawer operator panel. Is the CEC drawer operator panel hung displaying OK? v Yes, continue with the next step. v No, if using the repair information for the displayed code sent you to this MAP, continue at the next step. If you have not yet used the displayed code, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. 5. Connect the service terminal to working cluster and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a related problem for the RPC cards, CEC drawer power or I/O drawer power? v Yes, if you used the problem to get to this MAP, continue with the next step. If you have not yet used the problem, you can exit this MAP and attempt to use the problem to repair this problem. v No, continue with the next step. 6. Verify that the cables are correctly connected to:
(SCSI Signal) Q7 (SCSI Power) Q8

Fan 7

Fan 8

Front View
Q1 V/S Comm Q3 RIO-0 Q2 RIO-1 Q4 JTAG

Figure 140. CEC Drawer Bulkhead Connector Locations (s009527)

462

VOLUME 1, TotalStorage ESS Service Guide

MAP 4880: Cluster Power On Problem


Media Power (CEC Drawer SCSI Devices)

RIO 1

RIO 0

OP (CEC Drawer Operator Panel) P3 P2 P1

10-100 (Ethernet) Q1 (V/S Comm)

No Use

Debug (not used)

S1 S3

S2 S4

J11 J15

J14 J16

Q4 Q7 (CEC Drawer SCSI Signal)

R1 (JTAG)

Figure 141. I/O Drawer Bulkhead Connector Locations (s009526)

v CEC drawer connectors for RIO-0, RIO-1, V/S COMM, and JTAG (four horizontal cables at bottom front of drawer) v I/O drawer connectors for RIO-0, RIO-1, V/S COMM, JTAG, RJ45 card connectors 1/2/3 and connector J14 (drawer front lower right) v I/O drawer power control cables are correctly plugged into the I/O drawer RJ45 connectors P2 and P3 (drawer front middle right). v I/O drawer power control cables are correctly plugged into each RPC card. RPC card connector J2-11 (for cluster 2) or J2-15 (for cluster 1).

2105 Model 800 J2


2 1 4 3

Top View J2 Connectors RPC 1


6 5 8 10 12 14 16 7 9 11 13 15

J2 RPC 2
2 1 4 3 6 5 8 10 12 14 16 7 9 11 13 15

R EAR V IEW

Figure 142. RPC Card J2 Connector Locations (s009583)

Are the cables correctly connected? v Yes, continue with the next step.
Problem Isolation Procedures, CHAPTER 3

463

MAP 4880: Cluster Power On Problem


v No, exit this MAP. Verify that the cluster is powered off (CEC and I/O drawer power indicators off), reconnect the cable, or cables, and power on the cluster. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. 7. Observe the I/O drawer power green LED indicator on the CEC drawer operator panel upper left. Use the condition that applies: v Off (no power), go to MAP 4880 Section-2. v Blinking slowly (standby power), go to MAP 4880 Section-3 on page 465. v On solid (powered on), continue with the next step. 8. Observe the CEC drawer power green LED indicator (drawer front lower left). Find the indicator condition that applies: v Off, go to MAP 4880 Section-4 on page 466. v Blinking slowly, MAP 4880 Section-5 on page 467. v On solid, both drawers of the cluster are powered on. Exit this MAP and return to the procedure that sent you here.

MAP 4880 Section-2


The I/O drawer power indicator is off. The I/O drawer did not reach standby power mode. 1. Observe the I/O drawer power supply input power LED indicators (next to the input power connectors). Are any of the power supply input power LED indicators lit? v Yes, continue with the next step. v No, go to MAP 2800: CEC or I/O Drawer Visual Power Supply Problem on page 171. 2. Press the CD-ROM drive eject button. (Front of CEC drawer.) Does the CD-ROM tray open? v Yes, the I/O drawer is powered on. There is a problem with the LED indicator or the circuits that drive it. Verify that the CEC drawer operator panel cable is connected to the operator panel and I/O drawer connector OP. See figures in step 6 on page 462. Replace the following FRUs until the problem is fixed. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs: CEC drawer operator panel CEC drawer operator panel cable I/O drawer planar assembly v No, continue with the next step. 3. The I/O drawer has input power and should automatically be in standby power mode (I/O drawer power LED indicator blinking slowly). If the cluster received a power on request, the drawer would power on (I/O drawer power LED indicator on solid). Do one of the following: v Go to the next step for an attempt to display an error on the CEC drawer operator panel. If that is not successful, you will return here and replace the listed FRUs. v Replace the following FRUs until the problem is fixed. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs:

464

VOLUME 1, TotalStorage ESS Service Guide

MAP 4880: Cluster Power On Problem


I/O drawer fan controller card I/O drawer planar assembly CEC drawer operator panel CEC drawer operator panel cable 4. Reset the drawer power system and attempt to display an error code in the CEC drawer operator panel: v On both CEC drawer power supplies, set the power switch to off (down). v On both CEC drawer power supplies, unplug the input power cables (four total). v On both I/O drawer power supplies, set the power switch to off (down). v On both I/O drawer power supplies, unplug the input power cables (four total). v Unplug the cables from the I/O drawer RJ-45 card connectors 2 and 3 (drawer front lower right). (These cables come from the RPC cards and cause the I/O drawer to power on instead of stopping at standby power mode when they are plugged in.) v Reconnect the CEC drawer power supply input power cables (before reconnecting the I/O drawer power supply input power cables). v Reconnect the I/O power supply input power cables. v Observe the CEC drawer operator panel. If progress codes are displayed and then stop with OK, the I/O drawer is in standby power mode (good condition). Reconnect the RJ45 cables into the correct connectors. Go to MAP 4880 Section-3. If the display hangs with a progress/error code displayed, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. If no codes are displayed, return to Step 3 on page 464 and replace the listed FRUs until the problem is repaired.

MAP 4880 Section-3


I/O drawer power LED indicator is blinking slowly (standby power mode). 1. When the cluster first failed to power on, did you use the Repair Menu, Alternate Cluster Repair, Power On the Alternate Cluster menu option? v Yes, go to step 4. v No, continue at the next step. 2. Connect to the working cluster and use the Repair Menu, Alternate Cluster Repair, Power On the Alternate Cluster menu option to attempt to power on the cluster. Did the power on request complete with good status? v Yes, go to step 4. v No, the code in the working cluster should have been able to successfully start the cluster power on process even if the cluster power on did not complete. This indicates the code in the working cluster may not be in a normal state. To reduce the chance of an unexpected customer event, contact the next level of support. After the cause of the power on not giving good status is determined or corrected, continue at the next step. 3. Attempt to power on the I/O drawer. Press the rack operator panel local power control switch to on (up), then release it. 4. Observe the CEC drawer operator panel. Is it hung with a code displayed?

Problem Isolation Procedures, CHAPTER 3

465

MAP 4880: Cluster Power On Problem


v Yes, if you have already been to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 for the displayed code, continue at the next step. If you have not been to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 for the displayed code, go there now. v No, continue with the next step. 5. Observe the I/O drawer power green LED indicator on the upper left of the CEC drawer operator panel. Is the LED indicator on solid? v Yes, the I/O drawer has powered up. Continue with the next step. v No, go to MAP 4880 Section-1 Step 7 on page 464. 6. Observe the CEC drawer power LED indicator (drawer front lower left). Is the CEC drawer power LED indicator on solid? v Yes, the CEC drawer has powered on. Both drawers are powered on, exit this MAP and return to the procedure that sent you here. v No, go to MAP 4880 Section-1 on page 461. 7. Verify that the RPC cards are receiving the power on request from the rack operator panel. Press the rack operator panel local power control switch to on (up), then release it. Observe the primary power supply digital status indicator (between front cooling fans). Are status codes displayed on at least one of the two primary power supplies? v Yes, the RPC is working correctly. (Only one working RPC is needed to power on the cluster). Continue with the next step. v No, the rack has a power on problem, go to MAP 2020: Isolating Power Symptoms on page 112. 8. Replace the following FRUs until the problem is fixed. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs: v v v v I/O drawer planar assembly I/O drawer fan controller card RPC cards RPC to I/O drawer cables (connected to RJ45 connectors J2 or J3)

MAP 4880 Section-4


CEC drawer does not reach standby power mode. 1. Observe the CEC drawer power supply input power LED indicators (next to the input power connectors). Are any of the power supply input power LED indicators lit? v Yes, continue with the next step. v No, go to step MAP 2800: CEC or I/O Drawer Visual Power Supply Problem on page 171. 2. Do the following to attempt to bring the I/O drawer back to a standby power mode: v On both I/O drawer power supplies, set the power switch to off (down). v On both I/O drawer power supplies, unplug the input power cables (four total). v Unplug the cables from the RJ-45 card connectors 2 and 3. (These cables come from the RPC cards and cause the I/O drawer to power on instead of stopping at standby power mode.) v Reconnect the power supply input power cables.

466

VOLUME 1, TotalStorage ESS Service Guide

MAP 4880: Cluster Power On Problem


v Observe the CEC drawer operator panel. If progress codes are displayed and then stop with OK, the I/O drawer is in standby power mode (good condition). Reconnect the RJ45 cables into the correct connectors. Go to MAP 4880 Section-5. If the display hangs with a progress or error code displayed, reconnect the RJ45 cables and then go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. If no codes are displayed, continue with the next step. 3. There may be an overcurrent condition caused by the processor or memory FRUs. Repeat MAP 4880 Section-4 on page 466, unplugging the processor card and both memory cards before reconnecting the power supply input power cables. v If the CEC drawer power LED indicator blinks slowly, one of the FRUs removed is drawing an overcurrent and must be replaced. Reconnect the RJ45 card cables back into connectors 2 and 3 and use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. v If the CEC drawer power LED stays off, reconnect the RJ45 card cables back into connectors 2 and 3. One of the following FRUs is failing. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRU, or FRUs: CEC drawer fan controller card CEC drawer planar assembly I/O drawer planar assembly CEC drawer operator panel CEC drawer operator panel cable

MAP 4880 Section-5


CEC drawer power LED indicator is blinking slowly (standby power mode). The CEC drawer receives a power on signal from the I/O drawer through the System Power Control Network (SPCN) cables (V/S Comm and JTAG cables). 1. When the cluster first failed to power on, did you use the Repair Menu, Alternate Cluster Repair, Power On the Alternate Cluster menu option? v Yes, go to step 4. v No, continue at the next step. 2. Connect to the working cluster and use the Repair Menu, Alternate Cluster Repair, Power On the Alternate Cluster menu option to attempt to power on the cluster. Did the power on request complete with good status? v Yes, go to step 4. v No, the code in the working cluster should have been able to successfully start the cluster power on process even if the cluster power on did not complete. This indicates the code in the working cluster may not be in a normal state. To reduce the chance of an unexpected customer event, contact the next level of support. After the cause of the power on not giving good status is determined or corrected, continue at the next step. 3. Attempt to power on the CEC drawer. Press the rack operator panel local power control switch to on (up), then release it. 4. Observe the CEC drawer operator panel. Is it hung with a code displayed? v Yes, if you have already been to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 for the displayed
Problem Isolation Procedures, CHAPTER 3

467

MAP 4880: Cluster Power On Problem


code, continue at the next step. If you have not been to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 for the displayed code, go there now. v No, continue with the next step. 5. Observe the CEC drawer power LED indicator. Is the LED indicator on solid? v Yes, the CEC drawer has powered up. Exit this MAP and return to the procedure that sent you here. v No, one of the following FRUs is failing. Replace the following FRUs until the problem is fixed. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs: CEC drawer fan controller card CEC drawer Planar assembly I/O drawer planar assembly V/S comm cable JTAG cable CEC drawer power supply

MAP 4885: SPCN Load Fault Firmware Error Code


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
An SPCN firmware code of 1011 1C0x is occurring which indicates one of the CEC drawer power supplies is reporting an overcurrent.

Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence.

MAP 4885 Section-1


Remove cards 1. Quiesce and power off the failing cluster. Use the Repair Menu, Alternate Cluster Repair Menu options. 2. Unplug the power input cables from both CEC drawer power supplies (four cables total). 3. Remove the following FRUs: v CEC processor card v Memory riser card 1 v Memory riser card 2 4. Reconnect the power input cables to the CEC drawer power supplies (four cables total). 5. Power on the cluster using the Alternate Cluster Repair Menu option. Is the error code 1011 1C0x displayed?

468

VOLUME 1, TotalStorage ESS Service Guide

MAP 4885: SPCN Load Fault Firmware Error Code


v Yes, go to MAP 4885 Section-2. v No, go to MAP 4885 Section-7 on page 470.

MAP 4885 Section-2


Test CEC drawer power supply 2. 1. Power off the cluster using the Alternate Cluster Repair Menu option. 2. Disconnect the power input cables from the CEC drawer power supplies. 3. Remove CEC drawer power supply 2 (Tx-U1.1-V2, left power supply when viewed from rear of rack). 4. Reconnect the power input cables to CEC drawer power supply 1. 5. Power on the cluster using the Alternate Cluster Repair Menu option. Is the error code 1011 1C0x displayed? v Yes, go to MAP 4885 Section-4. v No, go to MAP 4885 Section-3.

MAP 4885 Section-3


Reinstall CEC drawer power supply 2. 1. Power off the cluster using the Alternate Cluster Repair Menu option. 2. Reinstall CEC drawer power supply 2. 3. 4. Reconnect the power input cables to CEC drawer power supply 2. Power on the cluster using the Alternate Cluster Repair Menu option. Is the error code 1011 1C0x displayed? v Yes, replace the CEC drawer power supply 2 and reinstall the processor and memory cards that were removed earlier. v No, the symptom has changed or is no longer failing. Power off the cluster, reinstall the cards that were removed. Power on the cluster. If the cluster hangs at 1011 1C0x, return to the beginning of this MAP. If the cluster does not hang, exit this MAP and return to the service procedure that sent you here.

MAP 4885 Section-4


Test CEC drawer power supply 1 1. Power off the cluster using the Alternate Cluster Repair Menu option. 2. Reinstall CEC drawer power supply 2. 3. Reconnect the power input cables to CEC drawer power supply 2. 4. Disconnect the power input cable to CEC drawer power supply 1 (Tx-U1.1-V1, right power supply when viewed from rear of rack). 5. Remove the CEC drawer power supply 1. 6. Power on the cluster using the Alternate Cluster Repair Menu option. Is the error code 1011 1C0x displayed? v Yes, go to MAP 4885 Section-5. v No, replace CEC drawer power supply 1, location: U1.1-V1 and reinstall the processor and memory cards that were removed earlier.

MAP 4885 Section-5


Test CEC drawer fan 7. 1. Power off the cluster using the Alternate Cluster Repair Menu option. 2. Reinstall the CEC drawer power supply 1. 3. Reconnect the power input cables to the CEC drawer power supply 1.
Problem Isolation Procedures, CHAPTER 3

469

MAP 4885: SPCN Load Fault Firmware Error Code


4. Unplug the CEC drawer fan 7 connector at the front of the CEC drawer. Disregard a CEC fan error code if it occurs during this step. 5. Power on the cluster using the Alternate Cluster Repair Menu option. Is the error code 1011 1C0x displayed? v Yes, go to step MAP 4885 Section-6. v No, replace the CEC drawer fan 7, location: U1.1-F7 and reinstall the processor and memory cards that were removed earlier.

MAP 4885 Section-6


Test CEC drawer fan 8. 1. Power off the cluster using the Alternate Cluster Repair Menu option. 2. Reconnect the CEC drawer fan 7 connector that was unplugged in MAP 4885 Section-5 on page 469. 3. Unplug the CEC drawer fan 8 connector at the front of the CEC drawer. Disregard a CEC fan error code if it occurs during this step. 4. Power on the cluster using the Alternate Cluster Repair Menu option. Is the error code 1011 1C0x displayed? v Yes, reconnect the CEC drawer fan 8 connector and then replace the CEC drawer planar assembly, location: U1.1-P1 using MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. (MAP 4700 has additional FRU manual verification checkouts.) At the same time reinstall the processor and memory cards that were removed earlier. v No, replace the CEC drawer fan 8, location: U1.1-F8 and reinstall the processor and memory cards that were removed earlier.

MAP 4885 Section-7


Test removed cards. 1. Power off the cluster using the Alternate Cluster Repair Menu option. 2. Disconnect the power input cables from the CEC drawer. 3. To isolate the failing card, reinstall and test one card at a time in the sequence listed: v Memory riser card 1 v Memory riser card 2 v CEC processor card 4. Reconnect the power input cables to the CEC drawer. 5. Power on the cluster using the Alternate Cluster Repair Menu option. Is the error code 1011 1C0x displayed? v Yes, the last card you installed is defective. Replace the last card you installed and reinstall the remaining cards that were removed. v No, continue to the next step. 6. Have all the cards that were removed in MAP 4885 Section-1 on page 468 been reinstalled? v Yes, replace the CEC drawer planar assembly using MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. (MAP 4700 has additional FRU manual verification checkouts.) v No, install and test the next card in the list.

470

VOLUME 1, TotalStorage ESS Service Guide

MAP 4890: Replacing a CEC or I/O Drawer Power Supply

MAP 4890: Replacing a CEC or I/O Drawer Power Supply


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
See isolation below.

Isolation
1. THE CLUSTER WILL POWER OFF IF A POWER SUPPLY IS REMOVED FOR MORE THAN FOUR MINUTES. When the power supply is removed, a power supply (working or not) must be reinstalled within 4 minutes. Note: The cluster firmware checks if the power supply FRUs are physically installed. If it detects a power supply as missing, it waits four minutes and then powers off the the cluster. A power supply that is not installed, creates a problem with the cooling air flow, that can cause components to overheat. As long as the firmware detects the power supply, it does not matter if the power supply is working, the firmware will not power off the cluster. 2. Note the power supply location code displayed in the problem. Use the Repair Menu, Replace a FRU option to replace the CEC or I/O drawer power supply. Note: When using the Replace a FRU option, the CEC and I/O drawer power supplies are listed under the rsrack1 container. 3. If the repair was successful, the problem will be closed automatically. Use the Repair Menu, End of Call Status option to complete the service action.

MAP 4960: ESC 5500 Isolation


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
An error was recorded against one or more FRUs with a location code that was not properly recognized by the maintenance package. Additional action is required to determine the correct 2105 FRU location.

Isolation
1. Display the problem details screen that sent you here. 2. Observe the FRU Location field in the Possible FRUs list. Is the location code U0.1-V? v Yes, go to MAP 40E0: Only One I/O Drawer Power Supply Detected on page 349. v No, continue with the next step. 3. Is there another FRU listed in the same problem with a valid 2105 FRU location? Note: A FRU with a valid location will not have n/a in the Engineering FRU name, FRU Name, and Likely to Fix fields. v Yes, ignore the unrecognized FRU and go to step 5 on page 472. v No, continue with the next step. 4. Determine the 2105 FRU location as follows:
Problem Isolation Procedures, CHAPTER 3

471

MAP 4960: ESC 5500 Isolation


a. Note the FRU Location and Failing Cluster from the problem details, see example:
# Engineering # FRU Name n/a FRU Name n/a Likely to fix n/a FRU Location and/or FRU Error Code U0.1-P1/I4

# ESC 5500: One or more of the FRU entries listed have a FRU that cannot # be fully identified. Call your Next level of support for assistance to . . . # Failing Cluster ..........= 1

Figure 143. Example of Problem Details Report (s009716)

b. Determine the 2105 FRU location by prefixing the location code with the cluster location, for example T1-U0.1-P1-I4. Note: A description of the Location Codes is provided in Location Codes in chapter 7 of the Volume 3. 5. Determine if additional isolation actions, information or failing function codes are provided for the SRN and FRU Error Code, if listed. Lookup the SRN and FRU Error code (if listed) in Error Messages, Diagnostic Codes, and Service Reports in chapter 9 of Volume 3. 6. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace any cluster FRUs. Note: If the FRU location indicates a PCI slot (the last FRU location characters being /Ix, where x is the PCI slot number), replace the card in that slot. If that does not resolve the problem then contact your next level of support before replacing the I/O planar assembly 7. If the above actions do not indicate a failing FRU or resolve the problem, then call your next level of support.

MAP 4970: Isolating a Software Problem


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The 2105 Model 800 functional code detected a software problem that will require the next level of support to correct. Powering off and then on the cluster or reloading the hard disk drive code will not fix it. The next level of support may ask you to provide them with the information displayed in one or more fields of the problem. This will help identify the specific problem and the actions needed to correct it. This MAP is also called if a LIC feature license failure has been detected by the 2105 code. Another MAP isolates this problem.

Procedure
1. Use the table to find and repair the ESC listed in the problem.
Table 74. ESC Repairs ESC Go to:

472

VOLUME 1, TotalStorage ESS Service Guide

MAP 4970: Software Problem


Table 74. ESC Repairs (continued) 1235 1236 1237 1238 1239 2770, 2771 380E, 380F 3846 to 384F All other ESCs Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9 on page 474 Call the next level of support. Note: ESCs are defined in Chapter 9 of the service guide.

2. ESC 1235 - ODM out of synchronization Call your next level of support and have them reference the following note: Note: Only one resource, with out-of-sync ODM, is listed in the Error/Problem, there may be additional resources with ODM problems. The list of resources with out-of-sync ODM can be found in /var/adm/searas/tmp/rsodmcheck.log. See the following example. WARNING:The following ODM Errors were detected on Thu Apr 5 15:07:47 PDT 2001 cpssvol37 DA20 SingleSide cpssvol37 rank0 25 12 164067840 10 0 0 cpssvol38 DA20 SingleSide cpssvol38 rank0 25 13 177740160 10 0 0 lss9 DA10 MisMatch lss9 FF25 25 0 0 00diff0-rcfg lss9 FF25 25 10 0 0 0 diff1-rcfg ********End of List ******** ESC 1236 - DDM background process to format or certify or initialize failed From the service terminal Main Service Menu, select: Repair Menu Show Result of DDM Format / Resume Operation Call your next level of support, have them reference the result. For problem determination, a PE package and SSA adapter dumps will be needed. ESC 1237 - A resource was found fenced but there is no related problem to repair it. Call your next level of support. ESC 1238 - Quiesced resources were found and there is no service login or Automatic LIC is not in progress. The service representative needs to complete the service action that was started. ESC 1239 - Excessive file system fragmentation on the cluster hard drive. Call your next level of support. ESC 2770 - One cluster has a defective NVS/IOA card ESC 2771 - Both clusters have a defective NVS/IOA card on the same CPI interface. Go to MAP 41C0: ESC 2770 or 2771, Missing CPI Detected on page 362. ESC 380E - IBM notification of warmstart failover ESC 380F - IBM notification of warmstart a. This problem is informational only and requires no repair.
Problem Isolation Procedures, CHAPTER 3

3.

4.

5.

6. 7.

8.

473

MAP 4970: Software Problem


b. This problem was reported because the customer requested to be notified of warmstarts. (This reporting option can only be enabled/disabled by IBM Support.) c. IBM support will be notified of this event automatically if Call Home Reporting is enabled. If it is not enabled, you must call the next level of support to report this problem. d. Cancel this problem using the Utility Menu, Problem Log Menu, Cancel Problems by Selecting Problem IDs e. Display and repair any related problems for the cluster that warmstarted. 9. ESC 3846 - License failure, PPRC V2 disabled ESC 3847 - License failure, FlashCopy V2 disabled ESC 384A - License failure, Remote Flash Copy disabled ESC 384B - License failure, license out of sync on each cluster ESC 384C - License failure, PA disabled ESC 384D - License failure, XRC disabled ESC 384E - License failure, PPRC disabled ESC 384F - License failure, Flash Copy disabled Go to MAP 4990: LIC Feature License Failure on page 476.

MAP 4980: Customer Copy Services Problems


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The customer is experiencing problems or has asked for assistance with ESS Web Copy Services. One of the following conditions may be present: v The customer is unfamiliar with managing Copy Services using the ESS Specialist v The customer wants help in managing Copy Services v ESS Web Copy Services is not properly configured v The customer has asked you to restart Copy Services v The customer is not seeing a complete LSS list at the host

Procedure
Use the following table to help determine the action needed to resolve the customers problem. Find the Symptom in the table and then use the Action to isolate and repair the problem.

474

VOLUME 1, TotalStorage ESS Service Guide

MAP 4980: Copy Services Problems


Table 75. ESS Web Copy Services Problems Symptoms The customer is unfamiliar with managing Copy Services with the ESS Specialist. Actions Familiarize yourself with the use of the ESS Specialist Copy Services feature and do one of the following: v instruct the customer on how perform the necessary operations v Use the ESS Specialist to manage Copy Services for the customer v Instruct the customer to refer to the IBM TotalStorage Enterprise Storage Server Web Users Interface Guide book, SC26-7346 The customer wants help managing Copy Services. Use the Copy Services SMIT screen option Copy Services Menu under the Configurations Options Menu in chapter 8 of the Volume 3. Instruct the customer to refer to the IBM TotalStorage Enterprise Storage Server Web Users Interface Guide book, SC26-7346. ESS Web Copy Services is not properly configured Use the Configure Copy Services, with DNS in chapter 6 of the Volume 2, or the Configure Copy Services, without DNS in chapter 6 of the Volume 2.

The customer has asked you to restart Copy From the service terminal Main Service Services Menu, select: Configure Options Menu Copy Services Menu Copy Services Server Menu Change Server Definitions Select one of the following: Reset to Primary Restarts Copy Services with Primary Server as active server Reset to Backup Restarts Copy Services with Backup Server as active server

Problem Isolation Procedures, CHAPTER 3

475

MAP 4980: Copy Services Problems


Table 75. ESS Web Copy Services Problems (continued) Symptoms The customer is not seeing a complete LSS list at the host terminal Actions Do one of the following: 1. If the customer has asked you to restart Copy Services, from the service terminal Main Service Menu, select: Configure Options Menu Copy Services Menu Copy Services Server Menu Change Server Definitions Select one of the following: Reset to Primary Restarts Copy Services with Primary Server as active server Reset to Backup Restarts Copy Services with Backup Server as active server 2. The network connecting the primary server to the backup server may be down. Ask the customer to check the network. 3. The backup server may not be installed or configured. Has the backup server been installed? v Yes, the backup server is installed but may not be configured. Use the Configure Copy Services, with DNS in chapter 6 of the Volume 2, or the Configure Copy Services, without DNS in chapter 6 of the Volume 2. v No, the backup server needs to be installed. A new ESS subsystem needs to be installed or a Copy Services MES needs to be ordered and installed on a currently installed backup server ESS subsystem. This MAP has not been able to resolve your problem. Contact your next level of support.

MAP 4990: LIC Feature License Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
There are LIC features that the customer buys a license for. The service representative enables the feature by loading a customized diskette written for this 2105s serial number. If there is a mismatch, a problem will be created with an ESC field that identifies the feature that is disabled.

476

VOLUME 1, TotalStorage ESS Service Guide

MAP 4990: LIC Feature License Failure

Procedure
1. Display the problem details screen and identify the ESC and LIC feature that is disabled. v 384B - License Failure, license out of sync on each cluster, go to step 6. v 384C - License Failure, PAV disabled, go to step 2. v 384D - License Failure, XRC disabled, go to step 2. v 384E - License Failure, PPRC disabled, go to step 2. v 384F - License Failure, Flash Copy disabled, go to step 2. 2. Display the LIC feature status screen. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu LIC Feature Menu Display Active LIC Features 3. The LIC feature will be disabled if the Configured Capacity exceeds the Feature Capacity Limit. If it does, do one of the following: v The configured capacity must be reduced. v The customer must purchase more LIC feature capacity. Then the a customized diskette enabling the added capacity must be installed. 4. The LIC feature will be disabled if the LIC Feature Control diskette has not been created and installed. For more information on how to create the diskette reference, LIC Feature Control Record Extraction in chapter 5 of the Volume 2 book. Note: The LIC feature are automatically reloaded as part of the hard disk drive rebuild process. 5. The LIC feature capacities should be the same on both clusters. If they are not, call the next level of support. 6. Was there a LIC feature already installed on this 2105? v Yes, a feature may have been removed, and the clusters need to be rebooted. Close the problem that sent you here. Use the Alternate Cluster Repair Menu options to quiesce and then resume first the failing cluster and then the other cluster. If the problem is reopened, close it and then power the 2105 off and then on. If the problem is reopened, call the next level of support. v No, call the next level of support.

MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
The Background Certify and Build Logical Configuration from ISA process is used during installation to perform an automated DDM Certify and build of the logical configuration. This can be completed after the service representative has left the

Problem Isolation Procedures, CHAPTER 3

477

MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
site. You have been sent to this Isolation procedure because a problem was detected. If there is another problem, this map will provide additional recovery guidance.

Procedure
1. Use the Repair Menu, Show / Repair Problems needing Repair to display the problem details for the problem which sent you here. Identify the ESC which was recorded and take the action described in the following table.
Table 76. ESC Actions ESC 1370 1371 1372 1373 1374 1375 1376 Description Failure detected during background certify ISA logical cfg build - failed to create logical subsystem ISA logical cfg build - failed to create rank ISA logical cfg build - failed to create custom volume ISA logical cfg build - failed to create PAV Failure detected by Fixed Block format monitor Unexpected (MLE) failure Action Go to MAP Section 49A0-1 Go to MAP Section 49A0-2 on page 479 Go to MAP Section 49A0-2 on page 479 Go to MAP Section 49A0-2 on page 479 Go to MAP Section 49A0-2 on page 479 Go to MAP Section 49A0-3 on page 479 Call your next level of support to determine if the process should be restarted. Go to MAP Section 49A0-4 on page 481

1377

Unexpected IML occurred

MAP Section 49A0-1: 1. Use the Repair Menu, Show / Repair Problems needing Repair. In addition to the problem that sent you here, are there any other problems? v Yes, use the problem or logs to repair the other problem or problems. When they are repaired, return here and continue with the next step. v No, call your next level of support. Analysis of traces will be required to determine the cause and the action plan. 2. Close this problem that sent you here. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 3. Did all DDMs complete Certify during the repair action? v Yes, continue with the next step. v No, go to step 6 on page 479. 4. Restart the operation. From the service terminal Main Service Menu, select: Install/Remove Menu Background Certify and Build Logical Configuration from ISA Menu Background Certify and Build Logical Configuration from ISA

478

VOLUME 1, TotalStorage ESS Service Guide

MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
Make the required selections. Notes: a. Certify DDMs is not needed because it should have already been completed. b. If the Import and build logical configuration from ISA option was previously selected, it must be reselected. 5. Return to the Install section to complete any outstanding actions. 6. Restart the operation. From the service terminal Main Service Menu, select: Install/Remove Menu Background Certify and Build Logical Configuration from ISA Menu Background Certify and Build Logical Configuration from ISA Make the required selections. Notes: a. Certify DDMs must be reselected. b. If the Import and build logical configuration from ISA option was previously selected, it must be reselected. 7. Return to the Install section to complete any outstanding actions. MAP Section 49A0-2: 1. Use the Repair Menu, Show / Repair Problems needing Repair. In addition to the problem that sent you here, are there any other problems? v Yes, use the problem or logs to repair the other problem or problems. When they are repaired, return here and continue with the next step. v No, call your next level of support. Analysis of traces will be required to determine the cause and the action plan. 2. Close the problem that sent you here. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 3. Restart the operation. From the service terminal Main Service Menu, select: Install/Remove Menu Background Certify and Build Logical Configuration from ISA Menu Background Certify and Build Logical Configuration from ISA Make the required selections then return to the Install section to complete any outstanding actions. Notes: a. Certify DDMs is not needed because it should have already been completed. b. The Import and build logical configuration from ISA option must be reselected. MAP Section 49A0-3: 1. Use the Repair Menu, Show / Repair Problems needing Repair. In addition to the problem that sent you here, are there any other problems?

Problem Isolation Procedures, CHAPTER 3

479

MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
v Yes, use the problem or logs to repair the other problem or problems. When they are repaired, return here and continue with the next step. v No, call your next level of support. Analysis of traces will be required to determine the cause and the action plan. 2. Close the problem that sent you here. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 3. Determine if Fixed Block format has now completed for all volumes. a. From the service terminal Main Service Menu, select: Utility Menu Fixed Block Format Menu Show Fixed Block Format Status b. Display each LSS which shows a type of FB. c. Check for each LSS that all Volumes show an LV FORMAT/HARDWARE STATUS of FORMATTED/READY or that the LSS shows No Logical Volumes configured for this Logical Address. Did all LSSs appear as described? v Yes, go to step 8. v No, continue with the next step. 4. Did any Volume show an LV FORMAT/HARDWARE STATUS of FORMAT_IN_PROGRESS v Yes, continue with the next step. v No, go to step 6. 5. Note the FORMAT PERCENT for the Volumes which show FORMAT_IN_PROGRESS. Wait 10 minutes and then display them again. Did the FORMAT PERCENT increase? v Yes, the Fixed Block formatting appears to be continuing. Wait until all Volumes show FORMATTED/READY, then go to step 8. v No, wait a further 10 minutes and then check again. If the FORMAT PERCENT has not increased then call your next level of support. 6. Did any Volume show an LV FORMAT/HARDWARE STATUS of FAILED. v Yes, continue with the next step. v No, call your next level of support. 7. Attempt to recover the FAILED volumes. From the service terminal Main Service Menu, select: Utility Menu Fixed Block Format Menu Fixed Block Format Recovery Was the recover successful for all volumes? v Yes, Wait until all Volumes show FORMATTED/READY, then continue with the next step. v No, call your next level of support. 8. Was Import and build logical config from ISA originally selected? v Yes, continue with the next step. v No, go to the Install section to complete any outstanding actions. 9. Were Fiber Channel Open Systems Hosts configured during this install?

480

VOLUME 1, TotalStorage ESS Service Guide

MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
v Yes, continue with the next step. v No, go to the Install section to complete any outstanding actions. 10. Power the ESS off and then on using the white switch on the operator panel. 11. Return to the Install section to complete any outstanding actions. MAP Section 49A0-4: 1. Use the Repair Menu, Show / Repair Problems needing Repair. In addition to the problem that sent you here, are there any other problems? v Yes, use the problem or logs to repair the other problem or problems. When they are repaired, return here and continue with the next step. v No, continue with the next step. 2. ESC 1377 is created during IML if the ESS was powered off or rebooted while the Background Certify and Build Logical Configuration from ISA process was still running. Has the cause of that been identified and resolved? v Yes, continue with the next step. v No, call you next level of support to assist in analysis and resolution of the problem. 3. Determine if all Automated Install processes completed successfully. From the service terminal Main Service Menu, select: Install/Remove Menu Enterprise Storage Server Menu Background Certify and Build Logical Configuration from ISA Menu Show Status of Certify / Build Process Did all selected Tasks complete successfully? v Yes, continue with the next step. v No, go to step 9 on page 482. 4. Close this problem. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 5. Was Import and build logical config from ISA originally selected? v Yes, continue with the next step. v No, go to the Install section to complete any outstanding actions. 6. Were Fiber Channel Open Systems Hosts configured during this install? v Yes, continue with the next step. v No, go to the Install section to complete any outstanding actions. 7. Power the ESS off and then on using the white switch on the operator panel. 8. Return to the Install section to complete any outstanding actions.

Problem Isolation Procedures, CHAPTER 3

481

MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
9. Use the following table, in the sequence shown, to determine the next action.
Table 77. Status Actions Current status 1. Certify DDMs shows: Running, Failed, or Not yet started 2. Logical configuration shows: Running, Failed, or Not yet started 3. Fixed Block Format shows: Running, Failed, or Not yet started Action Go to step 2 on page 478 of MAP Section 49A0-1 Go to step 2 on page 479 of MAP Section 49A0-2 Go to step 2 on page 480 of MAP Section 49A0-3

4. Call home shows: Running, Failed, or Not yet Go to step 4 on page 481 of MAP started Section 49A0-4 5. Reboot shows: Running, Failed, or Not yet started Go to step 4 on page 481 of MAP Section 49A0-4

MAP 4A00: Isolating an Automatic LIC Activation Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
An Automatic LIC Activation process has been suspended due to a logic or hardware error.

Isolation
1. Call the next level of support.

MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for AutoLIC Phase 000 only. This MAP provides guidance on the proper order to repair the problems and restart AutoLIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. Login to a cluster and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for the other cluster cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If a cluster appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If a cluster appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator

482

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A10: Automatic LIC Activation Problem, Phase 000


panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. If not, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. No, continue with the next step. addition to the problem that sent you here, are there any other problems? Yes, go to step 8. No, continue with the next step.

v 2. In v v

3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 9 on page 484. v No, continue with the next step. 4. Did the Automatic LIC process stop and display an error screen that also gave a recovery action to do? v Yes, continue with the next step. v No, Automatic LIC appears to have failed with no other problem or visual symptom. Verify that a norsStartOnce diskette was not left in the diskette drive, then call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). Did the recovery appear to work successfully so that Automatic LIC could continue? v Yes, close this problem (using the Repair Menu, Close a Previously Repaired Problem option) and then go to step 15 on page 484. v No, call the next level of support. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. Did the problem have an ESC=1472? v Yes, go to step 9 on page 484. v No, call the next level of support. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with step 9 on page 484.

5.

6.

7.

8.

Table 78. Problem Repair Sequence Problem Type Each cluster has a problem calling MAP 4A10. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 on page 482 of this MAP.

Problem Isolation Procedures, CHAPTER 3

483

MAP 4A10: Automatic LIC Activation Problem, Phase 000


Table 78. Problem Repair Sequence (continued) Problem Type Cluster problem with ESC = 14Fx and calls MAP 4B10. Cluster problem with ESC not 14Fx Repair Sequence 2 Action Repair the problem and then return here and continue in this table. Repair the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair the problem and then return here and continue in this table. Go to step 9

Non-Cluster problem

All problems repaired

9. Close the problem that sent you here and then continue with the next step. 10. Verify the 2105 is fully operational. From the service terminal Main Service Menu, select: Repair Menu End of Call Status 11. Did you use MAP 4B10 during this repair? v Yes, go to step 13. v No, continue with the next step. 12. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 13. Was MAP 4025: Hard Disk Drive Build Process used to repair any of the problems? v Yes, the new Automatic LIC code needs to be recopied onto the cluster hard disk drives that was just repaired. Continue with the next step. v No, go to step 15. 14. Restart the Automatic LIC process with a recopy of the LIC code. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Initiate Automatic LIC Activation (On the Initiate Automatic LIC Activation screen, Copy LIC Image Source field, select the source for the LIC code, do not select No Copy.) The repair is complete and the automatic LIC activation process is in progress. 15. Restart the Automatic LIC process without a recopy of the LIC code. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu

484

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A10: Automatic LIC Activation Problem, Phase 000


Initiate Automatic LIC Activation (On the Initiate Automatic LIC Activation screen, Copy LIC Image Source field, select No Copy.) The repair is complete and the automatic LIC activation process is in progress.

MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. Login to cluster 2 (right) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6 on page 486. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 486. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 486. v No, call the next level of support.

Problem Isolation Procedures, CHAPTER 3

485

MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 79. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4A30. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair the problem and then return here and continue in this table. Repair the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair the problem and then return here and continue in this table. Continue with the next step.

Cluster problem with ESC = 14Fx and calls MAP 4B20. Cluster problem with ESC not 14Fx

Non-Cluster problem

All problems repaired

Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.

MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

486

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)

Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. No, continue with the next step. addition to the problem that sent you here, are there any other problems? Yes, go to step 6. No, continue with the next step.

v 2. In v v 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 488. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 488. v No, call the next level of support.

Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.

Problem Isolation Procedures, CHAPTER 3

487

MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Table 80. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4A20. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair the problem and then return here and continue in this table. Repair the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair the problem and then return here and continue in this table. Continue with the next step.

Cluster problem with ESC = 14Fx and calls MAP 4B30. Cluster problem with ESC not 14Fx

Non-Cluster problem

All problems repaired

Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.

MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. Login to cluster 2 and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error.

488

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A40: Automatic LIC Activation Problem, Cluster 1 Phase 100 (CCL)
Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 490. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 490. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all cluster problems have been repaired or a decision has been made to defer the repair, continue with step 7 on page 490.
Table 81. Problem Repair Sequence Problem Type Repair Sequence Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP.

Cluster 1 (left) has a problem 1 calling MAP 4A50.

Problem Isolation Procedures, CHAPTER 3

489

MAP 4A40: Automatic LIC Activation Problem, Cluster 1 Phase 100 (CCL)
Table 81. Problem Repair Sequence (continued) Problem Type Repair Sequence Action Cluster 1 (left) did an unexpected reboot. Are there additional problems? v Yes, continue in this table. v No, call the next level of support. Cluster problem with ESC = 14Fx and calls MAP 4B40. Cluster problem with ESC not 14Fx 3 Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair using the problem and then return here and continue in this table. Continue with the next step.

Cluster 1 (left) has a problem 2 calling MAP 4A40.

Non-Cluster problem

All problems repaired

7. Close the problem that sent you here and then continue with the next step. 8. Verify the 2105 is fully operational. From the service terminal Main Service Menu, select: Repair Menu End of Call Status 9. Did you use MAP 4B40 during this repair? v Yes, go to step 11. v No, continue with the next step. 10. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 11. Was MAP 4025: Hard Disk Drive Build Process used to repair any of the problems? v Yes, the new Automatic LIC code needs to be recopied onto the cluster hard disk drives that was just repaired. Continue with the next step. v No, go to step 13 on page 491. 12. Restart the Automatic LIC process with a recopy of the LIC code. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Initiate Automatic LIC Activation (On the Initiate Automatic LIC Activation screen, Copy LIC Image Source field, select the source for the LIC code, do not select No Copy.)

490

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A40: Automatic LIC Activation Problem, Cluster 1 Phase 100 (CCL)
The repair is complete and the automatic LIC activation process is in progress. 13. Restart the Automatic LIC process without a recopy of the LIC code. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Initiate Automatic LIC Activation (On the Initiate Automatic LIC Activation screen, Copy LIC Image Source field, select No Copy.) The repair is complete and the automatic LIC activation process is in progress.

MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Cluster 1 (left) remained operational and cluster 2 (right) had a failure that has suspended the Automatic LIC process. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. If this MAP determines that a repair of cluster 2 is needed: v DO NOT resume cluster 2, even if directed to by other MAPs. v Cluster 2 must be in the quiesced state, prior to resuming Automatic LIC process. 2. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. After the repair is complete, return here and continue. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 3. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 7 on page 492.
Problem Isolation Procedures, CHAPTER 3

491

MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
v No, continue with the next step. 4. Did you already repair a problem found in step 2 on page 491 of this MAP? v Yes, go to step 9 on page 493. v No, continue with the next step. 5. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 6. Did the problem have an ESC=1472? v Yes, go to step 9 on page 493. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 7. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 82. Problem Repair Sequence Problem Type Cluster 2 (right) has a problem calling MAP 4A40. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 on page 491 of this MAP. This should only occur after a successful repair of cluster 2. If cluster 2 still cannot communicate with cluster 1, return to step 1 on page 491 of this MAP to isolate the problem. Repair using the problem and then return here and continue in this table. Repair using the problem (do not resume the cluster) and then return here and continue in this table. Attention: Review the guidance in MAP step 1 on page 491 before continuing with the repair. Non-Cluster problem 5 Repair using the problem and then return here and continue in this table. Continue with the next step.

Cluster 2 (right) problem with ESC = 14Fx and calls MAP 4A50.

Cluster problem with ESC = 14Fx calling MAP 4B50 Cluster problem with ESC not 14Fx

All problems repaired

492

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
Note: Cluster refers to CEC and I/O drawers Verify that cluster 2 is in the quiesced state before continuing. From the service terminal Main Service Menu, select: Utility Menu Option Resource Management Menu Show Quiesced Resources (Cluster Bay 2)

8.

9. Close the problem or problems that sent you here and then continue with the next step. 10. Did you use MAP 4B50 during this repair? v Yes, continue with the next step. v No, go to step 12. 11. Resume the Automatic LIC process. (MAP 4B50 already terminated the Automatic LIC process. ) From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Initiate Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress. The repair is complete and the automatic LIC activation process is in progress. 12. Resume the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.

MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. Login to cluster 2 (right) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible?

Problem Isolation Procedures, CHAPTER 3

493

MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL)
v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6. v No, continue with the next step. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 495. v No, continue with the next step. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. Did the problem have an ESC=1472? v Yes, go to step 7 on page 495. v No, call the next level of support.

2.

3.

4.

5.

Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 83. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4A70. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table.

Cluster problem with ESC = 14Fx and calls MAP 4B60. Cluster problem with ESC not 14Fx

494

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL)
Table 83. Problem Repair Sequence (continued) Problem Type Non-Cluster problem Repair Sequence 4 Action Repair using the problem and then return here and continue in this table. Continue with the next step.

All problems repaired

Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.

MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems?
Problem Isolation Procedures, CHAPTER 3

495

MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
v Yes, go to step 6. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 84. Problem Repair Sequence Problem Type Cluster 2 (right) has a problem calling MAP 4A60. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Continue with the next step.

Cluster problem with ESC = 14Fx and calls MAP 4B70. Cluster problem with ESC not 14Fx Non-Cluster problem

All problems repaired

Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation

496

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
The repair is complete and the automatic LIC activation process is in progress.

MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 200 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Cluster 2 (right) remained operational and cluster 1 (left) had a failure that suspended the Automatic LIC process. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. If this MAP determines that a repair of cluster 1 is needed: v DO NOT resume cluster 1 even if directed by other MAPs. v Cluster 1 must be in the quiesced state prior to resuming Automatic LIC process. Login to cluster 2 (right) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. After the repair is complete, return here and continue. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 7 on page 498. v No, continue with the next step. Did you already repair a problem found in step 2 of this MAP? v Yes, go to step 9 on page 499. v No, continue with the next step. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. Did the problem have an ESC=1472?
Problem Isolation Procedures, CHAPTER 3

2.

3.

4.

5.

6.

497

MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
v Yes, go to step 9 on page 499. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 7. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 85. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4A90. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 on page 497 of this MAP. Repair using the problem (do not resume the cluster) and then return here and continue in this table. Attention: The cluster has already been quiesced by the Automatic LIC process. It must stay quiesced as you use the standard repair procedures. Do not resume the cluster when the repair is complete as the code is not at the correct level yet. Cluster problem with ESC not 14Fx 3 Repair using the problem (do not resume the cluster) and then return here and continue in this table. Attention: The cluster has already been quiesced by the Automatic LIC process. It must stay quiesced as you use the standard repair procedures. Do not resume the cluster when the repair is complete as the code is not at the correct level yet. Non-Cluster problem 4 Repair using the problem and then return here and continue in this table. Attention: Review the guidance in MAP step 1 on page 497 before continuing with the repair.

Cluster problem with ESC = 14Fx and calls MAP 4B80.

498

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
Table 85. Problem Repair Sequence (continued) Problem Type All problems repaired Repair Sequence Action Continue with the next step.

Note: Cluster refers to CEC and I/O drawers 8. Verify that cluster 1 is in the quiesced state before continuing. From the service terminal Main Service Menu, select: Utility Menu Option Resource Management Menu Show Quiesced Resources (Cluster 1) 9. Close the problem or problems that sent you here and then continue with the next step. 10. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.

MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 200 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371.

Problem Isolation Procedures, CHAPTER 3

499

MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all cluster problems have been repaired or a decision has been made to defer the repair, continue with step 7.
Table 86. Problem Repair Sequence Problem Type Repair Sequence Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue. Repair using the problem and then return here and continue in this table. Continue at the next step.

Cluster 1 (left) has a problem 1 calling MAP 4A80.

Cluster problem with ESC = 14Fx and calls MAP 4B90.

Cluster problem with ESC not 3 14Fx Non-Cluster problem 4

All problems repaired

7. Close the problem or problems) that sent you here and then continue with the next step. 8. Verify the 2105 is fully operational. From the service terminal Main Service Menu, select:

500

VOLUME 1, TotalStorage ESS Service Guide

MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Repair Menu End of Call Status 9. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.

MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. Login to cluster 2 (right) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6 on page 502. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 502. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support.
Problem Isolation Procedures, CHAPTER 3

501

MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
5. Did the problem have an ESC=1472? v Yes, go to step 7. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 87. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4AB0. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair using the problem and then return here and continue in this table. Continue with the next step.

Cluster problem with ESC = 14Fx and calls MAP 4BA0. Cluster problem with ESC not 14Fx

Non-Cluster problem

All problems repaired

Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.

502

VOLUME 1, TotalStorage ESS Service Guide

MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)

MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
1. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6 on page 504. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 504. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 504. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources).

Problem Isolation Procedures, CHAPTER 3

503

MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 88. Problem Repair Sequence Problem Type Cluster 2 (right) has a problem calling MAP 4AA0. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair using the problem and then return here and continue in this table. Continue with the next step.

Cluster problem with ESC = 14Fx and calls MAP 4BB0. Cluster problem with ESC not 14Fx

Non-Cluster problem

All problems repaired

Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.

MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 400 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

504

VOLUME 1, TotalStorage ESS Service Guide

MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL)

Isolation
1. Login to a cluster and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for the other cluster cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If a cluster appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If a cluster appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 506. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 506. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 89. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4AE0. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP.

Problem Isolation Procedures, CHAPTER 3

505

MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL)
Table 89. Problem Repair Sequence (continued) Problem Type Cluster problem with ESC = 14Fx calling MAP 4BE0 Cluster problem with ESC not 14Fx Non-Cluster problem Repair Sequence 2 Action Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue. Repair using the problem and then return here and continue in this table. Continue with the next step.

All problems repaired

Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.

MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 000 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed.

506

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL)
v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). Were you directed here from MAP 4A10? v Yes, go to step 3. v No, continue with the next step. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A10? v Yes, return to that problem and begin with MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) on page 482. v No, call the next level of support. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 7. v No, continue with the next step. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation Go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324 and load the original code (prior to starting the Automatic LIC) on the failing cluster. After that is complete, return here and continue with the next step. Return to MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) on page 482 that sent you here. (MAP 4A10 will have you complete any remaining repairs and then resume the Automatic LIC process.) Verify that you are logged into the cluster not being repaired. Display the current boot list setting. Note the order of the hdisks in the list. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2).

1.

2.

3.

4.

5.

6.

7. 8.

Problem Isolation Procedures, CHAPTER 3

507

MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL)

9. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [One] 10. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Note: Note the hdisk that is listed. It should be the opposite hdisk to that listed first in step 8 on page 507. If it is not, call the next level of support. 11. Close the problem that calls MAP 4B10. 12. Power the failing cluster off and on (to IML the cluster). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B10? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed, continue with the next step. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation Return to MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) on page 482 that sent you here. (MAP 4A10 will have you complete any remaining repairs and then continue the Automatic LIC process.)

13. 14.

15.

16.

508

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)

MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). Were you directed here from MAP 4A20? v Yes, go to step 3. v No, continue with the next step. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A20? v Yes, return to that problem and begin with MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) on page 485. v No, call the next level of support. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 510. v No, order a cluster hard disk drive FRU to be used in a later step. Continue at the next step. Login to cluster 2 (right). Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu
Problem Isolation Procedures, CHAPTER 3

1.

2.

3.

4. 5.

509

MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activation Select cluster for LIC Activation: [Remote] Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Login to cluster 1 (left). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. Return to MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) on page 485. (MAP 4A20 will have you complete any remaining repairs and then continue the Automatic LIC process.)

7.

8.

9. 10. 11.

12.

13. Verify that you are logged into the cluster 2 (right). 14. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote]

510

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 510. If it is not, call the next level of support. Close the problem that calls MAP 4B20. Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B20? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) on page 485 that sent you here. (MAP 4A20 will have you complete any remaining repairs and then resume the Automatic LIC process.)

17. 18.

19. 20.

MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Problem Isolation Procedures, CHAPTER 3

511

MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)

Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A30? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A30? v Yes, return to that problem and begin with MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) on page 486. v No, call the next level of support. 3. Have you replaced the I/O drawer planar assembly FRU? v Yes, the boot list in the new I/O drawer planar assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 513. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 1 (left). 5. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory: [Next]

512

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activation Select cluster for LIC Activation: [Remote] 7. Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] 8. Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 9. Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. 10. Login to cluster 2 (right). 11. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 12. Return to MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) on page 486. (MAP 4A30 will have you complete any remaining repairs and then continue the Automatic LIC process.) 13. Verify that you are logged into the cluster 1 (left). 14. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select:
Problem Isolation Procedures, CHAPTER 3

513

MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 513. If it is not, call the next level of support. Close the problem that calls MAP 4B30. Quiesce and resume cluster 2 (right). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B30? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) on page 486 that sent you here. (MAP 4A30 will have you complete any remaining repairs and then resume the Automatic LIC process.)

17. 18.

19. 20.

MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

514

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL)

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A40? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A40? v Yes, return to that problem and begin with MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) on page 488. v No, call the next level of support. 3. Login to cluster 2 (right). 4. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 11 on page 516. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 5. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 6. The cluster 1 dual hard disk drives need to have the original LIC code reloaded. Use MAP 4025: Hard Drive Build Process for Automatic LIC on page 324 and then return here and continue. 7. Login to cluster 1 (left). 8. Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu
Problem Isolation Procedures, CHAPTER 3

515

MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL)
9. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 10. Return to MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) on page 488. (MAP 4A40 will have you complete any remaining repairs and then continue the Automatic LIC process.) 11. Verify that you are logged into the cluster 2 (right). 12. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 13. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [One] 14. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the hdisk that is listed. It should be the opposite hdisk to that listed first in step 12. If it is not, call the next level of support. 15. Close the problem that calls MAP 4B40. 16. Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu

516

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL)
Alternate Cluster Repair Menu 17. Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. 18. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B40? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Continue with the next step. 19. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 20. Return to MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) on page 488 that sent you here. (MAP 4A40 will have you complete any remaining repairs and then continue the Automatic LIC process.)

MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes

Problem Isolation Procedures, CHAPTER 3

517

MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A50? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A50? v Yes, return to that problem and begin with MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) on page 491. v No, call the next level of support. 3. Login to cluster 1 (left). 4. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 10. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 5. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 6. Resume cluster 2. When the resume is complete, continue at the next step. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Login to cluster 2 (right). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. Return to MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) on page 491. (MAP 4A50 will have you complete any remaining repairs and then continue the Automatic LIC process.) Verify that you are logged into the cluster 1 (left). Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu

7. 8.

9.

10. 11.

518

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 12. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 13. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 11 on page 518. If it is not, call the next level of support. 14. Close the problem that calls MAP 4B50. 15. Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 16. Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. 17. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B50? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Continue with the next step. 18. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation

Problem Isolation Procedures, CHAPTER 3

519

MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
19. Return to MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) on page 491 that sent you here. (MAP 4A50 will have you complete any remaining repairs and then continue the Automatic LIC process.)

MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A60? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A60? v Yes, return to that problem and begin with MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) on page 493. v No, call the next level of support. 3. Have you replaced the I/O drawer planar assembly FRU? v Yes, the boot list in the new I/O drawer planar assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 14 on page 522. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 2 (right).

520

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL)
5. Quiesce cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 6. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 7. Do the Copy LIC Directory. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy LIC Directory Select cluster: [Remote] Select source directory [Previous] Select destination directory [Active] 8. Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] 9. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 10. Return to MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) on page 493. (MAP 4A60 will have you complete any remaining repairs and then continue the Automatic LIC process.) 11. Login to cluster 1 (left). 12. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu

Problem Isolation Procedures, CHAPTER 3

521

MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL)
Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 13. Return to MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) on page 493. (MAP 4A60 will have you complete any remaining repairs and then continue the Automatic LIC process.) 14. Verify that you are logged into the cluster 2 (right). 15. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 16. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [One] 17. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the hdisk that is listed. It should be the opposite hdisk to that listed first in step 15. If it is not, call the next level of support. 18. Close the problem that calls MAP 4B60. 19. Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 20. Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit.

522

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL)
21. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B60? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) on page 493 that sent you here. (MAP 4A60 will have you complete any remaining repairs and then resume the Automatic LIC process.)

MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A70? v Yes, go to step 3 on page 524. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A70? v Yes, return to that problem and begin with MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) on page 495. v No, call the next level of support.
Problem Isolation Procedures, CHAPTER 3

523

MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
3. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 14 on page 525. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 1 (left). 5. Quiesce cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 6. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 7. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] 8. Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] 9. Resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 10. Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. 11. Login to cluster 2 (right). 12. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive?

524

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 13. Return to MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) on page 495. (MAP 4A70 will have you complete any remaining repairs and then continue the Automatic LIC process.) 14. Login to cluster 1 (left). 15. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 16. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14. If it is not, call the next level of support. 17. Display again the current boot list setting. The two hdisks should now be reversed in the list. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] 18. Close the problem that calls MAP 4B70. 19. Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 20. Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. 21. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option).

Problem Isolation Procedures, CHAPTER 3

525

MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
Is there a problem listed calling MAP 4B70? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) on page 495 that sent you here. (MAP 4A70 will have you complete any remaining repairs and then resume the Automatic LIC process.)

MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 200 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A80? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A80? v Yes, return to that problem and begin with MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) on page 497. v No, call the next level of support. 3. Have you replaced the I/O Drawer Planar Assembly FRU?

526

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 528. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 2 (right). 5. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] Power off and power on cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Login to cluster 1 (left). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support.

7.

8.

9. 10. 11.

Problem Isolation Procedures, CHAPTER 3

527

MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
12. Exit this MAP and go to MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) on page 497. (MAP 4A80 will have you complete any remaining repairs and then continue the Automatic LIC process.) 13. Login to cluster 2 (right). 14. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14. If it is not, call the next level of support. 17. Close the problem that calls MAP 4B80. 18. Power off and power on cluster 1 (left). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 19. Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. 20. Display problems needing repai (Repair Menu, Show / Repair Problems Needing Repair option)r. Is there a problem listed calling MAP 4B80? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced.

528

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
v No, the boot problem is fixed. Return to MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) on page 497 that sent you here. (MAP 4A80 will have you complete any remaining repairs and then resume the Automatic LIC process.)

MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 200 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A90? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A90? v Yes, return to that problem and begin with MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) on page 499. v No, call the next level of support. 3. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O Planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 530. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step.
Problem Isolation Procedures, CHAPTER 3

529

MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
4. Login to cluster 1 (left). 5. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory: [Next] Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. Login to cluster 2 (right). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive?

6.

7.

8.

9. 10. 11.

v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 12. Return to MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) on page 499. (MAP 4A90 will have you complete any remaining repairs and then continue the Automatic LIC process.) 13. Login to cluster 1 (left). 14. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu

530

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 530. If it is not, call the next level of support. Close the problem that calls MAP 4B90. Power off and power on cluster 2 (right). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B90? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) on page 499 that sent you here. (MAP 4A90 will have you complete any remaining repairs and then resume the Automatic LIC process.)

17. 18.

19. 20.

Problem Isolation Procedures, CHAPTER 3

531

MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)

MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). Were you directed here from MAP 4AA0? v Yes, go to step 3. v No, continue with the next step. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4AA0? v Yes, return to that problem and begin with MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) on page 501. v No, call the next level of support. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 533. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. Login to cluster 2 (right). Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu

1.

2.

3.

4. 5.

532

VOLUME 1, TotalStorage ESS Service Guide

MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Login to cluster 1 (left). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. Return to MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) on page 501. (MAP 4AA0 will have you complete any remaining repairs and then continue the Automatic LIC process.)

7.

8.

9. 10. 11.

12.

13. Verify that you are logged into cluster 2 (right). 14. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote]
Problem Isolation Procedures, CHAPTER 3

533

MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 533. If it is not, call the next level of support. Close the problem that calls MAP 4BA0. Quiesce and Resume cluster 1 (left). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4BA0? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) on page 501 that sent you here. (MAP 4AA0 will have you complete any remaining repairs and then resume the Automatic LIC process.)

17. 18.

19. 20.

MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

534

VOLUME 1, TotalStorage ESS Service Guide

MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)

Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4AB0? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4AB0? v Yes, return to that problem and begin with MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) on page 503. v No, call the next level of support. 3. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 536. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 1 (left). 5. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next]
Problem Isolation Procedures, CHAPTER 3

535

MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] 7. Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] 8. Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 9. Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. 10. Login to cluster 2 (right). 11. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 12. Return to MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) on page 503. (MAP 4AB0 will have you complete any remaining repairs and then continue the Automatic LIC process.) 13. Verify that you are logged into cluster 1 (left). 14. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select:

536

VOLUME 1, TotalStorage ESS Service Guide

MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 536. If it is not, call the next level of support. Close the problem that calls MAP 4BB0. Quiesce and Resume cluster 2 (right). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4BB0? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) on page 503 that sent you here. (MAP 4AB0 will have you complete any remaining repairs and then resume the Automatic LIC process.)

17. 18.

19. 20.

MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
This MAP is called for Automatic LIC Phase 400 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Problem Isolation Procedures, CHAPTER 3

537

MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL)

Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). Were you directed here from MAP 4AE0? v Yes, go to step 3. v No, continue with the next step. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4AE0? v Yes, return to that problem and begin with MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL) on page 504. v No, call the next level of support. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 5. v No, continue with the next step. Call the next level of support. The following notes describe the situation and actions needed to recover: v The hard disk drive that was loaded with the new LIC code has probably failed. v A hard drive rebuild is needed, but can only be done to the original LIC code level.

1.

2.

3.

4.

Note: The existing configuration diskettes contain data that is only valid for the original LIC code level. New level diskettes cannot be created until both clusters have been operational at the new level for a minimum of 12 hours. v Once the original LIC code has been loaded, a manual process to replace the failed hard drive and load/activate the new LIC code will be needed. 5. Login to working cluster. 6. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu

538

VOLUME 1, TotalStorage ESS Service Guide

MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL)
Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 7. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 8. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 6 on page 538. If it is not, call the next level of support. 9. Close the problem that calls MAP 4BE0. 10. Power off and power on the failing cluster. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 11. Wait up to 45 minutes for the rack operator panel Cluster Ready LED to be lit. 12. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4BE0? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL) on page 504 that sent you here. (MAP 4AE0 will have you complete any remaining repairs and then resume the Automatic LIC process.)

Problem Isolation Procedures, CHAPTER 3

539

MAPs 5XXX: Host Interface Isolation Procedures

MAPs 5XXX: Host Interface Isolation Procedures


Procedures in the MAP 5XXX group in Chapter 3 cover the host interface attached to the 2105 Model 800 and the internal read/write data paths.

MAP 5000: ESS Specialist Cannot Access Cluster


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
ESS specialist is accessed by using a web browser from the ESSNet console or other customer console. The ESS specialist software runs on each 2105 Model 800 cluster. Both the customer console and the ESSNet console access the cluster through the ESSNet ethernet hub.

Isolation
1. Does ESS Specialist access work from the ESSNet console? v Yes, continue with the next step. v No go to step 4. 2. Is access working from a customer console (if used)? v Yes, ensure access works to both clusters before determining that the problem is no longer occurring. v No, continue with the next step. 3. ESS specialist works from the ESSNet console but fails from the customer console. The customer network accesses the cluster through an ethernet connection at the ESSNet console ethernet hub. Check the following: v Customer is using the proper Hostname for the cluster on an intranet. v Customer is using the proper Hostname and domain name for the cluster on internet. v Have the customer try the tcp/ip address. v Have the customer ping the tcp/ip address. If the ping is successful, then there is a problem with the domain nameserver or other customer or internet problem. v Verify that the ESSNet ethernet hub port indicator for the customer network attachment is on or blinking. This means it is able to communicate with the customer ethernet hub/connection. The problem is either a failing port on the ESSNet ethernet hub or more likely a customer network problem. Go to MAP 4450: ESS Cluster to Customer Network Problem on page 407. 4. Ensure that the cluster has ESS Specialist access enabled. The InfoServer status will be running. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu ESS Specialist Menu Show ESS Specialist Status Continue with the next step. 5. Is the InfoServer running? v Yes, go to MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem on page 405 v No, use the Enable / Disable ESS Specialist option to enable it.

540

VOLUME 1, TotalStorage ESS Service Guide

MAP 5220: SCSI Bus

MAP 5220: Isolating a SCSI Bus Error


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: To prevent electrostatic discharge, ensure you discharge all SCSI host cables to the ESD discharge pad, before you plug them into the 2105 Model 800. The ESD discharge pads are mounted on the front right and left corners of the 2105 Model 800 frame, next to each tailgate. See Figure 144 on page 543 for the location of the ESD discharge pads. The SCSI bus has an error:

Description
SCSI bus errors can be detected by any SCSI bus card on the interface. The 2105 Model 800 SCSI host card will most often detect errors in the signals it receives. The customer host system SCSI card will most often detect errors in the signals it receives. The SCSI cables seldom fail, but the SCSI cable connections may cause errors if they are not properly seated. Errors can also be caused if there are not terminators on each end of the SCSI cable. The 2105 Model 800 SCSI Host Adapter has a terminator on the card itself.

Isolation
1. Display and repair any 2105 Model 800 reported SCSI adapter problems that may be related to the failure. If none are found, continue with the next step. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair 2. Use the following checks to locate and repair the problem. 3. Check for a fenced condition: Note: If SCSI parts have been replaced and the customer still does not have access to some volumes. The original SCSI error could have fenced a SCSI port. a. Verify that the SCSI ports are not fenced: Connect the service terminal to the cluster being serviced. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Show Fenced Resources b. Reset any fenced SCSI ports: From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Reset Fence For a Resource Check that the SCSI host cable is properly connected at each SCSI card. Check that the 2105 Model 800 SCSI host card(s) is properly seated. Check that the host system(s) SCSI card(s) is properly seated. Check the termination of the SCSI Bus: v A SCSI bus interface cable connects two or more SCSI cards. Connectors at each end of the daisy-chain must be terminated. The 2105 Model 800 SCSI host card must be at one end of the SCSI cable. If two 2105 Model
Problem Isolation Procedures, CHAPTER 3

4. 5. 6. 7.

541

MAP 5220: SCSI Bus


800 SCSI host cards are attached to a SCSI bus interface cable, they must be at the opposite ends of the SCSI cable. The customer host SCSI card(s) (one to four) must be in between. SCSI bus terminations are internal to the 2105 Model 800 SCSI Host Card. v If two 2105 Model 800 SCSI host cards are connected to the SCSI bus, ensure that the host system SCSI card(s) are not configured to terminate the SCSI bus when the host system is powered off. 8. Check the SCSI ID Settings. There must be no duplicates for the ports connected to the same SCSI bus cable. v If two 2105 Model 800 SCSI ports are attached to the same SCSI cable, verify that the SCSI ID assignments in each port are not in conflict. v Verify that each host SCSI card attached to the SCSI bus is set to a unique SCSI ID. v Verify that host SCSI host card SCSI ID assignments are correctly registered in the 2105 Model 800 SCSI port configuration. 9. Check SCSI bus slot parameter settings: Note: 2105 Model 800 SCSI bus parameters are set according to the host type configuration setting for each 2105 Model 800 SCSI port. These are recorded on the customer worksheets that were used to install the 2105 Model 800. a. Verify that the host type setting is correct for each 2105 Model 800 SCSI host cards attached to the SCSI bus cable. b. Verify that the SCSI bus parameter settings that have been configured into each attached host system SCSI host card are in agreement with the 2105 Model 800 SCSI bus parameter settings. 10. SCSI diagnostics: v The 2105 Model 800 has no SCSI diagnostics available to test the SCSI interface. The customer host system may have SCSI diagnostics that can be used to test the SCSI interface. Those same diagnostics may have procedures available to recreate and isolate the problem. Those diagnostics or procedures can be used now. v If the problem is not yet isolated, the 2105 Model 800 SCSI host card can be replaced now. Connect the service terminal to the cluster being serviced. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Host Bay FRUs Follow the guided procedure. 11. Was a problem found and repaired? v Yes, after the problem is repaired, go to MAP 1500: Ending a Service Action on page 67. v No, if no problem is found, and the failure still occurs, call the next level of support.

542

VOLUME 1, TotalStorage ESS Service Guide

MAP 5230: Fixed Block Read Data

Cluster 1

Cluster 2

Front View

ESD Discharge Pad

Top View Tailgate

ESD Discharge Pad

Figure 144. 2105 Model 800 ESD Discharge Pad Locations (s009141)

MAP 5230: Isolating a Fixed Block Read Data Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: To prevent electrostatic discharge, ensure you discharge all SCSI host cables to the ESD discharge pad, before you plug them into the 2105 Model 800. The ESD discharge pads are mounted on the front right and left corners of the 2105 Model 800 frame, next to each tailgate. See Figure 144 for the location of the ESD discharge pads.

Description
You are here to resolve a Data Check failure that has been logged with one of the ESC values listed below. An action to repair hardware or microcode is necessary; the action required may be to repair another problem in the log. This MAP isolates for the following ESCs: v ESC 3490, customer data sequence number validation error with data LRC. v ESC 34A0, customer data sequence number validation error without data LRC. v ESC 34AF, third or later repeat of customer data sequence number validation error on the same target LBA (Logical Block Address), track or volume. v ESC 34B0, SCSI Send Diagnostic command initiated data transfer validation process failure. v ESC 4960, second occurrence of customer data sequence number validation error on the same target LBA (Logical Block Address), track or volume.

Isolation
Refer to Table 90 on page 544 for the ESC that requires problem resolution. Determine the necessary hardware or microcode repair action.

Problem Isolation Procedures, CHAPTER 3

543

MAP 5230: Fixed Block Read Data


Table 90. SCSI Read Data Failure ESC Repairs ESC 3490 Description Customer Data Sequence Number validation error. Data transferred from a DDM to cache memory is not from the expected Logical Block Address (LBA). The Sequence Number in the received LBA does not match the expected Sequence Number. Sequence number validation also detected LRC indicating that the LBA data is defective. Recommended Action LRC failures are a higher priority symptom. If the problem contains a failure with ESC value 33XX (LRC failure), the recommended action is to repair the ESC 33XX problem. If a problem with ESC 33xx does not exist then the probable cause for this failure is a Microcode Logic Error. The recommended action is to contact your next level of support for fault isolate and repair assistance. An error has occurred during the reading or writing of data from the track, volume or array. The recommended action is to contact your next level of support for fault isolate and repair assistance. Customer repair action may be required to restore data after the hardware problem has been resolved. This problem should only be used to determine a repair action if the problem does not contain any other records for a hardware failure that would be associated with this diagnostic failure SCSI port, data path and target volume. If you are unable to identify another hardware repair action then the recommended action is to contact your next level of support for fault isolate and repair assistance.

34A0, 34AF, or 4960

Customer Data Sequence Number validation error. Data transferred from a DDM to cache memory is not from the expected Logical Block Address (LBA). The Sequence Number in the received LBA does not match the expected Sequence Number. ESC 34AF indicates that additional Sequence Number error events have been logged for the same target LBA, track or volume.

34B0

A SCSI Send Diagnostic command initiated data transfer validation process failed. A write or read data transfer failure would be logged as another error event and ESC. If no other error has been logged then this failure indicates that the data read did not match the test pattern data written.

MAP 5240: Isolating a Customer Data Check Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
You are here to resolve a Data Check failure that has been logged with one of the ESC values listed below. An action to repair hardware or microcode is necessary. This required action will be to repair another problem in the log. The failure has caused customer data to be unreadable. The customer must restore the data after the hardware or microcode repair action is complete. This MAP isolates for the following ESCs: v ESC 4910, Customer data check, DDM medium error, single LBA. v ESC 4920, Customer data check, DDM medium error, multiple LBAs. v ESC 4930, Customer data check, data LRC, single LBA. v ESC 4940, Customer data check, data LRC, multiple LBAs.

Isolation
1. Refer to Table 91 on page 545 for the ESC that requires problem resolution. Determine the necessary hardware or microcode repair action.

544

VOLUME 1, TotalStorage ESS Service Guide

MAP 5240: Customer Data Check


Note: Close the problem that sent you here when the data check failure repair/recovery is complete. 2. After the underlying hardware has been repaired, customer repair action will be required to restore the track: Fixed Block: Refer to the Additional Message in the problem for the failed volume and first failing LBA on track information. Restore this data from backup. CKD: A Media SIM for Media Maintenance Procedure 2 has been sent to the host. Ask the customer to follow this procedure to return the track to usable condition, then restore the customer data from backup. Media Maintenance Procedure 2 is described in Analyzing a Media SIM.

If a hardware repair problem is not available for this failure, the failure may be intermittent. If the data failure continues, call your next level of support for assistance in isolating and repairing the problem.
Table 91. Customer Data Check Failure ESC Repairs ESC 4910 or 4920 Description Recommended Action

Customer Data Check affecting one or Locate and repair the problem with more Logical Block Address on the ESC CXXX, DXXX or EXXX that target volume. 4910 indicates one contains a repair action for the DDM LBA, 4920 indicates more than one or SSA device card that is associated LBA. with this Data Check. The SSA device card reported a Medium Error during data transfer from DDM to cache memory.

4930 or 4940

Customer Data Check affecting one or Locate and repair any problems with more Logical Block Address on the ESC 33XX or 34XX. target volume. 4930 indicates one LBA, 4940 indicates more than one LBA. An LRC check, sequence number check or physical address check detected during data transfer could not be recovered. Data has been marked defective on the DDM. Subsequent attempts to read this data will fail.

Analyzing a Media SIM


For information about correcting a failure that causes a media SIM, see Maintaining IBM Storage System Media (form number GC26-4495-05 or later). Note: Before the customer does a media maintenance procedure, the customer may need to determine the address of the cylinder and head involved in the failure. Use the SIM portion of an EREP system execution report to obtain the address (cccchh). 2105 Model 800 Media SIM Maintenance: Instruct the customer to perform the media maintenance procedure indicated in Media Sim Maintenance Procedure 2 on page 546 Also, look at the examples shown in Example of Media Sim Maintenance Procedure 2 on page 546.

Problem Isolation Procedures, CHAPTER 3

545

MAP 5240: Customer Data Check


Media Sim Maintenance Procedure 2: The first part of this procedure finds all tracks with unrecoverable data and supplies information on the allocation of the user data (for example, dataset names). The second part of this procedure returns the indicated track to a usable condition. Data on this track is no longer readable. All subsystem attempts at media maintenance have been unsuccessful. All attempts to recover the data have been unsuccessful. 1. Using ICKDSF Release 16 or higher, enter the following commands:
IODELAY SET MSEC(100) ANALYZE <UNIT() |DDNAME()> NODRIVE SCAN

IODELAY adjusts ICKDSF to run concurrently with customer operations. ANALYZE scans the volume for data that is not readable or usable. 2. See Example of Media Sim Maintenance Procedure 2 for the location of the ESC and addresses of the failing track and head (cccchh) in the Analyze sense information. 3. For each track that reports an ESC of 49XX, issue the following command (all on the same line):
INSPECT <UNIT() | DDNAME()> <VFY()|NOVFY> ASSIGN NOCHECK NOPRESERVE TRACK(cccc,hh)

Warning: The above ICKDSF inspect command will result in the loss of all customer data on that track. The NOPRESERVE parameter must be specified for the 2105 Model 800. The PRESERVE parameter is not valid for the 2105 Model 800. All previous attempts by the subsystem to recover the data have not been successful. Although the track will be returned to a usable state, all customer data on the specified track will be lost when the INSPECT command is run. Example of Media Sim Maintenance Procedure 2: To locate all tracks with unrecoverable data, obtain information on the allocation of user data. To restore such tracks to a usable condition, run the ICKDSF command sequence below. ICKDSF must be at level 16 or higher. The bold text in the following example is defined in the note below.
ENTER INPUT COMMAND: analyze unit(1290) nodrive scan ANALYZE UNIT(1290) NODRIVER SCAN ICK00700I DEVICE INFORMATION FOR 1290 IS CURRENTLY AS FOLLOWS: PHYSICAL DEVICE = XXXX STORAGE CONTROLLER = XXXX STORAGE CONTROL DESCRIPTOR = CC DEVICE DESCRIPTOR = 06 ICK04000I DEVICE IS IN SIMPLEX STATE ICK01400I 1290 ANALYZE STARTED ICK01408I 1290 DATA VERIFICATION TEST STARTED ICK21776I DATAVER TEST: ERROR DURING DATA VERIFICATION CSW = D07C88 0200FFFF CCW = DE000000 3000FFFF FILEMASK = 1E SENSE = 80000000 9000010B 00000034 80000004 02007667 FB200F0B 000040E2 0003A401 ICK21401I 1290 SUSPECTED DRIVE PROBLEM ICK401I 1290 SUSPECTED DRIVE PROBLEMcchh ICK01406I 1290 ANALYZE ENDED ICK00001I FUNCTION COMPLETED, HIGHEST CONDITION CODE WAS 8

Note: In this example, the ESC is 0F0B and the failing track and head address (cccchh) is 03A401. The cccc is 03A4 and the hh is 01. Common ICKDSF Messages:

546

VOLUME 1, TotalStorage ESS Service Guide

MAP 5240: Customer Data Check


ICK31054I - Device not supported for specific function Ensure that the parameters specified in the media maintenance procedure are correct and rerun the ICKDSF media maintenance procedure. ICK12155I - Parameter ignored for device type (parameter) The parameter identified is not valid for the 2105 Model 800. This parameter is ignored and processing continues. No action is needed.

MAP 5250: Isolating a Meta Data Check Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
You are here to resolve a Data Check failure that has been logged with one of the ESC values listed below. An action to repair hardware or microcode is necessary. This required action will be to repair another problem in the log. This MAP isolates for the following ESCs: v ESC 4980, Meta data check, DDM medium error, single LBA. v ESC 4990, Meta data check, DDM medium error, multiple LBA. v ESC 49A0, Meta data check, data LRC, single LBA. v ESC 49B0, Meta data check, data LRC, multiple LBA.

Isolation
Refer to Table 92 for the ESC that requires problem resolution. Determine the necessary hardware or microcode repair action. Data will be recovered by internal microcode. No data repair action is required. If a hardware repair problem is not available for this failure, the failure may be intermittent. If the data failure continues, call your next level of support for assistance in isolating and repairing the problem.
Table 92. Meta Data Check Failure ESC Repairs ESC 4980 or 4990 Description Meta Data Check affecting one or more Logical Block Address on the target volume. 4980 indicates one LBA, 4990 indicates more than one LBA. The SSA device card reported a Medium Error during data transfer from DDM to cache memory. 49A0 or 49B0 Meta Data Check affecting one or more Logical Block Address on the target volume. 49A0 indicates one LBA, 49B0 indicates more than one LBA. An LRC check detected during data transfer from DDM to cache memory could not be recovered. Locate and repair the problem with ESC 33XX that contains a repair action for the DDM or SSA device card that is associated with this data check. Recommended Action Locate and repair the problem with ESC CXXX, DXXX or EXXX that contains a repair action for the DDM or SSA device card that is associated with this Data Check.

Problem Isolation Procedures, CHAPTER 3

547

MAP 5300: Link Fault Isolation

MAP 5300: ESCON or Fibre Channel Link Fault


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Fiber Optic Cable Handling Precautions


CAUTION: Do not look into the end of a fiber optic cable or into a fiber optic receptacle. Eye injury can result. To verify the continuity of a fiber optic cable, use an optical light source and a power meter. Although shining light into one end and looking into the other end of a disconnected optical fiber to verify the continuity of optic fibers may not injure the eye, this procedure is potentially dangerous. Therefore, verifying the continuity of optical fibers by shining light into one end and looking into the other end is not recommended. (1061) Note: This notice is translated into selected languages. See Translation of Cautions and Danger Notices in chapter 11 of the Volume 3. Attention: Fiber optic cables are easily damaged from fiber breakage. The cable connectors also must be clean to perform correctly. Observe the following precautions to prevent damage when you handle fiber optic cables: 1. Save all the plastic connector covers for later use. These covers can be used to protect the link cable connectors when you remove the 2105 host adapter card or when you store the cables. 2. Do not remove the protective cover plugs from the connector ends until you are ready to insert the connector into a card. You may have to remove the cover to feed the cable through the tailgate. 3. Before you insert the connector into a card, ensure that you clean the connector end faces. Use the fiber optic cleaning procedure specified in the fiber optic connector cleaning kit (New P/N 46G6844 or Old P/N 5453521). 4. Do not pull on the connector. 5. Do not bend the cable to a radius smaller than 12mm (0.5 in).

Description
Link incidents are problems that are not automatically detected, isolated and reported by any one single node on the optical link. They occur on an interface and may cause multiple nodes to detect different types of link incidents. Each node detecting and reporting a link incident will generate its own link incident. Fault isolation of link incidents is solved by the combined use of product and system documentation: v Enterprise Systems Link Fault Isolation book, form number SY22-9533 v Maintenance Information for S/390 Fiber Optic Links (ESCON, FICON, Coupling Links, and Open System Adapters) book, form number SY27-2597.

Isolation
1. Were you sent here by a link incident which was detected by a unit or device external to this 2105? v Yes, continue with the next step. v No, go to step 3 on page 549. 2. Use link fault isolation procedures to determine the source of the problem. See Description above.

548

VOLUME 1, TotalStorage ESS Service Guide

MAP 5300: Link Fault Isolation


Is the 2105 suspected to be causing this problem? v Yes, continue with the next step. v No, the problem is not in this 2105. Use maintenance procedures for the suspected failing unit, or call the next level of support 3. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem for the host adapter connected to the failing link? v Yes, continue with the next step. v No, go to step 5. 4. Is the ESC defined as a Bit Error Rate Failure? Note: Reference chapter 9 in Volume 3, examples of these ESCs are 1326, 1328, 1329, and 356A. v Yes, Go to the map for the host adapter card that is reporting the problem. Fibre Channel Card, go to MAP 5410: Fibre Channel Bit Error Rate Validation on page 563. If the problem is not resolved return here and go to step 7. ESCON Card, go to MAP 5310: ESCON Bit Error Rate Validation on page 551. If the problem is not resolved return here and go to step 7. v No, use the problem to replace the FRUs. 5. Is the link cable properly connected to host adapter card? v Yes, continue with the next step. v No, connect the cable, exit this MAP and have the customer attempt to use the link. If the problem still exists, continue with the next step. 6. Clean the fiber optic connectors on the host card and cables. Then continue with the next step. Note: Clean the fiber optic connectors as described in the cleaning instructions in the fiber optic connector cleaning kit (New P/N 46G6844, Old P/N 5453521). Run the Wrap test to determine if the Host adaptor card or port is failing: v For ESCON, go to step 12a on page 550 and do procedure A. v For Fiber Channel (including FICON), go to step 12b on page 550 and do procedure B. Did the test run successfully? v Yes, continue with the next step. v No, use the repair process to replace the FRU. Have the customer attempt to use the link. Was the customer able to use the link without further errors? v Yes, continue with the next step. v No, the problem may be intermittent, go to step 10. Is this the first time with a problem on this link? v Yes, the problem may be fixed. No further action required unless the problem returns. v No, the problem may be intermittent, continue with the next step. Check the optical transmitter output level:

7.

8.

9.

10.

Problem Isolation Procedures, CHAPTER 3

549

MAP 5300: Link Fault Isolation


v For ESCON, Use MAP 5320: ESCON Optical Power Measurement, Isolation Procedure 1: Optical Transmitter Measurement on page 553 to answer the question below. v For Fiber Channel Use MAP 5321: Fibre Channel Optical Power Measurement, Isolation Procedure 1: Optical Transmitter Measurement on page 557 to answer the question below. Was the optical transmitter output correct? v Yes, continue with the next step. v No, use the repair process to replace the FRU: From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Host Bay FRUs a. Select the host bay containing the failing host card b. Select the failing host card. 11. Check that the optical receiver is receiving a correct signal level: v For ESCON, Use MAP 5320: ESCON Optical Power Measurement, Isolation Procedure 2: Optical Receiver Measurement on page 555 to answer the question below. v For Fiber Channel Use MAP 5321: Fibre Channel Optical Power Measurement, Isolation Procedure 2: Optical Receiver Measurement on page 559 to answer the question below. Was the optical receiver input level correct? v Yes, verify that all optical link cables are reconnected, cancel any outstanding problems logged for this link, resume any quiesced links, then call your next level of support. v No, the problem is within the link cabling or the transmitter in the remote card. 12. Is the problem cause identified? v Yes, exit this procedure. v No, call your next level of support. a. Procedure A: From the service terminal Main Service Menu, select: Machine Test Menu Host Interface Cards Menu ESCON Host Cards Menu ESCON Port Optical Wrap Test b. Procedure B: From the service terminal Main Service Menu, select: Machine Test Menu Host Interface Cards Menu Fibre Channel Host Cards Menu Fibre Channel Port Optical Wrap Tests

MAP 5305: ESCON or Fibre Channel Bit Error Rate Test Failure
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

550

VOLUME 1, TotalStorage ESS Service Guide

MAP 5305: Bit Error Rate Test Failure

Description
The Bit Error Rate Test was run and the ESCON or fibre channel link that the test was run on could not transmit fibre frames. (The Bit Error Rate Test counts errors during fibre frame transmission on a fibre link, when the link cannot transmit frames, no errors can occur. This means that the Bit Error Rate Test cannot be run.) The problem with the link may be caused by the fibre link itself or by the adapter at either end of the link.

Isolation
This diagnostic requires that the port being tested is connected to an enabled source transmitting communication frames. This does not require customer data transfer, the normal idle process is enough. Normally the interface must be physically connected directly to a host system or through a switch (ESCON director or SAN fabric). Do you have an enabled connection as described? v Yes, return to the procedure that sent you here. v No, the test was run without the needed connection, close the problem

MAP 5310: ESCON Bit Error Rate Validation


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Bit Error Rate Threshold incidents are caused by specific conditions at an interface or along a line which can cause bits to be received or interpreted incorrectly. These bit errors are counted, and when a specific number is reached (threshold exceeded), the link is operating in a degraded mode. Bit errors are counted by each node attached on a link. You must determine which node(s) in a link have detected a threshold exceeded condition to identify the link or nodes causing the incident.

Isolation
1. Determine what type of error was reported by the customer. Was the customer-reported error a Bit Error Threshold Exceeded (BER) detected at the ATTACHED node? v Yes, go to step 3. v No, continue with the next step. 2. display problems using the following service panel options: From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Are there any bit error rate problems (ESC=356A) for the failing link? v Yes, continue with the next step. v No, additional link problem determination is needed. Ensure that all optical link cables are reconnected, then return to MAP 0120 in the Enterprise Systems Link Fault Isolation book, form number SY22-9533. 3. Test the bit error rate: v Reconnect the optical link cables to the subsystem, if previously disconnected.
Problem Isolation Procedures, CHAPTER 3

551

MAP 5310: ESCON Bit Error Validation


v Run the Bit Error Rate Test on the failing link: From the service terminal Main Service Menu, select: Machine Test Menu Interface Cards Menu ESCON Host Ports Menu ESCON Port Optical Bit-Error-Rate Test Select the SA interface to be tested, and follow the instructions on the screen to run the test. Did the test run successfully? Yes, cancel any outstanding Bit Error Rate problems logged for this link and resume any quiesced links. The call is complete. No, continue with the next step. 4. Determine how many times the ESCON Port Optical Bit-Error-Rate Test has been run. Has this test been run only one time? v Yes, clean the fiber optic connectors and run this test again. Use the fiber optic cleaning procedure specified in the fiber optic connector cleaning kit (New P/N 46G6844 or Old P/N 5453521). Continue at step 3 on page 551. v No, return to the step in MAP 5300: ESCON or Fibre Channel Link Fault on page 548 that sent you here.

MAP 5320: ESCON Optical Power Measurement


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Ensure that you read the Fiber Optic Cable Handling Precautions on page 548 before you run this test.

Description
This MAP contains two procedures: v Isolation Procedure 1: Optical Transmitter Measurement on page 553 v Isolation Procedure 2: Optical Receiver Measurement on page 555 The procedures should be performed sequentially. These procedures measure the optical power at the 2105 Model 800 ESCON card and the customers ESCON port cable using the optical power meter (P/N 18F7005). The coupler and test cable are part of the fiber optic test support kit (P/N 18F6953). Isolation Procedure 1 will run the ESCON Port Optical Wrap Test on the selected 2105 ESCON card port. A successful wrap test will not only ensure that the card is operating correctly, but will also condition the port for a power measurement. Note: Do not skip the wrap test, even if was previously run.

552

VOLUME 1, TotalStorage ESS Service Guide

MAP 5320: ESCON Optical Power Measurement


2105 FICON Host Card Duplex Connector Test Cable Biconic Connectors

White Duplex Connector From Host Adapter Card 1300nn To ESCON Channel or ESCON Director 2105 FICON Host Card Duplex Connector Test Cable Power Meter

Black

Biconic Connectors

White Duplex Connector From Host Adapter Card 1300nn To ESCON Channel or ESCON Director
Figure 145. Measuring Optical Transmit Power (S008185m)

Black

Power Meter

Isolation Procedure 1: Optical Transmitter Measurement: This procedure measures the optical power transmitted from the 2105 Model 800 ESCON card through a short test cable (P/N 18F6948). Note: Clean the fiber optic connectors as described in the cleaning instructions in the fiber optic cleaning kit (New P/N 46G6844 or Old P/N 5453521) before connecting or reconnecting the fiber optic cables. 1. Verify that the host bay containing the 2105 Model 800 ESCON card is powered on. 2. Run the optical wrap test on the desired ESCON card port: From the service terminal Main Service Menu, select: Machine Test Menu Host Interface Cards Menu ESCON Host Cards Menu
Problem Isolation Procedures, CHAPTER 3

553

MAP 5320: ESCON Optical Power Measurement


ESCON Port Optical Wrap Test Select the desired host card port to be tested and follow the screen instructions to run the test. Note: After the wrap test returns a Machine Test was successful message and displays Make Resource Available for Customer Use, STOP! Do not press Enter. Do not follow the screen instructions to reconnect the customer cables and resume the resources until instructed to do so at the end of this MAP (when all the power measurements are complete). Was the wrap test successful? v Yes, the host card port is now ready to perform the power measurement, continue with the next step. v No, use the repair process to replace the FRU: From the service terminal Main Service Menu, select: Utility Menu Show/Repair Problems Needing Repair Repair any ESCON host card problems shown, then repeat the ESCON Optical Power Measurement. Connect the duplex connector of the optical power meter test cable to the 2105 Model 800 ESCON card duplex connector (see Figure 145 on page 553). If the optical power meter has not been previously turned on, zeroed, and set to the correct scale, set the meter using Optical Power Meter Setup on page 556. After the meter is set, insert the black biconic connector of the test cable, P/N 18F6948, into the receptacle on the top of the power meter. Use the optical power meter to obtain a reading. The power reading should be at least -21 dBm (-20 dBm is more power than -21 dBm. For example, -22 dBm indicates that the transmitter is failing.) Record the actual measurement value for possible use during the link fault isolation procedures. Disconnect the test cable from the 2105 Model 800 ESCON card. Was the power measurement correct? v Yes, continue with the next step. v No, use the repair process to replace the ESCON host card FRU: Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Host Bay FRUs Replace the ESCON host card that was being tested, then repeat the Optical Transmitter Measurement.

3.

4.

5.

6. 7.

Note: The repair procedure will resume the required resources. When the repair is complete, return to the procedure that directed you here. 8. Do you still need to perform the Optical Receiver Measurement procedure? v Yes, perform the Optical Receiver Measurement (Isolation Procedure 2).

554

VOLUME 1, TotalStorage ESS Service Guide

MAP 5320: ESCON Optical Power Measurement


v No, continue with the next step. 9. Return to the service terminal and follow the instructions on the screen to: Make Resource Available for Customer Use 10. Return to the procedure that sent you here.

2105 FICON Host Card

Duplex Connector To ESCON Channel or ESCON Director

Duplex Connector Test Cable

Duplex-to-Duplex Test Coupler Biconic Connectors

White

Black

1300nn Power Meter


Figure 146. Measuring Optical Receive Power (s008186n)

Isolation Procedure 2: Optical Receiver Measurement: This procedure measures the power received at the end of the customers ESCON link cable (input into the 2105 host card optical receiver). Note: Always clean the fiber optic connectors as described in the cleaning instructions in the fiber optic cleaning kit (New P/N 46G6844 or Old P/N 5453521) before connecting or reconnecting the fiber optic cables. 1. Ensure that the device on the other end of the link is powered on. 2. Disconnect the fiber optic cable connector from the duplex connector on the 2105 Model 800 ESCON card, if not previously disconnected. 3. Connect the duplex connector of the customers fiber optic cable (the duplex connector that was removed from the 2105 Model 800 ESCON card) into one side of the duplex-to-duplex test coupler, P/N 18F6952 (see Figure 146). 4. Connect the duplex connector of the optical power meter test cable into the other side of the duplex-to-duplex test coupler. If the optical power meter has not been previously turned on, zeroed, and set to the correct scale, set the meter using Optical Power Meter Setup on page 556. After the meter is set, insert the black biconic connector of the test cable, P/N 18F6948, into the receptacle on the top of the power meter. 5. Use the optical power meter to obtain a reading. The power reading should be at least -29.0 dBm (-28.0 dBm is more than -29.0 dBm).
Problem Isolation Procedures, CHAPTER 3

555

MAP 5320: ESCON Optical Power Measurement


Record the actual measurement value for possible use later during the link fault isolation procedures. 6. Disconnect the customer fiber optic channel cable from the coupler and reconnect the cable to the 2105 Model 800 ESCON card. 7. Return to the service terminal and follow the instructions on the screen to: Make Resource Available for Customer Use 8. Return to the procedure that sent you here. Optical Power Meter Setup: Use this procedure only to do the initial setup of the optical power meter (P/N 18F7005): 1. Power meter On. 2. Set the meter to 1300 nanometers (nm). 3. Zero the meter. 4. Set the meter to display the dBm scale. Note: Do not hold down a push-button for more than one-half second. When held down for more than approximately three seconds, the push-button generates results different from those needed. Ensure that the black cap is over the biconic receptacle at the top of the power meter. Press Power On/Off. AUTO OFF will be displayed and the meter will turn off if no push-button is pressed in ten minutes. Allow a two minute warm-up period. If the meter does not display 1300 nm, press the (lambda) push-button repeatedly until 1300 nm is displayed. Press ZERO, two displays will be seen: v A value between 0.30 and 0.70 nW (nanowatts). v ZERO will blink after a short time, indicating that the meter is properly set to zero. If the above indicators do not display and Hi or Lo is displayed after pressing ZERO, press ZERO again. Using a small screwdriver, adjust the trim pot that is next to the biconic receptacle at the top of the meter until a value of between .30 and .70 nW is displayed. Set the value as close to .50 nW as possible. Press ZERO again to zero the meter. The meter must also display dBm (decibel, based on one milliwatt). If nW is displayed, press dBm/Watt. Continue with one of the following: v Isolation Procedure 1: Optical Transmitter Measurement on page 553 v Isolation Procedure 2: Optical Receiver Measurement on page 555

5. 6. 7. 8.

9.

MAP 5321: Fibre Channel Optical Power Measurement


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Ensure that you read the Fiber Optic Cable Handling Precautions on page 548 before you run this test.

Description
This MAP contains two procedures: v Isolation Procedure 1: Optical Transmitter Measurement on page 557

556

VOLUME 1, TotalStorage ESS Service Guide

MAP 5321: Fibre Optical Power Measurement


v Isolation Procedure 2: Optical Receiver Measurement on page 559 The procedures should be performed sequentially. These procedures measure the optical power at the 2105 Model 800 fibre channel host card and the customers fibre channel cable using the optical power meter (P/N 18F7005). Isolation Procedure 1 will run the Fibre Channel Port Optical Wrap Test on the selected 2105 fibre card port. A successful wrap test will not only ensure that the card is operating correctly, but will also condition the port to perform a power measurement. Isolation Procedure 2 will measure the power at the end of the customers cable. Note: Do not skip the wrap test, even if was previously run. Isolation Procedure 1: Optical Transmitter Measurement: This procedure measures the optical power transmitted from the 2105 Model 800 fibre channel host card through a short C-to-SC test cable (P/N 54G3407), SC-to-ST adapter (P/N 54G3424) and ST-to-ST test cable (P/N 02G6159). Note: Clean the fiber optic connectors as described in the cleaning instructions in the fiber optic cleaning kit (New P/N 46G6844 or Old P/N 5453521) before connecting or reconnecting the fiber optic cables.
SC-to-ST Adapter (54G3424) ST-to-ST Test Cable (02G6159)

Device SC-to-SC Test Cable (54G3407)

Power Meter
Figure 147. Measuring Fibre Channel Optical Transmit Power (s008840l)

1. Verify that the host bay containing the 2105 Model 800 fibre channel card is powered on. 2. Run the optical wrap test on the desired fibre channel card port: From the service terminal Main Service Menu, select: Machine Test Menu Host Interface Cards Menu Fibre Channel Host Cards Menu Fibre Port Optical Wrap Test Select the desired host card port to be tested and follow the screen instructions to run the test.

Problem Isolation Procedures, CHAPTER 3

557

MAP 5321: Fibre Optical Power Measurement


Note: After the wrap test returns a Machine Test was successful message and displays Make Resource Available for Customer Use, STOP! Do not press Enter. Do not follow the screen instructions to reconnect the customer cables and resume the resources until instructed to do so at the end of this MAP (when all the power measurements are complete). 3. Was the wrap test successful? v Yes, the host card port is now ready to perform the power measurement, continue with the next step. v No, use the repair process to replace the FRU: From the service terminal Main Service Menu, select: Utility Menu Show/Repair Problems Needing Repair Repair any fibre channel host card problems shown, then repeat the Fibre Optical Power Measurement. 4. Connect one end of a SC-to-SC test cable to the SC-to-ST adapter; then connect the other end to the 2105 Model 800 fibre channel host card, see Figure 147 on page 557. 5. Connect the ST-to-ST test cable from the SC-to-ST adapter to the power meter. Note: If the optical power meter has not been previously turned on, zeroed, and set to the correct scale, set the meter using Optical Power Meter Setup on page 560. If the fibre channel connection uses long wavelength (LW2), set the meter to 1300nm. If it uses short wavelength (SW2), set the meter to 780nm. 6. Use the optical power meter to obtain a reading. The power reading should be between -3.0 dBm and -9.0 dBm (-8 dBm is more power than -9.0 dBm. For example, -10 dBm indicates that the card transmitter is failing.) Record the actual measurement value for possible use during the link fault isolation procedures. 7. Disconnect the test cable from the 2105 Model 800 fibre channel card. 8. Was the power measurement correct? v Yes, continue with the next step. v No, use the repair process to replace the fibre channel host card FRU: Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Host Bay FRUs Replace the fibre channel host card that was being tested, then repeat the Optical Transmitter Measurement. Note: The repair procedure will resume the required resources. 9. Do you still need to perform the Optical Receiver Measurement procedure? v Yes, perform the Optical Receiver Measurement (Isolation Procedure 2).

558

VOLUME 1, TotalStorage ESS Service Guide

MAP 5321: Fibre Optical Power Measurement


v No, continue with the next step. 10. Return to the service terminal and follow the instructions on the screen to: Make Resource Available for Customer Use 11. Return to the procedure that sent you here. Isolation Procedure 2: Optical Receiver Measurement: This procedure measures the power received at the end of the customers fibre channel link cable (input into optical receiver) through a SC-to-ST adapter (P/N 54G3424) and ST-to-ST test cable (P/N 02G6159). Note: Always clean the fiber optic connectors as described in the cleaning instructions in the fiber optic cleaning kit (New P/N 46G6844 or Old P/N 5453521) before connecting or reconnecting the fiber optic cables.
SC-to-ST Adapter (54G3424) ST-to-ST Test Cable (02G6159)

To Fibre Channel Host

Device

Power Meter

Figure 148. Measuring Fibre Channel Optical Receive Power (s008841m)

1. Ensure that the device on the other end of the link is powered on. 2. Disconnect the fiber optic cable from the duplex connector on the 2105 Model 800 fibre channel card, if not previously disconnected. 3. Connect the customers fiber optic cable (that was removed from the 2105 Model 800 fibre channel host card) to the SC-to-ST adapter, Figure 148. 4. Connect the ST-to-ST test cable from the SC-to-ST adapter to the power meter. Note: If the optical power meter has not been previously turned on, zeroed, and set to the correct scale, set the meter using MAP 5320: ESCON Optical Power Measurement on page 552. If the Fibre Channel connection uses long wavelength (LW2), set the meter to 1300nm. If it uses short wavelength (SW2), set the meter to 780nm. Use the optical power meter to obtain a reading. The power reading should be -3.0 dBm and -20.0 dBm (-19.0 dBm is more power than -20.0 dBm). Record the actual measurement value for possible use later during the link fault isolation procedures. Disconnect the customer fiber optic channel cable from the coupler and reconnect the cable to the 2105 Model 800 fibre channel card. Return to the service terminal and follow the instructions on the screen to: Make Resource Available for Customer Use Return to the procedure that sent you here.
Problem Isolation Procedures, CHAPTER 3

5.

6. 7. 8.

559

MAP 5321: Fibre Optical Power Measurement


Optical Power Meter Setup: Use this procedure only to do the initial setup of the optical power meter (P/N 18F7005): 1. Power meter On 2. Set the meter to 1300 nanometers (nm) 3. Zero the meter 4. Set the meter to display the dBm scale Note: Do not hold down a push-button for more than one-half second. When held down for more than approximately three seconds, the push-button generates results different from those needed. 1. Ensure that the black cap is over the biconic receptacle at the top of the power meter. 2. Press Power On/Off. AUTO OFF will be displayed and the meter will turn off if no push-button is pressed in ten minutes. Allow a two minute warm-up period. 3. If the meter does not display 1300 nm, press the (lambda) push-button repeatedly until 1300 nm is displayed. 4. Press ZERO, two displays will be seen: v A value between 0.30 and 0.70 nW (nanowatts). v ZERO will blink after a short time, indicating that the meter is properly set to zero. If the above indicators do not display and Hi or Lo is displayed after pressing ZERO, press ZERO again. Using a small screwdriver, adjust the trim pot that is next to the biconic receptacle at the top of the meter until a value of between .30 and .70 nW is displayed. Set the value as close to .50 nW as possible. Press ZERO again to zero the meter. 5. The meter must also display dBm (decibel, based on one milliwatt). If nW is displayed, press dBm/Watt. Continue with one of the following: v Isolation Procedure 1: Optical Transmitter Measurement on page 557 v Isolation Procedure 2: Optical Receiver Measurement on page 559

MAP 5330: Display ESCON and Fibre Node Descriptors


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Isolating ESCON and Fibre link faults outside the 2105 may be easier using node information stored in the 2105. LIC levels prior to Code EC 2.3.0.0 do not provide this menu option.

Isolation
1. To display the 2105 node information for ESCON and Fibre host adapters, use the service login Utility Menu, Display ESCON and Fibre Node Descriptors option. Note: LIC levels prior to Code EC 2.3.0.0 do not provide this menu option. 2. An example of the displayed information is shown below. v For a definition of the 2105 Port ID field, go to step 3 on page 561.

560

VOLUME 1, TotalStorage ESS Service Guide

MAP 5330: Display ESCON and Fibre Node Descriptors


v The remaining fields displayed below are not defined here. They are provided for use by personnel trained on link fault isolation. Reference the Enterprise Systems Link Fault Isolation manual, form number SY22-9533. Note: The displayed line of information for a Port may wrap to the next line as shown.
Machine:IBM.2105-800.75-18302 WWNN=5005076300C02C6E Port-Type PortID Src/Dst-ID D WWNN WWPN ESCON 0004 E9 ESCON 0005 ED FC-1GB 0088 211C13 100008008840A89D WWPN Attach Attached_Machine PortI

Current IBM.9032-002.02-10242 00E9 Current IBM.9032-002.02-10242 00ED

5005076300C62C6E Current MCD.5000-001.01-XXXXX 0018

3. Use the following table to convert the 2105 Port ID field in the second column to the host bay, host card, and port:
Table 93. 2105 Port ID Field HOST ADAPTOR PORT Host Bay 1 Port Host Bay 2 Port Host Bay 3 Port Host Bay 4 Port IDs IDs IDs IDs 0020 0021 0024 0025 0028 0029 002C 002D 0080 0081 0084 0085 0088 0089 008C 008D 00A0 00A1 00A4 00A5 00A8 00A9 00AC 00AD

Host Card 1 Port 0000 0 (Top) Host Card 1 Port 0001 1 (Bottom) Host Card 2 Port 0004 0 (Top) Host Card 2 Port 0005 1 (Bottom) Host Card 3 Port 0008 0 (Top) Host Card 3 Port 0009 1 (Bottom) Host Card 4 Port 000C 0 (Top) Host Card 4 Port 000D 1 (Bottom)

MAP 5340: CKD Read Data Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
You are here to resolve a Data Path failure that has been logged with one of the ESC values listed below. An action to repair hardware or microcode is necessary. The action may require the repair of another problem in the log. The failure may have caused customer data to be unreadable. If this occurs the customer must restore the data after the hardware or microcode repair action is complete. This MAP isolates for the following ESCs:
Problem Isolation Procedures, CHAPTER 3

561

MAP 5340: CKD Read Data


v ESC 334B, physical address validation error. v ESC 334C, third or later repeat of physical address validation error on the same physical address. v ESC 4970, second occurrance of physical address validation error on the same physical address. These are customer data physical address validation errors. Data transferred from a DDM to cache memory did not have the expected physical address. There are two possible causes of this failure: v Data may have been read from the wrong track, volume or array. v The data that was read, may have originally been written to the wrong track, volume or array.

Isolation
The recommended action is to contact your next level of support for fault isolate and repair assistance. The most likely repair activities are: 1. Locate and repair any related problems. 2. Have the customer restore the data after the hardware problem has been resolved.

MAP 5400: Fibre Channel Link Fault


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Ensure that you read the Fiber Optic Cable Handling Precautions on page 548before you run this test.

Description
Link incidents are problems that are not automatically detected, isolated and reported by any one single node on the Fibre Channel link. They occur on an interface and may cause multiple nodes to detect different types of link incidents. Each node detecting and reporting a link incident will generate its own link incident. Link incidents detected by the storage facility may be displayed from the error log. Fault isolation of link incidents is solved by the combined use of product and system documentation: v Enterprise Systems Connection Link Fault Isolation. book, form number SY22-9533. v Maintenance Information for S/390 Fiber Optic Links (ESCON, Fibre, Coupling Links, and Open System Adapters) book, form number SY27-2597. Ensure that both documents are available for problem determination.

Isolation
1. This MAP has been combined into MAP 5300, go to MAP 5300: ESCON or Fibre Channel Link Fault on page 548.

562

VOLUME 1, TotalStorage ESS Service Guide

MAP 5410: Fibre Channel Bit Error Validation

MAP 5410: Fibre Channel Bit Error Rate Validation


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
Bit Error Rate Threshold incidents are caused by specific conditions at an interface or along a line which can cause bits to be received or interpreted incorrectly. These bits are counted, and when a specific number is reached (threshold exceeded), the link is operating in a degraded mode. Bit errors are counted by each node attached on a link. You must determine which node(s) in a link have detected a threshold exceeded condition to identify the link or nodes causing the incident.

Isolation
1. Determine what type of error was reported by the customer. Was the customer-reported error a Bit Error Threshold Exceeded (BER) detected at the ATTACHED node? v Yes, go to step 3. v No, continue with the next step. 2. Display problems using the following service panel options: From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Are there any bit error rate problems (ESC=326A) for the failing link? v Yes, continue with the next step. v No, Additional link problem determination is needed. Ensure that all optical link cables are reconnected, then call next level of support. 3. Test the bit error rate: v Reconnect the optical link cables to the subsystem, if previously disconnected. v Run the Bit Error Rate Test on the failing link: From the service terminal Main Service Menu, select: Machine Test Menu Host Interface Cards Menu Fibre Channel Host Ports Menu Fibre Channel Port Bit-Error-Rate Test Select the SA interface to be tested, and follow the instructions on the screen to run the test. Did the test run successfully? Yes, cancel any outstanding Bit Error Rate problems logged for this link and resume any quiesced links. The call is complete. No, continue with the next step. 4. Determine how many times the Bit-Error-Rate Test has been run. Has this test been run only one time? v Yes, clean the fiber optic connectors and run this test again. Use the fiber optic cleaning procedure specified in the fiber optic connector cleaning kit (New P/N 46G6844, Old P/N 5453521). Go to step 3.
Problem Isolation Procedures, CHAPTER 3

563

MAP 5410: Fibre Channel Bit Error Validation


v No, return to the step in MAP 5300: ESCON or Fibre Channel Link Fault on page 548 that sent you here.

MAP 5430: Host Fibre Channel Fails to Recognize ESS LUNs


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
You are here to resolve a host failure to recognize LUNs configured on an ESS Fibre Channel.

Isolation
1. Use the service terminal to determine the current ESS Fibre Channel configuration and connections. From the service terminal Main Service Menu, select: Configuration Options Menu Systems Attachment Resources Menu List Host Cards and Ports 2. Using the configuration worksheets from the IBM Enterprise Storage Server Configuration Planner book, form number SC26-7450. Verify that the ESS hardware configuration matches: a. The configuration worksheet. b. The Fibre Channel host cables are connected to the appropriate Fibre Channel host card and host bay, see the following figure. if mismatches are discovered, check with the customer to resolve any differences.

564

VOLUME 1, TotalStorage ESS Service Guide

MAP 5430: Host Fibre Channel Fails to Recognize ESS LUNs

Host Bays

R1-B1

R1-B2

R1-B3

R1-B4

Front View Ultra SCSI Host Cards Card 1, R1-Bx-H1 Card 2, R1-Bx-H2 Card 3, R1-Bx-H3 Card 4, R1-Bx-H4 SCSI Connectors ESCON Link Connectors ESCON Host Cards Card 1, R1-Bx-H1 Card 2, R1-Bx-H2 Card 3, R1-Bx-H3 Card 4, R1-Bx-H4

ZA ZB

ZA/LINK 00 ZB/LINK 01

Fibre Channel Host Cards Card 1, R1-Bx-H1 Card 2, R1-Bx-H2 Card 3, R1-Bx-H3 Card 4, R1-Bx-H4 Fibre Channel Card Type LW2 (Long Wave Card) SW2 (Short Wave Card) Fibre Link Connectors Link A

Figure 149. 2105 Model 800 Host Bay Connector Locations (s009135)

Has the problem been resolved? v Yes, return to the procedure that sent you here. v No, continue with the next step. 3. Verify the LUN access setting of the Control Switches: From the service terminal Main Service Menu, select: Configuration Options Menu Change / Show Control Switches Is the Fibre Channel LUN Access Control set to Access_All? v Yes, have the customer check with the system administrator to verify that the host fibre configuration is correct. Note: If the control switches are changed, the subsystem must be rebooted for the change to take effect. v No, continue with the next step. 4. Has the problem been resolved? v Yes, return to the procedure that sent you here.
Problem Isolation Procedures, CHAPTER 3

565

MAP 5430: Host Fibre Channel Fails to Recognize ESS LUNs


v No, continue with the next step. 5. Using ESS specialist, verify (or have the customer verify) that: a. The proper hosts are defined as being attached to the ESS. b. All the host fibre port configuration is correct and matches the configuration worksheets. Are the hosts defined correctly? v Yes, have the customer check with the system administrator to verify that the host fibre configuration is correct. If there are fibre switches between the host and the ESS, have the customer verify that any zoning within the switch is properly configured. This will allow the desired host to access the ESS. v No, the customer must define the Host configuration via the ESS specialist.

MAP 5440: Fibre Host Card Reports a Loss of Light


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.

Description
A fibre host card in the 2105 has detected and reported a loss of light from an attached host system. When a 2105 fibre host card detects a loss of light, the problem is normally external to the 2105. This is reported to the host system as a status condition. A problem is not created for this condition. A separate problem will be created if the fiber card detects an internal operational error.

Isolation
1. Use information from the customer to determine which fibre host card in the 2105 has reported the loss of light. Note: A problem is not created for this condition. 2. Use the service terminal Repair Menu and Show / Repair Problems Needing Repair options to repair any related problems for that fibre host card. 3. Observe the green and yellow LED indicators on that fibre host card. With a loss of light condition, the green LED should be blinking slowly (1 per second) and the yellow LED should be off. A loss of light problem is normally not caused by the 2105. These problems are normally external to the 2105. Use the standard fibre channel isolation procedures (not included in this service guide) to restore light to the fiber cable connected to this fibre host card.

566

VOLUME 1, TotalStorage ESS Service Guide

MAPs 6XXX: Service Terminal Isolation Procedures

MAPs 6XXX: Service Terminal Isolation Procedures


Procedures in the MAP 6XXX group in Chapter 3 cover the service terminal attached to the cluster of the 2105 Model 800 unit.

MAP 6060: Isolating a Service Terminal Login Failure


Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: The FRUs and cables in this procedure are ESD-sensitive. Always wear an ESD wrist strap during this isolation procedure. Follow the ESD procedures in Working with ESD-Sensitive Parts in chapter 4 of the Volume 2.

Description
The Copyright and Login screen are displayed when all of the following occur: v The service terminal and cable are connected to the cluster S2 port. v The service terminals, terminal emulator program is properly configured. v The service terminals, terminal emulator program is logically connected. v The Enter key is pressed to create a keyboard interrupt to the cluster. The login Main Service Menu is displayed when all of the following occur: v The service terminal Copyright and Login screen are displayed v The service login and password are entered v The cluster to cluster ethernet communication is either successful or times out. If the communication hangs, the screen will go blank and stay blank. The following rsACExec.c return code definitions are provided for product engineering use only:
Table 94. rsACExec.c Return Code Definitions PROCESS_TIMEOUT SUBROUTINE_FAIL FILE_FAILURE READ_FAIL WRITE_FAIL INVALID_IP_ADDR DAEMON_INIT_FAIL SOCKET_FAIL AUTHORIZATION_FAIL ERROR_INVALID 0x90=144 0x89=137 0x88=136 0x86=134 0x85=133 0x84=132 0x83=131 0x82=130 0x81=129 0x80=128 Client timeout failure A system subroutine failure Operations on a file failed Read of socket failure Write to socket failure Loopback IP address invalid failure Daemon Initialization/Setup fail Socket failure Failure during authorization Invalid parameter failure

Isolation
Use the following steps to isolate the problem. 1. Check if the Copyright and Login screen is displayed. Connect the service terminal and cable to the cluster and then attempt to logically connect the service terminals terminal emulator program. Press the Enter key to create a keyboard interrupt. Wait up to 3 minutes for the Copyright and Login screen to display.
Problem Isolation Procedures, CHAPTER 3

567

MAP 6060: Service Terminal Login Failure


Is the Copyright and Login screen displayed? v Yes, go to step 7 on page 569. v No, continue with the next step. 2. Check if the Copyright and Login screen is displayed from the other cluster. Disconnect the service terminal from the failing cluster and connect it to the other cluster. Then logically attempt to connect the service terminals terminal emulator program to the cluster. Is the Copyright and Login screen displayed? v Yes, the service terminal is working, continue with step 4. v No, continue with the next step. 3. Determine if the service terminal is working. Use one or more of the following checks: v Verify the service terminal is connected to the S2 port of the cluster. v Verify that the service terminal is configured correctly, see Entry for Service Terminal Activities in chapter 8 of the Volume 3. v If there is a second 2105 available, try connecting the service terminal to its cluster. If it can login, the service terminal and cable are working. If it cannot login, the service terminal and or cable are not working. Use documentation for the service terminal to test or repair it. v If a second known good service terminal is available, try to login to the original failing cluster. If it can login, there is a problem with the first service terminal. Use documentation for the service terminal to test or repair it. If the second service terminal cannot not login, there is a problem with the cluster. v If you cannot do either of the actions listed above, you may use the service terminal documentation to test the serial port being used. Do you have a service terminal that works? v Yes, both clusters are not able to connect the service terminal. There may be a problem that can be cleared if you power the 2105 off and on and then retry the service terminal login. Call your next level of support. v No, you must have a working service terminal to continue. Exit this MAP. 4. Determine if the cluster is powered on. Press the CD-ROM drive eject button. Does the CD tray open? v Yes, the cluster is powered on, continue with the next step. v No, the cluster must be powered on to login. If the cluster is not in the middle of a repair action, and should be powered on, go to MAP 4880: Cluster Power On Problem on page 461. 5. Determine if the cluster is hung. Observe the CEC drawer operator panel display. Has any one code been displayed for more than ten minutes? v Yes, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v No, continue with the next step. 6. Determine if the cluster is still loading code after a power on. Observe the CEC drawer operator panel display. Are various codes being displayed? v Yes, the cluster is still loading code or is in error recovery. Wait for the code load process to complete. The display will quickly show the Ready or

568

VOLUME 1, TotalStorage ESS Service Guide

MAP 6060: Service Terminal Login Failure


Ready for Login message and then may go blank and stay blank. This can take a minimum of 30 minutes from cluster power on. (If the cluster is being automatically rebooted for error recovery, it can take much longer.) Prior to that, the cluster will not accept a service terminal connection and the Copyright and Login screen will not be displayed. Retry the service terminal login. Exit this MAP if it is successful, repeat this step if the login is not successful. v No, go to step 9. 7. Determine if the service terminal can login to the cluster. At the Copyright and Login screen, Enter the service ID (and password). Wait up to 5 minutes for the Main Service Menu screen to display. Is the Main Service Menu displayed? v Yes, the service terminal login is successful. Use the Main Service Menu, Service Menu, End Of Call Status option to complete this service action. v No, the screen went blank, continue with the next step. 8. Determine if the cluster to cluster ethernet communication is hanging the service terminal login. Do the following steps: a. Logically disconnect the service terminal from the cluster (leave the cable connected). b. Disconnect the ethernet cable from the other cluster. c. Logically connect the service terminal to the cluster. d. Wait up to five minutes for the cluster to cluster ethernet connection software to timeout and then allow the Main Service Menu to be displayed. Is the Main Service Menu displayed? v Yes, go to step 11 on page 570. v No, disconnect the service terminal from this cluster and connect it to the other cluster. Attempt to login to display and repair any related problems. If the login fails, you cannot login to either cluster. Call your next level of support. They may have you power off and on the 2105 to attempt to clear the problem. 9. Determine if the other cluster has any related problem for the failing cluster. Connect the service terminal to the other cluster (not the failing cluster) and display problems needing repair. Are there any related problems for the failing cluster? v Yes, exit this MAP and repair the related problem. v No, continue with the next step. 10. Determine if reloading the code on the failing cluster will correct the connect and/or login problem. The failing cluster will be quiesced, powered off, powered on to reload the code. Connect the service terminal to the cluster not being serviced. Use the Main Service Menu, Repair Menu, Alternate Cluster Repair Menu options to quiesce, power off and then power on the failing cluster. Wait up to 45 minutes for the rack operator panel Cluster Ready LED to be lit and then attempt to use the service terminal to connect and login. Was the service terminal able to connect and login? v Yes, the problem is fixed. Connect the service terminal back to the other cluster and resume this cluster. Close any related problems and then use the Repair Menu, End of Call Status option to complete this service action.
Problem Isolation Procedures, CHAPTER 3

569

MAP 6060: Service Terminal Login Failure


v No, there may be a hardware problem with the S2 port function of the I/O drawer planar assembly FRU or the I/O drawer planar assembly to S2 connector cable FRU. Call your next level of support. 11. Determine if either cluster has any related problem for either cluster. Connect the service terminal to each cluster and attempt to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a related cluster problem? v Yes, exit this MAP and repair the related problem v No, continue with the next step. 12. Determine if reloading the code one or both clusters will correct the login hang problem. One cluster will be quiesced, powered off, powered on to reload the code. Connect the service terminal to the cluster not being serviced. Use the Main Service Menu, Repair Menu, Alternate Cluster Repair Menu options to quiesce, power off and then power on the failing cluster. Reconnect the ethernet cable. Wait up to 40 minutes to attempt to use the service terminal to connect and login. Was the service terminal able to connect and login with the ethernet cable connected? v Yes, the problem is fixed. Connect the service terminal back to the other cluster and resume this cluster. Close any related problems and then use the Repair Menu, End of Call Status option to complete this service action. v No, disconnect the ethernet cable from the cluster. Connect the service terminal back to the other cluster and resume this cluster. Repeat this step for the remaining cluster. If both clusters have had their code reloaded and the login still fails, call your next level of support.

Service Terminal Connection Diagram


The following diagram shows the hardware and cables involved in connecting the service terminal to the 2105 rack and AC power.

570

VOLUME 1, TotalStorage ESS Service Guide

Cluster 2 RS/232 S2

Cluster 1 S2 RS/232

I/O Planar J41

I/O Planar J41

Service Terminal Interface Cable Power Jack Serial Connector Service Terminal

AC Voltage Adapter Service Cord

Figure 150. Service Terminal Connections to Controllers and Power (s009595)

Problem Isolation Procedures, CHAPTER 3

571

572

VOLUME 1, TotalStorage ESS Service Guide

Appendix. Accessibility
Accessibility features help a user who has a physical disability, such as restricted mobility or limited vision, to use software products successfully.

Features
These are the major accessibility features in the IBM TotalStorage Enterprise Storage Server information: 1. You can use screen-reader software and a digital speech synthesizer to hear what is displayed on the screen. IBM Home Page Reader version 3.0 has been tested. 2. You can operate features using the keyboard instead of the mouse.

Navigating by keyboard
You can use keys or key combinations to perform operations and initiate menu actions that can also be done through mouse actions. You can navigate the IBM TotalStorage Enterprise Storage Server information from the keyboard by using the shortcut keys for your browser or Home Page Reader. See your browser Help for a list of shortcut keys that it supports. See the following Web site for a list of shortcut keys supported by Home Page Reader:
http://www-306.ibm.com/able/solution_offerings/keyshort.html

Accessing the publications


You can find HTML versions of the IBM TotalStorage Enterprise Storage Server information at the following Web site:
http://www.ehone.ibm.com/public/applications/publications/cgibin/pbi.cgi

You can access the information using IBM Home Page Reader 3.0.

Copyright IBM Corp. 2004, 2005

573

574

VOLUME 1, TotalStorage ESS Service Guide

Index Numerics
20 mb where 40 mb SSA cable expected, MAP 3656 312 2105 cannot be power off, pinned data, MAP 24B0 167 2105 expansion enclosure (rack 2) power off problem, MAP 23B0 144 2105 Expansion Enclosure (rack 2) UEPO problem, MAP 2380 138 2105 expansion enclosure information 1 2105 expansion enclosure power on problem, MAP 2420 154 2105 Model 750 disk storage information 21 2105 Model 750 disk storage information 21 2105 model 750 information 1 2105 Model 800 (rack 1) UEPO problem, MAP 2360 131 2105 Model 800 disk storage information 21 2105 Model 800 disk storage information 21 2105 model 800 information 1 2105 Model 800 local power on problems, MAP 2400 149 automatic LIC activation problem, (CCL), MAP 4A60 493 automatic LIC activation problem, (NCCL), MAP 4AA0 501 automatic LIC activation problem, (NCCL), MAP 4BA0 532 automatic LIC activation problem, (CCL), MAP 4B60 520 automatic LIC activation problem, (CCL), MAP 4A80 497 automatic LIC activation problem, (CCL), MAP 4B80 526 automatic LIC activation problem, (CCL), MAP 4A50 491 automatic LIC activation problem, (CCL), MAP 4B50 517 automatic LIC activation problem, (NCCL), MAP 4A30 486 automatic LIC activation problem, (NCCL), MAP 4B30 511 automatic LIC activation problem, (CCL), MAP 4A70 495 automatic LIC activation problem, (CCL), MAP 4B70 523 automatic LIC activation problem, (NCCL), MAP 4AB0 503 automatic LIC activation problem, (NCCL), MAP 4BB0 534 automatic LIC activation problem, (CCL), MAP 4A90 499 automatic LIC activation problem, (CCL), MAP 4B90 529 automatic LIC activation problem, NCCL), MAP 4A10 482 automatic LIC activation problem, NCCL), MAP 4B10 506 automatic LIC activation problem, NCCL), MAP 4BE0 537 cluster 1, phase 150 cluster 1, phase 150 cluster 1, phase 150 cluster 1, phase 150, cluster 1, phase 200 cluster 1, phase 200 cluster 2, phase 100 cluster 2, phase 100 cluster 2, phase 100 cluster 2, phase 100 cluster 2, phase 150 cluster 2, phase 150 cluster 2, phase 150 cluster 2, phase 150 cluster 2, phase 200 cluster 2, phase 200 phase 000 (CCL & phase 000 (CCL & phase 400 (CCL &

A
a temporary CPI error was detected, MAP 41F0 365 accessibility 573 accessing copy services information 9 accessing ESS specialist information 9 all DDMs on loop B do not have the same characteristics, MAP 3626 303 all DDMs on SSA loop A do not have the same characteristics, MAP 3625 302 analyzing a storage cage fan/power sense card check summary indicator on, MAP 3379 246 array repair required, MAP 3123 226 arrays across loops information 5 arrays across loops information 5 attaching the ESSNet to a customer network, MAP 1620 107 attempt to format array member, MAP 3131 231 attention notices fragility of disk drive modules 176 automatic LIC activation cluster problem, phase 400 (CCL & NCCL), MAP 4AE0 504 automatic LIC activation failure, cluster 1 phase 100 (CCL), MAP 4A40 488 automatic LIC activation problem, cluster 1, phase 100 (CCL), MAP 4B40 514 automatic LIC activation problem, cluster 1, phase 100 (NCCL), MAP 4A20 485 automatic LIC activation problem, cluster 1, phase 100 (NCCL), MAP 4B20 509

B
battery set charge low, MAP 2460 162 battery set detection problem, MAP 2470 162 bay held reset condition 339 begin all service actions 29 bit error rate test failure, MAP 5305 550 bootlist management using SMS for automatic LIC, MAP 43A5 392 bootlist management using SMS, MAP 43A0 387 both RPC cards firmware down level, MAP 24F0 168 bypass card jumpers wrong, MAP 3654 311

C
call home / remote services failure, MAP 1301 55 Canadian compliance statement xviii category 1, crash codes 368, 369

Copyright IBM Corp. 2004, 2005

575

CD-ROM test failure, MAP 4600 429 CEC drawer power indicator information 15 CEC drawer power indicator information 15 CEC drawer power on problem, MAP 2700, 170 CEC or I/O drawer visual power supply problem, MAP 2800 171 CEC, I/O, or host bay drawer overcurrent, MAP 2030 113 CEC, I/O, or host bay drawer power fault, MAP 2230 122 changing network configuration for ESS and master console, MAP 1607 85 chapter 1, reference information 1 chapter 2, entry for all service actions 29 chapter 3, problem isolation procedures (MAPs) 41 Chinese EMI statement xx CKD read data failure, MAP 5340 561 cluster MAP 4055, bay held reset condition 339 MAP 45A0: pinned data, special case 428 cluster code load counter = 2, MAP 4350 370 cluster dual hard drive ESC 1xxx, MAP 43B0 398 cluster fails to power off, MAP 47A0 449 cluster FRU replacement (CEC and I/O drawers), MAP 4700 432 cluster hang during failback or error recovery, MAP 4010 319 cluster IML from second hard disk drive, MAP 43C0 400 cluster indicators information 15 cluster indicators information 15 cluster minimum configuration, MAP 4540 418 cluster not ready, MAP 20A0 117 cluster NVS problem, MAP 4460 410 cluster power off request problem, MAP 4730 446 cluster power on problem, MAP 4880 461 cluster powered off unexpectedly 431 cluster powered off unexpectedly, MAP 23E0 149 cluster SP, SPCN, or system firmware down-level, MAP 4610 430 cluster SP, SPCN, or system firmware reload 431 cluster to cluster ethernet communication test, MAP 4410 403 cluster to modem communication problem, MAP 1300 52 cluster to RPC cards communication problem, MAP 4480 411 codes category 1, crash codes 368, 369 communications statement xviii compliance statement, radio frequency energy xviii compliance statement, Taiwan xx configuration MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 configuring 94 configuring, 2105 Model 800 for installation 94

connecting ethernet LAN 94 connecting the modem and modem expander for remote support, MAP 1610 88 converting the personal computer to an ESSNet console, MAP 1606 76 copy services information 10 copy services information 10 copy services, accessing information 9 CPI address mismatch, MAP 4090 343 CPI diagnostic communication problem, MAP 4840 457 CPI failure needing CPI cable as FRU, MAP 41E0 365 CPI interface NVS/IOA card to host bay failure, MAP 41B0 361 CPI problem or host bay slot failure, MAP 41D0 364 crossed RPC cables to expansion rack, MAP 2450 160 CUIR information 11 CUIR information 11 customer copy services problem, MAP 4980 474 customer media maintenance examples 38 start 38 customer receives sense data without a SIM 34 start 34

D
DDM bay controller card indicator 22 controller card power check indicator information 21 DDM check indicator 22 disk drive module check indicator 23 disk drive module indicators 23 disk drive module ready indicator 23 external SSA connections 24 indicator information 21 internal SSA connections 24 link status (ready) indicator 22 mode indicator 22 DDM bay controller card indicator 22 DDM bay controller card power check indicator information 21 DDM bay DDM check indicator 22 DDM bay disk drive module check indicator 23 DDM bay disk drive module indicators 23 DDM bay disk drive module ready indicator 23 DDM bay external SSA connections 24 DDM bay indicator information 21 DDM bay internal SSA connections 24 DDM bay link status (ready) indicator 22 DDM bay mode indicator 22 DDM bay verification for possible problems, MAP 3520 284 DDM bay, maintenance analysis procedures (MAPs) 176 DDM installation introduces different RPM, MAP 3614 296

576

VOLUME 1, TotalStorage ESS Service Guide

DDM installation with mixed capacity rank site, MAP 3612 293 DDM installation with new rank site capacity, MAP 3610 290 DDM size is not supported, MAP 3617 298 DDM, or DDMs, found in formatting state during IML, MAP 3580 288 DDMs of same capacity but different rpms on the same SSA loop 298 decode a refcode 36 start 36 disability 573 disk drive module check indicator 23 indicators 23 ready indicator 23 disk drive module check indicator 23 disk drive module indicators 23 disk drive module ready indicator 23 display and repair a problem, MAP 1210 51 display cluster ethernet network address, MAP 4420 405 display ESCON and fibre node descriptors, MAP 5330 560 displaying cluster SMS error logs, MAP 4400 402 dump progress indicators 369 duplicate TCP/IP address detected for this cluster, MAP 43D0 401

E
electronic emission notices xviii EMI statement, Chinese xx end a DASD service action, MAP 3360 241 end service action, MAP 1500 67 entry for maintenance analysis procedures (MAPs) 41 entry MAP for CPI problems, MAP 4040 326 entry table for all service actions 29 start 29 entry table, entry table, MAP 3xxx: SSA DASD DDM bay MAPs 43 entry table, entry table, MAP 4xxx: cluster MAPs 45 entry table, entry table, MAP 5xxx: host interface MAPs 48 entry table, MAP 1xxx: general MAPs 41 entry table, MAP 2xxx: power and cooling MAPs 42 entry table, MAP 3xxx: SSA DASD DDM bay MAPs 43 entry table, MAP 5xxx: host interface MAPs 48 entry table, MAP 6xxx: service terminal MAPs 49 EREP EREP reports 34 repair using an EREP report 34 EREP reports 34 start 34 error displaying problems needing repair, MAP 4370 375 ESC 2768, NVS/IOA card problem, MAP 4470 411 ESC 2770 or 2771, missing CPI detected, MAP 41C0 362 ESC 5500 isolation, MAP 4960 471

ESCON information 6 link fault isolation 548 MAP 5305, bit error rate test failure 550 ESCON attached host systems information 6 ESCON bit error validation, MAP 5310 551 ESCON optical power measurement, MAP 5320 552 ESS connection security information 7 ESS cluster to customer network problem, MAP 4450 407 ESS connection security information 7 ESS interface information 7 ESS interface information 7 ESS service interface information 11 ESS service interface information 11 ESS specialist information 9 ESS Specialist cannot access cluster, MAP 5000 540 ESS specialist information 9 ESS specialist, accessing information 9 EssNet MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 ESSNet information 7 master console replacement 8 ESSNet console problem, MAP 1600 68 ESSNet information 7 ESSNet1 or master console to cluster ethernet problem, MAP 4440 405 European Community Compliance statement xviii event history report 35 start 35 extended cluster IML time from NVS battery charging, MAP 4200 366

F
failure detected during Background Certify and Build Logical Configuration from ISA 477 failure detected during Background Certify and Build Logical Configuration from ISA, MAP 49A0 477 FCC (see Federal Communications Commission) xviii Federal Communications Commission (FCC) statement xviii fence network isolation, MAP 40A0 344 fibre MAP 5305, bit error rate test failure 550 fibre channel connection information 13 host card indicator information 14 fibre channel (SCSI-FCP) information 6
Index

577

fibre channel (SCSI-FCP) host system information 6 fibre channel bit error validation, MAP 5410 563 fibre channel connection information 13 fibre channel host card indicator information 14 Fibre channel link fault, MAP 5400 562 fibre host card reports a loss of light, MAP 5440 566 fibre optical power measurement, MAP 5321 556 ficon information 6 link fault isolation 548 FICON attached host systems information 6 formatting of a DDM has not completed, MAP 3127 229

G
generating a refcode from sense bytes 37 start 37

H
Handling a missing or failing resource, MAP 4130 353 hard disk drive build process for both drives, MAP 4020 320 hard drive build process for automatic LIC, MAP 4025 324 host bay drawer fan reporting failure, MAP 4110 351 host bay drawer power supply problem, MAP 2210 119 host bay drawer visual power supply problem, MAP 2810 174 host bay fails to power off, MAP 4720 443 host bay power on problem, MAP 4870 459 host fibre channel fails to recognize ESS LUNs, MAP 5430 564 host systems information 5 host systems information 5

I
I/O drawer power indicator information 17 I/O drawer power indicator information 17 IBM patents xvii products xvii programs xvii services xvii trademarks xx incomplete or failed format process, MAP 3550 286 indicators dump progress 369 Industry Canada Compliance statement xviii information 2105 expansion enclosure 1 2105 model 750 1 2105 Model 750 disk storage 21 2105 model 800 1 2105 Model 800 disk storage 21 accessing copy services 9

information (continued) accessing ESS specialist 9 arrays across loops 5 CEC drawer power indicator 15 cluster indicators 15 copy services 10 CUIR 11 DDM bay controller card indicator 22 DDM bay controller card power check indicator information 21 DDM bay DDM check indicator 22 DDM bay disk drive module check indicator 23 DDM bay disk drive module indicators 23 DDM bay disk drive module ready indicator 23 DDM bay external SSA connections 24 DDM bay indicators 21 DDM bay internal SSA connections 24 DDM bay link status (ready) indicator 22 DDM bay mode indicator 22 differences, ESSNet and master consoles 8 disk drive module check indicator 23 disk drive module indicators 23 disk drive module ready indicator 23 ESCON attached host systems 6 ESS connection security 7 ESS interfaces 7 ESS master consoles 8 ESS service interface 11 ESS specialist 9 ESSNet 7 fibre channel connection 13 fibre channel host card indicators 14 FICON attached host systems 6 host systems 5 fibre channel (SCSI-FCP) 6 SCSI 5 I/O drawer power indicator 17 master console 7 RAID-10 4 RAID-5 4 redundant array of independent disks (RAID) 4 reference 1 remote service support 13 RPC local and automatic switch settings 20 RPC local and remote switch settings 20 service interface 13 special tools 28 switching ESS power off (automatic mode) 21 switching ESS power off (local mode) 20 switching ESS power off (remote mode) 21 switching ESS power on and off (all modes) 19 switching ESS power on and off (automatic mode) 19 switching ESS power on and off (local mode) 19 switching ESS power on and off (remote mode) 20 topics 1 TotalStorage expert 10 using the ESS operator panel 17 information topics 1 information, 2105 expansion enclosure 1 information, 2105 model 750 1

578

VOLUME 1, TotalStorage ESS Service Guide

information, 2105 model 800 1 information, reference 1 installation, 2105 Model 800 completing connecting ethernet LAN 94 testing modem communications 94 installed unit or feature mismatch, MAP 2320 124 isolating a blinking 888 error on the CEC drawer operator panel, MAP 4240 367 isolating a cluster to cluster CPI communication failure, MAP 4510 415 isolating a cluster to cluster ethernet problem, MAP 4390 377 isolating a customer data check failure, MAP 5240 544 isolating a customer LAN connection problem, MAP 4380 376 isolating a DDM bay controller card communications problem, MAP 3398 264 isolating a DDM bay location error, MAP 3428 279 isolating a DDM bay power problem, MAP 3395 261 isolating a DDM LIC update problem, MAP 4710 442 isolating a DDM location problem, MAP 3429 282 isolating a degraded SSA link between a DDM and an SSA device card, MAP 3060 184 isolating a degraded SSA link between a DDM and two SSA device cards, MAP 3078 193 isolating a degraded SSA link between two DDMs in separate DDM bays and an SSA device card, MAP 3096 209 isolating a degraded SSA link between two DDMs in separate DDM bays, MAP 3101 217 isolating a degraded SSA link between two DDMs, MAP 3010 178 isolating a degraded SSA link between two SSA device cards connected throjugh a DDM bay, MAP 3086 201 isolating a degraded SSA link, MAP 3121 223 isolating a diskette drive failure, MAP 4620 430 isolating a fixed block read data failure, MAP 5230 543 isolating a functional code not running problem, MAP 4780 447 isolating a LIC activation process failure, MAP 4140 354 isolating a LIC process read/display problem, MAP 4100 351 isolating a meta data check failure, MAP 5250 547 isolating a Multiple DDM detect over temperature problem, MAP 3685 316 isolating a SCSI bus error, MAP 5220 541 isolating a SCSI card configuration timeout, MAP 4820 456 isolating a software problem, MAP 4970 472 isolating a storage and DDM bay location error, MAP 3427 277 isolating a storage cage fan failure, MAP 3384 248 isolating a storage cage fan/power sense card error, MAP 3375 242 isolating a storage cage fan/power sense card error, MAP 3378 245 isolating a storage cage fan/power sense card error, MAP 3381 247

isolating a storage cage fan/power sense card location error, MAP 3426 275 isolating a storage cage fan/power sense card R1 jumper missing error, MAP 3423 270 isolating a storage cage power supply failure, MAP 3387 251 isolating a storage cage power supply problem, MAP 3391 255 isolating a two DDMs detected over temperature problem, MAP 3680 313 isolating an array repair required failure, MAP 3129 230 isolating an automatic LIC activation failure, MAP 4A00 482 isolating an SSA DASD DDM bay controller card problem, MAP 3397 263 isolating an SSA link error between a DDM and an SSA device card, MAP 3050 179 isolating an SSA link error between a DDM and two SSA device cards, MAP 3077 187 isolating an SSA link error between two DDMs in separate DDM bays and an SSA device card, MAP 3095 204 isolating an SSA link error between two DDMs in separate DDM bays, MAP 3100 212 isolating an SSA link error between two DDMs, MAP 3000 176 isolating an SSA link error two SSA device cards connected through a DDM bay, MAP 3085 197 isolating an unexpected result, MAP 3605 290 isolating an unexpected SSA SRN, MAP 3125 228 isolating an unexpected SSA test results, MAP 3126 228 isolating an unknown DDM failure, MAP 3128 229 isolating between DDM hardware and microcode failures, MAP 3124 227 isolating e-mail notification problems, MAP 1310 58 isolating memory related error codes, MAP 4160 355 isolating multiple DDMs on an SSA loop cannot be accessed, MAP 3142 231 isolating power symptoms, MAP 2020 112 isolating SNMP notification problems, MAP 1305 56 isolating too few DDMs in DDM bay, MAP 3220 239 isolation bay held reset condition 339 cluster fails to power off 449 cluster powered off unexpectedly 431 cluster SP, SPCN, or system firmware reload 431 DDMs of same capacity but different rpms on the same SSA loop 298 entry for MAPs 41 entry table, MAP 1xxx: general MAPs 41 entry table, MAP 2xxx: power and cooling MAPs 42 entry table, MAP 3xxx: SSA DASD DDM bay MAPs 43 entry table, MAP 4xxx: cluster MAPs 45 entry table, MAP 5xxx: host interface MAPs 48 entry table, MAP 6xxx: service terminal MAPs 49 link fault Isolation, ESCON or ficon 548 MAP 1200, prioritizing visual symptoms and problems for repair 50
Index

579

isolation (continued) MAP 1210, display and repair a problem 51 MAP 1300, cluster to modem communication problem 52 MAP 1301, call home / remote services failure 55 MAP 1305, isolating SNMP notification problems 56 MAP 1310, isolating e-mail notification problems 58 MAP 1460, E-mail reported errors 66 MAP 1480, replacing a FRU without using a problem 66 MAP 1500, end service action 67 MAP 1600, ESSNet console problem 68 MAP 1602, repairing the ESSNet consoles personal computer 69 MAP 1604, restoring the personal computers software 69 MAP 1605, master console product recovery wizard 73 MAP 1606, converting the personal computer to an ESSNet console 76 MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 MAP 1610, connecting the modem and modem expander for remote support 88 MAP 1620, attaching the ESSNet to a customer network 107 MAP 1630, master console product recovery wizard for Xseries 206 PCs 111 MAP 2000, model 100 attachment rack reported 112 MAP 2020, isolating power symptoms 112 MAP 2030, CEC, I/O, or host bay drawer overcurrent 113 MAP 2031, repair ground continuity 114 MAP 20A0, cluster not ready 117 MAP 2210, host bay drawer power supply problem 119 MAP 2220, input power to CEC, I/O, host bay drawer power supplies not detected 120 MAP 2230, CEC, I/O, or host bay drawer power fault 122 MAP 2320, installed unit or feature mismatch 124 MAP 2340, PPS status code 06 125 MAP 2350, PPS status indicator codes 127 MAP 2360, 2105 Model 800 (rack 1) UEPO problem 131 MAP 2365, UEPO loop problem 133 MAP 2370, rack 1 power on problem, automatic mode 136 MAP 2380, 2105 Expansion Enclosure (rack 2) UEPO problem 138 MAP 2390, rack 1 power on problem, remote mode 140 MAP 23B0, 2105 expansion enclosure (rack 2) power off problem 144 MAP 23C0, power event threshold exceeded 146

isolation (continued) MAP 23D0, RPC-2 card reporting PPS battery set present 147 MAP 23E0, cluster powered off unexpectedly 149 MAP 2400, 2105 Model 800 local power on problems 149 MAP 2410, RPC power mode switch mismatch 153 MAP 2420, 2105 expansion enclosure power on problem 154 MAP 2430, one RPC card firmware down level 157 MAP 2440, rack 1 power off problem 157 MAP 2450, crossed RPC cables to expansion rack 160 MAP 2460, battery set charge low 162 MAP 2470, battery set detection problem 162 MAP 2490, PPS input phase missing 164 MAP 24A0, PPS power on problem 165 MAP 24B0, 2105 cannot be power off, pinned data 167 MAP 24F0, both RPC cards firmware down level 168 MAP 2520, PPS output circuit breaker tripped 168 MAP 2600, RPC card cannot reset a power fault 169 MAP 2700, CEC drawer power on problem 170 MAP 2800, CEC or I/O drawer visual power supply problem 171 MAP 2810, host bay drawer visual power supply problem 174 MAP 3000, isolating an SSA link error between two DDMs 176 MAP 3010, isolating a degraded SSA link between two DDMs 178 MAP 3050, isolating an SSA link error between a DDM and an SSA device card 179 MAP 3060, isolating a degraded SSA link between a DDM and an SSA device card 184 MAP 3077, isolating an SSA link error between a DDM and two SSA device cards 187 MAP 3078, isolating a degraded SSA link between a DDM and two SSA device cards 193 MAP 3085, isolating an SSA link error two SSA device cards connected through a DDM bay 197 MAP 3086, isolating a degraded SSA link between two SSA device cards connected throjugh a DDM bay 201 MAP 3095, isolating an SSA link error between two DDMs in separate DDM bays and an SSA device card 204 MAP 3096, isolating a degraded SSA link between two DDMs in separate DDM bays and an SSA device card 209 MAP 3100, isolating an SSA link error between two DDMs in separate DDM bays 212 MAP 3101, isolating a degraded SSA link between two DDMs in separate DDM bays 217 MAP 3120, isolating an SSA link error 220 MAP 3121, isolating a degraded SSA link 223 MAP 3123, array repair required 226 MAP 3124, isolating between DDM hardware and microcode failures 227

580

VOLUME 1, TotalStorage ESS Service Guide

isolation (continued) MAP 3125, isolating an unexpected SSA SRN 228 MAP 3126, isolating an unexpected SSA test results 228 MAP 3127, formatting of a DDM has not completed 229 MAP 3128, isolating an unknown DDM failure 229 MAP 3129, isolating an array repair required failure 230 MAP 3131, attempt to format array member 231 MAP 3142, isolating multiple DDMs on an SSA loop cannot be accessed 231 MAP 3149, repairing single or multiple DDM failures 232 MAP 3180, controller card faile 235 MAP 3190, wrong drawer type error 236 MAP 3200, uninstalled SSA DDMs connected to loop A 237 MAP 3210, uninstalled SSA DDMs connected to loop B 238 MAP 3220, isolating too few DDMs in DDM bay 239 MAP 3300, repair alternate cluster to run SSA loop test 240 MAP 3360, end a DASD service action 241 MAP 3375, isolating a storage cage fan/power sense card error 242 MAP 3378, isolating a storage cage fan/power sense card error 245 MAP 3379, analyzing a storage cage fan/power sense card check summary indicator on 246 MAP 3381, isolating a storage cage fan/power sense card error 247 MAP 3384, isolating a storage cage fan failure 248 MAP 3387, isolating a storage cage power supply failure 251 MAP 3391, isolating a storage cage power supply problem 255 MAP 3395, isolating a DDM bay power problem 261 MAP 3397, isolating an SSA DASD DDM bay controller card problem 263 MAP 3398, isolating a DDM bay controller card communications problem 264 MAP 3400, replacing a DDM bay frame replacement 266 MAP 3421, storage cage fan/power sense card R2 cable problem 266 MAP 3422, storage cage fan/power sense card R2 jumper and cable problems 268 MAP 3423, isolating a storage cage fan/power sense card R1 jumper missing error 270 MAP 3424, storage cage fan/power sense card R1 jumper failing error 272 MAP 3425, storage cage fan/power sense card R2 cable error 273 MAP 3426, isolating a storage cage fan/power sense card location error 275 MAP 3427, isolating a storage and DDM bay location error 277 MAP 3428, isolating a DDM bay location error 279

isolation (continued) MAP 3429, isolating a DDM location problem 282 MAP 3500, verify a DDM bay repair 283 MAP 3520, DDM bay verification for possible problems 284 MAP 3530, SSA devices certify test failure 284 MAP 3540, web initiated format incomplete 285 MAP 3550, incomplete or failed format process 286 MAP 3560, unrelated occurrence, retry verification test 287 MAP 3570, unrelated event caused resume failure 288 MAP 3580, DDM, or DDMs, found in formatting state during IML 288 MAP 3600, multiple DDMs isolated on an SSA loop 289 MAP 3605, isolating an unexpected result 290 MAP 3610, DDM installation with new rank site capacity 290 MAP 3612, DDM installation with mixed capacity rank site 293 MAP 3614, DDM installation introduces different RPM 296 MAP 3617, DDM size is not supported 298 MAP 3618, replacement DDM has slower RPM than called for 299 MAP 3619, this repair requires a larger capacity DDM 301 MAP 3621, new DDM storage capacity smaller than original DDMs 301 MAP 3625, all DDMs on SSA loop A do not have the same characteristics 302 MAP 3626, all DDMs on loop B do not have the same characteristics 303 MAP 3627, unable to determine DDM use 304 MAP 3640, other cluster fenced - unable to verify SSA loop 305 MAP 3650, wrong, missing, or failing bypass card 307 MAP 3652, wrong, missing, or failing passthrough card 309 MAP 3654, bypass card jumpers wrong 311 MAP 3656, 20 mb where 40 mb SSA cable expected 312 MAP 3680, isolating a two DDMs detected over temperature problem 313 MAP 3685, isolating a Multiple DDM detect over temperature problem 316 MAP 4010, cluster hang during failback or error recovery 319 MAP 4020, hard disk drive build process for both drives 320 MAP 4025, hard drive build process for automatic LIC 324 MAP 4040, entry MAP for CPI problems 326 MAP 4060, replacing I/O drawer FRUs for CPI problems 341 MAP 4070, replacement of host bay FRUs for CPI problems 343 MAP 4090, CPI address mismatch 343 MAP 40A0, fence network isolation 344
Index

581

isolation (continued) MAP 40B0, special cluster problem determination using slow boot mode 346 MAP 40C0, special SCSI bus problem 347 MAP 40D0, special SRN problems 348 MAP 40E0, only one I/O drawer power supply detected 349 MAP 4100, isolating a LIC process read/display problem 351 MAP 4110, host bay drawer fan reporting failure 351 MAP 4120, handling unexpected resources 352 MAP 4130, handling a missing or failing resource 353 MAP 4140, isolating a LIC activation process failure 354 MAP 4150, PPS to RPC interface failure 355 MAP 4160, isolating memory related error codes 355 MAP 4170, loss of redundant input power to CEC, I/O or host bay drawers 357 MAP 4190, RPC to host bay drawer power communication failure 360 MAP 41A0, RPC card host bay drawer fan reporting failure 361 MAP 41B0, CPI interface NVS/IOA card to host bay failure 361 MAP 41C0, ESC 2770 or 2771, missing CPI detected 362 MAP 41D0, CPI problem or host bay slot failure 364 MAP 41E0, CPI failure needing CPI cable as FRU 365 MAP 41F0, a temporary CPI error was detected 365 MAP 4200, extended cluster IML time from NVS battery charging 366 MAP 4240, isolating a blinking 888 error on the CEC drawer operator panel 367 MAP 4350, cluster code load counter = 2 370 MAP 4360, isolation using codes displayed by the CEC drawer operator panel 371 MAP 4370, error displaying problems needing repair 375 MAP 4380, isolating a customer LAN connection problem 376 MAP 4390, isolating a cluster to cluster ethernet problem 377 MAP 43A0, bootlist management using SMS 387 MAP 43A5, bootlist management using SMS for automatic LIC 392 MAP 43B0, cluster dual hard drive ESC 1xxx 398 MAP 43C0, cluster IML from second hard disk drive 400 MAP 43D0, duplicate TCP/IP address detected for this cluster 401 MAP 43E0, service processor reset 401 MAP 4400, displaying cluster SMS error logs 402 MAP 4410, cluster to cluster ethernet communication test 403

isolation (continued) MAP 4420, display cluster ethernet network address 405 MAP 4440, ESSNet1 or master console to cluster ethernet problem 405 MAP 4450, ESS cluster to customer network problem 407 MAP 4460, cluster NVS problem 410 MAP 4470, ESC 2768, NVS/IOA card problem 411 MAP 4480, cluster to RPC cards communication problem 411 MAP 4510, isolating a cluster to cluster CPI communication failure 415 MAP 4520, pinned data and/or volume status unknown 417 MAP 4540, cluster minimum configuration 418 MAP 4550, NVS FRU replacement 426 MAP 4560: no valid subsystem status available 427 MAP 4600, CD-ROM test failure 429 MAP 4610, cluster SP, SPCN, or system firmware down-level 430 MAP 4620, isolating a diskette drive failure 430 MAP 4700, cluster FRU replacement (CEC and I/O drawers) 432 MAP 4710, isolating a DDM LIC update problem 442 MAP 4720, host bay fails to power off 443 MAP 4730, cluster power off request problem 446 MAP 4760, recovering from corrupted files or functions 446 MAP 4780, isolating a functional code not running problem 447 MAP 4810, unexpected host bay power off 452 MAP 4820, isolating a SCSI card configuration timeout 456 MAP 4840, CPI diagnostic communication problem 457 MAP 4850, repair the host bay drawer 458 MAP 4870, host bay power on problem 459 MAP 4880, cluster power on problem 461 MAP 4885, SPCN Load Fault Firmware Error Code 468 MAP 4890, replacing a CEC or I/O drawer power supply 471 MAP 4960, ESC 5500 isolation 471 MAP 4970, isolating a software problem 472 MAP 4980, customer copy services problem 474 MAP 4990, LIC feature license failure 476 MAP 49A0, failure detected during Background Certify and Build Logical Configuration from ISA 477 MAP 4A00, isolating an automatic LIC activation failure 482 MAP 4A10, automatic LIC activation problem, phase 000 (CCL & NCCL) 482 MAP 4A20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 485 MAP 4A30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 486 MAP 4A40, automatic LIC activation failure, cluster 1 phase 100 (CCL) 488

582

VOLUME 1, TotalStorage ESS Service Guide

isolation (continued) MAP 4A50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 491 MAP 4A60, automatic LIC activation problem, cluster 1, phase 150 (CCL) 493 MAP 4A70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 495 MAP 4A80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 497 MAP 4A90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 499 MAP 4AA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 501 MAP 4AB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 503 MAP 4AE0, automatic LIC activation cluster problem, phase 400 (CCL & NCCL) 504 MAP 4B10, automatic LIC activation problem, phase 000 (CCL & NCCL) 506 MAP 4B20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 509 MAP 4B30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 511 MAP 4B40, automatic LIC activation problem, cluster 1, phase 100 (CCL) 514 MAP 4B50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 517 MAP 4B60, automatic LIC activation problem, cluster 1, phase 150, (CCL) 520 MAP 4B70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 523 MAP 4B80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 526 MAP 4B90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 529 MAP 4BA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 532 MAP 4BB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 534 MAP 4BE0, automatic LIC activation problem, phase 400 (CCL & NCCL) 537 MAP 5000, ESS Specialist cannot access cluster 540 MAP 5220, isolating a SCSI bus error 541 MAP 5230, isolating a fixed block read data failure 543 MAP 5240, isolating a customer data check failure 544 MAP 5250, isolating a meta data check failure 547 MAP 5305, bit error rate test failure 550 MAP 5310, ESCON bit error validation 551 MAP 5320, ESCON optical power measurement 552 MAP 5321, fibre optical power measurement 556 MAP 5330, display ESCON and fibre node descriptors 560 MAP 5340, CKD read data failure 561 MAP 5400, Fibre channel link fault 562 MAP 5410, fibre channel bit error validation 563 MAP 5430, host fibre channel fails to recognize ESS LUNs 564

isolation (continued) MAP 5440, fibre host card reports a loss of light 566 MAP 6060, service terminal login failed to one cluster 567 MAPs 41 MAPs 1XXX, general isolation procedures 50 MAPs 2XXX, power and cooling isolation procedures 112 MAPs 3XXX, SSA DASD DDM bay isolation procedures 176 MAPs 4XXX, cluster isolation procedures 319 MAPs 5XXX, host interface isolation procedures 540 MAPs 6XXX, service terminal isolation procedures 567 pinned data, special case 428 problem isolation using visual symptoms 60 replacing DDMs called out by enhanced PFA 233 RPC to RPC communication fault 359 SSA DASD DDM bay power problem 234 using the DDM bay maintenance analysis procedures (MAPs) 176 using the SSA DASD maintenance analysis procedures (MAPs) 176 isolation using codes displayed by the CEC drawer operator panel, MAP 4360 371

J
Japanese Voluntary Control Council for Interference (VCCI) class A statement xix

K
Korean Government Ministry of Communication (MOC) statement xix

L
LIC feature license failure 476 LIC feature license failure, MAP 4990 476 loss of redundant input power to CEC, I/O or host bay drawers, MAP 4170 357

M
manually configuring the video/graphics adapter for the master console, MAP 1608 86 manuals, related xxv MAP entry table, MAP 3xxx: SSA DASD DDM bay MAPs 43 entry table, MAP 4xxx: cluster MAPs 45 entry table, MAP 5xxx: host interface MAPs 48 entry table, MAP 6xxx: service terminal MAPs 49 MAP 1200, prioritizing visual symptoms and problems for repair 50 MAP 1210, display and repair a problem 51

Index

583

MAP 1300, cluster to modem communication problem 52 MAP 1301, call home / remote services failure 55 MAP 1305, isolating SNMP notification problems 56 MAP 1310, isolating e-mail notification problems 58 MAP 1320, problem isolation using visual symptoms 60 MAP 1460, E-mail reported errors 66 MAP 1460, E-mail reported errors, MAP 1460 66 MAP 1480, replacing a FRU without using a problem 66 MAP 1500, end service action 67 MAP 1600, ESSNet console problem 68 MAP 1602, repairing the ESSNet consoles personal computer 69 MAP 1604, restoring the personal computers software 69 MAP 1605, master console product recovery wizard 73 MAP 1606, converting the personal computer to an ESSNet console 76 MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 MAP 1610, connecting the modem and modem expander for remote support 88 MAP 1620, attaching the ESSNet to a customer network 107 MAP 1630, master console product recovery wizard for Xseries 206 PCs 111 MAP 1XXX, general isolation procedures 50 MAP 1xxx: general MAPs, entry table 41 MAP 2000, model 100 attachment rack reported 112 MAP 2020, isolating power symptoms 112 MAP 2030, CEC, I/O, or host bay drawer overcurrent 113 MAP 2031, repair ground continuity 114 MAP 20A0, cluster not ready 117 MAP 2210, host bay drawer power supply problem 119 MAP 2220, input power to CEC, I/O, host bay drawer power supplies not detected 120 MAP 2230, CEC, I/O, or host bay drawer power fault 122 MAP 2320, installed unit or feature mismatch 124 MAP 2340, PPS status code 06 125 MAP 2350, PPS status indicator codes 127 MAP 2360, 2105 Model 800 (rack 1) UEPO problem 131 MAP 2365, UEPO loop problem 133 MAP 2370, rack 1 power on problem, automatic mode 136 MAP 2380, 2105 Expansion Enclosure (rack 2) UEPO problem 138 MAP 2390, rack 1 power on problem, remote mode 140 MAP 23B0, 2105 expansion enclosure (rack 2) power off problem 144 MAP 23C0, power event threshold exceeded 146

MAP 23D0, RPC-2 card reporting PPS battery set present 147 MAP 23E0, cluster powered off unexpectedly 149 MAP 2400, 2105 Model 800 local power on problems 149 MAP 2410, RPC power mode switch mismatch 153 MAP 2420, 2105 expansion enclosure power on problem 154 MAP 2430, one RPC card firmware down level 157 MAP 2440, rack 1 power off problem 157 MAP 2450, crossed RPC cables to expansion rack 160 MAP 2460, battery set charge low 162 MAP 2470, battery set detection problem 162 MAP 2490, PPS input phase missing 164 MAP 24A0, PPS power on problem 165 MAP 24B0, 2105 cannot be power off, pinned data 167 MAP 24F0, both RPC cards firmware down level 168 MAP 2520, PPS output circuit breaker tripped 168 MAP 2600, RPC card cannot reset a power fault 169 MAP 2700, CEC drawer power on problem 170 MAP 2800, CEC or I/O drawer visual power supply problem 171 MAP 2810, host bay drawer visual power supply problem 174 MAP 2xxx: power and cooling MAPs, entry table 42 MAP 3000, isolating an SSA link error between two DDMs 176 MAP 3010, isolating a degraded SSA link between two DDMs 178 MAP 3050, isolating an SSA link error between a DDM and an SSA device card 179 MAP 3060, isolating a degraded SSA link between a DDM and an SSA device card 184 MAP 3077, isolating an SSA link error between a DDM and two SSA device cards 187 MAP 3078, isolating a degraded SSA link between a DDM and two SSA device cards 193 MAP 3085, isolating an SSA link error two SSA device cards connected through a DDM bay 197 MAP 3086, isolating a degraded SSA link between two SSA device cards connected throjugh a DDM bay 201 MAP 3095, isolating an SSA link error between two DDMs in separate DDM bays and an SSA device card 204 MAP 3096, isolating a degraded SSA link between two DDMs in separate DDM bays and an SSA device card 209 MAP 3100, isolating an SSA link error between two DDMs in separate DDM bays 212 MAP 3101, isolating a degraded SSA link between two DDMs in separate DDM bays 217 MAP 3120, isolating an SSA link error 220 MAP 3121, isolating a degraded SSA link 223 MAP 3123, array repair required 226 MAP 3124, isolating between DDM hardware and microcode failures 227 MAP 3125, isolating an unexpected SSA SRN 228

584

VOLUME 1, TotalStorage ESS Service Guide

MAP 3126, isolating an unexpected SSA test results 228 MAP 3127, formatting of a DDM has not completed 229 MAP 3128, isolating an unknown DDM failure 229 MAP 3129, isolating an array repair required failure 230 MAP 3131, attempt to format array member 231 MAP 3142, isolating multiple DDMs on an SSA loop cannot be accessed 231 MAP 3180, controller card faile 235 MAP 3190, wrong drawer type error 236 MAP 3200, uninstalled SSA DDMs connected to loop A 237 MAP 3210, uninstalled SSA DDMs connected to loop B 238 MAP 3220, isolating too few DDMs in DDM bay 239 MAP 3300, repair alternate cluster to run SSA loop test 240 MAP 3360, end a DASD service action 241 MAP 3375, isolating a storage cage fan/power sense card error 242 MAP 3378, isolating a storage cage fan/power sense card error 245 MAP 3379, analyzing a storage cage fan/power sense card check summary indicator on 246 MAP 3381, isolating a storage cage fan/power sense card error 247 MAP 3384, isolating a storage cage fan failure 248 MAP 3387, isolating a storage cage power supply failure 251 MAP 3391, isolating a storage cage power supply problem 255 MAP 3395, isolating a DDM bay power problem 261 MAP 3397, isolating an SSA DASD DDM bay controller card problem 263 MAP 3398, isolating a DDM bay controller card communications problem 264 MAP 3400, replacing a DDM bay frame replacement 266 MAP 3421, storage cage fan/power sense card R2 cable problem 266 MAP 3422, storage cage fan/power sense card R2 jumper and cable problems 268 MAP 3423, isolating a storage cage fan/power sense card R1 jumper missing error 270 MAP 3424, storage cage fan/power sense card R1 jumper failing error 272 MAP 3425, storage cage fan/power sense card R2 cable error 273 MAP 3426, isolating a storage cage fan/power sense card location error 275 MAP 3427, isolating a storage and DDM bay location error 277 MAP 3428, isolating a DDM bay location error 279 MAP 3429, isolating a DDM location problem 282 MAP 3500, verify a DDM bay repair 283 MAP 3520, DDM bay verification for possible problems 284 MAP 3530, SSA devices certify test failure 284 MAP 3540, web initiated format incomplete 285

MAP 3550, incomplete or failed format process 286 MAP 3560, unrelated occurrence, retry verification test 287 MAP 3570, unrelated event caused resume failure 288 MAP 3580, DDM, or DDMs, found in formatting state during IML 288 MAP 3600, multiple DDMs isolated on an SSA loop 289 MAP 3605, isolating an unexpected result 290 MAP 3610, DDM installation with new rank site capacity 290 MAP 3612, DDM installation with mixed capacity rank site 293 MAP 3614, DDM installation introduces different RPM 296 MAP 3617, DDM size is not supported 298 MAP 3618, replacement DDM has slower RPM than called for 299 MAP 3619, this repair requires a larger capacity DDM 301 MAP 3621, new DDM storage capacity smaller than original DDMs 301 MAP 3625, all DDMs on SSA loop A do not have the same characteristics 302 MAP 3626, all DDMs on loop B do not have the same characteristics 303 MAP 3627, unable to determine DDM use 304 MAP 3640, other cluster fenced - unable to verify SSA loop 305 MAP 3650, wrong, missing, or failing bypass card 307 MAP 3652, wrong, missing, or failing passthrough card 309 MAP 3654, bypass card jumpers wrong 311 MAP 3656, 20 mb where 40 mb SSA cable expected 312 MAP 3680, isolating a two DDMs detected over temperature problem 313 MAP 3685, isolating a Multiple DDM detect over temperature problem 316 MAP 3xxx: SSA DASD DDM bay MAPs, entry table 43 MAP 4010, cluster hang during failback or error recovery 319 MAP 4020, hard disk drive build process for both drives 320 MAP 4025, hard drive build process for automatic LIC 324 MAP 4040, entry MAP for CPI problems 326 MAP 4060, replacing I/O drawer FRUs for CPI problems 341 MAP 4070, replacement of host bay FRUs for CPI problems 343 MAP 4090, CPI address mismatch 343 MAP 40A0, fence network isolation 344 MAP 40B0, special cluster problem determination using slow boot mode 346 MAP 40C0, special SCSI bus problem 347 MAP 40D0, special SRN problems 348 MAP 40E0, only one I/O drawer power supply detected 349 MAP 4100, isolating a LIC process read/display problem 351
Index

585

MAP 4110, host bay drawer fan reporting failure 351 MAP 4120, handling unexpected resources 352 MAP 4120, handling unexpected resources, MAP 4120 352 MAP 4130, handling a missing or failing resource 353 MAP 4140, isolating a LIC activation process failure 354 MAP 4150, PPS to RPC interface failure 355 MAP 4160, isolating memory related error codes 355 MAP 4170, loss of redundant input power to CEC, I/O or host bay drawers 357 MAP 4190, RPC to host bay drawer power communication failure 360 MAP 41A0, RPC card host bay drawer fan reporting failure 361 MAP 41B0, CPI interface NVS/IOA card to host bay failure 361 MAP 41C0, ESC 2770 or 2771, missing CPI detected 362 MAP 41D0, CPI problem or host bay slot failure 364 MAP 41E0, CPI failure needing CPI cable as FRU 365 MAP 41F0, a temporary CPI error was detected 365 MAP 4200, extended cluster IML time from NVS battery charging 366 MAP 4240, isolating a blinking 888 error on the CEC drawer operator panel 367 MAP 4350, cluster code load counter = 2 370 MAP 4360, isolation using codes displayed by the CEC drawer operator panel 371 MAP 4370, error displaying problems needing repair 375 MAP 4380, isolating a customer LAN connection problem 376 MAP 4390, isolating a cluster to cluster ethernet problem 377 MAP 43A0, bootlist management using SMS 387 MAP 43A5, bootlist management using SMS for automatic LIC 392 MAP 43B0, cluster dual hard drive ESC 1xxx 398 MAP 43C0, cluster IML from second hard disk drive 400 MAP 43D0, duplicate TCP/IP address detected for this cluster 401 MAP 43E0, service processor reset 401 MAP 4400, displaying cluster SMS error logs 402 MAP 4410, cluster to cluster ethernet communication test 403 MAP 4420, display cluster ethernet network address 405 MAP 4440, ESSNet1 or master console to cluster ethernet problem 405 MAP 4450, ESS cluster to customer network problem 407 MAP 4460, cluster NVS problem 410 MAP 4470, ESC 2768, NVS/IOA card problem 411 MAP 4480, cluster to RPC cards communication problem 411 MAP 4510, isolating a cluster to cluster CPI communication failure 415 MAP 4520, pinned data and/or volume status unknown 417

MAP 4540, cluster minimum configuration 418 MAP 4550, NVS FRU replacement 426 MAP 4560: no valid subsystem status available 427 MAP 4600, CD-ROM test failure 429 MAP 4610, cluster SP, SPCN, or system firmware down-level 430 MAP 4620, isolating a diskette drive failure 430 MAP 4700, cluster FRU replacement (CEC and I/O drawers) 432 MAP 4710, isolating a DDM LIC update problem 442 MAP 4720, host bay fails to power off 443 MAP 4730, cluster power off request problem 446 MAP 4760, recovering from corrupted files or functions 446 MAP 4780, isolating a functional code not running problem 447 MAP 47A0, cluster fails to power off 449 MAP 4810, unexpected host bay power off 452 MAP 4820, isolating a SCSI card configuration timeout 456 MAP 4840, CPI diagnostic communication problem 457 MAP 4850, repair the host bay drawer 458 MAP 4870, host bay power on problem 459 MAP 4880, cluster power on problem 461 MAP 4885, SPCN Load Fault Firmware Error Code 468 MAP 4890, replacing a CEC or I/O drawer power supply 471 MAP 4960, ESC 5500 isolation 471 MAP 4970, isolating a software problem 472 MAP 4980, customer copy services problem 474 MAP 4A00, isolating an automatic LIC activation failure 482 MAP 4A10, automatic LIC activation problem, phase 000 (CCL & NCCL) 482 MAP 4A20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 485 MAP 4A30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 486 MAP 4A40, automatic LIC activation failure, cluster 1 phase 100 (CCL) 488 MAP 4A50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 491 MAP 4A60, automatic LIC activation problem, cluster 1, phase 150 (CCL) 493 MAP 4A70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 495 MAP 4A80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 497 MAP 4A90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 499 MAP 4AA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 501 MAP 4AB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 503 MAP 4AE0, automatic LIC activation cluster problem, phase 400 (CCL & NCCL) 504 MAP 4B10, automatic LIC activation problem, phase 000 (CCL & NCCL) 506

586

VOLUME 1, TotalStorage ESS Service Guide

MAP 4B20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 509 MAP 4B30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 511 MAP 4B40, automatic LIC activation problem, cluster 1, phase 100 (CCL) 514 MAP 4B50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 517 MAP 4B60, automatic LIC activation problem, cluster 1, phase 150, (CCL) 520 MAP 4B70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 523 MAP 4B80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 526 MAP 4B90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 529 MAP 4BA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 532 MAP 4BB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 534 MAP 4BE0, automatic LIC activation problem, phase 400 (CCL & NCCL) 537 MAP 4xxx: cluster MAPs, entry table 45 MAP 5000, ESS Specialist cannot access cluster 540 MAP 5220, isolating a SCSI bus error 541 MAP 5230, isolating a fixed block read data failure 543 MAP 5240, isolating a customer data check failure 544 MAP 5250, isolating a meta data check failure 547 MAP 5305, ESCON or fibre bit error rate test failure 550 MAP 5310, ESCON bit error validation 551 MAP 5320, ESCON optical power measurement 552 MAP 5321, fibre optical power measurement 556 MAP 5330, display ESCON and fibre node descriptors 560 MAP 5340, CKD read data failure 561 MAP 5400, Fibre channel link fault 562 MAP 5410, fibre channel bit error validation 563 MAP 5430, host fibre channel fails to recognize ESS LUNs 564 MAP 5440, fibre host card reports a loss of light 566 MAP 5xxx: host interface MAPs, entry table 48 MAP 6060, service terminal login failed to one cluster 567 MAP 6xxx: service terminal MAPs, entry table 49 MAPs 41 entry for problem isolation 41 entry table, MAP 1xxx: general MAPs 41 entry table, MAP 2xxx: power and cooling MAPs 42 MAP 1200, prioritizing visual symptoms and problems for repair 50 MAP 1210, display and repair a problem 51 MAP 1300, cluster to modem communication problem 52 MAP 1301, call home / remote services failure 55 MAP 1305, isolating SNMP notification problems 56 MAP 1310, isolating e-mail notification problems 58 MAP 1320, problem isolation using visual symptoms 60 MAP 1460, E-mail reported errors 66

MAPs (continued) MAP 1480, replacing a FRU without using a problem 66 MAP 1500, end service action 67 MAP 1600, ESSNet console problem 68 MAP 1602, repairing the ESSNet consoles personal computer 69 MAP 1604, restoring the personal computers software 69 MAP 1605, master console product recovery wizard 73 MAP 1606, converting the personal computer to an ESSNet console 76 MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 MAP 1610, connecting the modem and modem expander for remote support 88 MAP 1620, attaching the ESSNet to a customer network 107 MAP 1630, master console product recovery wizard for Xseries 206 PCs 111 MAP 2000, model 100 attachment rack reported 112 MAP 2020, isolating power symptoms 112 MAP 2030, CEC, I/O, or host bay drawer overcurrent 113 MAP 2031, repair ground continuity 114 MAP 20A0, cluster not ready 117 MAP 2210, host bay drawer power supply problem 119 MAP 2220, input power to CEC, I/O, host bay drawer power supplies not detected 120 MAP 2230, CEC, I/O, or host bay drawer power fault 122 MAP 2320, installed unit or feature mismatch 124 MAP 2340, PPS status code 06 125 MAP 2350, PPS status indicator codes 127 MAP 2360, 2105 Model 800 (rack 1) UEPO problem 131 MAP 2365, UEPO loop problem 133 MAP 2370, rack 1 power on problem, automatic mode 136 MAP 2380, 2105 Expansion Enclosure (rack 2) UEPO problem 138 MAP 2390, rack 1 power on problem, remote mode 140 MAP 23B0, 2105 expansion enclosure (rack 2) power off problem 144 MAP 23C0, power event threshold exceeded 146 MAP 23D0, RPC-2 card reporting PPS battery set present 147 MAP 23E0, cluster powered off unexpectedly 149 MAP 2400, 2105 Model 800 local power on problems 149 MAP 2410, RPC power mode switch mismatch 153 MAP 2420, 2105 expansion enclosure power on problem 154
Index

587

MAPs (continued) MAP 2430, one RPC card firmware down level 157 MAP 2440, rack 1 power off problem 157 MAP 2450, crossed RPC cables to expansion rack 160 MAP 2460, battery set charge low 162 MAP 2470, battery set detection problem 162 MAP 2490, PPS input phase missing 164 MAP 24A0, PPS power on problem 165 MAP 24B0, 2105 cannot be power off, pinned data 167 MAP 24F0, both RPC cards firmware down level 168 MAP 2520, PPS output circuit breaker tripped 168 MAP 2600, RPC card cannot reset a power fault 169 MAP 2700, CEC drawer power on problem 170 MAP 2800, CEC or I/O drawer visual power supply problem 171 MAP 2810, host bay drawer visual power supply problem 174 MAP 3000, isolating an SSA link error between two DDMs 176 MAP 3010, isolating a degraded SSA link between two DDMs 178 MAP 3050, isolating an SSA link error between a DDM and an SSA device card 179 MAP 3060, isolating a degraded SSA link between a DDM and an SSA device card 184 MAP 3077, isolating an SSA link error between a DDM and two SSA device cards 187 MAP 3078, isolating a degraded SSA link between a DDM and two SSA device cards 193 MAP 3085, isolating an SSA link error two SSA device cards connected through a DDM bay 197 MAP 3086, isolating a degraded SSA link between two SSA device cards connected throjugh a DDM bay 201 MAP 3095, isolating an SSA link error between two DDMs in separate DDM bays and an SSA device card 204 MAP 3096, isolating a degraded SSA link between two DDMs in separate DDM bays and an SSA device card 209 MAP 3100, isolating an SSA link error between two DDMs in separate DDM bays 212 MAP 3101, isolating a degraded SSA link between two DDMs in separate DDM bays 217 MAP 3120, isolating an SSA link error 220 MAP 3121, isolating a degraded SSA link 223 MAP 3123, array repair required 226 MAP 3124, isolating between DDM hardware and microcode failures 227 MAP 3125, isolating an unexpected SSA SRN 228 MAP 3126, isolating an unexpected SSA test results 228 MAP 3127, formatting of a DDM has not completed 229 MAP 3128, isolating an unknown DDM failure 229 MAP 3129, isolating an array repair required failure 230

MAPs (continued) MAP 3131, attempt to format array member 231 MAP 3142, isolating multiple DDMs on an SSA loop cannot be accessed 231 MAP 3149, repairing single or multiple DDM failures 232 MAP 3152, replacing DDMs called out by enhanced PFA 233 MAP 3160, SSA DASD DDM bay single DDM power problem 234 MAP 3180, controller card faile 235 MAP 3190, wrong drawer type error 236 MAP 3200, uninstalled SSA DDMs connected to loop A 237 MAP 3210, uninstalled SSA DDMs connected to loop B 238 MAP 3220, isolating too few DDMs in DDM bay 239 MAP 3300, repair alternate cluster to run SSA loop test 240 MAP 3360, end a DASD service action 241 MAP 3375, isolating a storage cage fan/power sense card error 242 MAP 3378, isolating a storage cage fan/power sense card error 245 MAP 3379, analyzing a storage cage fan/power sense card check summary indicator on 246 MAP 3381, isolating a storage cage fan/power sense card error 247 MAP 3384, isolating a storage cage fan failure 248 MAP 3387, isolating a storage cage power supply failure 251 MAP 3391, isolating a storage cage power supply problem 255 MAP 3395, isolating a DDM bay power problem 261 MAP 3397, isolating an SSA DASD DDM bay controller card problem 263 MAP 3398, isolating a DDM bay controller card communications problem 264 MAP 3400, replacing a DDM bay frame replacement 266 MAP 3421, storage cage fan/power sense card R2 cable problem 266 MAP 3422, storage cage fan/power sense card R2 jumper and cable problems 268 MAP 3423, isolating a storage cage fan/power sense card R1 jumper missing error 270 MAP 3424, storage cage fan/power sense card R1 jumper failing error 272 MAP 3425, storage cage fan/power sense card R2 cable error 273 MAP 3426, isolating a storage cage fan/power sense card location error 275 MAP 3427, isolating a storage and DDM bay location error 277 MAP 3428, isolating a DDM bay location error 279 MAP 3429, isolating a DDM location problem 282 MAP 3500, verify a DDM bay repair 283 MAP 3520, DDM bay verification for possible problems 284

588

VOLUME 1, TotalStorage ESS Service Guide

MAPs (continued) MAP 3530, SSA devices certify test failure 284 MAP 3540, web initiated format incomplete 285 MAP 3550, incomplete or failed format process 286 MAP 3560, unrelated occurrence, retry verification test 287 MAP 3570, unrelated event caused resume failure 288 MAP 3580, DDM, or DDMs, found in formatting state during IML 288 MAP 3600, multiple DDMs isolated on an SSA loop 289 MAP 3605, isolating an unexpected result 290 MAP 3610, DDM installation with new rank site capacity 290 MAP 3612, DDM installation with mixed capacity rank site 293 MAP 3614, DDM installation introduces different RPM 296 MAP 3615, DDMs of same capacity but different rpms on the same SSA loop 298 MAP 3617, DDM size is not supported 298 MAP 3618, replacement DDM has slower RPM than called for 299 MAP 3619, this repair requires a larger capacity DDM 301 MAP 3621, new DDM storage capacity smaller than original DDMs 301 MAP 3625, all DDMs on SSA loop A do not have the same characteristics 302 MAP 3626, all DDMs on loop B do not have the same characteristics 303 MAP 3627, unable to determine DDM use 304 MAP 3640, other cluster fenced - unable to verify SSA loop 305 MAP 3650, wrong, missing, or failing bypass card 307 MAP 3652, wrong, missing, or failing passthrough card 309 MAP 3654, bypass card jumpers wrong 311 MAP 3656, 20 mb where 40 mb SSA cable expected 312 MAP 3680, isolating a two DDMs detected over temperature problem 313 MAP 3685, isolating a Multiple DDM detect over temperature problem 316 MAP 4010, cluster hang during failback or error recovery 319 MAP 4020, hard disk drive build process for both drives 320 MAP 4025, hard drive build process for automatic LIC 324 MAP 4040, entry MAP for CPI problems 326 MAP 4055, bay held reset condition 339 MAP 4060, replacing I/O drawer FRUs for CPI problems 341 MAP 4070, replacement of host bay FRUs for CPI problems 343 MAP 4090, CPI address mismatch 343 MAP 40A0, fence network isolation 344

MAPs (continued) MAP 40B0, special cluster problem determination using slow boot mode 346 MAP 40C0, special SCSI bus problem 347 MAP 40D0, special SRN problems 348 MAP 40E0, only one I/O drawer power supply detected 349 MAP 4100, isolating a LIC process read/display problem 351 MAP 4110, host bay drawer fan reporting failure 351 MAP 4120, handling unexpected resources 352 MAP 4130, handling a missing or failing resource 353 MAP 4140, isolating a LIC activation process failure 354 MAP 4150, PPS to RPC interface failure 355 MAP 4160, isolating memory related error codes 355 MAP 4170, loss of redundant input power to CEC, I/O or host bay drawers 357 MAP 4180, RPC to RPC communication fault 359 MAP 4190, RPC to host bay drawer power communication failure 360 MAP 41A0, RPC card host bay drawer fan reporting failure 361 MAP 41B0, CPI interface NVS/IOA card to host bay failure 361 MAP 41C0, ESC 2770 or 2771, missing CPI detected 362 MAP 41D0, CPI problem or host bay slot failure 364 MAP 41E0, CPI failure needing CPI cable as FRU 365 MAP 41F0, a temporary CPI error was detected 365 MAP 4200, extended cluster IML time from NVS battery charging 366 MAP 4240, isolating a blinking 888 error on the CEC drawer operator panel 367 MAP 4350, cluster code load counter = 2 370 MAP 4360, isolation using codes displayed by the CEC drawer operator panel 371 MAP 4370, error displaying problems needing repair 375 MAP 4380, isolating a customer LAN connection problem 376 MAP 4390, isolating a cluster to cluster ethernet problem 377 MAP 43A0, bootlist management using SMS 387 MAP 43A5, bootlist management using SMS for automatic LIC 392 MAP 43B0, cluster dual hard drive ESC 1xxx 398 MAP 43C0, cluster IML from second hard disk drive 400 MAP 43D0, duplicate TCP/IP address detected for this cluster 401 MAP 43E0, service processor reset 401 MAP 4400, displaying cluster SMS error logs 402 MAP 4410, cluster to cluster ethernet communication test 403
Index

589

MAPs (continued) MAP 4420, display cluster ethernet network address 405 MAP 4440, ESSNet1 or master console to cluster ethernet problem 405 MAP 4450, ESS cluster to customer network problem 407 MAP 4460, cluster NVS problem 410 MAP 4470, ESC 2768, NVS/IOA card problem 411 MAP 4480, cluster to RPC cards communication problem 411 MAP 4510, isolating a cluster to cluster CPI communication failure 415 MAP 4520, pinned data and/or volume status unknown 417 MAP 4540, cluster minimum configuration 418 MAP 4550, NVS FRU replacement 426 MAP 4560: no valid subsystem status available 427 MAP 45A0: pinned data, special case 428 MAP 4600, CD-ROM test failure 429 MAP 4610, cluster SP, SPCN, or system firmware down-level 430 MAP 4620, isolating a diskette drive failure 430 MAP 4640, cluster SP, SPCN, or system firmware reload 431 MAP 4670, cluster powered off unexpectedly 431 MAP 4700, cluster FRU replacement (CEC and I/O drawers) 432 MAP 4710, isolating a DDM LIC update problem 442 MAP 4720, host bay fails to power off 443 MAP 4730, cluster power off request problem 446 MAP 4760, recovering from corrupted files or functions 446 MAP 4780, isolating a functional code not running problem 447 MAP 47A0, cluster fails to power off 449 MAP 4810, unexpected host bay power off 452 MAP 4820, isolating a SCSI card configuration timeout 456 MAP 4840, CPI diagnostic communication problem 457 MAP 4850, repair the host bay drawer 458 MAP 4870, host bay power on problem 459 MAP 4880, cluster power on problem 461 MAP 4885, SPCN Load Fault Firmware Error Code 468 MAP 4890, replacing a CEC or I/O drawer power supply 471 MAP 4960, ESC 5500 isolation 471 MAP 4970, isolating a software problem 472 MAP 4980, customer copy services problem 474 MAP 4990, LIC feature license failure 476 MAP 49A0, failure detected during Background Certify and Build Logical Configuration from ISA 477 MAP 4A00, isolating an automatic LIC activation failure 482 MAP 4A10, automatic LIC activation problem, phase 000 (CCL & NCCL) 482

MAPs (continued) MAP 4A20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 485 MAP 4A30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 486 MAP 4A40, automatic LIC activation failure, cluster 1 phase 100 (CCL) 488 MAP 4A50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 491 MAP 4A60, automatic LIC activation problem, cluster 1, phase 150 (CCL) 493 MAP 4A70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 495 MAP 4A80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 497 MAP 4A90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 499 MAP 4AA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 501 MAP 4AB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 503 MAP 4AE0, automatic LIC activation cluster problem, phase 400 (CCL & NCCL) 504 MAP 4B10, automatic LIC activation problem, phase 000 (CCL & NCCL) 506 MAP 4B20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 509 MAP 4B30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 511 MAP 4B40, automatic LIC activation problem, cluster 1, phase 100 (CCL) 514 MAP 4B50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 517 MAP 4B60, automatic LIC activation problem, cluster 1, phase 150, (CCL) 520 MAP 4B70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 523 MAP 4B80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 526 MAP 4B90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 529 MAP 4BA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 532 MAP 4BB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 534 MAP 4BE0, automatic LIC activation problem, phase 400 (CCL & NCCL) 537 MAP 5000, ESS Specialist cannot access cluster 540 MAP 5220, isolating a SCSI bus error 541 MAP 5230, isolating a fixed block read data failure 543 MAP 5240, isolating a customer data check failure 544 MAP 5250, isolating a meta data check failure 547 MAP 5300: link fault isolation 548 MAP 5305, bit error rate test failure 550 MAP 5310, ESCON bit error validation 551 MAP 5320, ESCON optical power measurement 552 MAP 5321, fibre optical power measurement 556

590

VOLUME 1, TotalStorage ESS Service Guide

MAPs (continued) MAP 5330, display ESCON and fibre node descriptors 560 MAP 5340, CKD read data failure 561 MAP 5400, Fibre channel link fault 562 MAP 5410, fibre channel bit error validation 563 MAP 5430, host fibre channel fails to recognize ESS LUNs 564 MAP 5440, fibre host card reports a loss of light 566 MAP 6060, service terminal login failed to one cluster 567 MAPs 1XXX, general isolation procedures 50 MAPs 2XXX, power and cooling isolation procedures 112 MAPs 3XXX, SSA DASD DDM bay isolation procedures 176 MAPs 4XXX, cluster isolation procedures 319 MAPs 5XXX, host interface isolation procedures 540 MAPs 6XXX, service terminal isolation procedures 567 problem isolation 41 using the DDM bay maintenance analysis procedures (MAPs) 176 using the SSA DASD maintenance analysis procedures (MAPs) 176 MAPs 2XXX, power and cooling isolation procedures 112 MAPs 3XXX, SSA DASD DDM bay isolation procedures 176 MAPs 4XXX, cluster isolation procedures 319 MAPs 5XXX, host interface isolation procedures 540 MAPs 6XXX, service terminal isolation procedures 567 master console 8 information 7 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 replaces ESSNet console 8 master console information 7 master console product recovery wizard for Xseries 206 PCs, MAP 1630 111 master console product recovery wizard, MAP 1605 73 media maintenance customer media maintenance examples 38 media SIM maintenance procedures 37 media SIM maintenance procedures 37 start 37 MOC (see Korean Government Ministry of Communication) xix model 100 attachment rack reported, MAP 2000 112 multiple DDMs isolated on an SSA loop, MAP 3600 289

notices, electronic emission xviii NVS FRU replacement, MAP 4550 426

O
one RPC card firmware down level, MAP 2430 157 only one I/O drawer power supply detected, MAP 40E0 349 ordering publications xxv other cluster fenced - unable to verify SSA loop, MAP 3640 305

P
patent licenses xvii pinned data MAP 45A0: pinned data, special case 428 pinned data and/or volume status unknown, MAP 4520 417 pinned data, special case 428 power event threshold exceeded, MAP 23C0 146 power off and reboot procedure for the TotalStorage ESS master console, MAP 1609 87 PPS input phase missing, MAP 2490 164 PPS output circuit breaker tripped, MAP 2520 168 PPS power on problem, MAP 24A0 165 PPS status code 06, MAP 2340 125 PPS status indicator codes, MAP 2350 127 PPS to RPC interface failure, MAP 4150 355 prioritizing visual symptoms and problems for repair, MAP 1200 50 problem isolation procedures (MAPs) 41 problem isolation using visual symptoms, MAP 1320 60 products xvii programs xvii publications, ordering xxv

R
rack 1 power off problem, MAP 2440 157 rack 1 power on problem, automatic mode, MAP 2370, 136 rack 1 power on problem, remote mode, MAP 2390 140 radio-frequency energy compliance statement xviii RAID information 4 RAID-10 information 4 RAID-5 information 4 recovering from corrupted files or functions, MAP 4760 446 redundant array of independent disks (RAID) information 4 refcode decode a refcode 36 generating a refcode from sense bytes 37 reference information 1 related manuals xxv remote service support information 13 remote service support information 13
Index

N
new DDM storage capacity smaller than original DDMs, MAP 3621 301 notices laser safety xvii safety xvii

591

repair using a SIM console message 33 using an EREP report 34 repair alternate cluster to run SSA loop test, MAP 3300 240 repair ground continuity, MAP 2031 114 repair the host bay drawer, MAP 4850 458 repair using a SIM console message 33 start 33 repair using an EREP reroute 34 start 34 repairing single or multiple DDM failures, MAP 3149 232 repairing the ESSNet consoles personal computer, MAP 1602 69 replacement DDM has slower RPM than called for, MAP 3618 299 replacement of host bay FRUs for CPI problems, MAP 4070 343 replacing a CEC or I/O drawer power supply, MAP 4890 471 replacing a DDM bay frame replacement, MAP 3400 266 replacing a FRU without using a problem, MAP 1480 66 replacing I/O drawer FRUs for CPI problems, MAP 4060 341 reports EREP reports 34 event history report 35 system exception reports 34 restoring the personal computers software, MAP 1604 69 RPC card cannot reset a power fault, MAP 2600 169 RPC card host bay drawer fan reporting failure, MAP 41A0 361 RPC local and automatic switch settings information 20 RPC local and automatic switch settings information 20 RPC local and remote switch settings information 20 RPC local and remote switch settings information 20 RPC power mode switch mismatch, MAP 2410 153 RPC to host bay drawer power communication failure, MAP 4190 360 RPC to RPC communication fault 359 RPC-2 card reporting PPS battery set present, MAP 23D0 147

S
safety notices attention xvii caution xvii danger xvii laser xvii notices xvii translations of xvii SCSI information 5

SCSI host system information 5 service actions analyze and repair a service request 29 change communications configuration 29 ESSNet console 29 Information 29 install 29 licensed internal code (microcode EC) 29 logical configuration / ESS specialist 29 remove 29 service terminal 29 start 29 system/390 repair 29 test a machine function 29 service interface information 13 service interface information 13 service processor reset, MAP 43E0 401 service terminal login failed to one cluster, MAP 6060 567 services xvii sim generation and usage 33 SIM customer receives sense data without a SIM 34 repair using a SIM console message 33 sim generation and usage 33 start 33 solating an SSA link error, MAP 3120 220 SPCN Load Fault Firmware Error Code, MAP 4885 468 special case, pinned data 428 special cluster problem determination using slow boot mode, MAP 40B0 346 special SCSI bus problem, MAP 40C0 347 special SRN problems, MAP 40D0 348 special tools 28 SSA DDM bay external SSA connections 24 DDM bay external SSA connections, five DDM bays 27 DDM bay external SSA connections, four DDM bays 27 DDM bay external SSA connections, one DDM bays 25 DDM bay external SSA connections, six DDM bays 27 DDM bay external SSA connections, three DDM bays 26 DDM bay external SSA connections, two DDM bays 26 DDM bay internal SSA connections 24 DDM bay internal SSA connections, two DDM bays 25 DDM bay SSA connections 25 SSA DASD DDM bay replacing DDMs called out by enhanced PFA 233 single DDM power problem 234 SSA DASD, maintenance analysis procedures (MAPs) 176 SSA devices certify test failure, MAP 3530 284

592

VOLUME 1, TotalStorage ESS Service Guide

start analyze and repair a service request 29 change communications configuration 29 customer media maintenance examples 38 customer receives sense data without a SIM 34 decode a refcode 36 entry table for all service actions 29 EREP reports 34 ESSNet console 29 event history report 35 generating a refcode from sense bytes 37 Information 29 install 29 licensed internal code (microcode EC) 29 logical configuration / ESS specialist 29 media SIM maintenance procedures 37 remove 29 repair using a SIM console message 33 repair using an EREP report 34 service actions 29 service terminal 29 sim generation and usage 33 system exception reports 34 system/390 repair 29 test a machine function 29 start all service actions 29 start service actions 29 statement of compliance European Community Compliance xviii Federal Communications Commission xviii Industry Canada Compliance xviii Japanese Voluntary Control Council for Interference (VCCI) xix Korean Government Ministry of Communication (MOC) xix Taiwan xx statement of EMI Chinese xx storage cage fan/power sense card R1 jumper failing error, MAP 3424 272 storage cage fan/power sense card R2 cable error, MAP 3425, 273 storage cage fan/power sense card R2 cable problem, MAP 3421 266 storage cage fan/power sense card R2 jumper and cable problems, MAP 3422 268 switching ESS power off (automatic mode) information 21 switching ESS power off (automatic mode) information 21 switching ESS power off (local mode) information 20 switching ESS power off (local mode) information 20 switching ESS power off (remote mode) information 21 switching ESS power off (remote mode) information 21 switching ESS power on and off (all modes) information 19 switching ESS power on and off (all modes) information 19

switching ESS power on and off information 19 switching ESS power on and off information 19 switching ESS power on and off information 19 switching ESS power on and off information 19 switching ESS power on and off information 20 switching ESS power on and off information 20 system exception reports 34 start 34

(automatic mode) (automatic mode) (local mode) (local mode) (remote mode) (remote mode)

T
Taiwan compliance statement xx testing modem communications 94 this repair requires a larger capacity DDM, MAP 3619 301 topics, information 1 TotalStorage expert information 10 TotalStorage expert information 10 trademarks xx

U
UEPO loop problem, MAP 2365 133 unable to determine DDM use, MAP 3627 304 unexpected host bay power off, MAP 4810 452 uninstalled SSA DDMs connected to loop A, MAP 3200 237 uninstalled SSA DDMs connected to loop B, MAP 3210 238 unrelated event caused resume failure, MAP 3570 288 unrelated occurrence, retry verification test, MAP 3560 287 using the DDM bay maintenance analysis procedures (MAPs) 176 using the ESS operator panel information 17 using the ESS operator panel information 17 using the SSA DASD maintenance analysis procedures (MAPs) 176

V
VCCI (see Japanese Voluntary Control Council for Interference) xix verify a DDM bay repair, MAP 3500 283

W
web initiated format incomplete, MAP 3540 285 where to start all service actions 29 wrong drawer type error, MAP 3190 236 wrong, missing, or failing bypass card, MAP 3650 307

Index

593

wrong, missing, or failing passthrough card, MAP 3652 309

594

VOLUME 1, TotalStorage ESS Service Guide

Readers Comments Wed Like to Hear from You


IBM TotalStorage Enterprise Storage Server Service Guide 2105 Models 750/800 and Expansion Enclosure Volume 1 Chapters 1, 2 (START), and 3 Publication No. SY27-7635-05 Overall, how satisfied are you with the information in this book? Very Satisfied h Satisfied h Neutral h Dissatisfied h Very Dissatisfied h

Overall satisfaction

How satisfied are you that the information in this book is: Very Satisfied h h h h h h Satisfied h h h h h h Neutral h h h h h h Dissatisfied h h h h h h Very Dissatisfied h h h h h h

Accurate Complete Easy to find Easy to understand Well organized Applicable to your tasks

Please tell us how we can improve this book:

Thank you for your responses. May we contact you?

h Yes

h No

When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments in any way it believes appropriate without incurring any obligation to you.

Name Company or Organization Phone No.

Address

___________________________________________________________________________________________________

Readers Comments Wed Like to Hear from You


SY27-7635-05

Cut or Fold Along Line

Fold and _ _ _ _ _ _ _ _ _ _Fold and_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _Please _ _ _ _ _ staple _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Tape _ _ _ _ _ _ _ _ Tape _ _ _ _ do not _ _ _ _ NO POSTAGE NECESSARY IF MAILED IN THE UNITED STATES

BUSINESS REPLY MAIL


FIRST-CLASS MAIL PERMIT NO. 40 ARMONK, NEW YORK POSTAGE WILL BE PAID BY ADDRESSEE

IBM Information Development Department 61C 9032 South Rita Road Tucson, Arizona U.S.A. 85775-4401

_________________________________________________________________________________________ Please do not staple Fold and Tape Fold and Tape

SY27-7635-05

Cut or Fold Along Line

Part Number: 23R0913

Printed in U.S.A. (ss)

SY27-7635-05

(1P) P/N: 23R0913

Spine information:

IBM TotalStorage Enterprise Storage Server

VOLUME 1, TotalStorage ESS Service Guide

You might also like