Professional Documents
Culture Documents
Service Guide 2105 Models 750/800 and Expansion Enclosure Volume 1 Chapters 1, 2 (START), and 3
SY27-7635-05
Service Guide 2105 Models 750/800 and Expansion Enclosure Volume 1 Chapters 1, 2 (START), and 3
SY27-7635-05
Note Before using this information and the product it supports, be sure to read the general information under Notices on page xvii.
Sixth Edition (November 2005) This edition replaces SY27-7635-04 This edition applies to the first release of the IBM TotalStorage Enterprise Storage Server and to all following releases and changes until otherwise indicated in new editions. Order publications through your IBM representative or the IBM branch office serving your locality. Publications are not stocked at the address given below. IBM welcomes your comments. A form for readers comments may be supplied at the back of this publication, or you may mail your comments to the following address: International Business Machines Corporation Information Development Department 61C 9032 South Rita Road Tucson, AZ 85775-5501 U.S.A. When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes suitable without incurring any obligation to you. Copyright International Business Machines Corporation 2004, 2005. All rights reserved. US Government Users Restricted Rights Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Safety Notices . . . . . . . . . . . . . . . . . . . . . . . . . xvii Laser Safety and Compliance . . . . . . . . . . . . . . . . . . xvii Translated Safety Notices. . . . . . . . . . . . . . . . . . . . xvii Environmental Notices . . . . . . . . . . . . . . . . . . . . . . xvii Product Recycling . . . . . . . . . . . . . . . . . . . . . . xviii Product Disposal . . . . . . . . . . . . . . . . . . . . . . . xviii Electronic Emission Notices . . . . . . . . . . . . . . . . . . . . xviii Federal Communications Commission (FCC) Statement . . . . . . . . xviii Industry Canada Compliance Statement . . . . . . . . . . . . . . xviii European Community Compliance Statement . . . . . . . . . . . . xviii Japanese Voluntary Control Council for Interference (VCCI) Class A Statement . . . . . . . . . . . . . . . . . . . . . . . . . xix Korean Ministry of Information and Communication (MIC) Statement . . . . xix Taiwan Class A Compliance Statement . . . . . . . . . . . . . . . xx Chinese Class A Electronic Emission Statement . . . . . . . . . . . . xx Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . xx Using This Service Guide. . . . Where to Start . . . . . . . . Limited Vocabulary . . . . . . . Publications . . . . . . . . . TotalStorage ESS Product Library Ordering Publications . . . . . Web Sites . . . . . . . . . Other Related Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii xxiii xxiii xxiv xxiv xxv xxv xxv . 1 . 1 . 2 . 3 . 17 . 19 . 21 . 21 . 23 . 24 . 24 . 28 . . . . . . . . . 29 29 33 33 34 34 34 36 37
Chapter 1: Reference Information. . . . . . . . . . . . . . 2105 Models 750 and 800 Overview . . . . . . . . . . . . . 2105 Model 750 Specifications . . . . . . . . . . . . . . 2105 Model 800 Specifications . . . . . . . . . . . . . . Using the ESS operator panel . . . . . . . . . . . . . . . Switching the ESS power on and off (Local, Automatic or Remote) . 2105 Models 750 and 800 Disk Storage . . . . . . . . . . . DDM Bay Indicators . . . . . . . . . . . . . . . . . . DDM Bay Disk Drive Module Indicators . . . . . . . . . . . Internal Connections (DDM Bay) . . . . . . . . . . . . . External SSA Connections (DDM Bay) . . . . . . . . . . . Special Tools . . . . . . . . . . . . . . . . . . . . . Chapter 2: Entry for All Service Actions . . . Entry Table for All Service Actions . . . . . . SIM Generation and Usage . . . . . . . . Repair Using a SIM Console Message . . . . Customer Receives Sense Data Without a SIM Repair Using an EREP Report . . . . . . . EREP Reports . . . . . . . . . . . . Decode a Refcode . . . . . . . . . . . Generating a Refcode from Sense Bytes . .
Copyright IBM Corp. 2004, 2005
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
iii
Media SIM Maintenance Procedures . . . . . . . . . . . . . . . . . 37 Customer Media Maintenance Procedure Examples . . . . . . . . . . 38 Chapter 3: Problem Isolation Procedures . . . . . . . . . . . . . . 41 Entry for Maintenance Analysis Procedures (MAPs) . . . . . . . . . . . 41 MAP 1XXX: General Maintenance Analysis Procedures . . . . . . . . . 41 MAP 2XXX: Power and Cooling Maintenance Analysis Procedures . . . . . 42 MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures . . . . 43 MAP 4XXX: Cluster Maintenance Analysis Procedures . . . . . . . . . 45 MAP 5XXX: Host Interface Maintenance Analysis Procedures. . . . . . . 48 MAP 6XXX: Service Terminal Maintenance Analysis Procedures . . . . . . 49 MAPs 1XXX: General Isolation Procedures . . . . . . . . . . . . . . 50 MAP 1200: Prioritizing Visual Symptoms and Problems For Repair . . . . . 50 MAP 1210: Displaying and Repairing a Problem . . . . . . . . . . . 51 MAP 1300: Isolating Cluster to Modem Communication Problems . . . . . 52 MAP 1301: Isolating Call Home / Remote Services Failure . . . . . . . . 55 MAP 1305: Isolating SNMP Notification Problems . . . . . . . . . . . 56 MAP 1310: Isolating E-Mail Notification Problems . . . . . . . . . . . 58 MAP 1320: Isolating Problems Using Visual Symptoms . . . . . . . . . 60 MAP 1460: Isolating E-Mail Reported Errors . . . . . . . . . . . . . 66 MAP 1480: Replacing a FRU, Without Using a Problem . . . . . . . . . 66 MAP 1500: Ending a Service Action . . . . . . . . . . . . . . . . 67 MAP 1600: ESSNet Console Problem . . . . . . . . . . . . . . . 68 MAP 1602: Repairing the ESSNet Consoles Personal Computer . . . . . 69 MAP 1604: Restoring the Personal Computers Software . . . . . . . . 69 MAP 1605: Master Console Product Recovery Wizard . . . . . . . . . 73 MAP 1606: Converting the Personal Computer to an ESSNet Console . . . 76 MAP 1607: Changing the Network Configuration (IP address, host name, domain, subnet mask) for ESS and the TotalStorage ESS Master Console . 85 MAP 1608: Manually Configuring the Video/Graphics Adapter for the Master Console . . . . . . . . . . . . . . . . . . . . . . . . . 86 MAP 1609: Power Off and Reboot Procedure for the TotalStorage ESS Master Console . . . . . . . . . . . . . . . . . . . . . . . 87 MAP 1610: Connecting the Modem and Modem Expander for Remote Support . . . . . . . . . . . . . . . . . . . . . . . . . . 88 MAP 1620: Attaching The ESSNet to a Customer Network . . . . . . . 107 MAP 1630: Master Console Product Recovery Wizard for Xseries 206 PCs 111 MAPs 2XXX: Power and Cooling Isolation Procedures . . . . . . . . . . 112 MAP 2000: Model 100 Attachment Rack Reported . . . . . . . . . . 112 MAP 2020: Isolating Power Symptoms . . . . . . . . . . . . . . . 112 MAP 2030: CEC, I/O, or Host Bay Drawer Overcurrent . . . . . . . . . 113 MAP 2031: Repair Ground Continuity . . . . . . . . . . . . . . . 114 MAP 20A0: Cluster Not Ready . . . . . . . . . . . . . . . . . . 117 MAP 2210: Host Bay Drawer Power Supply Problem . . . . . . . . . 119 MAP 2220: Input Power to CEC, I/O, Host Bay Drawer Power Supply Not Detected . . . . . . . . . . . . . . . . . . . . . . . . . 120 MAP 2230: CEC, I/O, or Host Bay Drawer Power Fault . . . . . . . . 122 MAP 2320: Installed Unit or Feature Mismatch . . . . . . . . . . . . 124 MAP 2340: PPS Status Code 06 . . . . . . . . . . . . . . . . . 125 MAP 2350: Isolating PPS Status Indicator Codes . . . . . . . . . . . 127 MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem . . . . . . . . . 131 MAP 2365: UEPO Loop Problem . . . . . . . . . . . . . . . . . 133 MAP 2370: Rack 1 Power On Problem, Automatic Mode . . . . . . . . 136 MAP 2380: 2105 Expansion Enclosure (Rack 2) UEPO Problem . . . . . 138 MAP 2390: Rack 1 Power On Problem, Remote Mode . . . . . . . . . 140 MAP 23B0: 2105 Expansion Enclosure (Rack 2) Power Off Problem . . . . 144
iv
MAP 23C0: Power Event Threshold Exceeded . . . . . . . . . . . . MAP 23D0: RPC-2 Card Reporting PPS Battery Set Present . . . . . . MAP 23E0: Cluster Powered Off Unexpectedly . . . . . . . . . . . MAP 2400: 2105 Model 800 Local Power On Problems . . . . . . . . MAP 2410: RPC Power Mode Switch Mismatch . . . . . . . . . . . MAP 2420: 2105 Expansion Enclosure Power On Problem . . . . . . . MAP 2430: One RPC Card Firmware Down Level . . . . . . . . . . MAP 2440: Rack 1 Power Off Problem . . . . . . . . . . . . . . MAP 2450: Crossed RPC Cables to Expansion Rack . . . . . . . . . MAP 2460: Battery Set Charge Low. . . . . . . . . . . . . . . . MAP 2470: Battery Set Detection Problem . . . . . . . . . . . . . MAP 2490: PPS Input Phase Missing . . . . . . . . . . . . . . . MAP 24A0: PPS Power On Problem . . . . . . . . . . . . . . . MAP 24B0: 2105 Cannot Power Off, Pinned Data . . . . . . . . . . MAP 24F0: Both RPC Cards Firmware Down Level . . . . . . . . . . MAP 2520: PPS Output Circuit Breaker Tripped . . . . . . . . . . . MAP 2600: RPC Card Cannot Reset a Power Fault . . . . . . . . . . MAP 2700: CEC Drawer Power On Problem . . . . . . . . . . . . MAP 2800: CEC or I/O Drawer Visual Power Supply Problem . . . . . . MAP 2810: Host Bay Drawer Visual Power Supply Problem . . . . . . . MAPs 3XXX SSA DASD DDM Bay Isolation Procedures . . . . . . . . . Using the SSA DASD Maintenance Analysis Procedures (MAPs) . . . . . MAP 3000: Isolating an SSA Link Error Between Two DDMs . . . . . . MAP 3010: Isolating a Degraded SSA Link between Two DDMs . . . . . MAP 3050: Isolating an SSA Link Error Between a DDM and an SSA Device Card . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3060: Isolating a Degraded SSA Link Between a DDM and an SSA Device Card . . . . . . . . . . . . . . . . . . . . . . . MAP 3077: Isolating an SSA Link Error Between a DDM and two SSA Device Cards . . . . . . . . . . . . . . . . . . . . . . . MAP 3078: Isolating a Degraded SSA Link Between a DDM and Two SSA Device Cards . . . . . . . . . . . . . . . . . . . . . . . MAP 3085: Isolating an SSA Link Error Between Two SSA Device Cards Connected Through a DDM Bay . . . . . . . . . . . . . . . . MAP 3086: Isolating a Degraded SSA Link Between Two SSA Device Cards Connected Through a DDM Bay . . . . . . . . . . . . . . . . MAP 3095: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays and an SSA Device Card . . . . . . . . . . . . . . MAP 3096: Isolating a Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card . . . . . . . . . . . . . . MAP 3100: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays . . . . . . . . . . . . . . . . . . . . . . . . MAP 3101: Isolating a Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays . . . . . . . . MAP 3120: Isolating an SSA Link Error . . . . . . . . . . . . . . MAP 3121: Isolating a Degraded SSA Link . . . . . . . . . . . . . MAP 3123: Array Repair Required . . . . . . . . . . . . . . . . MAP 3124: Isolating Between DDM Hardware and Microcode Failures MAP 3125: Isolating an Unexpected SSA SRN. . . . . . . . . . . . MAP 3126: Isolating an Unexpected SSA Test Result . . . . . . . . . MAP 3127: Formatting of a DDM Has Not Completed . . . . . . . . . MAP 3128: Isolating an Unknown DDM Failure . . . . . . . . . . . MAP 3129: Isolating an Array Repair Required Failure . . . . . . . . . MAP 3131: Attempt to Format Array Member . . . . . . . . . . . . MAP 3142: Isolating Multiple DDMs on an SSA Loop Cannot be Accessed MAP 3149: Repairing Single or Multiple DDM Failures . . . . . . . . .
Contents
146 147 149 149 153 154 157 157 160 162 162 164 165 167 168 168 169 170 171 174 176 176 176 178 179 184 187 193 197 201 204 209 212 217 220 223 226 227 228 228 229 229 230 231 231 232
MAP 3152: Replacing DDMs Called Out by Enhanced PFA . . . . . . . MAP 3160: SSA DASD DDM Bay Isolating a Single DDM Redundant Power Fault . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3180: Controller Card Failed . . . . . . . . . . . . . . . . MAP 3190: Wrong Drawer Type Installed . . . . . . . . . . . . . . MAP 3200: Uninstalled SSA DDMs Connected to Loop A . . . . . . . . MAP 3210: Uninstalled SSA DDMs Connected to Loop B . . . . . . . . MAP 3220: Isolating too Few DDMs in a DDM Bay . . . . . . . . . . MAP 3300: Repair Alternate Cluster to Run SSA Loop Test . . . . . . . MAP 3360: Ending a DASD Service Action . . . . . . . . . . . . . MAP 3375: Isolating a Storage Cage Fan/Power Sense Card Error . . . . MAP 3378: Isolating a Storage Cage Fan/Power Sense Card Error . . . . MAP 3379: Analyzing a Storage Cage Fan/Power Sense Card Check Summary Indicator On . . . . . . . . . . . . . . . . . . . . MAP 3381: Isolating a Storage Cage Fan/Power Sense Card Error . . . . MAP 3384: Isolating a Storage Cage Fan Failure . . . . . . . . . . . MAP 3387: Isolating a Storage Cage Power Supply Failure . . . . . . . MAP 3391: Isolating a Storage Cage Power System Problem . . . . . . MAP 3395: Isolating a DDM Bay Power Problem . . . . . . . . . . . MAP 3397: Isolating an SSA DASD DDM Bay Controller Card Problem MAP 3398: Isolating a DDM Bay Controller Card Communications Failure MAP 3400: Replacing a DDM Bay Frame Assembly . . . . . . . . . . MAP 3421: Storage Cage Fan/Power Sense Card R2 Cable Problem . . . MAP 3422: Storage Cage Fan/Power Sense Card R2 Jumper and Cable Problems . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3423: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Missing Error . . . . . . . . . . . . . . . . . . . . . . . MAP 3424: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Failing Error. . . . . . . . . . . . . . . . . . . . . . . . MAP 3425: Isolating a Storage Cage Fan/Power Sense Card R2 Cable Error . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3426: Isolating a Storage Cage Fan/Power Sense Card Location Error MAP 3427: Isolating a Storage and DDM Bay Location Error . . . . . . MAP 3428: Isolating a DDM Bay Location Error . . . . . . . . . . . MAP 3429: Isolating a DDM Location Error . . . . . . . . . . . . . MAP 3500: Verifying a DDM Bay Repair . . . . . . . . . . . . . . MAP 3520: DDM Bay Verification for Possible Problems . . . . . . . . MAP 3530: SSA Devices Certify Test Failure . . . . . . . . . . . . MAP 3540: Web Initiated Format Incomplete, User to Restart . . . . . . MAP 3550: Incomplete or Failed Format Process, User to Restart . . . . MAP 3560: Unrelated Occurrence, Retry Verification Test . . . . . . . . MAP 3570: Unrelated Event Caused Resume Fail . . . . . . . . . . MAP 3580: DDM, or DDMs, Found in Formatting State During IML . . . . MAP 3600: Multiple DDMs Isolated on an SSA Loop . . . . . . . . . MAP 3605: Isolating an Unexpected Result . . . . . . . . . . . . . MAP 3610: DDM Installation with New Rank Site Capacity . . . . . . . MAP 3612: DDM Installation with Mixed Capacity Rank Site . . . . . . . MAP 3614: DDM Installation Introduces Different RPM . . . . . . . . . MAP 3615: DDMs of Same Capacity but Different RPMs on the Same SSA Loop . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 3617: DDM Size is Not Supported . . . . . . . . . . . . . . MAP 3618: Replacement DDM Has Slower RPM Than Called For . . . . MAP 3619: This Repair Requires a Larger Capacity DDM . . . . . . . MAP 3621: New DDM Storage Capacity Smaller Than Original DDMs MAP 3625: All DDMs on SSA Loop A Do Not Have the Same Characteristics . . . . . . . . . . . . . . . . . . . . . . .
233 234 235 236 237 238 239 240 241 242 245 246 247 248 251 255 261 263 264 266 266 268 270 272 273 275 277 279 282 283 284 284 285 286 287 288 288 289 290 290 293 296 298 298 299 301 301 302
vi
MAP 3626: All DDMs on SSA Loop B Do Not Have the Same Characteristics . . . . . . . . . . . . . . . . . . . . . . . MAP 3627: Unable to Determine DDM Use . . . . . . . . . . . . . MAP 3640: Other Cluster Fenced - Unable to Verify SSA Loop . . . . . . MAP 3650: Wrong, Missing, or Failing Bypass Card . . . . . . . . . . MAP 3652: Wrong, Missing, or Failing Passthrough Card . . . . . . . . MAP 3654: Bypass Card Jumpers Wrong . . . . . . . . . . . . . . MAP 3656: 20 MB SSA Cable Installed Where 40 MB Cable Expected MAP 3680: Isolating a Two DDMs Detect Over-Temperature Problem . . . MAP 3685: Isolating a Multiple DDM Detect Over-Temperature Problem MAPs 4XXX: Cluster Isolation Procedures . . . . . . . . . . . . . . MAP 4010: Cluster Hang During a Failback or Error Recovery . . . . . . MAP 4020: Hard Disk Drive Build Process for Both Drives . . . . . . . MAP 4025: Hard Drive Build Process for Automatic LIC . . . . . . . . MAP 4040: Entry MAP for CPI Problems . . . . . . . . . . . . . . MAP 4055: Resolving a Bay Held Reset Condition . . . . . . . . . . MAP 4060: Replacing I/O Drawer FRUs for CPI Problems . . . . . . . MAP 4070: Replacement of Host Bay FRUs for CPI Problems . . . . . . MAP 4090: CPI Address Mismatch . . . . . . . . . . . . . . . . MAP 40A0: Fence Network Isolation . . . . . . . . . . . . . . . MAP 40B0: Special Cluster Problem Determination Using Slow Boot Mode MAP 40C0: Special SCSI Bus Problems . . . . . . . . . . . . . . MAP 40D0: Special SRN Problems . . . . . . . . . . . . . . . . MAP 40E0: Only One I/O Drawer Power Supply Detected . . . . . . . MAP 4100: Isolating a LIC Process Read/Display Problem . . . . . . . MAP 4110: Host Bay Drawer Fan Reporting Failure . . . . . . . . . . MAP 4120: Handling Unexpected Resources . . . . . . . . . . . . MAP 4130: Handling a Missing or Failing Resource . . . . . . . . . . MAP 4140: Isolating a LIC Activation Process Failure . . . . . . . . . MAP 4150: PPS to RPC Interface Failure . . . . . . . . . . . . . MAP 4160: Isolating Memory Related Error Codes . . . . . . . . . . MAP 4170: Loss of Redundant Input Power to CEC, I/O, or Host Bay Drawers . . . . . . . . . . . . . . . . . . . . . . . . . MAP 4180: RPC to RPC Communication Failure . . . . . . . . . . . MAP 4190: RPC to Host Bay Drawer Power Supply Communication Failure MAP 41A0: RPC Card Host Bay Drawer Fan Reporting Failure . . . . . MAP 41B0: CPI Interface NVS/IOA Card to Host Bay Failure . . . . . . MAP 41C0: ESC 2770 or 2771, Missing CPI Detected . . . . . . . . . MAP 41D0: CPI Problem for Host Bay Slot Failure . . . . . . . . . . MAP 41E0: CPI Failure Needing CPI Cable as FRU . . . . . . . . . . MAP 41F0: A Temporary CPI Error was Detected . . . . . . . . . . . MAP 4200: Extended Cluster IML Time Due to NVS Battery Charging . . . MAP 4240: Isolating a Blinking 888 Error on the CEC Drawer Operator Panel . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 4350: Isolating Cluster Code Load Counter=2 . . . . . . . . . . MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel . . . . . . . . . . . . . . . . . . . . . . . . . . MAP 4370: Error Displaying Problems Needing Repair . . . . . . . . . MAP 4380: Isolating a Customer LAN Connection Problem . . . . . . . MAP 4390: Isolating a Cluster to Cluster Ethernet Problem . . . . . . . MAP 43A0: Bootlist Management Using SMS . . . . . . . . . . . . MAP 43A5: Bootlist Management Using SMS for Automatic LIC . . . . . MAP 43B0: Cluster Dual Hard Drive ESC 1xxx . . . . . . . . . . . MAP 43C0: Cluster IML from Second Hard Disk Drive . . . . . . . . . MAP 43D0: Duplicate TCP/IP Address Detected for this Cluster . . . . . MAP 43E0: Service Processor Reset . . . . . . . . . . . . . . .
Contents
303 304 305 307 309 311 312 313 316 319 319 320 324 326 339 341 343 343 344 346 347 348 349 351 351 352 353 354 355 355 357 359 360 361 361 362 364 365 365 366 367 370 371 375 376 377 387 392 398 400 401 401
vii
MAP 4400: Displaying Cluster SMS Error Logs . . . . . . . . . . . MAP 4410: Cluster to Cluster Ethernet Communication Test . . . . . . . MAP 4420: Display Cluster Ethernet Network Address . . . . . . . . . MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem . . . MAP 4450: ESS Cluster to Customer Network Problem . . . . . . . . MAP 4460: Cluster NVS Problem . . . . . . . . . . . . . . . . MAP 4470: ESC 2768, NVS/IOA Card Problem . . . . . . . . . . . MAP 4480: Cluster to RPC Cards Communication Problem . . . . . . . MAP 4510: Isolating a Cluster to Cluster CPI Communication Failure . . . MAP 4520: Pinned Data and/or Volume Status Unknown . . . . . . . . MAP 4540: Cluster Minimum Configuration . . . . . . . . . . . . . MAP 4550: NVS FRU Replacement . . . . . . . . . . . . . . . . MAP 4560: No Valid Subsystem Status Available . . . . . . . . . . . MAP 45A0: Pinned Data, Special Case . . . . . . . . . . . . . . MAP 4600: Isolating a CD-ROM Test Failure . . . . . . . . . . . . MAP 4610: Cluster SP, SPCN, or System Firmware Down-Level . . . . . MAP 4620: Isolating a Diskette Drive Failure . . . . . . . . . . . . MAP 4640: Cluster SP, SPCN, or System Firmware Reload . . . . . . . MAP 4670: Cluster Powered Off Unexpectedly . . . . . . . . . . . . MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) . . . . . . MAP 4710: Isolating a DDM LIC Update Problem . . . . . . . . . . . MAP 4720: Host Bay Fails to Power Off . . . . . . . . . . . . . . MAP 4730: Cluster Power Off Request Problem . . . . . . . . . . . MAP 4760: Recovering from Corrupted Files or Functions . . . . . . . MAP 4780: Isolating a Functional Code Not Running Problem . . . . . . MAP 47A0: Cluster Fails to Power Off . . . . . . . . . . . . . . . MAP 4810: Unexpected Host Bay Power Off . . . . . . . . . . . . MAP 4820: Isolating a SCSI Card Configuration Timeout . . . . . . . . MAP 4840: CPI Diagnostic Communication Problem . . . . . . . . . MAP 4850: Repair the Host Bay Drawer . . . . . . . . . . . . . . MAP 4870: Host Bay Power On Problem . . . . . . . . . . . . . . MAP 4880: Cluster Power On Problem . . . . . . . . . . . . . . MAP 4885: SPCN Load Fault Firmware Error Code . . . . . . . . . . MAP 4890: Replacing a CEC or I/O Drawer Power Supply . . . . . . . MAP 4960: ESC 5500 Isolation . . . . . . . . . . . . . . . . . MAP 4970: Isolating a Software Problem . . . . . . . . . . . . . . MAP 4980: Customer Copy Services Problems . . . . . . . . . . . MAP 4990: LIC Feature License Failure . . . . . . . . . . . . . . MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA . . . . . . . . . . . . . . . . . . . . MAP 4A00: Isolating an Automatic LIC Activation Failure . . . . . . . . MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) . . . . . . . . . . . . . . . . . . MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) . . . . . . . . . . . . . . . . . . . . . . MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL) . . . . . . . . . . . . . . . . . . . . . . . . . .
402 403 405 405 407 410 411 411 415 417 418 426 427 428 429 430 430 431 431 432 442 443 446 446 447 449 452 456 457 458 459 461 468 471 471 472 474 476 477 482 482 485 486 488 491 493 495 497 499 501 503 504
viii
MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL) MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL) MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL) MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL) MAPs 5XXX: Host Interface Isolation Procedures . . . . . . . . . . . . MAP 5000: ESS Specialist Cannot Access Cluster . . . . . . . . . . MAP 5220: Isolating a SCSI Bus Error . . . . . . . . . . . . . . . MAP 5230: Isolating a Fixed Block Read Data Failure . . . . . . . . . MAP 5240: Isolating a Customer Data Check Failure . . . . . . . . . MAP 5250: Isolating a Meta Data Check Failure . . . . . . . . . . . MAP 5300: ESCON or Fibre Channel Link Fault . . . . . . . . . . . MAP 5305: ESCON or Fibre Channel Bit Error Rate Test Failure . . . . . MAP 5310: ESCON Bit Error Rate Validation . . . . . . . . . . . . MAP 5320: ESCON Optical Power Measurement . . . . . . . . . . . MAP 5321: Fibre Channel Optical Power Measurement . . . . . . . . MAP 5330: Display ESCON and Fibre Node Descriptors . . . . . . . . MAP 5340: CKD Read Data Failure . . . . . . . . . . . . . . . . MAP 5400: Fibre Channel Link Fault . . . . . . . . . . . . . . . MAP 5410: Fibre Channel Bit Error Rate Validation . . . . . . . . . . MAP 5430: Host Fibre Channel Fails to Recognize ESS LUNs . . . . . . MAP 5440: Fibre Host Card Reports a Loss of Light . . . . . . . . . MAPs 6XXX: Service Terminal Isolation Procedures . . . . . . . . . . . MAP 6060: Isolating a Service Terminal Login Failure . . . . . . . . . Appendix. Accessibility . Features . . . . . . . Navigating by keyboard . Accessing the publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
506 509 511 514 517 520 523 526 529 532 534 537 540 540 541 543 544 547 548 550 551 552 556 560 561 562 563 564 566 567 567 573 573 573 573
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
Contents
ix
Tables
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. Fibre Channel Host Card LED Indicators . . . . . . . . . . . . . . CEC Drawer Power Indicators . . . . . . . . . . . . . . . . . . I/O Drawer Power Indicators . . . . . . . . . . . . . . . . . . Power control with Remote Power Control Feature installed . . . . . . . Power control without Remote Power Control Feature installed . . . . . . Summary of Bypass Card Indicators . . . . . . . . . . . . . . . . Entry for All Service Actions . . . . . . . . . . . . . . . . . . . 2105 Media Maintenance Procedures . . . . . . . . . . . . . . . MAP 1XXX: General Maintenance Analysis Procedures . . . . . . . . MAP 2XXX: Power and Cooling Maintenance Analysis Procedures . . . . MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures . . . MAP 4XXX: Cluster Maintenance Analysis Procedures . . . . . . . . . MAP 5XXX: Host Interface Maintenance Analysis Procedures . . . . . . MAP 6XXX: Service Terminal Maintenance Analysis Procedures . . . . . Prioritizing Repairs . . . . . . . . . . . . . . . . . . . . . . Call Home Return Codes . . . . . . . . . . . . . . . . . . . . 2105 Model 800 Operator Panel Visual Symptoms . . . . . . . . . . 2105 Model 800 PPS and RPC Card Visual Symptoms. . . . . . . . . 2105 Model 800 CEC, I/O, and Host Bay Visual Symptoms . . . . . . . 2105 Model 800 Storage Bay Visual Symptoms . . . . . . . . . . . DDM Bay, and DDMs Visual Symptoms . . . . . . . . . . . . . . 2105 Model 800 Recommended ESSNet Hub Connection Sequence . . . 2105 Model 800 Power Symptoms . . . . . . . . . . . . . . . . Cluster Power Supply Input Power Cable Plug Chart . . . . . . . . . PPS Status Display Codes . . . . . . . . . . . . . . . . . . . RPC Card and Local Switch Card Configuration Switch Settings . . . . . With Remote Power Feature Installed . . . . . . . . . . . . . . . Remote Power Feature Not Installed . . . . . . . . . . . . . . . CEC or I/O Drawer Visual Power Supply Problems . . . . . . . . . . Host Bay Drawer Visual Power Supply Problems . . . . . . . . . . 2105 Model 800 and Expansion Enclosure, Storage Cages 1 and 2 (upper) Expansion Enclosure, Storage Cages 3 and 4 (lower) . . . . . . . . . Storage Cage Power Supply Installation Requirements . . . . . . . . Original Repair MAP . . . . . . . . . . . . . . . . . . . . . CPI Diagnostics Overview . . . . . . . . . . . . . . . . . . . Failure Condition . . . . . . . . . . . . . . . . . . . . . . Fenced or Quiesced Cluster or Host Bays` . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . FRUs Not Yet Replaced . . . . . . . . . . . . . . . . . . . . CPI Cable Connections . . . . . . . . . . . . . . . . . . . . NVS Power Cards . . . . . . . . . . . . . . . . . . . . . . Memory Quad DIMMs . . . . . . . . . . . . . . . . . . . . PPS Cable Connectors . . . . . . . . . . . . . . . . . . . . Host Bay Drawer Power Supply Communication Cable Connectors . . . . Failing CPI Interface . . . . . . . . . . . . . . . . . . . . . CIP FRUs . . . . . . . . . . . . . . . . . . . . . . . . . Cluster I/O Drawer Slot Locations . . . . . . . . . . . . . . . . Host Adapter Card FRU Names . . . . . . . . . . . . . . . . . CPI Cable FRUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 17 20 20 22 29 38 41 42 43 45 48 49 50 55 61 63 64 65 66 95 112 122 128 137 158 158 172 175 253 254 256 324 327 327 328 332 333 333 334 336 338 344 354 357 358 360 361 363 364 364 365
xi
54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94.
Cluster Boot or Down, Symptoms . . . . . . . . . . . Cluster to Cluster Communication Problem, MAP Entry . . . Cluster to Cluster Communication Failure . . . . . . . . Cluster to Cluster Communication Problem, TCP/IP Settings . Cluster to Cluster Communication Problem, New ESSNet . . Cluster to Cluster Communication Problem, Existing ESSNet . Cluster to Cluster Communication Problem, Customer Network Cluster to Cluster Communication Problem, Unknown Cause . Boot Devices Found by Firmware on Power On . . . . . . Number of Harddisks Displayed . . . . . . . . . . . . MAP Repair Started in . . . . . . . . . . . . . . . hdisk_ Repairs . . . . . . . . . . . . . . . . . . ESC Repair Actions . . . . . . . . . . . . . . . . Conditions for Fencing . . . . . . . . . . . . . . . Minimum Configuration Error Codes . . . . . . . . . . Minimum Configuration Checkpoint . . . . . . . . . . . Memory Quad DIMMs . . . . . . . . . . . . . . . CEC Drawer FRU Replacements . . . . . . . . . . . I/O Drawer FRU Replacements . . . . . . . . . . . . Host Bay LEDs . . . . . . . . . . . . . . . . . . ESC Repairs . . . . . . . . . . . . . . . . . . . ESS Web Copy Services Problems . . . . . . . . . . ESC Actions . . . . . . . . . . . . . . . . . . . Status Actions . . . . . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . Problem Repair Sequence . . . . . . . . . . . . . . SCSI Read Data Failure ESC Repairs . . . . . . . . . Customer Data Check Failure ESC Repairs . . . . . . . Meta Data Check Failure ESC Repairs . . . . . . . . . 2105 Port ID Field . . . . . . . . . . . . . . . . . rsACExec.c Return Code Definitions . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
372 378 379 380 380 380 381 381 390 393 394 395 398 412 421 421 423 437 439 454 472 475 478 482 483 486 488 489 492 494 496 498 500 502 504 505 544 545 547 561 567
xii
Figures
1. Chinese EMI Statement (s009679) . . . . . . . . . . . . . . . . . . . . . . . . xx 2. 2105 Model 800 Front and Rear Views (s009119) . . . . . . . . . . . . . . . . . . . 3 3. 2105 Expansion Enclosure Front and Rear Views (S007726m) . . . . . . . . . . . . . . 4 4. Master Console Connections (s009220) . . . . . . . . . . . . . . . . . . . . . . . 9 5. Fibre Channel Host Card LED Indicator Locations (s009528) . . . . . . . . . . . . . . 14 6. CEC Drawer Power Indicator Location (s009612) . . . . . . . . . . . . . . . . . . . 16 7. I/O Drawer Power Indicator Location (s009613) . . . . . . . . . . . . . . . . . . . 17 8. ESS ndicators (s009531) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 9. ESS Operator Panel Switches and Indicators for an Expansion Enclosure (s008026m) . . . . . 19 10. DDM Bay Indicators (S008108l) . . . . . . . . . . . . . . . . . . . . . . . . . 22 11. DDM Bay Drawer Disk Drive Module Indicators (t007660m) . . . . . . . . . . . . . . . 23 12. DDM Bay Internal SSA Connections (S008107l) . . . . . . . . . . . . . . . . . . . 24 13. DDM Bay Diagram Explanation (S008122l) . . . . . . . . . . . . . . . . . . . . . 24 14. One DDM Bay External SSA Connections (S008129m) . . . . . . . . . . . . . . . . . 25 15. Two DDM Bay Initial External SSA Connections (S008128m) . . . . . . . . . . . . . . 25 16. Two DDM Bay Final External SSA Connections (S008127m) . . . . . . . . . . . . . . . 26 17. Three DDM Bay External SSA Connections (S008126m) . . . . . . . . . . . . . . . . 26 18. Four DDM Bay External SSA Connections (S008125m) . . . . . . . . . . . . . . . . 27 19. Five DDM Bay External SSA Connections (S008124m) . . . . . . . . . . . . . . . . . 27 20. Six DDM Bay External SSA Connections (S008123m) . . . . . . . . . . . . . . . . . 28 21. Service Information Messages Report (S009434) . . . . . . . . . . . . . . . . . . . 35 22. Event History Report (S009433) . . . . . . . . . . . . . . . . . . . . . . . . . 36 23. Decoding the Refcode (s008597m) . . . . . . . . . . . . . . . . . . . . . . . . 36 24. Refcode in the 2105 SIM Sense Bytes (S008594n) . . . . . . . . . . . . . . . . . . 37 25. Example of ICKDSF Analyze Drivetest Output . . . . . . . . . . . . . . . . . . . . 39 26. Modem and Modem Expander Attachment Diagram (s009425) . . . . . . . . . . . . . . 89 27. Modem Configuration Switch Settings (S007457l) . . . . . . . . . . . . . . . . . . . 89 28. Modem Expander Setup Switch Settings (S007455l) . . . . . . . . . . . . . . . . . . 90 29. Modem Rear View (S008410l) . . . . . . . . . . . . . . . . . . . . . . . . . . 90 30. Modem Expander Rear View (S008411l) . . . . . . . . . . . . . . . . . . . . . . 91 31. Cluster Modem Connectors (s009133) . . . . . . . . . . . . . . . . . . . . . . . 92 32. Modem Front Panel Locations (S008412l) . . . . . . . . . . . . . . . . . . . . . 93 33. Modem Expander Switches and Indicators (S007486l) . . . . . . . . . . . . . . . . . 93 34. Cluster to Cluster Communication Cable Location (s009120) . . . . . . . . . . . . . . . 95 35. ESSNet Hub Port Connector Locations (S008603p) . . . . . . . . . . . . . . . . . . 96 36. Line Cord Bracket Connectors (s009124) . . . . . . . . . . . . . . . . . . . . . 115 37. Ground Continuity Repair Diagram (s009406) . . . . . . . . . . . . . . . . . . . . 116 38. Male Plug on the Mainline Power Cable (S008045l) . . . . . . . . . . . . . . . . . 116 39. Female Connector on the Mainline Power Cable (S008046l) . . . . . . . . . . . . . . 117 40. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 132 41. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 134 42. Rack Operator Panel Locations (s009714) . . . . . . . . . . . . . . . . . . . . . 135 43. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 139 44. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 145 45. Rack Power Control Card Cable Locations (s009706) . . . . . . . . . . . . . . . . . 148 46. Rack Power Control Card Switch Locations (s009707) . . . . . . . . . . . . . . . . 149 47. 2105 Model 800 RPC Local/Remote Switch Location (s009127) . . . . . . . . . . . . . 150 48. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 151 49. 2105 Model 800 Operator Panel Locations (s009422) . . . . . . . . . . . . . . . . . 152 50. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 155 51. 2105 Model 800 Operator Panel Locations (s009422) . . . . . . . . . . . . . . . . . 156 52. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . . 159 53. RPC Card Cables (s009705) . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Copyright IBM Corp. 2004, 2005
xiii
54. 2105 Primary Power Supply Locations (s009048) . . . . . . . . . . . . . . . . . 55. SSA Link Failure, Two Adjoining DDMs (s009440) . . . . . . . . . . . . . . . . . 56. SSA Link Failure, Two Adjoining DDMs (s009440) . . . . . . . . . . . . . . . . . 57. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008041l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58. DDM bay SSA Connectors (S007693l) . . . . . . . . . . . . . . . . . . . . . 59. Cluster SSA Device Card Connector Locations (s009166) . . . . . . . . . . . . . . 60. DDM bay DDM Indicator Locations (S008021l) . . . . . . . . . . . . . . . . . . 61. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008041l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62. DDM bay SSA Connectors (S007693l) . . . . . . . . . . . . . . . . . . . . . 63. Cluster SSA Device Card Connector Locations (s009166) . . . . . . . . . . . . . . 64. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008141l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 66. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 67. DDM bay DDM Indicator Locations (S008021l) . . . . . . . . . . . . . . . . . . 68. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 69. SSA Link Failure, Passthrough and Bypass Card Link Between a DDM and SSA Device Card (S008141l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 71. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 72. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 73. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S007649l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 75. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 76. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 77. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S007649l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 79. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 80. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 81. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008140l). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82. DDM Bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 83. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . . . . . . . 84. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 85. SSA Link Degraded, Two Passthrough and Bypass Card Link Between Two DDMs (S008384l) 86. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . . . . . . . 87. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 88. SSA Link Failure, Passthrough/Bypass Cards and Two DDMs (s009437) . . . . . . . . . 89. DDM Bay DDM Indicator Locations (S008021l) . . . . . . . . . . . . . . . . . . 90. DDM Bay SSA Connectors (S007693l) . . . . . . . . . . . . . . . . . . . . . 91. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 92. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 93. SSA Link Failure, Passthrough/Bypass Cards and Two DDMs (s009437) . . . . . . . . . 94. DDM Bay SSA Connectors (S007693l) . . . . . . . . . . . . . . . . . . . . . 95. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 96. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . . . . . . . 97. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (s009438) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98. DDM bay DDM Indicator Locations (S008021l) . . . . . . . . . . . . . . . . . . 99. DDM Bay Bypass Card Jumper Settings (s009436). . . . . . . . . . . . . . . . . 100. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (s009438) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 184 . 185 . 186 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 189 190 191 192 193 194 194 195 197 198 199 200 201 202 202 203 205 206 207 208 210 210 211 213 214 215 216 216 217 218 219 219
xiv
101. DDM bay SSA Connector Locations (S007693l) . . . . . . . . . . . . 102. Cluster SSA Device Card SSA Connector Locations (s009166) . . . . . . 103. DDM Bay Bypass Card Jumper Settings (s009436). . . . . . . . . . . 104. Cluster SSA Device Card Locations (s009166) . . . . . . . . . . . . 105. Cluster SSA Device Card Locations (s009166) . . . . . . . . . . . . 106. Expected DDM Bay DDM Locations (S007657l) . . . . . . . . . . . . 107. DDM bay Indicator Locations (S008018l) . . . . . . . . . . . . . . 108. 2105 Model 800 DDM Bay Locations (s009136) . . . . . . . . . . . . 109. 2105 Expansion Enclosure DDM Bay Locations (S007741s) . . . . . . . 110. Storage Cage Power Planar Fan Jumper Locations (s008352p) . . . . . . 111. Storage Cage Power Supply Locations (s009536) . . . . . . . . . . . 112. Primary Power Supply CB and Connector Locations (S008496l) . . . . . . 113. Storage Cage Power Supply Locations (S008495m) . . . . . . . . . . 114. Storage Cage Power Supply Locations (S008495m) . . . . . . . . . . 115. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 116. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 117. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 118. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 119. 2105 Primary Power Supply Connectors (5008774m) . . . . . . . . . . 120. Fan Sense Card Jumper and Cable Locations (S008774m) . . . . . . . . 121. Fan Sense Card Jumper and Cable Locations (S008774m) . . . . . . . . 122. DDM Bay Front Power Cable Locations (S009430) . . . . . . . . . . . 123. DDM Bay Rear Power Cable Locations (S009431) . . . . . . . . . . . 124. DDM bay Indicator Locations (S008018l) . . . . . . . . . . . . . . 125. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . 126. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . 127. DDM bay Bypass Card Jumper Settings (s009436) . . . . . . . . . . . 128. I/O Drawer Cluster ID Jumpers (s009459) . . . . . . . . . . . . . . 129. 2105 Model 800 Memory Riser Card Memory DIMM Locations (s009638) . . 130. Cluster to Cluster Communication Cable Location (s009120) . . . . . . . 131. Boot Sequence Display . . . . . . . . . . . . . . . . . . . . . 132. CEC Drawer Operator Panel Locations (s009652) . . . . . . . . . . . 133. CEC Drawer Bulkhead Connector Locations (s009527) . . . . . . . . . 134. I/O Drawer Bulkhead Connector Locations (s009526) . . . . . . . . . . 135. CEC Drawer and I/O Drawer Communication (s009721) . . . . . . . . . 136. CEC Drawer, Memory Riser Card Memory DIMM Module Locations (s009241) 137. Power Supply Connector Locations (s009710) . . . . . . . . . . . . 138. Host Bay Planar LED Indicator Location (s009643) . . . . . . . . . . . 139. Host Drawer Power Supply HA LED Indicator Location (s009644) . . . . . 140. CEC Drawer Bulkhead Connector Locations (s009527) . . . . . . . . . 141. I/O Drawer Bulkhead Connector Locations (s009526) . . . . . . . . . . 142. RPC Card J2 Connector Locations (s009583) . . . . . . . . . . . . . 143. Example of Problem Details Report (s009716) . . . . . . . . . . . . 144. 2105 Model 800 ESD Discharge Pad Locations (s009141) . . . . . . . . 145. Measuring Optical Transmit Power (S008185m) . . . . . . . . . . . . 146. Measuring Optical Receive Power (s008186n) . . . . . . . . . . . . 147. Measuring Fibre Channel Optical Transmit Power (s008840l) . . . . . . . 148. Measuring Fibre Channel Optical Receive Power (s008841m) . . . . . . . 149. 2105 Model 800 Host Bay Connector Locations (s009135) . . . . . . . . 150. Service Terminal Connections to Controllers and Power (s009595) . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
224 225 226 237 238 239 240 244 245 249 251 252 256 262 267 269 271 272 274 275 277 281 282 290 308 309 311 345 357 382 389 402 418 419 422 423 434 453 454 462 463 463 472 543 553 555 557 559 565 571
Figures
xv
xvi
Notices
References in this manual to IBM products, programs, or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Subject to IBMs valid intellectual property or other legal protected rights, any functionally equivalent product, program, or service may be used instead of the IBM product, program, or service. The evaluation and verification of operation in conjunction with other products, except those expressly designated by IBM, are the responsibility of the user. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 USA
Safety Notices
Safety notices are printed throughout this manual. Danger notices warn you of conditions or procedures that can result in death or severe personal injury. Caution notices warn you of conditions or procedures that can cause personal injury that is neither lethal nor extremely hazardous. Attention notices warn you of conditions or procedures that can cause damage to machines, equipment, or programs.
Environmental Notices
This section contains information about: v Product recycling for this product v Environmental guidelines for this product
xvii
Product Recycling
This unit contains recyclable materials. These materials should be recycled where processing sites are available and according to local regulations. In some areas, IBM provides a product take-back program that ensures proper handling of the product. Contact your IBM representative for more information.
Product Disposal
This unit contains several types of batteries. Return all Pb-acid (lead-acid) batteries to IBM for proper recycling, according to the instructions received with the replacement batteries.
xviii
Conformity with the Council Directive 73/23/EEC on the approximation of the laws of the Member States relating to electrical equipment designed for use within certain voltage limits is based on compliance with the following harmonized standard: EN60950.
Germany Only
Zulassungsbescheinigung laut Gesetz ueber die elektromagnetische Vertraeglichkeit von Geraeten (EMVG) vom 30. August 1995. Dieses Geraet ist berechtigt, in Uebereinstimmung mit dem deutschen EMVG das EG-Konformitaetszeichen - CE - zu fuehren. Der Aussteller der Konformitaetserklaeung ist die IBM Deutschland. Informationen in Hinsicht EMVG Paragraph 3 Abs. (2) 2: .bx 0 80 erfuellt die Schutzanforderungen nach EN 50082-1 un EN 55022 off EN 55022 Klasse A Geraete beduerfen folgender Hinweise: Nach dem EMVG: Geraete duerfen an Orten, fuer die sie nicht ausreichend entstoert sind, nur mit besonderer Genehmigung des Bundesministeriums fuer Post und Telekommunikation oder des Bundesamtes fuer Post und Telekommunikation betrieben werden. Die Genehmigung wird erteilt, wenn keine elektromagnetischen Stoerungen zu erwarten sind. (Auszug aus dem EMVG, Paragraph 3, Abs.4) Dieses Genehmigungsverfahren ist nach Paragraph 9 EMVG in Verbindung mit der entsprechenden Kostenverordnung (Amtsblatt 14/93) kostenpflichtig. Nach der EN 55022: Dies ist eine Einrichtung der Klasse A. Diese Einrichtung kann im Wohnbereich Funkstoerungen verursachen; in diesem Fall kann vom Betreiber verlangt werden, angemessene Massnahmen durchzufuehren und dafuer aufzukommen. Anmerkung: Um die Einhaltung des EMVG sicherzustellen, sind die Geraete wie in den Handbuechern angegeben zu installieren und zu betreiben. Das Geraet Klasse A. .bx
Notices
xix
Trademarks
The following terms are trademarks of the IBM Corporation in the United States or other countries or both: AIX AS/400 DB2 DFSMS/MVS DFSMS/VM e (logo) Enterprise Storage Server Enterprise Systems Architecture/390 ESCON ES/9000 FICON FlashCopy IBM MVS MVS/ESA Netfinity NetVista NUMA-Q Operating System/400 OS/390 OS/400 RETAIN RS/6000 S/390 Seascape SP System/360 System/370 System/390 TotalStorage Versatile Storage Server
xx
VM/ESA VSE/ESA xSeries z/Architecture z/OS zSeries z/VM Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation in the United States, other counties, or both. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other counties, or both. UNIX, is a registered trademark of The Open Group in the United States and other countries. Other company, product, and service names, may be trademarks or service marks of others.
Notices
xxi
xxii
Where to Start
Start all service actions at Chapter 2: Entry for All Service Actions on page 29. Note: 2105 Model 750 information v The 2105 Model 750 is fully supported by the service information in this manual when following guided procedures. However, the service information will only reference the 2105 Model 800. v The 2105 Model 750 supports limited configuration options when compared to the 2105 Model 800. For further information, reference the IBM TotalStorage ESS Introduction and Planning Guide (form number SC267246). Attention: When performing any service action on the IBM 2105 TotalStorage ESS, follow the directions given in Chapter 2: Entry for All Service Actions on page 29 or from the service terminal. This ensures that you use the correct remove, replace, or repair procedure, including the correct power on/off procedure, for this machine. Failure to follow these instructions can cause damage to the machine and might or might not also cause an unexpected loss of access to customer data.
Limited Vocabulary
This manual uses a specific range of words so that the text can be understood by IBM service representatives in countries where English is not the primary language.
xxiii
Publications
This section describes the TotalStorage ESS library and publications for related products. It also gives ordering information.
xxiv
v IBM TotalStorage ESS: S/390 Command Reference manual, SC26-7298 This publication describes the functions of the ESS and provides reference information, such as channel commands, sense bytes, and error recovery procedures for IBM S/390 and zSeries hosts. v IBM TotalStorage ESS: SCSI Command Reference manual, SC26-7297 This publication describes the functions of the ESS. It provides reference information, such as channel commands, sense bytes, and error recovery procedures for UNIX , IBM Application System/400 (AS/400), and Eserverserver iSeries 400 hosts. v IBM TotalStorage ESS: Subsystem Device Driver manual, SC26-7478 This publication describes how to use the IBM TotalStorage ESS Subsystem Device Driver (SDD) on open-systems hosts to enhance performance and availability on the ESS. SDD creates redundant paths for shared logical unit numbers. SDD permits applications to run without interruption when path errors occur. It balances the workload across paths, and it transparently integrates with applications. For information about the SDD, go to the following Web site: www.ibm.com/storage/support/techsup/swtechsup.nsf/support/sddupdates/ v IBM TotalStorage ESS: Users Guide manual, SC26-7445 This guide provides instructions for setting up and operating the ESS and for analyzing problems. v IBM TotalStorage ESS: Web Interface Users Guide manual, SC26-7448 This guide provides instructions for using the two ESS Web interfaces, ESS Specialist and ESS Copy Services. Note: No hardcopy manual is produced for this publication. However, a PDF file is available from the following Web site: www.storage.ibm.com/hardsoft/products/ess/refinfo.htm
Ordering Publications
All of the above publications are available on a CD-ROM that comes with the TotalStorage ESS. You can also order a hard copy of some of the publications. For additional CD-ROMs, order: v ESS Service Documents CD-ROM, SK2T-8822 v ESS Customer Documents CD-ROM, SK2T-8803
Web Sites
v IBM Storage home page: http://www.storage.ibm.com/ v IBM Enterprise Storage Server home page: http://www.ibm.com/storage/ess http://www.storage.ibm.com/hardsoft/product/refinfo.htm
xxv
xxvi
Reference Information
v Disk capacity that can be assigned and reassign among attached host systems. v Instant copy solutions with FlashCopy v Disaster recovery solutions with Peer to Peer Remote Copy (PPRC)
Reference Information
For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. Physical differences The storage cage in the upper right quandrant of 2105 Model 750 has been removed. Therefore, the four plenum fans above the cage have been removed. Also, the power cables to the fans are terminated at the fan end with the same jumpers used on the power planar board. The left quandrant of the 2105 Model 750 is the same as found on the 2105 Model 800. The dimensions are identical to the 2105 Model 800 and the dimensions with packaging are also identical to the 2105 Model 800. However, there is a weight difference with a maximum weight of 1059 kg (2330 lbs) without packaging and a maximum weight with packaging of 1173 kg (2580 lbs).
Front View
Figure 2. 2105 Model 800 Front and Rear Views (s009119)
Rear View
Reference Information
Front view
Rear view
The 2105 Model 800 subsystem supports a maximum of 384 DDMs with: v 128 DDMs in a 2105 Model 800 v 256 DDMs in an 2105 Expansion Enclosure, must be attached to a 2105 Model 800 v 384 DDMs in a 2105 Model 800 with attached 2105 Expansion Enclosure Note: The minimum configuration for all ESS models is 16 DDMs.
Reference Information
Because the ESS requires that a loop have two spare disk drives, the first RAID-10 disk group must consist of six DDMs and two spares. The data on three DDMs is mirrored to the other three DDMs. This configuration satisfies the ESS requirement for two-spares per loop. Later disk groups on the same loop could have eight DDMs, with the data on four DDMs mirrored to the other four DDMs. With half of the DDMs in the group used for data and the other half for mirrored data, RAID-10 arrays have less capacity than RAID-5 arrays.
Reference Information
SCSI-FCP Attached Host Systems: The ESS attaches to open-systems hosts with one-port fibre-channel adapters. The fibre-channel adapters can be configured to operate with the SCSI-to-FCP (SCSI-FCP) protocol. Longwave adapters and shortwave adapters are available on 2105 Model 800. With fibre-channel adapters configured for SCSI-FCP protocol, the ESS supports: v A maximum of 16 fibre channel ports (one port per adapter) v A maximum of 128 host login IDs per fibre channel port v A maximum of 512 SCSO-FCP host login IDs or SCSI-3 initiators per ESS v Logical unit number (LUN) and port masking by target v Either fibre channel arbitrated loop (FC-AL) fabric, or point to point topologies. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. ESCON Attached Host Systems: The ESS attaches to S/390 host systems and zSeries host systems with two-port ESCON adapters or FICON bridge channels. The FICON bridge card in ESCON Director 9032 Model 5 enables a FICON bridge channel to connect to ESCON host adapters in the ESS. The FICON bridge architecture supports up to 16 384 devices per channel. With ESCON adapters, the ESS supports: v A maximum of 32 ESCON ports (two ports per adapter) per ESS v A maximum of 64 logical paths per port v A maximum of 2048 logical paths per ESS v A maximum of 16 control-unit images per ESS v A maximum of 256 logical paths per control-unit image v Access to all 16 control-unit images and 2048 CKD devices over a single ESCON port on the ESS Note: Certain LIC levels might limit the number of devices per ESCON channel to 1024. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. FICON Attached Host Systems: The 2105 Model 800 can attach to S/390 and zSeries host systems with fibre-channel adapters that are configured to operate with the FICON upper layer protocol. A maximum of sixteen fibre-channel ports (one per adapter) can be installed. ESS FICON adapters for the 2105 Model 800 support 1 Gbps or 2 Gbps operation. When operating at 2 Gbps, channel-link speed can be up to 200 MB per second in full duplex mode. However, effective sustained throughput of the adapters will be less than these theoretical maximums. With fibre-channel adapters configured for FICON, the ESS supports: v Either fabric or point to point topologies v A maximum of 127 channel login IDs per fibre-channel port v A maximum of 16 FICON ports v A maximum of 256 logical paths per FICON port v A maximum of 4096 logical paths per ESS (256 logical paths x 16 ports = 4096)
Reference Information
Note: Certain FICON host channels might limit the number of logical paths to 2048. v A maximum of 16 control-unit images per ESS v A maximum of 256 logical paths to each control-unit image v Access to all 16 control-unit images (4096 CKD devices) over each FICON port For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual.
ESS Interfaces
This section describes the following interfaces: v ESS connection security v IBM TotalStorage Enterprise Storage Server Network (ESSNet) v IBM TotalStorage Enterprise Storage Server Specialist (ESS Specialist) v A command-line interface (CLI) v IBM TotalStorage Enterprise Storage Server Copy Services (ESS Copy Services) v IBM TotalStorage Expert, an optional software product v The ESS service interface See the IBM TotalStorage Enterprise Storage Server Web Users Interface Guide manual for detailed descriptions of the Web interfaces and instructions about how to use them. ESS Connection Security: The customer connects to the ESS administrative functions through the IBM TotalStorage Enterprise Storage Server Master Console (ESS Master Console). Access to the server functions associated with ESS Specialist and ESS Copy Services requires user IDs and passwords. The customer controls user access by assigning levels of access, such as configure or view. The levels of access limit users to the set of functions that they are authorized to perform. IBM TotalStorage Enterprise Storage Server Network: The IBM TotalStorage Enterprise Storage Server Network (ESSNet) is a network that is established between a set of ESSs and various support functions. The customer needs an ESSNet facility for each set of ESSs in a locality. A local ESSNet is the network between the ESSNet facility and the ESSs. The local ESSNet supports installation functions and configuration functions on the associated ESSs through the ESS Specialist. IBM installs the ESSNet facility when they install the ESS. The facility consists of the dedicated ESS Master Console and the networking components. Note: Feature code (FC) 2717, the ESS Master Console, replaces the remote support facility, FC 2715. FC 2715 included the ESSNet console. The ESS Master Console includes an application that provides links to the ESS user interface. When one of these links is selected, it initiates the Web interface to ESS Specialist and ESS Copy Services. The following service functions for local and remote service areas depend on facilities that the local ESSNet networking components provide: v Simple Network Management Protocol (SNMP) traps
Reference Information, CHAPTER 1
Reference Information
v Electronic mail (e-mail) v Pagers v Call home The customer can extend the local ESSNet into their Ethernet network and between local ESSNets to create an expanded Copy-Services server domain. The local ESSNet can also enable other personal computers (PCs) in the network to interact with the ESSs through either of the following: v v v v ESS Specialist ESS Copy Services ESS Copy Services Command Line interface (CLI) SNMP protocols
Interface into the ESSNet facility is through the ESS Master Console, or through an external Ethernet switch or hub that provides cable connections from the ESSNet to the ESS. The ESS Master Console also requires a telephone connection for operation of call home, remote service, and pager functions. Note: The customer can attach the Ethernet LAN to the external hub. The hub speed is 10 or 100 megabits per second (Mbps), depending on the LAN. The customer provides any hardware that is needed for this connection. ESS Master Console: IBM has replaced the ESSNet console with the ESS Master Console. The ESS Master Console uses a modem and a 16-port serial adapter that enables communication between the ESS and IBM. This communication offers the following enhancements to remote support over the ESSNet console: v Monitoring of hardware and microcode operations v The log viewer displays console message files, formatted error files, log files, and trace files on demand. This function is available to IBM service representatives and other support representatives. v Activation of microcode engineering changes (ECs) from the ESS Master Console v Reduce or elimination of long-distance telephone costs for call-home service (The ESS Master Console uses the IBM Global Network to communicate with the Field Support Center.) v Improved data transmission rates and improved reliability for state saves and traces v Simultaneous code load to multiple ESSs v The ability for IBM or the service provider to copy a LIC package from a CD-ROM at the ESS Master Console to any or all of the attached ESSs. IBM or the service provider could then use the service panels on the ESS Master Console to perform a LIC activate. Because of the added benefits, customers should convert their ESSNet console to an ESS Master Console. Contact their IBM marketing office to request this free service. Figure 4 on page 9 shows the ESS Master Console connections and the remote support functions.
Reference Information
Ethernet 15.3 m (50 ft) Ethernet 15.3 m (50 ft) 16-port Ethernet switch 15.3 m (50 ft) Ethernet Master Console MSA PCI card
1.2 m (4 ft)
Modem
Catcher systems
Call home
To customer network FTP trace data 15.3 m (50 ft) null-modem cable 15.3 m (50 ft) null-modem cable
Remote service
Serial port
Methods of accessing the ESS Specialist and ESS Copy Services Web interfaces: The customer can access the ESS Specialist and ESS Copy Services Web interfaces from the ESS Master Console. The ESS Master Console includes browser software for this access. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual.
ESS Specialist
The 2105 includes the ESS Specialist. The ESS Specialist is a Web-based interface that allows the customer to configure the 2105. From the Web interface the customer can perform the following tasks: v Monitor problems v View and change the configuration, which includes the following subtasks: Add or delete SCSI-attached host systems and fibre-channel-attached host systems Configure SCSI host ports and fibre-channel host ports on the ESS Define control-unit images for S/390 host systems and zSeries host systems Define fixed-block (FB) and count-key-data (CKD) disk groups Add FB and CKD logical devices (volumes) Assign logical devices to be accessible to more than one host system Change logical-device assignments v Change and view communication resource settings, such as electronic mail (e-mail) addresses and telephone numbers v Authorize user access
Reference Information, CHAPTER 1
Reference Information
With ESS Specialist the customer can view the following information: v The external connection between a host system and an ESS port v The allocation of storage space to fixed-block (FB) and count key data (CKD) volumes IBM updates the ESS Specialist through licensed internal code (LIC) updates. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. 2105 Copy Services: ESS Copy Services operates over the ESSNet and involves a set of ESS storage servers that are associated in a Copy-Services server domain. Each Copy-Services server domain contains a primary and a backup ESS Copy Services server. The ESS Copy Services servers each run on one of the ESS clusters within the Copy-Services server domain. ESS Copy Services provides the following types of data-copy functions: v Peer-to-Peer Remote Copy, Peer-to-Peer Remote Copy (PPRC) automatically copies changes that the customer makes to a source volume to the target volume until they suspend or terminate the PPRC relationship. v FlashCopy, FlashCopy makes a single point-in-time copy. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. IBM TotalStorage Expert: The TotalStorage Enterprise Storage Server Expert (ESS Expert) is an optional software product the customer can purchase to use with the ESS (ESS Ex[ert). ESS Expert gathers performance, asset, and data capacity information from each ESS that it finds on a network. It stores this information in a database, and generates reports that are based on this information. ESS Expert displays these reports to administrators who sign on to the Expert using a Web browser. The customer must provide a LAN connection between ESS Expert and the ESS to enable ESS Expert to gather the information from the ESS. v Asset management ESS Expert collects and displays asset management data. v Capacity management The ESS Expert collects and displays capacity management data. v Performance management ESS Expert collects and displays performance management data, for example: Number of I/O requests Number of bytes transferred Read and write response time Cache use statistics. v Manage Volume Data The ESS Expert collects and displays volume data. ESS Expert enables the customer to schedule the information collection. With this information, they can make informed decisions about capacity planning and volume placement. They can also isolate I/O performance bottlenecks.
10
Reference Information
For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual. ESS Service Interface: The ESS provides service interface ports for external connection of a service terminal. IBM or the service provider can perform service on the ESS by using an IBM mobile service terminal (MoST) or an equivalent service interface. The ESS service interface also provides remote service support with call-home capability for directed maintenance by service personnel. The customer must provide an analog telephone line to enable this support. The ESS provides the following service functions: v Continuous self-monitoring initiates a call (call home) to service personnel if a failure has occurred. Because service personnel who respond to the call know about the failing component, they can reduce the repair time. v Service personnel can access error and problems remotely. Service personnel use the logs to analyze potential failures. v Remote support can correct many types of problems on the ESS. When the ESS reports a problem, service personnel can often correct the problem from a remote location. v The call-home facility enables the use of step-ahead storage. For more details, see the IBM TotalStorage Enterprise Storage Introduction and Planning Guide manual.
11
Reference Information
The customer must enter the configuration information (to Allow CUIR to Automatically Vary Paths OFF/ON) into the Communication Resources Worksheets. These worksheets are found in the IBM TotalStorage Enterprise Storage Server Introduction and Planning Guide manual, GC26-7444. The worksheet options will be either: v Enable: allows the ESS to initiate the reconfiguration for service. The ESS will request the attached hosts to automatically vary paths offline for service, and back online after the service is complete. v Disable (default): sets the paths to the ESS cluster offline for service. The system operator has to manually vary the affected paths offline, and back online. The service representative will set the configuration option during the initial install sequence or can change the original option setting using the Change/Show Control Switches procedure in chapter 6 of the Volume 2. The service representative requests a quiesce of the channel paths from the ESS subsystem. The ESS sends a reconfiguration request to the operating system. The operating system determines the appropriate reconfiguration actions necessary and performs these actions: v For a CUIR quiesce request: The operating system determines the paths affected by the request. For each affected path, the operating system checks to determine if it is online. If the channel path is already offline for some of the devices using it, the operating system marks it as in use by CUIR. The channel path can then not be varied online by the operator while it is being serviced. If the channel path is already offline for all of the devices using it, the operating system WILL NOT mark it in use by CUIR. If the operating system detects that the path is online, it issues the command to vary the path offline. This marks the path as in use by CUIR. CUIR will not take the last path from any processor image to an online device. If the request is unsuccessful, the quiesce process gives instructions for the service representative to take to the system operator. The system operator issues the host commands to vary the channel paths offline. If the request is successful, the service representative can perform the service action. If the request is unsuccessful, the quiesce process gives instructions for the service representative to take to the system operator. The system operator issues the host commands to vary the channel paths offline. If the request is successful, the service representative can perform the service action. v After the service action, the service representative resumes the I/O components that were quiesced. The ESS sends the resume request to the operating system. The operating system then performs the appropriate reconfiguration action: Resume channel path request The operating system determines which channel paths are affected and varies them online automatically. All paths varied off by the operator will remain offline but will no longer be marked as in use by CUIR. v The operating system sends the results of the resume request to the ESS. The automatic control provided by CUIR simplifies the actions required by operations personnel in managing their service requirements.
12
Reference Information
Service Interface
The 2105 Model 800 provides service interface ports for external connection of a service terminal. IBM or the customers service provider can perform service on the 2105 using an IBM mobile service terminal (MoST) or equivalent. Remote Services Support: The 2105 service interface also provides remote service support with call-home capability with directed maintenance for service support representatives. The customer provides an analog telephone line to enable this support. The service interface provides an RS232 connection via a modem switch and modem, to the analog telephone line. The customer must order a modem and modem switch. The first 2105 Model 800 ordered requires this equipment. The modem and modem switch support up to seven 2105 Model 800s. The cable length from the 2105 Model 800 to the modem switch should be a maximum of 50 feet (15 meters). The 2105 Model 800 and Expansion Enclosure provides the following service functions: v Continuous self-monitoring that initiates a call (call home) to service personnel; if a failure has occurred. Because service personnel who respond to the call knows about the failing component, repair time is reduced. v Problems are available that service personnel can access remotely to analyze potential failures. v Remote support that allows the ESS to correct many types of problems. When the ESS reports a problem, service personnel can often create a correction which they can apply from the remote location. The Service support representative, logically configures the ESS during installation. After the ESS is installed the customer can perform additional configuration using the ESS Web interfaces. This includes modifying the remote service functions.
13
Reference Information
v Point-to-point allows direct interconnection of ports. v Fabric (the underlying structure) To allow multiple nodes to be interconnected, a fabric that provides the necessary switching functions can be used to support communication between multiple nodes. A fabric can be implemented using available vendor products. v Arbitrated Loop Arbitrated loop is a ring topology that enables the interconnection of a set of nodes. The maximum number of ports for a Fibre channel arbitrated loop is 128.
LED 1 (Green) LED 2 (Yellow) LED 3 (Green) LED 4 (Green) LED 5 (Red)
Front View
Side View
14
Reference Information
Table 1. Fibre Channel Host Card LED Indicators Green LED 1 Indicator Off Off Off Off Off On On On On On Blinking slowly (1 blink per second) Blinking slowly (1 blink per second) Blinking slowly (1 blink per second) Yellow LED 2 Indicator Off On Blinking slowly (1 blink per second) Blinking rapidly (4 blinks per second) Unsteady blinking (no pattern) Off On Blinking slowly (1 blink per second) Unsteady blinking (no pattern) Blinking rapidly (4 blinks per second) Off Blinking slowly (1 blink per second) Blinking rapidly (4 blinks per second) Indicated Condition Wake-up failure (card failed) Power on Self Test failure (card failed) Wake-up failure Power on Self Test failure Power on Self Test in progress Failure while operating Failure while operating Normal, inactive Normal, active Normal, busy Normal, link down or not yet started (loss of light) Off-line for download Restricted off-line mode (waiting for restart)
Cluster Indicators
Each cluster is made up of a CEC drawer and an I/O drawer. Each of these drawers have their own power indicators. CEC Drawer Power Indicator: The CEC drawer power indicator 1 is located on the front of the CEC drawer on the lower left corner.
Table 2. CEC Drawer Power Indicators CEC Drawer Power Indicator State Off Blinking Slowly Blinking Rapidly On Steady (not blinking) Condition Indicated Power is off Power off in progress Power on in progress Power is on
15
Reference Information
CEC Drawer
Front View
1
Front View
16
Reference Information
I/O Drawer Power Indicator: The I/O drawer power indicator 2 is located on the front of the CEC drawer on the top left corner of the CEC drawer operator panel.
Table 3. I/O Drawer Power Indicators I/O Drawer Power Indicator State Off Blinking Slowly Blinking Rapidly On Steady (not blinking) Condition Indicated Power is off Power off in progress Power on in progress Power is on
CEC Drawer
Front View
Figure 7. I/O Drawer Power Indicator Location (s009613)
Front View
17
Reference Information
LEDs are off when the ESS power is off. LEDs flash rapidly (twice per second) to indicate that the ESS power-on or power-off sequence is in progress. LEDs are on solid when the power on sequence is complete with no errors. LEDs flash slowly (once per second) to indicate a power fault. v Cluster 1 and Cluster 2 message LEDs: LED is on when a problem is created that requires a service action. LEDs flash rapidly (twice per second) to indicate that a cluster power-on or power-off sequence is in progress. v Cluster 1 and Cluster 2 Ready LEDs: LED is off when the cluster is powering on, fenced, or being serviced. LED is on when the cluster is ready for customer use by the host systems. v Unit Emergency power switch: Causes an immediate ESS power off and may cause customer data loss. Works the same if the ESS is in Local Power Control Mode, Automatic Power Control Mode or Remote Power Control Mode.
Local Power
Ready Cluster 1 Cluster 2 Power Complete Line Cord 1 Line Cord 2 Messages Cluster 1 Cluster 2 Front View Front View
18
Reference Information
Unit Emergency
Figure 9. ESS Operator Panel Switches and Indicators for an Expansion Enclosure (s008026m)
19
Reference Information
Table 5. Power control without Remote Power Control Feature installed Local and Automatic Switch Setting Local Power Control Modes Local Power Control Mode: The ESS power is controlled by the ESS operator panel local power switch. Note: If the ESS loses customer power to both line cords, it will need to be manually powered on when customer power is returned to one or both line cords. Automatic Automatic Power Control Mode: The ESS power is controlled by the ESS operator panel local power switch. Note: If the ESS loses customer power to both line cords, it will automatically power on when customer power is restored to one or both line cords.
20
Reference Information
Note: If the ESS will not power off, there may be a hardware problem or pinned data. Your service representative must repair this condition. Attention: Do not force the power off using the Unit Emergency switch, customer data loss may occur.
21
22
Figure 11. DDM Bay Drawer Disk Drive Module Indicators (t007660m)
1 [Figure 11] Ready Indicator This green indicator shows the following conditions: Indicator Off Both SSA links are inactive because one of the following conditions exists: - The DDMs or DDM and bypass card that are logically on each side of, and next to, this DDM are not connected or are missing. - The DDMs or DDM and bypass card that are logically on each side of, and next to, this DDM are inactive. - An SSA attachment that is in the loop is inactive. - A power-on self-test (POST) is running on this DDM. Indicator Permanently On Both SSA links are active, and the DDM is ready to accept commands from the using system. The Ready indicator does not show that the motor of the DDM is spinning. The DDM might be waiting for a Motor Start command, or might have received a Motor Stop Command. Indicator Slowly Blinks (two seconds on, two seconds off) Only one SSA link is active. Indicator Blinks Fast (five times per second) The DDM is active with a command in progress. 2 [Figure 11] Check Indicator This amber indicator shows the following conditions: Indicator Off Normal operating condition. Indicator Permanently On One of the following conditions exists: - An unrecoverable error that prevents the normal operation of the SSA link has been detected. - The power-on self-tests (POSTs) are running or have failed. The indicator comes on as soon as the DDM is powered on, and goes off when the POSTs are complete. If the indicator remains on for longer than one minute after the DDM is powered on, the POSTs have failed. - Neither SSA link is active.
Reference Information, CHAPTER 1
23
24
Figure 15. Two DDM Bay Initial External SSA Connections (S008128m)
25
Figure 16. Two DDM Bay Final External SSA Connections (S008127m)
26
27
Special Tools
Special Tools
v v v v SSA screwdriver tool (P/N 32H7059) ESCON wrap tool (large), (P/N 5605670) ESCON wrap tool (small), (P/N 05N6767) Fibre channel (SW2 and LW2) wrap tool (P/N 11P3847)
28
Repair service terminal connection MAP 6060: Isolating a Service Terminal Login Failure on page 567 problem, cannot display the Copyright and Login screen Repair service terminal connection problem, cannot display the Main Service Menu screen MAP 6060: Isolating a Service Terminal Login Failure on page 567
INSTALL 2105 Model 800 Subsystem Installing and Testing the 2105 Model 800 Unit in chapter 5 of Volume 2
2105 Expansion Enclosure (Physically Adding a separate Expansion Enclosure, that was NOT shipped or tested with attached to a 2105 Model 800) the existing 2105 Model 800, requires a separate MES. Relocated 2105 Model 800 Subsystem (Previously installed and relocated) DDM Bay (8 Pack) Host Card Master Console ESSNet1 console Connect the ESSNet to a customer network Safety inspection Installing a Relocated 2105 Model 800 and Expansion Enclosure Subsystem in chapter 5 of Volume 2 Adding a DDM bay to an existing 2105 subsystem requires a separate MES. Installing a Host Card Installing a Host Card in chapter 5 of Volume 2 Begin Installation of the Master Console in chapter 5 of Volume 2 MAP 1610: Connecting the Modem and Modem Expander for Remote Support on page 88 Connecting an ESSNet1 or Master Console to a Customer Network in chapter 5 of Volume 2 Safety Inspection in chapter 12 of Volume 3
29
Start
Table 7. Entry for All Service Actions (continued) If you are here to: 2105 Expansion Enclosure DDM Bay (8 Pack) Host Card Relocate 2105 Subsystem Go to: Removing a 2105 Expansion Enclosure from an existing 2105 subsystem requires a separate RPQ. Removing a DDM bay from an existing 2105 subsystem requires a separate RPQ. Removing a Host Card Removing a Host Card in chapter 5 of Volume 2 Relocating a 2105 Model 800 Subsystem in chapter 5 of Volume 2
LOGICAL CONFIGURATION / ESS SPECIALIST Change logical subsystem configuration Customer cannot access the 2105 Model 800 using the ESS Specialist If additional configuration needs to be completed, use the ESS Specialist from the ESSNet console. Go to Analyze and Repair a Service Request section of this table.
Customer cannot access a SCSI LUN Go to Analyze and Repair a Service Request section of this table. Customer requests list of WWPNs of installed fibre channel cards Offload ESS Specialist User Files Go to Configuration Options Menu in chapter 8 of Volume 3. Look for WWPN under System Attachment Resource Menu. Offload User Files in chapter 6 of Volume 2
CHANGE COMMUNICATIONS CONFIGURATION TCP/IP LAN, use only after 2105 initial installation Enable/Disable ESS Specialist Regenerate the ESS Specialist Certificate Repair Multiple DDM Failures E-mail Serial port / modem SNMP Call home/remote reporting options Configure Copy Services, with DNS Configure Copy Services, without DNS Managing Copy Services Enable/Disable Control Unit Initiated Reconfiguration (CUIR) Changing TCP/IP Configuration in chapter 6 of Volume 2 Configure ESS Specialist in chapter 6 of Volume 2 Regenerate ESS Specialist Certificate in chapter 6 of Volume 2 MAP 3149: Repairing Single or Multiple DDM Failures on page 232 Configure E-mail in chapter 6 of Volume 2 Configure Call Home/Remote Services in chapter 6 of Volume 2 book. Configure SNMP in chapter 6 of Volume 2 Configure Call Home/Remote Services in chapter 6 of Volume 2 Configure Copy Services, with DNS in chapter 6 of Volume 2 Configure Copy Services, without DNS in chapter 6 of Volume 2 Copy Services Server Menu in chapter 6 of Volume 2, refer to the Copy Services Server Menu options there Change/Show Control Switches in chapter 6 of Volume 2
ANALYZE and REPAIR a SERVICE REQUEST Prioritize symptoms for repair Codes displayed by the CEC drawer operator panel Cluster Ready indicator LED Off Display and repair a problem with the service terminal MAP 1200: Prioritizing Visual Symptoms and Problems For Repair on page 50 MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 MAP 20A0: Cluster Not Ready on page 117 MAP 1210: Displaying and Repairing a Problem on page 51
30
Start
Table 7. Entry for All Service Actions (continued) If you are here to: E-mail reported problem Go to: MAP 1460: Isolating E-Mail Reported Errors on page 66
SCSI-Host system receives command MAP 4560: No Valid Subsystem Status Available on page 427 rejects and check condition of internal target failure SCSI-Host system detected ESCON-Host system receives FC status, pinned data MAP 5220: Isolating a SCSI Bus Error on page 541 MAP 4560: No Valid Subsystem Status Available on page 427
ESCON-host or fiber-host, system link MAP 5300: ESCON or Fibre Channel Link Fault on page 548 error Display ESCON and Fibre Node Descriptors Customer reports a loss of line cord input power via e-mail message Power on or off problems Modem call home problems SNMP Notification Problems E-Mail Notification Problems Visual symptom Power and cooling Cluster boot or down problem Customer LAN connection problem Replace a FRU without using a problem Repair a service terminal connection problem to one cluster Repair a service terminal connection problem to both clusters MAP 5330: Display ESCON and Fibre Node Descriptors on page 560 This should cause a visual symptom, MAP 1320: Isolating Problems Using Visual Symptoms on page 60 MAP 2020: Isolating Power Symptoms on page 112 MAP 1300: Isolating Cluster to Modem Communication Problems on page 52 MAP 1305: Isolating SNMP Notification Problems on page 56 MAP 1310: Isolating E-Mail Notification Problems on page 58 MAP 1320: Isolating Problems Using Visual Symptoms on page 60 MAP 1320: Isolating Problems Using Visual Symptoms on page 60 MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 MAP 4450: ESS Cluster to Customer Network Problem on page 407 MAP 1480: Replacing a FRU, Without Using a Problem on page 66 MAP 6060: Isolating a Service Terminal Login Failure on page 567 MAP 6060: Isolating a Service Terminal Login Failure on page 567
Customer cannot access a SCSI LUN Normally this is due to a logical configuration problem or other customer related problem with the SCSI based host server. This could be the result of an off-line Raid array on the ESS. This can only occur if there are two problems on the same SSA loop, or two problems on each loop of an adapter pair in an AAL configured machine. Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option. If related problems are not found, call the next level of support. Customer cannot access a fibre channel LUN Customer cannot access the 2105 Model 800 using the ESS Specialist ESSNet Console Hardware Problem ESSNet Console Software Problem MAP 5430: Host Fibre Channel Fails to Recognize ESS LUNs on page 564 MAP 5000: ESS Specialist Cannot Access Cluster on page 540 MAP 1600: ESSNet Console Problem on page 68 MAP 1600: ESSNet Console Problem on page 68
ESSNet CONSOLE ESSNet Console Hardware Problem ESSNet Console Software Problem MAP 1600: ESSNet Console Problem on page 68 MAP 1600: ESSNet Console Problem on page 68
31
Start
Table 7. Entry for All Service Actions (continued) If you are here to: Manage Master Console Entries Create Master Console PE Package Offload ESS Specialist User Files Test Master Console Configuration and Communication Status Boot Sector Problem Go to: Master Console Queue Management in chapter 6 of Volume 2 Master Console PE Package in chapter 6 of Volume 2 Offload User Files in chapter 6 of Volume 2 Test Master Console Configuration and Communication Status in chapter 5 of Volume 2 MAP 1600: ESSNet Console Problem on page 68
SYSTEM/390 REPAIRS SIM Generation and Usage Repair Using a Hardware SIM ID SIM Generation and Usage on page 33 The SIM ID is the same as the Problem Number in the 2105 Problem. Use this number to begin the repair, go to MAP 1210: Displaying and Repairing a Problem on page 51. Repair Using an EREP Report on page 34 Repair Using a SIM Console Message on page 33 Media SIM Maintenance Procedures on page 37 Decode a Refcode on page 36 Change SIM Reporting Options (System/390 Only) in chapter 6 of Volume 2
Repair Using an EREP Report Repair Using a SIM Console Message Media SIM Maintenance Procedures Decode a Refcode Change SIM Reporting Levels
TEST a MACHINE FUNCTION Cluster Host Bay Planners Interface Cards Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3
External connections: SSA loop, LAN, Machine Test Menu in chapter 8 of Volume 3 cluster-to-cluster, and initialize modem expander SSA Devices, certify SSA Loops Rack Power Control (RPC) Cards CD-ROM Drive Diskette Drive Send Test Notification: E-mail, SNMP, pager, service Show Problem Safety inspection Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Machine Test Menu in chapter 8 of Volume 3 Safety Inspection in chapter 12 of Volume 3
LICENSED INTERNAL CODE (Microcode EC) Install/Activate LIC Feature LIC Feature Control Record Extraction Display LIC Levels and Resource Requirements Display LIC Installation Instructions Activate LIC Feature in chapter 8 of Volume 3 LC Feature Control Record Extraction in chapter 5 of Volume 2 Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3 Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3
32
Start
Table 7. Entry for All Service Actions (continued) If you are here to: Copy a LIC Image to LIC Library Activate a LIC Image Copy and Activate a LIC Image Go to: Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3 Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3 Licensed Internal Code Maintenance Menu in chapter 8 of Volume 3
INFORMATION Machine overview Service interface Locations and FRUs, 2105 Model 800, only Determine ESD procedures Determine standard tools needed CEC drawer operator panel, status codes DDM Bay and SSA DASD Drawer indicators and switch 2105 Model 800 maintenance agreement qualification Chapter 1: Reference Information on page 1 Service Interface on page 13 Locations in chapter 7 of Volume 3 Working with ESD-Sensitive Parts in chapter 4 of Volume 2 Standard Tools Needed in chapter 4 of Volume 2 MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 2105 Models 750 and 800 Disk Storage on page 21 Safety Check in chapter 4 of Volume 2
33
EREP Reports
For detailed information about EREP reports, see Environmental Recording, Editing, and Printing Program Users Guide book.
34
COUNT
FIRST OCCURRENCE
LAST OCCURRENCE
****************************************************************************************************
1 021/99 17:44:27:78 021/99 17:44:27:78 MODERATE ALERT 2105-800 S/N 0113-10473 REFCODE C211-1060-A00A ID=03 DASD EXCEPTION ON SSID 0011 ADDITIONAL ANALYSIS REQUIRED TO DETERMINE REPAIR IMPACT. SEE PROBLEM NUMBER 03 FOR DETAILS
021/99 19:24:19:56 021/99 19:24:19:56 SERVICE ALERT 2105-800 S/N 0113-30224 REFCODE 4320-0000-5284 ID=06 MEDIA EXCEPTION ON SSID 00D2, VOLSER 380050 DEV 0E12, 0D REFERENCE MEDIA MAINTENANCE PROCEDURE 2
021/99 19:24:04:67 022/99 03:29:01:65 SERIOUS ALERT 2105-800 S/N 0113-10473 REFCODE C211-1060-A00A ID=09 DASD EXCEPTION ON SSID 00D2 ADDITIONAL ANALYSIS REQUIRED TO DETERMINE REPAIR IMPACT. SEE PROBLEM NUMBER 09 FOR DETAILS
To run EREP for the system exception reports: 1. Make a working data set using the following parameters: PRINT=NO ACC=Y ZERO=N TYPE=O TABSIZE=999K 2. Run EREP against the working data set and print using the following parameters: SYSEXN=Y HIST ACC=N TABSIZE=999K DEV=(33xx)
35
SPID SSYS ID TIME JOBNAME RECTYP CP CUA * DNO DEVT CRW-CHP REASON
SNID PSW-MCH /PROG-EC 04 06 08 10 ESW RCYRYXIT 12 14 COMP/MOD CSECTID 16 18 20 22 ERROR-ID VOLUME SEEK SD CT
DATE 052 99 00 12 10 44
N/A
N/A N/A N/A
05104501 FE000100
05104601 FE000100 05104601 FE000100 05104601 FE000100
*****
00 19 22 92 00 27 44 10 00 28 41 75
***** *****
To make a refcode from SIM sense bytes, see Generating a Refcode from Sense Bytes on page 37.
Decode a Refcode
The refcode is a 6-byte field that contains information you can use to locate and repair a 2105 error condition. This section explains how to decode the refcode and find the probable failing FRUs, see Figure 23.
KTGS-CCCC-II PP
KTGS: ESC Refcode Bytes 0 and 1 Exception Class Exception Type General Symptom MAP or SIM Symptom CCCC: LIC Level Identifier Refcode Byte 2 II: Problem ID (SIM ID) Refcode Byte 4
PP: Repair Procedure If PP=09 (Refcode Byte 5), Perform procedure for problem indicated in Refcode Byte 4. If PP=82 (Refcode Byte 5), Perform Media Maintenance 2
36
Decode a Refcode
2105 SIM Sense Byte Fields: MEDIA SIM 00 03 xxxxxxxx Byte 06 = xF: Needed for SIM sense bytes YY: SIM ID field refcode: KTGS-0000-SSQM Byte 28= FE: 2105 DASD SIM Failing cylinder Failing head 04 07 xxxxxFYY 08 11 xxxxxx00 12 15 00SSQMxx 16 19 xxxxxxxx 20 23 xxxx KTGS 24 27 xxxxxxxx 28 31 FEc.ccchh
. . . .
Use the information in Figure 24 to determine the refcode if the EREP or similar function is not available. See EREP Reports on page 34 for more information. If the record type in the Event History report is ASYNCH, that indicates this record contains SIM sense bytes. If the record type in the Event History report is OBRxxx, the record is a unit check sense and does not contain SIM sense bytes.
37
Notes: 1. IODELAY adjusts ICKDSF to run concurrently with customer operations. 2. ANALYZE scans the volume for data that is not readable or not usable. 3. The NOPRESERVE parameter must be specified for the 2105. The PRESERVE parameter is not valid for the 2105. All previous attempts by the subsystem to recover the data have not been successful. Although the track will be returned to a usable state, all customer data on the specified track will be lost when the INSPECT command is run.
38
Sense Information Key Description: ESC 1 cccchh 2 ESC = 0F0B in this example Failing track and head address (cccchh) v Failing track address (cccc = track 03A4 in this example) v Failing head address (hh = head 01 in this example)
39
40
41
Isolate
Table 9. MAP 1XXX: General Maintenance Analysis Procedures (continued) MAP 1XXX Procedures: MAP 1609: Power Off and Reboot Procedure for the TotalStorage ESS Master Console MAP 1610: Connecting the Modem and Modem Expander for Remote Support MAP 1620: Attaching The ESSNet to a Customer Network MAP 1630: Master Console Product Recovery Wizard for Xseries 206 PCs Go to: Page 87 Page 88 Page 107 Page 111
42
Isolate
Table 10. MAP 2XXX: Power and Cooling Maintenance Analysis Procedures (continued) MAP 2XXX Procedures: MAP 24A0: PPS Power On Problem MAP 24B0: 2105 Cannot Power Off, Pinned Data MAP 24F0: Both RPC Cards Firmware Down Level MAP 2520: PPS Output Circuit Breaker Tripped MAP 2600: RPC Card Cannot Reset a Power Fault MAP 2700: CEC Drawer Power On Problem MAP 2800: CEC or I/O Drawer Visual Power Supply Problem MAP 2810: Host Bay Drawer Visual Power Supply Problem Go to: Page 165 Page 167 Page 168 Page 168 Page 169 Page 170 Page 171 Page 174
MAP 3077: Isolating an SSA Link Error Between a DDM and two SSA Device Page 187 Cards MAP 3078: Isolating a Degraded SSA Link Between a DDM and Two SSA Device Cards MAP 3085: Isolating an SSA Link Error Between Two SSA Device Cards Connected Through a DDM Bay MAP 3086: Isolating a Degraded SSA Link Between Two SSA Device Cards Connected Through a DDM Bay MAP 3095: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays and an SSA Device Card MAP 3096: Isolating a Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card MAP 3100: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays Page 193 Page 197 Page 201 Page 204 Page 209 Page 212
MAP 3101: Isolating a Degraded SSA Link Between Two Between Two DDMs Page 217 in Separate DDM Bays in Separate DDM Bays MAP 3120: Isolating an SSA Link Error MAP 3121: Isolating a Degraded SSA Link MAP 3123: Array Repair Required MAP 3124: Isolating Between DDM Hardware and Microcode Failures MAP 3125: Isolating an Unexpected SSA SRN MAP 3126: Isolating an Unexpected SSA Test Result MAP 3127: Formatting of a DDM Has Not Completed Page 220 Page 223 Page 226 Page 227 Page 228 Page 228 Page 229
43
Isolate
Table 11. MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures (continued) MAP 3XXX Procedures: MAP 3128: Isolating an Unknown DDM Failure MAP 3129: Isolating an Array Repair Required Failure MAP 3131: Attempt to Format Array Member MAP 3142: Isolating Multiple DDMs on an SSA Loop Cannot be Accessed MAP 3149: Repairing Single or Multiple DDM Failures MAP 3152: Replacing DDMs Called Out by Enhanced PFA MAP 3160: SSA DASD DDM Bay Isolating a Single DDM Redundant Power Fault MAP 3180: Controller Card Failed MAP 3190: Wrong Drawer Type Installed MAP 3200: Uninstalled SSA DDMs Connected to Loop A MAP 3210: Uninstalled SSA DDMs Connected to Loop B MAP 3220: Isolating too Few DDMs in a DDM Bay MAP 3300: Repair Alternate Cluster to Run SSA Loop Test MAP 3360: Ending a DASD Service Action MAP 3375: Isolating a Storage Cage Fan/Power Sense Card Error MAP 3378: Isolating a Storage Cage Fan/Power Sense Card Error MAP 3379: Analyzing a Storage Cage Fan/Power Sense Card Check Summary Indicator On MAP 3381: Isolating a Storage Cage Fan/Power Sense Card Error MAP 3384: Isolating a Storage Cage Fan Failure MAP 3384: Isolating a Storage Cage Fan Failure MAP 3391: Isolating a Storage Cage Power System Problem MAP 3395: Isolating a DDM Bay Power Problem MAP 3397: Isolating an SSA DASD DDM Bay Controller Card Problem MAP 3398: Isolating a DDM Bay Controller Card Communications Failure MAP 3400: Replacing a DDM Bay Frame Assembly MAP 3421: Storage Cage Fan/Power Sense Card R2 Cable Problem MAP 3422: Storage Cage Fan/Power Sense Card R2 Jumper and Cable Problems MAP 3423: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Missing Error MAP 3424: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Failing Error MAP 3425: Isolating a Storage Cage Fan/Power Sense Card R2 Cable Error MAP 3426: Isolating a Storage Cage Fan/Power Sense Card Location Error MAP 3427: Isolating a Storage and DDM Bay Location Error MAP 3428: Isolating a DDM Bay Location Error MAP 3429: Isolating a DDM Location Error MAP 3500: Verifying a DDM Bay Repair MAP 3520: DDM Bay Verification for Possible Problems Go to: Page 229 Page 230 Page 231 Page 231 Page 232 Page 233 Page 234 Page 235 Page 236 Page 237 Page 238 Page 239 Page 240 Page 241 Page 242 Page 245 Page 246 Page 247 Page 248 Page 251 Page 255 Page 261 Page 263 Page 264 Page 266 Page 266 Page 268 Page 270 Page 272 Page 273 Page 275 Page 277 Page 279 Page 282 Page 283 Page 284
44
Isolate
Table 11. MAP 3XXX: SSA DASD DDM Bay Maintenance Analysis Procedures (continued) MAP 3XXX Procedures: MAP 3530: SSA Devices Certify Test Failure MAP 3540: Web Initiated Format Incomplete, User to Restart MAP 3550: Incomplete or Failed Format Process, User to Restart MAP 3560: Unrelated Occurrence, Retry Verification Test MAP 3570: Unrelated Event Caused Resume Fail MAP 3580: DDM, or DDMs, Found in Formatting State During IML MAP 3600: Multiple DDMs Isolated on an SSA Loop MAP 3605: Isolating an Unexpected Result MAP 3610: DDM Installation with New Rank Site Capacity MAP 3612: DDM Installation with Mixed Capacity Rank Site MAP 3614: DDM Installation Introduces Different RPM MAP 3615: DDMs of Same Capacity but Different RPMs on the Same SSA Loop MAP 3617: DDM Size is Not Supported MAP 3618: Replacement DDM Has Slower RPM Than Called For MAP 3619: This Repair Requires a Larger Capacity DDM MAP 3621: New DDM Storage Capacity Smaller Than Original DDMs MAP 3625: All DDMs on SSA Loop A Do Not Have the Same Characteristics MAP 3626: All DDMs on SSA Loop B Do Not Have the Same Characteristics MAP 3627: Unable to Determine DDM Use MAP 3640: Other Cluster Fenced - Unable to Verify SSA Loop MAP 3650: Wrong, Missing, or Failing Bypass Card MAP 3652: Wrong, Missing, or Failing Passthrough Card MAP 3654: Bypass Card Jumpers Wrong MAP 3656: 20 MB SSA Cable Installed Where 40 MB Cable Expected MAP 3680: Isolating a Two DDMs Detect Over-Temperature Problem MAP 3685: Isolating a Multiple DDM Detect Over-Temperature Problem Go to: Page 284 Page 285 Page 286 Page 287 Page 288 Page 288 Page 289 Page 290 Page 290 Page 293 Page 296 Page 298 Page 298 Page 299 Page 301 Page 301 Page 302 Page 303 Page 304 Page 305 Page 307 Page 309 Page 311 Page 312 Page 313 Page 316
45
Isolate
Table 12. MAP 4XXX: Cluster Maintenance Analysis Procedures (continued) MAP 4XXX Procedures: MAP 40A0: Fence Network Isolation MAP 40B0: Special Cluster Problem Determination Using Slow Boot Mode MAP 40C0: Special SCSI Bus Problems MAP 40D0: Special SRN Problems MAP 40E0: Only One I/O Drawer Power Supply Detected MAP 4100: Isolating a LIC Process Read/Display Problem MAP 4110: Host Bay Drawer Fan Reporting Failure MAP 4120: Handling Unexpected Resources MAP 4130: Handling a Missing or Failing Resource MAP 4140: Isolating a LIC Activation Process Failure MAP 4150: PPS to RPC Interface Failure MAP 4150: PPS to RPC Interface Failure Go to: Page 344 Page 346 Page 347 Page 348 Page 349 Page 351 Page 351 Page 352 Page 353 Page 354 Page 355 Page 355
MAP 4170: Loss of Redundant Input Power to CEC, I/O, or Host Bay Drawers Page 357 MAP 4180: RPC to RPC Communication Failure MAP 4190: RPC to Host Bay Drawer Power Supply Communication Failure MAP 41A0: RPC Card Host Bay Drawer Fan Reporting Failure MAP 41B0: CPI Interface NVS/IOA Card to Host Bay Failure MAP 41C0: ESC 2770 or 2771, Missing CPI Detected MAP 41D0: CPI Problem for Host Bay Slot Failure MAP 41E0: CPI Failure Needing CPI Cable as FRU MAP 41F0: A Temporary CPI Error was Detected MAP 4200: Extended Cluster IML Time Due to NVS Battery Charging MAP 4240: Isolating a Blinking 888 Error on the CEC Drawer Operator Panel MAP 4350: Isolating Cluster Code Load Counter=2 MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel MAP 4370: Error Displaying Problems Needing Repair MAP 4380: Isolating a Customer LAN Connection Problem MAP 4390: Isolating a Cluster to Cluster Ethernet Problem MAP 43A0: Bootlist Management Using SMS MAP 43A5: Bootlist Management Using SMS for Automatic LIC MAP 43B0: Cluster Dual Hard Drive ESC 1xxx MAP 43C0: Cluster IML from Second Hard Disk Drive MAP 43D0: Duplicate TCP/IP Address Detected for this Cluster MAP 43E0: Service Processor Reset MAP 4400: Displaying Cluster SMS Error Logs MAP 4410: Cluster to Cluster Ethernet Communication Test MAP 4420: Display Cluster Ethernet Network Address MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem MAP 4450: ESS Cluster to Customer Network Problem Page 359 Page 360 Page 361 Page 361 Page 362 Page 364 Page 365 Page 365 Page 366 Page 367 Page 370 Page 371 Page 375 Page 376 Page 377 Page 387 Page 392 Page 398 Page 400 Page 401 Page 401 Page 402 Page 403 Page 405 Page 405 Page 407
46
Isolate
Table 12. MAP 4XXX: Cluster Maintenance Analysis Procedures (continued) MAP 4XXX Procedures: MAP 4460: Cluster NVS Problem MAP 4470: ESC 2768, NVS/IOA Card Problem MAP 4480: Cluster to RPC Cards Communication Problem MAP 4510: Isolating a Cluster to Cluster CPI Communication Failure MAP 4520: Pinned Data and/or Volume Status Unknown MAP 4540: Cluster Minimum Configuration MAP 4550: NVS FRU Replacement MAP 4560: No Valid Subsystem Status Available MAP 45A0: Pinned Data, Special Case MAP 4600: Isolating a CD-ROM Test Failure MAP 4610: Cluster SP, SPCN, or System Firmware Down-Level MAP 4620: Isolating a Diskette Drive Failure MAP 4640: Cluster SP, SPCN, or System Firmware Reload MAP 4670: Cluster Powered Off Unexpectedly MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) MAP 4710: Isolating a DDM LIC Update Problem MAP 4720: Host Bay Fails to Power Off MAP 4730: Cluster Power Off Request Problem MAP 4760: Recovering from Corrupted Files or Functions 6 MAP 47A0: Cluster Fails to Power Off 3 MAP 4820: Isolating a SCSI Card Configuration Timeout MAP 4840: CPI Diagnostic Communication Problem MAP 4850: Repair the Host Bay Drawer MAP 4870: Host Bay Power On Problem MAP 4880: Cluster Power On Problem MAP 4885: SPCN Load Fault Firmware Error Code MAP 4890: Replacing a CEC or I/O Drawer Power Supply MAP 4960: ESC 5500 Isolation MAP 4970: Isolating a Software Problem MAP 4980: Customer Copy Services Problems MAP 4990: LIC Feature License Failure MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA MAP 4A00: Isolating an Automatic LIC Activation Failure MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) Go to: Page 410 Page 411 Page 411 Page 415 Page 417 Page 418 Page 426 Page 427 Page 428 Page 429 Page 430 Page 430 Page 431 Page 431 Page 432 Page 442 Page 443 Page 446 Page 446 Page 448 Page 449 Page 454 Page 456 Page 457 Page 458 Page 459 Page 461 Page 468 Page 471 Page 471 Page 472 Page 474 Page 476 Page 477 Page 482 Page 482 Page 485 Page486
47
Isolate
Table 12. MAP 4XXX: Cluster Maintenance Analysis Procedures (continued) MAP 4XXX Procedures: MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL) MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL) MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL) MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL) MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL) Go to: Page 488 Page 491 Page 493 Page 495 Page 497 Page 499 Page 501 Page 503 Page 504 Page 506 Page 509 Page 511 Page 514 Page 517 Page 520 Page 523 Page 526 Page 529 Page 532 Page 534 Page 537
48
Isolate
Table 13. MAP 5XXX: Host Interface Maintenance Analysis Procedures (continued) MAP 5XXX Procedures: MAP 5410: Fibre Channel Bit Error Rate Validation MAP 5430: Host Fibre Channel Fails to Recognize ESS LUNs MAP 5440: Fibre Host Card Reports a Loss of Light Go to: Page 563 Page 564 Page 566
49
Description
Use this procedure if there is more than one visual symptom and/or problem needing repair.
Procedure
v Display the details of each problem and then use the table below to prioritize their repair.
Table 15. Prioritizing Repairs Condition Visual Symptoms Description Visual symptoms should create a related problem that can be displayed in the Repair Menu, Show / Repair Problems Needing Repair option. Repair related problems before using visual symptoms. If there are no related problems, go to MAP 1320: Isolating Problems Using Visual Symptoms on page 60. The Automatic LIC Activation process stopped or suspended because one or more problems needing repair are listed in the problem log. If a problem exists with ESC=14xx, you must use it first as it will prioritize the repairs and recover the Automatic LIC Activation process. A single fault may create more than one related problem. The successful repair of one problem will automatically close the other related problems for the same resource. Power problems can normally be repaired after logic problems because of the fault tolerant power system design. Cluster problems should be repaired before SSA loop or DDM problems. Both fault free clusters are needed to verify the repair of an SSA loop or DDM problem. Both clusters must be fault free to verify the repair of an SSA loop or DDM problem. Repair this SSA loop before repairing an SSA loop with only one problem. All CPI interface problems needing isolation use the same isolation MAP, so either problem can be used.
Power problems
Cluster problems
SSA loop or DDM problems An SSA loop has two or more problems. CPI interface problems for a cluster and host bay.
50
Description
A problem was created by a cluster and stored in the problem. A 2105 Model 800 operator panel Message indicator was turned on to show which cluster reported the problem. The problem may be in the cluster indicated, the other cluster, or somewhere else in the 2105 Model 800. If the clusters can communicate with each other, the service terminal can display problems from both clusters while attached to either cluster. If the clusters cannot communicate, error information will be displayed to connect the service terminal to the other cluster. Problems from that cluster can then be displayed. A failing cluster may be able to communicate with the service terminal even when it cannot communicate with the other cluster. The Message indicator turns off when the service terminal connects to that cluster. If e-mail is enabled, a copy of the problem will be sent to the defined customer destinations. The service terminal will be used to display the problem or problems needing repair. The problems show FRUs and/or isolation procedures needed to repair the problem. The service terminal and service guide will work together to guide you through the repair process.
Procedure
Use the following steps to display and repair the problem: 1. Ensure the 2105 Model 800 is powered on. 2. Observe the 2105 Model 800 operator panel Message indicators: v If both cluster message indicators are on, connect the service terminal to either cluster. v If only one cluster message indicator is on, connect the service terminal to that cluster. v If both cluster message indicators are off, connect the service terminal to cluster 1. 3. Look at the service terminal screen.
Problem Isolation Procedures, CHAPTER 3
51
Description
The cluster is not able to communicate with the modem expander or the modem. This error can occur for the following reasons: v The modem expander or modem is powered off. v The modem expander needs to be reset. Powering the modem expander off and on will not reset it. The SET and CLEAR buttons must be used to reset the modem expander. The service terminal configuration screens are used to reload the initialization strings. This can only be done through cluster 1 in the 2105 Model 800. Modem expander port 1 is always cabled to cluster 1 in one of the attached 2105 Model 800s. The other modem expander ports do not have authority to accept the initialization string. v The modem is hung and needs to be reset. Powering the modem off and on should clear the hang. To ensure the modem is set correctly, use the service terminal configuration screens to reload the initialization strings. v The cable between the modem expander and modem, or the cluster and modem expander, is disconnected or damaged. v One or more of the modem configuration settings in the cluster is not configured correctly. The possible FRUs are:
52
The service terminal Change / Show Modem Configuration option has two different uses: 1. It displays the modem configuration settings. These can be compared to the values listed on the Communications Resources Work Sheet provided by the customer. 2. It will attempt to initialize the modem expander and then the modem when the Enter key is pressed. This occurs even if none of the displayed values have been updated. This is a pass/fail test. If the test fails, no reason for the failure is indicated. Note: Any problems that were created while the modem was unavailable will still be queued to be sent to the call home destination. If e-mail notification is enabled, these problems will be sent to the customer by e-mail.
Isolation
1. Ensure the modem expander and modem are powered on by observing their ON indicators. 2. Determine if the cluster to modem communication error is still present. Use the following procedure as a cluster to modem communication test. Display the Change / Show Modem Configuration screen. From the service terminal Main Service Menu, select: Configurations Options Menu Configure Communications Resources Menu Configure Call Home / Remote Services Menu Change / Show Modem Configuration Pressing enter, will attempt to initialize the modem expander and modem. If it is not successful, an error message will be displayed. The error message does not isolate the type of failure, this is a pass/fail test. For an explanation of Call Home return codes, see Table 16 on page 55. 3. Determine if the test passed or failed: v If the test failed, stopped with an error, go to step 4. v If the test was successful, complete OK, check that the modem can call the defined remote telephone numbers. From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Service Notification (via modem) v If the modem call is successful go to MAP 1500: Ending a Service Action on page 67. v If the modem call fails, go to MAP 1301: Isolating Call Home / Remote Services Failure on page 55. For an explanation of Call Home return codes, see Table 16 on page 55. 4. If a problem is found and corrected in any of the following steps, you should jump to step 14 on page 54.
53
6. 7.
8.
9. 10. 11.
v If both clusters fail, continue with the next step. 12. Read the note below then reset the modem expander. Note: Resetting the modem expander will load factory default settings. These settings will not work with the 2105 Model 800. The modem expander must be initialized through port 1 after the reset. You must locate the 2105 Model 800 with the cluster 1 that is cable to modem expander port 1. Ensure that the customer will let you have access to it. The modem expander can attache up to seven 2105 Model 800s. a. Press and hold both the SET and CLEAR buttons. b. Release only the CLEAR button. c. Release the SET button. 13. Initialize the modem expander. Connect the service terminal to the cluster 1 that is cabled to modem expander port 1. Use the cluster to modem communication test to test and initialize the modem expander. v If the test fails, call the next level of support. v If the test is successful, continue with the next step. 14. Connect the service terminal to the original cluster that was failing and repeat the cluster to modem communication test. v If the test is successful, then go to MAP 1500: Ending a Service Action on page 67. v If the problem has not been fixed, and the cluster to modem communication test still fails, call the next level of support.
54
52
53
54 55
56 57
58 59
60
Description
v If the ESS is NOT configured to send call home records to the ESSNet, this failure can occur for the following reasons: The customers analog phone line is not functional.
55
Isolation
1. Verify that the phone number or phone numbers being used are valid and that the customer phone line is functional: a. Connect the customers analog phone line to a phone receiver set. b. Call the phone number or phone numbers defined for use by the Configure Call Home / Remote Services Menu. If a modem answers, hang up and reconnect the customers phone line to the modem. Continue with step 2. 2. Verify that the cabling between the cluster and the modem is functional. a. Review MAP 1300: Isolating Cluster to Modem Communication Problems on page 52. b. Repair any problems found, if no problem is found go to step 3. 3. Determine if the protocol for a phone number is correct: a. Call the next level of support. Have them confirm that the required PE protocol or RETAIN protocol match the phone number or phone numbers being used. b. If the problem is not resolved, call the next level of support again.
Description
The ESS can send SNMP Trap messages to the customers LAN when it requires service or to notify the customer of certain information events. The ESS generates Simple Network Management Protocol (SNMP) traps and supports a read-only management information base (MIB) to allow monitoring by the customers network. Note: SNMP is required for PPRC status reporting for ESS Copy Services in Open Systems environments. For SNMP to function, the 2105 must be installed on the Customer LAN with TCP/IP addresses and ethernet cables. The SNMP trap is sent to the TCP/IP addresses of the Trap Destinations that are configured by the CE through the service panels or by the customer through the web interface. Note: These addresses should be the dotted TCP/IP address form. You can use the hostname dotted form; however, if the Name Server is down, you will not receive the SNMP trap as desired.
56
Isolation
1. Determine if the problem is still occurring. From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Customer Notification (via SNMP) Did the customer receive the SNMP test notification? v Yes, exit this MAP. The problem is no longer occurring v No, continue with the next step. Notes: a. This test procedure will ONLY send a test SNMP trap to all destinations that are configured. b. This test will complete with Customer Notification Test Results: Passed. No Problem Detected. This message means that the 2105 sent the SNMP trap messages, not that the customer SNMP trap destinations received the messages. c. Have the customer inspect the event log of the associated management system to see if they have received a SNMP trap from the ESS. 2. Verify the SNMP is correctly configured and Enabled. Use your ESS Communication Worksheet and the SNMP Menu options. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu SNMP Menu Note: The addresses should be the dotted TCP/IP address form. You can use the hostname dotted form; however, if the Name Server is down, you will not receive the SNMP trap as desired. 3. Use the ping test to verify connectivity to the SNMP trap destination or destinations. From the service terminal Main Service Menu, select: Machine Test Menu External Connections Menu LAN Test On the Ping Test for LAN panel, fill in the TCP/IP address or hostname of the Trap Destination. Press Enter to run the test. Was the ping test successful? v Yes, call the next level of support. The ping test works, but the SNMP test notification fails. v No, continue with the next step. 4. Use the ping test from step 3 to attempt to ping the other cluster in this 2105. Was the ping test successful?
57
Description
The ESS can send e-mail messages to the customers LAN when it requires service or to notify the customer of certain information events. For e-mail to function, the 2105 must be installed on the Customer LAN with TCP/IP addresses and ethernet cables. E-mail messages are sent to the addresses that are configured by the CE through the service panels or by the customer through the web interface. The ESS generates e-mail messages in the following categories: v Errors v Information Examples of Informational Types of messages are notifications that: v A new level of Licensed Internal Code (LIC) has been installed v New hardware has been installed v The service provider has run the customer-notification diagnostic test. This test verifies that e-mail messages are being received by those who should receive them. The ESS sends Error Type of messages when it detects a situation that requires customer action. A key point here is that the Test e-mail Notification will only be sent to e-mail users that are configured for Informational types of messages. We support three basic SMTP environments for e-mail. The default choice is to use the Domain Nameserver where you specify the domain name on the TCP/IP configuration panel. The other two choices: 1. Smart Host Relay e-mail and 2. Local e-mail must be selected from the e-mail Menu under Configure Communication Resources Menu. Microsoft Exchange and Lotus Notes are NOT SMTP environments. To send e-mail to these environments, the customer must have a Message Transfer Agent (MTA) installed and configured. Without this MTA, you will not be able to send e-mail to these types of environments.
Isolation
1. Determine if the problem is still occurring. From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Customer Notification (via E-Mail) Notes: a. This test procedure will ONLY send a test message to all destinations that are configured for Information type messages. (For example, if the
58
59
Description
Most visual symptoms create a related problem which should be used to start the problem repair. If a related problem was not created, the table below can be used to start the repair.
60
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. v Locate your visual symptom in the following tables then follow the description and actions. 2105 Model 800 operator panel, use Table 17 2105 Model 800 rack, cluster, and storage bay, use Table 18 on page 63 2105 Model 800 CEC drawer, I/O drawer, and host bay, use Table 19 on page 64 DDM bay and DDMs, use Table 21 on page 66
Table 17. 2105 Model 800 Operator Panel Visual Symptoms Visual Symptom Operator panel cluster Message indicator is on. Description and Action Description: A problem has been logged in that cluster. The indicator will go off when a service terminal login to that cluster occurs. Action: Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option to begin the repair. 2105 Model 800 Operator panel cluster Ready indicator is off. Description: During cluster power on and code load, status codes are displayed on the CEC drawer operator panel. When the code load is complete the CEC drawer operator panel Ready indicator LED will be lit. The LED is set to off when a cluster is fenced and a problem is created. Notes: 1. If you have switched off a PPS, there may be a failing operator panel indicator LED. This LED is controlled from the RPC card that is still powered on. If the Ready indicator switches on when the primary power supply is powered on, then there is no LED problem. 2. It is possible for the code to switch off the cluster Ready indicator, even when the cluster is still ready. The cluster will allow a service terminal login. The Repair Menu, End Of Call Status option will show no related problem and the cluster will not be fenced or quiesced. The Ready indicator will return to normal operation when the cluster code is loaded again. 3. If the Repair Menu, End Of Call Status option shows the cluster fenced, but the Repair Menu, Show / Repair Problems Needing Repair option shows no related problem, call the next level of support. There is a code problem, all fencing should create a problem that defines what needs to be repaired. Do not reset the fence condition without first repairing the cluster. Description during normal operation: v Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option to repair any related cluster problems. If there are none, continue. v Observe the CEC drawer operator panel. If it is displaying a code, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. If it is not, continue. v There is no single point of hardware failure that can cause the operator panel Ready indicator to fail. (Behind each cluster Ready indicator are two LEDs, each controlled by a single RPC card.) Call the next level of support.
Problem Isolation Procedures, CHAPTER 3
61
62
Both primary power supplies (PPS) Description: This occurs when both customer line cords lose power, or both have all indicators off. PPS input circuit breakers are in the off position. Action: Ensure PPS input circuit breakers are on and have customer restore line cord power. A code is displayed in the primary power supply (PPS) status display. Description: The PPS has detected an error condition. Action: Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. Primary power supply (PPS) indicators Description: There are five PPS indicators which can be as listed here: 1. UEPO PWR/STBY indicator is lit when customer line voltage input is available to the PPS. A code is displayed in the PPS status display. 2. UEPO Loop CMPLT indicator is lit when customer line voltage input is available to the PPS and the UEPO Switch is in the normal position. A code is displayed in the PPS status display. 3. PPS Good indicator slow blinks in standby mode when the 2105 Model 800 is off. The indicator is on when the 2105 Model 800 is powered on. 4. PPS Fault indicator slow blinks when a fault has been detected. A code is displayed in the PPS status display. 5. On Batt indicator is only lit when customer power to both line cords has been lost. The 2105 Model 800 will complete writing the customer data in cache to DDMs and will then power off within 5 minutes. Action: Use other visual symptom in this table to correct any problems. Primary power supply (PPS) status display and indicators are off. The other PPS has a status display code of 06. Description: The PPS has no customer line cord power and the PPS to PPS communication is failing. Action: Read the Attention below before continuing. Ensure the communication cable is connected to PPS connector J3 at both ends. If it is, replace it. The status code 06 will automatically reset when communication is again successful. Go to MAP 2340: PPS Status Code 06 on page 125. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS. RPC card indicator is off. Description: The RPC card indicator LED is off when the 2105 is powered off. The indicator is on when the 2105 is powered on, the RPC is receiving power from the PPS, and there is no RPC error. If an RPC error is detected, the indicator will be switched off, the RPC card will be fenced (removed from use) and a problem will be created. Action: Use the service terminal to display and repair any related problems. If there are none, observe the primary power supply (PPS) status code display at the front of the rack. If a code is displayed, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. If a code is not displayed, and the Repair Menu, End Of Call Status option shows the RPC card is fenced, call the next level of support. There is a code problem, all fencing should create a problem that defines the needed repair. Do not reset the fence condition without first repairing the cluster. Primary power supply (PPS) input circuit breaker is tripped. Description: An over-current condition in the PPS has occurred. The PPS digital status display between the front PPS fans may display a code of 16. Action: Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127.
63
Primary power supply (PPS) output Description: An over-current condition outside the PPS has occurred. The circuit breaker is tripped. PPS digital status display between the front PPS fans may display a code of 13. Action: Go to MAP 2520: PPS Output Circuit Breaker Tripped on page 168. Table 19. 2105 Model 800 CEC, I/O, and Host Bay Visual Symptoms Visual Symptom CEC operator panel is blank or stopped with a progress code displayed. Description and Action Description: During cluster power on and code load, status codes are displayed, some codes for seconds, others for minutes. An error condition is occurring if a code is displayed for more than 10 minutes. The alternate cluster may have created a problem for the failing cluster. It may have specific problem information or may just report no communication with the failing cluster. Action: Connect the service terminal to the working cluster, display and repair any related problems for the failing cluster. If there are no problems, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. CEC drawer power LED (front lower left) is off or blinking. I/O drawer power LED on CEC drawer operator panel (upper left) is off or blinking. CEC drawer or I/O drawer power supply: This is normal if the cluster has been powered off for service. If the cluster is not being serviced, go to MAP 4880: Cluster Power On Problem on page 461. This is normal if the cluster has been powered off for service. If the cluster is not being serviced, go to MAP 4880: Cluster Power On Problem on page 461. Description during normal operation:
v PWR 1 and PWR 2 indicator LEDs are both on (green). v Input power indicator LEDs PWR v CHK/POWER GOOD indicator LED is on (green). 1 and 2 are both not on (green), If all three LEDs are not normal (on green), continue with the action. or v CHK/POWER GOOD indicator LED is off or on solid amber Host Bay drawer power supply: Input power LED indicator INPUT PRESENT is not on (green), or POWER ON: HA1 and HA2 indicator LEDs are both not on (green) Action: Go to MAP 2800: CEC or I/O Drawer Visual Power Supply Problem on page 171. Description during normal operation: INPUT PRESENT indicator LED is on (green). POWER ON: HA1 and HA2 indicator LEDs are both on (green).If all three LEDs are not normal (on green), continue with the action. Action: Go to MAP 2810: Host Bay Drawer Visual Power Supply Problem on page 174.
64
65
v Green DDM ready indicator is on and v Green DDM ready indicator is off v Amber DDM check indicator is off or Action: Look at all of the above indicators on all of the DDMs in the DDM v Amber DDM check indicator is bay. on. v If all of the indicators on all of the DDMs in the DDM bay are off, go to For indicator locations, see DDM MAP 3395: Isolating a DDM Bay Power Problem on page 261. if the Bay Disk Drive Module Indicators DDM indicators are not as described above, go to MAP 3520: DDM Bay on page 23. Verification for Possible Problems on page 284. Controller card DDM Check indicator, DDM bay: v Check indicator is on (amber) This indicator is off during normal operations. If it is on, the DDM bay controller card has detected a failure in the DDM bay. Action: Go to MAP 3520: DDM Bay Verification for Possible Problems on page 284. Controller Card CHECK indicator, DDM bay: v Card Check indicator is on (amber) Description during normal operation: v Card Check indicator is normally off. This indicator is off during normal operations. If it is on, the DDM bay controller card is failing. Action: Go to MAP 3397: Isolating an SSA DASD DDM Bay Controller Card Problem on page 263. Description during normal operation: v Check indicator is normally off.
Description
A problem was created by one of the 2105 clusters. It was stored in the problem and an e-mail copy of it was sent to the e-mail destination(s). The 2105 operator panel Message indicator for the reporting cluster should be on steady (not blinking). The customer may have given you a copy of the e-mail or may just have told you that an e-mail occurred. The service terminal will be used to display and then repair the problem.
Procedure
Use the following to begin the problem repair. If you have a copy of the e-mail problem, and this Service Guide you may be able to plan the service action prior to arriving at the 2105 Model 800. The problem displays the FRUs and/or isolation procedures used to determine the FRUs. Go to, MAP 1210: Displaying and Repairing a Problem on page 51.
66
Description
Occasionally you may need to replace a FRU that is not failing and has not generated a problem. The following procedure uses the service terminal functions to replace a FRU with no problem. This procedure replaces a FRU that no problem has been logged for.
Procedure
1. Select a FRU for replacement. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Cluster FRUs Host Bay FRUs DDM Bay FRUs Rack Power Cooling FRUs Device Power Cooling FRUs Electronics Cage Power Cooling FRUs Select the FRU area and press enter. Select the FRU in the FRU area and press enter. 2. Follow the service terminal instructions.
Description
Before leaving the customer account the following actions are needed: v Ensure that the problem just repaired had its problem closed. If not, use the menu option to close it. Note: Closing or cancelling a problem will attempt to return to customer use any fenced or quiesced resources. v Ensure that any resources associated with the repair have been returned to customer use. v Ensure that any other resources not available for customer use are associated with problem(s) still needing repair. Plan to repair those problems.
Procedure
1. If the service terminal repair process did not automatically close the problem, then use this step to close it now. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Close a Previously Repaired Problem. Note: Closing or cancelling a problem will attempt to return to customer use any fenced or quiesced resources. If the problem was not fully repaired, the existing problem may be updated or a new problem created. 2. Use the service terminal options listed below to ensure all resources for this repair have been returned to customer use (they will not be listed). Any listed resources are not available for customer use and will still be quiesced or
Problem Isolation Procedures, CHAPTER 3
67
68
69
70
3. Restart the PC and do the instructions on the screen. If the PC does not start from the CD on the first try, change the startup sequence in the Configuration/Setup Utility program: __ a. Turn off the PC, wait a few seconds, then turn the power on again. __ b. When the Configuration/Setup Utility program prompt appears in the lower left corner of the screen, quickly press F1. Note: The Configuration/Setup Utility program prompt will only appear on the screen for a few seconds. You must press F1 quickly. __ c. Select Start Options from the Configuration/Setup Utility program menu. __ d. Select Startup Sequence from the Start Options menu. __ e. Write down the startup sequence shown on the screen. You will need this information later to restore the original startup sequence after the recovery process. __ f. Change the First Startup Device to the CD-ROM drive. __ g. Press Esc (escape) until you return to the Configuration/Setup Utility program menu. __ h. Before you exit from the program, select Save Settings from the Configuration/Setup Utility program menu, then press Enter. __ i. Exit the Configuration/Setup Utility program, press Esc and do the instructions on the screen. 4. When asked for Recovery Type, choose Full Recovery. 5. When the recovery is complete, remove the IBM Product Recovery CD and restart the PC. 6. END of MAP. Return to the MAP or procedure that sent you here.
Problem Isolation Procedures, CHAPTER 3
71
Restoring PC Software - Master Console on IBM PC300 PCs, IBM NetVista PCs, Xseries 205 or Xseries 206 PCs
The personal computers software is restored to its off-the-shelf state using the Master Console (IBM TotalStorage Master Console) Product Recovery procedure. This procedure uses the Master Console Product Recovery CD-ROM to restore the ESSNets software to the state that it was when it left 2105 Manufacturing. Note: The Master Console Product Recovery CD-ROM is different for Xseries 206 PCs. If you have an Xseries 206 PC then ensure that you use part number 22R1500. 1. If the Master Console is bootable, make backup copies of the configuration files and any files you created. Any files not backed up will be lost. If you are unable to make backup copies of the configuration then continue with the next step. Note: Refer to ESSNet Backup/Restore Configuration Data in chapter 5 of the Volume 2 2. Insert the Master Console Product Recovery CD into the CD-ROM drive. Note: If there are multiple copies of the Master Console Product Recovery CD available, use the newest one. 3. Restart the PC and wait for it to boot. If the PC does not start from the CD on the first try, change the startup sequence in the Configuration/Setup Utility program: __ a. Turn off the PC, wait a few seconds, then turn the power on again. __ b. When the Configuration/Setup Utility program prompt appears in the lower left corner of the screen, quickly press F1. Note: The Configuration/Setup Utility program prompt will only appear on the screen for a few seconds. You must press F1 quickly. __ c. Select Start Options from the Configuration/Setup Utility program menu. __ d. Select Startup Sequence from the Start Options menu. __ e. Write down the startup sequence shown on the screen. You will need this information later to restore the original startup sequence after the recovery process. __ f. Change the First Startup Device to the CD-ROM drive. __ g. Press Esc (escape) until you return to the Configuration/Setup Utility program menu. __ h. Before you exit from the program, select Save Settings from the Configuration/Setup Utility program menu, then press Enter. __ i. Exit the Configuration/Setup Utility program, press Esc and do the instructions on the screen. At the EasyRestore screen, click continue to start the recovery process. When the recovery is complete, remove the IBM Product Recovery CD and restart the PC. When the PC restarts, you should see the Master Console Product Recovery Wizard screen. If you do not see this screen, the ESSNETs software has not been restored properly and you will need to either repeat this procedure or go to MAP 1600: ESSNet Console Problem on page 68 to repair the PC and reinstall its software. When the Master Console Product Recovery Wizard screen is displayed, continue with one of the following MAPs:
4. 5. 6.
7.
72
2) Press Enter again. 3) Remove the diskette from the drive and press Enter. b. If there is no previously backed up configuration, type No and press Enter. 3. At the HARDWARE CONFIGURATION screen (2 of 4), press the Enter to continue. 4. At the Welcome to Kudzu screen, press any key to start the hardware detection process kudzu. Note: Steps 5 to 6 on page 75 depend on the physical configuration of the ESSNet PC. More than one step might be missing, or appear in a different order, or display slightly different messages. 5. The following steps will be used to build the Hardware configuration files for the LINUX operating system. The automated process will discover the installed hardware and prompt for configuration options. Use the following table for guidance on the required responses for the type of hardware that is discovered. Notes: a. The sequence in which the hardware is discovered, the actual hardware and minor screen details will differ between PC models and Master Console Code levels. b. Only a sub-set of the hardware in the table below will be discovered for a particular PC type. That is normal and does not indicate a problem.
Screen Name (Hardware Type) Response Actions Notes
73
Hardware Added Verify that the Configure button is selected, (Video/Graphics then press the Enter key. Adaptor) for example, ATI / Rage XL
Monitor Setup
1. Select the installed monitor from the list. 2. Use the Tab key to move the cursor to OK button. 3. Press Enter to continue.
Monitor Probe
Verify that the Yes button is selected, then press This screen may Enter. appear in place of the Monitor Setup screen. 1. The detected size of video memory will be preselected. 2. Use the Tab key to move the cursor to OK button. 3. Press Enter to continue.
Video Memory
Clockchip Configuration
1. No Clockchip Setting (recommended) will be preselected. 2. Use the Tab key to move the cursor to OK button. 3. Press Enter to continue.
1. Press the Tab key twice to move the cursor to the column labeled 24 bit:. 2. Use the down arrow key to highlight the field [ ] .1024x768.. 3. Press the space bar to select the 1024 x 768 resolution. 4. Use the Tab key to move the cursor to OK button. 5. Press Enter to continue.
Starting X Screen
1. Verify that the OK button is selected then press Enter. 2. At the Can You See This Message message, verify that the Yes button is selected, then press Enter. 3. At the Xconfigurator can set up your computer to automatically start X upon booting. Would you like X to start when you reboot? message, verify that the Yes button is selected then press Enter.
74
Verify that Configure is selected then press Enter. Verify that Configure is selected then press Enter. 1. The type of mouse will be preselected. 2. Use the Tab key to move cursor to the Emulate 3 Buttons? field. 3. Press the space bar to select emulating 3 buttons. 4. Use the Tab key to move the cursor to OK button. 5. Press Enter to continue.
Verify that the Yes button is selected, then press Screen may not Enter. appear on Xseries 205 Verify that Configure is selected, then press the Screen may not Enter key. appear on Xseries 205 Verify that Configure is selected, then press the Screen may not Enter key. appear on Xseries 205
6. At the TIMEZONE CONFIGURATION screen (3 of 4), press Enter. 7. At the Configure Timezones screen: a. Use the Tab key to move the cursor from the [*] Hardware clock set to GMT field, to the list displaying the locations. b. Use arrow keys to select your location, for example America/Los_Angeles. c. Use the Tab key to move the cursor to OK button. d. Press Enter to continue. 8. At the CONFIGURATION COMPLETE screen (4 of 4): v If the video/graphics adapter was configured successfully in step 5 on page 73, type No and press the Enter key. v If an error occurred while configuring the video/graphics adapter, type YES and press the Enter key. Then perform MAP 1608 starting with step 4 on page 86. 9. The ESSNet Console PC will now begin booting. Check for the following messages during the PC boot process: a. Equinox SST driver loaded: The MSA PCI card has been recognized by the system b. Installed Memory: xxxMByte: The number in xxx should be 192 or higher c. The modem has been secured successfully: The modem is recognized by the system. This messages will stay on the PC monitor screen for 30 seconds. Press any key to continue before the 30 seconds times out. 10. The Master Console login screen (blue background with Console logos) indicates the Master Console is ready for use.
Problem Isolation Procedures, CHAPTER 3
75
76
77
18. Close the ESSNet Toolkit. 19. Continue with Web Browser Setup for NetVista.
78
Delete all of the remaining icons on the left side of the desktop. Note: To move an icon: 1. Left click and hold on the icon, 2. Move mouse/icon, 3. Release the mouse button to drop the icon. To delete an icon: 1. Right click on the icon, 2. Select Delete from the drop down menu. 2. Reply Yes to the boxes that ask you to confirm deletion of the icons. 3. Right click on the desktop (not over any icon on the desktop). From the drop down menu select Arrange Icons then Auto Arrange. 4. Continue with Install Netscape Browser for NetVista.
79
16. At the Begin Logon screen, press the Ctrl + Alt + Delete keys. 17. At the Logon Information window, enter Username: Administrator, password = password. Click OK. 18. Close all of the windows. 19. Remove the Ready-to-Configure Utility Program CD-ROM from the CD-ROM drive. 20. Continue with Setup Netscape Browser for NetVista.
Note: If there are two Startup menu items, select the one that contains the ESSNet Console. 14. In the Exploring-Startup window, click once on the ESSNet Console icon in the right-hand panel, and then press the Delete key on the keyboard.
80
81
82
Note: Do steps 20 to 24 only if this is the initial setup of TCP/IP on the workstation. If this is not the initial setup of TCP/IP, do the following: a. Click OK. b. Click the Start button. c. Click the Shutdown button and restart the ESSNet console. d. Go to step 25. 20. Click Next. 21. Click Next to start the network. 22. 23. 24. 25. Ensure the Workgroup radio button is selected then click Next. Click Finish. Click Yes to reboot the ESSNet Console. Press the Escape (Esc) key to cancel DHCP Load during bootup.
83
84
MAP 1607: Changing the Network Configuration (IP address, host name, domain, subnet mask) for ESS and the TotalStorage ESS Master Console
Description
The network configuration is changing for the ESS, the Master Console or both. Dependencies: To communicate properly, each ESS clusters IP address must be registered on the Master Console. On each ESS, the IP address and host name from the attached Master Console must be registered. If an ESS cluster IP address changes, the old IP address must be deleted and the new IP address must be registered on the Master Console. If the Master Console IP address or host name changes, it must be changed on the ESS also. When changing the network configuration, the general sequence is shown below. Depending on what settings are changed, some steps may be omitted : 1. Change network settings on ESS, see Changing TCP/IP Configuration in chapter 6 of the Volume 2. 2. Change network settings on Master Console, see Master Console Configuration for Customer Network in chapter 5 of the Volume 2. 3. Delete old ESS cluster IP addresses on Master Console, see Verify Cluster IP Address on the Master Console in chapter 5 of the Volume 2. 4. Register/Add new ESS cluster IP addresses on Master Console, see Verify Cluster IP Address on the Master Console in chapter 5 of the Volume 2. Note: With this step, the Master Console communicates with that ESS to retrieve additional configuration information, for example, the ESS cluster host name. Make sure that all ESS cluster return values are not N/A. If N/A is returned, no communications are possible with that ESS cluster, this is an error condition!
85
MAP 1607: Changing Network Configuration for ESS and Master Console
The Refresh button on the Master Consoles ESS Configurations panel can be used to query ESS cluster configuration information at any time. If only the subnet or the domain has changed, and the IP address stayed the same, the Refresh button must be pressed. This verifies that the Master Console is communicating correctly with the ESS and also retrieves the latest ESS cluster configuration information. 5. Change Master Console IP address and host name on the ESS, see Configuring the 2105 Model 800 in chapter 5 of the Volume 2.
MAP 1608: Manually Configuring the Video/Graphics Adapter for the Master Console
Description
The Master Consoles personal computer (PC) has a problem. Note: The IBM TotalStorage ESS Master Console will be referred to as the Master Console in this document. Manual Configuration of the Video/Graphics Adapter: Normally the automatic hardware configuration, performed in MAP 1605: Master Console Product Recovery Wizard on page 73 configures the video/graphics adapter. This map procedure is only required when the automatic configuration fails. 1. Reboot the PC. 2. During the boot process press the R key when the message Press R to enter the Console system Menu is displayed. 3. The boot process will continue and display the Product Recovery System Configuration Menu. 4. Choose selection 4) RUN XWINDOW CONFIGURATOR TOOL by typing 4, then press the Enter key. 5. At the WELCOME screen, verify that the OK button is selected then press the Enter key. 6. At the Choose a card screen: a. Select your video/graphics adapter, or if not found, select Unlisted Card at the bottom of the list. b. Use the Tab key to move the cursor to OK button. c. Press the Enter key to continue. 7. The Pick a Server screen may be displayed? v If the Pick a Server screen is displayed: a. Select XF86_SVGA. b. Use the Tab key to move the cursor to OK button. c. Press the Enter key to continue. v If the Pick a Server screen is not displayed, continue with the next step. 8. At the Monitor Probe screen, verify that the Yes button is selected then press the Enter key. 9. At the Screen Configuration screen, select Not to probe. 10. At the Video Memory screen: a. b. c. 11. At Select 4mb. Use the Tab key to move the cursor to OK button. Press the Enter key to continue. the Clockchip Configuration screen:
86
MAP 1608: Configuring the Video/Graphics Adapter for the Master Console
a. No Clockchip Setting (recommended) will be preselected. b. Use the Tab key to move the cursor to OK button. c. Press the Enter key to continue. 12. The Probe for Clocks screen may be displayed: v If the Probe for Clocks screen is displayed, verify that the Probe button is selected then press the Enter key. v If the Probe for Clocks screen is not displayed, continue with the next step. At the Probe for Clocks screen, verify that the Probe button is selected then press the Enter key. 13. The Clock Probe Failed scree may be displayed: v If the Clock Probe Failed screen is displayed, verify that the OK button is selected then press the Enter key. v If the Clock Probe Failed screen is not displayed, continue with the next step. 14. At the Select Video Modes screen: a. Press the Tab key once to move the cursor to the column labeled 16 bit:. b. Use the down arrow key to highlight the field [ ]1024x768. c. Press the space bar to select the 1024 x 768 resolution. d. Use the Tab key to move the cursor to OK button. e. Press the Enter key to continue. At the Starting X screen, verify that the OK button is selected then press the Enter key. At the Can You See This Message message screen, verify that the Yes button is selected then press the Enter key. At the Xconfigurator can set up your computer to automatically start X upon booting. Would you like X to start when you reboot? message, verify that the Yes button is selected then press the Enter key. At the Confirm screen, verify that the OK button is selected then press the Enter key. When prompted, press the Enter key to continue On the Product Recovery System Configuration Menu screen, choose selection 98) REBOOT SYSTEM by typing 98, then press the Enter key. The ESS Net Console PC will now begin rebooting. If the problem still exists, repeat the procedure and choose a different server in step 7 on page 86. END of PROCEDURE. Return to the MAP or procedure that sent you here.
MAP 1609: Power Off and Reboot Procedure for the TotalStorage ESS Master Console
Description
The Master Consoles personal computer (PC) needs to be powered off or rebooted. To avoid potential system damage, that may occur if the PC power switch is used, perform the following steps to power off or reboot the PC. Power Off the Master Console: Use this procedure to power off the Master Console. 1. Click on the Foot icon located at the bottom left of the screen. 2. Click on Log Out.
Problem Isolation Procedures, CHAPTER 3
87
MAP 1609: Power Off and Reboot Procedure, TotalStorage ESS Master Console
3. 4. 5. 6. In the next window, select Halt and then click the Yes button. The screen switches to text mode and displays the power-down process status. Wait until you see the message System halted at the bottom of the screen. It is now safe to turn off the Master Console using the PCs main power switch.
Reboot the Master Console: Use this procedure to reboot the Master Console. 1. Click on the Foot icon located at the bottom left of the screen. 2. Click on Log Out. 3. In the next window, select Reboot and then click the Yes button. 4. The screen switches to text mode and displays the power-down process status. 5. The Master Console PC will begin to automatically reboot.
MAP 1610: Connecting the Modem and Modem Expander for Remote Support
Description
This procedure supports early level modem and ESSNet hardware. It does not support ESSNet II.
Procedure
Attention: The modem and modem expander are installed with the initial 2105 Model 800. They support the initial 2105 Model 800 and the next six 2105 Model 800s via the modem expander. If an eighth 2105 Model 800 is installed, a new modem and modem expander must be installed. Attention: The 2105 and cables in this procedure are ESD-sensitive. Always wear an ESD wrist strap during this procedure. Follow the ESD procedures in Working With ESD Sensitive Parts in chapter 4 of the Volume 2. 1. Verify that the customer has supplied the required analog telephone connection and cables, and that there are two AC service connections available for the modem and modem expander. Note: This is an additional AC service requirement for the customer, do not connect the modem or modem expander to the AC cord required for the service terminal. 2. Locate the Modem Kit, ordered separately, but shipped with the 2105 Model 800 Ship Group (Feature Code for US/Canada 2715). Place the modem 1 [Figure 26] and the modem expander (asynchronous port switch) 6 [Figure 26] in an area between the customer supplied AC service and the 2105 Model 800 clusters. Note: Feature code 2715 contains remote support switch parts for the modem and ESSNet.
88
Customer AC Service
1
Modem Line DTE
6 4
Power
5
Power
10 S3
Cluster 1 Cluster 2
9 8
Port 16 Port 2
S3
Telephone Line
Figure 26. Modem and Modem Expander Attachment Diagram (s009425)
3. Is the modem you are attaching a Microcom DeskPort? v Yes, continue with the next step. v No, go to step 5. 4. Verify that the Modem Configuration Switches [Figure 27] are set correctly, all switches down. Note: To access the switches, use a thin blade screwdriver to lift off the modem nameplate on the left side of the front panel. The two banks of switches are located behind the nameplate.
Microcom DeskPorte Modem T/D O/A ON/OFF
Front View
5. Verify that the Modem Expander Setup Switches [Figure 28] on the bottom of the expander are set correctly. Switches 1 and 3 should be OFF (0) and all other switches should be ON (1). Note: Setting switches 1 and 3 to OFF (0) sets the modem baud rate at 38.4 kb.
89
6. Plug the 25-pin end of the data cable 3 [Figure 26], into the DTE connector 14 [Figure 29] on the back of the modem. Plug the 9-pin end of the data cable into Port 16 16 [Figure 30] on the back of the modem expander. Tighten the cable connector retention screws. 7. Plug the RJ-11 telephone cable 2 [Figure 26], into the LINE connector, 15 [Figure 29] on the back of the modem. Plug the other end of the cable into the customers telephone line connector. 8. Plug the modem power adapter 4 [Figure 26], into the POWER connector 13 [Figure 29] on the rear of the modem. Plug the other end of the cable into the customers AC service outlet. 9. Determine if the customer supplied AC input voltage for the modem expander is in the range of 115 V ac or 230 V ac. Set the voltage range switch 18 [Figure 30] on the rear of the modem expander to match the customers AC voltage: v 115 V ac range, push the switch to the left v 230 V ac range, push the switch to the right 10. Plug the power cord 5 [Figure 26], supplied with the modem expander, into the power connector 19 [Figure 30] on the rear of the modem expander. Plug the other end of the cable into the customers AC service outlet.
13
Microcom DeskPorte Modem Parallel Port Power DTE
14
15
Line
Phone
Rear View
13
MultiTech MultiModem
PHONE LINE EIA RS232C
VOLUME
Rear View
15
Figure 29. Modem Rear View (S008410l)
14
90
15 16
13 14
11 12
9 10
7 8
5 6
3 4
1 2
Rear View
Figure 30. Modem Expander Rear View (S008411l)
11. Locate the two null-modem cables (P/N 34L7144, length 15 meters, 50 feet) in the ship group. 12. Determine if this is the first 2105 Model 800 being installed on the modem expander: v If this is the first 2105 Model 800 being installed on the modem expander, go to step 13. v If this is not the first 2105 Model 800 being installed on the modem expander, go to step 15. 13. Connect cluster 1 9 [Figure 26] to modem expander port 1: Plug the connector labeled CLUSTER S3 of the null-modem cable 10 [Figure 26], into the cluster 1, S3 connector 19 [Figure 31], on the front of cluster 1. Plug the other end of the cable labeled MODEM EXPANDER into Port 1 14 [Figure 30], on the rear of the modem expander. Attention: For correct modem expander initialization, cluster 1 of the first 2105 subsystem installed must be connected to port 1 of the modem expander. This connection is critical because the modem expander can only be configured through the cluster 1/port 1 connection. 14. Connect cluster 2 8 [Figure 26] to modem expander port 2: Connect the other null-modem cable 7 [Figure 26] to the S3 connector 20 [Figure 31] on the front of cluster 2. Plug the other end of the cable into the into Port 2 17 [Figure 30], on the rear of the modem expander, go to step 17 on page 92. Attention: Both clusters must be connected to the modem expander for the 2105 service strategy to work. Note: After each null modem cable is connected to the cluster, run each loose cable into the center cable bundle. Ensure that the Ferrite cores on the cables are located inside the 2105 frame. This is needed to minimize RFI and to provide a loop to allow the I/O drawer to be moved to the service position. The additional cable length can be stored in the area between the AC input connectors. 15. Connect cluster 1 9 [Figure 26] to the next available modem expander port: Plug the 9-pin connector end of the null-modem cable 10 [Figure 26], into the S3 connector 20 [Figure 31] on the front of cluster 1. Plug the other end of the cable into the lowest numbered port available on the rear of the modem expander [Figure 30]. 16. Connect cluster 2 to the next available modem expander port:
91
I/O Drawer 1
I/O Drawer 2
19
Front View
20
18. Power on the modem expander. At the rear of the expander press the power switch 15 [Figure 30], to On (up). 19. Is the modem you are attaching a MultiTech MultiModem? v Yes, continue with the next step. v No, go to step 21. 20. Turn on the modem using the on/off switch located on the front panel. When you apply power, the modem performs a diagnostic self-test, indicated by the TM indicator lighting for a few seconds after which the LCD should light. If this does not happen, check that the power switch is on, the power supply is solidly connected correctly and the AC outlet voltage is present.If these checks do not work, see Chapter 8 of the Users Guide supplied with the modem, Solving Problems. Go to step 22 on page 93. 21. Power on the modem. At the front of the modem press the ON/OFF switch, 22 [Figure 32]. The light in the switch comes on when the modem is powered on. Note: Pressing the ON/OFF switch is the same as unplugging the modem from its AC power source and plugging it back in. Each time you power the modem off then on, it performs its power-up diagnostics. These tests take about 5 seconds and the modem ignores all commands while diagnostics are running. If the TST light is on steady (not blinking) for more then 5 seconds after the test, the modem has detected an error. Repair the problem using the trouble shooting section of the user guide supplied with the modem.
92
Front View
21
MultiTech MultiModem
Power
Front View
21
Figure 32. Modem Front Panel Locations (S008412l)
22. Initialize the modem expander: a. Go to the front of the modem expander and locate the CLEAR 23 [Figure 33] and SET 24 switches. b. Press and hold the SET and CLEAR switches at the same time. c. Release the CLEAR switch, wait one second, and then release the SET switch.
ON CLEAR SET
OFF 1 2
23
Front
24
23. Is the modem being installed as part of the initial 2105 installation? v Yes, continue with Completing the Installation of the 2105 Model 800 Unit on page 94. v No, the 2105 was previously installed without a modem, continue with the next step. 24. Ensure that the Communications Resources Worksheet has been filled in for the Call Home/Remote Services and Modem Configuration fields. Refer to the IBM TotalStorage Enterprise Storage Server Introduction and Planning Guide book, form number GC26-7444 for the worksheets, and to Filling in fields on the Communications Resources Worksheet in chapter 6 of the Volume 2 for the procedure.
Problem Isolation Procedures, CHAPTER 3
93
2. 3. 4. 5. 6.
Connecting the 2105 Model 800 to the ESSNet Hub: Attention: The 2105 and cable in this procedure are ESD-sensitive. Always wear an ESD wrist strap during this procedure. Follow the ESD procedures in Working With ESD Sensitive Parts in chapter 4 of Volume 2. 1. Disconnect the cluster to cluster communication Ethernet (RJ45) cable 1 [Figure 34] from both clusters. Note: Clusters will communicate across the ESSNet after the ESSNet Ethernet cables are installed.
94
I/O Drawer 1
I/O Drawer 2
Front View
2. Route one RJ45 cable (PN 18P1896) between the 10/100 Base T (RJ45) connector on each the I/O drawer and the ethernet hub. Ensure that the Ferrite cores on the cables are located inside the 2105 frame. This is needed to minimize RFI and to provide a loop to allow the I/O drawer to be moved to the service position. The additional cable length can be stored in the area between the AC input connectors. 3. Connect an Ethernet RJ45 cable to the RJ45 connector on each cluster. Connect the other end of each cable to the recommended port (1X to 15X) on the hub [Figure 35], do not connect to either hub port 16MDL-X or 16MDI. See Table 22 for the recommended 2105 Model 800 cluster hub port connection sequence. Label all RJ45 cables.
Table 22. 2105 Model 800 Recommended ESSNet Hub Connection Sequence 2105 Subsystem Being Installed 1 2 3 4 5 6 7 Cluster 1, Hub Connector 1X 2X 3X 4X 5X 6X 7X Cluster 2, Hub Connector 9X 10X 11X 12X 13X 14X 15X
95
Ethernet 10/100 Base T (RJ45) connector ESSNET* Ethernet RJ45 cable Ethernet Hub 1X 2X 3X 4X 5X 6X 7X
Master Console
IBM
9X
10X
11X
12X 13X
14X
15X 16 MDI-X 16 MDI MDI PORT Customer Port (non-crossover) Customer Port (crossover)
Ethernet 10 Base T (RJ45) connector * Note: See table for recommended plugging of additional 2105 subsystem connections to the Ethernet Hub
8X
Crossover
16 Ethernet Switch
Figure 35. ESSNet Hub Port Connector Locations (S008603p)
4. Connect the service terminal to cluster 1. Use the Repair Menu, Display / Repair Problems Needing Repair option, which displays problems from both clusters. If the cluster to cluster communication is not working, it will give an error message for cluster 2. Is there an error message for cluster 2? v Yes, go to MAP 4390: Isolating a Cluster to Cluster Ethernet Problem on page 377. v No, continue with the next step. 5. Is this the first 2105 Model 800 being installed on this ESSNet? v Yes, continue with the next step. v No, go to Configuring the 2105 Model 800 in chapter 5 of the Volume 2. 6. Continue with Installing and Connecting the ESSNet Console to the ESSNet Hub. Installing and Connecting the ESSNet Console to the ESSNet Hub:
96
4. 5. 6. 7.
97
98
ESSNet Setup for NetVista: 1. At the Begin Logon screen, press the Ctrl + Alt + Delete keys. 2. At the Logon Information window, enter Username: Administrator, password = password. Click OK. 3. Insert the current ESSNet Console Installation Diskette into the floppy diskette drive. 4. On the desktop, click Start, then Run. 5. Enter a:setupenc.exe, then press Enter. 6. Follow the instructions on the screen by selecting Yes to continue. 7. At the Installshield Self - extracting EXE window, click Yes, then Next, then Yes, then Next then Finish. 8. After clicking Finish, remove the ESSNet Console Installation Diskette from the a: drive. 9. Close the ESS Network window. 10. On the ESSNet console desktop, double click on the ESSNet Toolkit icon. 11. At the Missing Setup Files... dialog box, click OK.
99
19. Continue with Web Browser Setup for NetVista. Web Browser Setup for NetVista: 1. Start the web browser by clicking the Windows Start menu then choose Programs, ESS Network, ESSNet Console.... 2. From the pull down menu, click on Tools->Internet Options. This will bring up the Internet Options panel. 3. Under the General tab, click the Use Current button. 4. Click the Security tab. 5. Click the Internet icon. 6. Click the Custom Level. This will bring up the Security Settings panel. 7. Scroll through the Java section and under Java Permissions, click on the Custom radio button. 8. At the bottom of the Security Settings panel, click on Java Custom Settings. This will display the Internet panel. 9. Click the Edit Permissions tab. 10. At Run Unsigned Content, click the Enable radio button. 11. At the Internet panel, click OK. 12. At the Security Settings panel, click OK. 13. At the Warning ! screen, click Yes. 14. At the Internet option panel, click OK. 15. Maximize the web browser window (Home-Microsoft Internet Explorer). 16. Click the ESS Specialist button. 17. After the window changes, under Select a cluster, click (ESS-1 cluster-1). 18. At the Internet Connection Wizard window, click Cancel. 19. At another Internet Connection Wizard window, check Do not show the Internet Connection Wizard in the future, then click Yes. 20. Close the web browser (Specialist-Microsoft Internet Explorer). 21. Continue with Cleanup Desktop for NetVista. Cleanup Desktop for NetVista: 1. Move (drag and drop) all of the following icons to the right side of the desktop: v ESSNet Toolkit v Internet Explorer v My Computer v Network Neighborhood v Inbox v My Briefcase v Recycle bin Delete all of the remaining icons on the left side of the desktop.
100
101
102
103
m. Adjust the slider to the desired setting under Desktop Area: v 15 inch monitors, recommended setting 800 x 600 v 17 inch monitors, recommended setting 1024 x 768 n. Click on Test. o. Click OK on the Testing Mode window and wait to view the test screen. p. Click Yes if you saw the bitmap correctly. q. Click Apply on the Display Properties window. r. Click OK to close the Display Properties window. s. Use the buttons at the bottom of the monitor to adjust the screen. 5. Connect the ESSNet console to the Ethernet hub using the remaining RJ45 cable. Connect to hub port (8X) on the hub, do not connect to either hub port 16MDI-X or 16MDI. See in Figure 35 on page 96. TCP/IP Setup of the ESSNet Console, Personal Computer 300PL Only: 1. Right click the mouse on the Network Neighborhood icon. 2. Select Properties: Note: Do steps 3 to 12 only if this is the initial setup of TCP/IP on the workstation. If this is not the initial setup of TCP/IP, do the following: a. Click on the Protocols tab. b. Highlight the TCP/IP Protocol. c. Click on the Properties tab. d. Continue with step 13. Click Yes under Network Configuration to install NT Networking. Check the box by Wired to the Network, and click Next. Click Start Search to find the Ethernet adapter. Ensure that the ethernet adapter is checked and click Next. Ensure that TCP/IP Protocol is checked and click Next. Ensure all Network Services are checked and click Next. Click Next to install selected components. Click Continue to install the drivers from the c: drive. Click Continue to copy some Windows NT files. Click OK, when the Ethernet Adapter properties are shown, if a question is asked about DHCP, click NO. Ensure the ethernet adapter is selected under Adapter drop down menu. Select the Specify an IP address radio button. Enter 172.31.1.250 for the IP Address. Enter 255.255.255.0 for the Subnet Mask. Leave Default Gateway blank. Select Apply.
104
ESSNet Setup, Personal Computer 300PL Only: 1. Login as administrator. 2. Insert the ESSNet Console Installation Diskette into the floppy diskette drive. 3. On the desktop click start then Run... 4. Enter a:setupenc.exe. 5. Follow the instructions on the screen. 6. Remove the ESSNet Console Installation Diskette from the a: drive. 7. On the ESSNet console desktop, double click on the ESSNet Toolkit icon. 8. When you get the Missing Setup Files... dialog box, click OK. If you do not get this box, see ESSTOOLKIT NOTES in README.TXT. 9. Click Install/Configure tab. 10. Click ESSNet Configuration. 11. Click Subsystem tab. 12. Click Add ESS (2105). Note: If the ESSNet is already connected to the customers network, enter the information from the Communication Resources Worksheets before clicking Save. Click Save. Click OK. Close the ESSNet Toolkit. Close the ESSNetwork window.
Web Browser Setup, Personal Computer 300PL Only: 1. Internet Explorer comes preloaded with Windows NT. If you choose to use a different browser such as Netscape Communicator install it now. Note: For approved web browsers, see IBM TotalStorage Enterprise Storage Server Introduction and Planning Guide book (GC26-7444) or IBM TotalStorage Enterprise Storage Server Web Users Interface Guide book (SC26-7346). 2. Bring up the web browser by clicking the Windows Start menu and choosing Programs, ESS Network, ESSNet Console.
105
106
107
6.
7. 8.
9.
108
b. c. d. e. f. g. h.
109
15.
16. 17.
18. 19.
110
MAP 1630: Master Console Product Recovery Wizard for Xseries 206 PCs
Description
The ESSNet Consoles personal computer (PC) has a problem. Note: The IBM TotalStorage ESS Master Console will be referred to as the Master Console in this document.
111
Description
The 2105 Model 800 RPC cards are reporting a Model 100 attachment unit. Attachment of a Model 100 to a 2105 Model 800 is not supported. The problem is caused by incorrectly set RPC card DIP switches or a Model 100 has been connected.
Isolation
1. The 2105 Model 800 does not support connection of a Model 100 attachment rack. Note: 2105 Model Exx/Fxx do support the Model 100 attachment. Verify that RPC card DIP switch 5 is set to Off for both RPC cards.
Description
Most power symptoms create a related problem which should be used to start the problem repair. If a related problem was not created, the table below can be used to start the repair.
Isolation
Use the table below to find and repair your power symptom:
Table 23. 2105 Model 800 Power Symptoms Power Symptom Visual power symptoms. Description and Action Description: A problem is created for most power problems. Action: Use the service terminal Repair Menu, Show / Repair Problems Needing Repair option to begin the repair. If no related problems are found, go to MAP 1320: Isolating Problems Using Visual Symptoms on page 60. 2105 Model 800 will not power Description: If the RPC card switches are set for local mode, on in local mode. the 2105 Model 800 Local power switch should be able to power it on. Action: Go to MAP 2400: 2105 Model 800 Local Power On Problems on page 149.
112
2105 Model 800 will not power Description: If the RPC card switches are set for local mode, off in local mode. the 2105 Model 800 Local power switch should be able to power it off. If a pinned data condition exists, a problem will have been created and the 2105 Model 800 will not power off until that condition is repaired. Action: Go to MAP 2440: Rack 1 Power Off Problem on page 157. 2105 Model 800 will not power Description: If the RPC card switches are set for remote on in remote mode. mode, a 2105 Model 800 remote system should be able to power it on. Action: Go to MAP 2390: Rack 1 Power On Problem, Remote Mode on page 140. 2105 Model 800 will not power Description: If the RPC card switches are set for remote off in remote mode. mode, a 2105 Model 800 remote system should be able to power it on. If a pinned data condition exists, a problem will have been created and the 2105 Model 800 will not power off until that condition is repaired. Action: Go to MAP 2390: Rack 1 Power On Problem, Remote Mode on page 140. 2105 Model 800 will not power Description: If the RPC card switches are set for automatic on or off in automatic mode. mode, the 2105 Model 800 should power on the first time line cord power returns after both line cords lost power. Action: Go to MAP 2370: Rack 1 Power On Problem, Automatic Mode on page 136. 2105 Model 800 UEPO problems. Description: The UEPO switch on the operator panel should prevent the 2105 Model 800 power on when in the off position and should allow the 2105 Model 800 power on when in the on position. Action: Go to MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem on page 131.
CEC Drawer will not power on Action: Go toMAP 4880: Cluster Power On Problem on page 461.
Single Host Bay will not power Action: Go toMAP 4870: Host Bay Power On Problem on on page 459.
113
Description
The failing drawer power supplies are most likely creating an overcurrent condition on one of the output power busses that both power supplies share.
Isolation
Overcurrent affects both power supplies in the drawer due to the shared power busses. Find the drawer power supply that is failing: v CEC drawer power supplies, continue with the next step. v I/O drawer power supplies, go to step 3. v Host Bay drawer power supplies, go to 4. 2. One of the FRUs in the CEC drawer is drawing too much current and needs to be replaced: v Quiesce and power off the cluster. Login in to the cluster not being repaired and use the Repair Menu, Alternate cluster repair options. v Unplug or replace one or more of the CEC drawer FRUs including the power supplies and then power on retest until the overcurrent condition no longer occurs. 3. One of the FRUs in the I/O drawer is drawing too much current and needs to be replaced: v Quiesce and power off the cluster. Login in to the cluster not being repaired and use the Repair Menu, Alternate cluster repair options. v Unplug or replace one or more of the I/O drawer FRUs including the power supplies and then power on retest until the overcurrent condition no longer occurs. 4. A FRU in the host bay drawer is drawing too much current and needs to be replaced. The failure may be in either host bay, or in the host bay drawer power backplane. v Quiesce and power off a host bay using the Repair Menu, Replace a FRU option. v Unplug or replace one or more of the host bay FRUs including the power supplies and then power on retest until the overcurrent condition no longer occurs. v If the overcurrent still occurs, replace the host bay drawer backplane using MAP 4850: Repair the Host Bay Drawer on page 458. 1.
Description
The previous procedure measured more than 0.1 ohm resistance between the ground pin of the mainline power cable and the primary power supply enclosure. Follow these steps to check and repair the ground continuity problem.
Isolation
1. Disconnect the problem mainline power cable 1 [Figure 36] or 2 from the line cord bracket.
114
2. Prepare the multimeter to measure 0.1 ohm or less resistance. For connector information, refer to figure Figure 39 on page 117. v For the mainline power cable (plug in): Place one lead of the multimeter on the ground pin of the male plug on the mainline power cable.
115
2105
Ohm
1 50/60 A
3 60 A
Customer AC Power
Wired
CB OFF
Tag: Do Not Connect... S229-0237
Green/Yellow wire
v For the mainline power cable (wired): Place one lead of the multimeter on the green and yellow wire at the customer end of the mainline power cable. 3. Place the other lead on the ground pin of the female connector on the mainline power cable.
116
Ground
4.
5. 6.
7.
8. 9.
v If there is 0.1 ohm or less resistance, the mainline power cable is good but the primary power supply enclosure is not grounded. Perform steps 7 through 9. v If there is more than 0.1 ohm resistance, the mainline power cable ground lead is open or has resistance. Perform steps 4 through 6. The ground lead on the primary power supply is open or has resistance. Replace the mainline power cable Return to Primary Power Supply Removal and Replacement in chapter 4 of the Volume 2, and then return here to continue. Insert the female connector on the new mainline power cable into the inlet on the line cord bracket. Return to Checking the Ground Continuity in chapter 5 of the Volume 2, to verify that ground continuity now measures 0.1 ohm or less resistance on the replaced cable. The primary power supply enclosure is not grounded, replace the primary power supply. Go to Primary Power Supply Removal and Replacement in chapter 4 of the Volume 2, and then return here to continue. Insert the female connector on the mainline power cable into the inlet on the line cord bracket. Return to Checking the Ground Continuity in chapter 5 of the Volume 2, to verify that ground continuity now measures 0.1 ohm or less resistance
Description
The 2105 Model 800 is powered on. The cluster should be powered on and the 2105 Model 800 operator panel cluster Ready indicator should be on.
Isolation
1. The operator panel cluster Ready indicator may be off for one of the following reasons: v You have quiesced the cluster as part of a service action. The Ready indicator will be lit when the cluster is successfully resumed. If the problem is resolved, return to the procedure that sent you here. v The cluster has been fenced by an error and a problem was created. Display and repair the related problem.
Problem Isolation Procedures, CHAPTER 3
117
118
Description
A single host bay drawer power supply is not switching power on or off as expected by the 2105 functional code. There are two possible failures: v The power supply may be failing so that it is always in the same state, always on, or always off. v The power supply may not be receiving power off or on signals, from the RPC card, through the cable shared with the other power supply in the same host bay drawer.
Isolation
1. Ensure both RPC card to host bay drawer power supply cables are connected to both host bay drawer power supplies. 2. Repeat the power on or off procedure that sent you here. If the procedure still fails, return here and continue with the next step. 3. Observe the failing host bay drawer power supply input power LED indicators. Are both input indicators off? v Yes, continue with the next step. v No, go to step 5. 4. The power supply does not see either power input as present. Each power input cable is shared by the six power supplies (CEC drawer, I/O drawer, host bay drawer) in that half of the 2105. If both cables had no power, all the drawers would be powered off. Only one power supply is failing, replace it. 5. The other power supply in the host bay drawer is working as expected by the 2105 functional code. Only one RPC to host bay drawer power supply cable needs to be working for the power supply to operate properly. It is unlikely that both cables are failing. You can replace the failing power supply now, or do the following isolation test: a. Unplug one of the RPC to host bay drawer power supply cables from the failing power supply. b. Repeat the original operation that failed to power the host bay off or on: v If it works, replace the cable that is disconnected. v It it fails, reconnect the cable and unplug the other cable from the failing power supply. Repeat the test: If it works, replace the cable that is disconnected. If it fails, replace the power supply.
119
MAP 2220: Input Power to CEC, I/O, Host Bay Drawer Power Supply Not Detected
MAP 2220, input power to CEC, I/O, host bay drawer power supplies not detected Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
Each CEC, I/O, and Host Bay drawer power supply has two input power connectors. Each power supply can operate normally with one or both input connectors receiving input power. Input connector 1 on the six CEC, I/O, and host bay drawer power supplies, in a cluster, share a common input power cable from one PPS. Input connector 2 is supplied by a similar cable from the other PPS. Notes: 1. Before replacing an input power cable to the CEC drawer, I/O drawer, or host bay power supply, verify that the power supplies are receiving power through the other input power cable. Observe the power supply input LED indicators on the affected power supply. v CEC drawer power supply input power indicators are: PWR 1 and PWR 2. v I/O drawer power supply input power indicators are: PWR 1 and PWR 2. v Host bay power supply input power indicators are: INPUT J11 and INPUT J12. 2. When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence.
Isolation
Observe the six power supplies for the CEC, I/O, and host bay drawers for a cluster. How many power supplies have the same power supply input power LED indicator off? v One, continue with the next step. v Two to Five, go to step 5 on page 121. v Six, go to step 6 on page 121. 2. Observe the failing power supply. Is the input power cable plugged into the failing power supply input connector? v Yes, go to step 4. v No, continue with the next step. 3. Plug the input power cable. Did the power supply input power LED indicator come on? v Yes, use the Repair Menu options, Close a Previously Repaired Problem and End Of Call Status to complete the repair action. v No, continue with the next step. 4. Unplug the cable from the failing power supply input connector. Check the connector contacts on the cable and failing power supply. v If a problem is found, replace the damaged FRU. 1.
120
5.
6.
7.
8.
9.
121
Cluster 1 (T1), CEC, I/O, Host Bay drawer power supplies, input 1 connector Cluster 2 (T2), CEC, I/O, Host Bay drawer power supplies, input 2 connector
Cluster 2 (T2), CEC, I/O, Host Bay drawer power supplies, input 1 connector Cluster 1 (T1), CEC, I/O, Host Bay drawer power supplies, input 2 connector
Description
The CEC and I/O drawer power supplies will switch on the amber CHK/PWR-GOOD LED indicator for: v v v v Missing input power Over-current Over-voltage Under-voltage
The Host Bay power supplies will switch off the HA1 or HA2 power LEDs under control of the RPC or for the following error conditions: v Missing input power on both inputs v Over-current v Over-voltage v Under-voltage Notes: 1. Before replacing an input power cable to the host bay drawer power supplies, verify that the power supplies are receiving power through the other input power cable. Observe the power supply INPUT PRESENT, LED indicators. 2. When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence.
Isolation
1. Were you sent here for a problem with one of the Host Bay power supplies? v Yes, continue with the next step. v No, continue to step 7 on page 124. 2. Reference the ESC in the problem log. Use the following table to determine the failing FRU and action:
ESC Failing FRU Action
122
3. Replace the failing FRU listed in the previous step. Select the FRU for replacement using the problem log. If the FRU is not selectable in the problem log, use: Repair Menu, Replace a FRU option. (The Host bay power supplies are listed under the rsrack1 container.) v If the FRU replacement is completed successfully, continue with step 12 on page 124. v If the FRU replacement is unsuccessful, replace any remaining FRUs listed in the problem. Use the Repair Menu, Replace a FRU option. 4. Observe the input J11 and J12 LEDs on the failing Host bay power supply. Is one or both of the LEDs off? v Yes, go to map MAP 2220: Input Power to CEC, I/O, Host Bay Drawer Power Supply Not Detected on page 120. v No, continue with the next step. 5. Observe the HA1 and HA2 LEDs on the failing power supply. Is one or both of the LEDs off? v Yes, continue to the next step. v No, the power supply does not appear to be failing. Perform a dummy replacement of the power supply. Use: Repair Menu, Replace a FRU option. Ensure that the power supply is physically removed and replaced. The Host bay power supplies are listed under the rsrack1 container. If the dummy replacement is successful, then continue with step 12 on page 124. If the dummy repair is unsuccessful, then replace the failing power supply or other FRUs listed in the problem. Use: Repair Menu, Replace a FRU option. Continue with step 12 on page 124 when the replacement is completed. 6. Observe the HA1 and HA2 LEDs on the companion Host bay power supply in the same drawer. Is the same LED off on both power supplies? v Yes, go to map MAP 2030: CEC, I/O, or Host Bay Drawer Overcurrent on page 113. v No, replace the failing power supply determined in step 2 on page 122. Use: Repair Menu, Replace a FRU option.
Problem Isolation Procedures, CHAPTER 3
123
7.
8.
9.
10.
11.
Note: Call the next level of support before proceeding with the FRU replacement. 12. If you used the Replace a FRU menu option, then close the problem that sent you here. Use: Repair Menu, Close a Previously Repaired Problem. Then use: Repair Menu, End of Call Status menu option to complete the service action.
Description
The 2105 Model 800 code has detected an installed unit or feature that is not correct.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The 2105 Model 800 code has detected one of the following:
124
Description
The PPS status code of 06 is a communication failure between PPS-1 and PPS-2. This communication failure can be caused by two different conditions: 1. A hardware communication fault between PPS 1 and PPS 2. Because PPS 1 and PPS 2 communicate in both directions, the failure could be in either PPS or the communication cable. 2. A mismatch of the PPS identifications. When a PPS is installed in the PPS-2 position, which never has a battery signal cable connection, the PPS identification status code should be a 92. When a PPS is installed in the PPS-1 position, which always has a battery signal cable connection, the PPS identification status code should be a 91. If both PPS have the same identification status code, they will display an 06 status code.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Verify that the PPS to PPS Cable is properly plugged into the J3 connector on PPS 1 and PPS 2. Is the cable connected correctly? v Yes, continue with the next step. v No, Read the Attention below before continuing. Connect the cable and then press the 2105 Model 800 operator panel Local power on switch momentarily to on (up). If the status code 06 is no longer displayed, go to MAP 1500: Ending a Service Action on page 67. If the status code 06 is still displayed, continue with the next step. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS. 2. Ensure both PPS have the same code level.
125
3.
4.
5.
6. 7.
8.
126
Description
The PPS Status display is normally off. If a power fault is detected, the status display will display a two digit code. If more than one fault is present, the first status code will display followed by the next codes. If a status code is displayed, the operator panel Line Cord indicator for this PPS should be blinking slowly. Pressing the 2105 Model 800 operator panel Local power switch momentarily to the on position will display the PPS code level, the PPS I.D. and any status codes that are active.
Isolation
1. Observe the operator panel Line Cord LED indicator for the failing PPS. Find the condition that applies: v On solid, the failing condition is no longer present. Return to the procedure that sent you here or call the next level of support. v Blinking slowly, the failing condition is still present and a status code should be displayed. Continue at the next step.
127
128
129
130
Description
The 2105 Model 800 operator panel UEPO (Unit Emergency Power Off) switch is used to switch off the PPS dc output. The logic voltage for the PPS internal logic, RPC card and operator panel are not switched off. To switch off all logic voltage, the PPS input circuit breaker must be switched to off. Note: Each PPS supplies the other PPS with logic voltage only for the PPS internal logic through the PPS to PPS communication cable. This occurs if the PPS input circuit breaker is on and customer line cord power is present.
Problem Isolation Procedures, CHAPTER 3
131
Rear View
Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY
Isolation
The 2105 Model 800 will be powered off during this isolation. Ensure it is not in use by the customer. This isolation does a complete checkout of the UEPO functions. Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The 2105 Model 800 should be in local power control mode for this MAP. Ensure the local/remote switch is set to local (down) for both RPC switch cards (Local Switch Cards or Remote Switch Cards depending on which feature is installed). Verify the local/remote switch is set to local (down) on both RPC switch cards (Local Switch Cards or Remote Switch Cards depending on which feature is installed). 2. Power off the 2105 Model 800. 3. Ensure the input circuit breaker for each PPS is set to on (up). 4. Ensure that the 2105 Model 800 operator panel Unit Emergency switch is set to on (up).
132
Description
The 2105 Model 800 operator panel UEPO (Unit Emergency Power Off) switch is used to switch off the primary power supply (PPS) 395 V dc output. The logic voltage for the PPS internal logic, RPC card, and operator panel are not switched off. To switch off all logic voltage, the PPS input circuit breaker must be switched to off. Note: Each PPS supplies the other PPS with logic voltage only for the PPS internal logic through the PPS to PPS communication cable. This occurs if the PPS input circuit breaker is on and customer line cord power is present.
133
Rear View
Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY
Isolation
Attention: If you are performing the following steps concurrently with customer operation, you must take care to ensure that you always remove the correct connectors. Failure to do so could result in a complete subsystem power drop. 1. The 2105 Model 800 should be in local power control mode for this MAP. Verify that the RPC card local/remote switch for each RPC card is set to local (down). If they are set to remote (up), set them to the down position. When the repair is complete, set them back to their original position. 2. Is status code 14 or 08 displayed on both PPSs in this rack? Note: The status code can appear continuously or flashing in combination with other codes, for example 10-12-14. v Yes, this condition is not expected. Call your next level of support. v No, continue with the next step. 3. Ensure the UEPO cable is plugged into PPS connector J6 and operator panel UEPO card connectors J1 or J2. Did you find a problem? v Yes, go to Step 7 on page 136.
134
(R1- )
1 2
Clips
Front View
Figure 42. Rack Operator Panel Locations (s009714)
Rear View
v Yes, the cause may be a problem on the Customer Remote UEPO circuit. Do not continue, contact your next level of support for advice. v No, ensure that the UEPO Local/Remote switch is in the Local position 1 . If you found a problem go to Step 7 on page 136, if not, continue with the next step. 6. The PPS to UEPO card cable or the operator panel UEPO card is failing. Perform the following actions to determine the cause. a. Carefully trace the cable back from the PPS to the UEPO panel to identify the connector (P1 or P2) into which the cable plugs. b. Unplug the connector from the UEPO panel. c. Use a CE meter to measure the resistance between the pins on the UEPO panel connector. Was the resistance measured in step c less than 1 ohm? v Yes, replace the cable from the PPS to the UEPO panel then continue with the next step. v No, read the notes below, replace the UEPO panel and then continue with the next step. Notes: a. If the UEPO panel needs to be changed concurrently, contact your next level of support for a procedure.
Problem Isolation Procedures, CHAPTER 3
135
Description
The 2105 Model 800 power can be controlled in three modes: v Local Power Control Mode: This mode is available with or without the Remote power feature, RPC DIP switch set to off (left) or on (right). The operator panel Local power switch controls power on and power off. For local power control, the RPC remote switch card Local/Remote or Local/Automatic switch is set to the Local position (down). v Automatic Power Mode: This mode is only available when the Remote power feature is not installed. RPC DIP switch 3 will be set to off (left). Loss of power to both line cords causes a power off after the 2105 Model 800 has de-staged customer data using the batteries for up to five minutes. When one or both line cords have power again, a power on automatically occurs. The automatic power on will only occur once after each power loss to both line cords. The operator panel Local power switch can also control power on and off. For automatic power control, the RPC card Local/Automatic switch is set to Automatic Position (up) and RPC card switch 3 is Off (left). v Remote Power Control Mode: This mode is only available when the Remote power feature is installed. RPC DIP switch 3 will be set to on (right). With line cord power present, a remote control power cable from a host system controls power on and power off. The operator panel Local power switch cannot control a power off or on. If the operator panel needs to be used to control power, switch the Local/Remote switch to the Local position (down). For remote power control, the RPC card Local/Remote switch is set to Remote position (up) and RPC card switch 3 is On (right). It only requires one host system to power on the 2105 Model 800, even if remote power control cables from others host systems that are powered off are connected. A single system cannot power off the 2105 Model 800 unless all the host systems with remote power control cables attached are powered off.
136
Note: DIP switch 3 is set to off if Remote Power Control feature is not installed. DIP switch 3 is set to on if Remote Power Control feature is installed.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Use the service terminal Repair Menu, Display / Repair Problems Needing Repair option to repair any related power problems before continuing. 2. You must take the 2105 Model 800 away from the customer before continuing with this procedure. 3. Use the 2105 Model 800 operator panel Local power switch to power off. 4. Ensure the RPC Interconnect Cable is connected. 5. Ensure the RPC card to CEC, I/O, and host bay drawer cables are connected. 6. Ensure the switches on each RPC card are set for automatic mode per the table above. 7. Set the input MAIN LINE circuit breaker (CB00) to off (down) on PPS 1 and PPS 2. 8. Set the PPS 1 input CB to on (up). Did the 2105 Model 800 power on? v Yes, do the following steps: a. Power the 2105 Model 800 off. b. Set the PPS 1 input CB to off. c. Set the PPS 2 input CB to on. When the 2105 Model 800 powers on, return to the procedure that sent you here or go to MAP 1500: Ending a Service Action on page 67. v No, continue with the next step. 9. Set the PPS 1 input CB to off. 10. Set the PPS 2 input CB to on.
Problem Isolation Procedures, CHAPTER 3
137
Description
The 2105 Expansion Enclosure operator panel UEPO (Unit Emergency Power Off) switch is used to switch off the PPS dc output in the 2105 Expansion Enclosure only. The logic voltage for the PPS internal logic, RPC card and operator panel are not switched off. To switch off all logic voltage, the PPS input circuit breaker must be switched to off. Note: Each PPS supplies the other PPS with logic voltage only for the PPS internal logic through the PPS to PPS communication cable. This occurs if the PPS input circuit breaker is on and customer line cord power is present. The 2105 Expansion Enclosure operator panel UEPO switch only powers off the 2105 Expansion Enclosure, not the 2105 Model 800. The 2105 Expansion Enclosure is powered on using the 2105 Model 800 operator panel local/remote power control switch. The PPS UEPO PWR indicator is on when the PPS has customer input power, the input circuit breaker is on and the PPS internal logic is providing UEPO logic voltage.
138
Rear View
Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY
Isolation
The 2105 Expansion Enclosure and 2105 Model 800 will be powered off during this isolation. Ensure it is not in use by the customer. This isolation does a complete checkout of the UEPO functions. Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The 2105 Model 800 should be in local power control mode for this MAP. Ensure the local/remote switch is set to local (down) for both RPC switch cards (Local Switch Cards or Remote Switch Cards depending on which feature is installed). If they are set to remote (up), set them to the down position. When the repair is complete, set them back to their original position. 2. Power off the 2105 Model 800, which also powers off the 2105 Expansion Enclosure. 3. Ensure the input circuit breaker for each 2105 Expansion Enclosure PPS is set to on (up). 4. Ensure that the 2105 Expansion Enclosure operator panel Unit Emergency switch is set to on (up).
139
Description
The 2105 Model 800 power can be controlled in three modes:
140
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. This procedure requires the 2105 Model 800 be taken away from customer use so it can be powered off and on. Verify that all customer activity is stopped before going to the next step. 2. Do the following checks: v Verify that the RPC card switch 3 is set to remote, On (right). v Verify that the RPC remote switch card Local/Remote switch is set to remote, On (up). v Verify that the host system(s) remote power control cables are properly connected to the remote power control card in the tailgate and also at each host system. v Verify that the remote power control card, in the tailgate, to RPC remote switch card cable is properly connected to remote power control card connector J1 and both RPC remote switch cards. Are the cables and switches correct? v Yes, continue with the next step. v No, correct the problem and attempt to power on from the host system(s). If it still fails return to the beginning of this MAP. 3. Set the 2105 Model 800 to local power control mode.
141
142
9.
10.
11.
12.
143
Description
To power off the 2105 Expansion enclosure, the following must occur: 1. The 2105 Model 800 (Rack 1) RPC cards must receive a power off request. This signal is from the 2105 Model 800 operator panel, if in Local mode, or from the remote power control card, if in Remote mode. 2. Each RPC card sends a power signal to the primary power supply (PPS) in the expansion enclosure that it is directly cabled to. The 2105 Expansion enclosure does not prevent the 2105 Model 800x (rack 1) from powering off. If rack the 2105 Model 800 will not power off, exit this MAP and begin at MAP 2440: Rack 1 Power Off Problem on page 157.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. You are here because the 2105 Expansion Enclosure will not power off. Does the 2105 Model 800 power off and the expansion enclosure remain powered on? v Yes, continue with the next step. v No, the 2105 Model 800 must be able to power off by itself. Exit this MAP and go to MAP 2440: Rack 1 Power Off Problem on page 157. 2. Verify that the control cables from the primary power supply (PPS) to the RPC are connected correctly. Check the J4 connector on the expansion enclosure primary power supply and the J2 slot 5 connector on the 2105 Model 800 RPC card. Reference the 2015 Expansion Enclosure procedure in the install Chapter 5 for more information and diagrams if needed. v If a problem is found and repaired, retry the operation that sent you here. v If the problem is not fixed, continue with the next step. 3. With the 2105 Model 800 powered on and ready, connect the service terminal. Use the Repair Menu, Show / Repair Problems Needing Repair option to repair any related power problems for the RPC in rack 1 or the PPS in rack 2. v If a problem is found and repaired, retry the operation that sent you here. v If the problem is not fixed, continue with the next step. 4. Check the operation of the 2105 Expansion Enclosure PPS connections to the 2105 Model 800 RPC cards. Momentarily press the operator panel Local Power switch to on (up). Observe both PPS status display in the expansion enclosure. They should both display the PPS code level with the repeated sequence 00-xx-yy (xx=code level, yy=PPS l.D.). Do both PPS display the code level sequence? v Yes, each RPC is correctly cabled to its PPS power supply in the expansion enclosure, continue with the next step. v No, go to step 6 on page 146.
144
Rear View
Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY
5. The 2105 Model 800 will only power off if both of its PPSs power off. The PWR GOOD indicator on the PPS should be blinking slowly when the PPS is powered off and in standby mode. Standby mode is when the main output voltages are off, but the PPS internal logic voltages and line cord input voltages are still on. Press the 2105 Model 800 operator panel Local Power switch momentarily to off (down). Wait up to five minutes for the PPS PWR GOOD indicators to slow flash (indicates powered off to standby mode). Find the condition below that you have: v Both PPS PWR GOOD indicators are blinking slowly. The 2105 Model 800 powered off successfully. Return to the procedure that sent you here, or go to MAP 1500: Ending a Service Action on page 67. v Both PPS PWR GOOD indicators are on solid. Continue with the next step. v One PPS PWR GOOD indicator is on solid and the other is blinking slowly. One PPS powered off and the other did not. Do the following: a. Momentarily press the operator panel Local Power switch to on. This will cause both PPS to be powered on again. Wait until both PPS PWR GOOD indicators are on solid. This allows the working PPS power system to keep the 2105 Model 800 power on while the possible failing FRUs are replaced. b. Replace the following FRUs until both PPS power off from the operator panel Local Power switch.
Problem Isolation Procedures, CHAPTER 3
145
Description
When the 2105 is powered on and Ready, the microcode monitors various power boundaries such as the clusters and primary power supplies. If the microcode senses a power boundary power off and on several times in a short period of time, it creates a problem for it. The power cycles can be caused by the customer or service representative or a microcode recovery action. The problem is reported for conditions that the customer and service representative may not be aware of.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The problem details list one or more FRUs. These are FRUs that, with special microcode recovery conditions, could cause a power boundary within the 2105 to power off and on repeatedly. The power off and on could also have occurred due to special service actions being performed by the service representative or customer line cord power being powered off and on repeatedly.
146
Description
The 390 V battery set is connected to primary power supply 1 (PPS-1) and PPS-1 is connected to RPC-1. RPC-1 reports that the 390 V battery set is attached. Primary power supply 2 (PPS-2) is connected only to RPC-2, not to the 390 V battery set, and does not report battery attachment. With the problem calling this MAP, RPC-2 is falsely reporting that a 390 V battery set is attached to PPS-2. One of the following problems can cause this: v The cable from PPS-1 is connected to RPC-2 (for example, both PPS to RPC cables are cross connected). v The RPC card DIP switch address settings are not correct.
Isolation
Notes: 1. This problem is usually caused by the rack 1 or rack 2 PPS-1 being connected to RPC-2 (instead of RPC-1), or the RPC card DIP switches set incorrectly. 2. When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Find the ESCs from the related problem or logs to determine the action:
ESC Single problem with ESC=8462 and expansion rack installed within the last 4 weeks. All other conditions Action Go to MAP 2450: Crossed RPC Cables to Expansion Rack on page 160 Continue with the next step.
2. Verify that the following 2105 (rack 1) cables are connected correctly: v RPC-1 card (top RPC card) connector J2 port 6 is cabled to rack 1 primary power supply 1 (PPS-1) connector J4. v RPC-2 card (bottom RPC card) connector J2 port 6 is cabled to rack 1 primary power supply 2 (PPS-2) connector J4. After the cables are verified, find the condition that applies: v If both cables are connected correctly, continue with the next step. v If both cables are not connected correctly, the 2105 will have to be powered off to correct the problem.
Problem Isolation Procedures, CHAPTER 3
147
2 1
6 5
16 15
3
RPC-2 6 5 16 15
2
RPC-2 Card
J2
5
J2-6 15
Rear View
1 J2-5
J4 PPS-2
J4 PPS-1
Rear View
3. If there is an expansion rack installed, verify that the following cables are connected correctly: v RPC-1 card (top RPC card) connector J2 port 5 is cabled to rack 2 primary power supply 1 (PPS-1) connector J4. v RPC-2 card (bottom RPC card) connector J2 port 5 is cabled to rack 2 primary power supply 2 (PPS-2) connector J4. Find the condition that applies: v If both cables are connected correctly, continue with the next step. v If both cables are not connected correctly, go to MAP 2450: Crossed RPC Cables to Expansion Rack on page 160. 4. Verify that address switches 1 and 2, on both RPC cards, are set correctly on both RPC cards: v Cluster 1, RPC-1 6 [Figure 46] or Cluster 2, RPC-2 7 Switch 1: - RPC 1 = On (switch to right ) - RPC 2 = Off (switch to left ) Switch 2: - RPC 1 = Off (switch to left )
148
RPC-1 Card
RPC-2 Card
Rear View
After the switches are verified, find the condition that applies: v If the switches on both RPC cards are set correctly, call the next level of support for engineering assistance. v If the switches on both RPC cards are not set correctly, the 2105 will have to be powered off to correct the problem. v If the switches on only one RPC card are not set correctly, use the RPC card FRU replacement procedure to power off the RPC card before correcting switch settings. Use the service login Main Service Menu, Repair Menu, Replace a FRU menu, (Container = rsrack1).
Description
Directions to replacement MAP.
Isolation
1. This MAP has been replaced by a new MAP, go to MAP 4670: Cluster Powered Off Unexpectedly on page 431.
149
Description
The 2105 Model 800 is not powering on properly. Only one of the two 2105 Model 800 power systems is needed to power on the 2105 Model 800. However, this MAP will require both power systems to be functioning.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. At the 2105 Model 800, are the Local /Remote power switches on both RPC switch cards [Figure 47] set to Local mode (down)? v Yes, go to step 3. v No, continue with the next step. 2. Set the RPC switch card Local/Remote or Local/Automatic switch to Local mode (down). (Remember to set the switches back to their original position when the repair is complete.) Attempt to power on the 2105 Model 800 using the 2105 Model 800 operator panel Local Power switch [Figure 49]. Does it power on? v Yes, the 2105 Model 800 only fails in remote power control mode. Go to MAP 2390: Rack 1 Power On Problem, Remote Mode on page 140. v No, continue with the next step.
RPC-1 Card
RPC-2 Card
AUTO or REMOTE
7
LOCAL
Rear View
Figure 47. 2105 Model 800 RPC Local/Remote Switch Location (s009127)
3. Observe the primary power supply (PPS) to RPC control cables. v PPS-1 connector J4 to RPC-1 connector J2 slot 6. v PPS-2 connector J4 to RPC-2 connector J2 slot 6. Are both cables properly connected? v Yes, continue with the next step. v No, before reconnecting the cable, go to the PPS it should be connected to and set the input circuit breaker to the off position. The 2105 Model 800 RPC cards can stay powered on while the cable is connected. Connect the cable. Set the input circuit breaker to on (up), then attempt to power on the
150
Rear View
Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY
5. Observe each PPS UEPO PWR indicator. Is the indicator on? v Yes, continue with the next step. v No, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127 for status code 8 and perform the actions listed. 6. Ensure the 2105 Model 800 operator panel UEPO switch is set to on (up). 7. Observe the 2105 Model 800 PPS UEPO LOOP-STBY indicator. Is it on? v Yes, continue with the next step. v No, go to MAP 2360: 2105 Model 800 (Rack 1) UEPO Problem on page 131. 8. Observe the PWR GOOD indicator. Is it slow blinking? v Yes, the PPS is in standby mode, waiting for a power on request. Continue with the next step. v No, replace the PPS. If the 2105 Model 800 still fails to power on, return to the beginning of this MAP. 9. Observe the PWR UNIT FAULT indicator.
Problem Isolation Procedures, CHAPTER 3
151
REMOTE
Local Power
Ready Cluster 1 Cluster 2 Power Complete Line Cord 1 Line Cord 2 Messages Cluster 1 Cluster 2 Front View Rear View
Front View
12. Observe each 2105 Model 800 PPS. Find the condition that now exists. v The PPS GOOD indicator is on solid which is normal operation. 390V output is being supplied to electronics cage and storage cage power supplies. The 2105 Model 800 should be powering on. If not, reenter the service guide with the new symptom(s). v A PPS status code is displayed. Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v The PPS GOOD indicator is still slow blinking. Continue with the next step. 13. Replace the PPS. Does it still fail? v Yes, continue with the next step. v No, go to MAP 1500: Ending a Service Action on page 67. 14. Replace the RPC card for the PPS that is slow blinking and then attempt to power on. (RPC-1 for PPS-1, RPC-2 for PPS-2) If the clusters are in Ready, use the service terminal FRU Replace menu option to replace the RPC card. If it still fails, call the next level of support. If it no longer fails go to MAP 1500: Ending a Service Action on page 67.
152
Description
The microcode detected a mismatch between the setting of the power mode switches on the two RPC cards that lasted more than 10 seconds.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Do the following visual checks: v Observe DIP switch position 3 on each RPC card. v Observe the Local control switch (large white) on each RPC switch card (below the RPC cards.) Are the switches set the same? v Yes, the RPC code is detecting that one of the four switches appears to be failing. Go to step 4. v No, determine which RPC card, switch card, or both are set incorrectly, reference the instructions in Checking the 2105 Model 800 Switch Settings in chapter 5 of the Volume 2. Read the section for setting the RPC and switch card settings to match the power control feature the customer wants. Use the next step to change the switch settings. 2. To change the RPC card or switch card settings, you must power the FRU off using the Replace a FRU process. When the FRU is powered off, the switch settings can be changed. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU 3. After the replace a FRU action is complete, determine if the error has been repaired. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): v In the original problem, display the Last Occurrence date/timestamp field in the problem details. If it has not been updated, the problem has been fixed. v If there is no new related problem, the problem has been fixed. Has the problem been fixed? v Yes, verify that the original problem has been closed and that there are no other problems to be repaired. Use the following to do this. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem End of Call Status v No, one of the FRUs is failing even though the switches appear to be set correctly. Replace the RPC cards and RPC switch cards until the problem has been fixed using steps 2 and 3. 4. There are two actions you can take:
Problem Isolation Procedures, CHAPTER 3
153
Description
The 2105 Expansion Enclosure is not powering on properly from the 2105 Model 800. Only one of the two 2105 Expansion Enclosure power systems is needed to power on the 2105 Model 800. However, this MAP will require both power systems to be functioning.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Does the 2105 Model 800 this 2105 Expansion Enclosure is attached to power on? v Yes, continue with the next step. v No, go to MAP 2400: 2105 Model 800 Local Power On Problems on page 149. 2. Observe the 2105 Expansion Enclosure primary power supply (PPS) to 2105 Model 800 RPC control cables. v 2105 Expansion Enclosure PPS-1 connector J4 to 2105 Model 800 RPC-1 card connector J2 (Slot 5). v 2105 Expansion Enclosure PPS-2 connector J4 to 2105 Model 800 RPC-2 card connector J2 (Slot 5). Are both cables properly connected? v Yes, continue with the next step. v No, before reconnecting the cable, go to the 2105 Expansion Enclosure PPS it should be connected to and set the input circuit breaker to the off position. The 2105 Model 800 RPC cards can stay powered on while the cable is connected. Connect the cable. Set the input circuit breaker to on (up), then attempt to power the 2105 Expansion Enclosure on again. If it still fails continue with the next step.
154
Rear View
Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY
4. Observe each 2105 Expansion Enclosure PPS UEPO PWR indicator. Is the indicator on? v Yes, continue with the next step. v No, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127 for status code 8 and perform the actions listed. 5. Ensure the 2105 Expansion Enclosure operator panel UEPO switch is set to on (up). 6. Observe the 2105 Expansion Enclosure PPS UEPO LOOP-STBY indicator. Is it on? v Yes, continue with the next step. v No, go to MAP 2380: 2105 Expansion Enclosure (Rack 2) UEPO Problem on page 138. 7. Observe the PPS Good indicator. Is it slow blinking? v Yes, the PPS is in standby mode, waiting for a power on request. Continue at the next step. v No, replace the PPS. If the 2105 Expansion Enclosure still fails to power on, return to the beginning of this MAP.
Problem Isolation Procedures, CHAPTER 3
155
REMOTE
Local Power
Ready Cluster 1 Cluster 2 Power Complete Line Cord 1 Line Cord 2 Messages Cluster 1 Cluster 2 Front View Rear View
Front View
11. Observe each 2105 Expansion Enclosure PPS. Find the condition that now exists. v The PPS Pwr Good indicator is on solid which is normal operation. 390V output is being supplied to storage cage power supplies. The 2105 Expansion Enclosure should be powering on. If not, reenter the service guide with the new symptom(s). v A PPS status code is displayed. Go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v The PPS Pwr Good indicator is still slow blinking. Continue at the next step. 12. Replace the PPS. Does it still fail? v Yes, continue with the next step.
156
Description
The firmware code in one RPC card is not at the latest level available.
Isolation
1. The firmware installed on the RPC card is down level from the latest available on the 2105 Model 800 LIC code library. The problem that sent you here displays the RPC card that is down level in the FRUs list. 2. Return to the service terminal and follow the displayed instructions to load the RPC code. Note: Do not press F3 to escape out of the problem. Do not use the LIC Menu options to update the RPC card firmware.
Description
The following must occur for the 2105 Model 800 to power off. Both RPC cards must receive a power off request. This is from the 2105 Model 800 operator panel if in Local or Automatic mode or from the remote power control card if in Remote mode. Both RPC cards must agree that they have received a power off request. If one RPC card is fenced (quiesced), the other card can power off the 2105 Model 800 without getting agreement. If a pinned data condition exists, the power off request will be ignored. The power off request will work after the pinned data condition is cleared.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Connect the service terminal to a cluster that will not power off. From the service terminal Main Service Menu, select: Utilities Menu Pinned Data Menu Display Pinned Data Are any volumes displayed with retryable, non-retryable or FC status?
157
Table 28. Remote Power Feature Not Installed RPC Card Local/Remote Switch Setting Down (local power control) RPC card DIP switch position 3 setting (automatic power control) Off (to left) Switches Set to Power Off From Operator panel local power switch Operator panel local power switch
5. Are the RPC switches set to use the 2105 Model 800 operator panel Local Power switch? v Yes, continue with step 7 v No, continue with the next step. 6. Set the RPC card DIP switch position 3 to off (to left) for both RPC cards. Set the RPC switch card switch to local (down) for both RPC switch cards. Attempt to power off using the operator panel Local Power switch. Does the 2105 Model 800 power off now? v Yes, go to step 13 on page 160. v No, power off fails in both remote and local modes. Leave the switches set for Local mode. (After the problem is fixed, remember to set the switches back to remote mode.) Continue with the next step. 7. Connect the service terminal and use the Repair Menu, Show / Repair Problems Needing Repair option to repair any related power problems (PPS,
158
Rear View
Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY
9. The 2105 Model 800 will only power off if both PPS power off. The PWR GOOD indicator on the PPS will be slow blinking when the PPS is powered off to standby mode. Standby mode is when the main output voltages are off, but the PPS internal logic voltages and line cord input voltages are still on. Press the 2105 Model 800 operator panel Local Power switch momentarily to off (down). Wait up to 5 minutes for the PPS PWR GOOD indicators to slow flash (indicates powered off to standby mode). Find the condition that applies for you?
Problem Isolation Procedures, CHAPTER 3
159
Description
The RPC power control cables to the expansion rack have been detected as being crossed. Normally this would occur during the install of the expansion rack. It can be detected and reported up to 30 days after the expansion rack is installed.
Isolation
1. Observe the RPC card online green LED indicator beneath the DIP switches on each RPC card.
160
16 5
1
~
~
15
RPC-2 Card
J2
15
~
~
P/N 18P4495 15
Rear View
1 J2-5
Rear View
3. Correct the cable plugging so that it matches the description in step 2. Note: The expansion rack will stay powered up while this is done.
Problem Isolation Procedures, CHAPTER 3
161
Close each problem. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 5. Complete the service action. From the service terminal Main Service Menu, select: Repair Menu End of Call Status
Description
The 390V Battery Set did not reach full charge in 30 hours. An uncharged battery set will be charged at a high rate for up to 5 hours with a switched 750 ma current. Then at low rate for up to 25 hours with a constant 750 ma current. It then begins a trickle charge.
Isolation
1. Ensure the circuit breaker on the master battery (under PPS -1) is set to on. 2. Ensure the cable between the master and slave battery is connected. 3. Ensure both cables between the master battery and PPS 1 are connected. 4. The 03 will automatically go blank when the battery set reaches full charge in not more than 30 hours. The 03 is always displayed for 5 minutes (PPS code level 20 or greater) when PPS 1 powers. Then the battery charge level is checked. 5. Wait up to 30 hours for the batteries to reach full charge.
Description
The battery has a low charge or PPS 1 has detected a battery fault condition. If code 03 is displayed, the battery is low and is charging. A battery that is discharged can require up to 25 hours to become fully charged. The system will report a permanent battery failure if the condition persists beyond the normal charge time. If code 04 is displayed, a battery failure is indicated. This can be a false condition generated during replacement of PPS 1 or the battery. Notes: 1. If the battery set is the FRU, both halves of the battery must be replaced at the same time.
162
Isolation
Notes: 1. If the 390V Battery FRU was replaced, the new initial charge date must be entered. The service login FRU replacement process should prompt you to enter the date. If you are not prompted, use the Main Service Menu, Utility Menu, Battery Menu, Update Battery Charge Date option. 2. When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. Procedural Steps 1. If the RPC-1 (R1G1) card has not been quiesced, quiesce it now. From the service terminal Main Service Menu, select: Use the Utility Menu Resource Management Menu 2. Quiesce a Resource Do the following in the listed order to ensure that PPS 1 has been reset properly: a. Switch off the PPS 1 input circuit breaker (CB00). b. Switch off the battery set circuit breaker. c. Disconnect the PPS 1 to PPS 2 communication cable from the PPS 1, J3 connector. d. Verify that the cable between the two halves of the battery set is connected. (battery 1, J2 to battery 2, J1 connectors) e. Verify that both cables between PPS 1 and the battery set are connected. (PPS 1, J5A to battery 1 J1A and PPS 1, J5B to battery 1 J1B connectors) f. Wait 10 seconds. g. Read the following Attention before continuing. Connect the PPS 1 to PPS 2 communication cable to the PPS 1, J3 connector. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS. h. Switch on the battery set circuit breaker.
i. Switch on the PPS 1 input circuit breaker. 3. Resume the RPC-1 (R1G1) card using the menu option in Step 1. 4. Press the 2105 Model 800 operator panel Local power switch momentarily to the On position. 5. Is code 04 still displayed? v Yes, continue with the next step. v No, go to step 9 on page 164. One of the following FRUs is failing: v 390 V battery set (both batteries are replaced at the same time)
Problem Isolation Procedures, CHAPTER 3
163
The 04 status display shows the current condition. It will reset to blank as soon as the failing FRU is replaced. 6. The FRU(s) must be replaced using the service login. See 390 V Battery Set Removal and Replacement, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. v If the FRUs are listed in a problem, use the problem to select the FRUs and continue the repair. v If the FRUs are not listed in a problem, use the Main Service Menu, Repair Menu, Replace a FRU menu options to replace the FRUs. 7. If code 03 is displayed, the battery is charging. Wait up to 30 hours. If code 03 is still displayed, the battery is not being charged. The possible failing FRUs are the PPS, battery set, and battery cables. Use the service terminal Repair Menu, Replace a FRU, Power Cooling FRUs menu options. 8. When the repair is complete go to MAP 1500: Ending a Service Action on page 67. 9. Was the ESC in the problem an 8526, 8528, or 8531? v Yes, continue with the next step. v No, the failure is no longer occurring, exit this MAP and return to the procedure that sent you here. 10. Test the PPS to verify it has recognized the battery and will be able to charge and use it when needed. Switch off the battery set circuit breaker. Is code 04 displayed? v Yes, switch on the battery set circuit breaker, exit this MAP and return to the procedure that sent you here. v No, the PPS does not recognize the battery. Continue with the next step. 11. Replace one or more of the following FRUs. Repeat step 10 after each FRU replacement, until code 04 is displayed, and you can answer Yes to the question. The possible failing FRUs are: v Battery set v PPS-1 v Power and signal cables between PPS-1 and the battery set
Description
The model number of this 2105 Model 800 requires that all PPS have three phase input power. This allows for maximum power output. If single phase input power is used, only 60% of maximum power output is available.
164
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The PPS powered up and detected single phase input power when it should have three phase input power. If the three phase input power had dropped to single phase after power up, a PPS status code 07 would be displayed. 2. Use the service guide install Chapter 5 procedures to check the customer input to the PPS line cords. Use the service terminal Repair Menu, Replace a FRU, Rack Power Cooling FRUs option to prepare the PPS to be powered off for the power checks. The PPS line cord will need to be disconnected from the customer power source. 3. When the problem is repaired go to MAP 1500: Ending a Service Action on page 67.
Description
Each time the 2105 Model 800 operator panel Local power switch is momentarily pressed to on (up), the primary power supply (PPS) status display should display a sequence of 2 characters codes. If it does not. either the PPS is not providing power to its RPC card, the RPC card is not sending a power on request to the PPS or the PPS itself is failing.
165
Rear View
Indicators UEPO PWR UEPO LOOP-STBY PWR GOOD PWR UNIT FAULT ON BATTERY
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Switch the failing PPS input circuit breaker to off. Unplug the PPS to PPS communication cable from the J3 connector. (This removes both power sources from the PPS logic.) 2. Read the following Attention before continuing. Plug the cable back to the J3 connector. Switch the input circuit breaker to on. Attention: Logic voltages are present on the J3 cable from the other PPS. If the PPS J3 connector pins are bent and shorted when the J3 cable is being plugged, the other PPS may drop power. It is not recommended to attempt to straighten the pins as they may easily bend again. Replace the PPS. 3. Observe the PPS UEPO PWR indicator. Is the indicator on solid? v Yes, the PPS has customer line cord input power. Go to the next step. v No, either the customer line cord power is off or the PPS is failing. Use the instructions in Check the Customers Circuit Breaker with the Power On in
166
4.
5.
6.
7.
Description
When a pinned data condition occurs a problem is created. The power control microcode will not allow the 2105 Model 800 to power off until after the pinned data condition is repaired. The attempt to power off also disables all the host system interfaces. The functional code has been stopped so it is not possible to query and repair the condition that caused the pinned data. The 2105 power must be forced off using the operator panel red UEPO switch (causes a firehouse dump). Then it
Problem Isolation Procedures, CHAPTER 3
167
Isolation
1. An attempt to power off the 2105 Model 800 failed because a pinned data condition already existed or was found during the power off destage of data. 2. Force the 2105 to power off. Use the operator panel red UEPO switch. 3. Power the 2105 on using the operator panel white switch 4. Correct the pinned data condition. Go to MAP 4520: Pinned Data and/or Volume Status Unknown on page 417.
Description
The firmware code in both RPC cards is not at the latest level available.
Isolation
1. The firmware installed on both RPC cards is down level from the latest available on the 2105 Model 800 LIC code library. Close the problem that sent you here. LIC Activation requires that there be no open problems needing repair, all problems must be in closed or cancelled state to continue. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Multiple LIC Activation (Concurrent option) Note: Do not use the Licensed Internal Code Maintenance Menu, Firmware LIC menu option. 2. Go to Go to: MAP 1500: Ending a Service Action on page 67.
Description
A tripped PPS output circuit breaker will normally display a status code 13 but in some cases a status code of 10 may appear. An overcurrent condition can cause this. The loads connected to this circuit breaker will be disconnected until the problem is isolated.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Ensure that the circuit breaker (CB) is still tripped. 2. Disconnect the power cable from the connector beneath the tripped CB.
168
Description
The RPC card in the problem is reporting a power fault or event, that cannot be reset.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Use the problem or the Repair Menu, Replace a FRU option to do a dummy repair of the RPC card called out in the problem. This will reset the RPC card and may correct the problem. A dummy repair requires you to remove and reinstall the original RPC card using the normal repair process. Was the dummy repair successful? v Yes, the problem has been fixed, use the Repair Menu, End of Call Status option to complete the service action. v No, the problem is still occurring. Repeat the repair and replace the failing RPC card with a new FRU. If the repair is not successful, call the next level of support. The problem could be a failure external to the RPC card or a microcode problem.
169
Description
The cluster CEC drawer power on is controlled from the I/O drawer through the system power control network (SPCN). The SPCN interface signals are in the VS/COMM and the JTAG cables between I/O and CEC drawers.
Isolation
1. Observe the I/O drawer power indicator LED on the upper left of the CEC drawer operator panel. Is the I/O drawer power indicator LED on solid? v Yes, continue with the next step. v No, go to MAP 4880: Cluster Power On Problem on page 461. 2. Observe the CEC drawer power indicator LED on the front lower left of the CEC drawer. Select the LED condition that applies: v Off solid, go to step 5. v Blinking slowly, CEC drawer is in standby waiting for a power on signal through the SPCN interface. Continue with the next step. v On solid, the CEC drawer is powered on. Exit this MAP. 3. Verify that the VS/COMM and JTAG cables are connected correctly at the CEC drawer and the I/O drawer. Are both cables connected correctly? v Yes, continue with the next step. v No, power off the cluster, connect the cables correctly then power on the cluster: If the CEC drawer powers on, exit this MAP. If the CEC drawer does not power on, return to step 1. 4. Power off the cluster using the service terminal Alternate Cluster Repair Menu options, see Cluster Power On and Off Procedures, 2105 Model 800 in chapter 4 of the Volume 2. Put the CEC drawer in the service position with the cover opened, seeCEC Drawer Service Position Procedure, 2105 Model 800 and CEC Drawer Top Service Access Procedure, 2105 Model 800 in chapter 4 of the Volume 2. Ensure the flat ribbon cable from the fan controller card to the CEC drawer planar assembly is connected correctly. Verify that the flat ribbon cable from the power planar to the CEC planar on the CEC drawer planar assembly is connected correctly. Are both cables connected correctly? v Yes, continue with the next step. v No, connect the cables correctly, close the top cover, and then attempt to power on the cluster: If the CEC drawer powers on, the I/O drawer power indicator is on solid, exit this MAP. If the CEC drawer does not power on, return to step 1. 5. The CEC drawer is not able to create standby power. Observe the CEC drawer power supply input power indicator LEDs.
170
Description
Each CEC drawer and I/O drawer power supply has two power inputs, one from each primary power supply (PPS). Either input can supply all of the input power needed for the drawer power supplies to operate. This allows both CEC and I/O drawer power supplies to operate when one PPS is powered off from a failure or service action.
Isolation
1. Verify that the 2105 is already powered on. Use the following table to find and repair your visual symptom:
171
172
173
Description
Each host bay power supply has two power inputs, one from each primary power supply (PPS). Either input can supply all of the input power needed for the host bay power supplies to operate. This allows both host bay power supplies to operate when one PPS is powered off from a failure or service action.
Isolation
1. Verify that the 2105 is already powered on. Use the following table to find and repair your visual symptom:
Table 30. Host Bay Drawer Visual Power Supply Problems PWR 1 LED Off PWR 2 LED Off HA 1 LED Off HA 2 LED Off Description: Normal when 2105 is powered off. Action: None On On On On Description: Normal when 2105 is powered on. Action: None Description and Action
174
175
Description
The SSA link between two adjoining disk drive modules (DDMs) is failing. The failing link is between two adjoining DDMs, on the same backplane, in the same left or right group of four DDMs. See Figure 55 for the relationship of the DDM and backplane FRUs involved with this failure. v DDM locations in DDM bay, two adjoining DDMs in DDM bay positions 1 to 8
176
DDM
DDM
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Review if any other problems (pending or open) have a single DDM as the FRU. Besides the problem you are working on, are there any other pending or open problems with a single DDM as the FRU? v Yes, go to step 3. v No, go to step 4. 3. Compare the single DDM FRU in the pending or open problem with the DDMs in the problem you are working on. Is the DDM in the open or pending problem the same as one of the DDMs in the problem you are working on? v Yes, repair the problem with the single DDM FRU first, it should fix the problem you are working on. v No, go to step 4. 4. Replace the first of the two DDMs displayed on the service terminal, then verify the repair. Note: If the amber check indicator on one of the two DDMs is on, replace that DDM first, see DDM Bay Disk Drive Module Indicators on page 23. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 5. 5. Replace the second DDM displayed on the service terminal with the DDM removed in step 4, then verify the repair. Note: The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to completed. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time can be many hours. Sparing time varies with system usage and the storage capacity of the DDM being spared.
Problem Isolation Procedures, CHAPTER 3
177
Description
The 40 MBs per second SSA link, between two adjoining disk drive modules (DDMs) is degraded and is running at 20 MBs per second. The degraded link is between two adjoining DDMs, on the same backplane. See Figure 56 for the relationship of the DDM and backplane FRUs involved with this failure. v DDM locations in DDM bay, two adjoining DDMs in DDM bay positions 1 to 8
DDM
DDM
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Replace the first of the two DDMs displayed on the service terminal, then verify the repair.
178
MAP 3050: Isolating an SSA Link Error Between a DDM and an SSA Device Card
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
An SSA link failed between a DDM and the SSA device card. The failing FRU is either a center DDM, a passthrough or bypass card, a SSA device cable, or an SSA device card. See Figure 57 for the relationship of the DDM, passthrough or bypass card, backplane, SSA device cable and SSA device card FRUs involved with this failure.
179
MAP 3050: SSA Link Error Between a DDM and an SSA Device Card
SSA Device Cable SSA Device Cable Bypass Card SSA Device Card
Passthrough Card
Passthrough Card
DDM
DDM Bay - B
Figure 57. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008041l)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Review if any other problems (pending or open) have a single DDM as the FRU. Besides the problem you are working on, are there any other pending or open problems with a single DDM as the FRU? v Yes, go to step 3. v No, go to step 4. 3. Compare the single DDM FRU in the pending or open problem with the DDM in the problem you are working on. Is the DDM in the open or pending problem the same as the DDM in the problem you are working on? v Yes, repair the open or pending problem with the single DDM FRU first, it should fix the problem you are working on. v No, go to step 4. 4. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA and cables are connected correctly, go to step 5. v No, continue with step 7 on page 181. 5. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. See Locating an SSA Cable. Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors, go to step 6. v No, go to step 7 on page 181. 6. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable you just connected correctly. Proceed
180
MAP 3050: SSA Link Error Between a DDM and an SSA Device Card
through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, go to step 7. 7. Locate the SSA cables displayed on the service terminal as possible FRUs. For this isolation procedure, one of the the SSA cables is connected between a DDM bay and an SSA device card. The other SSA cable is connected between the same DDM bay and another DDM bay. The service terminal will identify the DDM bay and its SSA connector, and the SSA device card and its SSA connector. Note: To locate a DDM bay see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 58. To locate an SSA device card cable connector, see Figure 59 on page 182.
Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2 Use the figure below to locate an SSA device card cable connector.
181
MAP 3050: SSA Link Error Between a DDM and an SSA Device Card
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
a. Disconnect one of the two SSA device cables shown in Figure 57 on page 180, and listed in the Problem FRU list. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Reconnect both ends of the SSA device cable, ensure good connection. c. Run the repair verification. Select one cable from the Problem FRU list and follow the repair process and verification without actually replacing the cable. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, select one of the following. If you have inspected only one cable, repeat the above steps on the second cable, If you have inspected both cables, go to step 8. 8. Locate DDM bay A, it may be in the front or rear of the 2105. Observe all of the DDM bay, DDM Ready and Check indicators. See Figure 60 on page 183. Are any of the DDM bay DDM indicators on? v Yes, go to step 9. v No, there is a DDM bay power problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261. 9. Locate DDM bay B, it may be in the front or rear of the 2105. Observe all of the DDM bay, DDM Ready and Check indicators. Are any of the DDM bay DDM indicators on? v Yes, go to step 10 on page 183.
182
MAP 3050: SSA Link Error Between a DDM and an SSA Device Card
v No, there is a DDM bay power problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261.
10. Replace the DDM displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 11. 11. Replace SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 12. 12. Replace the passthrough cards displayed on the service terminal. Replace these cards one at a time, see Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. After each card is replaced, verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, If all of the cards shown in Figure 57 on page 180, have been replaced, go to step 13. If all of the cards shown in Figure 57 on page 180, have NOT been replaced, repeat this step until all of the cards have been replaced. 13. Replace one of the two SSA device cables displayed on the service terminal FRU list, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing. If both of the SSA device cables shown in Figure 57 on page 180, have been replaced, go to step 14 on page 184.
Problem Isolation Procedures, CHAPTER 3
183
MAP 3050: SSA Link Error Between a DDM and an SSA Device Card
If both of the SSA device cables shown in Figure 57 on page 180, have NOT been replaced, repeat this step until all of the cables have been replaced. 14. Replace the DDM bay frames displayed on the service terminal, one at a time: v DDM bay Frame assembly, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing. If all of the backplanes shown in Figure 57 on page 180, have been replaced, the SSA link is still failing, call the next level of support. If all of the backplanes shown in Figure 57 on page 180, have NOT been replaced, repeat this step until all of the backplanes have been replaced.
MAP 3060: Isolating a Degraded SSA Link Between a DDM and an SSA Device Card
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
A 40 MBs per second SSA link is degraded and is running at 20 MBs per second, between a DDM and the SSA device card. The degraded FRU is either a center DDM, a passthrough or bypass card, a SSA device cable, or an SSA device card. See Figure 61 for the relationship of the DDM, passthrough or bypass card, backplane, SSA device cable and SSA device card FRUs involved with this failure. v DDM bay A v DDM bay B
SSA Device Cable SSA Device Cable Bypass Card SSA Device Card
Passthrough Card
Passthrough Card
DDM
DDM Bay - B
Figure 61. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008041l)
184
MAP 3060: Degraded SSA Link Between a DDM and an SSA Device Card
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the SSA cables displayed on the service terminal as possible FRUs. For this isolation procedure, one of the SSA cables is connected between a DDM bay and an SSA device card. The other SSA cable is connected between the same DDM bay and another DDM bay. The service terminal will identify the DDM bay and its SSA connector, and the SSA device card and its SSA connector. Note: To locate a DDM bay see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 62. To locate an SSA device card cable connector, see Figure 63 on page 186.
Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2 Use the figure below to locate an SSA device card cable connector.
185
MAP 3060: Degraded SSA Link Between a DDM and an SSA Device Card
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
a. Disconnect one of the two SSA device cables shown in Figure 61 on page 184, and listed in the Problem FRU list. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. There should be six pins in each plug. If there are less than six pins, replace the cable. Reconnect both ends of the SSA device cable, ensure good connection. c. Run the repair verification. Select one cable from the Problem FRU list and follow the repair process and verification without actually replacing the cable. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded: If you have inspected only one cable, repeat the above steps on the second cable, If you have inspected both cables, go to step 3. 3. Replace the passthrough and bypass cards displayed on the service terminal. Replace these cards one at a time, see Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. After each card is replaced, verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded: If all of the cards shown in Figure 61 on page 184, have been replaced, go to step 4 on page 187.
186
MAP 3060: Degraded SSA Link Between a DDM and an SSA Device Card
If all of the cards shown in Figure 61 on page 184, have NOT been replaced, repeat this step until all of the cards have been replaced. 4. Replace one of the two SSA device cables displayed on the service terminal FRU list, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded: If both of the SSA device cables shown in Figure 61 on page 184, have been replaced, go to step 5. If both of the SSA device cables shown in Figure 61 on page 184, have NOT been replaced, repeat this step until all of the cables have been replaced. 5. Replace the DDM displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to step 6. 6. Replace SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to step 7. 7. Replace the DDM bay frames displayed on the service terminal, one at a time: v DDM bay Frame assembly, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded: If all of the backplanes shown in Figure 61 on page 184, have been replaced, the SSA link is still degraded, call the next level of support. If all of the backplanes shown in Figure 61 on page 184, have NOT been replaced, repeat this step until all of the backplanes have been replaced.
MAP 3077: Isolating an SSA Link Error Between a DDM and two SSA Device Cards
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Problem Isolation Procedures, CHAPTER 3
187
MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards
Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
An SSA link between a DDM and two SSA device cards is failing. The failing link includes two SSA device cards, one bypass card, one passthrough card, three SSA cables, and the DDM bay backplane. See Figure 64 for the relationship of these FRUs. The failure or incorrect connection of any of these components can cause the link to fail. Other failures can also cause the link to fail. For example, a hot reset line to the SSA device card can cause the connection between the two loop inputs to appear to be open.
Figure 64. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008141l)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Write the following information on a piece of paper. a. The Problem ID of this problem. b. The number of the failing cluster, cluster 1 or 2. c. The number of the other cluster: v If cluster 1 is the failing cluster, record the other cluster as cluster 2. v If cluster 2 is the failing cluster, record the other cluster as cluster 1. 3. Press F3 on the service terminal to list other problems. Are there any other problems whose Failing Cluster is the other cluster written down in step 2c? v Yes, repair and verify them now. Repairing these problems may correct this problem. After repair verification, continue with the next step. v No, continue with step 5 on page 189 4. Did the repair of the other problems resolve the problem recorded in the last step (problem ID not displayed)? v Yes, this problem is resolved. v No, continue with the next step.
188
MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards
5. Return to the original problem. Select one of the SSA device cards from the Possible FRU to Replace list. Continue through the repair and verify process but do not replace any FRU. Did the verification test run without error? v Yes, the problem is resolved. This problem was caused by a condition that has now been resolved. v No, continue with the next step. 6. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA cables are connected correctly, continue with the next step. v No, continue with step 9 on page 190. 7. Verify that the SSA cables are connected correctly. Locate all of the three SSA cables displayed by the service terminal as possible FRUs. These SSA cables will each be connected between a DDM bay and an SSA device card. The service terminal FRU Location will identify the DDM bay and SSA connector where each end of the SSA cable is connected. Note: To locate the DDM bay see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 65. To locate an SSA device card cable connector, see Figure 66 on page 190.
Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2
189
MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
Figure 66. Cluster SSA Device Card SSA Connector Locations (s009166)
Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors, continue with the next step. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group. v No, go to step 9. 8. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable you just connected correctly. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, continue with the next step. 9. Locate the DDM bay, it may be located in the front or rear of the 2105. Observe all of the DDM bay DDM Ready and Check indicators. Are any of the DDM bay DDM indicators on? v Yes, go to step 10 on page 191. v No, there is a DDM bay problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261.
190
MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards
10. Replace the DDM displayed on the service terminal, then verify the repair. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. 11. Replace one of the SSA device cards displayed on the service terminal, then verify the repair. See SSA Service Card, Cluster in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. 12. Replace the other SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. 13. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
191
MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 68. DDM bay Bypass Card Jumper Settings (s009436)
14.
15.
16.
17.
Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. Replace the passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, continue with the next step. Replace the first SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 7 on page 189. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to the next step. Replace the second SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 7 on page 189. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to the next step. Replace the third SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 7 on page 189. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.
v No, the SSA link is still failing, go to the next step. 18. Replace the backplane in the DDM bay, then verify the repair: See Frame Assembly, DDM Bay in chapter 4 of the Volume 2.
192
MAP 3077: SSA Link Error Between a DDM and two SSA Device Cards
Note: For a DDM bay, the backplanes are replaced by replacing the frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, call the next level of support.
MAP 3078: Isolating a Degraded SSA Link Between a DDM and Two SSA Device Cards
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
A 40 MBs per second SSA link between a DDM and two SSA device cards is degraded and is running at 20 MBs per second. The degraded link includes two SSA device cards, one bypass card, one passthrough card, three SSA cables, and the DDM bay backplane. See Figure 69 for the relationship of these FRUs. The failure or incorrect connection of any of these components can cause the link to run at a slower speed.
Figure 69. SSA Link Failure, Passthrough and Bypass Card Link Between a DDM and SSA Device Card (S008141l)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate all of the three SSA cables displayed by the service terminal as possible FRUs. These SSA cables will each be connected between a DDM bay and an SSA device card. The service terminal FRU Location will identify the DDM bay and SSA connector where each end of the SSA cable is connected.
193
MAP 3078: Degraded SSA Link Between a DDM and Two SSA Device Cards
Note: To locate the DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 70. To locate an SSA device card cable connector, see Figure 71.
Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
Figure 71. Cluster SSA Device Card SSA Connector Locations (s009166)
Disconnect both ends of each of these SSA cables. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group.
194
MAP 3078: Degraded SSA Link Between a DDM and Two SSA Device Cards
Inspect the cable connectors for bent pins and correct any problems found. There should be three pins in each plug. If there are less than three pins, replace the cable. Reconnect both ends of the SSA device cables, ensure good connection. Continue with the next step. 3. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any of the cables. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, continue with the next step. 4. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 72. DDM bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. 5. Replace the passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. 6. Replace the first SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 2 on page 193. Did repair verification run without error?
Problem Isolation Procedures, CHAPTER 3
195
MAP 3078: Degraded SSA Link Between a DDM and Two SSA Device Cards
v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to the next step. Replace the second SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 2 on page 193. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to the next step. Replace the third SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 2 on page 193. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, go to the next step. Replace the DDM displayed on the service terminal, then verify the repair. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. Replace one of the SSA device cards displayed on the service terminal, then verify the repair. See SSA Service Card, Cluster in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. Replace the other SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still degraded, continue with the next step. Replace the backplane in the DDM bay, then verify the repair: See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: For a DDM bay, the backplanes are replaced by replacing the frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.
7.
8.
9.
10.
11.
12.
196
MAP 3078: Degraded SSA Link Between a DDM and Two SSA Device Cards
v No, the SSA link is still degraded, call the next level of support.
MAP 3085: Isolating an SSA Link Error Between Two SSA Device Cards Connected Through a DDM Bay
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
An SSA link failed between two SSA device cards connected through a DDM bay. The failing FRU is one of the FRUs displayed in the FRU list. See Figure 73 for the relationship of these FRUs.
Figure 73. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S007649l)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA cables are connected correctly, go to step 3. v No, continue with step 5 on page 198. 3. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. See Locating an SSA Cable. Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors, go to step 4. v No, go to step 5 on page 198. 4. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable you just connected correctly. Proceed
197
MAP 3085: SSA Link Error Between Two SSA Device Cards Through a DDM Bay
through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, go to step 5. 5. Locate the two SSA cables displayed on the service terminal as possible FRUs. For this isolation procedure, the SSA cables will be connected between a DDM bay and SSA device cards. The service terminal will identify the DDM bays and their SSA connectors, and the SSA device cards and their SSA connectors. Note: To locate a DDM bay see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 74. To locate an SSA device card cable connector, see Figure 75 on page 199.
Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2
198
MAP 3085: SSA Link Error Between Two SSA Device Cards Through a DDM Bay
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
Figure 75. Cluster SSA Device Card SSA Connector Locations (s009166)
a. Disconnect the SSA device cable from the cluster SSA device card and the DDM bay. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Reconnect both ends of the SSA device cable, ensure good connection. c. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, go to step 6. 6. Locate DDM bay, it may be located in the front or rear of the 2105. Observe all of the DDM bay DDM and card indicators. Are any of the DDM bay indicators on? v Yes, go to step 7. v No, there is a DDM bay problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261. 7. Replace the first SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 8 on page 200.
199
MAP 3085: SSA Link Error Between Two SSA Device Cards Through a DDM Bay
8. Replace the other SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 9. 9. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 76. DDM bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 10. 10. Replace the passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, go to step 11. 11. Replace the SSA device cables displayed on the service terminal one at a time, then verify each repair. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, of you have not replaced the other cable, replace it and verify the repair. If both cables have been replaced, and the SSA link is still failing, go to step 12. 12. Replace the frame (DDM bay) assembly displayed on the service terminal:
200
MAP 3085: SSA Link Error Between Two SSA Device Cards Through a DDM Bay
v DDM bay Frame assembly, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the SSA link is still failing, call the next level of support.
MAP 3086: Isolating a Degraded SSA Link Between Two SSA Device Cards Connected Through a DDM Bay
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
A 40 MBs per second SSA link between two SSA device cards connected through a DDM bay is degraded and is running at 20 MBs per second. The degraded FRU is one of the FRUs displayed in the FRU list. See Figure 77 for the relationship of these FRUs.
Figure 77. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S007649l)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the two SSA cables displayed on the service terminal as possible FRUs. For this isolation procedure, the SSA cables will be connected between a DDM bay and SSA device cards. The service terminal will identify the DDM bays and their SSA connectors, and the SSA device cards and their SSA connectors.
201
MAP 3086: Degraded SSA Link Between Two SSA Device Cards Through a DDM Bay
Note: To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 78. To locate an SSA device card cable connector, see Figure 79.
Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
Figure 79. Cluster SSA Device Card SSA Connector Locations (s009166)
a. Disconnect the SSA device cables from the cluster SSA device cards and the DDM bay. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group.
202
MAP 3086: Degraded SSA Link Between Two SSA Device Cards Through a DDM Bay
b. Inspect the cable connectors for bent pins and correct any problems found. Each connector should have three pins. If there are less than three pins, replace the cable. Reconnect both ends of the SSA device cables, ensure good connection. c. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9 on page 204. v No, continue with the next step. 3. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 80. DDM bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Go to step 9 on page 204. v No, the SSA link is still degraded, continue with the next step. 4. Replace the passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9 on page 204. v No, the SSA link is still degraded, continue with the next step. 5. Replace the SSA device cables displayed on the service terminal one at a time, then verify each repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9 on page 204. v No, if you have not replaced the other cable, replace it and verify the repair. If both cables have been replaced, and the SSA link is still degraded, go to step 6. 6. Replace the first SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error?
Problem Isolation Procedures, CHAPTER 3
203
MAP 3086: Degraded SSA Link Between Two SSA Device Cards Through a DDM Bay
v Yes, the problem is resolved. Go to step 9. v No, the SSA link is still degraded, continue with the next step. 7. Replace the other SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9. v No, the SSA link is still degraded, continue with the next step. 8. Replace the frame (DDM bay) assembly displayed on the service terminal: v DDM bay Frame assembly, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 9. v No, the SSA link is still degraded, call the next level of support. 9. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.
MAP 3095: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays and an SSA Device Card
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
An SSA link between two DDMs is failing. The DDMs are in separate DDM bays. The failing link goes through two passthrough cards, a bypass card, SSA cable(s), and possibly an SSA device adapter. See Figure 81 for the relationship of these FRUs. The failure or incorrect connection of any of these components can cause the link to fail. Other failures can also cause the link to fail. For example, a hot reset line to the SSA device card can cause the connection between the two loop inputs to appear to be open.
204
MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
SSA Device Cable
Bypass Card
DDM
Passthrough Card
Passthrough Card
Figure 81. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (S008140l)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Write the following information on a piece of paper. a. The Problem ID of this problem. b. The number of the failing cluster, cluster 1 or 2. c. The number of the other cluster: v If cluster 1 is the failing cluster, record the other cluster as cluster 2. v If cluster 2 is the failing cluster, record the other cluster as cluster 1. 3. Press F3 on the service terminal to list other problems. Are there any other problems whose Failing Cluster is the other cluster written down in step 2c? v Yes, repair and verify them now. Repairing these problems may correct this problem. After repair verification, continue with the next step. v No, go to step 6. 4. Did the repair of the other problems resolve the problem recorded in the last step (problem ID not displayed)? v Yes, this problem is resolved. v No, continue with the next step. 5. Return to the original problem. Select the SSA device card from the Possible FRU to Replace list. Continue through the repair and verify process but do not replace any FRU. Did the verification test run without error? v Yes, the problem is resolved. This problem was caused by another problem that has now been resolved. v No, continue with the next step. 6. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed?
205
MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
v Yes, verify that the SSA cables are connected correctly, continue with the next step. v No, continue with step 11 on page 207. 7. Locate the SSA cables displayed on the service terminal as possible FRUs. One of these SSA cables will be connected between two separate DDM bays. The service terminal will identify the DDM bay and SSA connector that each end of the SSA cable is connected to. Note: To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 82. Is the SSA cable connected to the correct connectors? v Yes, continue with the next step. v No, connect the cable correctly. Continue with the next step. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. After the cable is connected correctly, go to step 10 on page 207. 8. Disconnect both ends of the SSA device cable. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. Inspect the cable connectors for bent pins and correct any problems found. Reconnect both ends of the SSA device cable, ensure good connection. Continue with the next step.
9. Locate the two remaining SSA cables in the Possible FRU list. These SSA cable will be connected between a DDM bay and an SSA device card. The service terminal will identify the DDM bay and its SSA connector, and the SSA device card and its SSA connector. Locate the DDM bay end of the SSA cable, see the instructions in step 7. Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2
206
MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
To locate an SSA device card cable connector, see Figure 83. Are the SSA cables connected to the correct connectors? v Yes, step 11. v No, connect the cable correctly. After the cable is connected correctly, go to step 10.
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
Figure 83. Cluster SSA Device Card SSA Connector Locations (s009166)
10. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any cable in the Possible FRUs to Replace list. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, continue with the next step. 11. Replace the SSA device card displayed on the service terminal then verify the repair See SSA Service Card, Cluster in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 12. Replace the first of the two DDMs displayed on the service terminal, then verify the repair. Note: If the amber check indicator on one of the two DDMs is on, replace that DDM first, see DDM Bay Disk Drive Module Indicators on page 23. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 13. Replace the second DDM displayed on the service terminal with the DDM removed in the last step, then verify the repair. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2.
Problem Isolation Procedures, CHAPTER 3
207
MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
Note: It may take many hours before the second DDM can be replaced. The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to complete. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time for 18 MB DDMs can be up to 36 hours. Sparing time varies with system usage and the storage capacity of the DDM being spared. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 14. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 84. DDM bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 15. Replace the first passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 16. Replace the second passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Use the card removed in the last step. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22 on page 209. v No, the SSA link is still failing, continue with the next step. 17. Replace the SSA device cable that connects the two DDM bays. This cable is displayed in the FRU list on the service terminal. To locate the cable, see step 7 on page 206. Did repair verification run without error?
208
MAP 3095: SSA Link Error Between Two Disk Drive Modules in Separate DDM Bays and an SSA Device Card
v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, continue with the next step. 18. Replace the second SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 9 on page 206. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, continue with the next step. 19. Replace the third SSA device cable displayed on the FRU list on the service terminal. To locate the cable, see step 9 on page 206. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, continue with the next step. 20. Replace the frame assembly (backplane) in DDM bay A, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? Note: For DDM bays, the backplanes are replaced by replacing the frame assembly. v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, continue with the next step. 21. Replace the backplane in DDM bay B, then verify the repair: v DDM bay see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: For DDM bays, the backplanes are replaced by replacing the frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 22. v No, the SSA link is still failing, call the next level of support. 22. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.
MAP 3096: Isolating a Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
A 40 MBs per second SSA link between two DDMs is degraded and is running at 20 MBs per second. The DDMs are in separate DDM bays. The degraded link goes through two passthrough cards, a bypass card, and an SSA cable. See Figure 85 for the relationship of these FRUs. The degradation of any of these components can cause the link to run slower.
209
MAP 3096: Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card
SSA Device Cable
Bypass Card
DDM
Passthrough Card
Passthrough Card
Figure 85. SSA Link Degraded, Two Passthrough and Bypass Card Link Between Two DDMs (S008384l)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the SSA cable displayed on the service terminal as possible FRU. This SSA cable will be connected between two separate DDM bays. The service terminal will identify the DDM bay and SSA connector that each end of the SSA cable is connected to. To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 86. Continue with the next step. 3. Disconnect both ends of the SSA device cable. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. Inspect the cable connectors for bent pins and correct any problems found. Each connector should have three pins. If there are less than three pins, replace the cable. Reconnect both ends of the SSA device cable, ensure good connection. Continue with the next step.
4. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any cable in the Possible FRUs to Replace list.
210
MAP 3096: Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card
Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, continue with the next step. 5. Replace the bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 87. DDM bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, the SSA link is still degraded, continue with the next step. 6. Replace the first passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, the SSA link is still degraded, continue with the next step. 7. Replace the second passthrough card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Use the card removed in the last step. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, the SSA link is still degraded, continue with the next step. 8. Replace the SSA device cable that connects the two DDM bays. This cable is displayed in the FRU list on the service terminal. To locate the cable, see step 2 on page 210. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13 on page 212. v No, the SSA link is still degraded, continue with the next step. 9. Replace the first of the two DDMs displayed on the service terminal, then verify the repair.
Problem Isolation Procedures, CHAPTER 3
211
MAP 3096: Degraded SSA Link Between Two DDMs in Separate DDM Bays and an SSA Device Card
Did repair verification run without error? v Yes, the problem is resolved. Go to step 13. v No, the SSA link is still degraded, continue with the next step. 10. Replace the second DDM displayed on the service terminal with the DDM removed in the last step, then verify the repair. See SSA Disk Drive Module, DDM Bay in chapter 4 of the Volume 2. Note: It may take many hours before the second DDM can be replaced. The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to complete. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time for 18 MB DDMs can be up to 36 hours. Sparing time varies with system usage and the storage capacity of the DDM being spared. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13. v No, the SSA link is still degraded, continue with the next step. 11. Replace the frame assembly (backplane) in DDM bay A, see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? Note: For DDM bays, the backplanes are replaced by replacing the frame assembly. v Yes, the problem is resolved. Go to step 13. v No, the SSA link is still degraded, continue with the next step. 12. Replace the backplane in DDM bay B, then verify the repair: v DDM bay see Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: For DDM bays, the backplanes are replaced by replacing the frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 13. v No, the SSA link is still degraded, call the next level of support. 13. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.
MAP 3100: Isolating an SSA Link Error Between Two DDMs in Separate DDM Bays
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
The SSA link between two DDMs is failing. The failing link is between two DDMs, in different DDM bays, two passthrough and or bypass cards and the SSA cable that links them. See Figure 88 for the relationship of the DDM, passthrough and or bypass card, and backplane FRUs involved with this failure.
212
MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays
DDM locations in DDM bays v DDM 1 or 8
DDM Bay-A
DDM
DDM
DDM Bay-B
Figure 88. SSA Link Failure, Passthrough/Bypass Cards and Two DDMs (s009437)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Review if any other problems (pending or open) have a single DDM as the FRU. Besides the problem you are working on, are there any other pending or open problems with a single DDM as the FRU? v Yes, go to step 3. v No, go to step 4. 3. Compare the single DDM FRU in the pending or open problem with the DDMs in the problem you are working on. Is the DDM in the open or pending problem the same as one of the DDMs in the problem you are working on? v Yes, repair the problem with the single DDM FRU first, it should fix the problem you are working on: If the problem is resolved, go to step 17 on page 217. If the problem is not resolved, continue with the next step. v No, go to step 4. 4. Determine if the SSA cables to the failing DDM bays have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA cables are connected correctly, go to step 5. v No, continue with step 7 on page 214. 5. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. See Locating an SSA Cable. Are any of the cables connected wrong?
213
MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays
v Yes, Connect the cables to the correct connectors, go to step 6. v No, go to step 7. 6. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select the cable you just connected correctly. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, go to step 7. 7. Locate DDM bay-A, it may be located in the front or rear of the 2105. Observe all of the DDM bay DDM indicators, see Figure 89. Are any of the DDM bay indicators on? v Yes, go to step 8. v No, there is a DDM bay power problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261.
8. Locate DDM bay-B, it may be located in the front or rear of the 2105. Observe all of the DDM bay DDM indicators, see Figure 89. Are any of the DDM bay indicators on? v Yes, go to step 9. v No, there is a DDM bay power problem, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261. 9. Locate the SSA cable displayed on the service terminal as a possible FRU. For this isolation procedure, the SSA cable will be connected between two separate DDM bays. The service terminal FRU Location will identify the DDM bay and SSA connector to which each end of the SSA cable is connected. To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. Use the drawing below to locate SSA cable connectors on a DDM bay. Select the cable shown on the service terminal for repair.
214
MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays
a. Disconnect the SSA device cable between the two DDM bays. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Reconnect both ends of the SSA device cable, ensure good connection. c. Run the repair verification. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any FRU. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, go to step 10. 10. Replace the first of the two DDMs displayed on the service terminal, then verify the repair. Note: If the amber check indicator on one of the two DDMs is on, replace that DDM first, see DDM Bay Disk Drive Module Indicators on page 23. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, the SSA link is still failing, go to step 11. 11. Replace the second DDM displayed on the service terminal with the DDM removed in step 10, then verify the repair. Note: The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to completed. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time can be many hours. Sparing time varies with system usage and the storage capacity of the DDM being spared. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, the SSA link is still failing, go to step 12. 12. Replace the first of the two passthrough or bypass cards displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2.
215
MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays
Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 91. DDM bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, the SSA link is still failing, go to step 13. 13. Replace the second passthrough or bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Use the card removed in step 12 on page 215. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 92. DDM bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. v No, the SSA link is still failing, go to step 14. 14. Replace the SSA device cable displayed on the service terminal, see SSA Cables, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17 on page 217. v No, the SSA link is still failing, go to step 15 on page 217.
216
MAP 3100: SSA Link Error Between Two DDMs in Separate DDM Bays
15. Replace the backplane in DDM bay-A, see MAP 3400: Replacing a DDM Bay Frame Assembly on page 266. Note: For DDM bays, the backplanes are replaced by replacing the frame (DDM bay) assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17. v No, the SSA link is still failing, go to step 16. 16. Replace the backplane in DDM Bay-B, see MAP 3400: Replacing a DDM Bay Frame Assembly on page 266, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 17. v No, the SSA link is still failing, call the next level of support. 17. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.
MAP 3101: Isolating a Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
The 40 MBs per second SSA link between two DDMs is degraded and is running at 20 MBs per second. The degraded link is between two DDMs, in different DDM bays, two passthrough and/or bypass cards and the SSA cable that links them. See Figure 93 for the relationship of the DDM, passthrough and or bypass card, and backplane FRUs involved with this failure. DDM locations in DDM bays: v Both are DDM 8
DDM Bay-A
DDM
DDM
DDM Bay-B
Figure 93. SSA Link Failure, Passthrough/Bypass Cards and Two DDMs (s009437)
Isolation
1. Read this Attention before replacing any FRUs in this MAP:
217
MAP 3101: Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays
Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the SSA cable displayed on the service terminal as a possible FRU. For this isolation procedure, the SSA cable will be connected between two separate DDM bays. The service terminal FRU Location will identify the DDM bay and SSA connector to which each end of the SSA cable is connected. To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. Use the drawing below to locate SSA cable connectors on a DDM bay. Select the cable shown on the service terminal for repair.
a. Disconnect the SSA device cable between the two DDM bays. Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Disconnect both ends of each of these SSA cables. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group. c. Inspect the cable connectors for bent pins and correct any problems found. There should be six pins in each plug. If there are less than six pins, replace the cable. Reconnect both ends of the SSA device cable, ensure good connection. Reconnect both ends of the SSA device cable, ensure good connection. d. Run the repair verification. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any FRU. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10 on page 220. v No, continue with the next step. 3. Replace the first of the two passthrough or bypass cards displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2.
218
MAP 3101: Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays
Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 95. DDM bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Go to step 10 on page 220. v No, the SSA link is still degraded, continue with the next step. 4. Replace the second passthrough or bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Use the card removed in step 3 on page 218. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 96. DDM bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Go to step 10 on page 220. v No, the SSA link is still degraded, continue with the next step. 5. Replace the SSA device cable displayed on the service terminal, see SSA Cables, DDM Bay in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10 on page 220. v No, the SSA link is still degraded, go to step 6 on page 220.
219
MAP 3101: Degraded SSA Link Between Two Between Two DDMs in Separate DDM Bays in Separate DDM Bays
6. Replace the first of the two DDMs displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10. v No, the SSA link is still degraded, continue with the next step. 7. Replace the second DDM displayed on the service terminal with the DDM removed in step 6. then verify the repair. Note: The service terminal will determine if the second DDM being replaced is in the same array as the first DDM. If both DDMs are in the same array, the service terminal will instruct you to wait for sparing to completed. When sparing for the first DDM replacement completes, the second DDM can be replaced. DDM sparing time can be many hours. Sparing time varies with system usage and the storage capacity of the DDM being spared. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10. v No, the SSA link is still degraded, continue with the next step. 8. Replace the backplane in DDM Bay-A, see MAP 3400: Replacing a DDM Bay Frame Assembly on page 266. Note: For DDM bays, the backplanes are replaced by replacing the frame (DDM bay) assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10. v No, the SSA link is still degraded, continue with the next step. 9. Replace the backplane in DDM Bay-B, see MAP 3400: Replacing a DDM Bay Frame Assembly on page 266, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 10. v No, the SSA link is still degraded, call the next level of support. 10. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.
Description
An SSA link failed between a DDM and the SSA device card. See Figure 97 for the relationship of the DDM, passthrough or bypass card, backplane, SSA device cable and SSA device card FRUs involved with this failure.
220
DDM
Figure 97. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (s009438)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Review if any other problems (pending or open) have a single DDM or SSA device card as the FRU. Besides the problem you are working on, are there any other pending or open problems with a single DDM as the FRU? v Yes, go to step 3. v No, go to step 4. 3. Compare the single DDM or SSA device card FRU in the pending or open problem with the DDM in the problem you are working on. Is the FRU in the open or pending problem the same as the FRU in the problem you are working on? v Yes, repair the open or pending problem with the single FRU first, it should fix the problem you are working on. v No, go to step 4. 4. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, verify that the SSA cables are connected correctly, go to step 5. v No, continue with step 7 on page 222. 5. Verify that the SSA cables are connected correctly. Look at the SSA cables displayed on the Detail Problem screen. Compare the SSA cables displayed with the cabling of the DDM bay or DDM bay. See Locating an SSA Cable in chapter 7 of the Volume 3. Are any of the SSA cables connected wrong?
221
8. Intermittent SSA link errors, can reopen problems that were corrected earlier with a successful FRU replacement. In steps 9 through 13 on page 223, skip any step that is the same as an earlier step that had a successful repair. Continue with the next step. 9. Replace the DDM displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14 on page 223. v No, the SSA link is still failing, go to step 10. 10. Replace SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14 on page 223. v No, the SSA link is still failing, go to step 11. 11. Replace the passthrough or bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2.
222
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 99. DDM Bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Go to step 14. v No, the SSA link is still failing, go to step 12. 12. Replace the SSA device cable displayed on the service terminal probable FRU list, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14. v No, the SSA link is still failing, go to step 13. 13. Replace the DDM bay frame assembly displayed on the service terminal. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14. v No, the SSA link is still failing, call the next level of support. 14. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.
Description
A 40 MBs per second SSA link between a DDM and the SSA device card is degraded and is running at 20 MBs per second. See Figure 100 for the relationship of the DDM, passthrough or bypass card, backplane, SSA device cable and SSA device card FRUs involved with this degraded link.
223
DDM
Figure 100. SSA Link Failure, Passthrough or Bypass Card Link Between a DDM and SSA Device Card (s009438)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the SSA cable displayed on the service terminal as a possible FRU. For this isolation procedure, the SSA cable will be connected between a DDM bay and an SSA device card. The service terminal FRU Location will identify the DDM bay and its SSA connector, and the SSA device card and its SSA connector. Note: To locate a DDM bay, see Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. To locate SSA cable connectors on a DDM bay, see Figure 101. To locate an SSA device card cable connector, see Figure 101 and Figure 102 on page 225.
Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar
224
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
Figure 102. Cluster SSA Device Card SSA Connector Locations (s009166)
a. Disconnect the SSA device cable from the SSA device card and the DDM bay Note: To prevent damage to cables with plastic connector screws, always use the special screwdriver (SSA tool, P/N 32H7059). This screwdriver is in the 2105 ship group. b. Inspect the cable connectors for bent pins and correct any problems found. Each connector should have three pins. If there are less than three pins, replace the cable. Reconnect both ends of the SSA device cable, ensure good connection. c. Run the repair verification, go to the Problem Detail screen on the service terminal. Select any FRU for replacement, go through the repair and verification procedure but do not remove or replace any FRU. This will verify if the problem is resolved. Did repair verification run without error? v Yes, the problem is resolved. Go to step 8 on page 226. v No, continue with the next step. 3. Replace the passthrough or bypass card displayed on the service terminal, then verify the repair. See Bypass and Passthrough Cards, DDM Bay in chapter 4 of the Volume 2. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
225
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 103. DDM Bay Bypass Card Jumper Settings (s009436)
Did repair verification run without error? v Yes, the problem is resolved. Go to step 8. v No, the SSA link is still degraded, continue with the next step. 4. Replace the DDM displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 8. v No, the SSA link is still degraded, continue with the next step. 5. Replace SSA device card displayed on the service terminal, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 8. v No, the SSA link is still degraded, continue with the next step. 6. Replace the SSA device cable displayed on the service terminal probable FRU list, then verify the repair. Did repair verification run without error? v Yes, the problem is resolved. Go to step 8. v No, the SSA link is still degraded, continue with the next step. 7. Replace the DDM bay frame assembly displayed on the service terminal. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Go to step 14 on page 223. v No, the SSA link is still failing, call the next level of support. 8. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem.
226
Description
This failure indicates that a DDM failure occurred during an array build. The array needs to be rebuilt.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. Repair any other problems before continuing with this MAP. Display the problem and record the information with the FRU Engineering Name. This information should be rank## or ssa## with ## being a one or two digit number. Record the SRN and the rank or SSA number, then call your next level of support. They will help you and the system operator through the array disband and rebuild. This problem will have to be manually closed after the rebuild is started.
2. 3.
4.
5.
Description
This failure indicates that either the hardware or the microcode of a DDM has failed. This MAP will determine if which has failed.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Display the problems. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair 3. Review the SRN portion of each one line problem description. Does this same SRN appear in more than one problem? v Yes, this is a complex problem that the maintenance procedures are unable to resolve. Call your next level of support. v No, select the DDM in this problem for replacement. Follow the service terminal instructions for the replacement of the DDM.
227
Description
The cluster received an unexpected service request number (SRN) from the SSA.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check if there are any other DDM or SSA open problems associated with the failing resource: v If there are no other problems to repair, go to step 4. v If there are other problems, repair them before continuing with this MAP, then continue with the next step. 3. After the problems are repaired: v If the unexpected SSA SRN problem is closed, then the repair is complete. v If the unexpected SSA SRN problem is still open, then continue with the next step. 4. The problem cannot be corrected with a service procedure. 5. Call your next level of support.
Description
The cluster received unexpected results from the SSA.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option.
228
Description
Disk drive module (DDM) still formatting from previous installation or repair.
Isolation
1. Wait for the formatting of the DDM to complete. Formatting is complete when the indicators on the DDM stop flickering. 2. Retry the verification test.
229
Description
DDM Failure(s) have left array(s) with no spares.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check for any other DDM or SSA problems: Display problems needing repair. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Show / Repair Problems Needing Repair. v If there are other DDM or SSA problems, repair and test them. v If there are not any other DDM or SSA problems, continue with the next step. 3. Call your next level of support.
Description
Array is not available for customer use. There may be multiple problems that can be repaired to restore access. If no problems are found call your next level of support.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check for any other DDM or SSA problems: Display problems needing repair. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Show / Repair Problems Needing Repair. v If there are other DDM or SSA problems, repair and test them. v If there are not any other DDM or SSA problems, continue with the next step. 3. Call your next level of support.
230
Description
The DDMs you are attempting to format are members of an existing array that may contain customer data. By formatting these DDMs customer data would be destroyed. A possible cause of this condition is: previously configured DDMs or DDM bays were installed, that were not properly discontinued, when they were removed from their original rack.
Isolation
Call technical support. Do not attempt to resolve this problem without assistance from technical support, customer data may be lost.
Description
Multiple DDMs on an SSA loop cannot be accessed.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check if there are any other open problems: Note: Priority should be given to problems with the same ssaxx (SSA device card) or rsDDMxxxx as Failing Resource. Note the problem ID of the problem you are working on. To find other problems, press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Repair Menu Show/Repair Problems v If there are no other problems that can be repaired, go to step 4 on page 232.
231
Description
This procedure: v Supports the repair and replacement of one or more DDMs at the same time v Allows only one DDM per loop to be selected at one time so the loop does not have two breaks simultaneously v Automatically closes the original for each DDM that was replaced v Formats and resumes all the replaced DDMs at the same time v v v v Automatically disconnects the service login once the format/resume is started Opens a new if the format and resume fails for a DDM Allows you to log back in to monitor the progress of the format and resume Prevents further service actions until the format and resume is complete
Repair
1. From the service terminal Main Service Menu, select: Repair Menu Multiple DDM Repair 2. Review all problems needing repair and note the location of DDMs that require replacement. 3. Exit to the Repair menu and select: Repair / Verify DDM(s) (Multiple DDM Repair on older LIC code levels) 4. Do all of the DDM(s) for repair appear on the list? v Yes, continue with the next step. v No, go to step 9 on page 233. 5. Are all of the DDM(s) for repair available for selection ? Note: DDMs which have a pound sign (#) in front of the name cannot be selected at this time. v Yes, continue with the next step. v No, go to step 10 on page 233. 6. Select the DDMs for repair and follow the instructions on the terminal. Continue with the next step.
232
Description
Firmware update has detected DDM or DDMs with the enhanced PFA (Predictive Failure Analysis) test. These DDMs have not failed but should be replaced to prevent possible future failures. PFA has been enhanced in this level of code to be more sensitive in detecting conditions that could lead to future drive failures. For this reason, DDMs with no current functional problems may be called out for replacement. At this code level, the number of DDMs called out for replacement may be higher than in previous levels.
233
Isolation
1. Note the DDMs in the FRU list. 2. If there are other ESC 1216 problems, note the DDMs on those FRU lists. 3. Inform customer that you would like to replace these DDMs and explain enhanced PFA to them. 4. Replace the DDMs using the Replace FRU menu. 5. After all DDMs have been replaced cancel all 1216 ESC problems.
MAP 3160: SSA DASD DDM Bay Isolating a Single DDM Redundant Power Fault
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
A single DDM is reporting a different redundant power or cooling status than the other DDMs in the same DDM bay.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Display the problem which sent you here. Is there a controller card displayed in the FRU list? v Yes, select the controller card to replace. Follow the on-screen instructions but do not replace the controller card, just reseat it. Continue with the next step. v No, Use the Replace a FRU menu item and select the controller card (rs8pkctlrxx) in the same drawer as the DDM displayed in the problem. Follow the on-screen instructions but do not replace the controller card, just reseat it. Continue with the next step. 3. Verify that reseating the controller card resolved the problem. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Machine Test Menu SSA Loops Menu Select SSA Loop by DDM Bay (Drawer) Select the line that has the DDM bay containing the controller card which was just reseated. Press enter on the next screen, the loop test will run. Did loop test run without error? v Yes, the problem is resolved. Cancel the problem now. v No, the failure is still present, continue with the next step.
234
MAP 3160: SSA DASD DDM Bay Single DDM Redundant Power Fault
4. Display the problem which sent you here. Is there a controller card displayed on the list of FRUs ? v Yes, use the problem to replace the controller card. See Controller Card, DDM Bay in chapter 4 of the Volume 2. Continue with the next step. v No, Use the Replace a FRU menu item to replace the controller card (rs8pkctlrxx) in the same drawer as the DDM displayed in the problem. Controller Card, DDM Bay in chapter 4 of the Volume 2. Continue with the next step. 5. Verify that replacing the controller card resolved the problem. Machine Test Menu SSA Loops Menu Select SSA Loop by DDM Bay (Drawer) Select the line that has the DDM bay containing the DDM displayed on the service terminal. Press enter on the next screen, the loop test will run. Did loop test run without error? v Yes, the problem is resolved. Cancel the problem now. v No, the failure is still present, continue with the next step. 6. Use the problem to replace the listed DDM. See SSA Disk Drive Model, 7133 Model 020/040 in chapter 4 of the Volume 2. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, the failure is still present, continue with the next step. 7. Use the problem to replace the DDM Bay frame assembly displayed on the service terminal. See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. Note: The DDM bay backplane is replaced by replacing the DDM Bay frame assembly. Did repair verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process to return the resources to the customer and cancel the problem. v No, the failure is still present, call the next level of support.
Description
A controller card has failed in a DDM bay.
Isolation
1. Read this Attention before replacing any FRUs in this MAP:
235
Description
A different drawer has been installed where a DDM bay was expected. All of the drawers on the SSA loop must be uninstalled then reinstalled. If the customer has any data on the SSA loop, they will need to off load the data then reload it after the reinstallation.
Isolation
1. Use the service terminal to locate the drawer displayed as a Possible FRU to Replace. Copy down the FRU Location Description (Rr-Yxx or Rr-Ux-Wx). 2. Locate the improperly installed drawer. Use the location code copied down in the last step. Use Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3 to locate the SSA DASD DDM bay, and to determine which type of drawer is installed at that location. This drawer will need to be removed from the loop and then reinstalled using a DDM bay. 3. Use the service terminal to remove the drawer. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Remove Device Drawers Select the drawer line with the Resource Location that matches the location copied down in step 1. Continue through the instructions to remove the drawer. 4. Use the service terminal to install the DDM bay. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Install a Device Drawer Follow the install process, be sure to enter the correct DDM bay information this time.
236
Description
Installation of DDM bays on loop B failed, when loop A on the same SSA device card had uninstalled DDMs. The SSA cables attached to loop A must be disconnected.
Isolation
1. Use the service terminal to locate the SSA device card displayed as a Possible FRU to Replace. Copy down the FRU location. 2. Locate the cluster and the SSA device card using the information below and in Figure 104. Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v v v v Tx is the cluster, 1 or 2 P2 is the cluster planar Ix is the SSA device card location, slot yy is the cable connector, A1, A2, B1, or B2
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
3. Disconnect the SSA device cables from SSA device card connectors A1 and A2 on the indicated card. Note: To prevent damage to the SSA device cable connector screws, always use the special screwdriver (SSA tool, P/N 32H7059) to turn them. This screwdriver is in the 2105 ship group.
Problem Isolation Procedures, CHAPTER 3
237
Description
Installation of DDM bays on loop A failed, when loop B on the same SSA device card had uninstalled DDMs. The SSA cables attached to loop B must be disconnected.
Isolation
1. Use the service terminal to locate the SSA device card displayed as a Possible FRU to Replace. Copy down the FRU location. 2. Locate the cluster and the SSA device card using the information below and in Figure 105. Note: The SSA device card cable connector is in the format R1-Tx-P2-Ix/yy, where v Tx is the cluster, 1 or 2 v P2 is the cluster planar v Ix is the SSA device card location, slot v yy is the cable connector, A1, A2, B1, or B2
Front View
I/O DRAWER 1 (R1-T1) I/O DRAWER 2 (R1-T2) Tx-u0.1-P1-I1 Tx-u0.1-P1-I2 SSA Device Cards Tx-u0.1-P1-I11 Tx-u0.1-P1-I12
238
Description
The wrong number of DDMs were found where eight were expected. v Disk drive module (DDM) locations in DDM bay: New DDM locations: 1, 2, 3, 4, 5, 6, 7, and 8
16 15 14 13 N N N N 1 N 2 N 3 N 4 N
12 11 10 N N N 5 N 6 N 7 N
9 N 8 N
Isolation
1. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, continue with step 2. v No, continue with step 3 on page 240. 2. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors. Use the service terminal to verify that the problem is resolved. Select the cable that was incorrectly connected from the cable list and continue through verification without replacing the cable. v No, go to step 3 on page 240.
239
Description
During a repair or installation, the SSA Loop Verify test could not run from both clusters because one of the clusters is failing. To verify SSA loop operation, the SSA Loop test must be run from both clusters. The other (failing) cluster or cluster communications must be repaired before the SSA loop repair or installation can be completed
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check for open cluster or cluster communications problems.
240
Description
Before some DASD visual symptom service actions can be completed, this procedure must be done to ensure the status of the 2105 subsystem: Display any related problems shown as needing repair and change their status to closed.
Procedure
Use the description above and these procedures to complete the service action. 1. Display problems needing repair. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Show / Repair Problems Needing Repair Select a Problem to View or Repair v Record the Problem ID of all problems with a Failing Resource of rsrpc..... Note: To find the Failing Resource, select the problem and display the Detail Problem Record. Scroll down the screen until Failing Resource... is displayed. v Press F3 on the service terminal to display the next problem. Record its Problem ID if its Failing Resource is rsrpc..... Repeat this step until all related problem IDs problems have been recorded. 2. Change the state of the open problem with a Failing Resource of rsrpc.... to Closed. Press F3 on the service terminal until the Main Service Menu is displayed, then select:
Problem Isolation Procedures, CHAPTER 3
241
Description
Only one DDM bay has sensed a storage cage fan/power sense card failure. The other installed DDM bays, that monitor the same card, did not sense the failure. If the storage cage fan/power sense card was failing, all of the DDM bays should have reported the failure. This indicates that the storage cage fan/power sense card is OK. The fault reporting path, through the DDM bay that reported the failure, is not working correctly.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine which DDM bay reported the storage cage fan/power sense card failure and replace its DDM bay controller card. See Controller Card, DDM Bay in chapter 4 of the Volume 2. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 3. Refer to the following figures to determine the physical location of the DDM bay that you just replaced the controller card in: v 2105 Model 800, see Figure 108 on page 244. v 2105 Expansion Enclosure, see Figure 109 on page 245.
242
243
Front View
Rear View
Figure 108. 2105 Model 800 DDM Bay Locations (s009136)
244
Front View
Storage Cage 2 (-U2-) R2-U2-W5 R2-U2-W6 R2-U2-W7 R2-U2-W8 Storage Cage 4 (-U4-) R2-U4-W5 R2-U4-W6 R2-U4-W7 R2-U4-W8 12345678
Storage Cage 1 (-U1-) R2-U1-W5 R2-U1-W6 R2-U1-W7 R2-U1-W8 Storage Cage 3 (-U3-) R2-U3-W5 R2-U3-W6 R2-U3-W7 R2-U3-W8
Rear View
Figure 109. 2105 Expansion Enclosure DDM Bay Locations (S007741s)
245
Description
Multiple DDM bays have sensed a storage cage fan/power sense card failure. The storage cage fan/power sense card is the most likely FRU. There is a small chance that the storage cage power planar is failing.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Replace the storage cage fan/power sense card. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. After the replacement, verify the repair. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 3. 3. Replace the storage cage power planar. See Storage Cage Power Planar, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. After the replacement verify the repair. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, call your next level of support.
MAP 3379: Analyzing a Storage Cage Fan/Power Sense Card Check Summary Indicator On
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
A storage cage fan/power sense card Check Summary indicator is on. This indicator is on when the fan/power sense card detects a problem with one of the storage cage fans or power supplies that it monitors.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Use the service terminal to check for open problems: From the service terminal Main Service Menu, select: Repair Menu Show/Repair Problems Needing Repair Menu If there are any open storage cage fan or power supply faults, select and repair them.
246
MAP 3379: Storage Cage Fan/Power Sense Card Check Summary Indicator On
v If there are any open storage cage fan or power supply faults, select and repair them. v If there are not any open storage cage fan or power supply faults, go to the next step. 3. Run the machine test on All SSA Loops. From the service terminal Main Service Menu, select: Machine Test Menu SSA Loops Menu Select SSA Loops by SSA Device Card All Loops Run the SSA loop test on all SSA loops attached to an SSA device card If Machine Test found any problems, repair them. If Machine Test did not fine any problems, replace the storage cage fan/power sense card, see Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. Is the problem resolved? - Yes, end call. - No, call your next level of support.
Description
Only one DDM bay sensed a storage cage fan/power sense card failure. No other DDM bays are installed in the half-rack being sensed by the storage cage fan/power sense card. The most likely FRUs are the storage cage fan/power sense card or the DDM bay controller card in the reporting DDM bay. The problem could be a failure in the error reporting path.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Replace the storage cage fan/power sense card. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 3. 3. Determine which DDM bay reported the storage cage fan/power sense card failure and replace its DDM bay controller card. See Controller Card, DDM Bay in chapter 4 of the Volume 2. Is the storage cage fan/power sense card problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 4 on page 248.
Problem Isolation Procedures, CHAPTER 3
247
Description
A storage cage cooling fan failure has been reported. It could be one of the storage cage fans in the top of the 2105, or one of the two fans in the front of the 2105 Model 800 between the DDM bays. The most likely FRU is the failing fan. The fan fault reporting circuits could also be reporting a false fan error. Note: Every fan connector on the storage cage power planar must be plugged with a fan cable or a fan jumper. If any connector is empty, a false fan error will be created.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine which storage cage fan reported the storage cage fan failure. Locate the failing fan in the 2105, see chapter 7, volume 3 of this book for: v 2105 Model 800 and Expansion Enclosure Storage Cage Fan (Top) Location Codes in chapter 7 of the Volume 3
248
J18
J28
J31
J33
J35 J36
J37
J39
J34
J38
J15
J25
J14
J24
J13
J23
J12
J22
J21
Figure 110. Storage Cage Power Planar Fan Jumper Locations (s008352p) Connector Number J 31 Storage cage fan location code Rx-U1 or U3-F1
Problem Isolation Procedures, CHAPTER 3
249
4. Verify the repair. Return to the service terminal and select the storage cage fan for replacement. Proceed through the repair but do not replace the storage cage fan, this will simulate a repair and run verification. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 6. 5. Replace the failing storage cage fan. See Storage Cage Fan (Top) Removal and Replacement, 2105 Model 800 and Expansion Enclosure or Storage Cage Fan (Center), 2105 Model 800 and Expansion Enclosurein chapter 4 of the Volume 2. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 6. 6. Replace the storage cage fan/power sense card. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 7. Replace the DDM bay controller card. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, continue with the next step. 8. Disconnect the cable to the failing fan at the fan and the storage cage power planar. Connect a storage cage fan FRU cable to the fan and the storage cage power planar. Is the storage cage fan problem resolved? v Yes, use the service terminal to close the problem and end the call. v No, go to step 9. 9. Replace the storage cage power planar. See Storage Cage Power Planar, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2.
250
Description
A storage cage power supply failure has been reported. The failure could be the storage cage power supply, its dc input voltage, or its error reporting path.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine which storage cage power supply is failing. Locate the failing power supply, see Rack, 2105 Model 800 and Expansion Enclosure Storage Cage Power Supply Location Codes in chapter 7 of the Volume 3. Is there a real power supply, not a dummy power supply, installed in the failing power supply location? v Yes, go to step 3. v No, go to step 15 on page 255. 3. Observe the power switch on the failing storage cage power supply. Is the storage cage power supplies power switch set to On (up)? v Yes, go to step 4 on page 252. v No, set the switch to On (up). Verify the problem has been corrected. If the FRU is listed on the problem details screen, select it and do a pseudo repair (do not actually replace it) of the storage cage power FRU so that the FRU verification tests will run and report the results.
Power Switch
251
CB1 CB2 CB3 CB4 CB5 J1 J2 J3 J4 J7-1 J7-2 J7-3 J7-4 J7-5
J5A J5B J6
CB00
Rear View
Figure 112. Primary Power Supply CB and Connector Locations (S008496l) Failing Storage Cage Power Supply (SCPS) SCPS-1 SCPS-2 SCPS-3 SCPS-4 SCPS-5 SCPS-6 CB Check for 2105 Model 800 and Expansion Enclosure Storage Cages 1 and 2 (upper) CB-3 CB-4 CB-3 CB-4 CB-3 CB-4 CB Check for Expansion Enclosure Storage Cages 3 and 4 (lower) CB-1 CB-2 CB-1 CB-2 CB-1 CB-2
Is the input power CB for the failing storage cage power supply tripped (down)? v Yes, go to MAP 2520: PPS Output Circuit Breaker Tripped on page 168. v No, go to step 9 on page 253.
252
SCPS and PPS Connectors to Check SCPS-1, J2 and PPS-1, J7-3 SCPS-1, J1 and PPS-2, J7-3 SCPS-2, J2 and PPS-1, J7-4 SCPS-2, J1 and PPS-2, J7-4 SCPS-3, J2 and PPS-1, J7-3 SCPS-3, J1 and PPS-2, J7-3 SCPS-4, J2 and PPS-1, J7-4 SCPS-4, J1 and PPS-2, J7-4 SCPS-5, J2 and PPS-1, J7-3 SCPS-5, J1 and PPS-2, J7-3 SCPS-6, J2 and PPS-1, J7-4 SCPS-6, J1 and PPS-2, J7-4
253
SCPS and PPS Connectors to Check SCPS-1, J2 and PPS-1, J7-1 SCPS-1, J1 and PPS-2, J7-1 SCPS-2, J2 and PPS-1, J7-2 SCPS-2, J1 and PPS-2, J7-2 SCPS-3, J2 and PPS-1, J7-1 SCPS-3, J1 and PPS-2, J7-1 SCPS-4, J2 and PPS-1, J7-2 SCPS-4, J1 and PPS-2, J7-2 SCPS-5, J2 and PPS-1, J7-1 SCPS-5, J1 and PPS-2, J7-1 SCPS-6, J2 and PPS-1, J7-2 SCPS-6, J1 and PPS-2, J7-2
Is the storage cage P.S. cable connected correctly? v Yes, go to step 11. v No, reseat the cable as required. If the green PWR-1 or -2 Power indicator is now on, the problem is resolved. Use the service terminal to verify the problem and close it. If the green PWR-1 or -2 Power indicator is still off, go to step 11. 11. Swap the two input power cables, J1 and J2, on the rear of the failing storage cage power supply. Observe the status of the PWR-1 and -2 Power indicators. Did the PWR-1 and -2 Power indicator swap states (On now Off and Off now On)? v Yes, go to step 12. v No, replace the storage cage power supply. See Storage Cage Power Supply, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. If the problem is not resolved, call your next level of support. 12. Swap the two input power cables, J1 and J2, back to their original positions. Replace the primary P.S. to storage cage P.S. cable associated with the PWR-1 or -2 power indicator that is Off. See Cables, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. If the problem is not resolved, call your next level of support. 13. Replace the storage cage power supply. See Storage Cage Power Supply, 2105 Model 800 and Expansion Enclosure in chapter 4 of the Volume 2. Observe the CHK/PWR GOOD indicator On (green)? Is the storage cage power supply problem resolved? v Yes, the problem is resolved. Return to the service terminal and Continue Repair Process to return the resources to the customer and cancel the problem. v No, continue with the next step.
254
Description
SSA DASD DDM bay power problem. A group of storage cage power supplies are failing. The storage cage power supplies shut down when they cannot maintain their output voltage. This can be caused by too few storage cage power supplies or by a short circuit on their output voltage. All of the storage cage power supplies feed a common voltage bus. A short on the bus will affect all attached storage cage power supplies. With this failure, the CHK/POWER GOOD indicators on all associated storage cage power supplies will be On (amber). Note: The CHK/POWER GOOD indicator can be on with the color amber or green. v Amber is CHK (check) v Green is POWER GOOD
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first.
255
Power Switch
3. Determine if the correct number of storage cage power supplies are installed. Count the DDM bays and the storage cage power supplies installed in the storage cages associated with the failing power supplies (storage cages 1 and 2 or 3 and 4).
Table 33. Storage Cage Power Supply Installation Requirements Number of DDM bays Installed 1 to 8 1 to 8 and 9 to 16 Minimum Number of Storage Cage Power Supplies Required 4 6
Are the correct number of storage cage power supplies installed for the number of DDM bays installed?
256
257
258
13.
14. 15.
16.
259
260
Description
DDM bay power problem. All indicators on an DDM bay are off. This indicates that input power to the DDM bay is missing.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Did you start this service action from a problem displayed on a service terminal? v Yes, go to step 5 on page 262.
Problem Isolation Procedures, CHAPTER 3
261
Power Switch
Are all of the storage cage 1 and 2 power supply CHK/POWER GOOD indicators On (amber)? v Yes, MAP 3391: Isolating a Storage Cage Power System Problem on page 255. v No, go to step 8 on page 263. 7. Go to the rear of the 2105 Expansion Enclosure. Locate the storage cage power supplies mounted between storage cages 3 and 4.
262
MAP 3397: Isolating an SSA DASD DDM Bay Controller Card Problem
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
DDM bay controller card problem. The controller card failure indicator is on.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Did you start this service action from a problem displayed on a service terminal? v Yes, go to step 6. v No, continue with the next step. 3. Use the service terminal to look for any problems. Repair these problems first then continue with the next step. 4. Are the symptoms that originally sent you to this MAP repaired? v Yes, the problem is resolved end the service call. v No, continue with the next step. 5. Replace the controller card, use Controller Card, DDM Bay in chapter 4 of the Volume 2. 6. Determine the location code for the DDM bay that you just replaced the controller card in. The DDM bay location code is in the format: Rx-Uy-Wz. Do you know the DDM bays location code?
Problem Isolation Procedures, CHAPTER 3
263
Description
The DDM bay controller card has problems communicating with the bypass card or the passthrough cards in the DDM bay. The cause of the failure may be the controller card, bypass card, one of the pass through cards, or the DDM bay backplane.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the controller card in the FRU list. Select the controller card and replace it. After replacement, verify the repair: v If the problem is resolved, end the call. v If the problem is not resolved, continue with the next step. 3. Verify that the controller card check indicator is on (amber), see DDM Bay Indicators on page 21. v If the check indicator is on, continue with the next step. v If the check indicator is not on, call your next level of support. 4. Select the bypass card from the FRU list for replacement. a. Do not disconnect the SSA cables from the bypass card.
264
265
Description
This procedure is used for SSA failures when the service terminal repair process cannot call out the backplane for replacement.
Procedure
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Record the MAP and step number that sent you to this MAP. 3. Verify you are at the SSA link repair screen that did not include the backplane as a FRU. 4. Record the DDM bay number you are repairing. 5. Press F3 on the service terminal until the Repair Menu is displayed, select: Replace a FRU 6. Move the cursor to the DDM bay location for the backplane or frame being replaced, front or back, and press Enter. 7. Replace the selected backplane or frame: v DDM bay frame assembly (backplane). See Frame Assembly, DDM Bay in chapter 4 of the Volume 2. 8. After the DDM bay frame is replaced, follow the instructions displayed on the service terminal to verify the repair process. v If the repair verification runs without error, the problem is resolved. v If the SSA link is still failing, look at the MAP and step that sent you to this MAP. If that step is the last step in the procedure, call the next level of support. If there are more steps in the procedure, continue with that MAP.
266
Description
The storage cage fan/power sense card in the bottom half of a 2105 Expansion Enclosure has reported that it has no cage sense card R2 cable installed. This cable is needed for proper control of fan speeds in the 2105 Expansion Enclosure box. The problem can be caused by one of the following: v The cage sense card R2 cable is not connected correctly. v The cage sense card R2 cable is failing. v The lower fan/power sense card is reporting incorrectly. v A DDM bay controller card is reporting incorrectly.
Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)
Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the cage sense card R2 cable that is connected to the upper and lower storage cage fan/power sense cards in the 2105 Expansion Enclosure. Verify that the R2 cable is connected correctly to both sense cards. Did you find and fix a problem with the R2 cable? v Yes, verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If verification is successful, close the problem.
Problem Isolation Procedures, CHAPTER 3
267
MAP 3422: Storage Cage Fan/Power Sense Card R2 Jumper and Cable Problems
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
The Storage cage fan/power sense card in the top of the 2105 Expansion Enclosure has reported one of the following: v Missing cage sense card R2 jumper v Missing cage sense card R2 cable
268
Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)
Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Check if there is a storage cage fan/power sense card in the bottom of the 2105 Expansion Enclosure. Is there a lower storage cage fan/power sense card in the 2105 Expansion Enclosure? v Yes, go to step 7 on page 270. v No, continue with the next step. 3. Inspect the upper storage cage fan/power sense card in the 2105 Expansion Enclosure. Verify that cage sense card R2 jumper is present and installed correctly on the upper storage cage fan/power sense card. Did you find and correct a problem with the R2 jumper? v Yes, verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If verification is successful, close the problem. If verification fails, continue with the next step. v No, continue with the next step.
269
MAP 3423: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Missing Error
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
The storage cage fan/power sense card in 2105 Model 800 has reported that the cage sense card R1 jumper is missing.
270
Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)
Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Inspect the upper storage cage fan/power sense card in the 2105 Model 800 Verify that cage sense card R1 jumper is present and installed correctly on the storage cage fan/power sense card. Did you find and correct a problem with the R1 jumper? v Yes, verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If verification is successful, close the problem. If verification fails, go to step 3 on page 272. v No, replace the R1 jumper and verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification.
Problem Isolation Procedures, CHAPTER 3
271
MAP 3424: Isolating a Storage Cage Fan/Power Sense Card R1 Jumper Failing Error
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
The storage cage fan/power sense card in 2105 Model 800 has reported a failure that is only possible in 2105 Expansion Enclosure. This indicates that the 2105 Model 800 cage sense card R1 jumper is failing.
Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)
Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification.
272
MAP 3425: Isolating a Storage Cage Fan/Power Sense Card R2 Cable Error
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
One of the storage cage fan/power sense cards in 2105 Expansion Enclosure has reported a line open in the cage sense card R2 cable. This cable connects the upper and lower storage cage fan/power sense cards. The most likely cause of the problem is one of the following: v The cage sense card R2 cable is failing v The storage cage fan/power sense card that reported the failure is failing. v A DDM bay controller card is reporting incorrectly.
273
Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)
Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Replace the cage sense card R2 cable, then verify the repair. Return to the service terminal and select the sense card for replacement. Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 3. Replace the storage cage fan/power sense card, that was shown as a FRU by the service terminal, then verify the repair. See Storage Cage Fan/Power Sense Card, 2105 Model 800 in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, continue with the next step. 4. Replace the DDM bay controller card shown as a FRU by the service terminal, then verify the repair. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, call your next level of support.
274
MAP 3426: Isolating a Storage Cage Fan/Power Sense Card Location Error
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
The machine hardware is reporting different rack location information from than entered manually at the service terminal. The problem must be corrected. The possible causes of this condition are: v A cage sense card R2 jumper has mistakenly been plugged onto the storage cage fan/power sense card in 2105 Model 800 v A cage sense card R1 jumper has mistakenly been plugged onto the storage cage fan/power sense card in the top half of 2105 Expansion Enclosure v A cage sense card is reporting the location incorrectly. v A DDM bay controller card is reporting the location incorrectly. v The DDM bay location selected by the service support representative for a DDM bay was in the wrong 2105, and needs to be changed.
Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)
Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)
Figure 120. Fan Sense Card Jumper and Cable Locations (S008774m)
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option.
Problem Isolation Procedures, CHAPTER 3
275
276
Description
The machine hardware is reporting different DDM bay location information than was entered manually at the service terminal. The problem must be corrected. The possible causes for this condition are: v The cage sense card R2 cable has been plugged backwards. The end marked Fan Sense Card Top Power Stack has been plugged into the lower sense card. The end marked Fan Sense Card Bottom Power Stack has been plugged into the upper sense card. v The DDM Bay location selected by the CE for an DDM Bay was in the wrong bay, and needs to be changed. v A DDM bay controller card is reporting incorrectly.
Rack 2 With 3 or 4 Storage Cages, Upper Sense Card Connector (Top, R2-Q1-C1)
Rack 2 With 3 or 4 Storage Cages, Lower Sense Card Connector (Bottom, R2-Q2-C1)
Figure 121. Fan Sense Card Jumper and Cable Locations (S008774m)
Isolation
1. Read this Attention before replacing any FRUs in this MAP:
Problem Isolation Procedures, CHAPTER 3
277
278
Description
The machine hardware is reporting different DDM bay location information than was entered manually at the service terminal. The problem must be corrected. The possible causes for this condition are: v The power planar to DDM bay planar cable is plugged to the wrong connector position on the storage cage power planar. See Figure 122 on page 281 and Figure 123 on page 282 v The connectors that the power planar to DDM bay planar cable plugs into, may have bent or pushed back pins. v The DDM bay location selected by the service support representative for a DDM bay was in the wrong location, and needs to be changed. v A DDM bay controller card is reporting incorrectly.
Isolation
1. Review the DDM bay location entered by the service support representative. Look below the FRU list on the service terminal, at the line that starts with Additional Message.... Look for the word Reported, followed by the Rack-Bay-Drawer (DDM bay) location reported by the 2105. You can find the actual DDM that was used to read the Reported location. Look on the Additional Messages line, to the right of the Reported Rack-Bay-Drawer location. You may need to use the arrow keys on the keyboard to scroll to the right. Look for the words DDMSN, followed by the serial number of the DDM that was used to read the Reported location. Following the serial number is the slot number in the DDM bay, in parentheses, where the DDM is located. You should be able to find the DDM with this serial number in the DDM bay slot indicated by the Reported location. Then look for the word Entered:, followed by the Rack-Bay-Drawer location that was entered by the service support representative. Carefully review the location that the service support representative entered to determine if it is correct.
Problem Isolation Procedures, CHAPTER 3
279
280
Storage Cage U2
F1 F3
F2
Storage Cage U3
F1 F3
Power Planar Q2
F1 J18 J17 J16 J15 J14 J13 J12 J11 J28 J27 J26 J25 J24 J23 J22 J21
Storage Cage U4
F1 F3
F2
Front View
Figure 122. DDM Bay Front Power Cable Locations (S009430)
Note: The two lower storage cages (U3 and U4) are not present in 2105 Model 800s.
281
Storage Cage U1
F6 F4
Storage Cage U4
F6 F4
Power Planar Q2
J28 J27 J26 J25 J24 J23 J22 J21 J18 J17 J16 J15 J124 J13 J12 J11
Storage Cage U3
F6 F4
Rear View
Figure 123. DDM Bay Rear Power Cable Locations (S009431)
Note: The two lower storage cages (U4 and U3) are not present in 2105 Model 800s.
Description
The machine hardware is reporting different DDM location information than was created internally based on what was entered manually at the service terminal. The problem must be corrected. The possible causes for this condition are: v The SSA loop has been cabled incorrectly. v The DDM bay controller card is reporting the DDM location incorrectly.
282
Isolation
1. Look at the SSA cables displayed on the Detail Problem screen. Compare the SSA cables displayed with the cabling of the DDM bay being Installed/Analyzed. Are any of the SSA cables connected wrong? v Yes, connect the jumper cables to the correct connectors, then verify the repair. Return to the service terminal and select the sense card for replacement Proceed through the repair but do not replace the sense card. This will simulate a repair and run verification. If the verification was successful, close the problem and end the call. If the verification was not successful, continue with the next step. v No, continue with the next step. 2. Replace the DDM bay controller card shown as a FRU by the service terminal, then verify the repair. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2. v If the verification was successful, close the problem and end the call. v If the verification was not successful, call your next level of support.
Description
This MAP helps you to verify a repair to a Verifying a DDM bay Repair that generated a problem because it was powered off. This MAP will verify if the problem is resolved.
Isolation
1. Determine if the DDM bay with the problem was just installed into the 2105 or if DDMs were just installed into it. Was the failing DDM bay or its DDMs just installed? v Yes, the DDM bay or its DDMs were just installed. At the service terminal press F3 until the screen that allows the restart of installation is displayed. Restart the installation to verify the repair. If the repair is verified, the installation will resume at the point that the original error was detected. v No, the DDM bay or its DDMs were not just installed. Verify the repair using the service terminal. From the Main Service Menu, select: Machine Test Menu. Machine Test Menu Select SSA Loops Menu. Select the DDM bay you just repaired. Identify the DDM bay by the location code. Did the SSA device test run without error? Yes, go to step 2 on page 284. No, follow the instructions displayed on the service terminal to correct the problem.
Problem Isolation Procedures, CHAPTER 3
283
Description
This MAP verifies that an DDM bay is operating correctly when visual symptoms, or other reasons, indicate a possible problem.
Isolation
1. Did you start this service action from a problem displayed on a service terminal? v Yes, go to step 4. v No, continue with the next step. 2. Use the service terminal to look for any problems. Repair these problems first then continue with the next step. 3. Are the symptoms that originally sent you to this MAP repaired? v Yes, the problem is resolved end the service call. v No, continue with the next step. 4. Record the location of the DDM bay that you have just repaired. 5. At the service terminal, press F3 until the Main Service Menu is displayed, select: Machine Test Menu SSA Loops Menu Find the line that has the SSA Device DDM bay with location you recorded. 6. Select a line with the recorded DDM bay location to run the SSA loop test. Select loop A or B for this test, it does not matter which you select. This test will verify correct operation of all of the DDM bays on both loops of that SSA device card.
Description
The SSA Devices Certify Test detected a problem. The failure was due to either: v A media problem was detected with one or more DDMs. Or. v Some unrelated occurrence in the system caused the process to abort.
Isolation
1. Observe the Code EC level displayed on the logon screen. Is the code level above 2.3.0.255? v Yes, continue with the next step.
284
3.
4.
5.
6.
285
Description
The web-initiated DDM format operation fails to complete because some unrelated occurrence in the system caused the process to abort. Retrying the DDM format operation may allow the process to run to completion. The web-initiated DDM format operation probably failed because of a problem on the machine or an error recovery by the machine.
Isolation
1. Cancel the ESC 1247 problem generated by the DDM format operation. 2. Retry the DDM Format / Resume operation. From the service terminal Main Service Menu, select: Repair Menu Format / Resume DDM(s) Continue through the instructions to retry the DDM format operation. After the DDM format operation is started, you will be automatically logged off. 3. Log back on the 2105 any time to check the DDM format operation progress. When the DDM format operation has completed. From the service terminal Main Service Menu, select: Repair Menu Show Result of DDM Format / Resume Operation Did the operation complete successfully? v Yes, the problem is resolved. Ask the customer continue their web operation. Notes: a. Customer may or may not have any web operation to continue. b. Customer may want to retry the previous failed web operation. v No, the machine is still failing. Fix any additional problems that occurred on the machine. Retry step 3 if possible. If machine is still failing, call the next level of support.
Description
The service-initiated DDM format operation fails to complete because some unrelated occurrence in the system caused the process to abort. Retrying the DDM format operation may allow the process to run to completion. The service-initiated DDM format operation probably failed because of a problem on the machine or an error recovery by the machine.
286
Isolation
1. Cancel the ESC 1246 problem generated by the DDM format operation. 2. Retry the DDM Format / Resume operation. From the service terminal Main Service Menu, select: Repair Menu Format / Resume DDM(s) Continue through the instructions to retry the DDM format operation. After the DDM format operation is started, you will be automatically logged off. 3. Log back on the 2105 any time to check the DDM format operation progress. When the DDM format operation has completed. From the service terminal Main Service Menu, select: Repair Menu Show Result of DDM Format / Resume Operation Did the operation complete successfully? v Yes, the problem is resolved. v No, the machine is still failing. Fix any additional problems that occurred on the machine. Retry step 3 if possible. If machine is still failing, call the next level of support.
Description
The verification test did not complete successfully because some unrelated occurrence in the system caused the test to abort. Retrying the verification test will allow the verification test to run to completion. If there is a real problem, you will be directed to a different MAP.
Isolation
v If you are viewing the problem after selecting Show / Repair Detected Problems from the Verification Tests Has Detected Problems screen, rerun the verification test. Press F3 once, then at the new screen select the Run Verification Tests Again option. v If you are not viewing the problem from Show / Repair Detected Problems, select and repair the original problem and choose the original FRU. Do not replace the FRU when instructed to do so. Did repair verification run without error? v If the verification ran without error, the problem is resolved. v If the verification failed, continue with any problem displayed by the verification process. If this same problem continues to occur, there may be another problem on the machine that prevents verification from running successfully. Resolve these problems then retry this problem again. If verification still fails, call your next level of support.
287
Description
The verification test did not complete successfully because some unrelated occurrence in the system caused the test to abort. Retrying the verification test will allow the verification test to run to completion. If there is a real problem, you will be directed to a different MAP. At the end of a repair process a Resume process is performed that makes the resource available for customer use. During the Resume process an unrelated event occurred that prevented the Resume to complete normally. You will need to go through a pseudo repair process to complete the repair.
Isolation
1. Select the DDM listed in the Possible FRUs to Replace portion of the problem. 2. Proceed through the repair process, when the process instructs you to replace the DDM, do not replace it. Continue through the repair process as if you had replaced the DDM. If this repair process directs you to resolve other problems before completing this problem, do so. Then return to this problem
Description
One or more DDMs have been found in a formatting state during IML. A possible cause for this condition is: a format process was interrupted by some unrelated occurrence in the system.
Isolation
1. Check if there are any other open problems: v If there are no other problems to repair, go to step 2. v If there are other problems, repair them before continuing with this MAP, then continue with the next step. 2. Cancel the problem. 3. Use the service terminal to format the drawer, or drawers, that has the DDM, or DDMs, found in a formatting state. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Format DDM Bays (Drawers) Format All Drawers Listed Continue through the instructions to format the DDM, or DDMs.
288
Description
Multiple DDMs can not be accessed. The open links are on a DDM bay boundary.
Isolation
1. Determine if the SSA cables to the failing DDM bay have just been changed or installed. Have the SSA cables just been changed or installed? v Yes, go to step 2. v No, go to step 4. Verify that the SSA cables are connected correctly. Look at the cables displayed on the Detail Problem screen. Compare the cables displayed with the cabling of the DDM bay. Are any of the cables connected wrong? v Yes, Connect the cables to the correct connectors, go to step 3. v No, go to step 4. Determine if the problem is resolved. Return to the service terminal Detail Problem screen. Select any FRU in the Possible FRUs to Replace list or any cable in the cable list. Proceed through the repair but do not replace any FRU or disconnect any cables. This will simulate a repair and run verification. Did verification run without error? v Yes, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v No, go to step 4. Look at the Additional Message in the Detail Problem Record, it will give you the name and location of one or more failing DDM bays. Find one of these failing DDM bays. See Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3. Continue with the next step. Observe the following indicators on the front of the DDM bay: v DDMs (eight) v Bypass card v Controller card
2.
3.
4.
5.
289
6. Go to the DDM bay and observe the indicators. Note: The front of the DDM bay can be facing the front or rear of the 2105. Are any of the indicators on? v Yes, call your next level of support. v No, go to MAP 3395: Isolating a DDM Bay Power Problem on page 261
Description
Unexpected results were reported by an SSA component.
Isolation
An unexpected condition was detected, call your next level of support.
Description
This section describes the conditions that created this state. The full storage capacity of all DDMs (Disk Drive Modules) on an SSA loop (or on both loops of an adapter pair for an AAL configured machine) can be used only when all of the DDMs have the same storage capacity. There are times when it is correct to add DDMs of a different capacity to a loop. This can happen when a specific DDM is no longer manufactured and DDMs with a larger storage capacity must be used. There are also times when there is a need to have mixed capacity DDM bay on a single loop (or adapter pair for an AAL configured machine). You have been sent to this MAP because multiple capacity arrays may be created on this loop (or adapter pair for an AAL configured machine), and additional DDMs may be required as spares.
290
Detailed Description
This section is to describe the details of the conditions that created this state. The following Isolation section will describe what to do to fix the condition. 1. The capacity of all DDMs on an SSA loop (or on both loops of an adapter pair for an AAL configured machine) are most fully used when all DDMs have the same storage capacity. There are times when there is a need to add DDMs of a different capacity. 2. On each SSA loop, one spare is created for each of the first two arrays of each DDM capacity. 3. There are two possible options to resolving this condition. a. Give permission for the installation to continue with DDMs intermixed as they currently are. b. Remove the DDM bay(s) that you have just installed. 4. The follow items will help you determine the exact condition and what the options mean. 5. On each SSA loop (or on both loops of an adapter pair for an AAL configured machine), DDMs are grouped together as Potential and Configured Rank Sites. Each Rank Site consists of eight DDMs. 6. Arrays consist of seven or eight array member DDMs. All of the members of any array are found on the same rank site. When there are seven members in an array, the additional DDM in that rank site is always assigned as a spare. 7. There is a Utility that allows viewing the Rank Sites on an SSA Loop and the capacities of the DDMs on those Rank Sites. The effective capacity of a Rank Site is determined by the smallest capacity of any DDM on a rank site. 8. Configured rank sites contain those DDMs which have already been assigned as array members, or spare DDMs. Since these rank sites contain customer data, they will not be affected by this MAP. The effective capacity of these rank sites is the same capacity as the smallest capacity DDM in the rank site. Note: There is a possible, but infrequent, situation where an arrays effective capacity will be smaller than the smallest DDM. See the note with Description step 12. 9. All unassigned DDMs on a loop are considered to be Free and have been grouped into potential rank sites. Note: Some DDMs may have a status of Failed and may occur in either rank site. 10. Whenever new DDMs are installed on a loop, these DDMs become Free DDMs. Existing potential rank sites are dissolved releasing their Free DDMs and any spare DDMs. Then all the Free DDMs, both new and previously existing, are grouped together into new potential rank sites. 11. These Free DDMs are then placed in potential rank sites by capacity. The Largest DDMs are placed into rank sites first. When there are not enough DDMs of the largest capacity to fill the next rank site, the next smaller capacity is used. This continues until all the Free DDMs are in potential rank sites. 12. The capacity of an array is determined by the smallest capacity of the member DDMs when the array is created. This will be the smallest DDM in the rank site. If one of the DDMs in a rank site is to become a spare, the largest capacity DDM is chosen for the spare. The rest of the DDMs will become
Problem Isolation Procedures, CHAPTER 3
291
Isolation
1. Do you want to display the capacities and rank sites of the DDMs on this loop? v Yes, go to step 3. 2. v No, continue with the next step. Do you want to complete the installation with the DDMs that are currently on the loop? v Yes, go to step 8 on page 293. v No, go to step 6. To display the capacities of the DDMs on this loop, perform the following: a. Note the Loop Name (color) of the loop where the installation is being done. b. From the service terminal select Exit Install, to display the Main Service Menu, then select: Utility Menu Show Storage Facility Resources Menu List DDMs on an SSA Loop by Rank Site Select the line with the install Loop Name (color). Scroll up and down on the screen to view the Rank Sites and Capacities of the DDMs on this loop. c. Continue with the next step. Now that you have viewed the DDM capacities, do you want to complete the installation with the DDMs that are currently on the loop? v Yes, complete the installation, continue with the next step. v No, go to step 7 to remove the DDM bay(s) or DDM bay(s) you just installed. Return to the Install process on the Service Terminal. Press F3 until the Main Service Menu is displayed. From the service terminal Main Service Menu, select: Install/Remove Menu DDM Bay (Drawer) Menu Continue into the install process you performed before until the screen that directed you to this MAP appears. Go to step 8 on page 293. At the Service terminal, select Exit Install and you will be at the Main Service Menu. Continue with the next step. Do the following steps to uninstall the DDM bay or DDM bays that you just installed: a. From the service terminal Main Service Menu, select: Install/Remove Menu
3.
4.
5.
6.
7.
292
Description
This section describes the conditions that created this state. The full storage capacity of all DDMs (Disk Drive Modules) on an SSA loop can be used only when all of the DDMs have the same storage capacity. There are times when DDMs of a different capacity are added to a loop. This can happen when a specific DDM is no longer manufactured and a DDM with a larger storage capacity must be used as a replacement. There are also times when it is desirable to install DDM bays that contain intermixed capacity DDMs. You have been sent to this MAP to make sure that you intended to install different size DDMs on this loop. If you understand the conditions that created this state, go directly to the Isolation section. If you need more information on to determine if you will allow mixed DDM capacities in a rank site, read the following Detailed Description section.
Detailed Description
This section is to describe the conditions that created this state. The following Isolation section will describe what to do to fix the condition. 1. The capacity of all DDMs on an SSA loop are most fully used when all DDMs have the same storage capacity. There are times when there is a need to add DDMs of a different capacity. 2. There are two possible options to resolving this condition. a. Give permission for the installation to continue with DDMs intermixed as they currently are.
Problem Isolation Procedures, CHAPTER 3
293
3. 4. 5.
6.
7.
8. 9.
10.
11.
12.
Isolation
1. Do you want to display the capacities of the DDMs on this loop? v Yes, go to step 3 on page 295. v No, continue with the next step.
294
4.
5.
6.
7.
295
Description
During the installation of new DDM bay(s), a DDM was found that has a different RPM than other DDMs previously on the loop. This is permitted, but not recommended. A DDM with a lower RPM will slow the access to any array in which it is included. You may choose to leave this DDM in the loop. If you do, you will not be notified if any other DDMs with this RPM are included in this installation. On any new installations, you will only be notified of a still different RPM DDM.
Isolation
1. Do you want to display the RPMs of the DDMs on this loop? v Yes, go to step 3. v No, continue with the next step. Do you want to complete the installation with the DDMs that are currently on the loop? v Yes, go to step 10 on page 297. v No, go to step 6 on page 297. To display the RPMs of the DDMs on this loop, perform the following: a. Note the Loop Name (color) of the loop where the installation is being done. b. From the service terminal select Exit Install, to display the Main Service Menu, then select: Utility Menu Show Storage Facility Resources Menu List DDMs on an SSA Loop by Rank Site Select the line with the install Loop Name (color). Scroll up and down on the screen to view the Rank Sites and Capacities of the DDMs on this loop. c. Continue with the next step. Now that you have viewed the DDM RPM speeds, do you want to complete the installation with the DDMs that are currently on the loop? v Yes, complete the installation, continue with the next step. v No, go to step 7 on page 297 to remove the DDM bay(s) you just installed. Return to the Install process on the Service Terminal. Press F3 until the Main Service Menu is displayed.
2.
3.
4.
5.
296
297
MAP 3615: DDMs of Same Capacity but Different RPMs on the Same SSA Loop
MAP 3615: DDMs of Same Capacity but Different RPMs on the Same SSA Loop
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
DDMs with the same storage capacity, but different speed (RPM), were found on the same SSA loop. DDMs with RPMs of 15K or higher, are not allowed on the same SSA loop as DDMs with the same capacity, but slower RPM.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Was a DDM with a different RPM, than called for in the FRU list, used as a replacement FRU during a repair? v Yes, get a new DDM FRU of the same RPM as called for in the FRU list, then retry the repair. If the problem is still present, contact your next level of support. v No, continue with the next step. 3. You have attempted to install DDMs that have different RPMs than other DDMs already on the loop. Determine if you need to replace individual DDMs or a DDM bay. Do you need to remove an entire DDM bay? v Yes, do the following: Remove the DDM bays you just attempted to install using the Remove Drawer option. Replace any DDMs with different RPMs than other DDMs already on the loop. Retry the DDM installation. If it still fails, contact your next level of support v No, Replace any DDMs with different RPMs than other DDMs already on the loop. Retry the DDM installation. If it still fails, contact your next level of support
298
Description
A disk drive module or modules, has been detected with an unsupported storage capacity. These DDMs are either: not supported by this machine model, or not supported by the level of Licensed Internal Code (LIC) on this machine.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine if the unsupported DDM was installed: as a replacement FRU during a repair, or as a new DDM. Was the unsupported DDM installed as a replacement FRU during a repair? v Yes, obtain a new DDM FRU that is a compatible or supported replacement FRU for the original DDM, then retry the repair. If the problem happens again with a compatible or supported replacement FRU, contact your next level of support. v No, you have attempted to install a DDM Bay that contains one or more DDMs with a storage capacity that is not supported. Do the following: Remove the DDM bays you just attempted to install using the Remove Drawer option. Replace any DDMs with unsupported capacity with DDMs of a supported capacity. Retry the installation. If the problem happens, contact your next level of support.
MAP 3618: Replacement DDM Has Slower RPM Than Called For
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
A DDM used for replacement has a slower RPM than was called for on the FRU list. It is recommended that a replacement DDM have an equal or higher RPM than called for on the FRU list. If a DDM with a lower RPM is spared into an array with higher RPM DDMs, the performance of that array will be somewhat degraded. If speed of repair is more important than performance, a slower speed DDM can be used by activating the Allow Slower RPM Replacement switch. This flag will be valid only for this repair.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification.
Problem Isolation Procedures, CHAPTER 3
299
MAP 3618: Replacement DDM Has Slower RPM Than Called For
v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Determine if it is you want to degrade subsystem performance by allowing a lower RPM replacement DDM to be installed (see Description above). Do you want to install a lower RPM DDM and degrade loop performance? v Yes, continue with the next step. v No, go to step 6. 3. You have chosen to degrade loop performance by allowing of a slower RPM replacement DDM than called for on the FRU list. This step will Allow Slower RPM Replacement: a. Return to the service terminal and record the number of the problem you are working on. b. Press F3 until the Main Service Menu is displayed. c. From the service terminal Main Service Menu, select: Configuration Option Menu Change/Show Control Switches d. Select Allow Slower RPM Replacement. e. Change the value to True. f. Continue with the next step. Press F3 until the Main Service Menu is displayed. a. From the service terminal Main Service Menu, select: Repair Menu Show/Repair Problems Needing Repair b. Select the problem with the number you recorded in step 3a. c. Select the DDM on the Possible FRUs to Replace list. d. Continue with the next step. Continue through the repair process until the DDM replacement is called. Do not replace the DDM. Continue through the replace process as if you had replaced the DDM. Did the Repair process complete successfully? v Yes, this problem is resolved. Continue to the end of the repair process to see if there are any additional problems. v No, continue with the problem displayed on the Service Terminal. continue with the next step. Replace the DDM with a correct RPM DDM. a. Select the DDM on the Possible FRUs to Replace list. b. Continue with the next step. Continue through the repair process until the DDM replacement is called. Replace the DDM with another DDM with the correct RPM. Continue through the replace process. Did the Repair process complete successfully? v Yes, this problem is resolved. Continue to the end of the repair process to see if there are any additional problems. v No, continue with the problem displayed on the Service Terminal. continue with the next step.
4.
5.
6.
7.
300
Description
The storage capacity of the replacement DDM is smaller than required. A replacement DDM must have the same or greater storage capacity of the DDM shown on the FRU list. One of the following conditions exits: v The storage capacity of the replacement DDM is smaller than the DDM listed in the FRU list of the original problem. v The current conditions on the loop now require that the replacement DDM have a storage capacity larger than specified in the FRU list of the original problem. This occurs when a member of a good array is replaced during a service call and the only spare available is of a larger capacity. The required replacement storage capacity of the DDM must now be increased to the size of the DDM listed in the FRU list of the current problem.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Select the DDM listed in the Possible FRUs to Replace portion of the problem. 3. Proceed through the repair process to the DDM replacement. Replace the DDM with a DDM that has the same or larger storage capacity than the DDM requested in the FRUs to Replace portion of the problem.
MAP 3621: New DDM Storage Capacity Smaller Than Original DDMs
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
One or more DDMs have been added to an SSA loop that have a smaller storage capacity than the existing DDMs. All DDMs in an SSA loop must have the same storage capacity.
Isolation
1. Read this Attention before replacing any FRUs in this MAP:
301
MAP 3625: All DDMs on SSA Loop A Do Not Have the Same Characteristics
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
DDMs have been added to SSA loop A that have different characteristics than the existing DDMs or each other. All DDMs in an SSA loop must have the same bus speed.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the , use that method first. v If the FRU is not listed or selectable in the , use the Repair Menu/Replace a FRU option. 2. Use the service terminal to locate the SSA device card displayed as a Possible FRU to Replace. Copy that Resource Name (rsssaxx). 3. From the service terminal Main Service Menu, select: Utility Menu Show Storage Facility Resources Menu List DDMs on an SSA Loop Select the loop that uses the same SSA device card resources copied and loop A. 4. Observe the bus speed of each DDM on the loop. All DDMs on a loop must have the same characteristics. As required to correct the problem, you will have to replace: v Entire DDM bay, or v Individual DDMs Notes: a. To correct the characteristics problem, only the DDMs or DDM bays that you just placed on the loop should be replaced.
302
MAP 3626: All DDMs on SSA Loop B Do Not Have the Same Characteristics
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: If you are not familiar with SSA DASD service, please reference Using the SSA DASD Maintenance Analysis Procedures (MAPs) on page 176.
Description
DDMs have been added to SSA loop B that have different characteristics than the existing DDMs. All DDMs in an SSA loop must have the same bus speed.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Use the service terminal to locate the SSA device card displayed as a Possible FRU to Replace. Copy that Resource Name (rsssaxx). 3. From the service terminal Main Service Menu, select: Utility Menu Show Storage Facility Resources Menu List DDMs on an SSA Loop Select the loop that uses the same SSA device card resources copied and loop B.
303
Description
The SSA adapter failed to retrieve the state of DDMs.
Isolation
1. Check the Problem Log for Open or Pending problems with ESC=E100. Use: Repair Menu Show / Repair Problems Needing Repair Did you find any problems with ESC=E100? v Yes, repair the listed problems and then close the problem that sent you to this Map. v No, continue with the next step. 2. Check the DDM_State for all of the DDMs attached to the SSA card listed in the problem that sent you here.
304
Description
You are connected to one cluster and are attempting to verify a repair on an SSA Loop. For this repair verification, a test must be run on both clusters. When verification was run, it failed because the alternate cluster was fenced. There are two situations that will cause this: 1. There is a problem on the alternate cluster that needs to be resolved before verifying an SSA repair. 2. The failure on the SSA loop caused the alternate cluster to fence. With this condition, the alternate cluster needs to be powered off and then on to clear the fence.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Examine the other problems to see if there are any, that need to be repaired, that are not SSA loop problems. a. Go to list of other problems. From the service terminal Main Service Menu, select: Repair Menu
Problem Isolation Procedures, CHAPTER 3
305
306
Description
In an DDM bay, where a bypass card should be plugged, one of the following conditions is present: v A different kind of card is plugged v There is no card in that location v The bypass card in that location is failing v The controller card in that DDM bay is failing v The DDM bay backplane is failing
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the bypass card listed under Possible FRUs to Replace on the service terminal. See chapter 7, volume 3 of this book for: DDM Bay, Component Physical Location Codes in chapter 7 of the Volume 3 book. Is there a card plugged into that location? v Yes, continue with the next step. v No, select the bypass card from the Possible FRUs to Replace list on service terminal. Install a bypass card in that location and proceed through the verification process. Note: Verify the jumpers on the bypass card are set correctly before installing the new card. Jumper pins 2 to 3 at both the top and bottom jumper positions.
307
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 125. DDM bay Bypass Card Jumper Settings (s009436)
If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. If the verification failed, continue with any problem displayed by the verification process. 3. Look at the card(s) plugged into the bypass card position. Is it a single card with two SSA connectors on it? v Yes, there is a bypass card in this position, continue with the next step. v No, the card in this position is a passthrough card instead of a bypass card. Select the bypass card from the Possible FRUs to Replace list on service terminal. Install a bypass card in that location and proceed through the verification process. Note: Be sure that the two jumpers on the bypass card are in the correct positions. See the jumper figures in: Bypass and Passthrough Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2 book. If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. If the verification failed, continue with the next step. 4. Select the bypass card from the Possible FRUs to Replace list on service terminal. Install a bypass card in that location and proceed through the verification process. Note: Be sure that the two jumpers on the bypass card are in the correct positions.
308
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 126. DDM bay Bypass Card Jumper Settings (s009436)
v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with the next step. 5. Select the controller card from the Possible FRUs to Replace list on the service terminal. Install a new controller card in that location and proceed through the verification process. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2 book. v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with the next step. 6. Select the frame from the Possible FRUs to Replace list on the service terminal. Install a new frame in that location and proceed through the verification process. v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with any problem displayed by the verification process.
Description
In an DDM bay, where a passthrough card should be plugged, one of the following conditions is present: v A different kind of card is plugged v There is no card in that location v The passthrough card in that location is failing v The controller card in that DDM bay is failing
Isolation
1. Read this Attention before replacing any FRUs in this MAP:
309
2.
3.
4.
5.
310
Description
v A bypass card has one or both jumpers in the wrong position v A controller card in that DDM bay is failing
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. 2. Locate the bypass card listed under Possible FRUs to Replace on the service terminal. See DDM Bay, Component Physical Location Codes, in chapter 7 of the Volume 3. 3. Select the bypass card from the Possible FRUs to Replace list on the service terminal. 4. Remove the bypass card. Verify the jumpers on the bypass card are set correctly. Jumper pins 2 to 3 at both the top and bottom jumper positions.
DDM Bay Bypass Card
Jumper Pins 2 to 3
3 21
3 21
Jumper Pins 2 to 3
Figure 127. DDM bay Bypass Card Jumper Settings (s009436)
Reinstall the bypass card and verify the repair: v If the verification ran without error, the problem is resolved. Return to the service terminal and select Continue Repair Process, to return the resources to the customer and cancel the problem. v If the verification failed, continue with the next step. 5. Select the controller card from the Possible FRUs to Replace list on the service terminal. Install a new controller card in that location and proceed through the verification process. See Controller Card Removal and Replacement, DDM Bay in chapter 4 of the Volume 2.
311
Description
One of the following conditions exists: v The SSA cable may be unplugged. v A 20 MB SSA cable is plugged where a 40 MB SSA cable should be used. Note: 20 MB SSA cables are grey and 40 MB SSA cables are blue. v The bypass card at that location has failed v The controller card in that DDM bay has failed
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the , use that method first. v If the FRU is not listed or selectable in the , use the Repair Menu/Replace a FRU option. 2. Locate the bypass card listed under Possible FRUs to Replace on the service terminal. See DDM Bay, Component Physical Location Codes in chapter 7 of the Volume 3 book. Determine the color of the SSA cables connected to the bypass card. Are both of the cables blue? v Yes, continue with the next step. v No, the wrong type of SSA cable(s) are installed. Select the bypass card from the Possible FRUs to Replace list on the service terminal. Do not replace the bypass card. Replace any grey SSA cables with blue SSA cables. Proceed through the verification process. If the verification ran without error, the problem is resolved. Go to step 9 on page 313. If the verification failed, continue with any problem. displayed by the verification process. 3. Are both of the SSA cables connected to the bypass card v Yes, continue with the next step. v No, connect the cable that is not connected. Select the cable from the Possible FRUs to Replace list on the service terminal. Do not replace the cable. Proceed through the verification process.
312
313
Description
The 2105 requires that the temperature of the room air entering it must not exceed 32C (89.6F). With a room temperature of less than 32C (89.6F), the base casting temperature of the DDMs should not exceed 50C (122F). You have been directed to this MAP because the base casting temperature on two DDMs has exceeding 50C (122F). This may be caused by: v The air temperature surrounding the DDMs exceeding the maximum allowed temperature. v The air flow to the DDMs being restricted. v The temperature sensing circuits on the DDMs being faulty. v The DDMs being faulty and generating too much heat. The repair strategy of this MAP is to first determine if the air supply to the DDMs is too warm or is restricted. An over-temperature condition is not reported until two or more DDMs have sensed an over-temperature. It is possible that one of the two drives has been failing for some time and that the second DDM has just failed. If the over-temperature condition can not be corrected while examining the air supply, you will be directed to replace the DDMs one at a time.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option. Record the Problem ID of this problem. Look at the time stamp of the last occurrence. If it is more than 30 minutes old the problem is resolved and can be closed. Was the last occurrence more than 30 minutes ago? v Yes, go to step 14 on page 316. v No, continue with the next step. Determine the approximate temperature of the air at the front and rear of each 2105 Model 800 and Expansion racks. Does the air exceed 32C ( 90F)? v Yes, contact the customer and have the temperature of the room lowered, then go to step 11 on page 315. v No, continue with the next step. Look for other problems with the Failing Resource = rsuplnrsnsxxx or rslplnrsnsxxx or ssaxxx. Are there any problems as described above? v Yes, repair all of these problems, this may lower the DDM temperatures. Then return to this map and go to step 11 on page 315. v No, continue with the next step. Locate the DDMs shown in the Possible FRUs to Replace section of the problem detail or your list from the temperature utility. Note the FRU Location for the FRUs and refer to Locating a DDM Bay in a 2105 Rack in chapter 7 of the Volume 3.
2.
3.
4.
5.
314
315
Description
The 2105 requires that the temperature of the room air entering it must not exceed 32C (89.6F). With a room temperature of less than 32C (89.6F), the base casting temperature of the DDMs should not exceed 60C (140F). You have been directed to this MAP because the base casting temperature on more than two DDMs has exceeded 60C (140F). This may be caused by the air temperature surrounding the 2105 exceeding the maximum allowed temperature or something restricting the air flow to the DDMs. The DDMs reporting the over-temperature conditions are in a DDM bay.
Isolation
1. Read this Attention before replacing any FRUs in this MAP: Attention: In the following steps you may be asked to replace one or more FRUs. The Repair Menu options must be used to perform the replacement and verification. v If the FRU is listed and is selectable in the problem, use that method first. v If the FRU is not listed or selectable in the problem, use the Repair Menu/Replace a FRU option.
316
4.
5.
6. 7.
8.
9.
10.
317
14.
15. Replace the DDM. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Move cursor to desired item and press Enter: v Select DDM bay that contains the DDM. v Select the DDM you wish to replace. Follow the instructions to replace the DDM, then go to step 12. 16. Close the Problem that you have just resolved. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem Select the Problem ID you recorded earlier. Follow the service terminal instructions to see if all problems are resolved.
318
Description
While one cluster reboots, the other cluster waits for it to complete. The waiting cluster needs the other cluster to complete its rebooting, before the waiting cluster can bring the rebooted cluster back on line. The cluster never completed the reboot so that it could communicate with the waiting cluster. The cluster may be hung during the power on firmware process or booting of AIX. The total time for all this occur may add up to one additional hour to the normal time for a cluster to come ready. The cluster reboot could have been part of a manual cluster resume during a service action or a reboot due to an automatic microcode error recovery process where one cluster reboots the other cluster. Information for Product Engineering - If the cluster takes a failure in the IML back or the failback to dual cluster a different problem is created.
Isolation
1. Verify that the cluster is powered on. Press the CD-ROM drive eject button. Did the CD-ROM tray open? v Yes, continue with the next step. v No, go to MAP 4880: Cluster Power On Problem on page 461. 2. Determine if the cluster hung prior to IPL of AIX complete by observing the CEC drawer operator panel. Are any codes displayed? v Yes, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v No, continue with the next step. 3. Connect the service terminal to the failing cluster and attempt to login. Can you login? v Yes, the cluster is not hung, continue with the next step. v No, go to step 5 on page 320. 4. Disconnect the service terminal from the failing cluster and connect it to the working cluster. Use the Repair Menu, and the Alternate Cluster Repair Menu options to attempt to resume the failing cluster: v If the cluster hangs with a code displayed, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v If the cluster completes the resume and comes ready, close the problem and use the Repair Menu, End of Call Status option. v If the cluster does not complete the resume, call the next level of support.
319
MAP 4020: Hard Disk Drive Build Process for Both Drives
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: The FRUs and cables in this procedure are ESD-sensitive. Always wear an ESD wrist strap during this isolation procedure. Follow the ESD procedures in Working with ESD-Sensitive Parts in chapter 4 of the Volume 2.
Description
This procedure is used: v To load AIX and 2105 Model 800 code on both the hard disk drives of one cluster. v The code will first be loaded on one of the hard disk drives then it will be automatically mirrored to the other hard disk drive.
Requirements
v v v v 2105 O/S CD volumes 1 and 2 2105 O/S update (PTF) CD if required 2105 LIC CD Blank diskettes to save customization and configuration
Procedure
1. Are you doing an Automatic LIC Code update? v Yes, continue with the next step. v No, go to step 3. 2. Is there a problem with ESC=14xx calling a 4Axx MAP? v Yes, go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. v No, continue with the next step. 3. Were you sent here from another MAP to do the hard disk drive build process? v Yes, continue with the next step. v No, please return to the procedure that sent you here. Replacing a single hard disk drive can be done concurrent with customer operation on the cluster. The service login Cluster Dual Hard Disk Drive Repair Menu options are used.
320
MAP 4020: Hard Disk Drive Build Process for Both Drives
4. Were both clusters running on the same level of LIC code prior to entering this MAP? v Yes, continue with the next step. v No, the clusters were in a LIC code update/activation. Call the next level of support. (This MAP is designed to end up with both clusters at the same LIC level. You must use LIC CDs that are at the same level as the working cluster.) 5. Verify the service terminal is connected to the cluster not being repaired, see Service Terminal Setup in chapter 8 of the Volume 3. 6. Did you quiesce the failing cluster before you started this MAP? v Yes, continue with the next step. v No, quiesce the failing cluster using the alternate cluster repair menu options from the operating cluster. Then continue with the next step. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Quiesce the Alternate Cluster 7. Make configuration diskette(s) from the cluster not being repaired. (Multiple diskettes will be needed if the configuration is large.) Note: Do not use previously created configuration diskettes, they may not have current information. From the service terminal Main Service Menu, select: Configurations Option Menu Import/Export Configuration Data Menu Export Configuration Data via Diskette Follow the service terminal prompts, insert the diskette when instructed. Note: When the diskette(s) are removed, label them with a date and as a configuration diskette. (If there are multiple configuration diskettes, mark them in the order they were created.) 8. Make a customization diskette now from the cluster not being repaired. Note: Do not use previously created customization diskettes, they may not have current information. From the service terminal Main Service Menu, select: Utility Menu Make A Customization Diskette Follow the service terminal prompts, insert the diskette (new media for /dev/rfd0) when prompted. Insert the customization diskette in the diskette drive of the cluster being repaired. Insert the 2105 O/S VER. X.X.X. volume 1 CD in the failing cluster CD-ROM Drive. Wait until the CD-ROM Drive LED stops blinking, then go to the next step. Power off the cluster, use the Alternate Cluster Repair Menu options. Power on the cluster, use the Alternate Cluster Repair Menu options. Connect the service terminal to port S1 of the cluster being repaired and attempt to logically connect to the cluster.
9. 10.
321
MAP 4020: Hard Disk Drive Build Process for Both Drives
Note: The service terminal logical connection will be lost several times. Keep logically reconnecting the service terminal so you do not miss seeing the displayed information. 14. The cluster will begin loading code from the 2105 CD and customization diskette to build the hard disk drive. After 20-30 minutes you will be asked to remove the 2105 O/S Volume 1 CD and the customization diskette and install the 2105 O/S Volume 2 CD. Please follow the instructions to install the CD and continue. Notes: a. If the CEC drawer operator panel displays code 0c31 and the following message appears on the service terminal, this indicates that the cluster was unable to read the customization diskette. ****** Please define the System Console. ******* Type a 1 and press Enter to use this terminal as the system console... The likely causes and actions are : 1) The customization diskette was not inserted. Please insert the diskette and then restart at step 9 on page 321. 2) The customization diskette is corrupted. Please create another diskette (step 8 on page 321) and then restart at step 9 on page 321. 3) The diskette drive is failing. Please replace the diskette drive and then restart at step 9 on page 321. 4) The bootlist is incorrect. Please use MAP 43A0: Bootlist Management Using SMS on page 387 to repair this and then restart at step 9 on page 321. b. If any of the following symptoms occur, the CD image is not being read. v CEC drawer operator panel code 20EE000B - no bootable devices found. v Cluster begins booting from one of the hard-drives. For example, Init CPI4 message appears on the CEC drawer operator panel v The symptoms which sent you to this map are repeated The likely causes and actions are: 1) Dirty CD. Clean the CD and then retry the failing operation from step 9 on page 321. 2) Failing CD or CD ROM drive. Replace the FRU and then return to step 9 on page 321. 3) Incorrect bootlist. Use MAP 43A0: Bootlist Management Using SMS on page 387 to check and correct this, then restart at step9 on page 321. 15. The cluster will reboot. Please reconnect the service terminal. Ignore any error messages that may temporarily display as the status messages scroll by. 16. After 10-15 minutes, a message will appear which will ask you to either: v Remove the 2105 O/S Volume 2 CD and install the 2105 O/S update CD. In that case follow the instructions to install the CD and select the option to continue. Then go to the next step. or v Remove the 2105 O/S Volume 2 CD and install the 2105 LIC CD and the first Configuration diskette. In that case please follow the instructions to install the CD plus Configuration diskette and go to step 18 on page 323.
322
MAP 4020: Hard Disk Drive Build Process for Both Drives
17. After a few minutes you will be asked to remove the 2105 O/S update CD and install the 2105 LIC CD and the first Configuration diskette. Please follow the instructions to install the CD plus configuration diskette and select the option to continue. 18. After a few minutes you will see a message to inform you that the configuration data is being read. You will then be asked to insert the additional configuration diskettes as required. Please follow the instructions to install the remaining configuration diskette(s) and select the option to continue. 19. After a few minutes you will be asked to remove the 2105 LIC CD and configuration diskette. Please follow the instructions to remove the CD and configuration diskette and select the option to continue. 20. After a few minutes you will see a message to indicate that the cluster hard disk drives are being mirrored. 21. After approximately 40 minutes a message will be displayed to indicate that the Hard Drive Rebuild has completed. When completed, do the following: a. Type 1 and press Enter to continue. b. Messages will display and then the Copyright screen will display. c. Login to the other cluster. (Use the normal S2 port.) d. Use the Alternate Cluster Repair Menu options to power off, and power on the cluster being repaired. 22. Wait up to 45 minutes for the cluster to come ready and then attempt to login with the service terminal. Was the service terminal able to login to the cluster being repaired? v Yes, continue with the next step. v No, go to MAP 6060: Isolating a Service Terminal Login Failure on page 567. 23. With the service terminal still connected to the cluster being repaired, display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): v If there are any new problems for the cluster (CEC drawer or I/O drawer), repair them before continuing. v If the last occurrence timestamp in an existing cluster (CEC drawer or I/O drawer) problem was updated, then the cluster is still failing and needs to be repaired before continuing. 24. Connect the service terminal to the cluster not being repaired. Use the Alternate Cluster Repair Menu option to resume the alternate cluster. Wait for the operator panel cluster Ready Indicator LED to come on and then go to the next step. 25. If the service terminal repair process did not automatically close the problem, then use this step to close it now. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu Close a Previously Repaired Problem. Note: Closing or cancelling a problem will attempt to return to customer use any fenced or quiesced resources. If the problem was not fully repaired, the existing problem may be updated or a new problem created. 26. Use the service terminal options listed below to ensure all resources for this repair have been returned to customer use (they will not be listed). Any listed
Problem Isolation Procedures, CHAPTER 3
323
MAP 4020: Hard Disk Drive Build Process for Both Drives
resources are not available for customer use and will still be quiesced or fenced. Those resources should have a related problem listed that still needs repair. If resources are listed and there are no problems listed, call the next level of support. Press F3 on the service terminal until the Main Service Menu is displayed, then select: Repair Menu End of Call Status
Description
This procedure is used as follows: v During AutoLIC recovery only. v To reload AIX and the 2105 Model 800 code on the hard disk drives of one cluster.
Requirements
v v v v 2105 O/S CD volumes 1 and 2 2105 O/S update (PTF) CD if required 2105 LIC CD Customization and configuration diskettes created with the LIC Install Instructions
Isolation
1. Verify the service terminal is connected to the cluster not being repaired, see Service Terminal Setup in chapter 8 of the Volume 3. 2. Use the table to proceed.
Table 34. Original Repair MAP The AutoLIC repair started with a problem log calling MAP MAP 4A10 MAP 4A40 Any other 4Axx MAP Action Go to step 3 Go to step 3 Call the next level of support. See note.
Note: One of the following two conditions is present: v Multiple failures exist. v The cluster to be repaired may have already been booted on the new LIC code, this MAP only supports reloading the original code and then recovering the AutoLIC. 3. Did you quiesce the failing cluster before you started this MAP? v Yes, continue with the next step. v No, quiesce the failing cluster using the alternate cluster repair menu options from the cluster not being repaired. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair
324
325
Description
A CPI error has generated a problem that is ready for repair. The error recovery code has fenced (removed from customer use), a host bay, cluster, or host bay and cluster. There are four CPI diagnostic tests: v IOA Test, tests the IOA/NVS card in the cluster. v IOA to Host Bay Planar Test, tests the interface between the IOA/NVS card in the cluster and the host bay planar in the host bay. v Host Bay Planar Test, tests the host bay planar. v Host Bay Planar PCI Bus Test, tests the PCI bus section of the host bay planar which is used for cluster to cluster communication. It is the common logic between the CPI interface to each cluster. This test first uses the cluster to cluster ethernet communications to setup registers in both clusters before testing the cluster to cluster CPI communications. There are four conditions when the CPI diagnostics are run. These are listed in the table below.
326
CPI Test IOA Test IOA to Host Bay Planar Test Host Bay Planar Test Host Bay Planar PCI Bus Test
To load the host bay planar firmware, both clusters must be available (not quiesced or fenced).
Isolation
Attention: The Likely to Fix FRU percentages, shown in the problem details, cannot be used to determine the order of FRU replacement. To avoid customer impact, this MAP must be followed exactly. The repair sequence is determined by which resources are fenced and which FRUs require replacement. 1. Review all problems needing repair looking for CPI interface problems with FRUs in the host bay, cluster I/O drawer, or both. v The possible cluster I/O drawer FRUs are the NVS/IOA card and the I/O drawer planar assembly. v The possible host bay FRUs are the host bay planar assembly and host adapter cards. v The CPI cables are also possible FRUs, but will not be listed in a problem. 2. Write down the time stamp in the Last Occurrence field of each related problem. This field is updated with a new time stamp if the same error is detected again during the repair verification procedures. It is also possible for a new problem to be created if the CPI diagnostics or functional code discover a related problem. 3. Select the condition below that applies:
Table 36. Failure Condition Condition You just replaced a cluster or host bay FRU. You were performing an AutoLIC or MultiLIC code upgrade. Neither of the above. Action Go to step 5 on page 329 Go to step 7 on page 330 Go to step 4
4. Determine if any cluster or host bay or bays, are fenced or quiesced. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Show Fenced Resources Show Quiesced Resources
327
cpcluster0 (Cluster 1)
cpcluster1 (Cluster 2)
328
cpcluster0 (Cluster 1) cpcpi4 (Host Bay 1) cpcpi5 (Host Bay 3) cpcpi6 (Host Bay 2) cpcluster1 (Cluster 2) cpcpi4 (Host Bay 1) cpcpi5 (Host Bay 3) cpcpi7 (Host Bay 4) cpcluster0 (Cluster 1) cpcpi4 (Host Bay 1) cpcpi6 (Host Bay 2) cpcpi7 (Host Bay 4) cpcluster1 (Cluster 2) cpcpi5 (Host Bay 3) cpcpi6 (Host Bay 2) cpcpi7 (Host Bay 4)
5. Verify that the CPI cables are connected correctly at the host bay planars and the I/O drawer NVS/IOA cards. Note: If the listed FRUs do not fix the problem, the attached CPI cable or cables should be replaced only if a host bay planar and NVS/IOA card were replaced. 6. Is the ESC listed in the problem one of the following:
Problem Isolation Procedures, CHAPTER 3
329
ESC = 1350: CPI4 cluster 2 ESC = 1351: CPI5 cluster 2 ESC = 1352: CPI6 cluster 2 ESC = 1353: CPI7 cluster 2 v Yes, continue with the next step. v No, the availability and code level of each cluster must be determined before a CPI repair can be attempted. Call your next level of support. 8. Did this failure occur while performing a MuiltiLIC or AutoLIC update ? v Yes, continue with the next step. v No, to continue with the analysis and FRU replacement, go to step 4 on page 327. Note: You may prefer to call the next level of support to have PFE attempt to restore the firmware without FRU replacement. 9. Did this failure occur while performing a MultiLic update ? v Yes, go to step 11 on page 331. v No, (AutoLIC update) continue with the next step. 10. Is a cluster FRU listed in the problem (ESC 1340-1347 or 134C-1353)? v Yes, replace ONLY the cluster FRU listed. Do the following: Determine if the failing cluster is already quiesced. Use the Main Service Menu, Utility Menu, Resource Management Menu, Show Quiesced Resources.
330
331
Note: A CPI fence can be caused by a host bay power failure. If there is a problem for this host bay or an RPC card, repair that problem first and then begin this repair over again. If there is not a related power problem, observe the host bay planar power LEDs as shown in Figure 138 on page 453. Use a working host bay to ensure you know where to look. If the LEDs are lit, there is no power problem. If the LEDs are not lit, at the rear of the rack, observe the HA1 (for host bay 1 or 3) or HA2 (for host bay 2 or 4) LEDs on both host bay drawer power supplies. If one or both LEDs are lit, the possible failing FRUs are the host bay planar or the host bay drawer backplane. Use the Repair Menu, Replace a FRU option to replace the FRUs. MAP 4040 Section-3: 1. Have you have been directed to replace a CPI cable in addition to the FRUs listed in the problem? v Yes, when you replace a host bay or I/O drawer FRU, also replace the CPI cable. Continue with the next step. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. v No, continue with the next step. 2. Find the combination of FRUs listed in the problem, or logs, that have not been replaced. Do the action shown in the table:
332
MAP 4040 Section-4: 1. Have you have been directed to replace a CPI cable in addition to the FRUs listed in the problem? v Yes, when you replace a host bay or I/O drawer FRU, also replace the CPI cable. Continue with the next step. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. v No, continue with the next step. 2. Find the combination of FRUs listed in the problem, or logs, that have not been replaced. Do the action shown in the table:
Table 40. FRUs Not Yet Replaced FRUs Not Yet Replaced I/O drawer and host bay FRUs Action Replace the I/O drawer FRUs, go to MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 Replace the I/O drawer FRUs, go to MAP 4060: Replacing I/O Drawer FRUs for CPI Problems on page 341 Go to step 3 Note: Both clusters must be available when a host bay FRU is replaced so the host bay FRU firmware can be properly loaded. There is no cluster FRU, so it is assumed that the CPI failure is in the CPI cable or host bay and the cluster can be made available.
3. Quiesce the failing host bay. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu Quiesce a Resource 4. Quiesce and then Resume the unavailable cluster. From the service terminal Main Service Menu, select: Utilities Menu Resource Management Menu
Problem Isolation Procedures, CHAPTER 3
333
3. Logon to Cluster 2 then Quiesce Cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 4. Are there any cluster I/O drawer FRUs to be replaced? v Yes, continue with the next step. v No, go to step 6. 5. Replace the cluster I/O drawer FRUs but do not attempt to resume the cluster. a. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. b. When MAP 4700 directs you to resume the cluster, do not resume it. c. Exit MAP 4700, return here and continue with the next step. 6. Select the condition that applies: v Host bay FRUs are listed for host bay 1 only. Continue with the next step. v Host bay FRUs are listed for host bay 3 only. Go to step 8 on page 335. v No host bay FRUs. Go to step 10 on page 335. 7. Attempt to quiesce and then resume CPI 5 (Host Bay 3). From the service terminal Main Service Menu, select: Utilities Menu
334
335
3. Logon to Cluster 1 then Quiesce Cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 4. Are there any cluster I/O drawer FRUs to be replaced? v Yes, continue with the next step. v No, go to step 6. 5. Replace the cluster I/O drawer FRUs but do not attempt to resume the cluster. a. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. b. When MAP 4700 directs you to resume the cluster, do not resume it. c. Exit MAP 4700, return here and continue with the next step. 6. Select the condition that applies: v Host bay FRUs are listed for host bay 2 only. Continue with the next step.
336
337
338
Description
A cluster has detected a condition where a Host bay is being permanently held in reset state. This condition can be triggered during a machine or cluster recovery action. This is not generally a hardware failure and can be recovered by power cycling the affected bays.
Isolation
1. Is the ESS currently being used by the customer? v Yes, go to step 6 on page 340. v No, continue with the next step. 2. Cancel the problem or logs which sent you to this Map. From the service terminal Main Service Menu, select: Utility Menu Problem Log Menu
Problem Isolation Procedures, CHAPTER 3
339
Note: This ESC will only be used on code levels below 2.3.0.0. Next level of support will need to determine the affected Host bay from the AIX error log. Alternatively, you can proceed with this map and perform step 10 on page 341 against all Host bays. 7. Determine whether one or both clusters is operational. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Show Fenced Resources Show Quiesced Resources Note: If the cluster you are logged onto is fenced, then you may not be able to display the Fenced or Quiesced Resources. In that case logon to the working cluster. 8. Connect the service terminal to an operating cluster. 9. Cancel the problem or logs which sent you to this Map. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Utility Menu
340
Description
This MAP is used to replace cluster I/O drawer FRUs for CPI problems. These FRUs are the IOA/NVS card and I/O drawer planar assembly.
341
Isolation
Note: All CPI repairs begin at MAP 4040. If you were not sent here from MAP 4040: Entry MAP for CPI Problems on page 326, go there now. 1. Do one of the following: v If a cluster is fenced, replace the FRUs in that cluster I/O drawer first. Go to the next step. v If a cluster is not fenced, and you have FRUs listed in both clusters, you may select either cluster to replace FRUs in. Go to the next step. 2. Replace the cluster I/O drawer FRU or FRUs, go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. a. When MAP 4700 directs you to close the problem, do not close it until you have replaced all the listed FRUs. b. When MAP 4700 directs you to go to MAP 1500: Ending a Service Action on page 67, return here instead and continue with the next step. Have all listed FRUs for both the I/O drawer and host bays been replaced? v Yes, continue with the next step. v No, replace the FRUs by using MAP 4040: Entry MAP for CPI Problems on page 326. (Use MAP 4040 because the unavailable resources may have changed.) Are any host bays fenced? v Yes, continue with the next step. v No, go to step 6. Quiesce the fenced host bay. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource Are any host bays quiesced? v Yes, continue with the next step. v No, go to step 8. Resume the quiesced host bay. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Resume a Resource Close the related problems and then use the Repair Menu, End Of Call Status option.
3.
4.
5.
6.
7.
8.
342
Description
This MAP is used to replace host bay FRUs. The CPI diagnostics are run when the host bay is resumed.
Isolation
Note: All CPI repairs begin at MAP 4040. If you were not sent here from MAP 4040: Entry MAP for CPI Problems on page 326, go there now. 1. Replace the host bay FRU or FRUs. Use the Replace a FRU option. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU 2. Have all listed FRUs for both the I/O drawer and host bays been replaced? v Yes, continue with the next step. v No, replace the FRUs by using MAP 4040: Entry MAP for CPI Problems on page 326. (Use MAP 4040 because the unavailable resources may have changed.) 3. Close the related problems and then use the Repair Menu, End of Call Status option. Note: If the listed FRUs did not fix the problem, and the host bay planar and NVS/IOA card were replaced, the CPI cable between them may be failing.
Description
The CPI diagnostics check that each cluster IOA card CPI interface is cabled to the proper host bay CPI interface. A diagnostic detected CPI address mismatch indicates a CPI address logic failure if only one error is detected. If two errors are detected, then the most likely cause is two CPI cables being cross connected. The CPI cables and adjacent sheet-metal are marked with matching color labels to indicate proper connection.
Problem Isolation Procedures, CHAPTER 3
343
Isolation
1. Determine if there are one or two problems related to CPI address mismatch. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): v There is only one related problem. Continue the repair using the problem and replace the listed FRU(s). v There are two or more related problems. Go to the next step. 2. Two or more CPI cables are cross connected. Use the color labels on CPI cables and adjacent sheet metal to determine which cables are crossed. Or use the following tables to determine the proper connections for each CPI cable. Notes: a. The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. b. Reference to Locating a CPI Cable Using Colored Labels in chapter 7 of the Volume 3.
Table 44. CPI Cable Connections CPI Interface CPI4 Local CPI4 Remote CPI5 Local CPI5 Remote CPI6 Local CPI6 Remote CPI7 Local CPI7 Remote Cluster Location T1-U0.1-P1-I3/A1 T2-U0.1-P1-I3/A1 T2-U0.1-P1-I4/A1 T1-U0.1-P1-I4/A1 T1-U0.1-P1-I5/A1 T2-U0.1-P1-I5/A1 T2-U0.1-P1-I9/A1 T1-U0.1-P1-I9/A1 Host Bay Location R1-B1-P1/JB R1-B1-P1/JA R1-B3-P1/JB R1-B3-P1/JA R1-B2-P1/JB R1-B2-P1/JA R1-B4-P1/JB R1-B4-P1/JA Color Code Green Yellow Red Violet Gray Brown Orange Blue
3. Determine the end of each cable that is cross connected. Use the service terminal Main Menu, Replace a FRU option to quiesce and power off the FRUs the cables are connected to before correcting the cable connections. v Use the host bay FRU option for that end of each CPI cable. v Use the cluster FRU option for that end of each CPI cable.
Description
Each cluster I/O drawer has a unique jumper that identifies it as cluster 1 (left) or cluster 2 (right). These jumpers are installed when the 2105 is built by manufacturing. When the I/O drawer planar assembly is replaced, the jumper must be moved from the old planar to the new planar. The functional code will post a problem if the jumpers are missing or the same ID. A problem will be created and refer to this MAP.
344
Isolation
1. Display the Problems Needing Repair. Is ESC 2780 in the problem details? v Yes, go to step 6 on page 346. v No, continue with the next step. 2. Use the problem to determine which cluster is failing. A quick visual inspection can be done without quiescing and powering off the cluster. Move the I/O drawer to the service position, then open the top cover just long enough to verify the proper cluster ID jumper is installed: v Cluster 1 (left) jumper with blue wires is labeled T1-U0.1 P1/Q5 (P/N 18P3209) v Cluster 2 (right) jumper with blue wires is labeled T2-U0.1 P1/Q5 (P/N 18P3210) Is either jumper missing or incorrect? v Yes, continue with the next step. v No, go to step 5.
Top View
Front
Figure 128. I/O Drawer Cluster ID Jumpers (s009459)
3. The cluster ID jumper is missing or incorrect. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to install the correct jumper. Then continue with the next step. 4. Login and display the original problem. Observe the last occurrence date/time field in the problem details display. Was the date/time field updated? v Yes, the problem has not been fixed, continue with the next step. v No, the problem is fixed. Use the Repair Menu options, Close a Previously Repaired Problem and End of Call Status to complete the repair action. 5. The possible failing FRUs are:
Problem Isolation Procedures, CHAPTER 3
345
MAP 40B0: Special Cluster Problem Determination Using Slow Boot Mode
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is for special cases where additional problem determination is needed to try to generate an error code
Isolation
1. Connect the service terminal to the cluster not being serviced. 2. Quiesce and power off the cluster being serviced. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Quiesce Alternate Cluster Power Off the Alternate Cluster 3. Wait up to three minutes for the CEC drawer operator panel to display OK. Is OK displayed? v Yes, continue with the next step. v No, go to step MAP 4730: Cluster Power Off Request Problem on page 446. 4. When OK is displayed on the CEC drawer operator panel, connect the service terminal (CE most or laptop) to the S2 port on the cluster being serviced. Press the Enter key to cause a keyboard interrupt to the service processor. The service processor (SP) Main Menu should be displayed. Note: The Master console cannot be used to access the SP menus. 5. Disable Fast System Boot. (Slow system boot uses more diagnostics which might be able to provide an error code to repair with.) From the service terminal Main Service Menu, select:
346
MAP 40B0: Special Cluster Problem Determination Using Slow Boot Mode
System Power Control Menu Enable/Disable Fast System Boot Note: Remember to reset to fast system boot when the repair is complete. 6. Power on the cluster using the alternate cluster repair menu options. Look for an error code to repair the cluster with that does not send you back to this MAP. If there is none, go to the next step. 7. Display the SP error logs looking for repair information. Power off the cluster. From the service terminal Main Service Menu, select: System Information Menu Read Service Processor Error Logs 8. Display the SP progress indicators from the last system boot (cluster power on). From the service terminal Main Service Menu, select: System Information Menu Read Progress Indicators form Last System Boot Read the last progress indicator to determine what might have occurred last before the cluster failed or hung. 9. Go to MAP 4540: Cluster Minimum Configuration on page 418.
Description
This MAP is for special cases where additional problem determination is needed for an error code.
Isolation
Note: Only use this MAP if directed here by a service guide procedure. 1. If a SCSI bus repair is in progress, check the FRU(s) that were just replaced are: a. The correct part number. b. Correctly connected for signal and power inside the CEC drawer and between the CEC drawer to the I/O drawer. c. Correctly plugged for the SCSI ID. (CD-ROM = 3, HDD1 = 0, HDD2 = 2). 2. Verify that the SCSI devices are receiving power. With the cluster powered on, press the CD-ROM drive eject button. Does the CD tray open? v Yes, go to step 4 on page 348. v No, verify that the SCSI power cable is connected to each hard disk drive and CD-ROM drive in the CEC drawer. Ensure the SCSI power cable between the CEC drawer and I/O drawer is connected. If no problem is found continue with the next step. 3. There may be an overcurrent condition that tripped the automatic resettable fuses on the I/O drawer planar assembly. a. Quiesce and power off the cluster. (Connect the service terminal to the working cluster and use the Alternate Cluster Repair Menu options.) b. Wait more than five minutes with power off for the fuses to reset.
Problem Isolation Procedures, CHAPTER 3
347
Description
Use this MAP to resolve problems reported by SRNs A00-000 to A1F-FFF.
348
Description
Observe the FRU location code for the I/O drawer power supply in the problem details: v If the FRU location code ends in U0.1-V, the cluster firmware detected only one of the two power supplies when the cluster powered on.
349
Isolation
1. Observe the LED indicators on both power supplies for the failing drawer. Is an amber check indicator on? v Yes, go to step MAP 2230: CEC, I/O, or Host Bay Drawer Power Fault on page 122. v No, continue with the next step. 2. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): Are there any related power problems for the same I/O drawer? v Yes, exit this MAP and repair them now. (If the other problem also sends you to this MAP, answer No to this step. ) v No, continue with the next step. 3. One of the I/O drawer power supplies is failing without a visual symptom. To determine if the failure is still occurring will require you to power the cluster off and then on. Do the following: a. Close the problem, or logs, that sent you here. Use the Repair Menu, Close a Previously Repaired Problem option. b. Use the Repair Menu, Alternate Cluster Repair menu options to quiesce, power off, and power on the cluster. Do not resume the cluster. c. Wait up to 45 minutes for the cluster to come ready. Note: If the cluster does not come ready, go to MAP 6060: Isolating a Service Terminal Login Failure on page 567. d. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option) and then do one of the following: v If there is a new problem for this error, one of the power supplies is failing. Continue with the next step. v If the original problem details Last Occurrence Timestamp field is updated, one of the power supplies is failing. Continue with the next step. v If there is not a new problem and the original problem Last Occurrence Timestamp field is not updated, the error condition is not failing. Close the original problem using the Repair Menu, Close a Previously Repaired Problem option. Use the Repair Menu, End of Call Status option to complete the service action. 4. Isolate the failing power supply: a. Close the problem, or logs, that sent you here. Use the Repair Menu, Close a Previously Repaired Problem option. b. Quiesce and power off the cluster. c. Replace one power supply. d. Wait up to 45 minutes for the cluster to come ready. Note: If the cluster does not come ready, go to MAP 6060: Isolating a Service Terminal Login Failure on page 567. e. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option) and then do one of the following: v If there is a new problem, or if the original problem details timestamp field was updated, the power supply that was not replaced is failing. Replace
350
Isolation
Determine if the LIC installation will be from CD-ROM or diskette: v If using a CD-ROM as the LIC installation media, go to MAP 4600: Isolating a CD-ROM Test Failure on page 429. v If using a diskette as the LIC installation media, go to MAP 4620: Isolating a Diskette Drive Failure on page 430.
Description
Both host bay drawer power supplies in the same host bay drawer are reporting different status for one or more host bay drawer cooling fans. It is necessary to first replace the power supply that is reporting the fan failure. If the conflicting status error still is reported, then the other power supply must be replaced. Once the conflicting status problem has been repaired, then any remaining host bay drawer fan problem can be repaired.
Isolation
1. Read the description above before continuing. The most likely failing FRUs are: v Host bay drawer power supply 1 or 2 v Host bay planar Note: Replacing a power supply or cable can be done concurrently. Replacing the host bay planar requires taking away both host bays from customer use. 2. Record the Last Occurrence Time stamp field value from the problem details display for this problem. After the FRU has been replaced, display the same field to determine if the error is still occurring (time stamp was updated). Look for any new related problems that may have been created. 3. Do one of the following: v To replace or reseat a host bay drawer power supply continue at the next step. v To replace the host bay planar go to step 5 on page 352.
351
Description
This failure indicates that a resource has been detected (ESC = 1202) that has not been properly installed in the 2105 Model 800 .
Isolation
1. Is there another problem (ESC = 1201) indicating that a resource is missing? v Yes, a FRU has been placed in a wrong location and needs to be moved, go to step 6 on page 353. 2. v No, continue with the next step. Look at the resource in the FRU list of the problem. The 2105 Model 800 has detected a resource that has not been properly installed. Should this resource be installed in this machine? v Yes, record the Problem ID number then continue with the next step to install this resource. v No, go to step 7 on page 353. Look at Install and Remove in chapter 5 of the Volume 2. See if there is an installation procedure for this resource. Is there an installation procedure for this resource? v Yes, continue with the next step and perform the installation. v No, there is no installation process for this resource. Call the next level of support for assistance. Perform the installation as described in the Service Guide. Were you able to complete the installation? v Yes, continue with the next step to cancel original problem. v No, contact your next level of support. The problem is now resolved, cancel the original problem. Press F3 until Main Service Menu is displayed. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem Select the problem with ID you recorded in step 2. Scroll to bottom of display and select the line that starts with: Close Problem ..... The problem is now closed and this repair is complete.
3.
4.
5.
352
Description
This failure indicates that a resource has not been detected (ESC = 1201) that should be in the 2105 Model 800. This may mean that the resource is not in the expected location or the resource is failing in such a way that it can not be detected.
Isolation
1. Is there another problem (ESC = 1202) indicating that a resource is unexpected? v Yes, a FRU has been placed in a wrong location, continue with the next step to move the FRU. v No, go to step 3. 2. You are going to move the FRU to the correct location. Select the FRU in the FRU list with either of the two problems. When directed to replace the FRU, move the FRU to the correct location. Continue through verification. Does Verification run without a problem? v Yes, the problem is resolved. Return to the service terminal and follow directions to return the resource to the customer and close the problem. v No, resolve the problem created by verification. 3. Is the listed FRU an NVS card? v Yes, go to step 6 on page 354. v No, continue with the next step. 4. You will add or replace the missing/failing resource. a. Select the FRU from the problem FRU list. b. When you are directed to replace the FRU, follow the remove/replace instructions to remove the FRU.
Problem Isolation Procedures, CHAPTER 3
353
Description
A Cluster hard disk drive is failing or data on it has been corrupted.
Isolation
1. Select the type of LIC update process that failed: v If Automatic LIC Activation failed, call the next level of support. v If Multiple LIC Activation failed, continue with the next step. 2. Connect the service terminal to the failing cluster and attempt to login. Was the login successful? v Yes, continue with the next step. v No, go to step 5 on page 355. 3. Do the following and then return here and continue: v Display and repair any problem for the failing cluster.
354
Description
A failure at the interface between the primary power supply (PPS) and RPC card has been detected. The failure can be caused by the PPS, the RPC card or the cable between them.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Replace the FRUs listed in the problem. If the error still occurs, replace the cable between the RPC card and the PPS. v The PPS in rack 1 connects to RPC card, J2 connector slot 6 (near front of rack) v The PPS in rack 2 connects to RPC card, J2 connector slot 5 (near rear of rack)
355
Description
You have been directed here because you have an SRN or service processor error code that lists one or more memory FRUs (memory DIMM or memory riser card). v Memory DIMMs work together in quads (four memory DIMMs with memory words spread across them). See the artwork and tables at the end of this MAP. v A single memory DIMM may fail and not affect the operation of the other three DIMMs in the quad it is part of. Problem logs for the single failing DIMM will be created. v A memory DIMM may fail and affect one or more memory DIMMs in the quad it is part of. Problem logs for each affected memory DIMM will be created. v All memory DIMMs in a quad must be of the same type and size.
Isolation
1. Read the description section above before continuing. 2. Display problems needing repair. Is there more than one problem that lists memory FRUs for the failing cluster? v Yes, It is possible for a single memory failure to create multiple problem logs. For example, if a memory DIMM is physically missing during cluster power on, one or two problems may be created for each of the three remaining DIMMs in the same quad. There will be no problem for the missing DIMM. Go to step 4. v No, continue with the next step. 3. Does the problem list a single Memory DIMM with no other FRUs. v Yes, go to step 6. v No, continue with the next step. 4. If more than one memory DIMM is called out, go to the service processor memory configuration/deconfiguration menu to verify the memory DIMM state. Access the SP menus: v Connect the service terminal to the working cluster. v Use the Main Service Menu, Repair Menu, Alternate Cluster Repair Menu options to quiesce and power off the failing cluster. v When the failing cluster CEC drawer operator panel displays OK, connect the service terminal (CE MOST or laptop) to the I/O drawer S1 serial port connector and login. Use the SP MAIN MENU, System Information Menu, Memory Configuration/Deconfiguration Menu. 5. From the Memory Configuration/Deconfiguration Menu, select the card or cards specified by the location code or codes of the failing memory DIMM or DIMMs. If the first character of the error status of any memory DIMM is 1, 2, or 3 (but not 0 or 4), this is a suspect memory DIMM. Record its location. For more information on the error status of the memory DIMMs, see System Information Menu, step 2 on page 419. v If only one memory DIMM was recorded, go to step 6. v If more than one memory DIMM was recorded, and the memory DIMMs reside in one quad, go to step 7 on page 357. v If more than one memory DIMM was recorded, go to step 8 on page 357. 6. Only one memory DIMM was recorded. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the following FRUs in the order listed until the problem is fixed: a. The memory DIMM. b. The memory quad. See table below.
356
7. More than one memory DIMM was recorded, and the memory DIMMs reside in one quad. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the following FRUs in the order listed until the problem is fixed: a. All of the failing memory DIMMs. b. The memory riser card. 8. More than one memory DIMM was recorded, and the memory DIMMs reside in more than one quad. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the following FRUs in the order listed until the problem is fixed: a. The memory riser card. b. All of the failing memory DIMMs.
SLOT (15)A SLOT (13)B SLOT (11)C SLOT (9)D SLOT (7)D SLOT (5)C SLOT (3)B SLOT (1)A
SLOT (16)A SLOT (14)B SLOT (12)C SLOT (10)D SLOT (8)D SLOT (6)C SLOT (4)B SLOT (2)A
DIMM INSTALLATION
Figure 129. 2105 Model 800 Memory Riser Card Memory DIMM Locations (s009638)
MAP 4170: Loss of Redundant Input Power to CEC, I/O, or Host Bay Drawers
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
A CEC drawer, I/O drawer, or host bay drawer power supply has lost input power to one of its two connectors. It is still powered up on the remaining good input power connector.
357
MAP 4170: Loss of Redundant Input Power to CEC, I/O, or Host Bay Drawers
Isolation
1. Observe both input power LED indicators on the failing power supply. Find the condition below that applies: v One indicator is off, continue with the next step.
v Both indicators are on, the input is no longer failing, return to the problem that sent you here. v Both indicators are off. Use the problem to repair the power supply that has no input power. 2. Observe the same indicator on the power supply along side this one. Is the same indicator off on the other power supply? v Yes, continue with the next step. v No, verify that the power input cable is properly seated. Unplug and inspect the cable and power supply connector for damage and replace them if damage is found. If no damage is found, replace the power supply. 3. Observe the PPS digital status display at the front of the rack. Is a two digit status code displayed? v Yes, go to step MAP 2350: Isolating PPS Status Indicator Codes on page 127. v No, the display is blank, continue with the next step. 4. Each PPS has two power cables that supply six power supplies each. v The power cable plugged into PPS connector J7-1 supplies input power to the six power supplies above it (CEC drawer, I/O drawer, and host bay drawer). v The power cable plugged into PPS connector J7-2 supplies input power to the six power supplies across from it (CEC drawer, I/O drawer, and host bay drawer). Use the table below to find the power supply LED indicator that is off. Using the same table row, go to the first column and find the PPS connector. Verify that the power cable is properly connected to the PPS connector and there is no connector damage. The possible failing FRUs are the cable and the PPS. Note: With the all seven cable connectors unplugged, it is possible to use an ohm meter to check the continuity of the cable before replacing it.
Table 47. PPS Cable Connectors PPS Connector (Location Code) J7-1 (R1-V1) PPS-1 J7-2 (R1-V1) PPS-1 J7-1 (R1-V2) PPS-2 J7-2 (R1-V2) PPS-2 Host Bay Power Supply Connector (Location) J11 (R1-B1-V1 and R1-B1-V2) J12 (R1-B3-V1 and R1-B3-V2) J11 (R1-B3-V1 and R1-B3-V2) J12 (R1-B1-V1 and R1-B1-V2) I/O Drawer Power Supply Connector (Location) J1 (T1-U0.1-V1 and T1-U0.1-V2) J1 (T2-U0.1-V1 and T2-U0.1-V2) J1 (T2-U0.1-V1 and T2-U0.1-V2) J1 (T1-U0.1-V1 and T1-U0.1-V2) CEC Drawer Power Supply Connector (Location) J1 (T1-U1.1-V1 and T1-U1.1-V2) J1 (T2-U1.1-V1 and T2-U1.1-V2) J1 (T2-U1.1-V1 and T2-U1.1-V2) J1 (T1-U1.1-V1 and T1-U1.1-V2)
358
Description
An RPC card has been detected as failing to communicate with the RPC reporting the error. The loss of communication could be caused by: v RPC to RPC communication cable connectors are loose or faulty v RPC card is receiving power from its PPS v RPC card did not power on correctly, the green indicator LED is off v RPC card DIP switch positions 1 and 2 are set the same, they should be set opposite of each other.
Isolation
Observe the green LED indicator on the failing RPC card. Is the indicator on? v Yes, verify that the RPC to RPC communication cable is connected to the J2 slot 8 connector of each card. Continue at step 4. v No, continue with the next step. 2. Do the following: v Verify that the RPC to RPC communication cable is connected to the J2 slot 8 connector of each card. v Verify that the PPS to RPC power cable is connected to RPC Card connector J2 slot 6 and PPS connector J4. v Verify that the RPC card DIP switch positions 1 and 2 are set opposite of each other. v Continue with the next step. 3. Observe the PPS digital status display at the front of the rack. Is a two digit status code displayed? v Yes, go to MAP 2350: Isolating PPS Status Indicator Codes on page 127. v No, continue with the next step. 4. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option): Is there is a problem for the failing RPC card? v Yes, repair it now. v No, continue with the next step. 5. Determine if the problem is still failing when the RPC card is powered off and on and unfenced. From the service terminal Main Service Menu, select: Repair Menu Replace a FRU Did the RPC green indicator light after the power on? v Yes, the problem is no longer failing. Close the problems and then use the End Of Call Status option. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem
Problem Isolation Procedures, CHAPTER 3
1.
359
MAP 4190: RPC to Host Bay Drawer Power Supply Communication Failure
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
One or more host bay drawer power supplies have been detected as failing to communicate with the RPC reporting the error. The loss of communication could be caused by: v The RPC to host bay drawer power supply communication cable is loose or faulty. v The host bay drawer power supply is failing. v The RPC card communication interface is failing.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Verify that the RPC to host bay drawer power supply communication cables are connected correctly. Each cable has three connectors, one for the RPC card and one for each of the two host bay drawer power supplies. Use each row of the table to determine the connector locations of each of the four communication cables:
Table 48. Host Bay Drawer Power Supply Communication Cable Connectors RPC Connector RPC-1 J2 slot 9 RPC-2 J2 slot 9 RPC-1 J2 slot 13 RPC-2 J2 slot 13 Host Bay Drawer Power Supply Connector T1-U1.1-V1/J15 (right connector viewed from rear) T1-U1.1-V1/J14 (left connector viewed from rear) T2-U1.1-V1/J15 (right connector viewed from rear) T2-U1.1-V1/J14 (left connector viewed from rear) Host Bay Drawer Power Supply Connector T1-U1.1-V2/J15 J15 (right connector viewed from rear) T1-U1.1-V2/J14 (left connector viewed from rear) T2-U1.1-V2/J15 J15 (right connector viewed from rear) T2-U1.1-V2/J14 (left connector viewed from rear)
Are the cables connected correctly? v Yes, the possible failing FRUs are the host bay drawer power supply, or the RPC card to host bay drawer power supply communication cable, the RPC card. Use the Repair Menu, Replace a FRU option for the FRU. For the cable use the RPC card as the FRU being replaced. Note: If the failure is to one power supply, the failing FRU is probably the power supply. If the failure is to both power supplies, the failing FRU is probably the communication cable or the RPC Card.
360
MAP 4190: RPC to Host Bay Drawer Power Supply Communication Failure
v No, to replug the cable, use the Repair Menu, Replace a FRU option for the FRU the cable connects to.
MAP 41A0: RPC Card Host Bay Drawer Fan Reporting Failure
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
The host bay drawer power supplies are reporting conflicting status for the host bay drawer power supply cooling fans.
Isolation
1. The most likely failing FRU is host bay drawer power supply 1 or 2. 2. Record the Last Occurrence Time stamp field value from the problem details display for this problem. After the FRU has been replaced, display the same field to determine if the error is still occurring (time stamp was updated). Look for any new related problems that may have been created. 3. Replace one host bay drawer power supply. v If it still fails, replace the other power supply. v If it still fails call the next level of support. Note: If the power supply to be replaced is not listed in the problem, use the Repair Menu, Replace a FRU option instead.
Description
A CPI interface failure has been detected between an NVS/IOA card in a cluster and the connected host bay. There may be a problem with the CPI cable.
Isolation
Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished. 1. Find the ESC from the problem in the table below to determine which CPI interface is failing. Then continue with the next step.
Table 49. Failing CPI Interface ESC 1111 1112 1113 Cable From Host Bay Host bay 1, left connector Host bay 1, right connector Host bay 2, left connector Cable To NVS/IOA Card Cluster 1, I/O drawer card slot 3 Cluster 2, I/O drawer card slot 3 Cluster 1, I/O drawer card slot 5 Resource cpcpi4 cpcpi4 cpcpi6
361
Are you installing the 2105 or did you just replace the host bay planar and/or NVS/IOA card FRU or FRUs for the failing CPI interface? v Yes, check or reseat the CPI cable at both ends for the failing CPI interface and then retry the FRU verification. If it still fails, the possible failing FRUs are the CPI cable , NVS/IOA card (in I/O drawer), or host bay planar. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the FRUs. v No, continue with the next step. 3. Did you just replace the CPI cable for the failing CPI interface? v Yes, check or reseat the CPI cable at both ends for the failing CPI interface and then retry the FRU verification. If it still fails the possible failing FRUs are the CPI cable, NVS/IOA card (in I/O drawer) or host bay planar. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the FRUs. v No, the possible failing FRUs are the NVS/IOA card (in I/O drawer) or host bay planar. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the FRUs.
2.
Description
An NVS/IOA card CPI interface resource has been detected as logically missing. The ESC defines if one or both clusters are detecting this condition.
Isolation
Do one of the following: v If the ESC is 2770, continue with the next step. v If the ESC is 2771, go to step 5 on page 363. 2. Are FRUs listed in the problem? v Yes, continue with the next step. v No, go to step 4 on page 363. 3. Use the table below to find the combination of FRUs listed and the needed action. 1.
362
4.
Determine which cluster is detecting the problem. In the problems details, go to the Additional Engineering Information for this problem section to see the Failing Cluster field value: v 1 = cluster 1, left
v 2 = cluster 2, right 5. Determine which CPI interface is failing by doing one of the following: v Go to step 6 to display the resource list using the service login. v Call the next level of support. They will need to access the detailed error log information by using a remote login or a PE package. (The PE database has a procedure for ESC = 2770 or 2771 that they can use.) Then go to step 7 on page 364. 6. Display the resources and determine which of the four CPI interfaces is not listed as Available. Repeat this procedure logged in to each cluster. From the service terminal Main Service Menu, select: Configuration Options Menu Show Storage Facility Resources Menu Show Storage Facility Resources The CPI interface resource names are listed in the first column. The four CPI resources are cpcpi4, cpcpi5, cpcpi6, and cpcpi7. A working resource will have a status of Available in the second column. A failing resource will have a status of Defined in the second column or will not be listed in column one. There can be several hundred resources listed. Use the AIX find feature. Type / to open the find feature. Type cpcpi and press Enter to search for the first occurrence. A CPI resource should be displayed. Repeat the find three more times. How many CPI interfaces are not listed as Available or are not listed: v One, continue with the next step.
Problem Isolation Procedures, CHAPTER 3
363
7.
Table 51. Cluster I/O Drawer Slot Locations CPI Interface cpcpi4 cpcpi5 cpcpi6 cpcpi7 I/O Drawer slot location P1-I3 P1-I4 P1-I5 P1-I9
8. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the NVS/IOA FRU or FRUs.
Description
A CPI problem is being detected for a host bay planar host adapter slot. The most likely cause is the host adapter installed in the slot. The failure was discovered at a time when the normal reporting information was not available. Special information in the FRU Name field of the problem details identifies the failing slot location.
Isolation
1. Determine which host adapter card is failing, At least one FRU listed in the problem details is a host adapter card. To determine which host adapter card to replace, display the FRU Name fields in the problem details. Use the table below to translate the FRU Name value to the host adapter card location to replace. Note: The FRU Name field syntax is: Slot xy0 where x is the CPI interface and y is the host bay slot.
Table 52. Host Adapter Card FRU Names Text in FRU Name field Slot 400 Slot 410 Slot 420 Slot 430 Host Adapter Card to Replace Host Bay 1, Slot 1 Host Bay 1, Slot 2 Host Bay 1, Slot 3 Host Bay 1, Slot 4
Host Bay 3, Slot 1 Host Bay 3, Slot 2 Host Bay 3, Slot 3 Host Bay 3, Slot 4
364
Host Bay 4, Slot 1 Host Bay 4, Slot 2 Host Bay 4, Slot 3 Host Bay 4, Slot 4
2. Replace the FRU or FRUs listed in the problem using the Repair Menu, Replace a FRU option. If the problem still occurs, call the next level of support.
Description
A CPI problem has been detected which can be caused by a CPI cable or one of the FRUs listed in the problem.
Isolation
1. A CPI cable should be replaced along with the FRUs listed in the problem. Use the following table to determine which CPI cable to replace. Note: The CPI cable connector can be easily damaged during a FRU replacement. Temporarily use a velcro cable tie to secure the loose cable end where it will not be damaged. The velcro cable tie can be removed from elsewhere on the 2105. Return the tie when finished.
Table 53. CPI Cable FRUs Listed FRUs NVS/IOA card Host Bay Planar CPI Cable to Add to FRU List CPI cable connected to the NVS/IOA card CPI cable between host bay planar and cluster in the Failing Cluster field in the problem details. CPI cable between the listed FRUs.
2. Go to MAP 4040: Entry MAP for CPI Problems on page 326 to replace the FRUs.
Description
A host bay or cluster was temporarily unavailable for customer use because it was fenced due to a CPI error. CPI diagnostics were run automatically and did not fail.
365
Isolation
There is no repair action needed for this problem and it should be closed. 1. Contact the customer to ensure that the recovery was successful. 2. Close the problem. Note: If this CPI error reoccurs, a new problem will be created that will require FRUs to be replaced. The FRUs in this problem are listed as reference information for the next level of support.
MAP 4200: Extended Cluster IML Time Due to NVS Battery Charging
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Do not power off the 2105 unless instructed to do so.
Description
The time to complete a cluster IML is extended when one or both NVS batteries, in the battery assembly FRU, are being charged. This can also happen when the charger card status is being rebuilt. These conditions can occur when an NVS battery FRU has been disconnected, replaced, or discharged. The NVS batteries must be fully charged to guarantee NVS data retention. Note: If one or both battery cables were left disconnected, a battery failure will be indicated. This will be handled as a battery failure, and a problem will be created during the cluster IML.
Isolation
1. The cluster LCD display panel is displaying a status message to reference this MAP. This status message also shows which NVS batteries are being charged. 2. Find the condition you have in the table below. You must wait up to the maximum amount of time for the cluster IML to complete. If the IML does not complete, call your next level of support.
Condition Total time for cluster to complete IML from power on (See note below) Action
Wait for IML to complete v Minimum 30 minutes (battery does not need to be recharged but the battery charge profile must be rebuilt) v Maximum 90 minutes (battery needs to be fully recharged and the battery charge profile must be rebuilt)
I/O drawer NVS charger card Maximum 30 minutes (battery Wait for IML to complete FRU was replaced does not need to be recharged but the battery charge profile must be rebuilt)
366
I/O drawer FRU replaced that Maximum 30 minutes (battery Wait for IML to complete required disconnecting an does not need to be NVS battery cable. recharged but the battery charge profile must be rebuilt) NVS battery was drained v Minimum 30 minutes when the 2105 lost customer (battery needs a minimal input power unexpectedly recharge) (2105 was not powered down v Maximum 90 minutes using the operator panel (battery needs to be fully white switch) recharged) 2105 has been powered off for several days Wait for IML to complete
Wait for IML to complete v Minimum 30 minutes (battery does not need to be recharged but the battery charge profile must be rebuilt) v Maximum 90 minutes (battery needs to be fully recharged) Return to install instructions
First IML during an install is not delayed for an NVS battery status check
Note: If one of the I/O drawer power supplies is failing, the time listed will be doubled.
MAP 4240: Isolating a Blinking 888 Error on the CEC Drawer Operator Panel
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1. Attention: Do not power off the 2105 unless instructed to do so.
Description
v A blinking 888 number suggests that either a hardware or software problem has been detected and a diagnostic message is ready to be read. The next level of support will be called as they may have the additional information and access authority to do problem isolation and resolution.
Isolation
1. Perform the following steps to record the information contained in the blinking 888 message and then call your next level of support. a. Wait until the blinking 888 is displayed. b. Record in sequence each code that is displayed after the blinking 888 goes away. Stop recording when the blinking 888 reappears. Separate each code recorded with a blank space. c. Go to step 2 on page 368.
367
368
369
Description
The cluster attempted to IML two times and failed each time. A problem was created. When a cluster powers on, it first loads the AIX operating system, then the functional code and finally the RAS (maintenance package) code. The code load counter is initially set to 0 and is increased by 1 at the start of the code load. If the code load is successful, the counter is reset to 0. If it is unsuccessful, the counter is not reset to 0. If the load of the functional code is not successful, the failing cluster creates an AIX error log. A problem is not created as the functional code and RAS code were not able to be loaded yet. The other cluster reboots the failing cluster to attempt to get past the error. If the code load is successful, the code load counter is reset to 0. The AIX error log from the prior unsuccessful attempt will not create a problem as the error was temporary. If the second reboot attempt fails, a final reboot occurs. The AIX code is loaded, the functional code load which would fail is bypassed, and the RAS code is loaded. This leaves the failing cluster unable to do customer operations, but able to accept a service terminal login for service actions. The other cluster creates a problem with an ESC=38F0 and uses this MAP for further isolation. The problem does not give the error that caused the code load failures. The failing cluster should create a problem using the AIX error log from the prior unsuccessful attempt. The problem should contain the repair action for the error that caused the code load failures.
370
Isolation
1. Read the description section above. 2. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Look for related problems that have cluster or power FRUs. (SSA or drawer problems are not related.) Were related problems found? v Yes, repair them. v No, call the next level of support.
MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
The CEC drawer operator panel displays various types of codes that indicate the status of the cluster power on and code load. Some are normal status and progress indications that change every few seconds. These same codes can indicate a problem if the cluster appears to hang with the code still displayed. Other codes indicate error conditions that will not prevent the code load from completing but will create a problem. Still other codes indicate conditions that prevent the cluster from completing its power on or code load. Notice that a Ready for Login display normally means the cluster is powered on and all code is nearly loaded. The Ready for Login can become blank shortly after first appearing, this is normal operation. However, the Cluster Ready indicator on the 2105 Model 800 operator panel will stay lit.
Isolation
1. Use the Repair Menu, End Of Call Status option to display and repair any related problems. If there are none, continue with the next step. 2. The table lists possible symptoms and the actions to repair them. If you have: v Only one symptom, find it in the following table and do the listed Actions. If that does not correct the problem, look for other listed symptoms you may have missed. If that does not repair the problem, call the next level of support. v Multiple symptoms, find the last symptom you observed in the table and do the listed Actions. If that does not repair the problem, use the earlier symptoms you observed to attempt to repair the problem. If that still does not correct the problem, call the next level of support.
371
Went blank after displaying Ready 1. This is a normal indication at the end of a successful cluster power on and for Login code load. The cluster is ready for a service terminal login 2. The Ready for Login display can be overwritten at any time by an AIX operating system or service terminal action that will cause it to be blank. Ready for Login is displayed 1. This is a normal indication at the end of a successful cluster power on and code load. The cluster is ready for a service terminal login but should not be resumed until the rack operator panel Cluster Ready LED is lit. 2. The Ready display can be overwritten at any time by an AIX operating system or service terminal action that will cause it to be blank. 3. If the 2105 has been powered off for more than three days, the cluster power on may take up to two hours to display Ready for Login or light the cluster Ready LED. The NVS charger card needs this time to rebuild its profile of the battery capacity.
372
Service terminal cannot connect (display Copyright and Login screen) or cannot login (display Main Service Menu)
The cluster stops and a 3-digit number is displayed in the CEC drawer operator panel. 888 is displayed followed by additional error codes. Cluster stops with 0005 displayed.
373
Go to Entry Table for CEC Drawer Operator Panel Codes in chapter 9 of the Volume 3. Go to Entry Table for CEC Drawer Operator Panel Codes in chapter 9 of the Volume 3. Go to Entry Table for CEC Drawer Operator Panel Codes in chapter 9 of the Volume 3. Go to Entry Table for CEC Drawer Operator Panel Codes in chapter 9 of the Volume 3. If the service terminal is kept logically connected, this normally happens after the cluster POST indicators are displayed. The term POST indicators refer to the resource names that are listed after the multiple lines of RS/6000 are displayed. They are memory keyboard network SCSI speaker. Go to MAP 43A0: Bootlist Management Using SMS on page 387.
The cluster appears to restart/reboot more than four times while displaying the Exxx system firmware codes. The 0Cxx AIX progress codes are not displayed. The cluster appears to restart/reboot when displaying the 10 character codes. The cluster returns to the E1xx progress codes and begins the code load sequence again. This may occur up to three times.
There is a problem that prevents the cluster from reaching E105 that would begin the AIX boot from the hard disk drive. Connect the service terminal to the working cluster and use the Alternate Cluster Repair Menu options to power off the failing cluster. Then go to MAP 2700: CEC Drawer Power On Problem on page 170. When the problem is fixed, the cluster should be able to boot from the hard disk drive. There are certain error recovery sequences during code load at the time the CPI interfaces are being initialized that will cause up to 3 code loads to be attempted. A problem will be created and the cluster message indicator on the 2105 Model 800 operator panel will be on. Connect the service terminal to the cluster with the message indicator on and use the Main Service Menu -> Start Repair -> Show/Repair Problems Needing Repair option. v If a related problem is found, repair it. v If no related problem is found, then attempt to recreate the problem by power cycling the cluster again. Connect the service terminal to the working cluster and use the Repair Menu -> Alternate Cluster Repair Menu -> options to: Quiesce the Alternate Cluster Power Off the Alternate Cluster Power On the Alternate Cluster. Observe the CEC drawer operator panel during power on and code load. If it loads normally, then use Resume Alternate Cluster to return the cluster to customer use. If the cluster fails with a problem created, repair it. If the cluster fails with no problem created, call the next level of support.
374
The cluster stops and POST Go to MAP 2700: CEC Drawer Power On Problem on page 170. indicators are displayed on the service terminal session (if it had been kept logically connected since the cluster power on. The term POST indicators refer to the resource names that are listed after the multiple lines of RS/6000 are displayed. They are memory keyboard network SCSI speaker.
Description
The process to display the problems first attempts to access the problem file on the cluster that the service terminal is connected to. If the file cannot be read, an error message will be included in the service terminal problem display screen for that cluster. The process to display the problems then attempts to access the problem file on the other cluster. It attempts to communicate through the cluster to cluster ethernet connection. If there is no response from the other cluster when trying to read the problem, then an error message will be included in the service terminal problem display screen for that cluster.
Isolation
1. Use the service terminal Show / Repair Problems Needing Repair option. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, continue with the next step. v No, go to step 6 on page 376. 2. There is a problem displaying the problems for the other cluster. (The other cluster is the cluster that the service terminal is not connected to, it is also the failing cluster.) Is the CEC drawer operator panel for the failing cluster hung displaying a code (other than Ready for Login) for more than five minutes? v Yes, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v No, continue with the next step. 3. Connect the service terminal to the other cluster and attempt to login. Is the Copyright and Login screen displayed? v Yes, continue with the next step. v No, the Copyright and Login screen is not displayed, go to step MAP 6060: Isolating a Service Terminal Login Failure on page 567. 4. Attempt to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option):
Problem Isolation Procedures, CHAPTER 3
375
Description
This Map is no longer used to isolate Customer LAN problems. Please use the table below to select the correct MAP for the problem which was detected.
Isolation
1. Use the following table to determine the next action based on the problem that sent you here:
Problem Go to:
376
Description
The clusters communicate to each other through an ethernet connection for the RAS (maintenance package) operations. A new 2105 Model 800 comes from the factory with a short ethernet jumper cable directly connecting the RJ-45 connector on each cluster. This special jumper cable crosses signals within the cable so an ethernet hub is not needed to direct connect the clusters to each other. All 2105 Model 800 leave the factory set with the same pair of TCP/IP addresses, one for each cluster bay. Those TCP/IP addresses are changed when connected to the ESSNet console or customer ethernet. You have been directed to this map because a cluster to cluster ethernet communication problem was detected by the microcode. The following SRN information (displayed in the problem log) may provide additional information for product engineering.
SRN 128 129 130 131 132 133 134 136 137 144 1232 Description Invalid parameter failure Failure during authorization Socket failure Daemon Initialization/Setup fail Loopback IP address invalid failure Write to socket failure Read to socket failure Operations on a file failure System subroutine failure Client timeout failure No additional information provided in the SRN
377
Isolation
MAP 4390 Section-1: Note: Intermittent failures are not normally due to 2105 TCP/IP settings. They can be due to intermittent hardware failures and intermittent or marginal network problems. This includes the customer network if it is attached to the ESSNet. 1. Does the problem that sent you here have ESC = 13FF? v Yes, continue with the next step. 2. v No, go to step 3. Is the problem ESC = 13FF? v Yes, the cluster to cluster communication problem was caused by the rsACExecd daemon on the cluster that reported the 13FF. The daemon was automatically recovered by the microcode:
Close the problem with ESC = 13FF. If the other cluster has a prior problem with ESC = 1232, close that problem. Use the Repair Menu, End of Call Status option to verify 2105 status before logging out. v No, continue with the next step. 3. Find the condition that applies in the table below:
Table 55. Cluster to Cluster Communication Problem, MAP Entry Condition Most Likely Cause of the Problem Go To: MAP 4390 Section-2 on page 379
A 2105 is being installed and v New 2105 from IBM no TCP/IP settings were manufacturing. Most likely a changed yet. The cluster to hardware failure of either cluster cluster communication failed I/O drawer planar assembly, or with the cross cluster the cross cluster ethernet jumper ethernet communication cable. cable still connected. v Reinstall of a 2105. Most likely the cross cluster ethernet jumper cable is not installed. It also could be invalid TCP/IP settings from the prior installation, or hardware FRUs listed for new 2105 above. The cluster TCP/IP settings on the 2105 were just changed, and now it is failing. The first 2105 was just installed on a new ESSNet. One or more TCP/IP settings on either cluster did not get updated correctly, is invalid, or created a duplicate TCP/IP condition on the network. The TCP/IP settings were working prior to the 2105 being connected to the new ESSNet. The problem is with the ESSNet ethernet hub or ethernet cables.
378
An additional 2105 was just v If the new 2105 is failing, the added to an existing ESSNet problem is probably the ESSNet (with one or more 2105s ethernet hub, the ethernet already on it.) cables, or a duplicate TCP/IP address. (It is assumed that the TCP/IP settings were working prior to the 2105 being connected to the new ESSNet.) v If an existing 2105 is failing, the new 2105 has probably created a duplicate TCP/IP address. Customer just made changes The customer network may have a to their network network device that now has a duplicate TCP/IP setting. (It is assumed that no changes were made to the ESSNet network.) None of the above conditions The problem could be failing are suspected, the cause of hardware, TCP/IP settings in the problem is unknown. cluster, duplicate TCP/IP address on the network.
MAP 4390 Section-2: The 2105 is being installed new from IBM or reinstalled from a prior account. Do the following actions in the order listed until the problem if fixed:
Table 56. Cluster to Cluster Communication Failure Action Go to:
Verify the cross cluster ethernet jumper cable See figure 382 is installed and connected to both clusters. Verify the TCP/IP settings for both clusters are correct. The above most likely causes did not fix the problem. Do a complete checkout. MAP 4390 Section-10: Check the TCP/IP Settings for Each Cluster on page 383 MAP 4390 Section-13: 2105 Install Failure, Replace FRUs or Do Further Isolation on page 386
MAP 4390 Section-3: TCP/IP settings on the 2105 were just changed. The new settings were provided by the customer if the ESSNet is to be attached to the customer network. The settings were provided by the service guide Install Chapter 5 if it is not to be attached to the customer network. There are two methods used to change the settings: 1. You logged in to one cluster and then used the Dual cluster option to update all information on both clusters. 2. You logged in to each cluster and used the single cluster options to update each cluster. This step assumes the problem is with the new settings. One of the following occurred: 1. One or more of the provided settings is not valid. 2. One or more of the settings were entered incorrectly.
379
Display cluster TCP/IP settings as defined by MAP 4390 Section-10: Check the TCP/IP the customer configuration worksheet or Settings for Each Cluster on page 383 service guide Install Chapter 5. The above most likely causes did not fix the problem. Do a complete checkout. MAP 4390 Section-7 on page 381
MAP 4390 Section-4: The first 2105 was just installed on a new ESSNet. This step assumes that the cluster to cluster communication was working before being connected to the ESSNet ethernet hub. The problem is most likely with the ESSNet ethernet hub or ethernet cables. Note: During the 2105 install, the clusters are directly connected to each other by the cross cluster ethernet communication cable. The cluster TCP/IP settings are updated for attachment to the ESSNet while the clusters are still directly connected together. Do the following actions in the order listed until the problem if fixed:
Table 58. Cluster to Cluster Communication Problem, New ESSNet Action Check visual symptoms of cluster Ready LEDs and ESSNet ethernet hub LEDs. The above most likely causes did not fix the problem. Do a complete checkout. Go to: MAP 4390 Section-9: Check for Visual Symptoms on page 382 MAP 4390 Section-7 on page 381
MAP 4390 Section-5: A 2105 was just installed on an existing ESSNet with at least one 2105 already attached. This step assumes that the new 2105 cluster to cluster communications were working after the cluster TCP/IP settings were updated before being connected to the ESSNet ethernet hub. v If the new 2105 is failing: 1. There is a problem with the ethernet cables to the ethernet hub, or the ethernet hub. 2. There is a duplicate TCP/IP setting between the new and existing 2105. v If the existing 2105 is failing: 1. There is a duplicate TCP/IP setting between the new and existing 2105. Do the following actions in the order listed until the problem if fixed:
Table 59. Cluster to Cluster Communication Problem, Existing ESSNet Action Check visual symptoms of cluster Ready LEDs and ESSNet ethernet hub LEDs. Check cluster TCP/IP settings as defined by customer configuration worksheet or service guide Install Chapter 5. Go to: MAP 4390 Section-9: Check for Visual Symptoms on page 382 MAP 4390 Section-10: Check the TCP/IP Settings for Each Cluster on page 383
380
MAP 4390 Section-6: The customer has made changes to their network and now the cluster to cluster communication does not work. This step assumes the clusters are connected to an ESSNet which is also connected to the customer network. The most likely cause of this problem is that the customer network has a duplicate TCP/IP address of one of the clusters. Do the following actions in the order listed until the problem if fixed:
Table 60. Cluster to Cluster Communication Problem, Customer Network Action Go to:
Use the ESSNet console to ping each cluster MAP 4390 Section-12: Console Ping Test to looking for a failure or duplicate TCP/IP Each Cluster (also Tests for Duplicate address on network. TCP/IP Address) on page 385 The above most likely causes did not fix the problem. Do a complete checkout. MAP 4390 Section-7
MAP 4390 Section-7: No known changes have been made to the clusters, ESSNet, or customer network and yet the cluster to cluster communication stopped working. The following actions in the order listed should isolate the problem:
Table 61. Cluster to Cluster Communication Problem, Unknown Cause Action Test cluster to cluster communication by displaying problems needing repair from each cluster. Repair any related problems. Check visual symptoms of cluster Ready LEDs and ESSNet ethernet hub LEDs. Check cluster TCP/IP settings as defined by customer configuration worksheet or the service guide Install Chapter 5. Test cluster to cluster communication using the cross cluster ethernet cable (clusters disconnected from the ESSNet). MAP 4390 Section-9: Check for Visual Symptoms on page 382 MAP 4390 Section-10: Check the TCP/IP Settings for Each Cluster on page 383 MAP 4390 Section-11: Test Cluster to Cluster Communication with the Direct Connect Cable on page 384 Go to: MAP 4390 Section-8: Test if the Communication Problem is still Occurring
Do an ethernet ping test to each cluster from MAP 4390 Section-12: Console Ping Test to the ESSNet console. If it fails, disconnect the Each Cluster (also Tests for Duplicate failing cluster and ping again looking for a TCP/IP Address) on page 385 duplicate TCP/IP address. Call the next level of support.
MAP 4390 Section-8: Test if the Communication Problem is still Occurring: Login to each cluster and use the service login Display Problems Needing Repair (Repair Menu, Show / Repair Problems Needing Repair option) option to test if the cross cluster communication is working.
Problem Isolation Procedures, CHAPTER 3
381
I/O Drawer 1
I/O Drawer 2
Front View
3. Verify the following ESSNet ethernet hub indications are present: a. Power LED is on.
382
383
384
385
386
Description
The cluster boot list is kept in the I/O Drawer Planar Assembly NVRAM. The normal bootlist sequence is fd0 (diskette drive), cd0 (CD-ROM drive), hdisk0 ( hard disk drive), and hdisk1 (hard disk drive) or more simply: fd0, CD0, hdisk1, or hdisk0. v If the cluster is not able to boot up to AIX, System Management Services (SMS) is used to display and update the bootlist. v If the cluster is able to boot to AIX, service login options are used to display and update the bootlist. This MAP is only for SMS bootlist maintenance.
Isolation
1. Are you doing an Automatic LIC Code update? v Yes, continue with the next step. v No, go to step 3. 2. Is there a problem with ESC=14xx calling a 4Axx MAP? v Yes, go to step MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392. v No, continue with the next step. 3. Connect the service terminal to the cluster not being serviced. Repair any problems related the dual cluster disk drives for the failing cluster. If there are none, continue at the next step. From the service terminal Main Service Menu, select: Repair Menu
Problem Isolation Procedures, CHAPTER 3
387
4.
Cluster Dual Hard Disk Drive Menu Display Cluster Dual Hard Disk Drive Status (Identify/Replace a Failing Cluster Hard Disk Drive) 5. Connect the service terminal to the cluster not being serviced. 6. Quiesce and power off the cluster being serviced. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Quiesce Alternate Cluster Power Off the Alternate Cluster 7. Wait up to five minutes for the CEC drawer operator panel to display OK. Is OK displayed? v Yes, continue with the next step. v No, go to MAP 4730: Cluster Power Off Request Problem on page 446. 8. When OK is displayed on the CEC drawer operator panel, connect the service terminal to the S1 serial port of the cluster being repaired. Press the Enter key to cause a keyboard interrupt to the service processor. The service processor Main Menu should be displayed. 9. Setup to boot to the SMS menu. From the service processor Main Menu, select: System Power Control Menu Boot Mode Menu Boot to SMS Menu (set to enable) Note: The Boot to SMS setting will automatically be reset to disable during the next cluster power off. 10. Power on the cluster (SP menu refers to the cluster as a system). Enter 98 to return to the System Power Control Menu, then select Power On System. During the cluster power on, keep the service terminal logically connected to the cluster being powered on. Do not press the 1 key when prompted to select type of boot. Wait for the SMS main menu to display. It takes about four minutes from cluster power on to SMS being up. 11. Display the boot list using the text-based System Management Services: From the service processor Main Menu, select: Utilities Menu Multi Boot Menu Select Boot Devices Display Current Settings
388
Version M2P020312 Copyright IBM Corp, 2000 All rights reserved. Current Boot Sequence 1 2 3 4 5 Diskette SCSI CD-ROM (loc=U0.1-P1/Z1-A3,0 SCSI 9100 MB Harddisk (loc=U0.1-P1/Z1-A0,0) SCSI 9100 MB Harddisk (loc=U0.1-P1/Z1-A2,0) None
Are the four boot devices shown above displayed on the cluster SMS screen? (The order of the Harddisks in the list does not matter.) v Yes, go to MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320 to reload all the cluster software. MAP 4020 will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive. v No, continue with the next step. 12. Do the following to see if the hard drives are detected on the SCSI bus. a. Press X repeatedly until you are back at the Utilities Menu. b. Enter 6 (MultiBoot) c. Watch carefully for one or both of the following messages. It is important to know exactly which of them appear. They will only appear briefly so you may need to exit and repeat this step several times. (To repeat : Press X once to return to the Utilities menu and then press 6). v Message, /pci@fff7f08000/scsi@c/sd@0,0, indicates that hdisk0 has been found. v Message, /pci@fff7f08000/scsi@c/sd@2,0, indicates that hdisk1 has been found. Note: The messages will be preceded by several other similar messages such as: /pci@fff7f0a000/pci@c,2/ssa@1/disk@21013D1104B14CK. These messages refer to SSA devices devices detected on the PCI adaptors and can be disregarded. 13. Use the table to determine the condition you have, and the action to perform. Note: The hdisk0 or hdisk1 displayed in current settings can be on either hard disk drive (HDD1 or HDD2). It depends on which hard disk drive the cluster booted from last. To translate the hdisk to an HDD, you must use the displayed SCSI IDs in the location codes as explained in step 12c.
Problem Isolation Procedures, CHAPTER 3
389
MAP 43A0 Section-1: Three SCSI boot devices not found. MAP 43A0 Section-2: Three boot devices found on page 391. MAP 43A0 Section-3: Both hard disk drives not found on page 391. MAP 43A0 Section-4: One hard disk drive not found on page 391. MAP 43A0 Section-4: One hard disk drive not found on page 391. MAP 43A0 Section-5: One hard disk drive and CD-ROM drive not found on page 391. MAP 43A0 Section-5: One hard disk drive and CD-ROM drive not found on page 391. MAP 43A0 Section-6: CD-ROM drive not found on page 392.
yes
yes
yes
yes
no
no
yes
no
yes
yes
yes
no
no
no
yes
no
yes
no
no
yes
yes
MAP 43A0 Section-1: Three SCSI boot devices not found: Common problem to all three SCSI boot devices. Do the following: 1. Press the CD-ROM drive eject button: v If the CD tray opens, continue with the next step. v If the CD tray does not open, there is most likely a power problem to the SCSI devices. Verify the SCSI power cables between the CD-ROM drive and I/O drawer planar assembly are plugged correctly. The possible failing FRUs are the I/O drawer planar assembly and the SCSI power cables. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the FRUs. 2. Verify the SCSI signal and power cables between the SCSI boot devices and the I/O drawer planar assembly are plugged correctly.
390
391
Description
The cluster boot list is kept in the I/O Drawer Planar Assembly NVRAM. The normal bootlist sequence is fd0 (diskette drive), cd0 (CD-ROM drive), hdisk0 ( hard disk drive), and hdisk1 (hard disk drive) or fd0, CD0, hdisk1,hdisk0 . v If the cluster is not able to boot up to AIX, System Management Services (SMS) is used to display and update the bootlist. v If the cluster is able to boot to AIX, service login options are used to display and update the bootlist. This MAP is only for SMS bootlist maintenance.
Isolation
1. Connect the service terminal to the cluster not being serviced. 2. Quiesce and power off the cluster being serviced. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Quiesce Alternate Cluster Power Off the Alternate Cluster 3. Wait up to five minutes for the CEC drawer operator panel to display OK. Is OK displayed? v Yes, continue with the next step. v No, go to MAP 4730: Cluster Power Off Request Problem on page 446.
392
9. Both hard disk drives are listed. Reverse the order of the hard disk drives in the boot list.
Problem Isolation Procedures, CHAPTER 3
393
Do a. b. c.
10.
11.
12.
13.
14. Go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. When the build is complete return to MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) on page 482 and continue in the priority table. 15. Go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324. When the build is complete return to MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) on page 488 and continue in the priority table. 16. This is an unexpected condition which probably indicates a double failure (one hard disk drive not detected and the other hard disk drive cannot boot). Call your next level of support. 17. No hard disks are shown on SMS Boot list. Do the following to see if the hard drives are detected on the SCSI bus. a. Press X repeatedly until you are back at the Utilities Menu. b. Enter 6 (MultiBoot).
394
Yes
Yes
Yes
Yes
No
No
Yes
No
Yes
Yes
Yes
No
No
No
Yes
No
Yes
No
No
Yes
Yes
395
396
397
Description
ESC 1xxx are used to report problems with the cluster dual hard drives.
Isolation
Note: A single hard drive failure can normally be repaired without the cluster being Quiesced or Powered Off. Do not power off the failing cluster during this repair unless directed by the maintenance package. Undirected use of cluster power off can lead to unpredictable results. 1. Determine the failing cluster and ESC. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Select the problem to be repaired and note the ESC and the Failing Cluster. 2. Use the table to determine the condition you have, and the action to perform:
Table 66. ESC Repair Actions ESC 1050 Description and Action Description: Cluster hard disk drives are not mirrored, and the automatic mirroring function has been disabled. Action: Connect the service terminal to the failing cluster. Use Restore the Cluster to Automatic Mirroring Mode on the Cluster Dual Hard Disk Drive Repair Menu. Then select Restore Mirroring after a Cluster Hard Disk Drive Replacement. 1051 Description: A cluster dual hard drive has failed. Action: Connect the service terminal to the failing cluster. Use the Identify/Replace a Failing Cluster Hard Disk Drive option on the Cluster Dual Hard Disk Drive Repair Menu to identify and replace the failing drive. Note: The cluster MUST NOT be powered down during replacement of the failing cluster hard disk drive. 1052, Description: The cluster dual hard drive Mirror operation failed Action: Go to step 3 on page 400. 1053 Description: The cluster dual hard drive Unmirror operation failed Action: Go to step 3 on page 400.
398
399
3. Connect the service terminal to the failing cluster. Use Display Cluster Dual Hard Drive Status on the Cluster Dual Hard Disk Drive Repair Menu to determine the status of the cluster hard disk drives. Find the condition listed below and the action to perform: v If the status of both hard disk drives is good, the problem may have already been repaired but the problem was not closed. Close the problem. If the problem was not already repaired, call the next level of support. v If one hard disk drive has good status and the other does not, use the Identify/Replace a Failing Cluster Hard Disk Drive option on the Cluster Dual Hard Disk Drive Repair Menu to repair the failing hard disk drive. If this option does not show one hard disk drive that can be repaired, call the next level of support. v If both hard disk drives show bad status, call the next level of support before using MAP 4020: Hard Disk Drive Build Process for Both Drives on page 320 to reload all the cluster software. That MAP will reload all code on one hard disk drive, and automatic mirroring will restore the other hard disk drive.
Description
ESC 1060 is used to report a problem that caused the cluster to IML from the second hard disk drive, in the SMS boot list, instead of the first.
400
Isolation
1. There is a problem with the first hard disk drive in the cluster SMS boot list. The cluster did IML from the second hard disk drive listed in the SMS boot list. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Repair the related problem. If there is no related problem, call the next level of support.
Description
The cluster reporting this error has detected a duplicate of its TCP/IP address on the network (ESSNet or customer) connected to its ethernet port.
Isolation
1. The cluster reporting this error has detected a duplicate of its TCP/IP address on the attached network. Use standard LAN network problem isolation techniques. Note: The following actions may assist you: v Verify that the TCP/IP addresses are set correctly in this cluster. v Determine the network topology. Direct connected to customer network. Connected to an ESSNet network that is not connected to a customer network. Connected to an ESSNet network that is connected to a customer network. v Disconnect the ethernet cable to this cluster and use ping commands from the ESSNet console (or customer console if attached) to help identify the source of the duplicate address. v Call the next level of support as needed.
Description
A service processor reset is needed to attempt to clear an error condition.
Isolation
1. The service processor reset will cause the cluster to power off. 2. Login to the cluster not being serviced. 3. Quiesce the cluster x. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 4. Reset the service processor. Use a small straight nonconducting object, or an insulated paper clip (straightened), as a tool to activate the switch. The switch is very small in diameter, insert the tool straight into the hole 1 , keeping it at a
Problem Isolation Procedures, CHAPTER 3
401
Front View
Figure 132. CEC Drawer Operator Panel Locations (s009652)
5. Power on the cluster. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 6. If the cluster IMLs normally, resume the cluster. If the cluster hangs displaying a progress or error code in the CEC drawer operator panel go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371, or call the next level of support.
Description
The SMS (System Management Services) includes an option to display SMS errors logs. These logs may contain repair information for a failure that did not create a viewable problem.
402
Procedure
Note: This procedure requires the cluster to be taken away from customer use. 1. Quiesce and power off the failing cluster: a. Connect the service terminal to the cluster not being repaired. b. Use the Main Service Menu, Repair Menu, and then the Alternate Cluster Repair Menu options. 2. Power on the failing cluster and immediately go to the next step. v Use the Alternate Cluster Repair Menu options. 3. Display the SMS entry menu: a. Connect the service terminal to the failing cluster b. Keep logically connecting to the failing cluster until the work keyboard is displayed. Note: The firmware boot may disconnect the service terminal one or more times.
RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 RS/6000 Memory ====> Keyboard
c. Then quickly press the 1 key to bring up SMS. 4. Select Utilities and then Display Error Logs. 5. If an error is logged, check the time stamp: v If the error was logged during the current boot attempt, record it, then look up the error in Chapter 9: Error Messages, Diagnostic Codes, and Service Reports of the Volume 3. v If no recent error is logged in the error log, go to MAP 2700: CEC Drawer Power On Problem on page 170.
Description
Each cluster must have the correct TCP/IP settings for both clusters. Each cluster must have a working ethernet hardware connection to the other cluster.
Isolation
1. Is the Code EC level above 2.3.0.0? Note: The current Code EC level can be seen on the logon screen. v Yes, continue with the next step. v No, the Cluster to Cluster test on levels prior to 2.3.0.0 can give unpredictable results. Go to step 3 on page 404 to verify cluster to cluster communications.
Problem Isolation Procedures, CHAPTER 3
403
404
Description
The service login can display the UAA of the integrated ethernet adapter in the I/O Drawer Planar Assembly.
Procedure
1. Connect the service terminal to the cluster whos Ethernet network address will be displayed. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Change / Show TCP/IP Configuration Display Ethernet Network Address
Description
ESSNet1 or Master Console to cluster Ethernet problem.
Procedure
1. Verify that the following conditions are met before going to the next step: a. The ESSNet ethernet hub is powered on. (Use hub documentation to repair any problem.) b. The ESSNet console is powered on and ESSNet console software is active. (Use console documentation to repair any problem.) c. The 2105 cluster is powered on, the 2105 operator panel cluster ready indicator will be on. (Use normal repair actions to identify and correct any problem.) d. The ESSNet ethernet hub to 2105 cluster cable is correctly connected at both ends. (Use service guide Install Chapter 5 information if needed.)
Problem Isolation Procedures, CHAPTER 3
405
2.
3.
4.
5.
406
Description
The 2105 Model 800 cluster ethernet connections to the customer LAN network are made through the ESSNet ethernet hub or switch. All the TCP/IP settings including the ethernet protocol (en0 or et0) across the network must be compatible. Note: In this Map the term ESSNet console refers to either the ESSNet1 console or the Master Console.
Isolation
1. Verify the following ESSNet ethernet hub indications are present: a. Power LED is on. b. Error indicator LEDs are off. Reference the ethernet hub documentation. Are the hub indicators as listed above? v Yes, continue with the next step. v No, use the ESSNet ethernet hub documentation to correct the problem.
Problem Isolation Procedures, CHAPTER 3
407
408
409
Description
The FRUs listed in the problem details did not repair the problem, additional NVS related FRUs must be replaced.
Isolation
1. Observe the FRUs listed in the problem details display. Select the condition that applies: v Only NVS/IOA cards are listed, go to step 2 on page 411. v Only the NVS battery charger card and/or the NVS battery assembly are listed, go to step 3 on page 411.
410
Description
An I/O slot failure has been detected.
Isolation
1. Replace the FRU listed in the problem details. If the FRU does not repair the problem, call the next level of support. Note: The next level of support will need to get the PE package and statesaves for engineering assistance.
Description
A problem with a FRU list that contains both RPC cards, and the I/O drawer planar assembly. Each RPC card communicates with the service processor function on the I/O drawer planar assembly. v Each RPC card has a separate status register for each cluster that can be read.
Problem Isolation Procedures, CHAPTER 3
411
No No No
v When replacing a cluster FRU, the communication to both RPC Cards is only tested if both RPC Cards are not fenced. If an RPC Card is fenced, it must be quiesced and then resumed to test the communication from the cluster. v When replacing an RPC Card, the cluster to cluster comparison of the RPC status occurs only if both clusters are not fenced or quiesced. If a cluster is fenced or quiesced, it must be resumed to run the cluster to cluster RPC status comparison.
412
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Is the problem ESC=8314 or 8315? v No, got to step 3. v Yes, go to step 2. 2. See Note 1 in Table 67 on page 412 above. The most likely failing FRU would be the I/O drawer planar in the failing cluster or the RPC card to the I/O drawer planar communication cables should be checked. Reference Volume 3, chapter 10, 2105 Model 800 Cluster Power Control Diagram. The cable at I/O drawer connector R1 is common to both RPC cards. If it is, then replace one or both FRUs. If the FRUs cannot be selected in the problem, then use the Repair Menu, Replace a FRU option. After the problem is repaired, close the problem and then use the Repair Menu, End of Call Status option. 3. If the FRUs listed in the problem do not fix the problem, use this list of all possible FRUs. v I/O drawer planar assembly v RJ45 card (on the front of the I/O drawer planar assembly) v External cable from I/O drawer planer assembly connector J14 to RJ45 Card connector 1 v Cables from RJ45 card to the RPC 1 and RPC 2 cards v RPC 1 card and RPC2 card Display the problem details that sent you here and write down the timestamp value in the last occurrence field. After the FRU replace you will display this field again. If the value has been updated, then the same failure is still occurring and additional FRUs will need to be replaced. The FRU list contains both RPC cards and one or more cluster FRUs: v To replace a cluster FRU, go to step 6. v To replace an RPC card FRU, go to step 12 on page 414. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace the cluster FRU. Return here after the cluster FRU replacement is completed and the cluster has come ready. Display the problems to determine if a problem is still occurring. v If the original problem last occurrence timestamp value has been updated, the problem is still occurring. Return to the beginning of this MAP to replace the remaining FRUs or call the next level of support if all FRUs have been replaced. v If a new related problem was created, repair that problem now. After that repair is complete return to this MAP if the original problem is still occurring. (The last occurrence timestamp field value of the original problem was updated during the last cluster power on.) v If the original problem was not updated and there is no new related problem, continue with the next step. Quiesce and then Resume RPC-1. This will ensure that both clusters read the status register from the RPC-1 card. From the service terminal Main Service Menu, select:
Problem Isolation Procedures, CHAPTER 3
4.
5.
6.
7.
8.
413
414
Description
This MAP is used for a cluster to cluster CPI communication timeout. The communication after AIX is loaded and as the functional code loads occurs across the CPI interfaces (cluster 1 I/O Attachment Card to the Host Bay Planar Card to the cluster 2 I/O Attachment Card). There are four CPI interfaces that may be used. Once the cluster code is loaded, each cluster periodically sends a communication message to the other cluster (heartbeat) and sets a timer waiting for the response. If the timer expires with no response, the error recovery process will cause the non-responding cluster to failover its resources to the originating cluster. The non-responding cluster is then fenced (which removes customer use of that cluster).
Problem Isolation Procedures, CHAPTER 3
415
Isolation
Use the following steps to continue this repair action. 1. Ensure that the problem is still displayed on the service terminal. Note the following: v Failing Cluster should be the other cluster (not the one the service terminal is connected to). v Reporting Cluster should be the cluster you are connected to. v Ignore the information in the Failure Actions, Probable Cause, Failure Cause and User Actions fields. This information is only used by engineering and the next level of support. 2. Observe the cluster Ready indicator LED for the failing cluster on the 2105 Model 800 operator panel. Is the Ready indicator LED on? v Yes, the cluster has successfully completed the power on error recovery. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option) and repair any related problems. Then go to MAP 1500: Ending a Service Action on page 67. v No, the cluster did not successfully complete the power on error recovery. Continue with the next step. 3. Observe the CEC drawer operator panel. Is the cluster hung displaying a code on the operator panel? v Yes, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371 and use the codes displayed on the CEC drawer operator panel. v No, display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option) and repair any related problems. If there are none, call the next level of support.
416
Description
Pinned Data can exist for DASD Fast Write, High Bandwidth Sequential Fast Write, and Cache Fast Write Data. Pinned Data is caused by failures that prevent data from being destaged to DASD. These are either DASD failures that make the array/volume unavailable or failures that make cache and/or NVS data unavailable. Pinned Data can only be freed or un-pinned by successful retry of the destage operation or a request to discard the pinned data is received from the host or service interface.
Isolation
1. Use this step to collect the needed information and then call the next level of support. Do not perform any repair unless directed by the next level of support. If repairs are performed in the wrong sequence, customer data loss can occur. a. Determine all of the volumes with Pinned Data and/or Volume Status Unknown. From the service terminal Main Service Menu, select: Utilities Menu Pinned Data Menu/Volume Status Unknown Display Pinned Data Note: Volumes displayed have retryable pinned data, non-retryable pinned data or FC (no global subsystem status). A volume can be listed with more than one pinned data status. Pinned data status can be caused by hardware problems which create problems. Retryable pinned data is normally caused by DASD or SSA interface problems. Non-retryable pinned data is normally caused by cluster problems. FC status can be caused by either of the above problem types. b. Display problems needing repair. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problem Needing Repair c. Continue with the next step. 2. Call your next level of support now. Have ready the information you gathered in the last step. Your next level of support may need to login remotely and perform additional problem analysis. 3. Your next level of support may direct you to do the following steps after they have reviewed all of the information. They may change the order of the repairs. Wait for them to guide you before continuing. 4. Are there any DASD or SSA interface related problems? v Yes, repair the DASD or SSA interface problems. The repair may allow retryable pinned data to destage. (An SSA loop with only one DDM failure will not normally cause pinned data if the DDM is part of a RAID array.)
Problem Isolation Procedures, CHAPTER 3
417
Description
This MAP is used to locate defective FRUs not found by normal diagnostics that hang the cluster preventing it from loading code. The problem may be in the CEC drawer or the I/O drawer. Use the following figures to locate CEC and I/O drawer bulkhead connectors:
(SCSI Signal) Q7 (SCSI Power) Q8
Fan 7
Fan 8
Front View
Q1 V/S Comm Q3 RIO-0 Q2 RIO-1 Q4 JTAG
418
RIO 1
RIO 0
No Use
S1 S3
S2 S4
J11 J15
J14 J16
R1 (JTAG)
419
420
Locate, if possible, the checkpoint that sent you to this MAP in the following table:
Table 69. Minimum Configuration Checkpoint Checkpoint 91FF 9380 94B0 94B1 Checkpoint 94B2 94BB 9501 9502 Checkpoint 9503 9504 9505 9506
Did you find the error code or checkpoint that sent you here in the above tables or did the action that sent you to MAP 4540 direct you to run the CEC Drawer Minimum Configuration? v Yes, go to MAP 4540 Section-4 on page 422.
Problem Isolation Procedures, CHAPTER 3
421
CEC Drawer
V/S Comm
RIO 0 RIO 1
JTAG
With the RIO-1 cable connected between the CEC drawer RIO-0 port and the I/O drawer RIO-0 port, power on the cluster. Does the same error code still occur? v Yes, reconnect both RIO cables to their original connectors on both drawers, then go to MAP 4540 Section-5. v No, the original RIO-0 cable you removed in step 3 is defective. Replace the failing RIO cable. Reconnect the RIO-1 cable back to its original connectors on both drawers, then go to MAP 4540 Section-12 on page 426.
422
13
15 SLOT (15)A
SLOT (16)A 16 SLOT (14)B 14 SLOT (12)C 12 SLOT (10)D 10 SLOT (8)D SLOT (6)C SLOT (4)B SLOT (2)A
5 1
7 SLOT (7)D
SLOT (5)C 3 SLOT (3)B SLOT (1)A
8 4
6 2
DIMM INSTALLATION
Figure 136. CEC Drawer, Memory Riser Card Memory DIMM Module Locations (s009241)
4. On the memory riser card in slot Tx-UI.1-P1-M2, record the DIMM locations and remove all the memory DIMMs except the ones in slots I, 2, 15, and 16 (memory quad A). 5. With the CEC drawer now configured with only minimum required memory, connect and switch on the power supplies, then power on the cluster. Does the same error code still occur? v Yes, go to MAP 4540 Section-6. v No, go to MAP 4540 Section-8 on page 424.
423
424
425
Description
When a problem calls this MAP, the NVS FRU or FRUs must be replaced as described below.
Isolation
1. A problem with NVS/IOA cards sent you here. If replacing the FRUs listed in the problem does not repair the NVS problem, replace the I/O drawer planar assembly. Go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432
426
Description
Global subsystem status (GSS) exists for each Logical Subsystem (LSS). Two copies are kept, each on a separate array. If one copy becomes unavailable, a problem is created and a new second copy is created on a different array if possible. It stays in this new location even after the repair is complete. Normally, when a volume is unavailable, the array it is located on has status of offline or unknown. An LSS can operate on just one GSS copy. If both GSS copies are unavailable, the LSS gives FC status to all ESCON host system requests to its volumes. The LSS gives command rejects and check conditions of internal target failure to all SCSI host system requests to its volumes. There can be one or more problems for each GSS copy that is unavailable. It normally takes two or more failures to prevent the fault tolerant RAID architecture from accessing a particular array (rank). If access to the GSS copies was lost, but the data is still valid, then the repair action should restore access. This will automatically reset the No Valid Subsystem Status condition. If both copies lost the actual GSS data, then the GSS status for that LSS will have to be reset when determined by the next level of support. This can cause customer data loss. There is no one problem that will identify the various combinations of failures that created the condition. Each GSS copy has at least one problem needing repair. There may be other non-related problems needing repair also. An example would be a problem for a DDM replacement on an array and SSA loop not part of the LSS with the condition. Therefore, the isolation procedure below helps you determine the highest priority problem to repair first.
Isolation
1. It is important you read the description section above before proceeding with this isolation procedure. 2. Call your next level of support before going to the next step. 3. Display the pinned data status: From the service terminal Main Service Menu, select: Utilities Menu Pinned Data Menu Display Pinned Data A volume is only displayed if it has pinned data status. The LUA/LSS and SSID are shown for each volume displayed. The display groups volumes having retryable pinned data, non-retryable pinned data and FC (no global subsystem status). v If a volume has FC status, go to the next step. v If a volume has retryable or non-retryable pinned data go to MAP 4520: Pinned Data and/or Volume Status Unknown on page 417.
Problem Isolation Procedures, CHAPTER 3
427
Description
There is a pinned data condition that was detected by: v A 2105 power off, local or remote mode v A cluster resume, part of a service action v A cluster recovery, other cluster rebooted this cluster v A failback to service or a failover to service, specific code actions The pinned data may be retryable or non-retryable. Any of the above conditions will create a problem, and call this map, with an ESC = 38E7.
428
Isolation
1. Read the description above. 2. Call your next level of support for specific guidance. Failure to do so may cause unnecessary customer data loss. 3. The next level of support may have you do the following: v Create a product engineering login password so they can do a remote connection and access information not available using the service login. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu Call Home / Remote Services Menu Enable Product Engineering Access. v Use the UEPO red switch to power off the 2105. This causes an emergency dump of the data as part of the recovery. v Power on the 2105. v Go to MAP 4520: Pinned Data and/or Volume Status Unknown on page 417. This MAP also uses your next level of support to guide you through identifying the proper order of repair of the related problems that should have been created.
Description
The CD-ROM drive in one of the clusters is failing.
Isolation
Retry the failing operation with another CD-ROM disk of the same type. Notes: 1. For test media, use the 2105 code/LIC CD-ROM instead of the CD-ROM test disk requested by the CD-ROM drive test. 2. Audio is not used by the 2105, do not run the audio test that uses the audio headset. Is the CD-ROM still failing? v Yes, go to MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 and replace the CD-ROM drive.
Problem Isolation Procedures, CHAPTER 3
429
Description
The cluster SP or System Firmware is down-level. This can happen when the I/O drawer planar assembly FRU is replaced and has down-level firmware. On cluster power up, the down-level code is discovered and a problem is created. This occurs even before you have the chance to check and update the firmware per the FRU Replace table in MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432. Before firmware can be updated, all problems needing repair must be repaired or cancelled.
Isolation
1. Cancel the problem that sent you to this MAP. From the service terminal Main Service Menu, select: Utility Menu Problem Log Menu Change a Problem State 2. Repair all problems needing repair before going to the next step. From the service processor Main Menu, select: Repair Menu Show / Repair Problems Needing Repair 3. Check and update to the latest level of LIC firmware for the I/O drawer planar assembly and SP. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Multiple LIC Menu Select one of the following: v Concurrent or Nonconcurrent Select one: a. Concurrent or b. Noncurrent. v System Planar / Service Processor Menu
Description
The diskette drive in one of the clusters is failing.
Isolation
Retry the failing operation with a new diskette of the same type. Is the diskette drive still failing?
430
Description
The cluster firmware needs to be reloaded, it may be corrupted.
Isolation
1. Find the following that applies: v The cluster hangs with an eight digit system firmware error code displayed on the CEC drawer operator panel. Continue with the next step. v The cluster comes Ready and there is a problem with a firmware error code that calls this MAP. Use MAP 4610: Cluster SP, SPCN, or System Firmware Down-Level on page 430 to reload the firmware (even though the firmware may not be downlevel). 2. The system firmware needs to be reloaded using the Service Processor (SP) menu options and firmware diskettes. Do the following: a. Connect the service terminal to the working cluster and use the Main Service Menu, Repair Menu, Alternate Cluster Repair menu options to quiesce and power off the failing cluster. b. Wait for the CEC drawer operator panel to display OK. c. Connect the service terminal (CE most or laptop) to the S2 port to the cluster being serviced. Press the Enter key to cause a keyboard interrupt to display the SP Main Menu. Note: The Master console cannot be used to access the SP menus. The current system firmware version is displayed. Locate the system firmware diskettes. Select the Service Processor Setup Menu, Reprogram Flash EPROM Menu option. Follow the prompts. The following will be loaded: System Power Control Network (SPCN), service processor, system firmware, run-time abstraction services. When the update completes the service processor will reboot to OK. Connect the service terminal to the working cluster and use the Main Service Menu, Repair Menu, Alternate Cluster Repair menu options to power on and resume the cluster. When the repair is complete, go to MAP 1500: Ending a Service Action on page 67.
d. e. f.
g.
h.
Description
The cluster powered off unexpectedly.
Problem Isolation Procedures, CHAPTER 3
431
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Observe the CEC drawer operator panel. Are any codes displayed? v Yes, go to step MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v No, continue with the next step. Login to the working cluster and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Are there any problems related to the RPC cards, or failing cluster (CEC or I/O drawer) power or cooling? v Yes, exit this MAP and repair the related problem. v No, continue with the next step. Verify cluster is powered off. Observe the I/O drawer power LED indicator on the upper left of the CEC drawer operator panel. Find the condition that applies: v On solid, continue with the next step. v Blinking slowly, go to step 5. v Off, go to step 5. Observe the CEC drawer power LED indicator on the front lower left of the drawer. Find the condition that applies: v On solid, both drawers of the cluster are powered on normally. Exit this MAP and return to the procedure that sent you here. This appears to be a false error condition. If there is a problem, cancel it or call the next level of support. v Blinking slowly, the CEC drawer did not power up, go to MAP 4880: Cluster Power On Problem on page 461. v Off, the CEC drawer did not power up, go to MAP 4880: Cluster Power On Problem on page 461. Determine if there is a cluster power on problem that may be related to the unexpected cluster power off. Login to the working cluster and attempt to power on the failing cluster using the Alternate Cluster Repair Menu options. Does the cluster power on? v Yes, continue with the next step. v No, exit this MAP and go to MAP 4880: Cluster Power On Problem on page 461. The failing FRU may be the I/O drawer planar assembly. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) to replace the FRU, or call the next level of support.
2.
3.
4.
5.
6.
432
Description
A problem or MAP isolation procedure has identified one or more cluster FRUs for replacement. Following all steps in this MAP will verify that the FRU is replaced and verified properly.
Procedure
Note: If memory FRUs are being replaced, reference MAP 4160: Isolating Memory Related Error Codes on page 355 before continuing. 1. Is there an existing problem for the cluster FRU or FRUs being replaced? v Yes, display the problem details for all related problems then continue with the next step. v No, continue with the next step. 2. Are you here to replace a single hard drive? v Yes, go to MAP 43B0: Cluster Dual Hard Drive ESC 1xxx on page 398. v No, continue with the next step. (This includes replacement of both hard disk drives.) 3. Are you here to replace only a CEC or I/O drawer power supply (no other cluster FRUs to replace)? v Yes, go to MAP 4890: Replacing a CEC or I/O Drawer Power Supply on page 471. v No, continue with the next step. 4. Connect the service terminal to the cluster that is not being repaired. See Service Terminal Setup in chapter 8 of the Volume 3. 5. Quiesce the cluster being repaired using the service terminal Alternate Cluster Repair menu option. Note: If pinned data is detected during the quiesce, you will be sent to MAP 4520: Pinned Data or FC Status. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Quiesce the Alternate Cluster 6. Was pinned data status detected during the quiesce cluster in the prior step? v Yes, verify that all the actions in MAP 4520: Pinned Data or FC Status were attempted. If you are still unable to Quiesce the cluster normally, contact your next level of support. Get their approval to quiesce the cluster using the Unconditionally Quiesce the Alternate Cluster option, instead of the Quiesce the Alternate Cluster option. This will bypass the check for pinned data. Note: Note. This action may result in a loss of customer data. When the quiesce is complete, continue with the next step. v No, go to step 10 on page 434. 7. Was the original pinned data status non-retryable? v Yes, continue with the next step. v 8. Is v v No, go to step 10 on page 434. an NVS card FRU being replaced? Yes, continue with the next step. No, go to step 10 on page 434.
Problem Isolation Procedures, CHAPTER 3
433
(R1-)
Power Cable Connectors 2 per power supply
PWR
1 2 J1 J2
Cluster 2 cpcluster1
Cluster 1 cpcluster0
Power Supply
ON CHK/ PWR-GOOD
OFF
Rear View
Figure 137. Power Supply Connector Locations (s009710)
12. Slide the CEC or I/O drawer to be repaired into the service position. Open the CEC or I/O drawer top cover. Reference the correct cluster drawer repair procedure, in chapter 4 volume 2 of this book, see:
434
435
436
437
438
439
440
v I/O drawer serial interface cable (S3 Description: Verify the connection to the modem and expander (if installed). port) Action: 1. Verify modem and expander connection. v Connect the service terminal to the S2 port of this cluster. From the service terminal Main Service Menu, select: Machine Test Menu Send Test Notification Menu Service Notification (via modem) 2. Go to step 22. v I/O drawer ethernet 10Base-T cable Description: Test the ethernet connection to the other cluster. Action: 1. To test the ethernet connection to the other cluster: v Connect the service terminal to the S2 port of this cluster. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Verify that the problems status is displayed for both clusters. 2. Go to step 22. v I/O drawer ethernet AUI cable Description: The AUI connection if not used for the 2105 Model 800. Action: None v I/O drawer power supply Description: No additional verification needed. Action: The I/O drawer power supply can be replaced concurrent with customer activity on the cluster. When the problem has ESC=5300, it is combined with cluster FRUs that must be replaced using this MAP. Replace the power supply when the other FRUs are replaced. No additional verification. Then go to step 22. CEC drawer to I/O drawer external cables: v V/S comm. v JTAG v RIO-0 v RIO-1 Description: No additional verification needed.
22. Verify that the cluster being repaired has come ready by connecting the service terminal to the cluster and attempting to login. The time to come ready will be increased if any cluster firmware updates are needed. The updates occur automatically during the cluster IML. Was the service terminal able to login to the cluster being repaired? v Yes, continue with the next step. v No, wait for the cluster to come ready. If the cluster hangs displaying a code, go to MAP 4360: Isolation Using Codes Displayed by the CEC
441
Description
A failure was detected when new disk drive module (DDM) licensed internal code was being downloaded to the DDMs. Note: The term download means the same as update. One of the following error conditions could have been detected: v SSA card is not in the proper state. v Unable to check the array status. v Arrays are not in a the proper state. v DDM diagnostic failed for pdiskXX. v Download failed for pdiskXX. v The download process took too long and timed out. The DDM code download process includes the following: v The new DDM code is included on the 2105 LIC Code update CD-ROM. v The LIC update process copies the code from the CD-ROM to the cluster. v The DDM download process is started using the service terminal Disk Drive Module (DDM) LIC Menu options. It automatically runs to one DDM at a time. It runs the DDM diagnostics, then loads the new code, then runs the DDM diagnostics again. If the diagnostics and code load are successful, the process is repeated on the next DDM, until every DDM is complete.
442
Isolation
1. Read the description section above. 2. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Look for related problem (SSA or drawer FRUs). From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair v If there are no related problems, call the next level of support. v If there are related problems, fix them and then return here and continue with the next step. 3. Use the DDM Download Restart option to complete the DDM download process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Disk Drive Module (DDM) LIC Menu DDM Download Restart
Description
Each host bay drawer power supply: v Has two output boundaries, one for each host bay. v Receives control and status request commands from each RPC card through an RS-485 interface. v Must receive power off status from both RPC cards to switch off host bay power (if one of the RS-485 interfaces is not operational, the power supply will power off if it receives a command from the operational interface).
Isolation
Show and repair any problems with RPC card or Host Bay Drawer power supplies. If there are no related problems, continue at the next step. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair 2. Verify the following cables are correctly connected to both Host Bay Drawer power supplies: v Power control cables are correctly plugged into both host bay power supply RJ45 connectors J14 and J15.
Problem Isolation Procedures, CHAPTER 3
1.
443
444
445
Description
Cannot power off both clusters
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. The service terminal utility menu options were used to attempt to power off a cluster, but the other cluster was already powered off. Only one cluster may be powered off at a time. Power on the other cluster. Then connect the service terminal to the other cluster and use the Alternate Cluster Repair menu option to power off this cluster.
Description
A cluster file (dataset) or function is corrupted. If this has affected customer operations, a separate problem should have been created. In many cases, customer operations will not be affected. Only Processes and/or files used by the RAS (maintenance package) processes may be affected. There are three recommended actions: v The cluster can be quiesced, powered off and on, then resumed. This reloads the code into the cluster which might clear a hung process. If the failure is still present, then the next action is needed. v The code is reloaded onto the cluster hard disk drives. An important part of this process is the saving and restoring of the configuration and customization files. This allows the cluster to restore access to the customer data after the process is complete. If the failure is still present, then the next action is needed. v The next level of support is contacted. They can login through the modem and do functions similar to that of an AIX system administrator.
Isolation
1. Does the problem ESC = 38F5?
446
Description
The cluster functional code was not loaded during the last cluster power on. Only the AIX operating system and RAS (maintenance package) code was loaded. The service terminal can login to the failing cluster because it only requires the RAS code.
447
Isolation
1. Verify that no diskette is in the failing clusters diskette drive. v If there is not a diskette, continue with the next step. v If there is a diskette, remove it and repeat the operation that failed. Note: The norsStartOnce diskette used when directed by the next level of support can create this condition. 2. Use the service terminal to display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there any other related problem for the failing cluster? v Yes, exit this MAP and repair the related problem. v No, continue with the next step. 3. Do both clusters have a problem that calls this MAP? v Yes, continue with the next step. v No, go to step 6. 4. There may be a false error condition in the rack power control cards that can be reset. a. Power Off the 2105 Model 800. b. Switch the System Power AC circuit breaker on both primary power supplies to Off (down). c. Wait until the green Power Control Good indicators on both rack power control cards are off. It takes up to 30 seconds for the logic voltage supplied to the rack power control cards to discharge. d. Switch the System Power AC circuit breaker on both primary power supplies to On (up). e. Power On the 2105 Model 800, then continue with the next step. 5. Wait more than the normal amount of time for the customer operator panel Cluster 1 and 2 Ready indicators to come on solid. A failing cluster may attempt to load its code up to three times before it posts an error. Each code load attempt may take 10 to 20 minutes. v If both clusters come ready, go to MAP 1500: Ending a Service Action on page 67. v If a cluster hangs and displays a code on its operator panel, go to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. v If a cluster does not come ready, attempt to log in, display, and repair any new related problems. If there are no new related problems, call the next level of support. 6. Only one cluster has a problem that sent you to this MAP. Verify that the other Cluster Ready indicator on the rack operator panel is On.
448
Description
The cluster can be powered off three ways, the service login options, the rack operator panel local power switch, and the RPC card push-buttons. The push-buttons and local power switch circuits are directly connected to the RPC cards. The service login communicates to the RPC cards from the login cluster, normally the cluster not being serviced. Both RPC cards must receive the cluster power off request and agree that they have received it. When they agree, they request the service processor in the cluster being serviced, to begin the cluster
Problem Isolation Procedures, CHAPTER 3
449
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Show and repair any problems with RPC card, CEC drawer or I/O drawer power FRUs, then retry the cluster power off. Note: If there are no related problems, continue with the next step. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair 2. Verify the following cables are correctly connected to both I/O drawers and both RPC cards: v I/O drawer power control cables are correctly plugged into the I/O drawer RJ45 connectors P2 and P3 (drawer front middle right). v I/O drawer power control cables are correctly plugged into each RPC card. RPC card connector J2-11 (for cluster 2) or J2-15 (for cluster 1). Note: When using the service terminal to do Alternate Cluster repair and cluster power off, both clusters must be able to communicate successfully with both RPC cards. 3. Verify that the failing cluster has been quiesced before continuing. Connect the service terminal to the cluster not being serviced. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Quiesce a Resource (Cluster Bay 1 is for cluster 1, Cluster Bay 2 is for cluster 2) 4. Determine if an I/O drawer power supply is stuck on. Observe the CHK/PWR GOOD power indicator LED on each I/O drawer power supply for the failing I/O drawer (rear of rack). Is one power supply LED blinking slowly and the other power supply LED on solid? v Yes, the I/O drawer power supply with the LED on solid is preventing the I/O drawer from powering off properly. The possible failing FRUs are: the I/O drawer power supply with the LED on solid, replace that power supply. Use the Repair Menu, FRU Replace Menu, IBM ESS Model 800 options. If the problem still occurs, call the next level of support before replacing the I/O Drawer Planar Assembly. v No, continue with the next step.
450
451
Description
A host bay can unexpectedly lose power in two ways: v The host bay drawer power supplies are operating correctly, but the power is not reaching the host bay planar or the host bay planar is failing. The power supply HA1 and HA2 LED indicators only indicate if the power supply outputs are switched on. They do not indicate if the host bay is receiving power. v The host bay drawer power supplies are not operating correctly and the HA1 and HA2 indicators may be off. The 2105 code can detect the host bay power being off when the clusters cannot communicate with the host bay logic through the CPI interfaces. The host bay planar assembly receives bulk voltage from the host bay drawer power supplies. It then converts the bulk voltage into logic voltages. Note: The are four LEDs indicators on the host bay planar. They are located at the front of the planar to the right of slot 4. They can be seen by looking at an angle through the front sheet metal cooling air holes, see Figure 138 on page 453. The functions of the four LEDs are: v First (front) LED, host bay planar power is on. v Second LED, remote FPGA chip updated from flash properly during power on.
452
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Display and repair any related problems for the host bay drawer power supplies or RPC cards. Then return here and continue with the next step. 2. Observe the four LEDs on the failing host bay planar assembly, see the note in the description section of this MAP. Use a working host bay to verify you know where to look.
Observe the HA LED for the failing host bay on both host bay drawer power supplies: v HA1 LED is for host bay 1 or 3. v HA2 LED is for host bay 2 or 4.
453
R1
HA LEDs
Switch
Rear View
Figure 139. Host Drawer Power Supply HA LED Indicator Location (s009644)
3. Both power supplies are not supplying power to the host bay. Do the following until the problem is repaired: v Check for an overcurrent condition. Use step 7 on page 455. v Check for a failing power supply that prevents the other power supply from powering on. Use step 8 on page 456. v Replace one or more of the possible failing FRUs. Host bay planar assembly, use the problem or Repair Menu, Replace a FRU option. Host bay drawer backplane, use step 9 on page 456. When the problem is repaired, go to step 10 on page 456. 4. The host bay is being sent power by at least one host bay drawer power supply. The host bay may not be receiving bulk voltage, or the host bay planar assembly may not be making logic voltages properly. Do the following until the problem is repaired: v Check for damaged auto-docking power connectors and proper host bay seating. Use step 6 on page 455.
454
455
456
Description
The SCSI card firmware load process did not complete the first load attempt which created the problem that sent you here. That failure should have caused a reset that attempted a second firmware load attempt. If the card status is available, the second firmware load attempt was successful.
Isolation
1. Repair any other problems for this SCSI Card. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Were any other problems for this SCSI Card repaired? v Yes, retry the firmware update load process. If it still fails, call the next level of support. v No, continue with the next step. 2. Read the description section above. Determine if SCSI card status is available. From the service terminal Main Service Menu, select: Utility Menu Show Storage Facility Resources Menu Show Storage Facility Resources Use the left column to find the Engineering FRU Name listed in the problem and determine the status. Is the status available? v Yes, continue with the next step. v No, call the next level of support. 3. Close the problem. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repair Problem
Description
The CPI diagnostics are run from both clusters to each host bay. The clusters communicate with each other through the cluster to cluster ethernet connection. Note: The problem may list the failing resource as a CPI interface. The CPI interface shown is the CPI interface that was being tested when the communication failure occurred. It is not the actual failing resource.
Isolation
1. Test the cluster to cluster communications through the ethernet connection. From the service terminal Main Service Menu, select: Machine Test Menu External Connections Menu Cluster-Cluster Communications Test
Problem Isolation Procedures, CHAPTER 3
457
Description
To replace the host bay planar FRU, a special procedure must be followed. There is no option for this FRU in the service terminal Replace a FRU option.
Procedure
Attention:This procedure requires taking both host bays and one of the clusters away from customer use. 1. Determine if a cluster is fenced. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Show Fenced Resources Is the cluster fenced? v Yes, there should be a problem for it. Repair that first then return here and continue with the next step. v No, continue with the next step.
458
Description
A host bay will not power on.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence. 1. Do the following checks before continuing: v Verify that the host bay drawer power supply power switch is set to on (up) on both power supplies. v Display and repair any problems for the RPC cards or host bay drawer power. v Verify that the host bay drawer power supply power input cables and power control cables are correctly plugged. v Verify that the host bay drawer power control cables are correctly plugged into the RPC cards connector J2-9 (host bay 1 and 2) or J2-13 (host bay 3 and 4). v Slide the host bay in and out a few times to verify that the connector contacts are clean. v Attempt to power on the host bay, if it fails continue with the next step. 2. Determine where the host bay failed to power on from:
Problem Isolation Procedures, CHAPTER 3
459
460
Description
The CEC drawer and I/O drawer have three power states: v Powered off This only occurs when the 2105 is powered off. The drawer power LED indicator will be off. v Standby power mode This occurs when the cluster has been powered off for service. The CEC drawer and I/O drawer power LED indicators will be blinking slowly. The CEC drawer operator panel will display OK. v Powered on This is the normal mode when the 2105 is powered on. The drawer power LED indicator will be on solid. When the 2105 is powered on, the drawers receive standby power from the drawer power supplies. The drawer power LED indicators are blinking slowly. The service processor in the I/O drawer and the System Power Control Network (SPCN) including the fan controller card in both drawers are operational. Once the CEC drawer operator panel displays OK, standby power mode is complete. This state is signalled to both RPC cards through the cables connected to the I/O drawer RJ45 connectors 2 and 3 (drawer front right). The RPC cards automatically send back an I/O drawer power on signal. The I/O drawer then powers on completely and signals the CEC drawer to also power on completely.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence.
461
Fan 7
Fan 8
Front View
Q1 V/S Comm Q3 RIO-0 Q2 RIO-1 Q4 JTAG
462
RIO 1
RIO 0
No Use
S1 S3
S2 S4
J11 J15
J14 J16
R1 (JTAG)
v CEC drawer connectors for RIO-0, RIO-1, V/S COMM, and JTAG (four horizontal cables at bottom front of drawer) v I/O drawer connectors for RIO-0, RIO-1, V/S COMM, JTAG, RJ45 card connectors 1/2/3 and connector J14 (drawer front lower right) v I/O drawer power control cables are correctly plugged into the I/O drawer RJ45 connectors P2 and P3 (drawer front middle right). v I/O drawer power control cables are correctly plugged into each RPC card. RPC card connector J2-11 (for cluster 2) or J2-15 (for cluster 1).
J2 RPC 2
2 1 4 3 6 5 8 10 12 14 16 7 9 11 13 15
R EAR V IEW
Are the cables correctly connected? v Yes, continue with the next step.
Problem Isolation Procedures, CHAPTER 3
463
464
465
466
467
Description
An SPCN firmware code of 1011 1C0x is occurring which indicates one of the CEC drawer power supplies is reporting an overcurrent.
Isolation
Note: When directed to replace an RPC card or primary power supply (PPS) FRU, always use the FRU Removal and Replacement Procedures in Chapter 4 of the Volume 2. The attached cables must be unplugged and replugged in the correct sequence.
468
469
470
Description
See isolation below.
Isolation
1. THE CLUSTER WILL POWER OFF IF A POWER SUPPLY IS REMOVED FOR MORE THAN FOUR MINUTES. When the power supply is removed, a power supply (working or not) must be reinstalled within 4 minutes. Note: The cluster firmware checks if the power supply FRUs are physically installed. If it detects a power supply as missing, it waits four minutes and then powers off the the cluster. A power supply that is not installed, creates a problem with the cooling air flow, that can cause components to overheat. As long as the firmware detects the power supply, it does not matter if the power supply is working, the firmware will not power off the cluster. 2. Note the power supply location code displayed in the problem. Use the Repair Menu, Replace a FRU option to replace the CEC or I/O drawer power supply. Note: When using the Replace a FRU option, the CEC and I/O drawer power supplies are listed under the rsrack1 container. 3. If the repair was successful, the problem will be closed automatically. Use the Repair Menu, End of Call Status option to complete the service action.
Description
An error was recorded against one or more FRUs with a location code that was not properly recognized by the maintenance package. Additional action is required to determine the correct 2105 FRU location.
Isolation
1. Display the problem details screen that sent you here. 2. Observe the FRU Location field in the Possible FRUs list. Is the location code U0.1-V? v Yes, go to MAP 40E0: Only One I/O Drawer Power Supply Detected on page 349. v No, continue with the next step. 3. Is there another FRU listed in the same problem with a valid 2105 FRU location? Note: A FRU with a valid location will not have n/a in the Engineering FRU name, FRU Name, and Likely to Fix fields. v Yes, ignore the unrecognized FRU and go to step 5 on page 472. v No, continue with the next step. 4. Determine the 2105 FRU location as follows:
Problem Isolation Procedures, CHAPTER 3
471
# ESC 5500: One or more of the FRU entries listed have a FRU that cannot # be fully identified. Call your Next level of support for assistance to . . . # Failing Cluster ..........= 1
b. Determine the 2105 FRU location by prefixing the location code with the cluster location, for example T1-U0.1-P1-I4. Note: A description of the Location Codes is provided in Location Codes in chapter 7 of the Volume 3. 5. Determine if additional isolation actions, information or failing function codes are provided for the SRN and FRU Error Code, if listed. Lookup the SRN and FRU Error code (if listed) in Error Messages, Diagnostic Codes, and Service Reports in chapter 9 of Volume 3. 6. Use MAP 4700: Cluster FRU Replacement (CEC and I/O Drawers) on page 432 to replace any cluster FRUs. Note: If the FRU location indicates a PCI slot (the last FRU location characters being /Ix, where x is the PCI slot number), replace the card in that slot. If that does not resolve the problem then contact your next level of support before replacing the I/O planar assembly 7. If the above actions do not indicate a failing FRU or resolve the problem, then call your next level of support.
Description
The 2105 Model 800 functional code detected a software problem that will require the next level of support to correct. Powering off and then on the cluster or reloading the hard disk drive code will not fix it. The next level of support may ask you to provide them with the information displayed in one or more fields of the problem. This will help identify the specific problem and the actions needed to correct it. This MAP is also called if a LIC feature license failure has been detected by the 2105 code. Another MAP isolates this problem.
Procedure
1. Use the table to find and repair the ESC listed in the problem.
Table 74. ESC Repairs ESC Go to:
472
2. ESC 1235 - ODM out of synchronization Call your next level of support and have them reference the following note: Note: Only one resource, with out-of-sync ODM, is listed in the Error/Problem, there may be additional resources with ODM problems. The list of resources with out-of-sync ODM can be found in /var/adm/searas/tmp/rsodmcheck.log. See the following example. WARNING:The following ODM Errors were detected on Thu Apr 5 15:07:47 PDT 2001 cpssvol37 DA20 SingleSide cpssvol37 rank0 25 12 164067840 10 0 0 cpssvol38 DA20 SingleSide cpssvol38 rank0 25 13 177740160 10 0 0 lss9 DA10 MisMatch lss9 FF25 25 0 0 00diff0-rcfg lss9 FF25 25 10 0 0 0 diff1-rcfg ********End of List ******** ESC 1236 - DDM background process to format or certify or initialize failed From the service terminal Main Service Menu, select: Repair Menu Show Result of DDM Format / Resume Operation Call your next level of support, have them reference the result. For problem determination, a PE package and SSA adapter dumps will be needed. ESC 1237 - A resource was found fenced but there is no related problem to repair it. Call your next level of support. ESC 1238 - Quiesced resources were found and there is no service login or Automatic LIC is not in progress. The service representative needs to complete the service action that was started. ESC 1239 - Excessive file system fragmentation on the cluster hard drive. Call your next level of support. ESC 2770 - One cluster has a defective NVS/IOA card ESC 2771 - Both clusters have a defective NVS/IOA card on the same CPI interface. Go to MAP 41C0: ESC 2770 or 2771, Missing CPI Detected on page 362. ESC 380E - IBM notification of warmstart failover ESC 380F - IBM notification of warmstart a. This problem is informational only and requires no repair.
Problem Isolation Procedures, CHAPTER 3
3.
4.
5.
6. 7.
8.
473
Description
The customer is experiencing problems or has asked for assistance with ESS Web Copy Services. One of the following conditions may be present: v The customer is unfamiliar with managing Copy Services using the ESS Specialist v The customer wants help in managing Copy Services v ESS Web Copy Services is not properly configured v The customer has asked you to restart Copy Services v The customer is not seeing a complete LSS list at the host
Procedure
Use the following table to help determine the action needed to resolve the customers problem. Find the Symptom in the table and then use the Action to isolate and repair the problem.
474
The customer has asked you to restart Copy From the service terminal Main Service Services Menu, select: Configure Options Menu Copy Services Menu Copy Services Server Menu Change Server Definitions Select one of the following: Reset to Primary Restarts Copy Services with Primary Server as active server Reset to Backup Restarts Copy Services with Backup Server as active server
475
Description
There are LIC features that the customer buys a license for. The service representative enables the feature by loading a customized diskette written for this 2105s serial number. If there is a mismatch, a problem will be created with an ESC field that identifies the feature that is disabled.
476
Procedure
1. Display the problem details screen and identify the ESC and LIC feature that is disabled. v 384B - License Failure, license out of sync on each cluster, go to step 6. v 384C - License Failure, PAV disabled, go to step 2. v 384D - License Failure, XRC disabled, go to step 2. v 384E - License Failure, PPRC disabled, go to step 2. v 384F - License Failure, Flash Copy disabled, go to step 2. 2. Display the LIC feature status screen. Connect the service terminal to the working cluster. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu LIC Feature Menu Display Active LIC Features 3. The LIC feature will be disabled if the Configured Capacity exceeds the Feature Capacity Limit. If it does, do one of the following: v The configured capacity must be reduced. v The customer must purchase more LIC feature capacity. Then the a customized diskette enabling the added capacity must be installed. 4. The LIC feature will be disabled if the LIC Feature Control diskette has not been created and installed. For more information on how to create the diskette reference, LIC Feature Control Record Extraction in chapter 5 of the Volume 2 book. Note: The LIC feature are automatically reloaded as part of the hard disk drive rebuild process. 5. The LIC feature capacities should be the same on both clusters. If they are not, call the next level of support. 6. Was there a LIC feature already installed on this 2105? v Yes, a feature may have been removed, and the clusters need to be rebooted. Close the problem that sent you here. Use the Alternate Cluster Repair Menu options to quiesce and then resume first the failing cluster and then the other cluster. If the problem is reopened, close it and then power the 2105 off and then on. If the problem is reopened, call the next level of support. v No, call the next level of support.
MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
The Background Certify and Build Logical Configuration from ISA process is used during installation to perform an automated DDM Certify and build of the logical configuration. This can be completed after the service representative has left the
477
MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
site. You have been sent to this Isolation procedure because a problem was detected. If there is another problem, this map will provide additional recovery guidance.
Procedure
1. Use the Repair Menu, Show / Repair Problems needing Repair to display the problem details for the problem which sent you here. Identify the ESC which was recorded and take the action described in the following table.
Table 76. ESC Actions ESC 1370 1371 1372 1373 1374 1375 1376 Description Failure detected during background certify ISA logical cfg build - failed to create logical subsystem ISA logical cfg build - failed to create rank ISA logical cfg build - failed to create custom volume ISA logical cfg build - failed to create PAV Failure detected by Fixed Block format monitor Unexpected (MLE) failure Action Go to MAP Section 49A0-1 Go to MAP Section 49A0-2 on page 479 Go to MAP Section 49A0-2 on page 479 Go to MAP Section 49A0-2 on page 479 Go to MAP Section 49A0-2 on page 479 Go to MAP Section 49A0-3 on page 479 Call your next level of support to determine if the process should be restarted. Go to MAP Section 49A0-4 on page 481
1377
MAP Section 49A0-1: 1. Use the Repair Menu, Show / Repair Problems needing Repair. In addition to the problem that sent you here, are there any other problems? v Yes, use the problem or logs to repair the other problem or problems. When they are repaired, return here and continue with the next step. v No, call your next level of support. Analysis of traces will be required to determine the cause and the action plan. 2. Close this problem that sent you here. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 3. Did all DDMs complete Certify during the repair action? v Yes, continue with the next step. v No, go to step 6 on page 479. 4. Restart the operation. From the service terminal Main Service Menu, select: Install/Remove Menu Background Certify and Build Logical Configuration from ISA Menu Background Certify and Build Logical Configuration from ISA
478
MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
Make the required selections. Notes: a. Certify DDMs is not needed because it should have already been completed. b. If the Import and build logical configuration from ISA option was previously selected, it must be reselected. 5. Return to the Install section to complete any outstanding actions. 6. Restart the operation. From the service terminal Main Service Menu, select: Install/Remove Menu Background Certify and Build Logical Configuration from ISA Menu Background Certify and Build Logical Configuration from ISA Make the required selections. Notes: a. Certify DDMs must be reselected. b. If the Import and build logical configuration from ISA option was previously selected, it must be reselected. 7. Return to the Install section to complete any outstanding actions. MAP Section 49A0-2: 1. Use the Repair Menu, Show / Repair Problems needing Repair. In addition to the problem that sent you here, are there any other problems? v Yes, use the problem or logs to repair the other problem or problems. When they are repaired, return here and continue with the next step. v No, call your next level of support. Analysis of traces will be required to determine the cause and the action plan. 2. Close the problem that sent you here. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 3. Restart the operation. From the service terminal Main Service Menu, select: Install/Remove Menu Background Certify and Build Logical Configuration from ISA Menu Background Certify and Build Logical Configuration from ISA Make the required selections then return to the Install section to complete any outstanding actions. Notes: a. Certify DDMs is not needed because it should have already been completed. b. The Import and build logical configuration from ISA option must be reselected. MAP Section 49A0-3: 1. Use the Repair Menu, Show / Repair Problems needing Repair. In addition to the problem that sent you here, are there any other problems?
479
MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
v Yes, use the problem or logs to repair the other problem or problems. When they are repaired, return here and continue with the next step. v No, call your next level of support. Analysis of traces will be required to determine the cause and the action plan. 2. Close the problem that sent you here. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 3. Determine if Fixed Block format has now completed for all volumes. a. From the service terminal Main Service Menu, select: Utility Menu Fixed Block Format Menu Show Fixed Block Format Status b. Display each LSS which shows a type of FB. c. Check for each LSS that all Volumes show an LV FORMAT/HARDWARE STATUS of FORMATTED/READY or that the LSS shows No Logical Volumes configured for this Logical Address. Did all LSSs appear as described? v Yes, go to step 8. v No, continue with the next step. 4. Did any Volume show an LV FORMAT/HARDWARE STATUS of FORMAT_IN_PROGRESS v Yes, continue with the next step. v No, go to step 6. 5. Note the FORMAT PERCENT for the Volumes which show FORMAT_IN_PROGRESS. Wait 10 minutes and then display them again. Did the FORMAT PERCENT increase? v Yes, the Fixed Block formatting appears to be continuing. Wait until all Volumes show FORMATTED/READY, then go to step 8. v No, wait a further 10 minutes and then check again. If the FORMAT PERCENT has not increased then call your next level of support. 6. Did any Volume show an LV FORMAT/HARDWARE STATUS of FAILED. v Yes, continue with the next step. v No, call your next level of support. 7. Attempt to recover the FAILED volumes. From the service terminal Main Service Menu, select: Utility Menu Fixed Block Format Menu Fixed Block Format Recovery Was the recover successful for all volumes? v Yes, Wait until all Volumes show FORMATTED/READY, then continue with the next step. v No, call your next level of support. 8. Was Import and build logical config from ISA originally selected? v Yes, continue with the next step. v No, go to the Install section to complete any outstanding actions. 9. Were Fiber Channel Open Systems Hosts configured during this install?
480
MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
v Yes, continue with the next step. v No, go to the Install section to complete any outstanding actions. 10. Power the ESS off and then on using the white switch on the operator panel. 11. Return to the Install section to complete any outstanding actions. MAP Section 49A0-4: 1. Use the Repair Menu, Show / Repair Problems needing Repair. In addition to the problem that sent you here, are there any other problems? v Yes, use the problem or logs to repair the other problem or problems. When they are repaired, return here and continue with the next step. v No, continue with the next step. 2. ESC 1377 is created during IML if the ESS was powered off or rebooted while the Background Certify and Build Logical Configuration from ISA process was still running. Has the cause of that been identified and resolved? v Yes, continue with the next step. v No, call you next level of support to assist in analysis and resolution of the problem. 3. Determine if all Automated Install processes completed successfully. From the service terminal Main Service Menu, select: Install/Remove Menu Enterprise Storage Server Menu Background Certify and Build Logical Configuration from ISA Menu Show Status of Certify / Build Process Did all selected Tasks complete successfully? v Yes, continue with the next step. v No, go to step 9 on page 482. 4. Close this problem. From the service terminal Main Service Menu, select: Repair Menu Close a Previously Repaired Problem 5. Was Import and build logical config from ISA originally selected? v Yes, continue with the next step. v No, go to the Install section to complete any outstanding actions. 6. Were Fiber Channel Open Systems Hosts configured during this install? v Yes, continue with the next step. v No, go to the Install section to complete any outstanding actions. 7. Power the ESS off and then on using the white switch on the operator panel. 8. Return to the Install section to complete any outstanding actions.
481
MAP 49A0: Failure Detected During Background Certify and Build Logical Configuration from ISA
9. Use the following table, in the sequence shown, to determine the next action.
Table 77. Status Actions Current status 1. Certify DDMs shows: Running, Failed, or Not yet started 2. Logical configuration shows: Running, Failed, or Not yet started 3. Fixed Block Format shows: Running, Failed, or Not yet started Action Go to step 2 on page 478 of MAP Section 49A0-1 Go to step 2 on page 479 of MAP Section 49A0-2 Go to step 2 on page 480 of MAP Section 49A0-3
4. Call home shows: Running, Failed, or Not yet Go to step 4 on page 481 of MAP started Section 49A0-4 5. Reboot shows: Running, Failed, or Not yet started Go to step 4 on page 481 of MAP Section 49A0-4
Description
An Automatic LIC Activation process has been suspended due to a logic or hardware error.
Isolation
1. Call the next level of support.
MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for AutoLIC Phase 000 only. This MAP provides guidance on the proper order to repair the problems and restart AutoLIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. Login to a cluster and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for the other cluster cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If a cluster appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If a cluster appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator
482
v 2. In v v
3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 9 on page 484. v No, continue with the next step. 4. Did the Automatic LIC process stop and display an error screen that also gave a recovery action to do? v Yes, continue with the next step. v No, Automatic LIC appears to have failed with no other problem or visual symptom. Verify that a norsStartOnce diskette was not left in the diskette drive, then call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). Did the recovery appear to work successfully so that Automatic LIC could continue? v Yes, close this problem (using the Repair Menu, Close a Previously Repaired Problem option) and then go to step 15 on page 484. v No, call the next level of support. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. Did the problem have an ESC=1472? v Yes, go to step 9 on page 484. v No, call the next level of support. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with step 9 on page 484.
5.
6.
7.
8.
Table 78. Problem Repair Sequence Problem Type Each cluster has a problem calling MAP 4A10. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 on page 482 of this MAP.
483
Non-Cluster problem
9. Close the problem that sent you here and then continue with the next step. 10. Verify the 2105 is fully operational. From the service terminal Main Service Menu, select: Repair Menu End of Call Status 11. Did you use MAP 4B10 during this repair? v Yes, go to step 13. v No, continue with the next step. 12. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 13. Was MAP 4025: Hard Disk Drive Build Process used to repair any of the problems? v Yes, the new Automatic LIC code needs to be recopied onto the cluster hard disk drives that was just repaired. Continue with the next step. v No, go to step 15. 14. Restart the Automatic LIC process with a recopy of the LIC code. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Initiate Automatic LIC Activation (On the Initiate Automatic LIC Activation screen, Copy LIC Image Source field, select the source for the LIC code, do not select No Copy.) The repair is complete and the automatic LIC activation process is in progress. 15. Restart the Automatic LIC process without a recopy of the LIC code. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu
484
MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. Login to cluster 2 (right) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6 on page 486. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 486. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 486. v No, call the next level of support.
485
MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 79. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4A30. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair the problem and then return here and continue in this table. Repair the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair the problem and then return here and continue in this table. Continue with the next step.
Cluster problem with ESC = 14Fx and calls MAP 4B20. Cluster problem with ESC not 14Fx
Non-Cluster problem
Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.
MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
486
MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. No, continue with the next step. addition to the problem that sent you here, are there any other problems? Yes, go to step 6. No, continue with the next step.
v 2. In v v 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 488. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 488. v No, call the next level of support.
Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
487
MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Table 80. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4A20. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair the problem and then return here and continue in this table. Repair the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair the problem and then return here and continue in this table. Continue with the next step.
Cluster problem with ESC = 14Fx and calls MAP 4B30. Cluster problem with ESC not 14Fx
Non-Cluster problem
Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.
MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. Login to cluster 2 and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error.
488
MAP 4A40: Automatic LIC Activation Problem, Cluster 1 Phase 100 (CCL)
Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 490. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 490. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all cluster problems have been repaired or a decision has been made to defer the repair, continue with step 7 on page 490.
Table 81. Problem Repair Sequence Problem Type Repair Sequence Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP.
489
MAP 4A40: Automatic LIC Activation Problem, Cluster 1 Phase 100 (CCL)
Table 81. Problem Repair Sequence (continued) Problem Type Repair Sequence Action Cluster 1 (left) did an unexpected reboot. Are there additional problems? v Yes, continue in this table. v No, call the next level of support. Cluster problem with ESC = 14Fx and calls MAP 4B40. Cluster problem with ESC not 14Fx 3 Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair using the problem and then return here and continue in this table. Continue with the next step.
Non-Cluster problem
7. Close the problem that sent you here and then continue with the next step. 8. Verify the 2105 is fully operational. From the service terminal Main Service Menu, select: Repair Menu End of Call Status 9. Did you use MAP 4B40 during this repair? v Yes, go to step 11. v No, continue with the next step. 10. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 11. Was MAP 4025: Hard Disk Drive Build Process used to repair any of the problems? v Yes, the new Automatic LIC code needs to be recopied onto the cluster hard disk drives that was just repaired. Continue with the next step. v No, go to step 13 on page 491. 12. Restart the Automatic LIC process with a recopy of the LIC code. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Initiate Automatic LIC Activation (On the Initiate Automatic LIC Activation screen, Copy LIC Image Source field, select the source for the LIC code, do not select No Copy.)
490
MAP 4A40: Automatic LIC Activation Problem, Cluster 1 Phase 100 (CCL)
The repair is complete and the automatic LIC activation process is in progress. 13. Restart the Automatic LIC process without a recopy of the LIC code. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Initiate Automatic LIC Activation (On the Initiate Automatic LIC Activation screen, Copy LIC Image Source field, select No Copy.) The repair is complete and the automatic LIC activation process is in progress.
MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Cluster 1 (left) remained operational and cluster 2 (right) had a failure that has suspended the Automatic LIC process. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. If this MAP determines that a repair of cluster 2 is needed: v DO NOT resume cluster 2, even if directed to by other MAPs. v Cluster 2 must be in the quiesced state, prior to resuming Automatic LIC process. 2. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. After the repair is complete, return here and continue. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 3. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 7 on page 492.
Problem Isolation Procedures, CHAPTER 3
491
MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
v No, continue with the next step. 4. Did you already repair a problem found in step 2 on page 491 of this MAP? v Yes, go to step 9 on page 493. v No, continue with the next step. 5. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 6. Did the problem have an ESC=1472? v Yes, go to step 9 on page 493. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 7. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 82. Problem Repair Sequence Problem Type Cluster 2 (right) has a problem calling MAP 4A40. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 on page 491 of this MAP. This should only occur after a successful repair of cluster 2. If cluster 2 still cannot communicate with cluster 1, return to step 1 on page 491 of this MAP to isolate the problem. Repair using the problem and then return here and continue in this table. Repair using the problem (do not resume the cluster) and then return here and continue in this table. Attention: Review the guidance in MAP step 1 on page 491 before continuing with the repair. Non-Cluster problem 5 Repair using the problem and then return here and continue in this table. Continue with the next step.
Cluster 2 (right) problem with ESC = 14Fx and calls MAP 4A50.
Cluster problem with ESC = 14Fx calling MAP 4B50 Cluster problem with ESC not 14Fx
492
MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
Note: Cluster refers to CEC and I/O drawers Verify that cluster 2 is in the quiesced state before continuing. From the service terminal Main Service Menu, select: Utility Menu Option Resource Management Menu Show Quiesced Resources (Cluster Bay 2)
8.
9. Close the problem or problems that sent you here and then continue with the next step. 10. Did you use MAP 4B50 during this repair? v Yes, continue with the next step. v No, go to step 12. 11. Resume the Automatic LIC process. (MAP 4B50 already terminated the Automatic LIC process. ) From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Initiate Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress. The repair is complete and the automatic LIC activation process is in progress. 12. Resume the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.
MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. Login to cluster 2 (right) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible?
493
MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL)
v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6. v No, continue with the next step. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 495. v No, continue with the next step. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. Did the problem have an ESC=1472? v Yes, go to step 7 on page 495. v No, call the next level of support.
2.
3.
4.
5.
Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 83. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4A70. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table.
Cluster problem with ESC = 14Fx and calls MAP 4B60. Cluster problem with ESC not 14Fx
494
MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL)
Table 83. Problem Repair Sequence (continued) Problem Type Non-Cluster problem Repair Sequence 4 Action Repair using the problem and then return here and continue in this table. Continue with the next step.
Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.
MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems?
Problem Isolation Procedures, CHAPTER 3
495
MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
v Yes, go to step 6. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 84. Problem Repair Sequence Problem Type Cluster 2 (right) has a problem calling MAP 4A60. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Continue with the next step.
Cluster problem with ESC = 14Fx and calls MAP 4B70. Cluster problem with ESC not 14Fx Non-Cluster problem
Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation
496
MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
The repair is complete and the automatic LIC activation process is in progress.
MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 200 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Cluster 2 (right) remained operational and cluster 1 (left) had a failure that suspended the Automatic LIC process. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. If this MAP determines that a repair of cluster 1 is needed: v DO NOT resume cluster 1 even if directed by other MAPs. v Cluster 1 must be in the quiesced state prior to resuming Automatic LIC process. Login to cluster 2 (right) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. After the repair is complete, return here and continue. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 7 on page 498. v No, continue with the next step. Did you already repair a problem found in step 2 of this MAP? v Yes, go to step 9 on page 499. v No, continue with the next step. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. Did the problem have an ESC=1472?
Problem Isolation Procedures, CHAPTER 3
2.
3.
4.
5.
6.
497
MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
v Yes, go to step 9 on page 499. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 7. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 85. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4A90. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 on page 497 of this MAP. Repair using the problem (do not resume the cluster) and then return here and continue in this table. Attention: The cluster has already been quiesced by the Automatic LIC process. It must stay quiesced as you use the standard repair procedures. Do not resume the cluster when the repair is complete as the code is not at the correct level yet. Cluster problem with ESC not 14Fx 3 Repair using the problem (do not resume the cluster) and then return here and continue in this table. Attention: The cluster has already been quiesced by the Automatic LIC process. It must stay quiesced as you use the standard repair procedures. Do not resume the cluster when the repair is complete as the code is not at the correct level yet. Non-Cluster problem 4 Repair using the problem and then return here and continue in this table. Attention: Review the guidance in MAP step 1 on page 497 before continuing with the repair.
498
MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
Table 85. Problem Repair Sequence (continued) Problem Type All problems repaired Repair Sequence Action Continue with the next step.
Note: Cluster refers to CEC and I/O drawers 8. Verify that cluster 1 is in the quiesced state before continuing. From the service terminal Main Service Menu, select: Utility Menu Option Resource Management Menu Show Quiesced Resources (Cluster 1) 9. Close the problem or problems that sent you here and then continue with the next step. 10. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.
MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 200 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371.
499
MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all cluster problems have been repaired or a decision has been made to defer the repair, continue with step 7.
Table 86. Problem Repair Sequence Problem Type Repair Sequence Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue. Repair using the problem and then return here and continue in this table. Continue at the next step.
7. Close the problem or problems) that sent you here and then continue with the next step. 8. Verify the 2105 is fully operational. From the service terminal Main Service Menu, select:
500
MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Repair Menu End of Call Status 9. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.
MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. Login to cluster 2 (right) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 1 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 1 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 1 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6 on page 502. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 502. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support.
Problem Isolation Procedures, CHAPTER 3
501
MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
5. Did the problem have an ESC=1472? v Yes, go to step 7. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 87. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4AB0. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair using the problem and then return here and continue in this table. Continue with the next step.
Cluster problem with ESC = 14Fx and calls MAP 4BA0. Cluster problem with ESC not 14Fx
Non-Cluster problem
Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.
502
MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
1. Login to cluster 1 (left) and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for cluster 2 cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If cluster 2 appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If cluster 2 appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6 on page 504. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 504. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 504. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources).
503
MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 88. Problem Repair Sequence Problem Type Cluster 2 (right) has a problem calling MAP 4AA0. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP. Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue in this table. Note: If a hard drive rebuild is required, use the original code load LIC CDs. Repair using the problem and then return here and continue in this table. Continue with the next step.
Cluster problem with ESC = 14Fx and calls MAP 4BB0. Cluster problem with ESC not 14Fx
Non-Cluster problem
Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.
MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 400 only. Normally there should be an additional problem that can be repaired. This MAP provides guidance on the proper order to repair the problems and restart Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
504
MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL)
Isolation
1. Login to a cluster and display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). The problem status for both clusters will be displayed. If the problems for the other cluster cannot be accessed, the status message will indicate this error. Does the problem details screen display status that problems from the alternate cluster are inaccessible? v Yes, do one of the following: If a cluster appears to be IMLing (CEC drawer operator panel displaying various codes), wait for Ready for Login to be displayed and then repeat this step. If a cluster appears to be hung for more than 5 minutes displaying a status code (other than Ready for Login) on the CEC drawer operator panel, go directly to MAP 4360: Isolation Using Codes Displayed by the CEC Drawer Operator Panel on page 371. Otherwise, follow the on screen guidance and go to MAP 4370: Error Displaying Problems Needing Repair on page 375. v No, continue with the next step. 2. In addition to the problem that sent you here, are there any other problems? v Yes, go to step 6. v No, continue with the next step. 3. Did you already repair a problem found in step 1 of this MAP? v Yes, go to step 7 on page 506. v No, continue with the next step. 4. Is this the first time through this Map ? v Yes, continue with the next step. v No, call the next level of support. 5. Did the problem have an ESC=1472? v Yes, go to step 7 on page 506. v No, call the next level of support. Note: AutoLic stopped because an error condition was detected. That error condition did not create an additional problem, the error must be identified and repaired before continuing. The next level of support may use End of Call Status option, or utility menu options to look for unexpected conditions (for example, fenced resources). 6. Display the details for the other problems needing repair. Use the following table to prioritize their repair sequence. When all problems have been repaired, continue with the next step.
Table 89. Problem Repair Sequence Problem Type Cluster 1 (left) has a problem calling MAP 4AE0. Repair Sequence 1 Action This most likely is caused by an ethernet communication problem between the clusters. A permanent ethernet communication problem would be detected and repaired by step 1 of this MAP.
505
MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL)
Table 89. Problem Repair Sequence (continued) Problem Type Cluster problem with ESC = 14Fx calling MAP 4BE0 Cluster problem with ESC not 14Fx Non-Cluster problem Repair Sequence 2 Action Repair using the problem and then return here and continue in this table. Repair using the problem and then return here and continue. Repair using the problem and then return here and continue in this table. Continue with the next step.
Note: Cluster refers to CEC and I/O drawers 7. Close the problem or problems that sent you here and then continue with the next step. 8. Resume the Automatic LIC process From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Resume Suspended Automatic LIC Activation The repair is complete and the automatic LIC activation process is in progress.
MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 000 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed.
506
MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL)
v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). Were you directed here from MAP 4A10? v Yes, go to step 3. v No, continue with the next step. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A10? v Yes, return to that problem and begin with MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) on page 482. v No, call the next level of support. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 7. v No, continue with the next step. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation Go to MAP 4025: Hard Drive Build Process for Automatic LIC on page 324 and load the original code (prior to starting the Automatic LIC) on the failing cluster. After that is complete, return here and continue with the next step. Return to MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) on page 482 that sent you here. (MAP 4A10 will have you complete any remaining repairs and then resume the Automatic LIC process.) Verify that you are logged into the cluster not being repaired. Display the current boot list setting. Note the order of the hdisks in the list. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2).
1.
2.
3.
4.
5.
6.
7. 8.
507
MAP 4B10: Automatic LIC Activation Problem, Phase 000 (CCL & NCCL)
9. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [One] 10. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Note: Note the hdisk that is listed. It should be the opposite hdisk to that listed first in step 8 on page 507. If it is not, call the next level of support. 11. Close the problem that calls MAP 4B10. 12. Power the failing cluster off and on (to IML the cluster). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B10? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed, continue with the next step. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation Return to MAP 4A10: Automatic LIC Activation Process Detected a Problem During Phase 000 (CCL & NCCL) on page 482 that sent you here. (MAP 4A10 will have you complete any remaining repairs and then continue the Automatic LIC process.)
13. 14.
15.
16.
508
MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). Were you directed here from MAP 4A20? v Yes, go to step 3. v No, continue with the next step. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A20? v Yes, return to that problem and begin with MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) on page 485. v No, call the next level of support. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 510. v No, order a cluster hard disk drive FRU to be used in a later step. Continue at the next step. Login to cluster 2 (right). Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu
Problem Isolation Procedures, CHAPTER 3
1.
2.
3.
4. 5.
509
MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activation Select cluster for LIC Activation: [Remote] Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Login to cluster 1 (left). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. Return to MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) on page 485. (MAP 4A20 will have you complete any remaining repairs and then continue the Automatic LIC process.)
7.
8.
9. 10. 11.
12.
13. Verify that you are logged into the cluster 2 (right). 14. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote]
510
MAP 4B20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL)
Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 510. If it is not, call the next level of support. Close the problem that calls MAP 4B20. Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B20? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A20: Automatic LIC Activation Problem, Cluster 1, Phase 100 (NCCL) on page 485 that sent you here. (MAP 4A20 will have you complete any remaining repairs and then resume the Automatic LIC process.)
17. 18.
19. 20.
MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
511
MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A30? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A30? v Yes, return to that problem and begin with MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) on page 486. v No, call the next level of support. 3. Have you replaced the I/O drawer planar assembly FRU? v Yes, the boot list in the new I/O drawer planar assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 513. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 1 (left). 5. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory: [Next]
512
MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activation Select cluster for LIC Activation: [Remote] 7. Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] 8. Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 9. Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. 10. Login to cluster 2 (right). 11. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 12. Return to MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) on page 486. (MAP 4A30 will have you complete any remaining repairs and then continue the Automatic LIC process.) 13. Verify that you are logged into the cluster 1 (left). 14. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select:
Problem Isolation Procedures, CHAPTER 3
513
MAP 4B30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL)
Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 513. If it is not, call the next level of support. Close the problem that calls MAP 4B30. Quiesce and resume cluster 2 (right). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B30? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A30: Automatic LIC Activation Problem, Cluster 2, Phase 100 (NCCL) on page 486 that sent you here. (MAP 4A30 will have you complete any remaining repairs and then resume the Automatic LIC process.)
17. 18.
19. 20.
MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
514
MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL)
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A40? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A40? v Yes, return to that problem and begin with MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) on page 488. v No, call the next level of support. 3. Login to cluster 2 (right). 4. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 11 on page 516. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 5. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 6. The cluster 1 dual hard disk drives need to have the original LIC code reloaded. Use MAP 4025: Hard Drive Build Process for Automatic LIC on page 324 and then return here and continue. 7. Login to cluster 1 (left). 8. Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu
Problem Isolation Procedures, CHAPTER 3
515
MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL)
9. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 10. Return to MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) on page 488. (MAP 4A40 will have you complete any remaining repairs and then continue the Automatic LIC process.) 11. Verify that you are logged into the cluster 2 (right). 12. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 13. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [One] 14. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the hdisk that is listed. It should be the opposite hdisk to that listed first in step 12. If it is not, call the next level of support. 15. Close the problem that calls MAP 4B40. 16. Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu
516
MAP 4B40: Automatic LIC Activation Problem, Cluster 1, Phase 100 (CCL)
Alternate Cluster Repair Menu 17. Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. 18. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B40? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Continue with the next step. 19. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 20. Return to MAP 4A40: Automatic LIC Activation Detected a Cluster 1 Problem During Phase 100 (CCL) on page 488 that sent you here. (MAP 4A40 will have you complete any remaining repairs and then continue the Automatic LIC process.)
MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 100 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes
517
MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A50? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A50? v Yes, return to that problem and begin with MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) on page 491. v No, call the next level of support. 3. Login to cluster 1 (left). 4. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 10. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 5. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation 6. Resume cluster 2. When the resume is complete, continue at the next step. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Login to cluster 2 (right). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. Return to MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) on page 491. (MAP 4A50 will have you complete any remaining repairs and then continue the Automatic LIC process.) Verify that you are logged into the cluster 1 (left). Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu
7. 8.
9.
10. 11.
518
MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 12. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 13. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 11 on page 518. If it is not, call the next level of support. 14. Close the problem that calls MAP 4B50. 15. Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 16. Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. 17. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B50? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Continue with the next step. 18. Terminate the Automatic LIC process. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Terminate Automatic LIC Activation
519
MAP 4B50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL)
19. Return to MAP 4A50: Automatic LIC Activation Problem, Cluster 2, Phase 100 (CCL) on page 491 that sent you here. (MAP 4A50 will have you complete any remaining repairs and then continue the Automatic LIC process.)
MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A60? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A60? v Yes, return to that problem and begin with MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) on page 493. v No, call the next level of support. 3. Have you replaced the I/O drawer planar assembly FRU? v Yes, the boot list in the new I/O drawer planar assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 14 on page 522. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 2 (right).
520
MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL)
5. Quiesce cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 6. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 7. Do the Copy LIC Directory. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy LIC Directory Select cluster: [Remote] Select source directory [Previous] Select destination directory [Active] 8. Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] 9. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 10. Return to MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) on page 493. (MAP 4A60 will have you complete any remaining repairs and then continue the Automatic LIC process.) 11. Login to cluster 1 (left). 12. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu
521
MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL)
Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 13. Return to MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) on page 493. (MAP 4A60 will have you complete any remaining repairs and then continue the Automatic LIC process.) 14. Verify that you are logged into the cluster 2 (right). 15. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 16. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [One] 17. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the hdisk that is listed. It should be the opposite hdisk to that listed first in step 15. If it is not, call the next level of support. 18. Close the problem that calls MAP 4B60. 19. Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 20. Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit.
522
MAP 4B60: Automatic LIC Activation Problem, Cluster 1, Phase 150, (CCL)
21. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B60? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A60: Automatic LIC Activation Problem, Cluster 1, Phase 150 (CCL) on page 493 that sent you here. (MAP 4A60 will have you complete any remaining repairs and then resume the Automatic LIC process.)
MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A70? v Yes, go to step 3 on page 524. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A70? v Yes, return to that problem and begin with MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) on page 495. v No, call the next level of support.
Problem Isolation Procedures, CHAPTER 3
523
MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
3. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 14 on page 525. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 1 (left). 5. Quiesce cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 6. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 7. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] 8. Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] 9. Resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 10. Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. 11. Login to cluster 2 (right). 12. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive?
524
MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 13. Return to MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) on page 495. (MAP 4A70 will have you complete any remaining repairs and then continue the Automatic LIC process.) 14. Login to cluster 1 (left). 15. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 16. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14. If it is not, call the next level of support. 17. Display again the current boot list setting. The two hdisks should now be reversed in the list. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] 18. Close the problem that calls MAP 4B70. 19. Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 20. Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. 21. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option).
525
MAP 4B70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL)
Is there a problem listed calling MAP 4B70? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A70: Automatic LIC Activation Problem, Cluster 2, Phase 150 (CCL) on page 495 that sent you here. (MAP 4A70 will have you complete any remaining repairs and then resume the Automatic LIC process.)
MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 200 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A80? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A80? v Yes, return to that problem and begin with MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) on page 497. v No, call the next level of support. 3. Have you replaced the I/O Drawer Planar Assembly FRU?
526
MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 528. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 2 (right). 5. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] Power off and power on cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Login to cluster 1 (left). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support.
7.
8.
9. 10. 11.
527
MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
12. Exit this MAP and go to MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) on page 497. (MAP 4A80 will have you complete any remaining repairs and then continue the Automatic LIC process.) 13. Login to cluster 2 (right). 14. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14. If it is not, call the next level of support. 17. Close the problem that calls MAP 4B80. 18. Power off and power on cluster 1 (left). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 19. Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. 20. Display problems needing repai (Repair Menu, Show / Repair Problems Needing Repair option)r. Is there a problem listed calling MAP 4B80? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced.
528
MAP 4B80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL)
v No, the boot problem is fixed. Return to MAP 4A80: Automatic LIC Activation Problem, Cluster 1, Phase 200 (CCL) on page 497 that sent you here. (MAP 4A80 will have you complete any remaining repairs and then resume the Automatic LIC process.)
MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 200 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4A90? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4A90? v Yes, return to that problem and begin with MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) on page 499. v No, call the next level of support. 3. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O Planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 530. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step.
Problem Isolation Procedures, CHAPTER 3
529
MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
4. Login to cluster 1 (left). 5. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory: [Next] Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. Login to cluster 2 (right). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive?
6.
7.
8.
9. 10. 11.
v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 12. Return to MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) on page 499. (MAP 4A90 will have you complete any remaining repairs and then continue the Automatic LIC process.) 13. Login to cluster 1 (left). 14. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu
530
MAP 4B90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL)
Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 530. If it is not, call the next level of support. Close the problem that calls MAP 4B90. Power off and power on cluster 2 (right). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4B90? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4A90: Automatic LIC Activation Problem, Cluster 2, Phase 200 (CCL) on page 499 that sent you here. (MAP 4A90 will have you complete any remaining repairs and then resume the Automatic LIC process.)
17. 18.
19. 20.
531
MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). Were you directed here from MAP 4AA0? v Yes, go to step 3. v No, continue with the next step. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4AA0? v Yes, return to that problem and begin with MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) on page 501. v No, call the next level of support. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 533. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. Login to cluster 2 (right). Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu
1.
2.
3.
4. 5.
532
MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next] 6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] Quiesce and resume cluster 1. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Login to cluster 1 (left). Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. Return to MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) on page 501. (MAP 4AA0 will have you complete any remaining repairs and then continue the Automatic LIC process.)
7.
8.
9. 10. 11.
12.
13. Verify that you are logged into cluster 2 (right). 14. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote]
Problem Isolation Procedures, CHAPTER 3
533
MAP 4BA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL)
Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 533. If it is not, call the next level of support. Close the problem that calls MAP 4BA0. Quiesce and Resume cluster 1 (left). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 1 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4BA0? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4AA0: Automatic LIC Activation Problem, Cluster 1, Phase 150 (NCCL) on page 501 that sent you here. (MAP 4AA0 will have you complete any remaining repairs and then resume the Automatic LIC process.)
17. 18.
19. 20.
MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
534
MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
Description
This MAP is called for Automatic LIC Phase 150 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). 1. Were you directed here from MAP 4AB0? v Yes, go to step 3. v No, continue with the next step. 2. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4AB0? v Yes, return to that problem and begin with MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) on page 503. v No, call the next level of support. 3. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 13 on page 536. v No, order a cluster hard disk drive FRU to be used in a later step. Continue with the next step. 4. Login to cluster 1 (left). 5. Do the Single Drive LIC Install Preparation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Install Preparation Select cluster: [Remote] Select source directory [Next]
Problem Isolation Procedures, CHAPTER 3
535
MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
6. Do the Single Drive LIC Activation. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Single Drive LIC Activated Select cluster for LIC activation: [Remote] 7. Do the Copy Automatic LIC Control Files. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Automatic LIC Recovery and Utilities Menu Copy Automatic LIC Control Files Select cluster to copy Automatic LIC files to: [Remote] 8. Quiesce and resume cluster 2. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 9. Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. 10. Login to cluster 2 (right). 11. Identify and replace the failing hard disk drive. From the service terminal Main Service Menu, select: Repair Menu Cluster Dual Hard Disk Drive Repair Menu Identify/Replace a Failing Cluster Hard Disk Drive Is there a failing cluster hard disk drive? v Yes, use this menu option to replace the failing hard disk drive, then continue with the next step. v No, call the next level of support. 12. Return to MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) on page 503. (MAP 4AB0 will have you complete any remaining repairs and then continue the Automatic LIC process.) 13. Verify that you are logged into cluster 1 (left). 14. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 15. Change the boot list setting. From the service terminal Main Service Menu, select:
536
MAP 4BB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL)
Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 16. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 14 on page 536. If it is not, call the next level of support. Close the problem that calls MAP 4BB0. Quiesce and Resume cluster 2 (right). From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu Wait up to 45 minutes for the rack operator panel Cluster 2 Ready LED to be lit. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4BB0? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4AB0: Automatic LIC Activation Problem, Cluster 2, Phase 150 (NCCL) on page 503 that sent you here. (MAP 4AB0 will have you complete any remaining repairs and then resume the Automatic LIC process.)
17. 18.
19. 20.
MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL)
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
Description
This MAP is called for Automatic LIC Phase 400 only. Normally there should be a primary problem that should be used to start the repair. This MAP provides secondary repair information to rebuild the cluster hard drive and then restart the Automatic LIC. Note: CCL is Concurrent Code Load, NCCL is NonConcurrent Code Load.
Problem Isolation Procedures, CHAPTER 3
537
MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL)
Isolation
Note: A problem that calls a 4Bxx MAP means the cluster booted from the wrong cluster hard disk drive. During the Automatic LIC process, the cluster dual hard disk drives are no longer mirrored and each will have a different level of code. Because the cluster rebooted from the wrong drive, the 2105 functional code was not loaded, (like the norsStartOnce diskette is in the cluster). Once this repair is successful, the cluster boot will be as expected and you will be directed to exit this MAP. This can occur if: v The target boot drive failed. v You used MAP 43A5: Bootlist Management Using SMS for Automatic LIC on page 392 and it had you change the boot list for diagnostic purposes v An I/O Drawer Planar Assembly FRU was replaced (there is a 50% chance that the default boot list is not valid for the Automatic LIC process). Were you directed here from MAP 4AE0? v Yes, go to step 3. v No, continue with the next step. Verify that you have displayed all problems needing repair for this service action. Was there a separate problem calling MAP 4AE0? v Yes, return to that problem and begin with MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL) on page 504. v No, call the next level of support. Have you replaced the I/O Drawer Planar Assembly FRU? v Yes, the boot list in the new I/O drawer planar Assembly may be set incorrectly for this phase of the automatic LIC process. Changing the boot list may correct this problem. Continue with step 5. v No, continue with the next step. Call the next level of support. The following notes describe the situation and actions needed to recover: v The hard disk drive that was loaded with the new LIC code has probably failed. v A hard drive rebuild is needed, but can only be done to the original LIC code level.
1.
2.
3.
4.
Note: The existing configuration diskettes contain data that is only valid for the original LIC code level. New level diskettes cannot be created until both clusters have been operational at the new level for a minimum of 12 hours. v Once the original LIC code has been loaded, a manual process to replace the failed hard drive and load/activate the new LIC code will be needed. 5. Login to working cluster. 6. Display the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu
538
MAP 4BE0: Automatic LIC Activation Problem, Phase 400 (CCL & NCCL)
Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk (hdisk0 or hdisk1) that is listed. Note the number of hdisks listed (1 or 2). 7. Change the boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Change Boot List Select cluster: [Remote] Select first drive bootlist: [Alternate] Select number of hard drives in bootlist: [Two] 8. Display again the current boot list setting. From the service terminal Main Service Menu, select: Licensed Internal Code Maintenance Menu Automatic LIC Activation Menu Automatic LIC Recovery and Utilities Menu Display Current Boot List Select cluster to view bootlist: [Remote] Note: Note the first hdisk that is listed. It should be the opposite hdisk to that listed first in step 6 on page 538. If it is not, call the next level of support. 9. Close the problem that calls MAP 4BE0. 10. Power off and power on the failing cluster. From the service terminal Main Service Menu, select: Repair Menu Alternate Cluster Repair Menu 11. Wait up to 45 minutes for the rack operator panel Cluster Ready LED to be lit. 12. Display problems needing repair (Repair Menu, Show / Repair Problems Needing Repair option). Is there a problem listed calling MAP 4BE0? v Yes, call the next level of support. It appears that the Automatic LIC process is detecting a problem with both hard disk drives that is not due to the I/O drawer planar assembly that was replaced. v No, the boot problem is fixed. Return to MAP 4AE0: Automatic LIC Activation Cluster Problem, Phase 400 (CCL & NCCL) on page 504 that sent you here. (MAP 4AE0 will have you complete any remaining repairs and then resume the Automatic LIC process.)
539
Description
ESS specialist is accessed by using a web browser from the ESSNet console or other customer console. The ESS specialist software runs on each 2105 Model 800 cluster. Both the customer console and the ESSNet console access the cluster through the ESSNet ethernet hub.
Isolation
1. Does ESS Specialist access work from the ESSNet console? v Yes, continue with the next step. v No go to step 4. 2. Is access working from a customer console (if used)? v Yes, ensure access works to both clusters before determining that the problem is no longer occurring. v No, continue with the next step. 3. ESS specialist works from the ESSNet console but fails from the customer console. The customer network accesses the cluster through an ethernet connection at the ESSNet console ethernet hub. Check the following: v Customer is using the proper Hostname for the cluster on an intranet. v Customer is using the proper Hostname and domain name for the cluster on internet. v Have the customer try the tcp/ip address. v Have the customer ping the tcp/ip address. If the ping is successful, then there is a problem with the domain nameserver or other customer or internet problem. v Verify that the ESSNet ethernet hub port indicator for the customer network attachment is on or blinking. This means it is able to communicate with the customer ethernet hub/connection. The problem is either a failing port on the ESSNet ethernet hub or more likely a customer network problem. Go to MAP 4450: ESS Cluster to Customer Network Problem on page 407. 4. Ensure that the cluster has ESS Specialist access enabled. The InfoServer status will be running. From the service terminal Main Service Menu, select: Configuration Options Menu Configure Communications Resources Menu ESS Specialist Menu Show ESS Specialist Status Continue with the next step. 5. Is the InfoServer running? v Yes, go to MAP 4440: ESSNet1 or Master Console to Cluster Ethernet Problem on page 405 v No, use the Enable / Disable ESS Specialist option to enable it.
540
Description
SCSI bus errors can be detected by any SCSI bus card on the interface. The 2105 Model 800 SCSI host card will most often detect errors in the signals it receives. The customer host system SCSI card will most often detect errors in the signals it receives. The SCSI cables seldom fail, but the SCSI cable connections may cause errors if they are not properly seated. Errors can also be caused if there are not terminators on each end of the SCSI cable. The 2105 Model 800 SCSI Host Adapter has a terminator on the card itself.
Isolation
1. Display and repair any 2105 Model 800 reported SCSI adapter problems that may be related to the failure. If none are found, continue with the next step. From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair 2. Use the following checks to locate and repair the problem. 3. Check for a fenced condition: Note: If SCSI parts have been replaced and the customer still does not have access to some volumes. The original SCSI error could have fenced a SCSI port. a. Verify that the SCSI ports are not fenced: Connect the service terminal to the cluster being serviced. From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Show Fenced Resources b. Reset any fenced SCSI ports: From the service terminal Main Service Menu, select: Utility Menu Resource Management Menu Reset Fence For a Resource Check that the SCSI host cable is properly connected at each SCSI card. Check that the 2105 Model 800 SCSI host card(s) is properly seated. Check that the host system(s) SCSI card(s) is properly seated. Check the termination of the SCSI Bus: v A SCSI bus interface cable connects two or more SCSI cards. Connectors at each end of the daisy-chain must be terminated. The 2105 Model 800 SCSI host card must be at one end of the SCSI cable. If two 2105 Model
Problem Isolation Procedures, CHAPTER 3
4. 5. 6. 7.
541
542
Cluster 1
Cluster 2
Front View
Figure 144. 2105 Model 800 ESD Discharge Pad Locations (s009141)
Description
You are here to resolve a Data Check failure that has been logged with one of the ESC values listed below. An action to repair hardware or microcode is necessary; the action required may be to repair another problem in the log. This MAP isolates for the following ESCs: v ESC 3490, customer data sequence number validation error with data LRC. v ESC 34A0, customer data sequence number validation error without data LRC. v ESC 34AF, third or later repeat of customer data sequence number validation error on the same target LBA (Logical Block Address), track or volume. v ESC 34B0, SCSI Send Diagnostic command initiated data transfer validation process failure. v ESC 4960, second occurrence of customer data sequence number validation error on the same target LBA (Logical Block Address), track or volume.
Isolation
Refer to Table 90 on page 544 for the ESC that requires problem resolution. Determine the necessary hardware or microcode repair action.
543
Customer Data Sequence Number validation error. Data transferred from a DDM to cache memory is not from the expected Logical Block Address (LBA). The Sequence Number in the received LBA does not match the expected Sequence Number. ESC 34AF indicates that additional Sequence Number error events have been logged for the same target LBA, track or volume.
34B0
A SCSI Send Diagnostic command initiated data transfer validation process failed. A write or read data transfer failure would be logged as another error event and ESC. If no other error has been logged then this failure indicates that the data read did not match the test pattern data written.
Description
You are here to resolve a Data Check failure that has been logged with one of the ESC values listed below. An action to repair hardware or microcode is necessary. This required action will be to repair another problem in the log. The failure has caused customer data to be unreadable. The customer must restore the data after the hardware or microcode repair action is complete. This MAP isolates for the following ESCs: v ESC 4910, Customer data check, DDM medium error, single LBA. v ESC 4920, Customer data check, DDM medium error, multiple LBAs. v ESC 4930, Customer data check, data LRC, single LBA. v ESC 4940, Customer data check, data LRC, multiple LBAs.
Isolation
1. Refer to Table 91 on page 545 for the ESC that requires problem resolution. Determine the necessary hardware or microcode repair action.
544
If a hardware repair problem is not available for this failure, the failure may be intermittent. If the data failure continues, call your next level of support for assistance in isolating and repairing the problem.
Table 91. Customer Data Check Failure ESC Repairs ESC 4910 or 4920 Description Recommended Action
Customer Data Check affecting one or Locate and repair the problem with more Logical Block Address on the ESC CXXX, DXXX or EXXX that target volume. 4910 indicates one contains a repair action for the DDM LBA, 4920 indicates more than one or SSA device card that is associated LBA. with this Data Check. The SSA device card reported a Medium Error during data transfer from DDM to cache memory.
4930 or 4940
Customer Data Check affecting one or Locate and repair any problems with more Logical Block Address on the ESC 33XX or 34XX. target volume. 4930 indicates one LBA, 4940 indicates more than one LBA. An LRC check, sequence number check or physical address check detected during data transfer could not be recovered. Data has been marked defective on the DDM. Subsequent attempts to read this data will fail.
545
IODELAY adjusts ICKDSF to run concurrently with customer operations. ANALYZE scans the volume for data that is not readable or usable. 2. See Example of Media Sim Maintenance Procedure 2 for the location of the ESC and addresses of the failing track and head (cccchh) in the Analyze sense information. 3. For each track that reports an ESC of 49XX, issue the following command (all on the same line):
INSPECT <UNIT() | DDNAME()> <VFY()|NOVFY> ASSIGN NOCHECK NOPRESERVE TRACK(cccc,hh)
Warning: The above ICKDSF inspect command will result in the loss of all customer data on that track. The NOPRESERVE parameter must be specified for the 2105 Model 800. The PRESERVE parameter is not valid for the 2105 Model 800. All previous attempts by the subsystem to recover the data have not been successful. Although the track will be returned to a usable state, all customer data on the specified track will be lost when the INSPECT command is run. Example of Media Sim Maintenance Procedure 2: To locate all tracks with unrecoverable data, obtain information on the allocation of user data. To restore such tracks to a usable condition, run the ICKDSF command sequence below. ICKDSF must be at level 16 or higher. The bold text in the following example is defined in the note below.
ENTER INPUT COMMAND: analyze unit(1290) nodrive scan ANALYZE UNIT(1290) NODRIVER SCAN ICK00700I DEVICE INFORMATION FOR 1290 IS CURRENTLY AS FOLLOWS: PHYSICAL DEVICE = XXXX STORAGE CONTROLLER = XXXX STORAGE CONTROL DESCRIPTOR = CC DEVICE DESCRIPTOR = 06 ICK04000I DEVICE IS IN SIMPLEX STATE ICK01400I 1290 ANALYZE STARTED ICK01408I 1290 DATA VERIFICATION TEST STARTED ICK21776I DATAVER TEST: ERROR DURING DATA VERIFICATION CSW = D07C88 0200FFFF CCW = DE000000 3000FFFF FILEMASK = 1E SENSE = 80000000 9000010B 00000034 80000004 02007667 FB200F0B 000040E2 0003A401 ICK21401I 1290 SUSPECTED DRIVE PROBLEM ICK401I 1290 SUSPECTED DRIVE PROBLEMcchh ICK01406I 1290 ANALYZE ENDED ICK00001I FUNCTION COMPLETED, HIGHEST CONDITION CODE WAS 8
Note: In this example, the ESC is 0F0B and the failing track and head address (cccchh) is 03A401. The cccc is 03A4 and the hh is 01. Common ICKDSF Messages:
546
Description
You are here to resolve a Data Check failure that has been logged with one of the ESC values listed below. An action to repair hardware or microcode is necessary. This required action will be to repair another problem in the log. This MAP isolates for the following ESCs: v ESC 4980, Meta data check, DDM medium error, single LBA. v ESC 4990, Meta data check, DDM medium error, multiple LBA. v ESC 49A0, Meta data check, data LRC, single LBA. v ESC 49B0, Meta data check, data LRC, multiple LBA.
Isolation
Refer to Table 92 for the ESC that requires problem resolution. Determine the necessary hardware or microcode repair action. Data will be recovered by internal microcode. No data repair action is required. If a hardware repair problem is not available for this failure, the failure may be intermittent. If the data failure continues, call your next level of support for assistance in isolating and repairing the problem.
Table 92. Meta Data Check Failure ESC Repairs ESC 4980 or 4990 Description Meta Data Check affecting one or more Logical Block Address on the target volume. 4980 indicates one LBA, 4990 indicates more than one LBA. The SSA device card reported a Medium Error during data transfer from DDM to cache memory. 49A0 or 49B0 Meta Data Check affecting one or more Logical Block Address on the target volume. 49A0 indicates one LBA, 49B0 indicates more than one LBA. An LRC check detected during data transfer from DDM to cache memory could not be recovered. Locate and repair the problem with ESC 33XX that contains a repair action for the DDM or SSA device card that is associated with this data check. Recommended Action Locate and repair the problem with ESC CXXX, DXXX or EXXX that contains a repair action for the DDM or SSA device card that is associated with this Data Check.
547
Description
Link incidents are problems that are not automatically detected, isolated and reported by any one single node on the optical link. They occur on an interface and may cause multiple nodes to detect different types of link incidents. Each node detecting and reporting a link incident will generate its own link incident. Fault isolation of link incidents is solved by the combined use of product and system documentation: v Enterprise Systems Link Fault Isolation book, form number SY22-9533 v Maintenance Information for S/390 Fiber Optic Links (ESCON, FICON, Coupling Links, and Open System Adapters) book, form number SY27-2597.
Isolation
1. Were you sent here by a link incident which was detected by a unit or device external to this 2105? v Yes, continue with the next step. v No, go to step 3 on page 549. 2. Use link fault isolation procedures to determine the source of the problem. See Description above.
548
7.
8.
9.
10.
549
MAP 5305: ESCON or Fibre Channel Bit Error Rate Test Failure
Attention: Customer disruption may occur if microcode and power boundaries are not in the proper conditions for this service action. Verify that you start all service activities in Chapter 2: Entry for All Service Actions on page 29 in Volume 1.
550
Description
The Bit Error Rate Test was run and the ESCON or fibre channel link that the test was run on could not transmit fibre frames. (The Bit Error Rate Test counts errors during fibre frame transmission on a fibre link, when the link cannot transmit frames, no errors can occur. This means that the Bit Error Rate Test cannot be run.) The problem with the link may be caused by the fibre link itself or by the adapter at either end of the link.
Isolation
This diagnostic requires that the port being tested is connected to an enabled source transmitting communication frames. This does not require customer data transfer, the normal idle process is enough. Normally the interface must be physically connected directly to a host system or through a switch (ESCON director or SAN fabric). Do you have an enabled connection as described? v Yes, return to the procedure that sent you here. v No, the test was run without the needed connection, close the problem
Description
Bit Error Rate Threshold incidents are caused by specific conditions at an interface or along a line which can cause bits to be received or interpreted incorrectly. These bit errors are counted, and when a specific number is reached (threshold exceeded), the link is operating in a degraded mode. Bit errors are counted by each node attached on a link. You must determine which node(s) in a link have detected a threshold exceeded condition to identify the link or nodes causing the incident.
Isolation
1. Determine what type of error was reported by the customer. Was the customer-reported error a Bit Error Threshold Exceeded (BER) detected at the ATTACHED node? v Yes, go to step 3. v No, continue with the next step. 2. display problems using the following service panel options: From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Are there any bit error rate problems (ESC=356A) for the failing link? v Yes, continue with the next step. v No, additional link problem determination is needed. Ensure that all optical link cables are reconnected, then return to MAP 0120 in the Enterprise Systems Link Fault Isolation book, form number SY22-9533. 3. Test the bit error rate: v Reconnect the optical link cables to the subsystem, if previously disconnected.
Problem Isolation Procedures, CHAPTER 3
551
Description
This MAP contains two procedures: v Isolation Procedure 1: Optical Transmitter Measurement on page 553 v Isolation Procedure 2: Optical Receiver Measurement on page 555 The procedures should be performed sequentially. These procedures measure the optical power at the 2105 Model 800 ESCON card and the customers ESCON port cable using the optical power meter (P/N 18F7005). The coupler and test cable are part of the fiber optic test support kit (P/N 18F6953). Isolation Procedure 1 will run the ESCON Port Optical Wrap Test on the selected 2105 ESCON card port. A successful wrap test will not only ensure that the card is operating correctly, but will also condition the port for a power measurement. Note: Do not skip the wrap test, even if was previously run.
552
White Duplex Connector From Host Adapter Card 1300nn To ESCON Channel or ESCON Director 2105 FICON Host Card Duplex Connector Test Cable Power Meter
Black
Biconic Connectors
White Duplex Connector From Host Adapter Card 1300nn To ESCON Channel or ESCON Director
Figure 145. Measuring Optical Transmit Power (S008185m)
Black
Power Meter
Isolation Procedure 1: Optical Transmitter Measurement: This procedure measures the optical power transmitted from the 2105 Model 800 ESCON card through a short test cable (P/N 18F6948). Note: Clean the fiber optic connectors as described in the cleaning instructions in the fiber optic cleaning kit (New P/N 46G6844 or Old P/N 5453521) before connecting or reconnecting the fiber optic cables. 1. Verify that the host bay containing the 2105 Model 800 ESCON card is powered on. 2. Run the optical wrap test on the desired ESCON card port: From the service terminal Main Service Menu, select: Machine Test Menu Host Interface Cards Menu ESCON Host Cards Menu
Problem Isolation Procedures, CHAPTER 3
553
3.
4.
5.
6. 7.
Note: The repair procedure will resume the required resources. When the repair is complete, return to the procedure that directed you here. 8. Do you still need to perform the Optical Receiver Measurement procedure? v Yes, perform the Optical Receiver Measurement (Isolation Procedure 2).
554
White
Black
Isolation Procedure 2: Optical Receiver Measurement: This procedure measures the power received at the end of the customers ESCON link cable (input into the 2105 host card optical receiver). Note: Always clean the fiber optic connectors as described in the cleaning instructions in the fiber optic cleaning kit (New P/N 46G6844 or Old P/N 5453521) before connecting or reconnecting the fiber optic cables. 1. Ensure that the device on the other end of the link is powered on. 2. Disconnect the fiber optic cable connector from the duplex connector on the 2105 Model 800 ESCON card, if not previously disconnected. 3. Connect the duplex connector of the customers fiber optic cable (the duplex connector that was removed from the 2105 Model 800 ESCON card) into one side of the duplex-to-duplex test coupler, P/N 18F6952 (see Figure 146). 4. Connect the duplex connector of the optical power meter test cable into the other side of the duplex-to-duplex test coupler. If the optical power meter has not been previously turned on, zeroed, and set to the correct scale, set the meter using Optical Power Meter Setup on page 556. After the meter is set, insert the black biconic connector of the test cable, P/N 18F6948, into the receptacle on the top of the power meter. 5. Use the optical power meter to obtain a reading. The power reading should be at least -29.0 dBm (-28.0 dBm is more than -29.0 dBm).
Problem Isolation Procedures, CHAPTER 3
555
5. 6. 7. 8.
9.
Description
This MAP contains two procedures: v Isolation Procedure 1: Optical Transmitter Measurement on page 557
556
Power Meter
Figure 147. Measuring Fibre Channel Optical Transmit Power (s008840l)
1. Verify that the host bay containing the 2105 Model 800 fibre channel card is powered on. 2. Run the optical wrap test on the desired fibre channel card port: From the service terminal Main Service Menu, select: Machine Test Menu Host Interface Cards Menu Fibre Channel Host Cards Menu Fibre Port Optical Wrap Test Select the desired host card port to be tested and follow the screen instructions to run the test.
557
558
Device
Power Meter
1. Ensure that the device on the other end of the link is powered on. 2. Disconnect the fiber optic cable from the duplex connector on the 2105 Model 800 fibre channel card, if not previously disconnected. 3. Connect the customers fiber optic cable (that was removed from the 2105 Model 800 fibre channel host card) to the SC-to-ST adapter, Figure 148. 4. Connect the ST-to-ST test cable from the SC-to-ST adapter to the power meter. Note: If the optical power meter has not been previously turned on, zeroed, and set to the correct scale, set the meter using MAP 5320: ESCON Optical Power Measurement on page 552. If the Fibre Channel connection uses long wavelength (LW2), set the meter to 1300nm. If it uses short wavelength (SW2), set the meter to 780nm. Use the optical power meter to obtain a reading. The power reading should be -3.0 dBm and -20.0 dBm (-19.0 dBm is more power than -20.0 dBm). Record the actual measurement value for possible use later during the link fault isolation procedures. Disconnect the customer fiber optic channel cable from the coupler and reconnect the cable to the 2105 Model 800 fibre channel card. Return to the service terminal and follow the instructions on the screen to: Make Resource Available for Customer Use Return to the procedure that sent you here.
Problem Isolation Procedures, CHAPTER 3
5.
6. 7. 8.
559
Description
Isolating ESCON and Fibre link faults outside the 2105 may be easier using node information stored in the 2105. LIC levels prior to Code EC 2.3.0.0 do not provide this menu option.
Isolation
1. To display the 2105 node information for ESCON and Fibre host adapters, use the service login Utility Menu, Display ESCON and Fibre Node Descriptors option. Note: LIC levels prior to Code EC 2.3.0.0 do not provide this menu option. 2. An example of the displayed information is shown below. v For a definition of the 2105 Port ID field, go to step 3 on page 561.
560
3. Use the following table to convert the 2105 Port ID field in the second column to the host bay, host card, and port:
Table 93. 2105 Port ID Field HOST ADAPTOR PORT Host Bay 1 Port Host Bay 2 Port Host Bay 3 Port Host Bay 4 Port IDs IDs IDs IDs 0020 0021 0024 0025 0028 0029 002C 002D 0080 0081 0084 0085 0088 0089 008C 008D 00A0 00A1 00A4 00A5 00A8 00A9 00AC 00AD
Host Card 1 Port 0000 0 (Top) Host Card 1 Port 0001 1 (Bottom) Host Card 2 Port 0004 0 (Top) Host Card 2 Port 0005 1 (Bottom) Host Card 3 Port 0008 0 (Top) Host Card 3 Port 0009 1 (Bottom) Host Card 4 Port 000C 0 (Top) Host Card 4 Port 000D 1 (Bottom)
Description
You are here to resolve a Data Path failure that has been logged with one of the ESC values listed below. An action to repair hardware or microcode is necessary. The action may require the repair of another problem in the log. The failure may have caused customer data to be unreadable. If this occurs the customer must restore the data after the hardware or microcode repair action is complete. This MAP isolates for the following ESCs:
Problem Isolation Procedures, CHAPTER 3
561
Isolation
The recommended action is to contact your next level of support for fault isolate and repair assistance. The most likely repair activities are: 1. Locate and repair any related problems. 2. Have the customer restore the data after the hardware problem has been resolved.
Description
Link incidents are problems that are not automatically detected, isolated and reported by any one single node on the Fibre Channel link. They occur on an interface and may cause multiple nodes to detect different types of link incidents. Each node detecting and reporting a link incident will generate its own link incident. Link incidents detected by the storage facility may be displayed from the error log. Fault isolation of link incidents is solved by the combined use of product and system documentation: v Enterprise Systems Connection Link Fault Isolation. book, form number SY22-9533. v Maintenance Information for S/390 Fiber Optic Links (ESCON, Fibre, Coupling Links, and Open System Adapters) book, form number SY27-2597. Ensure that both documents are available for problem determination.
Isolation
1. This MAP has been combined into MAP 5300, go to MAP 5300: ESCON or Fibre Channel Link Fault on page 548.
562
Description
Bit Error Rate Threshold incidents are caused by specific conditions at an interface or along a line which can cause bits to be received or interpreted incorrectly. These bits are counted, and when a specific number is reached (threshold exceeded), the link is operating in a degraded mode. Bit errors are counted by each node attached on a link. You must determine which node(s) in a link have detected a threshold exceeded condition to identify the link or nodes causing the incident.
Isolation
1. Determine what type of error was reported by the customer. Was the customer-reported error a Bit Error Threshold Exceeded (BER) detected at the ATTACHED node? v Yes, go to step 3. v No, continue with the next step. 2. Display problems using the following service panel options: From the service terminal Main Service Menu, select: Repair Menu Show / Repair Problems Needing Repair Are there any bit error rate problems (ESC=326A) for the failing link? v Yes, continue with the next step. v No, Additional link problem determination is needed. Ensure that all optical link cables are reconnected, then call next level of support. 3. Test the bit error rate: v Reconnect the optical link cables to the subsystem, if previously disconnected. v Run the Bit Error Rate Test on the failing link: From the service terminal Main Service Menu, select: Machine Test Menu Host Interface Cards Menu Fibre Channel Host Ports Menu Fibre Channel Port Bit-Error-Rate Test Select the SA interface to be tested, and follow the instructions on the screen to run the test. Did the test run successfully? Yes, cancel any outstanding Bit Error Rate problems logged for this link and resume any quiesced links. The call is complete. No, continue with the next step. 4. Determine how many times the Bit-Error-Rate Test has been run. Has this test been run only one time? v Yes, clean the fiber optic connectors and run this test again. Use the fiber optic cleaning procedure specified in the fiber optic connector cleaning kit (New P/N 46G6844, Old P/N 5453521). Go to step 3.
Problem Isolation Procedures, CHAPTER 3
563
Description
You are here to resolve a host failure to recognize LUNs configured on an ESS Fibre Channel.
Isolation
1. Use the service terminal to determine the current ESS Fibre Channel configuration and connections. From the service terminal Main Service Menu, select: Configuration Options Menu Systems Attachment Resources Menu List Host Cards and Ports 2. Using the configuration worksheets from the IBM Enterprise Storage Server Configuration Planner book, form number SC26-7450. Verify that the ESS hardware configuration matches: a. The configuration worksheet. b. The Fibre Channel host cables are connected to the appropriate Fibre Channel host card and host bay, see the following figure. if mismatches are discovered, check with the customer to resolve any differences.
564
Host Bays
R1-B1
R1-B2
R1-B3
R1-B4
Front View Ultra SCSI Host Cards Card 1, R1-Bx-H1 Card 2, R1-Bx-H2 Card 3, R1-Bx-H3 Card 4, R1-Bx-H4 SCSI Connectors ESCON Link Connectors ESCON Host Cards Card 1, R1-Bx-H1 Card 2, R1-Bx-H2 Card 3, R1-Bx-H3 Card 4, R1-Bx-H4
ZA ZB
ZA/LINK 00 ZB/LINK 01
Fibre Channel Host Cards Card 1, R1-Bx-H1 Card 2, R1-Bx-H2 Card 3, R1-Bx-H3 Card 4, R1-Bx-H4 Fibre Channel Card Type LW2 (Long Wave Card) SW2 (Short Wave Card) Fibre Link Connectors Link A
Figure 149. 2105 Model 800 Host Bay Connector Locations (s009135)
Has the problem been resolved? v Yes, return to the procedure that sent you here. v No, continue with the next step. 3. Verify the LUN access setting of the Control Switches: From the service terminal Main Service Menu, select: Configuration Options Menu Change / Show Control Switches Is the Fibre Channel LUN Access Control set to Access_All? v Yes, have the customer check with the system administrator to verify that the host fibre configuration is correct. Note: If the control switches are changed, the subsystem must be rebooted for the change to take effect. v No, continue with the next step. 4. Has the problem been resolved? v Yes, return to the procedure that sent you here.
Problem Isolation Procedures, CHAPTER 3
565
Description
A fibre host card in the 2105 has detected and reported a loss of light from an attached host system. When a 2105 fibre host card detects a loss of light, the problem is normally external to the 2105. This is reported to the host system as a status condition. A problem is not created for this condition. A separate problem will be created if the fiber card detects an internal operational error.
Isolation
1. Use information from the customer to determine which fibre host card in the 2105 has reported the loss of light. Note: A problem is not created for this condition. 2. Use the service terminal Repair Menu and Show / Repair Problems Needing Repair options to repair any related problems for that fibre host card. 3. Observe the green and yellow LED indicators on that fibre host card. With a loss of light condition, the green LED should be blinking slowly (1 per second) and the yellow LED should be off. A loss of light problem is normally not caused by the 2105. These problems are normally external to the 2105. Use the standard fibre channel isolation procedures (not included in this service guide) to restore light to the fiber cable connected to this fibre host card.
566
Description
The Copyright and Login screen are displayed when all of the following occur: v The service terminal and cable are connected to the cluster S2 port. v The service terminals, terminal emulator program is properly configured. v The service terminals, terminal emulator program is logically connected. v The Enter key is pressed to create a keyboard interrupt to the cluster. The login Main Service Menu is displayed when all of the following occur: v The service terminal Copyright and Login screen are displayed v The service login and password are entered v The cluster to cluster ethernet communication is either successful or times out. If the communication hangs, the screen will go blank and stay blank. The following rsACExec.c return code definitions are provided for product engineering use only:
Table 94. rsACExec.c Return Code Definitions PROCESS_TIMEOUT SUBROUTINE_FAIL FILE_FAILURE READ_FAIL WRITE_FAIL INVALID_IP_ADDR DAEMON_INIT_FAIL SOCKET_FAIL AUTHORIZATION_FAIL ERROR_INVALID 0x90=144 0x89=137 0x88=136 0x86=134 0x85=133 0x84=132 0x83=131 0x82=130 0x81=129 0x80=128 Client timeout failure A system subroutine failure Operations on a file failed Read of socket failure Write to socket failure Loopback IP address invalid failure Daemon Initialization/Setup fail Socket failure Failure during authorization Invalid parameter failure
Isolation
Use the following steps to isolate the problem. 1. Check if the Copyright and Login screen is displayed. Connect the service terminal and cable to the cluster and then attempt to logically connect the service terminals terminal emulator program. Press the Enter key to create a keyboard interrupt. Wait up to 3 minutes for the Copyright and Login screen to display.
Problem Isolation Procedures, CHAPTER 3
567
568
569
570
Cluster 2 RS/232 S2
Cluster 1 S2 RS/232
Service Terminal Interface Cable Power Jack Serial Connector Service Terminal
571
572
Appendix. Accessibility
Accessibility features help a user who has a physical disability, such as restricted mobility or limited vision, to use software products successfully.
Features
These are the major accessibility features in the IBM TotalStorage Enterprise Storage Server information: 1. You can use screen-reader software and a digital speech synthesizer to hear what is displayed on the screen. IBM Home Page Reader version 3.0 has been tested. 2. You can operate features using the keyboard instead of the mouse.
Navigating by keyboard
You can use keys or key combinations to perform operations and initiate menu actions that can also be done through mouse actions. You can navigate the IBM TotalStorage Enterprise Storage Server information from the keyboard by using the shortcut keys for your browser or Home Page Reader. See your browser Help for a list of shortcut keys that it supports. See the following Web site for a list of shortcut keys supported by Home Page Reader:
http://www-306.ibm.com/able/solution_offerings/keyshort.html
You can access the information using IBM Home Page Reader 3.0.
573
574
Index Numerics
20 mb where 40 mb SSA cable expected, MAP 3656 312 2105 cannot be power off, pinned data, MAP 24B0 167 2105 expansion enclosure (rack 2) power off problem, MAP 23B0 144 2105 Expansion Enclosure (rack 2) UEPO problem, MAP 2380 138 2105 expansion enclosure information 1 2105 expansion enclosure power on problem, MAP 2420 154 2105 Model 750 disk storage information 21 2105 Model 750 disk storage information 21 2105 model 750 information 1 2105 Model 800 (rack 1) UEPO problem, MAP 2360 131 2105 Model 800 disk storage information 21 2105 Model 800 disk storage information 21 2105 model 800 information 1 2105 Model 800 local power on problems, MAP 2400 149 automatic LIC activation problem, (CCL), MAP 4A60 493 automatic LIC activation problem, (NCCL), MAP 4AA0 501 automatic LIC activation problem, (NCCL), MAP 4BA0 532 automatic LIC activation problem, (CCL), MAP 4B60 520 automatic LIC activation problem, (CCL), MAP 4A80 497 automatic LIC activation problem, (CCL), MAP 4B80 526 automatic LIC activation problem, (CCL), MAP 4A50 491 automatic LIC activation problem, (CCL), MAP 4B50 517 automatic LIC activation problem, (NCCL), MAP 4A30 486 automatic LIC activation problem, (NCCL), MAP 4B30 511 automatic LIC activation problem, (CCL), MAP 4A70 495 automatic LIC activation problem, (CCL), MAP 4B70 523 automatic LIC activation problem, (NCCL), MAP 4AB0 503 automatic LIC activation problem, (NCCL), MAP 4BB0 534 automatic LIC activation problem, (CCL), MAP 4A90 499 automatic LIC activation problem, (CCL), MAP 4B90 529 automatic LIC activation problem, NCCL), MAP 4A10 482 automatic LIC activation problem, NCCL), MAP 4B10 506 automatic LIC activation problem, NCCL), MAP 4BE0 537 cluster 1, phase 150 cluster 1, phase 150 cluster 1, phase 150 cluster 1, phase 150, cluster 1, phase 200 cluster 1, phase 200 cluster 2, phase 100 cluster 2, phase 100 cluster 2, phase 100 cluster 2, phase 100 cluster 2, phase 150 cluster 2, phase 150 cluster 2, phase 150 cluster 2, phase 150 cluster 2, phase 200 cluster 2, phase 200 phase 000 (CCL & phase 000 (CCL & phase 400 (CCL &
A
a temporary CPI error was detected, MAP 41F0 365 accessibility 573 accessing copy services information 9 accessing ESS specialist information 9 all DDMs on loop B do not have the same characteristics, MAP 3626 303 all DDMs on SSA loop A do not have the same characteristics, MAP 3625 302 analyzing a storage cage fan/power sense card check summary indicator on, MAP 3379 246 array repair required, MAP 3123 226 arrays across loops information 5 arrays across loops information 5 attaching the ESSNet to a customer network, MAP 1620 107 attempt to format array member, MAP 3131 231 attention notices fragility of disk drive modules 176 automatic LIC activation cluster problem, phase 400 (CCL & NCCL), MAP 4AE0 504 automatic LIC activation failure, cluster 1 phase 100 (CCL), MAP 4A40 488 automatic LIC activation problem, cluster 1, phase 100 (CCL), MAP 4B40 514 automatic LIC activation problem, cluster 1, phase 100 (NCCL), MAP 4A20 485 automatic LIC activation problem, cluster 1, phase 100 (NCCL), MAP 4B20 509
B
battery set charge low, MAP 2460 162 battery set detection problem, MAP 2470 162 bay held reset condition 339 begin all service actions 29 bit error rate test failure, MAP 5305 550 bootlist management using SMS for automatic LIC, MAP 43A5 392 bootlist management using SMS, MAP 43A0 387 both RPC cards firmware down level, MAP 24F0 168 bypass card jumpers wrong, MAP 3654 311
C
call home / remote services failure, MAP 1301 55 Canadian compliance statement xviii category 1, crash codes 368, 369
575
CD-ROM test failure, MAP 4600 429 CEC drawer power indicator information 15 CEC drawer power indicator information 15 CEC drawer power on problem, MAP 2700, 170 CEC or I/O drawer visual power supply problem, MAP 2800 171 CEC, I/O, or host bay drawer overcurrent, MAP 2030 113 CEC, I/O, or host bay drawer power fault, MAP 2230 122 changing network configuration for ESS and master console, MAP 1607 85 chapter 1, reference information 1 chapter 2, entry for all service actions 29 chapter 3, problem isolation procedures (MAPs) 41 Chinese EMI statement xx CKD read data failure, MAP 5340 561 cluster MAP 4055, bay held reset condition 339 MAP 45A0: pinned data, special case 428 cluster code load counter = 2, MAP 4350 370 cluster dual hard drive ESC 1xxx, MAP 43B0 398 cluster fails to power off, MAP 47A0 449 cluster FRU replacement (CEC and I/O drawers), MAP 4700 432 cluster hang during failback or error recovery, MAP 4010 319 cluster IML from second hard disk drive, MAP 43C0 400 cluster indicators information 15 cluster indicators information 15 cluster minimum configuration, MAP 4540 418 cluster not ready, MAP 20A0 117 cluster NVS problem, MAP 4460 410 cluster power off request problem, MAP 4730 446 cluster power on problem, MAP 4880 461 cluster powered off unexpectedly 431 cluster powered off unexpectedly, MAP 23E0 149 cluster SP, SPCN, or system firmware down-level, MAP 4610 430 cluster SP, SPCN, or system firmware reload 431 cluster to cluster ethernet communication test, MAP 4410 403 cluster to modem communication problem, MAP 1300 52 cluster to RPC cards communication problem, MAP 4480 411 codes category 1, crash codes 368, 369 communications statement xviii compliance statement, radio frequency energy xviii compliance statement, Taiwan xx configuration MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 configuring 94 configuring, 2105 Model 800 for installation 94
connecting ethernet LAN 94 connecting the modem and modem expander for remote support, MAP 1610 88 converting the personal computer to an ESSNet console, MAP 1606 76 copy services information 10 copy services information 10 copy services, accessing information 9 CPI address mismatch, MAP 4090 343 CPI diagnostic communication problem, MAP 4840 457 CPI failure needing CPI cable as FRU, MAP 41E0 365 CPI interface NVS/IOA card to host bay failure, MAP 41B0 361 CPI problem or host bay slot failure, MAP 41D0 364 crossed RPC cables to expansion rack, MAP 2450 160 CUIR information 11 CUIR information 11 customer copy services problem, MAP 4980 474 customer media maintenance examples 38 start 38 customer receives sense data without a SIM 34 start 34
D
DDM bay controller card indicator 22 controller card power check indicator information 21 DDM check indicator 22 disk drive module check indicator 23 disk drive module indicators 23 disk drive module ready indicator 23 external SSA connections 24 indicator information 21 internal SSA connections 24 link status (ready) indicator 22 mode indicator 22 DDM bay controller card indicator 22 DDM bay controller card power check indicator information 21 DDM bay DDM check indicator 22 DDM bay disk drive module check indicator 23 DDM bay disk drive module indicators 23 DDM bay disk drive module ready indicator 23 DDM bay external SSA connections 24 DDM bay indicator information 21 DDM bay internal SSA connections 24 DDM bay link status (ready) indicator 22 DDM bay mode indicator 22 DDM bay verification for possible problems, MAP 3520 284 DDM bay, maintenance analysis procedures (MAPs) 176 DDM installation introduces different RPM, MAP 3614 296
576
DDM installation with mixed capacity rank site, MAP 3612 293 DDM installation with new rank site capacity, MAP 3610 290 DDM size is not supported, MAP 3617 298 DDM, or DDMs, found in formatting state during IML, MAP 3580 288 DDMs of same capacity but different rpms on the same SSA loop 298 decode a refcode 36 start 36 disability 573 disk drive module check indicator 23 indicators 23 ready indicator 23 disk drive module check indicator 23 disk drive module indicators 23 disk drive module ready indicator 23 display and repair a problem, MAP 1210 51 display cluster ethernet network address, MAP 4420 405 display ESCON and fibre node descriptors, MAP 5330 560 displaying cluster SMS error logs, MAP 4400 402 dump progress indicators 369 duplicate TCP/IP address detected for this cluster, MAP 43D0 401
E
electronic emission notices xviii EMI statement, Chinese xx end a DASD service action, MAP 3360 241 end service action, MAP 1500 67 entry for maintenance analysis procedures (MAPs) 41 entry MAP for CPI problems, MAP 4040 326 entry table for all service actions 29 start 29 entry table, entry table, MAP 3xxx: SSA DASD DDM bay MAPs 43 entry table, entry table, MAP 4xxx: cluster MAPs 45 entry table, entry table, MAP 5xxx: host interface MAPs 48 entry table, MAP 1xxx: general MAPs 41 entry table, MAP 2xxx: power and cooling MAPs 42 entry table, MAP 3xxx: SSA DASD DDM bay MAPs 43 entry table, MAP 5xxx: host interface MAPs 48 entry table, MAP 6xxx: service terminal MAPs 49 EREP EREP reports 34 repair using an EREP report 34 EREP reports 34 start 34 error displaying problems needing repair, MAP 4370 375 ESC 2768, NVS/IOA card problem, MAP 4470 411 ESC 2770 or 2771, missing CPI detected, MAP 41C0 362 ESC 5500 isolation, MAP 4960 471
ESCON information 6 link fault isolation 548 MAP 5305, bit error rate test failure 550 ESCON attached host systems information 6 ESCON bit error validation, MAP 5310 551 ESCON optical power measurement, MAP 5320 552 ESS connection security information 7 ESS cluster to customer network problem, MAP 4450 407 ESS connection security information 7 ESS interface information 7 ESS interface information 7 ESS service interface information 11 ESS service interface information 11 ESS specialist information 9 ESS Specialist cannot access cluster, MAP 5000 540 ESS specialist information 9 ESS specialist, accessing information 9 EssNet MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 ESSNet information 7 master console replacement 8 ESSNet console problem, MAP 1600 68 ESSNet information 7 ESSNet1 or master console to cluster ethernet problem, MAP 4440 405 European Community Compliance statement xviii event history report 35 start 35 extended cluster IML time from NVS battery charging, MAP 4200 366
F
failure detected during Background Certify and Build Logical Configuration from ISA 477 failure detected during Background Certify and Build Logical Configuration from ISA, MAP 49A0 477 FCC (see Federal Communications Commission) xviii Federal Communications Commission (FCC) statement xviii fence network isolation, MAP 40A0 344 fibre MAP 5305, bit error rate test failure 550 fibre channel connection information 13 host card indicator information 14 fibre channel (SCSI-FCP) information 6
Index
577
fibre channel (SCSI-FCP) host system information 6 fibre channel bit error validation, MAP 5410 563 fibre channel connection information 13 fibre channel host card indicator information 14 Fibre channel link fault, MAP 5400 562 fibre host card reports a loss of light, MAP 5440 566 fibre optical power measurement, MAP 5321 556 ficon information 6 link fault isolation 548 FICON attached host systems information 6 formatting of a DDM has not completed, MAP 3127 229
G
generating a refcode from sense bytes 37 start 37
H
Handling a missing or failing resource, MAP 4130 353 hard disk drive build process for both drives, MAP 4020 320 hard drive build process for automatic LIC, MAP 4025 324 host bay drawer fan reporting failure, MAP 4110 351 host bay drawer power supply problem, MAP 2210 119 host bay drawer visual power supply problem, MAP 2810 174 host bay fails to power off, MAP 4720 443 host bay power on problem, MAP 4870 459 host fibre channel fails to recognize ESS LUNs, MAP 5430 564 host systems information 5 host systems information 5
I
I/O drawer power indicator information 17 I/O drawer power indicator information 17 IBM patents xvii products xvii programs xvii services xvii trademarks xx incomplete or failed format process, MAP 3550 286 indicators dump progress 369 Industry Canada Compliance statement xviii information 2105 expansion enclosure 1 2105 model 750 1 2105 Model 750 disk storage 21 2105 model 800 1 2105 Model 800 disk storage 21 accessing copy services 9
information (continued) accessing ESS specialist 9 arrays across loops 5 CEC drawer power indicator 15 cluster indicators 15 copy services 10 CUIR 11 DDM bay controller card indicator 22 DDM bay controller card power check indicator information 21 DDM bay DDM check indicator 22 DDM bay disk drive module check indicator 23 DDM bay disk drive module indicators 23 DDM bay disk drive module ready indicator 23 DDM bay external SSA connections 24 DDM bay indicators 21 DDM bay internal SSA connections 24 DDM bay link status (ready) indicator 22 DDM bay mode indicator 22 differences, ESSNet and master consoles 8 disk drive module check indicator 23 disk drive module indicators 23 disk drive module ready indicator 23 ESCON attached host systems 6 ESS connection security 7 ESS interfaces 7 ESS master consoles 8 ESS service interface 11 ESS specialist 9 ESSNet 7 fibre channel connection 13 fibre channel host card indicators 14 FICON attached host systems 6 host systems 5 fibre channel (SCSI-FCP) 6 SCSI 5 I/O drawer power indicator 17 master console 7 RAID-10 4 RAID-5 4 redundant array of independent disks (RAID) 4 reference 1 remote service support 13 RPC local and automatic switch settings 20 RPC local and remote switch settings 20 service interface 13 special tools 28 switching ESS power off (automatic mode) 21 switching ESS power off (local mode) 20 switching ESS power off (remote mode) 21 switching ESS power on and off (all modes) 19 switching ESS power on and off (automatic mode) 19 switching ESS power on and off (local mode) 19 switching ESS power on and off (remote mode) 20 topics 1 TotalStorage expert 10 using the ESS operator panel 17 information topics 1 information, 2105 expansion enclosure 1 information, 2105 model 750 1
578
information, 2105 model 800 1 information, reference 1 installation, 2105 Model 800 completing connecting ethernet LAN 94 testing modem communications 94 installed unit or feature mismatch, MAP 2320 124 isolating a blinking 888 error on the CEC drawer operator panel, MAP 4240 367 isolating a cluster to cluster CPI communication failure, MAP 4510 415 isolating a cluster to cluster ethernet problem, MAP 4390 377 isolating a customer data check failure, MAP 5240 544 isolating a customer LAN connection problem, MAP 4380 376 isolating a DDM bay controller card communications problem, MAP 3398 264 isolating a DDM bay location error, MAP 3428 279 isolating a DDM bay power problem, MAP 3395 261 isolating a DDM LIC update problem, MAP 4710 442 isolating a DDM location problem, MAP 3429 282 isolating a degraded SSA link between a DDM and an SSA device card, MAP 3060 184 isolating a degraded SSA link between a DDM and two SSA device cards, MAP 3078 193 isolating a degraded SSA link between two DDMs in separate DDM bays and an SSA device card, MAP 3096 209 isolating a degraded SSA link between two DDMs in separate DDM bays, MAP 3101 217 isolating a degraded SSA link between two DDMs, MAP 3010 178 isolating a degraded SSA link between two SSA device cards connected throjugh a DDM bay, MAP 3086 201 isolating a degraded SSA link, MAP 3121 223 isolating a diskette drive failure, MAP 4620 430 isolating a fixed block read data failure, MAP 5230 543 isolating a functional code not running problem, MAP 4780 447 isolating a LIC activation process failure, MAP 4140 354 isolating a LIC process read/display problem, MAP 4100 351 isolating a meta data check failure, MAP 5250 547 isolating a Multiple DDM detect over temperature problem, MAP 3685 316 isolating a SCSI bus error, MAP 5220 541 isolating a SCSI card configuration timeout, MAP 4820 456 isolating a software problem, MAP 4970 472 isolating a storage and DDM bay location error, MAP 3427 277 isolating a storage cage fan failure, MAP 3384 248 isolating a storage cage fan/power sense card error, MAP 3375 242 isolating a storage cage fan/power sense card error, MAP 3378 245 isolating a storage cage fan/power sense card error, MAP 3381 247
isolating a storage cage fan/power sense card location error, MAP 3426 275 isolating a storage cage fan/power sense card R1 jumper missing error, MAP 3423 270 isolating a storage cage power supply failure, MAP 3387 251 isolating a storage cage power supply problem, MAP 3391 255 isolating a two DDMs detected over temperature problem, MAP 3680 313 isolating an array repair required failure, MAP 3129 230 isolating an automatic LIC activation failure, MAP 4A00 482 isolating an SSA DASD DDM bay controller card problem, MAP 3397 263 isolating an SSA link error between a DDM and an SSA device card, MAP 3050 179 isolating an SSA link error between a DDM and two SSA device cards, MAP 3077 187 isolating an SSA link error between two DDMs in separate DDM bays and an SSA device card, MAP 3095 204 isolating an SSA link error between two DDMs in separate DDM bays, MAP 3100 212 isolating an SSA link error between two DDMs, MAP 3000 176 isolating an SSA link error two SSA device cards connected through a DDM bay, MAP 3085 197 isolating an unexpected result, MAP 3605 290 isolating an unexpected SSA SRN, MAP 3125 228 isolating an unexpected SSA test results, MAP 3126 228 isolating an unknown DDM failure, MAP 3128 229 isolating between DDM hardware and microcode failures, MAP 3124 227 isolating e-mail notification problems, MAP 1310 58 isolating memory related error codes, MAP 4160 355 isolating multiple DDMs on an SSA loop cannot be accessed, MAP 3142 231 isolating power symptoms, MAP 2020 112 isolating SNMP notification problems, MAP 1305 56 isolating too few DDMs in DDM bay, MAP 3220 239 isolation bay held reset condition 339 cluster fails to power off 449 cluster powered off unexpectedly 431 cluster SP, SPCN, or system firmware reload 431 DDMs of same capacity but different rpms on the same SSA loop 298 entry for MAPs 41 entry table, MAP 1xxx: general MAPs 41 entry table, MAP 2xxx: power and cooling MAPs 42 entry table, MAP 3xxx: SSA DASD DDM bay MAPs 43 entry table, MAP 4xxx: cluster MAPs 45 entry table, MAP 5xxx: host interface MAPs 48 entry table, MAP 6xxx: service terminal MAPs 49 link fault Isolation, ESCON or ficon 548 MAP 1200, prioritizing visual symptoms and problems for repair 50
Index
579
isolation (continued) MAP 1210, display and repair a problem 51 MAP 1300, cluster to modem communication problem 52 MAP 1301, call home / remote services failure 55 MAP 1305, isolating SNMP notification problems 56 MAP 1310, isolating e-mail notification problems 58 MAP 1460, E-mail reported errors 66 MAP 1480, replacing a FRU without using a problem 66 MAP 1500, end service action 67 MAP 1600, ESSNet console problem 68 MAP 1602, repairing the ESSNet consoles personal computer 69 MAP 1604, restoring the personal computers software 69 MAP 1605, master console product recovery wizard 73 MAP 1606, converting the personal computer to an ESSNet console 76 MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 MAP 1610, connecting the modem and modem expander for remote support 88 MAP 1620, attaching the ESSNet to a customer network 107 MAP 1630, master console product recovery wizard for Xseries 206 PCs 111 MAP 2000, model 100 attachment rack reported 112 MAP 2020, isolating power symptoms 112 MAP 2030, CEC, I/O, or host bay drawer overcurrent 113 MAP 2031, repair ground continuity 114 MAP 20A0, cluster not ready 117 MAP 2210, host bay drawer power supply problem 119 MAP 2220, input power to CEC, I/O, host bay drawer power supplies not detected 120 MAP 2230, CEC, I/O, or host bay drawer power fault 122 MAP 2320, installed unit or feature mismatch 124 MAP 2340, PPS status code 06 125 MAP 2350, PPS status indicator codes 127 MAP 2360, 2105 Model 800 (rack 1) UEPO problem 131 MAP 2365, UEPO loop problem 133 MAP 2370, rack 1 power on problem, automatic mode 136 MAP 2380, 2105 Expansion Enclosure (rack 2) UEPO problem 138 MAP 2390, rack 1 power on problem, remote mode 140 MAP 23B0, 2105 expansion enclosure (rack 2) power off problem 144 MAP 23C0, power event threshold exceeded 146
isolation (continued) MAP 23D0, RPC-2 card reporting PPS battery set present 147 MAP 23E0, cluster powered off unexpectedly 149 MAP 2400, 2105 Model 800 local power on problems 149 MAP 2410, RPC power mode switch mismatch 153 MAP 2420, 2105 expansion enclosure power on problem 154 MAP 2430, one RPC card firmware down level 157 MAP 2440, rack 1 power off problem 157 MAP 2450, crossed RPC cables to expansion rack 160 MAP 2460, battery set charge low 162 MAP 2470, battery set detection problem 162 MAP 2490, PPS input phase missing 164 MAP 24A0, PPS power on problem 165 MAP 24B0, 2105 cannot be power off, pinned data 167 MAP 24F0, both RPC cards firmware down level 168 MAP 2520, PPS output circuit breaker tripped 168 MAP 2600, RPC card cannot reset a power fault 169 MAP 2700, CEC drawer power on problem 170 MAP 2800, CEC or I/O drawer visual power supply problem 171 MAP 2810, host bay drawer visual power supply problem 174 MAP 3000, isolating an SSA link error between two DDMs 176 MAP 3010, isolating a degraded SSA link between two DDMs 178 MAP 3050, isolating an SSA link error between a DDM and an SSA device card 179 MAP 3060, isolating a degraded SSA link between a DDM and an SSA device card 184 MAP 3077, isolating an SSA link error between a DDM and two SSA device cards 187 MAP 3078, isolating a degraded SSA link between a DDM and two SSA device cards 193 MAP 3085, isolating an SSA link error two SSA device cards connected through a DDM bay 197 MAP 3086, isolating a degraded SSA link between two SSA device cards connected throjugh a DDM bay 201 MAP 3095, isolating an SSA link error between two DDMs in separate DDM bays and an SSA device card 204 MAP 3096, isolating a degraded SSA link between two DDMs in separate DDM bays and an SSA device card 209 MAP 3100, isolating an SSA link error between two DDMs in separate DDM bays 212 MAP 3101, isolating a degraded SSA link between two DDMs in separate DDM bays 217 MAP 3120, isolating an SSA link error 220 MAP 3121, isolating a degraded SSA link 223 MAP 3123, array repair required 226 MAP 3124, isolating between DDM hardware and microcode failures 227
580
isolation (continued) MAP 3125, isolating an unexpected SSA SRN 228 MAP 3126, isolating an unexpected SSA test results 228 MAP 3127, formatting of a DDM has not completed 229 MAP 3128, isolating an unknown DDM failure 229 MAP 3129, isolating an array repair required failure 230 MAP 3131, attempt to format array member 231 MAP 3142, isolating multiple DDMs on an SSA loop cannot be accessed 231 MAP 3149, repairing single or multiple DDM failures 232 MAP 3180, controller card faile 235 MAP 3190, wrong drawer type error 236 MAP 3200, uninstalled SSA DDMs connected to loop A 237 MAP 3210, uninstalled SSA DDMs connected to loop B 238 MAP 3220, isolating too few DDMs in DDM bay 239 MAP 3300, repair alternate cluster to run SSA loop test 240 MAP 3360, end a DASD service action 241 MAP 3375, isolating a storage cage fan/power sense card error 242 MAP 3378, isolating a storage cage fan/power sense card error 245 MAP 3379, analyzing a storage cage fan/power sense card check summary indicator on 246 MAP 3381, isolating a storage cage fan/power sense card error 247 MAP 3384, isolating a storage cage fan failure 248 MAP 3387, isolating a storage cage power supply failure 251 MAP 3391, isolating a storage cage power supply problem 255 MAP 3395, isolating a DDM bay power problem 261 MAP 3397, isolating an SSA DASD DDM bay controller card problem 263 MAP 3398, isolating a DDM bay controller card communications problem 264 MAP 3400, replacing a DDM bay frame replacement 266 MAP 3421, storage cage fan/power sense card R2 cable problem 266 MAP 3422, storage cage fan/power sense card R2 jumper and cable problems 268 MAP 3423, isolating a storage cage fan/power sense card R1 jumper missing error 270 MAP 3424, storage cage fan/power sense card R1 jumper failing error 272 MAP 3425, storage cage fan/power sense card R2 cable error 273 MAP 3426, isolating a storage cage fan/power sense card location error 275 MAP 3427, isolating a storage and DDM bay location error 277 MAP 3428, isolating a DDM bay location error 279
isolation (continued) MAP 3429, isolating a DDM location problem 282 MAP 3500, verify a DDM bay repair 283 MAP 3520, DDM bay verification for possible problems 284 MAP 3530, SSA devices certify test failure 284 MAP 3540, web initiated format incomplete 285 MAP 3550, incomplete or failed format process 286 MAP 3560, unrelated occurrence, retry verification test 287 MAP 3570, unrelated event caused resume failure 288 MAP 3580, DDM, or DDMs, found in formatting state during IML 288 MAP 3600, multiple DDMs isolated on an SSA loop 289 MAP 3605, isolating an unexpected result 290 MAP 3610, DDM installation with new rank site capacity 290 MAP 3612, DDM installation with mixed capacity rank site 293 MAP 3614, DDM installation introduces different RPM 296 MAP 3617, DDM size is not supported 298 MAP 3618, replacement DDM has slower RPM than called for 299 MAP 3619, this repair requires a larger capacity DDM 301 MAP 3621, new DDM storage capacity smaller than original DDMs 301 MAP 3625, all DDMs on SSA loop A do not have the same characteristics 302 MAP 3626, all DDMs on loop B do not have the same characteristics 303 MAP 3627, unable to determine DDM use 304 MAP 3640, other cluster fenced - unable to verify SSA loop 305 MAP 3650, wrong, missing, or failing bypass card 307 MAP 3652, wrong, missing, or failing passthrough card 309 MAP 3654, bypass card jumpers wrong 311 MAP 3656, 20 mb where 40 mb SSA cable expected 312 MAP 3680, isolating a two DDMs detected over temperature problem 313 MAP 3685, isolating a Multiple DDM detect over temperature problem 316 MAP 4010, cluster hang during failback or error recovery 319 MAP 4020, hard disk drive build process for both drives 320 MAP 4025, hard drive build process for automatic LIC 324 MAP 4040, entry MAP for CPI problems 326 MAP 4060, replacing I/O drawer FRUs for CPI problems 341 MAP 4070, replacement of host bay FRUs for CPI problems 343 MAP 4090, CPI address mismatch 343 MAP 40A0, fence network isolation 344
Index
581
isolation (continued) MAP 40B0, special cluster problem determination using slow boot mode 346 MAP 40C0, special SCSI bus problem 347 MAP 40D0, special SRN problems 348 MAP 40E0, only one I/O drawer power supply detected 349 MAP 4100, isolating a LIC process read/display problem 351 MAP 4110, host bay drawer fan reporting failure 351 MAP 4120, handling unexpected resources 352 MAP 4130, handling a missing or failing resource 353 MAP 4140, isolating a LIC activation process failure 354 MAP 4150, PPS to RPC interface failure 355 MAP 4160, isolating memory related error codes 355 MAP 4170, loss of redundant input power to CEC, I/O or host bay drawers 357 MAP 4190, RPC to host bay drawer power communication failure 360 MAP 41A0, RPC card host bay drawer fan reporting failure 361 MAP 41B0, CPI interface NVS/IOA card to host bay failure 361 MAP 41C0, ESC 2770 or 2771, missing CPI detected 362 MAP 41D0, CPI problem or host bay slot failure 364 MAP 41E0, CPI failure needing CPI cable as FRU 365 MAP 41F0, a temporary CPI error was detected 365 MAP 4200, extended cluster IML time from NVS battery charging 366 MAP 4240, isolating a blinking 888 error on the CEC drawer operator panel 367 MAP 4350, cluster code load counter = 2 370 MAP 4360, isolation using codes displayed by the CEC drawer operator panel 371 MAP 4370, error displaying problems needing repair 375 MAP 4380, isolating a customer LAN connection problem 376 MAP 4390, isolating a cluster to cluster ethernet problem 377 MAP 43A0, bootlist management using SMS 387 MAP 43A5, bootlist management using SMS for automatic LIC 392 MAP 43B0, cluster dual hard drive ESC 1xxx 398 MAP 43C0, cluster IML from second hard disk drive 400 MAP 43D0, duplicate TCP/IP address detected for this cluster 401 MAP 43E0, service processor reset 401 MAP 4400, displaying cluster SMS error logs 402 MAP 4410, cluster to cluster ethernet communication test 403
isolation (continued) MAP 4420, display cluster ethernet network address 405 MAP 4440, ESSNet1 or master console to cluster ethernet problem 405 MAP 4450, ESS cluster to customer network problem 407 MAP 4460, cluster NVS problem 410 MAP 4470, ESC 2768, NVS/IOA card problem 411 MAP 4480, cluster to RPC cards communication problem 411 MAP 4510, isolating a cluster to cluster CPI communication failure 415 MAP 4520, pinned data and/or volume status unknown 417 MAP 4540, cluster minimum configuration 418 MAP 4550, NVS FRU replacement 426 MAP 4560: no valid subsystem status available 427 MAP 4600, CD-ROM test failure 429 MAP 4610, cluster SP, SPCN, or system firmware down-level 430 MAP 4620, isolating a diskette drive failure 430 MAP 4700, cluster FRU replacement (CEC and I/O drawers) 432 MAP 4710, isolating a DDM LIC update problem 442 MAP 4720, host bay fails to power off 443 MAP 4730, cluster power off request problem 446 MAP 4760, recovering from corrupted files or functions 446 MAP 4780, isolating a functional code not running problem 447 MAP 4810, unexpected host bay power off 452 MAP 4820, isolating a SCSI card configuration timeout 456 MAP 4840, CPI diagnostic communication problem 457 MAP 4850, repair the host bay drawer 458 MAP 4870, host bay power on problem 459 MAP 4880, cluster power on problem 461 MAP 4885, SPCN Load Fault Firmware Error Code 468 MAP 4890, replacing a CEC or I/O drawer power supply 471 MAP 4960, ESC 5500 isolation 471 MAP 4970, isolating a software problem 472 MAP 4980, customer copy services problem 474 MAP 4990, LIC feature license failure 476 MAP 49A0, failure detected during Background Certify and Build Logical Configuration from ISA 477 MAP 4A00, isolating an automatic LIC activation failure 482 MAP 4A10, automatic LIC activation problem, phase 000 (CCL & NCCL) 482 MAP 4A20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 485 MAP 4A30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 486 MAP 4A40, automatic LIC activation failure, cluster 1 phase 100 (CCL) 488
582
isolation (continued) MAP 4A50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 491 MAP 4A60, automatic LIC activation problem, cluster 1, phase 150 (CCL) 493 MAP 4A70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 495 MAP 4A80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 497 MAP 4A90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 499 MAP 4AA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 501 MAP 4AB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 503 MAP 4AE0, automatic LIC activation cluster problem, phase 400 (CCL & NCCL) 504 MAP 4B10, automatic LIC activation problem, phase 000 (CCL & NCCL) 506 MAP 4B20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 509 MAP 4B30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 511 MAP 4B40, automatic LIC activation problem, cluster 1, phase 100 (CCL) 514 MAP 4B50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 517 MAP 4B60, automatic LIC activation problem, cluster 1, phase 150, (CCL) 520 MAP 4B70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 523 MAP 4B80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 526 MAP 4B90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 529 MAP 4BA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 532 MAP 4BB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 534 MAP 4BE0, automatic LIC activation problem, phase 400 (CCL & NCCL) 537 MAP 5000, ESS Specialist cannot access cluster 540 MAP 5220, isolating a SCSI bus error 541 MAP 5230, isolating a fixed block read data failure 543 MAP 5240, isolating a customer data check failure 544 MAP 5250, isolating a meta data check failure 547 MAP 5305, bit error rate test failure 550 MAP 5310, ESCON bit error validation 551 MAP 5320, ESCON optical power measurement 552 MAP 5321, fibre optical power measurement 556 MAP 5330, display ESCON and fibre node descriptors 560 MAP 5340, CKD read data failure 561 MAP 5400, Fibre channel link fault 562 MAP 5410, fibre channel bit error validation 563 MAP 5430, host fibre channel fails to recognize ESS LUNs 564
isolation (continued) MAP 5440, fibre host card reports a loss of light 566 MAP 6060, service terminal login failed to one cluster 567 MAPs 41 MAPs 1XXX, general isolation procedures 50 MAPs 2XXX, power and cooling isolation procedures 112 MAPs 3XXX, SSA DASD DDM bay isolation procedures 176 MAPs 4XXX, cluster isolation procedures 319 MAPs 5XXX, host interface isolation procedures 540 MAPs 6XXX, service terminal isolation procedures 567 pinned data, special case 428 problem isolation using visual symptoms 60 replacing DDMs called out by enhanced PFA 233 RPC to RPC communication fault 359 SSA DASD DDM bay power problem 234 using the DDM bay maintenance analysis procedures (MAPs) 176 using the SSA DASD maintenance analysis procedures (MAPs) 176 isolation using codes displayed by the CEC drawer operator panel, MAP 4360 371
J
Japanese Voluntary Control Council for Interference (VCCI) class A statement xix
K
Korean Government Ministry of Communication (MOC) statement xix
L
LIC feature license failure 476 LIC feature license failure, MAP 4990 476 loss of redundant input power to CEC, I/O or host bay drawers, MAP 4170 357
M
manually configuring the video/graphics adapter for the master console, MAP 1608 86 manuals, related xxv MAP entry table, MAP 3xxx: SSA DASD DDM bay MAPs 43 entry table, MAP 4xxx: cluster MAPs 45 entry table, MAP 5xxx: host interface MAPs 48 entry table, MAP 6xxx: service terminal MAPs 49 MAP 1200, prioritizing visual symptoms and problems for repair 50 MAP 1210, display and repair a problem 51
Index
583
MAP 1300, cluster to modem communication problem 52 MAP 1301, call home / remote services failure 55 MAP 1305, isolating SNMP notification problems 56 MAP 1310, isolating e-mail notification problems 58 MAP 1320, problem isolation using visual symptoms 60 MAP 1460, E-mail reported errors 66 MAP 1460, E-mail reported errors, MAP 1460 66 MAP 1480, replacing a FRU without using a problem 66 MAP 1500, end service action 67 MAP 1600, ESSNet console problem 68 MAP 1602, repairing the ESSNet consoles personal computer 69 MAP 1604, restoring the personal computers software 69 MAP 1605, master console product recovery wizard 73 MAP 1606, converting the personal computer to an ESSNet console 76 MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 MAP 1610, connecting the modem and modem expander for remote support 88 MAP 1620, attaching the ESSNet to a customer network 107 MAP 1630, master console product recovery wizard for Xseries 206 PCs 111 MAP 1XXX, general isolation procedures 50 MAP 1xxx: general MAPs, entry table 41 MAP 2000, model 100 attachment rack reported 112 MAP 2020, isolating power symptoms 112 MAP 2030, CEC, I/O, or host bay drawer overcurrent 113 MAP 2031, repair ground continuity 114 MAP 20A0, cluster not ready 117 MAP 2210, host bay drawer power supply problem 119 MAP 2220, input power to CEC, I/O, host bay drawer power supplies not detected 120 MAP 2230, CEC, I/O, or host bay drawer power fault 122 MAP 2320, installed unit or feature mismatch 124 MAP 2340, PPS status code 06 125 MAP 2350, PPS status indicator codes 127 MAP 2360, 2105 Model 800 (rack 1) UEPO problem 131 MAP 2365, UEPO loop problem 133 MAP 2370, rack 1 power on problem, automatic mode 136 MAP 2380, 2105 Expansion Enclosure (rack 2) UEPO problem 138 MAP 2390, rack 1 power on problem, remote mode 140 MAP 23B0, 2105 expansion enclosure (rack 2) power off problem 144 MAP 23C0, power event threshold exceeded 146
MAP 23D0, RPC-2 card reporting PPS battery set present 147 MAP 23E0, cluster powered off unexpectedly 149 MAP 2400, 2105 Model 800 local power on problems 149 MAP 2410, RPC power mode switch mismatch 153 MAP 2420, 2105 expansion enclosure power on problem 154 MAP 2430, one RPC card firmware down level 157 MAP 2440, rack 1 power off problem 157 MAP 2450, crossed RPC cables to expansion rack 160 MAP 2460, battery set charge low 162 MAP 2470, battery set detection problem 162 MAP 2490, PPS input phase missing 164 MAP 24A0, PPS power on problem 165 MAP 24B0, 2105 cannot be power off, pinned data 167 MAP 24F0, both RPC cards firmware down level 168 MAP 2520, PPS output circuit breaker tripped 168 MAP 2600, RPC card cannot reset a power fault 169 MAP 2700, CEC drawer power on problem 170 MAP 2800, CEC or I/O drawer visual power supply problem 171 MAP 2810, host bay drawer visual power supply problem 174 MAP 2xxx: power and cooling MAPs, entry table 42 MAP 3000, isolating an SSA link error between two DDMs 176 MAP 3010, isolating a degraded SSA link between two DDMs 178 MAP 3050, isolating an SSA link error between a DDM and an SSA device card 179 MAP 3060, isolating a degraded SSA link between a DDM and an SSA device card 184 MAP 3077, isolating an SSA link error between a DDM and two SSA device cards 187 MAP 3078, isolating a degraded SSA link between a DDM and two SSA device cards 193 MAP 3085, isolating an SSA link error two SSA device cards connected through a DDM bay 197 MAP 3086, isolating a degraded SSA link between two SSA device cards connected throjugh a DDM bay 201 MAP 3095, isolating an SSA link error between two DDMs in separate DDM bays and an SSA device card 204 MAP 3096, isolating a degraded SSA link between two DDMs in separate DDM bays and an SSA device card 209 MAP 3100, isolating an SSA link error between two DDMs in separate DDM bays 212 MAP 3101, isolating a degraded SSA link between two DDMs in separate DDM bays 217 MAP 3120, isolating an SSA link error 220 MAP 3121, isolating a degraded SSA link 223 MAP 3123, array repair required 226 MAP 3124, isolating between DDM hardware and microcode failures 227 MAP 3125, isolating an unexpected SSA SRN 228
584
MAP 3126, isolating an unexpected SSA test results 228 MAP 3127, formatting of a DDM has not completed 229 MAP 3128, isolating an unknown DDM failure 229 MAP 3129, isolating an array repair required failure 230 MAP 3131, attempt to format array member 231 MAP 3142, isolating multiple DDMs on an SSA loop cannot be accessed 231 MAP 3180, controller card faile 235 MAP 3190, wrong drawer type error 236 MAP 3200, uninstalled SSA DDMs connected to loop A 237 MAP 3210, uninstalled SSA DDMs connected to loop B 238 MAP 3220, isolating too few DDMs in DDM bay 239 MAP 3300, repair alternate cluster to run SSA loop test 240 MAP 3360, end a DASD service action 241 MAP 3375, isolating a storage cage fan/power sense card error 242 MAP 3378, isolating a storage cage fan/power sense card error 245 MAP 3379, analyzing a storage cage fan/power sense card check summary indicator on 246 MAP 3381, isolating a storage cage fan/power sense card error 247 MAP 3384, isolating a storage cage fan failure 248 MAP 3387, isolating a storage cage power supply failure 251 MAP 3391, isolating a storage cage power supply problem 255 MAP 3395, isolating a DDM bay power problem 261 MAP 3397, isolating an SSA DASD DDM bay controller card problem 263 MAP 3398, isolating a DDM bay controller card communications problem 264 MAP 3400, replacing a DDM bay frame replacement 266 MAP 3421, storage cage fan/power sense card R2 cable problem 266 MAP 3422, storage cage fan/power sense card R2 jumper and cable problems 268 MAP 3423, isolating a storage cage fan/power sense card R1 jumper missing error 270 MAP 3424, storage cage fan/power sense card R1 jumper failing error 272 MAP 3425, storage cage fan/power sense card R2 cable error 273 MAP 3426, isolating a storage cage fan/power sense card location error 275 MAP 3427, isolating a storage and DDM bay location error 277 MAP 3428, isolating a DDM bay location error 279 MAP 3429, isolating a DDM location problem 282 MAP 3500, verify a DDM bay repair 283 MAP 3520, DDM bay verification for possible problems 284 MAP 3530, SSA devices certify test failure 284 MAP 3540, web initiated format incomplete 285
MAP 3550, incomplete or failed format process 286 MAP 3560, unrelated occurrence, retry verification test 287 MAP 3570, unrelated event caused resume failure 288 MAP 3580, DDM, or DDMs, found in formatting state during IML 288 MAP 3600, multiple DDMs isolated on an SSA loop 289 MAP 3605, isolating an unexpected result 290 MAP 3610, DDM installation with new rank site capacity 290 MAP 3612, DDM installation with mixed capacity rank site 293 MAP 3614, DDM installation introduces different RPM 296 MAP 3617, DDM size is not supported 298 MAP 3618, replacement DDM has slower RPM than called for 299 MAP 3619, this repair requires a larger capacity DDM 301 MAP 3621, new DDM storage capacity smaller than original DDMs 301 MAP 3625, all DDMs on SSA loop A do not have the same characteristics 302 MAP 3626, all DDMs on loop B do not have the same characteristics 303 MAP 3627, unable to determine DDM use 304 MAP 3640, other cluster fenced - unable to verify SSA loop 305 MAP 3650, wrong, missing, or failing bypass card 307 MAP 3652, wrong, missing, or failing passthrough card 309 MAP 3654, bypass card jumpers wrong 311 MAP 3656, 20 mb where 40 mb SSA cable expected 312 MAP 3680, isolating a two DDMs detected over temperature problem 313 MAP 3685, isolating a Multiple DDM detect over temperature problem 316 MAP 3xxx: SSA DASD DDM bay MAPs, entry table 43 MAP 4010, cluster hang during failback or error recovery 319 MAP 4020, hard disk drive build process for both drives 320 MAP 4025, hard drive build process for automatic LIC 324 MAP 4040, entry MAP for CPI problems 326 MAP 4060, replacing I/O drawer FRUs for CPI problems 341 MAP 4070, replacement of host bay FRUs for CPI problems 343 MAP 4090, CPI address mismatch 343 MAP 40A0, fence network isolation 344 MAP 40B0, special cluster problem determination using slow boot mode 346 MAP 40C0, special SCSI bus problem 347 MAP 40D0, special SRN problems 348 MAP 40E0, only one I/O drawer power supply detected 349 MAP 4100, isolating a LIC process read/display problem 351
Index
585
MAP 4110, host bay drawer fan reporting failure 351 MAP 4120, handling unexpected resources 352 MAP 4120, handling unexpected resources, MAP 4120 352 MAP 4130, handling a missing or failing resource 353 MAP 4140, isolating a LIC activation process failure 354 MAP 4150, PPS to RPC interface failure 355 MAP 4160, isolating memory related error codes 355 MAP 4170, loss of redundant input power to CEC, I/O or host bay drawers 357 MAP 4190, RPC to host bay drawer power communication failure 360 MAP 41A0, RPC card host bay drawer fan reporting failure 361 MAP 41B0, CPI interface NVS/IOA card to host bay failure 361 MAP 41C0, ESC 2770 or 2771, missing CPI detected 362 MAP 41D0, CPI problem or host bay slot failure 364 MAP 41E0, CPI failure needing CPI cable as FRU 365 MAP 41F0, a temporary CPI error was detected 365 MAP 4200, extended cluster IML time from NVS battery charging 366 MAP 4240, isolating a blinking 888 error on the CEC drawer operator panel 367 MAP 4350, cluster code load counter = 2 370 MAP 4360, isolation using codes displayed by the CEC drawer operator panel 371 MAP 4370, error displaying problems needing repair 375 MAP 4380, isolating a customer LAN connection problem 376 MAP 4390, isolating a cluster to cluster ethernet problem 377 MAP 43A0, bootlist management using SMS 387 MAP 43A5, bootlist management using SMS for automatic LIC 392 MAP 43B0, cluster dual hard drive ESC 1xxx 398 MAP 43C0, cluster IML from second hard disk drive 400 MAP 43D0, duplicate TCP/IP address detected for this cluster 401 MAP 43E0, service processor reset 401 MAP 4400, displaying cluster SMS error logs 402 MAP 4410, cluster to cluster ethernet communication test 403 MAP 4420, display cluster ethernet network address 405 MAP 4440, ESSNet1 or master console to cluster ethernet problem 405 MAP 4450, ESS cluster to customer network problem 407 MAP 4460, cluster NVS problem 410 MAP 4470, ESC 2768, NVS/IOA card problem 411 MAP 4480, cluster to RPC cards communication problem 411 MAP 4510, isolating a cluster to cluster CPI communication failure 415 MAP 4520, pinned data and/or volume status unknown 417
MAP 4540, cluster minimum configuration 418 MAP 4550, NVS FRU replacement 426 MAP 4560: no valid subsystem status available 427 MAP 4600, CD-ROM test failure 429 MAP 4610, cluster SP, SPCN, or system firmware down-level 430 MAP 4620, isolating a diskette drive failure 430 MAP 4700, cluster FRU replacement (CEC and I/O drawers) 432 MAP 4710, isolating a DDM LIC update problem 442 MAP 4720, host bay fails to power off 443 MAP 4730, cluster power off request problem 446 MAP 4760, recovering from corrupted files or functions 446 MAP 4780, isolating a functional code not running problem 447 MAP 47A0, cluster fails to power off 449 MAP 4810, unexpected host bay power off 452 MAP 4820, isolating a SCSI card configuration timeout 456 MAP 4840, CPI diagnostic communication problem 457 MAP 4850, repair the host bay drawer 458 MAP 4870, host bay power on problem 459 MAP 4880, cluster power on problem 461 MAP 4885, SPCN Load Fault Firmware Error Code 468 MAP 4890, replacing a CEC or I/O drawer power supply 471 MAP 4960, ESC 5500 isolation 471 MAP 4970, isolating a software problem 472 MAP 4980, customer copy services problem 474 MAP 4A00, isolating an automatic LIC activation failure 482 MAP 4A10, automatic LIC activation problem, phase 000 (CCL & NCCL) 482 MAP 4A20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 485 MAP 4A30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 486 MAP 4A40, automatic LIC activation failure, cluster 1 phase 100 (CCL) 488 MAP 4A50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 491 MAP 4A60, automatic LIC activation problem, cluster 1, phase 150 (CCL) 493 MAP 4A70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 495 MAP 4A80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 497 MAP 4A90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 499 MAP 4AA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 501 MAP 4AB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 503 MAP 4AE0, automatic LIC activation cluster problem, phase 400 (CCL & NCCL) 504 MAP 4B10, automatic LIC activation problem, phase 000 (CCL & NCCL) 506
586
MAP 4B20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 509 MAP 4B30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 511 MAP 4B40, automatic LIC activation problem, cluster 1, phase 100 (CCL) 514 MAP 4B50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 517 MAP 4B60, automatic LIC activation problem, cluster 1, phase 150, (CCL) 520 MAP 4B70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 523 MAP 4B80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 526 MAP 4B90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 529 MAP 4BA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 532 MAP 4BB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 534 MAP 4BE0, automatic LIC activation problem, phase 400 (CCL & NCCL) 537 MAP 4xxx: cluster MAPs, entry table 45 MAP 5000, ESS Specialist cannot access cluster 540 MAP 5220, isolating a SCSI bus error 541 MAP 5230, isolating a fixed block read data failure 543 MAP 5240, isolating a customer data check failure 544 MAP 5250, isolating a meta data check failure 547 MAP 5305, ESCON or fibre bit error rate test failure 550 MAP 5310, ESCON bit error validation 551 MAP 5320, ESCON optical power measurement 552 MAP 5321, fibre optical power measurement 556 MAP 5330, display ESCON and fibre node descriptors 560 MAP 5340, CKD read data failure 561 MAP 5400, Fibre channel link fault 562 MAP 5410, fibre channel bit error validation 563 MAP 5430, host fibre channel fails to recognize ESS LUNs 564 MAP 5440, fibre host card reports a loss of light 566 MAP 5xxx: host interface MAPs, entry table 48 MAP 6060, service terminal login failed to one cluster 567 MAP 6xxx: service terminal MAPs, entry table 49 MAPs 41 entry for problem isolation 41 entry table, MAP 1xxx: general MAPs 41 entry table, MAP 2xxx: power and cooling MAPs 42 MAP 1200, prioritizing visual symptoms and problems for repair 50 MAP 1210, display and repair a problem 51 MAP 1300, cluster to modem communication problem 52 MAP 1301, call home / remote services failure 55 MAP 1305, isolating SNMP notification problems 56 MAP 1310, isolating e-mail notification problems 58 MAP 1320, problem isolation using visual symptoms 60 MAP 1460, E-mail reported errors 66
MAPs (continued) MAP 1480, replacing a FRU without using a problem 66 MAP 1500, end service action 67 MAP 1600, ESSNet console problem 68 MAP 1602, repairing the ESSNet consoles personal computer 69 MAP 1604, restoring the personal computers software 69 MAP 1605, master console product recovery wizard 73 MAP 1606, converting the personal computer to an ESSNet console 76 MAP 1607, changing network configuration for ESS and master console 85 MAP 1608, manually configuring the video/graphics adapter for the master console 86 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 MAP 1610, connecting the modem and modem expander for remote support 88 MAP 1620, attaching the ESSNet to a customer network 107 MAP 1630, master console product recovery wizard for Xseries 206 PCs 111 MAP 2000, model 100 attachment rack reported 112 MAP 2020, isolating power symptoms 112 MAP 2030, CEC, I/O, or host bay drawer overcurrent 113 MAP 2031, repair ground continuity 114 MAP 20A0, cluster not ready 117 MAP 2210, host bay drawer power supply problem 119 MAP 2220, input power to CEC, I/O, host bay drawer power supplies not detected 120 MAP 2230, CEC, I/O, or host bay drawer power fault 122 MAP 2320, installed unit or feature mismatch 124 MAP 2340, PPS status code 06 125 MAP 2350, PPS status indicator codes 127 MAP 2360, 2105 Model 800 (rack 1) UEPO problem 131 MAP 2365, UEPO loop problem 133 MAP 2370, rack 1 power on problem, automatic mode 136 MAP 2380, 2105 Expansion Enclosure (rack 2) UEPO problem 138 MAP 2390, rack 1 power on problem, remote mode 140 MAP 23B0, 2105 expansion enclosure (rack 2) power off problem 144 MAP 23C0, power event threshold exceeded 146 MAP 23D0, RPC-2 card reporting PPS battery set present 147 MAP 23E0, cluster powered off unexpectedly 149 MAP 2400, 2105 Model 800 local power on problems 149 MAP 2410, RPC power mode switch mismatch 153 MAP 2420, 2105 expansion enclosure power on problem 154
Index
587
MAPs (continued) MAP 2430, one RPC card firmware down level 157 MAP 2440, rack 1 power off problem 157 MAP 2450, crossed RPC cables to expansion rack 160 MAP 2460, battery set charge low 162 MAP 2470, battery set detection problem 162 MAP 2490, PPS input phase missing 164 MAP 24A0, PPS power on problem 165 MAP 24B0, 2105 cannot be power off, pinned data 167 MAP 24F0, both RPC cards firmware down level 168 MAP 2520, PPS output circuit breaker tripped 168 MAP 2600, RPC card cannot reset a power fault 169 MAP 2700, CEC drawer power on problem 170 MAP 2800, CEC or I/O drawer visual power supply problem 171 MAP 2810, host bay drawer visual power supply problem 174 MAP 3000, isolating an SSA link error between two DDMs 176 MAP 3010, isolating a degraded SSA link between two DDMs 178 MAP 3050, isolating an SSA link error between a DDM and an SSA device card 179 MAP 3060, isolating a degraded SSA link between a DDM and an SSA device card 184 MAP 3077, isolating an SSA link error between a DDM and two SSA device cards 187 MAP 3078, isolating a degraded SSA link between a DDM and two SSA device cards 193 MAP 3085, isolating an SSA link error two SSA device cards connected through a DDM bay 197 MAP 3086, isolating a degraded SSA link between two SSA device cards connected throjugh a DDM bay 201 MAP 3095, isolating an SSA link error between two DDMs in separate DDM bays and an SSA device card 204 MAP 3096, isolating a degraded SSA link between two DDMs in separate DDM bays and an SSA device card 209 MAP 3100, isolating an SSA link error between two DDMs in separate DDM bays 212 MAP 3101, isolating a degraded SSA link between two DDMs in separate DDM bays 217 MAP 3120, isolating an SSA link error 220 MAP 3121, isolating a degraded SSA link 223 MAP 3123, array repair required 226 MAP 3124, isolating between DDM hardware and microcode failures 227 MAP 3125, isolating an unexpected SSA SRN 228 MAP 3126, isolating an unexpected SSA test results 228 MAP 3127, formatting of a DDM has not completed 229 MAP 3128, isolating an unknown DDM failure 229 MAP 3129, isolating an array repair required failure 230
MAPs (continued) MAP 3131, attempt to format array member 231 MAP 3142, isolating multiple DDMs on an SSA loop cannot be accessed 231 MAP 3149, repairing single or multiple DDM failures 232 MAP 3152, replacing DDMs called out by enhanced PFA 233 MAP 3160, SSA DASD DDM bay single DDM power problem 234 MAP 3180, controller card faile 235 MAP 3190, wrong drawer type error 236 MAP 3200, uninstalled SSA DDMs connected to loop A 237 MAP 3210, uninstalled SSA DDMs connected to loop B 238 MAP 3220, isolating too few DDMs in DDM bay 239 MAP 3300, repair alternate cluster to run SSA loop test 240 MAP 3360, end a DASD service action 241 MAP 3375, isolating a storage cage fan/power sense card error 242 MAP 3378, isolating a storage cage fan/power sense card error 245 MAP 3379, analyzing a storage cage fan/power sense card check summary indicator on 246 MAP 3381, isolating a storage cage fan/power sense card error 247 MAP 3384, isolating a storage cage fan failure 248 MAP 3387, isolating a storage cage power supply failure 251 MAP 3391, isolating a storage cage power supply problem 255 MAP 3395, isolating a DDM bay power problem 261 MAP 3397, isolating an SSA DASD DDM bay controller card problem 263 MAP 3398, isolating a DDM bay controller card communications problem 264 MAP 3400, replacing a DDM bay frame replacement 266 MAP 3421, storage cage fan/power sense card R2 cable problem 266 MAP 3422, storage cage fan/power sense card R2 jumper and cable problems 268 MAP 3423, isolating a storage cage fan/power sense card R1 jumper missing error 270 MAP 3424, storage cage fan/power sense card R1 jumper failing error 272 MAP 3425, storage cage fan/power sense card R2 cable error 273 MAP 3426, isolating a storage cage fan/power sense card location error 275 MAP 3427, isolating a storage and DDM bay location error 277 MAP 3428, isolating a DDM bay location error 279 MAP 3429, isolating a DDM location problem 282 MAP 3500, verify a DDM bay repair 283 MAP 3520, DDM bay verification for possible problems 284
588
MAPs (continued) MAP 3530, SSA devices certify test failure 284 MAP 3540, web initiated format incomplete 285 MAP 3550, incomplete or failed format process 286 MAP 3560, unrelated occurrence, retry verification test 287 MAP 3570, unrelated event caused resume failure 288 MAP 3580, DDM, or DDMs, found in formatting state during IML 288 MAP 3600, multiple DDMs isolated on an SSA loop 289 MAP 3605, isolating an unexpected result 290 MAP 3610, DDM installation with new rank site capacity 290 MAP 3612, DDM installation with mixed capacity rank site 293 MAP 3614, DDM installation introduces different RPM 296 MAP 3615, DDMs of same capacity but different rpms on the same SSA loop 298 MAP 3617, DDM size is not supported 298 MAP 3618, replacement DDM has slower RPM than called for 299 MAP 3619, this repair requires a larger capacity DDM 301 MAP 3621, new DDM storage capacity smaller than original DDMs 301 MAP 3625, all DDMs on SSA loop A do not have the same characteristics 302 MAP 3626, all DDMs on loop B do not have the same characteristics 303 MAP 3627, unable to determine DDM use 304 MAP 3640, other cluster fenced - unable to verify SSA loop 305 MAP 3650, wrong, missing, or failing bypass card 307 MAP 3652, wrong, missing, or failing passthrough card 309 MAP 3654, bypass card jumpers wrong 311 MAP 3656, 20 mb where 40 mb SSA cable expected 312 MAP 3680, isolating a two DDMs detected over temperature problem 313 MAP 3685, isolating a Multiple DDM detect over temperature problem 316 MAP 4010, cluster hang during failback or error recovery 319 MAP 4020, hard disk drive build process for both drives 320 MAP 4025, hard drive build process for automatic LIC 324 MAP 4040, entry MAP for CPI problems 326 MAP 4055, bay held reset condition 339 MAP 4060, replacing I/O drawer FRUs for CPI problems 341 MAP 4070, replacement of host bay FRUs for CPI problems 343 MAP 4090, CPI address mismatch 343 MAP 40A0, fence network isolation 344
MAPs (continued) MAP 40B0, special cluster problem determination using slow boot mode 346 MAP 40C0, special SCSI bus problem 347 MAP 40D0, special SRN problems 348 MAP 40E0, only one I/O drawer power supply detected 349 MAP 4100, isolating a LIC process read/display problem 351 MAP 4110, host bay drawer fan reporting failure 351 MAP 4120, handling unexpected resources 352 MAP 4130, handling a missing or failing resource 353 MAP 4140, isolating a LIC activation process failure 354 MAP 4150, PPS to RPC interface failure 355 MAP 4160, isolating memory related error codes 355 MAP 4170, loss of redundant input power to CEC, I/O or host bay drawers 357 MAP 4180, RPC to RPC communication fault 359 MAP 4190, RPC to host bay drawer power communication failure 360 MAP 41A0, RPC card host bay drawer fan reporting failure 361 MAP 41B0, CPI interface NVS/IOA card to host bay failure 361 MAP 41C0, ESC 2770 or 2771, missing CPI detected 362 MAP 41D0, CPI problem or host bay slot failure 364 MAP 41E0, CPI failure needing CPI cable as FRU 365 MAP 41F0, a temporary CPI error was detected 365 MAP 4200, extended cluster IML time from NVS battery charging 366 MAP 4240, isolating a blinking 888 error on the CEC drawer operator panel 367 MAP 4350, cluster code load counter = 2 370 MAP 4360, isolation using codes displayed by the CEC drawer operator panel 371 MAP 4370, error displaying problems needing repair 375 MAP 4380, isolating a customer LAN connection problem 376 MAP 4390, isolating a cluster to cluster ethernet problem 377 MAP 43A0, bootlist management using SMS 387 MAP 43A5, bootlist management using SMS for automatic LIC 392 MAP 43B0, cluster dual hard drive ESC 1xxx 398 MAP 43C0, cluster IML from second hard disk drive 400 MAP 43D0, duplicate TCP/IP address detected for this cluster 401 MAP 43E0, service processor reset 401 MAP 4400, displaying cluster SMS error logs 402 MAP 4410, cluster to cluster ethernet communication test 403
Index
589
MAPs (continued) MAP 4420, display cluster ethernet network address 405 MAP 4440, ESSNet1 or master console to cluster ethernet problem 405 MAP 4450, ESS cluster to customer network problem 407 MAP 4460, cluster NVS problem 410 MAP 4470, ESC 2768, NVS/IOA card problem 411 MAP 4480, cluster to RPC cards communication problem 411 MAP 4510, isolating a cluster to cluster CPI communication failure 415 MAP 4520, pinned data and/or volume status unknown 417 MAP 4540, cluster minimum configuration 418 MAP 4550, NVS FRU replacement 426 MAP 4560: no valid subsystem status available 427 MAP 45A0: pinned data, special case 428 MAP 4600, CD-ROM test failure 429 MAP 4610, cluster SP, SPCN, or system firmware down-level 430 MAP 4620, isolating a diskette drive failure 430 MAP 4640, cluster SP, SPCN, or system firmware reload 431 MAP 4670, cluster powered off unexpectedly 431 MAP 4700, cluster FRU replacement (CEC and I/O drawers) 432 MAP 4710, isolating a DDM LIC update problem 442 MAP 4720, host bay fails to power off 443 MAP 4730, cluster power off request problem 446 MAP 4760, recovering from corrupted files or functions 446 MAP 4780, isolating a functional code not running problem 447 MAP 47A0, cluster fails to power off 449 MAP 4810, unexpected host bay power off 452 MAP 4820, isolating a SCSI card configuration timeout 456 MAP 4840, CPI diagnostic communication problem 457 MAP 4850, repair the host bay drawer 458 MAP 4870, host bay power on problem 459 MAP 4880, cluster power on problem 461 MAP 4885, SPCN Load Fault Firmware Error Code 468 MAP 4890, replacing a CEC or I/O drawer power supply 471 MAP 4960, ESC 5500 isolation 471 MAP 4970, isolating a software problem 472 MAP 4980, customer copy services problem 474 MAP 4990, LIC feature license failure 476 MAP 49A0, failure detected during Background Certify and Build Logical Configuration from ISA 477 MAP 4A00, isolating an automatic LIC activation failure 482 MAP 4A10, automatic LIC activation problem, phase 000 (CCL & NCCL) 482
MAPs (continued) MAP 4A20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 485 MAP 4A30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 486 MAP 4A40, automatic LIC activation failure, cluster 1 phase 100 (CCL) 488 MAP 4A50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 491 MAP 4A60, automatic LIC activation problem, cluster 1, phase 150 (CCL) 493 MAP 4A70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 495 MAP 4A80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 497 MAP 4A90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 499 MAP 4AA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 501 MAP 4AB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 503 MAP 4AE0, automatic LIC activation cluster problem, phase 400 (CCL & NCCL) 504 MAP 4B10, automatic LIC activation problem, phase 000 (CCL & NCCL) 506 MAP 4B20, automatic LIC activation problem, cluster 1, phase 100 (NCCL) 509 MAP 4B30, automatic LIC activation problem, cluster 2, phase 100 (NCCL) 511 MAP 4B40, automatic LIC activation problem, cluster 1, phase 100 (CCL) 514 MAP 4B50, automatic LIC activation problem, cluster 2, phase 100 (CCL) 517 MAP 4B60, automatic LIC activation problem, cluster 1, phase 150, (CCL) 520 MAP 4B70, automatic LIC activation problem, cluster 2, phase 150 (CCL) 523 MAP 4B80, automatic LIC activation problem, cluster 1, phase 200 (CCL) 526 MAP 4B90, automatic LIC activation problem, cluster 2, phase 200 (CCL) 529 MAP 4BA0, automatic LIC activation problem, cluster 1, phase 150 (NCCL) 532 MAP 4BB0, automatic LIC activation problem, cluster 2, phase 150 (NCCL) 534 MAP 4BE0, automatic LIC activation problem, phase 400 (CCL & NCCL) 537 MAP 5000, ESS Specialist cannot access cluster 540 MAP 5220, isolating a SCSI bus error 541 MAP 5230, isolating a fixed block read data failure 543 MAP 5240, isolating a customer data check failure 544 MAP 5250, isolating a meta data check failure 547 MAP 5300: link fault isolation 548 MAP 5305, bit error rate test failure 550 MAP 5310, ESCON bit error validation 551 MAP 5320, ESCON optical power measurement 552 MAP 5321, fibre optical power measurement 556
590
MAPs (continued) MAP 5330, display ESCON and fibre node descriptors 560 MAP 5340, CKD read data failure 561 MAP 5400, Fibre channel link fault 562 MAP 5410, fibre channel bit error validation 563 MAP 5430, host fibre channel fails to recognize ESS LUNs 564 MAP 5440, fibre host card reports a loss of light 566 MAP 6060, service terminal login failed to one cluster 567 MAPs 1XXX, general isolation procedures 50 MAPs 2XXX, power and cooling isolation procedures 112 MAPs 3XXX, SSA DASD DDM bay isolation procedures 176 MAPs 4XXX, cluster isolation procedures 319 MAPs 5XXX, host interface isolation procedures 540 MAPs 6XXX, service terminal isolation procedures 567 problem isolation 41 using the DDM bay maintenance analysis procedures (MAPs) 176 using the SSA DASD maintenance analysis procedures (MAPs) 176 MAPs 2XXX, power and cooling isolation procedures 112 MAPs 3XXX, SSA DASD DDM bay isolation procedures 176 MAPs 4XXX, cluster isolation procedures 319 MAPs 5XXX, host interface isolation procedures 540 MAPs 6XXX, service terminal isolation procedures 567 master console 8 information 7 MAP 1609, power off and reboot procedure for the TotalStorage ESS master console 87 replaces ESSNet console 8 master console information 7 master console product recovery wizard for Xseries 206 PCs, MAP 1630 111 master console product recovery wizard, MAP 1605 73 media maintenance customer media maintenance examples 38 media SIM maintenance procedures 37 media SIM maintenance procedures 37 start 37 MOC (see Korean Government Ministry of Communication) xix model 100 attachment rack reported, MAP 2000 112 multiple DDMs isolated on an SSA loop, MAP 3600 289
notices, electronic emission xviii NVS FRU replacement, MAP 4550 426
O
one RPC card firmware down level, MAP 2430 157 only one I/O drawer power supply detected, MAP 40E0 349 ordering publications xxv other cluster fenced - unable to verify SSA loop, MAP 3640 305
P
patent licenses xvii pinned data MAP 45A0: pinned data, special case 428 pinned data and/or volume status unknown, MAP 4520 417 pinned data, special case 428 power event threshold exceeded, MAP 23C0 146 power off and reboot procedure for the TotalStorage ESS master console, MAP 1609 87 PPS input phase missing, MAP 2490 164 PPS output circuit breaker tripped, MAP 2520 168 PPS power on problem, MAP 24A0 165 PPS status code 06, MAP 2340 125 PPS status indicator codes, MAP 2350 127 PPS to RPC interface failure, MAP 4150 355 prioritizing visual symptoms and problems for repair, MAP 1200 50 problem isolation procedures (MAPs) 41 problem isolation using visual symptoms, MAP 1320 60 products xvii programs xvii publications, ordering xxv
R
rack 1 power off problem, MAP 2440 157 rack 1 power on problem, automatic mode, MAP 2370, 136 rack 1 power on problem, remote mode, MAP 2390 140 radio-frequency energy compliance statement xviii RAID information 4 RAID-10 information 4 RAID-5 information 4 recovering from corrupted files or functions, MAP 4760 446 redundant array of independent disks (RAID) information 4 refcode decode a refcode 36 generating a refcode from sense bytes 37 reference information 1 related manuals xxv remote service support information 13 remote service support information 13
Index
N
new DDM storage capacity smaller than original DDMs, MAP 3621 301 notices laser safety xvii safety xvii
591
repair using a SIM console message 33 using an EREP report 34 repair alternate cluster to run SSA loop test, MAP 3300 240 repair ground continuity, MAP 2031 114 repair the host bay drawer, MAP 4850 458 repair using a SIM console message 33 start 33 repair using an EREP reroute 34 start 34 repairing single or multiple DDM failures, MAP 3149 232 repairing the ESSNet consoles personal computer, MAP 1602 69 replacement DDM has slower RPM than called for, MAP 3618 299 replacement of host bay FRUs for CPI problems, MAP 4070 343 replacing a CEC or I/O drawer power supply, MAP 4890 471 replacing a DDM bay frame replacement, MAP 3400 266 replacing a FRU without using a problem, MAP 1480 66 replacing I/O drawer FRUs for CPI problems, MAP 4060 341 reports EREP reports 34 event history report 35 system exception reports 34 restoring the personal computers software, MAP 1604 69 RPC card cannot reset a power fault, MAP 2600 169 RPC card host bay drawer fan reporting failure, MAP 41A0 361 RPC local and automatic switch settings information 20 RPC local and automatic switch settings information 20 RPC local and remote switch settings information 20 RPC local and remote switch settings information 20 RPC power mode switch mismatch, MAP 2410 153 RPC to host bay drawer power communication failure, MAP 4190 360 RPC to RPC communication fault 359 RPC-2 card reporting PPS battery set present, MAP 23D0 147
S
safety notices attention xvii caution xvii danger xvii laser xvii notices xvii translations of xvii SCSI information 5
SCSI host system information 5 service actions analyze and repair a service request 29 change communications configuration 29 ESSNet console 29 Information 29 install 29 licensed internal code (microcode EC) 29 logical configuration / ESS specialist 29 remove 29 service terminal 29 start 29 system/390 repair 29 test a machine function 29 service interface information 13 service interface information 13 service processor reset, MAP 43E0 401 service terminal login failed to one cluster, MAP 6060 567 services xvii sim generation and usage 33 SIM customer receives sense data without a SIM 34 repair using a SIM console message 33 sim generation and usage 33 start 33 solating an SSA link error, MAP 3120 220 SPCN Load Fault Firmware Error Code, MAP 4885 468 special case, pinned data 428 special cluster problem determination using slow boot mode, MAP 40B0 346 special SCSI bus problem, MAP 40C0 347 special SRN problems, MAP 40D0 348 special tools 28 SSA DDM bay external SSA connections 24 DDM bay external SSA connections, five DDM bays 27 DDM bay external SSA connections, four DDM bays 27 DDM bay external SSA connections, one DDM bays 25 DDM bay external SSA connections, six DDM bays 27 DDM bay external SSA connections, three DDM bays 26 DDM bay external SSA connections, two DDM bays 26 DDM bay internal SSA connections 24 DDM bay internal SSA connections, two DDM bays 25 DDM bay SSA connections 25 SSA DASD DDM bay replacing DDMs called out by enhanced PFA 233 single DDM power problem 234 SSA DASD, maintenance analysis procedures (MAPs) 176 SSA devices certify test failure, MAP 3530 284
592
start analyze and repair a service request 29 change communications configuration 29 customer media maintenance examples 38 customer receives sense data without a SIM 34 decode a refcode 36 entry table for all service actions 29 EREP reports 34 ESSNet console 29 event history report 35 generating a refcode from sense bytes 37 Information 29 install 29 licensed internal code (microcode EC) 29 logical configuration / ESS specialist 29 media SIM maintenance procedures 37 remove 29 repair using a SIM console message 33 repair using an EREP report 34 service actions 29 service terminal 29 sim generation and usage 33 system exception reports 34 system/390 repair 29 test a machine function 29 start all service actions 29 start service actions 29 statement of compliance European Community Compliance xviii Federal Communications Commission xviii Industry Canada Compliance xviii Japanese Voluntary Control Council for Interference (VCCI) xix Korean Government Ministry of Communication (MOC) xix Taiwan xx statement of EMI Chinese xx storage cage fan/power sense card R1 jumper failing error, MAP 3424 272 storage cage fan/power sense card R2 cable error, MAP 3425, 273 storage cage fan/power sense card R2 cable problem, MAP 3421 266 storage cage fan/power sense card R2 jumper and cable problems, MAP 3422 268 switching ESS power off (automatic mode) information 21 switching ESS power off (automatic mode) information 21 switching ESS power off (local mode) information 20 switching ESS power off (local mode) information 20 switching ESS power off (remote mode) information 21 switching ESS power off (remote mode) information 21 switching ESS power on and off (all modes) information 19 switching ESS power on and off (all modes) information 19
switching ESS power on and off information 19 switching ESS power on and off information 19 switching ESS power on and off information 19 switching ESS power on and off information 19 switching ESS power on and off information 20 switching ESS power on and off information 20 system exception reports 34 start 34
(automatic mode) (automatic mode) (local mode) (local mode) (remote mode) (remote mode)
T
Taiwan compliance statement xx testing modem communications 94 this repair requires a larger capacity DDM, MAP 3619 301 topics, information 1 TotalStorage expert information 10 TotalStorage expert information 10 trademarks xx
U
UEPO loop problem, MAP 2365 133 unable to determine DDM use, MAP 3627 304 unexpected host bay power off, MAP 4810 452 uninstalled SSA DDMs connected to loop A, MAP 3200 237 uninstalled SSA DDMs connected to loop B, MAP 3210 238 unrelated event caused resume failure, MAP 3570 288 unrelated occurrence, retry verification test, MAP 3560 287 using the DDM bay maintenance analysis procedures (MAPs) 176 using the ESS operator panel information 17 using the ESS operator panel information 17 using the SSA DASD maintenance analysis procedures (MAPs) 176
V
VCCI (see Japanese Voluntary Control Council for Interference) xix verify a DDM bay repair, MAP 3500 283
W
web initiated format incomplete, MAP 3540 285 where to start all service actions 29 wrong drawer type error, MAP 3190 236 wrong, missing, or failing bypass card, MAP 3650 307
Index
593
594
Overall satisfaction
How satisfied are you that the information in this book is: Very Satisfied h h h h h h Satisfied h h h h h h Neutral h h h h h h Dissatisfied h h h h h h Very Dissatisfied h h h h h h
Accurate Complete Easy to find Easy to understand Well organized Applicable to your tasks
h Yes
h No
When you send comments to IBM, you grant IBM a nonexclusive right to use or distribute your comments in any way it believes appropriate without incurring any obligation to you.
Address
___________________________________________________________________________________________________
Fold and _ _ _ _ _ _ _ _ _ _Fold and_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _Please _ _ _ _ _ staple _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Tape _ _ _ _ _ _ _ _ Tape _ _ _ _ do not _ _ _ _ NO POSTAGE NECESSARY IF MAILED IN THE UNITED STATES
IBM Information Development Department 61C 9032 South Rita Road Tucson, Arizona U.S.A. 85775-4401
_________________________________________________________________________________________ Please do not staple Fold and Tape Fold and Tape
SY27-7635-05
SY27-7635-05
Spine information: