You are on page 1of 49

OptiX RTN 910/950

Troubleshooting
www.huawei.com

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Objectives

Upon completion of this course, you will be able to:

Describe the troubleshooting flow of OptiX RTN


910/950

Explain the alarms and outline their causes

Perform the troubleshooting for OptiX RTN 910/950

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page3

Contents
1.

Fault Handing Flow Introduction

2.

Methods of Analyzing and Locating Faults

3.

Classified Troubleshooting Analysis

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page4

General Fault Handling Flow


Start
Rectify fault
Observe and
record fault
phenomenon
External
cause

No Contact Huawei

Yes
Yes Other handling
flow

Write fault
handling report

No
Find cause and
Locate fault

Rectify fault

for technical
support

End

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page5

Find solution
together
and rectify fault

Emergency Handling Flow


Start

Wrong
Operation ?

Yes Perform reverse operation


to restore service

No
Equipment
alarm ?

Yes

Reset/re-insert/replace
board

No
Signal
loss alarm?

Be continued 1

Yes

Perform
No
Loopback on opposite
port

Reset/replace board
of the opposite NE

Handle anomaly of
interconnected equipment

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Be continued 2

Page6

Emergency Handling Flow


(Cont.)
Be continued 1
Line alarm?

Yes

Line
alarm on adjacent
NE?

Be
continued
2
Yes Handle fiber cut/board
fault/power supply problem

No
Protection switch
configured?

No

Change service route or


use standby route

No
Yes
Reset faulty board or
protection protocol

Check and use


standby route

Be continued 3
Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Be continued 4

Page7

Emergency Handling Flow


(Cont.)
Be continued 3
Any
loopback ?

Yes

Be
continued
4

Change port loopback


configuration

No

Yes
Service configuration
Error ?
No

Change service
configuration

Contact Huawei for End


technical support

No
Fault ratified?

Yes

End
Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page8

Contents
1.

Fault Handing Flow Introduction

2.

Methods of Analyzing and Locating Faults

3.

Classified Troubleshooting Analysis

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page9

Basic Principles of Fault


Locating
2

3
High-

External

First, then
Internal

Site First,

then Board

Severity
Alarms First,
then LowSeverity
Alarms

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page10

Common Methods of Fault


Locating
Common Methods of Fault Locating

Eth./MPLS
Alarm
Test
with
ReplaceLoopback
testing
analysis
ment instrumentResetting
and RMON
monitoring

Analyze first, then Loopback, and finally replace the board

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page11

Alarm Analysis

Use NMS

Comprehensive

Accurate

Current alarms, history alarms, occurrence time and


performance event data can be queried

Observe indicators on boards indicators

All alarms/performance events from the whole network

No alarm detail and history alarms

Note:

Besides the alarms, in the IP radio system, to query the


transmit and receiving power are also important and useful

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page12

Common Alarm Description


Alarm

Alarm Name Indication

Type
Equipment

POWER_FAIL

alarm

The power supply is in an abnormal


state.

FAN_FAIL

The fan is faulty.

BDSTATUS

The board is off-position.

NO_BD_SOFT

The board has no software.

HARD_BAD

The board hardware is faulty.

SYN_BAD

The clock synchronization source is


degraded

NESTATE

The NE is in the installation status.

_INSTALL
Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page13

Common Alarm Description


(Cont.)
Alarm

Alarm Name Indication

Type
Microwave

MW_LOF

link alarms

Loss of microwave Reed Solomn


frames.

MW_FECUNCO FEC correct the bit errors in MW


R

frames.

CONFIG_NOSU Wrong parameter in ODU.


PPORT
RADIO_RSL_L

ODU receiving power low / high.

OW/ HIGH
RADIO_MUTE

ODU was mute.

SYN_BAD

The clock synchronization source is


degraded

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page14

Common Alarm Description


(Cont.)
Alarm

Alarm Name

Indication

Microwave

IF_CABLE_OPE

IF cable uninstalled.

link alarms

Type

MW_LIM

Link ID mismatch in microwave


frame.

MW_RDI

Microwave remote defection.

RPS_INDI

Radio protection (1+1 backup)


switched.

LOOP_ALM

ODU/IF port was looped.

TEMP_ALARM

ODU/IF temperature is abnormal.

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page15

Common Alarm Description


(Cont.)
Alarm

Alarm Name

Indication

Service

ETH_LOS

The network interface is disconnected.

Alarms

ETH_LINK_DOWN

Ethernet port connection is faulty.

ALM_IMA_LIF

Received IMA frames is lost.

CES_MISORDERPK

the number of lost out-of-order

T_EXC

CES packets exceeds specified

Type

threshold
MP_DOWN

A failure of the MP group.

MPLS_TUNNEL_LO

loss of tunnel connectivity verification..

CV
PW_DOWN

A PW service connection is down

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page16

Case Analysis
MW_LO
F

NE1

MW_RDI
RPS_IN
DI

NE3

NE2

Description

NE1 & NE2 is1+1HSB configuration,

TherewasanalarmMW_LOF"onNE1,

Alarm"MW_RDI",RPS_INDIonNE2.

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page17

Loopback

It is useful in the physical layer availability check,


such as the signal loss, loss of frame alarms

Do not use in the NNI ports and E-LAN service

It interrupts the traffic and inband DCN, must be


carefully

Inloop

Inloop
RTN 910/950

Ethernet
outloop

outloop

ODU/IF
outloop
Inloop

CES E1, cSTM-1


Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page18

Replacement

If any component is suspected to be faulty, replace


the component and locate the fault

In the case of replacement, use one component that


works normally to replace one probably faulty
component to locate and rectify the fault

The replaceable components include the equipment,


boards and cables

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page19

Test with instrument

This method is the most authoritative, but we must


have the devices in hand
Instrument

Test item

Bit error testing device

Bit error/traffic

Optical power meter

Optical power

SDH analyzer

Bit error/traffic/overhead
bytes

SmartBits

Ethernet service

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page20

Resetting

Resetting is a restoration scheme for application programs


and data configurations. When the component is not running
properly, after resetting, it will return to the normal state

Resetting boards

Resetting equipments by power off and on

Resend the configuration

Reset Modes:

Warm reset loads the correct programs and data on the


equipment

Cold reset restores the correct programs and data before the
CPU power failure

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page21

OAM Testing for Ethernet

Performing the LB Test

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page22

OAM Testing for Ethernet


(Cont.)
Performing the LT Test

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page23

OAM Testing for Ethernet


(Cont.)
Performing the CC Test

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page24

OAM Testing for MPLS Tunnel

Performing the LSP Ping Test

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page25

OAM Testing for MPLS Tunnel


(Cont.)
Performing the LSP Traceroute Test

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page26

Contents
1.

Fault Handing Flow Introduction

2.

Methods of Analyzing and Locating Faults

3.

Classified Troubleshooting Analysis

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page27

Contents
3. Classified Troubleshooting and Locations
3.1 Microwave link Troubleshooting
3.2 CES Service Troubleshooting
3.3 Ethernet Service Troubleshooting
3.4 IMA Troubleshooting
3.5 LAG Troubleshooting
3.6 ML-PPP Troubleshooting
3.7 MPLS APS Troubleshooting
3.8 QoS Troubleshooting

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page28

Microwave Link
Troubleshooting
Service down or bit error, loss of
packet hanppened

Impact

Microwave link protection


switching

Cause

HARD_BAD

Power abnormal for wrong setting,

fading, interference or hardware

Sympto
m

TEMP_ALARM
IF_INPWR_ABN
RADIO_MUTE

failure

Attention the
alarms:

RADIO_TSL_HI
GH

Alarms like MW_LOF ,

MW_FECUNCOR, etc. reported on IF or

RADIO_TSL_LO

ODU units

W
RADIO_RSL_HI
GH

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page29

IF_CABLE_OPE

Microwave Link Troubleshooting


(Cont.)
Fault in microwave link

Common Fault Causes

The transmit power is


abnormal

The ODU is faulty or the frequency /


power wrong setting.

The receive power is


permanent lower than the
ideal value

The antenna direction is not properly


adjusted or be moved.
The antennas have different
polarization directions since installed
or after changing the ODU.

There is an obstacle in the transmit


direction.

the connection between the antenna


and the ODU are abnormally (loose).

The ODU is faulty or the transmit


power is abnormal on the opposite
ODU.

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page30

Microwave Link Troubleshooting


(Cont.)
Fault in microwave link

Common Fault Causes

The receive power is


The fading margin is not sufficient.
abnormal due to slow downfading.
The receive power is
abnormal due to fast
fading.

The multipath fading is fast.

The power are ok, service

The link ID on both sides of one hop


are not consistent.

down with MW_LIM alarm.

The receive power is always There is external interference.


normal, but the microwave
link becomes faulty
occasionally.

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page31

Microwave Link Troubleshooting


(Cont.)

Fault Locating Methods:

1. Check whether the ODU is mute, powered off, or looped back.


Check whether the data configuration is correct
2. Check whether the ODU and the IF board are faulty
3. If the transmit power is abnormal, replace the ODU
4. If the receive power is abnormal, check out the possible causes
based on the fading type
5. If the receive power is always normal, but the microwave link
becomes faulty occasionally. Check whether there is
interference before you proceed
6. If the transmit/receive power is normal, perform loopback
operations
Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page32

CES Service Troubleshooting


Impact

bit errors taken place or service


interrupted

Cause

Hardware failure or client signal

loss

The failure in PW, tunnel or radio

link

Sympto
m

Correspondin

CXPAR
ML1
ML1A
CD1

Alarms like T_ALOS, AIS, etc.

reported on corresponding CES ports

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

g boards:

Page33

CES Service Troubleshooting


(Cont.)
Symptom

The CES service


is interrupted.

Alarm Reported

Board

HARD_BAD, TEMP_OVER, or
BUS_ERR

CXPAR, ML1 or ML1A

COMMUN_FAIL

CXPAR

T_ALOS

CXPAR, ML1, or ML1A

UP_E1_AIS or DOWN_E1_AIS

ML1 or ML1A

MPLS_TUNNEL_LOCV

CXPAR

PW_DOWN
The CES service
has bit errors and
the
communication is
degraded.

HARD_BAD,TEMP_OVER, or
BUS_ERR

CXPAR, ML1 or ML1A

SYNC_C_LOS or LTI

CXPAR

CES_LOSPKT_EXC,CES_MISORDER CXPAR, ML1, or ML1A


PKT_EXC,CES_MALPKT_EXC,CES_S
TRAYPKT_EXC,CES_JTRUDR_EXC,
or CES_JTROVR_EXC

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page34

CES Service Troubleshooting


(Cont.)
Meter

Replacing

testing

boards

Optical power
meter
- SDH analyzer
- BER tester
-

HARD_BAD
COMMUN_FAIL
BUS_ERR

Fault Locating
Methods
Client
-

Laser
Cable
Loop

side

Other
-

layer

PWE3
MPLS Tunnel
Radio link
Clock

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page35

Ethernet Service
Troubleshooting
Correspondin

Impact

Error packets, loss packets or


interruption on the service

Cause

Hardware failure or client signal


AUXQ
problems
Wrong data setting
EF8T
The failure in PW, tunnel or radio
EF8F
link
Alarms like ETH_LINK_DOWN on Eth.

g boards:
CXPAR

Sympto

Board

EG2

Client side report the faulty or loss

packet

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page36

Ethernet Service Troubleshooting


(Cont.)
Symptom
The Ethernet
service is

Alarm Reported
HARD_BAD, TEMP_OVER, or
BUS_ERR

interrupted.

The Ethernet
service loses
packets or has
erorred
packets.

Board
CXPAR, EF8T, EF8F
or
EG2

COMMUN_FAIL

CXPAR

ETH_LOS,ETH_LINK_DOWN,ET
H_AUTO_LINK_DOWN,
LOOP_ALM, or MAC_FCS_EXC

CXPAR, EF8T, EF8F


or

LASER_SHUT or LSR_WILL_DIE

EF8F or EG2

HARD_BAD,TEMP_OVER, or
BUS_ERR

CXPAR, EF8T, EF8F


or

EG2

EG2
LSR_WILL_DIE

EF8F or EG2

MAC_FCS_EXC or FLOW_OVER

CXPAR, EF8T, EF8F


or EG2

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page37

Ethernet Service Troubleshooting


(Cont.)
Replacing

Meter testing
- Optical power meter
- Ethernet analyzer
- BER tester

boards
-

HARD_BAD
COMMUN_FAIL
BUS_ERR

Fault Locating
Methods
Client
-

side

Laser, cable
Negotiation, MTU
Loop

Other
-

layer

PWE3
MPLS Tunnel
Radio link

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page38

IMA Troubleshooting
Symptom
The IMA group is

Alarm Reported

Board

IMA_GROUP_LE_DOWN

CXPAR,

invalid, and the service IMA_GROUP_RE_DOWN

ML1, or

is interrupted.

ML1A

One IMA group member ALM_IMA_LIF

CXPAR,

link is invalid, and the

ALM_IMA_RFI

ML1, or

service on the faulty

ALM_IMA_LODS

ML1A

link is shared by other

ALM_IMA_RE_RX_UNUSAB

member links. The IMA

LE

port is congested, and

ALM_IMA_RE_TX_UNUSAB

the packets of the

LE

service are lost.


Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page39

IMA Troubleshooting (Cont.)


IMA

groups not
enable or IMA
group members
are invalid

IMA

groups
negotiation
fail

Possible
Wrong

interface
setting of the
IMA member
link

Causes
Other
-

layer

PWE3
MPLS Tunnel
Radio link

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page40

LAG Troubleshooting
Symptom
The LAG is invalid, all

Alarm Reported
LAG_DOWN

Board
CXPAR

the member ports


cannot be used, and
the services are
interrupted.
The member ports in

LAG_MEMBER_DOW CXPAR

the LAG cannot be

used, and the packet

LOOP_ALM

of the service are lost. ETH_LOS


ETH_LINK_DOWN

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

CXPAR, EF8T, EF8F,


or
EG2

Page41

LAG Troubleshooting (Cont.)

Possible Causes:

Cause 1: The NEs at the two ends of the LAG are


incorrectly configured

Cause 2: The working mode of the member ports in the


LAG is set to half-duplex

Cause 3: The loopback is configured on the member


ports in the LAG

Cause 4: The connection of the member ports in the


LAG are improperly connected or disconnected

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page42

ML-PPP Troubleshooting
Symptom

Alarm Reported

The MP group is invalid


and the service is

Board
CXPAR, ML1, or

MP_DOWN

ML1A

interrupted.
The MP group member is PPP_LCP_FAIL or

CXPAR, ML1, or

invalid, and the packets

PPP_NCP_FAIL

ML1A

of the service are lost.

T_ALOS

The MP group member is

CXPAR, ML1, or

delayed, and the packets MP_DELAY

ML1A

of the service are lost.

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page43

ML-PPP Troubleshooting (Cont.)

Possible Causes:

Cause 1: The MP group is invalid

Cause 2: The negotiation of the protocols at the two


ends of the MP group member fails

Cause 3: The received signals of the MP group member


port are lost

Cause 4: The MP group member delay exceeds the


threshold

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page44

MPLS APS Troubleshooting


Symptom

Alarm Reported

Board

The APS protection group is


incorrectly configured, or
the APS frame cannot be
received. In this case, the
protection fails.

ETH_APS_PATH_MISMAT
CH

CXPAR

ETH_APS_LOST
ETH_APS_SWITCH_FAIL
ETH_APS_TYPE_MISMAT
CH

When the working tunnel or MPLS_TUNNEL_LOCV


bypass tunnel is faulty, the MPLS_TUNNEL_MISMER
switching fails.
GE
MPLS_TUNNEL_MISMATC
H
MPLS_TUNNEL_Excess
MPLS_TUNNEL_SD
Copyright 2009 Huawei Technologies Co., Ltd.
All rights reserved.
MPLS_TUNNEL_SF

Page45

MPLS APS Troubleshooting


(Cont.)
Possible Causes:

Cause 1: The configurations at the two ends of the APS


protection group are inconsistent

Cause 2: The protocols at the two ends of the APS protection


group are in the inactive state

Cause 3: The optical fibers or cables are incorrectly


connected

Cause 4: A hardware alarm exists on the board where the


bypass tunnel resides, and thus the APS frame cannot be
transmitted

Cause 5: The clock alarms exist in the system

Cause 6: The working tunnel or bypass tunnel is faulty

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page46

QoS Troubleshooting
Symptom

Alarm Reported

Board

The traffic is large, and a FLOW_OVER

CXPAR, EF8T,

congestion occurs.

EF8F, or EG2

The service bandwidth is CES_LOSPKT_EXC

CXPAR, ML1, or

pre-empted, and the

ML1A

packets of the service


are lost or bit errors

CES_JTROVR_EXC
CES_JTRUDR_EXC

occur.

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page47

QoS Troubleshooting (Cont.)

Possible Causes:

Cause 1: The NE is not configured with the QoS policy

Cause 2: During the service configuration, an incorrect


QoS policy is selected

Cause 3: The bandwidth configured in the tunnel or


PW is small

Cause 4: The board is faulty, and the configuration


data is not delivered to the board

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page48

Summary

Fault Handing Flow Introduction

Methods of Analyzing and Locating Faults

Classified Troubleshooting Analysis

Copyright 2009 Huawei Technologies Co., Ltd. All rights reserved.

Page49

Thank you
www.huawei.com

You might also like