You are on page 1of 22

SRX Hardware

Troubleshooting

© 2014 Juniper Networks, Inc. All rights reserved. | www.juniper.net | Worldwide Education Services
Objectives

 After successfully completing this content, you will be


able to:
•List the general chassis components
•Identify different methods for troubleshooting major chassis
components
•Troubleshoot redundant Routing Engine and Control Board
communication

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 2
Agenda: SRX Hardware Troubleshooting

General Chassis Components


 Redundancy
 Hardware Case Study

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 3
The Juniper Architecture

 Every platform is different, but Juniper Networks


devices typically include these components:
•Host subsystem: Composed of the Routing Engine and
supporting hardware
•PFE boards: Where actual traffic processing takes place
•Midplane: Provides physical connection between different
boards. Often passive, and as such not likely to fail—except
when incorrectly inserting new boards (bent pins)
•Power supplies and cooling subsystems

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 4
SRX1400 and SRX3000 Series Hardware
 SRX1400 and SRX3000 Series components overview
•Common form-factor module (CFM) cards
• I/O cards (IOCs)
• Network Processing I/O Cards (NP-IOCs)
• Services Processing Cards (SPCs)
• Network Processing Cards (NPCs)
•Non common form-factor module cards
• Routing Engine (RE)
• SRX Clustering Module (SCM) (SRX3000 series)
• Switch fabric board (SFB) (SRX3000 series)
• Backplane and System I/O card (SRX1400)
•Other components
• Power supplies
• Cooling system

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 5
SRX5000 Series Services Gateway Hardware

 Card overview
•I/O cards (IOCs)
•Flex IOCs
•Services Processing Cards (SPCs)
•Switch Control Boards (SCBs)
•Routing Engine
 Major redundant components
•SCBs
•Power supplies
•Cooling system

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 8
Junos Chassis Management Architecture

 Junos Chassis
Management Architecture
chassisd Power control
•The chassisd process process I2c bus
• Runs on the Routing Engine
• Detects insertion/removal
of hardware components
• Monitors environmental Fan
sensors
parameters
Board Board Board
• Controls fan speed μkernel μkernel μkernel

• It can also reset PEM


Board Board Board sensors
and power up or down Sensors Sensors Sensors
hardware components

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 10
Useful Commands

 Verifying the operational status and environmental


parameters
show chassis hardware
show chassis environment
show chassis specific-board
•The specific board changes with the platform
•Details are in the CLI Reference Guide

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 11
Troubleshooting Hardware Problems

 Hardware problems
•Environmental/cooling issues
• Fan failures
• Temperature
• Power supply failures
•Hardware component is offline
• Board goes offline
• New board refuses come online
•Hardware online, but reports errors
• Routing engine running from secondary media

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 13
Troubleshooting Temperature Issues

 Temperature issues
•Fan failures
• Fan rotation speed is monitored by chassisd; a fan failure will
trigger an alarm - use show chassis alarms command
• The solution is a straight fan tray replacement - follow HW guide
•Check fan speed temperature thresholds using show
chassis temperature-thresholds command
•If fans are ok, but temperature is still high
• Check the airflow and that there is sufficient clearance space
around the intake and exhaust points
• Check and if needed clean or replace air filters
• Finally, if you get high temperatures in daytime, make sure the air
conditioning system is sufficient

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 14
Troubleshooting Power Supplies

 Power supplies
• chassisd process monitors power supply output
• If the output voltage drops an alarm is triggered - use show
chassis alarms command and in addition a SNMP trap is
sent and a message is logged in both the messages and the
chassisd logs
• Check the power supply status and to confirm the problem—use
the show chassis environment command
•Consult the hardware troubleshooting guide

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 16
Agenda: SRX Hardware Troubleshooting

 General Chassis Components


Redundancy
 Hardware Case Study

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 18
Hardware Redundancy

 Most platforms include some form of hardware


redundancy
•Routing Engines
• GRES or NSR
•Control Boards
• Often paired with Routing Engines
•Fabric redundancy
•Power supplies
•Cooling subsystem

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 19
SRX5000 Series Chassis Cluster (1 of 2)

 Replace Routing Engine, SCB, or both on high-end


series chassis cluster
1. On SRX5000 Series: Before powering down the node,
deactivate the fab interfaces in the configuration (This can
be done on either node):
• user@srx# deactivate interfaces fab0
• user@srx# deactivate interfaces fab1
• user@srx# commit
Failure to do this will result in some chassis cluster challenges later.
2. Replace the Routing Engine or SCB.
3. Power on the device.

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 21
SRX5000 Series Chassis Cluster (2 of 2)

 Replace Routing Engine, SCB, or both on high-end


series chassis cluster
4. Enter the cluster ID information using the following
operational mode command:
• set chassis cluster cluster-id X node Y reboot
5. The device reboots with the cluster configuration and joins
the cluster.
6. On SRX5000 series once the node is back in the cluster,
activate the fab interfaces:
• user@srx# activate interfaces fab0
• user@srx# activate interfaces fab1
• user@srx# commit

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 22
Agenda: SRX Hardware Troubleshooting

 General Chassis Components


 Redundancy
Hardware Case Study

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 23
Case Study: Network “Slow-Down” (1 of 3)

 Data center problem


•Customer reported “network slow-down” problem
•Initial investigation was done on the SRX 5600 device:
• An alarm on the system was raised: 2012-01-20 12:53:09
EST Minor Check CB 0 Fabric Chip 1

DATA CENTER
Servers
SRX Series
Gateway

Internet Network

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 24
Case Study: Network “Slow-Down” (2 of 3)

 Data center problem


•Message files contained following entries:
fpc1 ICHIP(0)_REG_ERR:Non first cell drops in ichip fi rord: 43
fpc1 ichip_f_check_dest_errors: Fabric request time out for plane 2 dest 1 pfe 0
fpc1 ichip_f_check_dest_errors: Fabric request time out for plane 2 dest 1 pfe 0
fpc0 ICHIP(1)_REG_ERR:Non first cell drops in ichip fi rord: 84
alarmd[1173]: Alarm set: CB color=YELLOW, class=CHASSIS, reason=Check CB 0 Fabric
Chip 1
craftd[1174]: Minor alarm set, Check CB 0 Fabric Chip 1
fpc1 CMXDPC: CRC link error detected for FPC: 1 PFE: 0 fabric plane 2

•The operational mode command show chassis fabric


fpcs revealed a link error on the FPC1

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 25
Case Study: Network “Slow-Down” (3 of 3)

 Data center problem


•The SRX Series device was rebooted
•Problem disappeared but after some time reoccurred
•JTAC suggested following actions for the affected FPC1 to be
executed separately and monitored to check if they resolved
the issue checked
1. Offline and online the FPC1 via the CLI using the command:
• request chassis fpc slot number offline
• request chassis fpc slot number online
2. Physically offline, remove, and reseat the FPC in the chassis, by
using the chassis FPC offline button, pull the FPC out, reseat it in
the slot, and then use the online button on the chassis.
3. If both previous actions fail RMA for the FPC is needed
© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 27
Summary

 In this content, we:


•Listed the general chassis components
•Identified different methods for troubleshooting major
chassis components
•Troubleshot redundant Routing Engine and Control Board
communication

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 28
Review Questions

1. What are the main hardware components of a


Juniper device?
2. What is the Junos process responsible for managing
and monitoring hardware?
3. Which information resources are available to you
when troubleshooting hardware issues?

© 2014 Juniper Networks, Inc. All rights reserved. Worldwide Education Services www.juniper.net | 29
Worldwide Education Services

You might also like