You are on page 1of 8

Boeing Technology | Information Technology

Computing & Network Operations | Collaboration Services

Production Plumtree Environment


Health Monitoring
Performed by The Portal Infrastructure ServicesTeam
October 28, 2005

Prepared by: Nancy Lamb Bryce Hartford Bill Barker

The Team
Boeing Technology | Information Technology Computing & Network Operations | Collaboration Services What is this page for?

Portal Services G-4560 Portal System Administrators Portal Database Administrators

To identify the teaming of SAs, DBAs and Portal Services

Monitoring Activities currently in place


Boeing Technology | Information Technology Computing & Network Operations | Collaboration Services

There are 4 Monitoring Activities currently in place using the following:


Empirix Monitors the User Activity HPSIM (CIM) is a Hardware Monitoring Tool AppManager is our OS Monitoring Tool Gadget Watcher monitors Tools & Services and Total Access F5s Big IP is our Load Balancing monitor The Cluster Service and Reporting Services monitor data and communicate with our DBAs

The Monitoring Systems


Boeing Technology | Information Technology Computing & Network Operations | Collaboration Services

Empirix OneSight
User level monitoring

HPSIM (CIM)
SA Hardware Monitoring Tool

SA

App Manager
SA OS Monitoring Tool

Gadget Watcher
Total Access and T&S

System

F5 Big IP
load balancing

Cluster Service

DBA

Reporting Servers

The Picture
Boeing Technology | Information Technology Computing & Network Operations | Collaboration Services
Data Collector launches monitors GadgetWatcher - web based monitor Built and used by Total Access and T&S

OneSight Console

URL Monitors check for web server response

CIM (HPSIM) is a Hardware monitoring tool provided by Compaq (now HP).

Total Access

F5 BigIP

Network

E-Tester transaction monitors simulate a user login through WSSO and the WSSO rest of the Portal Servers infrastructure Internal Policies External Policies Load Balancing

Backend System and Communities Legend

9 Web Servers

4 Portlet Servers

BCA

EAP Proxy Server

DB Cluster

2 Image Servers

Tools & SVC

Reverse Proxy Server

AppManager runs as a service collecting application data on a given server. If an application stops it attempts to restart it. Failing that it Sends alerts to appropriate personnel and BARS tickets are created.

The Cluster Service monitors each SQL box every few seconds to see if it answers. If no answer, it will move all the resources to the remaining servers. SQL is setup to create alerts if it detects certain processes such as restarts, fatal errors, replication failure, database integrity problems, backup failures, etc. Reporting Servers alert if replication stops.

Portal Services

Transaction monitor URL Monitor

Monitors by User Experience


Boeing Technology | Information Technology Computing & Network Operations | Collaboration Services

Portal Services
EMPIRIX User Experience Monitoring

This service simulates a user logging in and then determines the response time that is observed. It provides for Pro-Active health monitoring (against established response thresholds) Captures response time data to reflect the end user experience Provide reports Trigger alerts Data can be collected from multiple locations ( Kent, Long Beach, St. Louis) but requires additional licenses. Response time & availability ( 2 types of service ) Complex web transactions ( e-tester transaction scripts ) Url monitor ( Retrieves a url, tests for good/bad text

Current Monitors by System Administrators


Boeing Technology | Information Technology Computing & Network Operations | Collaboration Services

AppManager
Service collects data Administrators set alert thresholds Alerts can be sent to appropriate personnel via email or pager BARS tickets are created based on monitoring the AppManager console and specific alerts sent

CIM (HPSIM)
Hardware monitoring tool Service runs on each server Monitored from a workstation console Provides indicators when a monitored component is about to fail as well as a report of the system hardware specificaitons Alerts to the console show RED when any component has failed Pager, Email or BARS used to alert the proper personnel required to insure proper repair

BigIP Load balancing/failover service by F5 Networks, Inc. Part of clustering setup on the Big IP server Monitors for a specific circumstance normally available on the nodes within a cluster pool If node is not reachable, it is removed from the pool and the traffic is redirected until the node becomes available Notification or alerting is NOT set up within BigIP and is not currently available to the teams knowledge

GadgetWatcher

Web based monitor built and used by Total Access that determines if certain Total Access and Tools & Services community pages are accessible. Uses a script to authenticate through WSSO and access these portal pages on each node within the portal Accesses the Total Access backend portlet server Using both segments together, monitoring personnel can determine whether a failure is within the portal environment itself, or cause by a backend process failure.
E-mail or pager alerts can be sent directly by this tool in the event of a detected failure. The Plumtree SA Team is just starting to integrate this tool into their monitoring processes.

Monitors used by Database Administrators


Boeing Technology | Information Technology Computing & Network Operations | Collaboration Services

Cluster Service
The Cluster Service monitors SQL every few seconds to see if it answers. If not, it will move all the resources to the other server.

SQL Alerts
SQL is setup to create alerts if it detects certain things like if it restarts, fatal errors, replication failure, database integrity problems, backup failures, etc.

Reporting Servers
The reporting servers alert if replication stops.

You might also like