You are on page 1of 72

CA Network and Systems Management (NSM)

Diagnostics Guide
r11.1\r11.2

Revision Date: September 29, 2008

This documentation (the Documentation) and related computer software program (the Software) (hereinafter collectively referred to as the Product) is for the end users informational purposes only and is subject to change or withdrawal by CA at any time. This Product may not be copied, transferred, reproduced, disclosed, modified or duplicated, in whole or in part, without the prior written consent of CA. This Product is confidential and proprietary information of CA and protected by the copyright laws of the United States and international treaties. Notwithstanding the foregoing, licensed users may print a reasonable number of copies of the Documentation for their own internal use, and may make one copy of the Software as reasonably required for back-up and disaster recovery purposes, provided that all CA copyright notices and legends are affixed to each reproduced copy. Only authorized employees, consultants, or agents of the user who are bound by the provisions of the license for the Software are permitted to have access to such copies. The right to print copies of the Documentation and to make a copy of the Software is limited to the period during which the license for the Product remains in full force and effect. Should the license terminate for any reason, it shall be the users responsibility to certify in writing to CA that all copies and partial copies of the Product have been returned to CA or destroyed. EXCEPT AS OTHERWISE STATED IN THE APPLICABLE LICENSE AGREEMENT, TO THE EXTENT PERMITTED BY APPLICABLE LAW, CA PROVIDES THIS PRODUCT AS IS WITHOUT WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT. IN NO EVENT WILL CA BE LIABLE TO THE END USER OR ANY THIRD PARTY FOR ANY LOSS OR DAMAGE, DIRECT OR INDIRECT, FROM THE USE OF THIS PRODUCT, INCLUDING WITHOUT LIMITATION, LOST PROFITS, BUSINESS INTERRUPTION, GOODWILL, OR LOST DATA, EVEN IF CA IS EXPRESSLY ADVISED OF SUCH LOSS OR DAMAGE. The use of this Product and any product referenced in the Documentation is governed by the end users applicable license agreement. The manufacturer of this Product is CA. This Product is provided with Restricted Rights. Use, duplication or disclosure by the United States Government is subject to the restrictions set forth in FAR Sections 12.212, 52.227-14, and 52.227-19(c)(1) - (2) and DFARS Section 252.227-7013(c)(1)(ii), as applicable, or their successors. All trademarks, trade names, service marks, and logos referenced herein belong to their respective companies. Copyright

2008 CA. All rights reserved.

Last Updated: September 30, 2008

Contents

Chapter 1: Introduction
CA Product References ........................................................................ 1-2

Chapter 2: Troubleshooting Basics


Identify the Environment ...................................................................... 2-1 Separate the Problem from the Symptoms ..................................................... 2-2 What Happened .......................................................................... 2-2 What is the Extent of the Problem? ........................................................ 2-4 Identify Potential Causes ...................................................................... 2-5 A Change in the Environment .............................................................. 2-5 A Change to the Product or Component .................................................... 2-6 User Error ................................................................................ 2-7 Confirm your Conclusions ................................................................. 2-7 Apply the Cure and Document the Solution ..................................................... 2-8

Chapter 3: Symptoms and Solutions


Installation and Setup Issues .................................................................. 3-1 I have installed MDB on a dedicated server with just WorldView Manager but the AIS local catalog is missing ................................................................................ 3-1 The installation halted because the estimated size of the System Path Entry would have been exceeded based on the components selected ............................................... 3-1 Internal error: function LookupAccountSid failed, rc=1789, Reason= The trust relationship between this workstation and the primary domain failed ..................................... 3-2 Agent Technology (AT) Issues ................................................................. 3-3 After applying a system-wide security patch, we had to reboot a large number of servers relatively quickly. Now, one of the managed nodes is showing a status of any:absent. What does this mean? ................................................................................... 3-4 NSM r11.x Agents are not discovered by DSM .............................................. 3-5 In DSM Monitor View theres a yellow exclamation mark next to MDB Connection and it has a WV Error Message ............................................................................ 3-6 Discovery Issues ............................................................................. 3-7 New classes are not classified correctly by Continuous Discovery or Classical Discovery........ 3-7 Event Management (EM) Issues ............................................................... 3-8

Last Update: September 30, 2008

Contents iii

CA Product References

Unable to Run Commands ................................................................ 3-8 Message Records and Actions Not Working ................................................. 3-9 Held Messages Do Not Appear in Next-Day Console ........................................ 3-10 Unable to Execute opreload .............................................................. 3-10 Console Does Not Refresh................................................................ 3-10 Console Messages Not Being Forwarded .................................................. 3-11 Message Record Action Banner is not Functional .......................................... 3-12 MCC Issues ................................................................................. 3-13 The RMI Connection to xxx has been lost. Please restart the Management Command Center to access this namespace................................................................... 3-14 The colors in the MCC are not propagating correctly up the LHP tree -or- Changes are not being reflected in the MCC ..................................................................... 3-15 When I select the Console Log plugin on the Right Hand Pane in the MCC, I get a message .. 3-16 Newly installed component icons dont show correctly...................................... 3-17 Cannot edit Message Records in MCC (Command execution denied) ........................ 3-17 In the MCC left hand pane, there is no Alert plugin available or Console Log Plugin .......... 3-18 No alerts are created .................................................................... 3-18 In the MCC Topology there is nothing under the WorldView object .......................... 3-19 System Performance Issues ................................................................. 3-20 A performance object is shown as blue unknown state in nodeview ......................... 3-20 I dont see the MIB values I associated to a specific class available in Trend ................. 3-20 Unicenter Configuration Manager ............................................................. 3-21 When I deliver a profile from UCM, the agent doesnt get the profile. ....................... 3-21 General/Miscellaneous Issues ................................................................ 3-22 Cannot connect to UBI/IIS on Windows 2003 (HTTP 404) .................................. 3-22 IP Address returned by hostname doesnt match IP returned by DNS ....................... 3-22

Chapter 4: Troubleshooting DIA


Where Can I Find Additional Information on DIA? .............................................. 4-1

Chapter 5: Working with Support


CA Technical Support Structure ............................................................... 5-1 Telephone Support ....................................................................... 5-2 Problem Tracking and Web Support........................................................ 5-2 CA Technical Support Organization ........................................................ 5-3 Escalation................................................................................ 5-3

iv

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

CA Product References

Chapter 6: Tools for Troubleshooting


Monitoring your Environment .................................................................. 6-1 CPU Bottlenecks .......................................................................... 6-1 Event Management ....................................................................... 6-2 WorldView ................................................................................ 6-2 Agent Technology ......................................................................... 6-2 Running Reports .......................................................................... 6-3 Checking History and Log Files ............................................................. 6-5 Verifying Functionality ........................................................................ 6-6 Components .............................................................................. 6-7 Databases ............................................................................... 6-13 Communications and Networks ........................................................... 6-14 Using Debug Mode and Diagnostic Trace ...................................................... 6-15 Modifying Log Files Permanently .......................................................... 6-16 Modifying Log Files Temporarily ........................................................... 6-17 Enterprise Management Tracing .......................................................... 6-17 Agent Technology and WorldView Tracing ................................................. 6-21 Common Services Tracing ................................................................ 6-22 Dynamic Tracing ......................................................................... 6-24 Circular Trace ........................................................................... 6-25

Last Update: September 30, 2008

Contents v

Chapter 1: Introduction
This Diagnostics Guide provides information to help you troubleshoot problems that may occur in your CA Network and Systems Management (NSM) infrastructure. Several of the chapters contain symptoms and solutions": descriptions of errors that have occurred at customer sites and tips for diagnosing and fixing those errors. You can use this guide to try to solve problems yourself before calling CA Technical Support. If you have visited the CA Support web site (http://support.ca.com) you may have seen some of these tips in the form of FAQs. In fact, this guide combines useful information from various sources at CA: support technicians, field service representatives, product specialists, and more. The information in this guide focuses on CA NSM 11.x and is arranged as follows: Basic troubleshooting techniques Symptoms and solutions for: Installation and General Issues Enterprise Management WorldView Agent Technology and Performance Discovery Issues

Information about CA Technical Support Unless otherwise noted, the procedures and syntax provided in each section pertain to 11.0/11.1 releases. Operating system variations are provided where applicable. Important! The Additional Guidelines for Troubleshooting DIA chapter that was previously available in this guide has been removed and is now part of the new DIA Supplemental Implementation Topics Guide for CA NSM r11.1 and r11.2. This document is available for download from the CA NSM Home Page on http://support.ca.com .

Last Update: September 30, 2008

Chapter 1: Introduction

11

CA Product References

CA Product References
This document contains references to the previous Unicenter NSM Diagnostics Guide (for Unicenter TNG 2.4 & NSM 3.0). If you cannot remedy your issue using either document, please contact CA support.

12

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Chapter 2: Troubleshooting Basics


This chapter provides guidelines for conducting preliminary troubleshooting for your implementation. Identifying a problem and the conditions under which it occurs can help you diagnose the cause of that problem and lead you to the appropriate solution. Note: The information gathered will also help if you need to contact technical support for further assistance. For information about Computer Associates Technical Support, see the chapter Working with Support later in this guide. Keep in mind that the troubleshooting is not always a straight line from problem identification to problem resolution. You may occasionally find yourself repeating the process several times, particularly when the problem has multiple, interrelated causes.

Identify the Environment


The first step is to identify the environment in which the problem occurred this includes both the hardware and software versions as well as any currently applied (or pending) patches. In some cases, an error may result from a missing service pack or an outdated operating system level. Therefore, before you begin troubleshooting the problem, identify: The CA NSM version, patch level, and list of deployed components (for example, Event Management, Windows System Agent). Useful commands include the following: univer.exe (for Windows only) awservices version cautenv.exe (for Windows only) ca_version (for UNIX) caiserv.exe (for Windows only) or caiserv (for UNIX) unifstat

The operating system version and most recently applied patches for each of the systems involved (for example server and agent installations). Also note security details for the operating system setup (for example, Domain details, mapped drives, user IDs being used). Useful commands include: winver.exe / ver.exe (for most Windows environments) ca_syscheck command from the CA NSM DVD image (for UNIX environments) uname command (for UNIX environments)

Last Update: September 30, 2008

Chapter 2: Troubleshooting Basics

21

Separate the Problem from the Symptoms

Version and patch level of any additional software packages that are interacting with your installation (for example, Microsoft SQL Server). Network protocols, firewall port limitations, and any other relevant communications details (for example LAN or WAN and network speed) After you have identified your environment, it is time to identify the problem.

Separate the Problem from the Symptoms


To solve the problem you need to clearly understand what it is and not just what the symptoms are. To do this you need to ask: What happened Where it happened When it happened What effect it had on the rest of your environment

What Happened
Typically, a problem is identified when something unusual or unexpected happens but it can also be suspected when something that normally happens does not. Therefore you need to identify the event that occurred (or did not occur). For example, you will need to identify: What should have happened if everything had been working properly (for example, discovery of a specific subnet)? What actually happened (or did not happen that was supposed to)? or happened that should not have happened (for example, did specific MRA not execute)? What is the specific error (for example, Agent View values are invalid)? What error messages or returned codes were issues (and from where)? Was this an isolated incident (for example, were other subnets discovered or other MRA executed on this node)? Answering these questions will identify the scope of your search for a solution. It is critical to examine not only what did happen but also what did not happen; this will identify the scope of your search for a solution. For example, suppose an agent is running but the Agent View display includes only question marks in place of a valid status. In this case you would verify the communications between the agent and the machine launching Agent View by, launching mibbrowse from the same Agent View machine, for example, using both the machine name and ip address for the agent.

22

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Separate the Problem from the Symptoms

Where It Happened
The next step is to further isolate the location of the error to determine how widespread it is. Find out if the error message was generated on a single machine or in a p the following: Useful questions to ask include the following: Which machines are affected (for example, all users in a specific subnet)? Note: Limit the number of functions being performed by the suspect machine to further isolate the problem. For example, if you suspect a problem with a specific policy on the DSM, exclude all other policies and update DSM to limit monitoring to a single instance of the failing agent. This will also reduce the amount of data that will need to be sifted though in the log files. If more than one machine is affected, what do they have in common (for example, does the installation only fail on computers using a dial-up connection)? Which specific component had the error (for example, the held messages pane on the Event Console)? If more than one component is affected, how do these components relate to one another? By isolating the specific machines on which the error is observed, you can find other similarities that may identify the root of the problem. For example, are all users unable to log in or just a select few? If just a select few, are they all defined in the same domain? If so, this indicates a problem with that particular domain. If Job Management cannot run the job, can it be run from the command line? If not, the problem may be with the job itself, not Job Management.

When It Happened
Another crucial step is to identify the time the problem occurred. This can help you limit potential causes to only those events that happened during that time. Useful questions to ask include the following: What function or functions were being performed at the time? When did the error first occur (for example, the Monday after a long weekend, shortly after the router was replaced, after a change to daylight savings time, and so on)? Has the problem been repeated since that first observation? If so, is there a pattern to that repetition (for example, every Friday after the weekly backup is performed)? What changes were made to the product before the problem occurred (for example, was an upgrade recently applied)?

Last Update: September 30, 2008

Chapter 2: Troubleshooting Basics

23

Separate the Problem from the Symptoms

How often does the problem recur (for example, if it occurs during the execution of a particular process does it ALWAYS occur when that process executes)? What other events occurred at that time (for example, does the error occur only during times of heavy network traffic and disappear when the load is lighter)? Can the problem be repeated at will by running certain commands or by certain actions on the system? Does the problem happen randomly without relation to anything else on the system? By identifying a pattern in the times during which the problem occurs (for example, the day after a holiday, the end of the month, shortly after the agent configuration set was modified, after a new patch was applied), you may be able to pinpoint a potential trigger. For example, is the install package set to be delivered via a particular router that happens to be undergoing a software update of its own and is, therefore, unavailable? Important! If support requests and analyzes multiple log files, it is critical that all log files cover the same time span, unless otherwise specified.

What is the Extent of the Problem?


Finally, you need to identify the full scope of the problem. Useful questions to ask include the following: How many components have the error (for example, are all agents failing or only the Windows System Agent)? What is the size of a single error (for example, is it limited to a single machine or does it impact an entire network segment)? How many errors are on each component (for example, does a Calendar profile work under Event Management but not for Job Management)? If there are multiple ways of performing the affected function, does the problem occur with each method (for example, can a job be held through CAUTIL execute but not through a GUI action)? Can the problem be reproduced in a similar test environment?

24

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Identify Potential Causes

Once again, the goal is to detect a behavioral pattern for the problem. Does it recur periodically? Has it spread to other components or machines? Did it affect only computers that recently had an upgrade or system change?

Identify Potential Causes


After you have identified the problem and its scope, the next step is to identify the potential cause(s). The most likely candidates include: A change in the environment had an unexpected effect A change in the product or component had an unexpected effect User error

A Change in the Environment


When you identified your environment, you should have listed both the version level and most recently applied patch level for both the affected software and operating system. Review this list and identify what might have changed or what should have changed. This includes: Recent upgrades to the operating system, hardware, or software. Verify that the new version is supported by your version of CA NSM. Inconsistent patch application. If upgrade patches were applied to two out of three machines, does the problem correspond to the pattern patch application? In other words, if the problem affects only the machine without the patches and not the updated machines, it is likely that applying the missing patch will fix the problem. Incorrect patch application. If there are a number of functional and nonfunctional machines that are on the same patch level, the machines displaying the problem may not have the patch correctly applied. Note: If you installed software patches using the applyptf utility, you can also use that utility to list all patches added to the system. This is true for both Windows and UNIX. Tips on locating and reading the applyptf history file can be found in the Tools chapter. Inconsistent version levels. Are you running multiple releases of the same software on different components? Although CA NSM is backward compatible with previous releases, the managing component must always be of the most current release. For example, a 3.0 agent can report to a 11.x DSM, but the reverse is not supported Deleted, new, or newly renamed machines. If you renamed a server that was identified as a station in a Job Management job definition, that job may not run if it cannot find the old machine name.

Last Update: September 30, 2008

Chapter 2: Troubleshooting Basics

25

Identify Potential Causes

Changes to any part of your enterprise can affect other areas in ways that you may not expect. Therefore, you need to find out what changed and when. In addition to hardware, software, and operating system changes you should identify changes to the following: Component profiles (such as users, users groups, computers and computer groups Configuration files Security and access rights Other software that recently has been installed (for example, virus scanning software or security software) OS configuration (such as registry changes or anything reported in a winmsd report) Variables (such as the default mode for Security Management)

A Change to the Product or Component


Verify that the component and all required services are running and that any required databases are not corrupt. For a list of commands and utilities that you can use to verify component functionality, see the chapter Tools later in this guide. Most components require the smooth interaction of many different services and other components to work properly. In NSM r11.x DIA provides the major underlying communications method. Therefore, it is critical that you check the status of DIA when communications appears to be the issue.

26

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Identify Potential Causes

User Error
Sometimes the problem lies not with the software or its environment but, rather with the user who either does not know or does not understand how the product or component works. For example, under Security Management, if a user has more than one access rule defined using the most specific format, the most restrictive rule applies. Further, confusion about how jobs and jobsets are processed in the new-day autoscan can leave you wondering why a job has not been processed. Use the autoscan simulation utility to test how your current job and jobset definitions affect each other and how they will be processed. User error can also include something as simple as a missing or mistyped command parameter. Therefore, you should ensure that a simple error in tasks, or a misunderstanding of how a task was to be performed, did not cause the error. Check error logs and review correct procedures with the user who last worked with the affected function or component. It is always wise to take care when modifying defaults. Some modifications are only temporary and are therefore lost when the system is recycled. Some modifications are affected by update intervals; changes are not applied until the next interval expires.

Confirm your Conclusions


Once you have identified the potential causes, the next step is to confirm that conclusion to ensure that it is correct (and that you are not merely responding to symptom). Check various parts of your system, and enter commands that help you troubleshoot. Test things that you think may help identify what is causing the problem, and eliminate possible causes that do not, in fact, apply to your situation. The next several chapters are designed to help you identify possible causes for an error and evaluate whether the causes apply to your site. These chapters contain "symptoms and solutions" (errors and possible fixes for those errors) for CA NSM components.

Last Update: September 30, 2008

Chapter 2: Troubleshooting Basics

27

Apply the Cure and Document the Solution

Apply the Cure and Document the Solution


Once you have applied your solution verify that it, indeed, is the solution and not just a temporary fix. Many times, during troubleshooting, what appears to be a solution only resolves a symptom and does not get at the root cause of the problem. As a result, the problem will recur and may even exhibit different symptoms, leading you to believe that it is an entirely different problem. Therefore you should thoroughly test and document the solution for future reference. Do not simply throw solutions at a problem, waiting for one to stick. Finally, if you have identified the true cause, you should also identify steps to eliminate the potential for a repeat of that problem. If the problem was traced to a user error, it is critical that you ensure that the person responsible for the error understands the correct procedure. If the problem was the result of a missing or improperly applied patch, review your update process to determine what checks need to be applied and whether changes need to be made to ensure future compliance. If a faulty job definition caused a job to hang, verify that you have implemented valid standards and procedures for job definitions, and verify that the people responsible for defining jobs understand those standards and procedures.

28

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions


This chapter provides symptoms and solutions to several common problems that you may experience using CA NSM. It is divided into the following sections: Installation and Setup Agent Technology Discovery Event Management Management Command Center (MCC) Systems Performance Unicenter Configuration Manager (UCM) General\Misc

Installation and Setup Issues


The following topics pertain to issues that may arise during installation and configuration of CA NSM r11.x

I have installed MDB on a dedicated server with just WorldView Manager but the AIS local catalog is missing
Symptom No providers were selected during the install and, consequently, the AIS local catalog is missing and MCC does not launch correctly. You must select at least one provider for the local catalog to be created. If no providers are selected, the install process will determine that there is no requirement for a local catalog. If you install MCC without any providers, it will create an AIS catalog but there will be no DNA cells available

Solution

The installation halted because the estimated size of the System Path Entry would have been exceeded based on the components selected
Symptom The install process was halted after the component selection was made. Typically, this occurs on a Windows 2003 system with SP1 applied.

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

31

Installation and Setup Issues

Solution

For most contemporary Microsoft operating systems the maximum length for the PATH variable is 2,048 characters but for Windows 2003 Service Pack 1 that restriction has been reduced to 1,024 characters (see Microsoft KB article 906469). Once the component selection has been made, the NSM install process verifies that the maximum system path length will not be exceeded. This is likely to occur when multiple products (for example, USD, DSM and NSM) are installed on the same server. To resolve this you can do the following: Shorten the directory path name. By default, the NSM location is
\Program Files\CA\Shared Components\CCS\WVEM

Reduce number of components selected See Path Length Considerations for Unicenter NSM r11.x document on Implementation Best Practices page for more details and a tool for estimating path length.

Internal error: function LookupAccountSid failed, rc=1789, Reason= The trust relationship between this workstation and the primary domain failed
Symptom This message is encountered during installation and occurs when the computers machine account has an incorrect role or when its password has become mismatched with that of the domain database. Please refer to the following Microsoft issue for further details to resolve this issue: http://support.microsoft.com/kb/162797/ Log on locally as a local administrator. In the Network tool of Control Panel, select Change and enter a Workgroup name, leaving the domain. Restart the computer and log on locally as a local administrator. There are two methods to rejoin the domain: You can join the domain from the client if at the same time you can provide an administrator and password on the domain OR You can delete the existing computer account in Server Manager, recreate the computer account, synchronize the domain, and then on the client rejoin the domain If you have AD Running: Remove the PC from the domain

Solution

32

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Agent Technology (AT) Issues

Go into AD users and computer Delete your pc if listed in computers Rejoin the domain

Agent Technology (AT) Issues


The topics contained in this section pertain to Agent Technology including the Distributed State Machine (DSM). Managing the DSM, and, in particular, ensuring that it is not operating beyond capacity, is critical to managing the performance of your CA NSM implementation. Following are several frequently asked questions pertaining DSM management. How can I tell how many objects a DSM is currently managing? In r11, the dsmMonitor monitors the number of nodes and objects that the DSM is managing. You can determine these values through an SNMP GET request: dsmDataObjectTotal 1.3.6.1.4.1.791.2.10.72.2.3.2 dsmDataNodeTotal 1.3.6.1.4.1.791.2.10.72.2.2.2 In r 3.1, the following command can be used to obtain the object count value: storectrl store %COMPUTERNAME% AwNsm@%COMPUTERNAME% Select_CLASS_=AWMO_OBJECT End GetProperties moObject_moID|find /C "Object:" What "warning signs" should I look for that may indicate a DSM is overloaded? The number of managed objects and the frequency with which status changes occur for those objects both impact a DSM's capacity. When that capacity is exceeded you may experience the following symptoms: missed status false status large delay in status updates (which can be interpreted as a wrong status).

Typically, as the DSM's capacity nears its limit, you will see an increase in the number of retries. If your DSM is constantly polling at your -m set value, then you are probably close to overload for that value. What can I do to avoid overloading my DSMs? To minimize the chances of overloading your DSMs you need to plan your architecture, understand what factors impact DSM performance and regularly monitor DSM behavior to identify the early warning signs of a potential overload.

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

33

Agent Technology (AT) Issues

When planning your architecture, keep the following in mind for DSMs: Always monitor the DSM as a critical device Determine your failover policy up front DSMs need resources ( 8MB + processor cycles) Think of multiple DSMs when you monitor > 200 hosts Ensure Polling frequency is reasonable PING Only takes no resources on the monitored device DSMs should be location based DSMs report to a specific MDB You can implement multiple classes of DSM (Ping only/MDB connected/etc.) You can implement trap-multiplexing in a DSM Avoid hierarchically organized layers of DSMs (Can create bottlenecks) Put DSMs as close as possible to their monitored devices/agents

You should understand that DSM performance (and, therefore, overload potential) is affected by the following: type of hardware on which the DSM is installed location of the DSM in relation to the MDB and the agents being monitored use of cold start vs. warm start electronic proximity to hosts configuration and congestion of the network number of hosts number of managed objects polling configuration

After applying a system-wide security patch, we had to reboot a large number of servers relatively quickly. Now, one of the managed nodes is showing a status of any:absent. What does this mean?
Symptom Managed node shows a status of any: absent immediately following a reboot. A status of "any:absent" indicates that the DSM cannot communicate with the agent and, considering the timing of this 'event' (right after the reboot of a significant number of machines), this may be due to DSM overloading where a large number of status updates were communicated to the DSMs in a very short period of time. Therefore, you will need to look at:

Solution

34

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Agent Technology (AT) Issues

How many managed objects is the DSM managing? What is the polling interval? Has the aws_snmp -m option been specified? Is the any:absent status for different servers or always the same server? If the same server, review awm_catch. If for different servers, update aws_snmp to run level 4 debug mode and adjust the debug file size accordingly. Additional DSM scalability considerations can be found on the Implementation Best Practices page on SupportConnect.

NSM r11.x Agents are not discovered by DSM


Symptom NSM r11 agent is up and running, the host is discovered in the MDB but the DSM does not discover the agent. Try to run the following verifications and make sure that everything is functioning correctly. On the Agent Box: Awservices list Run this on the target machine. Make sure all AT services up and running. If they are not then issue an Awservices stop Awservices start And make sure all services have a RUNNING status. On DSM box: Wvgethosts o hosts This returns a list of hosts managed by this DSM. Run this command on the DSM box managing the target box. Check if the target host name is included in the list of machines returned? Diahosts Is the agent host visible through DIA? Pingagt node Is the agent reachable via agtgate/DIA? Dsmwiz In case of non-standard community string, is it defined to DSM CommScope?

Solution

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

35

Agent Technology (AT) Issues

Other considerations to check: Make sure the proper community strings are set. You can do this by checking in the MCCs Tool Plugin under DSM Community Strings or by running dsmwiz Open and look in the DNA host list in aws_agtgate.log. Is the agent host included?

In DSM Monitor View theres a yellow exclamation mark next to MDB Connection and it has a WV Error Message
Symptom The DSM Monitor icon is yellow and says WV Error status next to the MDB Connection.

Solution

Click on the Confirm Button and the status will go back to Normal.

36

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Discovery Issues

Discovery Issues
Topics in this section are relevant to WorldView Discovery both Classic and Continuous.

New classes are not classified correctly by Continuous Discovery or Classical Discovery
Symptom After adding a new class with its SNMP OID to the MDB (with MCC Class Specification for instance) it does not get classified correctly by Continuous Discovery or Classical Discovery Check that you did the following steps: Make sure that you added the following class level properties: asset_class_id with a default value of 0

Solution

asset_hierarchy_id with a default value of 1

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

37

Event Management (EM) Issues

In a TRIX import file they would look like this:


CLP=asset_hierarchy_id Others TNGWV_OT_INT 1 4 CLP=asset_class_id Others TNGWV_OT_INT 0 4

Make sure you have run the following commands after adding the new class:
Program Files\CA\SharedComponents\CCS\Discovery\BIN\UpdateClassRules.exe Program Files\CA\SharedComponents\CCS\Discovery\BIN\RuleToDBConverter.exe

Event Management (EM) Issues


The following topics pertain to Event Management.

Unable to Run Commands


Symptom Under Windows, the system does not let you run commands through Event Management You may need additional privileges. Under Windows, try the following solutions. Make sure that the user is defined in CA_OPR_AUTH_LIST, which identifies the users who are authorized to issue commands. To access this list, select Configuration, Settings, Event Management or execute the cautenv.exe command. For syntax details, see the online CA Reference. Verify that the user is granted permission in Security Management. To set required privileges for the Enterprise Management user ID (caunint by default), follow this procedure for Windows:

Solution

38

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Event Management (EM) Issues

1. On Windows 2000/2003, start Administrative Tools Policy. 2. Select Local Policies User Rights Assignments

Local Security

3. Double click on the policy and add the caunint user. 4. Select Act as part of operating system (SeTcbPrivilege), increase quotas (SeIncreaseQuota) and Replace a process level token (SeAssignPrimary) To set required privileges for users logged on by Event Management: 1. On Windows 2000/2003, start Administrative Tools Policy. 2. Select Local Policies User Rights Assignments Local Security

Double click on Logon as a batch job and add the user(s) that can be logged on by Event Management.

Message Records and Actions Not Working


Symptom Solution Message records and actions are not working Verify the following: Ensure that the oprcmd opreload command was run, or run opreload now from the Event Console command line. Find out which message action is sending messages. Messages sent by the SENDOPER message action are not evaluated or processed. Use the message action EVALUATE or FORWARD when sending messages that require further processing. Under Windows, check to see if the program needs to be run in interactive mode (as Notepad.exe does). If so, the message record action detail syntax should be, for example:
/int drive:\path\notepad.exe

Under Windows, ensure that the ID issuing the command is included in Users Authorized to Issue Commands in Config/Settings/Client Preferences/Event Management. Under Windows, verify that the user ID has permission to execute the command in the Unicenter TNG or NSM environment. Do this by entering the following from the Console command line and then execute the command
/int cmd

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

39

Event Management (EM) Issues

Held Messages Do Not Appear in Next-Day Console


Symptom Held messages do not appear in the console for the next day, even when they were not acknowledged All held messages are carried over to the console for the next day unless they have been acknowledged. There is no limit to how many days these held messages are retained. However, if the Event Management daemon is not running during the new-day rollover (at midnight), then those held messages are not carried over to the next day. For example, if you have held messages on Friday and you shut down Event Management over the weekend, these held messages are not carried over to the Monday console log (although they are still recorded in the Friday console log)

Solution

Unable to Execute opreload


Symptom An opreload command is successful when you enter it on the Event Management System Console command line, but not from the command prompt. You will receive a not found error. Refresh the active message record and message action lists with the definitions stored in the Enterprise Management database by entering the following command:
oprcmd opreload

Solution

Console Does Not Refresh


Symptom Solution The Console Log does not refresh. Try the following solutions: Verify that the autoscroll option has been enabled on the Console. Make sure that Event is active by running the unifstat command on the server. Verify that caiopr is running on UNIX. Verify that cautil select conlog and list conlog are working. Verify that the time setting is correct on both UNIX and NT. Verify that CAICCI connection is active on both the client and the server. On Windows, run the following command.
ccicntrl status

For connection from Windows to Windows, Transport and WindowsServer or Windows-Client should be running. For a connection from Windows to Unix, Remote should be running.

310

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Event Management (EM) Issues

On UNIX enter the following, and make sure that ccirmtd is running
ps -ef | grep cci

Run the following command on the receiver node:


Ccir

Run the following command on the sending node:


ccis node_name number_of_msgs

Run the following commands:


netstat |grep caic ccinet status

Increase the CAICCI time-out setting

Console Messages Not Being Forwarded


Symptom Solution Messages are not forwarded from a server to the Event console. You can do several things to confirm that a managed server and the Event Management console are functioning properly: 1. Send a message from each machine to the other using a cawto command. Confirm that the message was received by looking at the console. The syntax of cawto is:
cawto n node_name test_message

2. Define a policy on either or both servers to forward a received message to the other server. A policy consists of a message record (identifying the message to be intercepted) and the corresponding message action (indicating the action to be performed upon receipt of the message). If either of these tests is successful, you can be reasonably sure that the event managers and agents are functioning properly. If these tests were not successful, do the following: 3. Make sure that you can ping the object by both IP address and host name. If you cannot ping by host name, add the host name to the DNS, WINS, or Hosts File depending on your setup. If neither ping works, you may have a problem with the network. 4. If you are able to ping the target servers, execute the oprping command. Oprping is similar to ping, but it uses the common communication interface (CAICCI). The syntax of oprping on Windows is:
oprping target_server number_of_pings test_message

The syntax of on UNIX is:


oprping node

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

311

Event Management (EM) Issues

5. If the oprping command is successful, Security may be preventing the message from being forwarded. Make sure that the ID issuing the command to the target server is listed in that servers CA_OPR_AUTH_LIST. If you are still unable to determine why your commands are not functioning properly, and if the oprping was not successful, verify that CAICCI is functioning correctly: On Windows, run the following command.
ccicntrl status

For connection from Windows to Windows, Transport and Windows-Server or Windows-Client should be running. For a connection from Windows to Unix, Remote should be running. On UNIX enter the following, and make sure that ccirmtd is running.
ps -ef | grep cci

Run the following command on the receiver node:


Ccir

Run the following command on the sending node:


ccis node_name number_of_msgs

Run the following commands:


netstat |grep caic ccinet status

Message Record Action Banner is not Functional


Symptom On a Windows 2003 machine with CA NSM r11.1 installed, a Message Record Action (MRA) has been defined specifying the BANNER action. However, when it executes, there seems to be an invisible banner scrolling at the top of the screen. In a Windows 2003 environment, windows and/or dialogs displayed on the interactive desktop by the Unicenter service may appear transparent and cannot be interacted with. This may happen when the Unicenter service is started automatically under a user name (i.e., not Local System). This problem may disappear if the Unicenter service is manually restarted, but may then recur if you log off, then log back on again. This behavior may be exhibited by the BANNER and COMMAND (with /int option) actions, CABANNER and OPRCMD (with the /int option) command.

Solution

312

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

MCC Issues

This issue is caused by a change in behavior of the Windows operating system beginning with Windows Server 2003 and Microsoft is in the process of further changing this behavior to increase security. For more information on these changes, see the following Microsoft articles: Q171890 INFO: Services, Desktops and Windows Stations Q327618 INFO: Security, Services and the Interactive Desktop Q165194 INFO: CreateProcessAsUser() Windowstations, and Desktops As Microsoft makes these changes and more, in the future, CA will work to update CA NSM software in order to maintain previous levels of functionality with the newest operating system specifications.

MCC Issues
The following topics are related to problems with the CA NSM UIs - including WorldView and the Management Command Center (MCC). In addition to this section, you should also consult the "Troubleshooting" chapter in the Inside Systems Management Guide for additional information for such topics as: Agent View Message: Could Not Connect to ORB Agents Do Not Appear in Management Command Center or WorldView Agent View Message: No Response for this Request Inconsistent Agent Status Mismatched Community Strings DSM Policy Not Loaded Abrowser Does Not Open Abrowser Starts but Gives Error "Could not Connect to ORB" Abrowser Starts but Gives Error "No Configuration File Specified" Abrowser Starts but no Values can be Altered Abrowser Starts but Gives Error "Could Not Read in Configuration File

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

313

MCC Issues

The RMI Connection to xxx has been lost. Please restart the Management Command Center to access this namespace
Symptom While using the MCC this message pops up:

Solution

If this happens then generally a problem occurred communicating to your rmi_server.exe. To resolve this, close the MCC as well as any other instances accessing the MDB at that moment. Then, from a command prompt on the MDB box, do a camclose. Keep issuing a camclose until you get the response
D:\>camclose camclose: server closed. D:\>camclose camclose: select failed (15) Unable to connect to CAM server

To verify that CAM is done, check Task Manager to ensure that rmi_server.exe is not running. Now issue a cam start Note: If you have System Performance installed, a camclose will stop the services associated with SP. You will need to restart them in the services.msc CA Systems Performance Distribution Server CA Systems Performance Domain Server If the performance agents are on this box as well, you need to start them:
hpaagent start prfagent start

314

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

MCC Issues

The colors in the MCC are not propagating correctly up the LHP tree -orChanges are not being reflected in the MCC
Symptom The colors in the MCC are not being propagated correctly. For example, even though you know that an object is really down and has a status of down the color of that object is not correctly reflected. Alternately, if you create a new object, that object is not being displayed in the MCC. In general symptoms such as these are related to Sevprop. The first thing you should check is if you still have a valid connection to the catalog machine. Within the MCC do a File Connect and enter your Master Catalog machine name. If you dont get any errors then go to the next step. If you do get an error then go through the solution steps mentioned in the previous issue. To recycle sevprop, first close all instances of the MCC that are running. Then recycle CA WorldView Severity Propagation Service either through services.msc or via command prompt:
sevprop stop sevprop start

Solution

Note: Make sure that, after issuing the sevprop stop, the following processes actually stopped running: sevprop.exe sevpropcom.exe startbpv.exe If this did not resolve the problem then follow the solution described in the previous issue.

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

315

MCC Issues

When I select the Console Log plugin on the Right Hand Pane in the MCC, I get a message Cannot find EM Manager Cell xxx
Symptom While in the MCC Topology View, selecting the Console Log plugin in the right hand side results in the following error message: Cannot find EM Manager Cell <server name>. For example:

Solution

This can occur when DIA is installed in Non-DNS environments. The reason for this error message was that the EM-Server property of the selected WV object was not set properly in Full Qualified Domain Name (FQDN) format. For the MCC to locate the EM server via DIA, the EM-Server name has to be in FQDN format. The first system reference entry in etc\hosts file is taken to set the EM-Server property. To fix this add to the \etc\hosts file the FQDN and either try an ipconfig /flushdns or recycle the system. The EMserver property in WV will update automatically then and the plugin will work as such:

316

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

MCC Issues

Newly installed component icons dont show correctly


Symptom After installing a new component (e.g., URM, ASM) that includes a custom icon, when the MCC is opened, the new icon is not correctly shown next to the new components respective object. Cam needs to be recycled to allow it to load in the new icons. If cam is not recycled then the new component icons will look like the following:

Solution

To fix this, close the MCC and issue a camclose at the command line. Keep issuing a camclose till you get the response:
D:\>camclose camclose: server closed. D:\>camclose camclose: select failed (15) Unable to connect to CAM server

To make sure it came down, check Task Manager to verify that rmi_server.exe is not running. Now issue a cam start Note: If you have System Performance installed, a camclose will stop the services associated with SP. You will need to restart them in the services.msc CA Systems Performance Distribution Server CA Systems Performance Domain Server If the performance agents are on this box as well, you need to start them:
hpaagent start prfagent start

Cannot edit Message Records in MCC (Command execution denied)


Symptom When editing Message Records and Actions under a non authorized user you get the error message %CAOP_E_513 Command execution denied: not authorized by security To resolve, verify that the user that was specified when the Enterprise Management plugin was first opened in the MCC is included in the list of users authorized to issue commands (caugui settings, CA_OPR_AUTH_LIST). Alternatively you can set Check authorized users list for MSGRECORD (CA_OPR_DB_CHECK_AUTH) to NO and then restart the CA-Unicenter service to apply the changed settings.

Solution

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

317

MCC Issues

In the MCC left hand pane, there is no Alert plugin available or Console Log Plugin
Symptom I dont see the Alert drop down or the Console Log plugin available in the LHP of the MCC? AMS utilizes the DIA protocol for communication between the MCC and the AMS manager. If DIA has not been configured, the Alert drop down that should be available in the left hand drop down within the MCC will not be available and when the MCC is started you may receiving a DIA warning dialogue box. More information on DIA concepts can be found in the NSM r11 Implementation Guide Appendix A: DIA Reference.

Solution

No alerts are created


Symptom Solution No alerts are created. Do the following: 1. Check if the alert exists in the ams_lv1_alerts database table. If it does but cannot be seen in the MCC check DIA. If the alert is not in the database table go to step 2. 2. Confirm that the alert class being used is active and that the expiration date is valid. 3. Stop the AMS service and run the command to recreate the alert 4. Check that a file has been created in the following location
\Program Files\CA\SharedComponents\CCS\WVEM\amshold

If the file has been created, this confirms that EM has called the alert action so you can move onto the next step. If the file is not created check EM and the MRA. 5. Start the AMS service and then confirm that the file is removed from the AMS hold folder. To debug AMS run the cautrace command to start the trace GUI and run the caamssrv traceon command to put AMS in a debug mode.

318

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

MCC Issues

In the MCC Topology there is nothing under the WorldView object


Symptom When you go into the MCC Topology, there are no objects under WorldView (the first object in the LHP tree). For example:

Solution

This generally means you did not select WorldView Provider during the NSM Installation. The provider is needed so that the MCC can communicate with that component. Re-run the installation selecting only the WV Provider run through the setup. For example:

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

319

System Performance Issues

System Performance Issues


The following topics pertain to System Performance.

A performance object is shown as blue unknown state in nodeview


Symptom The performance object is reporting a "blue" unknown state in nodeview. While in NodeView you see the message: Agent:HpxAgent reports machine.domain.com, CA Cube Store Group,% of Allowed Cube Store Space Used, has changed state from ok to unknown with value 0.000. Solution This is a normal response after the performance agent first starts up because, at this point, the performance agent hasn't collect enough data yet so its cube size is in an unknown state. If cube data exists, another possible reason is the default profile wasnt updated to deliver the cube data to a machine. Therefore, the cube size is in an unknown state. If you have applied the default profile it takes around 20 minutes each polling cycle and it will deliver the profile to the machine you designate. Thats why it's blue in nodeview which means unknown. If you update the Default Profile so it delivers the cube and give it some time it will change to a known state.

I dont see the MIB values I associated to a specific class available in Trend
Symptom You ran through the Associate MIB wizard in the Performance Configuration GUI to a particular class. You saved it and then updated your profiles and delivered that profile. An ample amount of time has passed yet when you bring up Performance Trend none of the new MIB values you associated are available to report on

320

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Unicenter Configuration Manager

Solution

The agent will only collect SNMP data for itself if this functionality is switched on under the machine properties in Performance Configuration. To do this you simply have to Locate the machine in the network tree Right-click and select properties Select the SNMP Proxies tab Select the Collect SNMP Resources for this machine check box (see screenshot below) Redeliver the profile to the agent machine.

Unicenter Configuration Manager


The following topics pertain to the Unicenter Configuration Manager (UCM).

When I deliver a profile from UCM, the agent doesnt get the profile.
Symptom Solution A profile delivered from UCM is not received by an agent. On the target machine, check to make sure that the agtctrlcell is up and running. This cell is responsible for putting the configuration on the agent. If the agtctrlcell is running check to make sure that it is also registered with the DNA on that machine by using diatool to connect to the target host. If it is registered correctly it there should be a green check mark next to the agtctrlcell. For example:

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

321

General/Miscellaneous Issues

If the agent has an exclamation mark through it then that means its currently in a failed state. You can try stopping CA DIA 1.2 DNA then restarting it and see if that fixes the problem. If not then you need to reregister the agtctrlcell. Please refer to the next chapter for information on how to do this.

General/Miscellaneous Issues
The following section contains general symptoms and solutions that are not specific to a particular component or function.

Cannot connect to UBI/IIS on Windows 2003 (HTTP 404)


Symptom When Unicenter Browser Interface (UBI) services are installed on Windows 2003 with IIS clients cannot connect and receive HTTP error 404. You need to set All Unknown CGI Extensions to Allow. You can find that parameter by opening Computer Management, navigate to Internet Information Services Manager > Web Service Extensions.

Solution

IP Address returned by hostname doesnt match IP returned by DNS


Symptom Solution IP address returned by hostname doesn't match the IP returned by DNS. Please use 'hostname -i' command to verify. Inspect your /etc/nsswitch.conf and /etc/hosts files. Do the following: 1. Check /etc/resolv.conf should have a valid Domain name and a Name server cat /etc/resolv.conf domain abcco.com

322

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

General/Miscellaneous Issues

nameserver 101.200.20.303

2. Make sure /etc/nsswitch.conf has the entry"hosts: dns files" to get the hostname from DNS first: /etc/nsswitch.conf hosts: dns files

3. Make sure /etc/hosts has a valid IP address along with the box name. This is required in case resolving the host name fails from DNS.
cat /etc/hosts : localhost 138.42.147.12 machine1.domain.com machine1

4. Check Hostname (and check this against the entry in /etc/hosts) LINUX only: /etc/init.d/network status Check if the interfaces are active If at least one of the interface (eth* interface) is not UP then start the interface using
/etc/init.d/network start hostname i

host hostname (DNS is setup correctly if 7 and 8 matches) UNIX:


nslookup hostname ifconfig a (get the IP address)

check if the interface(s) is up IP address that's returned from 5 above should be for one of the interfaces which are UP.

Last Update: September 30, 2008

Chapter 3: Symptoms and Solutions

323

Chapter 4: Troubleshooting DIA


The information previously contained in this chapter can now be found in the DIA Supplemental Implementation Topics for NSM r11.1 and r11.2 which is available for download from the CA NSM Home page on the CA Support website.

Where Can I Find Additional Information on DIA?


Refer to the DIA chapter in the Unicenter NSM Implementation Guide.

Last Update: September 30, 2008

Chapter 4: Troubleshooting DIA

41

Chapter 5: Working with Support


If you have followed the guidelines provided in this document and are still unable to diagnose your CA NSM problem, you should contact CA Technical Support. This document identifies the procedures and guidelines for working with Technical Support. Information on Support offerings can be found at the following link: https://support.ca.com/irj/portal/anonymous/phpdocs?filePath=0/common/tso fferings.html

CA Technical Support Structure


When you contact Technical Support you will be asked to assign a severity level to your issue. Use the following list as a guideline in making this determination: Severity 1 Description A system down or product inoperative condition that is impacting your production system. This designation can also be used for product installation problems when they occur afterhours, but only for critical systems products that need to be maintained and upgraded in narrow processing windows because of possible impact on product systems. A suspected high-impact condition associated with the product. The software may operate but is severely restricted. A question about product performance, or an intermittent or lowimpact condition associated with the product. A majority of the software functions are still usable; however, some circumvention may be required to provide service. A question about product use or installation.

2 3

The Technical Support Organization at CA provides support 24 hours a day, 7 days a week for all Severity 1 technical support calls. Review the Technical Support Policy posted on Support web site for details regarding the service level objectives for each severity level.

Last Update: September 30, 2008

Chapter 5: Working with Support

51

CA Technical Support Structure

Telephone Support
Contact information for CA Technical support can be found on the http://support.ca.com site at the following link: https://support.ca.com/irj/portal/anonymous?NavigationTarget=navurl://036c ebd7fd20d8d7505abb95e5ff120f The telephone number listed connects you directly to the support center responsible for the product. During primary service hours, call this number to speak with a technician. If all technicians are busy at the time of your call, a receptionist will log and queue your call for a callback by a technician. The CA standard is to return all calls by the end of the same business day in priority sequence. If a call is received near the end of the day and you cannot be reached, your call will be returned the morning of the next business day. If you are calling with a severity 1 problem or need immediate assistance, you should always inform the receptionist so that a technician can be made available to take your call immediately. Important! If you are calling from outside of North America, use the direct numbers during primary service hours. Call (631) 342-4683 for severity 1 problems only, during emergency service hours. When contact Customer Support, have the following information available: The product name, version, operating system or platform, and general description of problem Your name and telephone number Your company site ID Any documentation that may help in resolving the problem Use the guidelines provided under the "Basic Troubleshooting chapter document to help identify the problem (for example, when it occurred, what changed in your environment, which machines are impacted, and under what circumstances the problem occurs).

Problem Tracking and Web Support


If you have access to the Internet and would prefer to manage your technical issues through the web, you can reach CA Technical Support directly from our home page (www.ca.com). In addition to opening and tracking technical support issues, the CA Support web site provides easy access to product technical documents, product bulletins, support forums and other online resources to address your issues and provide up to date product information. The CA web-based online support is available 24 hours a day, 7 days a week.

52

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

CA Technical Support Structure

Regardless of the route you use to contact Technical Support, your issue will be entered into the StarTrak program which is used to collect, record, disseminate, and track data related to client support requests worldwide. When you open up an issue with Technical Support, the tracking number you are given is the StarTrak number. Refer to this number for subsequent support contact to further speed your response.

CA Technical Support Organization


CA provides a multilevel approach to Technical Support that is proven to deliver timely response to all issues, 24 x 7 response to critical issues and the greatest client satisfaction. In general, Technical Support is composed of the following: Level 1 Technical Support ("Technology Consultants"). Level 1 technicians are the first employees you encounter when you request technical assistance for your CA software - either through our telephone support numbers or our online support tools. Trained in all aspects of the products, they handle the majority of questions and problems reported and, in most cases, provide immediate assistance. Level 2 Technical Support. This second tier includes programmers dedicated to problem diagnosis and resolution. These technicians specialize in specific areas of the code to provide successful usage techniques and any necessary program patches. All levels of the client support staff function as part of a larger unit of resources that is completely focused upon client satisfaction. Should a technician require additional assistance during any phase of any support call they will work with the larger team to access the right resource for the most effective and efficient service.

Escalation
If you need to escalate the severity level of your problem, request an escalation from the CA Support Center working on your problem. If you feel that your problem is not being adequately addressed, you can escalate your concerns by requesting to speak with the manager responsible for the technician assigned to your issue. If the issue is not assigned to a technician, you can request to speak with a manager.

Last Update: September 30, 2008

Chapter 5: Working with Support

53

Chapter 6: Tools for Troubleshooting


This chapter identifies several tools that you can use to monitor and troubleshoot your CA NSM implementation. It is divided into the following sections: Monitoring your environment. Includes both GUI and Report-based approaches. Verifying functionality. Identifies tools for verifying that managers, agents, databases and their associated services and communications are functional. Using debug mode and diagnostic trace. Includes trace syntax for all CA NSM components as well as the Common Services components.

Monitoring your Environment


Knowing how and what to monitor in your environment provides you with two clear advantages: Familiarity with how the various components of CA NSM 11.x behave when they are functioning properly. This is key to pinpointing the exact scope and nature of any problems when they are not functioning properly. Increased likelihood of catching a potentially big problem while it is still small. CA NSM r11.x has several built-in monitoring tools that you can use to monitor the health of CA NSM itself. GUI-based applications, such as the Real World Interface and the Event Console, provide real-time monitoring capabilities, while online or printed reports can provide a historical accounting of CA NSM r11.x activities.

CPU Bottlenecks
One common problem to watch for is CPU bottlenecks, which can cause processing delays and timeouts. You can identify potential CPU bottlenecks on the MS-SQL Server on which the MDB resides, by installing the MS-SQL Agent and monitoring the following: Processor object, % Processor Time counter System object, % Total Processor Time counter In general, if usage continuously exceeds 80%, a CPU bottleneck is likely.

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

61

Monitoring your Environment

Event Management
One of the most obvious ways to monitor your enterprise is by checking the Event Management console. It is recommended that you employ filtering techniques to minimize the message traffic and highlight messages that warrant action. Depending on whether DSM messages are sent to the Event Console, you should look for messages containing the words Critical or Down in them to identify items that have gone into Critical Status or devices that are down. Note: If you are using an Event Agent, keep in mind that, although the Event Agent machine has an Event DSB, it does not have a caioprdb database. Therefore, in order to review the DSB in effect for a particular Event Agent, you need to enter the following commands:
oprdb list dsb_filename oprdb script db > c:\temp\cautili.txt

These commands display the event database in cautil format.

WorldView
Although the classic change in the color of an object on Node View provides a visual and obvious indication of trouble with a managed object, you can also use the Real World Interface in less obvious ways. For example, careful monitoring of the number of objects managed by a single DSM machine alerts you to potential scalability issues when that number becomes excessive. WorldView provides several ways to determine the number of objects being managed by a particular DSM. Open DSM View (obrowser), select Query Option, and click Search. With no filtering selected, the number of matches the query returns equals the number of managed objects. Open Node View and click the microscope icon. Once again, the number of matches equals the number of managed objects. Run dsm_report to generate a report of managed objects. This command creates a CSV file (dsmrpt.csv) in the current directory

Agent Technology
Agent Technology includes several SNMP diagnostic utilities that you can use to test and clarify the details of your AT setup. These include: Awget Returns the value of a specific SNMP attribute. The syntax is as follows:

62

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Monitoring your Environment

awget [-h hostname|IP-address] [-c community] [-p port|service-name] [-t timeout] [-d loglevel] [-f logfile] -o oid

Awnext Returns the value of the next SNMP attribute from the one specified. The syntax is as follows:
awnext [-h hostname|IP-address] [-c community] [-p port|service-name] [-t timeout] [-d loglevel] [-f logfile] [-o] oid

Awtrap Can test a managers ability to process an agents traps without actually running the agent. The syntax is as follows:
awtrap [-f from] [-h destination] [-p port] [-c community] enterprise-type [subtype] [oid type value]+

Awwalk Retrieves the value of every instance of every attribute defined in the MIB from the specified OID through the last OID in the tree. It is the equivalent of repeated executions of awnext. The syntax is as follows:
awwalk [-h hostname|IP-address] [-c community] [-p port|service-name] [-o] oid

For more information about these commands, including additional syntax and examples, see the online CA Reference

Running Reports
Reports provide a view of activity in your enterprise over a period of time, enabling you to detect patterns that may indicate problems. CA NSM 11.x includes a variety of different reports that can be helpful in troubleshooting your enterprise. Job Management, for example, includes a utility that invokes a simulated autoscan process and produces a set of detailed reports that identify: Jobs that would be selected, executed late, or carried over to the next day (backlogged) Date and time that each job would be processed Location where each job would be processed Resources required for each job Amount of utilization for each report Whenever you define or update new jobs or jobsets in Job Management, you should run this report to ensure that the Job definitions you provided actually have the result that you intended.

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

63

Monitoring your Environment

In addition to standard reports, CA NSM r11.x includes a number of commands that can generate data files listing detailed information about settings and activities for a particular component or components. These include: cautenv dumpini (Windows NT/Windows 2000 Unicenter Manager) Enables you to display and dynamically modify the CA NSM EM Environment variables.

You can also direct output to a text file for future reference. On UNIX you can check the output of the env command for the user that starts Enterprise Management. Caiserv Creates a comprehensive file detailing your EM environment. This includes the CA NSM history files, logs, general system information (such as system variables) and specific component details (such as causec for Security). cauexpr.exe Copies Job Management definitions to a text file, cauexpr.txt, which you can then upload using cautil. Dsbulist Displays the Security Decision Support Binary (DSB) cache and files. dsm_report Writes a record of every object in the DSM store to a CSV file (dsmrpt.csv). The syntax is:
dsm_report a agentClass , agentClass.|-c |-@hostname| -v | -o filename| -h hostname

In verbose mode, dsm_report provides detailed information, which may be enough to eliminate the need for running storectrl.

64

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Monitoring your Environment

oprdb script db > c:temp\cautili.txt Gets a copy of all your message records and actions in cautil format. You can then use the output file to build other Event Management machines and load the same message records and actions there. oprdb list Lists the contents of your Event Management DSB as summary (not suitable for cautil) and can be redirected to a text file. Whohas Displays the policies defined for assets in Security Management Whathas Displays the policies defined for users IDs in Security Management For more syntax and explanation, see the online Administrator Guide, CA Reference, and CA Procedures.

Checking History and Log Files


History files and log files are the ideal places to check for changes in system/component behavior that may lead to problems with your CA NSM implementation. Most log files, including the Event Console, Discovery Logs, and WorldView setup logs, are in Install_Path/logs. The Event Console usually includes three log files for each day with the extensions .log, .ldx and .idx. Log file locations for Agent Technology are: \CA\SharedComponents\CCS\AT\AGENTS\LOG Contains separate log files for each installed agent. \CA\SharedComponents\CCS\AT\AGENTS\LOG\hpa Contains log files for the Historical Performance Agent. \CA\SharedComponents\CCS\AT\SERVICES\VAR\LOG Contains log files for the various Agent Technology CA Common Services components such as aws_sadmin.log and aws_orb.log. Log file locations for Common Services components are: \Program Files\CA\SharedComponents\CAM\logs Contains information, warning, and error messages generated by the CAM server, which provides inter-machine communications. The files are named: dg0##.

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

65

Verifying Functionality

The applyptf utility generates a history file which you can review to identify which patches were applied and when. This file, named machinename.his, can be found in the root of the install directory. It is in ASCII text format and, if needed, can be forwarded to Support for review as is. Consider the following sample entry from the history file:
[FRI Jan 10 14:27:21 2003] PTF Wizard installed LO91233 (TNGEM) RELEASE=2.4 GENLEVEL=GA COMPONENT=TNGCC PREREQS= MPREREQS= MDBQS= SUPERSEDE= INSTALLEDFILE= /uni/cci/bin/caiccid INSTALLEDFILE= /uni/cci/bin/cci INSTALLEDFILE= /uni/cci/bin/cciclnd INSTALLEDFILE= /uni/cci/bin/ccicntrl INSTALLEDFILE= /uni/cci/bin/ccirmtd INSTALLEDFILE= /uni/cci/bin/libcci.so INSTALLEDFILE= /uni/cci/bin/rmt INSTALLEDFILE= /uni/cci/bin/rmtcntrl

Line 1 indicates the patch number, along with the date and time it was applied. Lines 2 and 3 identify the release and genlevel of the patch. Line 4 identifies the target component - in this case CAICCI (TNGCC). Lines 5 through 7 identify any prerequisites, master image prerequisites and corequisites. Line 8 identifies any other patches that might be superseded by this patch. All remaining lines prefixed by INSTALLED FILE denote a file that has been replaced.

Verifying Functionality
Pulse check commands identify whether a component and its required services and machines are functional. The following sections identify such commands: Components and co-requisite components Databases Communications

66

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Verifying Functionality

Components
When checking on the functionality of a CA NSM component, like Job Management, you should also verify that any co-requisite components are also fully functioning. For example, Job Management uses CAICCI to manage crossplatform job scheduling. If CAICCI is not properly configured, this impacts Job Management. Related components and required services are: Primary Component Agent Technology Related Components WorldView Required Services Services Control Mgr. (awservices) DSB (aws_orb) SNMP Gateway (aws_snmp) trap mux (aws_listen) object store (aws_store) DSM (aws_dsm) WorldView DSM Gateway (aws_wvgate)

Event Management Performance Management

CAICCI (required) Calendar (required) Calendar (optional) Agent Technology MS-Excel (required for Performance Trend and Chargeback) Job Management (for batch jobs to generate charts) CAM, CAFT (required) CAIENF, CAISSF (required) CAICCI (required), CAFT (optional), CAIENF (required) Event Management (optional) Calendar (required for Job Manager but not for Job Agent) Agent Technology MDB, WorldView Gateway Service, DSM Gateway, Severity Propagation Service Domain Server (pmdmnsrvr) Distribution Server (pmdstrbsrvr)

Security Management Job Management

WorldView

Use the commands described next to verify the health of these components and services.

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

67

Verifying Functionality

Agent Technology and WorldView


CA NSM includes the following commands to verify Agent Technology and WorldView functionality: awservices orbctrl servicectrl storectrl wvgethosts See the online CA Reference for detailed syntax and usage. awservices The syntax is as follows:
awservices list

If awservices is stopped, you are simply told this. If it is running, you see a breakdown of the services that are running. To list the version of all binaries included in the awservices status command, execute the following command:
awservices version

This also lists the agent versions. orbctrl This command line utility lists the services and agents that have attached to the instance of the Distributed Services Bus running on the specified system. It is a useful debugging tool because it can be used to verify that the required services are running on a remote host. The syntax is:
orbctrl -@ servicesHost

Where servicesHost is the hostname or IP address of a remote node. servicectrl Run the servicectrl utility to manage the awservices configuration for Agent Technology. To list the operational status of an agent or service, run the following:
servicectrl status

The display indicates whether the service is running or stopped. Servicectrl can also start and stop remote agents with the following syntax:
Servicectrl stop -remote=[machinename] -name=caiW2kOs Servicectrl start -remote=[machinename] -name=caiW2kOs

68

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Verifying Functionality

The remote orb must be running for servicectrl to work. If awservices is not up, the servicectrl will fail. storectrl Run the storectrl command to display the information contained in any of the AT stores (asw_nsm, aws_sadmin and objstore). Consider the following example:

Below is an example of the type of information contained in the resulting temp1.txt file.

Note: These commands are case sensitive You can also get this information by running dsm_report in verbose mode. wvgethosts The wvgethosts command can extract a list of discovered hosts from the MDB and compare it to the DSM filter file. This can be useful for determining if a DSM is trying to manage a particular host. For example, if wvgethosts does not return the host machine you are trying to manage, then one of the following is probably true:

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

69

Verifying Functionality

The object is not classified properly in the MDB. The DSM does not try to manage objects that are unclassified. The discovered IP address is not in the range of that DSMs IP Address scoping. Syntax and Examples wvgethosts has the following arguments:
wvgethosts [-n DSMServer|ALL] [-o nsm/hosts/agents] [-c class] [-r repositoryName -u user -p passwd] [-d dbglvl] [-f logfile]

Note: If n is not specified, the local host is not selected. For example to get a list of all hosts managed by the DSM run the following:
wvgethosts -o hosts

Use the -o nsm switch to extract all agents instead of hosts. You can also list all hosts in the MDB with the DSM_server name instead of limiting the command to one DSM. The results of this command can be piped to a file to make it easier to view a long list. Because wvgethosts queries the MDB for objects, MS-SQL is required, unless the wvdbt option is implemented. Severity Propagation Service The Severity Propagation Service is a key component for several applications. Therefore, if this service fails, applications will likely be affected. Severity Propagation Service consists of sevprop.exe and sevpropcom.exe. The main function of sevprop is to administer the service. In prior releases, sevprop carried out the function of sevpropcom as well. To verify that the Severity Propagation Service is functioning correctly, do the following: 1. 2. 3. Verify that sevprop.exe and sevpropcom.exe binaries reside in \ca_appsw directory. Verify that Severity Propagation is part of Administration Group and has "Log on as a Batch Job" user rights. If the Severity Propagation service is active, ensure that sevpropcom.exe or sevpro~1.exe is running.

Enterprise Management
The following commands check the status of Enterprise Management and its components: Enterprise Management functions, common objects, and services:

610

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Verifying Functionality

unifstat

Another command, for UNIX only, does the same thing:


ustat

Security Management (Windows only):


causec status

Job Management, logic of jobsets:


cau9test

Calendar Management, list of active calendars:


caladmin -l

Common Services
This section details the commands for checking the following common services: CAM and CAFT CAICCI CAM and CAFT The Unicenter Explorer interface, uses CA Message Queuing (CAM) and CA File Transfer (CAFT). Use the following command to review the status in both directions. You should see at least the opposing machine under the host category. A high number of retries could indicate a potential problem.
camstat nodename

Use the following command to confirm communications between two machines.


camping n machinename

Use the s 8000 option to verify whether large packets can be sent. This can help indicate if UDP is causing the problem. Use the following command to verify that CAM is operating successfully and to determine if the cam.cfg file includes the forward 127.0.0.1 command.
camcheck

Look for collect_message_spec( 127.0.0.1 ) called in the output. If CAM detects a configuration file error during startup, the fact is logged and the configuration record ignored. The camcheck program performs a syntax check on the configuration file. Blank lines and lines starting with a # (hash, pound or number) character are ignored. The cam configuration file (cam.cfg) is not present by default; however, you can build a cam.cfg file by executing the following command:

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

611

Verifying Functionality

camsave persist

This builds a cam.cfg file in the following directory:


\Program Files\CA\SharedComponents\CAM

Note: See the online CA Reference for additional details about the cam.cfg file. CAICCI The commands for checking CAICCI are described below. On Windows use the following command to verify the status of CAICCI:
ccicntrl status

In particular, verify that the remote and transport services are running. To administer remote CAICCI services, enter the following command:
rmtcntrl

To identify the CAICCI version, enter the following command:


ccinet release

To see if CAICCI remote is running on UNIX, enter the following command:


ps -ef | grep ccirmtd

Look for a response with the path of ccirmtd; do not be fooled by seeing a response for your grep. If you see the line with ccirmtd, CAICCI is running. To find out what version of ccirmtd is running on UNIX, enter:
what $CAIGLBL0000/cci/bin/ccirmtd

To see if a Star server is running on UNIX, enter:


unifstat | grep Star

To identify which machines or applications the machine can talk to, enter this command:
ccii

For consoles, ccii needs to list the UNIX machine with the application CA_STARUNIX_SERVER. If this entry is not present in the output of the ccii command, the star console cannot connect to the UNIX machine. In addition to netstat, ping and nslookup, you can use the following commands to troubleshoot remote CAICCI connections: traceroute (UNIX) or tracert (Windows) to identify the route taken between two hosts. If the client cannot ping a host, this command can help identify where, along the network path, the failure is occurring. Note: You can also use the tracert command to identify the number of router hops between machines. It is recommended that your agents be located close to their DSMs. In general, they should be no more than two to three hops away.

612

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Verifying Functionality

ccinet to pass commands to the ccirmtd daemon on UNIX (rmtcntrl on Windows NT). For example, ccinet ping can be used to send a special CAICCI test packet across the CAICCI connection, whereas ccinet status can be used to identify the status of the CAICCI connections. netstat -a to list all the network connections on the local box. Run it at the Command Prompt. If the command takes a long time to return any information, while the command netstat a completes quickly, then the system is having a problem resolving host names.

Databases
The MDB is critical to the functionality of your CA NSM implementation. On UNIX, you can use the caidbck program to get detailed statistics about a particular database. This program lists all the tablespaces and advises on capacity status, indicating which tablespaces are at or over 100%. If the values are high, run the schrecvr utility to clean up the database. It unloads, reloads, and clears logical errors. To identify which version of the MDB is being used, enter the following command: For UNIX:
RUN sql script \tnd\sql\INGVERSION.ING

For Windows:
isql U tngsa Q select string from TNGD.dbo.tng_class_ext where name=class_version and class_id=1

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

613

Verifying Functionality

Communications and Networks


In addition to verifying the health of each component, its related components and services, and its database, you need to ensure that communication across the nodes and through your network is unhindered. In fact, this is frequently the first area checked during troubleshooting because it is where many problems occur. Following are several commands and utilities you can use to verify that your components can talk to each other as needed. To diagnose connectivity between an agent node and manager, enter the following command for the node in question:
ping hostname|IP-address

To verify that DNS lookup is functioning correctly, ping the agent node and the manager node. In order to deliver traps from the agent node to the manager node, the communications path between agent and manager must be open. If your agent node configuration (aws_sadmin.cfg) file specifies manager nodes by hostname, use those hostnames in the ping. Enter the following command from both the agent and the manager:
ping a IP-address

Verify that the correct hostname is returned. The rping remote ping command is another useful tool that can be executed from a local manager machine to verify connectivity between another manager and the machines connected to it. You can also use the Agent Technology Remote Ping GUI to perform this function. If you are unable to ping the target server using its name, then you need to resolve the name resolution issue. If you are able to ping the target servers, the next step is to run the oprping command, which is similar to ping but uses CAICCI. The format of the oprping command is:
Oprping target-server number-of-pings test-message

To verify that you have SNMP communication to a specific machine use ObjectView to browse the agent MIB manually through port 6665. ObjectView can be accessed in context or by executing the objview command. If you are unable to establish SNMP communication, the problem may be an incorrect SNMP agent configuration or other network issues, such as firewall use, or even a security policy (for example, the SNMP device has been restricted to respond only to requests issued from specific IP addresses). On Windows to verify CAICCI connectivity to a specific node, use the following command:
u0verify -d=nodename

If UNIAPP.MAP file is missing an entry, this may appear as the first line of the error.

614

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Using Debug Mode and Diagnostic Trace

To test communications in Agent Technology, use the awm_config and awm_catch commands together. They let you manually push and catch messages on the Distributed Services Bus (DSB). The command awm_config pushes messages onto the DSB. The syntax is as follows:
awm_config -@ remote-node

The command awm_catch lets you display or redirect DSB messages to a file. awm_catch does not interfere with the normal delivery of the target message; the messages continue to their original destination. The syntax is as follows:
awm_catch -@ orbhostname message-type message-key

For example, the following command waits for poll event ICMP (ping) messages on the DSM at the node named OTHERHOST:
awm_catch -@ OTHERHOST POLL_EVENT ICMP

The following command waits for all SNMP poll responses containing the string mynode.cai.com:
awm_catch -F mynode.cai.com POLL_EVENT SNMP

Both commands can be run interactively or in batch. For more information about these commands and utilities, including examples and additional syntax options, see the online CA Reference.

Using Debug Mode and Diagnostic Trace


The debugging procedures and options are managed by environment variables. Most of these variables are set in the aws_dsm.ini file (for permanent changes) or through the aws_config command (for temporary changes). Syntax for running diagnostic traces follows; however, diagnostic traces should be run only at the request of Technical Support.

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

615

Using Debug Mode and Diagnostic Trace

Modifying Log Files Permanently


Before turning on diagnostic tracing, check your log file parameters and adjust them accordingly. Modify the following variables: Set CAI_AT_LOGWRAP to YES to wrap the log files. Set AW_MAX_LOGSIZE_K to the maximum size of the log file in KB (for example, a value of 30000 limits the log file to 30,000 KB). Set CAI_AT_LOGMAX to indicate the maximum number of log files created (for example, a value of 5 limits the number of log files to 5). The CAI_AT_LOGWRAP variable is ignored. By default, the LogMode is set to WrapAround, therefore the aws_dsm.log file writes 4 MB of data and wraps the log files. If you change the LogMode to Backup, the current file will be named aws_dsm.log and, once the maximum size is reached, the file will be renamed to aws_dsm.log.n. Specify the number of logs retained with the AW_LOG_NUM variable. You can specify more options in the aws_dsm.ini file:

Pertinent values include: PerObjectLogFilesIf set to 1, creates separate log file for each managed object. BreakOnBreakpointIf set to 1, aws_dsm breaks for debugging where AWDM_ASM_INT_3 is included. ResetOnAddMoIf set to 1, DSM resets managed object tree on AddMO call. LogFileSizeMaximumSets Max log file size in KB. The default is 4096. LogLevelSets debug log level. LogActiveSet to 1 to turn logging on. LogModeSelect WrapAround , Backup, or Limited. The default is WrapAround.

616

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Using Debug Mode and Diagnostic Trace

Modifying Log Files Temporarily


You can make temporary changes using the awm_config command. The syntax is as follows:
awm_config -s LOG_CONFIG LOG:service:parm SET:5

Where: service Indicates the name of the Agent Technology Service you want to configure (for example aws_sadmin or orbctrl). If you are not sure of the correct service name, use the orbctrl command to view the services. Specifies: MODESet to 0 to specify log wrapping. NumSet to an integer to specify the number of log files (for example 5). SIZESet to an integer to specify the size of the log file, in KB. LevelTo indicate the debug level (usually 4). n Specifies the level of logging required, ranging from 1 (for very high level errors only) to 9 (for extremely verbose logging). A log level of 4 provides a good middle level.

parm

For example, to configure logging for the SNMP Gateway service (aws_snmp) on the local host, issue the following command:
Awm_config -s LOG_CONFIG LOG:aws_snmp:Level SET:4

Enterprise Management Tracing


This section contains procedures for running a diagnostic trace for the following Enterprise Management components: Calendar Management Event Management Job Management You can run either the cautrace or unitrace command to start diagnostic tracing. Although the cautrace command is more user friendly (for example, you can pause the cautrace command), the trace file will contain the same information.

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

617

Using Debug Mode and Diagnostic Trace

Calendar Management Tracing


Windows To run a diagnostic trace for Calendar Management on Windows, follow this procedure. 1. From the Command Prompt, enter:
unicntrl stop cal

2. Display the configuration settings by entering:


caugui settings

3. Click the Client Preferences tab at the right. 4. On the Calendar Management tab at the bottom, set Calendar Trace to Y. 5. From the same Command Prompt as in Step 1, enter:
start unitrace unicntrl start cal

6. After the problem has been reproduced, enter:


unitrace ?

7. Send the following trace output file to Computer Associates Technical Support: unitrace.001 UNIX To run a diagnostic trace for Calendar on UNIX, follow this procedure. 1. Perform the following from a UNIX shell:
CA_CAIDEBUG=1 ;export CA_CAIDEBUG CAICAL0000=1 ;export CAICAL0000 unishutdown cal script cal.out unistart cal common debug option cal debug option save screen output to file cal.out

2. Run test to repeat the error, and then issue the following:
exit unset CA_CAIDEBUG unset CAICAL0000 unicycle cal close cal.out

turn off tracing

3. Send the following trace output file to Technical Support: cal.out

Event Management Tracing


Windows To run a diagnostic trace for Event Management on Windows, follow this procedure. 1. From the Command Prompt, enter:

618

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Using Debug Mode and Diagnostic Trace

unicntrl stop opr

2. Display the configuration settings by entering:


caugui settings

3. Click the Client Preferences tab at the right. 4. Select the Event Management tab at the bottom and enter 2 for OPR Trace: 0-2. 5. Select the Diagnostic Trace tab at the bottom and set: Router Trace to ON GUI Trace to ON Trace: 0-2 to 2 Common debug to Y 6. From the same Command Prompt as in Step 1, enter:
start unitrace unicntrl start opr

7. After the problem has been reproduced, enter:


unitrace ?

8. Send the following files to Computer Associates Technical Support: unitrace.001 (trace output) Logs for that day (for example, for Oct 15, 2006 include: CA\SharedComponents\CCS\WVEM\Logs\ 20061015.IDX 20061015.LDX 20061015.LOG UNIX To run a diagnostic trace for Event Management on UNIX, follow this procedure. 1. Perform the following from a UNIX shell:
export CA_CAIDEBUG=1 unishutdown opr script caiopr.out unistart opr common debug option save screen output to file caiopr.out

2. Run test to repeat the error, and then issue the following:
exit unset CA_CAIDEBUG unicycle opr close caiopr.out turn off tracing

3. Send the following files to Technical Support: caiopr.out (trace output) Console logs for that day from directory $CAIGLBL0000/opr/logs:

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

619

Using Debug Mode and Diagnostic Trace

opano.yyyymmdd opldx.yyyymmdd oplog.yyyymmdd opndx.yyyymmdd

Job Management Tracing


Windows To run a diagnostic trace for Job Management on Windows, follow this procedure. 1. From the Command Prompt, enter:
unicntrl stop sch

2. Display the configuration settings by entering:


caugui settings

3. Click the Client Preferences tab at the right. 4. Select the Job Workload Management tab at the bottom and set: Full Trace to Y Common Debug to Y 5. From the same Command Prompt as in Step 1, enter:
start unitrace unicntrl start sch

6. After the problem has been reproduced, enter:


unitrace ?

7. Send the following file to Technical Support: unitrace.001 UNIX To run a diagnostic trace for Job Management on UNIX, follow this procedure. 1. If you can reproduce the problem, gather the following trace information:
# # # # # unishutdown sche LEVEL2TRC=y;export LEVEL2TRC LEVEL2TRK=y;export LEVEL2TRK LEVEL2MTR=y;export LEVEL2MTR unistart sche

2. Get all files in the $CAISCHD0006 directory, unset the traces variables, and recycle Job Management.

620

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Using Debug Mode and Diagnostic Trace

Agent Technology and WorldView Tracing


To start a diagnostic trace for Agent Technology, run the following commands:
awm_config -s LOG_CONFIG LOG:aws_sadmin:Level SET:4 awm_config -s LOG_CONFIG LOG:objectStore:Level SET:4 awm_config -s LOG_CONFIG LOG:aws_snmp:Level SET:4 awm_config -s LOG_CONFIG LOG:AwNsm@%COMPUTERNAME%:Level SET:4 awm_config -s LOG_CONFIG LOG:aws_wvgate@%COMPUTERNAME%:Level SET:4 awm_config -s LOG_CONFIG LOG:orbctrl:Level SET:4

Tracing the JAVA GUI (On Server Side) and EM Services


This procedure shows how to trace the Unicenter Browser web interface. 1. Make sure the following symbols are set as system environmental variables: Set CA_ROUTER_DEBUG to ON. Set EM_JAVATRACE to ON. 2. If you changed the values, reboot so that the variables can take effect. This is necessary because some of the processes that check the variables run as system processes. 3. Run CauTrace.exe. 4. Go into the Services Dialog and shut down CA-Web Interface Service and then CA-Unicenter. 5. Restart CA-Unicenter followed by CA-Web Interface Service. You should see startup messages for CAEMRTS, CAEMRTA, W2Tree, and EMSERVER.

Debugging Severity Propagation Service


1. To debug the Severity Propagation Service, run the following commands:
sevprop stop sevprop remove

2. Run the following command:


sevprop install log [/D:MSSQLServer] /R:Repository name

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

621

Using Debug Mode and Diagnostic Trace

The /D:MSSQLServer parameter only needs to be specified if MS-SQL Server is on the same machine. This creates a sevprop_SCM log under the \CA_APPSW directory and a sevpropcom_trace.log under the C: drive root directory.

Common Services Tracing


CAICCI Tracing
To start diagnostic tracing for CAICCI, run the following commands:
ccicntrl stop rmt cautenv setlocal CA_CCITRACE 2 cautenv setlocal CA_CAIDEBUG Y cautenv setlocal caigui0000 ON cautenv setlocal caigui0001 cautrace (sets up the router trace) (sets up the GUI trace) (creates .trc files in the %TEMP directory)

As on Windows, trace the library and the daemon processes. The environment variable, CAI_CCI_DEBUG, enables tracing when set. After a process starts, its environment is set, so you need to recycle the process to set the environment variable in its process space. Note: You should carefully monitor the amount of time the CAICCI trace is running as it may generate a large amount of data, potentially filling up the file system if left to run for too long. Library and Daemon Tracing On UNIX, do the following to trace CAICCI (library and daemon tracing): 1. Perform the following from a UNIX shell:
# # # # # script /tmp/trace.cci.script unishutdown all CAI_CCI_DEBUG=y;export CAI_CCI_DEBUG rm $CAIGLBL0000/cci/logs/* unistart all (Or the problem application where applicable).

2. Run test to repeat the error. 3. Enter the following:


# # # # # ccinet show cci semashow ccinet status unishutdown all cautil select conlog list conlog > /tmp/conlog.cci.out

622

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Using Debug Mode and Diagnostic Trace

# exit

4. Send the following documentation to Technical Support: /tmp/trace.cci.script /tmp/conlog.cci.out $CAIGLBL0000/cci/logs directory Remote Daemon Tracing On UNIX, do the following to trace CAICCI remote daemon: 1. Perform the following from the UNIX shell:
# script/tmp/trace.cci.script # ccinet debugon

2. Reproduce the problem 3. Enter the following:


# # # # # # ccinet show cci semashow ccinet status ccinet debugoff cautil select conlog list conlog > /tmp/conlog.cci.out exit

4. Send the following files to Technical Support: /tmp/trace.cci.script /tmp/conlog.cci.out ccirmtd.prf $CAIGLBL0000/cci/logs directory Application and Daemon Tracing To trace CAICCI daemons and applications on UNIX, do the following: 1. Enter the following from a UNIX shell:
# # # # unishutdown all CAI_CCI_DEBUG=y rm $CAIGLBL0000/cci/logs/* unistart all

(make it easier to get current doc) (start just the problem app here)

2. Recreate the problem. 3. Enter the following:


# issue ccinet show # unishutdown all (provide extra data)

4. To unset the trace variable and restart the product, do the following:

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

623

Using Debug Mode and Diagnostic Trace

# unset CAI_CCI_DEBUG # unistart all

The trace files are located in $CAIGLBL0000/cci/logs/ directory. They include separate files for the CAICCI daemon processes and files in the format ccistub_pid.log, which are trace files generated by the applications calling into the CAICCI library.

CAM
To start tracing for a local CAM (CA Message Queue), enter the following:
camconfig trace=all

To start CAM for another machine, enter the following:


camconfig machineb trace=all

Dynamic Tracing
aws_sadmin To activate dynamic tracing for aws_sadmin, do the following: 1. Create a batch file that includes the following (nodename identifies the DSM machine):
rem get the current log setting awm_config -c 1 -s LOG_CONFIG LOG:AwNsm@nodename:Level GET LOG_RESULT "" rem change it to 4 if "%1" == "" goto end awm_config -c 1 -s LOG_CONFIG LOG:AwNsm@nodename:Level SET:%1 LOG_RESULT "" :end

Note: The first executable statement displays the current debug level and the second one sets the debug level to the required value. Set this to 4. To turn on the debug option for aws_wvgate dynamically, change AwNsm@nodename to aws_wvgate (in other words, awm_catch -c 1 -s LOG_CONFIG LOG:aws_wvgate:Level SET:%1) 2. Restart awservices. Do not start aws_dsm or aws_wvgate in debug mode because this may generate an extremely large debug file. 3. If aws_dsm is not performing any FSM event, turn dynamic tracing on by executing the following (batchfilename identifies the batch file you created in Step 1):
batchfilename 4

4. Let the file run for a few minutes to ensure that the debug data is logged in the \AT\SERVICES\VAR\LOG\aws_dsm.log file

624

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

Using Debug Mode and Diagnostic Trace

Note: You may need to increase the size of the log file. Be advised that GET LOG RESULT may not always return a response and may cause the request to hang. CAICCI To perform dynamic tracing for CAICCI, use the ccir and ccis commands: To send from Windows to Unix: 1. Enter the following on the UNIX machine:
-cd $CAIGLBL0000/cci/bin ./ccir

2. Press Enter. The machine waits for a response. 3. Enter the following on the Windows machine:
dos> ccis unixmachine 3

This should send 3 test messages to the UNIX machine. To send from UNIX to Windows: 1. Enter the following on the Windows machine:
dos> ccir

2. Press Enter. The machine waits for a response. 3. Enter the following on the UNIX machine:
-cd $CAIGLBL0000/cci/bin -./ccis ntmachine 3

This should send 3 test messages to the Windows machine.

Circular Trace
To run the circular unitrace: 1. Set the environment variable UNITRACE_CIRC_SIZE to a number of bytes. Use a large enough number so that the data will not be overwritten if the trace keeps running after the problem occurs. For example, set UNITRACE_CIRC_SIZE=15000000 (~15MB). The default size is 1MB. Note: Five files are created, so you need five times the amount of space specified by UNITRACE_CIRC_SIZE. 2. Start the tracing by running the following command:

Last Update: September 30, 2008

Chapter 6: Tools for Troubleshooting

625

Using Debug Mode and Diagnostic Trace

start unitrace -c

This starts the unitrace in a circular trace. Five files are created: unitrace.001 through unitrace.005. Although each file is the size specified by the UNITRACE_CIRC_SIZE value, only unitrace.002 through unitrace.005 are overwritten so that startup information is not lost.

626

Unicenter NSM Diagnostics Guide

Last Update: September 30, 2008

You might also like