You are on page 1of 45

1.

Using NETDIAG
This document describes how to use NETDIAG to troubleshoot network connection problems. It is intended to be used by Intellutions Technical Support Engineers and Application Engineers to help troubleshoot customer network problems. NETDIAG is a network diagnostic program that is shipped with iFIX. It provides information on the connection status of all incoming and outgoing connections to a local node. NETDIAG shows information for either TCP/IP or NetBIOS connections. This document assumes iFIX is configured for TCP/IP networking; however, information for NetBIOS networking is provided where necessary. Customers can use NETDIAG to examine the status of their network connections or when they need detailed information regarding a network connection. Generally, the customer uses NETDIAG after checking their Network Status Display in the WorkSpace. The Network Status Display shows at a glance the status of all network connections. If a connection does not appear in the Network Status Display, or if the Network Status Display indicates an error, the customer will usually call Technical Support. After requesting that the customer examine a number of items provided in this document, you can then request that they run a NETDIAG dump, and fax, mail, or FTP the file to you. The data in a dump file represents a snapshot of all information about all connections at a particular date and time. This is the data that the customers NETDIAG program has captured, plus some additional information. This information helps you understand what was happening at the time that the problem occurred.

Running NETDIAG
You can run NETDIAG in either of the following ways: [1] [1] Double-click NETDIAG.EXE file in the FIX Dynamics folder. Select Run from the Start menu, type NETDIAG, and press the Enter key.

NOTE: iFIX must be running before you run NETDIAG. The following screen appears.

Using the NETDIAG Utility

The NETDIAG user interface consists of a series of tabs. Each tab contains fields that provide information about the status of your network connections. The contents of each tab are described in greater detail in the Using NETDIAG Tabs section. The Close button closes the NETDIAG interface and the Dump button creates a dump file that contains information about all connections at a particular date and time. These buttons are available on each tab.

Acronym Table
The following table contains a list of the acronyms used throughout this document.

Acronym LNT LCT NNT

Meaning Logical Name Table Logical Connection Table Network Name Table

Using the NETDIAG Utility

Acronym SKT NBT LAN CM NCB NTSK SCU LANA LSN NAC NAM

Meaning Socket Table NetBIOS Table Local Area Network Connection management Network Control Block Network task System Configuration Utility LAN Adapter Local Session Number Network Alarm Client Network Alarm Manager

Overview of the Network Architecture


To understand the information in NETDIAG, you need to understand the network architecture. This section describes the tables that make up the network architecture within one node. iFIX represents nodes on a network using the Logical Name Table (LNT) and the Network Name Table (NNT). The NNT records represent actual nodes on the network while the LNT records represent virtual nodes, which are groups of one or more actual nodes that can be treated as one node for the purpose of data and alarm communications. Socket (SKT) table records represent network cards in your machine. The Logical Connection Table (LCT) records provide a way to group one or more connections to the same remote node and free upper layers from details at the SKT table layer.

Using the NETDIAG Utility

The following figures demonstrate the simplest and the most complex usage of the table records to model your network. Typically, your application will not use the full capabilities of the design since most applications do not use backup SCADAs and multiple LANs.

Simple Network Configuration


Figure 1 shows a simple configuration between a View client and a SCADA server.
View Node

V1

S1

SCADA Node

Figure 1-1: Simple View to SCADA Network Configuration Figure 1-2 shows the information in the tables on View client V1, and how associated records are connected in different tables.

Using the NETDIAG Utility

To Applications

Logical Name Table Header

Record 0 (Local Node) (V1)

Record 1 (S1)

Record 199

Network Name Table Header

Record 0 (Local Node) (V1)

Record 1 (S1)

Record 399

Logical Connection Table Header

Record 0 (S1)

Record 1

Record 399

Socket Table Header

Record 0 (S1)

Record 1

Record 399

To Remote Nodes

Figure 1-2: Simple View to SCADA Table Configuration The tables shown in Figure 1-2 are filled in based on the outgoing connections configured in the SCU as well as any dynamic connections established at run time. On a SCADA server you may have incoming connections which are also put into the table. At any point in time the set of tables will indicate which nodes you have connections with, as well as the status of those connections. The tables are all a fixed size and are allocated when iFIX starts up.

Using the NETDIAG Utility

The simple and most common example illustrated in Figure 1-1 show that normally there is a one-to-one correspondence of table records. LNT and NNT record 0 are always reserved for the local node even if the local node is not a SCADA.

Network Configuration using Redundancy


Figures 1-3 and 1-4 show that there can be multiple physical nodes that make up a logical node, and that there can be multiple physical network connections corresponding to a logical connection. It shows that the applications are concerned with only the logical node names and none of the underlying details. The network task determines the correct remote SCADA to send to, and which network path to use.

Logical name: VS Node Name: VS1

Logical name: VS Node Name: VS2

View SCADA

Backup SCADA and LAN Redundancy enabled

View SCADA

Figure 1-3: Network Configuration using Backup SCADA and LAN Redundancy

Using the NETDIAG Utility

Figure 4 shows the information in the tables on View client VS1, and how the associated records are connected in the tables.

To Applications

Logical Name Table Header

Record 0 (Local Node) (VS)

Record 1

Record 199

Network Name Table Header

Record 0 (Local Node) (VS1)

Record 1 (VS2)

Record 399

Logical Connection Table Header

Record 0 (VS2)

Record 1

Record 399

Socket Table Header

Record 0 (VS2) Network Card 1

Record 1 (VS2-R) Network Card 2

Record 399

To Remote Nodes

Figure 1-4: Backup SCADA and LAN Redundancy Table Configuration

Using the NETDIAG Utility

10

LNT and NNT Records


The LNT records group NNT records into one logical node. The primary, backup, and active indexes into the NNT identify grouped NNT records. The LNT record also contains counters for the number of failovers that have happened between its grouped nodes. It also counts manual failovers separately from automatic failovers. The NNT record models an actual physical node. It contains information such as the version of iFIX or FIX32 software running on the remote node, and who is logged into NT on that remote node. It also records the Windows NT machine name of the remote node which may be different from the iFIX node name. The NNT record stores the maximum packet size that can be sent to the remote node, and whether the remote node is using LAN Redundancy and Backup SCADAs. Within the LNT and NNT records, all names are unique.

LCT and SKT Records


The LCT records group SKT records into one logical connection. The primary, backup, and active indexes into the SKT identify grouped SKT records. The LCT record also contains counters of the number of failovers that have happened among its grouped nodes, and indicates if LAN failover is disabled. The Mgmt Thread ID and Mgmt Thread Handle fields indicate if a separate thread has been created in the network task to perform the LAN Redundancy logic. LCT records have unique names. NOTE: Although the Socket Table and Socket Records are two separate tabs in NETDIAG, they are one table in the architecture.

Types of Fields in Records


Each record contains fields such as timers, counters and indexes. Timers keep track of information such as how long a connection has been idle, and how long messages have been outstanding. Counters are used by the Network task to track events on a connection, such as number of messages sent and number of errors. Indexes are pointers between tables that associate LNT records with SKT records.

11

Using the NETDIAG Utility

Using NETDIAG Tabs


The following list provides a brief description of the information found on each NETDIAG tab. Each tab is described in greater detail in the following sections. Lnt Tbl Shows the connection status of logical nodes that have incoming or outgoing connections to the local node. Use this tab as the starting point as you troubleshoot network problems. Nnt TblShows the status of connections to all remote physical nodes. Use this tab to check the status of each node in a connection. LCT TblShows the status of logical connections to remote nodes. Use this tab when LAN Redundancy is enabled. This tab exists only in iFIX. Socket TblSummarizes the information in all the socket records and connections. Provides information about the available network addresses for iFIX networking. Socket RecsShows the incoming and outgoing connections to the local node. NBT TblSelect this tab when using NetBIOS. It provides information regarding the available LANA numbers for iFIX networking. NBT RecsSelect this tab with using NetBIOS. It shows the nodes that have incoming or outgoing connections to the local node. TraceSelect this tab to log the activity in NBTASK or TCPTASK. Net Recv QueuesDisplays the counts of the process-to-process queues used for all network transactions coming into the local node. Test InterfaceReserved for future use. For more information about these tables and records, refer to the Overview of the Network Architecture section of this document.

Using the NETDIAG Utility

12

Lnt Tbl (Logical Name Table)


The Lnt Tbl tab, shown in Figure 1, shows the LNT records in use. These records represent logical nodes.

Figure 1-5: Lnt Tbl Tab The list box on this tab shows the logical nodes that have incoming or outgoing connections to the local node, and their connection status. The logical name is the name assigned to a pair of redundant SCADAs, and the physical name refers to the name of each node. Use this tab as the starting point to troubleshoot your network problems because it shows the status of all sessions. The local node is always the first node listed. All other connections that follow are incoming or outgoing connections to the local node. NETDIAG provides two states to identify the status of a connection: OK and an error number. OK indicates that the connection is good. An error number indicates a problem with the connection. Refer to Table 1-9 for a list of possible error codes and the corresponding error message and description.

13

Using the NETDIAG Utility

Table 1-1 describes the fields on the Lnt Tbl tab that contain troubleshooting information. Table 1-1: Lnt Tbl Fields Section LNT_TBL Header Field Name Num In Use Displays... The number of slots in the table that are currently in use. This value indicates if you have available resources to make new connections. If the value in the Num In Use field is less than the value in the Max Entries field, you have available resources to make new connections. My Name Peak In Use The physical name of the local node. The maximum number of records in use at one time. If nodes have disconnected, then the value in this field would be greater than the value in the Num In Use field. A non-zero value if you are using SCADA failover and you have had a SCADA failover on any of your connections. Whether or not there have been any SCADA failovers for a particular connection. The total number of SCADA failovers for this connection. How many SCADA failovers were manually initiated.

Any Failovers

LNT_REC Details

Failover Latched

Num Failovers

Num Failover Manual

Using the NETDIAG Utility

14

Table 1-1: Lnt Tbl Fields (continued) Section LNT_REC Details (continued) Field Name Primary Index Backup Index Active Index Displays... Indexes into the NNT that indicate which physical node names are primary and backup, and which of these two are active. [TBD - describe how to use the values in these fields to determine this info] Good In Good Out The total number of good incoming and outgoing connections. This information is helpful to separate incoming from outgoing connections. The total number of connections that are up (good) or down (bad). This information is helpful when the number of incoming and outgoing connections exceed the length of the list box.

Num Up Num Dn

Nnt Tbl (Network Name Table)


The Nnt Tbl tab, shown in Figure 1-6 shows the list of NNT records in use. These records represent connections to remote physical nodes.

15

Using the NETDIAG Utility

Figure 1-6: Nnt Tbl Tab The list box on this tab shows the status of connections to all remote physical nodes. This tab is useful in checking the status of each node in a connection. It provides additional status information not available on the LNT tab such as the status of the upper and lower level network connections, and the connection types. If you are using NetBIOS, you would see the status of both the networking and alarming connections because NetBIOS uses two separate connections for data and alarming. Timers and Counters in the NNT Records The Send Tmo, Recv Tmo, and Idle Timer Cfg fields are timers that match the network timers configured in the SCU. The Discon Timer Curr and Discon Timer Cfg fields are timers that are used to determine inactivity in a connection so that it can be closed and cleaned up if the inactivity timer is enabled. You can configure these timers from the Network Advanced tab in the SCU.

Using the NETDIAG Utility

16

The incoming and outgoing transactions are counted separately as data, alarm, and Connection Management (CM) transactions. This design allows you to determine inactivity, since inactivity applies only to data and not alarms or CM messages. The NNT record has a pointer to its parent LNT record through both the Logical LNT ref and the Samename LNT ref pointers. These pointers are indexes into the LNT. Table 1-2 describes the fields on the Nnt Tbl tab that contain troubleshooting information. Table 1-2: Nnt Tbl Fields Section NNT_TBL Header Field Name In Use Displays... The number of records in use. The value in this field indicates if you have available resources to make a new connection. If the value in the In Use field is less than the value in the Max Entries field, you have available resources to make new connections. Max Entries Peak In Use The number of records in the table. The maximum number of records in use at one time. If nodes have disconnected, then the value in this field would be greater than the value in the In Use field.

17

Using the NETDIAG Utility

Table 1-2: Nnt Tbl Fields (continued) Section NNT_REC Stats Field Name In Use Displays... The number of NNT records in use. The value in this field is non-zero if the record is currently in use. The physical node name of the remote node. The index of the associated record inthe SKT or NBT table. The Socket ID if you are using TCP/IP or Local Session Number (LSN) if you are using NETBIOS A non-zero value if a socket is established between the two nodes. A non-zero value if CONMGR has successfully sent a transaction. Shows incoming and outgoing transactions for each of these packet types. This information breaks down the transactions to indicate what types of messages are being sent. The version of FIX32 or iFIX at the other end of the connection. The name of the user logged into NT at the remote machine at the time the connection was established. A non-zero value if the connection was ever fully established. The index of the associated record in the LCT table. [explain index info]

Name

Con Index

Con Handle

Status

cm state

Alm/Data/Conmgr

Ver Major and Ver Minor

User Name

Conn Ever Ok

Lct Index

Using the NETDIAG Utility

18

Table 1-2: Nnt Tbl Fields (continued) Section NNT_REC Stats (continued) Field Name Max Pkt Size Displays... The maximum packet size being used in transactions between the two computers. The value in this field should be 16384 for iFIX to iFIX connections, and 1400 for iFIX to FIX32 connections. The partner SCADA of the remote node. This field is empty if the partners SCADA is not configured on the remote node. The NT machine name of the remote node. This name may be different from the iFIX node name. The value that is configured in the SCU for these fields. You can configure timeouts on a per connection basis in the SCU. The direction of the connection. Outgoing connections are DYN or SCU, and incoming are server and alarm. This information helps you know if both the data and alarming connections are being established over a single socket. Possible values for this field are 2, 4, 8, or 10. Since connections can be incoming, outgoing, or both, values can be combined. For example, if a connection is an alarm and server connection, the value in this field would be 0x10 (SERVER=8 + DYNAM=2).

Partner SCADA

Machine name

Send Tmo Recv Tmo Con type

LCT Tbl (Logical Connection Table)


The LCT Tbl tab, shown in Figure 1-7, shows the LCT records in use. These LCT records represent logical connections to the remote nodes.

19

Using the NETDIAG Utility

Figure 1-7: CT Tbl Tab The list box on this tab shows the status of logical connections to remote nodes. Use this tab when LAN Redundancy is enabled. This tab shows the status of the logical connections to the remote node. This tab is similar to the LNT Tbl tab and exists only in iFIX. Information in the fields in the LCT_TBL Header show if there have been any LAN failovers on any connection if you are using LAN redundancy. Table 1-3 describes the fields on the LCT Tbl tab that contain troubleshooting information.

Using the NETDIAG Utility

20

Table 1-3: LCT Tbl Fields Section LCT_TBL Header Field Name Any Failovers Displays... A non-zero value if you are using LAN redundancy and you have had failovers on any of your connections. If a separate thread has been created in the network task to perform the LAN Redundancy logic. [describe values] LCT_REC Details Active Path Primary Path Backup Path Latched Failover Indexes into the socket table that indicate which network connection is active. [describe values] If there have been any LAN failovers on this connection.

Mgmt Thread ID Mgmt Thread Handle

Socket Tbl (Socket Table)


The Socket Tbl, tab shown in Figure 1-8, shows the list of socket records in use, which represent connections to remote nodes.

21

Using the NETDIAG Utility

Figure 1-8: Socket Tbl Tab This tab provides information about the available network addresses for iFIX networking. It is useful because it summarizes the information in all the socket records and connections. The information on this tab is read from the header of the socket table. The NCB and NTSK buffer usage, while not part of the socket table, are also displayed in this tab. NCB and NTSK buffers are necessary when communicating with remote nodes, and when troubleshooting network problems, it is important to know how many resources are available. Table 1-4 describes the fields on the Socket Tbl tab that contain troubleshooting information.

Using the NETDIAG Utility

22

Table 1-4: Socket Tbl Fields Field Name Timeouts Displays The total number of timeouts on all connections since iFIX was started. The set of local addresses being used. You can disable paths in the SCU which would prevent them from appearing on this tab. The number of available addresses being used. The total number of transactions per second on all connections. The number of NCB records currently in use. The value in this field should be close to zero. Higher than one or two per connection could indicate a resource leak. The number of NTSK buffers in use. The value in this field should be close to zero. Higher than one or two per connection could indicate a resource leak.

Available Addrs

Num Addrs Transactions/sec

Ncb InUse

Ntsk In Use

Socket Recs (Socket Records)


The Socket Recs tab, shown in Figure 1-9, shows the socket records in use. These socket records represent nodes that have incoming or outgoing connections to the local node.

23

Using the NETDIAG Utility

Timestamp fields

Figure 1-9: Socket Recs Tab A socket record represents a connection between two nodes. In the list box, incoming connections have a ~ in front of the node name. When you select a node name, status and other information display for that connection. Connection and session history fields display when any connection losses occur. This tab also contains the most complex and low level information for a connection. These are unique by local and remote IP address. If your machine has only one local IP address, all your socket records will have the same local IP address, but different remote IP addresses. The timestamp fields (located to the right of the listbox) show the last 10 errors for the connection with their timestamps. The timestamps can indicate if the connections are being lost at a regular interval or if they are being lost at random. The error that caused the connection loss is also recorded and you can see if the same errors are repeatedly reported. Information in the list box does not update automatically. Click on the Update list box button to update the information in the list box.

Using the NETDIAG Utility

24

Timers and Counters in the SKT Records TCPTASK keeps counters for the current connection and totalized counters for all connections made since startup. If you establish the connection once, never lose it, then re-establish that connection, the current connection counter will equal the total counter. If connections have been lost and re-established, then the total counter will be greater than the current counters. Some counters are useful for understanding performance and efficiency of your communication with the remote node. High values in the Wouldblocks and Multirecvs fields indicate that the connection is possibly overloaded. Even if not much information is being transferred between these two particular nodes, a busy network outside of this connection or outside of iFIX traffic can cause inefficiency of data transfer. All connections and machines share the same underlying network software and cabling. Table 1-5 describes the fields on the Socket Recs tab that contain troubleshooting information. Table 1-5: Socket Recs Fields Field Name LCT Index Displays... The index of associated records in the LCT table. [describe values] Conn Ever OK If the connection was ever established at this low level. It is possible for the value in this field to indicate that the connection was established (non-zero value), and the Conn Ever OK field on the NNT Tbl tab to indicate that the connection was never established (0 value). The packet type of the last packet sent and received. May be useful to know the last packet type sent before a connection was lost. The number of messages sent and received for the current session.

Last Pkt S/R

Msgs Sent Msgs Rcvd

25

Using the NETDIAG Utility

Table 1-5: Socket Recs Fields (continued) Field Name Multi Receive Displays... The number of times a request or response could not be received in one call from TCPTASK. High numbers (such as ??) could indicate network load, but are typically not the problem. The number of connection attempts. The value in this field should increment while the connection is down. This proves that it is trying to reconnect. The number of times TCPTASK had to retry when sending a packet. High numbers could indicate network load. Counters for current and all prior connections to this remote node.

Conn Attempts

Wouldblocks Multi Wouldblock Total Msgs Sent Total Msgs Rcvd Total Multi Send Total Multi Rcvd Total Conn Attempts Total Conn Attempts OK Total Wouldblocks Total Disconn

The number of times connections were taken down by the remote node. The number of connections that TCPTASK takes down due to timeouts.

Total Timeouts

NBT Tbl (NetBIOS Table)


The NBT Tbl tab, shown in Figure 1-10, shows the NBT records in use, which represent connections with remote nodes.

Using the NETDIAG Utility

26

Figure 1-10: NetBIOS Tbl Tab Select this tab when using NetBIOS networking. This tab provides information regarding the available LANA number for iFIX networking. The LANA numbers are used by the operating system to route network data to the correct protocol and network adapter (usually a network card). The information in the list box does not update automatically. Click the Update Listbox to update this information. Table 1-6 describes the fields on the NBT Tbl tab that contain troubleshooting information.

27

Using the NETDIAG Utility

Table 1-6: NBT Tbl Fields Field Name Entries Inuse Displays... The number of records in the table. The number of records in use. This value ndicates if you have available resources to make a new connection. If the value in the Num In Use field is less than the value in the Entries field, you have available resources to make new connections. Peak The maximum number of records in use at one time. If nodes have disconnected, then the value in this field would be greater than the value in the Inuse field. The number of available local area network addresses (LANAs). The set of local area network addresses being used. You can disable paths in the SCU which would prevent them from appearing here. The number of NCB records currently in use. The value in this field should be close to zero. A value higher than one or two per connection could indicate a resource leak. The total number of transactions per second on all connections.

Num Lanas

Available Lanas

Ncbs Inuse

Transaction Per Second

NBT Recs (NetBIOS Records)


The NBT Recs tab, shown in Figure1-11, shows the list of NBT records in use. These records represent nodes that have incoming or outgoing connections to the local node. Select this tab when using NetBIOS networking. 28

Using the NETDIAG Utility

Figure 1-11: NBT Recs Tab Incoming connections have a ~ in front of the node name. When you select a node name, NETDIAG displays the connection status and other information for that connection. The information in the list box does not update automatically. Click on the Update Listbox button to update the information in the list box. Table 1-7 describes the fields on the NBT Recs tab that contain troubleshooting information.

29

Using the NETDIAG Utility

Table 1-7: NBT Recs Fields Field Name Out Seq Current Out Seq Tot Displays... The number of responses that NBTASK is holding on to in order to send replies in the same order as requests. This logic is necessary to communicate with nodes that expect to get replies in the order of requests. If a connection is marked for cleanup by NBTASK. The value in this field is nonzero if a massive cleanup is pending for this record because you are closing the connection.

Massive Cleanup

Trace
The Trace tab, shown in Figure1-12, traces the activity in NBTASK or TCPTASK, which are executable programs that send and receive data over the network.

Using the NETDIAG Utility

30

Figure 1-12: Trace Tab Use this tab to collect and dump additional information such as the sequence of events leading up to a connection loss. Click the Enable Trace button for TCPTASK to begin collecting trace information. Click the Disable Trace button to stop collecting trace information. You can also turn tracing on automatically when iFIX starts by checking the Enable trace at startup checkbox. The fields on this tab are used to filter the collected trace information. For example, to trace information over a particular socket, enter the socket ID in the text box next to the Conn ID field. Information will be filtered down to the information that matches that socket ID only. Click the Dump Trace button to dump the information to a file. NETDIAG dumps the information to the file in the following location: Dynamics\APP\tracedmp.txt.

31

Using the NETDIAG Utility

Net Recv Queues (Network Receive Queues)


The Net Recv Queues tab, shown in Figure 1-13, displays the counts of the processto-process queues used for all network transactions coming into the local node.

Figure 1-13: Net Recv Queues Tab The values in the DBASRV fields are accurate only if DBASRV.EXE is running. DBASRV.EXE runs if you are using a NetBIOS network. Table 1-8 describes the fields on the Net Recv Queues tab.

Using the NETDIAG Utility

32

Table 1-8: Net Recv Queue Fields Field Name Conmgr Displays... The number of incoming messages for CONMGR. The value in this field would be non-zero only if CONMGR.EXE has stopped responding to remote requests. The number of incoming messages for DBASRV. If the value in this field is nonzero, DBASRV is at processing capacity. If this happens, reduce the number of view clients or overall iFIX network traffic. The number of incoming messages for NAC/NAM. The value in this field would be non-zero if NAC/NAM.EXE stopped responding to remote requests.

Dbasrv

Alarms

The Curr, Peak, and Size columns refer to the number of items currently in the queue, the most items ever in the queue at a given time, and the maximum size of the queue, respectively.

Test Interface Tab


The Test Interface tab, shown in Figure 1-14, is not currently used. It is reserved for future use.

33

Using the NETDIAG Utility

Figure 1-14: Test Interface Tab

Using the NETDIAG Utility

34

Troubleshooting Common Network Problems


Table 1-9 contains some common network problems and possible solutions. Use the information in this table if customers call you with these specific problems. Refer to Table 1-10 for a list of possible error codes you might see in NETDIAG.

35

Using the NETDIAG Utility

Using the NETDIAG Utility

36

Table 1-9: Troubleshooting Common Network Problems Problem Unable to establish a connection Possible Solutions If a customer cannot establish a network connection, the error code 8517 displays in the Network Status Display in the WorkSpace, or on the Lnt tab in NETDIAG. This error means that TCPTASK cannot resolve the node name to the IP address. If this error occurs, the customer is probably trying to connect to a node that is not in their host file or has a machine name different from the iFIX node name. You can begin to troubleshoot this problem by doing the following: [1] Check the Socket Recs tab in NETDIAG to verify that the IP address for the remote node is correct. Ensure that the address is not 255.255.255.255. If it is, this verifies that the name cannot be resolved. [2] Check the Conn Attempts field on the Socket Recs tab in NETDIAG to see if it is trying to establish the connection. The number in that field should increment by one every 20 seconds. If it does not, this indicates that it is probably a software problem. Request that the customer create a NETDIAG dump file. [3] Ask the customer to do the following to ensure the cable connections are good, and the name can be resolved to the IP address: [a] Ping the remote machine by node name. [b] Ping the remote machine by IP address. [c] Try mapping a network drive on the remote machine. [d] Verify that TCPTEST works between the two nodes. For more information about TCPTEST, refer to the Setting Up the Environment Manual.

37

Using the NETDIAG Utility

Table 1-9: Troubleshooting Common Network Problems (continued) Problem Temporary session loss Possible Solutions High network load can cause a temporary session loss. If a customer experiences this problem, a 1914 error code displays in the Lnt tab in NETDIAG and in the Network Status Display in the WorkSpace. You can begin to troubleshoot this problem by doing the following: [1] Go to the Socket Rec tab in NETDIAG and look at the error history to see the intervals of the session loss. The [field name] field shows how often you are losing a session. Check to see if the times correspond with activities that cause a burst of network activity (such as when you open a large pictures) or activities that use a large amount of CPU (such as printing reports). [2] Check the Total Timeouts field on the SKT Tbl tab. Does the count increment whenever there is a session loss? If it does, then the session loss is due to timeouts. [3] Check the Peak Turnaround field SKT Recs tab. This field indicates how close you are getting to timeouts.

Using the NETDIAG Utility

38

Table 1-9: Troubleshooting Common Network Problems (continued) Problem Slow network performance Possible Solutions If a customer loses a session frequently or experiences slow network performance, start by checking the following: [1] The WouldBlocks fields in the Socket Recs tab to see if the network is busy. [2] The Total MultiSend and total MultiReceive fields in the Socket Recs tab. They show total current and prior connections since iFIX has been running. You can reduce network load by increasing the picture refresh rate or the historical collection rate. Unrecovered session loss If a customer experiences an unrecovered session loss, this means that the session was once good, was lost, and did not come back for more than a minute or two. A 1914 error code may display in Lnt tab. Start troubleshooting by doing the following: [1] Follow steps 1 through 3 under Unable to establish a connection problem. [2] Investigate possible resource leaks by checking the NCB and NTSK fields on the Socket Tbl tab. If the values are high in these fields, go to the Socket Recs tab to figure out which socket record is using the resources. On the Socket Rec tab, look at the NCB count and Send Buf fields. If the values in the NCB and NTSK fields are less than 10, it is not a resource leak problem. [3] Find out from the customer if all connections to the remote node are bad, or just one connection? The answer to this question can indicate on which end the problem might be. If just one connection to the remote node is bad, can this view client connect to other SCADAs? [4] Ask the customer to restart iFIX on the view client and SCADA server if possible.

39

Using the NETDIAG Utility

Run-time Error Codes


Table 1-10 describes error codes that you may see in NETDIAG. Table 1-10: Run-time Error Codes Error Code 1605 1608 1610 1624 1620 Error Message Description

Command timed out. Invalid Local Session Number. Session Closed. Session Ended Abnormally. Cannot find name called.

These errors occur when the remote node is down. When the remote node is brought back up, the Connection Manager re-establishes the session. The session cannot be established because either a remote node is not operating, a cabling program exists between the nodes, or the remote node name is not registered on the network. Verify that both nodes are running compatible network software. Also, run NBTEST as discussed in the section, Troubleshooting NetBIOS with NBTEST in the Setting up the Environment Manual.

1914

Connection NOT established with node.

The Connection Manager has not yet established a connection with the remote node. Wait for Connection Manager to establish the session.

Using the NETDIAG Utility

40

Table 1-10: Run-time Error Codes (continued) 1960 FIX dynamic connection in progress. Connection Manager is in the process of establishing a dynamic connection with the remote node. Wait for Connection Manager to establish the session. Connection Manager has detected that iFIX has been shutdown on the remote node. This error code displays temporarily, then changes to 1914.

1964

FIX has been shut down on remote node.

Understanding the Dump File


The customer generates a dump file to provide you with a snapshot of the network statuses and counters at a particular point in time. When you receive a dump file from a customer, it is important to know when the file was generated, for example, before, during, or after the problem. It is recommended that you have the customer create a dump file from both ends of the connection to see what was happening at each end of the connection at the time that the problem occurred. The dump file outputs the information in the tables in NETDIAG, and provides other information not found in NETDIAG. The following table describes the information in each section of the dump file. Table 1-11: Dump File Contents This section of the Dump file... Node Platform NNTDUMP LNT Entries NNT Entries The contents of the LNT tab. The contents of the NNT tab. Shows... The operating system running on the local node.

41

Using the NETDIAG Utility

Table 1-11: Dump File Contents (continued) This section of the Dump file... NBDUMP LCT TABLE DUMP NBT TABLE DUMP NCB TABLE DUMP Buffer Pool Dump The contents of the NBT Tbl and NBT Rec tabs. Shows all socket records ever in use, not just currently in use. Output of the NBT table, showing the set of NBT records in use. Output of the NCB table, showing the set of NCB records in use. Output of the NTSK buffers. NBASTAT The NetBIOS Adapter status statistics provided by the operating system. The NetBIOS session status statistics provided by the operating system. Shows...

NBSSTAT

TCPDUMP LCT TABLE DUMP TCPDUMP SOCKET TABLE DUMP The contents of the LCT Tbl tab. The contents of the Socket Tbl and Socket Rec tabs. Shows all socket records ever in use, not just currently in use. Output of the NCB table, showing the set of NCB records in use. Output of the table of NTSK buffers. Buffers for messages that TCPTASK was unable to send on the first try. The contents of the alarm queues. This information is also displayed in the ALMSTAT utility. Check the *LOST* column when you are having problems with lost alarms. For more information on alarm queues, refer to Using the ALMSTAT Utility document.

NCB TABLE DUMP

Buffer Pool Dump Send Buffer Pool Dump

ALMQDUMP*

Using the NETDIAG Utility

42

Table 1-11: Dump File Contents (continued) This section of the Dump file... DBADUMP* Shows... Shows the set of remote nodes that are requesting data from this node. If a node is unable to retrieve data from you, check this section to see if that node is listed. The information found in the Net Recv Queue tab. The list of all SCADAs requesting alarms from the local node. Check this section if you are not receiving alarms from a SCADA server. A list of all view clients requesting alarms. Check here if a view client is not receiving alarms from a SCADA server.

NETQDUMP NACDUMP*

NAMDUMP*

43

Using the NETDIAG Utility

Table 1-11: Dump File Contents (continued) This section of the Dump file... OSXDUMP* Shows... A dump of the OSX table. The OSX table shows system resources (such as global memory, events, and queues) that are shared among iFIX applications. Check this section to see if the table is full. You would do this by checking the following fields: Max Allocated Max Used Num Used Num out of memory Nub glbl alloc failed

If you are having a failed memory allocation problem, these fields are non-zero. Otherwise, these fields should be 0. From this section you can also tell if the node is running any third party applications that are accessing the iFIX database. You can also see information shown in this section in the OSXDIAG program. The OSXDIAG program is shipped with iFIX. NOTE: Only the Num out of memory and Nub glbl alloc failed fields should be zero. KEYDUMP* The options that are enabled on your hardware key. If a feature is not working, you can check this section to see if the option is enabled. The KEYDIAG program that is shipped with iFIX also shows this information.

* Indicates information not provided by the NETDIAG Utility.

Using the NETDIAG Utility

44

Troubleshooting Network Problems


Use the information in the following tables to help you gather and interpret the information necessary to troubleshoot network problems. The first table contains information to check, or request from the customer. The second table contains specific information that you should understand about the connections and the network as you begin to troubleshoot problems.

Check or request this information... Event log messages in NT Event Log Viewer or any *.EVT in ALMPATH.

For this reason... This log shows error messages that eventually cause session losses, and the reasons why. For example, this log file shows timeouts and memory errors. This section of the file indicates which networking protocol is being used. Normally, this section of the file is empty which indicates that the default protocol is NetBEUI. The RDATHROTTLE value affects network performance. The default is 10 and is provided for multi-node networks. This default value ensures that all nodes share SCADA access equally. If you have a smaller network, increase this value.

If the customer is using NetBIOS, check to see if they have made an entry in the PreferredProtocol section of the network.ini file.

Check to see if the RDATHROTTLE has been changed by reading the HKEY_LOCAL_MACHINE\SOFTWARE\CLAS SES\SOFTWARE\FIX Dynamics registry to see if there is a named value called RDATHROTTLE. By default, there is no value. If there is a value, record that value. Request a NETDIAG dump from the customer before, and then during the problem.

This will help you determine what was happening before and during the problem.

As you begin to troubleshoot a network problem, ensure that you understand the specific information about the connection and the network that are listed in the following table.

45

Using the NETDIAG Utility

Ensure that you understand... The set of connected nodes, including the versions of Intellution software running on remote nodes. Also, understand which nodes are view clients and which nodes are SCADA server.

Explanation... If there are multiple nodes in the network you should know if all nodes are having the same problems. You can request dumps from other nodes. To determine a set of connected nodes, look at all the records shown in the NNT dump. To determine the software version, check the Ver Major and Ver Minor fields on the NNT Rec tab.

The directions of the connections (incoming and outgoing) and the logical connections (data and alarm) that are established.

The direction can indicate if the customer has configured incoming connections to a view client that is invalid. If the customer is not receiving alarms over the network, verify that the alarm session was established. Refer to the Con Type field on NNT Rec tab for this information.

How many network paths are being used.

To help rule out any dial up problems, you want to know if the customers machine is a simple machine with one network card, multiple cards, or even dial up connections. Look at the Available Lanas field on the NBT Recs tab or the Available Addrs field on the SKT Tbl tab to see the number of available paths. The value in the Num LANAS and Num Addrs field is usually 1.

If SCADA failover or LAN redundancy is enabled.

Knowing if SCADA failover or LAN redundancy is enabled can help you troubleshoot configuration problems if these features are set up incorrectly. To check for SCADA failover, check the LNT_REC primary/backup/active indexes. The backup will be a number other than 32767 if Backup SCADA is enabled. To check for LAN Redundancy, check to see if there are socket records for SCADA and SCADA-R.

Using the NETDIAG Utility

46

Ensure that you understand... If the inactivity timer is enabled.

Explanation... Knowing if this timer is enabled can help you rule out the inactivity timer as a source for session loss. The Discon timer cfg field on the NNT Rec tab should contain a number other than 32767. Verify that the following timers on the NNT Rec tab are at their default values: Send Tmo (30) Recv Tmo (60) Idle Timer Cfg (20) Discon Timer Cfg. (32767)

If any other programs are running on the machine.

If other programs are running they can take memory away from iFIX. Check the OSXDUMP section of the dump file to see if there is memory owned by any programs that you do not recognize.

The connection history of all connected nodes individually and as a group.

For single nodes, check for nodes that are lost at regular intervals or randomly. As a group, check to see if multiple connections are lost at the same time. If the connections are being lost at regular intervals or at random. See if the connections are being lost due to timeouts or disconnects. Look at the error history on the Socket Recs tab or NBT Recs tab. Also, check the counters of timeouts. Check if sessions are being lost on multiple connections.

47

Using the NETDIAG Utility

Ensure that you understand... The current network load.

Explanation... Non-iFIX traffic can affect iFIXs node-to-node communication. Check the following fields to determine network load: Multi sends and Multi recvs in TCP (Socket Recs tab) dbasrv (Net Recv Queues tab) NCB turnaround time (Socket Recs and NBT Recs tabs)

If there are any resource leaks.

If you are unable to reestablish a connection, the problem could be a resource leak. Check the NCB Table Dump section of the dump file to see if any NCBs are in use and which connection and application owns them. See if the ID is 1 (Conmgr) 2 (dbasrv) 3 (eda) or 6 (alarming). Check the DBADUMP section to see if any dba send buffers are available.

After using the information in the previous tables to troubleshoot the problem, depending on the nature of the problem, follow these general guidelines: Address any configuration issues such as having view clients listed in the remote node network list, or enabling/disabling inactivity or other timers. If you are concerned about network load and timeouts, increase session timers or reduce network load by adjusting picture refresh rates. If you are concerned about resources, perform periodic NETDIAG dumps and watch the resource usage over time.

Using the NETDIAG Utility

48

You might also like