You are on page 1of 67

Martin Ross - martin.ross@uk.ibm.

com

IBM Integration Bus V10 Performance


How to analyse your system to optimise performance and throughput

© 2009 IBM
Corporation
Overview

 The purpose of this presentation is to demonstrate how to find the cause of poor performance for an
IBM Integration Bus node (broker) for two different types of problem.
 The examples are obtained on a Windows system but the principles of investigation and problem
determination apply equally on all platforms. The system level tools will differ though.

14 July 2015 © 2015 IBM


Corporation
Agenda

 Introduction
 Tools
 Techniques
 Demonstration

14 July 2015 © 2015 IBM


Corporation
What are the main performance costs in message flows?

Parsing Tree Navigation Tree Copying


A B C … X Y Z

Root.Body.Level1.Level2.
Level3.Description.Line[1]; Set OutputRoot = InputRoot;

Resource Access Processing Logic

14 July 2015 © 2015 IBM


Corporation
Integration Bus Processes

 Bipservice
Integration Node
– Lightweight and resilient process that starts
and monitors the bipbroker process
Integration Server – If the bipbroker process fails, bipservice will
restart it
Application Application  Bipbroker
Message Message – A more substantial process. Contains the
flows flows deployment manager and administrative
Libraries Libraries
agent. All commands, toolkit connections
and WebUI go through this process.
– Responsible for starting and monitoring the
biphttplistener, bipMQTT and
DataFlowEngine processes.
– If any process fail, bipbroker will restart
Integration Server [n] them.

Application Application  BipMQTT


– Handles MQTT events
Message Message
flows flows
 Biphttplistener
Libraries Libraries – Runs the brokerwide HTTP connector for
HTTP and SOAP nodes.
 DataFlowEngine
– Runtime engine for all deployed resources.

14 July 2015 © 2015 IBM


Corporation
Which resources and how much

 Understand typical resource utilisation – need to understand if resource utilisation is higher than
expected or running as normal...
 In busy times expect to use what is needed (!)
– Exactly what will depend on the configuration and the applications
– Typical to use CPU and memory plus I/O to some level
 In quiet times Message Broker and MQ processes
– Should use very little CPU
– Should use very little I/O capacity
– Will retain memory
 Some memory sizes whilst running the Coordinated Request Reply sample
– Bipservice 3.7 MB
– Bipbroker 112 MB
– Biphttplistener 35 MB
– DataFlowEngine 154 MB
• Can use from ~100 MB to GigaBytes depending on number of flows, complexity of the
message flow, the size of the messages
 MQ processes
– Expect it to be less than IBM Integration Bus (76 MB for a simple queue manager)
– Will depend on number of open queues, channels, queue buffer sizes etc.

14 July 2015 © 2015 IBM


Corporation
Tools that are needed

 Monitoring tools
– At the operating system level to observe
• System resource usage – CPU, memory, I/O activity
• Heaviest resource users

– At the component level to observe


• Behaviour within the particular component (MQ / IBM Integration Bus)

– Both types of tools are needed


• They have different views of the world
• They are complimentary

 Driving tools
– Needed to generate a continuous workload
• Important to assess performance after warm-up during sustained activity

14 July 2015 © 2015 IBM


Corporation
UNIX tools
System Configuration: lcpu=64 mem=8192MB

kthr memory page faults cpu


----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
1 0 1977672 25823 0 0 0 0 0 0 3 958 696 4 0 96 0
1 0 1977838 25719 0 2 0 98 100 0 29 2941 2250 4 0 96 0
1 0 1977685 25872 0 0 0 0 0 0 2 636 483 4 0 96 0
 Vmstat System configuration: lcpu=64 drives=5 paths=6 vdisks=2

tty: tin tout avg-cpu: % user % sys % idle % iowait


0.0 29.5 3.6 0.1 96.2 0.0

 Iostat Disks: % tm_act Kbps tps Kb_read Kb_wrtn


hdisk3 0.0 0.0 0.0 0 0
hdisk2 0.0 0.0 0.0 0 0
hdisk0 0.0 4.0 1.0 8 0
hdisk1 0.0 0.0 0.0 0 0
 Nmon cd0 0.0 0.0 0.0 0 0

 filemon

14 July 2015 © 2015 IBM


Corporation
Windows tools – Process Explorer

 Watch system activity in detail on Windows


 Watch
– CPU usage
– Commit charge
– I/O activity
– Physical memory history
– Summary information
– Individual processes

 Download from https://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

14 July 2015 © 2015 IBM


Corporation
Process Explorer

 DataFlowEngine.exe:
– This is the Integration
Server

 amqzlaa0.exe
– This is the MQ agent for
LOCAL connections
(including the broker)

 amqrmppa.exe
– This is the MQ agent for
CLIENT connections

 Can quickly see system is busy

 Customise by selecting columns of choice

14 July 2015 © 2015 IBM


Corporation
Process Explorer – Task Information

14 July 2015 © 2015 IBM


Corporation
Find Integration Server name from PID
mqsilist DEMO -d2
-----------------------------------
BIP1286I: Integration server 'default' on integration node 'DEMO' is running.

Number of message flows that are enabled to run: '4'.


Number of applications that are enabled to run: '2'.
Process ID: '8248'
UUID: 'e1306ebc-3c3a-43c2-b18a-bbdb99e07d5c'
Short description: ''
Long description: ''
BIP8071I: Successful command completion.-----------------------------------
BIP1286I: Integration server 'default' on integration node 'DEMO' is running.

Number of message flows that are enabled to run: '4'.


Number of applications that are enabled to run: '2'.
Process ID: '8248'
UUID: 'e1306ebc-3c3a-43c2-b18a-bbdb99e07d5c'
Short description: ''
Long description: ''
BIP8071I: Successful command completion.
-bash-4.1$ ps -ef | grep DataFlowEngine
mqm 4331 4302 0 13:12 pts/0 00:00:00 grep DataFlowEngine
mqm 28788 28701 99 12:57 ? 00:41:56 DataFlowEngine CSIM d2cd939a-a7a3-46ce-8168-
b89c77744511 default

14 July 2015 © 2015 IBM


Corporation
Key tools at the component level

 Integration Bus
– User trace
– Trace nodes
– Activity Log
– WebUI
• Accounting & Statistics: Compare flow statistics at the node (broker), server (execution group),
container (application or library) or at an individual message flow level
• Resource Statistics: View resource use at the execution group level

 MQ Explorer

 Java Healthcenter

14 July 2015 © 2015 IBM


Corporation
Statistics scope

Resource
Node (broker) Statistics

Server (execution group)


Accounting &
Thread Statistics

Message Flow
Terminals
Message Model Node

14 July 2015 © 2015 IBM


Corporation
Accounting & Statistics

 Dynamic reporting of message flow  A Choice of intervals


usage for • Short or snapshot
• Problem determination • Long or archive
• Chargeback
• Profiling

 Data at two levels  Attach a label to the


1. Universal (CPU cost, IO, elapsed time)
information
– AccountingOrigin to
2. Broker specific (messages, commits, collect data to identify
backouts) and collate this
information according to
the specific origin of a
message, even in
 Choice of destination consolidated flows.
1. Trace
2. PubSub
3. SMF (z/OS only)

14 July 2015 © 2015 IBM


Corporation
Subscribing to Accounting & Statistics

 Publish/Subscribe data is published on the topic

$SYS/Broker/brokerName/StatisticsAccounting/recordType/executionGroupLabel/messageFlowLabel

A subscription for $SYS/Broker/+/StatisticsAccounting/+ receives all statistics for all brokers

Notes: The following three characters have a special meaning:


– The topic level separator "/"
– The multilevel wildcard "#"
– The single-level wildcard "+"

14 July 2015 © 2015 IBM


Corporation
Resource Statistics

 Dynamic reporting of the performance and operating details of


resources used by execution groups
• Problem determination
• Profiling

 Supported resources  Data


1. CICS
2. CORBA • XML messages
3. FTEAgenet distributed using
4. JDBCConnectionPools PubSub
5. JVM • Access as raw data
6. ODBC or view in WebUI
7. SOAPInput
8. Security
9. Outbound Sockets

 Regular reporting
• Data published approximately every 20 seconds

14 July 2015 © 2015 IBM


Corporation
Subscribing to Resource Statistics

 Data is published on the topic

$SYS/Broker/brokerName/ResourceStatistics/executionGroupLabel

A subscription for $SYS/Broker/+/ResourceStatistics/+ receives all statistics for all brokers

Notes: The following three characters have a special meaning:


– The topic level separator "/"
– The multilevel wildcard "#"
– The single-level wildcard "+"

14 July 2015 © 2015 IBM


Corporation
WebUI – Accounting & Statistics

 Using the WebUI in IBM Integration


Bus v10:
– Control statistics at all levels
– Easily view and compare flows,
helping to understand which are
processing the most messages or
have the highest elapsed time
– Easily view and compare nodes,
helping to understand which have
the highest CPU or elapsed times
– View all statistics metrics
available for each flow
– View historical flow data

14 July 2015 © 2015 IBM


Corporation
WebUI – Resource Statistics

 View resource statistics for resource managers in IIB such as:


– JVM
– ODBC
– JDBC
– parsers, etc.

14 July 2015 © 2015 IBM


Corporation
MQ Explorer

14 July 2015 © 2015 IBM


Corporation
IBM Support Assistant and Java Health Centre

 Java Health Centre is provided as part of the IBM Support Assistant


– Offers very low overhead monitoring tool
– Runs along side an IBM Java application
 Get visibility, monitoring and profiling in the following application areas:
– Performance
• Java method profiling
• Lock analysis
• Garbage collection
– Memory
– System
– Java Class
– File input and
– Object
 Enable the application JVM prior to use
– IBM_JAVA_OPTIONS=-Xhealthcenter

14 July 2015 © 2015 IBM


Corporation
Demonstration of analysing performance issues

 Identify problems in two message flows using


– Process Explorer
– WebUI Statistics
– MQ Explorer
– Java Healthcenter

Coordinated Request Reply Java Compute Transform

14 July 2015 © 2015 IBM


Corporation
Demonstration 1

Analysing a performance problem in the Coordinated Request Reply


Scenario

14 July 2015 © 2015 IBM


Corporation
Coordinated Request Reply message flows

 Consists of three message flows


– Request
• Converts incoming message from XML to CWF
• Saves the incoming message in a queue for subsequent reply processing
• Writes a message for the back end reply message flow

– BackendReplyApp
• Sets the completion time in the message
• Writes a reply message

– Reply
• Reads the message from the back end message flow
• Retrieves the original message saved by the request message flow
• Writes an output message

14 July 2015 © 2015 IBM


Corporation
Coordinated Request Reply queues
 The queues
– Request

CSIM_SERVER_IN_Q GET_BACKEND_REQ

GET_REPTO_STORE

– BackendReplyApp

GET_BACKEND_REQ GET_BACKEND_REP

– Reply

GET_BACKEND_REP CSIM_COMMON_REPLY_Q

GET_REPTO_STORE

14 July 2015 © 2015 IBM


Corporation
Run and investigate

Steps
1. Ensure all components are started and the applications works as expected
- Message flows, databases, external applications etc.
2. Start a load generator [JMSPerfharness in this case]
3. Look at activity
- Is processing happening at the expected rate?
- Is CPU usage as expected?
- Is memory usage as expected?
4. If things do not seem as expected
- Look for build up of messages
- Poor service times
5. Enable and view statistics
6. Analyse statistics
7. Examine message flows

14 July 2015 © 2015 IBM


Corporation
Step 1 – Check flows are running using the WebUI

 Check the server is running


 Check the flows are running
 Check the event/sys log for any errors
 Processing messages and no errors

14 July 2015 © 2015 IBM


Corporation
Step 2 – Start a load generator

 Run JMSPerfharness
– Using 10 threads

 All threads start successfully


– Each thread PUTs a message then GETs a
message so should be no messages on
queues for any period of time

 Check event/sys log for any error messages

14 July 2015 © 2015 IBM


Corporation
Step 3 – Look at CPU activity

 Messages being processed but:


– Rate is low, much lower than expected
– Very little CPU being used
• Integration Server does not register any
CPU activity

14 July 2015 © 2015 IBM


Corporation
Step 4 – Look for a build up of messages
CSIM_SERVER_IN_Q
GET_REPTO_STORE Request
 Key queues are GET_BACKEND_REQ  Build up of messages on queues:
GET_BACKEND_REQ
– GET_REPTO_STORE
BackendReplyApp
GET_BACKEND_REP – GET_BACKEND_REQ
GET_BACKEND_REP
GET_REPTO_STORE Reply  What does this mean?
CSIM_COMMON_REPLY_Q

14 July 2015 © 2015 IBM


Corporation
Step 4 – Look for a build up of messages...
 Looking at the flows
– Queue GET_REPTO_STORE is used by Request and Reply message flows
– Queue GET_BACKEND_REQ is used by BackendReplyApp message flow

– GET_REPTO_STORE is used mid-flow (so flows using this are less likely to be the problem)
– GET_BACKEND_REQ is the input queue for the BackendReplyApp
• Indicates flow is not running fast enough or not enough instances allocated
 Need to investigate what is happening with BackendReplyApp
– For this use WebUI flow statistics
14 July 2015 © 2015 IBM
Corporation
Step 5 – Enable flow statistics

 Start and stop statistics using the WebUI for:


– All flows in a server
– All flows in a container
– Individual flows

14 July 2015 © 2015 IBM


Corporation
Step 5 – View statistics

 Select the statistics view


 Drill down to the problem flow
 Start by comparing flows
 Flow analysis view for most detail

14 July 2015 © 2015 IBM


Corporation
Step 6 – Compare flows

 Compare flows to determine which one might be causing the problem


 We can see that the BackendReplyApp flow has an average elapsed time of 1,000.9 milliseconds. It
only has 1 active thread, and has processed 20 messages in the 20 second statistical snapshot
period.
 This matches the rate we see in JMSPerfHarness!

14 July 2015 © 2015 IBM


Corporation
Step 6 – Analyse the flow

 Display historical flow details such as message


rate, CPU and elapsed time
 View all nodes within the flow to determine and
sort by average elapsed and CPU times
 The compute node Modify_CompletionTime
seems to be a problem!

 What does high elapsed time and low CPU time


suggest the problem might be?

14 July 2015 © 2015 IBM


Corporation
Step 7 – Review the code

 Having worked out which node is causing the problem


 We can quickly see why the node is taking 1 second elapsed time but little CPU

14 July 2015 © 2015 IBM


Corporation
Problem found!!

 1 second sleep in the compute node within the message flow is causing slow processing times and no
CPU usage
– Matches the observations at the start
• Low CPU and low message rate

 Unlikely to be so easy in future but slow service times, like slow synchronous web service invocations
would have the same effect

 If it was slow web service response times then allocate more additional instances to improve
processing rate

14 July 2015 © 2015 IBM


Corporation
Summary of steps for this investigation

 Use a systemic approach


– Key steps used were
1.Ensure all components are started and the applications works as expected
- Message flows, databases, external applications etc.
2.Start a load generator [JMSPerfharness in this case]
3.Look at activity
- Is processing happening at the expected rate?
- Is CPU usage as expected?
- Is memory usage as expected?
4.If things do not seem as expected
- Look for build up of messages
- Poor service times
5.Enable and view statistics
6.Analyse statistics
7.Examine message flows
 It is very important to
– Use tools
• System level and component level
– Start at a high level – system level and then close-in on the problem

14 July 2015 © 2015 IBM


Corporation
Demonstration 2

Analysing a performance problem in the Java Compute Transform Scenario

14 July 2015 © 2015 IBM


Corporation
JavaComputeTransform message flows

 Consists of one message flow


– JavaCompute
• Reads an XML message
• Transforms to a different format using a Java Compute node

JAVA_COMPUTE_IN JAVA_COMPUTE_OUT

14 July 2015 © 2015 IBM


Corporation
What is the problem we need to solve?

 The problem is characterised by


– Low message rate
– High CPU usage at both system and Integration Server level
– Sufficient messages on the input queue

 Likely issue is one of high CPU usage in a message flow


– But which flow and which node?

14 July 2015 © 2015 IBM


Corporation
Compare the flows

 All of the elapsed and CPU time is in the JavaCompute message flow, so continue investigation here

14 July 2015 © 2015 IBM


Corporation
Finding the processing Node for investigation

 The majority of the elapsed and CPU time


within the flow is spent in the
JavaCompute Node

 What might cause this?

 As this is a Java Compute Node continue


investigation using the Java Healthcenter

14 July 2015 © 2015 IBM


Corporation
Find the Integration Server port for Java Health Center

Environment variable:
IBM_JAVA_OPTIONS=-Xhealthcenter
Opens ports starting 1972, the Integration Server
running the JavaComputeTransform scneario is
using port 1974

14 July 2015 © 2015 IBM


Corporation
Alternate method for finding the port number

> mqsilist DEMO -d2

-----------------------------------

BIP1286I: Integration server 'default' on integration node


'DEMO' is running.

Number of message flows that are enabled to run: '4'.

Number of applications that are enabled to run: '2'.

Process ID: '7284' > netstat -a -b -n -o


UUID: 'e1306ebc-3c3a-43c2-b18a-bbdb99e07d5c'

Short description: '' [runmqlsr.exe]


Long description: '' TCP [::]:1972 [::]:0 LISTENING 7352
BIP8071I: Successful command completion. [bipbroker.exe]

TCP [::]:1973 [::]:0 LISTENING 7748

[biphttplistener.exe]

TCP [::]:1974 [::]:0 LISTENING 7284

[DataFlowEngine.exe]

TCP [::]:4417 [::]:0 LISTENING 7352

[bipbroker.exe]

TCP [::]:49152 [::]:0 LISTENING 924

14 July 2015 © 2015 IBM


Corporation
Invoking the Java Health Center

14 July 2015 © 2015 IBM


Corporation
Attaching to the Integration Server JVM

14 July 2015 © 2015 IBM


Corporation
Connect to a port

14 July 2015 © 2015 IBM


Corporation
Connection complete and ready to analyse

14 July 2015 © 2015 IBM


Corporation
Analysis and Recommendations – CPU

14 July 2015 © 2015 IBM


Corporation
Analysis and Recommendations – Classes

14 July 2015 © 2015 IBM


Corporation
Analysis and Recommendations – Environment

14 July 2015 © 2015 IBM


Corporation
Analysis and Recommendations – Garbage Collection

14 July 2015 © 2015 IBM


Corporation
Analysis and Recommendations – I/O

14 July 2015 © 2015 IBM


Corporation
Analysis and Recommendations – Locking

14 July 2015 © 2015 IBM


Corporation
Analysis and Recommendations – Native Memory

14 July 2015 © 2015 IBM


Corporation
Analysis and Recommendations – Profiling

14 July 2015 © 2015 IBM


Corporation
The cause

 Having worked out which node is


causing the problem
 We can quickly see why the node is
consuming a lot of CPU
 A call to the method bubble_sort() just
before propagating out of the node is
sorting the entire output message

14 July 2015 © 2015 IBM


Corporation
If you suspect there is a product problem

 Identify the problem as best you can


 Find the simplest test that recreates the problem
 Collect the data identified in the Must Gather list
– For IIB http://www-01.ibm.com/support/docview.wss?rs=849&uid=swg21209857
– For MQ: http://www-01.ibm.com/support/docview.wss?uid=swg21229861#MG6

14 July 2015 © 2015 IBM


Corporation
Summary

 Wide range of tools available covering operating system and component performance
– Expect to use multiple tools
– After all it is important to understand what is happening at different levels
– Demonstration has shown how to use the key tools for MQ and IIB to debug a problem
 Practice before hand
– Being familiar with the tools is a great help in a crisis
– Learning a new tool and solving a crisis is not a good combination
 Know your applications and systems
– What is normal in terms of processing rate, CPU usage etc.
– This information allows to know whether there is a problem and to what extent

14 July 2015 © 2015 IBM


Corporation
Additional Information

 WebSphere Message Broker: Designing for Performance


– http://www-01.ibm.com/support/docview.wss?rs=849&uid=swg24006518

 WebSphere Message Broker: Message display, test & performance utilities (IH03)
– http://www-01.ibm.com/support/docview.wss?rs=171&uid=swg24000637

 IBM Monitoring and Diagnostic Tools for Java – Getting started with Health Center
– http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/getting_started.html

 IBM Monitoring and Diagnostic Tools for Java – Health Center


– http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/

 IBM Monitoring and Diagnostic Tools for Java – Knowledge Center


– http://www-01.ibm.com/support/knowledgecenter/#!/SS3KLZ/SS3KLZ/welcome_tools_family.html

14 July 2015 © 2015 IBM


Corporation
Backup chart

 MQ processes
 Additional Instances usage and tuning

14 July 2015 © 2015 IBM


Corporation
WebSphere MQ V7 Processes

Task Function
AMQALMPX The checkpoint processor that periodically takes journal checkpoints.
AMQZMUC0 Utility manager. This job executes critical queue manager utilities, for example the
journal chain manager.
AMQZXMA0 The execution controller that is the first job started by the queue manager. It handles
MQCONN requests, and starts agent processes to process WebSphere MQ API calls
AMQZFUMA Object authority manager (OAM)
AMQZLAA0 Queue manager agents that perform most of the work for applications that connect to
the queue manager using MQCNO_STANDARD_BINDING.
AMQZLAS0 Queue manager agent.
AMQZMUF0 Utility Manager
AMQZMGR0 Process controller. This job is used to start up and manage listeners and services.
AMQZMUR0 Utility manager. This job executes critical queue manager utilities, for example the
journal chain manager.
AMQZDMAA Deferred Message Processor
AMQFQPUB Publish/subscribe process.
AMQFCXBA Broker worker job.
RUNMQBRK Broker control job.
AMQRMPPA Channel process pooling job.
AMQCRSTA TCP/IP-invoked channel responder.

14 July 2015 © 2015 IBM


Corporation
WebSphere MQ V7 Processes

Task Function
AMQCRS6B LU62 receiver channel and client connection.
AMQRRMFA Repository manager for clusters.

AMQCLMAA Non-threaded TCP/IP listener.

AMQPCSEA PCF command processor that handles PCF and remote administration requests.
RUNMQTRM Trigger monitor.

RUNMQDLQ Dead letter queue handler.


RUNMQCHI The channel initiator.
RUNMQCHL Sender channel job that is started for each sender channel.
RUNMQLSR Threaded TCP/IP listener.
AMQXSSVN Shared memory servers.

AMQRCMLA Channel MQSC and PCF command processor.


AMQZTRCN Trace.

 Number present at any time will vary


– Dependent on configuration, applications running, etc.
– Some will always be present, such as AMGZXMA0 (Execution Controller)

14 July 2015 © 2015 IBM


Corporation
Additional Instances usage and tuning

 Integration Server level data contains the following data for each message flow in it:

– MessageFlowName – TotalSizeOfInputMessages
– TotalElapsedTime – MaximumSizeOfInputMessages
– MaximumElapsedTime – MinimumSizeOfInputMessages
– MinimumElapsedTime – NumberOfThreadsInPool
– TotalCPUTime – TimesMaximumNumberOfThreadsReached
– MaximumCPUTime – TotalNumberOfMQErrors
– MinimumCPUTime – TotalNumberOfMessagesWithErrors
– CPUTimeWaitingForInputMessage – TotalNumberOfErrorsProcessingMessages
– ElapsedTimeWaitingForInputMessage – TotalNumberOfCommits
– TotalInputMessages – TotalNumberOfBackouts
– TotalNumberOfTimeOutsWaitingForRepliesToAggregateMessages

 Fields NumberOfThreadsInPool and TimesMaximumNumberOfThreadsReached show for every


message flow the number of additional instances allocated and the number of times they were all used
– Use this data to determine if:
• More additional instances are required
• Too many are allocated

14 July 2015 © 2015 IBM


Corporation
Additional Instances usage and tuning

 % Time Thread Pool Limit Reached = TimesMaximumNumberOfThreadsReached / TotalInputMessages

14 July 2015 © 2015 IBM


Corporation

You might also like