You are on page 1of 10

A Best Practices Guide for Event Processing Solutions based on TIBCO BusinessEvents TM.

This document represents a guide to best practices for architecting, designing, and implementing solutions based on TIBCO BusinessEvents. The guide covers generic patterns as well as specific solution patterns such as the Transaction Analysis Solution. For specific solution examples it will cover also models for physical architecture and capacity planning. This document will evolve over time when new insight is gained from real-life projects and requirements/constraints.

http://www.tibco.com Global Headquarters 3303 Hillview Avenue Palo Alto, CA 94304 Tel: +1 650-846-1000 Toll Free: 1 800-420-8450 Fax: +1 650-846-1005
Copyright 2004, TIBCO Software Inc. All rights reserved. TIBCO, the TIBCO logo, The Power of Now, and TIBCO Software are trademarks or registered trademarks of TIBCO Software Inc. in the United States and/or other countries. All other product and company names and marks mentioned in this document are the property of their respective owners and are mentioned for identification purposes only. 0204

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute.

A Guide to Analysis & Specification for Solutions based on TIBCO BusinessEventsTM.

1 Document Revisions
Version 0.1 0.2 0.3 0.4 Date Oct 6th, 2004 May 23rd, 2005 Nov 9th, 2005 Dec1st, 2005 Author R. GomezUlmke H. Karmarkar H. Karmarkar H. Karmarkar Document Created. Document Updated for BusinessEvents version 1.1 Document Updated for BusinessEvents version 1.2 Document Updated to include monitoring and fault tolerance discussion Notes

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute

A Guide to Analysis & Specification for Solutions based on TIBCO BusinessEventsTM.

Table of Contents
1 Document Revisions.......................................................................................2 2 Purpose of Document......................................................................................3 3 Generic Design & Implementation Patterns..................................................4
3.1 Designing the Ontology.............................................................................................4 3.2 Designing Rules........................................................................................................4 3.3 Designing State Machines........................................................................................5 3.4 BE Archive Configuration..........................................................................................6 3.4.1 Selecting Rulesets........................................................................................6 3.4.2 Enabling Input destinations..........................................................................6 3.4.3 Designing the Engine Persistence................................................................6 3.5 Configuring the Deployment......................................................................................7

4 Generic Architecture Patterns........................................................................7


4.1 Fault-Tolerance & High Availability Planning............................................................7 4.2 Scalability .................................................................................................................7 4.2.1 Stateless Scenarios......................................................................................7 4.2.2 Stateful Scenarios........................................................................................7 4.3 Sizing & Capacity Planning.......................................................................................8 4.3.1 Example 1: Performance & Capacity............................................................8 4.4 Handling Duplicates..................................................................................................8 4.5 Monitoring and Management ...................................................................................9 4.5.1 Self Monitoring and Management.................................................................9 4.5.2 Monitoring& Management of BE engines using Hawk................................10

2 Purpose of Document
The purpose of this document is to provide a guide for architecting, designing, and implementing solutions based on TIBCO BusinessEvents TM. This document assumes familiarity with the product and availability of It covers a collection of best practices in the following areas: Design and implementation of ontology, concepts, rules, state models, etc. based on performance optimization criteria as well as operational maintainability, flexibility of design, etc. Architecture patterns for fault-tolerance and scaleability. Specific solution patterns such as Transaction Analysis and others. Physical architecture patterns and capacity planning patterns for general solutions but also for specific solutions.

This guide will evolve over time and will be enhanced with lessons learnt from specific real-life projects. The targeted audience are architects, designers, and developers.

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute

A Guide to Analysis & Specification for Solutions based on TIBCO BusinessEventsTM.

Throughout this document Courier New will be used for rules engine code samples, and Verdana will be used for deployment variable names as well as property file entries.

3 Generic Design & Implementation Patterns


3.1
1.

Designing the Ontology


Maintain keys manually within objects to allow for faster joins than using concept references ( see rules section) Maintain a flat datamodel whenever possible to minimize the number of object types required in rules Separate events into subclasses, assuming that there are logical divisions that can be made. When events are broken into subclasses, event correlation is faster because objects are filtered by type before rule conditions are evaluated against property values. Model data to be used in rule conditions from an event as event properties because the event payload can only be accessed by using the XSLT mapper and XPath builder; both of which are more costly than accessing event and concept properties directly. Pay careful attention to event TTL settings, as the default setting is 0, which means the event will be removed from the working memory after one rules evaluation cycle. A TTL <0 will have the event remain in the working memory until explicitly consumed.

Apply the following practices for designing the ontology in general for optimal performance and throughput.

2. 3.

4.

5.

3.2
1.

Designing Rules
Use rulesets to group together rules with similar logical functions, this facilitates deployment design, as well as making it easier to activate and deactivate certain functionality at runtime using Tibco Hawk methods. Always comment rules to make the goal of the rule clear to other developers who may have to maintain the project Use different types of concepts or events as the identifiers in different rules and avoid using generic parent types within rules that would better address sub-types. BE first filters out unwanted instances/events based on their type before passing them to rules evaluation. Use filter conditions (a condition that uses only one single identifier) wherever possible. E.g. Order.amount > 1000, or Order.customer == "Fred". BE first evaluates all the instances/events using filter conditions before matching them in subsequent join conditions. 5. Minimize the number of identifiers in a rule. Each additional identifier in a rule requires an additional join in rule. Joining objects is expensive, as a rule has to match/test N x M combinations. However, it is impossible to avoid joining or matching multiple objects. So, use simple join conditions if possible based on single keys rather than multiple property evaluations. BE has optimized for some join conditions like equivalent join condition (e.g. 'Order.customerId == Customer.Id', or 'Function(X) ==Y.property'). These types of join condition provide constant-time performance.

Apply the following practices for designing rules in general

2.

3.

4.

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute

A Guide to Analysis & Specification for Solutions based on TIBCO BusinessEventsTM.

6.

Avoid using I/O functions in the conditions/actions. E.g. updating a database in the action. Do this asynchronously whenever possible. For example,send an event to trigger a BW process to update the DB. BE will support asynchronous functions in future releases and the asynch functions will be executed independently outside the working memory. Avoid using functions that take a concept as a parameter in rule condition e.g. function(ConceptA instance). The engine will not trace out the dependency in the condition and will make the rule dependant on any change to the instance. So, for any change to the instance, the rule will be re-evaluated regardless of which property changed Minimize the usage of the XSLT mapper and XPath Builder for creating and modifying instances/events within rules. Executing XSLT and mapping/creating XML is expensive. Use the factory methods whenever possible from the Ontology functions tab to create new instances or events; use Event properties instead of Event payload is possible. Take extra care to avoid using the XPath builder in the condition when the argument is a concept, this creates the "any change" dependency described above in the point above. Perform a check for null when evaluating the length of a string e.g. cu.ATTRIBUTE13 !=null && String.length(cu.ATTRIBUTE13)>0

7.

8.

9.

10. Perform a check for array length before accessing an array property by index e.g. cu.array@length > 4 && cu.array[4] == "xx" 11. Delete instances and consume events that are no longer needed. Keeping them in working memory uses memory resource and may result in unnecessary matching evaluation both slowing the engine and unnecessarily consuming memoty. 12. When Identifying containment relationships with concepts, matching the container and contained instance by contained@parent == container is much more efficient than Instance.PropertyArray.indexOfContainedConcept(container.containedConceptProperty , contained) != -1

3.3
1.

Designing State Machines


There are some cases where rules are completely independent from each other, and using a state model imposes an unnecessary ordering condition on these rules, as well as additional state history maintenance overhead. When this is the case consider using only standalone rules and avoiding the state machine altogether. When using a state machine with a composite state to represent the logical flow of a concept, use conditionless transitions to exit out of composite states to prevent interrupt behaviour when the transition condition evaluates to true (unless this is the desired behaviour). To avoid this without having a dummy state following the transition, one can use a called state machine instead of the composite state. Keep all rules that interact with the state machine within the state machine transitions rather than adding them as standalone rules. This facilitates reviewing the flow of the state model by future developers.

2.

3.

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute

A Guide to Analysis & Specification for Solutions based on TIBCO BusinessEventsTM.

3.4

BE Archive Configuration
3.4.1 Selecting Rulesets

When configuring the archive you have the option of setting a number of parameters

By selecting only the rulesets that are crucial to your engine you can minimize the footprint of the engine at runtime and maintain the maximum performance. However, any ruleset that is not selected here will not be compiled into the engine, and cannot later be activated through Hawk.

3.4.2

Enabling Input destinations

If you select the listener set of default every destination that is the default destination for an event within a rule declaration will be enabled as an input destination. If you select custom, you will be able to designate which destinations the engine will receive input messages from

3.4.3

Designing the Engine Persistence

The following general recommendations should be considered when configuring BE engine persistence, but should only be considered guidelines as persistence tuning is very dependant on the use case, the manner in which events received and dealt with, as well as the runtime state of the engine/working memory (average rate of events, burst rates, number of active existing objects) 1. Enabling persistence adds overhead and will slightly reduce the speed of the engine, but it also enables the property cache allowing least used properties of objects to be swapped out to disk. In cases where the amount of objects in memory would exceed the available memory for the heap, persistence should be enabled (with the truncate deployment option if the persistence is not explicitly required by the use case, but only enabled to use the property cache). 2. The property cache size is the number of object properties that business events will keep in memory. This setting should be altered depending on the use case based on number of object properties that should remain in memory when persistence is enabled. Based on our LRU ( least recently used) implementation, the most actively used properties will remain in the property cache, up to the number defined by the user in this field. A property cache size that is too low will lead to thrashing, and a property cache size that is too high will lead to too much memory consumption. It is important to test this property setting for best performance, as it is highly dependent on the use case. 3. The checkpoint interval for persistence is another configuration parameter that depends heavily on the use case, specifically the rate of changes to the facts in the engine. Each checkpoint only writes the modified objects to disk. With many events coming in and altering facts, it makes sense to have a lower checkpoint interval, whereas with few events it can make sense to have a larger interval, we recommend a range between 10-50 seconds with a default of 30. 4. Unless you have specific analysis requirements for all historical instances, use the "Delete Retraced Objects from Database" option in persistence configuration. Checking this option physically removes the deleted objects from the persistence layer database as opposed to keeping the object in the database and marking it deleted internally. This will help in preventing unchecked growth of the persistence database. Also, under certain conditions, it may provide better performance from the persistence layer. 5. Refer to deployment configuration for information on the Berkeley DB cache size.

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute

A Guide to Analysis & Specification for Solutions based on TIBCO BusinessEventsTM.

3.5

Configuring the Deployment

The following general rules should be applied to Business Events Deployment configurations, but optimal setting are highly dependent on the actual use case and runtime character of the system 1. The Berkeley DB cache percentage is initially set to 20%. This means that when persistence is enabled, 20% of the entire jvm heap will be used for the cache for the embedded DB. The adjustment of this parameter depends heavily on the way the objects are used by the engine, and the best approach is testing. We recommend keeping values between 15-30%. This property can be set at deploy-time through the administrator using the deployment variable be.engine.om.berkeleydb.internalcachepercent, but this value should not be adjusted unless the trade off of persistence layer performance versus memory consumption has been carefully observed through testing of the specific use case with realistic volumes of data In a multiple CPU machine, the following properties should be set as follows: be.engine.wm.poolsize=1 be.engine.wm.queuesize=<some big number e.g. 10000 3. To run BusinessEvents, the administrator is recommended, but not required. During development and testing it is often easier to run the deployment from the BE tester within designer, or to run ear file from a command line by running the executable and passing the ear file as an argument. .\be-engine.exe -propFile <filename>.ear (Note that the executable will pick up whatever tra file shares the same name as the executable within the current working directory)

2.

4 Generic Architecture Patterns


4.1 Fault-Tolerance & High Availability Planning
For fault tolerance in stateful scenarios, the recommended paradigm for BusinessEvents is to configure two engines in a Hot-Cold pair with a re-mountable shared store for the persistence layer. When used with a guaranteed transport like EMS or RVCM this configuration ensures no message loss and a very small downtime ( depending on the size of the persistence database). In a stateless high availability configuration a Hot-Warm pair is recommended. This is achieved by using Hawk to enable the input destinations for the standby engine once the primary goes down. To prevent the ping-pong effect, when the standby engine begins receiving messages it should be designated the primary.

4.2

Scalability

BE is designed to handle high message volumes, but in rare cases where the production scenario exceeds the capability of a single engine there are a multiple options for scaling based on the scenario.

4.2.1

Stateless Scenarios

For a stateless use case where rules are evaluated against single events, and no cross event correlation is required scaling and load balancing can be handled by using a transport to split input messages across multiple engines, such as using EMS with round robin delivery on queues, or using Rendezvous distributed queues ( RVCMQ).

4.2.2

Stateful Scenarios

When the use case requires stateful scaling some sort of content based partitioning between multiple engines is recommended. The two recommended ways of handling this are by using message selectors in EMS, or by

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute

A Guide to Analysis & Specification for Solutions based on TIBCO BusinessEventsTM.

configuring a BE engine for pre-correlation thereby ensuring that all messages for any given flow are processed a single engine. When using EMS selectors one has to be able to partition the set of incoming messages by a single field such as a transaction id, a region, or some other key that will be consistent within all messages across a transaction, but allow for enough separation of the total set of events to provide performance gains by partitioning between engines. Pre-correlation methods are further discussed in the BE design patterns document

4.3

Sizing & Capacity Planning


4.3.1 Example 1: Performance & Capacity
The algorithm used in BusinessEvents is very efficient, and it can plough through large quantities of rules and facts in little time. BE uses an improved form of a well-known method called the RETE algorithm to match rules against the working memory. The RETE algorithm explicitly trades space for speed, so BE may require a lot of memory. Using 1.4.2 JVM on a 1.6 GHz Pentium 4 laptop, BE can fire more than 50,000 rules per second; it can add more than 50,000 instances/events to working memory per second. These figures were measured without persistence turned on. TIBCO suggests to use 64 bit architecture machines to address large amounts of memory. It is difficult to analyze the algorithms in the general case, because the actual performance is dependent on the makeup of working memory and on the exact nature of the rules. But BE (using an optimized Rete algorithm) eliminates the inefficiency in the simple pattern matcher by remembering past test results across iterations of the rule loop. Only new or deleted instances/events are tested against the rules at each step. Furthermore, it organizes the pattern matcher so that these few facts are only tested against the subset of the rules that may actually match.

4.4

Handling Duplicates

BE provides a 100% fail-safe environment for event processing by using a mechanism similar to a transaction log of a database combined with the persistency of the JMS layer. An internal Object Manager logs all activity of the engine into persistent memory for potential later recovery. Recovery at this time is really a re-play of all things that have happened in the right order. In order to satisfy also performance requirements this transaction log is flushed to disk in the background via a separate thread based on a configurable time (for this example lets assume an unusually large window of, every 10 minutes). This leaves potentially a window of 10 minutes where internal state can be lost due to a fatal crash of the engine. In this case, all input i.e. events are still stored in the guaranteed transport, i.e. they have not been acknowledged at this point. The recovery mechanism will then: 1. 2. re-play the internal BE transaction log start receiving 10 minutes worth of guaranteed transport S input events

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute

A Guide to Analysis & Specification for Solutions based on TIBCO BusinessEventsTM.

At this point exactly the same state as before the crash has been established. As with all asynchronous, real-time systems the possibility of duplicates needs to be handled properly. This is not specific to BE but in general applicable for non-2-phase commit based systems. There is no generic mechanism to achieve duplicate detection since it can happen within all layers of the solution where stateful processing is applied. I.e. in this solution these are the BE engines, the databases, the guaranteed transport, etc. All mechanisms for duplicate detection are tied to specific application and state logic and must be dealt with explicitly. (however, BE ensures that itself does not consume guaranteed transport messages twice by comparing message IDs) For example, the engine crashes 9 minutes after the last flush. During these last 9 minutes a number of alerts, KPIs, etc. have been generated and sent out to the presentation layer or database. In this case duplicate alerts, events, etc. need to be avoided. The logic that inserts events or alerts or KPIs into a database needs to be duplicate-aware and handle them appropriately. Another level of potential duplicates can be caused by faults within the collection layer, the data source / agent layer, etc. I.e. any component within the downstream layers of BE. Therefore, as part of the BE models e.g. state models explicit duplicate detection must be modeled based on event IDs, state IDs, etc. Example Scenario Events are received by BE via JMS which stores them persistently until explicitly acknowledged BE receives the event, does NOT acknowledge to JMS at this point BE evaluates all rules that are triggered by this event Rules that are triggered call the Event.consume() function to mark this event as potentially to be consumed and acknowledged Once all rules have been evaluated and the consume flag has been set, BE acknowledges the event to JMS and deletes it from memory; but not before its internal transaction log has been flushed to disk.

4.5

Monitoring and Management


4.5.1 Self Monitoring and Management
Using internal Engine Functions, a BusinessEvents engine can monitor its own status metrics like JVM memory free, and the total number of events and instances in the working memory. This data can be used to manage engine behavior as well as send out status messages and alert messages to be handled by other processes. Outgoing status messages can be used to send monitoring data to another GUI or another business events engine, as well as used to kick of integration processes or workflows using tools like BusinessWorks. General guidelines for incorporating meta-functionality into the deployment. To store relevant performance statistics, a separate ruleset and Scorecard(s) should be used to aggregate and store counters as well as other BE engine data. Separating these meta-application rules allows them to be enabled independently of the application logic and facilitates interpretation of the project source bys subsequent authors One should create a separate channel and ruleset to handle runtime engine control within BE such as clearing counters, setting flags, and setting parameters. This will facilitate maintenance of the project

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute

A Guide to Analysis & Specification for Solutions based on TIBCO BusinessEventsTM.

and future updates as well as allow the functionality to be disabled independently of the application logic, as well as the statistic collection Time intervals for publishing out statistics events should be stored in scorecards and initialized with global variables. This will allow them to be changed with rules and incoming events later, but also give the deployment administrator the ability to check the initial value and manage it as she would any other deployment parameter

4.5.2

Monitoring& Management of BE engines using Hawk

Like most TIBCO products, BE exposes certain monitoring and management functionality through a Hawk Microagent. The methods exposed are detailed in the BusinessEvents documentation, but aside from the standard stop functionality some BE specific methods are available the behavior of the engine at runtime, and getting information about the objects contained within the working memory. By and large the object inspection methods should be used sparingly to minimize expensive calls to the object manager, but when combined with Scorecards for tracking limited statistics these methods can provide access to data within a hawk display that would otherwise require a separate GUI.

TIBCO Software Inc. Proprietary & Confidential. Do Not Distribute

10

You might also like