A Comparison of SAP Application Performance on Centralized versus distributed server topologies can be found on the web, www.ibm.com / support / techdocs. IBM SAP International Competence Center walldorf, germany.
A Comparison of SAP Application Performance on Centralized versus distributed server topologies can be found on the web, www.ibm.com / support / techdocs. IBM SAP International Competence Center walldorf, germany.
A Comparison of SAP Application Performance on Centralized versus distributed server topologies can be found on the web, www.ibm.com / support / techdocs. IBM SAP International Competence Center walldorf, germany.
Comparison of SAP Application Performance on Centralized versus Distributed Server Topologies
This document can be found on the web, www.ibm.com/support/techdocs
Date: November 2013
Walter Orb Matthias Kchl Fabrice Moyen Hans-Jrgen Reiss
IBM SAP International Competence Center Walldorf, Germany IBM SAP International Competence Center
Copyright IBM Corp. 2013
2
Table of contents TABLE OF CONTENTS ..................................................................................................................................... 2 FIGURES ........................................................................................................................................................ 3 TABLES .......................................................................................................................................................... 3 1. INTRODUCTION ..................................................................................................................................... 4 ACKNOWLEDGEMENTS ........................................................................................................................................... 4 2. TRADITIONAL SAP BUSINESS SUITE TOPOLOGIES .................................................................................. 5 COMMUNICATION BETWEEN APPLICATION SERVER AND DATABASE ................................................................................. 5 3. DESCRIPTION OF TEST ENVIRONMENT .................................................................................................. 6 HARDWARE AND NETWORK SETUP ........................................................................................................................... 6 TESTED TOPOLOGIES .............................................................................................................................................. 7 SAP 2-tier ...................................................................................................................................................... 7 Consolidated Para 3-tier / 3-tier-in-a-box ................................................................................................. 7 SAP 3-tier ...................................................................................................................................................... 8 3-tier Campus/Cloud Simulation................................................................................................................... 8 SAP WORKLOAD SCENARIOS ................................................................................................................................... 9 NIPING Network Round-Trip Time (RTT) .................................................................................................... 9 SAP ERP Workload Simulation .................................................................................................................... 10 DB ROW-Select ........................................................................................................................................... 11 SAP Client Copy ........................................................................................................................................... 11 4. RESULT ANALYSIS ................................................................................................................................ 13 NIPING NETWORK ROUND-TRIP TIMES .................................................................................................................. 13 INTERACTIVE SAP WORKLOAD ............................................................................................................................... 17 SAP BACKGROUND PROCESSING ............................................................................................................................ 19 DB-Select Simulation [SINGLE_MULTI_READ] by SAP................................................................................. 19 SAP Client Copy ........................................................................................................................................... 20 WIDE AREA NETWORK SIMULATION ....................................................................................................................... 23 NIPING Network Round-Trip Times ............................................................................................................ 23 Interactive SAP Workload ........................................................................................................................... 24 SAP Background Processing ....................................................................................................................... 25 SAP Client Copy ........................................................................................................................................... 26 SAP - SD Queries [DB_READ_UPDATE] ....................................................................................................... 27 5. POWER SYSTEMS (AIX) SPECIFIC OBSERVATIONS ................................................................................ 29 ADAPTER / OS / NETWORK SETTINGS ..................................................................................................................... 29 VIO UTILIZATION ................................................................................................................................................ 31 6. SUMMARY ........................................................................................................................................... 33 7. TECHNICAL APPENDIX ......................................................................................................................... 34 DETAILED TEST LANDSCAPE NETWORK LAYOUT AT IBM CLIENT CENTER IN MONTPELLIER ................................................. 34 ANUE NETWORK LATENCY SIMULATOR .................................................................................................................. 36 8. ABOUT THE AUTHORS ......................................................................................................................... 38 9. TRADEMARKS AND SPECIAL NOTICES .................................................................................................. 39
IBM SAP International Competence Center
Copyright IBM Corp. 2013
3
Figures
FIGURE 1 SAP APPLICATION TIERS ............................................................................................................................... 5 FIGURE 2 SCHEMATICS OF TEST-ENVIRONMENT .............................................................................................................. 6 FIGURE 3 TESTED TOPOLOGIES..................................................................................................................................... 7 FIGURE 4 NIPING ROUND-TRIP TIMES ........................................................................................................................ 15 FIGURE 5 LARGE PACKETS BANDWIDTH DB<->APP-SERVER ............................................................................................. 16 FIGURE 6 SAP ERP DB-DIALOG TIMES ....................................................................................................................... 17 FIGURE 7 INCREASE OF SAP ERP SD DB-REQUEST TIMES VS. 2-TIER ................................................................................ 17 FIGURE 8 ROW SELECT TIME OVER QUERY RESULT VOLUME .............................................................................................. 19 FIGURE 9 SAP CLIENT COPY ELAPSED RUNTIME ............................................................................................................. 20 FIGURE 10 INCREASE OF SAP CLIENT COPY PROCESSING TIME VS. 2-TIER ........................................................................... 21 FIGURE 11 PARALLEL CLIENT COPY PROCESSES .............................................................................................................. 22 FIGURE 12 ANUE DELAY VERIFICATION BY NIPING ......................................................................................................... 23 FIGURE 13 ERP DB-WAN RESPONSE TIME ................................................................................................................. 24 FIGURE 14 EXPONENTIAL INCREASES IN GUI RESPONSE TIME ........................................................................................... 25 FIGURE 15 DB-BACKGROUND PROCESSING SELECT TIMES (ZTEST-ABAP) ........................................................................ 26 FIGURE 16 3-TIER CLIENT COPY RUNTIME OVER APP-SERVER. LATENCY ............................................................................. 27 FIGURE 17 DB SELECT MIX REPORT ........................................................................................................................... 28 FIGURE 18 IMPACT OF ETHERNET ADAPTER TUNING ON ROUND-TRIP TIMES (IN MS) ............................................................. 30 FIGURE 19 NETWORK ROUND-TRIP TIMES (IN MS) USING DIFFERENT PROCESSOR MODES FOR VIO SERVERS .............................. 31
Tables
TABLE 1 CATEGORIZED RESULT SERIES.......................................................................................................................... 13 TABLE 2 LAN ROUND-TRIP TIMES ............................................................................................................................... 14 TABLE 3 NIPING ROUND-TRIP TIMES ............................................................................................................................ 15 TABLE 4 INCREMENTAL DB-REQUEST TIME VS. 2-TIER .................................................................................................... 18 TABLE 5 INCREMENTAL CLIENT COPY RUNTIME VS. 2-TIER ............................................................................................... 21 TABLE 6 PARALLELIZATION GAINS OF CLIENT COPY ......................................................................................................... 22 TABLE 7 INCREASE OF DB-REQUEST TIME IN WAN VS. 2-TIER ......................................................................................... 25
IBM SAP International Competence Center
Copyright IBM Corp. 2013
4
1. Introduction
Today we see a continued trend towards 3-tier client/server implementations at SAP customers. In combination with virtualized platforms, this provides a high degree in flexibility and agility. However, 3-tier topologies introduce a new level of network complexity and performance impact (latency, bandwidth) compared to a centralized 2-tier SAP system setup. This is even more important when we compare the speedup of CPUs, I/Os (SSD, Flash) versus network evolution during the past years.
Furthermore, SAP customers start accommodating the cloud computing paradigm and start using Cloud based application server resources for special purposes (e.g. peak load processing, SAP upgrades). This introduces wide-area-network effects to their SAP applications. Obviously, the influence of the network can become the dominating factor for response and processing time.
The IBM SAP International Competence Center in Walldorf together with the IBM Client Center in Montpellier conducted a comprehensive series of measurements to quantify the impact of 2-tier versus 3-tier topologies for different SAP workload characteristics. The results of this study are represented and commented in this paper. The SAP AG performance team contributed with know-how and test reports.
Acknowledgements
Special Thanks To:
Thomas Glaser, IBM Power Systems Technical Sales Manager Europe, for his overall project sponsorship.
Dr. Ulrich Marquard, Senior Vice President and Head of Performance, Data Management & Scalability at SAP AG Germany, and his team for providing valuable test cases, consulting, and contribution while performing the tests and editing this document. IBM SAP International Competence Center
Copyright IBM Corp. 2013
5
2. Traditional SAP Business Suite Topologies
The SAP Business Suite backend architecture comprises of three layers, a database, an application, and a presentation layer. These layers can be deployed in either 2-tier mode (DB and application server processes run within a single OS image), or as a 3-tier configuration where the database and application layers run in separate OS images and have to communicate via a logical or physical network connection. The first tier is the presentation layer.
Figure 1 SAP Application Tiers
Each topology has different characteristics in regard to complexity, flexibility and resiliency.
Communication between Application Server and Database
The introduction of an additional TCP/IP network stack between the database and application servers for 3-tier configuration has an impact on SAP transaction response and the runtime of background jobs. In 2-tier implementations (dependent on the used database version) the database client to server communication path can be optimized to use an inter-process communication method. In a 3-tier configuration, database accesses always have to pass through the TCP/IP software stack as well as through some virtual (hypervisor) or physical (network interface adapter, LAN switches, WAN) communication layer. Each database access from the application server needs to pass through the complete TCP/IP layer twice: first issuing a DB-request (select, update etc.) and then receiving the results, either in form of selected business data or a transaction commit.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
6
3. Description of test environment
Hardware and Network Setup
The infrastructure was set up and operated at the Montpellier IBM Client Center. The SAP related tests were executed remotely by members of the IBM SAP Competence Center in Walldorf via VPN access.
The following components were used to build the test landscape:
2x IBM Power 750 servers (model 8408-E8D) o 32-core POWER7+ CPUs at 4GHz with 1TB of RAM o Dual Virtual I/O servers at level 2.2.2.2 (with efix IV37111m2a, IV38225s2a, IV39725m2a) o All logical partitions at AIX level 7.1 TL2 SP2
IBM Storwize V7000 Storage Subsystems (model 2076-324) o FC-attached Dedicated network and SAN infrastructure o 1Gb and 10Gb Ethernet LAN adapters and switches o 8Gb fibre SAN adapters and switches
ANUE Network Latency Simulator o 2 x 1Gb Ethernet ports
Figure 2 Schematics of Test-Environment
A symmetric EtherChannel connection between the two servers was chosen to resemble realistic customer setups. Configuration details are shown on page 34 in the Technical Appendix section.
The main focus of the test scenarios was to evaluate the impact of different network topologies on database request times and subsequently the transaction response and IBM SAP International Competence Center
Copyright IBM Corp. 2013
7
runtimes of background jobs. To avoid any other load dependent influencing factors, the workloads and the partitions were sized so that CPU and memory utilization did not create any bottlenecks. The maximum CPU utilization on each partition was kept in the 30-40% range.
Tested Topologies SAP 2-tier
SAP workload is processed within a single partition (beaci01). No external network traffic between database and application server processes is required for transaction processing.
For all test series in this paper the database and SAP Central Services remained on this partition. Only the application workload was executed and measured on other partitions.
Consolidated Para 3-tier / 3-tier-in-a-box
An additional SAP application server instance is hosted within a second partition (beaas12) on the same physical server. Communication via virtualized Ethernet adapters is provided by the PowerVM hypervisor using the integrated virtual Ethernet capabilities.
Figure 3 Tested Topologies IBM SAP International Competence Center
Copyright IBM Corp. 2013
8
SAP 3-tier The application server resides on another partition (beaas21) on a second physical server. The partition setup was identical to beaas12. TCP/IP traffic had to pass through the VIO server partitions and physical network segments. For the dedicated adapter tests, the Ethernet adapters were assigned directly to partitions beaci01 and beaas21. Thus any additional latency induced by the VIO servers was eliminated for these test series.
3-tier Campus/Cloud Simulation
Same as the SAP 3-tier scenario with a latency simulation device (ANUE) added to the external communication path. Although a single physical device, simulated latency time was split 50:50 across the two network paths (inbound and outbound) during our test series. The time delays on both network paths were set between 0 and 125 ms, the resulting round-trip delay experienced by the applications was two times this time delay.
See page 36 for more details about the ANUE device.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
9
SAP Workload Scenarios
A number of different workloads were used for simulating load on the test systems. These workloads are taken from typical SAP applications and the measured results (focussing on database request time) can be adapted to other SAP application load scenarios. NIPING Network Round-Trip Time (RTT) SAP delivers the NIPING tool to test the SAP NI (Network Interface) layer. This layer is used for inter application server, RFC, and GUI communication. As the name suggests the tool works similar as ping and can be used to test the network connectivity. SAP note 500235 (Network Diagnosis with NIPING) provides a detailed description of the tool. When using NIPING, at least two sessions must be started. The first one provides the server role and it will just receive and immediately send back data packages that are send by NIPING client sessions. The NIPING client can be controlled by command line parameters to focus on different aspects of network connectivity (round-trip time versus throughput).
In our test landscape, we used the following procedure:
1. Start NIPING server on beaci01 (database server) # niping s I 15
This command starts the NIPING tool in a server role and automatically stops after an idle time of 15 seconds (-I 15).
2. Start NIPING clients on all application server partitions: # niping c H beaci01 -B 1 -L 10000 # niping c H beaci01 -B 100000 -L 1000
The first test uses a data-buffer size of one (-B 1) and sends the data packages 10000 times (-L 10000). This test measures the network round-trip time. The second test uses a large data buffer (100000 bytes) to measure the network throughput.
The following is a sample output of the tool: Thu Aug 29 09:29:54 2013 connect to server o.k.
Thu Aug 29 09:29:56 2013 send and receive 10000 messages (len 1)
------- times ----- avg 0.192 ms IBM SAP International Competence Center
Copyright IBM Corp. 2013
10
max 9.754 ms min 0.155 ms tr 10.197 kB/s excluding max and min: av2 0.191 ms tr2 10.248 kB/s
For the round-trip times we look at the av2 value, which is the average response time for the number of messages sent excluding the maximum and minimum values. The tr2 value provides the network throughput figure.
Another tool to check network round-trip times is the standard ping utility. However, the lowest time resolution of the AIX implementation of ping is in milliseconds. The typical round-trip times in our test environment were in the microsecond range; therefore, the standard ping command was not adequate for most of the test scenarios.
To validate the results of NIPING, we used another ping like program called fping (see http://fping.org). This tool uses the Internet Control Message Protocol (ICMP) to test the network connectivity to a target host and provides the desired round-trip time resolution in microseconds. It has to be downloaded and installed on each server. We collected fping results in all our test scenarios. As the results matched closely and consistently with the output of the NIPING tests, we will provide only the NIPING results in this paper.
SAP NIPING and fping are basic tests tools to verify the network (typically routing, firewall settings), but can also be used for basic performance tests with no other components included but the SAP fundamental network layer.
SAP ERP Workload Simulation
We chose the standard SAP ERP Sales and Distribution (SD) benchmark to check the impact of network latency on online transactions. The SD benchmark reflects a typical SAP application where the SAP application server is communicating with the database. This benchmark is also used as a reference for all SAP sizing (SAPS) calculations (for details please see: http://www.sap.com/sizing).
The transaction load driver was installed together with the database/central instance partition (beaci01) and a fixed number of users were simulated on all three application servers simultaneously to produce a steady work-load of online ERP transactions. All users executed the same set of predefined transactions in a fixed number of loops. With this setup a single test run produced the comparison data for all three network topologies (2-tier, para-3-tier, and 3-tier). The key metric that is impacted by the network latency from an application point of view is the database request time for dialog and update work processes. We focused on these statistics in the result analysis.
We modified the test procedure for the wide-area-network (WAN) runs. To produce a reliable set of results, the SAP SD benchmark toolset requires that all tested application IBM SAP International Competence Center
Copyright IBM Corp. 2013
11
servers start and finish the high-load interval (the period of time where all users are logged on and generate a steady work-load) on each server at about the same time.
The WAN simulation has a heavy impact on end-user response times, so the 3-tier application server connected via the delayed network path experiences a significantly longer high-load interval. To avoid this problem, we performed a baseline measurement simulating work-load only on central instance application server running on beaci01. For the subsequent runs with an increasing number of network round-trip delays, the work-load was then executed on the 3-tier application server instance beaas21 only. This also means that - compared to the earlier runs without simulated network delays - only one third of the overall total workload was executed.
DB ROW-Select
This special load scenario executes a number of simple DB calls. This scenario is typical for network I/O intensive background processing with many database accesses like running analysis reports or end of year activities, etc. We distinguish between two flavors:
1. [SINGLE_MULTI_READ] Single and multiple database reads within one statement. This is typically found for analysis reports and mass data reads.
2. [DB_READ_UPDATE] Here database reads and writes are executed in a typical SAP application manner. This scenario reflects heavy network I/O load SAP applications, like mass data updates.
SAP Client Copy
We used local SAP client copies as another method to simulate the impact of network latency on background jobs with a significant number of database accesses. This workload shows both effects of high network I/O load combined with optimized data access strategies.
A client copy has to read all client specific data of the specified source client from the database and then insert the same data again using the new client number. We selected the SAP_ALL client copy profile, which copies all client specific data without change documents. To ensure that we always copied the same amount of data, we first created a new client 100 as a copy of one of the SD benchmark clients. This client was not used for any other tests, so after the initial copy, the client data was static. Next we copied this client to a new target client 200. The first client copy to a new client just inserts the data in the database. Any subsequent client copy to the same client will delete existing data before inserting the copied data in the new client. So after the initial copy to the target client 200, all following client copies used the same amount or read/delete/insert statements on the database. In our test system a client copy had to copy 62.544 tables with about 4 GB of data.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
12
As test result we noted the runtime in seconds as reported at the bottom of the detailed file log output of the Client Copy/Transport Log Analysis tool (transaction SCC3):
Exit program USERBUF_RESET successfully executed 13:41:15 Selected tables : 62.544 Copied data in kBytes : 4.098.311 Deleted data in kBytes : 4.098.329 Program ran successfully Runtime (seconds) : 2.017 End of processing: 13:41:15
For each test scenario, we ran the client copy on all three application servers. The client copies were scheduled as background jobs using the desired application server as background server without any parallel processing.
Parallel Process Client Copy
To check the effects of an increased network load on the client copies, we also ran a number of tests with various degrees of parallel processing. The Client Copy tool (transaction SCCL) contains a Parameters for Parallel Processes pushbutton that allows configuring the maximum number of processes and a RFC server group to be used for the processing. The client copy process will run as a single process during the analysis and post-processing phase, but will distribute work packages to parallel processes to handle the actual copy operations. We created three RFC server groups, each group containing only a single application server. The client copies were scheduled similar to the previous scenarios on all three application server. However, during the background scheduling, the parallel processes option was used to specify the desired amount of parallel processing and the appropriate RFC group for the tested application server. We tested three scenarios with three, six, and nine parallel processes on each application server.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
13
4. Result Analysis
For analysis, the test series were categorized according to the topologies and physical network characteristics. The provided charts will contain the following measurement series:
Table 1 Categorized result series
Where applicable, differences within a category caused by (intentional) changes of network parameters are discussed in the chapter Adapter / OS / Network Settings of this paper. Often results are normalized to the 2-tier setup which allows for a quick assessment of the relative effects of different setups. Results are sorted ascending, while the sequence of series can vary per test. Absolute times will be specified to allow quantified extrapolations (at ones own risk).
NIPING Network Round-Trip Times
Network latency in Ethernet networks is often measured and documented as one-way latency (the amount of time it takes from the source sending the packet until the target receiving it). From an application point of view, the more important value is the round- trip latency, which is the one-way latency from the source to the destination plus the one-way latency from the destination back to the source. This round-trip network latency does not include the amount of time that is spent in an application on the destination system processing the packet. Any references to network latency in this document will refer to round-trip latencies (unless explicitly stated otherwise).
Before starting to analyse database request times on SAP application servers in 3-tier configurations, we recommend to perform a quick sanity check to verify the network round-trip times between application servers and the database server. We recommend using the SAP NIPING tool, as it is already available in the SAP instance binary directory 1 .
1 Please note that according to the definition above, network round-trip latency does not include packet processing time on the destination. Nevertheless, NIPING provides reasonable good approximations of this network round-trip latency, as the server process just receives the packets and immediately sends them back again. Topology / Series 2-tier Para 3-tier 3-tier virtualized 1 Gbit Ethernet 3-tier virtualized 10 Gbit Ethernet 3-tier dedicated 1 Gbit Ethernet 3-tier dedicated 10 Gbit Ethernet 3-tier WAN virtualized 1 Gbit Ethernet IBM SAP International Competence Center
Copyright IBM Corp. 2013
14
The following are some typical values for expected NIPING round-trip times with small packages in current customer LAN networks:
Round-trip time in ms Rating less than 0.3 Very Good between 0.3 and 0.7 Average larger than 0.7 Network configuration should be included in the performance analysis of database response time problems Table 2 LAN round-trip times
The chart below shows the NIPING round-trip times measured with the different topologies. We measured round-trip times of about 30 microseconds for the 2-tier configuration. From an application perspective this means that each database request would have a network delay of at least 30 microseconds contributing to the response time 2 .
Moving to the para-3-tier configuration, where both partitions reside on the same physical server and communicate via the PowerVM hypervisor, we saw that the round- trip time doubled to about 60 microseconds.
For the 3-tier configurations the round-trip times increased again to about 100 microseconds with dedicated Ethernet adapters.
In a fully virtualized 3-tier setup, all network packets have to pass through a VIO server partition on both machines. Therefore the network round-trip times increased again to about 200 microseconds.
For small network packets, theres little difference between the 10 Gbit and 1 Gbit Ethernet adapters results. For network transfers with very small payloads, most of the time is actually spend in processing the TCP/IP protocol stack on the servers and the advantage in physical network speed does not play a significant role. However, when looking at network transfers with larger payloads, the advantage of 10 Gbit networks becomes apparent as they provide significantly improved network round-trip times leading to better application response times.
2 Database clients in 2-tier configurations that can be configured to use an IPC connection for local communication might achieve slightly better values for this communication delay. IBM SAP International Competence Center
Although the main focus of this paper is on network round-trip delays, weve included the following chart documenting the network throughput rates of the various scenarios for completeness. The throughput rates become more relevant for administrative tasks like database backups or for database queries that transfer a large amount of data per request. The test with small (one byte) data packets does not really test the network throughput but the latency and the results show that there is very little difference between the 1Gb and 10Gb Ethernet numbers. The additional processing required on the VIO servers actually lead to lower throughput numbers for the 3-tier virtual 10 Gbit compared to the 3-tier dedicated 1Gbit scenario. IBM SAP International Competence Center
Copyright IBM Corp. 2013
16
The picture changes for the real throughput test with large packets. The faster speed of the 10 Gbit setup provides a significant improvement in network throughput and that even a fully virtualized 10 Gbit Ethernet scenario provides better throughput rates than a scenario with dedicated 1 Gbit Ethernet adapters and switches.
Figure 5 large packets bandwidth DB<->App-Server
IBM SAP International Competence Center
Copyright IBM Corp. 2013
17
Interactive SAP Workload
The chart in Figure 6 shows the average database request times for the SAP SD benchmark transactions for the dialog and update tasks.
Figure 6 SAP ERP DB-Dialog times
The next chart and Table 4 show the relative increase in database request times of the various scenarios compared to the 2-tier reference measurement.
Figure 7 Increase of SAP ERP SD DB-Request times vs. 2-tier
IBM SAP International Competence Center
Copyright IBM Corp. 2013
18
Incremental DB-Request times vs. 2-tier Query Update para 3-tier +16% +18% 3-tier ded. 10Gb +27% +34% 3-tier ded. 1Gb +57% +64% 3-tier virt. 10Gb +70% +81% 3-tier virt. 1Gb +78% +88% Table 4 Incremental DB-Request time vs. 2-tier
The SAP SD benchmark transactions and their database access patterns have been highly optimized in the past. With the increase of processing power over the last number of years, the transactions have now become more or less light-weight transactions.
From an end-user perspective, the more interesting statistic is the database request time for the task type dialog as it is part of the transaction response time. Comparing a dialog database request time of little more than 5 milliseconds in the 2-tier setup with about 10 milliseconds for the 3-tier configuration does not sound that much. An end- user would certainly not notice the difference for this particular set of transactions. However, the relative increase in database request time in our test scenarios was more than 70%.
In customer production systems, there are many business critical transactions that are substantially more heavyweight and often have to fit in common service level agreements (SLA) for end-user response times below one second. Let us assume a transaction with a response time of just under one second, where the database request time contributes 40% (or 400 milliseconds) of that time. If the database request time increases by 70% on a 3-tier application server instance, the end-user response time for this transaction would increase to 1.28 seconds and suddenly violate the SLA. An end-user will probably complain about a nearly 30% longer dialog response time in this case.
In general the network part is not dominating the overall response time for SAP standard transactions and they work well in 3-tier scenarios. Nevertheless, if a customer has long-running business critical transactions with many database accesses, one should carefully analyze the potential impact of additional network delays before moving to a 3-tier configuration.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
19
SAP Background Processing DB-Select Simulation [SINGLE_MULTI_READ] by SAP
The chart below shows the differences in database response times for simple select statements using the different topologies. The report was parameterized to perform 2000 select statements for each select variant (select single, select 2 rows, select 3 rows ). The database request time in this chart was normalized to an average time per selected row to provide a better comparison of the results for the different variants.
Figure 8 Row select time over query result volume
As expected an increase in the network delay has a much worse impact on database response times for applications doing a lot of single record reads compared to optimized select statements where more rows are retrieved with a single query. This is also described in the SAP Performance Standard. The test series clearly show the potential improvements by optimizing the application.
The worst case scenarios obviously are application reports that perform millions of database accesses retrieving one record only for each access. Running in a 2-tier setup, each database request would take about 94 microseconds. Comparing this to a common customer setup with a fully virtualized environment and a 10 Gigabit Ethernet infrastructure, the database request time for 3-tier scenarios would increase to about 260 microseconds, which is a factor of 2.75 slower or a difference of 166 microseconds per access. Lets assume a hypothetical background job that performs ten million such database accesses. The total database request time for this job would be about 15 minutes for the 2-tier scenario and 43 minutes with the 3-tier configuration. A significant portion of that time could be compensated by rewriting the application to fetch more than one record with each database access. If an application rewrite is not an option, then we clearly recommend scheduling such background jobs on a 2-tier application server instance only. IBM SAP International Competence Center
Copyright IBM Corp. 2013
20
Please note that this 3-tier number was measured in an environment with very good NIPING network round-trip times of less than 200 microseconds. Customer environments with NIPING round-trip times of more than 500 microseconds are not uncommon and this additional network latency would substantially increase the difference in database request time for the 3-tier scenario.
SAP Client Copy
The SAP client copy process already exploits optimized database statements. It is a typical example for how to minimize the impacts of network delays by using optimal database access strategies.
Despite this optimization, there is still a significant impact of the 3-tier scenarios on the overall runtime. The absolute runtime increased from about 55 minutes for the 2-tier scenario to 60 minutes for the para 3-tier configuration.
For the fully virtualized setups, the runtime was 72 minutes with the 10 Gbit Ethernet and 75 minutes with 1 Gbit Ethernet network.
Figure 9 SAP Client Copy elapsed runtime
The following Figure 10 and Table 5 show the increase in runtimes compared to the 2-tier reference run.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
21
Figure 10 Increase of SAP Client Copy processing time vs. 2-tier
Incremental SAP Client Copy time vs. 2-tier para 3-tier 9% 3-tier ded. 10Gb 14% 3-tier ded. 1Gb 21% 3-tier virt. 10Gb 31% 3-tier virt. 1Gb 36% Table 5 Incremental Client Copy runtime vs. 2-tier
Optimized database access strategies help to reduce the impact of network round-trip delays in 3-tier configurations, but even then the overall runtime of our client copy scenarios increased by more than 30%.
Compared to the physical separation of servers, a virtualized 3-tier setup on a single Power System showed only a relatively small increase in elapsed processing time.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
22
Effects of Process Parallelization for Client Copy
One way to mitigate the problems with long running background jobs is to exploit parallel processing. This is not always an option, but when it is available, it can be used to reduce the processing time to acceptable levels at the expense of an increased CPU usage.
Weve compared the runtime of SAP client copies for all three tested scenarios with various levels of parallel processing to the reference number of the single process client copy on the 2-tier application server instance. The chart below shows that the speed-up with parallel processing was pretty constant across all topologies.
The second purpose of this test scenario was to check, whether introducing additional network load (by running multiple processes in parallel) would have noticeable performance impacts. The slope of the curves for the various scenarios is about the same, which shows that the chosen workload was not high enough to reach any network limitation.
Figure 11 Parallel Client Copy processes
This table shows the average speed-up for all three topologies:
We used the workloads described in section SAP Workload Scenarios to measure the impact of wide area network latencies on SAP database request times.
The network latency simulator (ANUE) was used to add a delay to the communication path between beaci01 and beaas21. Therefore, the test procedure was modified slightly to run the longer running workloads (ERP benchmark, simulated background jobs, client copies) only on the remote application server instance beaas21.
We measured the performance impact at various simulated network delays between 1 and 125 milliseconds. After the test sequence with 5 milliseconds network latency completed, it was obvious that it would not make sense trying to run the ERP SD benchmark with an even higher network delay and a SAP client copy would have taken days. Therefore we decided to reduce the test cases to NIPING and the simulated background jobs for the remaining two latency tests (50 ms and 125 ms).
The measured round-trip time for the LAN tests with dedicated 10 Gb Ethernet adapters (without the ANUE network delay) was about 0.1 ms (100 microseconds) and 0.2 ms in a fully virtualized environment.
As the ANUE device provided us the opportunity to simulate LAN networks which do not provide such good round-trip times as our test landscape, we decided to perform also a few tests with network latency delays in the typical LAN range (latency delays of 0.1, 0.25, and 0.5 ms which correspond to round-trip delays of 0.2, 0.5, and 1 ms). NIPING Network Round-Trip Times The following chart shows the NIPING round-trip times with various simulated network delays. The results are essentially exactly the same as in the 3-tier measurements before plus the simulated round-trip delay added on top. This is no surprise as the NIPING processes basically do nothing else other than sending and receiving network packets. We used these NIPING tests mainly as a verification that ANUE network delay was configured as intended before the subsequent test scenario runs.
Figure 12 ANUE delay verification by niping IBM SAP International Competence Center
Copyright IBM Corp. 2013
24
Interactive SAP Workload
While the end-user response time for the SAP SD benchmark transactions increased only slightly for the 3-tier application server instance in a controlled LAN environment, the picture changes completely when extending the tests into a WAN network scenario. The results show that the application response time gets unacceptable pretty fast.
The impact is biggest on update processing. As mentioned before, update processing happens asynchronously and does not directly influence end-user response time. However, the longer running update processing will block the SAP work-process that executes the update task. Eventually, all available update work-processes will be busy leading to additional queuing effects.
At a certain point, long running updates will effect dialog transactions too, as some subsequent transactions expect that previously created documents are already stored in the various business tables on the database.
What the charts do not show is that we had to change the work-process configuration a number of times to compensate for the longer database processing. Otherwise, transactions would have aborted (because of update being too slow) and the dialog und update response times would have been much higher, as incoming requests would have to wait for free work-processes. We ran the LAN tests with a configuration of 12 dialog and 3 update work-processes. To achieve a successful run for the 10 ms round-trip delay test, we had to double the number of dialog work-processes to 24 and increase the number of update work- processes to 18.
Figure 13 ERP DB-WAN Response time
IBM SAP International Competence Center
Copyright IBM Corp. 2013
25
Simulated round-trip delay/ms 0 0,2 0,5 1 2 6 10 Increase DB dialog request time vs. 2-tier +98% 168% 295% 535% 1004% 2851% 4665% Increase DB update request time vs. 2-tier +148% 250% 430% 764% 1407% 4213% 8374% Table 7 Increase of DB-Request time in WAN vs. 2-tier
Figure 14 Exponential increases in GUI response time
SAP Background Processing
The next chart shows the average database request time per row for different network delays and should be compared with Figure 8 on page 19. The scale on the y-axis is logarithmic to allow for the wide variance of simulated network round-trip delays. As in the NIPING case, the results are virtually the same as in the 3-tier measurements shown in Figure 8 plus the simulated round-trip delay added on top.
Once again, using optimized database access patterns help to mitigate the impact of the additional network delay.
The average select time per row in the 2-tier scenario was about 25 microseconds when selecting 9 rows with each query. The respective number for the one millisecond round- trip delay was about 160 microseconds.
Our hypothetical background job fetching 10 million rows with 9 rows per select would run a little longer than 4 minutes on a 2-tier application server instance. The same background job running with only one millisecond network delay would already need more than 26 minutes. IBM SAP International Competence Center
Copyright IBM Corp. 2013
26
This makes it obvious that applications performing a large number of database access would not perform well in WAN environments.
Figure 15 DB-Background Processing Select times (ZTEST-ABAP)
SAP Client Copy
For this set of tests we scheduled the client copy with 9 parallel processes. The runtime of the 2-tier reference measurement was about 28 minutes. This increased to 39 minutes for the 3-tier configuration. The runtime increased to 70 minutes already with a 1 ms round-trip delay, which is actually a delay one might experience in a somewhat problematic LAN configuration. Moving to the higher delays, even with a network round-trip delay of 10 ms only, the client copy runtime already increased to more than 5 hours. This clearly shows that it does not make sense trying to run background jobs with a major database component in a WAN setup.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
27
Figure 16 3-tier Client Copy runtime over App-Server. Latency
SAP - SD Queries [DB_READ_UPDATE]
In this test we used the load scenario [DB_READ_UPDATE], which simulates SAP transactions with heavy network I/O. The test report executed several sets of database selects in a sequence. We measured the total runtime and the average runtime for selecting a single database row in milliseconds. Increasing the network round-trip delay once again shows a rise in total response time caused by the increased request time for fetching a single database row. With higher network round-trip times, the total test response time rises up extremely. Please note that the chart uses a logarithmic scale for the y-axis, because of the huge differences in average request time per row (from about 500 microseconds in the 2-tier case climbing up to almost 100 milliseconds for the test scenario with a simulated network round-trip delay of 50 milliseconds).
IBM SAP International Competence Center
Copyright IBM Corp. 2013
28
Figure 17 DB Select Mix Report
IBM SAP International Competence Center
Copyright IBM Corp. 2013
29
5. Power Systems (AIX) Specific Observations Adapter / OS / Network Settings
During the SAP tests, we tested various AIX system parameters, in particular at network level, and analysed the impact on SAP results.
Network tuning (referred as tuned in tests):
During network traffic, interrupt coalescing is introduced to avoid flooding the host with too many interrupts. Consider a typical situation for a 1-Gbps Ethernet: if the average package size is 1000 bytes, to achieve the full receiving bandwidth, there will be 1250 packets in each processor tick (10ms). Thus, if there is no interrupt coalescing, there will be 1250 interrupts in each processor tick, wasting processor time with all the interrupt processing. Interrupt coalescing is aimed at reducing the interrupt overhead with minimum latency. There are two typical types of interrupt coalescing in AIX network adapters.
Most 1-Gbps Ethernet adapters, except the HEA (Host Ethernet Adapter) adapter, use the interrupt throttling rate method, which generates interrupts at fixed frequencies, allowing the bunching of packets based on time. The default interrupt rate is controlled by the intr_rate parameter, which is 10000 times per second by default.
Most 10-Gb Ethernet adapters and HEA adapters use an advanced interrupt coalescing feature. A timer starts when the first packet arrives, and then the interrupt is delayed for n microseconds or until m packets arrive. For the 10-Gb Ethernet adapter, the n value corresponds to intr_coalesce, which is 5 microseconds by default. The m value corresponds to receive_chain, which is 16 packets by default. Note the attribute name for earlier adapters might be different.
Todays Power7+ processors are much faster than the processors that were dominant when the 1G/10G interrupt coalescing AIX default parameters were chosen. Additionally, the potential overhead of the processor is also linked to the type of workload the processor needs to deal with. So we did several tests deactivating the interrupt coalescing parameters on the 1G and 10G physical adapters.
1G adapters: Changing intr_rate parameter of the physical adapter from 10000 to 0.
10G adapters: Changing intr_coalesce parameter of the physical adapter from 5 to 0 and changing receive_chain of the physical adapter from 16 to 1.
Test conclusion 1G : intr_rate parameter=0 definitely improves the network latency. A system measurement tool as well as SAP NIPING test demonstrate the latency is dropping about 0.1ms thanks to this tuning (for example, from 0.28 to 0.2 ms for a system test with VIO servers)
10G: The system measurement tool as well as SAP tests do not show sensible improvement using these tuning.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
30
Figure 18 Impact of Ethernet adapter tuning on round-trip times (in ms)
Throughput tuning (referred as tuned_2 in tests):
Using the experience of other benchmarks done in Montpellier that focused on network throughput, we did some tests (referred as "tuned_2") with specific network throughput parameters:
Nevertheless, the SAP tests which were more latency driven did not show any significant improvement using these specific throughput parameters. IBM SAP International Competence Center
Copyright IBM Corp. 2013
31
VIO Utilization
The PowerVM virtualization platform offers several options to give processor capacities to a logical partition:
Dedicated processors: simply assign real physical processors (cores) to the logical partition.
Shared processors (micro-partitioning): create a processor shared pool with physical processors (cores) and then assign a virtual fraction of this pool to the logical partition.
Donating (shared dedicating) processors: Compromise between the dedicated mode and the shared mode. Donating mode offers the simplicity of the dedicated mode with approaching performance, and almost the flexibility of the shared mode.
These options are available not only for general-purpose logical partitions but also for specific logical partitions such as Virtual I/O servers. When doing network tests, the best compromise regarding performance is achieved with VIO servers in donating mode. This is also a recommendation from SAP specialists and is the mode we used in all SAP test scenarios.
Figure 19 Network round-trip times (in ms) using different processor modes for VIO servers
Note: The AIX nmon monitoring tool provides several reports about processors consumption such as LPAR and CPU_ALL (but there are others), but all are not fully accurate according to the partitions processor mode: LPAR is more adapted to shared mode CPU_ALL is more adapted to dedicated mode and donating mode
When using dedicated partition, the nmon tool does not even provide the LPAR report as it is definitely not adapted to this mode. But as donating mode is a dedicated mode with some shared mode inside, the nmon tool provides both LPAR and CPU_ALL reports in such case, even if the LPAR report is definitely not accurate. IBM SAP International Competence Center
Copyright IBM Corp. 2013
32
Hereunder an example in our case, using VIO servers in donating mode:
The LPAR report seems to show the VIO server is working harder before the test starts and after the test stops!
While the CPU_ALL report shows accurate data without this side effect before and after the test begins.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
33
6. Summary
The measurement results documented in this paper show that the network round-trip time for database accesses can have a substantial impact on SAP background job processing times and transaction response times. An increase in processing time of more than 30% compared to a 2-tier scenario is quite common, even for applications which make use of optimized database access patterns.
Less optimized applications that perform a large number of database requests i.e., where each request returns only a few records - are especially problematic. Their runtime can easily increase by 100% or more.
Before moving an existing 2-tier SAP landscape to a 3-tier configuration, one should carefully examine the database times and access patterns of affected critical business transactions and background jobs to ensure that the additional network latency does not result in SLA violations.
Reversely, for existing 3-tier implementations a good method to improve processing times for problematic applications with a significant amount of database request times is to run such applications on a 2-tier application server instance (using background job scheduling or special logon groups for interactive transactions).
The implementation of a 3-tier-in a box setup (multiple partitions within a single physical server, in this document referred to as para 3-tier) using PowerVM connectivity provide an alternative with relatively low (<10%) incremental network delays while maintaining a high degree in resource and administration flexibility.
An often advertised use case for hybrid cloud scenarios is to buy incremental application server capacity for certain periods of time only, for example for month-end or year-end processing. According to our test results this temporary increase of processing capacity is only feasible for applications with a rather small database component and thus low network traffic between DB-server and remote (cloud) application servers. Most month-end and year-end processing jobs perform a large number of database accesses. Running them in a cloud would most likely fail, unless a cloud provider can guarantee network round-trip times equivalent to Gbit Ethernet LAN networks. In many cases, PowerVM capabilities for non-disruptive resource adjustment will provide a better option delivering constant processing times (given a flexible billing model is established).
IBM SAP International Competence Center
Copyright IBM Corp. 2013
34
7. Technical Appendix Detailed Test Landscape Network Layout at IBM Client Center in Montpellier
IBM SAP International Competence Center
Copyright IBM Corp. 2013
35
IBM SAP International Competence Center
Copyright IBM Corp. 2013
36
ANUE Network Latency Simulator
The ANUE was configured in the 1Gbit network, as it provides only two 1Gbit interfaces for external connections.
One can modify the latency at blade1 level and at blade2 level. During out tests, we kept both values symmetric. Looking at server YO88WU, all outbound traffic is delayed by blade1 and passed through by blade 2. All inbound traffic is delayed by blade2 and passed through by blade1. The overall round-trip time referred to during this document is defined by the aggregate delay time of both ANUE blades plus the native network latencies in the LAN segments described above.
Some screenshots from ANUE configuration GUI:
IBM SAP International Competence Center
Copyright IBM Corp. 2013
37
Note that the Latency per Packet output in this terminal window is actually the round- trip time measured in the netlatency.ksh script. That means a network delay of 4 ms configured on each blade results in a round-trip time of about 8.2 ms.
ANUE Network statistics panel:
IBM SAP International Competence Center
Copyright IBM Corp. 2013
38
8. About the authors
Walter Orb walter.orb@de.ibm.com Walter Orb is a technical consultant working at the IBM SAP International Competence Center in Walldorf. He has more than twenty years experience with SAP on AIX and Power Systems, with a major focus on system performance, benchmarks, and large-scale system tests.
Matthias Kchl koechl@de.ibm.com Matthias Kchl is a member of the IBM SAP International Competence Center in Walldorf. As a certified Senior Architect and CIM certified Marketer he is in charge for field enablement, marketing and education around SAP Solutions running on the IBM POWER and PureFlex platforms. Fabrice Moyen fabrice_moyen@fr.ibm.com Fabrice Moyen is a benchmark manager at the EMEA IBM Client Center in Montpellier, France, working in the IBM Power benchmark team, and with more specific competencies on PowerHA, IBM Systems Director and PureFlex. He is also a member of the new IBM Power Linux Center recently inaugurated in Montpellier (in addition to three other IBM Power Centers in Austin, New-York and Beijing).
Hans-Jrgen Reiss - hans-juergen.reiss@sap.com Hans-Jrgen Reiss is a member of the Performance & Scalability team from SAP Walldorf, Germany. He is a specialist in network sizing and analysis for SAP applications. He is working with SAP development on architectural design and optimization of SAP applications for SAP cloud solutions.
IBM SAP International Competence Center
Copyright IBM Corp. 2013
39
9. Trademarks and special notices Copyright IBM Corporation 2013. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml. SAP and other SAP products and services mentioned herein, as well as their respective logos, are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of the specific Statement of Direction. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Photographs shown are of engineering prototypes. Changes may be incorporated in production models. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.