Professional Documents
Culture Documents
Mike Perks
Kenny Bain
Pawan Sharma
Section 2 describes the methodology and tools that were used for
the scale out performance test including Login VSI.
The Lenovo results show a best in class achievement of 5000 persistent desktops with Citrix XenDesktop.
This type of result has not been documented before, partly because 5000 persistent desktops require a
significant investment in storage. However, as shown in this paper, a combination of Atlantis Computing
software and IBM FlashSystem storage turns the usual I/O performance problem into a non-event. Without
data reduction technology, enterprise class flash storage can be expensive. The Atlantis Computing
de-duplication and compression facilities make it cost effective (the storage cost for 5000 users is less than
$130 per user, including the Atlantis license). Moreover, the performance of logging onto a desktop in 16
seconds and rebooting 5000 desktops in 20 minutes makes this system easy to use for users and IT staff.
A total of 35 Lenovo Flex System servers in 30U were used to support 5000 users with up to 160 persistent
users per server. The Lenovo servers provide a dense and cost effective solution in terms of CAPEX and
OPEX. The performance results show that the usual persistent desktop I/O problem was effectively eliminated
and the compute server performance is the driving factor. More compute servers can be added to support more
users or more compute intensive users; it all depends on the individual customer environment and user load.
For more information about the Lenovo Client Virtualization solution for Citrix XenDesktop, contact your
Lenovo sales representative or business partner. For more information about the Citrix XenDesktop reference
architecture that includes information on the LCV solution, performance benchmarks, and recommended
configurations, see this website: http://lenovopress.com/tips1278.
Citrix XenDesktop provides a connection broker service between clients and user VMs to support virtual
applications or virtual desktops. XenDesktop uses management VMs and supports multiple hypervisors,
although only ESXi is used for the scale out performance test. Atlantis Computing software provides an
important service, which substantially reduces the amount of IO from the virtual desktops to the shared
storage.
Figure 1 shows an overview of the main components in the LCV. The rest of this section describes the subset
of hardware and software components that are used for the scale out performance test.
Tablets
IBM Storwize
x3550 M4/3650 M4
Laptops
NetApp NAS + DAS
XenDesktop
ESXi
EMC VNX
ThinkServer RD350/RD450
Workstations
Greatest choice for clients in processor type and OS platform all in the same chassis and is managed
from a single point of control.
Figure 2: Flex System Enterprise Chassis, and Flex System compute nodes
ibm.com/systems/pureflex/overview.html
The G8264 switch is ideal for latency-sensitive applications, such as client virtualization. It supports Virtual
Fabric to help clients reduce the number of I/O adapters to a single dual-port 10 Gb adapter, which helps
reduce cost and complexity. The G8264 switch supports the newest protocols, including Data Center
Bridging/Converged Enhanced Ethernet (DCB/CEE) for support of FCoE, in addition to iSCSI and NAS.
Atlantis software works with any type of heterogeneous storage, including server RAM, direct-attached storage
(DAS), SAN, or network-attached storage (NAS). It is provided as a VMware ESXi compatible VM that presents
the virtualized storage to the hypervisor as a native data store, which makes deployment and integration
straightforward. Atlantis Computing also provides other utilities for managing VMs and backing up and
recovering data stores.
For the purposes of this scale out test, the Atlantis ILIO Persistent VDI version was used in disk-backed mode.
This mode provides the optimal solution for desktop virtualization customers that are using traditional or
existing storage technologies that are optimized by Atlantis software with server RAM. In this scenario, Atlantis
employs memory as a tier and uses a small amount of server RAM for all I/O processing while using the
existing SAN, NAS, or all-flash arrays storage as the primary storage. Atlantis storage optimizations increase
the number of desktops that the storage can support by up to 20 times while improving performance.
Disk-backed configurations can use various different storage types, including host-based flash memory cards,
external all-flash arrays, and conventional spinning disk arrays.
Login VSI supports two launcher modes: serial and parallel. Serial mode is normally used to test the maximum
workload for a specific server. For the scale out performance testing, Login VSI was used in parallel mode so
that the login interval could be substantially reduced from the default of every 30 seconds and the simulated
load evenly distributed across the Login VSI launchers and compute servers. The user login interval was
varied to achieve the best result given the available servers and in many cases one logon every two seconds
was used. This means that 5000 users logon over a period of 10,000 seconds (approximately 2.75 hours) and
the total test time (including the standard 30 minute Login VSI idle period and logoff) would be about 3.5 hours.
All user VMs were pre-booted before the test so they were idle and ready to receive users. The Login VSI
medium workload was chosen to represent typical customer workloads. The more intensive heavy workload
simply required more servers to support the extra CPU load.
During the scale out performance test, different performance monitors were used to ensure that no single
component is overloaded. The esxtop tool was used for the compute servers and storage monitoring tools for
the IBM FlashSystem shared storage. The results from these tools are described in section 5 on page 17.
After each test run, the user VMs and Login VSI launcher VMs are rebooted and everything is reset and ready
for the next run a few hours later. Two or three runs often were done for each test variation.
Benchmarking: Make the correct decisions about different infrastructure options that are based on
tests.
Load-testing: Gain insight in the maximum capacity of your current (or future) hardware environment.
Capacity planning: Decide exactly what infrastructure is needed to offer users an optimal performing
desktop.
Change Impact Analysis: To test and predict the performance effect of every intended modification
before its implementation.
After the loop finished, it restarted automatically. Each loop takes approximately 14 minutes to run. Within each
loop, the response times of specific operations are measured at a regular interval: six times within each loop.
The response times of these seven operations is used to establish the VSImax score. VSImax is the maximum
capacity of the tested system expressed in the number of Login VSI sessions. For more information see this
website: loginvsi.com/
http://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.monitoring.doc/GUID-D89E8267-C
74A-496F-B58E-19672CAB5A53.html
For more information about interpreting esxtop statistics, see this website:
http://communities.vmware.com/docs/DOC-9279
5000 Persistent User Scale out Test
9
with Citrix XenDesktop and Atlantis Computing
2.4 Superputty
Superputty is a Windows GUI application that allows multiple PuTTY SSH clients to be opened, one per tab. In
particular, this tool was used to allow simultaneous control of multiple SSH sessions and start tools (such as
esxtop) in each session at the same time.
SAN24B-5 G8264
Switch
Switch
User Network
Management Network
Storage Network
SAN Network
Active Directory,
IBM FlashSystem DHCP, and NAS Storage for
840 Storage for DNS Server results, logs,
5000 persistent management and
user VMs Launcher Servers
launcher VM
images
Compute Servers
(for users and
management)
x222 2 x E5-2470 (Sandy Bridge EN) in each half 192 GB each half (384 total) 5x2
The 35 compute nodes are placed in three Lenovo Flex chassis, which use a total of 30U in a rack. Each Flex
chassis is configured with a EN4093R 10 GbE switch that is connected to a Lenovo G8264 64-port TOR 10
GbE ethernet switch. Each chassis is connected by using four a 40 GbE cable for best performance. An extra
EN4093R switch in each chassis and a second G8264 TOR switch can be used for redundancy.
Each Flex chassis also contains an FC3171 or FC5022 FC switch that is configured in pass-thru mode. The
chassis switches are connected with four LC-LC fibre cables to an IBM SAN24B-5 TOR SAN switch. An extra
FC switch in each chassis and a second SAN24B-5 TOR switch can be used for redundancy. All zoning for the
compute nodes and IBM FlashSystem 840 storage is centralized in the SAN24B-5 switch.
The IBM FlashSystem 840 storage server was configured with a full complement of 12 4 TB flash cards for a
total of 40 TB of redundant storage (usable after two-dimensional RAID protection). The 5000 persistent virtual
desktops used less than 5 TB of FlashSystem capacity after Atlantis Computing data reduction. The
FlashSystem 840 is connected to the SAN24B-5 switch by using four LC-LC fibre cables, two to each storage
controller for redundancy. Another four fibre cables can be used to connect to a second SAN switch for further
failover protection.
Even with redundancy, there are enough ports on the IBM FlashSystem 840 for a direct FC connection from
the Flex chassis FC switches. Pass-thru mode to a TOR SAN switch was used to show how a larger SAN
network is built.
All of the management VMs that are required by Citrix XenDesktop and Atlantis ILIO center are split across two
x240 compute nodes. The configuration and number of these VMs is in Table 2.
The VM for the Active Directory, DNS, and DHCP services is shared by the servers in the system under test
and the load framework and a second instance is used for redundancy. Windows Server 2012 R2 can be used
instead of Windows 2008 R2 SP1.
Figure 13 shows the compute servers, shared storage, and networking hardware for the system under test.
The load framework consists of 29 compute servers and one management server that uses Lenovo x3550 rack
servers with the VMware ESXi 5.5 hypervisor, and NAS shared storage for the Login VSI launcher VMs and
performance data.
The compute servers for the load framework must have adequate performance to support the required load of
8 - 12 Login VSI launcher VMs. These compute servers often have two Westmere EP or better processors and
96 GB or more of memory. Each launcher compute server has a USB key with ESXi 5.5 and a two-port
10 GbE adapter that is connected to the same G8264 10 GbE TOR switch that is used by the system under
test. There is no need for an FC connection to the IBM FlashSystem storage, although there is nothing
preventing centralization of the storage on FlashSystem. Instead, all of the data for the load framework is
stored on NAS shared storage, which is connected to the same G8264 10 GbE switch.
The management server for the load framework supports several VMs. The main VM is used to run Login VSI
Launcher and Analyzer tools. In addition a separate Citrix XenDesktop configuration is used to provision
Figure 14 shows the compute servers and storage hardware for load framework.
(launchers)
Management VMs
Master User VM
Master Launcher VM
Login VSI
Atlantis Computing software
5000 Persistent Desktop VMs
The Active directory, DNS, and DHCP server is shared between all compute servers on the network
(from the system under test and the load framework).
There are four XenDesktop Delivery Controllers.
The mapping between user IDs and the names of the persistent desktop VMs is statically specified to
the connection broker rather than being randomly assigned the first time it is needed. This specification
makes it easier to remedy any VM setup problems before the first performance test. If this is not done,
the assignment of user IDs to VMs must be rerun until it completes successfully for all 5000 users.
1. Create a Windows 7 Professional 64-bit with SP1 VM. The following VM parameters should be
specified: 1 vCPU, 1024 MB vRAM, and 24 GB Disk.
2. Configure Windows 7, networking, and other OS features.
3. Install VMware VMtools for access by vCenter and reboot.
4. Join to the Active Directory domain and reboot.
5. Disable all Internet Explorer plug-ins.
6. Ensure that the firewalls are turned off.
7. Enable remote desktop for remote access to the desktop.
8. Install the Windows applications that are needed for Login VSI medium workload, including Microsoft
Office, Adobe Acrobat, and so on.
9. Apply the Citrix recommended optimizations. For more information, see this website:
support.citrix.com/article/CTX125874
10. Install the Citrix XenDesktop Virtual Desktop Agent (VDA). This step is not needed for the brokerless
RDP test scenario.
This step is not needed for the brokerless RDP test. The Citrix desktop service randomly selects a
controller from the list (grouped or ungrouped) until a successful connection to a controller is
established.
loginvsi.com/documentation/index.php?title=Installation
A separate management VM is used to run Login VSI performance tests and analyze the results.
As noted earlier a Citrix MCS environment is used to create the launcher VMs. First add all of the physical
launcher machines to VMware vCenter and also Citrix XenCenter. Then using the master launcher image as a
template, the 288 launcher VMs are created using MCS dedicated mode. The number of launcher VMs per
physical server depends on its performance; however, 8 - 12 launcher VMs works well.
The Login VSI tool is started to ensure that all of the launchers were created properly and are ready to use.
Finally, a script is used to add the 5000 unique user IDs to AD. The password for all of these users is the same
for simplicity.
For the brokerless RDP test scenario, the following slightly different steps are used for running Login VSI:
Ensure that the LoginVSI RDP group has access to the Master image.
Use vCenter to copy and paste the IP addresses of the user VMs that are performing the LoginVSI test
into a CSV file (named %csv_target% in the commandline example below).
In the Login VSI configuration, replace the commandline with the following:
It is a recommended Atlantis best practice that each ILIO VM has its own logical unit number (LUN) on shared
storage. Therefore, 35 volumes (each with 300 GB capacity) were created on the IBM FlashSystem storage.
By using vCenter, each physical server has access to all 35 of the volumes, even though only one is actually
used per physical server. The ILIO master VM and the master user VM is then placed in one of those volumes.
Scripts that are available from Atlantis Computing are used to clone the ILIO VM and the master user VM
across all 35 compute servers in preparation for the next step.
x222 2 x E5-2470 (Sandy Bridge EN) in each half 5x2 100 1000
Total 5000
A command line script from Atlantis and a CSV file are used to fast clone the master VM on each compute
server to create the required number of VMs on each of the servers. A naming scheme of the server name and
VM number is used to create a set of 5000 uniquely named VMs. The cloning process can take half a day to
complete, but needs to be done only once for each different master VM image.
Each VM is started to register with active directory and automatically assign the machine name to the VM. This
process can be done as a separate step or as part of the fast cloning process described above. The VMs are
then shutdown via vCenter.
The dedicated machine catalog is created for Citrix XenDesktop and a 5000 line CSV file is used to
automatically insert all of the named VMs into the machine catalog. A desktop group is created and the 5000
VMs are added to it. XenDesktop automatically starts each VM and ensures that it is accessible from
XenDesktop. Sometimes it is necessary to do some manual steps to get all of the VMs into the correct state.
The last step is to perform a standard Login VSI profile run to automatically create the user profile in each
persistent desktop. Because of the static assignment of names, any failures can be corrected manually or by
rerunning Login VSI. A final restart of the guest operating systems and the 5000 persistent desktops are ready
for a performance test.
This section describes the results of the scale out tests by examining the performance of Login VSI test, the
compute servers, and shared storage.
Figure 15 shows the output from LoginVSI with a new logon every second. Out of 5000 started sessions, 4998
successfully reported back to Login VSI. The average response time is extremely good with a VSI baseline of
860 milliseconds (ms). The graph is flat with only a slight increase in the average between the first and last
desktop. As measured by Login VSI, the longest time to logon for any session was 8 seconds.
Figure 15: Login VSI performance result for Brokerless by using RDP
Figure 16 shows the percentage CPU utilization by using representative curves for each of the four different
servers that were used in the test. The utilization slowly climbs as more users logon and then sharply drops off
after the steady state period ends and the users are logged off. The E5-2470 based server has the lowest
Figure 17 shows the total number of server IOPS as reported by Esxtop by using representative curves for
each of the four different servers that were used in the test. The IOPS slowly climb as more users logon and
then sharply drops off after the steady state period ends and the users are logged off. The E5-2470 based
server has the lowest IOPS because only 100 VMs are started on those servers. The other three servers have
similar curves because all have 160 VMs. The IOPS curves are spiky and show that the number of IOPS at any
instant of time can vary considerably. The peaks are most likely because of logons.
Figure 18 shows the total number storage IOPS as measured by the IBM FlashSystem 840. The write IOPS
curve shows the classic Login VSI pattern of gradual building of IOPS (up 12:56am), then a steady state period
of 30 minutes (12:56am to 1:26am), and finally a peak for all of the logoffs at the end.
The read IOPS are low as Atlantis Computing software is managing most of them out of its in-memory cache.
The write IOPS are fairly low, peaking at less than 30,000 IOPS, which is 6 per persistent desktop. Atlantis
Computing software is using its data services to compress, de-dupe, and coalesce the write IOPS.
Figure 20 shows the storage request latency in milliseconds as measured by the IBM FlashSystem 840. The
curve shows that the average read latency is less than 200 us and even drops to zero during the steady state
phase because all of the read requests are satisfied by the Atlantis Computing cache. The write latency also
often is less than 200 us with occasional peaks, which are still less than 1000 us (1 millisecond), except during
the 5000 virtual desktop restart.
Figure 21 shows the output from LoginVSI with a new logon every 2 seconds, which is half the interval of the
brokerless RDP scale out test. Out of 5000 started sessions, 4997 successfully reported back to Login VSI and
is a successful run. The average response time is good, with a VSI baseline of 1356 ms. The graphs for
minimum and average response times are flat with only a slight increase in the average between the first and
last desktop. The graph for the maximum response time increases steadily and only shows the worst case. As
measured by Login VSI, the longest time to logon for any session was 16 seconds.
Figure 22 shows the percentage CPU utilization by using representative curves for each of the four different
servers that were used in the test. The utilization slowly climbs as more users logon and then sharply drops off
after the steady state period ends and the users are logged off. The E5-2470 based server has the lowest
utilization because only 100 VMs are started on those servers. The E5-2670 and E5-2690 CPUs have the
highest utilization (95%) compared to the faster E5-2690v2 and all three servers have 160 VMs.
Figure 23 shows the total number of server IOPS as reported by Esxtop by using representative curves for
each of the four different servers that were used in the test. The IOPS slowly climb as more users logon and
then sharply drops off after the steady state period ends and the users are logged off. The E5-2470 based
server has the lowest IOPS because only 100 VMs are started on those servers. The other three servers have
similar curves as all have 160 VMs. The IOPS curves are spiky, which shows that the number of IOPS at any
instant of time can vary considerably with the peaks are most likely because of a logon.
Figure 24 shows the total number storage IOPS as measured by the IBM FlashSystem 840. The write IOPS
curve shows the classic Login VSI pattern of gradual building of IOPS. The steady state period is less
discernible in this graph and occurs around 9:15 p.m. The read IOPS are low as Atlantis Computing software is
managing most of them out of its in-memory cache. The number of read IOPS increases substantially at logoff.
The write IOPS are quite low, peaking at less than 35,000 IOPS, which is 7 IOPS per persistent desktop. Again,
Atlantis Computing software is using its data services to compress, de-dupe, and coalesce the write IOPS.
Figure 26 shows the server latency in milliseconds as reported by Esxtop by using representative curves for
each of the four different servers that were used in the test. The average latency is 350 microseconds (us) and
is fairly constant throughout the whole test. The latency for the servers with the E5-2470 CPU tends peak
higher, but is usually not more than 1 ms.
lenovopress.com/tips1278
Atlantis Computing
atlantiscomputing.com/products
ibm.com/storage/flash
VMware vSphere
vmware.com/products/datacenter-virtualization/vsphere
Citrix XenDesktop
citrix.com/products/xendesktop
Acknowledgements
Thank you to the teams at Atlantis Computing (Mike Carman, Bharath Nagaraj), IBM (Rawley Burbridge), and
ITXen (Brad Wasson) for their tireless work on helping with the performance testing.