Professional Documents
Culture Documents
Abstract
The Remote Desktop Session Host (RD Session Host) role service lets multiple concurrent users run Windows-based applications on a remote computer running Windows Server 2008 R2. Microsoft RemoteFX delivers a rich user experience for session-based and virtual desktops to a broad range of client devices. This white paper is intended as a guide for capacity planning of RD Session Host in Windows Server 2008 R2 and RemoteFX in Windows Server 2008 R2 with Service Pack 1 (SP1). It describes the most relevant factors that influence the capacity of a given deployment, methodologies to evaluate capacity for specific deployments, and a set of experimental results for different combinations of usage scenarios and hardware configurations.
Copyright Information
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. 2010 Microsoft Corporation. All rights reserved. Microsoft, Hyper-V, Windows, and Windows Server are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners.
Contents
Introduction................................................................................................................5 Capacity Planning for a Specific Deployment.............................................................6 Problem statement..................................................................................................6 What determines the capacity of a system?...........................................................7 Usage scenario.....................................................................................................7 Hardware resources.............................................................................................7 Typical evaluation approaches................................................................................7 Load simulation tests...........................................................................................9 Testing methodology.............................................................................................10 Test bed configuration.......................................................................................11 Load generation.................................................................................................12 Response time measurement............................................................................12 Scenarios...........................................................................................................14 Examples of test results for different scenarios....................................................16 Tuning Your Server to Maximize Capacity................................................................18 Impact of hardware on server capacity.................................................................18 CPU....................................................................................................................19 Memory..............................................................................................................22 Disk storage.......................................................................................................25 Network..............................................................................................................26 Impact of Remote Desktop Services features on server capacity.........................28 32-bit color depth...............................................................................................28 Windows printer redirection (XPS)......................................................................28 Compression algorithm for RDP data.................................................................28 Desktop Experience pack...................................................................................29 RemoteApp programs...........................................................................................29 Hyper-V.................................................................................................................30 Impact of Windows System Resource Manager (WSRM).......................................31 Comparison with Windows Server 2008................................................................32 Conclusions..............................................................................................................32
Capacity planning on RD Session Host running RemoteFX.......................................33 Introduction..............................................................................................................33 Performance testing and Scalability testing on the system...................................33 Testing methodology.............................................................................................33 Result summary.......................................................................................................34 CPU Utilization:...................................................................................................34 Network utilization................................................................................................35 Appendix A: Test Hardware Details..........................................................................36 Appendix B: Testing Tools........................................................................................37 Test control infrastructure.....................................................................................37 Scenario execution tools.......................................................................................38 Appendix C: Test Scenario Definitions and Flow Chart.............................................39 Knowledge Worker v2...........................................................................................39 Knowledge Worker v1...........................................................................................40 Appendix D: Remote Desktop Session Host Settings................................................42 Appendix E: Test scenario for testing RemoteFX for RD Session Host server...........44 Appendix F: Group policy settings for testing RemoteFX for RD Session Host server .................................................................................................................................45 Appendix F: Group Policy Settings for Testing RemoteFX on RD Session Host server
Introduction
The Remote Desktop Session Host (RD Session Host) role service lets multiple concurrent users run Windows-based applications on a server running Windows Server 2008 R2. This white paper is intended as a guide for capacity planning of an RD Session Host server running Windows Server 2008 R2. In a server-based computing environment, all application execution and data processing occurs on the server. As a consequence, the server is one of the most likely systems to run out of resources under peak load and cause disruption across the deployment. Therefore it is very valuable to test the scalability and capacity of the server system to determine how many client sessions a specific server can support for specific deployment scenarios. This document presents guidelines and a general approach for evaluating the capacity of a system in the context of a specific deployment. Most of the key recommendations are also illustrated with examples based on a few scenarios that use Microsoft Office applications. The document also provides guidance on the hardware and software parameters that can have a significant impact on the number of sessions a server can support effectively.
requires 14 gigabytes (GB) of RAM to properly accommodate the target number of 100 users for a certain deployment, including peak load situations (all users open a memory intensive application at the same time), it is a reasonable expectation that the estimate coming from the planning exercise would be within the 14-16 GB of RAM range. But an estimate of 24 GB of RAM would be a significant waste of resources, because a significant fraction of that RAM (14 GB) would never be used.
Hardware resources The server hardware has a major impact on the capacity of a server. The main hardware factors that have to be considered are CPU, memory, disk storage, and network. The impact of each of these factors will be addressed in more detail later in this white paper.
acceptable levels of load and the limiting factors, and offers a good environment for iterating while adjusting various software and hardware configurations. 3. Projection based on single user systems. This approach uses extrapolation based on data collected from a single user system. In this case, various key metrics like memory usage, disk usage, and network usage are collected from a single user system and then used as a reference for projecting expected capacity on a multi-user system. This approach is fairly difficult to implement because it requires detailed knowledge of system and application operations. Furthermore, it is rather unreliable because the single user system data contain a significant level of noise generated by interference with the system software. Also, in the absence of sophisticated system modeling, translating the hardware performance metrics (CPU speed, disk speed) to the target server from the reference system used to collect the data is a complex and difficult process. In general, the first approach will prove to be more time and cost effective for relatively small deployments, while the second approach may be preferable for large deployments where making an accurate determination of server capacity could have a more significant impact on purchasing decisions. Load simulation tests Load simulation, as outlined above, is one of the more accurate techniques for estimating the capacity of a given system. This approach works well in a context in which the user scenarios are clearly understood, relatively limited in variation, and not very complicated. Generally it involves several distinct phases: 1. Scenario definition. Having a good definition of the usage scenarios targeted by the deployment is a key prerequisite. Defining the scenarios may turn out to be complicated, either because of the large variety of applications involved or complex usage patterns. Getting a reasonably accurate usage scenario is likely the most costly stage of this approach. It is equally important to capture not only the right sequence of user interactions, but also to use the right data content (such as documents, data files, media content) because this also may play a significant role in the overall resource usage on the system. Such a scenario can be built based on interviews with users, monitoring user activity, tracking metrics on key infrastructure servers, project goals, etc. 2. Scenario implementation. In this phase, an automation tool is used to implement the scenario so that multiple copies can be run simultaneously against the test system. An ideal automation tool will drive the application user interface from the Remote Desktop Connection client, has a negligible footprint on the server, is reliable, and tolerates variation in application behavior well due to server congestion. At this stage, it is also important to have a clear idea of the metrics used to gauge how viable the system is at
various load levels and to make sure that the scenario automation tools accommodate collecting those metrics. 3. Test bed setup. The test bed typically lives on an isolated network and includes 3 categories of computers: a. The RD Session Host server(s) to be tested b. Infrastructure servers required by the scenario (such as IIS, SQL Server, Exchange) or that provide basic services (DNS, DHCP, Active Directory) c. Test clients used to generate the load Having an isolated network is a very important factor because it avoids interference of network traffic with either the Remote Desktop Connection traffic or the application-specific traffic. Such interference may cause random slowdowns that would affect the test metrics and make it difficult to distinguish such slowdowns from the ones caused by resource exhaustion on the server. 4. Test execution. Typically this consists of gradually increasing the load against the server while monitoring the performance metrics used to assess system viability. It is also a good idea to collect various performance metrics on the system to help later in identifying the type of resources that come under pressure when system responsiveness degrades. This step may be repeated for various adjustments made based on conclusions derived from step 5. 5. Result evaluation. This is the final step where, based on the performance metrics and other performance data collected during the test, you can make a determination of the acceptable load the system can support while meeting the deployment performance requirements and the type of resources whose shortage causes the performance to start degrading. The conclusions reached in this step can be a starting point for a new iteration on hardware adjusted to mitigate the critical resource shortage in order to increase load capacity. Coming up with a single application-independent criterion for defining when an application performance degrades is fairly difficult. However, there is an interaction sequence that captures the most fundamental transaction of an interactive application: sending input, such as from a keyboard or mouse, to the application and having the application draw something back in response. The most trivial case of this would be typing, but other interactions like clicking a button, or selecting a check box or menu item also map in a very straightforward way to this type of transaction. The reason this interaction pattern stands out is that it captures the fundamental intention of connecting to a remote desktop: allowing a user to interact with a rich user interface running on a remote system the same way he or she would if the application were running locally. Although this metric will not cover all relevant metrics for tracking application performance, it is a very good
approximation for many scenarios, and degradation measured through this metric correlates well in general with degradation from other metrics. This capacity evaluation approach is what we recommend when a reasonably accurate number is required, especially for cases like large system deployments where sizing the hardware accurately has significant implications in terms of cost and a low error margin is desirable. We used the same approach for the experimental data that we used to illustrate various points in this document, for the following reasons: This approach allowed us to make fairly accurate measurements of the server capacity under specific conditions. It makes it possible for independent parties to replicate and confirm the test results. It allows a more accurate evaluation of various configuration changes on a reference test bed.
Testing methodology
We included various results obtained in our test labs to illustrate many of the assertions made in this document. These tests were executed in the Microsoft laboratories. The tests used a set of tools developed specifically for the purpose of Remote Desktop Session Host server load test simulations so that they meet all the requirements outlined above for effective load test execution. These tools were used to implement a few scenarios based on Office2007 and Internet Explorer. Response times for various actions across the scenarios were used to assess the acceptable level of load under each configuration.
Test bed configuration The Remote Desktop test laboratory configuration is shown in Figure 1.
Windows Server 2008 R2 and Office 2007 were installed by using the settings described in Appendix D. The test tools were deployed on the test controller, workstations, and test server as described previously. User accounts were created for all users used during the testing and their profiles were configured. For each user in the Knowledge Worker scenario, this included copying template files used by the applications, setting up a home page on Internet Explorer, and configuring an email account in Outlook. An automated restart of the server and client workstations was performed before each test-run to revert to a clean state for all the components.
Load generation The test controller was used to launch automated scenario scripts on the workstations. Each script, when launched, starts a remote desktop connection as a test user to the target server and then runs the scenario. The Remote Desktop users were started by the test controller in groups of ten with 30 seconds between successive users. After the group of ten users was started, a 5-minute stabilization period was observed in which no additional sessions were started before starting with the next group. What this means is that it takes 4 minutes and 30 seconds to start 10 users. Taking into account the 5-minute stabilization period, the controller takes 1 hour and 30 minutes to start 100 users. This approach of logging on users one at a time has two advantages. First, it ensures that we don't overwhelm the server by logging on 100 users at the same time. Second, we can look at the resulting data from the test and point to a specific number of users after which the server became unresponsive. From the results in the following sections it can be seen that the number of supported users has been reported to the nearest 10. The reason for this is that we use a group size of 10 users and the level of precision that we get from the test data is not sufficient to clearly distinguish between users from the same group. Response time measurement A user scenario is built by grouping a series of actions. An action sequence starts with the test script sending a key stroke through the client to one of the applications running in the session. As a result of the key stroke, the application does some drawing. For example, sending CTRL-F to Microsoft Word results in the application drawing the File menu. The test methodology is based on measuring the response time of all actions that result in drawing events (except for typing text). The response time is defined as the time taken between the key stroke and the drawing that happens as a result. A timestamp (T1) is taken on the client side when the test tools on the client send a keystroke to the Remote Desktop client. When the drawing happens in the server
application, it is detected by a test framework tool that runs inside each Remote Desktop session. The test tool on the server side sends a confirmation to the client side tools and at this point the client side tools take another timestamp (T2). The response time of the action is calculated as T2 T1. This measurement gives an approximation of the actual response time. It is accurate to within a few milliseconds (ms). The response time measurement is important because it is the most reliable and direct measurement of user experience as defined by system responsiveness. Looking at performance metrics such as CPU usage and memory consumption only gives us a rough idea as to whether the system is still within acceptable working conditions. For example, it is difficult to qualify exactly what it means for the users if the CPU is at 90% utilization. The response times tell us exactly what the users will experience at any point during the test. As the number of users increases on a server, the response times for all actions start to degrade after a certain point. This usually happens because the server starts running out of one or more hardware resources. A degradation point is determined for the scenario beyond which the server is considered unresponsive and therefore beyond capacity. To determine the degradation point for the entire scenario, a degradation point is determined for each action based on the following criteria:
For actions that have an initial response time of less than 200 ms, the degradation point is considered to be where the average response time is more than 200 ms and 110% of the initial value. For actions that have an initial response time of more than 200 ms, the degradation point is considered to be the point where the average response time increases with 10% of the initial value.
These criteria are based on the assumption that a user will not notice degradation in a response time when it is lower than 200 ms. Generally, when a server reaches CPU saturation, the response time degradation point for most actions is reached at the same number of users. In situations where the server is running out of memory, the actions that result in file input/output degrade faster than others (because of high paging activity resulting in congestion in the input/output subsystem), such as opening a dialog box to select a file to open or save. For the purpose of this testing, the degradation point for the whole test was determined to be the point where at least 20% of the user actions have degraded. A typical user action response time chart is shown in Figure 2. According to the criteria described above, the degradation point for this action is at 150 users.
Figure 2 Response time evaluation Scenarios The scenarios used for testing are automated and meant to simulate real user behavior. Although the scripts used in these scenarios simulate tasks that a normal user could perform, the users simulated in these tests are tirelessthey never reduce their intensity level. The simulated clients type at a normal rate, pause as if looking at dialog boxes, and scroll through mail messages as if to read them, but they do not get up from their desks to get a cup of coffee, they never stop working as if interrupted by a phone call, and they do not break for lunch. The tests assume a rather robotic quality, with users using the same functions and data sets during a thirty-minute period of activity. This approach yields accurate but conservative results.
Knowledge Worker v2 The knowledge worker scenario consists of a series of interactions with Microsoft Office 2007 applications (Word, Excel, Outlook, and PowerPoint) and Internet Explorer. The set of actions and their frequency in Office segments of the scenario are based on statistics collected from the Software Quality Management data submitted by Office users and should represent a good approximation of an average Office user. The scenario includes the following: Creating and saving Word documents Printing spreadsheets in Excel Using e-mail communication in Outlook Adding slides to PowerPoint presentations and running slide shows Browsing Web pages in Internet Explorer This scenario is described in detail in Appendix A. Knowledge Worker v2 with text-only presentation This scenario is very similar to the Knowledge Worker scenario above. It is exactly the same except for one differencethe PowerPoint presentation file used in this scenario is a text-only version. The file used in the original Knowledge Worker scenario is rich in content. The comparison of these two scenarios is interesting because it reveals how some differences in the scenarios can impact the capacity of the server. Knowledge Worker v2 without PowerPoint This scenario is similar to the Knowledge Worker scenario in most ways. The significant difference in this case is that the light Knowledge Worker scenario does not use PowerPoint. The duration of the scenario is the same as the Knowledge Worker scenario, but instead of spending time using PowerPoint, the user spends more time typing Word documents, filling Excel spreadsheets, and typing e-mail messages. This scenario is significantly lighter in terms of CPU usage compared to
the Knowledge Worker scenario because PowerPoint, while taking only ~10% of the total work cycle duration, uses more than half of the CPU. This also generates significant variation in the CPU usage during the work cycle, with much higher levels of CPU usage during the short PowerPoint interaction sequence. There were two reasons to introduce this scenario: PowerPoint usage data shows that it is not as widely used as the other Office applications in the mix and this scenario gives an alternate angle on examining various factors due to its relatively lighter load and smoother variations in resource usage. Knowledge Worker v1 This is the Knowledge Worker scenario that was used for testing in the Windows Server 2003 Terminal Server Capacity and Scaling (http://go.microsoft.com/fwlink/? LinkId=178901) white paper. This scenario was significantly different from the current Knowledge Worker v2, and is described in detail in Appendix A.
Capacity
150 users
Knowledge Worker v1
230 users
200 users
HP DL 585
Table 1 - Server capacity by scenario Table 1 shows the comparison of server capacity between different scenarios. The capacity numbers are determined by using the criteria outlined above, but these numbers should be treated with caution and may need to be adjusted for the real deployments. The most important observation about these results is that relatively minor tweaks in the scenario cause significant impact in scalability. Although both test that
230 users
PowerPoint has the same test in the presentation, the difference in the way it is rendered accounts for a 33% variation in capacity. Although the PowerPoint interaction is only ~10% of the total scenario execution cycle, removing it increased the capacity by ~53%. These examples serve as a strong reminder that careful consideration of the scenario used for capacity measurements is paramount to having accurate numbers. It also makes a compelling case that providing off-shelf numbers for capacity planning is not useful, and if such an effort is worth considering, you need to actually customize it to your needs.
Server Configuration
HP DL 385 2 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 24 GB Memory HP DL 585 4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 64 GB Memory
4 x AMD Opteron Quad-core CPUs 2.4 GHz 2048 KB L2 Cache 128 GB Memory
Capacity
80 users
Knowledge Worker v2
150 users
Knowledge Worker v2 Knowledge Worker v2 without PowerPoint Knowledge Worker v2 without PowerPoint
310 users
HP DL 585
4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 64 GB Memory 4 x AMD Opteron Quad-core CPUs 2.4 GHz 2048 KB L2 Cache 128 GB Memory
230 users
450 users
Table 2- Server capacity by hardware configuration As expected, hardware configuration changes would also play a big role in the capacity numbers. With the new x64-based architecture removing some fundamental constraints in the x86-based Windows Server architecture, properly configured servers should be able to accommodate large numbers of users for many mainstream workloads. There is no reason to expect that RD Session Host servers are inherently limited to a certain number of users.
CPU The data presented in Table 3 was obtained by using 2 different test servers. The only difference between the two servers was that one of them has a single Quadcore CPU and the other one has 2 Quad-core CPUs. Server Configuration Scenario Capacity AMD Opteron Quad-core CPU Knowledge 2.7 GHz 110 users Worker v2 512 KB L2 Cache 32 GB Memory 2 x AMD Opteron Quadcore CPU Knowledge 2.7 GHz 200 users Worker v2 512 KB L2 Cache 32 GB Memory AMD Opteron Quad-core CPU Knowledge Worker v2 2.7 GHz 180 users without PowerPoint 512 KB L2 Cache 32 GB Memory 2 x AMD Opteron Quadcore CPU Knowledge Worker v2 2.7 GHz 300 users without PowerPoint 512 KB L2 Cache 32 GB Memory Table 3 - Server capacity by CPU configuration and scenario The data in Table 3 shows the results for two different scenarios. One of the important factors to consider here is that the factor that determines capacity on all these systems is CPU, which is one of the resources that is very often subjected to unexpected variations and pressure points. Therefore, in a real-life deployment it is more prudent to put aside a fraction of CPU resources to act as a cushion when unexpected spikes of activity happen on the box (such as everyone using a certain application at the same time). Another factor that would play a significant role in this decision is the quality of service expected by the users: the higher the expectation, the larger the spare capacity that needs to be provisioned. Such a margin could range anywhere from 50% to 10% of the overall capacity and will cause the capacity numbers to be adjusted accordingly. As expected, increase in CPU power will allow a server to support more users if no other limitations are encountered. The most interesting measure of how increasing CPU capacity affects the overall server capacity is the scale factor defined as the ratio by which the server capacity increases when the CPU capacity doubles. This scaling factor is always smaller than 2 on a system where there is no other limitation except CPU. It is also expected to be a function of the initial number of CPUs involved, and would decrease in value when the number of CPUs involved
increase (the scaling factor going from 1 to 2 CPUs is larger than the one for going from 2 to 4 CPUs). Typically the scaling factor for Remote Desktop Session Host servers would be found in the 1.5 to 1.9 range. Although the same hardware box was used, different scenarios yielded different scaling factors: the normal script version yielded a scale factor of ~1.8, and the version without PowerPoint yielded a factor of 1.67. The reason for this is that the scenario that included PowerPoint had more variation in CPU usage, and the system with more CPU capacity available softened the impact of local usage peaks that can overwhelm the less powerful system. Lets take a look at the CPU usage profile for the test scenarios in more detail to understand how the variance and fluctuation in server load impacts server capacity on a CPU limited system.
Figure 3 - CPU usage for Knowledge Worker without PowerPoint The CPU curve in Figure 3 shows a general increase in CPU usage (green curve) as the number of active users increases (blue curve). Looking at the CPU curve closely, we can see that every time there is an increase in users, the CPU curve hits a peak. This peak is followed by a decline as the number of users becomes constant for a while. This pattern is repeated throughout the test while the overall CPU keeps rising. The CPU peak results from logon activity associated with the users that are logging on at that time on the server. Users log on in groups of 10. Each group of users logs on within 5 minutes before the test enters a steady state for another 5 minutes. Because the users are being logged on so close together, the CPU spike caused by each user logon overlaps with the ones caused by users preceding/following them and results in one large CPU peak for the group of 10 users. The size of this CPU logon peak impacts the server capacity measurement. Server capacity is reached on a CPU limited system when the CPU usage reaches close to saturation (100% usage). The slope of the CPU curve is determined by the steady state load on the system as the number of users increases (this is the CPU usage minus the logon peaks as depicted by the orange curve in Figure 3). If there was no logon-related CPU activity, the server would reach capacity when this curve hits 100%. In reality, the CPU hits 100% sooner because the logon peaks touch 100% (marked as 100% CPU Peak in Figure 3). The bigger the peaks are, the sooner the CPU curve will touch 100%. The size of the CPU logon peak is dependent on the total processing power of the server. On a 4-core computer, the logon peak will be larger than on an 8-core computer. The 8-core computer has more processing power to absorb the impact of the logon peak. This means that a scenario will be able to reach further on the
steady state CPU curve (the orange curve) on computers with more processing power.
Figure 4 - Knowledge Worker CPU usage The other thing to consider when looking at the CPU usage pattern is the variance of the workload in the scenario. In terms of CPU usage, the variance of the workload is low when all parts of the scenario are equally CPU intensive. If the variance is low, the CPU usage pattern will be very uniform as in Figure 3. If the variance is high, the CPU usage pattern will be non-uniform and this can impact the server capacity. The variance of the Knowledge Worker scenario with PowerPoint is higher when compared to the Knowledge Worker without PowerPoint. This is because the PowerPoint part of the scenario is much more CPU- intensive when compared to the other parts of the scenario. This means that if several users happen to start working in PowerPoint, the CPU usage jumps up across the system. When this phase coincides with a user logon peak, the result is that the CPU peak becomes much higher than usual. Figure 4 shows the CPU usage profile of the Knowledge Worker scenario. The peaks where logon activity overlaps with a high number of users working in PowerPoint are marked in Figure 4 as "High CPU Peak." It is not easy to predict when these high peaks will occur during the test beyond a few groups of users because it becomes increasingly difficult to calculate what all the users are doing at a given time. Because of these very high peaks, the CPU usage hits 100% even sooner. This means that a scenario with a low CPU variance will scale better than one with high CPU variance. Also, in this case a computer with more processing power is able to mitigate the impact of CPU variance and the high peaks and thus scales better. Memory Determining the amount of memory necessary for a particular use of an RD Session Host server is complex. It is possible to measure how much memory an application has committedthe memory the operating system has guaranteed the application that it can access. But the application will not necessarily use all that memory, and it certainly is not using all that memory at any one time. The subset of pages that an application has accessed recently is referred to as the working set of that process. Because the operating system can page the memory outside a processs working set to disk without a performance penalty to the application, the working set is a much better measure of the amount of memory needed. The process performance object's working set counter, used on the _Total instance of the counter to measure all processes in the system, measures how many bytes have been recently accessed by threads in the process. However, if the free memory in the computer is sufficient, pages are left in the working set of a process
even if they are not in use. If free memory falls below a threshold, unused pages are trimmed from working sets. The method used in these tests for determining memory requirements cannot be as simple as observing a performance counter. It must account for the dynamic behavior of a memory-limited system. The most accurate method of calculating the amount of memory required per user is to analyze the results of several performance counters [Memory\Pages Input/sec, Memory\Pages Output/sec, Memory\Available Bytes and Process\Working Set(Total_)] in a memory-constrained scenario. When a system has abundant physical RAM, the working set will initially grow at a high rate, and pages will be left in the working set of a process even if they are not in use. Eventually, when the total working set tends to exhaust the amount of physical memory, the operating system will be forced to trim the unused portions of the working set until enough pages are made available to free up the memory pressure. This trimming of unused portions of the working sets will occur when the applications collectively need more physical memory than is available, a situation that requires the system to constantly page to maintain all the processes working sets. In operating systems theory terminology, this constant paging state is referred to as thrashing. Figure 5 shows the values of several relevant counters from a Knowledge Worker test when performed on a server with 8 GB of RAM installed.
Figure 5 - Stages of memory usage Zone 1 represents the abundant memory stage. This is when physical memory is greater than the total amount of memory that applications need. In this zone, the operating system does not page anything to disk, even seldom used pages. Zone 2 represents the stage when unused portions of the working sets are trimmed. In this stage the operating system periodically trims the unused pages from the processes working sets whenever the amount of available memory drops to a critical value. Each time the unused portions are trimmed, the total working set value decreases, increasing the amount of available memory, which results in a significant number of pages being written to page files. As more processes are created, more memory is needed to accommodate their working sets, and the number of unused pages that can be collected during the trimming process decreases. The page- input rate is mostly driven by pages required when creating new processes. The average is typically below the page-output rate. This state is acceptable as long as the system has a suitable disk storage system. The
applications should respond well because, in general, only unused pages are being paged to disk. Zone 3 represents the high pressure zone. The working sets are trimmed to a minimal value and mostly contain pages that are frequented by the greater number of users. Page faults will likely cause the ejection of a page that will need to be referenced in the future, thus increasing the frequency of page faults. The output per second of pages will increase significantly, and the page-output curve follows the shape of the page-input curve to some degree. The system does a very good job of controlling degradation, almost linearly, but the paging activity increases to a level where the response times are not acceptable. In Figure 5, it seems as though the amount of physical memory is greater than 8 GB because the operating system does not start to trim working sets until the total required is well above 14 GB. This is due to cross-process code sharing, which makes it appear as if there is more memory used by working sets than is actually available. To determine the amount of memory needed per user by the system, we have to look at the three zones again. Zone 1 is a clearly acceptable working stage for the system, while Zone 3 is clearly unacceptable. Zone 2 needs more careful consideration. The average total paging activity (pages input and pages output) steadily rises during this stage. In the example above, the paging activity increases from around 50 pages per second to over 1500 pages per second. This translates into an ever increasing disk access activity. During this stage, how responsive a system will be is determined by how much the throughput of the disk storage system is. If, for example, the system is using only a local disk for its storage with a low throughput, its responsiveness will be unacceptable anywhere in Zone 2. On the other hand, if the disk storage system is capable of handling this level of disk activity, the system will be responsive during the entire Zone 2. Even with a responsive disk storage system, it is generally good to be conservative about choosing the spot in Zone 2 where you think the system will still be responsive. A good rule of thumb is to choose the point where the operating system does the second large trimming of the working set (this is the point of the second large spike on the page-output curve marked as 'optimal point' in Figure 2). The user response times should also be looked at to verify that they are acceptable at this point. The amount of memory required per user can be estimated by dividing the total amount of memory in the system by the number of users at the optimal point in Zone 2. Such an estimate would not account for the memory overhead required to support the operating system. A more precise measurement can be obtained by running this test for two different memory configurations (for example, 4 GB and 8 GB), determining the number of users, and dividing the difference in memory size (8 GB 4 GB in this case) by the difference in number of users at the optimal point in Zone 2. In practice, the amount of memory required for the operating system can
be estimated as the memory consumed before the test starts. In the above example, the optimal point in Zone 2 is where the system has 110 active users logged on. The total memory available at the start of the test was 7500 MB (the remaining having been consumed by the operating system. These numbers mean that each user requires approximately 68 MB of memory. Although a reasonable amount of paging is acceptable, paging naturally consumes a small amount of the CPU and other resources. Because the maximum number of users that could be loaded onto a system was determined on systems with abundant physical RAM, a minimal amount of paging occurred. The working set calculations assume that a reasonable amount of paging has occurred to trim the unused portions of the working set, but this would only occur on a system that was memory-constrained. If you take the base memory requirement and add it to the number of users multiplied by the required working set, you end up with a system that is naturally memory-constrained, and therefore acceptable paging will occur. On such a system, expect a slight decrease in performance due to the overhead of paging. This decrease in performance can reduce the number of users who can be actively working on the system before the response time degrades above the acceptable level. Comparison of different memory configurations Model Knowledge Server Configuration Number Worker
4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 8 GB Memory 4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 4 GB Memory
DL585
120 users
DL585
60 users
Table 4 - Server capacity by memory configuration Table 4 shows the comparison of server capacity between different memory configurations. On systems where physical memory is the limiting factor, the number of supported users increases linearly with the amount of physical memory. Disk storage Storage access is a very significant factor in determining server capacity and needs to be considered carefully. Although the Knowledge Worker scenarios are not very demanding in terms of storage performance (they average about 0.5 disk operations per second per user), they still provide a good high-level view of what the concerns are in this space. In general, these are the storage areas most likely to face high input/output loads:
1. The storage for user profiles will likely have to handle most of the input/output activity related to file access because it holds user data, temporary file folders, application data, etc. Some of this may be alleviated if folder redirection is configured to re-route some of the traffic to network shares. 2. The storage holding system binaries and applications will service IOs during process creation and application launch and page faults to executable files under higher memory pressure. This is generally not much of a problem if the binaries (especially dlls) are not rebased during load because their code pages are shared across processes (and across session boundaries). 3. The storage holding page files will be solicited only if the system is running low on memory, but may face significant input/output load even under relatively moderate memory pressure conditions due to the large amount of RAM involved. You can expect that initial trimming passes will reclaim as much as 25% of the overall RAM size, which on a 16-GB system is 4 GB, a very large amount of data that needs to be transferred in a relatively short period of time to disk. Due to the potential high level of input/output involved in paging operations, we recommend isolating the page file to its own storage device(s) to avoid its interference with the normal file operations generated by the workload. We also recommend tracking dll base address collision/relocation problems to avoid both unnecessary input/output traffic and memory usage. Network By default, the data sent over Remote Desktop connections is compressed for all connections, which reduces the network usage for Remote Desktop scenarios. Network usage for two scenarios is shown in Figure 6. This includes all traffic coming in and going out of the RD Session Host server for these scenarios.
Figure 6 - Network usage by scenario It is apparent from this figure that the total network traffic on the server (inbound and outbound) can vary considerably depending on the scenario. The Knowledge Worker scenario is using richer graphics compared to the other scenarios, especially because of the PowerPoint presentation slide show that is a part of the scenario. As can be expected, this results in higher network usage. Figure 7 shows network usage in bytes per user for the Knowledge Worker scenario. This is taken from the Bytes Total/sec counter in the Network Interface performance object. This graph illustrates how the bytes per user average were calculated, as it converges on a single number when a sufficient amount of simulated users are running through their scripts. The number of user sessions is
plotted on the primary axis. The count includes both bytes received and sent by the RD Session Host server by using any network protocol.
Figure 7 - Knowledge Worker scenario network usage per user The network utilization numbers in these tests only reflect RDP traffic and a small amount of traffic from the domain controller, Microsoft Exchange Server, IIS Server, and the test controller. In these tests, the RD Session Host servers local storage drives are used to store all user data and profiles; no network home directories were used. In a normal RD Session Host server environment, there will be more traffic on the network, especially if user profiles are not stored locally.
Color Depth
16 bpp
DL585
32 bpp
140 users
Table 5 - Server capacity by desktop color depth for Knowledge Worker scenario Choosing 32-bit color depth for Remote Desktop Connection sessions instead of 16bit results in a slight increase in CPU usage. For the Knowledge Worker scenario, this results in a reduced server capacity from 150 users to 140 users. There is also an increase in network bandwidth usage (8% in this case). How much of an impact there will be depends on the scenario as well. A graphics-rich scenario will show a greater impact of choosing 32-bit color depth because there will be more graphics data to process and send over the network. Windows printer redirection (XPS) Windows printer redirection enables the redirection of a printer installed on the client computer to the RD Session Host server session. Through this feature, print commands issued to server applications get redirected to the client printer and the
actual printing happens on the client side. To assess the effect of enabling printer redirection on RD Session Host server scalability, the Knowledge Worker script was run in a configuration where an HP LaserJet 6P printer was installed on the NULL port on each client computer, and the clients were configured to redirect to the local printer when connecting to the server. The script prints twice during the 30minute work cycle: the first print job is a 19-KB Word document and the second print job is a 16-KB Excel spreadsheet. Test results show that network bandwidth usage is not significantly affected by printer redirection, and the impact on other key system parameters (memory usage, CPU usage) is negligible. There is no impact in terms of server capacity in the Knowledge Worker scenario. Compression algorithm for RDP data It is possible to specify which Remote Desktop Protocol (RDP) compression algorithm to use for Remote Desktop Services connections by applying the Group Policy setting Set compression algorithm for RDP data. By default, servers use an RDP compression algorithm that is based on the server's hardware configuration. In the case of the server computers used for this testing, this algorithm is "Optimize to use less memory." Testing was performed by using the default compression policy as well as setting the policy to "Optimize to use less network bandwidth." This option uses less network bandwidth, but is more memory-intensive. The test results show that there is no impact on server capacity by setting the compression policy to "Optimize to use less network bandwidth." The impact on memory usage is negligible, and there is an overall reduction in bandwidth usage. Additionally, the server is slightly more responsive in this case after capacity is reached compared to the default compression policy. Desktop Experience pack The Desktop Experience feature enables you to install a variety of Windows 7 features on your server (such as Desktop Themes, Windows SideShow, Windows Defender). For the purpose of this test, the Desktop Composition feature was installed on the server, which enables the Themes service and applies the Aero theme for all users. There were two different tests performed with the Desktop Experience pack installed. In the first test, Desktop Composition remoting was disabled from the client side. In the second test, Desktop Composition remoting was enabled. The results are displayed in Table 6. Server Configuration
4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 64 GB Memory
Capacit y
140 users
4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 64 GB Memory 4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 64 GB Memory
Installed
Disabled
140 users
Installed
Enabled
120 users
Table 6 - Server capacity at 32 bpp color depth for Knowledge Worker scenario In the case of the Desktop Experience pack when Desktop Composition remoting is disabled, the server capacity remains unchanged. There is around 5% increase in memory usage, which can result in a reduced server capacity on memory-limited systems. In the case when Desktop Composition remoting is enabled, the server capacity drops from 140 users to 120 users caused by an increase in CPU usage. There is around 68% increase in network bandwidth usage and a 5% increase in memory usage. When Desktop Composition remoting is enabled, there is a significant increase in CPU and memory usage on the client side as well. A client computer running 12 instances of the Remote Desktop Connection client (mstsc.exe) showed a 100% increase in memory usage as well as 70% increase in CPU usage when Desktop Composition remoting is enabled.
RemoteApp programs
Remote Desktop Web Access enables users to access RemoteApp programs. RemoteApp programs are applications that are accessed remotely through Remote Desktop Services and appear as if they are running on the end user's local computer. A RemoteApp program scenario was created so that we can compare server capacity when using RemoteApp programs to the Remote Desktop scenario. The RemoteApp programs scenario is mostly the same as the Knowledge Worker scenario. The difference is in the way the connection is made to the server and how the applications are launched. The comparison between Remote Desktop and RemoteApp programs is shown in Table 7. Server Configuration
4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 64 GB Memory
Scenario
Knowledge Worker
DL585
135 users
Table 7 - Server capacity comparison of RemoteApp programs and Remote Desktop Test results show higher CPU usage in the RemoteApp programs scenario, which results in 10% fewer supported users compared to the Remote Desktop scenario. There is no significant difference in other key system parameters (memory usage, network bandwidth).
Hyper-V
Hyper-V, the Microsoft hypervisor-based server virtualization technology, enables you to consolidate multiple server roles as separate virtual machines (VMs) running on a single physical computer, and also run multiple different operating systems in parallel on a single server. Hyper-V tests were performed for this white paper to compare server capacity between an RD Session Host server running natively and an RD Session Host server hosted as a virtual machine under Hyper-V. For these tests, Windows Server 2008 R2 was installed as the Hyper-V host server. The test server used for this evaluation had a single Quad-core AMD CPU that supports Rapid Virtualization Indexing (RVI). This feature provides hardware acceleration for virtualization memory management tasks and is leveraged by the new Second Level Address Translation (SLAT) feature available in Hyper-V in Windows Server 2008 R2. When running inside a virtual machine, Windows Server 2008 R2 was also installed with the RD Session Host role service enabled. The VM was the only VM configured on that host, with 30 GB of the overall 32 GB of available RAM allocated to it. In addition, it was configured with the maximum of 4 virtual processors so that it can utilize all 4 CPU cores available. The Remote Desktop clients connected to the VM for these tests. There were two Hyper-V tests performed. One was with the default configuration that utilizes hardware acceleration provided by RVI (a new feature for Hyper-V available in Windows Server 2008 R2), and the other simulated a processor with no hardware assist by disabling the hardware assist support. The results are shown in Table 8.
Server Configuration
Scenario
SLAT
Capaci ty
AMD Opteron Quad-core CPU 2.7 GHz 512 KB L2 Cache 30 GB Memory AMD Opteron Quad-core CPU 2.7 GHz 512 KB L2 Cache 30 GB Memory AMD Opteron Quad-core CPU 2.7 GHz 512 KB L2 Cache 30 GB Memory
Native
N/A
180 users
Hyper-V
Enabled
150 users
Hyper-V
Disabled
70 users
Table 8 - Server capacity for Knowledge Worker v2 scenario without PowerPoint In the case of SLAT-capable hardware, the Hyper-V scenario supports 17% fewer users when compared to running natively without Hyper-V. When SLAT is disabled, the server capacity is reduced by 53% compared to the SLAT-enabled scenario. Obviously, SLAT makes a very significant difference when running the RD Session Host role service under Hyper-V. The processors that support this featureRapid Virtualization Index (RVI) for AMD processors and Extended Page Tables (EPT) for Intel processorsare strongly recommended.
Server Configuration
4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 64 GB Memory 4 x AMD Opteron Dual-core CPUs 2.4 GHz 1024 KB L2 Cache 64 GB Memory
DL585
150 Users
Table 9 - Server capacity by operating system for Knowledge Worker scenario Table 9 shows the server capacity comparison between Windows Server 2008 and Windows Server 2008 R2 for the knowledge worker scenario. The memory usage on both operating systems is very similar. Windows Server 2008 R2 uses slightly higher CPU when compared to Windows Server 2008, resulting in a slightly reduced server capacity.
Conclusions
Capacity planning for Remote Desktop deployments is subject to many variables and there are no good off-the-shelf answers. Based on usage scenario and hardware configuration, the variance in capacity can reach up to two orders of magnitude. If you need a relatively accurate estimate, either deploying a pilot or running a load simulation are quite likely the only reliable ways to get that. Remote Desktop Session Host server can provide good consolidation for certain scenarios if care is taken when configuring the hardware and software. Supporting 200 users on a dual socket 2U form factor server is completely viable for some of the medium to lighter weight scenarios. When configuring an RD Session Host server, give special attention to the following: Provide more CPU cores to not only increase overall server capacity, but also allow a server to better absorb temporary peaks in CPU load like logon bursts or variation in load. Provide the server with at least 8 GB of RAM, typically 16 GB.
Remember that enabling Desktop Composition will have a significant impact on resource usage and will affect server capacity negatively. When running RD Session Host servers in a virtualized environment, make sure the processor supports paging at the hardware level (RVI for AMD, EPT for Intel). Use WSRM in deployments where there are wide swings in CPU usage. Properly size the server input/output throughput capacity.
Testing methodology
All the tests described here were executed at Microsoft and the results were evaluated. The tests used a set of tools developed specifically for RemoteFX on RD Session Host capacity planning. Response times for various actions across the scenarios were used to assess the acceptable level of load under each configuration.
Our first set of tests compared a Remote Desktop server and test users running Windows 7 with SP1 with RemoteFX disabled and then enabled. We measured the CPU utilization as well as bandwidth consumption for session users on the RD Session Host server with and without RemoteFX enabled.
Result summary
Resource Utilization:
Initial capacity tests for the multimedia scenarios on a server with RemoteFX enabled and with high network bandwidth, demonstrate that enabling RemoteFX will decrease bandwidth utilization in multimedia scenarios. Enabling RemoteFX provides a better multimedia experience overall. In this scenario, CPU utilization will increase slightly, but the amount of bandwidth consumed in multimedia
scenarios will decrease. The exact values will depend on the kind of workload that is being executed. In knowledge worker scenarios, enabling RemoteFX on an RD Session Host server results in slightly greater CPU consumption and network bandwidth consumption than disabling RemoteFX.
The SwitchDesktop tool runs on the test client computers. It runs inside each new desktop that is created on the client. Its only function is to provide a way to switch back to the default desktop where the RDLoadSimulationClient is running.
Start (Word) - Start and exit Word Start (Microsoft Excel) - Start and exit Excel loop(forever) Start (Word) - Type a page of text and print Open a Word document Type a page of text Modify and format text Check spelling Print Save Exit Word Start (Microsoft Excel) - Load Excel spreadsheet, modify, and print it Load Excel spreadsheet Modify data and format Print Save Exit Excel
Start (PowerPoint) - Load presentation and run slide show Load a PowerPoint presentation Navigate Add a new slide Format text Run slide show Save file Exit PowerPoint
Switch To Process, (Outlook) - send e-mail, read message, and respond Send e-mail to other users Read e-mail and respond Minimize Outlook Start (Internet Explorer) - Load presentation and run slide show Loop (2) URL http://tsexchange/tsperf/WindowsServer.htm URL http://tsexchange/tsperf/Office.htm URL http://tsexchange/tsperf/MSNMoney.htm End of loop Exit Internet Explorer End of loop
Knowledge Worker v1
Typing Speed = 35 words per minute Definition: a worker who gathers, adds value to, and communicates information in a decision support process. Cost of downtime is variable but highly visible. Projects and ad-hoc needs towards flexible tasks drive these resources. These workers make their own decisions on what to work on and how to accomplish the task. The usual tasks they perform are marketing, project management, sales, desktop publishing, decision support, data mining, financial analysis, executive and supervisory management, design, and authoring. Connect User smcxxx Start (Microsoft Excel) - Load massive Excel spreadsheet and print it Open File c:\documents and settings\smcxxx\Carolinas Workbook.xls
Print Close document Minimize Excel Start (Outlook) - Send a new, short e-mail message ( e-mail2 ) Minimize Outlook Start (Internet Explorer) URL http://tsexchange/tsperf/Functions_JScript.asp Minimize Internet Explorer Start (Word) - Type a page of text ( Document2 ) Save Print Close document Minimize Word
Switch To (Excel) Create a spreadsheet of sales vs months ( spreadsheet ) Create graph ( graph ) Save Close document Minimize Excel Switch To Process, (Outlook) - read e-mail message and respond ( Reply2 ) Minimize Outlook Now, Toggle between apps in a loop loop(forever) Switch To Process, (Excel) Open File c:\documents and settings\smcxxx\Carolinas Workbook.xls Print Close document Minimize Excel Switch To Process, (Outlook) E-Mail Message ( e-mail2 ) Minimize Outlook Switch To Process, (Internet Explorer) Loop (2) URL http://tsexchange/tsperf/Functions_JScript.asp URL http://tsexchange/tsperf/Conditional_VBScript.asp URL http://tsexchange/tsperf/Conditional_JScript.asp URL http://tsexchange/tsperf/Arrays_VBScript.asp URL http://tsexchange/tsperf/Arrays_JScript.asp End of loop Minimize Internet Explorer Switch To Process, (Word) - Type a page of text ( Document2 ) Save Print Close document Minimize Word Switch To Process, (Excel) Create a spreadsheet of sales vs months ( spreadsheet )
Create graph ( graph ) Save Close document Minimize Excel Switch To Process, (Outlook) - read message and respond ( reply2 )
Log off
Networking left at default with typical network settings Server joined as a member to a Windows Server 2008 domain Page file initial and maximum size set to 56 GB System and user profiles data resides on a single logical RAID 5 drive Page files reside on a single logical RAID 5 drive that is separate from the one used for system and user profiles data
Disable all redirections (drive, Windows printer, Clipboard, , LPT, COM, audio and video playback, audio recording, Plug and Play devices) Color depth is set to 16 bit for Remote Desktop Services connections
Office 2007 installed enabling the following features from Office customization Microsoft Office Excel Microsoft Office Outlook Microsoft Office PowerPoint Microsoft Office Word Office Shared Features Office Tools
AutoSave of messages disabled Automatic name checking disabled Do Not Display New Mail Alert for users enabled Suggest names while completing To, Cc, and Bcc fields disabled Return e-mail alias if it exactly matches the provided e-mail address when searching OAB enabled
AutoArchive disabled Background grammar-checking disabled Check Grammar With spelling disabled Background saves disabled Save AutoRecover information disabled Always show full menus enabled Microsoft Office Online disabled Customer Experience Improvement Program disabled Automatically receive small updates to improve reliability disabled
Word Settings
Printer settings HP Color LaserJet 9500 PCL 6 created to print to NUL port
User profiles
Configuration script executed to pre-create cached profiles, copy template files for applications, configure e-mail accounts, and set home page on Internet Explorer Roaming profiles used for all users
Performance logger
Performance counters are logged on to the RD Session Host server itself Disable screen saver for all users through Group Policy Disable Windows Firewall Enable Remote Desktop Connections Set power settings to High Performance Delete all office and XPS printers installed at setup
General settings
Appendix E: Test Scenario Definitions and Flow Chart for Testing RemoteFX on RD Session Host server
Test description:
A RemoteFX on an RD Session Host server is set up and deployed. The tests were run in using the following sequence: 1. Log on 60 Remote Desktop users on a RemoteFX on an RD Session Host server. The users are logged on in 30 seconds apart. 2. The users logon and open these apps: Excel, Outlook, Power Point, IE, and Word. 3. Once the apps are open, the users go in a continuous loop cycling through these applications (write emails/docs/excel documents/create PP presentation/run slide show and browsing web pages). A user takes 32 minutes to complete a full script cycle. 4. Once all users have logged in and opened apps, we took a trace of the test runs.
Appendix F: Group Policy Settings for Testing RemoteFX on RD Session Host server
There is a group policy setting that an administrator can use to adjust performance or user experience as desired.
Optimize visual experience when using RemoteFX: Screen Image Quality:
Image quality corresponds to the user experience received by the client. It can be set to high, medium or low, with the higher settings resulting in a better user experience. This policy setting can be optimized for performance. The lowest GP settings result in slightly greater scalability on the server in some scenarios. Performance and scale are also dependent on workload, with knowledge worker scenarios resulting in greater scalability on the server.