You are on page 1of 42

Windows Enterprise Client Boot and Logon

Optimization – Part 1, Introduction

This is the first in a series of blog posts I'll be writing that discuss the optimization of Windows
client boot and logon in a corporate setting. Certainly, a lot of what I'll discuss can be applied to
the home PC while other parts of it won't matter. Other blog writers have discussed tools and
techniques that are close to this topic but I hope to bring some simplicity and a new way of
thinking to an area that challenges a lot of IT Pros.
This content started with an idea I had several years ago. As a Premier Field Engineer for Microsoft,
I was doing a lot of desktop assessments for customers where we were examining the boot and
logon time of Windows XP clients and helping them find ways to speed things up. As customers
moved away from Windows XP, I found that while helpful, these assessments were also a little
annoying.
Think about it. You've spent time designing a new corporate image that includes the software you
want, it's been configured with the Group Policy you want and you've deployed the Anti-virus
software you want. Then I come in the door and tell you it's wrong and you need to change it
(politely of course).
As I was having these experiences with customers, I started to think about how we could turn it
around. Could we use the same tools I use for analysis and build-in optimization during the design
phase? Heck yeah we could.
So, along with a couple of my colleagues (Roger Southgate and Scott Duffey), we built a customer
facing workshop. This workshop (Windows Enterprise Boot and Logon Optimization) is available
globally to all Microsoft Premier Customers and is delivered by Premier Field Engineering. And the
deliveries we've had have been really successful.
But now I want to share these ideas with a larger audience. It's time for this blog post series.
The Sales Pitch

So first up, why do we care about the boot and logon performance of user devices in the
Enterprise?
What we've found is that client performance is often overlooked. It's not built into the design,
there's no baseline for acceptable performance and this leads to a loss of productivity.

Users make assumptions about IT based upon their user experience and often, issues go unnoticed
when employees head off for a morning coffee while their PC boots, rather than calling helpdesk.
The objective of these blog posts will be to help you prioritize the user experience, design for
performance, establish a baseline, create a repeatable process to measure the impact of change
and to understand potential causes for poor user experience.
Eventually we'll get to some deep analysis but the intent is to start slow and build a solid process
that will help you minimize the time you need to spend isolating production issues.

With Windows 10 just around the corner, it's also a good time to have this discussion.
Windows Enterprise Client Boot and Logon
Optimization – Part 2, Tools and
Instrumentation
The process that I’m going to describe in the following posts will require you to benchmark the
user experience as you design the client image. In order to do this, you’ll need some way to time
the system as it boots and as the user logs in. What you may not realise is that the instrumentation
is already built into Windows.

Event Tracing for Windows (ETW)


There’s a great deal of information out there regarding ETW and a lot of it is extremely deep. As I
said during the introduction, I want to keep this pretty simple. The goal is to empower you to
achieve an outcome.
The short story is this

 ETW exists as built-in instrumentation for logging and debugging


 Some of the instrumentation runs all the time and feeds the Event Log, Resource Monitor
and other tools
 Some of the instrumentation can be turned on when needed (no reboot required)
 The instrumentation is non-intrusive and typically causes very little performance overhead
 The instrumentation is very powerful
o Able to sample down to a microsecond resolution
o Provide call stack (code) information

A nice diagram that conveys ETW is shown here (thank you Matthew Reynolds for this
conceptualization)
A quick summary -

 ETW is the framework built into Windows that allows the instrumentation we’re interested in
 Providers create the events captured during a trace session
 Controllers allow you to start/stop logging and to choose the providers from which to
capture events
 Real-time Consumers can display information as it is generated
 ETL files can capture the event data for parsing/analysis by a Consumer after the trace is
complete

Tools
This series of posts will focus on tools available with the Windows Performance Toolkit (WPT). WPT
is included with the Windows Assessment and Deployment Kit (ADK).
You can download the ADK for Windows 10 here.
The ADK may be installed on any down-level version of Windows that is still in mainstream
support. You’ll want a client system set aside for analysis that has WPT installed as follows:

1. Download and run ADKSetup.exe from the ADK link provided above
2. Follow the installation wizard though until you reach the feature selection screen
3. Clear all check boxes except Windows Performance Toolkit and then click Install

Once completed with the default installation path, WPT will reside in C:\Program Files
(x86)\Windows Kits\10\Windows Performance Toolkit
For deep analysis, which I’ll discuss later, you’ll want to proceed through this installation. This
makes sure that additional binaries are in the right folders, components are registered and your
path environment variable has been modified.
The WPT installation folder also includes a Redistributables folder. This includes WPT Windows
Installer packages for each of three architectures -

 WPTarm-arm_en-us.msi
 WPTx64-x86_en-us
 WPTx86-x86_en-us

For trace capture of a test system, you can flat copy this folder across to avoid a registry/Programs
and Features installation footprint. I usually put my tools in C:\WPT.
Going back to the terminology discussed above, and referencing some of the .exe’s you’ll find in
the WPT installation folder, we have -

 WPR.exe – this is a command line capture tool (Controller)


o In these blog posts, I won’t spend any time on it
 WPRUI.exe – this is a GUI capture tool (Controller)
o In these blog posts, this will be the go-to capture tool
 XPerf.exe – this is a command line capture (Controller) and analysis tool (Consumer)
o I’ll discuss this in the next blog post and how it’s used during the benchmarking
process
 WPA.exe – this is a GUI analysis tools (Consumer)
o We’ll use this extensively during deep analysis much later in the blog post series

That’s it. You’re ready to get benchmarking.

Windows Enterprise Client Boot and Logon


Optimization – Part 3, Trace Capture and
Benchmarking
In the previous post I discussed instrumentation and tools and we walked through the installation
of the Windows Performance Toolkit (WPT).
In this post, I want to discuss using WPT to capture boot and logon statistics and how that
information may be used to develop a process for benchmarking your Operating System build as
you design it.
Preparing for a Trace
Before we get into capturing a boot trace, I want to call out a few important pre-requisites.

1. Match the WPT architecture to the Operating System you want to benchmark (see
the previous post)
2. You must have local admin rights on the system to capture a trace
3. Consider the use of Autologon
The first two points are pretty clear but I want to expand on the use of Autologon -
During a boot trace, the instrumentation captures data throughout boot and logon. If a system is
left at the logon screen for a prolonged period of time, the trace will be longer and the timing will
be skewed. It’s certainly possible to extract information from the trace that tells you how long this
wait was but it’s an extra step and it’s not immediately accessible. In the interest of keeping things
simple and helping you create a robust process, I want to advocate the use of Autologon while the
system is being benchmarked.
N.B. The username and password is stored in clear text in the registry. It’s very important that you
remove the Autologon registry entries after you’ve completed benchmarking.

To enable Autologon, refer to knowledge base article - How to turn on automatic logon in
Windows XP (this article is still valid for supported versions of Windows).
Capturing a Timing Trace
Everything I’ve discussed to this point has been getting you ready to capture a timing trace. A
timing trace makes use of trace options that minimise performance overhead, log to a .ETL file and
establish the timing of boot phases.
To capture a timing trace, you begin by launching Windows Performance Recorder (WPRUI.exe).
This will be in your environment path if you completed a full installation of WPT. If not, just browse
to your WPT folder and launch it.
The options I’ll recommend for a timing trace are shown here -
With these options selected, simply click Start. Windows Performance Recorder will ask you to
select a trace file destination and to provide an optional description for the trace -

Once you click Save, you will be prompted to reboot. Once a reboot and login (with or without
Autologon) completes, the trace will continue for 120 seconds.

After this time, the trace will be saved to the location specified earlier (see above).
Simple Trace Analysis
The first thing you’ll want to do after capturing a timing trace is to extract some useful information
from it. In line with keeping it simple, I’m going to show you how XPerf.exe may be used as a
Consumer to process the trace. Let’s start with -
xperf.exe -i <trace_file.etl> -a tracestats

This simple command provides some information about the trace and the system from where it
was captured. For example:
Here I can see that the trace was captured from a system with 4 processors running at 2660 MHz
and that the total trace time was a little over 3 minutes.

So let’s get down to it. How can I extract the timing information needed to benchmark the system?
Again, XPerf.exe is the answer -
xperf -i <trace_file.etl> -o Summary.xml -a boot

The “-o” in this command specifies an output file. The output of boot analysis is in an XML format
so I choose an output file with a .xml extension.

If I open the Summary.xml file with Internet Explorer (or another favoured XML viewer), there’s a
great deal of data displayed. Again, keeping it simple means that I want to collapse all of the
redundant information and just display data relevant to system boot and logon times. At its
simplest, my data looks like this -

So what does all of this mean?


Right at the top we see that timeFormat=”msec” so this tells us that all of the numbers related to
time are in milliseconds.

There are six intervals, 5 of which represent boot phases


(PreSMSS, SMSSInit, WinlogonInit, ExplorerInit,PostExplorerPeriod) and TraceTail which represents
the remaining time in the trace. Boot phases are periods of time in which specific boot/logon
activity takes place. I’ll discuss these in a lot more detail in future blog posts.

bootDoneViaExplorer is the time since boot start at which the Windows Desktop (Explorer) launch
was complete (48.944 seconds in this example).
postBootRequiredIdleTime is the required amount of accumulated idle time (CPU and Disk 80%
idle) before the system is considered ready for use. This is always 10 seconds.
bootDoneViaPostBoot is the time at which 10 seconds of idle time has been accumulated. In
practical terms, subtract 10 seconds from this number and consider this the time at which the
system was ready for use (80.944 – 10 = 70.944 seconds in this example). This calculation should
also be performed against the PostExplorerPeriod boot phase. A value of “-1” here means that the
system did not reach an idle state by the time the trace completed.
For the purposes of benchmarking the system, I have everything I need. I can record these
numbers and observe how they fluctuate as changes are made to the system image.
Benchmarking
The benchmarking approach I’ll recommend is just an iterative process. Start with a clean
installation of Windows on your chosen hardware and gather a timing trace. This provides you with
your best possible baseline for that system.
Add a component and benchmark it again. In this way you’ll gain insight into how much time that
component costs you.
The diagram above makes reference to optimizing ReadyBoot. For now, just be aware that you’ll need to do
this with systems using older rotational hard drives and not with systems using solid state drives (SSDs). I’ll
dedicate the entirety of my next blog post to this topic.

Another way to look at it is to benchmark each significant layer of the user experience. This
approach may look something like this -

While taking this layered approach, it’s important to apply components that have large overall
impact, last. For example, Antivirus software not only has impact on CPU and disk directly but
because it may be scanning binaries as they’re loaded by other processes, it can slow those other
components down. Install it after all of your other components.
Disk encryption software is another example of a component that may have large overall impact.
Conclusion

By following this iterative process, you’ll be able to understand the time each design decision costs
you.
Consider
Time (in minutes) x Average Reboots Per Day x Average Employee Cost Per Minute x #Employees

This number might lead you to re-evaluate design decisions in your client image.
Windows Enterprise Client Boot and Logon
Optimization – Part 4, An Aside on ReadyBoot

During my last post, I discussed a benchmarking process you might use during the development
of your Windows client system image. To recap -

The piece that I didn’t discuss was Optimize ReadyBoot.


So What is ReadyBoot?
ReadyBoot has been around for a long while and was introduced to address Disk I/O bottlenecks.
The idea is that the ReadyBoot driver pre-fetches data from disk ahead of when it’s needed and
caches it in RAM. When it comes time for that data to be used, it’s available from RAM at low
latency.
In order to fetch the right data at the right time, ReadyBoot builds a boot plan based on past
experience. In other words, ReadyBoot pays attention to the data read by the system during past
boot sequences and uses this information to pre-fetch data on the next boot. ReadyBoot also
optimizes the location of this data on disk during scheduled disk defragmentation runs.
So Why Optimize ReadyBoot During the Benchmarking Process?

ReadyBoot takes time to optimize (a number of reboots and a defrag cycle). Installing Windows
straight off distribution media and gathering a timing trace won’t accurately represent the system
at its best performance.
Additionally, large changes to the system can mean the last boot plan is sub optimal. ReadyBoot
usually tolerates small system changes but for example, installing Antivirus software which loads
large virus definition files during boot will invalidate the boot plan.
What Happens When the Boot Plan is Suboptimal?

A suboptimal boot plan means that the right data is not pre-fetched at the right time and boot
slows down.
Terminology that ties into the diagram above –

 A Hit – A read request serviced by the ReadyBoot RAM cache


 A Pend - A read request for data not already in the ReadyBoot RAM cache but scheduled for
prefetching within the next 500 I/O’s
o The requesting process will be blocked until the data is pulled into the cache
 A Miss - A read request for data not already in the ReadyBoot RAM cache and not
scheduled for prefetching within the next 500 I/O’s
o I/O is sent directly to the disk, disrupting the pre-fetcher
o High I/O pressure may result in the pre-fetcher failing to keep up

How Do I Manually Optimize ReadyBoot?


An additional tool is included with the Windows Performance Toolkit (WPT). XBootMgr.exe is the
tool used to gather boot traces during early versions of WPT (it may still be used for this purpose
but now you have WPRUI.exe).
You can trigger ReadyBoot optimization from an elevated command prompt (Run as
Administrator) using
xbootmgr –trace boot –prepSystem –resultPath <path>

The <path> is used to store boot traces during the optimization process.
Executing this command will trigger a series of reboots that build a boot plan, defrag the hard
drive and optimize ReadyBoot. It’s a time consuming operation and requires a logon after each
reboot. It’s not uncommon for observers to think a system has hung during the defrag interval.

For these reasons, use ReadyBoot optimization cautiously; only after large changes to the system
or when you see that ReadyBoot efficiency is dropping away.
How Do I Measure ReadyBoot Efficiency?

ReadyBoot efficiency data may be extracted from a timing trace. Just as we extracted boot timing
information and trace statistics in Part 3 of this blog series, we can extract ReadyBoot information
with
xperf -i <filename.etl> -o ReadyBootSummary.txt -a bootprefetch -summary

This time, the output is simple text and so we choose an output file with a .txt extension. Once
collected, the output looks like

As mentioned in the graphic, you should aim for Prefetch-Hit Ratio and Request-Hit Ratio to
exceed 80%.
Keep in mind that as more components are installed, more data may need reading from disk
during boot. The longer the boot process goes, the more variance there may be in the boot
sequence and an optimized boot plan may be harder to achieve.
How Does Hardware Effect ReadyBoot Efficiency?
CPU and disk speeds play a significant role and ReadyBoot can only achieve so much. If your
system has a very high CPU speed but a very slow disk (e.g. 5400 RPM), pre-fetched data may be
consumed before ReadyBoot can pre-fetch more, leading to a high miss ratio.
In general, more RAM offers ReadyBoot more cache and in turn, you should see better system
performance during boot and logon.

In Part 3, I mentioned that you don’t need to optimize ReadyBoot if your system uses a Solid State
Drive (SSD). SSDs are fast enough that there’s no significant benefit to using ReadyBoot and so,
Windows disables the feature.
Conclusion

The first four posts of this blog series have introduced the idea of Windows Enterprise Client Boot
and Logon Optimization. I’ve discussed tools and instrumentation you might use to trace and
benchmark a system as a Windows client image is developed.

Finally in this post, I’ve discussed ReadyBoot functionality, when it’s important to optimize it and
how do execute that optimization.
With the first four posts in this series, you should be empowered to design your Windows
Enterprise Client system image with performance in mind.

Windows Enterprise Client Boot and Logon


Optimization – Part 5, Windows Performance
Analyzer – A Tour

Up to this point, I’ve discussed a process you can use to benchmark your Windows client image as
it’s designed. In this way you can come to understand how choices you make influence boot and
logon performance of your image on specific system hardware.

Going forward, I’ll be talking about how you can troubleshoot and isolate boot and logon
performance issues that unexpectedly creep into production. This is the hard (or harder) stuff.
Before we get to troubleshooting, I want to give you a tour of the tool you’ll be using. In addition
to using xperf.exe for producing a boot Summary.xml (discussed in Part 3), you need to be familiar
with Windows Performance Analyzer (WPA).
Windows Performance Analyzer (WPA)
WPA.exe is in your path after you install the Windows Performance Toolkit (WPT). For analysis, I
recommend completing a full installation of WPT instead of a flat-copy (discussed in Part 2). You
can safely copy .ETL trace files to an analysis system to save you installing the tools on every client
that has an issue.

The additional benefit to a full install is that .ETL files will have a file association to WPA. Double-
clicking an ETL file will open it and the default view you’ll see is this -

At this stage, it’s not beneficial to explain every option. I want to cover just enough to get you
started. As this blog series continues, I’ll introduce new features when they’re needed.
System Configuration
The first thing I find useful in troubleshooting boot and logon performance is to see system
details. This information is available by selecting System Configuration from the Trace menu. Doing
so opens a new tab in the analysis view. Here you can see I’ve selected the Storage category where
I can retrieve details of the physical disk.
In this example, you’ll notice the system uses a virtual hard drive – it’s a virtual machine running on
Hyper-V.

For a physical system, you’ll see the manufacturer and model number that may be used to
research disk detail. You may find the hard drive to be a 5400 RPM rotational drive and further
analysis showing saturated disk utilization. This may be evidence enough for a disk upgrade.

If you’re finished with System Configuration, you can close the tab just as you would with an
Internet browser.
Graph Explorer
The left-hand pane, called Graph Explorer, provides access to a number of pre-defined analysis
graphs. These graphs are categorised in System Activity, Computation, Storage, Memory, Power and
others depending on the options selected during trace capture.
Expanding a category provides access to the graphs. Double-clicking a graph adds it to the current
analysis view -
Here, I’ve added the CPU Usage (Sampled) graph and its data table.
As more graphs are added to the analysis view, they’re stacked on top of each other and the
analysis view gains a vertical scroll bar.
Graphs
Graphs may be highly customised and their display is tied to the ordering of columns in their
attached data table. You’ll see a lot of examples of this as the blog series continues but for now,
understand this -

 Columns to the left of the golden bar trigger aggregation


o Without anything to the left of the golden bar, the table would display every single
event as a separate row. In the screenshot above, Process is the aggregator and
columns in the middle of the table are added or counted (depending on
configuration). Again referring to the screenshot above, we see there
were 58,917events captured for svchost.exe (916)
 Columns to the right of the blue bar are called the graphing columns and their type
determines the graph format
o Numeric columns –> line graphs
o Timestamps –> checkpoint graphs
o A pair of timestamps –> Gantt chart
o An offset column –> state transition graph

To gain a feel for this, I suggest you try opening a few graphs for yourself. Examples of each graph
type mentioned above are -

 Computation –> CPU Usage (Sampled)


 System Activity –> Generic Events
 System Activity –> Processes
 Storage –> Disk Offset

During the analysis I’ll show you, there are times where I’ll modify the table. You should be aware
that this alters the appearance of the graph.
In situations where I want to maintain the graph but modify the table, I’ll open two instances of the
same graph, hide the table of the first instance and hide the graph of the second instance. This is
achieved using the buttons in the top right corner of each individual graph -

You’ll often want to focus on a small region of the graph. You may zoom to that region by
highlighting it and right-clicking -

Zooming in this way, not only expands the interesting region across the analysis view but also
filters the data table to display only events occurring in that region.
Analysis Views
Opening WPA provides you with the first analysis view. All graphs in the same analysis view are
subject to zooming. In other words, selecting a region in the CPU Utilization (Sampled) graph and
zooming to it, performs the same operation on every other graph residing in the same analysis
view.
New analysis views allow separate streams of investigation. To create a new analysis view, open a
new tab -

Conclusion
For now, I’ve covered WPA in enough detail to make a start on boot and logon performance
analysis. As this series continues, I’ll introduce new WPA features in the context of the task I’m
talking about.
Windows Enterprise Client Boot and Logon
Optimization – Part 6, Boot and Logon
Overview 0

Before I discuss boot and logon analysis in any detail, it’s important to provide an overview so that
you clearly understand the troubleshooting approach.

The boot and logon of Windows includes a number of distinct phases. When a particular phase is
slow, the investigation of delays requires an understanding of –

 Activities within each phase


 Activities that cross phase-boundaries

Windows Performance Analyzer (WPA) (see Part 5) as well as analysis summaries you can generate
with XPerf.exe, (seePart 3 and Part 4) expose these activities for examination. Consider -

In more detail, the troubleshooting approach is –

1. Gather a boot trace and examine the boot phases. Systematically investigate the most time
consuming phasesa
2. Review disk utilization during the delayed phase. Disk I/O is a common bottleneck. Identify
processes that are high contributors
3. Review CPU utilization during the delayed phase. Identify processes that are high
contributorsb
4. Examine the behaviour of activities that occupy and span the lengthy phase
5. Failing all else, proceed to wait analysis c

a. Investigate one phase at a time


b. Multi-processor systems may not obviously reflect CPU consumption.
Keep in mind that consumption of a single core on an 8 processor system
= 100 / 8
= 12.5% processor consumption
c. A detailed discussion of Wait Analysis will appear in a later post

The very early phases of boot will not be included in the trace. During POST and OSLoader, Event
Tracing for Windows (ETW) hasn’t started yet -
A Note on Trace Capture
For most of the analysis I’ll be discussing, a timing trace (discussed in Part 3) includes sufficient
detail. Some of the analysis however, will require a greater level of detail. For this reason, I
recommend the collection of an analysis trace any time you want to investigate boot and logon
delays – it just saves you having to do it twice.
An analysis trace uses the same options as a timing trace (shown below in green) except for the
detail level (shown below in red)

Conclusion
I’ve discussed the foundational information you need to begin boot and logon analysis. In posts to
come, I’ll discuss analysis of cross-phase activities as well as each boot phase in turn.
For each boot phase, I’ll discuss the activities that occur within the phase, measurement of the
phase and potential problems.
Windows Enterprise Client Boot and Logon
Optimization – Part 7, ReadyBoot Analysis

In my last post, I proposed that the troubleshooting approach for boot and logon performance is
to identify long running boot phases and then to investigate activities that occur during and across
those phases.
ReadyBoot is an activity that occurs across multiple phases on systems with rotational disks.

Recap
Part 4 of this series introduced ReadyBoot and I provided a method for extracting ReadyBoot
statistics from a boot trace. To revisit -

We can extract ReadyBoot information with


xperf -i <filename.etl> -o ReadyBootSummary.txt -a bootprefetch -summary
The output is simple text and so we choose an output file with a .txt extension. Once collected, the
output looks like

As mentioned in the graphic, you should aim for Prefetch-Hit Ratio and Request-Hit Ratio to
exceed 80%.

This data already provides us with an indication of whether there are problems with ReadyBoot.
Event Log
The Windows Event Log also contains some information about the state of ReadyBoot. Specifically,
whether the boot plan was correctly calculated. To see this, navigate to
Event Viewer –> Application and Services Logs –> Microsoft –> Windows –> ReadyBoost –
> Operational
The ReadyBoost (not a typo) Operational log writes an event 1016 every time the boot plan is
calculated. When this occurs after boot, the 1016 should be logged with a Reason of System Boot
Completion and a Result of 0x0 -
You may see a Reason other than System Boot Completion but these events do not represent the
boot plan calculation done at boot time.

In cases where the Result is not 0x0, there was an error calculating the boot plan.
Windows Performance Analyzer (WPA)
Any analysis trace captured on a rotational disk system running Windows 7 or later includes
ReadyBoot events. Opening this trace with WPA provides a ReadyBoot graph under the Storage
category.
It’s possible an error or some manual intervention prevents ReadyBoot from running. Typically, this occurs
when the SuperFetch service is disabled and the following registry value is set to 0x5, causing Windows to
assume the system boots from SSD. 0x0 also disables pre-fetching, while 0x3 is the default and
recommended value.

HKLM\SYSTEM\CurrentControlSet\services\rdyboost\Parameters\ReadyBootPlanAge
I like to view the graph and manipulate the table at the same time so I open two instances
(see Part 5 for WPA usage) -

 First ReadyBoot graph instance – display as graph only


 Second ReadyBoot graph instance – display as table only
This screen shot shows my preferred column ordering which I’ve achieved by dragging column
headers. Columns that were not visible were added by right-clicking a column header and adding
the column of interest. Lastly, I clicked the Size column heading to sort by size.
This view shows me that 137 MB of read-hit data came from the C:\Trend Micro folder during
boot.
This example comes from a very old Windows 7 trace where this behaviour was quite common
across all antivirus products – i.e. loading massive virus definition files during boot (which places a
lot of I/O pressure on the system). To my knowledge, all antivirus vendors have take steps to
improve on this behaviour and a lot less antivirus data is pre-fetched during boot in the latest
version of most products.
Remediation
The information I’ve provided in this post probably suggests some actions you can take when
ReadyBoot has issues but let me spell them out for you -

 There’s no ReadyBoot data in your trace


o First, make sure your system has a rotational drive and not an SSD
o If your system has a rotational drive, ensure
thatHKLM\SYSTEM\CurrentControlSet\services\rdyboost\Parameters\ReadyBootPlanAg
e has a value of 0x3 and that the SuperFetch service is running with a start-up type
of Automatic
 ReadyBoot Prefetch-Hit Ratio and Request-Hit Ratio are lower than 80%
o Check the disk speed of the system. If it’s a 5400 RPM drive in a system with one or
more fast CPUs, pre-fetched data may be processed too quickly for the ReadyBoot
driver to keep up with demand. It may not be possible to improve ReadyBoot
efficiency
o Examine the data being read during boot using the ReadyBoot table as I’ve outlined
above. Consider whether there are components that add large volumes of data to
the boot and logon process that might be removed or changed to a leaner version.
Keep in mind the volume of data pre-fetched for a brand new installation ofWindows
8.1 is 315 MB so components that compare to or exceed this number are of interest
o Optimize ReadyBoot as discussed in Part 4 of this blog post series
 Event Log 1016 contains a Result code other than 0x0
o Optimize ReadyBoot as discussed in Part 4 of this blog post series

Note that in some cases, I’ve observed the need to delete the contents of %windir%\Prefetch and
to run the ReadyBoot optimization again. Deleting the contents of the prefetch folder is often
discouraged but the act of manual ReadyBoot optimization rebuilds it for you.
As I’ve mentioned a few times now, ReadyBoot still has limitations. There’s only so much that may
be done to account for poor Disk I/O and reading large volumes of data during boot and logon.
Ideally, new systems will use SSDs for their OS disk and you keep the boot and logon process as
lean as possible.
If an SSD is out of the question, steer clear of 5400 RPM disks and ensure the OS disk on systems
you deploy to your Enterprise are at least 7200 RPM.
Conclusion
We’ve discussed the analysis and remediation of ReadyBoot which is the first cross-phase activity
you need to consider during boot and logon performance analysis.

Windows Enterprise Client Boot and Logon


Optimization – Part 8, Code Integrity Checking
Today I’m going to discuss another activity that occurs across boot phases – Code Integrity
checking.
Code Integrity validates driver signatures. Driver signatures may be self-contained within the driver
binary – embedded-signed, or they may be stored in catalog files elsewhere on disk. Code Integrity
checking is mostly associated with delays in the early stages of boot but does occasionally surface
in later phases also.

When loading a driver that is not embedded-signed, Code Integrity scans all catalog files on the
system, trying to locate the signature. This operation delays validation of the driver and slows
down the associated boot phase which is particularly problematic for BOOT_START drivers.
Windows Performance Analyzer (WPA) shows catalog reload events in the System Activity –
> Generic Events table. Moving the Provider Name and Task Name columns to the very left provide
the following view
If you see ReloadCatalogs in the Task Name column after expanding the Microsoft-Windows-
CodeIntegrity provider, there’s a non-embedded-signed driver loading on the system.

How Do I Find an Offending BOOT_START Driver?


In order to find non-embedded-signed BOOT_START driver/s, you’ll need signtool.exe which is
available as part of the Windows SDK. To download the latest Windows SDK, go here. After
starting the installation wizard, the only option you need to check is

I recommend installing the SDK on your analysis system and if you need to validate driver
signatures on a production system, flat-copy the entire signtool.exe directory from C:\Program Files
(x86)\Windows Kits\8.1\bin\<architecture>.

Once you have signtool.exe in place, you can check BOOT_START drivers by first parsing a boot
trace with
xperf -i trace.etl -o trace.txt -a dumper

findstr /C:"I-DCStart" trace.txt > trace.csv


Secondly, use PowerShell to read through trace.csv and execute signtool.exe against each
BOOT_START driver -
$Drvrs = Import-Csv -Path .\trace.csv

ForEach ($Drvr in $Drvrs) {&'Signtool.exe' verify /v $Drvr.FileName}


Examine any errors in the output for non-embedded-signed BOOT_START drivers and contact the
driver vendor in hope of an update.

Windows Enterprise Client Boot and Logon


Optimization – Part 9, CPU and Disk Utilization
In Part 6 of this series, I proposed a boot and logon troubleshooting approach as –

1. Gather a boot trace and examine the boot phases. Systematically investigate the most time
consuming phasesa
2. Review disk utilization during the delayed phase. Disk I/O is a common bottleneck. Identify
processes that are high contributors
3. Review CPU utilization during the delayed phase. Identify processes that are high
contributorsb
4. Examine the behaviour of activities that occupy and span the lengthy phase
5. Failing all else, proceed to wait analysis c

a. Investigate one phase at a time

b. Multi-processor systems may not obviously reflect CPU consumption.


Keep in mind that consumption of a single core on an 8 processor system
= 100 / 8
= 12.5% processor consumption

c. A detailed discussion of Wait Analysis will appear in a later post


Before I discuss each individual boot phase, I want to cover the examination of CPU and Disk
Utilization with Windows Performance Analyzer (WPA).
Loading Symbols
Symbols contain information about the programming-language constructs that generated specific
machine code in a module. WPA is able to use symbols and expose the names of functions in
certain events captured during an analysis trace.
In order to use symbols, WPA must be configured to download them. I find that setting a pair of
environment variables from an elevated command prompt to be the best approach. Doing so
means I only have to do it once for my analysis system -
setx /M _NT_SYMBOL_PATH
SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols
setx /M _NT_SYMCACHE_PATH C:\SymCache
The first of these commands sets the symbol path environment variable to a local path
of C:\Symbols and downloads symbols from Microsoft. The second command specifies the
symcache path as C:\SymCache. WPA uses this as a kind of “scratch pad” to process symbol
information.

With these environment variables set, you can open your analysis trace and at any time,
select Load Symbols from theTrace menu -

Depending on the speed of your Internet connection, loading symbols may take some time -

Keep in mind that only symbols for Microsoft binaries will become available using the instructions
I’ve provided here. Third-parties may not make symbols available. You’d need to ask the vendor to
see if they have a public symbol repository.
The last thing to mention about symbols is that after collecting an analysis trace, you may find
an NGENPDB folder in the same location as the trace. This folder contains symbols for .Net
components which have been automatically generated during trace capture. If you are copying the
analysis trace to another computer, you should also copy the NGENPDBfolder.

CPU Utilization

As I’ve mentioned above, investigation of a slow boot phase is done by selecting and zooming to
that phase (more on this in future posts), after which you’ll want to start investigation with CPU
and Disk Utilization.

I find it best to examine CPU Utilization by adding two instances of Computation –> CPU Usage
(Sampled) to the analysis view.
The data in the CPU Usage (Sampled) graph/table is collected by sampling CPU activity every 10ms. For this
reason, it’s ideal for investigation of high CPU issues.
I configure the first graph instance to display only the graph and from the menu at the top left
corner, I select Utilization by CPU -

I configure the second graph instance to display only the table.


By making these choices, the graph shows me a view of total CPU utilization and the table displays
the processes that are the greatest CPU contributors (ordered top to bottom) -

Here, svchost.exe (1116) is the greatest contributor to the overall CPU utilization displayed in the
graph.
Analysis to this point may be enough to choose your next action. The top contributor to CPU
utilization may be a process you can remove or launch as a delayed scheduled task. I often see
instant messaging clients or mail clients automatically launched at logon which steal CPU time
away from other, more important activities.
Sometimes, you won’t be sure how to proceed. By loading symbols as I’ve discussed above, you
can expose the code path (stack of binaries and functions) spending most time on the CPU within
a process -
The information in the stack column appears as module!function. My reviewing the names of
functions, you may be able to deduce process behaviour at the time of high CPU consumption.
Ultimately, you may need the assistance of a Microsoft support engineer that has code access (if
the issue is in a Microsoft process) or a third-party support engineer if the issue is in non-Microsoft
code. This approach I’m suggesting just gives you a deeper view of system activity that may
empower you to take action.
Disk Utilization

Reviewing disk activity is quite similar to CPU Utilization. I start by adding two instances
of Storage –> Disk Usage Utilization by Disk.
I configure the first instance to display only the graph.
The second instance I display as a table, change the view to Disk Usage Utilization by Process, Path
Name, Stack and then change the columns so that they display as Process, Path Tree, <golden
bar>.
The resulting view looks like this -
The graph at the top gives me great insight into how busy the disk is overall while the table shows
me the processes responsible for the I/O. Expanding rows in the Path Tree column show me
exactly which files on disk are being accessed and from that, I can begin to deduce my next course
of action.
Conclusion
Today I’ve discussed my preferred approach to examining CPU and Disk Utilization with Windows
Performance Analyzer. These techniques provide evidence and clues regarding cause for high CPU
and Disk consumption.
Sometimes these investigations won’t lead to a clear next step. If the system has poor physical
hardware, it may be the case that you just can’t improve performance. On the other hand, fast
systems behaving poorly may reveal insightful results.

Windows Enterprise Client Boot and Logon


Optimization – Part 10, Boot Phase – POST
My last five posts have been setting you up for the analysis of individual boot phases. This post
and the next five will discuss each phase.
For each phase, I’ll discuss phase activities, measurement of phase duration, issues that commonly
occur and investigation of those issues.
Today I’ll start with a simple phase – POST.
As seen in the diagram, POST is too early to be traced using Windows Performance Recorder
(WPRUI).

Boot Phase POST – Phase Activity


POST is the first stage of system start-up –

 Hardware is powered on and initialized


 Firmware performs a Power-on-self-test (POST)
 MBR is read (BIOS only)
 Boot manager is launched (bootmgr.exe / bootmgfw.efi)
 Winload.exe / Winload.efi is started and boots Windows

Boot Phase POST – Measurement


As I’ve already mentioned, POST occurs too early to be recorded using an analysis trace. Systems
prior to Windows 8 have no reliable way of measuring POST.

Beginning with Windows 8, ACPI 5.0 systems record POST duration (ms) in –
HKLM\System\CurrentControlSet\Control\Session Manager\Power\FwPOSTTime
Boot Phase POST – Potential Issues
BIOS/Firmware configuration may introduce a delay -

 PXE boot enabled?


 Device boot order may delay start-up

A Dual/Multi-boot configuration introduces a delay as long as the menu timeout.


Boot Phase POST – Remediation

Update BIOS/UEFI firmware and optimize configuration.


Conclusion
Issues that may occur during POST are limited but misconfiguration could introduce significant
delays. Ensuring system BIOS/UEFI firmware is updated and properly configured is essential in
optimizing the user experience.
Windows Enterprise Client Boot and Logon
Optimization – Part 11, Boot Phase – OS Loader
and Kernel Initialization

Last time, I began a breakdown of each boot phase. In this post, I’ll continue this discussion but
move on to OS Loaderand Kernel Initialization.

The reason I’ll discuss OS Loader and Kernel Initialization together is because the summary XML
generated with xperf.exe(see Part 3) summarises the total time of both activities in a single metric.
Regions of Interest in WPA
Windows Performance Analyzer (WPA) provides a new feature starting with the Windows 10 ADK –
the Regions of Interestgraph available in the System Activity group. By default, this graph displays
no data –

From WPA’s Trace menu, you can select Trace Properties and load a Regions of Interest definition
file –

The default location for Regions of Interest definition files is C:\Program Files (x86)\Windows
Kits\10\Windows Performance Toolkit\Catalog. By clicking the Add button, you’ll see a number of
definitions to choose from. For the purposes of Boot and Logon performance analysis, you should
choose FullBoot.Boot.Regions

The Regions of Interest graph is now populated with a Gantt chart of the boot phases we’re
interested in –

I’ll make reference to the Regions of Interest as we proceed to describe each boot phase and its
analysis.

Boot Phase OS Loader and Kernel Initialization – Phase Activity


OS Loader

 Loads the system registry hive into memory


 Loads but does not initialize BOOT_START drivers
 Passes control to the kernel.

Kernel Initialization

 Starts the plug and play manager


 Initializes BOOT_START drivers loaded during the OS Loader phase
 Loads and Initializes SYSTEM_START drivers
 Passes control to the Session Manager

Boot Phase OS Loader and Kernel Initialization – Measurement


The summary XML generated with xperf.exe provides two values for measurement

 Duration of OS Loader → osLoaderDuration


 Duration of OS Loader + Duration of Kernel Initialization → PreSMSS
Regions of Interest also provides the total duration as Boot-PreSessionInit-Phase:

You’ll notice a slight discrepancy in the duration which is due to xperf.exe and Regions of
Interest calculating in different ways. These small differences are unimportant in the context of our
investigation.
Boot Phase OS Loader and Kernel Initialization – Potential Issues
OS Loader

 The osLoaderDuration is ideally < 3 Seconds


 Performance is mostly disk bound (Reading drivers/registry)
 Additional 3rd party boot start drivers can cause delays (often Antivirus)
 Non-embedded-signed drivers trigger catalog reload code integrity checks to validate
driver signatures

Kernel Initialization

 Slow starting devices and drivers


 Non-embedded-signed drivers trigger catalog reload code integrity checks to validate
driver signatures
 Antivirus drivers adding delays and overhead to disk IO

Boot Phase OS Loader and Kernel Initialization – Remediation


The first thing I’d recommend is checking for catalog reload events as discussed in the code
integrity post. You can refine the scope of investigation in WPA by first selecting the Boot-
PreSessionInit-Phase, zooming to it and then examining the Generic Events table for catalog reload
events –

If ReloadCatalog events are discovered, use the techniques in the code integrity post to identify
the drivers responsible.
The next action you can take is to look for drivers that take a long time to load. As discussed
above, the drivers of interest are BOOT_START and SYSTEM_START drivers. The load times (slowest
at the top) for these are exposed towards the bottom of the summary XML file below the PnP
section -

For any drivers causing problems, contact the vendor and look for an update.

In almost all cases, you’ll see that Disk Utilization is near 100% during this phase. The system is
trying to load drivers as fast as possible – it makes sense to push the disk as hard as possible.
Slower disks will slow this phase down.
Conclusion

Delays in OS Loader and Kernel Initialization are usually attributed to BOOT_START or


SYSTEM_START drivers that are behaving badly. This post has provided techniques to help identify
those drivers which should be removed or updated to improve performance.

Some improvement may be seen by upgrading the system disk, especially old, slow, 5400 RPM
drives.
Windows Enterprise Client Boot and Logon
Optimization – Part 12, Boot Phase – Session
Initialization
★★★★★

★★★★

★★★

★★

Mark RenodenJuly 27, 2015 1


 0
 0

This post continues the series that started here.

In my last post, I mentioned that Kernel Initialization concluded by handing control over to the
Session Manager. This signals the start of the next phase – Session Initialization.

Boot Phase Session Initialization – Phase Activity

The activities that occur during Session Initialization are -

 Autochk (Check Disk)


 Video initialization
 Loads and starts all remaining drivers (that were not BOOT_START or SYSTEM_START)
o Lasts until all PnP devices have been started
 Starts Session 0 and launches Service Control Manager (SCM)
 Starts Winlogon

Boot Phase Session Initialization – Measurement


Once again, a summary XML file generated from a boot trace using xperf.exe displays the phase
duration as SMSSInit –
Alternatively, the Regions of Interest graph (discussed in Post 11) shows this phase duration
as Boot-SessionInit-Phase:

Boot Phase Session Initialization – Potential Issues


Session Initialization may be delayed when Autochk (Check Disk) runs. Autochk runs in the
following situations –

 If you try to run Chkdsk on the boot volume it is scheduled for the next reboot
 If Chkdsk cannot gain exclusive use of the volume
 If the volume is flagged as dirty

The most common issue during this phase is slow driver initialization with graphics drivers being
the typical culprit.
Boot Phase Session Initialization – Remediation
The execution of Check Disk during boot should be obvious as its status is written to the screen as
the disk is examined. After completion, Check Disk typically re-cycles the system so evidence of
this activity will not appear in an analysis trace.
Understanding that a dirty shutdown is a common trigger for Check Disk means that you should
look for relevant causes – possibly a hardware related issue. Also,
examine HKLM\System\CurrentControlSet\Control\Session Manager\BootExecutewhich should have
a default value as shown –
Slow drivers are identified in the same way as I discussed for Kernel Initialization only they show up
in the remainder section of the summary XML file –

Again, updates for drivers causing delays should be obtained from the vendor.
Conclusion
Session Initialization experiences similar issues to Kernel Initialization. Slow drivers should be
identified and updated where possible.

Windows Enterprise Client Boot and Logon


Optimization – Part 13, Boot Phase – Winlogon
★★★★★

★★★★

★★★

★★

Mark RenodenJuly 28, 2015 4



 0
 0

This post continues the series that started here.


Today I’ll be discussing the Winlogon phase. A lot of activity takes place during Winlogon and it’s
common to see delays here. Additionally, the interplay between activities provides an opportunity
for complication so I’ll be highlighting important considerations for analysis.

Some of the activities that occur during the Winlogon phase have a dependency on supporting
network and server infrastructure. I’ll write a post regarding the influence of your infrastructure at a
later time.
Boot Phase Winlogon – Phase Activity

 Service Control Manager launches services


 Machine Group Policy is applied and start-up scripts are executed
 Logon Screen is displayed and a user logs in
 User Group Policy is applied and logon scripts are executed
 User profile loads and network drives are mapped
 Explorer is started

Boot Phase Winlogon – Measurement


As with the previous two phases, a summary XML file generated from a boot trace using xperf.exe displays
the phase duration as WinlogonInit –

Alternatively, the Regions of Interest graph (discussed in Post 11) shows this phase duration as Boot-
Winlogon-Phase:
Boot Phase Winlogon – Potential Issues

As you might imagine, there’s a reasonable degree of complexity here.


Let’s begin with Services –

 Disk contention may impact service start times


 All services configured for auto-start are started as soon as possible
 Grouped auto-start services block the next group until they all start
 Ungrouped auto-start services wait for auto-start groups
 Demand-start and delayed-auto-start services wait on auto-start services

You can envision service start as follows (again, thank you Matthew Reynolds for this conceptualization) –

In addition, consider –

 The Network List Service is demand-start


 If Group Policy processing is synchronous, it will wait for the Network List Service
o i.e. it waits for all auto-start services

Moving on to Group Policy –

 Synchronous processing waits for the network which waits for auto-start services
 Triggers for synchronous processing
o Always wait for the network at computer startup and logon policy setting
o Folder Redirection
o Roaming User Profiles
o Software Installation (via policy)
o Home Drives

Consider also that –

 Excessive numbers of GPOs increases processing time


o Fewer GPOs each with more settings are generally processed more quickly than more GPOs with fewer
settings
 Start-up and logon scripts add time
o Especially true when processed synchronously
o PowerShell scripts trigger loading of .Net libraries before they can process
 WMI and Security filtering impacts processing time
o WMI is often very expensive

And lastly, User Profiles –

 Local user profiles are generally fast to load


 Large roaming user profiles may introduce delays
o Heavy dependency on network and server infrastructure

Boot Phase Winlogon – Remediation

After examining CPU and Disk utilization, the first avenue of investigation is to expand Boot-Winlogon-
Phase: in theRegions of Interest graph. By doing so, we see a breakdown of activities with
the Winlogon phase –

The example above provides some insight that the User Profile has loaded quickly but Group Policy is a
contributing factor, both during machine policy processing as well as user policy processing. With this
information, I’ll want to check whether policy processing is in synchronous mode.

To do this, I’ll select and zoom to Boot-Winlogon-Phase:, open the Generic Events table, order columns as
follows and then expand the Microsoft-Windows-GroupPolicy provider –
Now that I know synchronous processing of Group Policy is taking place, I will also examine
the Services graph and table. Once again, column ordering is important –

Now I’ve identified a slow starting service, I would investigate whether it may be moved to delayed -auto-
start and seek an update from the vendor.
If I move back to the Generic Events table (using exactly the same layout as above), scrolling further down
reveals two rows. The first row represents the Group Policy Objects that have applied while the second row
represents the Group Policy Objects that have been filtered. These rows are a little hard to find but Field
1 contains useful information that shows up in a tool tip when you mouse over it.
Applied GPOs (machine) –

Filtered GPOs (machine) –

I see that C-GPO1 and C-GPO3 are not applied due to a WMI filter. WMI filtering may be a very expensive
operation so my next avenue of investigation would be to examine those filters and attempt to exclude
those Group Policy objects using another method.

Similarly, when I scroll down further still, the Generic Events table shows applied and filtered Group Policy
Objects for the user –

Applied GPOs (user) –

Filtered GPOs (user) –


Here, no Group Policy Objects are applied to the user and several are filtered using WMI. In addition, I can
see that some of them were also applicable to the computer which may be an indication that Group Policy
loopback mode is in use. These are facts I can use to look for Group Policy optimizations.

Lastly, after zooming to Boot-Winlogon-Phase:, I can use the Process Lifetimes graph and table to identify
processes that may have been launched by start-up or logon scripts –

Transient processes whose life ends within GPClient phases (computer or user) may provide clues regarding
script execution delaying the user experience. Consider script optimization or moving script functionality
to Group Policy Preferences (where possible).
Conclusion

Winlogon is a complex phase with a lot going on. Windows Performance Analyzer (WPA) provides useful
insight into the phase activities and makes it easier to isolate problems. As you’ll find, it takes some time
and experience to become familiar with the workflow I’ve described but with practice, analysis with these
tools is powerful and specific.

You might also like