You are on page 1of 15

RIGOROUS PERFORMANCE TESTING

-MODERN TESTING TOOLS

BY GRANT ELLIS

This is my second blog post in a series of


three. If you havent already read my prior
post,
Rigorous Performance Testing How We Go
t Here
, then mosey on over for some extra
context.

A QUICK RECAP
We all know that the Internet has gone through some serious evolution over the past 25 years (really! 1989!). Data Centers
and hosting technologies have changed; media has changed (copper to fiber!); switches and peering points have changed;
content has changed; addressing and routing has changed (IP Anycast); devices have changed; content has changed.
In the last five years alone, we have seen a transition to rich, interactive, and dynamic sites and applications. Clients are
accessing those applications on handheld devices instead of computers. Connectivity is largely wireless instead of wired.
These are great technologies, and our lives are better for it but these same technologies do horrible things to web
performance.
Similarly, measuring performance has become quite complicated. Before the web, the simple, venerable ping was sufficient
while the web was in its infancy. Then, as bandwidth demands grew, we needed to use HTTP-aware testing tools like cURL.
With the adoption of the commercial web, paradigms changed and it became important to measure whole pages with tools
like Mercury Load Runner (now HP).
When CDNs started helping the middle-mile with decentralized infrastructure, the testing tools themselves needed to
decentralize in order to capture performance data with the CDNs in-line. Gomez (now Compuware) and Keynote stepped in
with browser-based testing agents distributed all over the middle-mile (backbone) of the Internet.

USER EXPERIENCE METRICS FOR THE MODERN WEB


Now, the web is filled with super-dynamic sites and applications. All of these applications are dynamic on the client-side as
well as the server-side. The browser mechanics of a modern application are complicated in themselves, and so testing
methodologies have become more sophisticated. One huge differentiator is which performance metrics are tracked.

FULLY LOADED
Prior testing tools would simply start a timer, initiate the page load, and then stop the timer after the underlying internet
connection was disused. In the Web 1.0 world, this was as sufficient test the browser needed all the content in order to
actually render the page and get that user happily using. On the modern Web 2.0+, pages dont need everything in order
to be actually functional. Secondary content and/or personalized content may be loaded asynchronously (for example,
below-the-fold loading), but the page may be fully functional beforehand. Ternary backend functions like analytics beacons
have no bearing on function from the users perspective. With these points in mind, internet connection idleness is no
reflection of user experience, and Fully Loaded has become less relevant.

DOCUMENT COMPLETE
The Document Complete event is fired in the browser when, well, when the document is complete. Generally, this means
that the page is visually complete, responsive to the user (user can search, scroll, click links, etc.). However, the browser
may still be loading asynchronous content or firing beacons see Fully Loaded above.
However, this metric is imperfect as well: some sites deliberately defer loading of prominent content until after Document
Complete.

BEWARE
Some Front-End Optimization (FEO) packages can defer execution of Javascript until after Document Complete.
Script deferral can be hugely misleading. Visual completeness may occur sooner, and Document Complete may be
significantly improved as well. Testers will even see evidence of the visual completeness in videos, filmstrips, and screen
shots.
However, despite visual completeness, the page may not be responsive until long after Document Complete users may
not be able to click links, scroll, or search. From a user's perspective, this is hugely frustrating and contributes to bounce
rates. Imagine if someone switched your browser window for a screen shot, and you kept trying to click links but nothing
would happen!

Perhaps more importantly, this tactic improves Document Complete, but only at the cost of making the metric meaningless
altogether! One of the primary tenets of Document Complete is that the page is ready for the user. With script deferral, the
page is not ready for the user even if it looks ready.

VISUALLY COMPLETE
Visually Complete is the moment that all visual elements are painted on the screen and visible for the user. Note that visual
completeness is not the same as functional. See the beware block above!

START RENDER (OR RENDER START)


The Start Render event is fired in the browser when something (anything!) is first painted on the screen. The paint event
may be the whole page but it could instead be a single word, single image, or single pixel. That may not sound significant
after all, if the content is not there and the user cant interact, then what is the value?
Keep in mind that, before Start Render fires, the user is staring at a blank white browser screen, or, worse, the prior page
from which they just tried to navigate away from. From the users perspective, Start Render is the moment that the web site
is clearly working properly.
There is significant evidence that Abandonment (bounce rate) is correlated very strongly with slow Start Render timings.
Arguably, Start Render is the most important metric of all.

FIRST BYTE
When the browser requests the base page, that request must traverse the Internet (whether or not a CDN is in play), then
the hosting facility must fetch (or assemble) the page, then the response must traverse the Internet again back to the device
requesting the page. First Byte is the time it takes for the first byte of the response to reach the browser. So, First Byte is a
function of twice network latency plus server latency. Other factors, like packet loss, may also impact this metric.
First Byte is transparent for your users. However, the metric is still important because it is critical path for all browser
functions.

SPEED INDEX
The Start Render event is fired in the browser when something (anything!) is first painted on the screen. The paint event
may The Speed Index is a metric peculiar to WebPageTest (more on that below). Loosely speaking, the Speed Index is the
average amount of time for visual components to be painted on the screen. More technically, if we plotted all the paint
events, then measured the area above the curve, we would have the Speed Index. That is, the Speed Index is the integral of
the area above the visual completeness curve.
Pages with a faster Start Render and a faster Visually Complete would have a greater percentage of the screen painted at
any time so the area above the curve would be less, and the Speed Index would be less (lower is better).
WebPageTest has excellent technical documentation on the Speed Index here.
Note again that a fast Speed Index is not the same as functional page. See the beware block above!

TOOLS THAT SUPPORT USER EXPERIENCE METRICS


REAL USER MONITORING (RUM) TOOLS
Middle-mile (or backbone) testing tools are great for measuring availability from the broader Internet, but they never reflect
the experience your users are actually seeing especially those using wireless connectivity (even Wi-Fi!).
RUM Tools are the best way to fill this gap. Basically, performance data is collected from your end users as they browser
your site. RUM tools track all of the above metrics (except Speed Index) and represent exactly what your users are seeing
(with one or two exceptions see below). RUM tools are really easy to install: just paste in a JavaScript tag.
Pros:

True user experience.


Easy set-up
Support for a broad range of browsers and devices.
Collects data from various real-world connection types including high-latency wireless and packet-loss scenarios.
Open source tools are available (Boomerang.js).

Cons:
Inserting a third-party tag hurts performance to a degree. The act of measuring performance with RUM also hurts
performance.
Safari doesnt support the browser APIs on which RUM tools are dependent. Data for Safari browsers will be a
subset of the metrics above, and remaining metrics are approximated using JavaScript timers rather than using
hyper-accurate native browser code.
Outliers can be extreme and must be removed before interpreting aggregate data.
RUM requires live traffic. It is not possible to use RUM to measure performance of a site pre-launch.

SYNTHETIC TOOLS
RUM tools are excellent for measuring performance, but sometimes we really need synthetic measurements especially
for evaluating performance of pre-production environments (code/stack).

WEBPAGETEST
WebPageTest is an open-source, community-supported and widely endorsed tool for measuring and analyzing
performance. The testing nodes are community-sponsored and freely available however, it is possible to set up private
testing nodes for your own dedicated use. Scripting capabilities are vastly improved on private nodes.
Pros:
Measures user experience metrics, albeit from backbone locations.
Supports traffic shaping, so testers can configure specific bandwidth, latency, or packet-loss scenarios. The traffic
shaping is, of course, synthetic and thus less variable than true user connections but still this is an excellent
feature and quite representative of real-world conditions.
Supports a subset of mobile clients, and a wide array of browsers.
Cons:
Limited testing agent geographies available.
Great analysis overall, but very limited statistical support.
Extremely difficult to monitor performance on an ongoing basis or on regular intervals for a fixed period. Testers must
set up private instances and WebPageTest Monitor in order to monitor performance.
Nodes are not centrally managed and therefore have inconsistent base bandwidth and hardware spec. Furthermore,
they can sometimes be unstable or unavailable.
Supports multi-step transactions only on private nodes.

CATCHPOINT
Catchpoint is a commercial synthetic testing package. Catchpoint has a massive collection of domestic and international
testing nodes available, and a powerful statistical analysis package.
Pros:

Tracks user experience metrics.


Supports ongoing performance monitoring.
Easy to provision complicated tests.
Supports multi-step transactions.
Captures waterfall diagrams for detailed analysis.
Supports true mobile connection testing. The agents themselves are desktop machines, but they operate on wireless
(Edge/3G/4G/LTE) modems.
Excellent statistical package.

Cons:
No traffic shaping available. All backbone tests have very high bandwidth and very low latency, so results are not
necessarily representative of end-user performance.
No support for mobile devices (note that mobile connections are supported).

KEYNOTE SYSTEMS
Keynote is also a commercial synthetic testing package. Keynote has existed for a LONG time, and formerly measured
only the Fully Loaded metric. However, they have recently revised their service to measure user experience metrics like
Document Complete and Start Render.
Pros:

Tracks user experience metrics.


Supports ongoing performance monitoring.
Easy to provision complicated tests.
Supports multi-step transactions.
Captures waterfall diagrams for detailed analysis.

Cons:
No traffic shaping available. All backbone tests have very high bandwidth and very low latency, so results are not
necessarily representative of end-user performance.
No support for mobile devices.

PERFORMANCE DATA ANALYSIS


So, youve picked your performance metrics and your tool, and now you have plenty of data. What are the next steps?
In the final installment of this series, we will discuss statistical analysis and interpretation of performance data sets.

www.instartlogic.com/blog/

You might also like