H QoS and Micro Shaping 2014 2Q

White paper H-QoS & -Shaping
H-QoS & -Shaping

Bandwidth Performance Optimization
With the rise of cloud computing, mobile small cell deployments and prioritized content
delivery networks for web applications comes increased interconnect of performancesensitive services over large-scale service provider networks.
Data center connectivity is one example, where application, database, storage and
desktop hosting make low-latency, high-throughput connections between enterprises
and the data center a must.
Mobile backhaul is another, with increased end-points moving closer to customers with
small cell deployments, Alternate Access Vendors (AAVs) providing last-mile
connectivity to mobile operators are increasingly using a wide range of broadband
technology where bandwidth is more constrained, and performance is not as easily
assured without bandwidth optimization methods employed.
Providers offering off-net services have long understood the value of traffic
conditioning, shaping and flow prioritization as they strive to deliver the best possible
quality of service (QoS) to end-customers over wholesale infrastructure. In these
applications the goal is to make best use of finite bandwidth, optimizing latency and
availability to meet service level agreements (SLAs) that often include multiple service
tiers.
With Hierarchical QoS (H-QoS) and high-performance traffic shaping assuring off-net
services, these same operators are now deploying this technology on-net as a service
differentiator. First-movers in this area have grown significant market share in the
business services market by shaping customer traffic into their pipes. H-QoS
optimizes the interface between their access links and the customer network to
considerably outperform competitive offerings.
Likewise, enterprises who take shaping into their own hands can achieve the same
benefits without relying on the operator to understand their own application and
performance priorities, allowing them to use any provider for connectivity with optimal
results that reflect their IT and business objectives.
May 2014
H-QoS Traffic Conditioning

Hierarchical bandwidth policing (or regulation), combined with advanced -shaping
techniques, establish and enforce per-flow QoS at the service edge. Typically
employed in the uplink direction on last-mile connections, this same approach can be
applied bi-directionally at network-to-network interfaces (NNI) - anywhere a change in
link bandwidth is experienced and traffic needs to be right-sized into a more
constricted service.
Simple Bandwidth Policing: Crushing the Edge
Crushing the Edge: Policing

Metro Ethernet Forum (MEF) Carrier Ethernet services must conform to a bandwidth
profile with a Committed Information Rate (CIR, guaranteed bandwidth), and in some
cases an Excess Information Rate (EIR, best effort bandwidth). These bandwidth
envelopes are normally policed at the service edge using regulators: any traffic
exceeding these predetermined thresholds is dropped, resulting in random packet
discard that has no preference to low or high priority traffic. This crush the edge
technique is effective in preventing bursts of client traffic from entering the providers
network, and is easy to implement.
However, this technique has serious repercussions on the client traffic, especially if
customer traffic is not being prioritized into the provisioned bandwidth profile. Any
mismatch between this mapping process results in excessive packet loss,
accompanied by increased retransmission, latency and most importantly, inability to fill
the pipe. Utilization can be constricted to 20% of usable bandwidth in many cases,
and well explore why.
H-QoS & -Shaping May 2014
Hierarchical Bandwidth Optimization

Hierarchical QoS, as specified in the MEF 10.3 standard, is a traffic conditioning
method which respects a prioritized flows right of way, while allowing lower-priority
flows to effectively use leftover bandwidth from higher-priority flows to increase overall
multi-service performance. Hierarchical Bandwidth Policing (H-BWP) is the evolution of
crushing the edge; ensuring highest priority packets transmission is assured by tiered
scheduling. As a policing mechanism, this process is conducted at wire-speed without
store-and-forward queuing delays.
Ultra-fast -Shaping can be applied along with H-BWP to maximize link utilization and
greatly reduce packet discard without adding delay to latency-sensitive flows. By
queuing and scheduling lower-priority flows into unused bandwidth with packet-bypacket granularity, service flows can approach 100% utilization of available capacity,
smoothing out bursty traffic and ensuring faster end-to-end packet delivery.
This cost-effective, single-ended optimization method is the most efficient approach to
bandwidth performance optimization - no complex configuration is required, and flow
prioritization can be easily tuned to a particular clients service mix.
-Shaping in Action
The results of H-QoS and -shaping are dramatic. The mismatch customers often
experience between provisioned bandwidth and speed test results can be eliminated
with properly implemented H-QoS and -shaping at the service edge.
Tests with and without -shaping on Internet connections of 15 and 30 Mbps show a
startling difference. -Shaped up-link traffic reaches full link capacity, while
unconditioned traffic uses only a fraction of the available bandwidth. As we will see, the
main reason for this is the nature of TCP transmission, and its relation to traffic bursts
and resulting packet loss.
One variable that operators can adjust on their provider equipment (PE) is the
Committed Burst Size (CBS) - the amount of instantaneous traffic beyond the CIR that
the network element accepts before discarding packets over sub-millisecond
scheduling windows.
Typically CBS is set at the lowest value possible (the default for most network
elements), protecting the provider network from traffic bursts. Tuning this parameter
upward can increase throughput significantly, but is undesirable for two reasons: (1)
allowing bursts into the provider network impacts overall aggregation and core network
performance, affecting other customers traffic over shared infrastructure, and (2), this
technique pushes packet loss deeper into the network, where retransmission is more
expensive, resulting in longer delays and wasted provider-network bandwidth.
The more network elements there are along a service transmission path, the less
effective increasing CBS will be, as the lowest CBS value of any element the traffic
encounters will be the limiting end-to-end determinant of whether the burst survives.
Allowing traffic with CBS of 512 Kbps is ineffective if the next network element allows
64 kbps.
Note that results shown in these graphs are those reported by Speedtest.net. Test
accuracy is somewhat limited, which is why, in some cases, the reported bandwidth,
actually exceeds the CIR of the Internet connection. Despite these limitations, this test
is often what customers run to verify their service performance, and the test is a
repeatable, relative performance gauge that reflects the true state of the network and
service configuration.
Quantifying the Benefits of -Shaping

In controlled tests with precise and accurate instruments,
-shapings effect on bandwidth performance optimization
is even more dramatic. An improvement of up to 800%
can be gained when applied to TCP traffic flows accounting for over 98% of Internet traffic since 2002.
TCP traffic will further increase over UDP in the coming

years, as the most bandwidth-consuming over the top
(OTT) media applications turn to TCP to avoid detection
by firewalls and traffic policy enforcement devices. As an
example, Skype, YouTube, Apple TV, Hulu, email, peer-topeer file transfer and web browsing all transmit using the
TCP protocol. UDP is predominantly used for VoIP and
IPTV transmission over provider networks, where
prioritization is controlled to ensure lowest possible packet
loss with the benefit of the lower average latency UDP
provides.
Source: DongJin Lee, Brian E. Carpenter, Nevil Brownlee, 2011
Why the Disconnect?

When a provider turns-up a service, standards-based Service Activation Testing (SAT)
using the RFC-2544 or ITU Y.1564 standards is normally employed to validate
configuration and performance of the service, and to provide a QoS baseline to the
customer as proof of compliance with any agreed upon SLA.
it
How is
that
immediately
thereafter, a client
can experience
such a
significantly
lower
throughput
than what was
demonstrated at
turn-up?
How is it that immediately thereafter, a client can experience such a significantly lower
throughput than what was demonstrated at turn-up?
The answer lies in the nature of testing vs. actual customer traffic. The goal of turn-up
testing is to validate that CIR, EIR, packet loss, delay variation and latency comply with
performance objectives. The service is filled with UDP traffic, as UDP can be
launched reliably at full line rate without TCP retransmission requests slowing down
flows resulting from packet loss that may occur during the test. UDP doesnt care if
packets are discarded, so tests can be conducted reliably and with high repeatability.
But customer traffic is predominantly transmitted using TCP. The way clients negotiate
their willingness to transmit and receive TCP packets is determined by the degree of
packet loss in a particular session. The TCP protocol requires that every frame is
accounted for, with a receipt acknowledgement required to confirm transmission
success. However, if the sender waited for each individual packet to be acknowledged
before the next packet was sent, throughput would be greatly impacted, especially over
large area connections.
TCP Windowing
TCP handles this problem with transmission windows - a collection of frames sent
together with the expectation that they will all arrive without loss. The size of TCP
transmission windows sent adapts to the success of previous windows. If a packet is
lost in a window, all packets after the lost packet are retransmitted, and the window size
is reduced by roughly half. When windows are successfully received, the window
length slowly increases at first, then more rapidly with continued error-free
transmission.
If packets are regularly
lost, the window length
will never increase to the
size required to achieve
full link utilization. The
mismatch between port
(media) speed and the
CIR of a link ensure that
this issue is ubiquitous. If
a CPE connects to an
access link at 1 Gbps, but
the CIR of the link is
limited to 200 Mbps,
bursts of traffic beyond
the policed 200 Mbps will
result in packet loss, TCP
window reduction, and
greatly impacted
throughput.
Standard traffic shaping is
unable to effectively
smooth out these bursts,
as many occur at a
millisecond time-scale (micro-bursts), and the granularity of most shapers is not

sufficient to process traffic at this speed. -Shaping - optimizing bandwidth on a perpacket basis - is able to effectively groom micro-bursts into the CIR in a lossless
manner.
Bandwidth Performance, Optimized

Accedians H-QoS and -shaping technology is recognized as the best available by
leading Tier-1 operators. Implemented in a variety of Accedian network performance
elements, four main technologies are combined to achieve this unrivalled performance:
priority packet bypass, the BLUE queue management algorithm, H-QoS
implementation, and faster-than-packet processing granularity.
Priority Bypass
With instant traffic classification, priority flows bypass shaper queues and are
immediately transmitted. The effect is that the most latency sensitive flows are handled
as though no shaping was implemented. Most network elements performing shaping
require all traffic to be buffered long enough to be inspected, which adds a commonly
latency to all flows, regardless of priority (store-and-forward technique).
The BLUE Algorithm

Developed by IBM in 1999, the BLUE queue management algorithm greatly reduces
queue length and resulting latency when compared to standard Random Early
Detection (RED) methods used by the majority network element shapers. By using
statistical metrics to throttle upstream flows in a way that maximizes window length and
reduces packet loss before traffic arrives at the flow classifier
(http://en.wikipedia.org/wiki/Blue_(queue_management_algorithm))
Packet Processing Granularity

The Accedian flow performance assurance (FPA) processor operates at a 1 ppm (part
per million) packet processing rate. On a GbE full-line rate flow, this means that the
processor is running 5x faster than the rate at which packets are received. This allows
each packet to be handled individually, resulting in the most granular smoothing
available, operating at the s level.
This processing speed is 1,000x faster than millisecond-length micro-bursts, allowing

lower-priority packets to be precisely interleaved into flows where instantaneous
capacity is not fully used by higher-priority streams. The result is best-possible
bandwidth capacity utilization (fill) without the packet discard associated with more
lumpy, coarse shaping techniques.
In addition to packet-handling granularity, Accedian elements offer granular traffic
classification into as many as 18 queues per-port. This ensures that even the most
complex multi-service client traffic can be precisely optimized.
H-QoS Implementation
When the MEF 10.3 specification for
hierarchical QoS processing is implemented, a
service bandwidth envelop is shared between
all flow priorities. CIR is consumed
hierarchically - any higher-priority flows
unused CIR is passed to the next lower priority
flow, and so on, until all flows have maximized
the use of the total service CIR. Any remaining
CIR in the envelop is added to the available
EIR, and the same process is repeated.
Compare this to the standard method of regulating each flow in isolation to ensure a
CIR is not exceeded: for example, policing two flows to 20 Mbps to ensure a CIR of 40
Mbps is respected results in unused bandwidth that could have been shared.
Bandwidth Performance Optimization: The Impact

Compared to WAN-optimization techniques that require expensive appliances at each
service end-point, or are subject to performance variation if virtualized, purpose built,
affordable, programmable elements can optimize bandwidth performance without
variation or setup complexity. Properly implemented H-QoS and -shaping can
significantly improve bandwidth performance in a wide variety of applications over
regional, national and international networks.
Bandwidth performance optimization has the most impact where bandwidth is
expensive, or capacity cannot be easily increased, and where uplink performance is
critical to application responsiveness or overall QoS. Services affected by
retransmission delays, with bursty traffic, or where there is a mix of traffic priorities
competing for limited bandwidth fall into this category.
Examples include off-net service optimization, mobile backhaul where control plane
traffic and inter-cell synchronization must be maintained under heavy traffic loads,
financial networks where algorithmic trading often results in micro-bursts, and data
center connectivity where greatly varying TCP traffic utilization over limited bandwidth
connections affects latency and usability when compared to on-site servers.
Bandwidth performance optimization benefits the provider as well as the client, with
smoother traffic entering the operators network, and full purchased-capacity delivered
to the customer. When implemented properly, its a win-win situation with clear results
everyone can easily see in the resulting service performance.
2014 Accedian Networks Inc. All rights reserved.

Accedian Networks, the Accedian Networks logo, SkyLIGHT, Plug & Go, AntMODULE, Vision EMS,
Vision Suite, VisionMETRIX, V-NID, R-FLO, Network State+, Traffic-Meter & FlowMETER are
trademarks or registered trademarks of Accedian Networks Inc.
All other company and product names may be trademarks of their respective companies. Accedian
Networks may, from time to time, make changes to the products or specifications contained herein
without notice. Some certifications may be pending final approval, please contact Accedian Networks for
current certifications.

H QoS and Micro Shaping 2014 2Q

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

H QoS and Micro Shaping 2014 2Q

Uploaded by

Copyright:

Available Formats

White paper H-QoS & -Shaping

H-QoS & -Shaping

H-QoS Traffic Conditioning

Simple Bandwidth Policing: Crushing the Edge

Crushing the Edge: Policing

H-QoS & -Shaping May 2014

Hierarchical Bandwidth Optimization

H-QoS & -Shaping May 2014

H-QoS & -Shaping May 2014

Quantifying the Benefits of -Shaping

TCP traffic will further increase over UDP in the coming

Source: DongJin Lee, Brian E. Carpenter, Nevil Brownlee, 2011

H-QoS & -Shaping May 2014

Why the Disconnect?

millisecond time-scale (micro-bursts), and the granularity of most shapers is not

Bandwidth Performance, Optimized

The BLUE Algorithm

Packet Processing Granularity

H-QoS & -Shaping May 2014

This processing speed is 1,000x faster than millisecond-length micro-bursts, allowing

Bandwidth Performance Optimization: The Impact

H-QoS & -Shaping May 2014

2014 Accedian Networks Inc. All rights reserved.

H-QoS & -Shaping May 2014

You might also like