Professional Documents
Culture Documents
May 2014
However, this technique has serious repercussions on the client traffic, especially if
customer traffic is not being prioritized into the provisioned bandwidth profile. Any
mismatch between this mapping process results in excessive packet loss,
accompanied by increased retransmission, latency and most importantly, inability to fill
the pipe. Utilization can be constricted to 20% of usable bandwidth in many cases,
and well explore why.
Ultra-fast -Shaping can be applied along with H-BWP to maximize link utilization and
greatly reduce packet discard without adding delay to latency-sensitive flows. By
queuing and scheduling lower-priority flows into unused bandwidth with packet-bypacket granularity, service flows can approach 100% utilization of available capacity,
smoothing out bursty traffic and ensuring faster end-to-end packet delivery.
This cost-effective, single-ended optimization method is the most efficient approach to
bandwidth performance optimization - no complex configuration is required, and flow
prioritization can be easily tuned to a particular clients service mix.
-Shaping in Action
The results of H-QoS and -shaping are dramatic. The mismatch customers often
experience between provisioned bandwidth and speed test results can be eliminated
with properly implemented H-QoS and -shaping at the service edge.
Tests with and without -shaping on Internet connections of 15 and 30 Mbps show a
startling difference. -Shaped up-link traffic reaches full link capacity, while
unconditioned traffic uses only a fraction of the available bandwidth. As we will see, the
main reason for this is the nature of TCP transmission, and its relation to traffic bursts
and resulting packet loss.
One variable that operators can adjust on their provider equipment (PE) is the
Committed Burst Size (CBS) - the amount of instantaneous traffic beyond the CIR that
the network element accepts before discarding packets over sub-millisecond
scheduling windows.
Typically CBS is set at the lowest value possible (the default for most network
elements), protecting the provider network from traffic bursts. Tuning this parameter
upward can increase throughput significantly, but is undesirable for two reasons: (1)
allowing bursts into the provider network impacts overall aggregation and core network
performance, affecting other customers traffic over shared infrastructure, and (2), this
technique pushes packet loss deeper into the network, where retransmission is more
expensive, resulting in longer delays and wasted provider-network bandwidth.
The more network elements there are along a service transmission path, the less
effective increasing CBS will be, as the lowest CBS value of any element the traffic
encounters will be the limiting end-to-end determinant of whether the burst survives.
Allowing traffic with CBS of 512 Kbps is ineffective if the next network element allows
64 kbps.
Note that results shown in these graphs are those reported by Speedtest.net. Test
accuracy is somewhat limited, which is why, in some cases, the reported bandwidth,
actually exceeds the CIR of the Internet connection. Despite these limitations, this test
is often what customers run to verify their service performance, and the test is a
repeatable, relative performance gauge that reflects the true state of the network and
service configuration.
it
How is
that
immediately
thereafter, a client
can experience
such a
significantly
lower
throughput
than what was
demonstrated at
turn-up?
How is it that immediately thereafter, a client can experience such a significantly lower
throughput than what was demonstrated at turn-up?
The answer lies in the nature of testing vs. actual customer traffic. The goal of turn-up
testing is to validate that CIR, EIR, packet loss, delay variation and latency comply with
performance objectives. The service is filled with UDP traffic, as UDP can be
launched reliably at full line rate without TCP retransmission requests slowing down
flows resulting from packet loss that may occur during the test. UDP doesnt care if
packets are discarded, so tests can be conducted reliably and with high repeatability.
But customer traffic is predominantly transmitted using TCP. The way clients negotiate
their willingness to transmit and receive TCP packets is determined by the degree of
packet loss in a particular session. The TCP protocol requires that every frame is
accounted for, with a receipt acknowledgement required to confirm transmission
success. However, if the sender waited for each individual packet to be acknowledged
before the next packet was sent, throughput would be greatly impacted, especially over
large area connections.
TCP Windowing
TCP handles this problem with transmission windows - a collection of frames sent
together with the expectation that they will all arrive without loss. The size of TCP
transmission windows sent adapts to the success of previous windows. If a packet is
lost in a window, all packets after the lost packet are retransmitted, and the window size
is reduced by roughly half. When windows are successfully received, the window
length slowly increases at first, then more rapidly with continued error-free
transmission.
If packets are regularly
lost, the window length
will never increase to the
size required to achieve
full link utilization. The
mismatch between port
(media) speed and the
CIR of a link ensure that
this issue is ubiquitous. If
a CPE connects to an
access link at 1 Gbps, but
the CIR of the link is
limited to 200 Mbps,
bursts of traffic beyond
the policed 200 Mbps will
result in packet loss, TCP
window reduction, and
greatly impacted
throughput.
Standard traffic shaping is
unable to effectively
smooth out these bursts,
as many occur at a
H-QoS & -Shaping May 2014
Priority Bypass
With instant traffic classification, priority flows bypass shaper queues and are
immediately transmitted. The effect is that the most latency sensitive flows are handled
as though no shaping was implemented. Most network elements performing shaping
require all traffic to be buffered long enough to be inspected, which adds a commonly
latency to all flows, regardless of priority (store-and-forward technique).
H-QoS Implementation
When the MEF 10.3 specification for
hierarchical QoS processing is implemented, a
service bandwidth envelop is shared between
all flow priorities. CIR is consumed
hierarchically - any higher-priority flows
unused CIR is passed to the next lower priority
flow, and so on, until all flows have maximized
the use of the total service CIR. Any remaining
CIR in the envelop is added to the available
EIR, and the same process is repeated.
Compare this to the standard method of regulating each flow in isolation to ensure a
CIR is not exceeded: for example, policing two flows to 20 Mbps to ensure a CIR of 40
Mbps is respected results in unused bandwidth that could have been shared.
retransmission delays, with bursty traffic, or where there is a mix of traffic priorities
competing for limited bandwidth fall into this category.
Examples include off-net service optimization, mobile backhaul where control plane
traffic and inter-cell synchronization must be maintained under heavy traffic loads,
financial networks where algorithmic trading often results in micro-bursts, and data
center connectivity where greatly varying TCP traffic utilization over limited bandwidth
connections affects latency and usability when compared to on-site servers.
Bandwidth performance optimization benefits the provider as well as the client, with
smoother traffic entering the operators network, and full purchased-capacity delivered
to the customer. When implemented properly, its a win-win situation with clear results
everyone can easily see in the resulting service performance.