You are on page 1of 43

Addis Ababa University

Department of Computer Science

Advanced Computer
Networking
(CS 723)
Chapter 3: Transport Layer

Chapter 3: Transport Layer


3.1 Transport-layer services
3.2 Multiplexing and de-multiplexing
3.3 UDP: Connectionless transport
3.4 TCP: Connection-oriented transport
segment structure
reliable data transfer
flow control
connection management
congestion control

Transport Layer

3-2

Transport vs. network layer


network layer: logical communication between hosts
transport layer: logical communication between processes

relies on and enhances network layer services

Transport Layer

3-3

Transport Layer
accepts data from above, splits it up into

smaller units if need be, passes them to


the network layer, and ensures that the
pieces all arrive correctly at the other end
allows peer entities on the source and
destination machines to hold conversations
deliver messages in the order they were
sent without guarantee of the order of
delivery
broadcasting messages to multiple
destinations
Transport Layer

3-4

Transport services and protocols


gi
lo
le
ca
nd
-e
nd
a
tr
rt
po
ns

logical communication
between app processes
running on different hosts
transport protocols run in
end systems
send side: breaks app
messages into segments,
passes to network layer
rcv side: reassembles
segments into messages,
passes to app layer
more than one transport
protocol available to apps
Internet: TCP and UDP
provide

application
transport
network
data link
physical

application
transport
network
data link
physical

Transport Layer

3-5

Internet transport-layer protocols


Two end-to-end transport protocols
TCP - Transmission Control Protocol
a reliable connection-oriented protocol that allows a byte
stream to be delivered without error
handles flow control to make sure that a fast sender does not
swamp a slow receiver
UDP - User Datagram Protocol
an unreliable, connectionless protocol

for applications that do not want TCPs sequencing or flow

control and wish to provide their own


where prompt delivery is more important than accurate
delivery, e.g., audio and video

Transport Layer

3-6

Internet transport-layer protocols


Three phases are involved in TCP
connection establishment (agreement to exchange data)
data transfer (data and control information exchanged)
connection termination (termination request) - by any of the
two parties
the key characteristics of connection-oriented data

transfer is that sequencing is used


each side sequentially numbers the frames that it sends
to the other side

Transport Layer

3-7

Internet transport-layer protocols


network
data link
physical

network
data link
physical

le
ca

nd
-e
nd

network
data link
physicalnetwork

a
tr
rt
po
ns

UDP (Connectionless)
Unreliable
unordered delivery
No delay guarantees
No bandwidth guarantees

application
transport
network
data link
physical

gi
lo

TCP (Connection oriented)


reliable
in-order delivery
congestion control
flow control
connection setup

network
data link
physical

data link
physical

network
data link
physical

application
transport
network
data link
physical

Transport Layer

3-8

3.2 Multiplexing/demultiplexing
Multiplexing at send host:
gathering data from multiple
sockets, enveloping data with
header (later used for
demultiplexing)
application

P3

transport
network

P1
P1

Demultiplexing at rcv host:


delivering received segments
to correct socket

application
transport
network

link

P2

P4

application
transport
network
link

link

physical

host 1
= socket

physical

host 2
= process

physical

host 3
Transport Layer

3-9

How demultiplexing works


host receives IP datagrams

each datagram has source IP


address, destination IP address
each datagram carries 1
transport-layer segment
each segment has source,
destination port number
host uses IP addresses & port
numbers to direct segment to
appropriate socket

32 bits
source port #

dest port #

other header fields

application
data
(message)
TCP/UDP segment format
Transport Layer

3-10

Connectionless demultiplexing
Create sockets with port

numbers:

DatagramSocket mySocket1 = new


DatagramSocket(12534);
DatagramSocket mySocket2 = new
DatagramSocket(12535);

UDP socket identified by

two-tuple:

Dest IP address,
Dest port number

When host receives UDP

segment:

checks destination port


number in segment
directs UDP segment to
socket with that port
number

IP datagrams with

different source IP
addresses and/or source
port numbers directed
to same socket
Transport Layer

3-11

Connectionless demux (cont)


DatagramSocket serverSocket = new DatagramSocket(6428);
P2

SP: 6428
DP: 9157

client
IP: A

P1
P1

P3

SP: 9157
DP: 6428

SP: 6428
DP: 5775

server
IP: C

SP: 5775
DP: 6428

Client
IP:B

SP provides return address


Transport Layer

3-12

Connection-oriented demux
TCP socket identified

by 4-tuple:

source IP address
source port number
dest IP address
dest port number

recv host uses all four

values to direct
segment to appropriate
socket

Server host may support

many simultaneous TCP


sockets:

each socket identified by


its own 4-tuple

Web servers have

different sockets for


each connecting client

non-persistent HTTP will


have different socket for
each request

Transport Layer

3-13

Connection-oriented demux (cont)


P1

P4

P5

P2

P6

P1P3

SP: 5775
DP: 80
S-IP: B
D-IP:C

client
IP: A

SP: 9157
DP: 80
S-IP: A
D-IP:C

server
IP: C

SP: 9157
DP: 80
S-IP: B
D-IP:C

Client
IP:B

Transport Layer

3-14

UDP: User Datagram Protocol


UDP is best effort service

as UDP segments may be:


lost
delivered out of order
connectionless:
no handshaking between
UDP sender, receiver
each UDP segment handled
independently of others

[RFC 768]

Why is there a UDP?


no connection

establishment (which can


add delay)
simple: no connection state
at sender, receiver
small segment header
no congestion control: UDP
can blast away as fast as
desired

Transport Layer

3-15

UDP: more
often used for streaming

multimedia apps

loss tolerant
rate sensitive

other UDP uses


DNS
SNMP

Length, in
bytes of UDP
segment,
including
header

reliable transfer over

UDP:

add reliability at application


layer
application-specific error
recovery!

32 bits
source port #

dest port #

length

checksum

Application
data
(message)
UDP segment format
Transport Layer

3-16

UDP checksum
Goal: detect errors (e.g., flipped bits) in transmitted
segment
Sender:
treat segment contents as sequence of 16-bit integers
checksum: addition (1s complement sum) of segment contents
sender puts checksum value into UDP checksum field

Receiver:
compute checksum of received segment
check if computed checksum equals checksum field value:

NO - error detected
YES - no error detected.
Transport Layer

3-17

Checksum Example
Note

When adding numbers, a carryout from the


most significant bit needs to be added to the
result

Example: add two 16-bit integers


1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
Transport Layer

3-18

TCP: Overview
point-to-point:
one sender, one receiver
reliable, in-order

steam:

byte

no message boundaries

pipelined:
TCP congestion and flow
control set window size

socket
door

send & receive buffers


a p p lic a t io n
w r ite s d a ta

a p p lic a t io n
re a d s d a ta

TC P
s e n d b u ffe r

TC P
r e c e iv e b u f f e r

RFCs: 793, 1122, 1323, 2018, 2581

full duplex data:


bi-directional data flow
in same connection
MSS: maximum segment
size
connection-oriented:
handshaking (exchange
of control msgs) inits
sender, receiver state
before data exchange
flow controlled:
sender will not
socket
door
overwhelm receiver

segm ent

Transport Layer

3-19

TCP segment structure


32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)

source port #

dest port #

sequence number
acknowledgement number

head not
UA P R S F
len used

checksum

Receive window
Urg data pnter

Options (variable length)

counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept

application
data
(variable length)

Transport Layer

3-20

TCP seq. #s and ACKs


Seq. #s:
byte stream
number of first
byte in segments
data
ACKs:
seq # of next byte
expected from
other side
cumulative ACK

Host B

Host A
User
types
C

Seq=4

2, AC
K

=79, d
ata =
C

= C
a
t
a
d
=43,
K
C
A
79,
Seq=

host ACKs
receipt
of echoed
C

Seq=4

3, ACK

host ACKs
receipt of
C, echoes
back C

=80

simple telnet scenario


Transport Layer

time

3-21

TCP Round Trip Time and Timeout


Q: how to set TCP
timeout value?

Q: how to estimate RTT?


SampleRTT: measured time from

longer than RTT

If too short:
premature timeout
unnecessary
retransmissions
If too long: slow
reaction to segment
loss

segment transmission until ACK


receipt
ignore retransmissions
SampleRTT will vary
average several recent
measurements, not just
current SampleRTT

Transport Layer

3-22

TCP Round Trip Time and Timeout


EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
Exponential weighted moving average
influence of past sample decreases exponentially fast
typical value: = 0.125

Transport Layer

3-23

TCP Round Trip Time and Timeout


Setting the timeout
EstimtedRTT plus safety margin

large variation in EstimatedRTT -> larger safety margin

first estimate of how much SampleRTT deviates from EstimatedRTT:

DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically, = 0.25)
Then set timeout interval:
TimeoutInterval = EstimatedRTT + 4*DevRTT
Transport Layer

3-24

TCP reliable data transfer


TCP creates rdt

service on top of IPs


unreliable service
Pipelined segments
Cumulative acks
TCP uses single
retransmission timer

Retransmissions are

triggered by:

timeout events
duplicate acks

Initially consider

simplified TCP sender:

ignore duplicate acks


ignore flow control,
congestion control

Transport Layer

3-25

TCP sender events:


data rcvd from app:
Create segment with
seq #
seq # is byte-stream
number of first data
byte in segment
start timer if not
already running (think
of timer as for oldest
unacked segment)
expiration interval:
TimeOutInterval

timeout:
retransmit segment
that caused timeout
restart timer
Ack rcvd:
If acknowledges
previously unacked
segments

update what is known to


be acked
start timer if there are
outstanding segments

Transport Layer

3-26

NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever) {
switch(event)
event: data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)

TCP
sender

(simplified)

event: timer timeout


retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
} /* end of loop forever */

Transport Layer

3-27

TCP Flow Control


receive side of TCP

connection has a
receive buffer:

flow control

sender wont overflow


receivers buffer by
transmitting too
much,
too fast

speed-matching

app process may be

service: matching the


send rate to the
receiving apps drain
rate

slow at reading from


buffer
Transport Layer

3-28

TCP Flow control: how it works


Rcvr advertises spare

(Suppose TCP receiver


discards out-of-order
segments)
spare room in buffer

room by including value


of RcvWindow in
segments
Sender limits unACKed
data to RcvWindow

guarantees receive
buffer doesnt overflow

= RcvWindow
= RcvBuffer-[LastByteRcvd LastByteRead]
Transport Layer

3-29

TCP Connection Management


Recall: TCP sender, receiver

establish connection before


exchanging data segments
initialize TCP variables:
seq. #s
buffers, flow control info
(e.g. RcvWindow)
client: connection initiator
Socket clientSocket = new
Socket("hostname","port
number");

server: contacted by client


Socket connectionSocket =
welcomeSocket.accept();

Three way handshake:


Step 1: client host sends TCP SYN
segment to server
specifies initial seq #
no data
Step 2: server host receives SYN,
replies with SYNACK segment
server allocates buffers
specifies server initial seq.
#
Step 3: client receives SYNACK,
replies with ACK segment,
which may contain data

Transport Layer

3-30

TCP Connection Management (cont.)


Closing a connection:
client closes socket:
clientSocket.close();

client

close

Step 1: client end system

close

FIN

timed wait

replies with ACK. Closes


connection, sends FIN.

FIN

ACK

sends TCP FIN control


segment to server

Step 2: server receives FIN,

server

A CK

closed
Transport Layer

3-31

TCP Connection Management (cont.)


Step 3: client receives FIN,
replies with ACK.

Enters timed wait - will


respond with ACK to
received FINs

client

closing

closing

FIN

timed wait

Connection closed.

can handle simultaneous FINs.

FIN

ACK

Step 4: server, receives ACK.


Note: with small modification,

server

A CK

closed

closed
Transport Layer

3-32

Principles of Congestion Control


Congestion:
informally: too many sources sending too much

data too fast for network to handle


different from flow control
manifestations:
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem!

Transport Layer

3-33

Approaches towards congestion control


Two broad approaches towards congestion control:
End-end congestion
control:
no explicit feedback from

network
congestion inferred from
end-system observed loss,
delay
approach taken by TCP

Network-assisted
congestion control:
routers provide feedback

to end systems
single bit indicating
congestion (SNA,
DECbit, TCP/IP ECN,
ATM)
explicit rate sender
should send at

Transport Layer

3-34

Case study: ATM ABR congestion control


ABR: available bit rate:
elastic service
if senders path

underloaded:
sender should use
available bandwidth
if senders path
congested:
sender throttled to
minimum guaranteed
rate

RM (resource management)
cells:
sent by sender, interspersed

with data cells


bits in RM cell set by switches
(network-assisted)
NI bit: no increase in rate
(mild congestion)
CI bit: congestion indication
RM cells returned to sender by
receiver, with bits intact

Transport Layer

3-35

TCP congestion control:

additive increase,
multiplicative decrease

Approach: increase transmission rate (window


size), probing for usable bandwidth, until loss
occurs
additive increase: increase CongWin by 1 MSS
(Maximum Segment Size) every RTT until loss detected
multiplicative decrease: cut CongWin in half
after loss

Transport Layer

3-36

TCP Congestion Control


sender limits transmission:
LastByteSent-LastByteAcked
CongWin
Roughly,
rate =

CongWin
Bytes/sec
RTT

CongWin is dynamic, function

How does sender


perceive congestion?
loss event = timeout or
3 duplicate acks
TCP sender reduces
rate (CongWin) after
loss event

of perceived network
congestion

Transport Layer

3-37

TCP Slow Start


When connection begins, CongWin = 1 MSS
available bandwidth may be >> MSS/RTT
desirable to quickly ramp up to respectable rate
When connection begins, increase rate exponentially

fast until first loss event

Transport Layer

3-38

TCP Slow Start (more)


When connection

double CongWin every


RTT
done by incrementing
CongWin for every ACK
received

RTT

begins, increase rate


exponentially until
first loss event:

Host A

Host B
one segm
en

two segm
ents

four segm

ents

Summary: initial rate

is slow but ramps up


exponentially fast

time
Transport Layer

3-39

Refinement: inferring loss


After 3 dup ACKs:

CongWin is cut in half


window then grows linearly
But after timeout event:
CongWin instead set to 1 MSS;
window then grows exponentially
to a threshold, then grows linearly

Philosophy:

3 dup ACKs indicates

network capable of
delivering some segments
timeout indicates a
more alarming
congestion scenario

Transport Layer

3-40

Summary: TCP Congestion Control


When CongWin is below Threshold, sender in

slow-start phase, window grows exponentially.

When CongWin is above Threshold, sender is in

congestion-avoidance phase, window grows linearly.

When a triple duplicate ACK occurs, Threshold

set to CongWin/2 and CongWin set to


Threshold.

When timeout occurs, Threshold set to

CongWin/2 and CongWin is set to 1 MSS.


Transport Layer

3-41

TCP sender congestion control


State

Event

TCP Sender Action

Commentary

Slow Start
(SS)

ACK receipt
for previously
unacked
data

CongWin = CongWin + MSS,


If (CongWin > Threshold)
set state to Congestion
Avoidance

Resulting in a doubling of
CongWin every RTT

Congestion
Avoidance
(CA)

ACK receipt
for previously
unacked
data

CongWin = CongWin+MSS *
(MSS/CongWin)

Additive increase, resulting


in increase of CongWin by
1 MSS every RTT

SS or CA

Loss event
detected by
triple
duplicate
ACK

Threshold = CongWin/2,
CongWin = Threshold,
Set state to Congestion
Avoidance

Fast recovery,
implementing multiplicative
decrease. CongWin will not
drop below 1 MSS.

SS or CA

Timeout

Threshold = CongWin/2,
CongWin = 1 MSS,
Set state to Slow Start

Enter slow start

SS or CA

Duplicate
ACK

Increment duplicate ACK count


for segment being acked

CongWin and Threshold


not changed

Transport Layer

3-42

Chapter 3: Summary
principles behind transport

layer services:
multiplexing, demultiplexing
reliable data transfer
flow control
congestion control
instantiation and
implementation in the Internet
UDP
TCP

Next:
leaving the network
edge (application,
transport layers)
into the network
core
Transport Layer

3-43