COMP 4621 - Computer Communication Networks I

Introduction
Thursday, September 8, 2016 12:04 AM

Circuit Switching vs Packet Switching
Internet Overview Circuit Switching Packet Switching

Mechanis - Resources along a path reserved, - Statistical multiplexing: Resources statistically
m dedicated for a connection even if shared by all traffic, no reservation
- Network of networks there is no traffic
- Each owned & independently operated by different ISPs (Internet - Hosts break app-layer msg into packets, each:
Service Providers) - Admission control: • Forwarded from router to next
- All ISP employ common protocol suite: IP (Internet Protocol) • Always has set-up process • Transmitted at full link cap
prior to any communication
- Loosely hierarchical: • New connection rejected if - Route mechanism:
○ Access network: Bottom, tier-n ISPs system cannot satisfy
• Store & Forward: Wait for entire packet to
Residential, university, enterprise requirements
arrive before transmitted next
○ Core networks: Tier-1 ISPs • Queue: Arrival rate > Trans rate, packets wait
- Bandwidth sharing techniques:
Long-haul, intra, intercontinental fiber links • Loss: Buffer fill up → Drop packets
• FDM (Freq Div Multiplexing)
• TDM (Time Div Multiplexing)
- Protocol: Define:
○ Format, order of msg exchanged between comm entities Pros - QoS (Quality-of-Service) - Accommodate more user if bursty usage
○ Action taken on trans/receipt guaranteed once connection (different traffic amount at different time)
established
- Suitable for real-time services
telephone, conference call, …
Cons - Reserved connection bandwidth - Difficult to guarantee QoS if traffic congested
wasted when no data sent →
Cannot serve many users
Sources of Packet Delay on Internet

dnodal = dproc + dqueue + dtrans + dprop
NOTES:
○ Global ISP A, B, C, … Delay source Def Usually Typical
fixed delay time
○ IXP: Internet exchange point, connecting global ISP
○ Content Provider Network: Private network, bypassing tier-1, Processing - Figure out where to forward packet next Yes μs
regional ISPs to bring services close to end users dproc - Check bit error
Google, Microsoft, Akamai, … Queuing - Wait for trans at output link No μs - ms
dqueue - Depend on traffic congestion
Transmission Time to push entire packet out of routers/switches Yes μs - ms
dtrans dtrans = L / R
L: Packet length (bits)
R: Link bandwidth (bits/s)
Internet Layered Architecture Propagation - Time for 1 bit to travel from 1 end to other Yes ms
dprop dprop = d / s
d: Length of physical link
- Provide modularity & function decomposition: s: Propagation speed
○ Good: Effective to deal with large, complex software system Speed of light for fiber
○ Bad: Inflexible, redundancy of certain function in different layers - Related to link's physical medium (copper, fiber, …)
- Five protocol stacks (top → bottom):

○ Application: Support network app
HTTP, FTP, SMTP
○ Transport:
 End-to-end protocol: Control communication between 2 Network Security
procs in 2 hosts (end systems)
 Control provided: reliability, congestion, multiplexing
 NOT for routers - DoS (Denial of Service): Attacker:
TCP, UDP ○ Overwhelm resource with bogus traffic
○ Make resources unavailable to legitimate traffic
○ Network: Responsible for packet-routing among routers, hosts
IP, routing protocols - Packet "sniffing": Read/records all packets passing by network
○ Link: - IP spoofing: Send packet with false source addr

 Deal with packet trans between 2 directly connected
nodes
 Control provided: channel access, packet framing, reliability
Physical: Deal with bit trans in physical medium
COMP 4621 Page 1

○ Physical: Deal with bit trans in physical medium
COMP 4621 Page 2

Application Layer Processes & Sockets
Sunday, September 11, 2016 10:23 AM
- Network app:
Application Architecture ○ ≥ 2 procs exchanging messages

○ Each pair of procs communication uniquely
determined by (Host IP + Port)
Connect to 128.119.245.12 at port 80
- Client-Server:
○ Client:
- Socket:
 Request service by sending messages to
server ○ Interface between app & transport layer
 No directly communicate with each other ○ Procs communication only deal with this
interface, no need to see actual packet
 Usually: Intermittently connected, dynamic IP
exchanged between hosts
○ Server:
 Implement service by reading client's request,
performing action & sending reply
 Always-on, permanent IP
- Peer-to-peer (P2P):
○ Peers request & provide service directly among
each other
○ No always-on server
○ Intermittently connected, dynamic IP
○ Self-scalability: New peers bring new service cap &
demand
Transport Layer Overview
Building Blocks of Application Layer Protocol

- Services provided for app:
○ Data integrity: No loss / Loss-tolerant
- Type of msg: ○ Throughput:
Request, response  Min throughput required ?
- Msg syntax: What fields?  Elastic ? (make use of any throughput)
- Msg semantics: Meaning of fields? ○ Timing: Delay = ?
- Rules: When & How procs send & response to ○ Security
msg
- Protocols:
○ TCP: Reliable transport
 Conn-oriented: Conn setup required between client & server
procs
 Provide:
□ Flow control: Sender won't overwhelm receiver
□ Congestion control: Throttle sender when network
overloaded
 Not provide: Timing, min throughput guarantee, security
 Security: Apps MUST encrypt data themselves using SSL
○ UDP: Unreliable data transfer

Not provide: reliability, flow control, congestion control, timing,
throughput, guarantee, security, conn setup
COMP 4621 Page 3

HTTP (Hypertext Transfer Protocol)
Sunday, September 18, 2016 4:30 PM
Non-persistent vs Persistent HTTP
Overview
Non-persistent Persistent
Mechanis - Each object delivered by - Multiple objects sent over single
- Client-Server model m individually established TCP conn TCP conn
- Use TCP:
○ HTTP client initiates TCP conn to HTTP server - Multiple objects: - Eliminate overhead in establishing
○ Conn established: HTTP messages exchanged • Multiple TCP conns & maintaining multiple TCP conns
through socket interface on top of TCP • Can simultaneous → ↑
- Stateless: Server maintains no info about past Response time
client request
Response Each object: - First object:
time 2 RTT + File trans time 2 RTT + File trans time
1 RTT = Initialize TCP - Consequent objects:
1 RTT = File request 1 RTT + File trans time
RTT: Time of small packet to travel client → server & back
HTTP Messages
- Request:
Cookies
- Help web server keep track of users

Auth, shopping carts, recommendation, user session state
(webmail)
Request method types (HTTP/1.1):

- Four components embedded in:
 GET: Form input uploaded in URL
○ HTTP request/response header line
www.somesite.com/animalSearch?key=monkey
○ Web browser (client)
 POST: Form input contained in Entity Body field
○ Server database
 HEAD: HTTP responds with no requested object, used
for debugging
 PUT: Upload files to server
 DELETE: Delete files on server
- Response:
COMP 4621 Page 4

Popular response status code:
 200 OK
 301 Move Permanently: Object moved, new loc
specified later in Location field
 400 Bad Request: Request msg not understood by
server
Web Caches (Proxy Server)
 404 Not Found: Requested object not found on server
 505 HTTP Version Not Supported
- Mechanism:
○ Client sends HTTP request to origin server through proxy
○ At proxy:
 If local copy of requested object found: Immediately return it
to client
Conditional GET
 Otherwise:
□ Forward request to origin server
□ Store copy of retrieved objects
- Goal: Not send object if cached version is still up-to-
□ Pass object to client
date
- How:
- Benefits:
○ Client: Specific date of cached copy in HTTP
○ ↓ Response time
request
If-modified-since: <date> ○ ↓ Traffic on institution access link
○ Server: Check date of client's cached copy:
 If client's cached still up-to-date:
Responded with no object, status code 304
HTTP/1.1 304 Not Modified
 If client's cached is outdated: Respond with
newest object and its date:
□ Last-Modified: <date>
Assumption:
Avg object size: 100kb
Avg request rate: 15/s
RTT from institutional router to any origin router: 2s
LAN utilization:
(100kb) × (15 request/s) / (1 Gb/s) = 0.15%
Without cache:
Access link utilization:
(100kb) × (15 request/s) / (1.54 Mb/s) = 97%
Total delay
= RTT + Access delay + LAN delay
= 2s + mins + μs
With cache, hit rate = 40% (40% requests satisfied by cache)
COMP 4621 Page 5

With cache, hit rate = 40% (40% requests satisfied by cache)
Access link utilization:
0.6 * (100kb) × (15 request/s) / (1.54 Mb/s) = 58%
Total delay
= 0.6 * (Delay from origin server) + 0.4 * (Delay from cache)
≈ 0.6 * 2.01 + 0.4 * (~ms)
≈ 1.2s
COMP 4621 Page 6

Email & FTP (File Transfer Protocol)
Monday, September 19, 2016 4:35 PM
Email
FTP (File Transfer Protocol)
- Major components:
○ User agents: Compose, edit, read mail
Outlook, Thunderbird, …
○ Mail server: Mailbox + Msg queue
- Function: Transfer file to/from remote host ○ Mail transfer protocol: For transfer mail msg among mail
- Model: Client-Server servers
- Run on TCP
- Mail transfer protocol: SMTP
- "Out-of-band" mechanism: Separate control & data conn: ○ TCP, port 25
○ Server port 21 (control): Client contacts server ○ Client-Server model
authorize, browse dir, send control command ○ Direct transfer: No intermediary mail server between sending &
○ Server port 20 (data): receiving server
 Opened by server to transfer data to client upon request ○ Persistent conn
 Closed after finishing transferring 1 file ○ Msg must be 7-bit ASCII
- Stateful protocol: Server maintains "state" - Mail access protocol (client retrieving data from server):
current dir, previous auth ○ POP3:
 Stateless
- Sample commands:  "Download-and-keep": Once msg downloaded to clients, it
○ USER username, PASS password removed from server
○ IMAP:
○ LIST: return list of file in current dir
 Msg kept at server
○ RETR filename: Get file
 Allow users to organize msg in folders
○ STOR filename: Store file onto host
 User state kept across session
- Sample return codes:

○ HTTP, …
○ 331 Username OK, password required
○ 125 date connection already open; transfer starting
○ 425 Can't open data connection
○ 452 Error writing file
COMP 4621 Page 7

DNS (Domain Name Service)
Friday, October 07, 2016 11:34 AM
DNS Name Resolution
DNS Service
Iterated query Recursive query
- Hostname→IP translation
- Host/Mail server aliasing:
Get canonical (original) hostname/mail server, IP for supplied
alias hostname/mail server
- Load distribution:
○ DNS database contain replicated servers' IP for canonical
name
○ Client query → DNS server returns entire set of IPs but rotates
set order each time → Distribute traffic
Architect
Contacted server reply contacting with

name of next server to contact:
"I don't know the answer, but you
can ask this server instead"
DNS Caching
- Distributed, Hierarchical Database

- Mapping query result cached in DNS servers
- 3 classes of hierarchical server:
○ Root: - TLD server addr cached in local DNS ⇒ Prevent root DNS from
 Contacted by local DNS that can't resolve name itself frequently visited
 Contact authoritative DNS if name mapping not known
○ Top lvl domain (TLD): Responsible for .com, .org, .net, .edu,
all top-lvl country domains
○ Authoritative:
 Organization's own DNS server
 Provide mapping of organization's public host to IP
DNS Records (Resource records)
- Local DNS (Default Name): (name, value, type, ttl)
○ Not strictly belong to hierarchy
○ Each ISP (residential, company, uni, …) has 1 - ttl (Time-to-live): Existing time of record (will be removed afterward)
○ Host's DNS query sent to local DNS first:
 If result avail (cached, up-to-date) → Send to host - type = A:
 Otherwise: Forward query to hierachy ○ name = Hostname
○ value = IP
- type = NS:
○ name = Domain
○ value = Authoritative DNS's hostname for the domain
DNS Protocol - type = CNAME:
○ value = Canonical name
○ name = Alias
Query & Reply msg have same format
- type = MX: value = Name of mail server associated with name
COMP 4621 Page 8

COMP 4621 Page 9
P2P (Peer-to-Peer Applications)
Thursday, September 29, 2016 5:16 PM
BitTorrent (P2P File Distribution)
File Distribution: Client-Server vs P2P
Client-Server P2P - Torrent: Collection of all peers participating in distribution of particular file
- Tracker: Infrastructure node in each torrent, keep track of all alive peers
Time to
distribute file of
size F to N - File: broken into chunks of 256kB
client
Analysis - Server: Must sequentially - Server: Must upload at least 1 copy - Peer:
upload N file copies: down to peers: ○ Download + Upload among each other, accumulate more chunks over time
○ When obtain entire file, can (selfishly) leave torrent/(altruistically) remain
- Clients: Each must - Peers: Each must download a file
○ Can leave anytime (with incomplete chunk set) and rejoin later
download a file copy: copy:
- Server + Peers: All contribute to
- Control mechanisms: Peer X
deliver N file copies
○ X join: Register with tracker → Get rand subset of other peers' IP
Max upload rate = us + ∑ui
○ X periodically inform tracker its aliveness
Deliver time:
○ Chunks requesting: Rarest first

u1 = … = un = u, F/u = 1 hour, us = 10u, dmin ≥ us  Periodically, X ask others it knows list of chunks they have
 X then requests its missing chunks from others, starting from rarest ones
⇒ Rarest chunks distributed more quickly ⇒ Eventually equalize copy no
of each chunks
○ Chunk sending: Tit-for-tat:

 X sends chunks to 4 peers currently sending chunks to it at highest rate
 Out-of-top-4 peers' requests choked
 X re-evaluates top 4 every 10 secs
⇒ Better trading partner → Get file faster
 Every 30 secs: X rand select peer Y, start sending chunks to Y ("optimistically

unchoke"), Y may later join X's top 4
⇒ Allow new peers to get chunks
Circular DHT
Distribute Hash Table (DHT)
- Each peer only aware of immediate successor & predecessor

- Search query: O(N) - Nature:
○ Store (key, value) pair
○ Distributed, P2P database: Each peer only holds subset of data
- Design:
○ Peer identified by integer in range [0, 2n - 1] (n bits)
○ Hash function: Map original key to integer in range [0, 2n - 1]
○ Assign each (key, value) pair to peer with ID "closest" to key
ID closest successor of key
n = 4 → ID range = Hashed key range = [0, 15]
- Use shortcuts → With O(logN) neighbors, can reduce search query to Peers: 1, 3, 4, 5, 8, 10, 12, 14
O(logN) Key = 13 → Successor peer = 14
Key = 15 → Successor peer = 1 (circular)
COMP 4621 Page 10

- Handle peer churn (Peers come & go): Each peer knows addr of 2
immediate successors
COMP 4621 Page 11

Transport Layer
Friday, October 07, 2016 11:35 AM
UDP
Overview - Features:
○ Connectionless:
 No handshaking between sender & receiver
- End-to-end protocol: Provide logical comm  Each UDP seg handled indept of others
between app procs on different host ○ "Best-effort": Seg may be lost/delivered out-of-order
○ No congestion control: Data sent as fast as desired
- Min required functions:
○ Multiplexing (at sender): Handle data from - Seg structure:
multiple socket, add transport header
○ Demultiplexing (at receiver): Deliver data
seg to correct socket using header
- Socket identity (for demultiplexing):

○ UDP = (Dest IP, Dest port)
○ TCP = (Source IP, Source port, Dest IP, Dest
Port)
- Checksum: Detect error

○ Sender:
 Treat seg content, including header fields as seq of 16-bit
 Checksum = 1's complement sum of these bit strings
○ Receiver:
 Compute checksum of received seg
 Computed ≠ Header field: Error detected
Sum 1110011001100110
+ 1101010101010101
= 11011101110111011
WrapAround 1011101110111011
(If MSB = 1, omit it, add 1 to + 1
result) = 1011101110111100
Checksum 0100010001000011
(flip bit of WrapAround result)
COMP 4621 Page 12

Reliable Data Transfer
Thursday, October 20, 2016 1:34 PM
Finite State Machine (FSM)
Stop-And-Wait Protocols
(Sender sends 1 pkt, then wait for receiver to respond)
Protocol Assumption Techniques Sender FSM Receiver FSM

version
rdt 1.0 - NO bit errors
- NO pkt lost
rdt 2.0 - Bit errors on - Checksum: Detect error

data
- NO bit errors - Receiver feedback:
on receiver • ACKs: Receiver tells sender
feedback pkt OK
• NAKs (Neg ACK): Receiver
- NO pkt lost tells sender pkt has errors,
need retransmission
rdt 2.2 - Bit errors (on - Seg No (0, 1) → Receiver knows

data + whether it receives new/re-
feedback) transmitted pk
- NO pkt lost - Receiver Feedback:

• Include checksum → Sender
can detect feedback error
• Include seg no → Sender

know which pkt they should
send next
• Duplicated ACKs:
ACK of previous pkt =
NAK of current pkt
rdt 3.0 - Bit errors - Timer: Same as rdt 2.2

• Sender waits "reasonable"
- Pkt loss amount of time for ACK
• Timeout, no ACK →
Retransmit
(Delayed feedback: Solved by pkt

no)
* NOTE: rdt 3.0 performance limited by Stop & Wait:

Packet: L = 1 kb = 1000 b
COMP 4621 Page 13

Packet: L = 1 kb = 1000 b
vtrans = 109 b/s
dprop = 15 ms
RTT = 2dprop = 30 ms
ttrans = L / vtrans = 0.001 ms
Sender utilization:
Throughput (over link of 1Gb/s !)
Pipelined protocols
(Sender allows multiple in-flight, yet-to-be-acknowledge pkts)
Protocol Techniques Issues Sender FSM Receiver FSM

Go-back-N - Sender: - Pros: Receiver: Simple logic
- Cons: Single error pkt cause

unnecessary retransmit of
many other pkts
• Have ≤ N unack pkts in buffer

• Timer for oldest unack pkts
Timeout → Retransmit ALL unack pkts
- Receiver:
• No buffering:
N = Sliding-window size (Max. no of allowed in-flight pkt each sending
Pkts arrives out of order (gap created) →
time)
Discard immediately
base = Seg no of oldest unack pkt
• Cumulative ACK:
nextseqnum = Seg no of smallest not-yet-sent pkt
ACK seg no of LAST in-order pkt
Selective - Sender: - Pros: Avoiding unnecessary

Repeat retrans
- Cons:
• Receiver: Complicated
logic
• Sender: Manage many
timers → OS overhead
- Range of seg no ≥ 2 ×
• Have ≤ N unack pkts in buffer
Window size (prevent receiver
• Maintain timer for EACH unack pkt
misunderstanding of
Timeout → Retransmit only that pkt new/retransmission pkt)
- Receiver:
• Buffer out-of-order pkts

Send individual ACK for each pkt rcv_base = Seg no of smallest unack pkt
COMP 4621 Page 14

TCP
Thursday, October 20, 2016 11:29 PM
Packet structure
Overview
- Point-to-point:
1 sender, 1 receiver
- No msg boundary:
Data treated as ordered byte stream
- Pipelined:
Multiple in-flight pkt
- Full duplex:
App-layer data flow proc A → B at same
time as proc B → A (different hosts)
- Conn-oriented:
Handshaking required
- Flow control:
Sender won't overflow receiver's buffer
(control by sliding window size)
- MSS (Max Seg Size): Size of payload, set by OS
Timeout Estimation - Data numbering: In terms of byte:

○ SegNum (Sender → Receiver): No of FIRST byte in pkts' Data Seg
○ AckNum (Receiver → Sender): No of NEXT byte expected by
receiver (Cumulative ACK, similar to Go-Back-N)
- SampleRTT:
○ Time measured from seg trans → ACK of seg received
○ Only consider seg transmitted once
- EstimatedRTT = (1 - α) × EstimatedRTT + α × SampleRTT

○ Exponential moving avg (influence of past sample ↓
exponentially)
○ Typical α = 0.25
- DevRTT = (1 - β) × DevRTT + β × |SampleRTT = EstimatedRTT|

○ "Safety margin" for timeout estimation
○ Typical β = 0.25
- TimeoutInterval = EstimatedRTT + 4 × DevRTT
Reliable Data Transfer

(Ignore flow + congestion control)
Entity Sender Receiver

FSM
COMP 4621 Page 15

FSM
Highlights - Control mechanism = Go-Back-N + Selective-Repeat: - TCP spec doesn’t include how to handle out-of-order segs
• Go-Back-N: Cumulative ACK (Only ack expected next in-order SegNum, ignore out-of-order segs) (If out-of-order segs buffered → Need more logic to handle ExpectedSeqNum as it can change a lot more upon new pkt
• Selective repeat: Individual retransmit (Only retransmit smallest unack seg, NOT ALL unack) received, not only length(data))
- Fast Retransmit:
• Often large no of inflight seg → If 1 seg lost, many duplicated ACKs
• If sender receives 3 duplicated ACKs → Immediately retransmit seg without waiting for timeout
Scenarios
Lost ACK Premature timeout Cumulative ACK Fast retransmit due to duplicated ACK
Flow Control
Connection Management
- Prevent sender from transmitting too many pkts to overflow receiver's

buffer
Establishment
- Receiver:
○ Advertise free buffer space through RcvWnd field
RcvWnd = RcvBuffer - (LastByteRcv - LastByteRead)
Closing
○ Issue: new RcvWnd only sent when receiver has ACK or data to
send to sender
COMP 4621 Page 16

Closing
○ Issue: new RcvWnd only sent when receiver has ACK or data to
send to sender
- Sender:
○ Control sliding-window size → Limit no of allow in-flight pkts
LastByteSent - LastByteAcked ≤ WindowSize = RcvWnd
○ When LastByteSent - LastByteAcked > RcvWnd:

Continue send 1-byte seg to receiver
→ Stimulate receiver to ACK with updated RcvWnd
Either first initiated by client/server
COMP 4621 Page 17

TCP Congestion Control
Saturday, October 22, 2016 12:33 AM
Congestion Control Overview
- Causes:
○ Large queueing delay (due to pkt arrival rate ≈ link cap)
○ Unneeded retrans by sender (due to premature timeout)
→ Router use link bandwidth to forward unneeded
pkt copies
○ Pkt dropped by router (due to full buffer)
→ Trans cap each upstream link leading to that
router wasted
- Solution approaches:
○ End-to-end:
 No explicit feedback from network
 Congestion inferred from loss, delay observed by
end-sys
○ Network-assisted: Routers provide feedback to end-sys

Congestion-indicating bit
TCP Congestion Control Mechanism
- Basis: Limit sliding-window size: WindowSize ≤ cwnd
- Expected behavior: AIMD (Additive ↑, multiplicative ↓):

○ Additive ↑: cwnd ↑ by 1 MSS every RTT until loss
○ Multiplicative ↓: cwnd ↓ by 1/2 after loss
- Implementation (TCP Reno):

○ Slow start: cwnd ↑ exponentially
cwnd x2 every RTT
(⇔ cwnd = cwnd + MSS every new ACK)
○ Congestion avoidance, Fast recovery: cwnd ↑ linearly
cwnd += MSS every RTT
(⇔ cwnd = cwnd + MSS × (MSS / cwnd) every new ACK)
(⇔ cwnd = cwnd + MSS every duplicated ACK)
* NOTE: For TCP Tahoe (older version):

No Fast recovery: Timeout & Triple Duplicated ACK →
Slow start
TCP Fairness
- Goal:
K TCP session share same bottleneck link of bandwidth R
→ Each session should have avg rate of R/K
- Why TCP fair:
TCP Throughput
- In terms of W (window size) & RTT:
COMP 4621 Page 18

TCP Throughput
- In terms of W (window size) & RTT:
- In terms of L (Seg loss prob):
- Issues:
○ UDP: No congestion control → Eat up throughput for TCP
○ Parallel TCP: App asking for more TCP parallel conn eat up throughput for app
asking for less TCP conn
COMP 4621 Page 19

Network Layer
Saturday, December 3, 2016 10:50 AM
Virtual Circuit (VC) & Datagram Networks

Overview
VC Datagram
Service Network-layer connection-oriented Network-layer
service connectionless
service
End-to- - Conn setup/teardown required → Path - No setup
end path pre-determined - No resource
- Resources allocated to VC guaranteed
Link & - Link identified by different VC numbers - Router: Maintain no
Router (depending on VC it belongs to) state about end-to-
end conn
- Router:
• Forwarding table for each input
• Maintain "state" for each passing
conn
Pkt - Carry VC identifier - Carry dest addr (used
- Embedded in every hosts, routers by router to select
- Can changed by router after going
through it (based on forwarding table) output link)
- Roles:
○ Transport seg from sending host to receiving host based on
IP addr * NOTE: VC vs Datagram networks is similar to TCP vs UDP, but:
○ Sending host: Encapsulate seg into datagrams - Host-to-host, not proc-to-proc
- Network layer can provide VC or datagram, not both
○ Receiving host: Deliver seg to transport layer
- Implemented in network core, not network edge
- Key functions:
○ Forwarding: Move pkts from router's input to appropriate
router's output
○ Routing: Determine route taken by pkts from source to dest
Components of Routers
Component Diagram Details

Input ports Decentralized switching: Forwarding performed at input
COMP 4621 Page 20

Input ports - Decentralized switching: Forwarding performed at input
port
- Match plus action: Given datagram dest, look up output

port using copy of forwarding table in input port mem
→ ↑ Processing speed (match with arrival rate)
- Queuing: If Line rate (arrival rate) > Forwarding rate

- Head-of-the-line (HOL) blocking: Datagram at front of
queue might prevents others behind from moving forward
Switching - Switching rates:
fabrics • Rate at which pkts transferred input → output
• Desirable: N × Line rate (N inputs)
- Switching via mem:

• Switching under direct control of CPU
• Pkts copied to sys's mem
• Speed limited by mem bandwidth
- Switching via bus:

• Use shared bus to transfer input → output
• Speed limited by bus bandwidth
- Switching via interconnection network:

• Input - output switch controlled by cross points
• Can be perform multiple switching in parallel
Output ports - Buffering: When Arrival rate from fabric > Transmission
rate
- Scheduling discipline: Choose which datagram get

transmitted first
Recommended buffer size:

RTT: Round-trip time
C: Link cap
N: No of inputs
COMP 4621 Page 21

IP (Internet Protocol)
Saturday, December 3, 2016 11:17 PM
IPv4 Fragmentation
IPv4 Datagram Format
- Network links can have different MTU (Max. Transfer Size)

- Type of service: Type of datagram
- Datagram size > Link MTU → Divided into ≥ 2 fragments:
Datagram requiring low delay, reliability, …
○ Reassembled ONLY at final dest
- Datagram length (bytes) = Header + Data ○ IP headers used to identify & order fragments:
 Flags:
- Time-to-live (TTL): □ 0 for last fragment
○ Max. no of routers a datagram can passed □ 1 otherwise
through before it must dropped  Identifier: Generated by source host
→ Prevent forever circulation  Fragment Offset: = Byte offset / 8
○ ↓1 after processed by router Offset = 185
TTL = 0 → Must dropped → Data should be inserted from byte 185 × 8 = 1480
- Identifier, Flags, Fragmentation offset: Used for IP

fragmentation
- Protocol: Transport-layer protocol creating this

DHCP (Dynamically Host Configuration Protocol)
datagram
TCP = 6, UDP = 17
- Purpose:
○ Allow host to dynamically obtains its IP addr from network server
when it joins network
○ IP addr lease can renewed
IPv4 Addressing
- Pros:
○ Allow reuse of IP addr
- Interface: Represent the conn between host/router & physical link
○ Automatic configuration of IP addr
○ Router has > 1 interface (as it connects to different links)
○ Host has 1-2 interfaces
- Cons: Can’t maintain TCP conn when host moves between subnets (as IP
Wired Ethernet, wireless 802.11, … addr changes)
- Subnet: Device interfaces within same subnet can physically reach - Mechanism:
each other without intervening router
○ 1 (optional): Host broadcasts "DHCP discover" msg
○ 2 (optional): DHCP server (can have > 1 on same network)

broadcasts "DHCP offer" msg with allocated IP addr
○ 3: Host chooses 1 DHCP server, responds with "DHCP request" msg

→ Accept offer
○ 4: Chosen DHCP server responds with "DHCP ACK" msg
NOTE: DHCP msg here are UDP broadcasting msg:

○ Source: 0.0.0.0:68
○ Dest: 255.255.255.255:67
COMP 4621 Page 22

- Extra info provided by DHCP to clients:
- How to obtain IP addr: ○ Addr of first-hop router
○ Local network: Get portion of ISP's addr space ○ Name, IP addr of DNS server
○ Network mask
○ ISP: Get from ICANN (Internet Corporation for Assigned Name &
Number)
- IPv4 addr:
○ Associate with EACH interface
○ Size: 32 bit, usually divided into 4 groups of 8 bits

○ 2 parts:
 Subnet/Prefix: High order bits
 Host: Low order bits
○ CIDR (Classless Inter-Domain Routing):

a.b.c.d/x
x: No of bits presenting subnet part
NAT (Network Address Translation)
○ Network mask:
 Indicate subnet portion of IP addr through no of leading
1's
 Subnet addr = IP & Mask
IP = 128.96.39.10
Mask = 255.255.255.128 (25 leading 0's)
128. 96. 39. 10

& 255.255.255.128
= 128. 96. 39. 0
⇒ Subnet: 128.96.39.0/25
- Hierarchical addressing: - Purpose: All devices in local area network (LAN) present to outside world
○ Router uses longest prefix matching to match dest of pkt with through 1 IP addr
desired output link
○ IP addr of devices from same organizations/ISPs share prefix - Benefits: LAN admin can:
○ Just obtain 1 IP addr from ISP
⇒ Routing info can efficiently advertised among router through ○ Change addr of LAN devices without notifying outside world
route aggregation ○ Change ISP without change addr of LAN devices
○ Prevent LAN devices from being explicitly addressable by outside → ↑
Security
- Implementation:
○ Outgoing datagram: Replace (Source IP, Port) with (NAT IP, New port)
○ NAT translation table: Remember all translation pair
○ Incoming datagram: Replace (NAT, New port) in dest fields with (Source
IP, Original port)
- Limitation:
○ Single NAT IP can only maintain ≈ 60000 simultaneous conns (Port field =
16 bits)
When organization moves from 1 ISP to another, its IP addr
portion can kept but requires extra advertisement effort from
ISP routers ○ NAT traversal problem:
COMP 4621 Page 23

○ Single NAT IP can only maintain ≈ 60000 simultaneous conns (Port field =
16 bits)
When organization moves from 1 ISP to another, its IP addr
portion can kept but requires extra advertisement effort from
ISP routers ○ NAT traversal problem:
 Outside clients can't initiate conn with devices inside LAN:
□ Internal LAN IP can't be used
□ NAT IP is common to all devices)
 Solution:
□ Statically configure NAT to forward incoming connection
requests at given port to specific device
Request to 123.76.29.7:2500 always forwarded to
10.0.0.1:25000
IPv6 Addressing
□ UPnP (Universal Plug & Play): NAT server can lease port
mappings to LAN devices for a period upon requests
- Size: 128 bits
LAN devices requests NAT server to forward traffic to NAT
→ Addr space won't get exhausted soon
port 3100 to its port 31000
- Not allow IP fragmentation

□ Relaying: NAT-behind hosts exchange data through bridge of
→ Sender must resend data using smaller IP datagram size relaying server using formal IP
upon receipt of "Package Too Big" ICMP msg
- No header checksum
→ Remove header checksum recomputation step (due to
change in TTL)
ICMP (Internet Control Message Protocol)
- Datagram format: Fixed 40-byte header
- Used by hosts & routers to communicate network-lvl info

Error reporting, unreachable host, echo request/reply, …
- ICMP msg carried in IP datagram as payload
- Msg content: Type, code, first 8 bytes of IP datagram causing error
- Popular application: Traceroute:

○ Source sends series of UDP seg to dest:
1st set has TTL = 1
○ Traffic class: = IPv4's Type of Service 2nd set has TTL = 2
○ Hop limit: = IPv4's TTL …
○ Next header: = IPv4's Protocol nth set has TTL = n
○ Flow label: Identify datagram in same flow, but flow concept ○ When nth set of datagram arrives to nth router: TTL = 0
not well-defined yet  Router must discard datagrams
 Sends ICMP to source with content:
- Transition from IPv4 to IPv6: □ Type 11, code 0
Tunneling: IPv6 datagram carried as payload in IPv4 datagram □ Router name & IP addr
among IPv4 routers  Source receives these ICMP msgs, record RTTs
○ When an seg eventually arrives at dest:

 Dest return ICMP of "port unreachable" msg (type 3, code 0)
 Source receives this ICMP msg, terminates
COMP 4621 Page 24

COMP 4621 Page 25
Routing Algorithms
Routing algorithm classification

Graph Abstraction
- Global vs Decentralized:
○ Global: All routers have complete
knowledge of topology, link costs
○ Decentralized: Router only knows about
physically connected neighbors and link
costs to them
- Static vs Dynamic:
○ Static: Routes changes slowly over time
○ Dynamic: Routes can change quickly
Graph G = (N, E)
N: Set of routers
E: Set of links
c(x, y): Link cost
1, ∝ 1/Bandwidth, ∝ 1/Congestion
Routing Algorithms
Compute least cost paths from 1 node (source) to other nodes
Link-State (LS) Distance-Vector (DV)

Type Global, Iterative Decentralized, Iterative
Notation - u: Source - dx(y) = Cost of least-cost path x → y
- D(v): Current value of path cost from source to v - Dx(y) = Estimate of least cost x → y
- p(v): Predecessor node along shortest path from source to v - Distance vector: Dx = [Dx(y): y ∈ N]
- N': Set of nodes whose least cost path have known
Algorithm Initialization: For each node x
N' = {u} - Assumption:
for v ∈ N: • Know cost to each neighbor v: c(x, v)
if ∃(u, v) then D(v) = c(u, v) else D(v) = ∞ • Maintain neighbors' DV: Dv = [Dv(y): y ∈ N], ∃(x, v)
Loop until all nodes in N': - When c(x, v) changes, Dv update msg from v received:
Find w ∉ N': D(w) min • Update Dx: Dx(y) = min{c(x, v) + Dv(y)}, y ∈ N
N' ← N' ∪ {w} • If ∃y: Dx(y) changes: Notify neighbors v
for v ∈ N, v ∉ N', ∃(w, v):
D(v) = min(D(v), D(w) + c(w, v)) - Under minor, natural condition: Dx(y) → dx(y)
Example
1st iteration 2nd iteration 3rd iteration
Node To To To Node To To To Node To To To

x x y z x x y z x x y z
From 0 2 7 From 0 2 3 From 0 2 3
x x x
Step N' D(v) D(w) D(x) D(y) D(z) From ∞ ∞ ∞ From 2 0 1 From 2 0 1
p(v) p(w) p(x) p(y) p(z) y y y
0 u 7, u 3, u 5, u ∞ ∞ From ∞ ∞ ∞ From 7 1 0 From 3 1 0

z z z
1 uw 6, w 5, u 11, w ∞
2 uwx 6, w 11, w 14, x
3 uwxv 10, v 14, x Node To To To Node To To To Node To To To
y x y z y x y z y x y z
4 uwxvy 12, y
From ∞ ∞ ∞ From 0 2 7 From 0 2 3
5 uwxvyz
x x x
From 2 0 1 From 2 0 1 From 2 0 1
Forwarding table in u:
y y y
COMP 4621 Page 26

Forwarding table in u:
y y y
Dest v x y w z
From ∞ ∞ ∞ From 7 1 0 From 3 1 0
Link (u, v) (u, x) (u, x) (u, x) (u, x) z z z
Node To To To Node To To To Node To To To

z x y z z x y z z x y z
From ∞ ∞ ∞ From 0 2 7 From 0 2 3
x x x
From ∞ ∞ ∞ From 2 0 1 From 2 0 1
y y y
From 7 1 0 From 3 1 0 From 3 1 0
z z z
Potential Oscillation: Count-to-infinity:

problems Routes changes frequently due to frequent change in link cost Loops in network topology → DV converges very slowly when link cost ↑
Link cost ∝ Carried traffic
44 iterations before algorithm stabilizes
Solution (not effective in all cases): Poisoned reversed:

If Z needs to pass through Y to reach X
→ Z tells Y that DZ(X) = ∞
(→ Y won't route via Z to reach X)
Msg - Each node needs to know all link costs in network - Msg only exchanged between directly-connected nodes
complexity - Link cost changes: Msg must sent to all nodes - Link cost changes: DV only propagated to neighbors if it changes
⇒ Complex ⇒ Simple
Convergence - Algorithm: O(|N|2) (more efficient implementation: O((|N|+|E|) - Convergence speed varies, may suffer from count-to-infinity
speed log|N|)
- Max O(|N|.|E|) msg sent
Robustness - Node could broadcast wrong cost for its attached links only - Node can advertise wrong path costs to any/all dests
- Node only calculates its own forwarding tables - Each node's forwarding table (DV) used by others → Error can propagate throughout network
⇒ More robust
Hierarchical routing
- LS/DV limitations:
○ Scale: Storing & exchanging routing info among mil of routers
 Large computation overhead for individual router
 No bandwidth for data pkts
○ Administrative autonomy: Organization wants to administer
its own network as it wishes
Choose routing algorithm, hide internal structure from
outside, …
- Autonomous System (AS):

○ Group of routers under same administrative control
○ Intra-AS routing: Routers in same AS run same routing
protocol
○ Gateway router: At edge of AS, link to router of another AS
- Inter-AS routing tasks:

○ Gateway router:
 Learn which dest reachable through neighboring AS
 Propagate this reachability to all routers within AS
COMP 4621 Page 27

- Inter-AS routing tasks:
○ Gateway router:
 Learn which dest reachable through neighboring AS
 Propagate this reachability to all routers within AS
○ All routers: Choose which gateway to forward pkts towards

AS dest (depend on AS policies)
Hot potato: Forward pkt towards closer gateway
COMP 4621 Page 28

Routing on the Internet
Monday, December 5, 2016 11:01 PM
Intra-AS Routing: OSPF (Open Shortest Path First)
Intra-AS Routing: RIP (Routing Information Protocol) - Link State (LS) algorithm (Dijkstra)
- Each link can have different cost metrics for different Type of Service
- Distance vector (DV) algorithm - Allow multiple same-cost paths
- Max no of routers supported = 15 → Router can simultaneously use ≥ 2 paths to route traffic
towards dest
→ Infinite = 16
- All link cost = 1
- Advertisement:
- Only 1 path exist between source-dest pair.
○ 1 entry/neighbor in msg
○ Flooded to entire AS
- Advertisement:
○ Neighbors exchange DV every 30 sec ○ Msg carried directly over IP (rather TCP, UDP)
○ Each advertisement: ≤ 25 dest subnets

- Integrate multicast support
- Link failure & recovery:

○ No advertisement after 180 sec → Dead link → Advertise - Security: All msgs authenticated
to neighbors
○ Poison reverse: Prevent ping-ping loops
Intra-AS Routing: Hierarchical OSPF
- Forwarding table processing:

○ Managed by app-lvl daemon proc call route-d
○ Advertisement sent in UDP pkts
- 2-lvl hierarchy:
○ 1 Backbone area:
 Route traffic among local areas
 Boundary + Backbone + Area border routers
○ ≥ 2 Local area:
Area border + Internal routers
BGP (Border Gateway Protocol)
De-factor Inter-AS Routing Protocol - Routers only know its area's topology & broadcast LS within its area
- Area border routers:

- Purpose: ○ Summarize distances to routers in its own area
○ Obtain subnet reachability info from neighboring ASs ○ Advertise to other Area border ones
○ Propagate reachability info to all AS-internal routers
○ Determine "good" routes to other networks based on - Backbone routers: Run OSPF inside Backbone area only
reachability info & policy
○ Allow subnet to advertise its existence to rest of Internet - Boundary routers: Gateway to other ASs
- BGP session:
○ Run on semi-permanent TCP conn
○ Advertise paths to different dest network prefix:

 eBGP session: Between routers from different ASs
 iBGP session: Among routers in same AS
COMP 4621 Page 29

○ Advertise paths to different dest network prefix:
 eBGP session: Between routers from different ASs
 iBGP session: Among routers in same AS
○ ≥ 2 prefixes can be aggregated into 1 entry in advertisement

msg
○ Msg type:
 OPEN
 UPDATE: Advertise new/Withdraw old paths
 KEEPALIVE
 NOTIFICATION: Report error, close conn
- BGP route = Prefix + Attributes (AS-PATH, NEXT-HOP, …):

○ AS-PATH: Contain ASs through which advertised prefix passed
→ Prevent loop advertisement
Choose among multiple paths to same prefix
AS 67, AS 17, …
○ NEXT-HOP: Indicate IP addr of first router (outside of AS

receiving advertisement) along advertised path to given prefix
→ Used by routers to configure forwarding table
- BGP Route Selection: Select among multiple routes to dest AS,

based on:
○ Policy
○ Shortest AS-PATH
○ Closest NEXT-HOP
○ …
C's perspective about network

- BGP Routing policy:
A, B, C: Provider networks W: A's customer
X: B & C's customer
Y: C's customer
○ B advertises BAW to X
○ B NOT advertises BAW to C:
 W, C are NOT B's customer
 B gets no revenue for routing C→B→W
○ X NOT advertise BX to C:
X gets no benefit from helping C route to B via X
COMP 4621 Page 30

Broadcast & Multicast Routing Multicast
1. Problem statement: Find tree connecting a group (not ALL) routers in network
Broadcast
Deliver pkts from source to ALL other nodes 2. Approaches:
a. Source-based tree: Different senders generate different trees
- Tree-forming criteria: Shortest path
- Flooding: Tree of shortest paths from source to receivers
○ When node receives broadcast pkts, sends copy to all

neighbors - Router's forwarding behavior:
○ Problems: Cycles, broadcast storm if (datagram received from incoming link on shortest path to source)
then Flood datagram to all outgoing links
- Controlled flooding: Node only broadcasts pkts if it hasn't else Ignore datagram
broadcasted same pkt before
○ Node keep track of pkt IDs already broadcasted
○ Reverse path forwarding (RPF)
Only forward pkt if it arrives on shortest path
between node & source
- Spanning tree:
○ Construct spanning tree first:
 Choose center node
 Each node sends msg to center node
 Msg forwarded until arriving at node already
belonging to tree - Pruning: When tree contains subtree with no group member
○ Then only forward pkts along tree  Router having no attached hosts in group sends "prune" msg to
upstream router
 Router receiving prune msg from downstream router forwards msg
further upstream
PIM (Protocol Independent Multicast)
- Not depend on any unicast routing algorithm b. Group-shared tree: Same tree for a group
- 2 scenarios: - Tree-forming techniques: Center-based
 1 router chosen as "center" (rendezvous point (RP))
Dense Spare  Join tree:
No. of - Densely packed - Small □ Edge router sends Join-msg to center
group - "Close" proximity - "Widely dispersed" □ Join-msg hits existing tree branches/center
members □ Path taken by join-msg becomes new tree branch
Bandwidth Plentiful Not plentiful
Membership Assume until explicitly PRUNE No assume until
explicitly JOIN
Tree - Data-driven: RPF - Receiver-driven:
construction Center-based
DVMRP (Distance Vector Multicast Routing Protocol)
- Flood-and-prune: Similar to
DVMRP but: - After joining via RP,
• Underlying unicast router can switch to
protocol provides RPF source-specific tree - Commonly implemented in commercial router
info for incoming → ↑ Performance
datagram Shorter path - Source-based, reverse path forwarding
• Less complicated - No assumption about underlying unicast
downstream flood - RP can extend tree - Initial datagram to group members flooded throughout network
→ ↓ Reliance on routing upstream
algorithm - Router leaving group: Send prune msg upstream
• Has mechanism to detect - Soft state:
leaf-node router
○ DVMRP router periodically forget "pruned" branch, continue to
COMP 4621 Page 31

- RP can extend tree - Initial datagram to group members flooded throughout network
→ ↓ Reliance on routing upstream
algorithm - Router leaving group: Send prune msg upstream
• Has mechanism to detect - Soft state:
leaf-node router
○ DVMRP router periodically forget "pruned" branch, continue to
push data downstream
○ Downstream router must reprune
COMP 4621 Page 32

Link Layer
Monday, December 5, 2016 11:01 PM
Implementation
Overview
- Adapter: = Hardward + Software + Firmware
NIC (Network Interface Card), chip
- Terminology: - Implement link + physical layer
○ Node: Host/Router - Attach into host's sys bus
○ Link: Comm channels connectinb adj nodes along comm path
○ Frame: Link layer pkt, encapsulate datagram
- Link layer services:

○ Framing: Frame = Header + Datagram
○ Link access:
 MAC (Medium Access Control) protocol: Rule by which frame
transmitted onto link
 Support:
□ Point-to-point link
□ Broadcast link:
 ≥ 2 nodes share 1 link, node can transmit
simultaneously
 Collision if ≥ 2 signal at same time
⇒ Multiple Access Control Protocol
○ Reliable delivery (between adj nodes):

Usually on wireless link → Correct error locally, prevent
triggering of end-to-end retransmission
○ Error detection & correction: Error Detection & Correction: Parity Checking
 Error sources: Signal attenuation, electromagnetic noise, …
 Correct bit errors without triggering retransmission
- Single parity:
○ Flow control ○ Sender: Add 1 Parity bit for every d data bit
→ No of 1's in (d+1) bit is even (Even scheme) / odd (Odd
○ Half/Full-duplex transmission scheme)
Wiress, by nature, is HALF-duplex
Error Detection: CRC (Cyclic Redundancy Check)
○ Receiver: For every (d+1) bit received, check no of 1's:

 Even scheme: ODD no of 1's → Error
- Widely used in practice
 Odd scheme: EVEN no of 1's → Error
Ethernet, 802.11 Wifi
- Can detect all burst error < (r + 1) bit
○ Undetected error can still happen
EVEN/ODD no of 1's in Even/Odd scheme
- Sender: For every d data bit
○ D = Number represented by these d bit
- 2D parity:
○ G = Generator, bit length (r + 1)
○ Sender:
○ R = CRC bit of length r
 Every d data bit divided into i rows × j cols
 Compute parity bit for each row, col
○ Need to find R such that: (D.2r ⨁ R) divisible by G
→ R = D.2r mod G
○ Receiver:
 Detect & correct 1 bit error, or
 Detect (only) 2 bit errors
COMP 4621 Page 33

○ Receiver:
 Detect & correct 1 bit error, or
 Detect (only) 2 bit errors
- Receiver:
○ G known
○ T = Number represented by (d + r) bit received
○ T mod G ≠ 0 → Error
COMP 4621 Page 34

Multiple Access Protocol
Friday, December 09, 2016 10:12 AM
Random Access Protocol: Slotted ALOHA
Overview: Multiple access protocol
- Distributed algorithm, determine how nodes share channel, i.e.

when node can transmit
- No out-of-band coordination channel: Comm about channel

sharing must use channel itself
- Assumption:
- Ideal: Rate R bps ○ Equal frame size
○ 1 node wants to transmit: Sending rate = R ○ Time divided into equal slots (used to transmit 1 frame)
○ M nodes wants to transmit: Sending rate of each = R/M ○ Node:
○ Fully decentralized:  Start transmission only when slot begins
 No special node to coordinate transmission  Sync
 No sync of clocks, slots  ≥ 2 nodes transmit in slot → All detect collision
○ Simple
- Node operation: Obtain fresh frame, transmit in next slot
- Types: ○ No collision: Progress to new frame, next slot
○ Channel partitioning: Based on time slots, freq, code, … ○ Collision:
TDMA (Time Division Multiple Access)  Retransmit frame in next slots until success
 Prob of choosing slot = p
- Pros:
○ Single active node can continuously transmit at full rate
FDMA (Freq Division Multiple Access)
○ Highly decentralized: Each node detects collision & decide when to
retransmit independently
○ Simple
- Cons:
○ Waste slot: collision, idle slots
○ Clock sync among nodes
○ Random access:
 No chanel division
- Efficiency (Long-run prop of successful slots)
 When node wants to send: Transmit at full rate
N nodes, each has many frame to send
 No priori coordination among nodes
Prob of choosing slot = p
→ Allow collision, provide collision recovery
P(Node success in slot) = p(1 - p)N - 1

○ "Taking turns":
E = P(Any node success) = Np(1 - p)N - 1
 Node takes turn to transmit
→ Find p to max E
 Nodes with more data to send can take longer turn
N → ∞, max(E) = 1/e = 0.37
Random access protocol: CSMA/CD (Carrier Sense Multiple Random Access Protocol: Pure (Unslotted) ALOHA
Access/Collision Detection)
- Node operation: Obtain fresh frame, transmit immediately

- Principle of CSMA: Listen before transmit ○ Collision: Immediately retransmit (after completely transmit collided
○ Channel sensed idle: Transmit entire frame frame) with prob p
○ Channel sensed busy: Defer transmission ○ No collision:
 Wait for frame transmission time
- Collision can still occur due to propagation delay, making 2 nodes  Error detected: Retransmit frame with prob p
unable to hear each other immediately No error: Wait for new frame with prob (1 - p)
- Pros: No sync needed

- Cons: High collision prob
- Efficiency:
COMP 4621 Page 35

Cons: High collision prob
- Efficiency:
1 time unit = Frame transmission time

Node start transmitting at t 0
E = P(Node success)
= P(Node transmit)
× P(No other nodes transmit in [t0 - 1, t0]
× P(No other nodes transmit in [t0, t0 + 1]
= p × (1 - p)N - 1 × (1 - p)N - 1
- Collision Detection:
○ Detect collision within short time → Abort transmission → ↓ N → ∞, Choose p to max E: max(E) = 1/(2e) = 0.18
Channel wastage (Worse than slotted ALOHA)
○ Implementation:
 Easy in wired LANs: Measure signal strength, compare
transmitted & received signals
 Difficult in wireless LANs: Received signal
overwhelmed by local transmission power
"Taking turns" access protocol

- Ethernet CSMA/CD algorithm:
○ 1: Receives datagram, create frame
○ 2: Sense channel:
- Motivation:
 Idle: Start transmission
○ Channel partitioning:
 Busy: Wait
 Efficient & Fair at high load
○ 3: If entire frame transmitted without collision: Done
 Inefficient at low load (delay in channel access)
If collision detected:
○ Random access:
 Abort, send jam signal
 Efficient at low load (channel rate fully utilized)
 Enter binary binary (exponential backoff):
 Inefficient at high load (collision overhead)
□ After mth collision: Choose random K from:
⇒ Taking turn: Balance approach
{0, 1, 2, …, 2m - 1}
□ Wait (512K) BIT times, then go to Step 2
- Implementation:
- Efficiency: Pollings Token-passing

- Master node invites slave nodes Control token passed from 1
to transmit in turn node to next sequentially
- Slaves: Typically "dumb" devices
tprop: Max prop delay between 2 nodes in LAN

ttrans: Time to transmit max-size frame
tprop → 1, ttrans → ∞: E → 1
- Benefit:
○ Simple, fully decentralized
○ Better performance than ALOHA
- Concerns:
○ Single point of failure (Master node/Token)
○ Latency: Waiting for turn
○ Overhead
COMP 4621 Page 36

LAN (Local Area Network) ARP (Address Resolution Protocol)
Friday, December 09, 2016 10:12 AM
- Function: Translate IP addr → MAC addr within LAN

MAC Address
- ARP table: Exist in each IP node
 Entry: (IP addr, MAC addr, TTL)
 Soft state: After period of TTL, addr map will forgotten
- Plug-and-play: Node creates table automatically

A wants to send datagram to B
B's MAC addr not in A's ARP table
○ A broadcast ARP query pkt with B's IP addr

Dest MAC addr = FF-FF-FF-FF-FF-FF
○ B receives ARP pkt, specifically replies to A with B's MAC addr
Dest MAC addr = A's MAC addr
○ A caches B's IP & MAC addr in its ARP table
- Unique, link-layer addr for each adapter

→ Router can have ≥ 2 MAC addr (1 link = 1 adapter)
Ethernet
- Size: 6 byte
1A-2F-BB-76-09-AD (1 number = 4 bit)
- Market properties: "Dominant" wired LAN technology
- Flat structure: No change wherever adapter goes (⇔ Social security ○ Cheap, Simple
number) ○ Support high speed
≠ IP addr: Hierarchical structure, change when moving to new 100 Mbps, 1 Gbps, 10 Gbps, …
subnet (⇔ Postal addr) ○ Support different physical layers:
fiber, copper cable, …
- MAC broadcast addr:
○ FF-FF-FF-FF-FF-FF
○ Sending adapter wants all other adapters on LAN to process this
frame
Link-layer Switch - Technical properties:

○ Connectionless: No hand-shaking between sending & receiving NICs
○ Unreliable:
 Receiver doesn't (N)ACK sender
 Recovery relied on higher layer
○ MAC protocol: CSMA/CD with binary backoff
- Physical topology
○ Coaxial bus: All nodes in same collision domain
○ Star:
 Active switch in center
 Each node runs separate Ethernet protocol
→ No collision with each other
- Dedicated, direct conn to each connecting node

→ No collision among nodes, full-duplex
Can transmit simultaneously
A ↔ A', B ↔ B'
- Transparency: Hosts, router unaware of switches

- Frame structure:
→ Switch doesn't have MAC addr
- Role: Forwarding & Filtering

○ Filtering: Determine whether incoming frame should be
COMP 4621 Page 37

- Frame structure:
→ Switch doesn't have MAC addr
- Role: Forwarding & Filtering

○ Filtering: Determine whether incoming frame should be
forwarded/dropped
○ Preamble (8 bytes):
○ Forwarding: Determine & Move frame to appropriate interface
 7 bytes of "10101010" + 1 byte of "10101011"
 Sync sender & receiver clock rates
- Plug-and-play: Automatically learn which hosts can reached through
which interface
When new frame comes from interface x ○ Dest MAC addr (6 bytes)
 Adapter receives frame with matching dest/broadcast addr →
Process frame
○ Add new entry to forwarding table, if not exists:
 Otherwise, discard
(Source MAC addr, x, TTL)
TTL = Time to live
○ Source MAC addr (6 bytes)
○ Check Dest MAC addr in forwarding table:
 No entry: Broadcast frame to ALL interface except x
○ Type: Indicate protocol used in Payload
 Entry exists, with interface y ≠ x: Forward frame to y ONLY
IP, ARP, …
 Entry exists, with same interface x: Discard frame (as frame
comes from LAN segment already containing Dest MAC
addr) ○ CRC: Cyclic Redundancy Check
Frame dropped if error detected
VLAN (Virtual Local Area Network)
- VLAN-supporting switch: Can configured to define ≥ 2 virtual LANs

over 1 phyiscal LAN infrastructure
- Benefits:
○ Traffic isolation among VLANs
→ ↑ LAN performance, security, privacy
○ Efficiency use of switches: 1 switch can define ≥ 2 VLANs
- Interconnecting switches: ○ Dynamic management: No need to change physical cables when
○ Use switches instead of routers to connect hosts moving devices belonging to same group
○ Limited to acyclic topology (can't have loop)
- Implementation: Port-based VLAN
○ Divide switch ports into groups
- Switch vs Router:
Switch Router
Set up Plug-and-play Need to configure IP addr
Store-and- Examine link-layer header Examine network-layer
forward header
scheme ○ Trunk port:
Forwarding Self-learn through Computed using routing  Carry frames between VLANs defined over ≥ 2 physical
table flooding technique, MAC algorithm, IP addr switches
addr  802.1q protocol: Define format of frame forwarded
Processing Fast (Only process up-to Slower (Process up to layer between trunk ports
time layer 2) 3)
Support - Acyclic - Any topology
topology - Effective at small - Suitable for large network
network (↑ network → ↑ (Routing algorithm chooses
ARP table size, traffic & best among ≥ 2 paths to
processing time) dest → More effective than
flooding & learning
mechanism at large scale)
COMP 4621 Page 38

network (↑ network → ↑ (Routing algorithm chooses
ARP table size, traffic & best among ≥ 2 paths to
processing time) dest → More effective than
flooding & learning
mechanism at large scale)
Protection Vulnerable to broadcast Firewall protection against
storm (1 host endlessly layer-2 broadcast storm
broadcast frames)
COMP 4621 Page 39

MPLS (Multiprotocol Label Switching) MPLS-capabled router (Label-switched router)
Motivation - Forward pkts based only label value in MPLS header
High-speed IP forwarding using fixed length label,

instead of IP addr
○ Fast lookup using fixed length ID (rather than IP
prefix matching) - Path to dest based on source + dest addr
○ Employ VC approach - Fast reroute: Use pre-computed backup routes in case link fails
- Signaling:
○ Modify OSPF link-state flooding protocols to carry extra info used by
MPLS routing
Link bandwidth, "reserved" bandwidth, …
○ RSVP-TE signaling protocol: Used by entry MLPS router → Set up MPLS

forwarding at downstream routers
COMP 4621 Page 40

Data Center Networking
Design: Rich interconnection among switches & racks:

- ↑ Routing path → ↑ Throughput among racks
- ↑ Redundancy → ↑ Reliability
COMP 4621 Page 41

COMP 4621 - Computer Communication Networks I

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COMP 4621 - Computer Communication Networks I

Uploaded by

Copyright:

Available Formats

Introduction

Thursday, September 8, 2016 12:04 AM

Internet Overview Circuit Switching Packet Switching

Sources of Packet Delay on Internet

- Five protocol stacks (top → bottom):

○ Link: - IP spoofing: Send packet with false source addr

Physical: Deal with bit trans in physical medium

COMP 4621 Page 1

COMP 4621 Page 2

Application Architecture ○ ≥ 2 procs exchanging messages

Transport Layer Overview

Building Blocks of Application Layer Protocol

○ UDP: Unreliable data transfer

COMP 4621 Page 3

Non-persistent vs Persistent HTTP

RTT: Time of small packet to travel client → server & back

- Help web server keep track of users

Request method types (HTTP/1.1):

COMP 4621 Page 4

With cache, hit rate = 40% (40% requests satisfied by cache)

COMP 4621 Page 5

COMP 4621 Page 6

- Sample return codes:

COMP 4621 Page 7

Contacted server reply contacting with

- Distributed, Hierarchical Database

COMP 4621 Page 8

File Distribution: Client-Server vs P2P

○ Chunks requesting: Rarest first

○ Chunk sending: Tit-for-tat:

 Every 30 secs: X rand select peer Y, start sending chunks to Y ("optimistically

- Each peer only aware of immediate successor & predecessor

COMP 4621 Page 10

COMP 4621 Page 11

- Socket identity (for demultiplexing):

- Checksum: Detect error

COMP 4621 Page 12

Finite State Machine (FSM)

Protocol Assumption Techniques Sender FSM Receiver FSM

rdt 2.0 - Bit errors on - Checksum: Detect error

rdt 2.2 - Bit errors (on - Seg No (0, 1) → Receiver knows

- NO pkt lost - Receiver Feedback:

• Include seg no → Sender

rdt 3.0 - Bit errors - Timer: Same as rdt 2.2

(Delayed feedback: Solved by pkt

* NOTE: rdt 3.0 performance limited by Stop & Wait:

COMP 4621 Page 13

Throughput (over link of 1Gb/s !)

Protocol Techniques Issues Sender FSM Receiver FSM

- Cons: Single error pkt cause

• Have ≤ N unack pkts in buffer

Selective - Sender: - Pros: Avoiding unnecessary

• Buffer out-of-order pkts

COMP 4621 Page 14

- MSS (Max Seg Size): Size of payload, set by OS

Timeout Estimation - Data numbering: In terms of byte:

- EstimatedRTT = (1 - α) × EstimatedRTT + α × SampleRTT

- DevRTT = (1 - β) × DevRTT + β × |SampleRTT = EstimatedRTT|

- TimeoutInterval = EstimatedRTT + 4 × DevRTT

Reliable Data Transfer

Entity Sender Receiver

COMP 4621 Page 15

- Prevent sender from transmitting too many pkts to overflow receiver's

COMP 4621 Page 16