You are on page 1of 41

Introduction

Thursday, September 8, 2016 12:04 AM


Circuit Switching vs Packet Switching

Internet Overview Circuit Switching Packet Switching


Mechanis - Resources along a path reserved, - Statistical multiplexing: Resources statistically
m dedicated for a connection even if shared by all traffic, no reservation
- Network of networks there is no traffic
- Each owned & independently operated by different ISPs (Internet - Hosts break app-layer msg into packets, each:
Service Providers) - Admission control: • Forwarded from router to next
- All ISP employ common protocol suite: IP (Internet Protocol) • Always has set-up process • Transmitted at full link cap
prior to any communication
- Loosely hierarchical: • New connection rejected if - Route mechanism:
○ Access network: Bottom, tier-n ISPs system cannot satisfy
• Store & Forward: Wait for entire packet to
Residential, university, enterprise requirements
arrive before transmitted next
○ Core networks: Tier-1 ISPs • Queue: Arrival rate > Trans rate, packets wait
- Bandwidth sharing techniques:
Long-haul, intra, intercontinental fiber links • Loss: Buffer fill up → Drop packets
• FDM (Freq Div Multiplexing)
• TDM (Time Div Multiplexing)
- Protocol: Define:
○ Format, order of msg exchanged between comm entities Pros - QoS (Quality-of-Service) - Accommodate more user if bursty usage
○ Action taken on trans/receipt guaranteed once connection (different traffic amount at different time)
established
- Suitable for real-time services
telephone, conference call, …
Cons - Reserved connection bandwidth - Difficult to guarantee QoS if traffic congested
wasted when no data sent →
Cannot serve many users

Sources of Packet Delay on Internet


dnodal = dproc + dqueue + dtrans + dprop

NOTES:
○ Global ISP A, B, C, … Delay source Def Usually Typical
fixed delay time
○ IXP: Internet exchange point, connecting global ISP
○ Content Provider Network: Private network, bypassing tier-1, Processing - Figure out where to forward packet next Yes μs
regional ISPs to bring services close to end users dproc - Check bit error
Google, Microsoft, Akamai, … Queuing - Wait for trans at output link No μs - ms
dqueue - Depend on traffic congestion
Transmission Time to push entire packet out of routers/switches Yes μs - ms
dtrans dtrans = L / R
L: Packet length (bits)
R: Link bandwidth (bits/s)

Internet Layered Architecture Propagation - Time for 1 bit to travel from 1 end to other Yes ms
dprop dprop = d / s
d: Length of physical link
- Provide modularity & function decomposition: s: Propagation speed
○ Good: Effective to deal with large, complex software system Speed of light for fiber

○ Bad: Inflexible, redundancy of certain function in different layers - Related to link's physical medium (copper, fiber, …)

- Five protocol stacks (top → bottom):


○ Application: Support network app
HTTP, FTP, SMTP

○ Transport:
 End-to-end protocol: Control communication between 2 Network Security
procs in 2 hosts (end systems)
 Control provided: reliability, congestion, multiplexing
 NOT for routers - DoS (Denial of Service): Attacker:
TCP, UDP ○ Overwhelm resource with bogus traffic
○ Make resources unavailable to legitimate traffic
○ Network: Responsible for packet-routing among routers, hosts
IP, routing protocols - Packet "sniffing": Read/records all packets passing by network

○ Link: - IP spoofing: Send packet with false source addr


 Deal with packet trans between 2 directly connected
nodes
 Control provided: channel access, packet framing, reliability

Physical: Deal with bit trans in physical medium

COMP 4621 Page 1


○ Physical: Deal with bit trans in physical medium

COMP 4621 Page 2


Application Layer Processes & Sockets
Sunday, September 11, 2016 10:23 AM

- Network app:

Application Architecture ○ ≥ 2 procs exchanging messages


○ Each pair of procs communication uniquely
determined by (Host IP + Port)
Connect to 128.119.245.12 at port 80
- Client-Server:
○ Client:
- Socket:
 Request service by sending messages to
server ○ Interface between app & transport layer
 No directly communicate with each other ○ Procs communication only deal with this
interface, no need to see actual packet
 Usually: Intermittently connected, dynamic IP
exchanged between hosts
○ Server:
 Implement service by reading client's request,
performing action & sending reply
 Always-on, permanent IP

- Peer-to-peer (P2P):
○ Peers request & provide service directly among
each other
○ No always-on server
○ Intermittently connected, dynamic IP
○ Self-scalability: New peers bring new service cap &
demand

Transport Layer Overview

Building Blocks of Application Layer Protocol


- Services provided for app:
○ Data integrity: No loss / Loss-tolerant
- Type of msg: ○ Throughput:
Request, response  Min throughput required ?
- Msg syntax: What fields?  Elastic ? (make use of any throughput)
- Msg semantics: Meaning of fields? ○ Timing: Delay = ?
- Rules: When & How procs send & response to ○ Security
msg
- Protocols:
○ TCP: Reliable transport
 Conn-oriented: Conn setup required between client & server
procs
 Provide:
□ Flow control: Sender won't overwhelm receiver
□ Congestion control: Throttle sender when network
overloaded
 Not provide: Timing, min throughput guarantee, security
 Security: Apps MUST encrypt data themselves using SSL

○ UDP: Unreliable data transfer


Not provide: reliability, flow control, congestion control, timing,
throughput, guarantee, security, conn setup

COMP 4621 Page 3


HTTP (Hypertext Transfer Protocol)
Sunday, September 18, 2016 4:30 PM

Non-persistent vs Persistent HTTP

Overview
Non-persistent Persistent
Mechanis - Each object delivered by - Multiple objects sent over single
- Client-Server model m individually established TCP conn TCP conn
- Use TCP:
○ HTTP client initiates TCP conn to HTTP server - Multiple objects: - Eliminate overhead in establishing
○ Conn established: HTTP messages exchanged • Multiple TCP conns & maintaining multiple TCP conns
through socket interface on top of TCP • Can simultaneous → ↑
- Stateless: Server maintains no info about past Response time
client request
Response Each object: - First object:
time 2 RTT + File trans time 2 RTT + File trans time
1 RTT = Initialize TCP - Consequent objects:
1 RTT = File request 1 RTT + File trans time

RTT: Time of small packet to travel client → server & back

HTTP Messages

- Request:

Cookies

- Help web server keep track of users


Auth, shopping carts, recommendation, user session state
(webmail)

Request method types (HTTP/1.1):


- Four components embedded in:
 GET: Form input uploaded in URL
○ HTTP request/response header line
www.somesite.com/animalSearch?key=monkey
○ Web browser (client)
 POST: Form input contained in Entity Body field
○ Server database
 HEAD: HTTP responds with no requested object, used
for debugging
 PUT: Upload files to server
 DELETE: Delete files on server

- Response:

COMP 4621 Page 4


Popular response status code:
 200 OK
 301 Move Permanently: Object moved, new loc
specified later in Location field
 400 Bad Request: Request msg not understood by
server
Web Caches (Proxy Server)
 404 Not Found: Requested object not found on server
 505 HTTP Version Not Supported

- Mechanism:
○ Client sends HTTP request to origin server through proxy
○ At proxy:
 If local copy of requested object found: Immediately return it
to client
Conditional GET
 Otherwise:
□ Forward request to origin server
□ Store copy of retrieved objects
- Goal: Not send object if cached version is still up-to-
□ Pass object to client
date
- How:
- Benefits:
○ Client: Specific date of cached copy in HTTP
○ ↓ Response time
request
If-modified-since: <date> ○ ↓ Traffic on institution access link
○ Server: Check date of client's cached copy:
 If client's cached still up-to-date:
Responded with no object, status code 304
HTTP/1.1 304 Not Modified
 If client's cached is outdated: Respond with
newest object and its date:
□ Last-Modified: <date>

Assumption:
Avg object size: 100kb
Avg request rate: 15/s
RTT from institutional router to any origin router: 2s

LAN utilization:
(100kb) × (15 request/s) / (1 Gb/s) = 0.15%

Without cache:
Access link utilization:
(100kb) × (15 request/s) / (1.54 Mb/s) = 97%
Total delay
= RTT + Access delay + LAN delay
= 2s + mins + μs

With cache, hit rate = 40% (40% requests satisfied by cache)

COMP 4621 Page 5


With cache, hit rate = 40% (40% requests satisfied by cache)
Access link utilization:
0.6 * (100kb) × (15 request/s) / (1.54 Mb/s) = 58%
Total delay
= 0.6 * (Delay from origin server) + 0.4 * (Delay from cache)
≈ 0.6 * 2.01 + 0.4 * (~ms)
≈ 1.2s

COMP 4621 Page 6


Email & FTP (File Transfer Protocol)
Monday, September 19, 2016 4:35 PM

Email
FTP (File Transfer Protocol)

- Major components:
○ User agents: Compose, edit, read mail
Outlook, Thunderbird, …
○ Mail server: Mailbox + Msg queue
- Function: Transfer file to/from remote host ○ Mail transfer protocol: For transfer mail msg among mail
- Model: Client-Server servers
- Run on TCP
- Mail transfer protocol: SMTP
- "Out-of-band" mechanism: Separate control & data conn: ○ TCP, port 25
○ Server port 21 (control): Client contacts server ○ Client-Server model
authorize, browse dir, send control command ○ Direct transfer: No intermediary mail server between sending &
○ Server port 20 (data): receiving server
 Opened by server to transfer data to client upon request ○ Persistent conn
 Closed after finishing transferring 1 file ○ Msg must be 7-bit ASCII

- Stateful protocol: Server maintains "state" - Mail access protocol (client retrieving data from server):
current dir, previous auth ○ POP3:
 Stateless
- Sample commands:  "Download-and-keep": Once msg downloaded to clients, it
○ USER username, PASS password removed from server
○ IMAP:
○ LIST: return list of file in current dir
 Msg kept at server
○ RETR filename: Get file
 Allow users to organize msg in folders
○ STOR filename: Store file onto host
 User state kept across session

- Sample return codes:


○ HTTP, …
○ 331 Username OK, password required
○ 125 date connection already open; transfer starting
○ 425 Can't open data connection
○ 452 Error writing file

COMP 4621 Page 7


DNS (Domain Name Service)
Friday, October 07, 2016 11:34 AM
DNS Name Resolution

DNS Service
Iterated query Recursive query

- Hostname→IP translation
- Host/Mail server aliasing:
Get canonical (original) hostname/mail server, IP for supplied
alias hostname/mail server
- Load distribution:
○ DNS database contain replicated servers' IP for canonical
name
○ Client query → DNS server returns entire set of IPs but rotates
set order each time → Distribute traffic

Architect

Contacted server reply contacting with


name of next server to contact:
"I don't know the answer, but you
can ask this server instead"

DNS Caching

- Distributed, Hierarchical Database


- Mapping query result cached in DNS servers
- 3 classes of hierarchical server:
○ Root: - TLD server addr cached in local DNS ⇒ Prevent root DNS from
 Contacted by local DNS that can't resolve name itself frequently visited
 Contact authoritative DNS if name mapping not known
○ Top lvl domain (TLD): Responsible for .com, .org, .net, .edu,
all top-lvl country domains
○ Authoritative:
 Organization's own DNS server
 Provide mapping of organization's public host to IP
DNS Records (Resource records)
- Local DNS (Default Name): (name, value, type, ttl)
○ Not strictly belong to hierarchy
○ Each ISP (residential, company, uni, …) has 1 - ttl (Time-to-live): Existing time of record (will be removed afterward)
○ Host's DNS query sent to local DNS first:
 If result avail (cached, up-to-date) → Send to host - type = A:
 Otherwise: Forward query to hierachy ○ name = Hostname
○ value = IP
- type = NS:
○ name = Domain
○ value = Authoritative DNS's hostname for the domain
DNS Protocol - type = CNAME:
○ value = Canonical name
○ name = Alias
Query & Reply msg have same format
- type = MX: value = Name of mail server associated with name

COMP 4621 Page 8


COMP 4621 Page 9
P2P (Peer-to-Peer Applications)
Thursday, September 29, 2016 5:16 PM
BitTorrent (P2P File Distribution)

File Distribution: Client-Server vs P2P

Client-Server P2P - Torrent: Collection of all peers participating in distribution of particular file
- Tracker: Infrastructure node in each torrent, keep track of all alive peers
Time to
distribute file of
size F to N - File: broken into chunks of 256kB
client
Analysis - Server: Must sequentially - Server: Must upload at least 1 copy - Peer:
upload N file copies: down to peers: ○ Download + Upload among each other, accumulate more chunks over time
○ When obtain entire file, can (selfishly) leave torrent/(altruistically) remain
- Clients: Each must - Peers: Each must download a file
○ Can leave anytime (with incomplete chunk set) and rejoin later
download a file copy: copy:
- Server + Peers: All contribute to
- Control mechanisms: Peer X
deliver N file copies
○ X join: Register with tracker → Get rand subset of other peers' IP
Max upload rate = us + ∑ui
○ X periodically inform tracker its aliveness
Deliver time:

○ Chunks requesting: Rarest first


u1 = … = un = u, F/u = 1 hour, us = 10u, dmin ≥ us  Periodically, X ask others it knows list of chunks they have
 X then requests its missing chunks from others, starting from rarest ones
⇒ Rarest chunks distributed more quickly ⇒ Eventually equalize copy no
of each chunks

○ Chunk sending: Tit-for-tat:


 X sends chunks to 4 peers currently sending chunks to it at highest rate
 Out-of-top-4 peers' requests choked
 X re-evaluates top 4 every 10 secs
⇒ Better trading partner → Get file faster

 Every 30 secs: X rand select peer Y, start sending chunks to Y ("optimistically


unchoke"), Y may later join X's top 4
⇒ Allow new peers to get chunks

Circular DHT
Distribute Hash Table (DHT)

- Each peer only aware of immediate successor & predecessor


- Search query: O(N) - Nature:
○ Store (key, value) pair
○ Distributed, P2P database: Each peer only holds subset of data

- Design:
○ Peer identified by integer in range [0, 2n - 1] (n bits)
○ Hash function: Map original key to integer in range [0, 2n - 1]
○ Assign each (key, value) pair to peer with ID "closest" to key
ID closest successor of key
n = 4 → ID range = Hashed key range = [0, 15]
- Use shortcuts → With O(logN) neighbors, can reduce search query to Peers: 1, 3, 4, 5, 8, 10, 12, 14
O(logN) Key = 13 → Successor peer = 14
Key = 15 → Successor peer = 1 (circular)

COMP 4621 Page 10


- Handle peer churn (Peers come & go): Each peer knows addr of 2
immediate successors

COMP 4621 Page 11


Transport Layer
Friday, October 07, 2016 11:35 AM

UDP

Overview - Features:
○ Connectionless:
 No handshaking between sender & receiver
- End-to-end protocol: Provide logical comm  Each UDP seg handled indept of others
between app procs on different host ○ "Best-effort": Seg may be lost/delivered out-of-order
○ No congestion control: Data sent as fast as desired
- Min required functions:
○ Multiplexing (at sender): Handle data from - Seg structure:
multiple socket, add transport header
○ Demultiplexing (at receiver): Deliver data
seg to correct socket using header

- Socket identity (for demultiplexing):


○ UDP = (Dest IP, Dest port)
○ TCP = (Source IP, Source port, Dest IP, Dest
Port)

- Checksum: Detect error


○ Sender:
 Treat seg content, including header fields as seq of 16-bit
 Checksum = 1's complement sum of these bit strings
○ Receiver:
 Compute checksum of received seg
 Computed ≠ Header field: Error detected

Sum 1110011001100110
+ 1101010101010101
= 11011101110111011
WrapAround 1011101110111011
(If MSB = 1, omit it, add 1 to + 1
result) = 1011101110111100
Checksum 0100010001000011
(flip bit of WrapAround result)

COMP 4621 Page 12


Reliable Data Transfer
Thursday, October 20, 2016 1:34 PM

Finite State Machine (FSM)

Stop-And-Wait Protocols
(Sender sends 1 pkt, then wait for receiver to respond)

Protocol Assumption Techniques Sender FSM Receiver FSM


version
rdt 1.0 - NO bit errors

- NO pkt lost

rdt 2.0 - Bit errors on - Checksum: Detect error


data
- NO bit errors - Receiver feedback:
on receiver • ACKs: Receiver tells sender
feedback pkt OK
• NAKs (Neg ACK): Receiver
- NO pkt lost tells sender pkt has errors,
need retransmission

rdt 2.2 - Bit errors (on - Seg No (0, 1) → Receiver knows


data + whether it receives new/re-
feedback) transmitted pk

- NO pkt lost - Receiver Feedback:


• Include checksum → Sender
can detect feedback error

• Include seg no → Sender


know which pkt they should
send next

• Duplicated ACKs:
ACK of previous pkt =
NAK of current pkt

rdt 3.0 - Bit errors - Timer: Same as rdt 2.2


• Sender waits "reasonable"
- Pkt loss amount of time for ACK
• Timeout, no ACK →
Retransmit

(Delayed feedback: Solved by pkt


no)

* NOTE: rdt 3.0 performance limited by Stop & Wait:


Packet: L = 1 kb = 1000 b

COMP 4621 Page 13


Packet: L = 1 kb = 1000 b
vtrans = 109 b/s
dprop = 15 ms

RTT = 2dprop = 30 ms
ttrans = L / vtrans = 0.001 ms

Sender utilization:

Throughput (over link of 1Gb/s !)

Pipelined protocols
(Sender allows multiple in-flight, yet-to-be-acknowledge pkts)

Protocol Techniques Issues Sender FSM Receiver FSM


Go-back-N - Sender: - Pros: Receiver: Simple logic

- Cons: Single error pkt cause


unnecessary retransmit of
many other pkts

• Have ≤ N unack pkts in buffer


• Timer for oldest unack pkts
Timeout → Retransmit ALL unack pkts

- Receiver:
• No buffering:
N = Sliding-window size (Max. no of allowed in-flight pkt each sending
Pkts arrives out of order (gap created) →
time)
Discard immediately
base = Seg no of oldest unack pkt
• Cumulative ACK:
nextseqnum = Seg no of smallest not-yet-sent pkt
ACK seg no of LAST in-order pkt

Selective - Sender: - Pros: Avoiding unnecessary


Repeat retrans

- Cons:
• Receiver: Complicated
logic
• Sender: Manage many
timers → OS overhead

- Range of seg no ≥ 2 ×
• Have ≤ N unack pkts in buffer
Window size (prevent receiver
• Maintain timer for EACH unack pkt
misunderstanding of
Timeout → Retransmit only that pkt new/retransmission pkt)

- Receiver:

• Buffer out-of-order pkts


Send individual ACK for each pkt rcv_base = Seg no of smallest unack pkt

COMP 4621 Page 14


TCP
Thursday, October 20, 2016 11:29 PM
Packet structure

Overview

- Point-to-point:
1 sender, 1 receiver
- No msg boundary:
Data treated as ordered byte stream
- Pipelined:
Multiple in-flight pkt
- Full duplex:
App-layer data flow proc A → B at same
time as proc B → A (different hosts)
- Conn-oriented:
Handshaking required
- Flow control:
Sender won't overflow receiver's buffer
(control by sliding window size)

- MSS (Max Seg Size): Size of payload, set by OS

Timeout Estimation - Data numbering: In terms of byte:


○ SegNum (Sender → Receiver): No of FIRST byte in pkts' Data Seg
○ AckNum (Receiver → Sender): No of NEXT byte expected by
receiver (Cumulative ACK, similar to Go-Back-N)

- SampleRTT:
○ Time measured from seg trans → ACK of seg received
○ Only consider seg transmitted once

- EstimatedRTT = (1 - α) × EstimatedRTT + α × SampleRTT


○ Exponential moving avg (influence of past sample ↓
exponentially)
○ Typical α = 0.25

- DevRTT = (1 - β) × DevRTT + β × |SampleRTT = EstimatedRTT|


○ "Safety margin" for timeout estimation
○ Typical β = 0.25

- TimeoutInterval = EstimatedRTT + 4 × DevRTT

Reliable Data Transfer


(Ignore flow + congestion control)

Entity Sender Receiver


FSM

COMP 4621 Page 15


FSM

Highlights - Control mechanism = Go-Back-N + Selective-Repeat: - TCP spec doesn’t include how to handle out-of-order segs
• Go-Back-N: Cumulative ACK (Only ack expected next in-order SegNum, ignore out-of-order segs) (If out-of-order segs buffered → Need more logic to handle ExpectedSeqNum as it can change a lot more upon new pkt
• Selective repeat: Individual retransmit (Only retransmit smallest unack seg, NOT ALL unack) received, not only length(data))

- Fast Retransmit:
• Often large no of inflight seg → If 1 seg lost, many duplicated ACKs
• If sender receives 3 duplicated ACKs → Immediately retransmit seg without waiting for timeout

Scenarios

Lost ACK Premature timeout Cumulative ACK Fast retransmit due to duplicated ACK

Flow Control
Connection Management

- Prevent sender from transmitting too many pkts to overflow receiver's


buffer
Establishment

- Receiver:
○ Advertise free buffer space through RcvWnd field
RcvWnd = RcvBuffer - (LastByteRcv - LastByteRead)

Closing
○ Issue: new RcvWnd only sent when receiver has ACK or data to
send to sender

COMP 4621 Page 16


Closing
○ Issue: new RcvWnd only sent when receiver has ACK or data to
send to sender

- Sender:
○ Control sliding-window size → Limit no of allow in-flight pkts
LastByteSent - LastByteAcked ≤ WindowSize = RcvWnd

○ When LastByteSent - LastByteAcked > RcvWnd:


Continue send 1-byte seg to receiver
→ Stimulate receiver to ACK with updated RcvWnd

Either first initiated by client/server

COMP 4621 Page 17


TCP Congestion Control
Saturday, October 22, 2016 12:33 AM

Congestion Control Overview

- Causes:
○ Large queueing delay (due to pkt arrival rate ≈ link cap)
○ Unneeded retrans by sender (due to premature timeout)
→ Router use link bandwidth to forward unneeded
pkt copies
○ Pkt dropped by router (due to full buffer)
→ Trans cap each upstream link leading to that
router wasted

- Solution approaches:
○ End-to-end:
 No explicit feedback from network
 Congestion inferred from loss, delay observed by
end-sys

○ Network-assisted: Routers provide feedback to end-sys


Congestion-indicating bit

TCP Congestion Control Mechanism

- Basis: Limit sliding-window size: WindowSize ≤ cwnd

- Expected behavior: AIMD (Additive ↑, multiplicative ↓):


○ Additive ↑: cwnd ↑ by 1 MSS every RTT until loss
○ Multiplicative ↓: cwnd ↓ by 1/2 after loss

- Implementation (TCP Reno):


○ Slow start: cwnd ↑ exponentially
cwnd x2 every RTT
(⇔ cwnd = cwnd + MSS every new ACK)
○ Congestion avoidance, Fast recovery: cwnd ↑ linearly
cwnd += MSS every RTT
(⇔ cwnd = cwnd + MSS × (MSS / cwnd) every new ACK)
(⇔ cwnd = cwnd + MSS every duplicated ACK)

* NOTE: For TCP Tahoe (older version):


No Fast recovery: Timeout & Triple Duplicated ACK →
Slow start

TCP Fairness

- Goal:
K TCP session share same bottleneck link of bandwidth R
→ Each session should have avg rate of R/K

- Why TCP fair:

TCP Throughput

- In terms of W (window size) & RTT:

COMP 4621 Page 18


TCP Throughput

- In terms of W (window size) & RTT:

- In terms of L (Seg loss prob):

- Issues:
○ UDP: No congestion control → Eat up throughput for TCP
○ Parallel TCP: App asking for more TCP parallel conn eat up throughput for app
asking for less TCP conn

COMP 4621 Page 19


Network Layer
Saturday, December 3, 2016 10:50 AM

Virtual Circuit (VC) & Datagram Networks


Overview

VC Datagram
Service Network-layer connection-oriented Network-layer
service connectionless
service
End-to- - Conn setup/teardown required → Path - No setup
end path pre-determined - No resource
- Resources allocated to VC guaranteed
Link & - Link identified by different VC numbers - Router: Maintain no
Router (depending on VC it belongs to) state about end-to-
end conn
- Router:
• Forwarding table for each input
• Maintain "state" for each passing
conn
Pkt - Carry VC identifier - Carry dest addr (used
- Embedded in every hosts, routers by router to select
- Can changed by router after going
through it (based on forwarding table) output link)
- Roles:
○ Transport seg from sending host to receiving host based on
IP addr * NOTE: VC vs Datagram networks is similar to TCP vs UDP, but:
○ Sending host: Encapsulate seg into datagrams - Host-to-host, not proc-to-proc
- Network layer can provide VC or datagram, not both
○ Receiving host: Deliver seg to transport layer
- Implemented in network core, not network edge

- Key functions:
○ Forwarding: Move pkts from router's input to appropriate
router's output
○ Routing: Determine route taken by pkts from source to dest

Components of Routers

Component Diagram Details


Input ports Decentralized switching: Forwarding performed at input

COMP 4621 Page 20


Input ports - Decentralized switching: Forwarding performed at input
port

- Match plus action: Given datagram dest, look up output


port using copy of forwarding table in input port mem
→ ↑ Processing speed (match with arrival rate)

- Queuing: If Line rate (arrival rate) > Forwarding rate


- Head-of-the-line (HOL) blocking: Datagram at front of
queue might prevents others behind from moving forward
Switching - Switching rates:
fabrics • Rate at which pkts transferred input → output
• Desirable: N × Line rate (N inputs)

- Switching via mem:


• Switching under direct control of CPU
• Pkts copied to sys's mem
• Speed limited by mem bandwidth

- Switching via bus:


• Use shared bus to transfer input → output
• Speed limited by bus bandwidth

- Switching via interconnection network:


• Input - output switch controlled by cross points
• Can be perform multiple switching in parallel
Output ports - Buffering: When Arrival rate from fabric > Transmission
rate

- Scheduling discipline: Choose which datagram get


transmitted first

Recommended buffer size:


RTT: Round-trip time
C: Link cap
N: No of inputs

COMP 4621 Page 21


IP (Internet Protocol)
Saturday, December 3, 2016 11:17 PM
IPv4 Fragmentation

IPv4 Datagram Format

- Network links can have different MTU (Max. Transfer Size)


- Type of service: Type of datagram
- Datagram size > Link MTU → Divided into ≥ 2 fragments:
Datagram requiring low delay, reliability, …
○ Reassembled ONLY at final dest
- Datagram length (bytes) = Header + Data ○ IP headers used to identify & order fragments:
 Flags:
- Time-to-live (TTL): □ 0 for last fragment
○ Max. no of routers a datagram can passed □ 1 otherwise
through before it must dropped  Identifier: Generated by source host
→ Prevent forever circulation  Fragment Offset: = Byte offset / 8
○ ↓1 after processed by router Offset = 185
TTL = 0 → Must dropped → Data should be inserted from byte 185 × 8 = 1480

- Identifier, Flags, Fragmentation offset: Used for IP


fragmentation

- Protocol: Transport-layer protocol creating this


DHCP (Dynamically Host Configuration Protocol)
datagram
TCP = 6, UDP = 17

- Purpose:
○ Allow host to dynamically obtains its IP addr from network server
when it joins network
○ IP addr lease can renewed
IPv4 Addressing

- Pros:
○ Allow reuse of IP addr
- Interface: Represent the conn between host/router & physical link
○ Automatic configuration of IP addr
○ Router has > 1 interface (as it connects to different links)
○ Host has 1-2 interfaces
- Cons: Can’t maintain TCP conn when host moves between subnets (as IP
Wired Ethernet, wireless 802.11, … addr changes)

- Subnet: Device interfaces within same subnet can physically reach - Mechanism:
each other without intervening router
○ 1 (optional): Host broadcasts "DHCP discover" msg

○ 2 (optional): DHCP server (can have > 1 on same network)


broadcasts "DHCP offer" msg with allocated IP addr

○ 3: Host chooses 1 DHCP server, responds with "DHCP request" msg


→ Accept offer

○ 4: Chosen DHCP server responds with "DHCP ACK" msg

NOTE: DHCP msg here are UDP broadcasting msg:


○ Source: 0.0.0.0:68
○ Dest: 255.255.255.255:67

COMP 4621 Page 22


- Extra info provided by DHCP to clients:
- How to obtain IP addr: ○ Addr of first-hop router
○ Local network: Get portion of ISP's addr space ○ Name, IP addr of DNS server
○ Network mask

○ ISP: Get from ICANN (Internet Corporation for Assigned Name &
Number)

- IPv4 addr:
○ Associate with EACH interface

○ Size: 32 bit, usually divided into 4 groups of 8 bits


○ 2 parts:
 Subnet/Prefix: High order bits
 Host: Low order bits

○ CIDR (Classless Inter-Domain Routing):


a.b.c.d/x
x: No of bits presenting subnet part

NAT (Network Address Translation)

○ Network mask:
 Indicate subnet portion of IP addr through no of leading
1's
 Subnet addr = IP & Mask
IP = 128.96.39.10
Mask = 255.255.255.128 (25 leading 0's)

128. 96. 39. 10


& 255.255.255.128
= 128. 96. 39. 0

⇒ Subnet: 128.96.39.0/25

- Hierarchical addressing: - Purpose: All devices in local area network (LAN) present to outside world
○ Router uses longest prefix matching to match dest of pkt with through 1 IP addr
desired output link
○ IP addr of devices from same organizations/ISPs share prefix - Benefits: LAN admin can:
○ Just obtain 1 IP addr from ISP
⇒ Routing info can efficiently advertised among router through ○ Change addr of LAN devices without notifying outside world
route aggregation ○ Change ISP without change addr of LAN devices
○ Prevent LAN devices from being explicitly addressable by outside → ↑
Security

- Implementation:
○ Outgoing datagram: Replace (Source IP, Port) with (NAT IP, New port)
○ NAT translation table: Remember all translation pair
○ Incoming datagram: Replace (NAT, New port) in dest fields with (Source
IP, Original port)

- Limitation:
○ Single NAT IP can only maintain ≈ 60000 simultaneous conns (Port field =
16 bits)
When organization moves from 1 ISP to another, its IP addr
portion can kept but requires extra advertisement effort from
ISP routers ○ NAT traversal problem:

COMP 4621 Page 23


○ Single NAT IP can only maintain ≈ 60000 simultaneous conns (Port field =
16 bits)
When organization moves from 1 ISP to another, its IP addr
portion can kept but requires extra advertisement effort from
ISP routers ○ NAT traversal problem:
 Outside clients can't initiate conn with devices inside LAN:
□ Internal LAN IP can't be used
□ NAT IP is common to all devices)

 Solution:
□ Statically configure NAT to forward incoming connection
requests at given port to specific device
Request to 123.76.29.7:2500 always forwarded to
10.0.0.1:25000
IPv6 Addressing

□ UPnP (Universal Plug & Play): NAT server can lease port
mappings to LAN devices for a period upon requests
- Size: 128 bits
LAN devices requests NAT server to forward traffic to NAT
→ Addr space won't get exhausted soon
port 3100 to its port 31000

- Not allow IP fragmentation


□ Relaying: NAT-behind hosts exchange data through bridge of
→ Sender must resend data using smaller IP datagram size relaying server using formal IP
upon receipt of "Package Too Big" ICMP msg

- No header checksum
→ Remove header checksum recomputation step (due to
change in TTL)
ICMP (Internet Control Message Protocol)
- Datagram format: Fixed 40-byte header

- Used by hosts & routers to communicate network-lvl info


Error reporting, unreachable host, echo request/reply, …

- ICMP msg carried in IP datagram as payload

- Msg content: Type, code, first 8 bytes of IP datagram causing error

- Popular application: Traceroute:


○ Source sends series of UDP seg to dest:
1st set has TTL = 1
○ Traffic class: = IPv4's Type of Service 2nd set has TTL = 2
○ Hop limit: = IPv4's TTL …
○ Next header: = IPv4's Protocol nth set has TTL = n

○ Flow label: Identify datagram in same flow, but flow concept ○ When nth set of datagram arrives to nth router: TTL = 0
not well-defined yet  Router must discard datagrams
 Sends ICMP to source with content:
- Transition from IPv4 to IPv6: □ Type 11, code 0
Tunneling: IPv6 datagram carried as payload in IPv4 datagram □ Router name & IP addr
among IPv4 routers  Source receives these ICMP msgs, record RTTs

○ When an seg eventually arrives at dest:


 Dest return ICMP of "port unreachable" msg (type 3, code 0)
 Source receives this ICMP msg, terminates

COMP 4621 Page 24


COMP 4621 Page 25
Routing Algorithms
Saturday, December 3, 2016 11:17 PM

Routing algorithm classification


Graph Abstraction

- Global vs Decentralized:
○ Global: All routers have complete
knowledge of topology, link costs
○ Decentralized: Router only knows about
physically connected neighbors and link
costs to them

- Static vs Dynamic:
○ Static: Routes changes slowly over time
○ Dynamic: Routes can change quickly
Graph G = (N, E)
N: Set of routers
E: Set of links
c(x, y): Link cost
1, ∝ 1/Bandwidth, ∝ 1/Congestion

Routing Algorithms
Compute least cost paths from 1 node (source) to other nodes

Link-State (LS) Distance-Vector (DV)


Type Global, Iterative Decentralized, Iterative
Notation - u: Source - dx(y) = Cost of least-cost path x → y
- D(v): Current value of path cost from source to v - Dx(y) = Estimate of least cost x → y
- p(v): Predecessor node along shortest path from source to v - Distance vector: Dx = [Dx(y): y ∈ N]
- N': Set of nodes whose least cost path have known
Algorithm Initialization: For each node x
N' = {u} - Assumption:
for v ∈ N: • Know cost to each neighbor v: c(x, v)
if ∃(u, v) then D(v) = c(u, v) else D(v) = ∞ • Maintain neighbors' DV: Dv = [Dv(y): y ∈ N], ∃(x, v)

Loop until all nodes in N': - When c(x, v) changes, Dv update msg from v received:
Find w ∉ N': D(w) min • Update Dx: Dx(y) = min{c(x, v) + Dv(y)}, y ∈ N
N' ← N' ∪ {w} • If ∃y: Dx(y) changes: Notify neighbors v
for v ∈ N, v ∉ N', ∃(w, v):
D(v) = min(D(v), D(w) + c(w, v)) - Under minor, natural condition: Dx(y) → dx(y)
Example

1st iteration 2nd iteration 3rd iteration

Node To To To Node To To To Node To To To


x x y z x x y z x x y z
From 0 2 7 From 0 2 3 From 0 2 3
x x x

Step N' D(v) D(w) D(x) D(y) D(z) From ∞ ∞ ∞ From 2 0 1 From 2 0 1
p(v) p(w) p(x) p(y) p(z) y y y

0 u 7, u 3, u 5, u ∞ ∞ From ∞ ∞ ∞ From 7 1 0 From 3 1 0


z z z
1 uw 6, w 5, u 11, w ∞
2 uwx 6, w 11, w 14, x
3 uwxv 10, v 14, x Node To To To Node To To To Node To To To
y x y z y x y z y x y z
4 uwxvy 12, y
From ∞ ∞ ∞ From 0 2 7 From 0 2 3
5 uwxvyz
x x x
From 2 0 1 From 2 0 1 From 2 0 1
Forwarding table in u:
y y y

COMP 4621 Page 26


Forwarding table in u:
y y y
Dest v x y w z
From ∞ ∞ ∞ From 7 1 0 From 3 1 0
Link (u, v) (u, x) (u, x) (u, x) (u, x) z z z

Node To To To Node To To To Node To To To


z x y z z x y z z x y z
From ∞ ∞ ∞ From 0 2 7 From 0 2 3
x x x
From ∞ ∞ ∞ From 2 0 1 From 2 0 1
y y y
From 7 1 0 From 3 1 0 From 3 1 0
z z z

Potential Oscillation: Count-to-infinity:


problems Routes changes frequently due to frequent change in link cost Loops in network topology → DV converges very slowly when link cost ↑
Link cost ∝ Carried traffic

44 iterations before algorithm stabilizes

Solution (not effective in all cases): Poisoned reversed:


If Z needs to pass through Y to reach X
→ Z tells Y that DZ(X) = ∞
(→ Y won't route via Z to reach X)

Msg - Each node needs to know all link costs in network - Msg only exchanged between directly-connected nodes
complexity - Link cost changes: Msg must sent to all nodes - Link cost changes: DV only propagated to neighbors if it changes
⇒ Complex ⇒ Simple
Convergence - Algorithm: O(|N|2) (more efficient implementation: O((|N|+|E|) - Convergence speed varies, may suffer from count-to-infinity
speed log|N|)
- Max O(|N|.|E|) msg sent
Robustness - Node could broadcast wrong cost for its attached links only - Node can advertise wrong path costs to any/all dests
- Node only calculates its own forwarding tables - Each node's forwarding table (DV) used by others → Error can propagate throughout network
⇒ More robust

Hierarchical routing

- LS/DV limitations:
○ Scale: Storing & exchanging routing info among mil of routers
 Large computation overhead for individual router
 No bandwidth for data pkts
○ Administrative autonomy: Organization wants to administer
its own network as it wishes
Choose routing algorithm, hide internal structure from
outside, …

- Autonomous System (AS):


○ Group of routers under same administrative control
○ Intra-AS routing: Routers in same AS run same routing
protocol
○ Gateway router: At edge of AS, link to router of another AS

- Inter-AS routing tasks:


○ Gateway router:
 Learn which dest reachable through neighboring AS
 Propagate this reachability to all routers within AS

COMP 4621 Page 27


- Inter-AS routing tasks:
○ Gateway router:
 Learn which dest reachable through neighboring AS
 Propagate this reachability to all routers within AS

○ All routers: Choose which gateway to forward pkts towards


AS dest (depend on AS policies)
Hot potato: Forward pkt towards closer gateway

COMP 4621 Page 28


Routing on the Internet
Monday, December 5, 2016 11:01 PM
Intra-AS Routing: OSPF (Open Shortest Path First)

Intra-AS Routing: RIP (Routing Information Protocol) - Link State (LS) algorithm (Dijkstra)

- Each link can have different cost metrics for different Type of Service
- Distance vector (DV) algorithm - Allow multiple same-cost paths
- Max no of routers supported = 15 → Router can simultaneously use ≥ 2 paths to route traffic
towards dest
→ Infinite = 16
- All link cost = 1
- Advertisement:
- Only 1 path exist between source-dest pair.
○ 1 entry/neighbor in msg
○ Flooded to entire AS
- Advertisement:
○ Neighbors exchange DV every 30 sec ○ Msg carried directly over IP (rather TCP, UDP)

○ Each advertisement: ≤ 25 dest subnets


- Integrate multicast support

- Link failure & recovery:


○ No advertisement after 180 sec → Dead link → Advertise - Security: All msgs authenticated
to neighbors
○ Poison reverse: Prevent ping-ping loops

Intra-AS Routing: Hierarchical OSPF

- Forwarding table processing:


○ Managed by app-lvl daemon proc call route-d
○ Advertisement sent in UDP pkts

- 2-lvl hierarchy:
○ 1 Backbone area:
 Route traffic among local areas
 Boundary + Backbone + Area border routers
○ ≥ 2 Local area:
Area border + Internal routers
BGP (Border Gateway Protocol)
De-factor Inter-AS Routing Protocol - Routers only know its area's topology & broadcast LS within its area

- Area border routers:


- Purpose: ○ Summarize distances to routers in its own area
○ Obtain subnet reachability info from neighboring ASs ○ Advertise to other Area border ones
○ Propagate reachability info to all AS-internal routers
○ Determine "good" routes to other networks based on - Backbone routers: Run OSPF inside Backbone area only
reachability info & policy
○ Allow subnet to advertise its existence to rest of Internet - Boundary routers: Gateway to other ASs

- BGP session:
○ Run on semi-permanent TCP conn

○ Advertise paths to different dest network prefix:


 eBGP session: Between routers from different ASs
 iBGP session: Among routers in same AS

COMP 4621 Page 29


○ Advertise paths to different dest network prefix:
 eBGP session: Between routers from different ASs
 iBGP session: Among routers in same AS

○ ≥ 2 prefixes can be aggregated into 1 entry in advertisement


msg

○ Msg type:
 OPEN
 UPDATE: Advertise new/Withdraw old paths
 KEEPALIVE
 NOTIFICATION: Report error, close conn

- BGP route = Prefix + Attributes (AS-PATH, NEXT-HOP, …):


○ AS-PATH: Contain ASs through which advertised prefix passed
→ Prevent loop advertisement
Choose among multiple paths to same prefix
AS 67, AS 17, …

○ NEXT-HOP: Indicate IP addr of first router (outside of AS


receiving advertisement) along advertised path to given prefix
→ Used by routers to configure forwarding table

- BGP Route Selection: Select among multiple routes to dest AS,


based on:
○ Policy
○ Shortest AS-PATH
○ Closest NEXT-HOP
○ …

C's perspective about network


- BGP Routing policy:
A, B, C: Provider networks W: A's customer
X: B & C's customer
Y: C's customer

○ B advertises BAW to X
○ B NOT advertises BAW to C:
 W, C are NOT B's customer
 B gets no revenue for routing C→B→W
○ X NOT advertise BX to C:
X gets no benefit from helping C route to B via X

COMP 4621 Page 30


Broadcast & Multicast Routing Multicast
Saturday, December 3, 2016 11:17 PM

1. Problem statement: Find tree connecting a group (not ALL) routers in network
Broadcast
Deliver pkts from source to ALL other nodes 2. Approaches:
a. Source-based tree: Different senders generate different trees
- Tree-forming criteria: Shortest path

- Flooding: Tree of shortest paths from source to receivers

○ When node receives broadcast pkts, sends copy to all


neighbors - Router's forwarding behavior:
○ Problems: Cycles, broadcast storm if (datagram received from incoming link on shortest path to source)
then Flood datagram to all outgoing links
- Controlled flooding: Node only broadcasts pkts if it hasn't else Ignore datagram
broadcasted same pkt before
○ Node keep track of pkt IDs already broadcasted
○ Reverse path forwarding (RPF)
Only forward pkt if it arrives on shortest path
between node & source

- Spanning tree:
○ Construct spanning tree first:
 Choose center node
 Each node sends msg to center node
 Msg forwarded until arriving at node already
belonging to tree - Pruning: When tree contains subtree with no group member
○ Then only forward pkts along tree  Router having no attached hosts in group sends "prune" msg to
upstream router
 Router receiving prune msg from downstream router forwards msg
further upstream

PIM (Protocol Independent Multicast)

- Not depend on any unicast routing algorithm b. Group-shared tree: Same tree for a group
- 2 scenarios: - Tree-forming techniques: Center-based
 1 router chosen as "center" (rendezvous point (RP))
Dense Spare  Join tree:
No. of - Densely packed - Small □ Edge router sends Join-msg to center
group - "Close" proximity - "Widely dispersed" □ Join-msg hits existing tree branches/center
members □ Path taken by join-msg becomes new tree branch
Bandwidth Plentiful Not plentiful
Membership Assume until explicitly PRUNE No assume until
explicitly JOIN
Tree - Data-driven: RPF - Receiver-driven:
construction Center-based
DVMRP (Distance Vector Multicast Routing Protocol)
- Flood-and-prune: Similar to
DVMRP but: - After joining via RP,
• Underlying unicast router can switch to
protocol provides RPF source-specific tree - Commonly implemented in commercial router
info for incoming → ↑ Performance
datagram Shorter path - Source-based, reverse path forwarding
• Less complicated - No assumption about underlying unicast
downstream flood - RP can extend tree - Initial datagram to group members flooded throughout network
→ ↓ Reliance on routing upstream
algorithm - Router leaving group: Send prune msg upstream
• Has mechanism to detect - Soft state:
leaf-node router
○ DVMRP router periodically forget "pruned" branch, continue to

COMP 4621 Page 31


- RP can extend tree - Initial datagram to group members flooded throughout network
→ ↓ Reliance on routing upstream
algorithm - Router leaving group: Send prune msg upstream
• Has mechanism to detect - Soft state:
leaf-node router
○ DVMRP router periodically forget "pruned" branch, continue to
push data downstream
○ Downstream router must reprune

COMP 4621 Page 32


Link Layer
Monday, December 5, 2016 11:01 PM
Implementation

Overview
- Adapter: = Hardward + Software + Firmware
NIC (Network Interface Card), chip
- Terminology: - Implement link + physical layer
○ Node: Host/Router - Attach into host's sys bus
○ Link: Comm channels connectinb adj nodes along comm path
○ Frame: Link layer pkt, encapsulate datagram

- Link layer services:


○ Framing: Frame = Header + Datagram

○ Link access:
 MAC (Medium Access Control) protocol: Rule by which frame
transmitted onto link
 Support:
□ Point-to-point link
□ Broadcast link:
 ≥ 2 nodes share 1 link, node can transmit
simultaneously
 Collision if ≥ 2 signal at same time
⇒ Multiple Access Control Protocol

○ Reliable delivery (between adj nodes):


Usually on wireless link → Correct error locally, prevent
triggering of end-to-end retransmission

○ Error detection & correction: Error Detection & Correction: Parity Checking
 Error sources: Signal attenuation, electromagnetic noise, …
 Correct bit errors without triggering retransmission
- Single parity:
○ Flow control ○ Sender: Add 1 Parity bit for every d data bit
→ No of 1's in (d+1) bit is even (Even scheme) / odd (Odd
○ Half/Full-duplex transmission scheme)
Wiress, by nature, is HALF-duplex

Error Detection: CRC (Cyclic Redundancy Check)

○ Receiver: For every (d+1) bit received, check no of 1's:


 Even scheme: ODD no of 1's → Error
- Widely used in practice
 Odd scheme: EVEN no of 1's → Error
Ethernet, 802.11 Wifi
- Can detect all burst error < (r + 1) bit
○ Undetected error can still happen
EVEN/ODD no of 1's in Even/Odd scheme
- Sender: For every d data bit
○ D = Number represented by these d bit
- 2D parity:
○ G = Generator, bit length (r + 1)
○ Sender:
○ R = CRC bit of length r
 Every d data bit divided into i rows × j cols
 Compute parity bit for each row, col
○ Need to find R such that: (D.2r ⨁ R) divisible by G
→ R = D.2r mod G

○ Receiver:
 Detect & correct 1 bit error, or
 Detect (only) 2 bit errors

COMP 4621 Page 33


○ Receiver:
 Detect & correct 1 bit error, or
 Detect (only) 2 bit errors

- Receiver:
○ G known
○ T = Number represented by (d + r) bit received

○ T mod G ≠ 0 → Error

COMP 4621 Page 34


Multiple Access Protocol
Friday, December 09, 2016 10:12 AM
Random Access Protocol: Slotted ALOHA

Overview: Multiple access protocol

- Distributed algorithm, determine how nodes share channel, i.e.


when node can transmit

- No out-of-band coordination channel: Comm about channel


sharing must use channel itself
- Assumption:
- Ideal: Rate R bps ○ Equal frame size
○ 1 node wants to transmit: Sending rate = R ○ Time divided into equal slots (used to transmit 1 frame)
○ M nodes wants to transmit: Sending rate of each = R/M ○ Node:
○ Fully decentralized:  Start transmission only when slot begins
 No special node to coordinate transmission  Sync
 No sync of clocks, slots  ≥ 2 nodes transmit in slot → All detect collision
○ Simple
- Node operation: Obtain fresh frame, transmit in next slot
- Types: ○ No collision: Progress to new frame, next slot
○ Channel partitioning: Based on time slots, freq, code, … ○ Collision:
TDMA (Time Division Multiple Access)  Retransmit frame in next slots until success
 Prob of choosing slot = p

- Pros:
○ Single active node can continuously transmit at full rate
FDMA (Freq Division Multiple Access)
○ Highly decentralized: Each node detects collision & decide when to
retransmit independently
○ Simple

- Cons:
○ Waste slot: collision, idle slots
○ Clock sync among nodes
○ Random access:
 No chanel division
- Efficiency (Long-run prop of successful slots)
 When node wants to send: Transmit at full rate
N nodes, each has many frame to send
 No priori coordination among nodes
Prob of choosing slot = p
→ Allow collision, provide collision recovery

P(Node success in slot) = p(1 - p)N - 1


○ "Taking turns":
E = P(Any node success) = Np(1 - p)N - 1
 Node takes turn to transmit
→ Find p to max E
 Nodes with more data to send can take longer turn

N → ∞, max(E) = 1/e = 0.37

Random access protocol: CSMA/CD (Carrier Sense Multiple Random Access Protocol: Pure (Unslotted) ALOHA
Access/Collision Detection)

- Node operation: Obtain fresh frame, transmit immediately


- Principle of CSMA: Listen before transmit ○ Collision: Immediately retransmit (after completely transmit collided
○ Channel sensed idle: Transmit entire frame frame) with prob p
○ Channel sensed busy: Defer transmission ○ No collision:
 Wait for frame transmission time
- Collision can still occur due to propagation delay, making 2 nodes  Error detected: Retransmit frame with prob p
unable to hear each other immediately No error: Wait for new frame with prob (1 - p)

- Pros: No sync needed


- Cons: High collision prob

- Efficiency:

COMP 4621 Page 35


Cons: High collision prob

- Efficiency:

1 time unit = Frame transmission time


Node start transmitting at t 0

E = P(Node success)
= P(Node transmit)
× P(No other nodes transmit in [t0 - 1, t0]
× P(No other nodes transmit in [t0, t0 + 1]
= p × (1 - p)N - 1 × (1 - p)N - 1
- Collision Detection:
○ Detect collision within short time → Abort transmission → ↓ N → ∞, Choose p to max E: max(E) = 1/(2e) = 0.18
Channel wastage (Worse than slotted ALOHA)
○ Implementation:
 Easy in wired LANs: Measure signal strength, compare
transmitted & received signals
 Difficult in wireless LANs: Received signal
overwhelmed by local transmission power

"Taking turns" access protocol


- Ethernet CSMA/CD algorithm:
○ 1: Receives datagram, create frame
○ 2: Sense channel:
- Motivation:
 Idle: Start transmission
○ Channel partitioning:
 Busy: Wait
 Efficient & Fair at high load
○ 3: If entire frame transmitted without collision: Done
 Inefficient at low load (delay in channel access)
If collision detected:
○ Random access:
 Abort, send jam signal
 Efficient at low load (channel rate fully utilized)
 Enter binary binary (exponential backoff):
 Inefficient at high load (collision overhead)
□ After mth collision: Choose random K from:
⇒ Taking turn: Balance approach
{0, 1, 2, …, 2m - 1}
□ Wait (512K) BIT times, then go to Step 2
- Implementation:

- Efficiency: Pollings Token-passing


- Master node invites slave nodes Control token passed from 1
to transmit in turn node to next sequentially
- Slaves: Typically "dumb" devices

tprop: Max prop delay between 2 nodes in LAN


ttrans: Time to transmit max-size frame

tprop → 1, ttrans → ∞: E → 1

- Benefit:
○ Simple, fully decentralized
○ Better performance than ALOHA

- Concerns:
○ Single point of failure (Master node/Token)
○ Latency: Waiting for turn
○ Overhead

COMP 4621 Page 36


LAN (Local Area Network) ARP (Address Resolution Protocol)
Friday, December 09, 2016 10:12 AM

- Function: Translate IP addr → MAC addr within LAN


MAC Address
- ARP table: Exist in each IP node
 Entry: (IP addr, MAC addr, TTL)
 Soft state: After period of TTL, addr map will forgotten

- Plug-and-play: Node creates table automatically


A wants to send datagram to B
B's MAC addr not in A's ARP table

○ A broadcast ARP query pkt with B's IP addr


Dest MAC addr = FF-FF-FF-FF-FF-FF
○ B receives ARP pkt, specifically replies to A with B's MAC addr
Dest MAC addr = A's MAC addr
○ A caches B's IP & MAC addr in its ARP table

- Unique, link-layer addr for each adapter


→ Router can have ≥ 2 MAC addr (1 link = 1 adapter)
Ethernet
- Size: 6 byte
1A-2F-BB-76-09-AD (1 number = 4 bit)
- Market properties: "Dominant" wired LAN technology
- Flat structure: No change wherever adapter goes (⇔ Social security ○ Cheap, Simple
number) ○ Support high speed
≠ IP addr: Hierarchical structure, change when moving to new 100 Mbps, 1 Gbps, 10 Gbps, …
subnet (⇔ Postal addr) ○ Support different physical layers:
fiber, copper cable, …
- MAC broadcast addr:
○ FF-FF-FF-FF-FF-FF
○ Sending adapter wants all other adapters on LAN to process this
frame

Link-layer Switch - Technical properties:


○ Connectionless: No hand-shaking between sending & receiving NICs
○ Unreliable:
 Receiver doesn't (N)ACK sender
 Recovery relied on higher layer
○ MAC protocol: CSMA/CD with binary backoff

- Physical topology
○ Coaxial bus: All nodes in same collision domain
○ Star:
 Active switch in center
 Each node runs separate Ethernet protocol
→ No collision with each other

- Dedicated, direct conn to each connecting node


→ No collision among nodes, full-duplex
Can transmit simultaneously
A ↔ A', B ↔ B'

- Transparency: Hosts, router unaware of switches


- Frame structure:
→ Switch doesn't have MAC addr

- Role: Forwarding & Filtering


○ Filtering: Determine whether incoming frame should be

COMP 4621 Page 37


- Frame structure:
→ Switch doesn't have MAC addr

- Role: Forwarding & Filtering


○ Filtering: Determine whether incoming frame should be
forwarded/dropped
○ Preamble (8 bytes):
○ Forwarding: Determine & Move frame to appropriate interface
 7 bytes of "10101010" + 1 byte of "10101011"
 Sync sender & receiver clock rates
- Plug-and-play: Automatically learn which hosts can reached through
which interface
When new frame comes from interface x ○ Dest MAC addr (6 bytes)
 Adapter receives frame with matching dest/broadcast addr →
Process frame
○ Add new entry to forwarding table, if not exists:
 Otherwise, discard
(Source MAC addr, x, TTL)
TTL = Time to live
○ Source MAC addr (6 bytes)
○ Check Dest MAC addr in forwarding table:
 No entry: Broadcast frame to ALL interface except x
○ Type: Indicate protocol used in Payload
 Entry exists, with interface y ≠ x: Forward frame to y ONLY
IP, ARP, …
 Entry exists, with same interface x: Discard frame (as frame
comes from LAN segment already containing Dest MAC
addr) ○ CRC: Cyclic Redundancy Check
Frame dropped if error detected

VLAN (Virtual Local Area Network)

- VLAN-supporting switch: Can configured to define ≥ 2 virtual LANs


over 1 phyiscal LAN infrastructure

- Benefits:
○ Traffic isolation among VLANs
→ ↑ LAN performance, security, privacy
○ Efficiency use of switches: 1 switch can define ≥ 2 VLANs
- Interconnecting switches: ○ Dynamic management: No need to change physical cables when
○ Use switches instead of routers to connect hosts moving devices belonging to same group
○ Limited to acyclic topology (can't have loop)
- Implementation: Port-based VLAN
○ Divide switch ports into groups

- Switch vs Router:

Switch Router
Set up Plug-and-play Need to configure IP addr
Store-and- Examine link-layer header Examine network-layer
forward header
scheme ○ Trunk port:
Forwarding Self-learn through Computed using routing  Carry frames between VLANs defined over ≥ 2 physical
table flooding technique, MAC algorithm, IP addr switches
addr  802.1q protocol: Define format of frame forwarded
Processing Fast (Only process up-to Slower (Process up to layer between trunk ports
time layer 2) 3)
Support - Acyclic - Any topology
topology - Effective at small - Suitable for large network
network (↑ network → ↑ (Routing algorithm chooses
ARP table size, traffic & best among ≥ 2 paths to
processing time) dest → More effective than
flooding & learning
mechanism at large scale)

COMP 4621 Page 38


network (↑ network → ↑ (Routing algorithm chooses
ARP table size, traffic & best among ≥ 2 paths to
processing time) dest → More effective than
flooding & learning
mechanism at large scale)
Protection Vulnerable to broadcast Firewall protection against
storm (1 host endlessly layer-2 broadcast storm
broadcast frames)

COMP 4621 Page 39


MPLS (Multiprotocol Label Switching) MPLS-capabled router (Label-switched router)
Saturday, December 10, 2016 11:12 AM

Motivation - Forward pkts based only label value in MPLS header

High-speed IP forwarding using fixed length label,


instead of IP addr
○ Fast lookup using fixed length ID (rather than IP
prefix matching) - Path to dest based on source + dest addr
○ Employ VC approach - Fast reroute: Use pre-computed backup routes in case link fails

- Signaling:
○ Modify OSPF link-state flooding protocols to carry extra info used by
MPLS routing
Link bandwidth, "reserved" bandwidth, …

○ RSVP-TE signaling protocol: Used by entry MLPS router → Set up MPLS


forwarding at downstream routers

COMP 4621 Page 40


Data Center Networking
Saturday, December 10, 2016 11:15 AM

Design: Rich interconnection among switches & racks:


- ↑ Routing path → ↑ Throughput among racks
- ↑ Redundancy → ↑ Reliability

COMP 4621 Page 41

You might also like