Professional Documents
Culture Documents
NOTES:
○ Global ISP A, B, C, … Delay source Def Usually Typical
fixed delay time
○ IXP: Internet exchange point, connecting global ISP
○ Content Provider Network: Private network, bypassing tier-1, Processing - Figure out where to forward packet next Yes μs
regional ISPs to bring services close to end users dproc - Check bit error
Google, Microsoft, Akamai, … Queuing - Wait for trans at output link No μs - ms
dqueue - Depend on traffic congestion
Transmission Time to push entire packet out of routers/switches Yes μs - ms
dtrans dtrans = L / R
L: Packet length (bits)
R: Link bandwidth (bits/s)
Internet Layered Architecture Propagation - Time for 1 bit to travel from 1 end to other Yes ms
dprop dprop = d / s
d: Length of physical link
- Provide modularity & function decomposition: s: Propagation speed
○ Good: Effective to deal with large, complex software system Speed of light for fiber
○ Bad: Inflexible, redundancy of certain function in different layers - Related to link's physical medium (copper, fiber, …)
○ Transport:
End-to-end protocol: Control communication between 2 Network Security
procs in 2 hosts (end systems)
Control provided: reliability, congestion, multiplexing
NOT for routers - DoS (Denial of Service): Attacker:
TCP, UDP ○ Overwhelm resource with bogus traffic
○ Make resources unavailable to legitimate traffic
○ Network: Responsible for packet-routing among routers, hosts
IP, routing protocols - Packet "sniffing": Read/records all packets passing by network
- Network app:
- Peer-to-peer (P2P):
○ Peers request & provide service directly among
each other
○ No always-on server
○ Intermittently connected, dynamic IP
○ Self-scalability: New peers bring new service cap &
demand
Overview
Non-persistent Persistent
Mechanis - Each object delivered by - Multiple objects sent over single
- Client-Server model m individually established TCP conn TCP conn
- Use TCP:
○ HTTP client initiates TCP conn to HTTP server - Multiple objects: - Eliminate overhead in establishing
○ Conn established: HTTP messages exchanged • Multiple TCP conns & maintaining multiple TCP conns
through socket interface on top of TCP • Can simultaneous → ↑
- Stateless: Server maintains no info about past Response time
client request
Response Each object: - First object:
time 2 RTT + File trans time 2 RTT + File trans time
1 RTT = Initialize TCP - Consequent objects:
1 RTT = File request 1 RTT + File trans time
HTTP Messages
- Request:
Cookies
- Response:
- Mechanism:
○ Client sends HTTP request to origin server through proxy
○ At proxy:
If local copy of requested object found: Immediately return it
to client
Conditional GET
Otherwise:
□ Forward request to origin server
□ Store copy of retrieved objects
- Goal: Not send object if cached version is still up-to-
□ Pass object to client
date
- How:
- Benefits:
○ Client: Specific date of cached copy in HTTP
○ ↓ Response time
request
If-modified-since: <date> ○ ↓ Traffic on institution access link
○ Server: Check date of client's cached copy:
If client's cached still up-to-date:
Responded with no object, status code 304
HTTP/1.1 304 Not Modified
If client's cached is outdated: Respond with
newest object and its date:
□ Last-Modified: <date>
Assumption:
Avg object size: 100kb
Avg request rate: 15/s
RTT from institutional router to any origin router: 2s
LAN utilization:
(100kb) × (15 request/s) / (1 Gb/s) = 0.15%
Without cache:
Access link utilization:
(100kb) × (15 request/s) / (1.54 Mb/s) = 97%
Total delay
= RTT + Access delay + LAN delay
= 2s + mins + μs
Email
FTP (File Transfer Protocol)
- Major components:
○ User agents: Compose, edit, read mail
Outlook, Thunderbird, …
○ Mail server: Mailbox + Msg queue
- Function: Transfer file to/from remote host ○ Mail transfer protocol: For transfer mail msg among mail
- Model: Client-Server servers
- Run on TCP
- Mail transfer protocol: SMTP
- "Out-of-band" mechanism: Separate control & data conn: ○ TCP, port 25
○ Server port 21 (control): Client contacts server ○ Client-Server model
authorize, browse dir, send control command ○ Direct transfer: No intermediary mail server between sending &
○ Server port 20 (data): receiving server
Opened by server to transfer data to client upon request ○ Persistent conn
Closed after finishing transferring 1 file ○ Msg must be 7-bit ASCII
- Stateful protocol: Server maintains "state" - Mail access protocol (client retrieving data from server):
current dir, previous auth ○ POP3:
Stateless
- Sample commands: "Download-and-keep": Once msg downloaded to clients, it
○ USER username, PASS password removed from server
○ IMAP:
○ LIST: return list of file in current dir
Msg kept at server
○ RETR filename: Get file
Allow users to organize msg in folders
○ STOR filename: Store file onto host
User state kept across session
DNS Service
Iterated query Recursive query
- Hostname→IP translation
- Host/Mail server aliasing:
Get canonical (original) hostname/mail server, IP for supplied
alias hostname/mail server
- Load distribution:
○ DNS database contain replicated servers' IP for canonical
name
○ Client query → DNS server returns entire set of IPs but rotates
set order each time → Distribute traffic
Architect
DNS Caching
Client-Server P2P - Torrent: Collection of all peers participating in distribution of particular file
- Tracker: Infrastructure node in each torrent, keep track of all alive peers
Time to
distribute file of
size F to N - File: broken into chunks of 256kB
client
Analysis - Server: Must sequentially - Server: Must upload at least 1 copy - Peer:
upload N file copies: down to peers: ○ Download + Upload among each other, accumulate more chunks over time
○ When obtain entire file, can (selfishly) leave torrent/(altruistically) remain
- Clients: Each must - Peers: Each must download a file
○ Can leave anytime (with incomplete chunk set) and rejoin later
download a file copy: copy:
- Server + Peers: All contribute to
- Control mechanisms: Peer X
deliver N file copies
○ X join: Register with tracker → Get rand subset of other peers' IP
Max upload rate = us + ∑ui
○ X periodically inform tracker its aliveness
Deliver time:
Circular DHT
Distribute Hash Table (DHT)
- Design:
○ Peer identified by integer in range [0, 2n - 1] (n bits)
○ Hash function: Map original key to integer in range [0, 2n - 1]
○ Assign each (key, value) pair to peer with ID "closest" to key
ID closest successor of key
n = 4 → ID range = Hashed key range = [0, 15]
- Use shortcuts → With O(logN) neighbors, can reduce search query to Peers: 1, 3, 4, 5, 8, 10, 12, 14
O(logN) Key = 13 → Successor peer = 14
Key = 15 → Successor peer = 1 (circular)
UDP
Overview - Features:
○ Connectionless:
No handshaking between sender & receiver
- End-to-end protocol: Provide logical comm Each UDP seg handled indept of others
between app procs on different host ○ "Best-effort": Seg may be lost/delivered out-of-order
○ No congestion control: Data sent as fast as desired
- Min required functions:
○ Multiplexing (at sender): Handle data from - Seg structure:
multiple socket, add transport header
○ Demultiplexing (at receiver): Deliver data
seg to correct socket using header
Sum 1110011001100110
+ 1101010101010101
= 11011101110111011
WrapAround 1011101110111011
(If MSB = 1, omit it, add 1 to + 1
result) = 1011101110111100
Checksum 0100010001000011
(flip bit of WrapAround result)
Stop-And-Wait Protocols
(Sender sends 1 pkt, then wait for receiver to respond)
- NO pkt lost
• Duplicated ACKs:
ACK of previous pkt =
NAK of current pkt
RTT = 2dprop = 30 ms
ttrans = L / vtrans = 0.001 ms
Sender utilization:
Pipelined protocols
(Sender allows multiple in-flight, yet-to-be-acknowledge pkts)
- Receiver:
• No buffering:
N = Sliding-window size (Max. no of allowed in-flight pkt each sending
Pkts arrives out of order (gap created) →
time)
Discard immediately
base = Seg no of oldest unack pkt
• Cumulative ACK:
nextseqnum = Seg no of smallest not-yet-sent pkt
ACK seg no of LAST in-order pkt
- Cons:
• Receiver: Complicated
logic
• Sender: Manage many
timers → OS overhead
- Range of seg no ≥ 2 ×
• Have ≤ N unack pkts in buffer
Window size (prevent receiver
• Maintain timer for EACH unack pkt
misunderstanding of
Timeout → Retransmit only that pkt new/retransmission pkt)
- Receiver:
Overview
- Point-to-point:
1 sender, 1 receiver
- No msg boundary:
Data treated as ordered byte stream
- Pipelined:
Multiple in-flight pkt
- Full duplex:
App-layer data flow proc A → B at same
time as proc B → A (different hosts)
- Conn-oriented:
Handshaking required
- Flow control:
Sender won't overflow receiver's buffer
(control by sliding window size)
- SampleRTT:
○ Time measured from seg trans → ACK of seg received
○ Only consider seg transmitted once
Highlights - Control mechanism = Go-Back-N + Selective-Repeat: - TCP spec doesn’t include how to handle out-of-order segs
• Go-Back-N: Cumulative ACK (Only ack expected next in-order SegNum, ignore out-of-order segs) (If out-of-order segs buffered → Need more logic to handle ExpectedSeqNum as it can change a lot more upon new pkt
• Selective repeat: Individual retransmit (Only retransmit smallest unack seg, NOT ALL unack) received, not only length(data))
- Fast Retransmit:
• Often large no of inflight seg → If 1 seg lost, many duplicated ACKs
• If sender receives 3 duplicated ACKs → Immediately retransmit seg without waiting for timeout
Scenarios
Lost ACK Premature timeout Cumulative ACK Fast retransmit due to duplicated ACK
Flow Control
Connection Management
- Receiver:
○ Advertise free buffer space through RcvWnd field
RcvWnd = RcvBuffer - (LastByteRcv - LastByteRead)
Closing
○ Issue: new RcvWnd only sent when receiver has ACK or data to
send to sender
- Sender:
○ Control sliding-window size → Limit no of allow in-flight pkts
LastByteSent - LastByteAcked ≤ WindowSize = RcvWnd
- Causes:
○ Large queueing delay (due to pkt arrival rate ≈ link cap)
○ Unneeded retrans by sender (due to premature timeout)
→ Router use link bandwidth to forward unneeded
pkt copies
○ Pkt dropped by router (due to full buffer)
→ Trans cap each upstream link leading to that
router wasted
- Solution approaches:
○ End-to-end:
No explicit feedback from network
Congestion inferred from loss, delay observed by
end-sys
TCP Fairness
- Goal:
K TCP session share same bottleneck link of bandwidth R
→ Each session should have avg rate of R/K
TCP Throughput
- Issues:
○ UDP: No congestion control → Eat up throughput for TCP
○ Parallel TCP: App asking for more TCP parallel conn eat up throughput for app
asking for less TCP conn
VC Datagram
Service Network-layer connection-oriented Network-layer
service connectionless
service
End-to- - Conn setup/teardown required → Path - No setup
end path pre-determined - No resource
- Resources allocated to VC guaranteed
Link & - Link identified by different VC numbers - Router: Maintain no
Router (depending on VC it belongs to) state about end-to-
end conn
- Router:
• Forwarding table for each input
• Maintain "state" for each passing
conn
Pkt - Carry VC identifier - Carry dest addr (used
- Embedded in every hosts, routers by router to select
- Can changed by router after going
through it (based on forwarding table) output link)
- Roles:
○ Transport seg from sending host to receiving host based on
IP addr * NOTE: VC vs Datagram networks is similar to TCP vs UDP, but:
○ Sending host: Encapsulate seg into datagrams - Host-to-host, not proc-to-proc
- Network layer can provide VC or datagram, not both
○ Receiving host: Deliver seg to transport layer
- Implemented in network core, not network edge
- Key functions:
○ Forwarding: Move pkts from router's input to appropriate
router's output
○ Routing: Determine route taken by pkts from source to dest
Components of Routers
- Purpose:
○ Allow host to dynamically obtains its IP addr from network server
when it joins network
○ IP addr lease can renewed
IPv4 Addressing
- Pros:
○ Allow reuse of IP addr
- Interface: Represent the conn between host/router & physical link
○ Automatic configuration of IP addr
○ Router has > 1 interface (as it connects to different links)
○ Host has 1-2 interfaces
- Cons: Can’t maintain TCP conn when host moves between subnets (as IP
Wired Ethernet, wireless 802.11, … addr changes)
- Subnet: Device interfaces within same subnet can physically reach - Mechanism:
each other without intervening router
○ 1 (optional): Host broadcasts "DHCP discover" msg
○ ISP: Get from ICANN (Internet Corporation for Assigned Name &
Number)
- IPv4 addr:
○ Associate with EACH interface
○ Network mask:
Indicate subnet portion of IP addr through no of leading
1's
Subnet addr = IP & Mask
IP = 128.96.39.10
Mask = 255.255.255.128 (25 leading 0's)
⇒ Subnet: 128.96.39.0/25
- Hierarchical addressing: - Purpose: All devices in local area network (LAN) present to outside world
○ Router uses longest prefix matching to match dest of pkt with through 1 IP addr
desired output link
○ IP addr of devices from same organizations/ISPs share prefix - Benefits: LAN admin can:
○ Just obtain 1 IP addr from ISP
⇒ Routing info can efficiently advertised among router through ○ Change addr of LAN devices without notifying outside world
route aggregation ○ Change ISP without change addr of LAN devices
○ Prevent LAN devices from being explicitly addressable by outside → ↑
Security
- Implementation:
○ Outgoing datagram: Replace (Source IP, Port) with (NAT IP, New port)
○ NAT translation table: Remember all translation pair
○ Incoming datagram: Replace (NAT, New port) in dest fields with (Source
IP, Original port)
- Limitation:
○ Single NAT IP can only maintain ≈ 60000 simultaneous conns (Port field =
16 bits)
When organization moves from 1 ISP to another, its IP addr
portion can kept but requires extra advertisement effort from
ISP routers ○ NAT traversal problem:
Solution:
□ Statically configure NAT to forward incoming connection
requests at given port to specific device
Request to 123.76.29.7:2500 always forwarded to
10.0.0.1:25000
IPv6 Addressing
□ UPnP (Universal Plug & Play): NAT server can lease port
mappings to LAN devices for a period upon requests
- Size: 128 bits
LAN devices requests NAT server to forward traffic to NAT
→ Addr space won't get exhausted soon
port 3100 to its port 31000
- No header checksum
→ Remove header checksum recomputation step (due to
change in TTL)
ICMP (Internet Control Message Protocol)
- Datagram format: Fixed 40-byte header
○ Flow label: Identify datagram in same flow, but flow concept ○ When nth set of datagram arrives to nth router: TTL = 0
not well-defined yet Router must discard datagrams
Sends ICMP to source with content:
- Transition from IPv4 to IPv6: □ Type 11, code 0
Tunneling: IPv6 datagram carried as payload in IPv4 datagram □ Router name & IP addr
among IPv4 routers Source receives these ICMP msgs, record RTTs
- Global vs Decentralized:
○ Global: All routers have complete
knowledge of topology, link costs
○ Decentralized: Router only knows about
physically connected neighbors and link
costs to them
- Static vs Dynamic:
○ Static: Routes changes slowly over time
○ Dynamic: Routes can change quickly
Graph G = (N, E)
N: Set of routers
E: Set of links
c(x, y): Link cost
1, ∝ 1/Bandwidth, ∝ 1/Congestion
Routing Algorithms
Compute least cost paths from 1 node (source) to other nodes
Loop until all nodes in N': - When c(x, v) changes, Dv update msg from v received:
Find w ∉ N': D(w) min • Update Dx: Dx(y) = min{c(x, v) + Dv(y)}, y ∈ N
N' ← N' ∪ {w} • If ∃y: Dx(y) changes: Notify neighbors v
for v ∈ N, v ∉ N', ∃(w, v):
D(v) = min(D(v), D(w) + c(w, v)) - Under minor, natural condition: Dx(y) → dx(y)
Example
Step N' D(v) D(w) D(x) D(y) D(z) From ∞ ∞ ∞ From 2 0 1 From 2 0 1
p(v) p(w) p(x) p(y) p(z) y y y
Msg - Each node needs to know all link costs in network - Msg only exchanged between directly-connected nodes
complexity - Link cost changes: Msg must sent to all nodes - Link cost changes: DV only propagated to neighbors if it changes
⇒ Complex ⇒ Simple
Convergence - Algorithm: O(|N|2) (more efficient implementation: O((|N|+|E|) - Convergence speed varies, may suffer from count-to-infinity
speed log|N|)
- Max O(|N|.|E|) msg sent
Robustness - Node could broadcast wrong cost for its attached links only - Node can advertise wrong path costs to any/all dests
- Node only calculates its own forwarding tables - Each node's forwarding table (DV) used by others → Error can propagate throughout network
⇒ More robust
Hierarchical routing
- LS/DV limitations:
○ Scale: Storing & exchanging routing info among mil of routers
Large computation overhead for individual router
No bandwidth for data pkts
○ Administrative autonomy: Organization wants to administer
its own network as it wishes
Choose routing algorithm, hide internal structure from
outside, …
Intra-AS Routing: RIP (Routing Information Protocol) - Link State (LS) algorithm (Dijkstra)
- Each link can have different cost metrics for different Type of Service
- Distance vector (DV) algorithm - Allow multiple same-cost paths
- Max no of routers supported = 15 → Router can simultaneously use ≥ 2 paths to route traffic
towards dest
→ Infinite = 16
- All link cost = 1
- Advertisement:
- Only 1 path exist between source-dest pair.
○ 1 entry/neighbor in msg
○ Flooded to entire AS
- Advertisement:
○ Neighbors exchange DV every 30 sec ○ Msg carried directly over IP (rather TCP, UDP)
- 2-lvl hierarchy:
○ 1 Backbone area:
Route traffic among local areas
Boundary + Backbone + Area border routers
○ ≥ 2 Local area:
Area border + Internal routers
BGP (Border Gateway Protocol)
De-factor Inter-AS Routing Protocol - Routers only know its area's topology & broadcast LS within its area
- BGP session:
○ Run on semi-permanent TCP conn
○ Msg type:
OPEN
UPDATE: Advertise new/Withdraw old paths
KEEPALIVE
NOTIFICATION: Report error, close conn
○ B advertises BAW to X
○ B NOT advertises BAW to C:
W, C are NOT B's customer
B gets no revenue for routing C→B→W
○ X NOT advertise BX to C:
X gets no benefit from helping C route to B via X
1. Problem statement: Find tree connecting a group (not ALL) routers in network
Broadcast
Deliver pkts from source to ALL other nodes 2. Approaches:
a. Source-based tree: Different senders generate different trees
- Tree-forming criteria: Shortest path
- Spanning tree:
○ Construct spanning tree first:
Choose center node
Each node sends msg to center node
Msg forwarded until arriving at node already
belonging to tree - Pruning: When tree contains subtree with no group member
○ Then only forward pkts along tree Router having no attached hosts in group sends "prune" msg to
upstream router
Router receiving prune msg from downstream router forwards msg
further upstream
- Not depend on any unicast routing algorithm b. Group-shared tree: Same tree for a group
- 2 scenarios: - Tree-forming techniques: Center-based
1 router chosen as "center" (rendezvous point (RP))
Dense Spare Join tree:
No. of - Densely packed - Small □ Edge router sends Join-msg to center
group - "Close" proximity - "Widely dispersed" □ Join-msg hits existing tree branches/center
members □ Path taken by join-msg becomes new tree branch
Bandwidth Plentiful Not plentiful
Membership Assume until explicitly PRUNE No assume until
explicitly JOIN
Tree - Data-driven: RPF - Receiver-driven:
construction Center-based
DVMRP (Distance Vector Multicast Routing Protocol)
- Flood-and-prune: Similar to
DVMRP but: - After joining via RP,
• Underlying unicast router can switch to
protocol provides RPF source-specific tree - Commonly implemented in commercial router
info for incoming → ↑ Performance
datagram Shorter path - Source-based, reverse path forwarding
• Less complicated - No assumption about underlying unicast
downstream flood - RP can extend tree - Initial datagram to group members flooded throughout network
→ ↓ Reliance on routing upstream
algorithm - Router leaving group: Send prune msg upstream
• Has mechanism to detect - Soft state:
leaf-node router
○ DVMRP router periodically forget "pruned" branch, continue to
Overview
- Adapter: = Hardward + Software + Firmware
NIC (Network Interface Card), chip
- Terminology: - Implement link + physical layer
○ Node: Host/Router - Attach into host's sys bus
○ Link: Comm channels connectinb adj nodes along comm path
○ Frame: Link layer pkt, encapsulate datagram
○ Link access:
MAC (Medium Access Control) protocol: Rule by which frame
transmitted onto link
Support:
□ Point-to-point link
□ Broadcast link:
≥ 2 nodes share 1 link, node can transmit
simultaneously
Collision if ≥ 2 signal at same time
⇒ Multiple Access Control Protocol
○ Error detection & correction: Error Detection & Correction: Parity Checking
Error sources: Signal attenuation, electromagnetic noise, …
Correct bit errors without triggering retransmission
- Single parity:
○ Flow control ○ Sender: Add 1 Parity bit for every d data bit
→ No of 1's in (d+1) bit is even (Even scheme) / odd (Odd
○ Half/Full-duplex transmission scheme)
Wiress, by nature, is HALF-duplex
○ Receiver:
Detect & correct 1 bit error, or
Detect (only) 2 bit errors
- Receiver:
○ G known
○ T = Number represented by (d + r) bit received
○ T mod G ≠ 0 → Error
- Pros:
○ Single active node can continuously transmit at full rate
FDMA (Freq Division Multiple Access)
○ Highly decentralized: Each node detects collision & decide when to
retransmit independently
○ Simple
- Cons:
○ Waste slot: collision, idle slots
○ Clock sync among nodes
○ Random access:
No chanel division
- Efficiency (Long-run prop of successful slots)
When node wants to send: Transmit at full rate
N nodes, each has many frame to send
No priori coordination among nodes
Prob of choosing slot = p
→ Allow collision, provide collision recovery
Random access protocol: CSMA/CD (Carrier Sense Multiple Random Access Protocol: Pure (Unslotted) ALOHA
Access/Collision Detection)
- Efficiency:
- Efficiency:
E = P(Node success)
= P(Node transmit)
× P(No other nodes transmit in [t0 - 1, t0]
× P(No other nodes transmit in [t0, t0 + 1]
= p × (1 - p)N - 1 × (1 - p)N - 1
- Collision Detection:
○ Detect collision within short time → Abort transmission → ↓ N → ∞, Choose p to max E: max(E) = 1/(2e) = 0.18
Channel wastage (Worse than slotted ALOHA)
○ Implementation:
Easy in wired LANs: Measure signal strength, compare
transmitted & received signals
Difficult in wireless LANs: Received signal
overwhelmed by local transmission power
tprop → 1, ttrans → ∞: E → 1
- Benefit:
○ Simple, fully decentralized
○ Better performance than ALOHA
- Concerns:
○ Single point of failure (Master node/Token)
○ Latency: Waiting for turn
○ Overhead
- Physical topology
○ Coaxial bus: All nodes in same collision domain
○ Star:
Active switch in center
Each node runs separate Ethernet protocol
→ No collision with each other
- Benefits:
○ Traffic isolation among VLANs
→ ↑ LAN performance, security, privacy
○ Efficiency use of switches: 1 switch can define ≥ 2 VLANs
- Interconnecting switches: ○ Dynamic management: No need to change physical cables when
○ Use switches instead of routers to connect hosts moving devices belonging to same group
○ Limited to acyclic topology (can't have loop)
- Implementation: Port-based VLAN
○ Divide switch ports into groups
- Switch vs Router:
Switch Router
Set up Plug-and-play Need to configure IP addr
Store-and- Examine link-layer header Examine network-layer
forward header
scheme ○ Trunk port:
Forwarding Self-learn through Computed using routing Carry frames between VLANs defined over ≥ 2 physical
table flooding technique, MAC algorithm, IP addr switches
addr 802.1q protocol: Define format of frame forwarded
Processing Fast (Only process up-to Slower (Process up to layer between trunk ports
time layer 2) 3)
Support - Acyclic - Any topology
topology - Effective at small - Suitable for large network
network (↑ network → ↑ (Routing algorithm chooses
ARP table size, traffic & best among ≥ 2 paths to
processing time) dest → More effective than
flooding & learning
mechanism at large scale)
- Signaling:
○ Modify OSPF link-state flooding protocols to carry extra info used by
MPLS routing
Link bandwidth, "reserved" bandwidth, …