You are on page 1of 26

Ivy: A Read/Write

Peer-to-Peer File System


A. Muthitacharoen, R. Morris, T. M. Gil, and B. Chen
In Proceedings of OSDI ‘02

2003-4-29
Presenter : Chul Lee
What is IVY?

• A multi-user read/write peer-to-peer file


system
• No centralized/dedicated components
• Single file system image
• Conventional file system interface

– Case study of DHT use!


Ivy uses DHT
Distributed application (Ivy)
put(key, data) get (key) data
Distributed hash table (DHash)
lookup(key) node IP address
Lookup service (Chord)

• DHT provides
– Simple API
• Put(key, value) and get(key)  value
– Availability (Replication)
– Robustness (Integrity checking)
Prob.: Shared Data w/ DHT
DHT node
Root
Inode

Directory Internet
Block

File1 File2 File3 File3


Inode Inode Inode Data
Challenges

• Consistency of file system meta-data


• Locking is an unattractive approach
over unreliable participants.
• Undo modifications by untrustworthy
participants
• Operate while partitioned, repair
conflicting updates
Solution: Log Based

• Update: Each participant maintains a


log of changes to the file system
• Lookup: Each participant scans all logs
Software Structure

• Local NFS loop-back server


DHT
user Node DHT
Node
Ivy
App
Server
Internet
system DHT
calls NFS RPCs Node

NFS DHT
Client Node
kernel
Example: Using Log

Local NFS Client Local Ivy Server


LOOKUP(“d”, I-Num=10)
I-Num=1000
CREATE(“aaa”, I-Num=1000)
I-Num=9956
WRITE(“hello”, 0, I-Num=9956)
OK
• echo hello > d/aaa
• LOOKUP finds the I-Number of directory “d”
• CREATE creates file “aaa” in directory “d”
• WRITE writes “hello” at offset 0 in file “aaa”
Using Log: File Creation

Type: Create Type: Link Type: Write


I-num: 9956 Dir I-num: 1000 I-num: 9956
File I-num: 9956 Offset: 0

Name: “aaa” Data: “hello”

Log
Head

• A log record describes a change to the file system


Using Log: Lookup

Type: Link Type: Link Type: Remove


Dir I-num: 1000 Dir I-num: 1000 Dir I-num: 1000
File I-num: 9956 File I-num: 9876 Name: “aaa”
Name: “aaa” Name: “bbb”

• A scan follows the log backwards in time


• LOOKUP(name, dir I-num): last Link, but stop at Remov
e
• READDIR(dir I-num): accumulate Links, minus Removes
Contributions

• Multi-user read/write peer-to-peer stora


ge system
• Distributed file system with useful integri
ty properties based on untrusted compo
nents
• Use of distributed hash tables as a build
ing block
Design

• DHash – maps keys to arbitrary values


• Log Data Structure – a linked list
• View – a set of logs
• Combining logs – in ordering records
• Snapshot – state of the file system
Log Data Structure

• A linked list of immutable log records


Log record types

• Roughly NFS update operations


• 160-bit i-numbers as file handle
User Cooperation: Views

• Set of logs that comprise the file system


• View block
– a immutable DHash content-hash block
Combining Logs

• Ivy orders records using version vectors


• Seq. field – starts from zero for each log
• Version vector: tuple (U:V) for each log
– U: Dhash key of the log-head
– V: Sequence number of the most recent record
• Example: (A:5 B:7)
– < (A:6 B:7) BUT concurrent with (A:6 B:6)
• Public keys used to order in case of concurre
ncy
Snapshots

• Each Ivy participant constructs a private


snapshot for speed
• Contains the entire state of the file syste
m
• Each snapshot stored in DHash for pers
istency as content-hash blocks
Snapshot Data Structure
Application Semantics

• Concurrent Updates
• Partitioned Updates/ Conflict Resolution
Concurrent Updates

• Ivy does not serialize all updates

• Problem
– Unlink(“a”) and rename(“a”, “b”) at same
time
– Ivy correctly lets only one take effect
– But it may return “success” status for both
Partitioned Updates

• Ivy is not directly aware of partitions


– Ivy’s design maximizes availability at the e
xpense of consistency
– Letting updates proceed in all partitions
• All updates during a partition are concur
rent updates
• Conflict resolution -> “lc” tools
WAN Evaluation on MAB

• Modified Andrew Benchmark


• 4 DHash nodes
• Round-trip times: 9, 16, 82 milliseconds
• No DHash replication
• 4 logs
• One active writer
WAN Performance

Phase Ivy NFS


Mkdir 11.2 4.8
Write 89.2 42.0
Stat 65.6 47.8
Read 65.8 55.6
Compile 144.2 130.2
Total 376.0 280.4
Summary

• Exploring use of DHTs as a building blo


ck
• Case study of DHT use: Ivy
– Read/write peer-to-peer file system
• Suitable for small groups of cooperating
participants who do not have a single ce
ntral server
Critiques

• Indefinite logs
• Scanning all logs for each request
• Rely on DHT’s block availability and rob
ustness
Discussion

• DHT interface
~ Disk Sector R/W interface
• Performance vs. Semantic
• Any other applications of DHT
– DB, LDAP server…

You might also like