Professional Documents
Culture Documents
ABSTRACT extremely low level [..] Within any CMP (chip multiproces-
We present an assembly language targeted at shared mem- sors) with a shared on-chip cache memory, however, each
ory multiprocessors, where CPU cores synchronize via locks, communication event typically takes just a handful of pro-
acquired with a traditional test and set lock instruction. We cessor cycles [..] Programmers must still divide their work
show programming examples taken from the literature on into parallel threads, but do not need to worry nearly as
Operating Systems, and discuss a typing system that en- much about ensuring that these threads are highly indepen-
forces a strict protocol on lock usage and that prevents race dent, since communication is relatively cheap. This is not a
conditions. complete panacea, however, because programmers must still
structure their inter-thread synchronization correctly, or the
1. MOTIVATION program may generate incorrect results or deadlock.”
The need for fast information processing is one of the driv-
ing forces in the advancement of technology in general, and This work is about language support to help correctly struc-
of computers in particular. Since the early steps in com- turing inter-thread synchronization in CMPs. We design
puting, when the ENIAC performed 300 operations per sec- a simple abstract CMP and present its programming lan-
ond, a huge strode has been made towards the computing guage: a conventional typed assembly language [16] extended
power of nowadays machines. Nevertheless, the computing with a notion of locks [6], and a fork primitive. A type sys-
challenges have increased even faster, and the demands, for tem enforces a policy of lock usage, making sure that, within
instance, from the astronomical community trying to probe a thread, locks are created, locked, the shared memory ac-
the universe or from the biological community trying to un- cessed, and unlocked.
derstand the human genome, constantly take current com-
puting power to the limit. What directions may we follow Our type system closely follows the tradition of typed as-
to increase this computing power? Olukotun and Hammond sembly languages [14, 15, 16], extended with support for
write: threads and locks, following Flanagan and Abadi [6]. With
respect to [6], however, our work is positioned at a much
lower abstraction level, and faces different challenges inher-
With the exhaustion of essentially all performance ent to non-lexical scoped languages.
gains that can be achieved for “free” with tech-
nologies such as superscalar dispatch and pipelin- Lock primitives have been discussed in the context of con-
ing, we are now entering an era where program- current object calculi [5], JVM [7, 8, 11, 12], C- - [19], but
mers must switch to more parallel programming not in that of typed assembly languages. In a typed set-
models in order to exploit multiprocessors ef- ting, were programs are guaranteed not to suffer from race
fectively, if they desire improved single-program conditions, we
performance. [17].
• syntactically decouple of the lock and unlock opera-
Continuing, we read “Previously it was necessary to min- tions on what one usually finds unified in a single syn-
imize communication between independent threads to an tactic construct in high-level languages: Birrel’s lock-
∗This work as partially supported by the European Union do-end construct [2], used under different names (sync,
under the IST Integrated Project “Software Engineering for synchronized-in, lock-in) in a number of other works,
Service-Oriented Overlay Computers”. including the Java programming language [6, 5, 7, 8,
4, 3, 9];
• allow for the lock acquisition/release in schemes other
than the nested discipline imposed by the lock-do-end
construct;
• allow to fork threads holding locks.
TV 06 Seattle, USA We describe the architecture of our CMP and its lock dis-
cipline (enforced by the type system) in Section 2. After,
(1) (1)
α!∈Λ
The code and type the for criticalRegion is that of the previ-
4. INTERPROCESSOR COMMUNICATION ous example. The code for main is identical, apart from the
This section presents the main concepts of our language, last instruction that is replaced by jump enterSleepLockRegion.
based on classical examples taken from the literature on Op- We discuss how to acquire lock α by forking a new thread.
erating Systems [21]. The following examples illustrate the
usage of threads and the discipline of locks. To fork a thread we use instruction fork and specify the
label of the code block that should be run when a processor
is available. Upon thread starting, registers are loaded with
4.1 Mutual exclusion using busy waiting the values they had when the fork action happened. In the
We start with a code block that actively tries to acquire a present case, register r1 should contain the address of the
lock. This technique, called spin lock, is used when there is tuple and register r2 the lock to be acquired.
a reasonable expectation that the lock will be available in a
short period of time. When enterSleepLockRegion fails to acquire the lock, it creates
a new thread (that will try to obtain the lock later) and ter-
Let Tuple stands for type h int iˆα, the type of tuples with one minates the current one, rather than jumping to the begin-
integer component, protected by lock α. In order to read or ning of the code block, as it happens with enterSpinLockRegion.
to write values in a heap location of type Tuple, the thread
e n t e r S l e e p L o c k R e g i o n (r1 : Tuple , r2 : h l o c k (α) i ˆα) {
must hold lock α. In this example, we allocate a tuple, r3 := t e s t S e t L o c k r2 −− t r y t o a c q u i r e l o c k
acquire the lock, and store a value in the tuple (within a i f r3 = 0 jump c r i t i c a l R e g i o n −− i f so , . . .
critical region where we hold the lock). 1
In order to make code blocks reusable in different contexts,
The below code block actively tries to acquire lock α (sup- we need to abstract the type of the tuple. For the sake of
simplicity and clarity, in this paper we do not make use of
plied in register r2 ); on success it transfers control to a criti- existential types. The interested reader may refer to [13] for
cal region. The code block expects the (address of the) tuple examples illustrating continuation passing style using exis-
in register r1 . The tuple plays no rôle in the code block; it tential types.
types τ ::= int | h~σ iα | Γ requires Λ | Ψ; Γ; ∅ ` yield (T-yield)
lock(α) Ψ; Γ ` v : Γ requires Λ Ψ; Γ; Λ0 ` I
(T-fork)
init types σ ::= τ | ?τ Ψ; Γ; Λ ] Λ0 ` fork v; I
register file types Γ ::= r1 : τ1 , . . . , rn : τn Ψ, α : : Lock; Γ{r : hlock(α)iα }; Λ ` I α 6∈ Ψ, Γ, Λ
typing environment Ψ ::= ∅ | Ψ, l : τ | Ψ, α : : Lock Ψ; Γ; Λ ` α, r := newLock 0; I
(T-newLock 0)
Ψ, α : : Lock; Γ{r : hlock(α)iα }; Λ ] {α} ` I α 6∈ Ψ, Γ, Λ
Figure 9: Types
Ψ; Γ; Λ ` α, r := newLock 1; I
(T-newlock 1)
` hσ1 , . . . , τn , . . . , σn+m iα <: hσ1 , . . . , ?τn , . . . , σn+m iα Ψ; Γ ` v : hlock(α)iα Ψ; Γ{r : lock(α)}; Λ ` I α 6∈ Λ
(S-uninit) Ψ; Γ; Λ ` r := testSetLock v; I
(T-tsl)
` σ <: σ 0 ` σ 0 <: σ 00
` σ <: σ (S-ref, S-trans) Ψ; Γ ` v : hlock(α)iα Ψ; Γ; Λ ` I
` σ <: σ 00 (T-unlock)
` τ 0 <: τ Ψ; Γ; Λ ] {α} ` unlock v; I
Ψ ` n : int Ψ ` b : lock(α) Ψ `?τ : ?τ Ψ; Γ ` r : lock(α) Ψ; Γ ` v : Γ requires (Λ ] {α}) Ψ; Γ; Λ ` I
Ψ, l : τ 0 ` l : τ
(T-int,T-label,T-lock,T-uninit) Ψ; Γ; Λ ` if r = 0 jump v; I
(T-critical)
Ψ ` v: τ
Ψ; Γ ` r : Γ(r) (T-reg,T-val)
Ψ; Γ ` v : τ
Figure 11: Typing rules for instructions (thread pool
and locks) Ψ; Γ; Λ ` I
Figure 10: Typing rules for values Ψ ` v : τ and for
operands Ψ; Γ ` v : τ
f o r k e n t e r S l e e p L o c k R e g i o n −− e l s e , t r y l a t e r
all locks must have been released prior to ending the thread.
yield −− f r e e t h e p r o c e s s o r Only the thread that acquired a lock may release it (see rule
} T-unlock below); as such, allowing acquired locks to “die”
with the thread, may lead to deadlock situations (cf. [11]).
Conventional pthread mutex implementations maintain a Rule T-fork splits the permission into two sets, Λ and Λ0 :
queue of waiting threads, rather than repeatedly forking one goes with the forked thread, the other remains with the
threads. Section 7.3 sketches such an implementation. current thread. Such a scheme is crucial in the implementa-
tion of Hoare’s monitors [10], as described in Section 7.3.
5. TYPE DISCIPLINE
The syntax of types is described in Figure 9. A type of the The two rules for newLock assign a lock type to the register.
form h~σ iα describes a tuple in the heap protected by lock α. When the lock is created in the locked state, the singleton
Each type in ~σ is either initialized (τ ) or uninitialized (?τ ). type is added to the permission of the thread. Rule T-tsl
A type of the form Γ requires Λ describes a code block; a requires that the value under test holds a lock; disallowing
thread jumping into such a block must hold a register file testing a lock already held by the thread. Rule T-unlock
type Γ as well as the locks in Λ. The type lock(α) describes makes sure that only held locks are unlocked. Finally, the
a singleton lock type. jump-to-critical-region rule ensures that the current thread
holds the exact number of locks required by the target code
The type system is defined in Figures 10 to 13. block. The rule also adds the lock under test to the per-
mission of the thread. A thread is guaranteed to hold the
Typing values, Figure 10. Heap values are distinguished lock only after (conditionally) jumping to a critical region.
from operands (that include registers as well) by the form of A previous test and set lock instruction may have obtained
the sequent. Notice that lock values—0, 1—have any lock the lock, but as far as the type system goes, the thread holds
type. Also, an uninitialized value ?τ has a type ?τ ; we use the lock after a jump-to-critical-region instruction.
the same syntax for a uninitialized value (at the left of the
colon) and its type (at the right of the colon). A formula Typing memory and control instructions, Figure 12.
σ <: σ 0 allows to “forget” initializations. These rules are standard [14], except on what concerns the
locks. The rule for malloc makes sure the lock α is in scope,
Typing fork and lock instructions, Figure 11. Instruc- meaning that it must be preceded by a α, r := newLock b in-
tions are checked against a typing environment Ψ (mapping struction, in the same code block; Section 7.2 shows how to
labels to types, and type variables to the kind Lock: the overcome this limitation. The rules for load and store make
kind of singleton lock types), a register file type Γ holding sure that lock α is in the permission Λ of the thread. The
the current types of the registers, and a set Λ of lock vari- conditions regarding lock type lock( ) in rules T-malloc,
ables (the permission of the code block). T-load, and T-store ensure that locks are only created
using a newLock instruction, and manipulated with test and
Rule T-yield requires an empty permission meaning that set lock.
~ iα }; Λ ` I
Ψ, α : : Lock; Γ{r : h?τ ~τ 6= lock( ) ∀i.Ψ ` R(ri ) : Γ(ri )
(reg file, Ψ ` R : Γ )
Ψ, α : : Lock; Γ; Λ ` r := malloc [~τ ] guarded by α; I Ψ ` R: Γ
(T-malloc) ∀i.Ψ ` P (i) Ψ ` R: Γ Ψ; Γ; Λ ` I
Ψ; Γ ` v : hσ1 ..τn ..σn+m iα Ψ; Γ{r : τn }; Λ ` I Ψ`P Ψ ` hR; Λ; Ii
τn 6= lock( ) α∈Λ (processors, Ψ ` P )
Ψ; Γ; Λ ` r := v[n]; I ∀i.Ψ ` li : Γi requires Ψ ` Ri : Γi
(T-load) Ψ ` {hl1 , R1 i, . . . , hln , Rn i}
Ψ; Γ ` v : τn Ψ; Γ ` r : hσ1 ..σn ..σn+m iα τn 6= lock( ) (thread pool, Ψ ` T )
Ψ; Γ{r : hσ1 .. type(σn )..σn+m iα }; Λ ` I α ∈ Λ Ψ; Γ; Λ ` I ∀i.Ψ, α : : Lock ` vi : τi
Ψ; Γ; Λ ` r[n] := v; I Ψ ` Γ requires Λ {I} : τ Ψ, α : : Lock ` h~v iα : h~σ iα
(T-store) (heap value, Ψ ` h : τ )
Ψ; Γ ` v : τ Ψ; Γ{r : τ }; Λ ` I ∀l.Ψ ` H(l) : Ψ(l)
(T-move) (heap, Ψ ` H )
Ψ; Γ; Λ ` r := v; I Ψ`H
Ψ; Γ ` r0 : int Ψ; Γ ` v : int Ψ; Γ{r : int}; Λ ` I Ψ`H Ψ`T Ψ`P
` halt (state, ` S )
Ψ; Γ; Λ ` r := r0 + v; I ` hH; T ; P i
(T-arith)
Ψ; Γ ` r : int Ψ; Γ ` v : Γ requires Λ Ψ; Γ; Λ ` I Figure 13: Typing rules for machine states
Ψ; Γ; Λ ` if r = 0 jump v; I
(T-branch)
Ψ; Γ ` v : Γ requires Λ 4. Unlock only in possession of the lock. If ι is unlock v
(T-jump)
Ψ; Γ; Λ ` jump v and H(R̂(v)) = h iα , then α ∈ Λ.
where type(τ ) = type(?τ ) = τ . 5. Releasing the processor only without held locks. If ι is
yield, then Λ = ∅.
Figure 12: Typing rules for instructions (memory
and control flow) Ψ; Γ; Λ ` I
For races we follow Flanagan and Abadi [6]. We start by
defining the set of permissions of a machine state, by joining
the permissions of the processes with those of the threads in
Typing machine states, Figure 13. The rules should be the run pool, and with the set of unlocked locks in the heap.
easy to understand. The only remark goes to heap tuples, Remember that a permission is a set of locks, denoted by Λ.
where we make sure that all locks protecting the tuples are
in the domain of the typing environment.
Definition 1 (State permissions.).
The main result of this section follows.
LP ={Λ | P (i) = h ; Λ; i}
LT ={Λ | hl, i ∈ T and H(l) = requires Λ { }}
Theorem 1 (Subject Reduction). If ` S and S → LH ={{α | H(l) = h0iα }}
S 0 , then ` S 0 .
LhH;T ;P i =LP ∪ LT ∪ LH
L
Lhalt =22
6. TYPE SAFETY
We split the results in three categories: the standard “well-
typed machines do not get stuck” (which we omit alto-
State permissions do not shrink with reduction. The proof (a
gether), the lock discipline, and races. The lock discipline is
case analysis on the various reduction rules) shows that state
embodied in the following Lemma (cf. Figure 2).
permissions grow only due to the execution of a newLock
instruction.
Lemma 2 (Lock Discipline). Let ` H : Ψ and Ψ `
hR; Λ; (ι; )i. Lemma 3. If S → S 0 , then LS ⊆ LS 0 .
Figure 14: Extending the type system with recursive values v ::= . . . | v[~
α]
types types τ ::= . . . | ∀[~
α].(Γ requires Λ) | . . .
Additional rules
Accessing the heap. A processor of the form hR; ; (ι; )i
accesses heap H at location l, if ι is of the form v[ ] := α].(Γ0 requires Λ)
Ψ; Γ ` v : ∀[~
(T-val-app)
or of the form := v[ ], and l = H(R̂(v)). ~ : ∀[].(Γ0 [β/~
Ψ; Γ ` v[β] ~ α] requires Λ[β/~
~ α])
[5] C. Flanagan and M. Abadi. Object types against [21] A. S. Tanenbaum. Modern Operating Systems.
races. In International Conference on Concurrency Prentice Hall, 2001.
Theory, volume 1664 of LNCS, pages 288–303.
Springer-Verlag, 1999.