A Multithreaded Typed Assembly Language

A Multithreaded Typed Assembly Language∗
Vasco T. Vasconcelos Francisco Martins

Department of Informatics Departments of Mathematics
Faculty of Sciences University of Azores, Portugal
University of Lisbon, Portugal
fmartins@di.fc.ul.pt
vv@di.fc.ul.pt
ABSTRACT extremely low level [..] Within any CMP (chip multiproces-
We present an assembly language targeted at shared mem- sors) with a shared on-chip cache memory, however, each
ory multiprocessors, where CPU cores synchronize via locks, communication event typically takes just a handful of pro-
acquired with a traditional test and set lock instruction. We cessor cycles [..] Programmers must still divide their work
show programming examples taken from the literature on into parallel threads, but do not need to worry nearly as
Operating Systems, and discuss a typing system that en- much about ensuring that these threads are highly indepen-
forces a strict protocol on lock usage and that prevents race dent, since communication is relatively cheap. This is not a
conditions. complete panacea, however, because programmers must still
structure their inter-thread synchronization correctly, or the
1. MOTIVATION program may generate incorrect results or deadlock.”
The need for fast information processing is one of the driv-
ing forces in the advancement of technology in general, and This work is about language support to help correctly struc-
of computers in particular. Since the early steps in com- turing inter-thread synchronization in CMPs. We design
puting, when the ENIAC performed 300 operations per sec- a simple abstract CMP and present its programming lan-
ond, a huge strode has been made towards the computing guage: a conventional typed assembly language [16] extended
power of nowadays machines. Nevertheless, the computing with a notion of locks [6], and a fork primitive. A type sys-
challenges have increased even faster, and the demands, for tem enforces a policy of lock usage, making sure that, within
instance, from the astronomical community trying to probe a thread, locks are created, locked, the shared memory ac-
the universe or from the biological community trying to un- cessed, and unlocked.
derstand the human genome, constantly take current com-
puting power to the limit. What directions may we follow Our type system closely follows the tradition of typed as-
to increase this computing power? Olukotun and Hammond sembly languages [14, 15, 16], extended with support for
write: threads and locks, following Flanagan and Abadi [6]. With
respect to [6], however, our work is positioned at a much
lower abstraction level, and faces different challenges inher-
With the exhaustion of essentially all performance ent to non-lexical scoped languages.
gains that can be achieved for “free” with tech-
nologies such as superscalar dispatch and pipelin- Lock primitives have been discussed in the context of con-
ing, we are now entering an era where program- current object calculi [5], JVM [7, 8, 11, 12], C- - [19], but
mers must switch to more parallel programming not in that of typed assembly languages. In a typed set-
models in order to exploit multiprocessors ef- ting, were programs are guaranteed not to suffer from race
fectively, if they desire improved single-program conditions, we
performance. [17].
• syntactically decouple of the lock and unlock opera-
Continuing, we read “Previously it was necessary to min- tions on what one usually finds unified in a single syn-
imize communication between independent threads to an tactic construct in high-level languages: Birrel’s lock-
∗This work as partially supported by the European Union do-end construct [2], used under different names (sync,
under the IST Integrated Project “Software Engineering for synchronized-in, lock-in) in a number of other works,
Service-Oriented Overlay Computers”. including the Java programming language [6, 5, 7, 8,
4, 3, 9];
• allow for the lock acquisition/release in schemes other
than the nested discipline imposed by the lock-do-end
construct;
• allow to fork threads holding locks.
TV 06 Seattle, USA We describe the architecture of our CMP and its lock dis-
cipline (enforced by the type system) in Section 2. After,
(1) (1)
CPU core 1 CPU core N α, r1 := newLock 0 α, r1 := newLock 1

r2 := malloc !int"ˆα r2 := malloc !int"ˆα
registers registers α!∈Λ α∈Λ
instruction instruction α∈Λ α∈Λ
cache cache (2) r3 := testSetLock r1 (3) r2 [0] := 7 (4) (5)

unlock r1 yield
if r3 = 0 jump critical r4 := r2 [0]
α!∈Λ
Figure 2: The Lock discipline.

Figure 2: The Lock discipline.
run pool heap
The below code block actively tries to acquire lock α (supplied in register r1 );
on success it transfers control to a critical region. The code block expects the
Figure 1: The architecture of the multiprocessor registers | . .tuple
r ::= rr01. The
(address of the) tuple in register . | plays
rR no rôle in the code block;
it isinteger
only intended to be passed to the critical region.
values n ::= . . . | -1 | 0 | 1 | . . .
1
e n t e r S p i n L o c k R e g i o n (r0 : Tuple , r1 : ! l o c k (α) " ˆα ; ) {

lock values
r2 := t e s t S e t L o c k r1
b ::= 0 | −− 1 t r y to a c q u i r e lock α
ivalues
f r2 = 0 jump c r i t v l R e gri o n| −−
i c a::= n |i fb so| , l e|n?τ
ter c r i t i c a l region
we present the syntax and the operational semantics of MIL jump e n t e r S p i n L o c k R e g i o n −− a c t i v e l y w a i t f o r t h e l o c k
(Section 3) and sketch the programming model (Section 4), } instructions ι ::=
by discussing well-known examples from the literature on A critical
control region
flowstores a value in the tuple,
r := v | rreleases
:= r + thevlock,
| and terminates.
Operating Systems, namely the enforcement of mutual ex- Notice that the type for the critical region expects the current thread to hold
lock α, as indicated by α after theifsemicolon
r = v jump
in the v |
signature of the code block.
clusion with locks. Section 5 presents a type discipline to
enforce the absence of race conditions in our language. Type After memory
updating the tuple, lock α ris:= released,
mallocmarking the endby
[~τ ] guarded of α
the |critical
region. Also, since the thread terminates (instruction yield ), the type system
safety is discussed in Section 6. Some extensions to MIL and r
enforces the release of all held locks. := v[n] | r[n] := v |
its type system are discussed in Section 7. The closing sec-
c r i t i clock
a l R e g i o n (r0 : Tuple , r1 : !α,
l o crk := " ˆα ; α) { b |
(α) newLock
tion discusses the related work, summarizes the conclusions, r0 [ 0 ] := 10 −− c h a n g e t h e t u p l e
and points future directions to extend our research. u n l o c k r1 r −− r e l e a s e t h e l v
:= testSetLock o c k| unlock v |
y i efork
ld −− v
fork r e t u r n c o n t r o l to s c h e d u l e r
}
inst. sequences I ::= ι; I | jump v | yield
2. THE ARCHITECTURE OF THE MUL- Finally, a main code block creates a new (unlocked) lock, allocates a tuple in
the heap, and tries to acquire lock α by transferring control to enterSpinLockRegion.
TIPROCESSOR AND ITS LOCK USAGE main ( ; ) { Figure 3: Instructions
POLICY α , r1 := newLock 0 −− c r e a t e mutex l o c k
The architecture of the machine is described in Figure 1. 1 Inorder to make code blocks reusable in different contexts, we need to abstract the type
It comprises a series of processor cores and a main mem- of the tuple. For the sake of simplicity and clarity, in this paper we do not make use of
existential types. The interested reader may refer to [8] for examples illustrating continuation
ory. Each processor core owns a number of registers and passing style using existential types.
vides locks to protect tuples in the heap. A standard “test
an instruction cache. The main memory is divided into two
and set lock” instruction is used to obtain a lock, thus al-
parts: a conventional heap storing data and code, and a run 4
lowing entering a critical region. Running threads read and
pool storing suspended threads.
write from the shared heap via conventional load and store
instructions. The policy for the usage of locks (enforced by
Threads run in processor cores. When the number of threads
the type system) by a given processor is depicted in Fig-
is larger than the number of available processor cores, part
ure 2, where α denotes a lock singleton type and Λ the set
of the threads is placed in the run pool. For each suspended
of locks held by the processor (the processor’ permission).
thread, the run pool stores a pair comprising (a) a pointer to
the heap, where the thread’s code fragment resides, and (b)
a register file (a mapping from registers to values) containing
the initial state of the processor. 3. SYNTAX AND OPERATIONAL SEMAN-
TICS
New threads are placed in the run pool via a dedicated fork The syntax of our language is described by the grammar in
instruction. Running threads relinquish the processor by Figures 3, 4, and 9. We defer the treatment of types until
explicitly executing a yield instruction; there is no otherwise Section 5. We rely on a set of heap labels ranged over by
machine-wide thread suspension mechanism—our machine l, and a disjoint set of type variables ranged over by α, β.
fits in the cooperative thread model, according to Tanenbaum Our machine is parameterized by two values: the number
terminology [21] (we are working at an abstraction level be- of processors N in the machine, and the number of registers
low that of the operating system, where one usually finds per processor R.
preemptive models). Freed processors look for work in the
run pool. A pair is selected and removed from the pool; Instructions, Figure 3. Most of the proposed instruc-
registers are loaded from the pair’s second component (a tions are standard in assembly languages. Instructions are
register file), and control is transferred to the code pointed organized as basic blocks, that is sequences of instructions
by the pair’s first component (a label). ending with a jump or a yield. The yield instruction releases
the processor, allowing it to fetch another thread from the
To provide for inter-thread synchronization the machine pro- run pool.
lock sets Λ ::= α1 , . . . , αn P (i) = hR; Λ; (α, r := newLock 0; I)i l 6∈ dom(H), β fresh
register files R ::= {r1 : v1 , . . . , rR : vR } hH; T ; P i → hH{l : h0iβ }; T ; P {i : hR{r : l}; Λ; I[β/α]i}i
(R-newLock 0)
processor p ::= hR; Λ; Ii
P (i) = hR; Λ; (α, r := newLock 1; I)i l 6∈ dom(H), β fresh
processors array P ::= {1 : p1 , . . . , N : pN }
hH; T ; P i → hH{l : h1iβ }; T ; P {i : hR{r : l}; Λ ] {β}; I[β/α]i}i
thread pool T ::= {hl1 , R1 i, . . . , hln , Rn i} (R-newlock 1)
heap values h ::= hv1 . . . vn iα | τ {I} P (i) = hR; Λ; (r := testSetLock v; I)i R̂(v) = l H(l) = h0iα
heaps H ::= {l1 : h1 , . . . , ln : hn } hH; T ; P i → hH{l : h1iα }; T ; P {i : hR{r : 0}; Λ ] {α}; Ii}i
states S ::= hH; T ; P i | halt (R-tsl 0)
P (i) = hR; Λ; (r := testSetLock v; I)i H(R̂(v)) = h1iα
Figure 4: Abstract machine hH; T ; P i → hH; T ; P {i : hR{r : 1}; Λ; Ii}i
(R-tsl 1)
P (i) = hR; Λ ] {α}; (unlock v; I)i R̂(v) = l H(l) = h iα
∀i.P (i) = h ; ; yieldi
(R-halt) hH; T ; P i → hH{l : h0iα }; T ; P {i : hR; Λ; Ii}i
h ; ∅; P i → halt
(R-unlock)
P (i) = h ; ; yieldi H(l) = requires Λ {I}
hH; T ] {hl, Ri}; P i → hH; T ; P {i : hR; Λ; Ii}i
(R-schedule) Figure 6: Operational semantics (locks)
P (i) = hR; Λ ] Λ0 ; (fork v; I)i
R̂(v) = l H(l) = requires Λ { } P (i) = hR; Λ; (r := malloc [~τ ] guarded by α; I)i l 6∈ dom(H)
(R-fork)
hH; T ; P i → hH; T ∪ {hl, Ri}; P {i : hR; Λ0 ; Ii}i ~ iα }; T ; P {i : hR{r : l}; Λ; Ii}i
hH; T ; P i → hH{l : h?τ
(R-malloc)
Figure 5: Operational semantics (thread pool) P (i) = hR; Λ; (r := v[n]; I)i H(R̂(v)) = hv1 ..vn ..vn+m iα
hH; T ; P i → hH; T ; P {i : hR{r : vn }; Λ; Ii}i
(R-load)
Machines, Figure 4. Machines can be in two states: P (i) = hR; Λ; (r[n] := v; I)i
halted or running. In the latter case, they comprise a heap, R(r) = l H(l) = hv1 ..vn ..vn+m iα
a thread pool, and a processor array. Heaps are maps from
hH; T ; P i → hH{l : hv1 ..R̂(v)..vn+m iα }; T ; P {i : hR; Λ; Ii}i
heap labels into heap values that can be either tuples or
(R-store)
code blocks. Tuples are vectors of values protected by some
lock α (locks play no role at runtime; they are needed only
for Subject Reduction and Type Safety). Code blocks com- Figure 7: Operational semantics (memory)
prise a signature and a body: the signature, which the type
system makes sure is of the form Γ requires Λ, describes the
types of each register and the locks held by the processor
when jumping to the block code; the body is a sequence of while the rest remains in the thread.
instructions.
Lock instructions, Figure 6. The newLock instructions
A thread pool is a multiset of pairs, each of which contains create new locks, in a locked or unlocked state. The scope
a pointer to a code block and a register file. A processor of α is the rest of the code block. A tuple—h0iβ in the for-
array contains N processors (recall that N is a parameter mer case; h1iβ in the latter—is allocated in the heap, and
to the machine). Each processor is composed of a register register r is made to point to it. A fresh type variable β re-
file (of fixed length R, the other parameter to the machine), places the variable α chosen by the programmer. When the
a set of locks (the locks held by the thread running in the lock is created in the locked state, the new lock variable β
processor), and a sequence of instructions. is added to the set of the locks held by the processor. Locks
apart, an instruction α, r := newLock b behaves as the pair
The operational semantics is defined via a reduction relation of instructions r := malloc hlock(α)iα ; r[0] := b. Instruc-
on machine states, as described in Figures 5 to 8. tion r := testSetLock v is the Test and Set Lock present in
many machines designed with multiple processes in mind.
Thread pool instructions, Figure 5. Rule R-halt stops It reads the contents of the memory word v into register r
the machine when it finds an empty thread pool and all and then stores 1 at the memory address v. When a locked
processors idle. Otherwise, if there is an idle processor and lock is found at the memory address, its lock variable α is
a pair in the thread pool, then rule R-schedule assigns a added to the permissions of the processor. As usual, the
new thread to the processor. A pair label-registers is taken two operations (load and store) are indivisible—no other
from the pool; the instructions for, and the locks held by, processor can access the memory word until the instruction
the new thread are read from the code block addressed by is finished; our operational semantics enforces this behavior.
the label; the initial value of the registers are read from the Locks apart, an instruction unlock v behaves as r[0] := 0.
pair. Rule R-fork places a new thread in the thread pool. Rule R-unlock, however, makes sure the processor holds
Notice that part of the held locks go with the forked thread, the lock.
is only intended to be passed to the critical region.1
P (i) = hR; Λ; jump vi H(R̂(v)) = {I} e n t e r S p i n L o c k R e g i o n (r1 : Tuple , r2 : h l o c k (α) i ˆα) {
(R-jump) r3 := t e s t S e t L o c k r2 −− t r y t o a c q u i r e l o c k α
hH; T ; P i → hH; T ; P {i : hR; Λ; Ii}i i f r3 = 0 jump c r i t i c a l R e g i o n −− i f s o . . .
P (i) = hR; Λ; (r := v; I)i jump e n t e r S p i n L o c k R e g i o n −− e l s e w a i t
(R-move) }
hH; T ; P i → hH; T ; P {i : hR{r : R̂(v)}; Λ; Ii}i
P (i) = hR; Λ; (r := r0 + v; I)i
A critical region stores a value in the tuple, releases the lock,
hH; T ; P i → hH; T ; P {i : hR{r : R(r0 ) + R̂(v)}; Λ; Ii}i
and terminates. Notice that the type for the critical region
(R-arith)
expects the current thread to hold lock α, as indicated by α
P (i) = hR; Λ; (if r = v jump v 0 ; )i after the semicolon in the signature of the code block. After
R(r) = v H(R̂(v 0 )) = {I} updating the tuple, lock α is released, marking the end of
(R-branchT) the critical region.
hH; T ; P i → hH; T ; P {i : hR; Λ; Ii}i
P (i) = hR; Λ; (if r = v jump ; I)i R(r) 6= v c r i t i c a l R e g i o n (r1 : Tuple , r2 : h l o c k (α) i ˆα)
hH; T ; P i → hH; T ; {i : hR; Λ; Ii}i requires α {
r1 [ 0 ] := 10 −− u p d a t e t h e t u p l e
(R-branchF)
u n l o c k r2 −− r e l e a s e t h e l o c k
jump c o n t i n u a t i o n −− c o n t i n u e
}
Figure 8: Operational semantics (control flow)

Finally, the main code block creates a new (unlocked) lock,
allocates a tuple in the heap, and tries to acquire lock α by
transferring control to enterSpinLockRegion.
Memory instructions, Figure 7. The rule for malloc main ( ) {
allocates a new tuple in the heap, protected by a given lock, α , r2 := newLock 0 −− c r e a t e µ t e x l o c k
and makes register r point to it. The size of the tuple is that r0 := m a l l o c [ i n t ] ˆ α −− a l l o c a t e t u p l e
of sequence of types [~τ ], its values are uninitialized values. jump e n t e r S p i n L o c k R e g i o n −− t r y a c q u i r e l o c k
The rules for loading and storing are standard [14]. }
Control flow instructions, Figure 8. These transition

rules are mostly straightforward [14]. They rely on the eval 4.2 Mutual exclusion using threads
function that works on operands (that is, registers or values), We now discuss a distinct, yet classical, approach to lock ac-
by looking for values in registers. quisition. The idea is to avoid actively waiting for the lock,
by launching a thread that tries to acquire the lock. If it
(
R(v) if v is a register succeeds, control is transferred to the critical region. Oth-
R̂(v) = erwise, the thread forks another thread that tries to gather
v otherwise
the lock later, thus avoiding a busy wait.
The code and type the for criticalRegion is that of the previ-
4. INTERPROCESSOR COMMUNICATION ous example. The code for main is identical, apart from the
This section presents the main concepts of our language, last instruction that is replaced by jump enterSleepLockRegion.
based on classical examples taken from the literature on Op- We discuss how to acquire lock α by forking a new thread.
erating Systems [21]. The following examples illustrate the
usage of threads and the discipline of locks. To fork a thread we use instruction fork and specify the
label of the code block that should be run when a processor
is available. Upon thread starting, registers are loaded with
4.1 Mutual exclusion using busy waiting the values they had when the fork action happened. In the
We start with a code block that actively tries to acquire a present case, register r1 should contain the address of the
lock. This technique, called spin lock, is used when there is tuple and register r2 the lock to be acquired.
a reasonable expectation that the lock will be available in a
short period of time. When enterSleepLockRegion fails to acquire the lock, it creates
a new thread (that will try to obtain the lock later) and ter-
Let Tuple stands for type h int iˆα, the type of tuples with one minates the current one, rather than jumping to the begin-
integer component, protected by lock α. In order to read or ning of the code block, as it happens with enterSpinLockRegion.
to write values in a heap location of type Tuple, the thread
e n t e r S l e e p L o c k R e g i o n (r1 : Tuple , r2 : h l o c k (α) i ˆα) {
must hold lock α. In this example, we allocate a tuple, r3 := t e s t S e t L o c k r2 −− t r y t o a c q u i r e l o c k
acquire the lock, and store a value in the tuple (within a i f r3 = 0 jump c r i t i c a l R e g i o n −− i f so , . . .
critical region where we hold the lock). 1
In order to make code blocks reusable in different contexts,
The below code block actively tries to acquire lock α (sup- we need to abstract the type of the tuple. For the sake of
simplicity and clarity, in this paper we do not make use of
plied in register r2 ); on success it transfers control to a criti- existential types. The interested reader may refer to [13] for
cal region. The code block expects the (address of the) tuple examples illustrating continuation passing style using exis-
in register r1 . The tuple plays no rôle in the code block; it tential types.
types τ ::= int | h~σ iα | Γ requires Λ | Ψ; Γ; ∅ ` yield (T-yield)
lock(α) Ψ; Γ ` v : Γ requires Λ Ψ; Γ; Λ0 ` I
(T-fork)
init types σ ::= τ | ?τ Ψ; Γ; Λ ] Λ0 ` fork v; I
register file types Γ ::= r1 : τ1 , . . . , rn : τn Ψ, α : : Lock; Γ{r : hlock(α)iα }; Λ ` I α 6∈ Ψ, Γ, Λ
typing environment Ψ ::= ∅ | Ψ, l : τ | Ψ, α : : Lock Ψ; Γ; Λ ` α, r := newLock 0; I
(T-newLock 0)
Ψ, α : : Lock; Γ{r : hlock(α)iα }; Λ ] {α} ` I α 6∈ Ψ, Γ, Λ
Figure 9: Types
Ψ; Γ; Λ ` α, r := newLock 1; I
(T-newlock 1)
` hσ1 , . . . , τn , . . . , σn+m iα <: hσ1 , . . . , ?τn , . . . , σn+m iα Ψ; Γ ` v : hlock(α)iα Ψ; Γ{r : lock(α)}; Λ ` I α 6∈ Λ
(S-uninit) Ψ; Γ; Λ ` r := testSetLock v; I
(T-tsl)
` σ <: σ 0 ` σ 0 <: σ 00
` σ <: σ (S-ref, S-trans) Ψ; Γ ` v : hlock(α)iα Ψ; Γ; Λ ` I
` σ <: σ 00 (T-unlock)
` τ 0 <: τ Ψ; Γ; Λ ] {α} ` unlock v; I
Ψ ` n : int Ψ ` b : lock(α) Ψ `?τ : ?τ Ψ; Γ ` r : lock(α) Ψ; Γ ` v : Γ requires (Λ ] {α}) Ψ; Γ; Λ ` I
Ψ, l : τ 0 ` l : τ
(T-int,T-label,T-lock,T-uninit) Ψ; Γ; Λ ` if r = 0 jump v; I
(T-critical)
Ψ ` v: τ
Ψ; Γ ` r : Γ(r) (T-reg,T-val)
Ψ; Γ ` v : τ
Figure 11: Typing rules for instructions (thread pool
and locks) Ψ; Γ; Λ ` I
Figure 10: Typing rules for values Ψ ` v : τ and for
operands Ψ; Γ ` v : τ
f o r k e n t e r S l e e p L o c k R e g i o n −− e l s e , t r y l a t e r
all locks must have been released prior to ending the thread.
yield −− f r e e t h e p r o c e s s o r Only the thread that acquired a lock may release it (see rule
} T-unlock below); as such, allowing acquired locks to “die”
with the thread, may lead to deadlock situations (cf. [11]).
Conventional pthread mutex implementations maintain a Rule T-fork splits the permission into two sets, Λ and Λ0 :
queue of waiting threads, rather than repeatedly forking one goes with the forked thread, the other remains with the
threads. Section 7.3 sketches such an implementation. current thread. Such a scheme is crucial in the implementa-
tion of Hoare’s monitors [10], as described in Section 7.3.
5. TYPE DISCIPLINE
The syntax of types is described in Figure 9. A type of the The two rules for newLock assign a lock type to the register.
form h~σ iα describes a tuple in the heap protected by lock α. When the lock is created in the locked state, the singleton
Each type in ~σ is either initialized (τ ) or uninitialized (?τ ). type is added to the permission of the thread. Rule T-tsl
A type of the form Γ requires Λ describes a code block; a requires that the value under test holds a lock; disallowing
thread jumping into such a block must hold a register file testing a lock already held by the thread. Rule T-unlock
type Γ as well as the locks in Λ. The type lock(α) describes makes sure that only held locks are unlocked. Finally, the
a singleton lock type. jump-to-critical-region rule ensures that the current thread
holds the exact number of locks required by the target code
The type system is defined in Figures 10 to 13. block. The rule also adds the lock under test to the per-
mission of the thread. A thread is guaranteed to hold the
Typing values, Figure 10. Heap values are distinguished lock only after (conditionally) jumping to a critical region.
from operands (that include registers as well) by the form of A previous test and set lock instruction may have obtained
the sequent. Notice that lock values—0, 1—have any lock the lock, but as far as the type system goes, the thread holds
type. Also, an uninitialized value ?τ has a type ?τ ; we use the lock after a jump-to-critical-region instruction.
the same syntax for a uninitialized value (at the left of the
colon) and its type (at the right of the colon). A formula Typing memory and control instructions, Figure 12.
σ <: σ 0 allows to “forget” initializations. These rules are standard [14], except on what concerns the
locks. The rule for malloc makes sure the lock α is in scope,
Typing fork and lock instructions, Figure 11. Instruc- meaning that it must be preceded by a α, r := newLock b in-
tions are checked against a typing environment Ψ (mapping struction, in the same code block; Section 7.2 shows how to
labels to types, and type variables to the kind Lock: the overcome this limitation. The rules for load and store make
kind of singleton lock types), a register file type Γ holding sure that lock α is in the permission Λ of the thread. The
the current types of the registers, and a set Λ of lock vari- conditions regarding lock type lock( ) in rules T-malloc,
ables (the permission of the code block). T-load, and T-store ensure that locks are only created
using a newLock instruction, and manipulated with test and
Rule T-yield requires an empty permission meaning that set lock.
~ iα }; Λ ` I
Ψ, α : : Lock; Γ{r : h?τ ~τ 6= lock( ) ∀i.Ψ ` R(ri ) : Γ(ri )
(reg file, Ψ ` R : Γ )
Ψ, α : : Lock; Γ; Λ ` r := malloc [~τ ] guarded by α; I Ψ ` R: Γ
(T-malloc) ∀i.Ψ ` P (i) Ψ ` R: Γ Ψ; Γ; Λ ` I
Ψ; Γ ` v : hσ1 ..τn ..σn+m iα Ψ; Γ{r : τn }; Λ ` I Ψ`P Ψ ` hR; Λ; Ii
τn 6= lock( ) α∈Λ (processors, Ψ ` P )
Ψ; Γ; Λ ` r := v[n]; I ∀i.Ψ ` li : Γi requires Ψ ` Ri : Γi
(T-load) Ψ ` {hl1 , R1 i, . . . , hln , Rn i}
Ψ; Γ ` v : τn Ψ; Γ ` r : hσ1 ..σn ..σn+m iα τn 6= lock( ) (thread pool, Ψ ` T )
Ψ; Γ{r : hσ1 .. type(σn )..σn+m iα }; Λ ` I α ∈ Λ Ψ; Γ; Λ ` I ∀i.Ψ, α : : Lock ` vi : τi
Ψ; Γ; Λ ` r[n] := v; I Ψ ` Γ requires Λ {I} : τ Ψ, α : : Lock ` h~v iα : h~σ iα
(T-store) (heap value, Ψ ` h : τ )
Ψ; Γ ` v : τ Ψ; Γ{r : τ }; Λ ` I ∀l.Ψ ` H(l) : Ψ(l)
(T-move) (heap, Ψ ` H )
Ψ; Γ; Λ ` r := v; I Ψ`H
Ψ; Γ ` r0 : int Ψ; Γ ` v : int Ψ; Γ{r : int}; Λ ` I Ψ`H Ψ`T Ψ`P
` halt (state, ` S )
Ψ; Γ; Λ ` r := r0 + v; I ` hH; T ; P i
(T-arith)
Ψ; Γ ` r : int Ψ; Γ ` v : Γ requires Λ Ψ; Γ; Λ ` I Figure 13: Typing rules for machine states
Ψ; Γ; Λ ` if r = 0 jump v; I
(T-branch)
Ψ; Γ ` v : Γ requires Λ 4. Unlock only in possession of the lock. If ι is unlock v
(T-jump)
Ψ; Γ; Λ ` jump v and H(R̂(v)) = h iα , then α ∈ Λ.
where type(τ ) = type(?τ ) = τ . 5. Releasing the processor only without held locks. If ι is
yield, then Λ = ∅.
Figure 12: Typing rules for instructions (memory
and control flow) Ψ; Γ; Λ ` I
For races we follow Flanagan and Abadi [6]. We start by
defining the set of permissions of a machine state, by joining
the permissions of the processes with those of the threads in
Typing machine states, Figure 13. The rules should be the run pool, and with the set of unlocked locks in the heap.
easy to understand. The only remark goes to heap tuples, Remember that a permission is a set of locks, denoted by Λ.
where we make sure that all locks protecting the tuples are
in the domain of the typing environment.
Definition 1 (State permissions.).
The main result of this section follows.
LP ={Λ | P (i) = h ; Λ; i}
LT ={Λ | hl, i ∈ T and H(l) = requires Λ { }}
Theorem 1 (Subject Reduction). If ` S and S → LH ={{α | H(l) = h0iα }}
S 0 , then ` S 0 .
LhH;T ;P i =LP ∪ LT ∪ LH
L
Lhalt =22
6. TYPE SAFETY
We split the results in three categories: the standard “well-
typed machines do not get stuck” (which we omit alto-
State permissions do not shrink with reduction. The proof (a
gether), the lock discipline, and races. The lock discipline is
case analysis on the various reduction rules) shows that state
embodied in the following Lemma (cf. Figure 2).
permissions grow only due to the execution of a newLock
instruction.
Lemma 2 (Lock Discipline). Let ` H : Ψ and Ψ `
hR; Λ; (ι; )i. Lemma 3. If S → S 0 , then LS ⊆ LS 0 .
1. Before lock creation, α is not a known lock. If ι is

We are interested only in mutual exclusive states, that is,
α, := newLock , then α 6∈ dom(Ψ).
states whose permissions do not “overlap.” Also, we say,
2. Before test and set lock, the processor does not hold that a state has a race condition if it contains two processors
the lock. If ι is := testSetLock v and H(R̂(v)) = trying to access the heap at the same location.
hlock(α)i , then α 6∈ Λ.
3. Before accessing the heap, the processor holds the lock. Definition 2. Mutual exclusive states. halt is mutual
If ι is v[ ] := or := v[ ], and H(R̂(v)) = h iα , then exclusive; S 6= halt is mutual exclusive when i 6= j im-
α ∈ Λ. plies Λi ∩ Λj = ∅, for all Λi , Λj ∈ LS .
types τ ::= . . . | µX.τ | X Extended syntax
Figure 14: Extending the type system with recursive values v ::= . . . | v[~
α]
types types τ ::= . . . | ∀[~
α].(Γ requires Λ) | . . .
Additional rules
Accessing the heap. A processor of the form hR; ; (ι; )i
accesses heap H at location l, if ι is of the form v[ ] := α].(Γ0 requires Λ)
Ψ; Γ ` v : ∀[~
(T-val-app)
or of the form := v[ ], and l = H(R̂(v)). ~ : ∀[].(Γ0 [β/~
Ψ; Γ ` v[β] ~ α] requires Λ[β/~
~ α])
Race condition. A state S has a race condition if S = Changed rules

hH; ; P i and there exist i and j distinct such that P (i)
and P (j) both access heap H at some location l. Ψ; Γ ` v : ∀[].(Γ requires Λ)
(T-jump)
Ψ; Γ; Λ ` jump v
Ψ, α~ : : Lock; Γ; Λ ` I
(heap value)
Theorem 4 (Types against races). If ` S and S is Ψ ` Γ requires Λ {I} : ∀[~α].(Γ requires Λ)
mutual exclusive, then S does not have a race condition.
Rules T-fork, T-branch, and T-critical undergo
changes similar to rule T-jump.
7. EXTENSIONS
In this section we discuss the extensions to the language
required to implement Hoare’s monitors [10]. Figure 15: Extending the simple type system with
universal types
7.1 Recursive types
Condition variables in monitors are represented by queues.
Queues are usually represented by lists, where each node The extension follows Morrisett et al. [14], and is described
refers to the next node in the list, yielding a recursive data in Figure 15, where we omit the obvious syntactic adjust-
structure. ment to the reduction rules. The interesting facts to notice
is that when forking or jumping to a code block (e.g. rule T-
Extending the language to include recursive types is jump) all lock variables must have been instantiated using
straightforward. Figure 14 summarizes the changes to the value application v[~α]; and that, when typing a code block
syntax. The µ operator is a binder, giving rise, in the (rule heap value), the abstracted lock types α~ are added to
standard way, to notions of bound and free variables and the typing environment Ψ, and may then be used to protect
alpha-equivalence. We do not distinguish between alpha- heap tuples (cf. rule R-malloc in Figure 12).
convertible types. Furthermore, we take an equi-recursive
view of types [18], not distinguishing between a type µX.τ
and its unfolding τ [µX.τ /X].2 7.3 Hoare’s Monitors
We focus on the implementation of conditions, in particu-
7.2 Polymorphism over lock types lar the primitives for creation of a new condition, wait , and
We would like to protect the queue, and all the nodes in it signal .
with the monitor’s lock. The problem is that scope of a lock
ranges from its creation to the end of the code block where A condition is represented as an (initially empty) queue of
it was created, disallowing manipulating the condition (for closures representing the threads that are currently wait-
example with the wait and signal primitives) in distinct code ing on the condition. To simplify the example, we use the
blocks. code address only, omitting the environment. The queue
is protected by the monitor’s lock, which we call m. Type
Polymorphism over lock types allows writing code blocks Condition is h int , hlock(m)iˆm, Node, Nodeiˆm, where the first
parametric on the locks they require, an in particular makes component is the size of the queue, thereafter is the moni-
it possible to protect a heap tuple with an abstracted lock. tor’s lock, and finally two references for the head and tail of
This is particularly useful for algorithms that prefer to pro- the queue. Each node is a pair formed by a Code (that we
tect all nodes of a list with the same lock, rather than using leave unspecified) and a reference for the next node in the
a different lock for each node, as we show in the next sec- queue, yielding the type µ x. hCode, Xiˆm.
tion. This extension makes it possible to protect new heap
tuples with a lock created in a different code block. We adopt a Continuation-Passing Style [1], meaning that
code blocks are passed the address of the continuation a
2
For recursive code blocks, Morrisett et al. [14] introduced a register. The following code block accepts in register r1 the
mechanism allowing to “forget” register types when entering monitor’s lock, in register r2 the continuation, and requires
a code block. Using this technique and by carefully choosing
the register to hold the continuation address one may avoid that the thread holds lock m. It then creates a new condi-
using recursive types for recursive procedures. Recursive tion variable and passes it to the continuation in register r3 .
types, however, come into play in the presence of recursive Notice that the code block is parametric in lock m, allow-
datatypes. ing to protect the queue’s descriptor and sentinel node with
lock m. unlockAndGo ∀ [m] (r1 : C o n d i t i o n , r2 : Code )
requires m {
n e w C o n d i t i o n ∀ [m ] . ( r1 : h l o c k (m) i ˆm, r2 : Code ) u n l o c k r1 [ 1 ]
requires m { jump r2
−− a l l o c a t e t h e s e n t i n e l }
r4 := m a l l o c [ C l o s u r e , Node ] g u a r d e d by m
−− a l l o c a t e t h e queue h e a d e r
r3 := m a l l o c [ i n t , h l o c k (m) i ˆm, Node , Node ]
g u a r d e d by m 7.4 Existential quantification over lock types
r3 [ 0 ] := 0 ; r3 [ 1 ] := r1 −− s t o r e s i z e and l o c k The introduction of universal types over locks allows for the
r3 [ 2 ] := r4 ; r3 [ 3 ] := r4 −− s t o r e head and t a i l construction of intricate data structures where part of the
jump r2 −− jump t o c o n t i n u a t i o n nodes may be protected by different locks, while others may
}
share locks. However, this extra facility in lock manipula-
tion is useful as long as locks are passed around between
The wait operation is issued from inside a monitor (repre- code blocks, when jumping or forking. Unfortunately, the
sented by requiring lock m in the code block type for the op- technique becomes impracticable if we need to propagate
eration) and causes the calling program to be delayed until a locks through successive code blocks in order to recover them
signal operation occurs. Waiting on a condition amounts to later, specially if the intermediate code blocks do not use the
enqueue the continuation of the wait operation (represented locks. An alternative is to store locks in the heap and re-
by the Code type), needed when the signaling operation oc- cover them later, but the language offers no facility to store
curs. The condition is passed in register r1 and the continu- and recover singleton types from the heap.
ation in register r2 . Apart from queue manipulation details,
it is important to notice that the newly created node is pro- For instance, in the monitors example, storing the singleton
tected by lock m, that the lock is released, allowing other type m in the queue itself allows different code blocks to
monitor procedures to execute, and that the current thread retrieve the lock type and enqueue a node. In fact, this code
terminates. block does not care which singleton lock type protects the
queue, it just requires that there exists such a lock. This
w a i t ∀ [m] (r1 : C o n d i t i o n , r2 : Code ) r e q u i r e s m {
−− a l l o c a t e a new s e n t i n e l
example motivates the existential quantification over lock
r3 := m a l l o c [ C l o s u r e , Node ] g u a r d e d by m types It is straightforward to incorporate existential types
−− u p d a t e t h e c u r r e n t s e n t i n e l in our type system, by following Flanagan and Abadi [6].
r4 := r1 [ 3 ] −− g e t t h e c u r r e n t s e n t i n e l Actually, we need distinct primitives to deal with existential
r4 [ 0 ] := r2 −− f i l l t h e c o n t i n u a t i o n lock types and conventional existential types (for instance
r4 [ 1 ] := r3 −− f i l l t h e s e n t i n e l to be use with CPS). The reason is that after unpacking a
r1 [ 3 ] := r3 −− s t o r e t h e new s e n t i n e l
−− i n c r e m e n t queue s i z e
value, its witness type remains abstracted, so that we can
r3 := r1 [ 0 ] ; r3 := r3+1; r1 [ 0 ] := r3 not discriminate packed lock types from other types, and
−− r e l e a s e m o n i t o r l o c k and t e r m i n a t e t h r e a d hence can not use lock operations on the witness type.
r3 := r1 [ 2 ] ; u n l o c k r3
yield
}
8. CONCLUSION AND FURTHER WORK
We presented a typed assembly language suited for an archi-
tecture where multiple CPU cores share a common memory.
A signal operation, also issued from inside a monitor, causes The language primitively includes instructions for handling
exactly one of the waiting programs to resume immediately. locks (create, acquire, and release) and for forking threads.
A signal operation must be followed immediately by resump- We provide a type system that verifies the usage of labels,
tion of a waiting program, without possibility of an interven- values, and registers according to the declared types and en-
ing procedure call from yet a third program [10]. The code sures a discipline on lock usage. Further results include a
block for signal receives the condition in register r1 and the guarantee that well-typed states do not “get stucked” and
continuation in register r2 . If the queue is empty, the signal do not incur in race conditions.
has no effect. Otherwise, queue manipulation details apart,
the key feature to focus our attention is in the fork r3 in- A compiler prototype can be found in URL [13], together
struction, where we crucially make use of the splitting lock with several examples, including extended versions of those
mechanism when launching a new thread. In this way we found in this paper. During the course of program devel-
are able to pass the lock to the new thread without unlock- opment with our language, we have faced several difficulties
ing it first. After the fork instruction the current thread no that should be addressed:
longer has lock m.
s i g n a l ∀ [m] (r1 : C o n d i t i o n , r2 : Code ) r e q u i r e s m {
Exclusive-lock and shared-lock. It would be desirable
r3 := r3 [ 0 ]
i f r3 = 0 jump unlockAndGo to have two levels of locking: exclusive locking does
−− d e q u e u e not allow for any readings; shared-locking permit mul-
r3 := r1 [ 2 ] ; r4 := r3 [ 2 ] ; r1 [ 2 ] := r4 tiple readers. We think it could be possible to incorpo-
−− d e c r e m e n t queue s i z e rate this requirements using two set of held locks (held
r4 := r1 [ 0 ] ; r4 := r4 − 1 ; r1 [ 0 ] := r4 exclusive and held shared) in the type system, adjust
r3 := r3 [ 0 ] −− r e s t o r e t h e w a i t i n g t h r e a d
conveniently rules T-load and T-store in Figure 12,
f o r k r3 −− l a u n c h a t h r e a d i n h e r i t i n g m!
jump r2 −− jump c o n t i n u a t i o n ( no l o c k m) having two test and set lock primitives (for exclusive
} and shared locking), and using three-valued locks in
the operation semantics.
Avoid protecting every tuple in the heap. There are a [12] C. Laneve. A type system for JVM threads. Theor.
number of situations where tuples do not require lock- Comput. Sci., 290(1):741–778, 2003.
ing (for example, closures in the Continuation Passing
Style [1]). A possible approach is to treat closures as [13] Mil: A multithreaded typed assembly language.
some sort of linear types and try to follow the ideas http://labmol.di.fc.ul.pt/mil/.
expressed in [20]. Another possible direction is owner- [14] G. Morrisett. Typed assembly language. In Advanced
ship types, where owned objects are syntactically dis- Topics in Types and Programming Languages. MIT
tinguished [7, 3, 4]. press, 2005.
[15] G. Morrisett, K. Crary, N. Glew, D. Grossman,
Acknowledgments. The authors would like to thank the R. Samuels, F. Smith, D. Walker, S. Weirich, and
anonymous referees for valuable comments and pointers. S. Zdancewic. Talx86: A realistic typed assembly
language. In Second Workshop on Compiler Support
9. REFERENCES for Systems Software, Atlanta, May 1999.
[1] A. W. Appel. Compiling with continuations.
[16] G. Morrisett, D. Walker, K. Crary, and N. Glew. From
Cambridge University Press, 1992.
System F to typed assembly language. ACM
[2] A. Birrell. An introduction to programming with Transactions on Programming Languages and
threads. Technical Report 35, Digital Systems Systems, 21(3):527–568, 1999.
Research Center, Palo Alto, California, 1989. [17] K. Olukotun and L. Hammond. The future of
[3] C. Boyapati, R. Lee, and M. Rinard. A type system microprocessors. ACM Queue, 3(7):26–34, 2005.
for preventing data races and deadlocks in java [18] B. C. Pierce. Types and Programming Languages. MIT
programs. In ACM Conference on Object-Oriented Press, 2002.
Programming, Systems, Languages and Applications,
pages 211–230, 2002. [19] N. Ramsey and S. P. Jones. Featherweight
concurrency in a portable assembly language, 2000.
[4] C. Boyapati and M. Rinard. A parameterized type
system for race-free Java programs. In 16th Annual [20] F. Smith, D. Walker, and J. G. Morrisett. Alias types.
Conference on Object-Oriented Programming, In ESOP ’00: Proceedings of the 9th European
Systems, Languages, and Applications (OOPSLA), Symposium on Programming Languages and Systems,
2001. LNCS, pages 366–381. Springer-Verlag, 2000.
[5] C. Flanagan and M. Abadi. Object types against [21] A. S. Tanenbaum. Modern Operating Systems.
races. In International Conference on Concurrency Prentice Hall, 2001.
Theory, volume 1664 of LNCS, pages 288–303.
Springer-Verlag, 1999.
[6] C. Flanagan and M. Abadi. Types for safe locking. In

European Symposium on Programming, volume 1576
of LNCS, pages 91–108. Springer-Verlag, 1999.
[7] C. Flanagan and S. N. Freund. Type-based race

detection for Java. ACM SIGPLAN Notices,
35(5):219–232, 2000.
[8] C. Flanagan and S. N. Freund. Type inference against

races. In International Conference on Concurrency
Theory, volume 3148 of LNCS, pages 116–132.
Springer-Verlag, 2004.
[9] D. Grossman. Type-safe multithreading in Cyclone. In

Workshop on Types in Language Design and
Implementation, volume 38(3) of SIGPLAN Notices,
pages 13–25. ACM Press, 2003.
[10] C. A. R. Hoare. Monitors: an operating system

structuring concept. Commun. ACM, 17(10):549–557,
1974.
[11] F. Iwama and N. Kobayashi. A new type system for

JVM lock primitives. In ASIA-PEPM ’02: Proceedings
of the ASIAN symposium on Partial evaluation and
semantics-based program manipulation, pages 71–82,
New York, NY, USA, 2002. ACM Press.

A Multithreaded Typed Assembly Language

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Multithreaded Typed Assembly Language

Uploaded by

Copyright:

Available Formats

A Multithreaded Typed Assembly Language∗

Vasco T. Vasconcelos Francisco Martins

CPU core 1 CPU core N α, r1 := newLock 0 α, r1 := newLock 1

registers registers α!∈Λ α∈Λ

instruction instruction α∈Λ α∈Λ

cache cache (2) r3 := testSetLock r1 (3) r2 [0] := 7 (4) (5)

Figure 2: The Lock discipline.

e n t e r S p i n L o c k R e g i o n (r0 : Tuple , r1 : ! l o c k (α) " ˆα ; ) {

Figure 8: Operational semantics (control flow)

Control flow instructions, Figure 8. These transition

1. Before lock creation, α is not a known lock. If ι is

Race condition. A state S has a race condition if S = Changed rules

[6] C. Flanagan and M. Abadi. Types for safe locking. In

[7] C. Flanagan and S. N. Freund. Type-based race

[8] C. Flanagan and S. N. Freund. Type inference against

[9] D. Grossman. Type-safe multithreading in Cyclone. In

[10] C. A. R. Hoare. Monitors: an operating system

[11] F. Iwama and N. Kobayashi. A new type system for

You might also like