You are on page 1of 32

Using linearity to allow heap-recycling in Haskell

Chris Nicholls

May 24, 2010


Abstract
This project investigates destructive updates and heap-space-recycling in
Haskell through the use of linear types. I provide a semantics for an exten-
sion to the STG language, an intermediate language used in the Glasgow
Haskell Compiler (GHC), that allows arbitrary data types to be updated.
A type system based on uniqueness typing is also introduced that allows
the use of the new semantics without breaking referential transparency. The
type system aims to be simple and syntactically light, allowing a programmer
to introduce destructive updates with minimal changes to source code.
I have implemented this semantic extension to both in an interpreter for
the STG language and in the GHC backend. Finaly, I have written a type
checker for this system that works over a subset of Haskell.

1
Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1 Introduction 3
The Problem With Persistence . . . . . . . . . . . . . . . . . . . . 3

2 Uniqueness 4

3 Uniqueness in Type Systems 6


Linear Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Clean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Monads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Hage & Holdermans’ Heap Recycling for Lazy Languages . . . . . 8
Uniqueness in Imperative Languages . . . . . . . . . . . . . . . . . 8
A Simpler Type System for Unique Values . . . . . . . . . . . . . . 10

4 Implementation 13
The STG Language . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Operational Semantics of STG . . . . . . . . . . . . . . . . . . . . 16
Closure Representation . . . . . . . . . . . . . . . . . . . . . . . . 18
Adding an Overwrite construct . . . . . . . . . . . . . . . . . . . . 19
Ministg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
GHC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Garbage Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Results 27

6 Conclusion 29

2
Chapter 1

Introduction

The Problem With Persistence


One striking feature of pure functional programming in languages such as
Haskell is the lack of state. As all data structures are persistent, updating
a value does not destroy it but instead creates a new copy. The advantages
of this are well known[1][2] but conversely so are the disadvantages[3][5].
In particular, persistence can lead to excessive memory consumption when
structures remain in memory long after they have ceased to be useful[6].
The reason Haskell does not allow state is to avoid side effects and the
reason side effects are avoided is because they can make understanding and
reasoning about programs difficult. Indeed, from a theoretical point of view,
side effects simple aren’t required for computation. Yet undeniably, side ef-
fects are useful, particularly when implementing efficient data structures[4].
Whilst the lack of destructive update in Haskell is useful in accomplishing
the goal of referential transparency, it is not strictly necessary. It is some-
times possible to allow destructive updates without introducing observable
side effects.

3
Chapter 2

Uniqueness

Imagine a program that reads a list of integers from a file, sorts them and
then continues to process the sorted list in some manner. In an imperative
setting, we might expect this sorting to be done in- place, but in Haskell we
must allocate the space for a new, sorted version of the list. However, if the
original list is not referred to in the rest of the program, then any changes
made to the data contained in the list will never be observed. Thus there
is no need to maintain the original list. This means we could re-use the
space occupied by the unsorted list, and sine we know that sorting preserves
length, we might begin to wonder if we can do the sorting in-place.
The reason we could not use destructive updates in the example above is
that doing so may introduce side effects into our program. For instance, if we
are able to sort a list in-place then the following code becomes problematic:

foo :: [a ] → ([a ], [a ])
foo xs = (xs, (sort in place xs))

Does fst (foo [3,2,1]) refers to a sorted list or an unsorted list? With
lazy evaluation we have no way of knowing.
Notice however that modifying the original, unsorted list is only a prob-
lem if it is referred to again elsewhere in the program. If the list is not used
anywhere else, then there can be no observable side effects of updating it
in place as any data that cannot be referenced again can have no semantic
effect on the rest of the program.
If this where the case then the compiler would be free to re-use the
space previously taken up by the list, perhaps updating the data structure
in-place, and referential transparency would not be broken. This condition,
that there is only ever one reference to the list, is known as uniqueness —
we say that the list is unique.
Consider an algorithm that inserts an element into a binary tree(fig 2.1).
In an imperative language this would normally involve walking the tree until
we find the correct place to insert the element and updating the node at that

4
a'
a a

c'
b b c
c

d e f g g'
d e f g

k l h i j k l m
h i j

(a) A binary tree. An element is to be inserted(b) After insertion. A new tree has been created
in the outlined position from the old one

Figure 2.1: Inserting an element into a binary tree

position. However in a functional language, we must instead copy all the


nodes above the one to be updated and create a new binary tree. If the
original tree was unique, that is, the only reference to a was passed to the
function that inserted m, then there will no longer be any references to
a. Consequently, there will be no longer be any references to nodes c or g
either. All three node will be wasting space in memory. If a larger number
of nodes are inserted then it is possible that the space wasted will be many
times greater than the space taken up by the tree! Clearly a lot of space
can be wasted.
In general it is not possible to predict when an object in a Haskell will
become garbage, so garbage collection must be a dynamic run-time pro-
cess. Because garbage collection happens at run-time, there is a perfor-
mance penalty associated with it. Indeed, whilst garbage collection can be
very efficient when large amounts of memory are available[8], it can often
take up non-trivial percentages of the programs execution time in memory
constrained environments. But when an object is know always to be unique,
its lifetime can be determined statically and so the run-time cost of garbage
collection can be avoided.

5
Chapter 3

Uniqueness in Type Systems

Linear Logic
Linear Logic is a system of logic proposed by Jean-Yves Girard in which
each assumption must be used exactly once. Wadler noticed that in the
context of programming languages, linear logic corresponds to

• No Duplication. No value is shared so, as we have seen, destructive


update is permissible.

• No Discarding. Each value is used exactly once. This use represents


an explicit deallocation, so no garbage collection is required.

Wadler proposed a Linear type system based directly on Girad’s logic [10]
[11]. In this type system every value is labeled as being either linear or
nonlinear. Functions are then typed to accept either linear or nonlinear
arguments.
In [7] David Wakeling and Colin Runciman describe an implementation
of a variant of Lazy ML that incorporates linear types. Their results are
disappointing: the performance of programs using linear data structures is
generally much worse than without. The cost of maintaining linearity easily
outweighs the benefits of reduced need for garbage collection.
Along similar lines, Henry Baker provides an implementation of linear
Lisp [17] that restricts every type to being linear. The result is an imple-
mentation of Lisp that requires no run-time memory management. This
comes at a price, however: Baker found that much of the work must instead
be done by the programmer and, as with Linear ML, the large amounts of
book-keeping and explicit copying mean that linear Lisp is slightly slower
that its classical counterpart.

6
Clean
Clean[23] is a language very similar to Haskell that features a unique type
system based on linear logic. Clean allows users to specify particular vari-
ables as being unique The type system exposed to the user is large and often
simple functions can have complex types. However, the de-facto implemen-
tation has proved to be very efficient.
One particularly interesting feature of Clean is that the state of the world
is explicit. Every Clean program passes around a unique object, the world.
The world represents the state of the system, explicitly threaded throughout
the program and is thus destructive updates to the world can be used to
sequence IO operations. Unique objects cannot be duplicated, so no more
than one world can exist at a time and hence there is no danger of referring
to an old state by accident.

Monads
Haskell takes a different approach towards IO. Monads, as presented by
Wadler and Peyton-Jones [27] can do much of the work of uniqueness typing
by the use of encapsulation. Indeed, they are much simpler in terms of both
syntax and type system. However, monads do not solve every problem as
elegantly.
Suppose we have a program that makes use of a binary tree:

data BinTree a = Empty | Node a (BinTree a) (BinTree a)


insert :: a → BinTree a → BinTree a
removeMin :: BinTree a → (a, BinTree a)
isEmpty :: BinTree a → Bool

If we want to allow the tree to be updated destructively we can employ


the ST monad, replacing each branch by a mutable reference, an STRef.
However, as STRefs require a state parameter, we must also add a type
parameter to our binary trees.

data BinTree s a =
Empty
| Node a (STRef s (BinTree s a) (BinTree s a))

Unfortunately, none of the code we have written to work over binary


trees will work any more! Not only are the type signatures incorrect, but the
whole implementation must be re-written to work within the state monad.

insert :: a → BinTree s a → ST s (BinTree s a)


removeMin :: BinTree s a → ST s (a, BinTree s a)
...

7
Monadic code can often differ significantly in style to idiomatic functional
code, so this may end up affecting large portions of our code. This can clearly
cause problems if we where trying to optimise a large program in which the
binary tree implementation had been identified as a bottleneck.

Hage & Holdermans’ Heap Recycling for Lazy Lan-


guages
As a way of avoiding this ‘monad creep’, Hage and Holdermans present a lan-
guage construct to allow destructive updates for unique values in nonstrict,
pure functional languages[19]. Their solution makes use of an embedded
destructive assignment operator and user-controlled annotations to indicate
which closures can be re-used. They describe a type system that restricts the
use of this operator and prove that referential transparency is maintained.
Hage and Holdermans do not provide an implementation of either the
type system or the destructive assignment operator.
They also state concern in the complexity of the type system exposed
to the user, despite it being simpler than the system used in Clean. This is
the issue addressed in the next section of this paper.

Uniqueness in Imperative Languages


The initial motivation for this project came not from linear logic but from
imagining an imperative language that maintained a form of referential
transparency.
This language has two types of variables, consumable and immutable.
Each function then accepts two sets of variables, one set is the set of variables
that this function consumes, the other is the set of variables that it views.
During execution, a function f is said to own a variable x if and only if:

• the variable x was created inside the body of f (either from a closed
term or a literal), or x was passed to f as a consumable variable;

• f has not passed x as a consumable variable to any other function.

Each function is restricted so that the only variables it can modify or


return are the variables it owns. One further restriction is then when a
variable is passed in to a function as a consumable variable, it is removed
from the current scope (this means it cannot be used as another argument to
the same function). Thus, any variable passed into a functional as a viewed
argument will not change during the execution of that function, and any
variable passed in as a consumed argument can not be referred to again, so
destructively updating the variable will not cause side effects.

8
As an example, here is an implementation of quicksort in this theoretical
language:

qsort (consumed xs :: [Int ]) → [Int ] = {


return sort (xs, Nil )
}
sort (consumed xs :: [Int ], end :: [Int ]) → [Int ] = {
case xs of
Nil → return end ;
Cons (x , xs 0 ) → {
ys, zs := split (x , xs 0 );
zs 0 := sort (zs, end );
return sort (ys, Cons (x , zs 0 ));
}
}
split (viewed p :: Int; consumed xs :: [Int ]) → ([Int ], [Int ]) = {
case xs of
Nil → return ([ ], [ ])
Cons (x , xs 0 ) → {
ys, zs := split (p, xs 0 );
if x > p then :
return (ys, Cons (x , zs));
else :
return (Cons (x , ys), zs);
}
}

In the body of sort, xs will be out of scope after the case expression,
and after the line
split(xs0 , x)
xs0 will be out of scope but x will remain in scope, since split consumes its
second argument but only views its first.
These rules ensure that at any point in the programs execution, if x is
consumable in the current environment then there is no more than a single
reference to it. Conversely if there is more than one reference to x then x
must be immutable.
A sufficiently smart compiler would be able to tell that in each case
expression, list under scrutiny is never referred to again, only its elements
are. Thus, in the case that the list was a Cons cell, the cell can be re- used
when a Cons cell is created later on. In this way, the function sort can avoid
allocating any new cells and instead operate in-place.

9
A Simpler Type System for Unique Values
We can translate this idea of consumable variables into Haskell. Below is
the code for quicksort written in a version of Haskell extended to include
this idea.

qsort :: [Int ] ; [Int ]


qsort xs = sort xs [ ]
sort :: [Int ] ; [Int ] ; [Int ]
sort [ ] end = end
sort (x : xs) end = sort ys (x : sort zs end )
where
(ys, zs) = split x xs
split :: Int → [Int ] ; ([Int ], [Int ])
split [ ] p = ([ ], [ ])
split (x : xs) p = case p > x of
True → (x : ys, zs)
False → (ys, x : zs)
where
(ys, zs) = split p xs

This is deliberately very close to standard Haskell, with one addition. A


new form of arrow has been introduced to the syntax of types. The intended
meaning of

f :: a ; b
g :: a → b

is that f consumes a variable of type a and produces a b. Thus the body


of f is free to modify x. By comparison, g is a standard Haskell function
that only views its first argument. Intuitively, the new arrow form obeys
the following rules:

• Only unique variables and closed terms may be used as an argument


to a function expecting a unique value;

• The result of applying a function of type (a ; b) to a unique value of


type a will be a unique value of type b

• A unique variable may be used at most once in the body of a function;

• data structures are unique all the way down


i.e. a function (f :: [a] ; [a]) works over a unique list whose elements
are also unique.

10
map :: (a ; b) → [a ] ; [b ]
map f [ ] = [ ]
map f (x : xs) = f x : map f xs
-- Map takes a unique list of and updates it in place
-- Notice the function f is not unique itself as it is used twice
-- on the right hand side
id :: x ; x
id x = x
compose :: (b ; c) → (a ; b) → a ; c
compose f g x = f (g x )
double1 :: a ; (a, a)
double1 a = (a, a) -- error: unique variable ‘a‘ is used twice.
double2 :: a → (a, a)
double2 a = (a, a)
apply1 :: (a → b) → a ; b
apply1 f x = f x -- error: result of applying f to x will not be unique
apply2 :: (a ; b) → a → b
apply2 f x = f x -- error: f expects a unique argument, x is not unique
twice :: (a ; a) → a ; a
twice f = compose f f
fold :: (b ; a ; a) → a ; List b ; a
fold f e [ ] = e
fold f e (x : xs) = fold f (f x e) xs
f1 :: a ; (a → b) → b
f1 x g =g x
f2 :: a → (a ; b) → b
f2 x g = g x -- error: g expects a unique argument, x is not unique
f3 :: a → (a → b) ; b
f3 x g = g x -- error: the result of applying g to x will not be unique.
-- a unique variable may be passed to an argument expecting a
-- non-unique variable, but not the other way round.
-- Note that in f1, the type signature is implicitly bracketed like this:
-- f 1 :: a ; ((a → b) → b)
-- so the result of a partial application would be a function that is
-- itself unique.

Figure 3.1: Some examples of function with possible type signatures and
type errors.

11
Semantically, this can be viewed in terms of the system proposed by
Hage and Holdermans, equivalent to
f :: a1 →
1 b1

g :: a → bω
ω ω

Many functions can be converted to use this type system without needing
to alter their definition at all. For instance, a function that reverses a list
in-place can be constructed simply by altering the type signature or the
standard Haskell function reverse.

reverse :: [a ] ; [a ]
reverse = rev [ ]
where
rev :: [a ] ; [a ] ; [a ]
rev xs [ ] = xs
rev xs (y : ys) = rev (y : xs) ys

There is a significant drawback to this system in the fact that there is


more than one possible way to assign a type to some fragments of code.
If we want to use both in-place reverse and regular reverse, then we must
create two separate functions that differ only by name and type signature.
I have implemented a typechecker for this system over a subset of Hasekell.
Due to time constraints and the complexity of GHC’s type system result-
ing from the vast number of type system extensions already present, the
new type system has not been integrated into GHC. Despite this, the back-
end mechanisms to allow closure-recycling are fully functional: the example
above will compile and run, sorting the list in-place although it will not be
typechecked by GHC.

12
Chapter 4

Implementation

I have implemented the backend mechanisms for dealing with overwriting


as an extension to the Glasgow Haskell Compiler. This section includes
just enough detail about the inner working of the compiler to explain this
extension.
There are several main stages in the compilation pipeline:

• The Front End contains the parser and the type checker.

• The Desugarer converts from the abstract syntax of Haskell into the
tiny intermediate Core-language.

• A set of Core-to-Core optimisations and other transformations.

• Translation into the STG language.

• Code generation.

This chapter deals with the details of the final two phases.

13
The STG Language
The STG language is a small, non-strict functional language used inter-
nally by GHC as an intermediate language before imperative code is output.
Along with formal denotational semantics [26], the STG language also has
full operational semantics with a clear and simple meaning for each language
construct.

Construct Operational meaning


Function Application Tail Call
Let Expression Heap Allocation
Case expression Evaluation
Constructor application Return to Continuation

There are also several properties of STG code that are of interest:

• Every argument to a function or data constructor is a simple variable


or constant. Operationally, this means that arguments to functions are
prepared (either by evaluating them or constructing a closure) prior
to the call.

• All constructors and built-in operations are saturated. This cannot be


guaranteed for every function since Haskell is a higher order language
and the arity of functions is not necessarily known, but it simplifies the
operational semantics. Functions of known arity can be eta-expanded
to ensure saturation.

• Pattern matching and evaluation is only ever performed via case ex-
pressions, and each case expression matches one-level patterns.

• Each closure has an associated update flag π. More is explained about


these further down.

• Bindings in the STG language carry with them a list of free variables.
This has no semantic effect but is useful for code generation.

14
Program prog → binds

Bindings binds → var1 = l f1 ; ...; varn = l fn

Lambda-forms lf → varsf λπ varsa -> expr

Update flag π → u Updatable


| n Not updatable

Expression expr → let binds in expr Local definition


| letrec binds in expr Local recursion
| case expr of alts Case statements
| var atoms Application
| constr atoms Saturated constructor
| prim atoms Saturated buit-in op
| literal

Alternatives alts → aalt1 ; ...; aaltn ; def ault n > 0 (Algebraic)


| palt1 ; ...; paltn ; def ault n > 0 (Primitive)

Algebraic alt aalt → constr vars -> expr


Primitive alt palt → literal -> expr
Default alt default → var -> expr

Literals literal → 0# | 1# | ... Primitive Integers

Primitive ops prims → +# | -# | *# | /# | ... Primitive integer ops

Variable lists vars → {var1 , ..., varn } n>0

Atom lists atoms → {atom1 , ..., atomn } n>0


atom → var | literal

Figure 4.1: Syntax of the STG language

15
let x = bind in e; s; H (LET)
e[x0 /x]; s; H[x7→ bind] (x’ free)

case v of alts; s; H[v7→ C a1 ...an ]


e[a−1/x1 ...an /xn ]; s; H alts = {...; C x1 ...xn → (CASECON)
e; ...}

case v of {...; x→ e;...}; s; H


e[v/x]; s; H v is a literal and does (CASEANY)
not match any other
case alternatives

case v of alts; s; H
e; (case • of alts : s); H (CASE)

v; case • of alts; s; H
case v of alts; s; h v is a literal or H[v] is in (RET)
HNF

s; s; H[x →
7 e]
e; (U pd • x : s); H e is a thunk (THUNK)

y; (U pd x • : s); H
y; s; H[x 7→ H[y]] H[y] is a value (UPDATE)

Figure 4.2: The evaluation rules

Operational Semantics of STG


Ths semantics of the STG language are described in [15] and [26]. An
outline of the relevant rules is presented here with some details left out. In
particular the details of both recursion and function application are missing
as neither have much effect on the ideas presented here. The semantics of
the STG language is given in terms of three components:

• The code e is the expression under evaluation;

• The stack s of continuations;

• The heap H is a finite mapping from variables to closures.

The continuations, κ, on the stack take the following forms:

κ ::= case • of alts Scrutinise the returned value in the case statement
| U pd t • Updatethe thunk t with the returned value
| (• a1 ...an ) Apply the returned function to a1 ...an

The first rule, LET, states that to evaluate a let-expression the heap
H is extended to map a fresh variable to the right hand side bind of the

16
expression. The fresh variable corresponds to allocating a new address in
memory. After allocation, we enter the code for e with x0 substituted for x.
Here is the Haskell code for the function reverse, taken from the standard
prelude and the corresponding STG code:

reverse = rev [ ]
where
rev xs [ ] = xs
rev xs (y : ys) = rev (y : xs) ys

reverse = { } λn { } → rev {Nil }


rev = { } λn {xs ys } → case ys of
Nil { } → xs
Cons {z , zs } →
let rs = {z , xs } λn Cons {z , xs } in
rev {rs, zs }

which should be read in the following way:

• First bind reverse to a function closure whose code pushes onto the
stack a continuation that apply a function to the value N il, then
evaluate the code for rev

• Bind rev to a function closure that expects two arguments xs and ys.
The code for this closure should force evaluation of ys and examine
the result:

– if it matches N il, then evaluate the code for N il;


– if it matches Cons z zs then allocate a Cons cell with arguments
z and xs, load rs and zs onto the stack and enter the code for
rev.

Update flags
One feature of lazy evaluation is that each closure should be replaced by
its (head) normal form upon evaluation, so that the same closure is never
evaluated more than once. The update flag attached to each closure specifies
whether this update should take place. If the flag is set to ‘u’ then the closure
will be updated and if it set to ‘n’ thet no update will be performed. Naı̈vely,
every flag can be set to ‘u’, but this is not always necessary. For instance, if a
closure is already in head normal form, then updating is not required. Much
more detail about this is given in Simon Peyton-Jones’ paper ‘Implementing
functional languages on stock hardware: the Spineless Tageless G-Machine’.
[26]

17
Closure Representation
Every heap object in GHC is in one of three forms: a head normal form (a
value), a thunk which represents an unevaluated expression, or an indirection
to another object. A value can either be a function value or a data value
formed by a saturated constructor application. The term closure is used
to refer to any of these objects. A distinctive feature of GHC is that all
closures are represented, and indeed handled, in a uniform way.

Free Variables

Code

All closures are in this form, with a pointer to an info table containing
code and other details about the closure and a list of variables that the
closure needs access to. For example, a closure for a function application will
store the code for the function in the info table and the arguments in the free
variable list. When the closure is evaluated, the arguments can be reached
via a known offset from the start of the closure. For a data constructor,
the code will return to the continuation of the case statement that forced
evaluation, providing the arguments of the constructor application. These
arguments are again, simply stored as an offset from the start of the closure.

18
Adding an Overwrite construct
In this section, a new construct overwrite is added to the STG language.
The syntax and semantics are given below. The idea is that
overwrite xs with e1 in e2
will behave in a similar manner to
let xs = e1 in e2
but rather than storing e1 as a new heap- allocated closure and binding to
x, the closure bound to x will be overwritten with the closure for e1 .
Now, care must be taken to ensure that x really is bound to a closure,
not an unboxed value, and that e1 will produce the same type of closure.
However, no checking is done at this stage as we assume this (as well as
uniqueness checking) has been taken care of elsewhere in the compiler.
This highlights another difference between let and overwrite, namely
that the variable bound in the let-construct may be any variable, free or
bound whereas in the overwrite-construct, it must be a bound variable.
We can add this construct to the example reverse from above:

reverse :: [a ] ; [a ]
reverse = { } λn { } → rev {Nil }
rev = { } λn {xs ys } → case ys of
Nil { } → xs
Cons {z , zs } →
overwrite ys with Cons {z , xs } in
rev {ys, zs }

Since there are no longer any let-constructs in this code, it doesn’t allocate
any space on the heap! The function runs using a constant amount of
space, although in the case that the list is a previously unevaluated thunk,
forcing the evaluation of reverse will also force the evaluation and therefore
allocation of the list it operates on.
Note that we know it is safe to overwrite ys with a Cons cell beacuase
we know it to be unique from the type signature1 and we know ys to be a
Cons cell already since it was matched in a case expression
In general, it is safe to overwrite a closure x with a constructor applica-
tion C a1 ...an exactly when these two conditions hold:
• x is known to be unique. This information is provided by the type
system.

• The closure bound to x was built with constructor C and is in normal


form. This happens when x has been matched in a case expression,
inside the guard for constructor C.

1
The STG language is untyped, but this inforformation is available during the trans-
lation phase.

19
Expression expr → ...
| overwrite x with expr in expr

overwrite x with e1 in e2 ; s; H
e; s; H[x 7→ e1 ] (OVERWRITE)

Figure 4.3: The overwrite construct

Ministg
Ministg [29] is an interpreter for the STG language that implements the
operational semantics as given above. It offers a good place to investigate
the new semantics.
Here is an outline of the relevant code:
The code dealing with overwrite-expressions is largely similar to the code
for let-expressions and usually is simpler. For instance, no free variable
need be generated unlike in the let- expression and no substitution need
be performed. Performing substitutions over overwrite-expressions is also
simpler than the corresponding let-expression, as there is no variable capture
to be avoided. The final difference is in calculating free variables, as in a
let expression the variable appearing on the left-hand-side is not free but is
free in an overwrite expression.

GHC
At an operational level, these are the only differences between let- expres-
sions and overwrite expressions. When it comes to implementing the STG
language in GHC, however, there are a few more hurdles to overcome. Un-
surprisingly, much of the code remains the same as for let- expressions, but
the translation is not as direct as in the Ministg interpreter.
Firstly: updating variables reacts badly with the generational garbage
collector employed in GHC. More detail about this is provided in the next
section.
Secondly: whereas in the Ministg interpreter variable locations are stored
in data structure representing a finite mapping, in GHC variable locations
are stored as pointers kept in registers or as offsets from the current closure.
In the case that the location of a variable is stored at an offset from a
closure that is to be overwritten we must make sure to save this location
before performing the update, otherwise the location will be lost and will
no longer be able to access the variable.
In the example below, the addresses for x and xs will be located at an

20
smallStep :: Exp → Stack → Heap → Eval (Maybe (Exp, Stack , Heap))
-- LET
smallStep (Let var object exp) stack heap = do
newVar ← freshVar
let newHeap = updateHeap newVar object heap
let newExp = subs (mkSub var (Variable newVar )) exp
return $ Just (newExp, stack , newHeap)
-- OVERWRITE
smallStep (Overwrite var object exp) stack heap = do
let newHeap = updateHeap var object heap
return $ Just (exp, stack , newHeap)
-- CASECON
smallStep (Case (Atom (Variable v )) alts) stack heap
| Con constructor args ← lookupHeap v heap,
Just (vars, exp) ← exactPatternMatch constructor alts = do
return $ Just (subs (mkSubList (zip vars args)) exp, stack ,
heap)
-- CASEANY
smallStep (Case (Atom v ) alts) stack heap
| isLiteral v ∨ isValue (lookupHeapAtom v heap),
Just (x , exp) ← defaultPatternMatch alts = do
return $ Just (subs (mkSub x v ) exp, stack , heap)
-- CASE
smallStep (Case exp alts) stack heap = do
return $ Just (exp, CaseCont alts callStack : stack , heap)
-- RET
smallStep exp@(Atom atom) (CaseCont alts : stackRest) heap
| isLiteral atom ∨ isValue (lookupHeapAtom atom heap) = do
return $ Just (Case exp alts, stackRest, heap)
-- THUNK
smallStep (Atom (Variable x )) stack heap
| Thunk exp ← lookupHeap x heap = do
let newHeap = updateHeap x BlackHole heap
return $ Just (exp, UpdateCont x : stack , newHeap)
-- Update
smallStep atom@(Atom (Variable y)) (UpdateCont x : stackRest) heap
| object ← lookupHeap y heap, isValue object = do
return $ Just (atom, stackRest, updateHeap x object heap)

Figure 4.4: Outline of the Ministg implementation for the evaluation rules
given in figure 4.2 plus the new overwrite expression.

21
offset from the closure for ls. When that closure is overwritten, we lose these
addresses, so we must take care to save them in tempory variables first.

...
case ls of
Cons x xs →
...
overwrite ls with Cons y ys in
... x ... xs ...

Cons info table Cons info table

... ...
...
x offset: 1 xy offset:
offset:
1 1
xs offest: 2 x xs ysoffest:
xs offest:
2 2 x y xs
xs ys
...
... ...

(a) (b)

Figure 4.5: Overwriting a Cons cell. Any references that pointed to xs in


(a) will point to ys after the update (b) and similarly for x and y.

22
Let us now consider another example, map. Intuitively, map seems like
a good candidate for in-place updates — we scan across the list updating
each element with an function application. But there is a problem. Looking
at the code for map and the corresponding STG binding we see that map
does not allocate any Cons cells! At least not directly:

map :: (a → b) → [a ] → [b ]
map [ ] = [ ]
map f (x : xs) = f x : map f xs

map = { } λn {f , xs } → case xs of
Nil { } → Nil
Cons {y, ys } →
let fy = {f , y } λu f {y } in
let mfys = {f , ys } = λu map {f , ys } in
Cons {fy, mfys }

The two closures allocated in the body of map are both thunks allocated
on the heap whereas the Cons cell is placed on the return stack. In a strict
language, the recursive call to map would allocate the rest of the list, but in a
lazy language a thunk representing the suspended computation is allocated
instead. This thunk will later be updated with its normal form (either a
Cons cell or Nil) if examined in a case statement.
In general, the size of the updatee’s closure and the size of the thunk will
not be of the same size, so we cannot blindly overwrite the former with the
latter. One can imagine a mechanism whereby upon seeing a unique value
in a case statement, the code generator searches the rest of the code for the
closure that ’fits best’. If a closure of the same type is built, then we select
that. Otherwise we try and reuse as much space as possible by selecting the
largest closure that will take up no more space than the closure we wish to
overwrite.
There is also the possibility of reusing the thunk allocated for the recur-
sive call itself, since once evaluated, it is no longer needed.
I have not been able to try implementing this feature, but it would be
an interesting improvement to make.
There is one more optimisation that could potentially be included. When
a variable x known to be unique goes out of scope, we know that it has
become garbage, weather or not x appears in a case statement. The compiler
would then be free to overwrite x with a new variable y without making any
assumptions about the uniqueness of y. There is some difficulty here as we
do not know if x refers to a value or an unevaluated thunk. If x has not
been evaluated then in general we can infer nothing about the size of the
thunk it refers to, as it may have been formed from an arbitrary expression.

23
Garbage Collection
GHC uses an n-stage generational garbage collector. A copying collector will
partition the heap into two heap spaces: the from-space and the to-space.
Initially objects are only allocated in the from-space. Once this space is full,
the live object in the from-space are copied into the to-space. Live objects
are objects that are reachable from the current code. Any unreachable
object (garbage) is never copied so will not take up space in the new heap
area, so the new heap will be smaller than the old heap (provided there was
unreachable data in the heap). Now the from-space becomes the to-space
and vide-versa and the program continues to run. If no space was reclaimed,
then the size of the two spaces must be increased, if this is possible. This can
be generalised to more than two spaces so that there are many heap-spaces
of which any one of them may be acting as the to-space at a given time.
This process clearly cannot be employed in a language that allows pointer
arithmetic, for example, since closures are frequently being relocated in
memory and pointers would be left dangling, or pointing to nonsense. But
are things any better in a functional language? Ignoring for the moment
lazy evaluation and overwriting, Haskell has the property that any new data
value will only point to old data, never the other way round since values are
immutable. This means the references in memory form a directed acyclic
graph with older values at the leaves and newer values nearer to the root.
The idea behind generations is that since structures are immutable, old
structures don’t usually point to structures created more recently. Because
of this is it possible to partition the heap into generations where old gener-
ations do not reference new generations. In this way, the garbage collector
can re-arrange the new generations without affecting the old generations. It
has been observed that in functional programming, old data tends to stay
around for much longer than new data [reference] so most unreachable data
is newly created. This means that a large proportion of the garbage to be
collected usually lies in the youngest generation, so by collecting this we can
reclaim decent amounts of space without having to traverse the whole heap.
Occasionally however, garbage collecting a young generation will not free up
enough space, in which case older generations must also be collected.
A generational collector will also use a method to age objects from
younger generation into older generations, if they have been around long
enough. The usual way of doing this is by recording how many collections
an object survives in a particular generation. Once this number exceeds
some threshold, the object is moved up into the older generation.
By default, GHC uses two generations. This scheme leads to frequent,
small collections with occasional, much larger collections of the entire heap.
Up until this point, we have been considering only garbage collection
in directed acyclic graphs. Things become much less neat when we allow
closures to be overwritten, as a closure in an old generation may well be

24
(a) A block of memory split into two (b) After garbage collecting the youngest gen-
generations. The grey blocks are garbage. eration.

Figure 4.6: Generational garbage collection

updated to reference a newer closure in a younger generation. When a


garbage collection takes place, the younger closure will be moved to a new
location and the reference inside the older closure will no longer point to the
correct location. It is worth noting that this can happen even without the
overwrite construct, owing to lazy evaluation. For example, the following
code can be used to create a cyclic list:

cyclic :: [a ] → [a ]
cyclic xs = rs
rs = link xs rs
link [ ] ys = ys
link (x : xs) = x : link xs ys

To work around this, for each generation, GHC keeps track of the set of
closures that contain pointers to younger generations called the remembered
set. During a garbage collection, the pointers to younger generations are
maintained so as to keep pointing to the correct locations. This remembered
set must be updated whenever a closure is overwritten; this is known as the
write-barrier. This means that whenever a closure is overwritten, we must
check for any old to new pointers being created by considering the generation

25
of the closure being overwritten.
Unfortunately, this does incur a significant performance penalty for the
overwrite expression. This will discussed later on in more detail.

26
Chapter 5

Results

Due to the write-barrier overhead, performance gains for this new optimi-
sation are slight. In a benchmark program that sorts a large list of integers
we find that although the time spent in garbage collection drops by around
10%, the extra cost of the write- barrier and saving free variables almost
exactly counteracts the benefits.
By comparison, if we restrict GHC to use a single-space copying collector,
thus avoiding the problems associated with older generations, we see a much
bigger improvement from in-place updates. However, the overall execution
time is worse than when using the generational collector, so there is little
point in doing so.
For larger, more realistic programs the gain is usually even smaller. Typi-
cally, there is little difference between an optimised program and one without
closure-overwriting. It is interesting to note though, that in no case has it
been observed that the extension causes a program to run noticeably slower.
However, this is only the case in conditions where the run time system is
allowed access to much more heap space than is needed. When the amount
of the heap space available is restricted to be close to the amount of live
data, very different results can be seen. Fig 5.2 shows how the performance
of the sorting program varies with the size of the heap. For small heap
sizes, including destructive update makes a big difference in the speed of
the program. Without allowing destructive update, reducing the size of the
heap dramatically incraces the amount of time spent garbage collecting. For
a heap size of 8MB garbage collection accounts for approximately 50% of
execution time and for a heap size of 2MB, this increases to around 75%.
By contrast, with destructive overwriting turned on, reducing the size of
the heap has little effect on the program. Indeed the program actually runs
slightly faster with a smaller heap! This may be due to improved data
locality of a smaller heap and fewer cache misses.

27
none -G1 -M2m
With optimisation
Time(s) 6.43 8.38 6.11
%GC 46% 56% 44.8%
Without optimisation
Time(s) 6.41 9.52 11.18
%GC 57% 60% 76%

Figure 5.1: Results of running a sorting algorithm with various options


affecting the run time system. The code under analysis here is exactly the
quicksort example given earlier used to sort a list of 20000 integers taking
the minimum time over three runs.

Sheet2

12

10

Total execution time


8
without overwriting

Time spent in garbage


collection without
Time (s )

overwriting
6

Total execution time


with overwriting

Time spent in garbage


4 collection with
overwriting

2 3 4 5 6 7

Heap Space (MB)

Figure 5.2 Page 1

28
Chapter 6

Conclusion

As the run time system of GHC has been highly optimised for persistent
data structures, overwriting closures provides little benefit under typical
conditions. Despite this, the technique appears promising for environments
where a large amount of excess heap space is not available.
A number of possibilities for further optimisation remain open, that may
improve the impact of this technique. It is likely that being more aggressive
in deciding which closures can be overwritten would lead to better results. In
particular, allowing the closures allocated for function calls to be updated
is likely to be useful for optimising recursive functions that are not tail
recursive.

29
Bibliography

[1] Hudak, P. 1989. Conception, evolution, and application of functional


programming languages ACM Computer Survey

[2] J. Hughes Why Functional Programming Matters

[3] David B. MacQueen Reflections on standard ML Lecture notes on Com-


puter Science, Volume 693/1993, pages 32-46.

[4] Sylvian Conchon Jean-Christophe Fillantre A Persistent Union-Find


Data Structure

[5] P. Wadler Functional Programming: Why no one uses functional lan-


guages

[6] Niklas Rojemo Colin Runciman Lag, Drag and Void: heap-profiling and
space-efficient compilation revisited Department of Computer Science,
University of York

[7] David Wakeling Colin Runciman Linearity and Laziness

[8] Andrew W. Appel Garbage Collection Can Ce Faster Than Stack Allo-
cation. Department of computer science, Princeton University.

[9] Philip Wadler The marriage of effects and monads. Philip Wadler, Bell
Laboratories

[10] Philip Wadler Is there a use for linear logic? Philip Wadler, Bell
Laboratories

[11] Philip Wadler A taste of linear logic Philip Wadler, Bell Laboratories

[12] Philip Wadler Comprehendnig Monads Philip Wadler, University of


Glasgow

[13] David N. Turner Philip Wadler Operational Interpretations of Linear


Logic

30
[14] Simon Peyton-Jones Implementing functional languages on stock hard-
ware: The Spineless Tagless G-machine version 2.5 University of Glas-
gow

[15] Simon Peyton-Jones Making a Fast Curry: Push/Enter vs Eval/Apply


for Higher-order Languages

[16] Antony L. Hosking Memory Management for Persistence University


of Massachusetts

[17] Henry G. Baker Lively Linear Lisp — ‘Look Ma, No Garbage’

[18] Edsko de Vries Rinus Plasmeijer David M Abrahamson Uniqueness


Typing Redefined

[19] Jurriaan Hage Stefan Holdermans Heap Recycling for Lazy Languages
Department of Information and Computing Sciences, Utrecht University

[20] Jon Mountjoy The Spineless Tagless G-machine, naturally Department


of Computer Science University of Amsterdam

[21] Francois Pottier Wandering through linear types, capabilities, and re-
gions.

[22] Exploring the Barrier to Entry - Incremental Generational Garbage


Collection for Haskell A.M. Cheadle A.J. Field S. Marlow S.L. Peyton
Jones R.L. While

[23] Ntcker E.G.J.M.H. Smetsers J.E.W. Eekelen M.C.J.D. van Plasmeijer


M.J. Concurrent Clean

[24] Simon Peyton-Jones Simon Marlow The STG Runtime System (re-
vised)

[25] Simon Peyton-Jones Simon Marlow The New GHC/Hugs Runtime Sys-
tem

[26] Simon Peyton-Jones Implementing Functional languages on stock hard-


ware: the Spineless Tageless G-Machine

[27] Simon Peyton-Jones Philip Wadler Imperative Functional Programming

[28] J. Launchburry S Peyton-Jones State in Haskell In Lisp and Symbolic


Computation, volume 8, pages 293-342.

[29] The Mini STG Language: http://www.haskell.org/haskellwiki/Ministg

31

You might also like