You are on page 1of 20

Implementing Functional Languages on Object-Oriented

Virtual Machines
Nigel Perrya and Erik Meijerb
a
Department of Computer Science, University of Canterbury
Christchurch, New Zealand
nigel@cosc.canterbury.ac.nz
b
Microsoft Corporation
emeijer@microsoft.com

Abstract

Hosting functional languages in object-oriented environments, such as the Java Virtual Machine and Microsoft’s
Common Language Infrastructure, so that they inter-operate well with other languages presents a number of
problems. In this paper we introduce Mondrian, a functional language specifically designed for such environments,
and describe the decisions and trade-offs in its design.

1 Introduction
Mondrian was originally conceived as a simple and consistent dialect of Haskell [1] which
was defined by a translation into the latter. In its original description [2] mention was made
as a footnote of using functional languages as glue for COM components as a possible
domain where functional languages, such as Mondrian, could be shown to be superior. That
first description also stated that Mondrian was still in its childhood stage.

Mondrian has grown up, and metamorphised, over the intervening years. The original footnote
became a more central goal in the design [3], the type system moved more towards those of
object-oriented languages, the syntax changed to be more familiar to imperative programmers,
it was targeted first at the JVM and then at .NET, and hopefully it retains some of the
simplicity of its childhood. Mondrian, however, is still not finished, so in this paper we
suggest it maybe is now approaching adulthood after an eventful adolescence.

Mondrian is now a functional language specifically designed to be hosted on object-oriented


virtual machines, such as the Java Virtual Machine (JVM) [4] and Microsoft’s Common
Language Infrastructure (CLI) [5-7], and to inter-operate well with object-oriented languages.
Executing on such machines raises problems for functional languages, especially if they wish
to inter-operate well. The type systems and evaluation models of the two paradigms are rather
different. The former provides a system based on sub-typing (inheritance), overriding
(“polymorphism”) and strict evaluation. Functional languages are typically based on parametric
polymorphism and non-strict evaluation.

This paper describes the language, its implementation, and some of the decisions and trade-offs
made in its design. The paper concentrates on the design and implementation of Mondrian. A
description of Mondrian from a programmer’s viewpoint may be found in [8].

Similarities between Mondrian and languages such as Pizza [9], GL [10] and Generic C#/CLR
[11] are evident. However, all these other approaches aim to extend the expressive power of
object-oriented languages by introducing concepts more commonly found in functional
languages. The contribution of Mondrian is that it takes an opposite approach, taking the
functional paradigm and moulding it to fit better into the object-oriented framework. We
believe this will introduce a whole new audience to functional languages, especially though
our work with Microsoft and the .NET project.
2 An Overview Of Mondrian
The semantics of Mondrian will be familiar to functional programmers, it is a simple non-strict
language. The I/O system is based on monads [12], and follows the design of Haskell [1]. A
number of things however are unusual; including its syntax, the type system, exception
handling, and concurrency support. We introduce here by example, later sections look at its
type system and implementation in greater detail.

2.1 Basic Syntax

The syntax of Mondrian is unusual in that it looks more like Java [13] or C# [14] and less like
traditional functional languages. This design was chosen to make the language more readily
accessible to object-oriented programmers.

Mondrian is also a minimalist1 language, rarely providing two ways of doing something, and
often providing simpler expression forms than current functional languages.

To introduce Mondrian, here is a simple function which calculates the roots of the quadratic
equation ax2 + bx + c = 0:

class Pair { r1 : Double; r2 : Double; }

roots : Double -> Double -> Double -> Pair;


roots = a -> b -> c
let
d = b * b - 4 * a * c;
root = sqrt d;
a2 = 2 * a;
in if d < 0
then
Pair(-1, -1)
else
Pair( (-b + root)/a2, (-b - root)/a2 );

The first line declares a product type, the syntax directly reflects Java/C#. However, the
function definition is more of a mix of the two styles. To simplify the syntax the traditional
“\” or fun/lambda keywords often used to introduce a functional definition have been dropped.
However the qualified expression (let) and conditional (if/then/ else) forms follow traditional
functional language design. The formation of a product value follows the typical constructor
application syntax of object-oriented languages, however the traditional new keyword is
redundant and has omitted2.

2.2 “Sum” Types and Pattern Matching

Rather than providing the traditional tagged sum type, Mondrian uses the class and sub-class
mechanisms of the host virtual machine to provide similar functionality. In this model sub-type
names replace tag names. For example, the standard list type can be defined in Mondrian
using:

1
The language is named after Dutch abstract painter Pieter Cornelis Mondrian (1872 – 1944). In his later work Mondrian
limited himself to small vertical and horizontal strokes.
2
Both Java and C# have retained ‘new’ though it is not required. The current Mondrian compiler optionally allows its use.

2
abstract class List<a> {};

class Nil<a> extends List<a> {};

class Cons<a> extends List<a>


{ head : a;
tail : List<a>;
};

This shows an underlying similarity to Java/C# while type parameters are added and the
field/type ordering is reversed. Pattern matching on sub-type names is provided, but only a
single level of matching is supported. For example, the length function on lists may be
defined as:

length: forall a. List<a> -> Integer;


length = l ->
switch(l)
{ case Nil: 0;
case Cons{t = tail;}: 1 + (length t);
}

The fine line Mondrian treads in providing syntax which feels familiar to traditional object-
oriented programmers and semantics familiar to traditional functional programmers is clear in
these examples. The choice of the products as sub-types approach and parametric types are
discussed in detail later. The single-level of pattern matching was chosen to keep the language
simple.

2.3 I/O And Calling Other Languages Hosted On The Virtual Machine

Following Haskell, Mondrian uses monadic I/O [12]. The standard “Hello” program may be
written as follows:

main : IO Void;

main =
{ PutStr("Please enter you name: ");
name <- GetStr();
PutStr("Hello there " + name);
}

To a functional language calling a method in an object-oriented language can be modelled the


same way as performing an I/O operation [15]. Mondrian follows this approach providing
three constructs; create, invoke, and static invoke; which construct monadic interfaces
for constructors, instance methods and static methods respectively. For example, the following
simple program executes on .NET and produces a single random number by calling the .NET
Framework’s System.Random:

// define a monadic interface for the System.Random constructor


dotnetRand : IO<System.Random>;
dotnetRand = create System.Random();

// define a monadic interface to the Random.Next method


nextRand : Integer -> System.Random -> IO<Integer>;
nextRand = invoke System.Random.Next(Int32)

3
main =
{ gen <- dotNetRand;
num <- nextRand 10 gen;
putStrLn ("Random 1..10: " + show num);
}

2.4 Exception Handling

Exceptions are an integral part of the host virtual machines. Mondrian provides a function to
generate exceptions and a monadic construct, modelled on Java/C# ’s try/catch/finally, to
handle them. Exceptions produced by other virtual machine hosted language code called from
Mondrian can be caught, and other languages calling Mondrian can catch its exceptions.

Mondrian is not the first example of a functional language with exception handling, for
example it has been added to Haskell [16], but it is unusual in its complete integration with
an exception system provided by an object-oriented virtual machine environment.

2.5 Concurrency Support

Threads and synchronisation primitives are also an integral part of the host virtual machines
and frameworks. Following on from our previous work, such as Concurrent Hope+C [17],
Mondrian provides full support for threads and synchronisation primitives. In a multi-threaded
program the individual threads can be written in any virtual machine hosted language, including
Mondrian; and synchronise with, and pass data between, each other.

3 The Mondrian Type System


Targeting an object-oriented virtual machine presents challenges compared to a conventional
machine. Most functional languages have polymorphic type systems, typically a descendant
of Milner’s which are both expressive and can be checked statically. A conventional machine
is more-or-less untyped, once functional language code has been shown to be type correct it
can be translated to untyped code for the machine and most languages require no runtime
type-checking.

However, when compiling to a typed virtual machine not only must the original functional
program be type correct in its type system, but its translation must be type correct in the type
system of the virtual machine. An ideal translation would maximise efficiency for source
language execution and compatibility with other languages hosted on the virtual machine.
However these two goals can be incompatible. For Mondrian, which is designed for scripting
and interoperabilty, we choose to favour compatibility over efficiency when resolving conflicts.
For a compiler for Haskell the choices might well be different.

The Trivial Translation: ∀τ.τ → Object

A trivial translation is possible; the virtual machines provide a inheritance/subtype hierarchy


with a named root type and dynamic type-checking, thus every source type can be translated
to the root type. This translation is neither particularly efficient or compatible and it is
tempting to reject it outright.

However non-strict evaluation complicates any translation into a typed architecture which
does directly support it, as the object-oriented virtual machines do not. Any value:

v:τ

4
will be of actual type:

v : () → τ

if it is a delayed computation. What type on the virtual machine can represent both of these?
The simple answer is Object, so the trivial translation may on occasion be needed. We return
to this issue later, after introducing the Mondrian type system and its idealised translation.

3.1 Types in Mondrian

The Mondrian type system is essentially that of the virtual machine with the addition of
parametric polymorphism. We define the type system by its translation into that of the virtual
machine. Mondrian provides all the core functional language types, or some equivalent, as
shown in the following table comparing Haskell and Mondrian:

<insert table>

Mondrian supports the usual set of primitive types; integers, floating point numbers, booleans,
etc. For portabilty of Mondrian source standard names for these types are defined and these
are translated to appropriate name for the virtual machine. For example, Mondrian’s Integer,
is translated to Integer on the JVM and System.Int32 on the CLI.

Unusually for a functional language, a string primitive type is provided, as this is provided by
the virtual machine’s. The standard convention of using lists of characters is also available,
and standard functions are provided to convert between the two representations.

3.2 Product Types

Mondrian provides product types with named fields using the virtual machines class types,
and adopts the standard syntax for them. Values of product type are also formed using
object-oriented style constructor syntax. For example, the Haskell types:

type Coord = (Float, Float, Float); // e.g. (1.2, 3.4, 5.6)

data Coord = Cd(x, y, z :: Float); // Cd (1.2, 3.4, 5.6) ??

data Coord = Cd Float Float Float; // Cd 1.2 3.4 5.6

are all replaced in Mondrian by:

class Coord { Float x; Float y; Float z; } // Coord(1.2, 3.4, 5.6)

Discussion

A field-only class has a product type, so they are an obvious encoding for Mondrian products.
The choice also aids interoperabilty with other languages, as does adopting the standard
syntax. However the encoding is not exactly as described here, which complicated inter-
operability slightly, but we defer this until the discussion of non-strict evaluation.

We decided against providing anonymous, or synonym, product types in Mondrian as these


would add nothing to the language and unnecessarily complicate language inter-operabilty.
As the underlying virtual machine’s also do not support such types an encoding to a named
class type would be required. Though such an encoding is trivial, and the Mondrian programmer
need to be aware of it, it would need to be known if values of such type were to be passed to
other languages.

5
3.3 Sum Types

Mondrian provides the equivalence of sum types using a combination of abstract classes, that
is classes which can have no instances, and sub-classes. For example the set of Mondrian
types:

abstract class IntTree {}


class Nil extends IntTree {}
class Leaf extends IntTree { val : Integer; }
class Node extends IntTree { left : IntTree; right : IntTree; }

Corresponds to the Haskell type:

data IntTree
= Nil
| Leaf { val :: Integer }
| Node { left :: IntTree, right :: IntTree}

The Mondrian formulation is clearly syntactically a bit more involved, but has the advantage
of compatibility with other virtual machine hosted languages.

Discussion

The trivial correspondence comes from the simple bijection between:

R=S+T;S=…;T=…

and:

R' ; S ⊆ R' = … ; T ⊆ R' = …

where the definitions of S and T are the same and no values of type R' are permitted.

The correspondence is also found at the implementation level. A functional language


implementation typically represents a variant as a record structure with a tag field indicating
the variant, while an object-oriented language implementation might use a record structure
with a tag field indicating the subtype. (Note: the “tag” field in either need not be a simple
value, it may for example be a reference to a type descriptor.)

Direct Tagging

[Can we think of any encoding which doesn’t use subclasses at all?]

A disadvantage of using the above correspondence is seen when considering the typical usage
of case analysis when defining functions over “sum” types. For example consider the Mondrian
function which uses case analysis to compute the number of leaves of an IntTree:
leafCount :: IntTree -> Integer;
leafCount = t ->
switch(t)
{ case Node{a = left; b = right;}:
(leafCount a) + (leafCount b);
case Leaf:
1;

6
case Nil:
0;
}

This is compiled into virtual machine code3 similar to (function encoding will be covered in
more detail later):

Integer leafCount(IntTree t)
{ if(t instanceof Node)
return leafCount(t.left) + leafCount(t.right);
else if(t instanceof Leaf)
return 1;
else
return 0;
}

The multiple uses of instanceof are a potential source of inefficiency. The instanceof test
is not a simple equality test as has to check for a value being a type or any of its subtypes.
This issue could be addressed by adding an explicit tag field to the translation of sum types,
as is done on untyped machines, which results in the more efficient code:

Integer leafCount(IntTree t)
{ switch(t.tag)
{ …
}
}

However this simple solution effectively duplicates the tag field and increases the size of
every variant. It also suffers from a lack of compatibility with other languages. The tag field
can easily be ignored when referencing a Mondrian created value from another language. The
“tagless” design though also allows a class/subclass value created by another language to be
handled in Mondrian using case analysis, which would not be possible if Mondrian relied on
explicit tags.

The efficiency issue could be alleviated, or solved, if the virtual machines provided a “type
case” style operation, which performed a multi-way branch depending on the (sub)type of its
argument. However neither the JVM or CLI provide such an operation.

3.4 Parametric Types

Mondrian provides parametric types by extending class declarations to allow type parameters.
For example the set of Mondrian types:

abstract class List<a> {}


class Nil<a> extends List<a> {}
class Cons<a> extends List<a> { head : a; tail : List<a>; }

Corresponds to the Haskell type:

data List a
= Nil
| Cons { head :: a, tail :: List a}

3
Shown for readability using C# rather than assembler .

7
Mondrian translates parametric types by erasure, replacing occurrences of type arguments
with Object. This follows the approach used by GJ [10]. For example the above set of types
becomes:

abstract class List {}


class Nil : List {}
class Cons : List { head : Object; tail : List; }

This translation involves loss of static type information, which impacts both efficiency and
inter-language working as discussed below.

Discussion

The object-oriented virtual machines we are considering do not support parametric types.
This has been noted by others in the context of non-functional languages and proposals exist
to extended for both the JVM [18] and the CLI [11] in include them. The syntax used by
Mondrian follows that of these proposals so that should they be implemented the Mondrian
syntax will then be familiar.

Mondrian’s translation by erasure is a homogenous one following the terminology of [9], that
is a single target class (and later function) is produced from each parameterised class (or
function) declaration. An alternative is the heterogeneous approach where each application of
a parameterised type (or function) produces a separate target class (or function). This latter
method is used, for example, by Ada 95 for generic packages and functions.

Mondrian adopts the homogenous approach as it increases compatibility with the typical
approach used in Java and C# , though at some expense in runtime type-checking as the
resultant types are, rather confusingly, heterogeneous. That is, as generic type in Java/C# is
defined using Object different elements of a structure can have different types.

We accept the runtime overhead as it is exactly the same cost paid by Java/C# style
implementations. Note however that in many cases there is no runtime cost in using erasure.
Consider the length function for the List type defined above:

length : forall a. List<a> -> Integer;


length = l ->
switch(l)
{ case Nil : 0;
case Cons{t = tail;} : 1 + (length t);
};

This function is independent of the type of the list elements and no runtime checking will be
performed. Many other polymorphic functions, such as the common higher-order map and
fold operations, will also require no runtime checks in their translations. It is only when the
original type is required that a runtime check is needed, and once performed subsequent
access to the same value do not require further checks.

Though inter-language compatibility is increased by using the homogenous translation, there


remain issues with inter-language operability. An instance of a parameterised type generated
by Mondrian is guaranteed to be homogenous. However should an instance of the type be
created by another language and then passed to Mondrian it may be type-incorrect, that is
heterogeneous, due to the use of the Object type. All such errors will be caught, but only at
runtime.

The translation could possibly add code to class constructors to ensure that only homogenous
parametric types are constructed, and then require other languages to use these constructors.

8
However in the presence of subtyping this could is potentially expensive, and is never
required when Mondrian does the construction. For this reason we have not pursed this
option. Additionally, should generic support be added to the virtual machines, and we are
hopeful it will be, such checking would not be required.

Security Implications

Agesen et al. [19] show the homogenous translation as used by Pizza [9] has a potential
security problem. This same problem has subsequently been acknowledged to be present GJ,
and they present a coding solution [10] to address it.

The problem is also present in Mondrian, and like GJ there is a simple solution to it. However
the functional programming model of Mondrian, with its separation of data and functions,
appears4 to reduce the impact of the problem.

The problem is demonstrated by the following Pizza code due to Agesen (note Pizza supports
bounded quantification, see later for a discussion of this in the context of Mondrian):

class BroadcastList<C extends Channel>


{ C channels[];
void add(C c) { … }
void broadcast(String s) { … channels[I].send(s) … }
}

class Encrypted extends Channel;


class Unencrypted extends Channel;

BroadcastList<Encrypted> bl = … ;
bl.add(new EncryptedChannel());
Java code here accessing bl
bl.broadcast("Hello");

The problem lies in the erasure of Encrypted in BroadcastList<Encrypted> to simply


Channel (its bound). Though Pizza would ensure that only values of type Encrypted are
added to the list, if plain Java code is called it may insert a value of type Unencrypted. The
call to broadcast will then transmit over this unencrypted channel.

We can translate this example to C# (to get an object with a method) and Mondrian. First we
define the types in C#:

class Channel
{ void Send(Message m);

}

class Encrypted : Channel { … }


class Unencrypted : Channel { … }

and then define a Mondrian function to perform a broadcast over a list of encrypted channels:

broadcastEncrypted :: Message -> List<Encrypted> -> IO<Void>;

In the same way as for Pizza/Java, a value of type Unencrypted can be added to a
List<Encrypted> in the Mondrian/C# example by calling C# code. However in the Mondrian

4
We await contradiction!

9
broadcastEncrypted any reference to an element of the list will be checked at runtime to be
of type Encrypted. This is analogous to GJ’s translation of its “bridge” methods.

This problem could be resolved in the same way as for constructors above, and would be also
be addressed by generic support in the virtual machines.

4 Mondrian Function Types


The functional and object-oriented approach to functions are quite different. In functional
languages functions are first class values which can be statically defined, created as the result
of an expression, and passed around like other values. Functions in the object-oriented paradigm
either exist as static members of classes, or as members of objects (instances of classes). The
concept of a function value usually does not exist. However there is an obvious parallel
between an object value with a member function and a function value. Typical object-oriented
programming models use such an object where the typically imperative or functional model
would use a function value. Our encoding of functions on the virtual machine host follows
this parallel.

4.1 Functions

We compile a Mondrian function into a class with two standard methods: an instance method,
Eval; and a static method, Apply. For example, our example leafCount function declared
as:

leafCount :: IntTree -> Integer

is translated (approximately) to:

class leafCount
{ Integer Eval(IntTree t) { … }
static Integer Apply(IntTree t) { … }
}

An instance of such a function class is used by the Mondrian execution engine (described
later) to represent a function value, and the Eval method is used to evaluate the function
when needed. Other languages executing on the virtual machine host can call the static Apply
method to evaluate the function, as in the following C# fragment:

int count;
IntTree aTree;

count = leafCount.Apply(aTree);

As function classes only encode behaviour, a function having no state, no instance variables
are required by the translation. Therefore any instance is essentially identical to any other
one. The translation takes advantage of this and stores a reference to a single instance in a
private static field within the class. The Mondrian execution engine and the Apply method
both use this single instance, so new instances are not created for every functional evaluation.

Discussion

By representing all functions, whether named statically or passed by value, as object instances
a simple and consistent translation of function evaluation is possible, this is calling an instance
method. The use of a static field to hold a single shared instance of a function class greatly

10
reduces the cost of the translation, without it a typical program would construct large numbers
of (identical) class instances all of which would quickly become garbage for the collector to
reclaim. From these standpoints the translation scheme is near optimal given the constraints
of the virtual machines.

However there is a possible problem. For various reasons, such as the support of reflection,
having a lot of classes defined in a program could present problems for the virtual machines
as every class definition consumes runtime resources. How real this problem is we do not
know, but it has been raised with us by one virtual machine supplier as a possible issue5.

The issue could be addressed by translating Mondrian modules to classes, and functions to
(static) members of that class. This could dramatically reduce the number of generated
classes and calls to statically named functions would be simple and efficient. However
function values would still have to be represented by class instances, and the translation of
function calls would depend on whether the function was statically named or not.

Though this alternative scheme would add some complexity to the translation it is not difficult.
Indeed many conventional functional language implementations use essentially this approach,
distinguishing statically named functions and function values. For Mondrian we have chosen
the first scheme as its consistent handling of functions simplifies the compiler, and as a
scripting language we do not expect to produce programs so large that the virtual machines
will be swamped by class definitions.

4.2 Partial Applications and Suspended Expressions

In common with many functional language implementations we treat partial applications and
suspended expressions in essentially the same way; the only significant difference of course
being that the former cannot be evaluated fully until further values are available, while the
latter has zero pending arguments.

A standard technique for implementing these constructs performs free variable, or maximum
free expression, analysis and then creates a closure containing the values of these. We
implement this standard approach by adding instance variables to the function class and a
constructor to initialise these to the values of the free variables. Partial applications, suspended
expressions, and functions are therefore all represented uniformly as object references with a
well-known method (Eval). As partial applications and suspended expressions are dynamic
instances of classes the static Apply method is replaced by an instance method, allowing
other languages to use these values.

For example, the Mondrian code fragment:

times : Float -> Float -> Float;



addTax = times 1.125;

is translated (approximately) into:

class partialTimes
{ Float a1;
partialTimes(Float a1) { this.a1 = a1; }
Float Eval(Float a2) { … times … a1 … a2 … }
}

addTax = new partialTimes(1.125);

5
They were (understandably) vague about what might be considered “too many” classes.

11
Discussion

This translation follows directly from the translation of functions. Should the alternative
translation scheme for functions be used this would not change the scheme for partial applications
and suspended expressions, as along with function values they are dynamic entities.

4.3 Accessing Suspended Expressions From Other Languages

The support of non-strict evaluation in Mondrian raises the issue of how other virtual machine
hosted languages, which are probably strict, see these values. As mention above under Product
Types the encoding of structured types is not exactly as described, to support non-strict
evaluation we have to type each data structure field so that is can either be a value or a
suspended expression. We therefore type each field simply as Object, the super-type of all
reference types on the virtual machines. Returning to our coordinate example:

class Coord { Float x; Float y; Float z; }

This is actually translated to:

class Coord { Object x; Object y; Object z; }

Unfortunately this means that other languages cannot directly access fields of Mondrian
generated classes. However it is generally accepted as bad practice to allow direct access to
instance variables of classes, rather access methods should be used. If access methods are
written in Mondrian, then any value returned to another language will be (at least) in WHNF6,
which is all that is required.

Discussion

The erasing of all field types raises the possibility of performance issues due to excessive
runtime type checks by the host virtual machine. Even after a field had been evaluated and
updated to be a WHNF value the virtual machine may see its type as Object and insert a
(redundant) check. This issue would appear to be more severe than the erasure of type
information to support parametric types and polymorphism, and brings into question the
efficiency of the translation, or the support of non-strict evaluation.

We considered whether the class inheritance mechanism could be used to reduce the type loss
in supporting non-strict evaluation by defining a non-strict subclass for every type. For
example we examined code along the lines of:

class Coord
{ Float x; Float y; Float z;
Float getX() { return x; }

}

6
In a non-strict language a function is never evaluated until its result is required in WHNF. Normal functional language
compilation takes advantage of this when producing code. This means, for example, that tohugh an integer addition function
may accept unevaluated arguments its result will be an integer.

12
class NS_Coord extends Coord
{ Object x; Object y; Object z;
Float getX()
{ if(!(x instanceof Float))
{ // evaluate x
}
return (Float)x;
}

}

This approach replaces the testing code the compiler emits when accessing a field by a
method dispatch which only does the test if the field may be unevaluated. Techniques similar
to this are used by some functional language implementations running on traditional machines,
in which unevaluated structures are overwritten by evaluated ones. However the technique is
ineffective for typed virtual machines as you cannot overwrite an object with another one.
That is, for example, if you have a reference to an NS_Coord you cannot overwrite the
referenced object to be a Coord after you have evaluated it. Thus tests can only be avoided if
a value is constructed out of known evaluated values. A further disadvantage is that the
platform primitive types, such as Integer, cannot be subclassed on current virtual machines;
so they would either have to be treated specially or replaced by non-standard classes, which
would complicate inter-language operations. For these reasons we rejected subclassing as too
involved compared to the expected benefits

However type erasure is amenable to another standard technique, strictness analysis. Even a
simple first-order usage analysis, such as that described in [20], provides plenty of benefits. If
this is combined with shadow variables, that is local variables of the correct type assigned the
result of evaluating a suspension, and multiple entry functions (as described in [20]) for
situations when arguments are known to be evaluated, then we expect the resulting code to be
as efficient as can be reasonably achieved when non-strict evaluation is supported.

For Mondrian we have not yet implemented strictness analysis as in practice the current code
has proved to be fast enough7 in the scripting type applications it has been used for.

5 Wot, No Interfaces?
Mondrian does not currently provide interface type declarations. This requires some
justification. Note, however, that Mondrian does support accessing objects created by other
languages which match an interface type. This is trivially provided as there is no distinction
in the virtual machine type systems between an object being of a class or interface type.

Background

Interface types were introduced in Java/JVM and are provided in C#/CIL. The “functionally
inspired” extensions to object-oriented languages, such as GJ [10] and Generic C# [11],
include parameterised interfaces. These extensions follow the object-oriented approach of
requiring a type to be declared to fulfil an interface. Haskell [1] provides a “class” system.
This is similar in use to the interface types of object-oriented languages but has significant
difference: a type is defined to meet a class specification after it is declared, indeed a type
declared elsewhere can be imported and defined to meet a class specification defined locally.
Ada 95 generic packages and functions which allow “with” clauses to be attached to generic

7
We have not benchmarked Mondrian in detail but casual observations seem to show it to be faster than Hugs and slower
than, the highly optimised, Glasgow Haskell Compiler.

13
type arguments which specify functions which must be supported over the type can be
viewed as ad hoc interface types. All these approaches can be related to F-bounded
polymorphism [21].

In the original design of Mondrian, with its aim of being a simple and uncluttered language, it
was decided not to inherit Haskell’s class system, as this can become quite involved. However
with the developing goal of close inter-working with object oriented languages running on
typed virtual machines we have re-examined this decision; and in particular whether some
kind of interface type, probably parameterised, should be supported.

“Haskell like” Interfaces

Haskell classes follow the “data + function” model, and as such would fit into Mondrian. A
Haskell class is a set of type schemas for functions which must be provided over a type for it
to match the class. Types only match a class by declaration, the declaration defining a
function for each of the class members. The Haskell system is very general, allowing arguments
in class declarations to themselves be constrained by other classes. This generality gives rise
to the complexity which saw classes omitted from Mondrian originally.

Haskell classes are typically compiled by creating a “dictionary” of functions which is passed
as an additional argument to functions which take arguments constrained by classes. A
reference to a function within a class translates to an access to this dictionary. An interface
type specifies a set of functions a type must provide to match the interface. A Haskell class
could be translated into an interface type, and a dictionary to an instance of that type.
Therefore Haskell style classes could be compiled onto typed object-oriented virtual machines.

Such a scheme would add to Mondrian, but not to Mondrian’s compatibility with object-oriented
languages. This model is not how those languages view interfaces. It is unclear how a
Mondrian function with constrained arguments should be callable from other languages as
dictionary arguments would need to be supplied.

An alternative would be to combine the dictionary and argument it refers to into a single
object instance, with the dictionary methods becoming instance methods of that object. This
would increase compatibility with other languages, but at added complexity to the translation
scheme.

“Object-oriented style” Interfaces

Mondrian product and sum-types are translated to “field-only” classes in the virtual machine
type systems. This means we do not support member functions in Mondrian constructed
classes. Of course as in Mondrian functions are values, fields of Mondrian classes can have
function type, but that is not the same thing as object-oriented member functions. Member
functions would not make sense, in the normal way, in Mondrian as they are prototypically
“state” mutators, and functional languages don’t have any state to mutate. From an object-
oriented perspective an interface type is purely a set of mutators, interfaces do not usually
specify fields. Therefore adding object-oriented style interfaces to Mondrian as it stands
would be of little use.

“Object-oriented style” Classes

We are investigating adding object-oriented style classes to Mondrian, and in turn this would
allow interfaces to be added. Adding classes would improve the inter-working with other
languages, as explained in the worked example below (7). In our proposed design a Mondrian
member function would have a monadic type with explicit passing of the current object value
(“this”). The view presented to other languages would be non-monadic with an implicit
current object, in essence the reverse of the Mondrian’s invoke construct (2.3) would be

14
performed. There are a number of unanswered questions in this design, including; should
non-monadic functions be allowed, and should class functions (and fields) be supported; and
work is under way in these areas.

6 The Execution Engine


Techniques for compiling non-strict, recursion-centric, languages such as Mondrian are by
now well developed for conventional machines. In this regard targeting a typed object-oriented
virtual machine has only minor impacts. In the typed environment the format of runtime
structures, such as the stack and the heap, are not only defined by the virtual machine but
private to it and cannot be accessed. This impacts the recursion, in that we cannot implement
our own tail-recursion optimisation, and garbage collection, in that it becomes an non-issue.

The current Mondrian compiler uses a modified STG [22] model; with a private argument
stack to support partial application, a trampoline to support tail-recursion, and leveraging the
virtual hosts exception mechanism to handle closure updates.

The virtual hosts only support all arguments of a function being supplied during a function
call. While it is possible to transform partial applications away by analysis a functional
program, we currently use a private stack to hold function arguments which allows direct
support for partial applications.

The use of a trampoline [23] is dictated by the virtual machines. The JVM does not provide a
tail-call instruction at all, and though the CLI does it is rather inefficient in current
implementations.

The use of exceptions to handle closure updates simplifies the design, but may be questionable
in terms of performance. The STG uses a “mark stack” to record references to closures which
need to be updated. Mondrian originally followed this approach. However when support for
threads was added we looked for ways to reduce the complexity of the implementation and
the number of private stacks per thread from two to one.

In our design when the evaluation of a closure is commenced a new trampoline is started,
which takes up one frame on the virtual machines stack. If a condition arises which requires
the closure to be update, i.e. a partial application will result, then a virtual machine exception
is thrown which is caught by trampoline loop routine. The closure can is then updated and the
trampoline routine returns.

Discussion

The use of exceptions for updating closures is quite elegant, and removes the need for the
mark stack, but may not be efficient depending on the implementation of the virtual machines
exception handling. However the exception is never thrown far, and switching to this scheme
had no noticeable effect on performance while simplifying the runtime system.

Using a private stack, exceptions, and the runtime type checks that stem from the design of
the type system, would suggest that the performance of Mondrian would not be very good.
However in practice Mondrian performs well on both the JVM and the CLI in the scripting-like
application areas it was designed for. For example, we have implemented a graphical animation
of Dining Philosophers based on an original Java applet [24]. The Java, C# and Mondrian
versions of this program are indistinguishable to the user.

15
7 Using Mondrian: A Multi-Language Example
We conclude by working through a complete, though small, example of how Mondrian can
be combined with traditional object-oriented languages, in this case C#, to solve a problem.

7.1 The Sieve of Eratosthenes in C#

The following method8 is an implementation of the Sieve of Eratosthenes in C# :

static void PrintPrimes(int limit)


{ int[] sieve = new int[limit];

// initialise array
for(int i = 2; i < limit; i++) sieve[i] = i;

//find primes
for(int cursor = 2; cursor < limit; cursor++)
{ if(sieve[cursor] != 0)
{ System.Console.WriteLine( cursor );
// “sieve” the array
for(int j = cursor + 1; j < limit; j++)
if(sieve[j]%cursor == 0)
sieve[j] = 0;
}
}
}

The above method will generate and display all the primes up to a certain value, call this N.
However, what if you need to generate N primes? To do this using the above formulation you
need to start with an array large enough to hold the N’th prime, and to determine that size you
need to know the N’th prime, which you will not until the program has been run…

Constructing a C# program which can produce N primes, rather than all primes ≤N, is a lot
more involved. It is possible, and concurrent programming and threads can help [25].

7.2 The Sieve: Combining Mondrian and C#

Producing primes using Eratosthenes’s algorithm on “infinite” lists is a standard example in


functional programming. Here is the code written in Mondrian:

// remove all multiples from a list


sieve : List -> List;
sieve = xs ->
switch (xs)
{ case (y::ys):
y :: sieve (filter (notMultiple y) ys);
};

notMultiple : Integer -> Integer -> Boolean;


notMultiple = x -> y -> not (y % x == 0);

8
We use the commond idiom of using division to check for multiples. A “true” sieve uses index incrementing and involves
no division, however the method shown is more widely known.

16
// the “infinite” list of primes
primes : List Integer;
primes = sieve (from 2);

The following C# method, PrintPrimes, calls the Mondrian function primes defined above,
and the standard Mondrian list accessor functions, hd and tl, to display count primes starting
from the Nth:

void PrintPrimes(int N, int count)


{ // list of all the primes...
Object longList = primes.Apply();

// skip to first desire prime


while(--N > 0)
{ longList = tl.Apply(longList);
}

// display primes
while(count-- > 0)
{ System.Console.WriteLine( hd.Apply(longList) );
longList = hd.Apply(longList);
}
}

Discussion

Though the above example is easy to follow it lacks the “feel” which object-oriented
programmers are familiar with. This stems from the mismatch between the object-oriented
object/method view and the functional data/function model. An object-oriented prime generator
would probably be written in “iterator” style, that is as a class with one method to obtain the
current prime and another to advance to the next prime.

The iterator style can be provided by wrapping the accesses to Mondrian functions within
another class:

class Primes
{ Object longList;

public Primes()
{ longList = primes.Apply();
}

public Integer Current()


{ return (Integer)hd.Apply(longList);
}

public void Next()


{ longList = tl.Apply(longList);
}
}

17
and PrintPrimes could then be written as:

void PrintPrimes(int N, int count)


{ // allocate a prime generator
Primes primeGen = new Primes();

// skip to first desire prime


while(--N > 0)
{ primeGen.Next();
}

// display primes
while(count-- > 0)
{ System.Console.WriteLine( primeGen.Current() );
primeGen.Next();
}
}

Work in progress is examining generating wrapper classes such as these automatically from
code written in Mondrian.

8 Conclusions
We have described a method for supporting non-strict functional languages on object-oriented
virtual machines. Though there is quite a gap between the functional and object-oriented
computational models the method is straightforward.

Calling functional code from object-oriented languages is not as seamless as vice-versa. This
is because traditional functional languages have no concept of “object” and “method”, and
certainly not of the state an object typically embodies. This means that without making
significant additions to the functional languages they are unable to present an object-oriented
“view” of themselves to object-oriented clients.

For example, an object-oriented client would expect an abstract data type (ADT) to be
represented by a class with instance methods for manipulating its values. However under our
strategy, which follows the functional data model, an ADT is represented by a data-only class
and a set of method-only classes.

In our future work we plan to explore the design space for extending functional languages to
provide object-oriented style access to functional code.

Our experimental language Mondrian is available for free download [26]. Also available is a
demonstration version of the Glasgow Haskell Compiler modified to use Mondrian as a
compilation route for executing on the .NET Framework9.

Acknowledgments

This work was started while both authors were at Utrecht Universiteit. Many thanks go to
Arjan van IJzendoorn at Utrecht who contributed to the design and did much of the
implementation work on the compiler. The work was supported with grants from the Dutch
Government and Microsoft Corporation.

9
The Haskell system will also produce Java code but we have not produced a Java version of GHC’s runtime support. If
you wish to do that then you can run Haskell on the JVM as well.

18
9 References
[1] S. P. Jones, J. Hughes, L. Augustsson, D. Barton, B. Boutel, W. Burton, J. Fasel, K.
Hammond, R. Hinze, P. Hudak, T. Johnsson, M. Jones, J. Launchbury, E. Meijer, J.
Peterson, A. Reid, C. Runciman, and P. Wadler, "Report on the Programming
Language Haskell 98," http://www.haskell.org. Haskell Committee, 1999.

[2] E. Meijer and K. Claessen, "The Design and Implementation of Mondrian," presented
at Haskell Workshop, 1997.

[3] E. Meijer, N. Perry, and A. v. IJzendoorn, "Scripting .NET using Mondrian," presented
at ECOOP, 2001.

[4] T. Lindholm and F. Yellin, The Java™ Virtual Machine Specification, 2 ed: Addison-
Wesley, 1999.

[5] ECMA, "CLI Partition II - Metadata," http://msdn.microsoft.com/net/ecma/. ECMA


TC39/TG3, 2001.

[6] ECMA, "CLI Partition III - CIL," http://msdn.microsoft.com/net/ecma/. ECMA


TC39/TG3, 2001.

[7] ECMA, "CLI Partition I - Architecture," http://msdn.microsoft.com/net/ecma/. ECMA


TC39/TG3, 2001.

[8] N. Perry, "Functional Languages for the .NET Framework," in Programming in the
.NET Environment, D. Watkins, M. Hammond, and B. Abrams. Addison Wesley, 2002,
ISBN 0-201-77018-0.

[9] M. Odersky and P. Wadler, "Pizza into Java: Translating theory into practice,"
presented at 24th ACM Symposium on Principles of Programming Languages, Paris,
1997.

[10] G. Bracha, M. Odersky, D. Stoutamire, and P. Wadler, "Making the future safe for the
past: Adding Genericity to the Java ™ Programming Language," presented at OOPSLA
98, Vancouver, 1998.

[11] D. Syme and A. Kennedy, "Design and Implementation of Generics for the .NET
Common Language Runtime," presented at PLDI 2001, Snowbird, Utah, 2001.

[12] P. Wadler, "Comprehending Monads," presented at Lisp and Functional Programming,


Nice, 1990.

[13] B. Joy, G. Steele, J. Gosling, and G. Bracha, The Java™ Language Specification, 2 ed:
Addison-Wesley, 2000.

[14] ECMA, "C# Language Specification," http://msdn.microsoft.com/net/ecma/. ECMA


TC39/TG2, 2001.

[15] N. Perry, "I/O and Inter-language Calling for Functional Languages," presented at
Proceedings IX International Conference of the Chilean Computer Society and XV
Latin American Conference on Informatics, Chile, 1989.

[16] S. Marlow, S. P. Jones, and A. Moran, "Asynchronous exceptions in Haskell,"


presented at 4th International Workshop on High-Level Concurrent Languages,
Montreal, Canada, 2000.

19
[17] P. Burgess, N. Perry, and R. Pointon, "The Concurrent Massey Hope+C Functional
Language System," Massey University ISBN 0-473-06203-8, 1999 1999.

[18] G. Bracha, "JSR #000014: Add Generic Types to the Java™ Programming Language,"
http://java.sun.com/aboutJava/communityprocess/jsr/jsr_014_gener.html. Sun
Microsystems, 1999.

[19] O. Agesen, S. N. Freund, and J. C. Mitchell, "Adding Type Parameterizaton to the


Java™ Language," presented at OOPSLA, 1997.

[20] N. Perry, "Non-Strict Fpm, A High Performance Lazy Abstract Machine," presented at
Fifteenth Australian Computer Science Conference, Hobart, 1992.

[21] P. Canning, W. Cook, W. Hill, J. Mitchell, and W. Olthoff, "F-bounded quantification


for object-oriented programming," Functional Prog. and Computer Architecture, pp.
273-280, 1989.

[22] S. P. Jones and J. Salkild, "The Spineless Tagless G-machine," presented at Functional
Programming and Computer Architecture, 1989.

[23] G. L. Steele, Jr., "Rabbit: A Compiler for Scheme,".: MIT, 1978, pp. 272.

[24] The Java-SIG Team, Java-SIG's 100 Best Applets: John Wiley, July 1997.

[25] J. N. Magee, "Primes Sieve of Eratosthenes," http://www-


dse.doc.ic.ac.uk/concurrency/book_applets/Primes.html. Imperial College, 1998.

[26] "Mondrian," http://www.mondrian-script.org.

20

You might also like