Professional Documents
Culture Documents
We describe a derivational approach to abstract interpretation that with store-allocated continuations allows a direct structural ab-
yields novel and transparently sound static analyses when applied straction1 by bounding the machine’s store. Thus, we are able to
to well-established abstract machines. To demonstrate the tech- convert semantic techniques used to model language features into
nique and support our claim, we transform the CEK machine static analysis techniques for reasoning about the behavior of those
of Felleisen and Friedman, a lazy variant of Krivine’s machine, very same features. By abstracting well-known machines, our tech-
and the stack-inspecting CM machine of Clements and Felleisen nique delivers static analyzers that can reason about by-need evalu-
into abstract interpretations of themselves. The resulting analyses ation, higher-order functions, tail calls, side effects, stack structure,
bound temporal ordering of program events; predict return-flow exceptions and first-class continuations.
and stack-inspection behavior; and approximate the flow and eval- The basic idea behind store-allocated continuations is not new.
uation of by-need parameters. For all of these machines, we find SML/NJ has allocated continuations in the heap for well over a
that a series of well-known concrete machine refactorings, plus a decade [28]. At first glance, modeling the program stack in an ab-
technique we call store-allocated continuations, leads to machines stract machine with store-allocated continuations would not seem
that abstract into static analyses simply by bounding their stores. to provide any real benefit. Indeed, for the purpose of defining the
We demonstrate that the technique scales up uniformly to allow meaning of a program, there is no benefit, because the meaning
static analysis of realistic language features, including tail calls, of the program does not depend on the stack-implementation strat-
conditionals, side effects, exceptions, first-class continuations, and egy. Yet, a closer inspection finds that store-allocating continua-
even garbage collection. tions eliminate recursion from the definition of the state-space of
the machine. With no recursive structure in the state-space, an ab-
Categories and Subject Descriptors F.3.2 [Logics and Meanings stract machine becomes eligible for conversion into an abstract in-
of Programs]: Semantics of Programming Languages—Program terpreter through a simple structural abstraction.
analysis, Operational semantics; F.4.1 [Mathematical Logic and To demonstrate the applicability of the approach, we derive
Formal Languages]: Mathematical Logic—Lambda calculus and abstract interpreters of:
related systems
• a call-by-value λ-calculus with state and control based on the
General Terms Languages, Theory CESK machine of Felleisen and Friedman [13],
Keywords abstract machines, abstract interpretation • a call-by-need λ-calculus based on a tail-recursive, lazy vari-
ant of Krivine’s machine derived by Ager, Danvy and Midt-
1. Introduction gaard [1], and
Abstract machines such as the CEK machine and Krivine’s ma- • a call-by-value λ-calculus with stack inspection based on the
chine are first-order state transition systems that represent the core CM machine of Clements and Felleisen [3];
of a real language implementation. Semantics-based program anal-
ysis, on the other hand, is concerned with safely approximating and use abstract garbage collection to improve precision [25].
intensional properties of such a machine as it runs a program. It
Overview
seems natural then to want to systematically derive analyses from
machines to approximate the core of realistic run-time systems. In Section 2, we begin with the CEK machine and attempt a struc-
Our goal is to develop a technique that enables direct abstract tural abstract interpretation, but find ourselves blocked by two re-
interpretations of abstract machines by methods for transforming cursive structures in the machine: environments and continuations.
a given machine description into another that computes its finite We make three refactorings to:
approximation.
1. store-allocate bindings,
∗ Supported
by the National Science Foundation under grant 0937060 to the 2. store-allocate continuations, and
Computing Research Association for the CIFellow Project.
3. time-stamp machine states;
resulting in the CESK, CESK⋆ , and time-stamped CESK⋆ ma-
chines, respectively. The time-stamps encode the history (context)
Permission to make digital or hard copies of all or part of this work for personal or of the machine’s execution and facilitate context-sensitive abstrac-
classroom use is granted without fee provided that copies are not made or distributed tions. We then demonstrate that the time-stamped machine ab-
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
stracts directly into a parameterized, sound and computable static
to lists, requires prior specific permission and/or a fee. analysis.
ICFP’10, September 27–29, 2010, Baltimore, Maryland, USA.
Copyright
c 2010 ACM 978-1-60558-794-3/10/09. . . $10.00 1 A structural abstraction distributes component-, point-, and member-wise.
In Section 3, we replay this process (slightly abbreviated) with
ς 7−→CEK ς ′
a lazy variant of Krivine’s machine to arrive at a static analysis of
by-need programs.
In Section 4, we incorporate conditionals, side effects, excep- hx, ρ, κi hv, ρ′ , κi where ρ(x) = (v, ρ′ )
tions, first-class continuations, and garbage collection. h(e0 e1 ), ρ, κi he0 , ρ, ar(e1 , ρ, κ)i
In Section 6, we abstract the CM (continuation-marks) machine hv, ρ, ar(e, ρ′ , κ)i he, ρ′ , fn(v, ρ, κ)i
to produce an abstract interpretation of stack inspection.
In Section 7, we widen the abstract interpretations with a single- hv, ρ, fn((λx.e), ρ′ , κ)i he, ρ′ [x 7→ (v, ρ)], κi
threaded “global” store to accelerate convergence. For some of our
analyzers, this widening results in polynomial-time algorithms and Figure 1. The CEK machine.
connects them back to known analyses.
An expression is either a value or uniquely decomposable into an
2. From CEK to the abstract CESK⋆ evaluation context and redex. The standard reduction machine is:
In this section, we start with a traditional machine for a program-
E[e] 7−→βv E[e′ ], if e βv e′ .
ming language based on the call-by-value λ-calculus, and gradu-
ally derive an abstract interpretation of this machine. The outline However, this machine does not shed much light on a realistic
followed in this section covers the basic steps for systematically implementation. At each step, the machine traverses the entire
deriving abstract interpreters that we follow throughout the rest of source of the program looking for a redex. When found, the redex
the paper. is reduced and the contractum is plugged back in the hole, then the
To begin, consider the following language of expressions:2 process is repeated.
e ∈ Exp ::= x | (ee) | (λx.e) Abstract machines such as the CEK machine, which are deriv-
x ∈ Var a set of identifiers. able from standard reduction machines, offer an extensionally
equivalent but more realistic model of evaluation that is amenable
A standard machine for evaluating this language is the CEK ma- to efficient implementation. The CEK is environment-based; it uses
chine of Felleisen and Friedman [12], and it is from this machine environments and closures to model substitution. It represents eval-
we derive the abstract semantics—a computable approximation of uation contexts as continuations, an inductive data structure that
the machine’s behavior. Most of the steps in this derivation corre- models contexts in an inside-out manner. The key idea of machines
spond to well-known machine transformations and real-world im- such as the CEK is that the whole program need not be traversed
plementation techniques—and most of these steps are concerned to find the next redex, consequently the machine integrates the pro-
only with the concrete machine; a very simple abstraction is em- cess of plugging a contractum into a context and finding the next
ployed only at the very end. redex.
The remainder of this section is outlined as follows: we present States of the CEK machine [12] consist of a control string (an
the CEK machine, to which we add a store, and use it to allo- expression), an environment that closes the control string, and a
cate variable bindings. This machine is just the CESK machine of continuation:
Felleisen and Friedman [13]. From here, we further exploit the store ς ∈Σ = Exp × Env × Kont
to allocate continuations, which corresponds to a well-known im- v ∈ Val ::= (λx.e)
plementation technique used in functional language compilers [28]. ρ ∈ Env = Var →fin Val × Env
We then abstract only the store to obtain a framework for the sound, κ ∈ Kont ::= mt | ar(e, ρ, κ) | fn(v, ρ, κ).
computable analysis of programs.
States are identified up to consistent renaming of bound variables.
2.1 The CEK machine Environments are finite maps from variables to closures. Envi-
A standard approach to evaluating programs is to rely on a Curry- ronment extension is written ρ[x 7→ (v, ρ′ )].
Feys-style Standardization Theorem, which says roughly: if an Evaluation contexts E are represented (inside-out) by continua-
expression e reduces to e′ in, e.g., the call-by-value λ-calculus, tions as follows: [ ] is represented by mt; E[([ ]e)] is represented
then e reduces to e′ in a canonical manner. This canonical manner by ar(e′ , ρ, κ) where ρ closes e′ to represent e and κ represents
thus determines a state machine for evaluating programs: a standard E; E[(v[ ])] is represented by fn(v ′ , ρ, κ) where ρ closes v ′ to
reduction machine. represent v and κ represents E.
To define such a machine for our language, we define a grammar The transition function for the CEK machine is defined in Fig-
of evaluation contexts and notions of reduction (e.g., βv ). An eval- ure 1 (we follow the textbook treatment of the CEK machine [11,
uation context is an expression with a “hole” in it. For left-to-right page 102]). The initial machine state for a closed expression e is
evaluation order, we define evaluation contexts E as: given by the inj function:
E ::= [ ] | (Ee) | (vE). inj CEK (e) = he, ∅, mti.
2 Fine
Typically, an evaluation function is defined as a partial function
print on syntax: As is often the case in program analysis where se- from closed expressions to answers:
mantic values are approximated using syntactic phrases of the program un-
der analysis, we would like to be able to distinguish different syntactic oc- eval ′CEK (e) = (v, ρ) if inj (e) 7−→
→CEK hv, ρ, mti.
currences of otherwise identical expressions within a program. Informally,
this means we want to track the source location of expressions. Formally, This gives an extensional view of the machine, which is useful, e.g.,
this is achieved by labeling expressions and assuming all labels within a to prove correctness with respect to a canonical evaluation function
program are distinct: such as one defined by standard reduction or compositional valu-
ation. However for the purposes of program analysis, we are con-
e ∈ Exp ::= xℓ | (ee)ℓ | (λx.e)ℓ
ℓ ∈ Lab an infinite set of labels. cerned more with the intensional aspects of the machine. As such,
we define the meaning of a program as the (possibly infinite) set of
However, we judiciously omit labels whenever they are irrelevant and doing reachable machine states:
so improves the clarity of the presentation. Consequently, they appear only
in Sections 2.7 and 7, which are concerned with k-CFA. eval CEK (e) = {ς | inj (e) 7−→
→CEK ς}.
Deciding membership in the set of reachable machine states
ς 7−→CESK ς ′
is not possible due to the halting problem. The goal of abstract
interpretation, then, is to construct a function, aval CEK
\ , that is a
sound and computable approximation to the eval CEK function. hx, ρ, σ, κi hv, ρ′ , σ, κi where σ(ρ(x)) = (v, ρ′ )
We can do this by constructing a machine that is similar in struc- h(e0 e1 ), ρ, σ, κi he0 , ρ, σ, ar(e1 , ρ, κ)i
ture to the CEK machine: it is defined by an abstract state transition hv, ρ, σ, ar(e, ρ′ , κ)i he, ρ′ , σ, fn(v, ρ, κ)i
\ ) ⊆ Σ̂ × Σ̂, which operates over abstract states,
relation (7−→CEK ′ ′
hv, ρ, σ, fn((λx.e), ρ , κ)i he, ρ [x 7→ a], σ[a 7→ (v, ρ)], κi
Σ̂, which approximate the states of the CEK machine, and an ab- where a ∈ / dom(σ)
straction map α : Σ → Σ̂ that maps concrete machine states into
abstract machine states.
The abstract evaluation function is then defined as: Figure 2. The CESK machine.
\ (e) = {ˆ
aval CEK ς | α(inj (e)) 7−→
→CEK
\ ςˆ}.
The state space for the CESK machine is defined as follows:
1. We achieve decidability by constructing the approximation in ς ∈Σ = Exp × Env × Store × Kont
such a way that the state-space of the abstracted machine is ρ ∈ Env = Var →fin Addr
finite, which guarantees that for any closed expression e, the σ ∈ Store = Addr →fin Storable
set aval (e) is finite. s ∈ Storable = Val × Env
a, b, c ∈ Addr an infinite set.
2. We achieve soundness by demonstrating the abstracted ma-
chine transitions preserve the abstraction map, so that if ς 7−→ States are identified up to consistent renaming of bound variables
ς ′ and α(ς) ⊑ ςˆ, then there exists an abstract state ςˆ′ such that and addresses. The transition function for the CESK machine is
ςˆ 7−→ ςˆ′ and α(ς ′ ) ⊑ ςˆ′ . defined in Figure 2 (we follow the textbook treatment of the CESK
machine [11, page 166]).
The initial state for a closed expression is given by the inj func-
A first attempt at abstract interpretation: A simple approach tion, which combines the expression with the empty environment,
to abstracting the machine’s state space is to apply a structural store, and continuation:
abstract interpretation, which lifts abstraction point-wise, element-
wise, component-wise and member-wise across the structure of a inj CESK (e) = he, ∅, ∅, mti.
machine state (i.e., expressions, environments, and continuations).
The eval CESK evaluation function is defined following the tem-
The problem with the structural abstraction approach for the
plate of the CEK evaluation given in Section 2.1:
CEK machine is that both environments and continuations are
recursive structures. As a result, the map α yields objects in an eval CESK (e) = {ς | inj (e) 7−→
→CESK ς}.
abstract state-space with recursive structure, implying the space
is infinite. It is possible to perform abstract interpretation over an Observe that for any closed expression, the CEK and CESK ma-
infinite state-space, but it requires a widening operator. A widening chines operate in lock-step: each machine transitions, by the corre-
operator accelerates the ascent up the lattice of approximation and sponding rule, if and only if the other machine transitions.
must guarantee convergence. It is difficult to imagine a widening Lemma 1 (Felleisen, [10]). eval CESK (e) ≃ eval CEK (e).
operator, other than the one that jumps immediately to the top of
the lattice, for these semantics. A second attempt at abstract interpretation: With the CESK ma-
Focusing on recursive structure as the source of the problem, a chine, half the problem with the attempted naı̈ve abstract interpre-
reasonable course of action is to add a level of indirection to the tation is solved: environments and closures are no longer mutually
recursion—to force recursive structure to pass through explicitly recursive. Unfortunately, continuations still have recursive struc-
allocated addresses. In doing so, we will unhinge recursion in ture. We could crudely abstract a continuation into a set of frames,
a program’s data structures and its control-flow from recursive losing all sense of order, but this would lead to a static analysis lack-
structure in the state-space. ing faculties to reason about return-flow: every call would appear
We turn our attention next to the CESK machine [10, 13], since to return to every other call. A better solution is to refactor contin-
the CESK machine eliminates recursion from one of the structures uations as we did environments, redirecting the recursive structure
in the CEK machine: environments. In the subsequent section (Sec- through the store. In the next section, we explore a CESK machine
tion 2.3), we will develop a CESK machine with a pointer refine- with a pointer refinement for continuations.
ment (CESK⋆ ) that eliminates the other source of recursive struc-
ture: continuations. At that point, the machine structurally abstracts 2.3 The CESK⋆ machine
via a single point of approximation: the store. To untie the recursive structure associated with continuations, we
shift to store-allocated continuations. The Kont component of the
2.2 The CESK machine machine is replaced by a pointer to a continuation allocated in
the store. We term the resulting machine the CESK⋆ (control,
The states of the CESK machine extend those of the CEK machine environment, store, continuation pointer) machine. Notice the store
to include a store, which provides a level of indirection for variable now maps to denotable values and continuations:
bindings to pass through. The store is a finite map from addresses
to storable values and environments are changed to map variables ς∈Σ = Exp × Env × Store × Addr
to addresses. When a variable’s value is looked-up by the machine, s ∈ Storable = Val × Env + Kont
it is now accomplished by using the environment to look up the κ ∈ Kont ::= mt | ar(e, ρ, a) | fn(v, ρ, a).
variable’s address, which is then used to look up the value. To The revised machine is defined in Figure 3 and the initial ma-
bind a variable to a value, a fresh location in the store is allocated chine state is defined as:
and mapped to the value; the environment is extended to map the
variable to that address. inj CESK ⋆ (e) = he, ∅, [a0 7→ mt], a0 i.
ς 7−→CESK ⋆ ς ′ , where κ = σ(a), b ∈
/ dom(σ) ς 7−→CESK ⋆t ς ′ , where κ = σ(a), b = alloc(ς), u = tick(ς)
hx, ρ, σ, ai hv, ρ′ , σ, ai where (v, ρ′ ) = σ(ρ(x)) hx, ρ, σ, a, ti hv, ρ′ , σ, a, ui where (v, ρ′ ) = σ(ρ(x))
h(e0 e1 ), ρ, σ, ai he0 , ρ, σ[b 7→ ar(e1 , ρ, a)], bi h(e0 e1 ), ρ, σ, a, ti he0 , ρ, σ[b 7→ ar(e1 , ρ, a)], b, ui
hv, ρ, σ, ai hv, ρ, σ, a, ti
if κ = ar(e, ρ′ , c) he, ρ′ , σ[b 7→ fn(v, ρ, c)], bi if κ = ar(e, ρ, c) he, ρ, σ[b 7→ fn(v, ρ, c)], b, ui
if κ = fn((λx.e), ρ′ , c) he, ρ′ [x 7→ b], σ[b 7→ (v, ρ)], ci if κ = fn((λx.e), ρ′ , c) he, ρ′ [x 7→ b], σ[b 7→ (v, ρ)], c, ui
The evaluation function (not shown) is defined along the same The tick function returns the next time; the alloc function allocates
lines as those for the CEK (Section 2.1) and CESK (Section 2.2) a fresh address for a binding or continuation. We require of tick
machines. Like the CESK machine, it is easy to relate the CESK⋆ and alloc that for all t and ς, t < tick(ς) and alloc(ς) ∈/ σ where
machine to its predecessor; from corresponding initial configura- ς = h , , σ, , i.
tions, these machines operate in lock-step: The time-stamped CESK⋆ machine is defined in Figure 4. Note
Lemma 2. eval CESK ⋆ (e) ≃ eval CESK (e). that occurrences of ς on the right-hand side of this definition are
implicitly bound to the state occurring on the left-hand side. The
Addresses, abstraction and allocation: The CESK⋆ machine, as initial machine state is defined as:
defined in Figure 3, nondeterministically chooses addresses when
it allocates a location in the store, but because machines are iden- inj CESKt⋆ (e) = he, ∅, [a0 7→ mt], a0 , t0 i.
tified up to consistent renaming of addresses, the transition system Satisfying definitions for the parameters are:
remains deterministic.
Looking ahead, an easy way to bound the state-space of this Time = Addr = Z
machine is to bound the set of addresses.3 But once the store is a0 = t0 = 0 tickh , , , , ti = t + 1 alloch , , , , ti = t.
finite, locations may need to be reused and when multiple values
are to reside in the same location; the store will have to soundly Under these definitions, the time-stamped CESK⋆ machine oper-
approximate this by joining the values. ates in lock-step with the CESK⋆ machine, and therefore with the
In our concrete machine, all that matters about an allocation CESK and CEK machines as well.
strategy is that it picks an unused address. In the abstracted ma- Lemma 3. eval CESKt⋆ (e) ≃ eval CESK ⋆ (e).
chine however, the strategy may have to re-use previously allo-
cated addresses. The abstract allocation strategy is therefore cru- The time-stamped CESK⋆ machine forms the basis of our ab-
cial to the design of the analysis—it indicates when finite resources stracted machine in the following section.
should be doled out and decides when information should deliber-
ately be lost in the service of computing within bounded resources. 2.5 The abstract time-stamped CESK⋆ machine
In essence, the allocation strategy is the heart of an analysis (allo- As alluded to earlier, with the time-stamped CESK⋆ machine, we
cation strategies corresponding to well-known analyses are given now have a machine ready for direct abstract interpretation via a
in Section 2.7.) single point of approximation: the store. Our goal is a machine
For this reason, concrete allocation deserves a bit more atten- that resembles the time-stamped CESK⋆ machine, but operates
tion in the machine. An old idea in program analysis is that dy- over a finite state-space and it is allowed to be nondeterministic.
namically allocated storage can be represented by the state of the Once the state-space is finite, the transitive closure of the transition
computation at allocation time [18, 22, Section 1.2.2]. That is, allo- relation becomes computable, and this transitive closure constitutes
cation strategies can be based on a (representation) of the machine a static analysis. Buried in a path through the transitive closure
history. These representations are often called time-stamps. is a (possibly infinite) traversal that corresponds to the concrete
A common choice for a time-stamp, popularized by Shiv- execution of the program.
ers [29], is to represent the history of the computation as contours, The abstracted variant of the time-stamped CESK⋆ machine
finite strings encoding the calling context. We present a concrete comes from bounding the address space of the store and the number
machine that uses general time-stamp approach and is parameter- of times available. By bounding these sets, the state-space becomes
ized by a choice of tick and alloc functions. We then instantiate finite,4 but for the purposes of soundness, an entry in the store may
tick and alloc to obtain an abstract machine for computing a k- be forced to hold several values simultaneously:
CFA-style analysis using the contour approach.
\ = Addr →fin P (Storable).
σ̂ ∈ Store
2.4 The time-stamped CESK⋆ machine
Hence, stores now map an address to a set of storable values rather
The machine states of the time-stamped CESK⋆ machine include a than a single value. These collections of values model approxima-
time component, which is intentionally left unspecified: tion in the analysis. If a location in the store is re-used, the new
t, u ∈ Time value is joined with the current set of values. When a location is
ς ∈ Σ = Exp × Env × Store × Addr × Time. dereferenced, the analysis must consider any of the values in the
set as a result of the dereference.
The machine is parameterized by the functions: The abstract time-stamped CESK⋆ machine is defined in Fig-
tick : Σ → Time alloc : Σ → Addr . ure 5. The (non-deterministic) abstract transition relation changes
little compared with the concrete machine. We only have to modify
3 A finite number of addresses leads to a finite number of environments, it to account for the possibility that multiple storable values (which
which leads to a finite number of closures and continuations, which in turn,
leads to a finite number of stores, and finally, a finite number of states. 4 Syntactic sets like Exp are infinite, but finite for any given program.
ςˆ 7−→CESK ′ [ ς , κ), u = tick(ˆ
\⋆ ςˆ , where κ ∈ σ̂(a), b = alloc(ˆ
d ς , κ) α(e, ρ, σ, a, t) = (e, α(ρ), α(σ), α(a), α(t)) [states]
t
α(ρ) = λx.α(ρ(x)) [environments]
G
hx, ρ, σ̂, a, ti hv, ρ′ , σ̂, a, ui where (v, ρ′ ) ∈ σ̂(ρ(x)) α(σ) = λâ. {α(σ(a))} [stores]
h(e0 e1 ), ρ, σ̂, a, ti he0 , ρ, σ̂ ⊔ [b 7→ ar(e1 , ρ, a)], b, ui α(a)=â
hv, ρ, σ̂, a, ti α((λx.e), ρ) = ((λx.e), α(ρ)) [closures]
if κ = ar(e, ρ′ , c) he, ρ′ , σ̂ ⊔ [b 7→ fn(v, ρ, c)], b, ui
if κ = fn((λx.e), ρ′ , c) he, ρ′ [x 7→ b], σ̂ ⊔ [b 7→ (v, ρ)], c, ui α(mt) = mt [continuations]
α(ar(e, ρ, a)) = ar(e, α(ρ), α(a))
Figure 5. The abstract time-stamped CESK⋆ machine. α(fn(v, ρ, a)) = fn(v, α(ρ), α(a)),
includes continuations) may reside together in the store, which we Figure 6. The abstraction map, α : ΣCESK ⋆t → Σ̂CESK
\⋆ .
t
handle by letting the machine non-deterministically choose a par-
ticular value from the set at a given store location.
The analysis is parameterized by abstract variants of the func- Because ς transitioned, exactly one of the rules from the definition
tions that parameterized the concrete version: of (7−→CESK ⋆t ) applies. We split by cases on these rules. The rule
d : Σ̂ × Kont → Time, [ : Σ̂ × Kont → Addr . for the second case is deterministic and follows by calculation.
tick alloc For the the remaining (nondeterministic) cases, we must show
In the concrete, these parameters determine allocation and stack an abstract state exists such that the simulation is preserved. By
behavior. In the abstract, they are the arbiters of precision: they examining the rules for these cases, we see that all three hinge on
determine when an address gets re-allocated, how many addresses the abstract store in ςˆ soundly approximating the concrete store in
get allocated, and which values have to share addresses. ς, which follows from the assumption that α(ς) ⊑ ςˆ.
Recall that in the concrete semantics, these functions consume
states—not states and continuations as they do here. This is because 2.7 A k-CFA-like abstract CESK⋆ machine
in the concrete, a state alone suffices since the state determines the In this section, we instantiate the time-stamped CESK⋆ machine
continuation. But in the abstract, a continuation pointer within a to obtain a contour-based machine; this instantiation forms the
state may denote a multitude of continuations; however the tran- basis of a context-sensitive abstract interpreter with polyvariance
sition relation is defined with respect to the choice of a particular like that found in k-CFA [29]. In preparation for abstraction, we
one. We thus pair states with continuations to encode the choice. instantiate the time-stamped machine using labeled call strings.
The abstract semantics computes the set of reachable states: Inside times, we use contours (Contour ), which are finite
\⋆ (e) = {ˆ
aval CESK ς | he, ∅, [a0 7→ mt], a0 , t0 i 7−→
→CESK
\⋆ ςˆ}.
strings of call site labels that describe the current context:
t t
δ ∈ Contour ::= ǫ | ℓδ.
2.6 Soundness and computability
The labeled CESK machine transition relation must appropri-
The finiteness of the abstract state-space ensures decidability. ately instantiate the parameters tick and alloc to augment the time-
Theorem 1 (Decidability of the Abstract CESK⋆ Machine). stamp on function call.
ςˆ ∈ aval CESK
\⋆ (e) is decidable.
Next, we switch to abstract stores and bound the address space
t
by truncating call string contours to length at most k (for k-CFA):
Proof. The state-space of the machine is non-recursive with finite \ k iff δ ∈ Contour and |δ| ≤ k.
δ ∈ Contour
sets at the leaves on the assumption that addresses are finite. Hence
reachability is decidable since the abstract state-space is finite. Combining these changes, we arrive at the instantiations for the
concrete and abstract machines given in Figure 7, where the value
We have endeavored to evolve the abstract machine gradually so ⌊δ⌋k is the leftmost k labels of contour δ.
that its fidelity in soundly simulating the original CEK machine is
both intuitive and obvious. But to formally establish soundness of Comparison to k-CFA: We say “k-CFA-like” rather than “k-
the abstract time-stamped CESK⋆ machine, we use an abstraction CFA” because there are distinctions between the machine just de-
function, defined in Figure 6, from the state-space of the concrete scribed and k-CFA:
time-stamped machine into the abstracted state-space. 1. k-CFA focuses on “what flows where”; the ordering between
The abstraction map over times and addresses is defined so states in the abstract transition graph produced by our machine
[ and tick
that the parameters alloc d are sound simulations of the produces “what flows where and when.”
parameters alloc and tick, respectively. We also define the partial 2. Standard presentations of k-CFA implicitly inline a global ap-
order (⊑) on the abstract state-space as the natural point-wise, proximation of the store into the algorithm [29]; ours uses one
element-wise, component-wise and member-wise lifting, wherein store per state to increase precision at the cost of complexity. In
the partial orders on the sets Exp and Addr are flat. Then, we terms of our framework, the lattice through which classical k-
can prove that abstract machine’s transition relation simulates the \
concrete machine’s transition relation. CFA ascends is P (Exp × Env × Addr ) × Store, whereas our
analysis ascends the lattice P Exp × Env × Store \ × Addr .
Theorem 2 (Soundness of the Abstract CESK⋆ Machine).
If ς 7−→CEK ς ′ and α(ς) ⊑ ςˆ, then there exists an abstract state We can explicitly inline the store to achieve the same complex-
ςˆ′ , such that ςˆ 7−→CESK ′ ′ ′ ity, as shown in Section 7.
\ ⋆ ςˆ and α(ς ) ⊑ ςˆ .
t
3. On function call, k-CFA merges argument values together with
Proof. By Lemmas 1, 2, and 3, it suffices to prove soundness with previous instances of those arguments from the same context;
respect to 7−→CESK ⋆t . Assume ς 7−→CESK ⋆t ς ′ and α(ς) ⊑ ςˆ. our “minimalist” evolution of the abstract machine takes a
Time = (Lab + •) × Contour ς 7−→LK ς ′
Addr = (Lab + Var ) × Contour
hx, ρ, σ, κi
t0 = (•, ǫ) if σ(ρ(x)) = d(e, ρ′ ) he, ρ′ , σ, c1 (ρ(x), κ)i
tickhx, , , , ti = t if σ(ρ(x)) = c(v, ρ′ ) hv, ρ′ , σ, κi
tickh(e0 e1 )ℓ , , , , ( , δ)i = (ℓ, δ) h(e0 e1 ), ρ, σ, κi he0 , ρ, σ[a 7→ d(e1 , ρ)], c2 (a, κ)i
( where a ∈ / dom(σ)
(ℓ, δ), if σ(a) = ar( , , ) hv, ρ, σ, c1 (a, κ)i hv, ρ, σ[a 7→ c(v, ρ)], κi
tickhv, , σ, a, (ℓ, δ)i =
(•, ℓδ), if σ(a) = fn( , , ) h(λx.e), ρ, σ, c2 (a, κ)i he, ρ[x 7→ a], σ, κi
alloc(h(eℓ0 e1 ), , , , ( , δ)i) = (ℓ, δ)
Figure 8. The LK machine.
alloc(hv, , σ, a, ( , δ)i) = (ℓ, δ) if σ(a) = ar(eℓ , , )
alloc(hv, , σ, a, ( , δ)i) = (x, δ) if σ(a) = fn((λx.e), , )
ςˆ 7−→LK
[ ⋆ ς
[ ς , κ), u = tick(ˆ
ˆ′ , where κ ∈ σ̂(a), b = alloc(ˆ d ς , κ)
d
tick(hx, , , , ti, κ) = t
t
d
tick(h(e ℓ
0 e1 ) , , , , ( , δ)i, κ) = (ℓ, δ) hx, ρ, σ̂, a, ti he, ρ′ , σ̂ ⊔ [b 7→ c1 (ρ(x), a)], b, ui
( if σ̂(ρ(x)) ∋ d(e, ρ′ )
(ℓ, δ), if κ = ar( , , )
d
tick(hv, , σ̂, a, (ℓ, δ)i, κ) = hx, ρ, σ̂, a, ti hv, ρ′ , σ̂, a, ui
(•, ⌊ℓδ⌋k ), if κ = fn( , , ) if σ̂(ρ(x)) ∋ c(v, ρ′ )
[ ℓ h(e0 e1 ), ρ, σ̂, a, ti he0 , ρ, σ̂ ′ , b, ui
alloc(h(e 0 e1 ), , , , ( , δ)i, κ) = (ℓ, δ)
[ ς , κ),
where c = alloc(ˆ
[
alloc(hv, , σ̂, a, ( , δ)i, κ) = (ℓ, δ) if κ = ar(eℓ , , ) σ̂ ′ = σ̂ ⊔ [c 7→ d(e1 , ρ), b 7→ c2 (c, a)]
[ hv, ρ, σ̂, a, ti hv, ρ′ , σ̂ ⊔ [a′ 7→ c(v, ρ)], c, ui
alloc(hv, , σ̂, a, ( , δ)i, κ) = (x, δ) if κ = fn((λx.e), , )
if κ = c1 (a′ , c)
h(λx.e), ρ, σ̂, a, ti he, ρ′ [x 7→ a′ ], σ̂, c, ui
Figure 7. Instantiation for k-CFA machine.
if κ = c2 (a′ , c)
higher-precision approach: it forks the machine for each ar- Figure 9. The abstract LK⋆ machine.
gument value, rather than merging them immediately.
4. k-CFA does not recover explicit information about stack struc-
application expression, which forces the operator expression to a
ture; our machine contains an explicit model of the stack for
value. The address a is the address of the argument.
every machine state.
The concrete state-space is defined as follows and the transition
relation is defined in Figure 8:
3. Analyzing by-need with Krivine’s machine
ς∈Σ = Exp × Env × Store × Kont
Even though the abstract machines of the prior section have advan- s ∈ Storable ::= d(e, ρ) | c(v, ρ)
tages over traditional CFAs, the approach we took (store-allocated κ ∈ Kont ::= mt | c1 (a, κ) | c2 (a, κ)
continuations) yields more novel results when applied in a different
context: a lazy variant of Krivine’s machine. That is, we can con- When the control component is a variable, the machine looks up
struct an abstract interpreter that both analyzes and exploits lazi- its stored value, which is either computed or delayed. If delayed,
ness. Specifically, we present an abstract analog to a lazy and prop- a c1 continuation is pushed and the frozen expression is put in
erly tail-recursive variant of Krivine’s machine [19, 20] derived by control. If computed, the value is simply returned. When a value
Ager, Danvy, and Midtgaard [1]. The derivation from Ager et al.’s is returned to a c1 continuation, the store is updated to reflect the
machine to the abstract interpreter follows the same outline as that computed value. When a value is returned to a c2 continuation, its
of Section 2: we apply a pointer refinement by store-allocating con- body is put in control and the formal parameter is bound to the
tinuations and carry out approximation by bounding the store. address of the argument.
The by-need variant of Krivine’s machine considered here uses We now refactor the machine to use store-allocated continua-
the common implementation technique of store-allocating thunks tions; storable values are extended to include continuations:
and forced values. When an application is evaluated, a thunk is ς∈Σ = Exp × Env × Store × Addr
created that will compute the value of the argument when forced. s ∈ Storable ::= d(e, ρ) | c(v, ρ) | κ
When a variable occurrence is evaluated, if it is bound to a thunk, κ ∈ Kont ::= mt | c1 (a, a) | c2 (a, a).
the thunk is forced (evaluated) and the store is updated to the result.
Otherwise if a variable occurrence is evaluated and bound to a It is straightforward to perform a pointer-refinement of the LK ma-
forced value, that value is returned. chine to store-allocate continuations as done for the CESK machine
Storable values include delayed computations (thunks) d(e, ρ), in Section 2.3 and observe the lazy variant of Krivine’s machine and
and computed values c(v, ρ), which are just tagged closures. There its pointer-refined counterpart (not shown) operate in lock-step:
are two continuation constructors: c1 (a, κ) is induced by a variable Lemma 4. eval LK (e) ≃ eval LK ⋆ (e).
occurrence whose binding has not yet been forced to a value.
The address a is where we want to write the given value when After threading time-stamps through the machine as done in
this continuation is invoked. The other: c2 (a, κ) is induced by an Section 2.4 and defining tick [ analogously to the defi-
d and alloc
nitions given in Section 2.5, the pointer-refined machine abstracts
ςˆ 7−→LK [ ς , κ), u = tick(ˆ
ˆ′ , where κ ∈ σ̂(a), b = alloc(ˆ
′⋆ ς
d ς , κ)
directly to yield the abstract LK⋆ machine in Figure 9. \
The abstraction map for this machine is a straightforward struc-
tural abstraction similar to that given in Section 2.6 (and hence h(e0 e1 ), ρ, σ̂, ai he0 , ρ, σ̂ ⊔ [b 7→ c2 (e1 , ρ, a)], bi
omitted). The abstracted machine is sound with respect to the LK⋆
machine, and therefore the original LK machine. h(λx.e), ρ, σ̂, ai he, ρ[x 7→ b], σ̂ ⊔ [b 7→ d(e′ , ρ′ )], ci
if κ = c2 (e′ , ρ′ , c)
Theorem 3 (Soundness of the Abstract LK⋆ Machine).
If ς 7−→LK ς ′ and α(ς) ⊑ ςˆ, then there exists an abstract state ςˆ′ ,
Figure 11. The abstract thunk postponing LK⋆ machine.
such that ςˆ 7−→LK
[ ˆ′ and α(ς ′ ) ⊑ ςˆ′ .
⋆ ς
t
[1] Mads S. Ager, Olivier Danvy, and Jan Midtgaard. A functional [21] Peter J. Landin. The mechanical evaluation of expressions. The
correspondence between call-by-need evaluators and lazy abstract Computer Journal, 6(4):308–320, 1964.
machines. Information Processing Letters, 90(5):223–232, June [22] Jan Midtgaard. Control-flow analysis of functional programs. Tech-
2004. nical Report BRICS RS-07-18, DAIMI, Department of Computer
[2] Andrew E. Ayers. Abstract analysis and optimization of Scheme. PhD Science, University of Aarhus, December 2007. To appear in revised
thesis, Massachusetts Institute of Technology, 1993. form in ACM Computing Surveys.
[3] John Clements and Matthias Felleisen. A tail-recursive machine with [23] Jan Midtgaard and Thomas Jensen. A calculational approach to
stack inspection. ACM Trans. Program. Lang. Syst., 26(6):1029– control-flow analysis by abstract interpretation. In Marı́a Alpuente
1052, November 2004. and Germán Vidal, editors, SAS, volume 5079 of Lecture Notes in
Computer Science, pages 347–362, 2008.
[4] John Clements, Matthew Flatt, and Matthias Felleisen. Modeling an
algebraic stepper. In ESOP ’01: Proceedings of the 10th European [24] Jan Midtgaard and Thomas P. Jensen. Control-flow analysis of
Symposium on Programming Languages and Systems, pages 320– function calls and returns by abstract interpretation. In ICFP ’09:
334, 2001. Proceedings of the 14th ACM SIGPLAN International Conference on
Functional Programming, pages 287–298, 2009.
[5] Patrick Cousot. The calculational design of a generic abstract
interpreter. In M. Broy and R. Steinbrüggen, editors, Calculational [25] Matthew Might and Olin Shivers. Improving flow analyses via ΓCFA:
System Design. 1999. Abstract garbage collection and counting. In ICFP ’06: Proceedings
of the Eleventh ACM SIGPLAN International Conference on
[6] Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified Functional Programming, pages 13–25, 2006.
lattice model for static analysis of programs by construction or
approximation of fixpoints. In Conference Record of the Fourth [26] François Pottier, Christian Skalka, and Scott Smith. A systematic
ACM Symposium on Principles of Programming Languages, pages approach to static access control. ACM Trans. Program. Lang. Syst.,
238–252, 1977. 27(2):344–382, March 2005.
[7] Patrick Cousot and Radhia Cousot. Systematic design of program [27] Peter Sestoft. Analysis and efficient implementation of functional
analysis frameworks. In POPL ’79: Proceedings of the 6th ACM programs. PhD thesis, University of Copenhagen, October 1991.
SIGACT-SIGPLAN Symposium on Principles of Programming [28] Zhong Shao and Andrew W. Appel. Space-efficient closure
Languages, pages 269–282, 1979. representations. In LFP ’94: Proceedings of the 1994 ACM
[8] Olivier Danvy. An Analytical Approach to Program as Data Objects. Conference on LISP and Functional Programming, pages 150–161,
DSc thesis, Department of Computer Science, Aarhus University, 1994.
October 2006. [29] Olin G. Shivers. Control-Flow Analysis of Higher-Order Languages.
[9] Karl Faxén. Optimizing lazy functional programs using flow PhD thesis, Carnegie Mellon University, 1991.
inference. In Static Analysis, pages 136–153. 1995. [30] Christian Skalka and Scott Smith. Static enforcement of security
[10] Matthias Felleisen. The Calculi of Lambda-v-CS Conversion: A with types. In ICFP ’00: Proceedings of the fifth ACM SIGPLAN
Syntactic Theory of Control and State in Imperative Higher-Order International Conference on Functional Programming, pages 34–45,
Programming Languages. PhD thesis, Indiana University, 1987. September 2000.
[11] Matthias Felleisen, Robert B. Findler, and Matthew Flatt. Semantics [31] Christian Skalka, Scott Smith, and David Van Horn. Types and
Engineering with PLT Redex. August 2009. trace effects of higher order programs. Journal of Functional
Programming, 18(02):179–249, 2008.
[12] Matthias Felleisen and Daniel P. Friedman. Control operators, the
SECD-machine, and the lambda-calculus. In 3rd Working Conference
on the Formal Description of Programming Concepts, August 1986.