You are on page 1of 32

Compactly Representing

First-Order Structures for


Static Analysis
Tel-Aviv University
Roman Manevich
Mooly Sagiv

I.B.M T.J. Watson


Ganesan Ramalingam
John Field
Deepak Goyal

Motivation

TVLA is a powerful and general abstract


interpretation system
Abstract interpretation in TVLA

Operational semantics is expressed with


first-order logic formulae
Program states are represented as
sets of Evolving First-Order Structures

Space is a major bottleneck

Desired Properties

Sparse data structures

Share common sub-structures

Inherited sharing
Incidental sharing due to program invariants

But feasible time performance

Phase sensitive data structures

Outline

Background
First-order structure representations

Base representation (TVLA 0.91)


BDD representation

Empirical evaluation
Conclusion

First-Order Logical Structures

Generalize shape graphs


Arbitrary set of individuals
Arbitrary set of predicates on individuals
Dynamically evolving

Usually small changes

Properties are extracted by evaluating first


order formula: v1 , v: x(v1) n(v1, v)
Join operator requires isomorphism testing

First-Order Structure ADT

Structure : new() /* empty structure */


SetOfNodes : nodeSet(Structure)
Node : newNode(Structure)
removeNode(Structure, node)
Kleene eval(Structure, p(r), <u1, . . . ,ur>)

update(Structure, p(r), <u1, . . . ,ur>, Kleene)

Structure copy(Structure)

print_all Example
/* list.h */
typedef struct node {
struct node * n;
int data;
} * L;

/* print.c */
#include list.h
void print_all(L y) {
L x;
x = y;
while (x != NULL) {
/* assert(x != NULL) */
printf(elem=%d, xdata);
x = xn;
}
}

print_all Example
n=

x = y
x(v) := y(v)
copy(S0) : S1
nodeset(S0) : {u1, u}
eval(S0, y, u1) : 1
update(S1, x, u1, 1)
eval(S0, y, u) : 0
update(S1, x, u, 0)

S0

u1
y=1

n=

u
sm=

n=

S1

u1
y=1
x=1

n=

u
sm=

print_all Example
n=

while (x != NULL)
precondition : v x(v)
x = x n
focus : v1 x(v1) n(v1, v)
x(v) := v1 x(v1) n(v1, v)

S1

u1
u
n=
x=1
sm=
y=1

n=

S2.0

u
sm=

u1
y=1

n=

S2.1

u1
y=1

n=1

u
x=1
n=

S2.2

u1
y=1

n=1

u.1
x=1

n=
n=

u.0
sm=

Overview and Main Results


1.

Two novel representations of first-order


structures

2.
3.

New BDD representation


New representation using functional maps

Implementation techniques
Empirical evaluation

Comparison of different representations


Space is reduced by a factor of 410
New representations scale better

Base Representation
(Tal Lev-Ami SAS 2000)

Two-Level Map :
Predicate (Node Tuple Kleene)
Sparse Representation
Limited inherited sharing by
Copy-On-Write

BDDs in a Nutshell (Bryant 86)

Ordered Binary Decision Diagrams


Data structure for Boolean functions
Functions are represented as (unique) DAGs
x1

x2

x3

x1
x2

x2

x3
0

x3
0

x3
1

x3
1

BDDs in a Nutshell (Bryant 86)

Ordered Binary Decision Diagrams


Data structure for Boolean functions
Functions are represented as (unique) DAGs

Also achieve sharing across functions

x1

x1

x2
x3

x2
x3

x3

x3
1

Duplicate Terminals

x1

x2

x2

x3

x3

Duplicate Nonterminals

x2
x3
0

Redundant Tests

Encoding Structures Using Integers

Static encoding of

Dynamic encoding of nodes

Predicates
Kleene values
0, 1, , n-1

Encode predicate ps values as

ep(p).en(u1). en(u2) . . en(un) . ek(Kleene)

BDD Representation of Integer Sets

Characteristic function

S={1,5}

1=<001>
=

5=<101>

(x1x2x3) (x1x2x3)
x2

x1

x2
x3

BDD Representation of Integer Sets

Characteristic function

S={1,5}

1=<001>
=

5=<101>

(x1x2x3) (x1x2x3)
x2

x1

x2
x3
1

BDD Representation Example


n=

S0

u
u1 n=
sm=
y=1

S0

BDD Representation Example


n=

S0

u
u1 n=
sm=
y=1

S0

S1

x=y

n=

S1

u1
u
n=
x=1
sm=
y=1

BDD Representation Example


n=

S0

u
u1 n=
sm=
y=1

S0

S1

x=y

n=

S1

u1
u
n=
x=1
sm=
y=1

x=xn
n=

S2.2

u1
y=1

n=1

u.1
x=1

n=
n=

u.0
sm=

S2.2

BDD Representation Example


n=

S0

u
u1 n=
sm=
y=1

S0

S1

x=y

n=

S1

u1
u
n=
x=1
sm=
y=1

x=xn
n=

S2.2

u1
y=1

n=1

u.1
x=1

n=
n=

u.0
sm=

S2.2

Improved BDD Representation

Using this representation directly


doesnt save space
Observation

Our heuristics

Node names can be arbitrarily remapped without


affecting the ADT semantics
Use canonic node names to encode nodes
Increases incidental sharing
Reduces isomorphism test to pointer comparison

4-10 space reduction

Reducing Time Overhead

Current implementation not optimized

Expensive formula evaluation

Hybrid representation

Distinguish between phases:


mutable phase Join immutable phase
Dynamically switch representations

Functional Representation

Alternative representation for first-order structures


Structures represented by maps from integers to
Kleene values
Tailored for representing first-order structures
Achieves better results than BDDs
Techniques similar to the BDD representation
More details in the paper

Empirical Evaluation

Benchmarks:

Cleanness Analysis (SAS 2000)


Garbage Collector
CMP (PLDI 2002) of Java Front-End and Kernel
Benchmarks
Mobile Ambients (ESOP 2000)

Stress testing the representations

We use relational analysis


Save structures in every CFG location

Space Results
450

402.8

400
350
300

Base
OBDD total
Functional

250
200

187.7

168.2

150
100
50

51.6
12.8

5.5

22.7 16.7

12.9

9.6

0
JFE

KERNEL

CA

MA

GC

Abstract Counters

Ignore language/implementation details


A more reliable measurement technique

Count only crucial space information


Independent of C/Java

Abstract Counters Results


45,000,000
40,000,000
35,000,000
30,000,000
Base
OBDD
Functional

25,000,000
20,000,000
15,000,000
10,000,000
5,000,000
0
JFE

KERNEL

CA

MA

GC

Trends in the
Cleanness Analysis Benchmark
600
500

564
505

400
Base
OBDD
Functional

300
200
100
0

74

54
42

50
1

10

Whats Missing from this Work?

Investigate other node mapping heuristics


Compactly represent sets of structures
Time optimizations

Conclusions

Two novel representations of first-order structures

Implementation techniques

New BDD representation


New representation using functional maps
Normalization techniques are crucial

Empirical evaluation

Comparison of different representations


Space is reduced by a factor of 410
New representations scale better

Conclusions

The use of BDDs for static analysis


is not a panacea for space saving

Domain-specific encoding crucial for saving space


Failed attempts
Original implementation of Veiths encoding
PAG

The End

You might also like