You are on page 1of 6

Static analysis

STATIC ANALYSIS
1
2
3
4
5
6
7
8
9
10
11

Definition, objective
Types of analysis
Characteristics
Data Flow Analysis
Data Flow Anomalies
Rice Theorem
Abstract Interpretation- Abstract Domain
Find bugs
Advantages n disadvantages
Analysis Tools
Light Weight Analysis

1. Automated computation of information about a program without executing it


(Originally developed for program optimization at compilation time ). Simple
analysis can be automated. behavior of a program are undecidable or
computationally infeasible to answer Ex : halting problem
a. Objective : automatically discover properties of a program that hold for all
possible execution paths of the program
b. Difference Testing: manually checking a property for some execution paths
Model checking: automatically checking a property for all execution
paths
2. Types of analysis
a. Control flow analysis. -Checks for loops with multiple exit or entry points, finds
unreachable code, etc
b. Data flow analysis. - Detects uninitialized variables, variables written twice
without an intervening assignment, variables which are declared but never used,
etc.
c. Interface analysis. - Checks the consistency of routine and procedure
declarations and their use (e.g. sequences of API calls).
d. Syntactic pattern matching
Check for known suspicious code
patterns/idioms.,
e. Information flow analysis. - Identifies the dependencies of output variables.
f. Path analysis. - Identifies paths through the program and sets out the stmts
executed in that path.
3. Characteristics of Static Analysis Approaches:
flow sensitive considers the order of execution of statements in the program
path sensitive distinguishes between paths through a program attempts to
consider only feasible ones
context sensitive method calls are analyzed differently based on the call site
interprocedural the body of a method is analyzed in the context of each
respective call site
intraprocedural does not distinguish methods call sites
Examples of defects that can be detected : Buffer overflows/underflows, Buffer
overrun/underrun,
Integer overflows,
Class hierarchy inconsistencies,
1

Static analysis
Concurrent data access violations, Division by zero, Null Pointers dereference,
Memory leaks,
Dead code,
Security vulnerabilities (e.g. XSS, CSRF),
Conformance to coding standards
Example : Check that every operation of a program will never cause an error
(division by zero, buffer overrun, deadlock, etc.)
4. Data Flow analysis
Typical Data Flow actions on a program variable - definition (d) reference (r)
undefine (u)
int main (void)
{
char *line; int x = 0, y;
line = malloc(256 * sizeof (*line));
fgets (line, 256, stdin);
scanf ("%d", &y);
if (y > x) y = y - x;
else { x = getvalue(); y = y - x; }
printf ("%s%d", line,y);
free(line);

5. Data Flow Anomalies - For a pair of successive actions on a variable


dd suspicious, du probably defect, dr normal case, ud okay, uu
probably defect, ur a defect, rd okay, ru okay, rr okay,
First occurrence of action on a variable - u suspicious, d okay, r
suspicious
Last occurrence of action on a variable - u okay, d suspicious (defined
but never used after), r okay (but may be deallocation forgotten)
6. Rice's Theorem
- The problem of determining that a nontrivial property
holds for a program P is undecidable a nontrivial property being a property
that holds for some programs and not for others. Consequence - Absolute
static analyzer is impossible to build
a. Good News - can obtain good approximation of the analyzed behavior e.g. by
using abstraction can be used infer properties using a decidable procedure
however there are tradeoffs
b. Characteristics of a Solution - False negative( execution is unsafe in reality
but reported as safe), False positive( execution is safe in reality but reported
unsafe by Analyzer), Sound Analysis: computed results hold for any possible
program execution no false negatives find all defect in a welldefined class,
Unsound Analysis : can omit to signal an error that can appear at runtime in
some execution environment( subject to false negatives, find some of the
defects in a welldefined Class), Precision: proportion of program operations
that can be decided safe or unsafe by an analyzer - answer unknown for the
others, Precise Analysis: never produces false alarms on any program ( no
false positives), Specialization tailoring the analyzer algorithms for a specific
class of programs e.g. flight control commands, digital signal processing, ...
so that higher precision can be guaranteed for this class of programs only Ideal
Static Analysis Solution: Sound and Precise impossible in the absolute
2

Static analysis
Possible to : efficiently compute approximate but sound guarantees Over
approximation overestimates analyzed behavior, sound but unprecise
analyzes, no false negatives, but possibility of false positives Under
approximation - underestimates analyzed behavior precise but unsound
analyzes (complete analysis) no false positives, but possibility of false
negatives Soundness is typically favoured over precision imprecision on the
safe side overestimation rather than underestimation specialization
analyzer never omits to signal an error that can appear in some execution
environment However: the number of false positives must be kept reasonable
tool is unusable otherwise
7. Abstract Interpretation - guaranteeing soundness of the analyzer results
By ensuring correctness by construction ( sound formal basis)
Abstract
Interpretation is the formalization of the notion of approximation ( introduced
by Cousot and Cousot) Example : Consider the following code - Suppose we
want to find all values that i can take
L1
L2
L3
L4
L5

: int i = 0;
: do {
: assert (i 10);
: i = i+2;
: } while (i< 5)

With no approximation concrete implementation

Abstract function : maps concrete to abstract values


Abstract interpretation : evaluate program behavior on an abstract domain to
obtain an approximate solution involves counterpart of concrete operations (e.g.
addition, union) in the abstract domain Eg : abstract domain set of intervals {[a,b]
| ab}

a. Static Analysis Procedure : propagate a set of (abstract) values through


the program control flow until the set saturates does not change with
3

Static analysis
further propagation Mathematically modelled as iterative application of a
monotone function saturation occurs when a fixed point is reached
b. Abstract domain : approximate representation of sets of concrete values
relational domains can capture relationships between variables . The class
of properties that can be efficiently proved depend on the expressive power
of the abstract domain An abstract domain is more precise than another if
less information is lost Various abstract domains have been introduced
Nonrelational abstractions Relational abstractions Symbolic abstractions
c. Examples of nonrelational abstract domains : Signs : {Pos, Neg, Zero}
Intervals : [a, b] Parities : {Even, Odd} Congruences : value v
represented by v mod k
d. Examples of relational abstract domains Difference Bound Matrices
(DBM) equations : x y c, }x c; x,y,c Integers Octagons equations :
ax+by c; a,b {1, 0, 1}; x,y,c Integers Octahedra generalize octagons
to more than 2 variables Polyhedra Equations : a1x1 + + anxn c;
ai ,c Integers
e. Any abstraction introduces some loss of information All answers given by
the abstract interpretation are always correct with respect to the concrete
program But, not all questions can be definitely answered classes of
questions that can be answered will vary with the Abstraction
f. Signs, Intervals, DBMs, Octagons, Octahedra, and Polyhedra form a
hierarchy in term of expressive power are useful for computing inequality
constraints about integer variables deriving nested program loops
invariants
g. Store Shape Graphs symbolic abstract domain for verifying properties of
dynamically created data structures (Shape Analysis) e.g. checking if two
pointer variables access the same memory location
check for aliases:
expressions that denote the same memory location introduced by
Pointers, Callbyreference array indexing C unions
h. Challenge design useful abstractions, computable for all programs,
expressive enough to yield interesting information for most programs
8. FindBugs - Implements a detectors for each pattern analyzes Java
bytecodes
a. Categories of detectors bad coding practice singlethreaded
correctness issue thread/synchronization correctness issue performance
issue security and vulnerability to malicious untrusted code
b. Detectors are implemented based on the Visitor Pattern - Strategies used
: Examination of class structure and inheritance hierarchy, Linear code scan
using visiting instructions to drive a state machine Control flow analysis
Dataflow analysis
c. Sample Detectors - Bad Covariant Definition of Equals parameter type
for method equals should be Object for proper overriding, wrong type can
cause unexpected behavior e.g. when instances are passed to Collections
Check for signature - public boolean equals(Foo obj) {...} Null Pointer
Dereference would result in NullPointerException Detection based on
intraprocedural Data Flow analysis
if (entry == null) {

Static analysis
IClasspathContainer container=
JavaCore.getClasspathContainer(
entry.getPath(),
root.getJavaProject());
// entry

d. Uninitialized Read In Constructor access to an object field before a value is


written to it Detector checks object constructors to determine whether any
field is read before it is written often results from confusion between
identifiers
public ByteArrayCallback(String propmpt) {
this.prompt = prompt;
}

e. Static Field Modifiable By Untrusted Code may be present when : static


nonfinal field has public or protected access static final field has public or
protected access and references a mutable structure (e.g. array, Hashtable)
method returns reference to a static mutable structure (e.g. array, Hashtable)
f. Unconditional Wait should check for condition being waited for before
entering a monitor wait event notification might have already occurred
g. Detector scan over bytecode for calls to wait() immediately preceded by
monitor enter not target of branch instruction
if (!enabled) {
try {
synchronized (lock) {
lock.wait();
... //enabled might have changed at this point
9. Advantages of static program Analysis no user interaction needed
general/reusable abstractions Drawbacks - general abstractions (applicable
widely) are often simple more sophisticated approaches have narrow
application areas scalability to very large applications lack of precision
report of unknown
10.
Analysis Tools
CodeSonar uses interprocedural analysis supports C/C++, Java source and
binary code buffer overflows, memory leaks, redundant loop, branch conditions,
data races, ...
KlocWork supports C/C++, Java, C# analysis of security, safety, reliability
onthefly analysis
Coverity Code Advisor supports C/C++, Java, C# interprocedural dataflow
analysis, boolean satisfiability, path analysis, design patterns recognition API
usage errors, buffer overflows, class hierarchy inconsistencies, XSS, deadlocks,
Astre prove absence of runtime errors in C programs (sound)
based on abstract interpretation (interval, octagons domains) applications in
avionics (Airbus development)
NASA C Global Surveyor analyses C code for runtime error (access to
noninitialized variables, null pointers dereferences, out-of-bound array access, ...)
based on abstract interpretation

Static analysis
clair supports C/C++ performs light as well as deep static analysis (abstract
interpretation with Polyhedra) checking of coding standards, proving runtimeerror
absences, test cases generation, ...
CPAchecker
(Configurable
Program
Analysis)

open
source
http://cpachecker.sosylab.org/ configurable program analysis based on a predicate
analysis
Microsoft SLAM verification engine of Microsoft Static Driver Verifier (SDV)
Microsoft CCCheck static analyzer for .Net languages (analyzes MSIL code)
checks contracts (pre/postconditions, invariants) at compile time integrates with
Visual Studio
11.
Lightweight Static Program Analysis
Use syntaxbased techniques to find issues earlier approaches for static analysis,
original inspiration : LINT from Bell Labs
Typically unsound and nonprecise false negatives false positives
Useful as support for inspections/reviews
Lightweight Static Analysis tools
PMD Java, Javascript, XMD ( unused variables, empty catch blocks, unnecessary
object creation)
Splint C, modern Lint , Jlint Java ( inconsistencies, synchronization problems
(e.g. deadlocks))
JSLint Javascript PyChecker Phyton ESC/Java Microsoft FxCop .Net
languages

You might also like