Professional Documents
Culture Documents
STATIC ANALYSIS
1
2
3
4
5
6
7
8
9
10
11
Definition, objective
Types of analysis
Characteristics
Data Flow Analysis
Data Flow Anomalies
Rice Theorem
Abstract Interpretation- Abstract Domain
Find bugs
Advantages n disadvantages
Analysis Tools
Light Weight Analysis
Static analysis
Concurrent data access violations, Division by zero, Null Pointers dereference,
Memory leaks,
Dead code,
Security vulnerabilities (e.g. XSS, CSRF),
Conformance to coding standards
Example : Check that every operation of a program will never cause an error
(division by zero, buffer overrun, deadlock, etc.)
4. Data Flow analysis
Typical Data Flow actions on a program variable - definition (d) reference (r)
undefine (u)
int main (void)
{
char *line; int x = 0, y;
line = malloc(256 * sizeof (*line));
fgets (line, 256, stdin);
scanf ("%d", &y);
if (y > x) y = y - x;
else { x = getvalue(); y = y - x; }
printf ("%s%d", line,y);
free(line);
Static analysis
Possible to : efficiently compute approximate but sound guarantees Over
approximation overestimates analyzed behavior, sound but unprecise
analyzes, no false negatives, but possibility of false positives Under
approximation - underestimates analyzed behavior precise but unsound
analyzes (complete analysis) no false positives, but possibility of false
negatives Soundness is typically favoured over precision imprecision on the
safe side overestimation rather than underestimation specialization
analyzer never omits to signal an error that can appear in some execution
environment However: the number of false positives must be kept reasonable
tool is unusable otherwise
7. Abstract Interpretation - guaranteeing soundness of the analyzer results
By ensuring correctness by construction ( sound formal basis)
Abstract
Interpretation is the formalization of the notion of approximation ( introduced
by Cousot and Cousot) Example : Consider the following code - Suppose we
want to find all values that i can take
L1
L2
L3
L4
L5
: int i = 0;
: do {
: assert (i 10);
: i = i+2;
: } while (i< 5)
Static analysis
further propagation Mathematically modelled as iterative application of a
monotone function saturation occurs when a fixed point is reached
b. Abstract domain : approximate representation of sets of concrete values
relational domains can capture relationships between variables . The class
of properties that can be efficiently proved depend on the expressive power
of the abstract domain An abstract domain is more precise than another if
less information is lost Various abstract domains have been introduced
Nonrelational abstractions Relational abstractions Symbolic abstractions
c. Examples of nonrelational abstract domains : Signs : {Pos, Neg, Zero}
Intervals : [a, b] Parities : {Even, Odd} Congruences : value v
represented by v mod k
d. Examples of relational abstract domains Difference Bound Matrices
(DBM) equations : x y c, }x c; x,y,c Integers Octagons equations :
ax+by c; a,b {1, 0, 1}; x,y,c Integers Octahedra generalize octagons
to more than 2 variables Polyhedra Equations : a1x1 + + anxn c;
ai ,c Integers
e. Any abstraction introduces some loss of information All answers given by
the abstract interpretation are always correct with respect to the concrete
program But, not all questions can be definitely answered classes of
questions that can be answered will vary with the Abstraction
f. Signs, Intervals, DBMs, Octagons, Octahedra, and Polyhedra form a
hierarchy in term of expressive power are useful for computing inequality
constraints about integer variables deriving nested program loops
invariants
g. Store Shape Graphs symbolic abstract domain for verifying properties of
dynamically created data structures (Shape Analysis) e.g. checking if two
pointer variables access the same memory location
check for aliases:
expressions that denote the same memory location introduced by
Pointers, Callbyreference array indexing C unions
h. Challenge design useful abstractions, computable for all programs,
expressive enough to yield interesting information for most programs
8. FindBugs - Implements a detectors for each pattern analyzes Java
bytecodes
a. Categories of detectors bad coding practice singlethreaded
correctness issue thread/synchronization correctness issue performance
issue security and vulnerability to malicious untrusted code
b. Detectors are implemented based on the Visitor Pattern - Strategies used
: Examination of class structure and inheritance hierarchy, Linear code scan
using visiting instructions to drive a state machine Control flow analysis
Dataflow analysis
c. Sample Detectors - Bad Covariant Definition of Equals parameter type
for method equals should be Object for proper overriding, wrong type can
cause unexpected behavior e.g. when instances are passed to Collections
Check for signature - public boolean equals(Foo obj) {...} Null Pointer
Dereference would result in NullPointerException Detection based on
intraprocedural Data Flow analysis
if (entry == null) {
Static analysis
IClasspathContainer container=
JavaCore.getClasspathContainer(
entry.getPath(),
root.getJavaProject());
// entry
Static analysis
clair supports C/C++ performs light as well as deep static analysis (abstract
interpretation with Polyhedra) checking of coding standards, proving runtimeerror
absences, test cases generation, ...
CPAchecker
(Configurable
Program
Analysis)
open
source
http://cpachecker.sosylab.org/ configurable program analysis based on a predicate
analysis
Microsoft SLAM verification engine of Microsoft Static Driver Verifier (SDV)
Microsoft CCCheck static analyzer for .Net languages (analyzes MSIL code)
checks contracts (pre/postconditions, invariants) at compile time integrates with
Visual Studio
11.
Lightweight Static Program Analysis
Use syntaxbased techniques to find issues earlier approaches for static analysis,
original inspiration : LINT from Bell Labs
Typically unsound and nonprecise false negatives false positives
Useful as support for inspections/reviews
Lightweight Static Analysis tools
PMD Java, Javascript, XMD ( unused variables, empty catch blocks, unnecessary
object creation)
Splint C, modern Lint , Jlint Java ( inconsistencies, synchronization problems
(e.g. deadlocks))
JSLint Javascript PyChecker Phyton ESC/Java Microsoft FxCop .Net
languages