You are on page 1of 101

CATCH AND RELEASE: A

NEW LOOK AT DETECTING


AND MITIGATING HIGHLY
OBFUSCATED EXPLOIT
KITS
BY MOHAMED SAHER AND AHMED GARHY

AGENDA
Our Intent
Rethinking Evasions
Domain of the Problem
Current Problem
Problem with Current Solutions
Solution #1 First Method
Solution #2 Second Method

OUR INTENT
Is this function malicious?
function Translate(objects, offset, size) {
var length = 4;
for (var i = 0; i < size; i++) {
var r = rc.substr(0, length);
if(offset > 0) {
r = r.substr(offset) + r.substr(0, offset);
}
objects[i] = r.substr(0, r.length);
}
}

OUR INTENT
Is this function malicious?
function Translate(objects, offset, size) {
var length = 4;
for (var i = 0; i < size; i++) {
var r = rc.substr(0, length);
if(offset > 0) {
r = r.substr(offset) + r.substr(0, offset);
}
objects[i] = r.substr(0, r.length);
}
}
Without understanding the context on how a function is used, it is
very difficult to determine if it is malicious or not

OUR INTENT
What about this script?
<script>
var a = '%25%33%43%69%66%72%61%6d%65 ...';
var b = unescape(unescape(a));
var spray = new Function(unescape(b));
</script>

OUR INTENT
What about this script?
<script>
var a = '%25%33%43%69%66%72%61%6d%65 ...';
var b = unescape(unescape(a));
var spray = new Function(unescape(b));
</script>
An experts eye can probably determine it looks suspicious.
The two are actually equal to each other

OUR INTENT
What about this script?
<script>
var a = '%25%33%43%69%66%72%61%6d%65 ...';
var b = unescape(unescape(a));
var spray = new Function(unescape(b));
</script>
An experts eye can probably determine it looks suspicious.
The two are actually equal to each other
Our intent is to allow an attack using the first example script,
without depending on obfuscating like the second example
script, and propose a more superior method for detecting both

RETHINKING EVASIONS
Designing a new architecture

RETHINKING EVASIONS
Designing a new architecture
Use a message oriented architecture (MOA) to split the attack
into disparate self contained messages we refer to this as units
of work

RETHINKING EVASIONS
Designing a new architecture
Use a message oriented architecture (MOA) to split the attack
into disparate self contained messages we refer to this as units
of work
This is a variation of the script splitting technique except a
message exists within a local scope and is destroyed after it
serves its purpose

RETHINKING EVASIONS
Designing a new architecture
Use a message oriented architecture (MOA) to split the attack
into disparate self contained messages we refer to this as units
of work
This is a variation of the script splitting technique except a
message exists within a local scope and is destroyed after it
serves its purpose
Does not require DOM manipulation to hide magic strings

RETHINKING EVASIONS
Designing a new architecture
Use a message oriented architecture (MOA) to split the attack
into disparate self contained messages we refer to this as units
of work
This is a variation of the script splitting technique except a
message exists within a local scope and is destroyed after it
serves its purpose
Does not require DOM manipulation to hide magic strings
Avoid the magic redirect IFRAME that can be a trigger for some
analyzers

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
An artifact that can be parsed or scanned for patterns,
characteristics, and definitions does not exist

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
An artifact that can be parsed or scanned for patterns,
characteristics, and definitions does not exist
An alternative to loading JavaScript in clear text

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
An artifact that can be parsed or scanned for patterns,
characteristics, and definitions does not exist
An alternative to loading JavaScript in clear text
Load one message at a time, forcing each message to be
analyzed independently remember units of work

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
An artifact that can be parsed or scanned for patterns,
characteristics, and definitions does not exist
An alternative to loading JavaScript in clear text
Load one message at a time, forcing each message to be
analyzed independently remember units of work
Web Sockets are a perfect candidate for both MOA and
bypassing HTTP from a web environment

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Two components involved, client and server
Client
Client

Listen
Listen
Invoke
Invoke

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Two components involved, client and server
Client
Client

Server
Server

Listen
Listen

State
State

Invoke
Invoke

Send
Send

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Two components involved, client and server
For each accepted connection from a client, server maintains a
state machine

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Two components involved, client and server
For each accepted connection from a client, server maintains a
state machine
Messages are essentially commands and do not depend on each
other remember units of work

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Two components involved, client and server
For each accepted connection from a client, server maintains a
state machine
Messages are essentially commands and do not depend on each
other remember units of work
Client evaluates message, invokes message, and destroys it

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Only client control flow is that of the client listening and invoking a
message

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Only client control flow is that of the client listening and invoking a
message
Order of messages not guaranteed by server. Server may send
NOP messages as part of an attack to trick certain analyzers

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Only client control flow is that of the client listening and invoking a
message
Order of messages not guaranteed by server. Server may send
NOP messages as part of an attack to trick certain analyzers
Monkey patch functions dynamically evaluated in messages to
trick certain analyzers

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format
Web Sockets are simple TCP pipes, so data can be represented
on the wire in an application specific way

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format
Web Sockets are simple TCP pipes, so data can be represented
on the wire in an application specific way
No longer restricted to sending JavaScript in clear text

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format
Web Sockets are simple TCP pipes, so data can be represented
on the wire in an application specific way
No longer restricted to sending JavaScript in clear text
Create custom binary format

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format
Web Sockets are simple TCP pipes, so data can be represented
on the wire in an application specific way
No longer restricted to sending JavaScript in clear text
Create custom binary format
Send message in binary on the wire
010010000110010101101100011011000110111100100000010010000
1100001011011010110001001110101011100100110011100100001

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format
Web Sockets are simple TCP pipes, so data can be represented
on the wire in an application specific way
No longer restricted to sending JavaScript in clear text
Create custom binary format
Send message in binary on the wire
Simply looking at a binary message won't give hints about what
its contents are is it an audio file, an image, even text?

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format
Web Sockets are simple TCP pipes, so data can be represented on the wire
in an application specific way
No longer restricted to sending JavaScript in clear text
Create custom binary format
Send message in binary on the wire
Simply looking at a binary message won't give hints about what its contents
are is it an audio file, an image, even text?
To even begin to understand a binary message, its format specification needs
to be known beforehand or else it is a very challenging problem in its own

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format
Confusing the Context

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format
Confusing the Context
Remember this function?
function Translate(objects, offset, size) {
var length = 4;
for (var i = 0; i < size; i++) {
var r = rc.substr(0, length);
if(offset > 0) {
r = r.substr(offset) + r.substr(0, offset);
}
objects[i] = r.substr(0, r.length);
}
}

RETHINKING EVASIONS
Designing a new architecture
Avoiding HTTP
Avoiding client side state
Limit control flow and function call hierarchy
Getting creative in transport format
Confusing the Context
Remember this function?
function Translate(objects, offset, size) {
var length = 4;
for (var i = 0; i < size; i++) {
var r = rc.substr(0, length);
if(offset > 0) {
r = r.substr(offset) + r.substr(0, offset);
}
objects[i] = r.substr(0, r.length);
}
}

Now that we get this from our binary format, we again ask the question, how do you determine if it is
malicious?

DOMAIN OF THE PROBLEM


How can we define a malicious website?

DOMAIN OF THE PROBLEM


How can we define a malicious website?
How can we detect a malicious website?

DOMAIN OF THE PROBLEM


How can we define a malicious website?
How can we detect a malicious website?
How can we detect obfuscation?

DOMAIN OF THE PROBLEM


How can we define a malicious website?
How can we detect a malicious website?
How can we detect obfuscation?
How can we identify obfuscation used for malicious purposes?

DOMAIN OF THE PROBLEM


How can we define a malicious website?
How can we detect a malicious website?
How can we detect obfuscation?
How can we identify obfuscation used for malicious purposes?
How can we categorize what is malicious and what is not?

CURRENT PROBLEM
Exploits delivered at some point relies on JavaScript

CURRENT PROBLEM
Exploits delivered at some point relies on JavaScript
JavaScript is continuously getting obfuscated with more
complexity

CURRENT PROBLEM
Exploits delivered at some point relies on JavaScript
JavaScript is continuously getting obfuscated with more
complexity
Current solutions are way behind in technology

PROBLEMS WITH CURRENT


SOLUTIONS
Relies heavily on invocative functions that are not a
concrete base to be malicious (fromCharCode, eval,
unescape, etc.) and have plenty of legitimate use cases

PROBLEMS WITH CURRENT


SOLUTIONS
Relies heavily on invocative functions that are not a
concrete base to be malicious (fromCharCode, eval,
unescape, etc.) and have plenty of legitimate use cases
DOM and CSS selectors

PROBLEMS WITH CURRENT


SOLUTIONS
Relies heavily on invocative functions that are not a
concrete base to be malicious (fromCharCode, eval,
unescape, etc.) and have plenty of legitimate use cases
DOM and CSS selectors
Client side proxies for client-server interaction

PROBLEMS WITH CURRENT


SOLUTIONS
Relies heavily on invocative functions that are not a
concrete base to be malicious (fromCharCode, eval,
unescape, etc.) and have plenty of legitimate use cases
DOM and CSS selectors
Client side proxies for client-server interaction
Client side template engines

PROBLEMS WITH CURRENT


SOLUTIONS
Relies heavily on invocative functions that are not a
concrete base to be malicious (fromCharCode, eval,
unescape, etc.) and have plenty of legitimate use cases
Limited sets of characteristics

PROBLEMS WITH CURRENT


SOLUTIONS
Relies heavily on invocative functions that are not a
concrete base to be malicious (fromCharCode, eval,
unescape, etc.) and have plenty of legitimate use cases
Limited sets of characteristics
Probabilistic decisions is directly proportional with the
characteristics extracted

TYPES OF APPROACHES
Dynamic analysis of embedded JS

TYPES OF APPROACHES
Dynamic analysis of embedded JS
Static analysis of extracted JS (Method #1)

TYPES OF APPROACHES
Dynamic analysis of embedded JS
Static analysis of extracted JS (Method #1)
Static analysis of extracted JS (Method #2)

DYNAMIC ANALYSIS
AdHoc Forwarding

DYNAMIC ANALYSIS
AdHoc Forwarding
Create a middle layer between the browser and the JS
engine

DYNAMIC ANALYSIS
AdHoc Forwarding
Create a middle layer between the browser and the JS
engine
Analyze the CFG of the scripts being executed

DYNAMIC ANALYSIS
AdHoc Forwarding
Create a middle layer between the browser and the JS
engine
Analyze the CFG of the scripts being executed
Analyze a call hierarchy of functions order

DYNAMIC ANALYSIS
AdHoc Forwarding
Create a middle layer between the browser and the JS
engine
Analyze the CFG of the scripts being executed
Analyze a call hierarchy of functions order
Analyze certain combination of functions used including
known highly risky ones

DYNAMIC ANALYSIS
AdHoc Forwarding
Browser Automation

DYNAMIC ANALYSIS
AdHoc Forwarding
Browser Automation
Attach to IE process

DYNAMIC ANALYSIS
AdHoc Forwarding
Browser Automation
Attach to IE process
Use shdocvw.dll to automate COM callbacks

DYNAMIC ANALYSIS
AdHoc Forwarding
Browser Automation
Attach to IE process
Use shdocvw.dll to automate COM callbacks
Capture events while they trigger and manipulate them

DYNAMIC ANALYSIS
AdHoc Forwarding
Browser Automation
Attach to IE process
Use shdocvw.dll to automate COM callbacks
Capture events while they trigger and manipulate them
Analyze in the same manner as AdHoc Forwarding

DYNAMIC ANALYSIS
AdHoc Forwarding
Browser Automation
Browser In-Memory Injection

DYNAMIC ANALYSIS
AdHoc Forwarding
Browser Automation
Browser In-Memory Injection
Inject JS in DOM to monitor events

DYNAMIC ANALYSIS
AdHoc Forwarding
Browser Automation
Browser In-Memory Injection
Inject JS in DOM to monitor events
Use a JS Debugger (FireBug or other)

STATIC ANALYSIS (METHOD 1)


Extract local scripts

STATIC ANALYSIS (METHOD 1)


Extract local scripts
Extract remote scripts

STATIC ANALYSIS (METHOD 1)


Analyze the script and categorize them based on certain
criteria

STATIC ANALYSIS (METHOD 1)


Analyze the script and categorize them based on certain
criteria
Web page encoding

STATIC ANALYSIS (METHOD 1)


Analyze the script and categorize them based on certain
criteria
Web page encoding
Detecting current language used and extracting features

STATIC ANALYSIS (METHOD 1)


Analyze the script and categorize them based on certain
criteria
Web page encoding
Detecting current language used and extracting features
Check the WHOIS for the web page

STATIC ANALYSIS (METHOD 1)


Analyze the script and categorize them based on certain
criteria
Web page encoding
Detecting current language used and extracting features
Check the WHOIS for the web page
Determine probabilistically to which category it belongs to

SHANNONS ENTROPY
Formula

SHANNONS ENTROPY
Formula

We use Shannons Entropy to determine the entropy of the


file only as a side-effect and not a main criteria to
determine the decision whether it was malicious or not

NAVE BAYESIAN
A machine-learning technique that can be used to predict
to which category a particular data case belongs

NAVE BAYESIAN
A machine-learning technique that can be used to predict
to which category a particular data case belongs

Given the above formula: An event A is INDEPENDENT


from event B if the conditional probability is the same as
the marginal probability

LAPLACIAN SMOOTHING
To avoid having a 0 joint in any partial probability we use
the add-one smoothing technique

LAPLACIAN SMOOTHING
To avoid having a 0 joint in any partial probability we use
the add-one smoothing technique.
Given an observation x = (x1, , xd) from a multinomial
distribution with N trials and parameter vector
= (1, , d), a "smoothed" version of the data gives the
estimator

where > 0 is the smoothing parameter ( = 0 corresponds


to no smoothing)

STATIC ANALYSIS (METHOD 2)


How is JS executed/handled?

STATIC ANALYSIS (METHOD 2)


How is JS executed/handled?
1. The code is scanned for all function(s) declaration. Each
declaration is executed by creating a function object and
a named reference to that function is created so that the
function can be called from within a statement.

STATIC ANALYSIS (METHOD 2)


How is JS executed/handled?
1. The code is scanned for all function(s) declaration. Each
declaration is executed by creating a function object and
a named reference to that function is created so that the
function can be called from within a statement.
2. The statements are evaluated and executed by order as
they appear on the page after fully loaded.

JS EXAMPLE #1
<script>
DoNothing();
function DoNothing() {
return;
}
</script>

This
This works
works

JS EXAMPLE #2
<script>
DoNothing();
</script>
<script>
function DoNothing() {
return;
}
</script>

This does
does not
not
works
works

JS EXAMPLE #3
<script>
function DoNothing() {
return;
}
</script>
<script>
DoNothing();
</script>

This
This works
works

JS EXAMPLE #3
<script>
// assuming that DoNothing is not defined
DoNothing();
alert(1);
</script>

This does not


works
works

JS EXAMPLE #3
<script>
// assuming that DoNothing is not defined
DoNothing();
</script>
<script>
alert(1);
</script>

This
This works
works

STATIC ANALYSIS (METHOD 2)


Semantic analysis to focus on what does this mean

STATIC ANALYSIS (METHOD 2)


Semantic analysis to focus on what does this mean
Optimizer-Compiler for JS which focuses on structure
other than extracted invocative functions

OPTIMIZER-COMPILER
The following describes the architecture of any ordinary
compiler and the current compiler as well
Lexer
Lexer

Tokens

Parser
Parser

AST

Translator
Translator

IR

Optimizer

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Optimizer

Hidden Classes

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Optimizer

Hidden Classes
Type
Type Inference
Inference

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Optimizer

Hidden Classes
Type
Type Inference
Inference
Inline
Inline Caches
Caches

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Optimizer

Hidden Classes
Type
Type Inference
Inference
Inline
Inline Caches
Caches
Function
Function Synthesis
Synthesis

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Inline
Inline Expansion
Expansion

Optimizer

Hidden Classes
Type
Type Inference
Inference
Inline
Inline Caches
Caches
Function
Function Synthesis
Synthesis

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Inline
Inline Expansion
Expansion

Optimizer

Loop
Loop Invariant
Invariant Code
Code Motion
Motion
Hidden Classes
Type
Type Inference
Inference
Inline
Inline Caches
Caches
Function
Function Synthesis
Synthesis

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Inline
Inline Expansion
Expansion

Optimizer

Loop
Loop Invariant
Invariant Code
Code Motion
Motion
Hidden Classes
Type
Type Inference
Inference
Inline
Inline Caches
Caches
Function
Function Synthesis
Synthesis

Constant
Constant Folding
Folding

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Inline
Inline Expansion
Expansion

Optimizer

Loop
Loop Invariant
Invariant Code
Code Motion
Motion
Hidden Classes

Constant
Constant Folding
Folding

Type
Type Inference
Inference

Copy
Copy Propagation
Propagation

Inline
Inline Caches
Caches
Function
Function Synthesis
Synthesis

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Inline
Inline Expansion
Expansion

Optimizer

Loop
Loop Invariant
Invariant Code
Code Motion
Motion
Hidden Classes

Constant
Constant Folding
Folding

Type
Type Inference
Inference

Copy
Copy Propagation
Propagation

Inline
Inline Caches
Caches

Common
Common Sub-Expression
Sub-Expression Elimination
Elimination

Function
Function Synthesis
Synthesis

OPTIMIZER-COMPILER
At this phase the optimizer tries to optimize the JS input
based on optimization theories after the AST was
generated and converted into an IR
Inline
Inline Expansion
Expansion

Optimizer

Loop
Loop Invariant
Invariant Code
Code Motion
Motion
Hidden Classes

Constant
Constant Folding
Folding

Type
Type Inference
Inference

Copy
Copy Propagation
Propagation

Inline
Inline Caches
Caches

Common
Common Sub-Expression
Sub-Expression Elimination
Elimination

Function
Function Synthesis
Synthesis

Dead
Dead Code
Code Elimination
Elimination

You might also like