You are on page 1of 59

RIVERSIDE RESEARCH INSTITUTE

Deobfuscator:

An Automated Approach to the


Identification and Removal of
Code Obfuscation

Jason Raber, Reverse Engineer, Team Lead


Eric Laspe, Reverse Engineer
Overview

• The Problem: Obfuscation


• Malware Example: RustockB
• The Solution: Deobfuscator
• Demonstration
• RustockB: Before & After
• Sample Source Code
• Summary

RIVERSIDE RESEARCH INSTITUTE


The Problem: Obfuscated Code

• Malware authors use code obfuscation


techniques to hide their malicious code
• Obfuscation costs reverse engineers time:
– Complicates instruction sequences
– Disrupts control flow
– Makes algorithms difficult to understand
• Manual obfuscation removal is a tedious
and error-prone process

RIVERSIDE RESEARCH INSTITUTE


Example: PUSH_POP_MATH

PUSH an immediate, then POP into a register and


do some math on it
Obfuscated code: PUSH a value

POP it into EDX

Math on EDX

Resolves to:
Emulate Result

NOP
Unnecessary
Instructions

RIVERSIDE RESEARCH INSTITUTE


Overview

• The Problem: Obfuscation


• Malware Example: RustockB
• The Solution: Deobfuscator
• Demonstration
• RustockB: Before & After
• Sample Source Code
• Summary

RIVERSIDE RESEARCH INSTITUTE


Malware Example: RustockB

• Good malware example that


implemented obfuscation patterns to
hide a decryption routine
• Many useless and confusing instructions
– Push regs, math, pop regs
– Pushes and pops in various obfuscated forms
• Control flow obscured
– Mangled jumps
– Unnecessary data cross-references

RIVERSIDE RESEARCH INSTITUTE


RustockB Decryption Control Flow

RIVERSIDE RESEARCH INSTITUTE


RustockB Decryption Control Flow

RIVERSIDE RESEARCH INSTITUTE


RustockB Decryption Control Flow

Obfuscated Pop
Unref’d Instruction
Obfuscated Jump

Obfuscated Push

Obfuscated Jump

Obfuscated Jump

RIVERSIDE RESEARCH INSTITUTE


Overview

• The Problem: Obfuscation


• Malware Example: RustockB
• The Solution: Deobfuscator
• Demonstration
• RustockB: Before & After
• Sample Source Code
• Summary

RIVERSIDE RESEARCH INSTITUTE


The Solution:
The Deobfuscator IDA Pro Plug-in

• Combines instruction emulation and


pattern recognition
• Determines proper code control flow
• Interprets and transforms instruction
sequences to enhance code readability
• Uses a binary injector to make both static
and dynamic analysis easier

RIVERSIDE RESEARCH INSTITUTE


Modes of Operation

• The plug-in has five modes:


– Anti-disassembly – replaces anti-disassembly
with simplified code
– Passive – simple peep-hole rules
– Aggressive – uses aggressive assumptions
about memory contents
– Ultra – more aggressive assumptions
– Remove NOPs – removes slack space

RIVERSIDE RESEARCH INSTITUTE


IDA Pro Integration

• Deobfuscator plug-in invoked with Alt-Z

RIVERSIDE RESEARCH INSTITUTE


IDA Pro Integration

• Deobfuscator plug-in invoked with Alt-Z


• Uses structures created by IDA Pro
disassembly and analysis
• Depending on the mode selected, it can:
– Follow jumps and calls
– Track registers and emulate the stack

RIVERSIDE RESEARCH INSTITUTE


Overview

• The Problem: Obfuscation


• Malware Example: RustockB
• The Solution: Deobfuscator
• Demonstration
• RustockB: Before & After
• Sample Source Code
• Summary

RIVERSIDE RESEARCH INSTITUTE


Demonstration

• Demo code protected


with anti-disassembly
code and obfuscation
• Note the obfuscated
jump at the end of this
graph
• Run iteratively, the
Deobfuscator will
remove obfuscation
and improve code flow
readability

RIVERSIDE RESEARCH INSTITUTE


Run 1 – Anti-Disassembly

• Two matching patterns


– JZ_JMP
– CALL_MATH

RIVERSIDE RESEARCH INSTITUTE


Output Injection

• Text files are generated by the


Deobfuscator plug-in

RIVERSIDE RESEARCH INSTITUTE


Output Injection

• Then, we inject the new, simplified


pattern over the old pattern with a
binary injector PERL script

RIVERSIDE RESEARCH INSTITUTE


Reload

• We reload the binary


in IDA Pro and see the
obfuscated code begin
to disappear
• The Deobfuscator
replaces obfuscation
patterns and injects
NOPs over useless code
to create slack space

RIVERSIDE RESEARCH INSTITUTE


Slack Space

• Slack space is useful for patterns that


need additional bytes to create a
simplified instruction
• Example: Transformed Code 1
MOV EBX, EAX
Obfuscated Code Needs two bytes NOP
PUSH EAX
NOP
* NOP
NOP
NOP
NOP
NOP
NOP
POP EBX
Transformed Code 2
Needs five bytes
MOV EBX, IMMED
NOP
*Code that was removed by an earlier run of the Deobfuscator

RIVERSIDE RESEARCH INSTITUTE


Pattern 1: JZ_JMP

Two useless jumps


Before Deobfuscation:
Useless Jumps

After Deobfuscation:

NOP’d Jumps

RIVERSIDE RESEARCH INSTITUTE


Pattern 2: CALL_MATH

EDX gets the return address of the CALL $5


Then, there is some math on EDX
Before Deobfuscation:

EDX = 401033

After Deobfuscation:
Emulated
Result
NOP’d Pop &
Math

RIVERSIDE RESEARCH INSTITUTE


Run 2 – Passive, Aggressive, & Ultra

• Three matching patterns


– MOV_MATH
– MATH_MOV_OR_POP
– MATH_MOV_OR_POP

RIVERSIDE RESEARCH INSTITUTE


Pattern 1: MOV_MATH

Move an immediate into EAX and XOR it with


another known register value
Before Deobfuscation:
Move into EAX

EAX Math

After Deobfuscation:
Emulated
Result
NOP’d Math

RIVERSIDE RESEARCH INSTITUTE


Pattern 2: MATH_MOV_OR_POP

Do math on EDX, then MOV an immediate


into EDX before using it
Before Deobfuscation:
EDX Math

After Deobfuscation:

NOP’d Math

RIVERSIDE RESEARCH INSTITUTE


Pattern 3: MATH_MOV_OR_POP

Do math on EDX, then POP EDX before using it

Before Deobfuscation:
EDX Math

POP EDX

After Deobfuscation:
… NOP’d Math

RIVERSIDE RESEARCH INSTITUTE


Run 3 – Passive, Aggressive, & Ultra

• Two matching patterns


– MOV_MATH_TRACK_REGISTER
– PUSH_POP_MATH

RIVERSIDE RESEARCH INSTITUTE


Pattern 1:
MOV_MATH_TRACK_REGISTER

Deobfuscator tracks register math


and emulates the result
Before Deobfuscation:


ECX Math

After Deobfuscation:
Emulated
Result

RIVERSIDE RESEARCH INSTITUTE


Pattern 2: PUSH_POP_MATH

PUSH an immediate, then POP EDX and


do some math on it
Before Deobfuscation:

EDX Math

After Deobfuscation:
Emulated
Result

RIVERSIDE RESEARCH INSTITUTE


Run 4 – Passive, Aggressive, & Ultra

• Two matching patterns


– MOV_REG_MOV_REG
– MOV_JMP

RIVERSIDE RESEARCH INSTITUTE


Pattern 1: MOV_REG_MOV_REG

MOV an immediate into EDX, then MOV EDX again


before using it
Before Deobfuscation:
Move into EDX

Move into EDX


Again
After Deobfuscation:

NOP’d Move

RIVERSIDE RESEARCH INSTITUTE


Pattern 2: MOV_JMP

MOV an immediate into EDX, then JMP EDX


Before Deobfuscation:

OB Jump
After Deobfuscation:

Jump Resolved

RIVERSIDE RESEARCH INSTITUTE


New Code Region Defined

Deobfuscation resolved the jump, causing


IDA Pro to analyze a new code region

RIVERSIDE RESEARCH INSTITUTE


New Code Region Defined

Deobfuscation resolved the jump, causing


IDA Pro to analyze a new code region

RIVERSIDE RESEARCH INSTITUTE


Run 5 – Anti-Disassembly

• Two matching
patterns
– JNZ_JZ
– JMP_INTO_INSTR

RIVERSIDE RESEARCH INSTITUTE


Run 5 – Anti-Disassembly

• Two matching
patterns
– JNZ_JZ
– JMP_INTO_INSTR
• The result is a better
control flow graph

RIVERSIDE RESEARCH INSTITUTE


Pattern 1: JNZ_JZ

Two conditional jumps to the same location


Before Deobfuscation:


Same
Destination

After Deobfuscation:

Unconditional
… Jump

RIVERSIDE RESEARCH INSTITUTE


Pattern 2: JMP_INTO_INSTR

JMP into the middle of the instruction


Byte sequence EB FF E0 is read by IDA Pro as a JMP $+1

EB FF E0

Before Deobfuscation:
Jump One Byte

RIVERSIDE RESEARCH INSTITUTE


Pattern 2: JMP_INTO_INSTR

JMP into the middle of the instruction


The real instruction is in the next two-bytes: JMP EAX

EB FF E0

After Deobfuscation:
Real Jump

RIVERSIDE RESEARCH INSTITUTE


Run 6 – Anti-Disassembly

• Unreferenced instructions
• One matching pattern
– MOV_JMP

RIVERSIDE RESEARCH INSTITUTE


Pattern: MOV_JMP

MOV an immediate into EAX, then JMP EAX


Before Deobfuscation:

OB Jump
After Deobfuscation:

Jump Resolved

New Code
Region Defined

RIVERSIDE RESEARCH INSTITUTE


Run 7 – Passive, Aggressive, & Ultra

• Three matching patterns


– REG_NO_REF
– MOV_MATH_TRACK_REGISTER
– REG_NO_REF

RIVERSIDE RESEARCH INSTITUTE


Patterns 1 & 3: REG_NO_REF

MOV values into registers and never use them


Before Deobfuscation:

ECX Unused

After Deobfuscation:

NOP’d Move /
Encountered
Retn

RIVERSIDE RESEARCH INSTITUTE


Pattern 2:
MOV_MATH_TRACK_REGISTER

Deobfuscator tracks register math


and emulates the result
Before Deobfuscation:

EBX Math

After Deobfuscation:
Emulated
Result

RIVERSIDE RESEARCH INSTITUTE


Run 8 – Passive, Aggressive, & Ultra

• Two matching patterns


– REG_NO_REF
– REG_NO_REF

RIVERSIDE RESEARCH INSTITUTE


Patterns 1 & 2: REG_NO_REF

MOV values into registers and never use them


Before Deobfuscation:
EBX Unused

EDX Unused

After Deobfuscation:

NOP’d Move

NOP’d Move

RIVERSIDE RESEARCH INSTITUTE


Finishing Up

• The Deobfuscator has finished matching


obfuscation patterns
• Slack space is no longer needed, so we
run “NOP Remove” mode
• It injects unconditional jumps to reduce
NOPs and simplify the appearance of the
control flow

RIVERSIDE RESEARCH INSTITUTE


NOP Remove
Before: After:

RIVERSIDE RESEARCH INSTITUTE


Overview

• The Problem: Obfuscation


• Malware Example: RustockB
• The Solution: Deobfuscator
• Demonstration
• RustockB: Before & After
• Sample Source Code
• Summary

RIVERSIDE RESEARCH INSTITUTE


RustockB: Before & After

Deobfuscated!

RIVERSIDE RESEARCH INSTITUTE


RustockB Decryption Pseudo-code
for (i = 7; i > 0; i--)
{
Address = 0x00401B82 // Starting address of encrypted region
Key1 = 0x4DFEE1C0 // Decryption key 1
Key2 = 0x0869ECC5 // Decryption key 2
Key3 = 0 // Decryption key 3
Key4 = 0 // Decryption key 4 (Accumulator)
for (j = 0x44DC; j > 0; j--, Address += 4) // 0x44DC = size of encrypted region
{
for (k = 2; k > 0; k--)
{
Key4 = k * 4
Key4 ^= 0x5E57B7DE
Key4 ^= Key3
Key4 += Key2
Key1 ^= k
[Address] -= Key4
Key3 += Key1
}
}
}

for (i = 0x44DC, Address = 0x00401B82, Sum = 0; i > 0; i--, Address += 4)


Sum += [Address] // Add up the encrypted region (a DWORD at a time) in EAX

for (i = 0x44DC, Address = 0x00401B82; i > 0; i--, Address += 4)


[Address] ^= Sum // XOR each DWORD of the encrypted region with the sum in EAX

RIVERSIDE RESEARCH INSTITUTE


Overview

• The Problem: Obfuscation


• Malware Example: RustockB
• The Solution: Deobfuscator
• Demonstration
• RustockB: Before & After
• Sample Source Code
• Summary

RIVERSIDE RESEARCH INSTITUTE


Sample Source Code

A Simple Problem:

RIVERSIDE RESEARCH INSTITUTE


Sample Source Code

The Simple Solution:


//-------------------------------------------------------------------------------
// CALL NULL - A function call that just returns
//-------------------------------------------------------------------------------
int CALL_NULL(insn_t call, FILE *outfile, int *instr_offset)
{
if (call.itype == NN_call && call.Operands[0].type == o_near)
{
if (!get_next_instruction(call.Operands[0].addr)) return 0;
insn_t ret = cmd;

// Function that just returns


if (ret.itype == NN_retn)
{
*instr_offset = call.size;
msg("\n%a CALL_NULL\n", call.ea);

// NOP the call


fprintf(outfile, "%X 5 90 90 90 90 90\n", get_fileregion_offset(call.ea));

// NOP the return


fprintf(outfile, "%X 1 90\n", get_fileregion_offset(ret.ea));

return 1;
}
}

return 0;
}

RIVERSIDE RESEARCH INSTITUTE


Overview

• The Problem: Obfuscation


• Malware Example: RustockB
• The Solution: Deobfuscator
• Demonstration
• RustockB: Before & After
• Sample Source Code
• Summary

RIVERSIDE RESEARCH INSTITUTE


Summary

• Most malware authors that wish to


protect their IP use obfuscation
techniques
• The Deobfuscator detects and simplifies
many of these obfuscation and anti-
disassembly patterns
• Over time, the repository of patterns will
be developed to characterize most
generic cases of obfuscation

RIVERSIDE RESEARCH INSTITUTE


Future Development

• Grammar
• Iterative patching of IDA database
• Black-box control flow
• Code Collapsing

RIVERSIDE RESEARCH INSTITUTE


Contact

• For more information on this and other


tools, contact:

RRI Software Security Team


softwaresecurity@rri-usa.org
937-427-7085

• Visit us online:
http://www.rri-usa.org/isrsoftware.html
RIVERSIDE RESEARCH INSTITUTE

You might also like