You are on page 1of 33

C++ Reverse Disassembly

Opcodevoid, 25 Aug 2004


4.86 (70 votes)

This article's aim is to provide material for modern day decompiling of an application written in
C++

Technical Detail
This article's aim is to provide material for modern day decompiling of applications written in C++.
We assume you have a solid understand of C++, X86 Assembly, and windows.

Overview and contents


1.
1.
2.
3.
2.
1.
2.
3.
4.
5.
6.
3.
1.
2.
3.
4.
5.
4.
1.
2.

Why is C++ Decompiling possible?


Intro
Modern Day Example
Compiler Specific
C++ Protocols
Intro
Global Variables
Expressions
Return Values
Function calls and the stack
Local Variables
C++ Keywords
Intro
If statement
For Loop
Structures
Technical Algorithms
Practical Decompiling
Intro to Decompiling Windows application
Decompiling a sample application

Special Case: Compiler Specific


Compiler Specific:
Each compiler is different, such as their CrtlStartUp routines, their statement assemblies
(switch , if,while), and numerous other things make each compiler generate different code, even if

you compile the same C++ code on two compilers, the end result will be different, because of this I
will stick with one and only one compiler, which is the Visual C++ Compiler.
Visual C++ is produce by Microsoft and currently delivers the fastest and most optimized code
available. Not to say all the information provided in this book only applies to Visual C++, I just saying
some of the information presented in this book may only work on Visual C++.
If you dont have Visual C++ that is fine, there are many other compilers available, and most of this
information is also accurate for them

Chapter 1: Why is C++ Decompiling possible?


1.1 Intro:
I been ask many times is C++ decompiling even possible not only due to the complexity of a
compiler but for the mass about of information loss in compiling, such as comments , include files,
macros just to name a few. So one often wonders is this even worth pursing. Well I wanted to start
out with the topic of what is totally loss when you compile a program and what stays there, refer to
table 1.1.1 to see what we loses and remains.

What is lost

What remains

templates

Function calls

classes

Dynamic linking calls

Marcos

Switch statements

Include files

Local Variables

comments

Parameters

Not to say everything that is in the What remains sections is 100% there, it just means it is very
simple and practical to reverse engineer. Because of this fact I choose to deal with the What
remains section first because its much easier.
As we progress though this book keep in mind reverse engineering is almost never practical and
takes lots of practice. Its harder to reverse engineer something created than to create it in the first
place.

A good way to start out with reverse engineering is to decompile your own programs and see how
each C++ function specifically works, then apply that knowledge in other areas because looking at
thousands of lines of assembly code is not really fun.

1.2 Modern Day Examples:


Now when your reading this book you might start to think that , anything translated info a different
language can be retranslated back into the same language right, well this is not the case in reverse
engineering a lot of things will be lost, and a lot of things you must make up(assume) along the way.
So I wanted to make sure a provided some practical examples for reverse engineering at the
beginning of the book, to give you a sense of hope.
To begin reverse engineering, I decided to start with the main C++ statement
Hide Copy Code

Int main(int argc, char * argv[])

Now we can easily find this statement in any executable file due to the PE format which tells us the
start of the executable, because of this we can simply read the PE format in a specific executable
and get its start address. Or can we?
This is where the Common Runtime Library comes in at (CRTL), you see when you compile a C++
program most compilers (because this is compiler specific stuff) will execute in the following order
1.
2.
3.

CrtlStartUp();
Int main(int argc, char * argv[])
CrtlCleanUp();
this means we cant look into the PE file and get the start of our code, we can only get the start of
the CrtlStartUp()s code. We have to choices, reverse engineer the CrtStartup Code or skip over it, I
like the latter, and we will deal with the Common Runtime Library later.

Chapter 2: C++ Protocols:


2.1 Intro
One of the main reason C++ is so well design is because it has a strict protocols use in its
assemblies. C++ has some very static assemblies such as when you return values, it is always put
in the EAX register, and function calling usually always use the stack because of this reverse
engineers can attack this static assemblies and get a head start
The first thing we should deal with is Global Variables because if youre coming from a lot of high
level languages you might have some miss conceptions.

2.2 Global Variables

You know how many books say Memory is stored random on the computer, well this is true for the
most part, but your application memory allocation for global variables is quite static. Thats right each
time you run your program, your static allocated variables will always end up in the same place.
Another interesting fact is variables dont hold data, they pointer to where the data is stored.
Here is a C++ Example:
Hide Copy Code

#include "stdafx.h"
#include "windows.h"
char * globalvar = "Whats Up";
int APIENTRY WinMain(HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPSTR lpCmdLine,
int nCmdShow)
{
// TODO: Place code here.
globalvar = (char *)0x400000;
return 0;
}

Here is a in depth look at the disassemblies


Hide Copy Code

00405030: global_var dd 405034h


00405034: global_var_value db 'Whats Up',0
mov global_var,400000h

OK, this proves that variables do not hold data, as you can see, the compiler automatically initialize
our global_var pointer to the address of global_var_value.
OK, so far we know that variables are just pointers to values, so we can change were the variable is
pointer right? Yes we can, with mov global_var, 400000h so whenever the compiler
accesses global_var, it will look into the value stored at 405030h and come up with 400000h
If youre confused remember global_var is stored at 405030h, and refer to the picture 2.

This picture is pretty self explanatory and if youre still confused how everything works then I suggest
you get a good assembly book and learn what indirect addressing is.
We have just dealt with a pointer variable lets deal with just a variable, because this is much more
simple.
Hide Copy Code

#include "stdafx.h"
#include "windows.h"
char globalvar[] = "Whats Up";
int APIENTRY WinMain(HINSTANCE hInstance,
INSTANCE hPrevInstance,
LPSTR lpCmdLine,
int nCmdShow)
{
globalvar[0] = 'A';
globalvar[5] = U;
return 0;
}

Which when compiled becomes


Hide Copy Code

00405030 global_var db Whats up,0


mov global_var, A
mov global_var + 5 , U

When instantly see that regular variables or a lot simpler than global variables, all we have to do is
refer to a address in memory which holds or data , of course in machine code we cant see pretty
names likeglobal_var, so here is a pure disassembly
Hide Copy Code

00405030 Whats Up,0


mov 00405030,A
mov 00405035,U

As you can see, we arent doing anything special just modifying the values store
at 00405030 and 00405035.
You should have variables and pointer variables down pack, since this information will not be explain
again, if there is something you dont understand, read it over.

2.3 Expressions
OK, as we all know C++ has near English like syntax and which we can program in. Well X86
assembly code doesnt, for example take a look at the following statement
Hide Copy Code

Int s = 3 + 4 + 1 + 5 + 9;

How can we calculate this in assembly? simple, look at the following C++ example

Hide Copy Code

#include "stdafx.h"
#include "windows.h"
int
int
int
int

s1
s2
s3
s4

=
=
=
=

3;
4;
1;
13;

int APIENTRY WinMain(HINSTANCE hInstance,


HINSTANCE hPrevInstance,
LPSTR lpCmdLine,
int nCmdShow)
{
// TODO: Place code here.
s1 = s2 + s3 s4 + 34;
return 0;
}

Which when compiled becomes


Hide Copy Code

00405030 s1 dd 3
00405034 s2 dd 4
00405038 s3 dd 1
0040503C s4 dd 1
mov eax, s2
00401008 add eax, s3
0040100E sub eax, s4
00401014 add eax, 34
00401017 mov s1, eax

OK the compiler optimizes the code a little bit, but its still very easy to understand.

The first thing the compiler does is load up eax, with the value of s2 with mov eax, s2
Now eax holds 4, the next thing we do is add eax to s3,
Now eax holds 5, after that we subtract eax from s4,
Now eax holds 4, after that we add eax to 34,
Now eax holds 38
Then we finish it up by moving s1 to eax which holds 38, now s1 holds 38.
You will often see the compiler use registers instead of variables in expression because registers are
faster.
From this we can conclude that for each mathematical operator the compiler maps it with a specific
X 86 Instructions, here is a table

C++ Operator

X86 Instruction

* (Multiply)

Mul , (use fmul for floating point)

/ (Division)

Div (use fdiv for floating point)

- (Subtraction)

Sub

+(Addition)

Add

As you can see, we can easily decipher most statements in C++ using the table above.
For a test we will look at a sample disassembly dump and decompile it by hand to C++.
Hide Copy Code

0000000 2
0000001 3
0000002 4
0000003 0
0000004 1
0000005 mov al, [00000000]
add al, [00000001]
mov ch, [00000002]
mul ch
mov [000000003],ax

OK the first thing we do is try to figure out what type of variables they are using
And from what we can see they our using al and ch, which are 8 bit registers, so that means
whenever they reference anything with 8 bit registers, it means the variable is a Char type.
On down you see that they do a mov [000000003], ax, and since ax is a 16 bit register the
variable type is short int.
Here is a small table, so you can map registers to variable types

X86 Registers

C++ Type Variables

8 bit registers ( AL , AH)

Char

16 bit registers (AX)

Short int

32 bit registers (EAX)

Int

So far we see 4 references to memory addresses, because of this we know we have 4 variables, the
first one [000000000] is obviously an char type variable since we see,
Hide Copy Code

mov al, [00000000]

and since al is an 8 bit register.


So lets give [0000000] the name of s1, we also see that [0000000] though[00000002] is all
reference by 8 bit variables meaning they are also char type, and the last one [00000003] whichs
use like mov[000000003] , ax is a short int type since ax is 16 bits
OK lets create another table one which will hold variable names or alias for the addresses
Although we can never get the original variable name we can also create our own.

Addresses

Variable names/aliass

Variable size

0000:0000

S1

Char

0000:0001

S2

Char

0000:0002

S3

Char

0000:0003

S4

Short int

You might be confused why 00000004 holds 1 and 00000003 doesnt, well this is because Intel
is a little edian machine, that stores values in reverse word order.
Now the next thing we should do is rewrite the code above with our aliass we created
Hide Copy Code

s1 db 2
s2 db 3
s3 db 4

s4 dw 1
mov al, s1
add al, s2
mov ch, s3
mul ch
mov
s4,ax

Now the first thing we do is mov al, s1


OK al now holds a value of 2, the next thing we do is add al, s2
Now al has a value of 5, since s2 had a value of 3 in it the next thing we want to do is mov ch, s3?
Now ch has a value of 4, after that we mul ch, now ax has the value of al * ch,
And since al had a value of 5 in it and ch had a value of 4 in at, ax has the value of 20.
OK we can start to decipher the C++ statement which is
Hide Copy Code

s1 + s2 * s3

After that we see that we see, mov s4, ax so the complete C++ statement is
Hide Copy Code

S4 = s1 + s2 * s3;

As you can see we just went though a whole bunch of mess to come up with a simple C++
statement, and this only works for global variables. Not local variables or structure members. So
things will only get harder, due to this I suggest you read carefully and if you dont understand
something read it over and over until you do.

2.4. Return Values


One of the major fundamentals of C++ is returning values from function call. This is actually a very
simple procedure, because it simple involves placing a value into the eax register.
So when you have a statement like this
Hide Copy Code

c = (char *) malloc (0xFF);

The first thing the compiler does is call malloc and then it assigns c to eax like mov c, eax
For example if you have a statement that returns 5; what you our really saying is
Hide Copy Code

__asm
{
Mov eax, 5
Ret
}

Lets have a little practice with a full disassembly dump


Hide Copy Code

Mov eax,5
Add eax,2
Sub eax,1
Ret

And the C++ equivalent is


Hide Copy Code

return 5 + 2 1;

This although simple is one of the most important concepts a C++ reverse engineer can learn.

2.5 Function Calls and the Stack


Now its time to get to the blood and guts of C++ with function calls.
Function calls are fairly simple for the most part because they our just labels for assembly
programmers example.
Hide Copy Code

Int func () {return 1 ;}


Func();

Would compile into


Hide Copy Code

Func:
Mov eax, 1
Ret
Call Func

From this we can conclude two things, the first is:


Functions name or like variables, they are just references to some address which is the same as a
label
Here is a full disassembly dump for practice
Hide Copy Code

0000:0000
0000:0001
0000:0002
0000:0003

0
0
0
0

0000:0005 mov eax,1


0000:0009 ret
0000:0010 call 0000:0005 code starts here
0000:0015 mov [0000:0000],eax

OK the first thing we see is that at address 0000:0015 , we our assign a 32 bit memory address to
the value of a 32 bit register whichs mean that we have a 32 bit variable at hand or a int type
variable to be more exact.
So lets create an alias for the addresss 0000:0000 0000:0003, which will be s1.
Now lets create a new disassembly with this added information
Hide Copy Code

S1 dw0
0000:0005 mov eax,1
0000:0009 ret
0000:0010 call 0000:0005 code starts here
0000:0015 mov S1,eax

OK the second thing we see is that code start at 0000:0010 and the first instruction is
call 0000:0005.
Now were at 0000:0015 we can see that the code is moving a value into eax then returning. Now we
our at address 0000:0015 and we just moved s1 into eax
So we can now reverse engineer this whole program back into C++
Hide Copy Code

Int s1 = 0; //dw 0
Int some_function()
{
return 1; //mov eax ,1 : ret
}
s1 = some_function(); //mov s1 , eax

Now what do we do when functions have parameters, well things get pretty complicated because the
compiler uses the stack to handle parameters.
It pushes in parameters right to left, meaning the last parameter goes in first, and the first parameter
goes in lest.

For example, C++ Function:


Hide Copy Code

Func (1, 2);

Would compile into


Hide Copy Code

Push 2
Push 1
Call func

Now lets have an imaginary stack frame, which has a size of 32


Now the first thing we realize is that ESP = 32, with that in mind look at the table below

X86 Instruction

Memory address stored at

Stack Frame Pointer value

Push 2

[32]

ESP = 28

Push 1

[28]

ESP = 24

Call func

[24]

ESP = 20

Push ebp

[20]

ESP = 16

Remember when you issue a call instruction on the X86 machines, the Processor stores the current
address on the stack so it can know the location it should return to.
Now that the parameters are on the stack lets look at the function itself
Hide Copy Code

Int func (int a, int b)


{
return a + b;
}

The first thing the compiler does is Mov eax, [ESP + 8], since ESP equals 20, and the
first parameter is stored at [28].
The second thing the compiler does is add eax, [esp + 12] and since ESP equals 20
and the second parameter is stored at 32.
The last thing the compiler does is ret

So the full compilation would be


Hide Copy Code

Func:
Mov eax, [ESP + 8]
Add eax, [ESP + 12]
Ret

A neat little reverse engineering tip is to remember that sense the stack has a fix width of 4 bytes,
you can easily tell what parameter they our accessing.
Hide Copy Code

[EBP] = Stack
[EBP +4] = Return address
[EBP + 8] = First
[EBP + 12] = Second
[EBP + 16] = Third
[EBP + 20] = Fourth

And so on.

2.6 Local Variables


We just learn that parameters are stored on the stack, now it time to learn about local variables
which are also stored on the stack, but local variables are stored quite different.
Here is an example
Hide Copy Code

Int func ()
{
int a = 5;
return a;
}

OK to compile this code, the compiler must first reserve space on the stack by going

Sub ESP, 4. Since 4 bytes is the size of an int variable. Of course the compiler must first back up
the esp register , and it does this by mov ebp,esp , but wait, the compiler must first back up ebp,
and it does this by push ebp so the very first thing the compiler does is
Hide Copy Code

: Setting up the stack frame


Push ebp; back up ebp

Mov ebp, ESP; back up ESP in ebp


Sub ESP, 4; reverse some space on the stack

Note: C++ always compiles code like Setting up the stack frame in any function, even if you use or
dont use local variables, and the compiler always uses ebp to reference parameters and local
variables.
In the Function Calls and the Stack section I use esp to reference parameters and skip Setting up
the stack frame code this out for clarity sake.
Now the second thing the compiler does is
Hide Copy Code

Mov [ebp 4], 5


Mov eax, [ebp -4]

If we had a second local variable we could simple go

Mov [ebp 8], 5, or course the compiler would use sub ESP, 8 Instead of sub ESP, 4.
The last thing the compiler does is restore the stack frame and return
Hide Copy Code

; Cleaning up the stack frame


Mov ESP, ebp; restore stack pointer
Pop ebp; restore ebp
Ret

Note: The compiler always execute the Cleaning up the stack frame code, in every function, due to
this we can detect a function by looking for similar code. I also skip this in functions call and the
stack section for clarity sake.
Here is a full disassembly dump, for practice
Hide Shrink

0000:0000
0000:0004
0000:0003
0000:0005
0000:0010
0000:0015
0000:0016
0000:0018
0000:0020
0000:0021
0000:0022
0000:0023
0000:0025
0000:0030

0
push ebp
mov ebp,esp
sub esp, 8
mov [ebp -4], 5
add [ebp 4] , [ebp + 8]
mov eax,[ebp 4]
mov esp, ebp
pop ebp
ret
push ebp
mov ebp,esp
add [ebp + 8] , [0000:0000]
add [ebp + 8] , [ebp + 12]

Copy Code

0000:0031
0000:0032
0000:0035
0000:0036
0000:0037
0000:0038
0000:0040
0000:0044
0000:0049
0000:0050
0000:0051
0000:0052
0000:0056
0000:0058
0000:0059

mov eax,[ebp +8]


mov esp, ebp
pop ebp
ret
push ebp ;code start
mov esp, ebp
push 1
call 0000:0002
mov [0000:0000],eax
push 4
push 3
call 0000:0022
add [0000:0000],eax
mov esp, ebp
pop ebp

OK the first thing we is that memory address [0000:000] is being reference by eax a lot, meaning
we have a 32 bit variable which is an int type. The next thing we notice is we set up the stack frame
3 times and clean it up 3 times, which means we have 3 functions(and yes int main() also sets
up the stack frame and cleans it up).
So we have
Hide Copy Code

Func1 ()
Func2 ()
Main ()

Next we see Func1 address is at 0000:0004 and accept one 32 bit parameter
Because we see at address 0000:0040 we push 1 into the stack and then at
address 0000:0044 we are calling 0000:0004 so we can setup func1 declaration
Hide Copy Code

00000:00002 Func1 (int a)

Now whenever func1, does anything to [ebp + 8] we know that it is doing something to its first
parameter. So look into func1 code, and we see that it has 1 local variable because it references
[ebp 4].
Now lets take a lot at address 0000:0049, which is mov [0000:000], eax so we know that the
original C++ code is something like
Hide Copy Code

[0000:0000] = func1 (1);

Next when see at address 0000:0051 that we are pushing 4 onto the stack then after that we are
pushing 3 onto the stack then we all 0000:0022.
Now we can setup Func2 declarations

Hide Copy Code

0000:0022 Func2(int a, int b)

At address 0000:0056 we see add [0000:0000],eax , means the original C++ code is
something like
Hide Copy Code

[0000:0000] += Func2(3,4)

Remember we pushed 4 onto the stack first, and 3 onto the stack second, because parameters or
passed right to left.
Now that we have a lot of information lets make a new disassembly one with alias for all local
variables and parameters in Func1 and Func2. Since we know that whenever they use code
like [ebp +] its a parameter, and when they use code like [ebp -...] its a local variable.
Hide Copy Code

0000:0000 s1 dw 0
0000:0004 func1(int param_1): push ebp
{ local : local_var_1}

OK I know, I made up a little assembly syntax such as func1(int param_1) and


Hide Copy Code

{Local : local_var_1 }

This is for clarity sake thats all.


Now lets start with func1 at address 0000:0010 we see that it is moving local_var_1 to 5, which
in C++ it's saying
Hide Copy Code

int local_var_1 = 5;

next we see add local_var_1, param_1 which in C++ its saying


Hide Copy Code

local_var_1 +=param_1

The last thing we see before we clean up the stack is mov eax,local_var_1 which in C++ its
saying
Hide Copy Code

return local_var_1;

So the full reversed engineered function is

Hide Copy Code

Int func1(int param1)


{
int local_var_1 = 5;
local_var_1 += param1;
return local_var_1;

Now lets go to func2 at address 0000:0025 we see add param_1, s1, which in C++ its saying
Hide Copy Code

param_1 +=s1;

after that we see add param_1, param_2, which in C++ its saying
Hide Copy Code

param_1 += param_2;

the last thing we see before we clean up the stack is mov eax, param_1, which in C++ its saying
Hide Copy Code

return param_1;

So the full reversed engineered function is


Hide Copy Code

Int func2(int param_1 int param_2)


{
param_1 += s1;
param_1 += param_2;
return param_1;
}

Now we our able to reverse engineer the whole program


Hide Copy Code

Int s1 = 0;
Int func1(int param1)
{
int local_var_1 = 5;
local_var_1 += param1;
return local_var_1;
}
Int func2(int param_1 int param_2)
{
param_1 += s1;
param_1 += param_2;
return param_1;
}

int main()
{
s1 = func1(1);
s1 += func2(3,4);
}

This Chapter might be a little hard to comprehend at first since I presented a lot of straight to the
point information, again if you dont understand anything read it over, and if you still dont
understand emailvbmew@hotmail.com with your question

Chapter 3: C++ Keywords


3.1 Intro:
What we been doing so far is the easy stuff, its time to deal with C++ keywords complex expression,
and some practical real world examples.

3.2 If Statement
One of the main statements people use is this if statement which logically compares values. Using
this function we can choose which path of execution our program should take.
If statement can also be very , very complex and very simple
Take a look at the following examples.
Hide Copy Code

If(I ==0)
//do function
//continue

Now what if we had something like this


Hide Copy Code

If(I==0)
{
int i2 = 0;
}
i2 = 3; //error cant access i2 because its not in your scope
// its in the if statements scope

Because of this we know that compiler generates a stack frame for each If statement with brackets
right? Wrong!.
I2 is accessible to main in reality but the compiler keeps it hidden, the reason I m telling you this is
because to reverse engineer if statements you must completely understand them.
The second example is

Hide Copy Code

If( (I ==0) || ( ( I2 == 1) && (i3 ==2) ) )

The logic for this is if I = 0 or if i2 = 1 andi3 = 2


Another Example would be
Hide Copy Code

If( (c = (char *) malloc(0xFF) ) == NULL)

This is saying c = malloc(0xFF) and if malloc return NULL this condition is true.
Yet another example is
Hide Copy Code

If(malloc(0xFF)) //this is saying call malloc(0xFF) and if it returns


//anything not equal to 0 then This condition is true

The last but not least example is


Hide Copy Code

If(!malloc(0xFF)) //this is saying call malloc(0xFF) and if it returns


// value is equal to zero then this condition is true

Thankfully all these if statement can be reverse engineer in turn back into just the way they
are(almost).
Now the if statement maps directly to the X86 instruction cmp with this in mind take a lot at the
following C++ program
Hide Copy Code

int main()
{
int I = 0;
if(I == 34)
i+= 23;
return 1;
}

This compiles into the following


Hide Copy Code

push ebp
mov ebp,esp ;setup the stack frame
sub esp, 4
mov [ebp 4],0
cmp [ebp 4], 34 ;
jnz continue_program

add [ebp 4],23


continue_program:
mov eax,1
mov esp, ebp ;restore the stack frame
pop ebp
ret

Yes I know I decided to give you a complete binary disassembly to see if you remember about the
stack frame and the [ebp -4] which means the first local variable created and yes int main has to
setup the stack frame like every other function.
Now lets learn how to turn this program back into C++
The first thing we do is look at the compare mov [ebp 4],0 which is telling us that the program is
initilize a variable to 0.
Next we see a cmp instruction that is comparing [ebp -4],34 , because of this we know the program
is using a if statement, you know if [ebp -4] = 34 what we should do now is create some alias for
[ebp -4] we will use local_var_1. next we see the instruction jnz, which is the same as jne which is
saying if[ebp -4] or local_var_1 is not 34 then skip over this if statement and jump to
continue_program.

Add [ebp -4], 34 or add local_var_1, 34 is saying local_var_1 += 34; After that we
moveax,1, clean up the stack frame and then return.
Now lets look for a multiple logical if statements
Hide Copy Code

If( (i==0) || (i2 == 23) && (i3 ==21) )


If_block_check1:
Cmp I,0
Jne if_block_check2:
Jmp do_if
If_block_check2:
Cmp i2,23
Jne skip_if
Cmp i3,21
Jne skip_if
Do_if:
; actions here
skip_if:

OK the first thing we see is that on multi logical if statements when one condition fails it jumps to the
next logical expression to see if that will evaluate to true, as shown in figure 3.2.1

So if we have a multi logical if statement, and part of the expression succeeds we continue to
evaluate the expression until something is false.
Of course this is only true for a &;& operator. For a || operator if one part of the expression is true
we quit that entire expression and the if statement evaluates as true.

3.3 For Loop


The for Loop is not only one of the most interesting things about C++ it is one of the most use
statements.
The interesting factor for the for loop comes in its ability to evaluate 3 expressions
Hide Copy Code

For( <expression 1>; <expression 2>; <expression 3>)

The Expression our usually


Hide Copy Code

For( <assignment>; <conditional>; <increment| decrement>)

Reverse engineering the for statement is not hard, because its really a if statement in most cases
Hide Copy Code

If(I < 4)
{
i++;
//do actions
}

Now for the for loop equivalent


Hide Copy Code

for(int I =0;i<4;i++)
{
//do actions
}

OK lets look at a simple reverse disassembly for the for loop


Hide Copy Code

Mov [ebp 4],0 ;initilize the local variable


Jmp condition
Increment:
Add [ebp -4],1
Condition:
Cmp [ebp -4],4

Jge done
Loop:
;do actions
Jmp increment
Done:

As you can see the for loop is nothing more than a high level if statement, the first thing we do is
initilize the local variable on the stack , after that check the condition statement. Then we go to the
loop, then at last we jump back to increment then we jump yet again to the condition label and again
until the condition is true.

3.4 Structures
Structures are very useful in C++ because of there ability to contain members. A structure lets you
define a variable of any size , example
Hide Copy Code

Struct test1
{
int member1;
int member2;
};

This creates a 64 bit , 8 byte variable in memory. So in a sense structures or regular variables but
allow us to access certain parts of that variable independently from others
This makes it very useful
Because if you were to use char test1[8]; you would be create the exact same in memory as Struct
test1, only it would be much harder to access 4 byte members individually in char test[8];
Here is a example of using test1 as a local variable
Hide Copy Code

Sub ESP, 8 ;reverse 8 bytes on the local stack


Mov [ESP -4], 45 ;move member2 to 45
Mov [ebp -8], 12 ;move member1 to 1

As you can see structures are stored reverse in memory, because you would think
That member one would be the last on the stack, but it turns out it is the first on the stack
For a global variable the compiler would simply reverse 8 bytes in the executable in reference those
each individually base on the member you have chosen.

3.6 Technical Algorithms

I am providing some algorithms to prove and help you understand some of the theory I presented in
this book.
This following example proves that variables inside a if block our truly accessible to the whole
function.
Hide Copy Code

#include "stdafx.h"
#include "iostream.h"
int main(int argc, char* argv[])
{
__asm mov dword ptr [ebp -4], 23
if(true)
{
int i;
cout << i << endl;
}
return 1;
}

The output should be 23 even though we never initialize I , if your confused remanber that since I is
the first variable and the only variable its location is [ebp -4].
This next example proves that structures are just regular variables with the given ability to be access
in parts instead of wholes.
Hide Copy Code

#include "stdafx.h"
#include "iostream.h"
struct test1
{
int member1;
int member2;
int member3;
};
int main(int argc, char* argv[])
{
test1 local_struct;
local_struct.member1 = 1;
local_struct.member2 = 1;
local_struct.member3 = 1;
__asm
{
add dword ptr [ ebp - 12],55 ; structure 1
add dword ptr [ ebp - 8] , 100 ; structure 2
add dword ptr [ ebp - 4] , 23 ; structure 3
}
cout << "member 1: " << local_struct.member1 << endl;
cout << "member 2: " << local_struct.member2 << endl;
cout << "member 3: " << local_struct.member3 << endl;
return 1;
}

Output should be
Hide Copy Code

member 1: 56
member 2: 101
member 3: 24

Chapter 4: Practical Decompiling


This Chapter aims to provide knowledge of practical decompiling, in this chapter we will learn to use
a disassembler, and learn to decompile real world applications.
4.1 Intro to Windows decompiling
Windows decompiling is not that difficult since all windows programmers follow a strict programming
method such as CreateWindowEx, or CreateDialog, and All windows have message loops
which you can easily find. Before we really start getting into decompiling lets go over the basic. In
the vast world of windows there are many types of application, and many more types of technology.
Therefore all of it is too much to cover in one tutorial. On top of that, this information only applies to
application that uses the basic window functions, such as CreateWindowEx, and CreateDialog.
Applications made in visual basic, or Delphi use there own engine, and there engines will not be
cover. Also there is MFC, which is simply a class wrapper to API calls, but can greatly complex
things. We will be working on an application I made in pure win32 API, All it does is show a window,
but we all know showing a window requires a significant amount of work.
1. Create the window class
From this we can get the Window Procedure Method, in which all message are handle.

lpfnWndProc of the WNDCLASSEX structure contains the address to the Window procedure
method.
2. Create the Window itself.
We can retrieve every single const by name, and most of the time the exact C/C++ equivalent.
3. The message Loop
All we have to do is look for a reference toGetMessage().
We start with the basic skeletons first, then move on to more complex stuff, its import to learn the
basic first because
They give you an ideal of how the application is design. We will be using the PVdasm, which you
can get from my site

http://www.crackingislife.com/modules.php?name=Downloads&d_op=getit&lid=2

This is a very nice free disassembler which we will be using.


4.2 Decompiling a sample application
First load up PvDasm, and your screen should look similar to Figure 4.2.1

(Figure 4.2.1)
Grab CreateWindow2 (the program we are going to decompile by hand) and Open it in the
disassembler, your screen should look similar to figure 4.2.2

(Figure 4.2.2)
We see are entry point, but this is CRTL code (Common Runtime library), how can we
find WinMain Function? By references. We know that in WinMain functions we have
a CreateWindowEx, or a RegisterClassEx, if we can find where the program is calling these
functions, we can than begin to map out the program. You see when you compile a program a linker
links it with libraries or DLL (Dynamic linking libraries). The functions you get from these
DLLs are called imports. The PVdasm can list all the imports a program has, and show you the
address from where they are called. To use this feature press Crtl+N or press the import button. Your
screen should look similar to figure 4.2.3

Step 1. Click the input button or press Crtl+N


Step 2. You should see a window with a list of imports; scroll down until you
see CreateWindowEx.
Now we must find the start of the function, this is pretty easy, if we follow the following rules.
1. Consist of a
Hide Copy Code

push ebp
mov esp,ebp
sub esp, <X>

2. Right after a
Hide Copy Code

mov esp,ebp
pop ebp
ret <X>

Well if we scroll up to address 0040104C and you should see


Hide Copy Code

0040104C push ebp


0040104D mov ebp, esp
0040104F sub esp, 50h

After that we see


Hide Copy Code

mov dword ptr ss:[ebp- 30],0000030


mov dword ptr ss:[ebp-2c],0000000003

Ok, so we know we have local variables, and it mostly looks like a structure, to find
the WNDCLASSEX structure we need a reference point. A good reference to look for
is LoadCursor. About every single application uses the call, so simply press the import button or
Crtl+N, and select LoadCursor.
Once you have selected LoadCursor you should then see something similar to

00401092 call ds:LoadCursorA


00401098 mov [ebp-14], eax
Ok, now we all know the return value for functions are stored in the eax register, and we know that
the hCursormember of WNDCLASSEX is being used (because we are loading a cursor). Now
what position is hCursor in memory, well its ebp-14h(yes thats 14 HEX no decimal), with this
information we can figure out where all the other member are to. If we take a quick look at
the WNDCLASSEX structure
Hide Copy Code

typedef struct WNDCLASSEX {


UINT cbSize; //30h
UINT style; // 2ch
WNDPROC lpfnWndProc; //28h
int cbClsExtra; //24h
int cbWndExtra; //20h
HINSTANCE hInstance; //1ch/
HICON hIcon; //18h
HCURSOR hCursor; // ebp -14h --Start calculation here ->
HBRUSH hbrBackground; //ebp -10h
LPCSTR lpszMenuName; //ebp 0ch
LPCSTR lpszClassName; //ebp - 8
HICON hIconSm; //ebp -4
};

As you can see its easy to calculate structure member addresses, simply add the size of the variable
for each member above you and subtract the size of the variable for each member below you. Now
that we know the memory location of every structure we can begin to really understand how the
program is created. The first thing we do is get the value of all the members in the structure, starting
with the cbSize member.
1. cbSize
The first thing we see is mov dword ptr ss:[ebp- 30],0000030 and we all know that ebp 30h
is the location of cbSize. So what we are really saying is mov dword ptr ss:[cbSize],30h. Of
course we can go a step further since we know that 30h is the size of WNDCLASSEX,
and cbSize is suppose to hold the size ofWNDCLASSEX, so we can fully decompile this line to

wc.cbSize = sizeof(WNDCLASSEX);
2. style

mov dword ptr ss:[ebp-2c],0000000003


Ok, what style is the program using, well, to figure this out we need to look into windows.h and get
all style values. Now we could do a bit by bit compare by hand, but we dont have time for that, so I
made a small program call WinDasmRef. All we need to do is choose the type of section we want to
look up, in our case its style from WNDCLASSEX, then enter a value, and bam it returns exactly
what the user entered.
Refer to screen shot 4.2.5 for more information

You can get this program from http://www.crackingislife.com/modules.php?


name=Downloads&d_op=getit&lid=1

Step 1. Select a section


Step 2. Enter a value
Step 3. It will do a bit by bit compare for you and find all the values.
This program is no where near finish, but it is more than enough for this book.
3. lpfnWndProc

mov dword ptr ss:[ebp-28],00401000


This is the most important and interesting structure, because this holds the address to the message
loop from this we can tell that the message loop is located at address 00401000(in hex of course)
4. cbClsExtra

mov dword ptr ss:[ebp -24],0


We are simply setting wc.cbClsExtra to 0000000
5. cbWndExtra

mov dword ptr ss:[ebp-20],0000000


we are simply setting wc.cbWndExtra to 0
6. hInstance

mov eax,dword ptr ss:[ebp+8] //local variable hInstance


mov dword ptr ss:[ebp-1C],eax //Hinstance
Remember the declaration for the main function is

WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine , int


nCmdShow)
and the first parameter (hInstance) is stored at ebp + 8, and the second parameter
(hPrevInstance) is stored at ebp + 12
Now that eax holds the value of holds hinstance, we simply transfer that value to [ebp-1C] or
hinstance. So in other words we are saying wc.hInstance = hInstance
7. hIcon

mov dword ptr ss:[ebp-18],00000000


we are simply setting wc.hIcon to 0
8. hCursor
Hide Copy Code

push 00007F00
mov ecx,DWORD ptr SS:[ebp+08]
push ecx
call USER32!LoadCursorA
mov dword ptr ss:[ebp-14],eax

Ok, the first thing we do is look at the declaration of LoadCursorA and find that it is
Hide Copy Code

LoadCursor (HINSTANCE hInstance, LPSTR cursorname);

and the last parameter is push first, so cursorname is the first parameter being bush which is the
value 7F00.

If the user is not using a custom cursor (most dont) we can retrieve its value in WinDasmRef and
yes, you can enter hex values in WinDasmRef, just make sure you put a 0x7F00 not 7F00
refer to figure 4.2.6

(Figure 4.2.6)
Note: If your wondering why LoadCursor.cursorname wasnt in the first picture, it is because Im
writing this program as Im typing this book.

mov ecx,DWORD ptr SS:[ebp+08]


push ecx
Next we move ecx, to SS:[ebp+8] which is hInstance, and then we push ecx to the stack,
the stack currently contains

IDC_ARROW
hInstance
then we see call USER32!LoadCursorA , we can turn this back into the complete original line of
source which is

LoadCursor(hInstance,IDC_ARROW);
now we all know that LoadCursor returns the handle to the cursor in the eax register so

mov dword ptr ss:[ebp-14],eax , ebp-14 is the position of hCursor. Now lets decompile the
entire statement
wc.hCursor = LoadCursor(hInstance,IDC_ARROW);
9. hbrBackground

push 01
CALL GDI32!GetStockObject
mov dword ptr ss:[ebp-10],eax
Ok , first we push 01 into the stack and call GetStockObject, now if we look at the declaration
ofGetStockObject which is GetStockObject(int brush) , we know that the 01 is specifying a
brush so load up WinDasmRef, and type 1 in , refer to figure 4.2.7 for more information

So we know the call is like GetStockObject(LTGRAY_BRUSH), after that we


see mov dword ptr ss:[ebp-10],eax and eax holds the handle to the brush return
by GetStockObject, and ebp-10, is the memory location of hbrBackground, so the full
decompile statement is

wc.hbrBackground = GetStockObject(LTGRAY_BRUSH);
10. lpszMenuName

mov dword ptr ss:[ebp-0C],0000000


we simply set lpszMenuName to 0
11. lpszClassName

mov edx,dword ptr ds:[0040603C]


mov dword ptr ss:[ebp-08],edx
at the address of 0040603C, is a pointer to are class name, how can i tell ? , easy because it is
surrounding the address in brackets, so it is getting a value from 0040603C, we can easily use any
hex editor to look at the address 0040603C, as long as we know the image base.
The image base is the location the program is loaded into memory, to see the image base press
CRTL+P in PvDasm A window similar to Figure 4.2.8 should come up

(Figure 4.2.8)
We subtract the image base with is 400000 in hex from 0040603C, and we are left with 603C, now if
we go to offset 603C in a file we will see 30, we must read 3 more bytes because Intel uses 32 bit
address, so the full address is 30604000
Now 30604000 is in little endian order, which the X86 uses, we must convert it to big endian by
reverse every hex byte, like this 00406030, now if we subtract the image base from that we get
6030, and we look at address 6030, we will see a D, if we keep reading to a null terminator like
everyone else does we will see DECOMPILE.
Now that we have the name of are class, we can fully decompile the statement like this

static char * szClass = DECOMPILE;

wc.lpszClassName = szClass; since we are going mov dword ptr ss:[ebp-08],edx and
edx
holds the address of szClass, and ebp-8 is the memory location of lpszClassName
12. hIcon

mov dword ptr ss:[ebp-4],0000000


this is simply setting hIcon to 0
Now that we are done with are whole window class, lets have a overview of all the values
Hide Copy Code

WNDCLASSEX wc; //we dont know the exact name but it has to be something
wc.cbSize = sizeof(WNDCLASSEX);
wc.style = CS_HREDRAW | CS_VREDRAW;
wc.lpfnWndProc = WndProc;
wc.cbClsExtra = 0;
wc.cbWndExtra =0;
wc.hInstance = hInstance;
wc.hIcon =0;
wc.hCursor = LoadCursor(hInstance,IDC_ARROW);
wc.hbrBackground = (HBRUSH) GetStockObject(LTGRAY_BRUSH);
wc.lpszMenuName = NULL;
wc.lpszClassName = szClass;
wc.hIconSm = NULL;

As you can see we practically decompile this back to exact source code.
Now we see the following code
Hide Copy Code

lea eax,dword ptr ss:[ebp-30]


push eax
call USER32!RegisterClassExA
and eax,0000FFFF
test eax,eax
jnz 004010E4
push 0
push 00406054 ; ASCIIZ Crap
push 0040605C ; ASCIIZ Cant register class
push 0
Call USER32!MessageBoxA
xor eax,eax
jmp 00401172

lets first begin with

lea eax,dword ptr ss:[ebp-30]


push eax

call USER32!RegisterClassExA
now ss:[ebp-30] holds the address of the WNDCLASSEX structure, because [ebp-30] is the first
member of the structure which is cbSize, now that eax holds the address of the structure we push it
into the stack and callUSER32!RegisterClassExA; if we look at the Declaration
of RegisterClassEx,

ATOM WINAPI RegisterClassExA(CONST WNDCLASSEX *);


We see that it returns the type ATOM, which is 16 bits, and because of that we see and
eax,0000FFFF, which is masking off the upper 16 bits, so we dont read a 32 bit value, after that we
see

test eax,eax
jnz 004010E4
this is simply saying if eax is not zero then jump to 004010E4, the exact c++ code for this is

if(!RegisterClassEx(&;wc))
{
//bad code here
}
//else continue (004010E4
Remember the ! is saying if RegisterClassEx returns the value of 0 execute the bad code. Now as
we continue on we see that it is going to display a message box if it fails

push 0
push 00406054 ; ASCIIZ Crap
push 0040605C ; ASCIIZ Cant register class
push 0
Call USER32!MessageBoxA
and if we look at the declaring of MessageBox

MessageBoxA(HWND hWnd , LPCSTR lpText, LPCSTR lpCaption, UINT uType);

push 0 is for the hWnd parameter and its specifying we have none
push 00406054; is the address of the ASCII string crap

push 0040605C;is the address of the ASCII string Cant register class
push 0; is the message box type, to see what type 0 is
Lets crack open WinDasmRef
Refer to figure 4.2.9 for more information

So we can decompile the whole line into

MessageBox(NULL,Cant register class,crap,MB_OK);


after that we see

xor eax,eax
jmp 00401172
xor eax,eax clears 0 and if we go see whats at address 00401172, we will find
mov esp,ebp
pop ebp
ret 10
which is exit code, so we can decompile this line to return 0. The full original code is

if(!RegisterClassEx(&;wc))
{
MessageBox(NULL,"Can't register class","Crap",MB_OK);
return 0;
}
As you can see decompiling is quite simple for this basic windows stuff, so I not going to bore you
with the rest. If you have any questions , please check out are forums
at http://www.eliteproxy.com/modules.php?name=Forums

You might also like