You are on page 1of 25

ANTI-UNPACKER TRICKS

CURRENT
Peter Ferrie, Senior Anti-Virus Researcher, Microsoft Corporation

Abstract Unpackers are as old as the packers themselves, but restored to its non-present state. In either case, the page
anti-unpacking tricks are a more recent development. These address is kept in a list. In the event of exceptions triggered by
anti-unpacking tricks have developed quickly in number and, in execution of non-executable or non-present pages, the page
some cases, complexity. In this paper, we will describe some of address is compared to the entries in that list. A match
the most common anti-unpacking tricks, along with some indicates the execution of newly-written code, and is a possible
countermeasures. host entrypoint.

INT RODUCT ION I. ANT I-UNPACKING BY ANT I-DUMPING

A nti-unpacking tricks can come in different forms,


depending on what kind of unpacker they want to attack.
The unpacker can be in the form of a memory-dumper, a
a. SizeOfImage

debugger, an emulator, a code-buffer, or a W-X interceptor. It The simplest of anti-dumping tricks is to change the
can be a tool in a virtual machine. There are corresponding SizeOfImage value in the Process Environment Block (PEB).
tricks for each of these, and they will be discuss ed separately. This interferes with process access, including preventing a
debugger from attaching to the process. It also breaks tools
- A memory-dumper dumps the process memory of the such as LordPE in default mode, among others.
running process, without regard to the code inside it. Example code looks like this:

- A debugger attaches to the process, allowing single- mov eax, fs:[30h] ;PEB
stepping, or the placing of breakpoints at key locations, in mov eax, [eax+0ch] ;LdrData
order to stop execution at the right place. The process can
;get InLoadOrderModuleList
then be dumped with more precision than a memory -dumper
mov eax, [eax+0ch]
alone.
;adjust SizeOfImage
- An emulator, as used within this paper, is a purely add dw [eax+20h], 1000h
software-based environment, most commonly used by anti-
malware software. It places the file to execute inside the The technique is used by many packers now. However,
environment and watches the execution for particular events of the technique is easily defeated, even by user-mode code.
interest. We can simply ignore the SizeOfImage value, and call the
VirtualQuery() function instead. The VirtualQuery() function
- A code-buffer is similar to, but different from, a debugger. returns the number of sequential pages whose attributes are
It also attaches to a process, but instead of executing the same. Since there cannot be gaps between sections in
instructions in-place, it copies each instruction into a private memory, the ranges can be enumerated by querying the first
buffer and executes it from there. It allows fine-grained control page after the end of the previous range. The enumeration
over execution as a result. It is also more transparent than a would begin with the ImageBase page and continue while
debugger, and faster than an emulator. the MEM_IMAGE type is returned. A page that is not of
the MEM_IMAGE type did not come from the file.
- A W-X interceptor uses page-level tricks to watch for
write-then-execute sequences. Typically, an executable region
b. Erasing the header
is marked as read-only and executable, and everything else is
marked as read-only and non-executable (or simply non-
present, depending on the hardware capabilities). Then the Some unpackers examine the section table to gather
code is allowed to execute freely. The interceptor intercepts interesting information about the image. Erasing or altering
exceptions that are triggered by writes to read-only pages, or that section table in the PE header can interfere with the
execution from non-executable or non-present pages. If the gathering of that information. This is typically used as to
hardware supports it, a read-only page will be replaced by a defeat ProcDump-style tools, which rely on the section table
writable but non-executable page, and the write will be allowed to dump the image.
to continue. Otherwise, the single-step exception will be used Example code looks like this:
to allow the write to complete, after which the page will be
;get image base start of the stolen bytes in the host, to point to the start of
push 0 the relocated code. A jump instruction is placed at the end
call GetModuleHandleA of the relocated code, to point to the end of the stolen bytes.
push eax The rest of the opcodes in the s tolen region in the host are
push esp then replaced with garbage. The relocated code can also be
push 4 ;PAGE_READWRITE interspersed with garbage instructions, in order to make it
;rounded up to hardware page size more difficult to determine the real instructions from the fake
push 1 instructions. This complicates the restoration of the original
push eax code. This technique was introduced in ASProtect.
xchg edi, eax
call VirtualProtect e. Guard Pages
xor ecx, ecx
mov ch, 10h ;assume 4kb pages Guard pages act as a one-shot access alarm. The first
;store VirtualProtect return value time that a guard page is accessed for any reason, an
rep stosb EXCEPTION_GUARD_PAGE (0x80000001) exception will be
raised. This can be used for a variety of things, but overall it
This technique is used by Yoda's Crypter, among others. acts as a demand-paging system for ring 3 code. The
As above, the VirtualQuery() function can be used to technique is achieved by intercepting the
recover the image size, and some of the layout (i.e. which EXCEPTION_GUARD_PAGE (0x80000001) exception,
pages are executable, which are writable, etc), but there is no checking if the page is within a particular range (for example,
way to recover the section table once it has been erased. within the process image space), then mapping in some
appropriate content if so.
c. Nanomites
This technique is used by Shrinker to perform on-
Nanomites are a more advanced method of anti-dumping. demand decompression. By decompressing only the pages
They were introduced in Armadillo. They work by replacing that are accessed, the startup time is reduced significantly.
branch instructions with an "int 3" instruction, and using The committed memory consumption can be reduced, since
tables in the unpacking code to determine the details. The any pages that are not accessed do not need any physical
details in this case are whether or not the "int 3" is a memory to back them. The overall application performance
nanomite or a debug break; whether or not the branch can also be increased, when compared to other packers that
should be taken, if it is a nanomite; the address of the decompress the entire application immediately. Shrinker
destination, if the branch is taken; and how large the works by hooking the ntdll KiUserExceptionDispatcher()
instruction is, if the branch is not taken. function, and watching for the EXCEPTION_GUARD_PAGE
(0x80000001) exception. If the exception occurs within the
A process that is protected by nanomites requires self- process image space, then Shrinker will load from disk the
debugging (known as "Debug Blocker" in Armadillo, see individual page that is being accessed, decompress it, and
Anti-Debugging:Self-Debugging section below), which uses then resume execution. If an access spans two pages, then
a copy of the same process. This allows the debugger to upon resuming, an exception will occur for the next page,
intercept the exceptions that are generated by the debuggee and Shrinker will load and decompress that page, too.
when the nanomite is hit. When the exception occurs in the
debuggee, the debugger recovers the exception address and A variation of this technique is used by Armadillo, to
searches for it in an address table. If a match is found, then perform on-demand decryption (known as "CopyMem2").
the nanomite type is retrieved from a type table. If the CPU However, as with nanomites, it requires the use of self-
flags match the type, then the branch will be taken. When debugging. This is in contrast to Shrinker, which is entirely
that happens, the destination address is retrieved from a self-contained. Armadillo decompresses all of the pages
destination table, and execution resumes from that address. into memory at once, rather than loading them from disk
Otherwise, the size of the branch is retrieved from the size when they are accessed. Armadillo uses the debugger to
table, in order to skip the instruction. intercept the exceptions in the debuggee, and watches for
the EXCEPTION_GUARD_PAGE (0x80000001) exception. If
d. Stolen Bytes the exception occurs within the process image space, then
Armadillo will decrypt the individual page that is being
Stolen bytes are opcodes that are taken from the host and accessed, and then resume execution. If an access spans
placed in dynamically allocated memory, where they will be two pages, then upon resuming, an exception will occur for
executed separately. A jump instruction is placed at the the next page, and Armadillo will decrypt that page, too.
a. PEB fields
If performance were not a concern, a protection method of
this type could also remember the last page that was loaded, i. NtGlobalFlag
and discard it when an exception occurs for another page
(unless the exception address suggests an access that The NtGlobalFlag field exists at offset 0x68 in the PEB.
spanned them). That way, no more than two pages will ever The value in that field is zero by default. On Windows
be in the clear in memory at the same time. In fact, that 2000 and later, there is a particular value that is typically
Armadillo does not do this could be considered a weakness stored in the field when a debugger is running. The
in the implementation, because by simply touching all of the presence of that value is not a reliable indication that a
pages in the image, Armadillo will decrypt them all, and then debugger is really running (especially since it is entirely
the process can be dumped entirely. absent on Windows NT). However, it is often used for
that purpose. The field is composed of a set of flags. The
f. Imports value that suggests the presence of a debugger is
composed of the following flags:
The list of imported functions can be very useful to get at
least some idea of what a program does. To combat this, FLG_HEAP_ENABLE_TAIL_CHECK (0x10)
some packers alter the import table after the imports have FLG_HEAP_ENABLE_FREE_CHECK (0x20)
been resolved. The alteration typically takes the form of FLG_HEAP_VALIDATE_PARAMETERS (0x40)
completely erasing the import table, but there are variations
that include changing the imported address to point to a Example incorrect code looks like this:
private buffer that is allocated dynamically. Within the
buffer is a jump to the real function address. This buffer is mov eax, fs:[30h] ;PEB
usually not dumped by default, so when the process exits, ;check NtGlobalFlag
the information is lost as to the real function addresses. cmp b [eax+68h], 70h
jne being_debugged
g. Virtual machines
This technique is used by ExeCryptor, among others.
Virtual machines are perhaps the ultimate in anti-dumping
technology, because at no point is the directly executable The "cmp" instruction above is a common mistake.
code ever visible in memory. Further, the import table might The assumption is that no other flags can be set, which is
contain only the absolutely required functions not true. Those three flags alone are usually set for a
(LoadLibrary() and GetProcAddress()), leaving no clue as to process that is created by a debugger, but not for a
what the program does. Additionally, the p-code might be process to which a debugger attaches afterwards.
encoded in some way, such that two behaviourally identical However, there are three further exceptions.
samples might have very different-looking contents. This
technique is used by VMProtect. The first exception is that additional flags can be set for
all processes, by a registry value. The registry value is
The p-code itself can be polymorphic, where do-nothing the "GlobalFlag" string value of the
instructions are inserted into the code flow, in the same way "HKLM\System\CurrentControlSet\Control\Session
as is often done for native code. This technique is used by Manager" registry key.
Themida.
The second exception is that all of the flags can be
The p-code can contain anti-debugging routines, such as controlled on a per-process basis, by a different registry
checking specific memory locations for specific values (see value. The registry value is the also the "GlobalFlag"
Anti-Debugging section below). This technique is used by string value (note that "Windows Anti-Debug Reference"
HyperUnpackMe2 i. by Nicolas Falliere iii incorrectly calls it "GlobalFlags") of
the "HKLM\Software\Microsoft\Windows
The p-code interpreter can be obfuscated, such that the NT\CurrentVersion\Image File Execution
method for interpretation is not immediately obvious. This Options\<filename>" registry key. The "<filename>" must
technique is used by Themida and Virtual CPUii. be replaced by the name of the executable file (not a DLL)
to which the flags will be applied when the file is
executed. An empty "GlobalFlag" string value will result
II. ANT I-UNPACKING BY ANT I-DEBUGGING
in no flags being set.
The third exception is that, on Windows 2000 and later, the second one (ForceFlags) is at offset 0x10 in the heap.
all of the flags can be controlled on a per-process basis, The Flags field indicates the settings that were used for the
by the Load Configuration Structure. The Load current heap block. The ForceFlags field indicates the
Configuration Structure has existed since Windows NT, settings that will be used for subsequent heap manipulation.
but the format was not documented by Microsoft in the The value in the first field is two by default, the value in the
PE/COFF Specification until 2006 (and incorrectly). The second field is zero by default. There are particular values
structure was extended to support Safe Exception that are typically stored in those fields when a debugger is
Handling in Windows XP, but it also contains two fields running, but the presence of those values is not a reliable
of relevance to this paper: GlobalFlagsClear and indication that a debugger is really running. However, they
GlobalFlagsSet. As their names imply, they can be used are often used for that purpose.
to clear and/or set any combination of bits in the PEB-
>NtGlobalFlag field. The flags specified by the The fields are composed of a set of flags. The value in
GlobalFlagsClear field are cleared first, then the flags the first field that suggests the presence of a debugger is
specified by the GlobalFlagsSet field are set. This means composed of the following flags:
that even if all of the flags are specified by the
GlobalFlagsClear field, any flags that are specified by the HEAP_GROWABLE (2)
GlobalFlagsSet field will still be set. No current packer HEAP_TAIL_CHECKING_ENABLED (0x20)
supports this structure. HEAP_FREE_CHECKING_ENABLED (0x40)
HEAP_SKIP_VALIDATION_CHECKS (0x10000000)
If the FLG_USER_STACK_TRACE_DB (0x1000) is HEAP_VALIDATE_PARAMETERS_ENABLED
specified to be set, either by the "GlobalFlag" registry (0x40000000)
value, or in the GlobalFlagsSet field, the
FLG_HEAP_VALIDATE_PARAMETERS will Example code looks like this:
automatically be set, even if it is specified in the
GlobalFlagsClear field. mov eax, fs:[30h] ;PEB
;get process heap base
Thus, the correct implementation to detect the default mov eax, [eax+18h]
value is this one: mov eax, [eax+0ch] ;Flags
dec eax
mov eax, fs:[30h] ;PEB dec eax
mov al, [eax+68h] ; NtGlobalFlag jne being_debugged
and al, 70h
cmp al, 70h The value in the second field that suggests the presence
je being_debugged of a debugger is composed of the following flags:

The simplest method to defeat this technique is to HEAP_TAIL_CHECKING_ENABLED (0x20)


create the empty "GlobalFlag" string value. HEAP_FREE_CHECKING_ENABLED (0x40)
HEAP_VALIDATE_PARAMETERS_ENABLED
b. Heap flags (0x40000000)

The process default heap is another place to find Example code looks like this:
debugging artifacts. The base heap pointer can be retrieved
by the kernel32 GetProcessHeap() function. Some packers mov eax, fs:[30h] ;PEB
avoid using the API and look directly at the PEB instead. ;get process heap base
mov eax, [eax+18h]
Example code looks like this: cmp [eax+10h], 0 ;ForceFlags
jne being_debugged
mov eax, fs:[30h] ;PEB
;get process heap base The "tail" flags are set in the heap fields if the
mov eax, [eax+18h] FLG_HEAP_ENABLE_TAIL_CHECK flag is set in the PEB-
>NtGlobalFlags field. The "free" flags are set in the heap
Within the heap are two fields of interest. The PEB- fields if the FLG_HEAP_ENABLE_FREE_CHECK flag is set
>NtGlobalFlags field forms the basis for the values in those in the PEB->NtGlobalFlags field. The validation flags are set
fields. The first field (Flags) exists at offset 0x0c in the heap, in the heap fields if the
FLG_HEAP_VALIDATE_PARAMETERS flag is set in the mov eax, fs:[30h] ;PEB
PEB->NtGlobalFlags field. However, the heap flags can be ;check BeingDebugged
controlled on a per-process basis, through the cmp b [eax+2], 0
"PageHeapFlags" value, in the same manner as "GlobalFlag" jne being_debugged
above.
To defeat these methods requires only setting the PEB-
c. The Heap >BeingDebugged flag to FALSE. A common
convenience while debugging is to place a breakpoint at
The problem with simply clearing the heap flags is that the first instruction in the kernel32 IsDebuggerPresent()
the initial heap will have been initialised with the flags function. Some unpackers check explicitly for this
active, and that leaves some artifacts that can be detected. breakpoint.
Specifically, at the end of the heap block will one definite Example code looks like this:
value, and one possible value. The
HEAP_TAIL_CHECKING_ENABLED flag causes the push offset l1
sequence 0xABABABAB to always appear twice at the call GetModuleHandleA
exact end of the allocated block. The push offset l2
HEAP_FREE_CHECKING_ENABLED flag causes the push eax
sequence 0xFEEEFEEE (or a part thereof) to appear if call GetProcAddress
additional bytes are required to fill in the slack space until cmp b [eax], 0cch
the next block. je being_debugged
Example code looks like this: ...
l1: db "kernel32", 0
mov eax, <heap ptr> l2: db "IsDebuggerPresent", 0
;get unused_bytes
movzx ecx, b [eax-2] Some packers check that the first byte in the function is
movzx edx, w [eax-8] ;size the "64" opcode ("FS:" prefix).
sub eax, ecx Example code looks like this:
lea edi, [edx*8+eax]
mov al, 0abh push offset l1
mov cl, 8 call GetModuleHandleA
repe scasb push offset l2
je being_debugged push eax
call GetProcAddress
These values are checked by Themida. cmp b [eax], 64h
jne being_debugged
d. Special APIs ...
l1: db "kernel32", 0
i. IsDebuggerPresent l2: db "IsDebuggerPresent", 0

The kernel32 IsDebuggerPresent() function was ii. CheckRemoteDebuggerPresent


introduced in Windows 95. It returns TRUE if a debugger
is present. Internally, it simply returns the value of the The kernel32 CheckRemoteDebuggerPresent() function
PEB->BeingDebugged flag. has these parameters: HANDLE hProcess, PBOOL
Example code looks like this: pbDebuggerPresent. The function is a wrapper that was
introduced in Windows XP SP1, to query a value that has
call IsDebuggerPresent existed since Windows NT. "Remote" in this sense refers
test al, al to a separate process on the same machine. The function
jne being_debugged sets to 0xffffffff the value to which the
pbDebuggerPresent argument points, if a debugger is
Some packers avoid using the kernel32 present. Internally, it simply returns the value from the
IsDebuggerPresent() function and look directly at the PEB ntdll NtQueryInformationProcess (ProcessDebugPort
instead. class) function.
Example code looks like this: Example code looks like this:
push eax
push esp push eax
push -1 ;GetCurrentProcess() mov eax, esp
call CheckRemoteDebuggerPresent push 0
pop eax push 4 ;ProcessInformationLength
test eax, eax push eax
jne being_debugged ;ProcessDebugObjectHandle
push 1eh
Some packers avoid using the kernel32 push -1 ;GetCurrentProcess()
CheckRemoteDebuggerPresent() function, and call the call NtQueryInformationProcess
ntdll NtQueryInformationProcess() function directly. pop eax
test eax, eax
iii. NtQueryInformationProcess jne being_debugged

The ntdll NtQueryInformationProcess() function has This technique is used by HyperUnpackMe2, among
these parameters: HANDLE ProcessHandle, others. Since this information comes from the kernel,
PROCESSINFOCLASS ProcessInformationClass, PVOID there is no easy way for user-mode code to prevent this
ProcessInformation, ULONG ProcessInformationLength, call from revealing the presence of the debugger.
PULONG ReturnLength. Windows Vista supports 45
classes of ProcessInformationClass information (up from The undocumented ProcessDebugFlags class returns
38 in Windows XP), but only four of them are documented the inverse value of the EPROCESS->NoDebugInherit bit.
by Microsoft so far. One of them is the That is, the return value is FALSE if a debugger is
ProcessDebugPort. It is possible to query for the present.
existence (not the value) of the port. The return value is Example code looks like this:
0xffffffff if the process is being debugged. Internally, the
function queries for the non-zero state of the EPROCESS- push eax
>DebugPort field. mov eax, esp
Example code looks like this: push 0
push 4 ;ProcessInformationLength
push eax push eax
mov eax, esp push 1fh ;ProcessDebugFlags
push 0 push -1 ;GetCurrentProcess()
push 4 ;ProcessInformationLength call NtQueryInformationProcess
push eax pop eax
push 7 ;ProcessDebugPort test eax, eax
push -1 ;GetCurrentProcess() je being_debugged
call NtQueryInformationProcess
pop eax This technique is used by HyperUnpackMe2, among
test eax, eax others. Since this information comes from the kernel,
jne being_debugged there is no easy way for user-mode code to prevent this
call from revealing the presence of the debugger.
This technique is used by MSLRH, among others.
Since this information comes from the kernel, there is no v. NtQuerySystemInformation
easy way for user-mode code to prevent this call from
revealing the presence of the debugger. The ntdll NtQuerySystemInformation() function has
these parameters: SYSTEM_INFORMATION_CLASS
iv. Debug Objects SystemInformationClass, PVOID SystemInformation,
ULONG SystemInformationLength, PULONG
Windows XP introduced a "debug object". When a ReturnLength. Windows Vista supports 106 classes of
debugging session begins, a debug object is created, and SystemInformationClass information (up from 72 in
a handle is associated with it. It is possible to query for Windows XP), but only nine of them are documented by
the value of this handle, using the undocumented Microsoft so far. None of them is the
ProcessDebugObjectHandle class. SystemKernelDebuggerInformation class, which has
Example code looks like this: existed since Windows NT.
push 3
The SystemKernelDebuggerInformation class returns push ebx
the value of two flags: KdDebuggerEnabled in al, and call NtQueryObject
KdDebuggerNotPresent in ah. Thus, the return value in pop ebp
ah is FALSE if a debugger is present. push 4 ;PAGE_READWRITE
Example code looks like this: push 1000h ;MEM_COMMIT
push ebp
push eax push ebx
mov eax, esp call VirtualAlloc
push 0 push ebx
push 2 ;SystemInformationLength ;ObjectInformationLength
push eax push ebp
;SystemKernelDebuggerInformation push eax
push 23h ;ObjectAllTypesInformation
call NtQuerySystemInformation push 3
pop eax push ebx
test ah, ah xchg esi, eax
je being_debugged call NtQueryObject
lodsd ;handle count
This technique is used by SafeDisc. Since this xchg ecx, eax
information comes from the kernel, there is no easy way to l1: lodsd ;string lengths
prevent this call from revealing the presence of the movzx edx, ax ;length
debugger. ;pointer to TypeName
lodsd
vi. NtQueryObject xchg esi, eax
;sizeof(L"DebugObject")
The ntdll NtQueryObject() function has these ;avoids superstrings
parameters: HANDLE Handle, ;like "DebugObjective"
OBJECT_INFORMATION_CLASS cmp edx, 16h
ObjectInformationClass, PVOID ObjectInformation, jne l2
ULONG ObjectInformationLength, PULONG xchg ecx, edx
ReturnLength. Windows NT-based platforms support five mov edi, offset l3
classes of ObjectInformationClass information, but only repe cmpsb
two of them are documented by Microsoft so far. Neither xchg ecx, edx
of them is the ObjectAllTypesInformation, which we jne l2
require. ;TotalNumberOfObjects
cmp [eax], edx
As noted above, when a debugging session begins on jne being_debugged
Windows XP, a debug object is created, and a handle is ;point to trailing null
associated with it. It is possible to query for the list of l2: add esi, edx
existing objects, and check the number of debug objects ;round down to dword
that exist. This API is supported by Windows NT-based and esi, -4
platforms, but only Windows XP and later will return a ;skip trailing null
debug object in the list. ;and any alignment bytes
Example code looks like this: lodsd
loop l1
xor ebx, ebx ...
push ebx l3: dw "D","e","b","u","g"
push esp ;ReturnLength dw "O","b","j","e","c","t"
;ObjectInformationlength of 0
;to receive required size Since this information comes from the kernel, there is
push ebx no easy way for user-mode code to prevent this call from
push ebx revealing the presence of the debugger.
;ObjectAllTypesInformation
vii. Thread hiding level denial-of-service, by causing the CSRSS.EXE
process to perform an illegal operation. One method is
Windows 2000 introduced an explicitly anti-debugging the creation of a thread at an invalid memory address, or a
API extension, in the form of an information class called thread that executes an infinite loop. However, since the
HideThreadFromDebugger. It can be applied on a per- control is complete, an application can inject a thread into
thread basis, using the ntdll SetInformationThread() the CSRSS.EXE process space and perform some
function. meaningful action, which results in a privilege elevation.
Example code looks like this: However, this is of only minor concern, since usually only
Administrators will be able to acquire the debug privilege,
push 0 and Administrators are highly privileged already. This
push 0 technique was described publicly by Piotr Bania iv in 2005.
;HideThreadFromDebugger
push 11h Both OllyDbg and WinDbg acquire the debug privilege,
push -2 ;GetCurrentThread() but Turbo Debug does not. The best way to defeat this
call NtSetInformationThread technique is to not acquire the privilege arbitrarily, and
keep it for only as long as truly necessary.
When the function is called, the thread will continue to
run but a debugger will no longer receive any events ix. CloseHandle
related to that thread. Among the missing events are that
the process has terminated, if the main thread is the If an invalid handle is passed to the kernel32
hidden one. This technique is used by CloseHandle() function (or directly to the ntdll NtClose()
HyperUnpackMe2, among others. function), and no debugger is present, then an error code
is returned. However, if a debugger is present, an
viii. OpenProcess EXCEPTION_INVALID_HANDLE (0xc0000008) exception
will be raised. This exception can be intercepted by an
When a process acquires the SeDebugPrivilege, it exception handler, and is an indication that a debugger is
gains full control of the CSRSS.EXE, even though running.
CSRSS.EXE is a system process. The reason for that is Example code looks like this:
because SeDebugPrivilege overrides all of the restrictions
for that process alone. Further, the privilege is passed to xor eax, eax
child processes, such as the ones created by a debugger. push offset being_debugged
The result is if a debugged application can obtain the push dw fs:[eax]
process ID for CSRSS.EXE, it can open the process via mov fs:[eax], esp
the kernel32 OpenProcess() function. The process ID can ;any illegal value will do
be obtained by the kernel32 CreateToolhelp32Snapshot() ;must be dword-aligned on Vista
function and a kernel32 Process32Next() function push esp
enumeration; or the ntdll NtQuerySystemInformation call CloseHandle
(SystemProcessInformation (5)) function (and the ntdll
NtQuerySystemInformation() function is how the kernel32 To defeat this method is easiest on Windows XP, where
CreateToolhelp32Snapshot() function gets its information a FirstHandler Vectored Exception Handler can be
on Windows NT-based platforms). Alternatively, registered by the debugger to hide the exception and
Windows XP introduced the ntdll CsrGetProcessId() silently resume execution. Of course, there is the problem
function, which simplifies things greatly. of transparently hooking the kernel32
Example code looks like this: AddVectoredExceptionHandler() function, in order to
prevent another handler from registering as the first
call CsrGetProcessId handler. However, it is still better than the problem of
push eax transparently hooking the ntdll NtClose() on Windows NT
push 0 and Windows 2000, in order to register a Structured
push 1f0fffh ;PROCESS_ALL_ACCESS Exception Handler to hide the exception.
call OpenProcess
test eax, eax x. OutputDebugString
jne being_debugged
The kernel32 OutputDebugString() function can
This opens (no pun intended) the way to a system- demonstrate different behaviour, depending on whether
or not a debugger is present. The most obvious
difference in behaviour that the kernel32 GetLastError() The kernel32 WriteProcessMemory() function
function will return zero if a debugger is present. technique is a simple variation on the kernel32 ReadFile()
Example code looks like this: function technique above, but it requires that the data to
write are already present in the process memory space.
push 0 Example code looks like this:
push esp
call OutputDebugStringA push 1
call GetLastError push offset l1
test eax, eax push offset l2
je being_debugged push -1 ;GetCurrentProcess()
call WriteProcessMemory
xi. ReadFile l1: nop
l2: int 3
The kernel32 ReadFile() function can be used as a
technique for self-modification, by reading file content This technique is used by NsAnti. The way to defeat
into the code stream. It is also an effective method for this technique is to use hardware breakpoints instead of
removing software breakpoints that a debugger might software breakpoints after the API call.
place. This is a technique that I discussed privately in
1999, but it was described publicly by Piotr Bania v in 2007. xiii. UnhandledExceptionFilter
Example code looks like this:
When an exception occurs, and no registered
xor ebx, ebx Structured Exception Handlers (neither Safe nor Legacy)
mov ebp, offset l2 or Vectored Exception Handlers exist, or none of the
push 104h ;MAX_PATH registered handlers handles the exception, the kernel32
push ebp UnhandledExceptionFilter() function will be called as a
push ebx ;self filename last resort. Within that function is a call to the handler
call GetModuleFileNameA that was registered by the kernel32
push ebx SetUnhandledExceptionFilter() function, but that call will
push ebx not reached if a debugger is present. Instead, the
push 3 ;OPEN_EXISTING exception will be passed to the debugger. The presence
push ebx of a debugger is determined by a call to the ntdll
push 1 ;FILE_SHARE_READ NtQueryInformationProcess (ProcessDebugPort class)
push 80000000h ;GENERIC_READ function. So, for applications that do not know about the
push ebp ntdll NtQueryInformationProcess (ProcessDebugPort
call CreateFileA class) function, the missing exception can be used to infer
push ebx the presence of the debugger.
push esp Example code looks like this:
;more bytes might be more useful
push 1 push offset l1
push offset l1 call SetUnhandledExceptionFilter
push eax ;force an exception to occur
call ReadFile int 3
;replaced by "M" jmp being_debugged
;from the MZ header l1: ...
l1: int 3
... xiv. Block Input
l2: db 104h dup (?);MAX_PATH
The user32 BlockInput() function blocks mouse and
The way to defeat this technique is to use hardware keyboard events from reaching applications. It is a very
breakpoints instead of software breakpoints after the API effective way to disable debuggers.
call. Example code looks like this:

xii. WriteProcessMemory push 1


call BlockInput ;an exception will reach here
l1: ...
This technique is used by Yoda's Protector, among
others. This technique is used by PC Guard.

xv. SuspendThread xvii. Alternative desktop

The kernel32 SuspendThread() function can be another Windows NT-based platforms support multiple
very effective way to disable user-mode debuggers like desktops per session. It is possible to select a different
OllyDbg and Turbo Debug. This can be achieved by active desktop, which has the effect of hiding the
enumerating the processes, as described above, then windows of the previously active desktop, and with no
suspending the main thread of the parent process, if it obvious way to switch back to the old desktop.
does not match "Explorer.exe". This technique is used by
Yoda's Protector. Example code looks like this:

xvi. Guard Pages xor eax, eax


push eax
Guard pages can be used for a simple debugger ;DESKTOP_CREATEWINDOW
detection. An exception handler is registered, an ;+ DESKTOP_WRITEOBJECTS
executable/writable page is allocated dynamically, a "C3" ;+ DESKTOP_SWITCHDESKTOP
opcode ("RET" instruction) is written to it, and then the push 182h
page protection is changed to PAGE_GUARD. Then an push eax
attempt is made to execute the instruction. This should push eax
result in an EXCEPTION_GUARD_PAGE (0x80000001) push eax
exception being received by the exception handler, but if a push offset l1
debugger is present, the debugger might intercept the call CreateDesktopA
exception and allow the execution to continue. In fact, push eax
that's exactly what happens in OllyDbg (see Anti- call SwitchDesktop
debugging:OllyDbg section below). ...
Example code looks like this: l1: db "mydesktop", 0

xor ebx, ebx This technique is used by HyperUnpackMe2.


push 40h ;PAGE_EXECUTE_READWRITE
push 1000h ;MEM_COMMIT e. Hardware tricks
push 1
push ebx i. Prefetch queue
call VirtualAlloc
mov b [eax], 0c3h Given this code:
push eax
push esp l1: call l3
;PAGE_EXECUTE_READWRITE l2: ...
;+ PAGE_GUARD l3: mov al, 0c3h
push 140h mov edi, offset l3
push 1 or ecx, -1
push eax rep stosb
xchg ebp, eax
call VirtualProtect What happens next? The answer depends on several
push offset l1 things. Clearly, the code overwrites itself, which might
push dw fs:[ebx] lead one to conclude that it stops as soon as the REP is
mov fs:[ebx], esp destroyed. If a debugger is used to single-step, then that
push offset being_debugged is exactly what happens. However, if a debugger is not
;executing ret will branch present, then the write continues until an exception
;to being_debugged occurs. Which exception that is depends on the memory
jmp ebp layout at the time.
If, after the code, is a purely virtual region that has One variation contains a JMP instruction within the
been accessed by a debugger, for example, then an access altered range; the other contains a JECXZ instruction
violation exception will occur and the program will exit if outside of the altered range. They have opposite effects.
no exception handler has been registered.
In both cases, the "90" opcode ("NOP" instruction) in
On the other hand, if the virtual memory has not been the AL register is used to overwrite the REP STOSB and
accessed, then the REP will stop. No visible exception some of the following bytes. Incorrect emulation (or
will occur, but a "C3" opcode ("RET" instruction) will be single-stepping through the code, as with a debugger)
executed, and control will be returned to l2. will cause the REP to exit prematurely, allowing the
instructions immediately following the STOSB instruction
Why? The answer is the prefetch queue. On the x86 to execute. In the first variation, the JMP instruction will
family of CPUs prior to the Pentium, the prefetch queue be executed as a result, revealing the presence of a
would not be flushed automatically when a memory write debugger or similar. In the second variation, the value in
occurred at an address that corresponded to the address ECX will be zero only if the REP STOSB completes. If the
of the bytes in the prefetch queue. However, the queue JECXZ instruction is not executed, this reveals the
would be flushed whenever an exception occurred, such presence of a debugger or similar.
as the single-step exception that many debuggers use to
step through code. This behaviour allowed for all kinds The JECXZ version of this technique is used by
of anti-debugger tricks, mostly concerned with Obsidium.
overwriting the next instruction to execute. In the
absence of a debugger, the prefetch queue would execute The Pentium Pro introduced an additional behaviour,
the original instruction. In the presence of a debugger called a "fast string" operation, which is also supported
that triggers a single-step exception, the queue would be by modern CPUs. It is available for both MOVS and
flushed, and the alteration would be applied. STOS. It requires these conditions: REP prefix, EDI
aligned to a multiple of 8 bytes (and ESI, too, for MOVS
Intel considered this behaviour to be a bug, and it was on a Pentium 3), ESI and EDI at least a cache-line apart
fixed in the Pentium and later CPUs, but with two (64 bytes for the Pentium 4 and later CPUs, 32 bytes for
exceptions that remain to this day: the REP MOVS and earlier CPUs) for MOVS, ECX at least 64, D flag clear in
REP STOS instructions. For those two instructions, the the EFLAGS register, and WB or WC memory type for
CPU still caches them and continues to execute them even EDI (and ESI for MOVS). Additionally, a Model Specific
when the instruction sequence has been overwritten in Register (MSR) must be set appropriately (though by
memory. default it is already enabled) - either 1A0 bit 0 or 1E0 bit 2.
Of particular interest is that the single-step exception
The execution continues until completion, or until an cannot interrupt the operation.
exception occurs. In the case above, an exception occurs,
but it is a page fault when the value in EDI reaches a ii. Hardware Breakpoints
reserved page in memory. At that time, the CPU flushes
and reloads the prefetch queue, sees the "C3" opcode When an exception occurs, Windows passes to the
("RET" instruction) where the REP STOS instruction was exception handler a context structure which contains the
previously, and executes that instead. values of the general registers, segment registers, control
registers, and the debug registers. If a debugger is
This technique is used by Invius. present and passes the exception to the debuggee with
hardware breakpoints in use, then the debug registers will
Another example of this trick exists that does not rely contain values that reveal the presence of the debugger.
on the page fault. Example code looks like this:
Example code looks like this:
xor eax, eax
l1: mov al, 90h push offset l1
push 10h push dw fs:[eax]
pop ecx mov fs:[eax], esp
mov edi, offset l1 ;force an exception to occur
rep stosb jmp eax
... ...
;ContextRecord ;ExceptionCode
l1: mov eax, [esp+0ch] mov ecx, [ecx]
mov eax, [eax+4] ;Dr0 ;EXCEPTION_INT_DIVIDE_BY_ZERO
or eax, [eax+8] ;Dr1 cmp ecx, 0c0000094h
or eax, [eax+0ch] ;Dr2 jne l6
or eax, [eax+10h] ;Dr3 ;CONTEXT_Eip
jne being_debugged inc b [edx+0b8h]
mov [edx+4], eax ;Dr0
The debugger is also vulnerable to being bypassed if mov [edx+8], eax ;Dr1
the debuggee erases the contents of the debug registers mov [edx+0ch], eax ;Dr2
prior to resuming execution after the exception. This mov [edx+10h], eax ;Dr3
technique is used by ASProtect, among others. mov [edx+14h], eax ;Dr6
mov [edx+18h], eax ;Dr7
iii. Instruction Counting ret
;EXCEPTION_BREAKPOINT
Instruction counting can be performed by registering l6: cmp ecx, 80000003h
an exception handler, then setting some hardware jne l7
breakpoints on particular addresses. When each address ;Dr0
is hit, an EXCEPTION_SINGLE_STEP (0x80000004) mov dw [edx+4], offset l1
exception will be raised. This exception will be passed to ;Dr1
the exception handler, which can adjust the instruction mov dw [edx+8], offset l2
pointer to point to a new instruction, and then resume ;Dr2
execution. To set the breakpoints requires access to a mov dw [edx+0ch], offset l3
context structure. This can be achieved by calling the ;Dr3
kernel32 GetThreadContext() function. Alternatively, a mov dw [edx+10h], offset l4
context structure is passed to an exception handler, so by ;Dr7
forcing an exception to occur, the context can be acquired mov dw [edx+18h], 155h
in a more obfuscated manner. Some debuggers do not ret
handle correctly hardware breakpoints that they did not ;EXCEPTION_SINGLE_STEP
set themselves, leading to some instructions not being l7: cmp ecx, 80000004h
counted by the exception handler. jne being_debugged
Example code looks like this: ;CONTEXT_Eax
inc b [edx+0b0h]
xor eax, eax ret
cdq
push offset l5 This technique is used by tELock .
push dw fs:[eax]
mov fs:[eax], esp iv. Execution Timing
int 3
l1: nop When a debugger is present, and used to single-step
l2: nop through the code, there is a significant delay between the
l3: nop executions of the individual instructions , when compared
l4: nop to native execution. This delay can be measured using
div edx one of several possible time sources. These sources
cmp al, 4 include the RDTSC instruction, the kernel32
jne being_debugged GetTickCount() function, and the winmm timeGetTime()
... function, among others. However, the resolution of the
l5: xor eax, eax winmm timeGetTime() function is variable, depending on
;ExceptionRecord whether or not it branches internally to the kernel32
mov ecx, [esp+4] GetTickCount() function, making it very unreliable to
;ContextRecord measure small intervals.
mov edx, [esp+0ch] Example code looks like this for RDTSC:
;CONTEXT_Eip
inc b [edx+0b8h] rdtsc
xchg ecx, eax i. Header entrypoint
rdtsc
sub eax, ecx Any section of the file, whose attributes do not include
cmp eax, 500h IMAGE_SCN_MEM_WRITE (writable) and/or
jnbe being_debugged IMAGE_SCN_MEM_EXECUTE (executable), is read-only
by default to a remote debugger. This includes the PE
Example code looks like this for kernel32 header, since there is no section that describes it (there is
GetTickCount(): an exception to this, see Anti-Emulating:File-Format
section below). If the entrypoint happens to be in such a
call GetTickCount section, then a debugger will not be able to successfully
xchg ebx, eax set any breakpoints, if it does not first call the kernel32
call GetTickCount VirtualProtectEx() function to write-enable the memory
sub eax, ebx region. Further, if the failure to set the breakpoint is not
cmp eax, 1 noticed by the debugger, then the debugger might allow
jnb being_debugged the debuggee to run freely. This is the case for Turbo
Debugger. This technique is used by MEW, among
Example code looks like this for winmm timeGetTime(): others.

call timeGetTime ii. Parent process


xchg ebx, eax
call timeGetTime Users usually execute applications manually via a
sub eax, ebx window provided by the shell. As a result, the parent of
cmp eax, 10h any such process will be Explorer.exe. Of course, if the
jnb being_debugged application is executed from the command-line, then the
command-line application will be the parent. Executing
v. EIP via Exceptions applications from the command-line can be a problem for
certain packers. This is because some packers check the
Using exceptions to alter the value of eip is a very parent process name, expecting it to be "Explorer.exe", or
common technique among packers. It serves as an the packers compare the parent process ID against that of
effective anti-debugging technique, since debuggers Explorer.exe. A mismatch in either case is then assumed
typically intercept some of the exceptions (int 1 and int 3, to be caused by a debugger creating the process.
for example). It also provides for a level of obfuscation,
particularly if the exception trigger is not immediately The process ID of both Explorer.exe, and the parent of
obvious. the current process, can be obtained by the kernel32
Example code looks like this: CreateToolhelp32Snapshot() function and a kernel32
Process32Next() function enumeration.
xor eax, eax Example code looks like this:
push offset l3
push dw fs:[eax] xor esi, esi
mov fs:[eax], esp xor edi, edi
l1: call l1 push esi
l2: jmp l2 push 2 ;TH32CS_SNAPPROCESS
l3: pop eax call CreateToolhelp32Snapshot
pop eax mov ebx, offset l5
pop esp push ebx
l4: ... push eax
xchg ebp, eax
Is l2 ever reached? No, it's not. A stack overflow call Process32First
exception occurs at l1, causing a transfer of control to l3. l1: call GetCurrentProcessId
After the stack is restored, execution continues from l4. ;th32ProcessID
This technique is used by PECompact, among others. cmp [ebx+8], eax
;th32ParentProcessID
f. Process tricks cmove edi, [ebx+18h]
test esi, esi
je l2 jmp l2
test edi, edi l1: push 8000h ;MEM_RELEASE
je l2 push esi
cmp esi, edi push ebx
jne being_debugged call VirtualFree
l2: lea ecx, [ebx+24h] ;szExeFile l2: xor eax, eax
push esi mov ah, 10h ;MEM_COMMIT
mov esi, ecx add ebp, eax ;4kb increments
l3: lodsb push 4 ;PAGE_READWRITE
cmp al, "\" push eax
cmove ecx, esi push ebp
or b [esi-1], " " push esi
test al, al call VirtualAlloc
jne l3 ;function does not return
sub esi, ecx ;required length for this class
xchg ecx, esi push esi
push edi ;must calculate by brute-force
mov edi, offset l4 push ebp
repe cmpsb push eax
pop edi ;SystemProcessInformation
pop esi push 5
;th32ProcessID xchg ebx, eax
cmove esi, [ebx+8] call NtQuerySystemInformation
push ebx ;STATUS_INFO_LENGTH_MISMATCH
push ebp cmp eax, 0c0000004h
call Process32Next je l1
test eax, eax l3: call GetCurrentProcessId
jne l1 ;UniqueProcessId
... cmp [ebx+44h], eax
l4: db "explorer.exe " ;InheritedFromUniqueProcessId
;sizeof(PROCESSENTRY32) cmove edi, [ebx+48h]
l5: dd 128h test esi, esi
db 124h dup (?) je l4
test edi, edi
This technique is used by Yoda's Protector, among je l4
others. Since this information comes from the kernel, cmp esi, edi
there is no easy way for user-mode code to prevent this jne being_debugged
call from revealing the presence of the debugger. l4: mov ecx, [ebx+3ch] ;ImageName
However, a common technique is to force the kernel32 jecxz l6
Process32Next() function to return FALSE, which causes push esi
the loop to exit early. It should be a suspicious condition xor eax, eax
if either Explorer.exe or the current processes were not mov esi, ecx
seen, but Yoda's Protector (and some other packers) does l5: lodsw
not contain any requirement that both were found. cmp eax, "\"
cmove ecx, esi
The process ID of both Explorer.exe, and the parent of push ecx
the current process, can be obtained by the ntdll push eax
NtQuerySystemInformation (SystemProcessInformation call CharLowerW
(5)) function. mov w [esi-2], ax
Example code looks like this: pop ecx
test eax, eax
xor ebp, ebp jne l5
xor esi, esi sub esi, ecx
xor edi, edi xchg ecx, esi
push edi debugged, even if the first process was. It will also know
mov edi, offset l7 that it is the copy since the mutex will exist.
repe cmpsb Example code looks like this:
pop edi
pop esi xor ebx, ebx
;UniqueProcessId push offset l2
cmove esi, [ebx+44h] push eax
;NextEntryOffset push eax
l6: mov ecx, [ebx] call CreateMutexA
add ebx, ecx call GetLastError
inc ecx ;ERROR_ALREADY_EXISTS
loop l3 cmp eax, 0b7h
... je l1
l7: dw "e","x","p","l","o","r" mov ebp, offset l3
dw "e","r",".","e","x","e",0 push ebp
call GetStartupInfoA
However, the process ID of Explorer.exe can be call GetCommandLineA
obtained most simply by the user32 GetShellWindow() ;sizeof(PROCESS_INFORMATION)
and user32 GetWindowThreadProcessId() functions. The sub esp, 10h
process ID of the parent of the current process can be push esp
obtained most simply by the ntdll push ebp
NtQueryInformationProcess (ProcessBasicInformation push ebx
(0)) function. push ebx
Example code looks like this: push ebx
push ebx
call GetShellWindow push ebx
push eax push ebx
push esp push eax
push eax push ebx
call GetWindowThreadProcessId call CreateProcessA
push 0 pop eax
;sizeof(PROCESS_BASIC_INFORMATION) push -1 ;INFINITE
push 18h push eax
mov ebp, offset l1 call WaitForSingleObject
push ebp call ExitProcess
push 0 ;ProcessBasicInformation l1: ...
push -1 ;GetCurrentProcess() l2: db "my mutex", 0
call NtQueryInformationProcess ;sizeof(STARTUPINFO)
pop eax l3: db 44h dup (?)
;InheritedFromUniqueProcessId
cmp [ebp+14h], eax A common mistake is the use of the kernel32 Sleep()
jne being_debugged function, instead of the kernel32 WaitForSingleObject()
... function, because it introduces a race condition. The
;sizeof(PROCESS_BASIC_INFORMATION) problem occurs when there is CPU-intensive activity.
l1: db 18h dup (?) This could be because of a sufficiently complicated
protection (or intentional delays) in the second process;
iii. Self-execution but also actions that the user might perform while the
execution is in progress, such as browsing the network or
One of the simplest ways to escape from the control of extracting files from an archive. The result is that the
a debugger is for a process to execute another copy of second process might not reach the mutex check before
itself. Typically, the process will use a synchronisation the delay expires; leading it to think that it is the first
object, such as a mutex, to prevent infinite executions. process. The result is that it executes yet another copy of
The first process will create the mutex, and then execute the process. This behaviour can be repeated any number
the copy of the process. The second process will not be of times, until one of the processes completes the mutex
check successfully. This technique is used by MSLRH, l4: <array of space-terminated ASCII
and the exact problem is present there. strings, space to end>
;sizeof(PROCESSENTRY32)
iv. Process Name l5: dd 128h
db 124h dup (?)
As noted above, the list of process names can be
retrieved by the kernel32 CreateTool32Snapshot() Example code looks like this for ntdll
function, or the ntdll QuerySystemInformation() function. NtQuerySystemInformation():
In addition to finding Explorer.exe or the current process
name, some packers look for other process names, xor ebp, ebp
particularly those which belong to anti-malware vendors xor esi, esi
or specialised tools. jmp l2
Example code looks like this for kernel32 l1: push 8000h ;MEM_RELEASE
CreateToolhelp32Snapshot(): push esi
push ebx
push 0 call VirtualFree
push 2 ;TH32CS_SNAPPROCESS l2: xor eax, eax
call CreateToolhelp32Snapshot mov ah, 10h ;MEM_COMMIT
mov ebx, offset l5 add ebp, eax ;4kb increments
push ebx push 4 ;PAGE_READWRITE
push eax push eax
xchg ebp, eax push ebp
call Process32First push esi
l1: lea ecx, [ebx+24h] ;szExeFile call VirtualAlloc
mov esi, ecx ;function does not return
l2: lodsb ;required length for this class
cmp al, "\" push esi
cmove ecx, esi ;must calculate by brute-force
or b [esi-1], " " push ebp
test al, al push eax
jne l2 ;SystemProcessInformation
sub esi, ecx push 5
xchg ecx, esi xchg ebx, eax
mov edi, offset l4 call NtQuerySystemInformation
l3: push ecx ;STATUS_INFO_LENGTH_MISMATCH
push esi cmp eax, 0c0000004h
repe cmpsb je l1
je being_debugged l3: mov ecx, [ebx+3ch] ;ImageName
mov al, " " jecxz l6
not ecx xor eax, eax
;move to previous character mov esi, ecx
dec edi l4: lodsw
;then find end of string cmp eax, "\"
repne scasb cmove ecx, esi
pop esi push ecx
pop ecx push eax
cmp [edi], al call CharLowerW
jne l3 mov w [esi-2], ax
push ebx pop ecx
push ebp test eax, eax
call Process32Next jne l4
test eax, eax sub esi, ecx
jne l1 xchg ecx, esi
... mov edi, offset l7
l5: push ecx ;to detect breakpoints
push esi add edx, eax
repe cmpsb loop l3
je being_debugged cmp edx, <checksum>
not ecx jne being_debugged
;move to previous character ;small delay then restart
dec edi push 100h
;force word-alignment call Sleep
and edi, -2 jmp l2
;then find end of string l4: ;code end
repne scasw
pop esi This technique is used by PE-Crypt32, among others.
pop ecx
cmp [edi], ax vi. Self-debugging
jne l5
;NextEntryOffset Self-debugging is the act of running a copy of a
l6: mov ecx, [ebx] process, and attaching to it as a debugger. Since only
add ebx, ecx one debugger can be attached to a process at any point in
inc ecx time, the copy of the process becomes undebuggable by
loop l3 ordinary means.
... Example code looks like this:
;must be word-aligned
;for correct scanning xor ebx, ebx
align 2 mov ebp, offset l3
l7: <array of null-terminated Unicode push ebp
strings, null to end> call GetStartupInfoA
call GetCommandLineA
v. Threads mov esi, offset l4
push esi
Threads are used by some packers to perform actions push ebp
such as periodically checking for the presence of a push ebx
debugger, or ensuring the integrity of the main code. The push ebx
use of threads had an additional advantage early on, push 1 ;DEBUG_PROCESS
which was that some anti-malware emulators did not push ebx
support threads, allowing the packed file to cause an early push ebx
exit. push ebx
Example code looks like this: push eax
push ebx
l1: xor eax, eax call CreateProcessA
push eax mov ebx, offset l5
push esp jmp l2
push eax l1: push 10002h ;DBG_CONTINUE
push eax push dw [esi+0ch] ;dwThreadId
push offset l2 push dw [esi+8] ;dwProcessId
push eax call ContinueDebugEvent
push eax l2: push -1 ;INFINITE
call CreateThread push ebx
... call WaitForDebugEvent
l2: xor eax, eax cmp b [ebx], 5
cdq ;EXIT_PROCESS_DEBUG_EVENT
mov ecx, offset l4 - offset l1 jne l1
mov esi, offset l1 ...
l3: lodsb ;sizeof(STARTUPINFO)
;simple sum l3: db 44h dup (?)
;sizeof(PROCESS_INFORMATION) hot-patched.
l4: db 10h dup (?)
;sizeof(DEBUG_EVENT) viii. TLS Callback
l5: db 60h dup (?)
This is a technique that allows the execution of user-
This technique is used by Armadillo, among others. defined code before the execution of the main entrypoint
This technique can be defeated most easily by kernel- code. It is a technique that I discussed privately in 2000,
mode code zeroing the EPROCESS->DebugPort field. but it was demonstrated publicly by Radim Picha vi later
Doing so will allow another debugger to attach to the that same year. It was used in a virus vii in 2002. It has
process. The debugged process can also be opened via been used by ExeCryptor and others since 2004.
the kernel32 OpenProcess() function, which means that a
DLL can be injected into the process space. ix. Device names
Alternatively, on Windows XP and later, the kernel32
DebugActiveProcessStop() function can be used to Tools that make use of kernel-mode drivers also need a
detach the debugger. way to communicate with those drivers. A very common
method is through the use of named devices. Thus, by
vii. Disassembly attempting to open such a device, any success indicates
the presence of the driver.
Some packers examine not just the first few bytes of an Example code looks like this:
API for breakpoints, but actively disassemble the
function code. There are a few reasons why they might xor eax, eax
do that. One reason is in order to perform API mov edi, offset l2
interception, whereby some complete instructions from l1: push eax
the function are copied to a private buffer and executed push eax
from there. A jump is placed at the end of those push 3 ;OPEN_EXISTING
instructions, to point after the last copied instruction in push eax
the original API code. This has the effect of bypassing push eax
breakpoints that are placed anywhere within the first few push eax
instructions of the original API code. push edi
call CreateFileA
Another reason is in order to perform a more reliable inc eax
search for breakpoints. By knowing the location of the jne being_debugged
instruction boundaries, there is no risk of encountering or ecx, -1
what appears to be a breakpoint, but is actually some repne scasb
data. For example, 0xb8 0xcc 0x00 0x00 0x00 appears to cmp [edi], al
contain a breakpoint, but when disassembled and jne l1
displayed, the sequence is "MOV EAX, 000000CC". ...
l2: <array of ASCIIZ strings, null to
In addition to searching for breakpoints, some packers end>
search for detours. Detours are jump instructions that are
inserted, usually as the first instruction, to point to a A typical list includes the following names:
private location. The code at that private location
typically creates a log of the APIs that are called, though \\.\SICE
it is not restricted to that behaviour. The problem with \\.\SIWVID
detecting detours is that it also detects hot-patching. \\.\NTICE
Microsoft added a dummy instruction to many functions
in Windows XP, that allows a jump instruction to be These names belong to SoftICE. Note that a
placed cleanly (that is, without concern for instruction successful opening of the device does not mean that
boundaries, since the dummy instruction achieves the SoftICE is active, but that it is present. However, that is
required alignment). This jump instruction would point to sufficient for many people. The first two drivers are
a private location that contains code to deal with a present on Windows 9x-based platforms, the third driver
vulnerability in the hooked function. If a packer detects is present on Windows NT-based platforms, but a lot of
detours and refuses to run if a detour of any kind is copy/paste occurs in the packer space, so this list
found, then it will also refuse to run if a function has been appears often, even in packers that do not run on
Windows 9x-based platforms. ii. Interrupt 1

Other common device names include these: The interrupt 1 descriptor normally has a descriptor
privilege level (DPL) of 0, which means that the "cd 01"
\\.\REGVXG opcode ("int 1" instruction) cannot be issued from ring 3.
\\.\REGSYS An attempt to execute this interrupt directly will result in a
general protection fault ("int 0x0d" exception) being
These names belong to RegMon. The first name is for issued by the CPU, eventually resulting in an
Windows 9x-based platforms, the second name is for EXCEPTION_ACCESS_VIOLATION (0xc0000005)
Windows NT-based platforms. exception being raised by Windows.

\\.\FILEVXG However, if SoftICE is running, it hooks interrupt 1 and


\\.\FILEM adjusts the DPL to 3, so that SoftICE can single-step
through user-mode code. This is not visible from within
These names belong to FileMon. The first name is for SoftICE, though - the "IDT" command, to display the
Windows 9x-based platforms, the second name is for interrupt descriptor table, shows the original interrupt 1
Windows NT-based platforms. handler address with a DPL of 0, as though SoftICE were
not present.
\\.\TRW
The problem is that when an interrupt 1 occurs, SoftICE
This name belongs to TRW. TRW is a debugger for does not check if it was caused by the trap flag or b y a
only Windows 9x-based platforms, yet some packers software interrupt. The result is that SoftICE always calls
check for it even on Windows NT-based platforms. the original interrupt 1 handler, and an
EXCEPTION_SINGLE_STEP (0x80000004) exception is
\\.\ICEEXT raised instead of the EXCEPTION_ACCESS_VIOLATION
(0xc0000005) exception, allowing for an easy detection
This name belongs to SoftICE extender. method.
Example code looks like this:
g. SoftICE-specific
xor eax, eax
For many years, SoftICE was the most popular of push offset l1
debuggers for the Windows platform. It is a debugger that push dw fs:[eax]
makes use of a kernel-mode driver, in order to support mov fs:[eax], esp
debugging of both user-mode and kernel-mode code, int 1
including transitions in either direction between the two. ...
;ExceptionRecord
SoftICE contains a number of vulnerabilities. A l1: mov eax, [esp+4]
description of them is beyond the scope of this paper. A ;EXCEPTION_SINGLE_STEP
companion paper (Anti-Unpacking Tricks - Future) will cmp dw [eax], 80000004h
cover the topic in detail. je being_debugged

i. Driver information This technique is used by SafeDisc. To defeat this


technique might appear to be a simple matter of restoring
The names of the device drivers on the system can be the DPL of interrupt 1. It is not so simple. The problem is
enumerated. This can be achieved using the ntdll to determine reliably the cause of an exception at the
NtQuerySystemInformation (SystemModuleInformation interrupt 0x0d level. The instruction queue can be
(0x0b)) function. For each module that is returned, the examined for an "int 1" sequence, but the trap flag could
version information in the file can be retrieved using the also appear to be set at the same time, even though it did
version VerQueryValue() function. This information not become active. This can happen if interrupts are
typically includes the Product Name and Copyright delayed for one instruction (via "pop ss", for example),
strings, which can be matched against specific products then the trap flag will not be responsible for the exception,
and companies, such as "SoftICE", "Compuware", and even though it is set. A companion paper (Anti-
"NuMega". Unpacking Tricks - Future) will cover some additional
aspects of this problem.
h. OllyDbg-specific push 0
push offset l1
OllyDbg is perhaps the most popular of us er-mode call FindWindowA
debuggers. It supports plug-ins. Some packers have been test eax, eax
written to detect OllyDbg, so some plug-ins have been jne being_debugged
written to attempt to hide OllyDbg from those packers. ...
Correspondingly, other packers have been written to detect l1: db "OLLYDBG", 0
these plug-ins. A description of those plug-ins, and the
vulnerabilities in them, is beyond the scope of this paper. A v. Guard Pages
companion paper (Anti-Unpacking Tricks - Future) will
cover the topic in detail. OllyDbg uses guard pages to handle memory
breakpoints. As noted above, if an application places
i. Malformed files executable instructions in a guarded page, an attempt to
execute them should result in an exception, but in
OllyDbg is too strict regarding the Portable Executable OllyDbg they will be executed instead.
format - it will refuse to open a file whose data directories
do not end at exactly the end of the Optional Header. It i. HideDebugger-specific
attempts to allocate the amount of memory specified by
the Export Directory Size, Base Relocation Directory Size, HideDebugger is a plug-in for OllyDbg. Early versions of
Export Address Table Entries, and PE->SizeOfCode fields, HideDebugger hooked the debuggee's kernel32
regardless of how large the values are. This can cause OpenProcess() function. The hook was done by placing a
the operating system swap file to grow enormously, far jump to a new handler, at offset 6 in the kernel32
which has a significant performance impact on the OpenProcess() function. The presence of the jump was a
system. good indicator that the HideDebugger plug-in was present.
Example code looks like this:
ii. Initial esi value
push offset l1
The esi register has an initial value of 0xffffffff in call GetModuleHandleA
OllyDbg on Windows XP, which seems to be constant, push offset l2
leading some people to use it as a detection method viii. In push eax
fact, it's just a coincidence (and the initial value is 0 on call GetProcAddress
Windows 2000). The value is a remnant of an exception cmp b [eax+6], 0eah
handler structure that Windows XP created during a call je being_debugged
to the ntdll RtlAllocateHeap() function. That location of ...
that value corresponds to the esi member in the context l1: db "kernel32", 0
structure that is created by the kernel32 CreateProcess() l2: db "OpenProcess", 0
function. The kernel32 CreateProcess() function does not
initialise the esi member. j. ImmunityDebugger-specific

iii. OutputDebugString ImmunityDebugger is essentially OllyDbg with a Python


command-line interface. In fact, it is largely byte-for-byte
OllyDbg passes user-defined data directly to the identical to the OllyDbg code. Correspondingly, it has the
msvcrt _vsprintf() function. If those data contain same vulnerabilities as OllyDbg, with respect to both detect
formatting string tokens, particularly if multiple "%s" and exploitation.
tokens are used, then it is likely that one of them will point
to an invalid memory region and crash OllyDbg. k. WinDbg-specific

iv. FindWindow i. FindWindow

OllyDbg can be found by calling the user32 WinDbg can be found by calling the user32
FindWindow() function, and passing "OLLYDBG" as the FindWindow() function, and passing
class name to find. "WinDbgFrameClass" as the class name to find.
Example code looks like this: Example code looks like this:
vulnerability, whereby an attacker will produce a sample
push 0 which intentionally delays its main execution, usually via a
push offset l1 dummy loop, in an attempt to force an emulator to give up.
call FindWindowA Example code looks like this:
test eax, eax
jne being_debugged mov ecx, 400000h
... l1: loop l1
l1: db "WinDbgFrameClass", 0
In some cases, such dummy loops can be recognized and
l. Miscellaneous tools skipped, but in that case, care must be taken to adjust the
values of any internal timers, and also the CPU registers that
i. FindWindow are involved. Otherwise, the arbitrary skipping of the loop
might be detected.
There are several less common tools that are of interest Example code looks like this:
to some packers, such as window name of "Import
REConstructor v1.6 FINAL (C) 2001-2003 MackT/uCF", or call GetTickCount
a class name of "TESTDBG", "kk1, "Eew57", or xchg ebx, eax
"Shadow". These names are checked by MSLRH. mov ecx, 400000h
l1: loop l1
call GetTickCount
III. ANT I-UNPACKING BY ANT I-EMULAT ING
sub eax, ebx
Some methods to detect emulators and virtual machines cmp eax, 1000h
have been described elsewhere ix. Some additional methods are jbe being_debugged
described here. A companion paper (Anti-Unpacking Tricks -
Future) will describe some further methods. Further, the loop might not be a dummy one at all, in the
sense that the results might be used for a real purpose, even
a. Software Interrupts though they could have been calculated without resorting to
a loop.
i. Interrupt 3 Real-world example code looks like this:

When an EXCEPTION_BREAKPOINT (0x80000003) mov ebp, esp


occurs, the eip register has already been advanced to the mov ebp, [ebp+1ch] ;0ffffffffh
next instruction, so Windows wants to rewind the eip to sub ebp, 5
point to the proper place. The problem is that Windows l1: sub ebp, 0ah
assumes that the exception is caused by a single-byte dec eax
"CC" opcode (short form "INT 3" instruction). If the "CD or ebp, ebp
03" opcode (long form "INT 3" instruction) is used to jne l1
cause the exception, then the eip will be pointing to the
wrong location. The same behaviour can be seen if any In this case, the calculated value is also used as a key, so
prefixes are placed before the short-form "INT 3" the loop cannot be skipped arbitrarily. This technique is
instruction. An emulator that does not behave in the used by Tibs.
same way will be revealed instantly. This technique is
used by TryGames. c. Invalid API parameters

b. Time-locks Many APIs return error codes when they receive invalid
parameters. The problem for anti-malware emulators is that,
Time-locks are a very effective anti-emulation technique. for simplicity, such error checking is not implemented. This
Most anti-malware emulators intentionally contain a limit to leads to a vulnerability, whereby an attacker will
the amount of time and/or the number of CPU instructions intentionally pass known invalid parameters to the function,
that can be emulated, before the emulator will exit with no and expecting an error code to be returned. In s ome cases,
detection. This behavior is almost a requirement, since a this error code is used as a key for decryption. Any
user will typically not be patient enough to wait for an emulator that fails to return the error code will not be able to
emulated application to exit on its own (if it ever would), decrypt the data.
before being able to access it normally. This leads to a Example code looks like this:
push 1 e. GetProcAddress(internal)
push 1
call Beep Some anti-malware emulators export special APIs, which
call GetLastError can be used to communicate with the host environment, for
;ERROR_INVALID_PARAMETER (0x57) example. This technique has been published elsewherex.
push 5 ;sizeof(l2) Example code looks like this:
pop ecx
xchg edx, eax push offset l1
mov esi, offset l2 call GetModuleHandleA
mov edi, esi push offset l2
l1: lodsb push eax
xor al, dl call GetProcAddress
stosb test eax, eax
loop l1 jne being_debugged
... ...
l2: db 3fh, 32h, 3bh, 3bh, 38h l1: db "kernel32", 0
;secret message l2: db "Aaaaaa", 0

This technique is used by Tibs. f. "Modern" CPU instructions

d. GetProcAddress Different CPU emulators have different capabilities. The


problem for anti-malware emulators is that, for simplicity,
The kernel32 GetProcAddress() function is intended to some (in some cases, many) CPU instructions are not
return the address of a function exported by the specified supported. This can include entire classes, such as FPU,
module. Since there is a potentially unlimited number of MMX, and SSE, as well as less common instructions such as
possible functions which can be retrieved from an infinite CMPXCHG8B. In addition, some instructions have slightly
number of modules, it is impossible for them all to be unexpected behaviours which might also not be supported,
available in an emulated environment that is provided by an such as that the CMPXCHG instruction always writes to
anti-malware emulator. However, even some expected memory, regardless of the result. Some of these behaviours
functions might be missing from such an environment, have attributes (particularly the CPU flags) that are marked
because of their lack of likely requirement, such as the as "undefined", but nothing is undefined in hardware. The
kernel32 GetTapeParameters() function. The problem is that challenge is to determine the algorithm to reproduce it.
some packers will exit early if not all function addresses
could be retrieved. To defeat that, some anti-malware Some packers use FPU and MMX instructions as do-
emulators will always return a value for the kernel32 nothing instructions, but the side-effect is that the anti-
GetProcAddress(), regardless of the parameters that are malware emulator might give up and fail to detect anything.
passed in. This leads to a vulnerability, whereby an attacker
will intentionally pass known invalid parameters to the g. Undocumented instructions
function, and expecting no function address to be returned.
Any emulator that returns an address in such a situation will Some packers make use of undocumented CPU
be revealed. instructions, for the same reason as they do for the modern
Example code looks like this: CPU instructions. That is, an anti-malware emulator is less
likely to support undocumented instructions or
push offset l1 undocumented encodings of documented instructions, so it
push 12345678h ;illegal value might give up and fail to detect anything. A list of these has
call GetProcAddress been published elsewhere xi.
test eax, eax
jne being_debugged h. Selector verification
...
l1: db "myfunction", 0 Selector verification is used to ensure that the descriptor
table layout matches the operating system platform, as
This technique is used by NsAnti. It is a specific case of returned by the kernel32 GetVersion() function, for example.
the general bad API problem from above. On Windows 9x-based platforms, the value of the cs selector
can exceed 0xff, but on Windows NT-based platforms, the the PE->SizeOfImage field should be a multiple of the
value is always 0x1b for ring 3 code. value in the PE->SectionAlignment field, but this is not a
Example code looks like this: requirement. Instead, Windows will round up the value as
required.
call GetVersion
test eax, eax ii. Overlapping structures
;Windows 9x-based platform
js l1 By adjusting the values of certain fields, it is possible
mov eax, cs to produce structures that overlap each other. The
xor al, al common targets are the MZ->lfanew field, to produce a PE
test eax, eax header that appears inside the MZ header; the PE-
jne being_emulated >SizeOfOptionalHeader field, to produce a section table
l1: ... that appears inside the DataDirectory array; and the
Import Address Table and Import Lookup Table virtual
This technique is used by MSLRH, among others. addresses, to produce an import table which has fields
inside the PE header.
i. Memory layout
iii. Non-standard NumberOfRvaAndSizes
There are certain in-memory structures that are always in
a predictable location. One of those is the A common mistake is to assume that the value in the
RTL_USER_PROCESS_PARAMETERS, which appears at PE->NumberOfRvaAndSizes field is set to the value that
memory location 0x20000 in normal circumstances. Within exactly fills the Optional Header, and that the section
that structure, the "DllPath" field exists at 0x20498, and the table follows immediately. The proper method to calculate
command-line at 0x205f8. This structure can be moved if PE- the location of the section table is to use the PE-
>ImageBase value is 0x20000 or less. The reason for this is >SizeOfOptionalHeader field. Both SoftICE and OllyDbg
because the PE sections are mapped into memory first, then contain this mistake. A companion paper (Anti-
the environment (at 0x10000 by default, and occupying 64kb Unpacking Tricks - Future) will cover the implications in
of virtual memory because of the behaviour of the memory detail.
allocation function that is used), then the process
parameters. By accessing these fields directly, certain APIs, iv. Non-aligned SizeOfRawData
such as the kernel32 GetCommandLine() function, do not
need to be called. This can make it difficult to know from The SizeOfRawData field in the section table is another
where certain information is gathered, and anti-malware field that is subject to automatic rounding up by
emulators might not include these structures at all. This Windows. By relying on this behavior, it is possible to
technique is used by TryGames. produce a section whose entrypoint appears to reside in
purely virtual memory, but because of rounding, will have
j. File-format tricks physical data to execute.

There are many known file-format tricks, yet occasionally v. Non-aligned PointerToRawData
a new one will appear. This can be a significant problem for
anti-malware emulators, since if the emulator is responsible The PointerToRawData field in the section table is a
for parsing the file-format, then incompatibilities can appear field that is subject to automatic rounding down by
because of differences in the emulated operating system. Windows. By relying on this behaviour, it is possible to
For example, Windows 9x-based platforms use a hard-coded produce a section whose entrypoint appears to point to
value for the size of the Optional Header, and ignore the PE- data other than what will actually be executed.
>SizeOfOptionalHeader field. They also allow gaps in the
virtual memory described by the section table. Windows NT- vi. No section table
based platforms honour the value in the PE-
>SizeOfOptionalHeader field, and do not allow any gaps. An interesting thing happens if the value in the PE-
Typical tricks include: >SectionAlignment field is reduced to less than 4kb.
Normally, the section that contains the PE header is
i. Non-aligned SizeOfImage neither writable nor executable, since there is no section
table entry that describes it. However, if the value in the
The file-format documentation states that the value in PE->SectionAlignment field is less than 4kb, then the PE
header is marked internally as both writable and to a specified user-mode address, but it also returns the
executable. Further, the contents of the section table original value of the page attributes. Any change in the
become optional. That is, the entire section table can be attributes is an indication that an intercepter is running.
zeroed out, and the file will be mapped as though it were Example code looks like this:
one section whose size is equal to the value in the PE-
>SizeOfImage field. ;sizeof(MEMORY_BASIC_INFORMATION)
push 1ch
mov ebx, offset l1
IV. ANT I-UNPACKING BY ANT I-INT ERCEPT ING
push ebx
push ebx
a. Write->Exec call VirtualQuery
;PAGE_EXECUTE_READWRITE
Some unpacking tools work by intercepting the execution cmp b [ebx+14h], 40h
of newly written pages, to guess when the unpacker has jne being_debugged
completed its main function and transferred control to the ...
host. By writing then executing a dummy instruction, an ;sizeof(MEMORY_BASIC_INFORMATION)
unpacker can cause an intercepter to exit early. l1: db 1ch dup (?)
Example code looks like this:
The kernel32 VirtualProtect() function is another way to
mov [offset dest], 0c3h query the page attributes, since the previous attributes are
call dest returned by the function. Any change in the attributes is an
indication that an intercepter is running.
This technique is used by ASPack , among others. Example code looks like this:
However, it is probably for an entirely different reason,
which is to force a CPU queue flush for multiprocessor l1: push eax
environments. push esp
push 40h ;PAGE_EXECUTE_READWRITE
b. Write^Exec push 1
push offset l1
Some unpacking tools work by changing the previously call VirtualProtect
writable-executable page attributes to either writable or pop eax
executable, but not both. These changes can be detected ;PAGE_EXECUTE_READWRITE
indirectly. The easier method to achieve this is to use a cmp al, 40h
function that uses the kernel to write to a specified user- jne being_debugged
mode address. The function will return an error if it fails to
write to the address. In this case, the address to specify is
one in which the page attributes are likely to have been V. MISCELLANEOUS
altered. A good candidate function is the kernel32
VirtualQuery() function. a. Fake signatures
Example code looks like this:
Some packers emit the startup code for other popular
;sizeof(MEMORY_BASIC_INFORMATION) packers and tools, in an attempt to fool unpackers into
push 1ch misidentifying the wrapper. Among the most popular of
mov ebx, offset l1 these fake signatures is the startup code for Microsoft
push ebx Visual C, which was written to fool PEiD. This technique is
push ebx used by RLPack Professional, among others.
call VirtualQuery
test eax, eax
je being_debugged CONCLUSION
... There are many different classes of anti-unpacking
;sizeof(MEMORY_BASIC_INFORMATION) techniques, and this paper has attempted to describe a subset
l1: db 1ch dup (?) of the known ones. A companion paper (Anti-Unpacking
Tricks - Future) will describe some of the possible future ones,
Not only does the kernel32 VirtualQuery() function write
so that we can, where possible, construct defenses against
them.

Final note:

The text of this paper was completed before I joined


Microsoft.
It was produced without access to any Microsoft source
code or personnel.

i
http://crackmes.de/users/thehyper/hyperunpackme2/
ii
http://www.honeynet.org/scans/scan33/index.html
iii
http://www.securityfocus.com/infocus/1893
iv
http://www.piotrbania.com/all/articles/antid.txt
v

http://piotrbania.com/all/articles/bypassing_the_breakpoints.tx
t
vi
http://www.defendion.com/EliCZ/infos/TlsInAsm.zip
vii
http://pferrie.tripod.com/papers/chiton.pdf
viii
http://vx.eof-project.net/viewtopic.php?id=142
ix
http://pferrie.tripod.com/papers/attacks2.pdf
x
http://pferrie.tripod.com/papers/attacks2.pdf
xi

http://www.symantec.com/enterprise/security_response/weblo
g/2007/02/x86_fetchdecode_anomalies.html

You might also like