You are on page 1of 16

2/2011 26

irst, Im going to present the main building block of
a debuggers implementation on Linux the ptrace
system call. All the code in this article is developed
on a 32-bit Ubuntu machine. Note that the code is very
much platform specific, although porting it to other plat-
forms shouldnt be too difficult.
To understand where were going, try to imagine what it
takes for a debugger to do its work. A debugger can start
some process and debug it, or attach itself to an existing
process. It can single-step through the code, set breakpo-
ints and run to them, examine variable values and stack
traces. Many debuggers have advanced features such as
executing expressions and calling functions in the deb-
bugged processs address space, and even changing the
processs code on-the-fly and watching the effects.
Although modern debuggers are complex beasts [1],
its surprising how simple is the foundation on which
they are built. Debuggers start with only a few basic se-
rvices provided by the operating system and the com-
piler/linker, all the rest is just a simple matter of pro-
Linux debugging ptrace
The Swiss army knife of Linux debuggers is the ptrace
system call [2]. Its a versatile and rather complex to-
ol that allows one process to control the execution of
another and to peek and poke at its innards [3]. ptrace
can take a mid-sized book to explain fully, which is why
Im just going to focus on some of its practical uses in
Lets dive right in.
Stepping through the code of a process
Im now going to develop an example of running a pro-
cess in traced mode in which were going to single-step
through its code the machine code (assembly instruc-
tions) thats executed by the CPU. Ill show the example
code in parts, explaining each, and in the end of the artic-
le you will find a link to download a complete C file that
you can compile, execute and play with.
The high-level plan is to write code that splits into a
child process that will execute a user-supplied command,
and a parent process that traces the child. First, the main
function from Listing 1.
Pretty simple: we start a new child process with fork
[4]. The if branch of the subsequent condition runs
the child process (called target here), and the else
if branch runs the parent process (called debugger
And the target process is in Listing 2.
The most interesting line there is the ptrace call. ptra-
ce is declared thus (in sys/ptrace.h):
long ptrace(enum __ptrace_request request, pid_t pid,
void *addr, void *data);
The frst argument is a request, which may be one of ma-
ny predefned PTRACE _ * constants. The second argument
How Debuggers Work
Most developers find debuggers an indispensable part of their
toolbox. This article aims to uncover how debuggers themselves
work. After reading this article, youll be able to implement a simple
debugger for Linux.
Youll learn:
How debuggers work
How to write a simple debugger for Linux
Where to fnd more information on advanced topics
You should know:
Have a basic familiarity with the Linux operating system
Good understanding of C programming
Basic familiarity with x86 assembly language 27
How Debuggers Work
quent calls to exec() by this process will cause a SIGTRAP to
be sent to it, giving the parent a chance to gain control before
the new program begins execution. A process probably should-
nt make this request if its parent isnt expecting to trace it.
(pid, addr, and data are ignored.)
Ive highlighted the part that interests us in this exam-
ple. Note that the very next thing run_target does after
ptrace is invoke the program given to it as an argument
with execl. This, as the highlighted part explains, causes
specifes a process ID for some requests. The third and
fourth arguments are address and data pointers, for me-
mory manipulation. The ptrace call in the Listing 2. makes
the PTRACE _ TRACEME request, which means that this child
process asks the OS kernel to let its parent trace it. The
request description from the man-page is quite clear:
Indicates that this process is to be traced by its parent. Any
signal (except SIGKILL) delivered to this process will cause it
to stop and its parent to be notified via wait(). Also, all subse-
Listing 1. main function for tracing another
int main(int argc, char** argv)
pid_t child_pid;
if (argc < 2) {
fprintf(stderr, Expected a program name as
return -1;
child_pid = fork();
if (child_pid == 0)
else if (child_pid > 0)
else {
return -1;
return 0;
Listing 2. Running the target process
void run_target(const char* programname)
procmsg(target started. will run %s\n,
/* Allow tracing of this process */
if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) {
/* Replace this processs image with the given
program */
execl(programname, programname, 0);
Listing 3. Running the debugger process
void run_debugger(pid_t child_pid)
int wait_status;
unsigned icounter = 0;
procmsg(debugger started\n);
/* Wait for child to stop on its frst
instruction */
while (WIFSTOPPED(wait_status)) {
/* Make the child execute another
instruction */
if (ptrace(PTRACE_SINGLESTEP, child_pid, 0,
0) < 0) {
/* Wait for child to stop on its next
instruction */
procmsg(the child executed %u instructions\n,
Listing 4. Sample program for tracing
#include <stdio.h>
int main()
printf(Hello, world!\n);
return 0;
2/2011 28
the OS kernel to stop the process just before it begins
executing the program in execl and sends a signal to the
Thus, time is ripe to see what the parent does. Lets
look at Listing 3.
Recall from above that once the child starts executing
the exec call, it will stop and be sent the SIGTRAP signal.
The parent here waits for this to happen with the first
wait call. wait will return once something interesting hap-
pens, and the parent checks that it was because the child
Listing 5. Hello, world! in assembly process
section .text
; The _start symbol must be declared for the
linker (ld)
global _start
; Prepare arguments for the sys_write system
; - eax: system call number (sys_write)
; - ebx: fle descriptor (stdout)
; - ecx: pointer to string
; - edx: string length
mov edx, len
mov ecx, msg
mov ebx, 1
mov eax, 4
; Execute the sys_write system call
int 0x80
; Execute sys_exit
mov eax, 1
int 0x80
section .data
msg db Hello, world!, 0xa
len equ $ - msg
Listing 6. Exemining registers with ptrace
void run_debugger(pid_t child_pid)
int wait_status;
unsigned icounter = 0;
procmsg(debugger started\n);
/* Wait for child to stop on its frst
instruction */
while (WIFSTOPPED(wait_status)) {
struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, child_pid, 0,
unsigned instr = ptrace(PTRACE_PEEKTEXT,
child_pid, regs.eip, 0);
procmsg(icounter = %u. EIP = 0x%08x.
instr = 0x%08x\n,
icounter, regs.eip, instr);
/* Make the child execute another
instruction */
if (ptrace(PTRACE_SINGLESTEP, child_pid, 0,
0) < 0) {
/* Wait for child to stop on its next
instruction */
procmsg(the child executed %u instructions\n,
Listing 7. Running the tracer, results
$ simple_tracer traced_helloworld
[5700] debugger started
[5701] target started. will run traced_helloworld
[5700] icounter = 1. EIP = 0x08048080. instr =
[5700] icounter = 2. EIP = 0x08048085. instr =
[5700] icounter = 3. EIP = 0x0804808a. instr =
[5700] icounter = 4. EIP = 0x0804808f. instr =
[5700] icounter = 5. EIP = 0x08048094. instr =
Hello, world!
[5700] icounter = 6. EIP = 0x08048096. instr =
[5700] icounter = 7. EIP = 0x0804809b. instr =
[5700] the child executed 7 instructions 29
How Debuggers Work
Listing 8. objdump output on our executable
$ objdump -d traced_helloworld
traced_helloworld: fle format elf32-i386
Disassembly of section .text:
08048080 <.text>:
8048080: ba 0e 00 00 00 mov
8048085: b9 a0 90 04 08 mov
804808a: bb 01 00 00 00 mov
804808f: b8 04 00 00 00 mov
8048094: cd 80 int $0x80
8048096: b8 01 00 00 00 mov
804809b: cd 80 int $0x80
Listing 9. Test program for setting breakpoints
section .text
; The _start symbol must be declared for the
linker (ld)
global _start
; Prepare arguments for the sys_write system
; - eax: system call number (sys_write)
; - ebx: fle descriptor (stdout)
; - ecx: pointer to string
; - edx: string length
mov edx, len1
mov ecx, msg1
mov ebx, 1
mov eax, 4
; Execute the sys_write system call
int 0x80
; Now print the other message
mov edx, len2
mov ecx, msg2
mov ebx, 1
mov eax, 4
int 0x80
; Execute sys_exit
mov eax, 1
int 0x80
section .data
msg1 db Hello,, 0xa
len1 equ $ - msg1
msg2 db world!, 0xa
len2 equ $ - msg2
Listing 10. objdump output on the program in
Listing 9
traced_printer2: fle format elf32-i386
Idx Name Size VMA LMA
File off Algn
0 .text 00000033 08048080 08048080
00000080 2**4
1 .data 0000000e 080490b4 080490b4
000000b4 2**2
Disassembly of section .text:
08048080 <.text>:
8048080: ba 07 00 00 00 mov
8048085: b9 b4 90 04 08 mov
804808a: bb 01 00 00 00 mov
804808f: b8 04 00 00 00 mov
8048094: cd 80 int $0x80
8048096: ba 07 00 00 00 mov
804809b: b9 bb 90 04 08 mov
80480a0: bb 01 00 00 00 mov
80480a5: b8 04 00 00 00 mov
80480aa: cd 80 int $0x80
80480ac: b8 01 00 00 00 mov
80480b1: cd 80 int $0x80
2/2011 30
was stopped (WIFSTOPPED returns true if the child pro-
cess was stopped by delivery of a signal).
What the parent does next is the most interesting
part of this article. It invokes ptrace with the PTRACE_
SINGLESTEP request giving it the child process ID. What
this does is tell the OS please restart the child process,
but stop it after it executes the next instruction. Again,
the parent waits for the child to stop and the loop con-
tinues. The loop will terminate when the signal that came
out of the wait call wasnt about the child stopping. Du-
ring a normal run of the tracer, this will be the signal that
tells the parent that the child process exited (WIFEXITED
would return true on it).
Note that icounter counts the amount of instructions
executed by the child process. So our simple example ac-
tually does something useful given a program name on
the command line, it executes the program and reports
the amount of CPU instructions it took to run from start
to finish. Lets see it in action.
A test run
I compiled the simple program from Listing 4. and ran it
under the tracer.
To my surprise, the tracer took quite long to run and
reported that there were more than 100,000 instruc-
tions executed. For a simple printf call? What gives? The
answer is very interesting [5]. By default, gcc on Linux
links programs to the C runtime libraries dynamically.
What this means is that one of the first things that runs
when any program is executed is the dynamic library lo-
ader that looks for the required shared libraries. This is
quite a lot of code and remember that our basic tracer
here looks at each and every instruction, not of just the
main function, but of the whole process.
So, when I linked the test program with the -static
flag (and verified that the executable gained some
500KB in weight, as is logical for a static link of the C
runtime), the tracing reported only 7,000 instructions
or so. This is still a lot, but makes perfect sense if you
recall that libc initialization still has to run before main,
and cleanup has to run after main. Besides, printf is a
complex function.
Still not satisfied, I wanted to see something testable
i.e. a whole run in which I could account for every
instruction executed. This, of course, can be done with
assembly code. So I took this version of Hello, world!
and assembled it as in Listing 5.
Sure enough. Now the tracer reported that 7 instruc-
tions were executed, which is something I can easily verify.
Deep into the instruction stream
The assembly-written program allows me to introduce
you to another powerful use of ptrace closely exami-
ning the state of the traced process. In Listing 6, there is
another version of the run_debugger function.
The only difference is in the first few lines of the while
loop. There are two new ptrace calls. The first one reads
the value of the processs registers into a structure. user_
Listing 11. readelf output on the program in
Listing 9
$ readelf -h traced_printer2
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00
00 00 00
Class: ELF32
Data: 2s
complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System
ABI Version: 0
Type: EXEC
(Executable fle)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048080
Start of program headers: 52 (bytes into
Start of section headers: 220 (bytes
into fle)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 4
Section header string table index: 3
Listing 12. Examining data at the instruction
/* Obtain and show childs instruction pointer */
ptrace(PTRACE_GETREGS, child_pid, 0, &regs);
procmsg(Child started. EIP = 0x%08x\n, regs.eip);
/* Look at the word at the address were interested
in */
unsigned addr = 0x8048096;
unsigned data = ptrace(PTRACE_PEEKTEXT, child_pid,
(void*)addr, 0);
procmsg(Original data at 0x%08x: 0x%08x\n, addr,
data); 31
How Debuggers Work
regs_struct is defined in sys/user.h. Now heres the fun
part if you look at this header file, a comment close to
the top says:
The whole purpose of this file is for GDB and GDB only. Do-
nt read too much into it. Dont use it for anything other than
GDB unless know what you are doing.
Now, I dont know about you, but it makes me feel
were on the right track :-). Anyway, back to the exam-
ple. Once we have all the registers in regs, we can pe-
ek at the current instruction of the process by calling
ptrace with PTRACE_PEEKTEXT, passing it regs.eip (the
extended instruction pointer on x86) as the address.
What we get back is the instruction [6]. Lets see this
new tracer run on our assembly-coded snippet (Listing
OK, so now in addition to icounter we also see the in-
struction pointer and the instruction it points to at each
step. How to verify this is correct? By using objdump -d
on the executable (Listing 8.).
The correspondence between this and our tracing
output is easily observed.
Attaching to a running process
As you know, debuggers can also attach to an already-
-running process. By now you wont be surprised to find
out that this is also done with ptrace, which can get the
PTRACE_ATTACH request. I wont show a code sample here
since it should be very easy to implement given the code
weve already gone through. For educational purposes,
the approach taken here is more convenient (since we
can stop the child process right at its start).
The code
The complete C source-code of the simple tracer pre-
sented in this article (the more advanced, instruction-
-printing version) is available here: http://eli.thegreenplace.
It compiles cleanly with -Wall -pedantic --std=c99 on
version 4.4 of gcc.
Next steps
Admittedly, we didnt cover much so far were still far from
having a real debugger in our hands. However, I hope it has
Listing 13. Writing the trap instruction into the
programs code
/* Write the trap instruction int 3 into the
address */
unsigned data_with_trap = (data & 0xFFFFFF00) |
ptrace(PTRACE_POKETEXT, child_pid, (void*)addr,
/* See whats there again... */
unsigned readback_data = ptrace(PTRACE_PEEKTEXT,
child_pid, (void*)addr, 0);
procmsg(After trap, data at 0x%08x: 0x%08x\n,
addr, readback_data);
Listing 14. Running the child until a breakpoint is
/* Let the child run to the breakpoint and wait for
it to
** reach it
ptrace(PTRACE_CONT, child_pid, 0, 0);
if (WIFSTOPPED(wait_status)) {
procmsg(Child got a signal: %s\n,
else {
/* See where the child is now */
ptrace(PTRACE_GETREGS, child_pid, 0, &regs);
procmsg(Child stopped at EIP = 0x%08x\n, regs.
Listing 15. Restoring previous state and running
the child
/* Remove the breakpoint by restoring the
previous data
** at the target address, and unwind the EIP back
by 1 to
** let the CPU execute the original instruction
that was
** there.
ptrace(PTRACE_POKETEXT, child_pid, (void*)addr,
regs.eip -= 1;
ptrace(PTRACE_SETREGS, child_pid, 0, &regs);
/* The child can continue running now */
ptrace(PTRACE_CONT, child_pid, 0, 0);
2/2011 32
already made the process of debugging at least a little less
mysterious. ptrace is truly a versatile system call with many
abilities, of which weve sampled only a few so far.
Single-stepping through the code is useful, but only to
a certain degree. Take the C Hello, world! sample I de-
monstrated above. To get to main it would probably take
a couple of thousands of instructions of C runtime initia-
lization code to step through. This isnt very convenient.
What wed ideally want to have is the ability to place a
breakpoint at the entry to main and step from there.
Lets see how breakpoints are implemented.
Software interrupts
Breakpoints are one of the two main pillars of debugging
the other being able to inspect values in the debugged
processs memory. Weve already seen a preview of the
other pillar before, but breakpoints still remain mysterio-
us. Soon, they wont be.
To implement breakpoints on the x86 architecture, so-
ftware interrupts (also known as traps) are used. Be-
fore we get deep into the details, I want to explain the
concept of interrupts and traps in general.
A CPU has a single stream of execution, working thro-
ugh instructions one by one [7]. To handle asynchronous
events like IO and hardware timers, CPUs use interrupts.
A hardware interrupt is usually a dedicated electrical si-
gnal to which a special response circuitry is attached.
This circuitry notices an activation of the interrupt and
makes the CPU stop its current execution, save its state,
and jump to a predefined address where a handler ro-
utine for the interrupt is located. When the handler fini-
shes its work, the CPU resumes execution from where
it stopped.
Software interrupts are similar in principle but a bit
different in practice. CPUs support special instructions
that allow the software to simulate an interrupt. When
such an instruction is executed, the CPU treats it like
an interrupt stops its normal flow of execution, saves
its state and jumps to a handler routine. Such traps
allow many of the wonders of modern OSes (task sche-
duling, virtual memory, memory protection, debugging)
to be implemented efficiently.
Some programming errors (such as division by 0) are
also treated by the CPU as traps, and are frequently re-
ferred to as exceptions. Here the line between har-
dware and software blurs, since its hard to say whether
such exceptions are really hardware interrupts or so-
ftware interrupts. But Ive digressed too far away from
the main topic, so its time to get back to breakpoints.
int 3 in theory
Having written the previous section, I can now simply say that
breakpoints are implemented on the CPU by a special trap
called int 3. int is x86 jargon for trap instruction a call to a
predefined interrupt handler. x86 supports the int instruction
with a 8-bit operand specifying the number of the interrupt
that occurred, so in theory 256 traps are supported. The first
32 are reserved by the CPU for itself, and number 3 is the
one were interested in here its called trap to debugger.
Without further ado, Ill quote from the bible itself [8]:
The INT 3 instruction generates a special one byte opcode
(CC) that is intended for calling the debug exception handler.
Listing 16. Sample assembly code
.. some code ..
jz foo
dec eax
call bar
.. some code ..
Listing 17. Sample C program serving as a trace
#include <stdio.h>
void do_stuff()
printf(Hello, );
int main()
for (int i = 0; i < 4; ++i)
return 0;
Listing 18. Interesting parts of the output of
objdump for the program in Listing 17
080483e4 <do_stuff>:
80483e4: 55 push %ebp
80483e5: 89 e5 mov
80483e7: 83 ec 18 sub
80483ea: c7 04 24 f0 84 04 08 movl
80483f1: e8 22 ff ff ff call
8048318 <puts@plt>
80483f6: c9 leave
80483f7: c3 ret 33
How Debuggers Work
(This one byte form is valuable because it can be used to repla-
ce the first byte of any instruction with a breakpoint, including
other one byte instructions, without over-writing other code).
The part in parens is important, but its still too early to
explain it. Well come back to it later in this article.
int 3 in practice
Yes, knowing the theory behind things is great, OK, but
what does this really mean? How do we use int 3 to im-
plement breakpoints? Or to paraphrase common pro-
gramming Q&A jargon Plz show me the codes!
In practice, this is really very simple. Once your pro-
cess executes the int 3 instruction, the OS stops it [9].
On Linux (which is what were concerned with in this ar-
ticle) it then sends the process a signal SIGTRAP.
Thats all there is to it honest! Now recall that a tra-
cing (debugger) process gets notified of all the signals its
child (or the process it attaches to for debugging) gets,
and you can start getting a feel of where were going.
Thats it, no more computer architecture 101 jabber.
Its time for examples and code.
Setting breakpoints manually
Im now going to show code that sets a breakpoint in a
program. The target program Im going to use for this de-
monstration is presented in Listing 9.
Listing 19. Debugging a C program with a
void run_debugger(pid_t child_pid)
procmsg(debugger started\n);
/* Wait for child to stop on its frst
instruction */
procmsg(child now at EIP = 0x%08x\n, get_
/* Create breakpoint and run to it*/
debug_breakpoint* bp = create_breakpoint(child_
pid, (void*)0x080483e4);
procmsg(breakpoint created\n);
ptrace(PTRACE_CONT, child_pid, 0, 0);
/* Loop as long as the child didnt exit */
while (1) {
/* The child is stopped at a breakpoint
here. Resume its
** execution until it either exits or hits
** breakpoint again.
procmsg(child stopped at breakpoint. EIP =
0x%08X\n, get_child_eip(child_
int rc = resume_from_breakpoint(child_pid,
if (rc == 0) {
procmsg(child exited\n);
else if (rc == 1) {
else {
procmsg(unexpected: %d\n, rc);
Listing 20. Debugging output
$ bp_use_lib traced_c_loop
[13363] debugger started
[13364] target started. will run traced_c_loop
[13363] child now at EIP = 0x00a37850
[13363] breakpoint created
[13363] child stopped at breakpoint. EIP =
[13363] resuming
[13363] child stopped at breakpoint. EIP =
[13363] resuming
[13363] child stopped at breakpoint. EIP =
[13363] resuming
[13363] child stopped at breakpoint. EIP =
[13363] resuming
[13363] child exited
2/2011 34
Im using assembly language for now, in order to
keep us clear of compilation issues and symbols that
come up when we get into C code. What the pro-
gram listed above does is simply print Hello, on
one line and then world! on the next line. Its very
similar to the program demonstrated in the previo-
us article.
I want to set a breakpoint after the first printout, but
before the second one. Lets say right after the first int
0x80 [10], on the mov edx, len2 instruction. First, we ne-
ed to know what address this instruction maps to. Run-
ning objdump -d we get whats in Listing 10.
So, the address were going to set the breakpoint on is
08048096. Wait, this is not how real debuggers work,
right? Real debuggers set breakpoints on lines of code
and on functions, not on some bare memory addresses?
Exactly right. But were still far from there to set break-
points like real debuggers we still have to cover symbols
and debugging information first, and it will take a while to
reach these topics. For now, well have to do with bare
memory addresses.
At this point I really want to digress again, so you have
two choices. If its really interesting for you to know why
the address is 08048096 and what does it mean, read
the next section. If not, and you just want to get on with
the breakpoints, you can safely skip it.
Digression process
addresses and entry point
Frankly, 08048096 itself doesnt mean much, its just a few
bytes away from the beginning of the text section of the
executable. If you look carefully at the dump listing above,
youll see that the text section starts at 008048080. This
tells the OS to map the text section starting at this ad-
dress in the virtual address space given to the process. On
Linux these addresses can be absolute (i.e. the executable
isnt being relocated when its loaded into memory), be-
cause with the virtual memory system each process gets
its own chunk of memory and sees the whole 32-bit ad-
dress space as its own (called linear address).
If we examine the ELF [11] header with readelf, we get
whats in Listing 11.
Note the entry point address section of the header,
which also points to 08048080. So if we interpret the
directions encoded in the ELF file for the OS, it says:
Map the text section (with given contents) to ad-
dress 08048080
Start executing at the entry point address
But still, why 08048080? For historic reasons, it turns
out. Some googling led me to a few sources that cla-
Listing 21. Sample C program the debugging
information of which we examine
#include <stdio.h>
void do_stuff(int my_arg)
int my_local = my_arg + 2;
int i;
for (i = 0; i < my_local; ++i)
printf(i = %d\n, i);
int main()
return 0;
Listing 22. Debug sections in the ELF executable
made from the program in Listing 21
26 .debug_aranges 00000020 00000000 00000000
27 .debug_pubnames 00000028 00000000 00000000
28 .debug_info 000000cc 00000000 00000000
29 .debug_abbrev 0000008a 00000000 00000000
30 .debug_line 0000006b 00000000 00000000
31 .debug_frame 00000044 00000000 00000000
32 .debug_str 000000ae 00000000 00000000
33 .debug_loc 00000058 00000000 00000000
How Debuggers Work
im that the frst 128MB of each processs address spa-
ce were reserved for the stack. 128MB happens to be
08000000, which is where other sections of the exe-
cutable may start. 08048080, in particular, is the de-
fault entry point used by the Linux ld linker. This en-
try point can be modifed by passing the -Ttext argu-
ment to ld.
To conclude, theres nothing really special in this ad-
dress and we can freely change it. As long as the ELF
executable is properly structured and the entry point
address in the header matches the real beginning of the
programs code (text section), were OK.
Setting breakpoints in the debugger with
int 3
To set a breakpoint at some target address in the traced
process, the debugger does the following:
Remember the data stored at the target address
Replace the frst byte at the target address with the
int 3 instruction
Then, when the debugger asks the OS to run the pro-
cess (with PTRACE _ CONT as we saw in the previous ar-
ticle), the process will run and eventually hit upon the
int 3, where it will stop and the OS will send it a signal.
This is where the debugger comes in again, receiving a
signal that its child (or traced process) was stopped. It
can then:
Replace the int 3 instruction at the target address
with the original instruction
Roll the instruction pointer of the traced process
back by one. This is needed because the instruction
pointer now points after the int 3, having already
executed it.
Allow the user to interact with the process in some
way, since the process is still halted at the desired
target address. This is the part where your debug-
ger lets you peek at variable values, the call stack
and so on.
When the user wants to keep running, the debug-
ger will take care of placing the breakpoint back
Listing 23. Interesting parts of objdump --dwarf
<1><71>: Abbrev Number: 5 (DW_TAG_subprogram)
<72> DW_AT_external : 1
<73> DW_AT_name : (...): do_stuff
<77> DW_AT_decl_fle : 1
<78> DW_AT_decl_line : 4
<79> DW_AT_prototyped : 1
<7a> DW_AT_low_pc : 0x8048604
<7e> DW_AT_high_pc : 0x804863e
<82> DW_AT_frame_base : 0x0 (location
<86> DW_AT_sibling : <0xb3>
<1><b3>: Abbrev Number: 9 (DW_TAG_subprogram)
<b4> DW_AT_external : 1
<b5> DW_AT_name : (...): main
<b9> DW_AT_decl_fle : 1
<ba> DW_AT_decl_line : 14
<bb> DW_AT_type : <0x4b>
<bf> DW_AT_low_pc : 0x804863e
<c3> DW_AT_high_pc : 0x804865a
<c7> DW_AT_frame_base : 0x2c (location
Listing 24. Interesting parts of objdump d
08048604 <do_stuff>:
8048604: 55 push ebp
8048605: 89 e5 mov ebp,esp
8048607: 83 ec 28 sub esp,0x28
804860a: 8b 45 08 mov eax,DWORD PTR
804860d: 83 c0 02 add eax,0x2
8048610: 89 45 f4 mov DWORD PTR [ebp-
8048613: c7 45 (...) mov DWORD PTR [ebp-
804861a: eb 18 jmp 8048634 <do_
804861c: b8 20 (...) mov eax,0x8048720
8048621: 8b 55 f0 mov edx,DWORD PTR
8048624: 89 54 24 04 mov DWORD PTR
8048628: 89 04 24 mov DWORD PTR
804862b: e8 04 (...) call 8048534
8048630: 83 45 f0 01 add DWORD PTR [ebp-
8048634: 8b 45 f0 mov eax,DWORD PTR
8048637: 3b 45 f4 cmp eax,DWORD PTR
804863a: 7c e0 jl 804861c <do_
804863c: c9 leave
804863d: c3 ret
2/2011 36
(since it was removed in step 1) at the target ad-
dress, unless the user asked to cancel the breakpo-
Lets see how some of these steps are translated in-
to real code. Well use the debugger template pre-
sented earlier (forking a child process and tracing it).
In any case, theres a link to the full source code of this
example which Ill present later in this article. Look at
Listing 12.
Here the debugger fetches the instruction pointer
from the traced process, as well as examines the word
currently present at 08048096. When run tracing the
assembly program listed in Listing 9, this prints:
[13028] Child started. EIP = 0x08048080
[13028] Original data at 0x08048096: 0x000007ba
So far, so good. Lets now look in Listing 13.
Note how int 3 is inserted at the target address. This
[13028] After trap, data at 0x08048096: 0x000007cc
Again, as expected 0xba was replaced with 0xcc. The
debugger now runs the child and waits for it to halt on
the breakpoint (Listing 14.).
This prints:
[13028] Child got a signal: Trace/breakpoint trap
[13028] Child stopped at EIP = 0x08048097
Note the Hello, that was printed before the break-
point exactly as we planned. Also note where the
child stopped just after the single-byte trap instruc-
Finally, as was explained earlier, to keep the child run-
ning we must do some work. We replace the trap with
the original instruction and let the process continue run-
ning from it (Listing 15.).
This makes the child print world! and exit, just as
Note that we dont restore the breakpoint here. That
can be done by executing the original instruction in sin-
gle-step mode, then placing the trap back and only then
do PTRACE_CONT. The debug library demonstrated later in
the article implements this.
Listing 25. DWARF parts for fnding variables
<1><71>: Abbrev Number: 5 (DW_TAG_subprogram)
<72> DW_AT_external : 1
<73> DW_AT_name : (...): do_stuff
<77> DW_AT_decl_fle : 1
<78> DW_AT_decl_line : 4
<79> DW_AT_prototyped : 1
<7a> DW_AT_low_pc : 0x8048604
<7e> DW_AT_high_pc : 0x804863e
<82> DW_AT_frame_base : 0x0 (location
<86> DW_AT_sibling : <0xb3>
<2><8a>: Abbrev Number: 6 (DW_TAG_formal_
<8b> DW_AT_name : (...): my_arg
<8f> DW_AT_decl_fle : 1
<90> DW_AT_decl_line : 4
<91> DW_AT_type : <0x4b>
<95> DW_AT_location : (...) (DW_OP_
fbreg: 0)
<2><98>: Abbrev Number: 7 (DW_TAG_variable)
<99> DW_AT_name : (...): my_local
<9d> DW_AT_decl_fle : 1
<9e> DW_AT_decl_line : 6
<9f> DW_AT_type : <0x4b>
<a3> DW_AT_location : (...) (DW_OP_
fbreg: -20)
<2><a6>: Abbrev Number: 8 (DW_TAG_variable)
<a7> DW_AT_name : i
<a9> DW_AT_decl_fle : 1
<aa> DW_AT_decl_line : 7
<ab> DW_AT_type : <0x4b>
<af> DW_AT_location : (...) (DW_OP_
fbreg: -24)
Listing 26. objdump --dwarf=loc output
$ objdump --dwarf=loc tracedprog2
tracedprog2: fle format elf32-i386
Contents of the .debug_loc section:
Offset Begin End Expression
00000000 08048604 08048605 (DW_OP_breg4: 4 )
00000000 08048605 08048607 (DW_OP_breg4: 8 )
00000000 08048607 0804863e (DW_OP_breg5: 8 )
00000000 <End of list>
0000002c 0804863e 0804863f (DW_OP_breg4: 4 )
0000002c 0804863f 08048641 (DW_OP_breg4: 8 )
0000002c 08048641 0804865a (DW_OP_breg5: 8 )
0000002c <End of list> 37
How Debuggers Work
More on int 3
Now is a good time to come back and examine int 3 and
that curious note from Intels manual. Here it is again:
This one byte form is valuable because it can be used to
replace the first byte of any instruction with a breakpoint, in-
cluding other one byte instructions, without over-writing other
int instructions on x86 occupy two bytes 0xcd fol-
lowed by the interrupt number [12]. int 3 couldve been
encoded as cd 03, but theres a special single-byte in-
struction reserved for it 0xcc.
Why so? Because this allows us to insert a breakpo-
int without ever overwriting more than one instruc-
tion. And this is important. Consider this sample in
Listing 16.
Suppose we want to place a breakpoint on dec eax.
This happens to be a single-byte instruction (with the
opcode 0x48). Had the replacement breakpoint instruc-
tion been longer than 1 byte, wed be forced to overwri-
te part of the next instruction (call), which would garble
it and probably produce something completely invalid.
But what is the branch jz foo was taken? Then, without
stopping on dec eax, the CPU would go straight to exe-
cute the invalid instruction after it.
Having a special 1-byte encoding for int 3 solves this
problem. Since 1 byte is the shortest an instruction can
get on x86, we guarantee than only the instruction we
want to break on gets changed.
Encapsulating some gory details
Many of the low-level details shown in code samples of
the previous section can be easily encapsulated behind
a convenient API. Ive done some encapsulation into a
small utility library called debuglib its code is available
for download at the end of the article. Here I just want
to demonstrate an example of its usage, but with a twist.
Were going to trace a program written in C.
Tracing a C program
So far, for the sake of simplicity, I focused on assembly
language targets. Its time to go one level up and see how
we can trace a program written in C.
It turns out things arent very different its just a bit
harder to find where to place the breakpoints. Consider
this simple program from Listing 17.
Suppose I want to place a breakpoint at the entrance
to do_stuff. Ill use the old friend objdump to disassem-
ble the executable, but theres a lot in it. In particular, lo-
oking at the text section is a bit useless since it contains
a lot of C runtime initialization code Im currently not
interested in. So lets just look for do_stuff in the dump
(Listing 18.).
Alright, so well place the breakpoint at 0080483e4,
which is the first instruction of do_stuff. Moreover, since
this function is called in a loop, we want to keep stopping
at the breakpoint until the loop ends. Were going to use
the debuglib library to make this simple. In Listing 19, the-
res the complete debugger function.
Instead of getting our hands dirty modifying EIP and
the target processs memory space, we just use cre-
ate_breakpoint, resume_from_breakpoint and cleanup_
breakpoint. Lets see what this prints when tracing the
simple C code displayed above (Listing 20.).
Just as expected!
The code
Here are the complete source code files for this part:
Listing 27. First instructions of the compiled do_
stuff function
08048604 <do_stuff>:
8048604: 55 push ebp
8048605: 89 e5 mov ebp,esp
8048607: 83 ec 28 sub esp,0x28
804860a: 8b 45 08 mov eax,DWORD PTR
804860d: 83 c0 02 add eax,0x2
8048610: 89 45 f4 mov DWORD PTR [ebp-
Listing 28. objdump --dwarf=decodeline output
$ objdump --dwarf=decodedline tracedprog2
tracedprog2: fle format elf32-i386
Decoded dump of debug contents of section .debug_
CU: /home/eliben/eli/eliben-code/debugger/
File name Line number Starting address
tracedprog2.c 5 0x8048604
tracedprog2.c 6 0x804860a
tracedprog2.c 9 0x8048613
tracedprog2.c 10 0x804861c
tracedprog2.c 9 0x8048630
tracedprog2.c 11 0x804863c
tracedprog2.c 15 0x804863e
tracedprog2.c 16 0x8048647
tracedprog2.c 17 0x8048653
tracedprog2.c 18 0x8048658
2/2011 38
debugger/debuggers_part2_code.tgz. In the archive youll
debuglib.h and debuglib.c the simple library for en-
capsulating some of the inner workings of a debug-
bp_manual.c the manual way of setting breakpo-
ints presented frst in this article. Uses the debuglib
library for some boilerplate code.
bp_use_lib.c uses debuglib for most of its code, as
demonstrated in the second code sample for tra-
cing the loop in a C program.
Next steps
Weve covered how breakpoints are implemented in
debuggers. While implementation details vary between
OSes, when youre on x86 its all basically variations on
the same theme substituting int 3 for the instruction
where we want the process to stop.
That said, Im sure some readers, just like me, will be
less than excited about specifying raw memory addres-
ses to break on. Wed like to say break on do_stuff, or
even break on this line in do_stuff and have the debug-
ger do it. Lets see how its done.
Debugging information
Now, Im going to explain how the debugger figures
out where to find the C functions and variables in the
machine code it wades through, and the data it uses to
map between C source code lines and machine langu-
age words.
Modern compilers do a pretty good job converting
your high-level code, with its nicely indented and ne-
sted control structures and arbitrarily typed variables
into a big pile of bits called machine code, the sole pur-
pose of which is to run as fast as possible on the target
CPU. Most lines of C get converted into several ma-
chine code instructions. Variables are shoved all over
the place into the stack, into registers, or completely
optimized away. Structures and objects dont even exist
in the resulting code theyre merely an abstraction
that gets translated to hard-coded offsets into memo-
ry buffers.
So how does a debugger know where to stop when
you ask it to break at the entry to some function? How
does it manage to find what to show you when you ask
it for the value of a variable? The answer is debugging
Debugging information is generated by the compiler
together with the machine code. It is a representation
of the relationship between the executable program
and the original source code. This information is en-
coded into a pre-defined format and stored alongside
the machine code. Many such formats were invented
over the years for different platforms and executable
files. Since the aim of this article isnt to survey the hi-
story of these formats, but rather to show how they
work, well have to settle on something. This some-
thing is going to be DWARF, which is almost ubiquito-
usly used today as the debugging information format
for ELF executables on Linux and other Unix-y plat-
The DWARF in the ELF
According to its Wikipedia page, DWARF (you can see
its logo in Figure 1.) was designed alongside ELF, although
it can in theory be embedded in other object file formats
as well [13].
DWARF is a complex format, building on many years
of experience with previous formats for various archi-
tectures and operating systems. It has to be complex,
since it solves a very tricky problem presenting de-
bugging information from any high-level language to de-
buggers, providing support for arbitrary platforms and
ABIs. It would take much more than this humble article
to explain it fully, and to be honest I dont understand all
its dark corners well enough to engage in such an ende-
Listing 29. Output of dwarf_get_func_addr
$ dwarf_get_func_addr tracedprog2
DW_TAG_subprogram: do_stuff
low pc : 0x08048604
high pc : 0x0804863e
DW_TAG_subprogram: main
low pc : 0x0804863e
high pc : 0x0804865a
Figure 1. DWARF Debugging Format logo 39
How Debuggers Work
avor anyway [14]. In this article I will take a more hands-
-on approach, showing just enough of DWARF to explain
how debugging information works in practical terms.
Debug sections in ELF files
First lets take a glimpse of where the DWARF info is pla-
ced inside ELF files. ELF defines arbitrary sections that
may exist in each object file. A section header table defines
which sections exist and their names. Different tools treat
various sections in special ways for example the linker is
looking for some sections, the debugger for others.
Well be using an executable built from this C source
(Listing 21.) for our experiments in this article, compiled
into tracedprog2.
Dumping the section headers from the ELF executable
using objdump -h well notice several sections with names
beginning with .debug_ these are the DWARF debug-
ging sections. See Listing 22.
The first number seen for each section here is its size,
and the last is the offset where it begins in the ELF file.
The debugger uses this information to read the section
from the executable.
Now lets see a few practical examples of finding useful
debug information in DWARF.
Finding functions
One of the most basic things we want to do when de-
bugging is placing breakpoints at some function, expec-
ting the debugger to break right at its entrance. To be
able to perform this feat, the debugger must have some
mapping between a function name in the high-level code
and the address in the machine code where the instruc-
tions for this function begin.
This information can be obtained from DWARF by
looking at the .debug_info section. Before we go fur-
ther, a bit of background. The basic descriptive entity in
DWARF is called the Debugging Information Entry (DIE).
Each DIE has a tag its type, and a set of attributes. DIEs
are interlinked via sibling and child links, and values of at-
tributes can point at other DIEs.
Lets run:
objdump --dwarf=info tracedprog2
The output is quite long, and for this example well just
focus on the lines in Listing 23. [15].
There are two entries (DIEs) tagged DW_TAG_subprogram,
which is a function in DWARFs jargon. Note that there-
s an entry for do_stuff and an entry for main. There are
several interesting attributes, but the one that interests
us here is DW_AT_low_pc. This is the program-counter (EIP
in x86) value for the beginning of the function. Note that
its 0x8048604 for do_stuff. Now lets see what this ad-
dress is in the disassembly of the executable by running
objdump -d (Listing 24.).
Indeed, 0x8048604 is the beginning of do_stuff, so the
debugger can have a mapping between functions and the-
ir locations in the executable.
Finding variables
Suppose that weve indeed stopped at a breakpoint in-
side do_stuff. We want to ask the debugger to show us
the value of the my_local variable. How does it know
where to find it? Turns out this is much trickier than fin-
ding functions. Variables can be located in global storage,
on the stack, and even in registers. Additionally, variables
with the same name can have different values in different
lexical scopes. The debugging information has to be able
to reflect all these variations, and indeed DWARF does.
I wont cover all the possibilities, but as an example Ill
demonstrate how the debugger can find my_local in do_
stuff. Lets start at .debug_info and look at the entry for
do_stuff again, this time also looking at a couple of its
sub-entries (Listing 25.).
Note the first number inside the angle brackets in
each entry. This is the nesting level in this example en-
tries with <2> are children of the entry with <1>. So we
know that the variable my_local (marked by the DW_TAG_
variable tag) is a child of the do_stuff function. The de-
bugger is also interested in a variables type to be able
to display it correctly. In the case of my_local the type
points to another DIE <0x4b>. If we look it up in the
output of objdump well see its a signed 4-byte integer.
To actually locate the variable in the memory image of
the executing process, the debugger will look at the DW_
AT_location attribute. For my_local it says DW_OP_fbreg:
-20. This means that the variable is stored at offset -20
from the DW_AT_frame_base attribute of its containing
function which is the base of the frame for the function.
The DW_AT_frame_base attribute of do_stuff has the va-
lue 0x0 (location list), which means that this value actu-
ally has to be looked up in the location list section. Lets
look at it (Listing 26.).
The location information were interested in is the first
one [16]. For each address where the debugger may be,
it specifies the current frame base from which offsets to
variables are to be computed as an offset from a register.
For x86, bpreg4 refers to esp and bpreg5 refers to ebp.
Its educational to look at the first several instructions
of do_stuff again (Listing 27.).
Note that ebp becomes relevant only after the second
instruction is executed, and indeed for the first two ad-
dresses the base is computed from esp in the location
information listed above. Once ebp is valid, its convenient
to compute offsets relative to it because it stays constant
while esp keeps moving with data being pushed and pop-
ped from the stack.
So where does it leave us with my_local? Were on-
ly really interested in its value after the instruction
at 0x8048610 (where its value is placed in memory
2/2011 40
after being computed in eax), so the debugger will
be using the DW_OP_breg5: 8 frame base to find it.
Now its time to rewind a little and recall that the
DW_AT_location attribute for my_local says DW_OP_
fbreg: -20. Lets do the math: -20 from the frame
base, which is ebp + 8. We get ebp - 12. Now look
at the disassembly again and note where the data is
moved from eax indeed, ebp - 12 is where my_local
is stored.
Looking up line numbers
When we talked about finding functions in the debugging
information, I was cheating a little. When we debug C so-
urce code and put a breakpoint in a function, were usu-
ally not interested in the first machine code instruction
[17]. What were really interested in is the first C code
line of the function.
This is why DWARF encodes a full mapping between
lines in the C source code and machine code addresses
in the executable. This information is contained in the
.debug_line section and can be extracted in a readable
form as in Listing 28.
It shouldnt be hard to see the correspondence be-
tween this information, the C source code and the di-
sassembly dump. Line number 5 points at the entry po-
int to do_stuff 0x8040604. The next line, 6, is where
the debugger should really stop when asked to break
in do_stuff, and it points at 0x804860a which is just
past the prologue of the function. This line information
easily allows bi-directional mapping between lines and
When asked to place a breakpoint at a certain li-
ne, the debugger will use it to fnd which address
it should put its trap on (remember our friend int 3
from the previous article?)
When an instruction causes a segmentation fault,
the debugger will use it to fnd the source code line
on which it happened
libdwarf Working with DWARF
Employing command-line tools to access DWARF infor-
mation, while useful, isnt fully satisfying. As programmers,
wed like to know how to write actual code that can read
the format and extract what we need from it.
Naturally, one approach is to grab the DWARF spe-
cification and start hacking away. Now, remember how
everyone keeps saying that you should never, ever par-
se HTML manually but rather use a library? Well, with
DWARF its even worse. DWARF is much more com-
plex than HTML. What Ive shown here is just the tip
of the iceberg, and to make things even harder, most
of this information is encoded in a very compact and
compressed way in the actual object file [18].
So well take another road and use a library to work
with DWARF. There are two major libraries Im aware of
(plus a few less complete ones):
BFD (libbfd) is used by the GNU binutils, including ob-
jdump which played a star role in this article, ld (the
GNU linker) and as (the GNU assembler)
libdwarf which together with its big brother libelf
are used for the tools on Solaris and FreeBSD ope-
rating systems
Im picking libdwarf over BFD because it appears less ar-
cane to me and its license is more liberal (LGPL vs. GPL).
Playing with ptrace, Part I,1
Process tracing using ptrace
How debugger works
Understanding ELF using readelf and objdump
Implementing breakpoints on x86 Linux
NASM manual
SO discussion of the ELF entry point
This Hacker News discussion of the frst part of the series
GDB Internals
objdump man page
Wikipedia pages for ELF
Dwarf Debugging Standard home page from here you can obtain the excellent DWARF tutorial by Mi-
chael Eager, as well as the DWARF standard itself. Youll probably want version 2 since its what gcc pro-
libdwarf home page the download package includes a comprehensive reference document for the li-
BFD documentation 41
How Debuggers Work
Since libdwarf is itself quite complex it requires a lot
of code to operate. Im not going to show all this code
in the article, but you can download it from here: http://
dwarf_get_func_addr.c and run it yourself. To compile this
file youll need to have libelf and libdwarf installed, and
pass the -lelf and -ldwarf flags to the linker.
The demonstrated program takes an executable and
prints the names of functions in it, along with their entry
points. Listing 29. shows its output for the C program
weve been playing with.
The documentation of libdwarf is quite good, and with
some effort you should have no problem pulling any
other information demonstrated in this article from the
DWARF sections using it.
Debugging information is a simple concept in principle.
The implementation details may be intricate, but in the
end of the day what matters is that we now know how
the debugger finds the information it needs about the
original source code from which the executable its tra-
cing was compiled. With this information in hand, the
debugger bridges between the world of the user, who
thinks in terms of lines of code and data structures, and
the world of the executable, which is just a bunch of
machine code instructions and data in registers and me-
This article explains the inner workings of a debugger.
Using the information presented here and some pro-
gramming effort, it should be possible to create a basic
but functional debugger for Linux.
Eli has been programming non-stop for the
past 13 years, working on a wide variety of
projects from embedded micro-controller co-
de to web applications.
[1] I didnt check but Im sure the LOC count of gdb is at least in the six-fgures range.
[2] Run man 2 ptrace for complete enlightment.
[3] Peek and poke are well-known system programming jargon for directly reading and writing memory
[4] This article assumes some basic level of Unix/Linux programming experience. I assume you know (at le-
ast conceptually) about fork, the exec family of functions and Unix signals.
[5] At least if youre as obsessed with low-level details as I am :-)
[6] A word of warning here: as I noted above, a lot of this is highly platform specifc. Im making some sim-
plifying assumptions for example, x86 instructions dont have to ft into 4 bytes (the size of unsigned on
my 32-bit Ubuntu machine). In fact, many wont. Peeking at instructions meaningfully requires us to ha-
ve a complete disassembler at hand. We dont have one here, but real debuggers do.
[7] On a high-level view this is true. Down in the gory details, many CPUs today execute multiple instruc-
tions in parallel, some of them not in their original order.
[8] The bible in this case being, of course, Intels Architecture software developers manual, volume 2A.
[9] How can the OS stop a process just like that? The OS registered its own handler for int 3 with the CPU,
thats how!
[10] Wait, int again? Yes! Linux uses int 0x80 to implement system calls from user processes into the OS kernel.
The user places the number of the system call and its arguments into registers and executes int 0x80.
The CPU then jumps to the appropriate interrupt handler, where the OS registered a procedure that lo-
oks at the registers and decides which system call to execute.
[11] ELF (Executable and Linkable Format) is the fle format used by Linux for object fles, shared libraries and
[12] An observant reader can spot the translation of int 0x80 into cd 80 in the dumps listed above.
[13] DWARF is an open standard, published by the DWARF standards committee.
[14] At the end of the article Ive collected some useful resources that will help you get more familiar with
DWARF, if youre interested. Particularly, start with the DWARF tutorial.
[15] Here and in subsequent examples, Im placing (...) instead of some longer and un-interesting informa-
tion for the sake of more convenient formatting.
[16] Because the DW_AT_frame_base attribute of do_stuff contains offset 0x0 into the location list. Note that
the same attribute for main contains the offset 0x2c which is the offset for the second set of location
[17] Where the function prologue is usually executed and the local variables arent even valid yet.
[18] Some parts of the information (such as location data and line number data) are encoded as instruc-
tions for a specialized virtual machine. Yes, really.

You might also like