You are on page 1of 12

Debugging Interrupts by Stuart Ball

INTRODUCTION Interrupts in an embedded system usually notify the CPU of some external event that requires immediate attention. An interrupt causes a temporary redirection of the program flow to service whatever condition caused the interrupt. Typical reasons to use interrupts include: Insuring that high priority tasks are executed immediately. An example of this would be a byte received by a UART, where the data must be read before the next byte is received, to prevent the first one from being overwritten. Reducing hardware cost/complexity by implementing functions in software. Examples of this would be a regular interrupt for timekeeping, or the use of a regular timer interrupt to cause the software to multiplex a multi-digit numeric LED display. Interrupts are the best and sometimes only solution to real time requirements, but they also reduce the stability of a system. Interrupts do not necessarily make a system unstable, but they nearly always make it less stable. This paper will look at several causes of interrupt problems, how to prevent them, and some techniques for debugging the interrupt system. Types Of Interrupts Interrupts come in two flavors; edge and level sensitive. As shown in Figure 1, a level sensitive interrupt is recognized when it is in the active state (high in the figure). Once the interrupt is serviced, it is expected to return to the inactive state, or else it will be continuously serviced. Edge sensitive interrupts are recognized on the transition from the inactive to the active state. An edge sensitive interrupt typically must remain in the active state until serviced. An edge sensitive interrupt may remain in the active state once serviced; it is the transition that causes the interrupt.

Edge sensitive interrupts are often left in the active state and pulsed to the inactive state to generate an interrupt (Figure 1).

Level sensitive: Interrupt is recognized when in the active state. If interrupt stays active, it will be continuously serviced.

Edge sensitive: Interrupt is recognized by by transition to active state. If interrupt stays active, only the first transition will be recognized.

Edge sensitive interrupts are often left in the active state and pulsed inactive to generate an interrupt.

Figure 1 Types of Interrupts

Interrupt Components Figure 2 shows the components of an interrupt system. The hardware components consist of: The external event that generates an interrupt request. The interrupt controller. The CPU. The software components consist of: An interrupt vector table. The ISR. The system stack. Figure 3 shows the mechanics of an interrupt. The software is executing a program when an external event occurs, generating an interrupt request. The interrupt controller may have several interrupt inputs, but it has only one interrupt output which it uses to request an interrupt from the CPU. Assuming the software has enabled interrupts, the CPU will save the return address of the program (the next instruction that would have been executed) on the stack and execute an interrupt acknowledge cycle. In response to the interrupt acknowledge, the controller will pass a vector to the CPU. Each interrupt input will typically have a unique vector. The CPU uses the vector as an index into the interrupt vector table, usually in RAM. The interrupt vector table contains pointers to the various ISRs. After the ISR executes, it must notify the interrupt controller when it is done. The ISR code reenables interrupts and executes an interrupt return. The CPU pops the return address from the stack and execution of the program resumes where it left off, with some delay.

Hardware: External Event Interrupt Request Interrupt Request Interrupt Controller Acknowledge Vector CPU

Software:

Interrupt Vector Table

System Stack

ISR

System ROM (Typically)

Vectors point to ISRs Figure 2 Interrupt Components

Program

ISR

Program

External event activates interrupt Interrupt controller sends interrupt request to CPU CPU saves return address on stack CPU returns interrupt acknowledge to controller Controller sends vector to CPU CPU uses vector as index to vector table. Figure 3 Interrupt Mechanics

ISR code informs controller that interrupt execution is complete ISR code reenables interrupts ISR code executes interrupt return CPU pops return address from stack Control returns to interrupted program

THE TWO LAWS OF INTERRUPTS Although this is a brief and general overview of the interrupt sequence, it leads us to the two immutable laws of interrupts: Law 1: An interrupt can occur at any time. It is asynchronous to the program. Law 2: Regardless of programming technique, all interrupts use one global resource: Time.

The majority of difficult interrupt problems are caused by the effects of these two laws. Interrupt Structures: There are some variations on the interrupt structure just described. Many microcontrollers, such as the 8031, 6805, and PIC17C42, do not have an interrupt vector table. Instead, each interrupt source vectors to a specific execution address. Some processors have a unique interrupt return instruction which returns from the ISR and reenables interrupts. Some CPUs (e.g. 80186) have an internal controller, while others require an external controller (8086). Figure 4 shows a daisy-chained interrupt structure. In this scheme, peripherals have priority in and priority out signals. Priority In of lower priority devices is tied to Priority Out of the next higher priority device. A peripheral can only assert the interrupt if its Priority In is active, and a peripheral asserting an interrupt will drive its Priority Out inactive. In this way, the highest priority peripheral that wants to assert an interrupt will be able to do so. The priority of each peripheral is fixed by its position in the priority chain.

Priority In Interrupt Acknowledge

Priority Out

Priority In Interrupt Acknowledge

Priority Out

Interrupt Request

Interrupt Request

To CPU From CPU


Figure 4 Daisy Chain

PROBLEMS CAUSED BY LAW 2 - INTERRUPTS ARE ASYNCHRONOUS Shared Memory: Figure 5 shows a scenario where a counter variable is shared between ISR and non-ISR code. The counter is counting interrupts for some unknown reason. The ISR increments the count, and the non-ISR code decrements the count when it processes whatever action the interrupt requested. The logic for the ISR and non-ISR code is shown below: Non-ISR logic: 1. Read counter 2. Decrement 3. Write counter ISR logic: A. Read Counter B. Increment C. Write counter

Again, as shown in the figure, if the interrupt occurs between the time the non-ISR code reads the counter and the time it writes the new value, the counter value will be incorrect (4 instead of 5). This problem can be fixed in three ways: Protect non-ISR code where it must be indivisible: Disable interrupts 1. Read counter 2. Decrement 3. Write counter Enable interrupts Use indivisible read/decrement/store instruction if CPU has one. Avoid variables that are written by both ISR and non-ISR code. Values written by an ISR should be read-only to non-ISR code and vice-versa. In this example, the counter could be implemented by a pair of counters. The ISR would increment one, and the non-ISR code would increment the other. The actual count is the difference between the two.

Count

4 1

5 2

Non-ISR code

3 A B C

ISR code

Interrupts occur here Bad result Figure 5 Shared memory problem

Shared Hardware: Figure 6 shows a 74xx374 (xx = LS, HC, AC, ACT) register that is written by the CPU. Since the register is write-only, a variable called REGMASK contains the current value in the register. The least significant bit of the register drives an LED, and is controlled by the ISR, which blinks the LED at a regular rate. The MSB is used by non-ISR code to control a relay. The remaining bits could control a stepper motor, or something similar. The non-ISR logic looks like this: 1. Read REGMASK 2. Set/Clear whatever bit needs to be changed 3. Write result to HW register 4. Write result to REGMASK

The ISR logic looks like this: A. Read REGMASK B. Toggle bit 0 to blink LED C. Write result to HW register D. Write result to REGMASK If the interrupt occurs between non-ISR statements 1 and 3, the non-ISR controlled bits in REGMASK and in the HW register will be correct, but the LED toggle will be lost. If the interrupt occurs between statements 3 and 4, a momentary pulse will occur at the output of the HW register on the bit the non-ISR code attempted to change. The HW register will be left in the wrong state, but REGMASK will be correct (except for the LED bit). The HW register will be corrected the next time it is updated, with the result being a delay in changing whatever bit the non-ISR code wanted to alter. This can result in subtle and intermittent problems such as premature relay failure or loss of sync in a stepper motor. The fixes for this are: Use disable/enable pairs to protect the non-ISR code. Use two masks. The ISR code sets a mask for the LED, but does not update the hardware register. The non-ISR code merges the LED mask with REGMASK when updating the hardware. The reverse also works, with the non-ISR code setting bits in a mask and the ISR code merging this with the LED toggle. Either method has the disadvantage that a change will not always be reflected immediately in the hardware.

+5 D7 RELAY LED D0 8-bit register Figure 6 A register written by ISR and non-ISR code +5

CPU data bus

Peripheral IC With Address Selection Register: Some peripheral ICs, such as the Z8530 and the Z8536, have a number of internal registers. As shown in Figure 7, these devices use an internal address register to select a specific data register. The controlling CPU typically sees two addresses - one for the address register and one to read/write data from the selected register. To access a particular data register, the software first writes the register number to the address register and then reads or writes the data.

In many cases, it is necessary for both ISR and non-ISR code to access the device. If the interrupt occurs between the time the non-ISR code writes the address register and the time it accesses the data register, the non-ISR code will get data from whatever register the ISR code last accessed. The fix for this problem is to protect the non-ISR accesses with disable/enable pairs.

Peripheral IC CPU DATA BUS

Address Register

Decode Logic

Register Array

Figure 7 Peripheral IC with address selection register

The First Two Rules: These three scenarios lead to the first two rules of avoiding ISR problems: Rule 1: Avoid having any variable or hardware written by both ISR and non-ISR code. Variables should be written by one and read by the other. Rule 2: When it is impossible to avoid shared variables or hardware, identify and protect all nonISR code that needs to be indivisible. But You Can Follow The Rules And Still Have A Problem: The following non-ISR psuedocode uses a variable, X: Read X If X = 1, do something Read X If x = 2, do something else Read X If x = 3, do a third thing The intent is that one of the three things will be performed on each pass through this section of code. However, if the ISR modifies X, then an interrupt during this sequence can cause bizarre and apparently impossible results, since two of the somethings can be performed in a single pass through the code. This scenario is not as unrealistic as it may appear; the 8031 microcontroller, for example, does not have a compare instruction. To perform the comparison as shown, the variable to be tested must be placed in the accumulator register and an XOR must be executed. This alters the contents of the accumulator, so it must be reloaded for the next comparison.

The fix, of course, is to read X at the beginning of the code section, store it someplace, and use the copy. But this scenario leads us to rule 3: Rule 3: Always assume that an ISR-modified variable can change between any two successive reads. Sooner or later, it will. PROBLEMS CAUSED BY ISR DELAYS Counter Delay: In Figure 8, a system uses a 16-bit counter for timetagging. Every so often, the software reads the count. Since the CPU is 8 bits, it must read the counter twice to get the full 16 bit count. An unrelated interrupt that occurs between the two paired reads can cause the code to get the wrong counter value, as shown in the figure. Because of the ISR execution time, the counter rolls over from 34FF to 3500. The software, reading one byte before the rollover and one byte after, gets a value of 3400 or 35FF (depending on which byte is read first). The fixes for this are straightforward: Read the count twice to verify a good read. Protect counter reads with disable/enable. Use counter with separate 'freeze' register.

Timer Value:

34FD

34FE

34FF

3500

3501

CPU reads timer: Unrelated interrupt occurs ISR service time

CPU reads count of 3400 or 35FF

Figure 8 ISR Execution Time

This scenario leads us to rule 4: Rule 4: The real world keeps happening while interrupts are being serviced. Interrupt Stackup: Figure 9 shows a system with three interrupts and three ISRs. The code is executing in the background when interrupt 1 occurs. During execution of ISR 1, interrupt 2 occurs. As soon as ISR 1 is finished, ISR 2 will be executed. During ISR 2, interrupt 3 occurs, so ISR 2 is followed by ISR 3. The background code stops execution for the sum of the time it takes ISR 1, ISR 2 and ISR 3 to execute. Note that the interrupts do not have to occur simultaneously to stack up and appear simultaneous to the non-ISR code. This leads to rule 5:

Rule 5: In any system with more than one active interrupt, sooner or later, the interrupts will stack up. Count on it.

Background stops execution for this long.

Code being executed Interrupt 1 occurs here Interrupt 2 occurs here And is serviced here. Interrupt 3 occurs here and is serviced here.

Background

ISR 1

ISR 2

ISR 3

Background

Figure 9 Interrupt Stackup

To minimize the impact of interrupt execution time on the rest of the system: Rule 6: Keep ISRs as short and simple as possible. For example, in a system that processes commands from a host PC via a serial port, don't put the command processor in the ISR that services the UART receive interrupts. Instead, let the ISR buffer the incoming data in a software FIFO, and let the background code perform the command processing. Interrupt Latency: Interrupt latency may be loosely defined as the time from when an interrupt occurs until it is serviced. This time can vary for a number of reasons. First, the CPU will always finish the current instruction before servicing an interrupt, and some instructions can take longer than others. The CPU may be executing a sequence of instructions protected by a disable/enable pair. Or, the CPU may be executing another ISR, which often has interrupts disabled. In Figure 10, a timer generates an interrupt when it times out. The timer is reloaded and restarted by the ISR. The intent is to generate regular timer events. But as you can see, the varying latency causes the actual timer ticks to move out in time. And the latencies are cumulative. This characteristic limits the repeatability of events in software using interrupts. For extremely accurate timing, the timebase must be in hardware.

ISR: ISR latency

Desired timer events ISR + latency Actual timer events

Figure 10 Interrupt Latency

Other Interrupt Problems: Not related to timing issues are problems such as stack overflow, caused by having insufficient stack space to support both interrupts and normal program execution. Another potential problem is subroutines shared between ISR and non-ISR code that are not reentrant. Finally, if nested interrupts are used (where an ISR can itself be interrupted by a higher priority request), remember that all the rules apply equally to the ISRs themselves. DEBUGGING TRICKS Measuring ISR Latency: As shown in Figure 11, a DSO in repetitive mode can be connected to a level-sensitive interrupt to measure latency. In repetitive mode, the new trace overlays the old trace instead of replacing it. If the DSO is set to trigger on assertion of the interrupt (rising edge in the figure), the trace will show the range of time it takes the software to service (and clear) the interrupt. The figure shows a single case of delayed servicing that might be caused by a disable/enable pair or by another ISR.

DSO in repetitive mode, triggered on leading edge of interrupt input.

Figure 11 Measuring interrupt latency with a DSO

Measuring ISR Execution Time: The technique for measuring latency does not give an idea of the total ISR execution time, unless the interrupt is reset at the end of the ISR. ISR execution time can be approximately measured by having the software set a bit at the start of the ISR and clearing it at the end of the ISR. The bit can be a spare port bit on a microcontroller or PIO peripheral, or one bit of a hardware register. The bit can be connected to a DSO or a logic analyzer, or to an analog oscilloscope for a running real-time estimate of execution time.

If a logic analyzer with timetagging capability is used, the difference between the leading and trailing edges of several interrupts can be captured, stored to disk, and plotted using a spreadsheet. Measuring ISR Throughput Usage: Sometimes it is useful to know how much of the total throughput is being spent in the ISRs. A rough estimation of this can be obtained in software with a polling (or background) loop by having each ISR set a common port bit on entry. The background code resets the bit once for each pass through the background loop. Monitoring the high vs low time of the bit on an oscilloscope will give an indication of how much time is spent in the ISRs. Debugging Interrupt Problems Debugging embedded software is traditionally based on setting a breakpoint and looking at registers, trace buffers, and other information to determine what went wrong. Since interrupts are asynchronous and continuous, this method has definite limitations for the following reasons: The cause of a problem may occur many machine cycles prior to the symptom, far outside the depth of the trace buffer. You often cannot single step the code because interrupt events stack up and flood the CPU as soon as you attempt another step. Breakpoints are difficult to set since the interrupts are asynchronous to the code. In spite of this, the general debugging method for interrupts is the same as for any problem: Identify all the symptoms. Form one or more theories that fit all the facts. Test the theories, discarding those that are disproven. If it seems impossible, you've overlooked something. Don't get into the trap of ignoring symptoms in favor of a theory. For example, if interrupts seem to stop being serviced for a few seconds, you might theorize that the software turns interrupts off and doesn't turn them back on. However, if you notice that the heartbeat LED, which is controlled by a timer ISR, keeps blinking, then you need a new theory. Action codes: Action codes are bytes (or words) written to an unused I/O port or memory location. The codes are captured, usually on a logic analyzer. As an alternative, the codes can be written to a rotating trace buffer in memory, where they can be examined as part of a post mortem dump. The use of action codes, and various techniques for generating them, are covered in more detail in Embedded Microprocessor Systems, Real World Design (ISBN 0-7506-9791-1), by the author. They will also be covered in the upcoming Debugging Embedded Systems (ISBN 0-7506-9990-6), also by the author.

Action codes give more of a macro view of software execution than the traditional trace buffer. Example codes for a very simple system might look like this: 80: Interrupt entry 81: Interrupt exit 4x: Command code x received from host. A Debugging Scenario Using Action Codes: While space does not permit inclusion here, the conference presentation will include a typical interrupt debugging scenario. CONCLUSION: The two laws of interrupts: Law 1: An interrupt can occur at any time, between any two machine instructions. Law 2: All interrupts use time. The 6 design rules: Rule 1: Avoid having any variable or hardware written by both ISR and non-ISR code. Variables should be written by one and read by the other. Rule 2: When it is impossible to avoid shared variables or hardware, identify and protect all nonISR code that needs to be indivisible. Rule 3: Always assume that an ISR-modified variable will change between two successive reads. Rule 4: The real world keeps happening while interrupts are being serviced. Rule 5: In any system with more than one active interrupt, sooner or later, the interrupts will stack up. Rule 6: Keep ISRs as short and simple as possible.

You might also like