You are on page 1of 67

Why ARM?

 As of 2012, about 98% of the more than one


billion mobile phones sold each year use at least
one ARM processor.

 As of 2015, ARM processors account for


approximately 95% of all embedded 32-bit RISC
processors
History
 ARM was developed at Acron Computer Limited
of Cambridge, England between 1983 and 1985
 RISC concept introduced in 1980 at Stanford
and Berkley
 ARM Limited founded in 1990
 ARM Cores
 Licensed to partners to develop and fabricate
new micro-controllers
 Soft-core
ARM Architecture
 Based upon RISC Architecture with enhancements
to meet requirements of embedded applications
 A large uniform register file
 Load-store architecture, where data processing
operations operate on register contents only
 Uniform and fixed length instructions
 32-bit processor
 Instructions are 32-bit long
 Good Speed/Power Consumption Ratio
 High Code Density
Enhancement to Basic RISC Features
 Variable cycle execution for certain instructions
 load-store-multiple instructions
 Inline barrel shifter leading to more complex
instructions
 Preprocessing one of the input registers before use
 Thumb 16-bit instruction set
 Code density improved by 30% over 32-bit instructions
 Enhanced DSP instructions
 Support fast 16x16 multiplier operations
Enhancement to Basic RISC Features
 Auto-increment and auto-decrement addressing
modes to optimize program loops
 Load and Store Multiple instructions to maximize
data throughput
 Conditional Execution of instruction to maximize
execution throughput
ARM Architecture Versions
 Version 1 (1983-85)
 26 bit addressing, no multiply or co-processor
 Version 2
 Includes 32-bit result multiply co-processor
 Version 3
 32 bit addressing
 Version 4
 Add signed, unsigned half-word and signed byte load and store
instructions
 Version 4T
 16-bit Thumb compressed form of instruction introduced
ARM Architecture Versions
 Version 5T
 Superset of 4T adding new instructions
 Version 5TE
 Add signal processing signal extension
 Examples:
 ARM 6: v3
 ARM 7: v3, ARM7TDMI: v4T
 StrongARM: v4
 ARM 9E-S: v5TE
Overview: Core Data Path
 Data items are placed in register file
 No data processing instructions directly
manipulate data in memory
 Instructions typically use two source registers and
single result or destination registers
 A Barrel shifter on the data path can pre-process
data before it enters ALU
 Increment/decrement logic can update register
content for sequential access independent of ALU
Basic ARM
Organization
Registers
 General Purpose registers hold either
data or address
 All registers are of 32 bits
 In user mode 16 data registers and 2
status registers are visible
 Data registers: r0 to r15
 Three registers r13, r14, r15 perform
special functions
 r13: stack pointer
 r14: link register
 r15: program counter
Registers (2)
 Depending upon context, registers r13 and r14 can
also be used as GPR
 Any instruction which use r0 can as well be used
with any other GPR (r1-r13) (Orthogonal)

 In addition, there are two status registers


 CPSR: current program status register
 SPSR: saved program status register
Status Registers
 CPSR: monitors and controls internal operations
CPSR: Example
ARM Status Bits
 Every arithmetic, logical, or shifting
operation sets CPSR bits:
 N (negative), Z (zero), C (carry), V
(overflow).
Processor Modes
 Processor modes determine
 Which registers are active, and
 Access rights to CPSR register itself
 Each processor mode is either
 Privileged: full read-write access to the CPSR
 Non-privileged: read-only access to the
control field of CPSR but read-write access to
the condition flags
Processor Modes (2)
 ARM has seven modes
 Privileged: abort, fast interrupt request,
interrupt request, supervisor, system and
undefined
 Non-privileged: user
 User mode is used for programs and
applications
Privileged Modes
 Abort
 when there is a failed attempt to access
memory
 Fast Interrupt Request (FIQ) & interrupt request
 correspond to interrupt levels available on
ARM
 Supervisor mode
 state after reset and generally the mode in
which OS kernel executes
Privileged Modes (2)
 System mode
 special version of user mode that allows full
read-write access of CPSR
 Undefined
 when processor encounters an undefined
instruction
Processor Modes
Processor Modes
Banked Registers
 Register file contains in all 37 registers
 20 registers are hidden from program at
different times
 These registers are called banked registers
 Banked registers are available only when the
processor is in a particular mode
 Processor modes (other than system mode) have a
set of associated banked registers that are subset of
16 registers
 Maps one-to-one onto a user mode register
Register Banking
SPSR
 Each privileged mode (except system mode) has
associated with it, a Save Program Status Register
or SPSR
 This SPSR is used to save the state of CPSR
(Current Program Status Register) when the
privileged mode is entered in order that the user
state can be fully restored when the user process is
resumed
Mode Changing
 Mode changes by writing directly to CPSR
or by hardware when the processor
responds to exception or interrupt
 To return to user mode a special return
instruction is used that instructs the core to
restore the original CPSR and banked
registers
Mode
Changing
Instructions
 Instructions process data held in registers and
access memory with load and store instructions
 Classes of instructions:
 Data processing
 Branch instructions
 Load-store instructions
 Software interrupt instructions
 Program status register instructions
Features of ARM instruction set
 3-address data processing instructions
 Conditional execution of every instruction
 Load and store multiple registers
 Shift, ALU operation in a single instruction
ARM data instructions
 Basic format:
 ADD r0,r1,r2
 Computes r1+r2, stores in r0.
 Immediate operand:
 ADD r0,r1,#2
 Computes r1+2, stores in r0.
Data Processing
 Manipulate data within registers
 MOVE instructions
 Arithmetic instructions
 Logical instructions
 Comparison instructions
 Suffix S on data processing instructions
updates flags in CPSR
Data Processing Instructions
 Operands are 32-bit wide; come from
registers or specified as literal (immediate
operands) in the instruction itself
 Second operand sent to ALU via barrel
shifter
 32-bit result placed in register; long
multiply instruction produces 64 bit result
Move instruction
 MOV Rd, N
 Rd: destination register
 N: can be an immediate value or source register
 Example: mov r7, r5

 MVN Rd, N
 Move into Rd not (inverse) of the 32-bit value from
source
Using Barrel Shifter
 Enables shifting 32-bit operand in one of the source
registers left or right by a specific number of
positions
 Basic Barrel shifter operations
 Shift left, shift right, rotate right
 Facilitates fast multiply, division and increases code
density
 Example: mov r7, r5, LSL # 2
 Multiplies content of r5 by 4 and puts result in r7
Using Barrel Shifter
Barrel Shift Instructions
 LSL, LSR : logical shift left/right
 fills with zeroes.
 ASL, ASR : arithmetic shift left/right
 fills with ones.

 ROR : rotate right

 RRX : rotate right extended with C


 performs 33-bit rotate, including C bit from CPSR
above sign bit.
Barrel Shift with Carry
Arithmetic Instructions
 Implements 32 bit addition and subtraction
 3-operand form

 Examples
SUB r0, r1, r2
Subtract value stored in r2 from that of r1 and store in r0
SUBS r1, r1, #1
Subtract 1 from r1 and store result in r1 and update Z and C
flags
Arithmetic Instructions
 ADD
 add
 SUB
 subtract
 MUL, MLA
 multiply (and accumulate)
Multiply Instructions
 Multiply contents of a pair of registers
 Long multiply generates 64 bit result
 Examples:
 MUL r0, r1, r2
 Contents of r1 and r2 multiplied and put in r0
 UMULL r0, r1, r2, r3
 Unsigned multiply with result stored in r0 and r1
Multiply and Accumulate
 Result of multiplication can be accumulated with
content of another register

 MLA Rd, Rm, Rs, Rn


 Rd = (Rm * Rs) + Rn

 UMLAL Rdlo, Rdhi, Rm, Rs


 [Rdhi, Rdlo] = [Rdhi, Rdlo] + (Rm * Rs)
Logical Instructions
 Bit-wise logical operations on the two source
registers
 Operators: AND, OR, EOR (Ex-OR), BIC (bit clear)

 Example: BIC r0, r1, r2


 r2 contains a binary pattern where every binary 1 in r2
clears a corresponding bit location in register r1
 Useful in manipulating status flags and interrupt masks
With Barrel Shifter
 Use of barrel shifter with arithmetic and logical
instructions increases the set of possible available
operations

 Example:
 ADD r0, r1, r1 LSL # 1
 Register r1 is shifted to the left by 1, then it is added
with r1 and the result (3 times r1) is stored in r0.
Compare Instructions
 Enables comparison of 32 bit values
 Updates CPSR flags but do not affect other registers

 Examples
 CMP r0, r9
 Flags set as a result of r0 – r9
 TEQ r0, r9
 Flags set as a result r0 ex-0r r9
 TST r0, r9
 Flags as a result of r0 & r9
Compare Instructions
 CMP : compare
 TST : bit-wise test
 TEQ : XOR

These instructions set only the NZCV bits of CPSR.


Load-Store Instructions
 Transfers data between memory and processor
registers
 Single register transfer
 Data types supported are signed and unsigned words (32
bits), half-words, bytes
 Multiple-register transfer
 Transfer multiple registers between memory and the
processor in a single instruction
 Swap
 Swaps content of a memory location with the contents of a
register
Single Transfer Instructions
 Load & Store data
 LDR, LDRH, LDRB:
 Load (word, half-word, byte)
 STR, STRH, STRB
 Store (word, half-word, byte)
 Supports different addressing modes:
 3 primary addressing modes
 Preindex with writeback, Preindex, Postindex
 Almost 9 derived addressing modes
 Immediate, Register, Scaled register, …
Addressing Modes (1)
 Preindex with writeback
 LDR r0, [r1, #4]!
 Updates the address base register with new address
Addressing Modes (2)
 Preindex (Immediate Offset)
 LDR r0, [r1, #4]
 12-bit offset added to the base register
Addressing Modes (3)
 Postindex
 LDR r0, [r1], #4
 Updates the address register after address is used
Example (1)
 Initial:
r0 = 0x00000000
r1 = 0x00009000
mem32 [0x00009000] = 0x01010101
mem32 [0x00009004] = 0x02020202
 Preindexing with writeback: LDR r0, [r1, #4]!
r0 = 0x02020202
r1 = 0x00009004
 Preindexing: LDR r0, [r1, #4]
r0 = 0x02020202
r1 = 0x00009000
Example (2)
 Initial:
r0 = 0x00000000
r1 = 0x00009000
mem32 [0x00009000] = 0x01010101
mem32 [0x00009004] = 0x02020202

 Postindexing: LDR r0, [r1], #4


r0 = 0x01010101
r1 = 0x00009004
Derived Addressing Modes
 Register indirect: LDR r0, [r1]
 Register operation: LDR r0, [r1, -r2]
 Calculated Address uses base register and another
register
 Scaled: LDR r0, [r1, r2, LSL #2]
 Address is calculated using the base address register
and a barrel shift operation
Example: C assignments
 C:
x = (a + b) - c;
 Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
ADR r4,c ; get address for c
LDR r2[r4] ; get value of c
C assignment, cont’d.
SUB r3,r3,r2 ; complete computation of x
ADR r4,x ; get address for x
STR r3[r4] ; store value of x
Example: C assignment
 C:
y = a*(b+c);
 Assembler:
ADR r4,b ; get address for b
LDR r0,[r4] ; get value of b
ADR r4,c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
C assignment, cont’d.
MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y
Example: C assignment
 C:
z = (a << 2) | (b & 15);
 Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0,LSL 2 ; perform shift
ADR r4,b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR
C assignment, cont’d.
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z
Control Flow Instructions
 Branch Instructions
 Conditional Branches
 Conditional Execution
 Branch and Link Instructions
 Subroutine Return Instructions
Branch Instruction
 Branch instruction: B label
 Example: B forward
 Address label is stored in the instruction as a signed
pc-relative offset
 Conditional Branch: B<cond> label
 Example: BNE loop
 Branch has a condition associated with it and
executed if condition codes have the correct value
Conditional Execution
 An unusual feature of ARM instruction set is that
conditional execution applies not only to branches
but to all ARM instructions

 Example: ADDEQ r0, r1, r2


 Instruction will only be executed when the zero flag is
set to 1
Thumb
 Thumb encodes a subset of the 32 bit instruction set
into a 16-bit subspace
 Thumb has higher performance than ARM on a
processor with a 16-bit data bus
 Thumb has higher code density
 For memory constrained embedded system
 On average, a Thumb implementation takes 30% less
memory than the equivalent ARM implementation.
(source: ARM System Developer’s Guide)
Thumb Instruction Decoding
 Each Thumb instruction is related to a 32-bit ARM
instruction.
ARMv5E Extensions
 Extensions to facilitate signal processing
operations

 Supports
 Signed multiply accumulate instruction
 Greater flexibility and efficiency when
manipulating 16 bit values for applications
such as 16 bit digital audio processing.
Summary
 We have studied instruction set of ARM
processors
 We discussed the use of barrel shifters
 We studied various addressing modes
 We have examined Thumb mode of
operation

You might also like