You are on page 1of 43

C6000

Architecture

Outline
CPU Architecture
Instruction Set Overview
Internal Buses & Memory
C6000 Peripherals Overview
Device Family Review

'C6000 System Block Diagram


Internal
Memories

CPU

Let's look into the CPU...

What Problem Are We Trying To Solve?


(A)

Digital sampling of
an analog signal
code

T =1
fs

Most DSP algorithms can be expressed as:


N

Y =

an * xn

n = 1

How is the 'C6000 designed to handle this algorithm?

Sum of Products (SOP) - Example


40

Y =

an * xn
n = 1

Lets write the code for this


algorithm
And develop the architecture
along the way...

What are the two basic


instructions required
by this algorithm?
Multiply
Add

Multiply
40

Y =

an * xn
n = 1

.?

MPY

a, x, prod

Multiply (.M unit)


40

Y =

an * xn
n = 1

.M

MPY .M

Note: 16-bit multiplier provides 32-bit results

a, x, prod

Add
40

Y =

an * xn
n = 1

.M
.?

MPY .M

a, x, prod

ADD .?

sum, prod, sum

Add (.L unit)


40

Y =

an * xn
n = 1

Where are
the variables
stored?

.M
.L

MPY .M

a, x, prod

ADD .L

sum, prod, sum

Register File - A
Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

an * xn
n = 1

.M
.L

MPY .M

a, x, prod

ADD .L

sum, prod, sum

Specifying Register Names


Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

an * xn
n = 1

.M
.L

MPY .M

A0, A1, A3

ADD .L

A4, A3, A4

Specifying Register Names


Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

an * xn
n = 1

.M
.L

MPY .M

A0, A1, A3

ADD .L

A4, A3, A4

How Do You Create the Loop?


Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.

40

Y =

an * xn
n = 1

.M
.L

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

Loop?

A31
32-bits

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter (= 40)
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the value
in the loop counter

Branching (1)
Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

.?

an * xn
n = 1

.M
loop:

.L

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

.?

loop

Branching (.S Unit)


Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

.S

an * xn
n = 1

.M
loop:

.L

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

.S

loop

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter (= 40)
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the
value in the loop counter

Creating a Loop Counter (2)

MVK - MoVe a 16-bit Konstant into a register


MVK

.S

40, A2

; A2 = 40

Creating a Loop Counter (2)


Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4

.
.
.
A31
32-bits

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

.S

loop

n = 1

loop:

.L

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter (= 40)
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the
value in the loop counter

Decrementing Loop Counter (3)


Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4

.
.
.
A31
32-bits

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

n = 1

loop:

.L

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter (= 40)
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the
value in the loop counter

Conditional Instructions
To minimize branching, all instructions are conditional

[condition]

loop

Execution based on !zero/non-zero condition


Code Syntax

Execute instruction if :

[cond]

true:

cond 0

[!cond]

false:

cond = 0

Where condition is: A0*, A1, A2, B0, B1, B2

*Note: Devices after C64x allows A0 to be used as a condition

Using Conditional Branch (4)


Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4

.
.
.
A31

40

Y =

.M

MVK

.S

40, A2

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

n = 1

loop:

.L

[A2] B
32-bits

an * xn

.S

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter with proper value
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the
value in the loop counter

Loading Values Into Registers


Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4
&a[n]
A5
&x[n]
A6
&Y
A7
..
A31

How do a and x get loaded?


.S

a, x, Y located in memory

.M

Create a pointer to values


A5 = &a
A6 = &x
A7 = &Y
Use pointer with load/store
LD
*A5, A0
LD
*A6, A1
ST
A4, *A7

.L

32-bits

Memory

a [40]
x [40]
Y

*A5
*A6
*A7

Load/Store Options
Because the 'C6000 provides byte addressability, the instruction
set supports several types of load/store instructions:
Load instructions

C Data Type

LDB

Load 8-bit byte

char

LDH

Load 16-bit half-word

short

LDW

Load 32-bit word

int

LDDW

Load 64-bit double-word

double, long long

Not Supported

C62x

Store instructions
STB

Load 8-bit byte

char

STH

Load 16-bit half-word

short

STW

Load 32-bit word

int

STDW

Load 64-bit double-word

double, long long

C62x, C67x

If were mulitplying 16-bit numbers, which


load instruction should be used?

Use LDH for Short (16x16) MPYs


Because the 'C6000 provides byte addressability, the instruction
set supports several types of load/store instructions:
Load instructions

C Data Type

LDB

Load 8-bit byte

char

LDH

Load 16-bit half-word

short

LDW

Load 32-bit word

int

LDDW

Load 64-bit double-word

double, long long

Not Supported

C62x

Store instructions
STB

Load 8-bit byte

char

STH

Load 16-bit half-word

short

STW

Load 32-bit word

int

STDW

Load 64-bit double-word

double, long long

C62x, C67x

Load/Store
Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4
&a[n]
A5
&x[n]
A6
&Y
A7
..
A31

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

LDH

.?

*A5, A0

LDH

.?

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

.?

A4, *A7

n = 1

loop:

.L
.?

[A2] B
STH

32-bits

Data Memory

Load/Store - .D Unit
Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4
&a[n]
A5
&x[n]
A6
&Y
A7
..
A31

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

.D

A4, *A7

n = 1

loop:

.L
.D

[A2] B
STH

32-bits

Data Memory

Using Arrays ...


A5
A6
A5
++

a0
a1
a2

a
&x
&

A6
++

.
.

40

Y = an * xn

x0
x1
x2

.
.

After first loop, A4 contains...

a0 * x0
How do you access a1 and
x1 on the second loop?
LDH .D
*A5++, A0
LDH .D
*A6++, A1

n = 1

loop:

MVK

.S

40, A2

LDH
LDH

.D

*A5,
*A5++,
A0A0

LDH
LDH

.D

*A6, A1A1
*A6++,

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

.D

A4, *A7

[A2] B
STH

Incrementing the Pointers


Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4
&a[n]
A5
&x[n]
A6
&Y
A7
..
A31

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

LDH

.D

*A5++, A0

LDH

.D

*A6++, A1

MPY

.M

A0, A1, A3

ADD

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

.D

A4, *A7

n = 1

loop:

.L
.D

[A2] B
STH

32-bits

Data Memory

Adding Side B
Register File A
A0
A1
A2
A3
A4

.
.
.
A31

Register File B
.S1

.S2

.M1

.M2

.L1

.L2

.D1

.D2

32-bits

B0
B1
B2
B3
B4

.
.
.
B31
32-bits

Data Memory

Code Review (using side A only)


40

Y =

an * xn

n = 1

MVK
loop: LDH
LDH
MPY
ADD
SUB
[A2] B
STH

.S1
.D1
.D1
.M1
.L1
.L1
.S1
.D1

40, A2
*A5++, A0
*A6++, A1
A0, A1, A3
A3, A4, A4
A2, 1, A2
loop
A4, *A7

; A2 = 40, loop count


; A0 = a(n)
; A1 = x(n)
; A3 = a(n) * x(n)
; Y = Y + A3
; decrement loop count
; if A2 0, branch
; *A7 = Y

Note: Assume A4 previously cleared.

Outline
CPU Architecture
Instruction Set Overview
Classic C6x Devices (C62x, C67x)
Introducing SIMD (C64x)
Brand New (C64x+, C674x, C66x)

Internal Buses & Memory


C6000 Peripherals Overview
Device Family Review
Exam 1

Outline
CPU Architecture
Instruction Set Overview
Classic C6x Devices (C62x, C67x)
Introducing SIMD (C64x)
Brand New (C64x+, C674x, C66x)

Internal Buses & Memory


C6000 Peripherals Overview
Device Family Review
Exam 1

C62x Instruction Set (by category)


Arithmetic

Logical

ABS
ADD
ADDA
ADDK
ADD2
MPY
MPYH
NEG
SMPY
SMPYH
SADD
SAT
SSUB
SUB
SUBA
SUBC
SUB2
ZERO

AND
CMPEQ
CMPGT
CMPLT
NOT
OR
SHL
SHR
SSHL
XOR

Bit Mgmt
CLR
EXT
LMBD
NORM
SET

Data Mgmt
LDB/H/W
MV
MVC
MVK
MVKL
MVKH
MVKLH
STB/H/W

Program Ctrl
B
IDLE
NOP

Note: Refer to the 'C6000 CPU Reference Guide for more details

C62x Instruction Set (by unit)


.S Unit

.S
.L
.D

ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKL
MVKH

NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO

ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM

NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO

.M Unit
.D Unit

.M

.L Unit

ADD
NEG
ADDAB (B/H/W) STB
(B/H/W)
SUB
LDB
(B/H/W) SUBAB (B/H/W)
ZERO
MV

MPY
MPYH
MPYLH
MPYHL

SMPY
SMPYH

No Unit Used
NOP

IDLE

C67x Superset of Fixed-Point


.S Unit

.S
.L
.D

ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKL
MVKH

NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO

ABSSP
ABSDP
CMPGTSP
CMPEQSP
CMPLTSP
CMPGTDP
CMPEQDP
CMPLTDP
RCPSP
RCPDP
RSQRSP
RSQRDP
SPDP

.D Unit

.M

ADD
NEG
ADDAB (B/H/W) STB
(B/H/W)
ADDAD
SUB
LDB
(B/H/W) SUBAB (B/H/W)
LDDW
ZERO
MV

.L Unit
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM

NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO

ADDSP
ADDDP
SUBSP
SUBDP
INTSP
INTDP
SPINT
DPINT
SPRTUNC
DPTRUNC
DPSP

.M Unit
MPY
MPYH
MPYLH
MPYHL

SMPY
SMPYH

MPYSP
MPYDP
MPYI
MPYID

No Unit Used
NOP

IDLE

C67x+ CPU Core Enhancements


CPU Enhancements
Number of registers doubled to 64
Cross-path operand sourcing ability doubled to 2
Execution Packets can now Span Fetch Packets (for better code size!)
All changes are backwards compatible to 67x CPU

New Instructions
.S Units enhanced with FP Adder
ADDSP
ADDDP
SUBSP
SUBDP
Along with .L unit, you can have
4 float adds/subtracts in parallel

.M Units enhanced with mixed


precision multiply instructions
MPYSPDP SP x DP into DP
MPYSP2DP SP x SP into DP
Many apps may benefit from these
mixed precision floating point mpys
These provide faster alternatives to
the full double precision MPYDP

Outline
CPU Architecture
Instruction Set Overview
Classic C6x Devices (C62x, C67x)
Introducing SIMD (C64x)
Brand New (C64x+, C674x, C66x)

Internal Buses & Memory


C6000 Peripherals Overview
Device Family Review
Exam 1

The C64x adds ...


Instruction Fetch
Instruction Dispatch

Emulation

Advanced Instruction
Packing

Advanced
Emulation

Instruction Decode

Registers (A0 - A15)

Registers (B0 - B15)

Registers (A16 - A31)

Registers (B16 - B31)

L1

S1

+
+

+
+
+

Interrupt
Control

Control Registers

+
+

M1
x
x
x
x

D1

D2

M2
X

x
x
x
x

S2

L2

+
+

+
+

+
+
+

Doubled size of register set


Dual 64-bit buses for loads/stores
Packed Data Processing - Dual 16-bit (4000 MMACs) or
- Quad 8-bit (8000 MMACs) which is great for imaging applications
Increased code density
100% object code compatible with C62x

'C64x: Superset of C62x


.S

.D

Dual/Quad Arith
SADD2
SADDUS2
SADD4

Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
Bitwise Logical UNPKHU4
ANDN
UNPKLU4
Shifts & Merge SWAP2
SPACK2
SHR2
SPACKU4
SHRU2
SHLMB
SHRMB
Dual Arithmetic Mem Access
ADD2
LDDW
SUB2
LDNW
LDNDW
Bitwise Logical STDW
AND
STNW
ANDN
STNDW
OR
XOR
Load Constant
MVK (5-bit)
Address Calc.
ADDAD

Compares
CMPEQ2
CMPEQ4
CMPGT2
CMPGT4

.L

Branches/PC
BDEC
BPOS
BNOP
ADDKPC

Dual/Quad Arith
ABS2
ADD2
ADD4
MAX
MIN
SUB2
SUB4
SUBABS4
Bitwise Logical
ANDN

.M
Average
AVG2
AVG4
Shifts
ROTL
SSHVL
SSHVR

Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
PACKH4
PACKL4
UNPKHU4
UNPKLU4
SWAP2/4

Multiplies
MPYHI
Shift & Merge
MPYLI
SHLMB
MPYHIR
SHRMB
MPYLIR
Load Constant
MPY2
MVK (5-bit)
SMPY2
Bit Operations DOTP2
DOTPN2
BITC4
DOTPRSU2
BITR
DOTPNRSU2
DEAL
DOTPU4
SHFL
DOTPSU4
Move
GMPY4
MVD
XPND2/4