You are on page 1of 77

Tensilica Xtensa

Tuan Huynh, Kevin Peek & Paul Shumate


CS 451 - Advanced Pr cess r Architecture ! vem"er 15, #$$5

%vervie&

Background Changes in progress from Xtensa to Xtensa LX Automated Development Process ISA TIE Language Benchmarks

Tensilica

ounded in !""# in Santa Clara$ California %& a group of engineers from Intel$ S'I$ (IPS$ and S&nops&s to compete )ith A*C 'oal+ To address application specific microprocessor cores and soft)are development tools %& designing the first configura%le and e,tensi%le processor core

'hy(

Em%edded application pro%lems )ith high cost custom designs or lo) performance -inefficiencient. processors S&stem on a Chip -SoC. challenge
Traditi

nally s lved usin) hard&ired *T+ "l cks

The Pr "lem &ith *T+

*apidl& increasing num%er of transistors re/uire more *TL %locks on chip 0ardcoded *TL %locks are not fle,i%le 0and1optimi2ed for application specific purposes

Tensilica,s S luti n

Xtensa
-

cusin) n desi)n thr u)h the .r cess r, and n t thr u)h hard&ired *T+

Xtensa

irst appearing in !""" 341%it microprocessor core )ith a graphical configuration interface and integrated tool chain Designed from the start to %e user customi2a%le Emphasi2es instruction1set configura%ilit& as its primar& feature distinguishing it from other core offerings 0as revolutioni2ed the S&stem on a Chip -SoC. challenge through out its development Configura%le and E,tensi%le

Xtensa / 0n a !utshell

Ena%les em%edded s&stem designers to %uild %etter$ more highl& integrated products in significantl& less time Can add speciali2ed functions or instructions to processor and have them recogni2ed as 5native6 %& the entire soft)are development took chain (ove to a higher level of a%straction %& designing )ith processors rather than *TL

Xtensa - 1elivera"les

Provided as s&nthesi2a%le *TL cores


2ate

c unt ran)e3 #5,$$$ / 15$,$$$4 0ncrease in )ates as cust mer adds instructi ns r .ti nal 5eatures

Soft)are development tools

Xtensa / 6eri5icati

n Challen)es

To e,tensivel& verif& the configura%le processor to ensure each possi%le configuration )ill %e %ug free To ena%le the customer to rapidl& integrate the core )hile limiting support costs

Xtensa / 7asic Architecture


#7 instructions five1stage pipeline that supports single1c&cle e,ecution ! 1 load8store model 341entr& orthogonal register file 34 optional e,tra registers

Xtensa / 7asic Architecture

Processor Configuration

P &er 8sa)e3 #$$m', $9#5 m, 1956 Cl ck S.eed3 1:$ ;H< Cache3


1= K7 0-cache 1= K7 1-cache 1irect ma..ed

># *e)isters ?>#-"its@ ABtensi"le via use 5 T0A instructi ns ! -l atin) P int Pr cess r Cer ver head l .s

Xtensa - 0SA

Priorities used in ISA Development


C

de Si<e, C n5i)ura"ility, Pr cess r C st, Aner)y A55iciency, Scala"ility, -eatures

ISA Influences
;0PS 07;

P &er Sun SPA*C A*; Thum" HP Playd h 1SPs

Xtensa 000

9ith :irtual IP 'roup developed an (P3 audio decoder for Tensilica;s Xtensa configura%le microprocessor architecture< The decoder offers hard)are e,tensions and optimi2ed code for accelerating (P3 decoding 341%it floating point processing 34,341%it hard)are multiplier irst Coprocessor interface
6ectra

1SP enhancements

Xtensa 06

=sed )hite %o, verification methodolog& for the original development Includes >1In Check and the Checker9are Li%rar& made %& (entor 'raphics Could repartition instructions up until point of manufacturing Support multiple processors in ASIC !471%it )ide local memor& interface

Xtensa 6

3?>(02 -s&nthesi2ed.$ as small as !7@ gates -><4?mm4. (ore fle,i%le interfaces for multiple processors

'rite-"ack and &rite-thr u)h caches Anhanced Xtensa + cal ;em ry 0nter5ace Shared data mem ries

(ore Automation

Xtensa CDC44 C m.iler & T0A +an)ua)e im.r vements XT#$$$ Amulati n kit

9orldAs fastest em%edded core

Xtensa 6 / Per5

rmance C st Timeline

Xtensa =

E,tremel& fast customi2ation path Three maBor enhancements from Xtensa :


Aut

cust mi<e .r cess r 5r m CDC44 "ased al) rithm usin) XP*AS C m.iler >$E less . &er c nsum.ti n Advanced security .r visi ns in ;;8-ena"led c n5i)urati ns

Xtensa +X

5 astest processor core ever6 C Tensilica


0D%

"and&idth, c m.ute .arallelism, and l &-. &er .timi<ati n eFuivalent t hand- .timi<ed, n n.r )ramma"le, *T+-desi)ned hard&are "l cks XP*AS C m.iler and aut mated .r cess )enerat r 8ses -leBi"le +en)th 0nstructi n Xtensi n ?-+0X@ 0deal 5 r3

em"edded .r cess r c ntr l tasks C m.ute-intensive data.ath hard&are tasks

Xtensa = 6s Xtensa +X
Processor Comparison Xt e ns a 6 Ma jo r I S A Co n fig u ra tio n O p tio n s MAC16 MUL16/ MUL32 Floa t ing Point Unit Ve ct ra LX S IMD DS P Engine Lin#$ MMU Pip e lin e / Arc h ite c tu re O p tio n s Pi%e line S t a ge s FLIX (e c)nolog* Pro c e s s o r I n te rfa c e O p tio n s PIF a n- XLMI A!a ila "le Loa -/ S t ore Unit s De s igne r + De /ine - Port s a n- 0#e #e s Ye s .ne o t A!a ila "le Ye s .ne or (,o Ye s & s t a ge ot A!a ila "le & st a ge / ' s t a ge 2 + t o 1& + ,i-e is s #e %a ra lle l e $e c#t ion Ye s Ye s Ye s ot A!a ila "le Ye s Ye s Ye s Ye s Ye s ot A!a ila "le Xt e ns a LX

Xtensa +X
Strongest

selling point is performance DSP operations can %e encapsulated into custom instructions 0igh performance leads to po)er savings Custom instructions target a special application

Xtensa +X

6s

2eneral Pur. se

Xtensa +X / Traditi

nal +imitati ns

! Dperation 8 c&cle Load8Store overhead

Xtensa +X

Dptions+

ABtra l adDst re unit, &ide inter5aces, c m. und instructi ns

=p to !" 'B8sec of throughput

Xtensa +X / Hi)hli)hts

Lo)er po)er usage I8D throughput at *TL speeds Dutstanding computer performance XP*ES Compiler

Xtensa +X / +
Automated

&er P &er 8sea)e

the insertion of fine1grain clock gating for ever& functional element of the Xtensa LX processor
This includes 5uncti ns created "y the desi)ner 1irect 0D% ca.a"ility / like *T+

%utstandin) C m.utin) Per5 rmance

E,tensi%le using

LIX

?-leBi"le +en)th 0nstructi n Xtensi ns@ Similar t 6+0' / "ut cust mi<a"le t 5it a..licati n c de,s needs

Significant improvement over competitors and previous Xtensa Design


1SP

instructi ns 5 rmed usin) -+0X t "e rec )ni<ed as native t entire devel .ment system

XP*AS C m.iler

Po)erful s&nthesis tool


Creates

tail red .r cess r descri.ti ns *un n native CDC44 c de

Aut mated 1evel .ment


Clients log into )e%site


Accessin)

Process Generator

Builds a model in *TL :erilog or :0DL


Sends

result via internet t client,s site

Also receive+
Prec

n5i)ured synthesis scri.ts, test "enches, and s 5t&are-devel .ment t ls CDC44 c m.iler, linker, de"u))er, and instructi n-set simulat r already m di5ied t match the hard&are c n5i)urati n

Soft)are tools include+


Assem"ler,

Aut mated 1evel .ment

Create special instructions descri%ed and )ritten in TIE TIE semantics allo) s&stem to modif& soft)are1development tools Integrates changes into processor design Compile )ith s&nthesis tool C test C order

Xtensa +X / 7asic Architecture

Processor Configuration

P &er 8sa)e3 := 'D;H< , 4: 'D;H< ? 5 and : sta)e .i.eline@ Cl ck S.eed3 >5$ ;H<, 4$$ ;H< ?5 and : sta)e .i.eline@ Cache3

u. t ># K7 and 1,#,>,4 &ay set ass ciative cache

=4 )eneral .ur. se .hysical re)isters ?>#-"its@ = s.ecial .ur. se re)isters ABtensi"le via use 5 T0A and -+0X instructi ns Cer ver head l .s

Xtensa +X Architecture

341%it AL= ! or 4 Load8Store (odel *egisters


>#-"it

)eneral .ur. se re)ister 5ile >#-"it .r )ram c unter 1= .ti nal 1-"it " lean re)isters 1= .ti nal >#-"it 5l atin) . int re)isters 4 .ti nal >#-"it ;AC1= data re)isters %.ti nal 6ectra +X 1SP re)isters

Xtensa +X Architecture

'eneral Purpose A* *egister


>#

ile

r =4 re)isters 0nstructi ns have access thr u)h Gslidin) &ind &H 5 1= re)isters9 'ind & can r tate "y 4, I, r 1# re)isters *e)ister &ind & reduces c de si<e "y limitin) num"er 5 "its 5 r the address and eliminated the need t save and rest re re)ister 5iles

Xtensa +X Architecture

Xtensa +X Pi.elinin)

? or # Stage Pipeline Design ? stage pipeline has stages+ I $ *egister Access$ E,ecute$ Data1(emor& Access$ and register )rite%ack ? stage pipeline accesses memor& in t)o stages< # stage pipeline is e,tended version of the ? stage pipeline )ith e,tra I and (emor& Access stage< E,tra stages provide more time for memor& access< Designer can run at a higher clock speed )hile using slo)er memor& to improve performance

Xtensa +X 0nstructi n Set

ISA consists of 7> core instructions including %oth !E and 4F %it instructions

Xtensa +X 0nstructi n Set

Processor Control Instructions

*S*, 'S*, XS*


*ead S.ecial *e)ister, 'rite S.ecial *e)ister 8sed 5 r savin) and rest rin) c nteBt, Pr cessin) 0nterru.ts and ABce.ti ns, C ntr llin) address translati n Access 8ser *e)isters 8sed 5 r C .r cess r re)isters and re)isters created &ith T0A

*8*, '8*

0SJ!C / &ait 5 r 0nstructi n -etch related chan)es t res lve *SJ!C / &ait 5 r 1is.atch related chan)es t res lve ASJ!CD1SJ!C / 'ait 5 r mem ryDdata eBecuti n related chan)es t res lve

Xtensa +X 0SA / 7uildin) 7l cks

(=L34
;8+>#

adds ># "it multi.lier

(=L!E and (AC!E


;8+1=

adds 1=B1= "it multi.lier ;AC1= adds 1=B1= "it multi.lier and 4$-"it accumulat r

Xtensa +X 0SA / 7uildin) 7l cks

loating Point =nit


>#-"it,

sin)le .recisi n, 5l atin)-. int c .r cess r

:ectra LX DSP Engine


%.timi<ed

t handle 1i)ital Si)nal Pr cessin) A..licati ns

6ectra +X 1SP An)ine


LIX1%ased -)h& it is EF %its. :ectra LX instructions encoded in EF %its<

Bits >+3 of a Xtensa instruction determine its length and format$ the %its have a value of !F to specif& it is a :ectra LX instruction Bits F+4# C contain either Xtensa LX core instruction or :ectra LX Load or Store instruction Bits 47+F? C contains either a (AC instruction or a select instruction Bits FE+E3 C contains either AL= and shift instructions or a load and store instruction for the second :ectra LX load8store unit

6ectra +X 1SP An)ine

Tensilica 0nstructi n ABtensi n

(ethod used processorAs instruction Can %e used


-

to e,tend the architecture and set in t)o )a&s+

r the T0A C m.iler - r the Pr cess r 2enerat r

Tensilica 0nstructi n ABtensi n

TIE Compiler
2enerates

5ile used t c n5i)ure s 5t&are devel .ment t ls s that they rec )ni<e T0A ABtensi ns Astimates hard&are si<e 5 ne& instructi n J u can m di5y a..licati n c de t take advanta)e 5 the ne& instructi n and simulate t decide i5 the s.eed advanta)e is & rth the hard&are c st

T0A

*esem%les :erilog (ore concise than *TL -it omits all se/uential logic$ pipeline registers$ and initiali2ation se/uences< The custom instructions and registers descri%ed in TIE are part of the processorAs programming model<

T0A Kueues and P rts


Ge) )a& to communicate )ith e,ternal devices Hueues+ data can %e sent or read through /ueues< A /ueue is defined in the TIE and the compiler generates the interface signals re/uired for the additional port needed to connect to the /ueue< Logic is also automaticall& generated Import1)ire+ processor can sample the value of an e,ternal signal E,port1state+ drive an output %ased

T0A

TIE Com%ines multiple operations into one using+


-usi

n S0;1D6ect r Trans5 rmati n -+0X

-usi n

Allo)s &ou to com%ine dependent operations into a single instruction Consider+ computing the average of t)o arra&s
unsigned short Ia$ I%$ IcJ < < < for- i K >J i L nJ iMM. cNiO K -aNiO M %NiO. PP !J

T&

Xtensa +X C re instructi ns reFuired, in additi n t l adDst re instructi ns

-usi n

use the t)o operations into a single TIE instruction


operation A:E*A'EQout A* res$ in A* input>$ in A* input!RQRQ )ire N!E+>O tmp K input>N!?+>O M input!N!?+>OJ assign res K tempN!E+!OJ R

Assentially

an add 5eedin) a shi5t, descri"ed usin) standard 6eril )-like syntaB

Implementing the instruction in C8CM M


Sinclude L,tensa8tie8average<hP

unsigned short Ia$ I%$ IcJ < < < for- i K >J i L nJ iMM. cNiO K A:E*A'E-aNiO M %NiO.J

S0;1D6ect r Trans5 rmati n

Single Instruction$ (ultiple Data


-usin) instructi ns int a Gvect rH All &s re.licati n 5 the same .erati n multi.le times in ne instructi n

Consider+ Computing four averages in one instruction

The 5 ll&in) T0A c de c m.utes multi.le iterati ns in a sin)le instructi n "y c m"inin) -usi n and S0;1
regfile :EC EF 7 v operation :A:E*A'EQout :EC res$ in :EC input>$ in :EC input!R QR Q )ire NE#+>O tmp K Q input>NE3+F7O M input!NE3+F7O$ input>NF#+34O M input!NF#+34O$ input>N3!+!EO M input!N3!+!EO$ input>N!?+>O M input!N!?+>O RJ assign res K QtmpNE#+?4O$ tmpN?>+3?O$ tmpN33+!7O$ tmpN!E+!ORJ R

S0;1D6ect r Trans5 rmati n


Computing four !E1%it averages

Aach data vect r must "e =4 "its ?4 B 1= "its@

Create ne) register file$ ne) instruction


:EC - ei)ht =4-"it re)isters t h ld data vect rs :A:E*A'E - takes .erands 5r m :EC, c m.utes avera)e, saves results int :EC

:EC Ia$ I%$ IcJ for -i K >J i L nJ i MK F.Q cNiO K :A:E*A'E- aNiO$ %NiO .JR

Ge) Datat&pe recogni2ed

T0A aut matically creates ne& l ad, st re instructi ns t m ve =4-"it vect rs "et&een :EC re)ister 5ile and mem ry

-+0X

le,i%le length instruction e,tension


Key

in eBtreme eBtensi"ility Hu)e .er5 rmance )ains . ssi"le C de si<e reducti n &ith ut c de "l at

Similar to :LI9 Created %& XP*ES Compiler

-+0X

-+0X - 8sa)e

=sed selectivel& )hen parallelism is needed Avoids code %loat =sed seemlessl& and modelessl& used )ith standard !E1 and 4F1%it instructions

XP*AS C m.iler

Po)erful s&nthesis tool


Creates

tail red .r cess r descri.ti ns *un n native CDC44 c de


Three optimi2ations methods *eturns optimal configurations along )ith pros and cons -tradeoffs.

XP*AS C m.iler

Anal&2es C8CMM code 'enerates possi%le configurations Compares performance criteria to silicon si2e -cost. *eturns possi%le configurations

XP*AS C m.iler - *esults

Application dependent
C

m.ute intensive .r )rams 1ata intensive .r )rams

(ore is sometimes less

.erati n sl ts in -+0X

XP*AS / 4 Pr

)ram Test

5Bit (anipulator6 program Cut c&cles to a third

XP*AS / 4 Pr

=E

)ram Test
ilter

0<4EF De%locking

.er5 rmance im.r vement

XP*AS / 4 Pr

)ram Test

(PE'F decoder
#>E

.er5 rmance increase

XP*AS / 4 Pr

=>E

)ram Test

SAD C sum of a%solute difference


.er5 rmance increase

Xtensa Hi--i # Audi An)ine


Add1on package for Xtensa LX Advantages over common audio processors+


"etter

s und Fuality 5 c m.ressed 5iles "ecause 5 increased .recisi n availa"le 5 r intermediate calculati ns9 ?#4 "its rather than 1=@ #4-"it audi 5ully c m.ati"le &ith m dern audi standards

Xtensa Hi--i # Audi An)ine

Audio packages integrated into an SDC design$ so no additional codec development re/uired Integrated Audio Packages+
1

l"y 1i)ital AC-> 1ec der, 1 l"y 1i)ital AC-> C nsumer Anc der, KS und ;icr K, ;P> Anc derD1ec der, ;PA2-4 aac.lus v1 and v# Anc derD1ec der, ;PA2-#D4 AAC +C Anc derD1ec der, ';A Anc derD1ec der, A;* narr &"and s.eech c dec, A;* &ide"and s.eech c dec9

Xtensa Hi--i # Audi An)ine


=ses over 3>> audio specific DLP instructions< eatures dual1multipl& accumulate for 4F,4F and 34,!E %it arithmetic on %oth units 5delivers noticea%l& superior sound /ualit& even )hen decoding prerecorded !E1%it encoded music files< 5

S.eed-u. ABam.le

'S( Audio Codec C )ritten in C Profiling code using unaltered *ISC architecture sho)ed that 7>T of the processor c&cles )ere devoted to multiplication Simpl& %& adding a hard)are multiplier$ the designer can reduce the num%er of c&cles re/uired from 4>F million to 47 million

S.eed-u. ABam.le

:iter%i %utterfl& instruction


Acts

like c m.ressi n 5 r the data C nsists 5 I l )ical .erati n I 5 these .erati ns are used t dec de each sym" l in the received di)ital in5 rmati n stream The desi)ner can add a 6iter"i instructi n t the Xtensa 0SA9 The eBtensi n can use the 1#I-"it mem ry "us t l ad data 5 r I sym" ls at nce9 This results in a avera)e eBecuti n time 5 $91= cycles .er "utter5ly9 An unau)mented Xtensa +X eBecutes 6iter"i in 4# cycles9

AA;7C !et& rkin) 7enchmark

Xtensa LX received highest %enchmark ever achieved on the Get)orking version 4 test< Xtensa LX has a F, code densit& advantage and a !>>, advantage in %oth die area and po)er dissipation

AA;7C !et& rkin) 7enchmark

Gormali2ed -per (02. EE(BC TCPmark Simulates performance in internet ena%led client side performance

Pr cess r Xtensa +X %.timi<ed P &erPC :=$2X P &erPC ;CP:44:A Xtensa +X %ut 5 the 7 B

Sc re 19=#4>4 $94=:1 $95I5= $9>>:=#

AA;7C !et& rkin) 7enchmark

Gormali2ed -%& (02. EE(BC IPmark Simulates performance in net)ork routers$ gate)a&s$ and s)itches

Pr cess r Xtensa +X %.timi<ed P &erPC :=$2X Xtensa +X %ut 5 the 7 B P &erPC ;CP:44:A

Sc re $9I#1>I $9#I=1 $91I1I

$91:51

AA;7C !et& rkin) 7enchmark

Total Code Si2e

Pr cess r Xtensa +X %.timi<ed Xtensa +X %ut 5 the 7 B P &erPC :=$2X P &erPC ;CP:44:A

T tal Si<e 5 C de =5,#$I =:,#5=

#55,:=4 #I$,LI4

H & Xtensa C m.ares

H & Xtensa C m.ares

H & Xtensa C m.ares ?c

nt@

8ses 5 Xtensa Pr ducts

G:IDIA C Licensed Xtensa LX


G'e

&ere very im.ressed &ith TensilicaMs aut mated a..r ach 5 r " th the .r cess r eBtensi ns and the )enerati n 5 the ass ciated s 5t&are t lsH

8ses 5 Xtensa Pr ducts

L' Cell Phone


Ph

ne is di)ital "r adcast ena"led Xtensa .r cess r &as used "ecause it ena"led +2 t Gcut desi)n time si)ni5icantly and "e 5irst t market &ith this eBcitin) ne& techn l )y9H Terrestrial di)ital-multimedia-"r adcast system in K rea

0n case y u are & nderin)99

11Tensilica;s announced licensees include Agilent$ ALPS$ A(CC -UGI Corporation.$ Astute Get)orks$ ATI$ Avision$ Ba& (icros&stems$ Berkele& 9ireless *esearch Center$ Broadcom$ Cisco S&stems$ Cone,ant S&stems$ C&press$ Crimson (icros&stems$ ET*I$ =UI IL( (icrodevices$ uBitsu Ltd<$ 0udson Soft$ 0ughes Get)ork S&stems$ Ikanos Communications$ L' Electronics$ (arvell$ GEC La%oratories America$ GEC Corporation$ GetEffect$ Geterion$ Gippon Telephone and Telegraph -GTT.$ G:IDIA$ Dl&mpus Dptical Co< Ltd<$ sci1)or,$ Seiko Epson$ Solid State S&stems$ Son&$ ST(icroelectronics$ Stretch$ TranS)itch Corporation$ and :ictor Compan& of Uapan

Kuesti ns(

You might also like