Professional Documents
Culture Documents
ANEESH R
1 Introduction
The ARM Cortex-A8 microprocessor is the first applications microprocessor in ARMs new Cortex family. With high performance an power efficiency! it targets a wi e "ariety of mo#ile an cons$mer applications incl$ ing mo#ile phones! set-top #oxes! gaming consoles an a$tomoti"e na"igation%entertainment systems. The Cortex-A8 processor spans a range of performance points epen ing on the implementation! eli"ering o"er to &''' (hrystone M)*+ ,(M)*+- of performance for eman ing cons$mer applications an cons$ming less than .''mW for low-power mo#ile e"ices. This translates into a large increase in processing capa#ility while staying with the power le"els of pre"io$s generations of mo#ile e"ices. Cons$mer applications will #enefit from the re $ce heat issipation an res$lting lower pac/aging an integration costs. )t is the first ARM processor to incorporate all of the new technologies a"aila#le in the ARM"0 architect$re. 1ew technologies seen for the first time incl$ e 1231 for me ia an signal processing an 4a5elle RCT for acceleration of real-time compilers. 3ther technologies recently intro $ce that are now stan ar on the ARM"0 architect$re incl$ e Tr$st6one technology for sec$rity! Th$m#-& technology for co e ensity an the 78*". floating point architect$re.
An entire application can #e written $sing space-sa"ing Th$m#-& instr$ctions! whereas with the original Th$m# mo e the processor ha to switch #etween ARM an Th$m# mo es. Ma/ing its first appearance in an ARM processor is the 1231 me ia an signal processing technology targete at a$ io! "i eo an .( graphics. )t is a :=%9&8-#it hy#ri +)M( architect$re. 1231 technology has its own register file an exec$tion pipeline which are separate from the main ARM integer pipeline. )t can han le #oth integer an single precision floating-point "al$es! an incl$ es s$pport for $naligne ata accesses an easy loa ing of interlea"e ata store in str$ct$re form. >sing 1231 technology to perform typical m$ltime ia f$nctions! the Cortex-A8 processor can eco e M*2?-= 7?A "i eo ,incl$ ing ering! e#loc/ filters an y$"&rg#- at .' frames%sec at &0@ MA5! an A.&:= 7i eo at .@' MA5 aneeshr2020 !mail"com Aneesh R
ANEESH R
aneeshr2020 !mail"com
Also new is 4a5elle RCT technology! an architect$re extension that c$ts the memory footprint of B$st-intime ,4)T- #yteco e applications to a thir of their original si5e. The smaller co e si5e res$lts in a #oost performance an a re $ction of power. Tr$st6one technology is incl$ e in the Cortex-A8 to ens$re ata pri"acy an (RM protection in cons$mer pro $cts li/e mo#ile phones! personal igital assistants an set-top #oxes that r$n open operating systems. )mplemente within the processor core! Tr$st6one technology protects peripherals an memory against a sec$rity attac/. A sec$re monitor within the core ser"es as a gate/eeper switching the system #etween sec$re an non-sec$re states. )n the sec$re state! the processor r$ns Ctr$ste D co e from a sec$re co e #loc/ to han le sec$rity-sensiti"e tas/s s$ch as a$thentication an signat$re manip$lation. ;esi es contri#$ting to the processorEs signal processing performance! 1231 technology ena#les software sol$tions to ata processing applications. The res$lt is a flexi#le platform which can accommo ate new algorithms an new applications as they emerge with simply the ownloa of new software or a ri"er. The 78*". technology is an enhancement to the 78*"& technology. 1ew feat$res incl$ e a o$#ling of the n$m#er of o$#le-precision registers to .&! an the intro $ction of instr$ctions that perform con"ersions #etween fixe -point an floating-point n$m#ers.
The Cortex-A8 processor is the most sophisticate low-power esign yet pro $ce #y ARM. To achie"e its high le"els of performance! new microarchitect$re feat$res were a e which are not tra itionally fo$n in the ARM architect$re! incl$ ing a $al in-or er iss$e ARM integer pipeline! an integrate F& cache an a eep 9.-stage pipe.
aneeshr2020 !mail"com
Aneesh R
ANEESH R
aneeshr2020 !mail"com
aneeshr2020 !mail"com
Aneesh R
ANEESH R
aneeshr2020 !mail"com
The replacement policy for the ata cache is write-#ac/ with no write allocates. Also incl$ e is a store #$ffer for ata merging #efore writing to main memory. The A7A; is a no"el approach to re $cing the power re<$ire for accessing the caches. )t $ses a pre iction scheme to etermine which way of the RAM to ena#le #efore an access.
3$(
'evel-2 cache
The Cortex-A8 processor incl$ es an integrate Fe"el-& cache. This gi"es the Fe"el-& cache a e icate low latency! high #an wi th interface to the Fe"el-l cache. This minimi5es the latency of Fe"el-9 cache linefills an oes not conflict with traffic on the main system #$s. )t can #e config$re in si5es from :=/ to &M. The Fe"el-& cache is physically a resse an 8-way set associati"e. )t is a $nifie ata an instr$ction cache! an s$pports optional 2CC an *arity. Write #ac/! write thro$gh! an write-allocate policies are followe accor ing to page ta#le settings. A pse$ o-ran om allocation policy is $se . The contents of the Fe"el-9 ata cache are excl$si"e with the Fe"el-& cache! whereas the contents of the Fe"el-9 instr$ction cache are a s$#set of the Fe"el-& cache. The tag an ata RAMs of the Fe"el-& cache are accesse serially for power sa"ings.
aneeshr2020 !mail"com
Aneesh R
ANEESH R
aneeshr2020 !mail"com The 1231 $nit is eco$ple from the main ARM integer pipeline #y the 1231 instr$ction <$e$e ,1)I-. The ARM )nstr$ction 2xec$te >nit can iss$e $p to two "ali instr$ctions to the 1231 $nit each cloc/ cycle. 1231 has 9&8-#it wi e loa an store paths to the Fe"el-9 an Fe"el-& cache! an s$pports streaming from #oth. The 1231 me ia engine has its own 9' stage pipeline that #egins at the en ARM integer pipeline. +ince all mispre icts an exceptions ha"e #een resol"e in the ARM integer $nit! once an instr$ction has #een iss$e to the 1231 me ia engine it m$st #e complete as it cannot generate exceptions. 1231 has three +)M( integer pipelines! a loa -store%perm$te pipeline! two +)M( singleprecision floating-point pipelines! an a non-pipeline 7ector 8loating-*oint $nit ,78*Fite-. 1231 instr$ctions are iss$e an retire in-or er. A ata processing instr$ction is either a 1231 integer instr$ction or a 1231 floating-point instr$ction. The Cortex-A8 1231 $nit oes not parallel iss$e two ata-processing instr$ctions to a"oi the area o"erhea with $plicating the ata-processing f$nctional #loc/s! an to a"oi timing critical paths an complexity o"erhea associate with the m$xing of the rea an write register ports. The 1231 integer atapath consists of three pipelinesG an integer m$ltiply%acc$m$late pipeline ,MAC-! an integer +hift pipeline! an an integer AF> pipeline. A loa -store%perm$te pipeline is responsi#le for all 1231 loa %stores! ata transfers to%from the integer $nit! an ata perm$te operations s$ch as interlea"e an e-interlea"e. The 1231 floating-point ,18*- atapath has two main pipelinesG a m$ltiply pipeline an an a pipeline. The separate 78*Fite $nit is a non-pipeline implementation of the ARM 78*". 8loating *oint +pecification targete for me i$m performance )222 0@= compliant floating point s$pport. 78*Fite is $se to pro"i e #ac/war s compati#ility with existing ARM floating point co e an to pro"i e )222 0@= compliant single an o$#le precision arithmetic. The CFiteD refers to area an performance! not f$nctionality.
aneeshr2020 !mail"com
Aneesh R
ANEESH R
aneeshr2020 !mail"com
( Implementation
;eca$se of the aggressi"e performance! power! an area targets ,**A- of the Cortex-A8 processor! new implementation flows ha"e #een e"elope in or er to meet goals witho$t resorting to a f$ll-c$stom implementation. The res$lting flows ena#le fine t$ning of the esign to the esire application. The res$lt is f$n amentally a cell-#ase flow! #$t $n er it lies semi-c$stom techni<$es that ha"e #een $se where necessary to meet performance. The Cortex-A8 processor $ses a com#ination of synthesi5e ! str$ct$re ! an c$stom circ$its. The esign was i"i e into se"en f$nctional $nits an then s$# i"i e into #loc/s! an the appropriate implementation techni<$e chosen for each. +ince the entire esign is synthesi5ea#le! #loc/s that can easily meet their **A goals can stic/ with a stan ar synthesis flow. A str$ct$re flow is $se for #loc/s which contain logic that can ta/e a "antage of controlle placement an ro$ting approach to meet timing or area goals. This approach is a semi-c$stom flow that man$ally maps the #loc/ into a gate-le"el netlist an specifies a relati"e placement for all the cells in the #loc/. The relati"e placement oes not specify the exact locations of the cells #$t how each cell is place with respect to the other cells in the #loc/. The str$ct$re approach is typically $se for ata #loc/s that ha"e reg$lar str$ct$re. The logic implementation an technology mapping of the #loc/ is one man$ally to maintain the reg$lar ataoriente #$s str$ct$re of the #loc/ instea of generating a ran om gate str$ct$re thro$gh synthesis. The logic gates of the #loc/ are place accor ing to the flow of ata thro$gh the #loc/. This approach offers more control o"er the esign than an a$tomate synthesis approach an lea s to a more pre icta#le timing clos$re. )t is also possi#le to get #etter performance an area on complex! high-performance esigns than tra itional techni<$es. The res$lting netlists may #e interprete with tra itional tiling techni<$es $sing the ARM Artisan A "antage-C2 or compati#le li#rary. The Artisan A "antage-C2 li#rary contains more than a tho$san cells. ;esi es the stan ar cells $se typical synthesis li#raries! many tactical cells are incl$ e more in line with c$stom implementation techni<$es. These are $se in an a$tomate fashion in the str$ct$re flow. The li#rary is specifically esigne to eal with the high- ensity ro$ting re<$irements of high-performance processors with a foc$s on #oth high spee operation an low static an ynamic power. Fea/age re $ction is achie"e thro$gh power gating MT-CM3+ cells an retention flip-fops to s$pport sleep an stan #y mo es. ARM has wor/e with tool "en ors to ens$re s$pport for this critical new flow. 8inally! a few of the most critical timing an area sensiti"e #loc/s of the esign are reser"e for f$ll c$stom techni<$es. This incl$ es memory arrays! register files an score#oar s. These #loc/s contain a mix of static an ynamic logic. 1o self-time circ$its are $se .
) Conclusion
The Cortex-A8 processor is the fastest! most power-efficient microprocessor yet e"elope #y ARM. With the a#ility to eco e 7?A A.&:= "i eo in $n er .@'MA5! it pro"i es the me ia processing power re<$ire for next generation wireless an cons$mer pro $cts while cons$ming less than .''mW in :@nm technologies. )ts new 1231 technology pro"i es a platform for flexi#le software-#ase sol$tions for me ia processing. Th$m#-& instr$ctions pro"i e co e ensity while maintaining the performance of stan ar ARM co eJ 4a5elle RCT technology oes li/ewise for realtime compilers. Tr$st6one technology pro"i es sec$rity for sensiti"e ata an (RM. Many significant new microarchitect$re feat$res ma/e their first appearance on the Cortex-A8 processor. These incl$ e a $al iss$e! in-or er s$perscalar pipeline! an integrate Fe"el-& cache an a aneeshr2020 !mail"com Aneesh R
ANEESH R
aneeshr2020 !mail"com significantly eeper pipeline than precio$s ARM processors. To meet its aggressi"e performance targets while maintaining ARMs tra itional small power #$ get! new flows ha"e #een e"elope which approach the efficiency of c$stom techni<$es while /eeping the flexi#ility of an a$tomate flow. The Cortex-A8 processor is a <$ant$m B$mp in flexi#le low power! high-performance processing.
aneeshr2020 !mail"com
Aneesh R