D.F. Fincholsloin, V. Szo, M.L. Sinangil, Y. Kokon, A.I. Chanoiakasan `nn|& 1n:& J|n|jj AIstract Tne H.264/AYC vIdeo codIng standard can deIIver nIgn conressIon emcIency at a cost oI Iarge con- IexIty and over. Tne IncreasIng ouIarIty oI vIdeo cature and IayLack on ortaLIe devIces re- quIres tnat tne energy oI tne vIdeo codec Le ket to a nInInun. TnIs aer rooses severaI arcnItecture otInIzatIons sucn as Increased araIIeIIsn, nuItI- Ie voItage/Irequency donaIns, and custon voItage- scaIaLIe SHAs tnat enaLIe Iov voItage oeratIon and reduce tne over oI a nIgn-dehnItIon decoder. An H.264/AYC HaseIIne LeveI 3.1 decoder ASIC vas IaLrIcated In 65-nn COS and verIhed. It oerates dovn to U.7-Y and nas a neasured over oI 1.BnV vnen decodIng a nIgn dehnItIon 72U vIdeo at 3U Iranes er second, vnIcn Is over an order oI nag- nItude Iover tnan revIousIy uLIIsned resuIts. 1 IntroductIon 1ho H.264A\C viooo cooing slunouio cun oolivoi high con- piossion ofcioncy; howovoi, il conos ul u cosl of high con- ploxily uno powoi which is ciilicul foi bulloiy-opoiuloo oo- vicos. Slulo-of-lho-uil ASIC H.264 oocooois huvo usoo loch- niguos such us pipolining, puiullolisn, uno cuching lo inciouso lhioughpul uno ioouco powoi consunplion [1, 2, 8, 4[. In lhis woil, wo uso uooilionul uichilocluio oplinizulions lo onublo low vollugo opoiulion, uno lhus ioouco lho powoi consunp- lion of high-oofnilion oocooing. \o pioposo lhul fuilhoi puiullolisn, nulliplo vollugofioguoncy oonuins, uno cus- lon vollugo-sculublo S!AMs uio nocossuiy in oiooi lo nuin- luin lho lhioughpul nocossuiy foi high oofnilion oocooing ul low vollugos. 1o vulioulo lhoso ioous, u 1.S n\ H.264A\C !usolino Lovol 8.1 viooo oocoooi wus inplononloo in 6-nn CMOS uno oocooos 720p^80fps wilh supply vollugos oown lo 0.7-\. \ollugo sculing is u woll-lnown lochniguo lhul iooucos on- oigy consunplion by u guuoiulic fucloi, ul lho cosl of in- ciousoo ciicuil ooluy. 1his oociousoo spooo is u chullongo foi ioul-lino uppliculions such us viooo oocooing whoio on uv- oiugo u now fiuno nusl bo conpuloo ovoiy 88 ns. \o will oosciibo u viooo oocoooi lhul ullows low vollugo opoiulion by iooucing lho lolul nunboi of cyclos ioguiioo poi fiuno wilh- oul inciousing lho lovols of logic on lho ciilicul pulh bolwoon iogislois. 1his ullows foi u longoi clocl poiioo, which louos lo u lowoi supply vollugo uno lowoi onoigy. 2 Decoder FIpeIIne ArcLItecture 1ho H.264 coooc is oosciiboo in [[ uno builos upon piovious M!LG slunouios. A blocl oiugiun of oui oocoooi huio- wuio is shown in Figuio 1. Al lho syslon lovol of lho oo- coooi, FIFOs of vuiying ooplhs connocl lho nujoi piocossing unils: Lnliopy Docoooi (LD), Invoiso 1iunsfoin (I1), Mo- lion Conponsulion (MC), Spuliul !iooiclion (IN1!A), Do- blocling Filloi (D!), Monoiy Conliolloi (MLM) uno Fiuno !uloi (F!). 1ho pipolinoo uichilocluio ullows lho oocoooi lo piocoss sovoiul 4x4 blocls sinullunoously, ioguiiing fowoi cyclos lo oocooo ouch fiuno. 1ho nunboi of cyclos ioguiioo lo piocoss ouch 4x4 blocl vuiios foi ouch unil us shown in 1ublo 1. 1ho lublo oosciibos lho pipolino poifoinunco foi oocooing !-fiunos (lonpoiul piooiclion). Mosl of lho oplinizulion oloils woio focusoo on !-fiuno poifoinunco, sinco lhoy occui noio fioguonlly lhun I-fiunos (spuliul piooiclion) in highly conpiossoo viooos. 1ho piosonco of FIFOs bolwoon ouch unil oisliibulos lho pipolino conliol uno ullows lho unils lo opoiulo oul of locl- slop. 1his uoupls foi lho vuiiublo luloncy of ouch unil uno inciousos lho lhioughpul of lho oocoooi by iooucing lho nun- boi of slulls, us oosciiboo in [6[. Entropy Decoder (ED) Inverse Transform (IT) Intra/Inter Selection (MUX) Spatial Prediction (INTRA) Motion Compensation (MC) Deblocking Filter (DB) Bitstream Input Video Output + Bitstream Input Bitstream Input OUR CHIP Memory Controller (MEM) Decoder Unit FIFO LEGEND Frame Buffer (FB) Figuio 1: H.264 oocoooi uichilocluio. 1ublo 1: Cyclos poi 4x4 lunu blocl foi ouch unil in !-fiuno pipolino of Figuio 1, ussuning no slulling. [ [ is poifoinunco ufloi Soclion 8 oplinizulions. 1:|:n 1:n 1cr :cj 1n: o| o| o| EL 0 89 4.8 IT 0 4 1.6 C 4 [2[ 9 [4.[ 4.6 [2.8[ LH S [2[ 12 [6[ S.9 [2.9[ E-read 4 27 18 1ho lunu uno chionu conpononls huvo sopuiulo pipolinos lhul shuio nininul huiowuio uno uio noslly oocouploo fion ouch olhoi. 1ho lwo pipolinos oo huvo ooponooncios on ouch olhoi, which sonolinos piovonls lhon fion iunning ul lho suno lino. Foi oxunplo, bolh pipolinos uso lho suno LD ul lho sluil, sinco lhis opoiulion is inhoionlly soiiul uno pio- 1 173 978-1-4244-2605-8/08/$25.00 2008 IEEE IEEE Asian Solid-State Circuits Conference November 3-5, 2008 / Fukuoka, Japan 6-1 oucos inpul coofcionls uno nolion voclois lo bolh pipolinos. 3 FaraIIeIIsn !uiullolisn cun bo usoo wilhin ouch unil lo ioouco lho nun- boi of cyclos ioguiioo lo piocoss ouch 4x4 blocl. 1his is pui- liculuily upplicublo lo lho nolion conponsulion (MC) uno ooblocling (D!) unils, which woio founo lo bo loy bolllo- nocls in lho syslon pipolino. 1ho cosl of puiullolisn is piinuiily un inciouso in uiou. 1ho inciouso in logic uiou ouo lo lho puiullolisn in lho MC uno D! unils oosciiboo in lho following soclions is uboul 10'. \hon conpuioo lo lho onliio oocoooi uiou (incluoing on-chip nonoiy) lho uiou ovoihouo is loss lhun 8'. 3.1 AotIon ConpensatIon FaraIIeIIsn 1ho MC unil usos inloipolulion uno flloiing lo piooicl lho cuiionl 4x4 blocl fion pusl fiunos lo oxploil lonpoiul io- ounouncy. 1ho lunu inloipoluloi piooicls lho cuiionl 4x4 blocl fion u 9x9 blocl of pixols fion lho piovious fiuno, whilo lho chionu inloipoluloi usos bilinoui fllois lo piooicl u 2x2 blocl fion u 8x8 blocl. 1o inpiovo lho lhioughpul of MC, u socono ioonlicul inloi- poluloi wus uoooo in puiullol us shown in Figuio 2. 1his pui- ullol sliucluio cun ooublo lho lhioughpul of lho MC slugo if ouiing ouch cyclo, bolh nolion voclois uio uvuilublo uno lwo now colunns of 9 pixols fion lho fiuno buloi uio uvuilublo ul lho inpuls of MC0 uno MC1. 1ho chionu inloipoluloi hus 4 puiullol bilinoui fllois uno cun piooicl u 2x2 blocl ovoiy cyclo. MC0 MC1 Frame Buffer (FB) Entropy Decoder (ED) column0 MV0 MV1 column1 6:1 6:1 6:1 6:1 6:1 x9 Figuio 2: 1ho MC unil is oupliculoo inlo MC0 uno MC1. 1ho oulupulh of ouch MC is 4 S-bil iogislois, lhiiloon 6:1 FI! fllois, uno foui 4:1 bilinoui fllois. 3.2 DeIIockIng FIIter FaraIIeIIsn 1hoio uio 4 lunu uno 2 chionu ooblocling fllois iunning in puiullol. 1ho lunu uichilocluio is shown in Figuio 8. 1ho 4 lunu fllois shuio lho suno bounouiy slionglh culcululion, us oo lho 2 chionu fllois. 1ho lunu uno chionu fllois cun ull iun ul lho suno lino, ussuning lho inpul oulu uno con- fguiulion puiunolois uio uvuilublo. A lunu nucioblocl (16 4x4 blocls) hus 12S oogos lhul nooo lo bo flloioo, so wilh 4 lunu fllois lho nucioblocl lulos 82 clocl cyclos lo conplolo. Unlilo piovious inplononlulions [7[, flloiing on 4x4 blocls bogins bofoio lho onliio nucioblocl is ioconsliucloo. A singlo-poil on-chip cucho wilh 4x4xS-bil oulu-wiolh is usoo lo sloio pixols fion lho lop uno lofl nucioblocls. 1ho oogo flloiing oiooi of lho nucioblocl wus oplinizoo lo nin- inizo lho nunboi of slulls fion iouowiilo conlicls lo lho cucho uno uccounls foi lho 4x4 blocl uiiivul oiooi uno lho lofl-iighl-lop-bollon oogo oiooi spocifoo in H.264 [[. Fui- lhoinoio, us lho on-chip cucho hus u 2-cyclo iouo luloncy, u fow oxliu slull cyclos uio nooooo foi lhoso nonoiy uccossos whon piofolching lhon is nol possiblo. 1uling inlo uccounl lhis ovoihouo, lho uvoiugo nunboi of cyclos ioguiioo by D! foi u lunu 4x4 blocl is uboul 2.9. Luch of lho lwo chionu conpononls of u nucioblocl hus 82 pixol oogos lo bo flloioo. Using 2 fllois poi cyclo iosulls in slighlly ubovo 82 clocl cyclos poi nucioblocl. \hon uc- counling foi slulls, lho nunboi of cyclos poi nucioblocl is uboul lho suno us lunu. Ovoiull, lho nunboi of cyclos io- guiioo by D! is uiouno 46 cyclos poi nucioblocl, which is loss lhun olhoi ooblocling flloi inplononlulions [7[. Last Line Cache 104kb [SRAM] Internal Memory 4x(4x4x8b) [DFF] Block IN p 0~3 (x4) Filters (bS=4) Filters (bS=1 to 3) Boundary Strength (bS) Datapath Control Datapath Control Block OUT Datapath Control Datapath Control Datapath Control q 0~3 (x4) p 0~3 (x4) (bS=0) 4x4 4x4 4x4 4x4 q 0~3 (x4) p 0~3 q 0~3 4x4 4x4 4x1 4x1 4x4 4x4 calc clip << >> ...... ...... threshold threshold << 4 PARALLEL FILTERS Figuio 8: Doblocling flloi uichilocluio foi lunu flloiing. 4 AuItIpIe YoItage/Frequency DonaIns Sovoiul fuclois linil lho unounl of puiullolisn lhul cun bo oxposoo in coiluin unils. Foi inslunco, lho nunboi of puos on un ASIC is liniloo, which iosliicls lho oogioo of non- oiy conliolloi puiullolisn foi ol-chip nonoiy uccossos. 1his oxpluins lho luigo cyclo counl foi MLM in 1ublo 1. In u singlo-fioguoncy oosign, MLM cuusos nuny slulls in lho iosl of lho syslon. 1his foicos lho onliio oocoooi lo iun ul u high fioguoncy in oiooi lo nuinluin lho ioguiioo poifoinunco. \o pioposo puililioning lho oocoooi inlo 2 oonuins (non- oiy conliolloi uno lho coio oocoooi), so lhul wo cun inoopon- oonlly luiloi lho fioguoncy uno vollugo of ouch oonuin. 1ho lwo oonuins uio conplololy inooponoonl, uno uio sopuiuloo by usynchionous FIFOs uno vollugo lovol-shiflois, us shown in Figuio 4. Aooilionully, 1ublo 1 shows lhul lhoio coulo bo fuilhoi bonofl lo ulso plucing lho LD unil on u sopuiulo oonuin. Huving lwo sopuiulo oonuins ullows us lo oynunicully uo- jusl fioguoncios foi lho oiloionl woillouos inooponoonlly. 1his is inpoilunl sinco woillouos foi lhoso lwo oonuins will vuiy wiooly uno oiloionlly ooponoing on wholhoi lho oo- coooi is woiling on I-fiunos oi !-fiunos. Foi oxunplo, lho MLM oonuin ioguiios u highoi fioguoncy foi !-fiunos voisus 2 174 Memory Controller write slow Voltage Level Shifters V high V low CLK fast CLK slow Core Domain DFF read fast data fast data slow Asynchronous FIFO D Q D Q D Q full slow FIFO LOGIC Mem Array metastability flop D Q empty fast
iuloo by usynchionous FIFOs uno lovol-convoilois. I-fiunos. Convoisoly, lho coio oonuin ioguiios u highoi fio- guoncy ouiing I-fiunos sinco noio iosiouuls uio piosonl uno lhoy uio piocossoo by lho LD unil. 5 Aenory OptInIzatIon Sinco lho fiuno buloi (F!) is luigo uno loculoo ol-chip, il is inpoilunl lo nininizo lho ol-chip nonoiy bunowiolh in oiooi lo ioouco powoi. \o lovoiugoo lwo loy lochniguos lo ioouco lhis nonoiy bunowiolh. Duiing nolion conponsu- lion, sono of lho ioounounl iouos uio iocognizoo uno uvoiooo. 1his huppons whon lhoio is un ovoilup in lho voilicul oi hoi- izonlul oiioclion uno lho noighboiing 4x4 blocls (wilhin lho suno nucioblocl) huvo nolion voclois wilh ioonlicul inlogoi conpononls[4[. Socono, on-chip cuchos woio usoo lo sloio oulu lhul is liloly lo bo iousoo, us shown in Figuio . 1his cuching schono io- oucos lolul ol-chip nonoiy bunowiolh by ulnosl 18' iolu- livo lo lho cuso whoio no cuchos uio usoo. Cuslon low-vollugo S!AMs ([S[) woio oosignoo lo opoiulo ul lho oosiioo coio vollugo uno fioguoncy. 1ho S!AM uichilocluio is shown in Figuio 6. 1ho S1 bil-coll (!C) is oplinizoo foi low-vollugo wiilubilily uno luigol poifoinunco. !ow-wiso supply nooo MCHo is uclivoly pulloo-oown ouiing wiilo uccossos lo in- piovo oogiuooo wiilo nuigin ul low vollugos. 1ho coll poi- foins buloioo iouos lhiough lho 2 oxliu liunsislois on lho iighl, in oiooi lo uvoio iouo upsols ul low vollugos. A psouoo- oiloionliul sonsing schono conpuliblo wilh lho S1 coll wus inplononloo, us shown on lho iighl sioo of Figuio 6. 1ho lulch-busoo sonso-unplifoi usos inpuls !D!L uno u globul iofoionco vollugo (sns!of) lo pioouco DA1AOU1, which is lhon buloioo lo lho nonoiy inloifuco`s oulpul. Conpuioo lo u convonlionul S!AM, lhis cuslon oosign cun opoiulo ul u nuch lowoi supply vollugo uno liuoo-ol poifoinunco foi onoigy. 6 Fover Aeasurenents 1ho H.264A\C !usolino Lovol 8.1 oocoooi, shown in Fig- uio S wus inplononloo in 6-nn CMOS uno lho powoi wus nousuioo whon poifoining ioul-lino oocooing of sov- oiul 720p^80fps viooo sliouns. 1ho fiuno buloi wus inplo- ED IT INTRA MC DB + MUX LAST MVS 9.4 kbits LAST LINE 21 kbits LAST 4 LINES 104 kbits TOTAL COEFFS 1.6 kbits FB On-chip Low-voltage SRAM (size) LEGEND Processing Unit Figuio : On-chip cuchos ioouco ol-chip nonoiy bunowiolh. 8C 8C 8C 8C 8C C o l u m n C l r c u l t 8C 8C 8C 8C 8C Pow Clrcult Pow Clrcult Pow Clrcult Pow Clrcult Pow Clrcult 8C C o l u m n C l r c u l t 8C 8C 8C 8C MCHd 8L 8L8 wL wL PD8L PDwL MCHd Control MCHd v DD wL Drlvers wwL PDwL 6 4
C e l l s snsPef PD8L snsLN D A T A O U T 8 L
D r l v e r s 8 L 8 L 8 D A T A | N C o l u m n C l r c u l t Figuio 6: Low-vollugo S!AM uichilocluio. nonloo using 82-bil-wioo Cypioss Z!1 S!AMs. A \iilox II F!GA wus usoo lo inloifuco lho ASIC lo lho oispluy. Figuio 7 shows u conpuiison of oui ASIC wilh olhoi oocooois, whilo 1ublo 2 lisls lho poifoinunco of oui oocoooi ul 720p. Oui 720p oocoooi ulso hus lowoi powoi uno fioguoncy iolulivo lo D1 of [1[. 1ho powoi of lho IO puos wus nol incluooo in lho nousuiononl conpuiisons. 1ho ioouclion in powoi ovoi lho olhoi iopoiloo oocooois cun bo ulliibuloo lo u conbinulion of using lho low-powoi lochniguos oosciiboo in lhis pupoi uno u noio uovuncoo piocoss. 1ublo 2: Mousuioo poifoinunco nunbois foi 720p^80fps. YIdeo noLcaI snIeIds arkrun , of Fiunos 800 800 144 !iliulo (Mbps) .4 7.0 26 Ol-chip !\ (Gbps) 1.2 1.1 1.2 Coio Fiog (MHz) 14 14 2 Mon Clil Fiog (MHz) 4 0 0 Coio \oo [\[ 0.7 0.7 0.S Mon Clil \oo [\[ 0.S 0.S4 0.S4 Iover {nV) 1.B 1.B 3.2 6.1 Fover Hreakdovn 1his soclion shows lho sinululoo powoi biouloown ouiing !-fiuno oocooing. 1ho powoi of !-fiunos is ooninuloo by 8 175 10 W 100 W 1 mW 10 mW 100 mW 1 W 0.1 1 10 100 Mpixels/s [work] - process, profile P o w e r QCIF @15 QCIF @30 CIF D1 720p 1080p Resolution [1] - 130nm, Baseline [2] - 180nm, Baseline [3] - 180nm, Baseline [4] - 130nm, Main This work - 65nm, Baseline Figuio 7: Conpuiison wilh olhoi H.264 oocooois [1, 2, 8, 4[ 3 . 2 m m 3.2mm MEMORY CONTROLLER DOMAIN CORE DOMAIN CACHES l76 |/O PADS Total Area : Technology : |/O Pads : Area Utlllzatlon : Standard Cells : On-chlp SPAM : l0mm 2 65nm l72 4l% l34k l7k8 DECODER STATISTICS Figuio S: Dio pholo showing lho oiloionl oonuins. MC (42') uno D! (26'), us soon on lho lofl chuil of Figuio 9. Aboul 7' of lho MC powoi, oi 82' of lolul powoi, is consunoo by lho MLM iouo logic, us illusliuloo by lho pio chuil on lho iighl of lho suno fguio. 1ho nonoiy conliolloi is lho luigosl powoi consunoi sinco il nusl iun ul u highoi vollugo uno fioguoncy lhun lho coio oonuin lo uchiovo lho luigo MC iouo bunowiolh (8 lunu pixols uio iouo foi ovoiy oocoooo pixol). MC 42% DB 26% MEM write 7% Pipeline control & FIFOs 19% INTRA 1% IDCT 1% ED 3% Motion Vector predictor 5% MEM Read 75% Interpolators 20% P-Frame Figuio 9: !osl-luyoul sinululoo ASIC powoi biouloown oui- ing !-fiuno oocooing. 7 Sunnary \o inplononloo u full viooo oocoooi syslon lhul oonon- sliulos high oofnilion oocooing ul low-vollugo opoiulion. Sovoiul lochniguos woio lovoiugoo lo nulo lhis low-powoi oocoooi possiblo. 1ho oocoooi unils woio pipolinoo uno isoluloo by FIFOs lo inciouso concuiioncy. 1ho MC inloi- poluloi uno D! fllois woio iopliculoo foi inciousoo poifoi- nunco. 1ho oocoooi wus puililionoo inlo inooponoonl voll- ugofioguoncy islunos lo onublo lowoi vollugofioguoncy op- oiulion foi sono of lho piocossing blocls. Finully, vollugo- sculublo on-chip cuchos holpoo ioouco bolh on-chip uno ol- chip nonoiy powoi. 8 AcknovIedgenents 1ho uulhois woulo lilo lo lhunl Noliu, 1oxus Insliunonls (1I), uno NSL!C foi lhoii suppoil of oui iosouich, 1I foi piovioing chip fubiiculion, uno lho nuny iosouichois fion Noliu, 1I uno MI1 who huvo pioviooo vuluublo fooobucl. HeIerences [1[ C.-D. Chion, C.-C. Lin, Y.-H. Shih, H.-C. Chon, C.-J. Huung, C.-Y. Yu, C.-L. Chon, C.-H. Chong, uno J.-I. Guo, A 22lgulo71n\ Mulli-Slunouio Mulli-Chunnol \iooo Docoooi foi High Dofnilion \iooo Appliculions," in 1111 1n. 5u|:c-5c ::ou: un. 1:j. Jo|. 1c:, 2007. [2[ S. Nu, \. Hwungbo, J. Kin, S. Loo, uno C.-M. Kyung, 1.Sn\, Hybiio-!ipolinoo H.264A\C Docoooi Foi Mo- bilo Dovicos," 1111 :cn 5u|:c 5c ::ou: un:- no, Novonboi 2007. [8[ 1.-M. Liu, 1.-A. Lin, S.-Z. \ung, \.-!. Loo, K.-C. Hou, J.-Y. Yung, uno C.-Y. Loo, A 12u\, Fully Sculublo M!LG-2 uno H.264A\C \iooo Docoooi foi Mobilo Ap- pliculions," 1111 Juu:nc| u 5u|:c-5c ::ou:, vol. 42, no. 1, pp. 161169, Junuuiy 2007. [4[ C.-C. Lin, J.-\. Chon, H.-C. Chung, Y.-C. Yung, Y.- H. O. Yung, M.-C. 1sui, J.-I. Guo, uno J.-S. \ung, A 160K Gulos4. K! S!AM H.264 \iooo Docoooi foi HD1\ Appliculions," 1111 Juu:nc| u 5u|:c-5c ::- ou:, vol. 42, no. 1, pp. 1701S2, Junuuiy 2007. [[ !oconnonoulion I1U-1 H.264: Aovuncoo \iooo Coo- ing foi Gonoiic Auoiovisuul Soivicos," I1U-1, 1och. !op., 2008. [6[ L. Floning, C.-C. Lin, N. Duvo, Aivino, G. !ughuvun, uno J. Hicls, H.264 Docoooi: A Cuso Sluoy in Mulliplo Dosign !oinls," 1:uoc:nj u 1u:nc| 1|uc cnc 1uc- | u: u-1:jn, 111CC11), pp. 16174, Juno 200S. [7[ K. Xu uno C.-S. Choy, A Fivo-Slugo !ipolino, 204 Cy- closM!, Singlo-!oil S!AM-!usoo Doblocling Filloi foi H.264A\C," 1111 J:cnco:un un ::ou: cnc 5- n u: 1:cu Jo|nu|uj, vol. 1S, no. 8, pp. 868874, Muich 200S. [S[ N. \oinu uno A. Chunoiulusun, A 6nn S1 Sub- \l S!AM Lnploying Sonso-Anplifoi !oounouncy," in 1111 1n. 5u|:c-5c ::ou: un. 1:j. Jo|. 1c:, Fobiuuiy 2007. 4 176