You are on page 1of 40

BipeenKulkarni

ASICSoCPhysicalDesignguidelinesandSolutionsunderonecloud

Backend(PhysicalDesign)InterviewQuestions
andAnswers

Filedunder:VLSIInterviewpreparation1Comment
March29,2013

Belowarethesequenceofquestionsaskedforaphysicaldesignengineer.

Inwhicheldareyouinterested?

Answertothisquestiondependsonyourinterest,expertiseandtotherequirementforwhichyou
havebeeninterviewed.

Well..thecandidategaveanswer:Lowpowerdesign

Canyoutalkaboutlowpowertechniques?Howlowpowerandlatest90nm/65nmtechnologiesare
related?

Referhereandbrowsefordierentlowpowertechniques.

Doyouknowaboutinputvectorcontrolledmethodofleakagereduction?

Leakagecurrentofagateisdependantonitsinputsalso.Hencendthesetofinputswhichgives
least leakage. By applyig this minimum leakage vector to a circuit it is possible to decrease the
leakage current of the circuit when it is in the standby mode. This method is known as input
vectorcontrolledmethodofleakagereduction.

Howcanyoureducedynamicpower?

ReduceswitchingactivitybydesigninggoodRTL
Clockgating
Architecturalimprovements
Reducesupplyvoltage
UsemultiplevoltagedomainsMultivdd

Whatarethevectorsofdynamicpower?

VoltageandCurrent

Howwillyoudopowerplanning?
Howwillyoudopowerplanning?

Referhereforpowerplanning.

IfyouhavebothIRdropandcongestionhowwillyouxit?

Spreadmacros
Spreadstandardcells
Increasestrapwidth
Increasenumberofstraps
Useproperblockage

IsincreasingpowerlinewidthandprovidingmorenumberofstrapsaretheonlysolutiontoIR
drop?

Spreadmacros
Spreadstandardcells
Useproperblockage

Inaregtoregpathifyouhavesetupproblemwherewillyouinsertbuerneartolaunchingop
orcaptureop?Why?

(buersareinsertedforxingfanoutvoilationsandhencetheyreducesetupvoilation;otherwise
wetrytoxsetupvoilationwiththesizingofcells;nowjustassumethatyoumustinsertbuer!)

Neartocapturepath.

Because there may be other paths passing through or originating from the op nearer to lauch
op. Hence buer insertion may aect other paths also. It may improve all those paths or
degarde. If all those paths have voilation then you may insert buer nearer to launch op
provideditimprovesslack.

Howwillyoudecidebestoorplan?

Referhereforoorplanning.

Whatisthemostchallengingtaskyouhandled?WhatisthemostchallengingjobinP&Row?

ItmaybepowerplanningbecauseyoufoundmoreIRdrop
Itmaybelowpowertargetbecauseyouhadmoredynamicandleakagepower
Itmaybemacroplacementbecauseithadmoreconnectionwithstandardcellsormacros
ItmaybeCTSbecauseyouneededtohandlemultipleclocksandclockdomaincrossings
ItmaybetimingbecausesizingcellsinECOowisnotmeetingtiming
Itmaybelibrarypreparationbecauseyoufoundsomeinconsistancyinlibraries.
ItmaybeDRCbecauseyoufacedthousandsofvoilations

Howwillyousynthesizeclocktree?

Singleclocknormalsynthesisandoptimization
MultipleclocksSynthesiseachclockseperately
MultipleclockswithdomaincrossingSynthesiseachclockseperatelyandbalancetheskew

Howmanyclockswerethereinthisproject?

Itisspecictoyourproject

Moretheclocksmorechallenging!
Moretheclocksmorechallenging!

Howdidyouhandleallthoseclocks?

Multipleclocks>synthesizeseperately>balancetheskew>optimizetheclocktree

AretheycomefromseperateexternalresourcesorPLL?

Ifitisfromseperateclocksources(i.e.asynchronous;fromdierentpadsorpins)thenbalancing
skewbetweentheseclocksourcesbecomeschallenging.

IfitisfromPLL(i.e.synchronous)thenskewbalancingiscomparativelyeasy.

Whybuersareusedinclocktree?

Tobalanceskew(i.e.optoopdelay)

Whatiscrosstalk?

Switching of the signal in one net can interfere neigbouring net due to cross coupling
capacitance.Thisaectisknownascrostalk.Crosstalkmayleadsetuporholdvoilation.

Howcanyouavoidcrosstalk?

Doublespacing=>morespacing=>lesscapacitance=>lesscrosstalk
Multiplevias=>lessresistance=>lessRCdelay
Shielding=>constantcrosscouplingcapacitance=>knownvalueofcrosstalk
Buerinsertion=>boostthevictimstrength

Howshieldingavoidscrosstalkproblem?Whatexactlyhappensthere?

Highfrequencynoise(orglitch)iscoupledtoVSS(orVDD)sinceshildedlayersareconnectedto
eitherVDDorVSS.

CouplingcapacitanceremainsconstantwithVDDorVSS.

Howspacinghelpsinreducingcrosstalknoise?

widthismore=>more spacing between two conductors=>cross coupling capacitanceisless=>less


crosstalk

Whydoublespacingandmultipleviasareusedrelatedtoclock?

Why clock? because it is the one signal which chages it state regularly and more compared to
anyothersignal.Ifanyothersignalswitchesfastthenalsowecanusedoublespace.

Doublespacing=>widthismore=>capacitanceisless=>lesscrosstalk

Multiplevias=>resistanceinparellel=>lessresistance=>lessRCdelay

Howbuercanbeusedinvictimtoavoidcrosstalk?

Buerincreasevictimssignalstrength;buersbreakthenetlength=>victimsaremoretolerantto
coupledsignalfromaggressor.

0commentsLinkstothispost

Labels:PhysicalDesign,Synthesis,TimingAnalysis
Labels:PhysicalDesign,Synthesis,TimingAnalysis

PhysicalDesignQuestionsandAnswers

Iamgeingseveralemailsrequestinganswerstothequestionspostedinthisblog.Butitisvery
diculttoprovidedetailedanswertoallquestionsinmyavailablesparetime.Henceidecidedto
give short and sweet one line answers to the questions so that readers can immediately
beneted. Detailed answers will be posted in later stage.I have given answers to some of the
physicaldesignquestionshere.Enjoy!

Whatparameters(oraspects)dierentiateChipDesignandBlockleveldesign?

ChipdesignhasI/Opads;blockdesignhaspins.

Chipdesignusesallmetallayesavailable;blockdesignmaynotuseallmetallayers.

Chipisgenerallyrectangularinshape;blockscanberectangular,rectilinear.

Chipdesignrequiresseveralpackaging;blockdesignendsinamacro.

Howdoyouplacemacrosinafullchipdesign?

Firstcheckylinesi.e.checknetconnectionsfrommacrotomacroandmacrotostandardcells.

If there is more connection from macro to macro place those macros nearer to each other
preferablynearertocoreboundaries.

Ifinputpinisconnectedtomacrobeertoplacenearertothatpinorpad.

Ifmacrohasmoreconnectiontostandardcellsspreadthemacrosinsidecore.

Avoidcriscrossplacementofmacros.

Usesoftorhardblockagestoguideplacementengine.

DierentiatebetweenaHierarchicalDesignandatdesign?

Hierarchialdesignhasblocks,subblocksinanhierarchy;Flaeneddesignhasnosubblocksandit
hasonlyleafcells.

Hierarchicaldesigntakesmoreruntime;Flaeneddesigntakeslessruntime.

Whichismorecomplicatedwhenuhavea48MHzand500MHzclockdesign?

500MHz;becauseitismoreconstrained(i.e.lesserclockperiod)than48MHzdesign.

Namefewtoolswhichyouusedforphysicalverication?

HerculisfromSynopsys,CaliberfromMentorGraphics.

Whataretheinputleswillyougiveforprimetimecorrelation?

Netlist,Technologylibrary,Constraints,SPEForSDFle.

Iftheroutingcongestionexistsbetweentwomacros,thenwhatwillyoudo?
Iftheroutingcongestionexistsbetweentwomacros,thenwhatwillyoudo?

Providesoftorhardblockage

Howwillyoudecidethediesize?

Bycheckingthetotalareaofthedesignyoucandecidediesize.

If lengthy metal layer is connected to diusion and poly, then which one will aect by antenna
problem?

Poly

Ifthefullchipdesignisroutedby7layermetal,whymacrosaredesignedusing5LMinsteadof
using7LM?

Becausetoptwometallayersarerequiredforglobalroutinginchipdesign.Iftopmetallayersare
alsousedinblocklevelitwillcreateroutingblockage.

Inyourprojectwhatisdiesize,numberofmetallayers,technology,foundry,numberofclocks?

Diesize:tellinmmeg.1mmx1mm;remeber1mm=1000micronwhichisabigsize!!

Metallayers:Seeyourtechle.generallyfor90nmitis7to9.

Technology:Againlookintotechles.

Foundry:Againlookintotechles;eg.TSMC,IBM,ARTISANetc

Clocks:LookintoyourdesignandSDCle!

Howmanymacrosinyourdesign?

Youknowitwellasyouhavedesignedit!ASoC(SystemOnChip)designmayhave100macros
also!!!!

Whatiseachmacrosizeandnumberofstandardcellcount?

Dependsonyourdesign.

Whataretheinputneedsforyourdesign?

Forsynthesis:RTL,Technologylibrary,Standardcelllibrary,Constraints

ForPhysicaldesign:Netlist,Technologylibrary,Constraints,Standardcelllibrary

WhatisSDCconstraintlecontains?

Clockdenitions

Timingexceptionmulticyclepath,falsepath

InputandOutputdelays

Howdidyoudopowerplanning?

Howtocalculatecoreringwidth,macroringwidthandstraportrunkwidth?
HowtondnumberofpowerpadandIOpowerpads?
HowtondnumberofpowerpadandIOpowerpads?

Howthewidthofmetalandnumberofstrapscalculatedforpowerandground?

Getthetotalcorepowerconsumption;getthemetallayercurrentdensityvaluefromthetechle;
Divide total power by number sides of the chip; Divide the obtained value from the current
density to get core power ring width. Then calculate number of straps using some more
equations.Willbeexplainedindetaillater.

Howtondtotalchippower?

Total chip power=standard cell power consumption,Macro power consumption pad power
consumption.

Whataretheproblemsfacedrelatedtotiming?

Prelayout:Setup,Maxtransition,maxcapacitance

Postlayout:Hold

Howdidyouresolvethesetupandholdproblem?

Setup:upsizethecells

Hold:insertbuers

Inwhichlayerdoyoupreferforclockroutingandwhy?

Nextlowerlayertothetoptwometallayers(globalroutinglayers).Becauseithaslessresistance
hencelessRCdelay.

Ifinyourdesignhasresetpin,thenitllaectinputpinoroutputpinorboth?

Outputpin.

Duringpoweranalysis,ifyouarefacingIRdropproblem,thenhowdidyouavoid?

Increasepowermetallayerwidth.

Goforhighermetallayer.

Spreadmacrosorstandardcells.

Providemorestraps.

Deneantennaproblemandhowdidyouresolvetheseproblem?

Increased net length can accumulate more charges while manufacturing of the device due to
ionisation process. If this net is connected to gate of the MOSFET it can damage dielectric
property of the gate and gate may conduct causing damage to the MOSFET. This is antenna
problem.

Decreasethelengthofthenetbyprovidingmoreviasandlayerjumping.

Insertantennadiode.

HowdelaysvarywithdierentPVTconditions?Showthegraph.
Pincrease>dealyincrease
Pincrease>dealyincrease

Pdecrease>delaydecrease

Vincrease>delaydecrease

Vdecrease>delayincrease

Tincrease>delayincrease

Tdecrease>delaydecrease

Explaintheowofphysicaldesignandinputsandoutputsforeachstepinow.

Clickheretoseetheowdiagram

Whatiscelldelayandnetdelay?

Gatedelay

Transistorswithinagatetakeanitetimetoswitch.Thismeansthatachangeontheinputofa
gatetakesanitetimetocauseachangeontheoutput.[Magma]

Gatedelay=functionof(i/ptransitiontime,Cnet+Cpin).

CelldelayisalsosameasGatedelay.

Celldelay

Foranygateitismeasuredbetween50%ofinputtransitiontothecorresponding50%ofoutput
transition.

Intrinsicdelay

Intrinsicdelayisthedelayinternaltothegate.Inputpinofthecelltooutputpinofthecell.

It is dened as the delay between an input and output pair of a cell, when a near zero slew is
applied to the input pin and the output does not see any load condition.It is predominantly
causedbytheinternalcapacitanceassociatedwithitstransistor.

Thisdelayislargelyindependentofthesizeofthetransistorsformingthegatebecauseincreasing
sizeoftransistorsincreaseinternalcapacitors.

NetDelay(orwiredelay)

The dierence between the time a signal is rst applied to the net and the time it reaches other
devicesconnectedtothatnet.

Itisduetotheniteresistanceandcapacitanceofthenet.Itisalsoknownaswiredelay.

Wiredelay=fn(Rnet,Cnet+Cpin)

Whataredelaymodelsandwhatisthedierencebetweenthem?

LinearDelayModel(LDM)

NonLinearDelayModel(NLDM)
Whatiswireloadmodel?
Whatiswireloadmodel?

WireloadmodelisNLDMwhichhasestimatedRandCofthenet.

WhyhighermetallayersarepreferredforVddandVss?

BecauseithaslessresistanceandhenceleadstolessIRdrop.

Whatislogicoptimizationandgivesomemethodsoflogicoptimization.

Upsizing

Downsizing

Buerinsertion

Buerrelocation

Dummybuerplacement

Whatisthesignicanceofnegativeslack?

negativeslack==>thereissetupvoilation==>deisgncanfail

Whatissignalintegrity?HowitaectsTiming?

IRdrop,ElectroMigration(EM),Crosstalk,Groundbouncearesignalintegrityissues.

IfIdropismore==>delayincreases.

crosstalk==>therecanbesetupaswellasholdvoilation.

WhatisIRdrop?Howtoavoid?Howitaectstiming?

There is a resistance associated with each metal layer. This resistance consumes power causing
voltagedropi.e.IRdrop.

IfIRdropismore==>delayincreases.

WhatisEManditeects?

Due to high current ow in the metal atoms of the metal can displaced from its origial place.
Whenithappensinlargeramountthemetalcanopenorbulgingofmetallayercanhappen.This
eectisknownasElectroMigration.

Aects:Eithershortoropenofthesignallineorpowerline.

Whataretypesofrouting?

GlobalRouting

TrackAssignment

DetailRouting

Whatislatency?Givethetypes?

SourceLatency
SourceLatency

It is known as source latency also. It is dened as the delay from the clock origin point to the
clockdenitionpointinthedesign.

Delayfromclocksourcetobeginningofclocktree(i.e.clockdenitionpoint).

The time a clock signal takes to propagate from its ideal waveform origin point to the clock
denitionpointinthedesign.

Networklatency

ItisalsoknownasInsertiondelayorNetworklatency.Itisdenedasthedelayfromtheclock
denitionpointtotheclockpinoftheregister.

Thetimeclocksignal(riseorfall)takestopropagatefromtheclockdenitionpointtoaregister
clockpin.

Whatistrackassignment?

Secondstageoftheroutingwhereinparticularmetaltracks(orlayers)areassignedtothesignal
nets.

Whatiscongestion?

If the number of routing tracks available for routing is less than the required tracks then it is
knownascongestion.

Whethercongestionisrelatedtoplacementorrouting?

Routing

Whatareclocktrees?

Distributionofclockfromtheclocksourcetothesyncpinoftheregisters.

Whatareclocktreetypes?

Htree,Balancedtree,Xtree,Clusteringtree,Fishbone

Whatiscloningandbuering?

Cloningisamethodofoptimizationthatdecreasestheloadofaheavilyloadedcellbyreplicating
thecell.

Bueringisamethodofoptimizationthatisusedtoinsertbeersinhighfanoutnetstodecrease
thedealy.

Whatisthedifferencebetweenalatchandaflipflop?

Both latches and ipops are circuit elements whose output depends not only on the present
inputs,butalsoonpreviousinputsandoutputs.

Theybotharehencereferredassequentialelements.
Theybotharehencereferredassequentialelements.

In electronics, a latch, is a kind of bistable multi vibrator, an electronic circuit which has two
stablestatesandtherebycanstoreonebitofofinformation.Todaythewordismainlyusedfor
simpletransparentstorageelements,whileslightlymoreadvancednontransparent(orclocked)
devicesaredescribedasipops.Informally,asthisdistinctionisquitenew,thetwowordsare
sometimesusedinterchangeably.[wiki]

In digital circuits, a ipop is a kind of bistable multi vibrator, an electronic circuit which has
twostablestatesandtherebyiscapableofservingasonebitofmemory.Today,thetermipop
has come to generally denote nontransparent (clocked or edgetriggered) devices, while the
simplertransparentonesareoftenreferredtoaslatches.[wiki]

Aipopiscontrolledby(usually)oneortwocontrolsignalsand/oragateorclocksignal.

Latchesarelevelsensitivei.e.theoutputcapturestheinputwhentheclocksignalishigh,soas
longastheclockislogic1,theoutputcanchangeiftheinputalsochanges.

FlipFlopsareedgesensitivei.e.ipopwillstoretheinputonlywhenthereisarisingorfalling
edgeoftheclock.

A positive level latch is transparent to the positive level(enable), and it latches the nal input
beforeitischangingitslevel(i.e.beforeenablegoesto0orbeforetheclockgoestovelevel.)

A positive edge op will have its output eective when the clock input changes from 0 to 1
state(1to0fornegativeedgeop)only.

Latchesarefaster,ipopsareslower.

Latchissensitivetoglitchesonenablepin,whereasipopisimmunetoglitches.

Latchestakelessgates(lesspower)toimplementthanipops.

DFFisbuiltfromtwolatches.Theyareinmasterslaveconguration.

Latchmaybeclockedorclockless.Butipopisalwaysclocked.

ForatransparentlatchgenerallyDtoQpropagationdelayisconsideredwhileforaopclockto
Qandsetupandholdtimeareveryimportant.

Synthesisperspective:ProsandConsofLatchesandFlipFlops

InsynthesisofHDLcodesinappropriatecodingcaninferlatchesinsteadofipops.Eg.:ifand
casestatements.Thisshouldbeavoidedsalatchesaremorepronetoglitches.

Latchtakeslessarea,Flipoptakesmorearea(asipopismadeupoflatches).

Latchfacilitatetimeborrowingorcyclestealingwhereasipopsallowsynchronouslogic.

Latches are not friendly with DFT tools. Minimize inferring of latches if your design has to be
madetestable.Sinceenablesignaltolatchisnotaregularclockthatisfedtotherestofthelogic.
Toensuretestability,youneedtouseORgateusingenable andscan_enablesignalsasinput
andfeedtheoutputtotheenableportofthelatch.[ref]

Most EDA software tools have diculty with latches. Static timing analyzers typically make
assumptionsaboutlatchtransparency.Ifoneassumesthelatchistransparent(i.e.triggeredbythe
activetimeofclock,nottriggeredbyjustclockedge),thenthetoolmayndafalsetimingpath
activetimeofclock,nottriggeredbyjustclockedge),thenthetoolmayndafalsetimingpath
throughtheinputdatapin.Ifoneassumesthelatchisnottransparent,thenthetoolmaymissa
criticalpath.

If target technology supports a latch cell then race condition problems are minimized. If target
technologydoesnotsupportalatchthensynthesistoolwillinferitbybasicgateswhichisprone
to race condition. Then you need to add redundant logic to overcome this problem. But while
optimization redundant logic can be removed by the synthesis tool ! This will create endless
problemsforthedesignteam.

Duetothetransparencyissue,latchesarediculttotest.Forscantesting,theyareoftenreplaced
byalatchipopcompatiblewiththescantestshiftregister.Undertheseconditions,aipop
wouldactuallybelessexpensivethanalatch.Readagoodarticleonproblemsoflatchpublished
ineetimeslongback!!

FlipopsarefriendlywithDFTtools.Scaninsertionforsynchronouslogicishasslefree.

2commentsLinkstothispost

Labels:Digitaldesign

WhatarethedifferenttypesofdelaysinASICorVLSIdesign?

DierentTypesofDelaysinASICorVLSIdesign

SourceDelay/Latency

NetworkDelay/Latency

InsertionDelay

TransitionDelay/Slew:Risetime,falltime

PathDelay

Netdelay,wiredelay,interconnectdelay

PropagationDelay

PhaseDelay

CellDelay

IntrinsicDelay

ExtrinsicDelay

InputDelay

OutputDelay

ExitDelay

Latency(Pre/postCTS)
Latency(Pre/postCTS)

Uncertainty(Pre/PostCTS)

Unateness:Positiveunateness,negativeunateness

Jier:PLLjier,clockjier

Gatedelay

Transistorswithinagatetakeanitetimetoswitch.Thismeansthatachangeontheinputofa
gatetakesanitetimetocauseachangeontheoutput.[Magma]

Gatedelay=functionof(i/ptransitiontime,Cnet+Cpin).

CelldelayisalsosameasGatedelay.

SourceDelay(orSourceLatency)

It is known as source latency also. It is dened as the delay from the clock origin point to the
clockdenitionpointinthedesign.

Delayfromclocksourcetobeginningofclocktree(i.e.clockdenitionpoint).

The time a clock signal takes to propagate from its ideal waveform origin point to the clock
denitionpointinthedesign.

NetworkDelay(latency)

ItisalsoknownasInsertiondelayorNetworklatency.Itisdenedasthedelayfromtheclock
denitionpointtotheclockpinoftheregister.

Thetimeclocksignal(riseorfall)takestopropagatefromtheclockdenitionpointtoaregister
clockpin.

Insertiondelay

Thedelayfromtheclockdenitionpointtotheclockpinoftheregister.

Transitiondelay

ItisalsoknownasSlew.Itisdenedasthetimetakentochangethestateofthesignal.Time
takenforthetransitionfromlogic0tologic1andviceversa.orTimetakenbytheinputsignalto
risefrom10%(20%)tothe90%(80%)andviceversa.

Transitionisthetimeittakesforthepintochangestate.

Slew

Rateofchangeoflogic.SeeTransitiondelay.

Slewrateisthespeedoftransitionmeasuredinvolt/ns.

RiseTime

Risetimeisthedierencebetweenthetimewhenthesignalcrossesalowthresholdtothetime
whenthesignalcrossesthehighthreshold.Itcanbeabsoluteorpercent.
Lowandhighthresholdsarexedvoltagelevelsaroundthemidvoltageleveloritcanbeeither
Lowandhighthresholdsarexedvoltagelevelsaroundthemidvoltageleveloritcanbeeither
10% and 90% respectively or 20% and 80% respectively. The percent levels are converted to
absolutevoltagelevelsatthetimeofmeasurementbycalculatingpercentagesfromthedierence
betweenthestartingvoltagelevelandthenalseledvoltagelevel.

FallTime

Falltimeisthedierencebetweenthetimewhenthesignalcrossesahighthresholdtothetime
whenthesignalcrossesthelowthreshold.

The low and high thresholds are xed voltage levels around the mid voltage level or it can be
either10%and90%respectivelyor20%and80%respectively.Thepercentlevelsareconvertedto
absolutevoltagelevelsatthetimeofmeasurementbycalculatingpercentagesfromthedierence
betweenthestartingvoltagelevelandthenalseledvoltagelevel.

Foranidealsquarewavewith50%dutycycle,therisetimewillbe0.Forasymmetrictriangular
wave,thisisreducedtojust50%.

Clickheretoseewaveform.

Clickheretoseemoreinfo.

The rise/fall denition is set on the meter to 10% and 90% based on the linear power in Was.
Thesepointstranslateintothe10dBand0.5dBpointsinlogmode(10log0.1)and(10log0.9).
Therise/falltimevaluesof10%and90%arecalculatedbasedonanalgorithm,whichlooksatthe
meanpoweraboveandbelowthe50%pointsoftherise/falltimes.Clickheretoseemore.

Pathdelay

Pathdelayisalsoknownaspintopindelay.Itisthedelayfromtheinputpinofthecelltothe
outputpinofthecell.

NetDelay(orwiredelay)

The dierence between the time a signal is rst applied to the net and the time it reaches other
devicesconnectedtothatnet.

Itisduetotheniteresistanceandcapacitanceofthenet.Itisalsoknownaswiredelay.

Wiredelay=fn(Rnet,Cnet+Cpin)

Propagationdelay

Foranygateitismeasuredbetween50%ofinputtransitiontothecorresponding50%ofoutput
transition.

Thisisthetimerequiredforasignaltopropagatethroughagateornet.Forgatesitisthetimeit
takesforaeventatthegateinputtoaectthegateoutput.

Fornetitisthedelaybetweenthetimeasignalisrstappliedtothenetandthetimeitreaches
otherdevicesconnectedtothatnet.

Itistakenastheaverageofrisetimeandfalltimei.e.Tpd=(Tphl+Tplh)/2.

Phasedelay

Sameasinsertiondelay
Sameasinsertiondelay

Celldelay

Foranygateitismeasuredbetween50%ofinputtransitiontothecorresponding50%ofoutput
transition.

Intrinsicdelay

Intrinsicdelayisthedelayinternaltothegate.Inputpinofthecelltooutputpinofthecell.

It is dened as the delay between an input and output pair of a cell, when a near zero slew is
applied to the input pin and the output does not see any load condition.It is predominantly
causedbytheinternalcapacitanceassociatedwithitstransistor.

Thisdelayislargelyindependentofthesizeofthetransistorsformingthegatebecauseincreasing
sizeoftransistorsincreaseinternalcapacitors.

Extrinsicdelay

Sameaswiredelay,netdelay,interconnectdelay,ighttime.

Extrinsicdelayisthedelayeectthatassociatedtowithinterconnect.outputpinofthecelltothe
inputpinofthenextcell.

Inputdelay

Inputdelayisthetimeatwhichthedataarrivesattheinputpinoftheblockfromexternalcircuit
withrespecttoreferenceclock.

Outputdelay

Output delay is time required by the external circuit before which the data has to arrive at the
outputpinoftheblockwithrespecttoreferenceclock.

Exitdelay

Itisdenedasthedelayinthelongestpath(criticalpath)betweenclockpadinputandanoutput.
Itdeterminesthemaximumoperatingfrequencyofthedesign.

Latency(pre/postcts)

Latency is the summation of the Source latency and the Network latency. Pre CTS estimated
latencywillbeconsideredduringthesynthesisandafterCTSpropagatedlatencyisconsidered.

Uncertainty(pre/postcts)

Uncertaintyistheamountofskewandthevariationinthearrivalclockedge.PreCTSuncertainty
isclockskewandclockJier.AfterCTSwecanhavesomemarginofskew+Jier.

Unateness

Afunctionissaidtobeunateiftherisetransitiononthepositiveunateinputvariablecausesthe
ouputtoriseornochangeandviceversa.

Negative unateness means cell output logic is inverted version of input logic. eg. In inverter
Negative unateness means cell output logic is inverted version of input logic. eg. In inverter
having input A and output Y, Y is ve unate w.r.to A. Positive unate means cell output logic is
sameasthatofinput.

These+veadveunatenessareconstraintsdenedinlibraryleandaredenedforoutputpin
w.r.tosomeinputpin.

Aclocksignalispositiveunateifarisingedgeattheclocksourcecanonlycausearisingedgeat
theregisterclockpin,andafallingedgeattheclocksourcecanonlycauseafallingedgeatthe
registerclockpin.

Aclocksignalisnegativeunateifarisingedgeattheclocksourcecanonlycauseafallingedgeat
the register clock pin, and a falling edge at the clock source can only cause a rising edge at the
registerclockpin.Inotherwords,theclocksignalisinverted.

Aclocksignalisnotunateiftheclocksenseisambiguousasaresultofnonunatetimingarcsin
theclockpath.Forexample,aclockthatpassesthroughanXORgateisnotunatebecausethere
arenonunatearcsinthegate.Theclocksensecouldbeeitherpositiveornegative,dependingon
thestateoftheotherinputtotheXORgate.

Jier

Theshorttermvariationsofasignalwithrespecttoitsidealpositionintime.

Jieristhevariationoftheclockperiodfromedgetoedge.Itcanvarry+/jiervalue.

From cycle to cycle the period and duty cycle can change slightly due to the clock generation
circuitry.Thiscanbemodeledbyaddinguncertaintyregionsaroundtherisingandfallingedges
oftheclockwaveform.

SourcesofJierCommonsourcesofjierinclude:

Internalcircuitryofthephaselockedloop(PLL)

Randomthermalnoisefromacrystal

Otherresonatingdevices

Randommechanicalnoisefromcrystalvibration

Signaltransmiers

Tracesandcables

Connectors

Receivers

ClickheretoreadmoreaboutjierfromAltera.

Clickheretoreadwhatwikisaysaboutjier.

Skew

Thedierenceinthearrivalofclocksignalattheclockpinofdierentops.

Twotypesofskewsaredened:LocalskewandGlobalskew.
Twotypesofskewsaredened:LocalskewandGlobalskew.

Localskew

Thedierenceinthearrivalofclocksignalattheclockpinofrelatedops.

Globalskew

Thedierenceinthearrivalofclocksignalattheclockpinofnonrelatedops.

Skewcanbepositiveornegative.

WhendataandclockareroutedinsamedirectionthenitisPositiveskew.

Whendataandclockareroutedinoppositethenitisnegativeskew.

RecoveryTime

Recoveryspeciestheminimumtimethatanasynchronouscontrolinputpinmustbeheldstable
afterbeingdeassertedandbeforethenextclock(activeedge)transition.

Recoverytimespeciesthetimetheinactiveedgeoftheasynchronoussignalhastoarrivebefore
theclosingedgeoftheclock.

Recoverytimeistheminimumlengthoftimeanasynchronouscontrolsignal(eg.preset)mustbe
stablebeforethenextactiveclockedge.Therecoveryslacktimecalculationissimilartotheclock
setupslacktimecalculation,butitappliesasynchronouscontrolsignals.

Equation1:

RecoverySlackTime=DataRequiredTimeDataArrivalTime

DataArrivalTime=LaunchEdge+ClockNetworkDelaytoSourceRegister+Tclkq+Registerto
RegisterDelay

DataRequiredTime=LatchEdge+ClockNetworkDelaytoDestinationRegister=Tsetup

Iftheasynchronouscontrolisnotregistered,equationsshowninEquation2isusedtocalculatethe
recoveryslacktime.Equation2:

RecoverySlackTime=DataRequiredTimeDataArrivalTime

DataArrivalTime=LaunchEdge+MaximumInputDelay+PorttoRegisterDelay

DataRequiredTime=LatchEdge+ClockNetworkDelaytoDestinationRegisterDelay+Tsetup

Iftheasynchronousresetsignalisfromaport(deviceI/O),youmustmakeanInputMaximum
Delayassignmenttotheasynchronousresetpintoperformrecoveryanalysisonthatpath.

RemovalTime

Removalspeciestheminimumtimethatanasynchronouscontrolinputpinmustbeheldstable
beforebeingdeassertedandafterthepreviousclock(activeedge)transition.

Removal time species the length of time the active phase of the asynchronous signal has to be
heldaftertheclosingedgeofclock.

Removaltimeistheminimumlengthoftimeanasynchronouscontrolsignalmustbestableafter
Removaltimeistheminimumlengthoftimeanasynchronouscontrolsignalmustbestableafter
the active clock edge. Calculation is similar to the clock hold slack calculation, but it applies
asynchronous control signals. If the asynchronous control is registered, equations shown in
Equation3isusedtocalculatetheremovalslacktime.

Iftherecoveryorremovalminimumtimerequirementisviolated,theoutputofthesequentialcell
becomes uncertain. The uncertainty can be caused by the value set by the resetbar signal or the
valueclockedintothesequentialcellfromthedatainput.

Equation3

RemovalSlackTime=DataArrivalTimeDataRequiredTime

Data Arrival Time = Launch Edge + Clock Network Delay to Source Register + Tclkq of Source
Register+RegistertoRegisterDelay

DataRequiredTime=LatchEdge+ClockNetworkDelaytoDestinationRegister+Thold

Iftheasynchronouscontrolisnotregistered,equationsshowninEquation4isusedtocalculate
theremovalslacktime.

Equation4

RemovalSlackTime=DataArrivalTimeDataRequiredTime

Data Arrival Time = Launch Edge + Input Minimum Delay of Pin + Minimum Pin to Register
Delay

DataRequiredTime=LatchEdge+ClockNetworkDelaytoDestinationRegister+Thold

Iftheasynchronousresetsignalisfromadevicepin,youmustspecifytheInputMinimumDelay
constrainttotheasynchronousresetpintoperformaremovalanalysisonthispath.

Whatisthedifferencebetweensoftmacroandhardmacro?

Whatisthedierencebetweenhardmacro,rmmacroandsoftmacro?

or

WhatareIPs?

Hard macro, rm macro and soft macro are all known as IP (Intellectual property). They are
optimized for power, area and performance. They can be purchased and used in your ASIC or
FPGA design implementation ow. Soft macro is exible for all type of ASIC implementation.
HardmacrocanbeusedinpureASICdesignow,notinFPGAow.BeforebyinganyIPitis
very important to evaluate its advantages and disadvantages over each other, hardware
compatibilitysuchasI/Ostandardswithyourdesignblocks,reusabilityforotherdesigns.

Softmacros

SoftmacrosareinsynthesizableRTL.

Softmacrosaremoreexiblethanrmorhardmacros.
Softmacrosaremoreexiblethanrmorhardmacros.

Softmacrosarenotspecictoanymanufacturingprocess.

Soft macros have the disadvantage of being somewhat unpredictable in terms of performance,
timing,area,orpower.

Soft macros carry greater IP protection risks because RTL source code is more portable and
therefore,lesseasilyprotectedthaneitheranetlistorphysicallayoutdata.

Fromthephysicaldesignperspective,softmacroisanycellthathasbeenplacedandroutedina
placementandroutingtoolsuchasAstro.(ThisisthedenitiongiveninAstroRailusermanual!)

Softmacrosareeditableandcancontainstandardcells,hardmacros,orothersoftmacros.

Firmmacros

Firmmacrosareinnetlistformat.

Firmmacrosareoptimizedforperformance/area/powerusingaspecicfabricationtechnology.

Firmmacrosaremoreexibleandportablethanhardmacros.

Firmmacrosarepredictiveofperformanceandareathansoftmacros.

Hardmacro

HardmacrosaregenerallyintheformofhardwareIPs(orwetermeditashardwreIPs!).

HardmacosaretargetedforspecicICmanufacturingtechnology.

Hardmacrosareblockleveldesignswhicharesilicontestedandproved.

Hardmacroshavebeenoptimizedforpowerorareaortiming.

Inphysicaldesignyoucanonlyaccesspinsofhardmacrosunlikesoftmacroswhichallowsusto
manipulateindierentway.

Youhavefreedomtomove,rotate,ipbutyoucanttouchanythinginsidehardmacros.

Very common example of hard macro is memory. It can be any design which carries dedicated
singlefunctionality(ingeneral)..forexampleitcanbeaMP4decoder.

Be aware of features and characteristics of hard macro before you use it in your design other
thanpower,timingandareayoualsoshouldknowpinpropertieslikesyncpin,I/Ostandardsetc

LEF,GDS2leformatallowseasyusageofmacrosindierenttools.

Fromthephysicaldesign(backend)perspective:

Hardmacroisablockthatisgeneratedinamethodologyotherthanplaceandroute(i.e.using
fullcustomdesignmethodology)andisbroughtintothephysicaldesigndatabase(eg.Milkyway
inSynopsys;VolcanoinMagma)asaGDS2le.

HereisonearticlepublishedinembeddedmagazineaboutIPs.Clickheretoread.

Synthesis and placement of macros in modern SoC designs are challenging. EDA tools employ
Synthesis and placement of macros in modern SoC designs are challenging. EDA tools employ
dierentalgorithmsaccomplishthistaskalongwiththetargetofpowerandarea.Thereareseveral
research papers available on these subjects. Some of them can be downloaded from the given link
below.

HardMacroPlacementinComplexSoCDesignviewandreadarticlefromsoccentral

HardMacroPlacementinComplexSoCDesigndownloadwhitepaper

IEEE/Univerityresearchpapers

LocalSearchforFinalPlacementinVLSIDesigndownload

Consistent Placement of MacroBlocks Using Floorplanning and standard cell placement


download

A TimingDriven SoftMacro Placement And Resynthesis Method In Interaction with Chip


Floorplanningdownload

0commentsLinkstothispost

Labels:ASIC,PhysicalDesign,VLSI

WhatisthedifferencebetweenFPGAandCPLD?

FPGAField Programmable Gate Array and CPLDComplex Programmable Logic Device both are
programmablelogicdevicesmadebythesamecompanieswithdierentcharacteristics.

A Complex Programmable Logic Device (CPLD) is a Programmable Logic Device with


complexity between that of PALs (Programmable Array Logic) and FPGAs, and architectural
features of both. The building block of a CPLD is the macro cell, which contains logic
implementingdisjunctivenormalformexpressionsandmorespecializedlogicoperations.

ThisiswhatWikidenes..!!

Clickheretoseewhatelsewikihastosayaboutit!

Architecture

GranularityisthebiggestdierencebetweenCPLDandFPGA.

FPGAarenegraindevices.Thatmeansthattheycontainhundredsof(upto100000)oftiny
blocks (called as LUT or CLBs etc) of logic with ipops, combinational logic and
memories.FPGAs oer much higher complexity, up to 150,000 ipops and large number of
gatesavailable.

CPLDs typically have the equivalent of thousands of logic gates, allowing implementation of
moderately complicated data processing devices. PALs typically have a few hundred gate
equivalentsatmost,whileFPGAstypicallyrangefromtensofthousandstoseveralmillion.

CPLD are coarsegrain devices. They contain relatively few (a few 100s max) large blocks of
logicwithipopsandcombinationallogic.CPLDsbasedonANDORstructure.

CPLDshavearegisterwithassociatedlogic(AND/ORmatrix).CPLDsaremostlyimplemented
CPLDshavearegisterwithassociatedlogic(AND/ORmatrix).CPLDsaremostlyimplemented
in control applications and FPGAs in datapath applications. Because of this course grained
architecture,thetimingisveryxedinCPLDs.

FPGAareRAMbased.Theyneedtobedownloaded(congured)ateachpowerup.CPLDare
EEPROM based. They are active at powerup i.e. as long as theyve been programmed at least
once.

FPGAneedsbootROMbutCPLDdoesnot.Insomesystemsyoumightnothaveenoughtimeto
bootupFPGAthenyouneedCPLD+FPGA.

Generally,theCPLDdevicesarenotvolatile,becausetheycontainashorerasableROMmemory
inallthecases.TheFPGAarevolatileinmanycasesandhencetheyneedacongurationmemory
for working. There are some FPGAs now which are nonvolatile. This distinction is rapidly
becominglessrelevant,asseveralofthelatestFPGAproductsalsooermodelswithembedded
congurationmemory.

ThecharacteristicofnonvolatilitymakestheCPLDthedeviceofchoiceinmoderndigitaldesigns
to perform boot loader functions before handing over control to other devices not having this
capability.AgoodexampleiswhereaCPLDisusedtoloadcongurationdataforanFPGAfrom
nonvolatilememory.

Becauseofcoarsegrainarchitecture,oneblockoflogiccanholdabigequationandhenceCPLD
haveafasterinputtooutputtimingsthanFPGA.

Clickheretoreadonegoodarticle.

Features

FPGA have special routing resources to implement binary counters,arithmetic functions like
adders,comparatorsandRAM.CPLDdonthavespecialfeatureslikethis.

FPGA can contain very large digital designs, while CPLD can contain small designs only.The
limitedcomplexity(<500>

Speed: CPLDs oer a singlechip solution with fast pintopin delays, even for wide input
functions. Use CPLDs for small designs, where instanton, fast and wide decoding, ultralow
idlepowerconsumption,anddesignsecurityareimportant(e.g.,inbaeryoperatedequipment).

Security:InCPLDonceprogrammed,thedesigncanbelockedandthusmadesecure.Sincethe
congurationbitstreammustbereloadedeverytimepowerisreapplied,designsecurityinFPGA
isanissue.

Power: The high static (idle) power consumption prohibits use of CPLD in baeryoperated
equipment.FPGAidlepowerconsumptionisreasonablylow,althoughitissharplyincreasingin
thenewestfamilies.

Designexibility:FPGAsoermorelogicexibilityandmoresophisticatedsystemfeaturesthan
CPLDs: clock management, onchip RAM, DSP functions, (multipliers), and even onchip
microprocessors and MultiGigabit Transceivers.These benets and opportunities of dynamic
reconguration,evenintheendusersystem,areanimportantadvantage.

UseFPGAsforlargerandmorecomplexdesigns.

ClickheretoreadwhatXilinxhastosayaboutit.
FPGAissuitedfortimingcircuitbecaucetheyhavemoreregisters,butCPLDissuitedforcontrol
FPGAissuitedfortimingcircuitbecaucetheyhavemoreregisters,butCPLDissuitedforcontrol
circuitbecausetheyhavemorecombinationalcircuit.Atthesametime,Ifyousynthesisthesame
code for FPGA for many times, you will nd out that each timing report is dierent. But it is
dierentinCPLDsynthesis,youcangetthesameresult.

As CPLDs and FPGAs become more advanced the dierences between the two device types will
continuetoblur.Whilethistrendmayappeartomakethetwotypesmorediculttokeepapart,the
architectural advantage of CPLDs combining low cost, nonvolatile conguration, and macro cells
with predictable timing characteristics will likely be sucient to maintain a product dierentiation
fortheforeseeablefuture.

WhatisthedifferencebetweenFPGAandASIC?

ThisquestionisverypopularinVLSIfresherinterviews.Itlookssimplebutadeeperinsightinto
thesubjectrevealsthefactthattherearelotofthinkstobeunderstood!!Sohereistheanswer.

FPGAvs.ASIC

Dierence between ASICs and FPGAs mainly depends on costs, tool availability, performance
anddesignexibility.Theyhavetheirownprosandconsbutitisdesignersresponsibilitytond
the advantages of the each and use either FPGA or ASIC for the product. However, recent
developmentsintheFPGAdomainarenarrowingdownthebenetsoftheASICs.

FPGA

FieldProgramableGateArrays

FPGADesignAdvantages

Faster timetomarket: No layout, masks or other manufacturing steps are needed for FPGA
design.ReadymadeFPGAisavailableandburnyourHDLcodetoFPGA!Done!!

No NRE (Non Recurring Expenses): This cost is typically associated with an ASIC design. For
FPGAthisisnotthere.FPGAtoolsarecheap.(sometimesitsfree!YouneedtobuyFPGA.thats
all !). ASIC youpay huge NRE and tools are expensive. I would say very expensiveIts in
crores.!!

Simpler design cycle: This is due to software that handles much of the routing, placement, and
timing. Manual intervention is less.The FPGA design ow eliminates the complex and time
consumingoorplanning,placeandroute,timinganalysis.

More predictable project cycle: The FPGA design ow eliminates potential respins, wafer
capacities, etc of the project since the design logic is already synthesized and veried in FPGA
device.

Field Reprogramability: A new bitstream ( i.e. your program) can be uploaded remotely,
instantly.FPGAcanbereprogrammedinasnapwhileanASICcantake$50,000andmorethan4
6weekstomakethesamechanges.FPGAcostsstartfromacoupleofdollarstoseveralhundreds
ormoredependingonthehardwarefeatures.

Reusability: Reusability of FPGA is the main advantage. Prototype of the design can be
Reusability: Reusability of FPGA is the main advantage. Prototype of the design can be
implemented on FPGA which could be veried for almost accurate results so that it can be
implementedonanASIC.IfdesignhasfaultschangetheHDLcode,generatebitstream,program
toFPGAandtestagain.ModernFPGAsarerecongurablebothpartiallyanddynamically.

FPGAsaregoodforprototypingandlimitedproduction.Ifyouaregoingtomake100200boards
itisntworthtomakeanASIC.

Generally FPGAs are used for lower speed, lower complexity and lower volume designs.But
todays FPGAs even run at 500 MHz with superior performance. With unprecedented logic
densityincreasesandahostofotherfeatures,suchasembeddedprocessors,DSPblocks,clocking,
andhighspeedserialateverlowerprice,FPGAsaresuitableforalmostanytypeofdesign.

Unlike ASICs, FPGAs have special hardwares such as BlockRAM, DCM modules, MACs,
memories and highspeed I/O, embedded CPU etc inbuilt, which can be used to get beer
performace. Modern FPGAs are packed with features. Advanced FPGAs usually come with
phaselocked loops, lowvoltage dierential signal, clock data recovery, more internal routing,
high speed, hardware multipliers for DSPs, memory,programmable I/O, IP cores and
microprocessor cores. Remember Power PC (hardcore) and Microblaze (softcore) in Xilinx and
ARM(hardcore)andNios(softcore)inAltera.ThereareFPGAsavailablenowwithbuiltinADC!
Usingallthesefeaturesdesignerscanbuildasystemonachip.Now,douyoreallyneedanASIC
?

FPGAsythesisismuchmoreeasierthanASIC.

InFPGAyouneednotdooorplanning,toolcandoiteciently.InASICyouhavedoit.

FPGADesignDisadvantages

Powe consumption in FPGA is more. You dont have any control over the power optimization.
ThisiswhereASICwinstherace!

YouhavetousetheresourcesavailableintheFPGA.ThusFPGAlimitsthedesignsize.

Goodforlowquantityproduction.Asquantityincreasescostperproductincreasescomparedto
theASICimplementation.

ASIC

ApplicationSpecicIntergratedCirciut

ASICDesignAdvantages

Cost.cost.cost.Lowerunitcosts:Forveryhighvolumedesignscostscomesouttobevery
less.LargervolumesofASICdesignprovestobecheaperthanimplementingdesignusingFPGA.

Speedspeedspeed.ASICs are faster than FPGA: ASIC gives design exibility. This gives
enoromousopportunityforspeedoptimizations.

Low power.Low power.Low power: ASIC can be optimized for required low power. There
are several low power techniques such as power gating, clock gating, multi vt cell libraries,
pipeliningetcareavailabletoachievethepowertarget.ThisiswhereFPGAfailsbadly!!!Canyou
think of a cell phone which has to be charged for every call..never..low power ASICs helps
baerylivelongerlife!!

InASICyoucanimplementanalogcircuit,mixedsignaldesigns.Thisisgenerallynotpossiblein
InASICyoucanimplementanalogcircuit,mixedsignaldesigns.Thisisgenerallynotpossiblein
FPGA.

InASICDFT(DesignForTest)isinserted.InFPGADFTisnotcarriedout(ratherforFPGAno
needofDFT!).

ASICDesignDiadvantages

Timetomarket: Some large ASICs can take a year or more to design. A good way to shorten
developmenttimeistomakeprototypesusingFPGAsandthenswitchtoanASIC.

Design Issues: In ASIC you should take care of DFM issues, Signal Integrity isuues and many
more. In FPGA you dont have all these because ASIC designer takes care of all these. ( Dont
forgetFPGAisanICanddesignedbyASICdesignenginner!!)

ExpensiveTools:ASICdesigntoolsareverymuchexpensive.YouspendahugeamountofNRE.

StructuredASICS

StructuredASICshavetheboommetallayersxedandonlythetoplayerscanbedesignedby
thecustomer.

Structured ASICs are custom devices that approach the performance of todays Standard Cell
ASICwhiledramaticallysimplifyingthedesigncomplexity.

Structured ASICs oer designers a set of devices with specic, customizable metal layers along
with predened metal layers, which can contain the underlying paern of logic cells, memory,
andI/O.

FPGAvs.ASICDesignFlowComparison

hp://www.xilinx.com/company/geingstarted/fpgavsasic.htm

Otherlinks

hp://www.controleng.com/article/CA607224.html

hp://www.soccentral.com/results.asp?CategoryID=488&EntryID=15887

hp://www.us.designreuse.com/articles/article9010.html

1commentsLinkstothispost

Labels:ASIC,FPGA

Inscanchainsifsomeflipflopsare+veedgetriggeredand
remainingflipflopsareveedgetriggeredhowitbehaves?

Answer:

Fordesignswithbothpositiveandnegativeclockedops,thescaninsertiontoolwillalwaysroute
Fordesignswithbothpositiveandnegativeclockedops,thescaninsertiontoolwillalwaysroute
thescanchainsothatthenegativeclockedopscomebeforethepositiveedgeopsinthechain.This
avoidstheneedoflockuplatch.

For the same clock domain the negedge ops will always capture the data just captured into the
posedgeopsontheposedgeoftheclock.

For the multiple clock domains, it all depends upon how the clock trees are balanced. If the clock
domainsarecompletelyasynchronous,ATPGhastomaskthereceivingops.

Whatisdifferencebetweennormalbufferandclockbuffer?

Answer:

Clock net is one of the High Fanout Net(HFN)s. The clock buers are designed with some special
property like high drive strength and less delay. Clock buers have equal rise and fall time. This
preventsdutycycleofclocksignalfromchangingwhenitpassesthroughachainofclockbuers.

Normal buers are designed with W/L ratio such that sum of rise time and fall time is minimum.
Theytooaredesignedforhigherdrivestrength.

WhatisdifferencebetweenHFNsynthesisandCTS?

Answer:

HFNsaresynthesizedinfrontendalso.butatthatmomentnoplacementinformationofstandard
cellsareavailablehencebackendtoolcollapsessynthesizedHFNs.ItresenthesizesHFNsbasedon
placement information and appropriately inserts buer. Target of this synthesis is to meet delay
requirementsi.e.setupandhold.

Forclocknosynthesisiscarriedoutinfrontend(why..????..becausenoplacementinformationof
ipops ! So synthesis wont meet true skew targets !!) in backend clock tree synthesis tries to
meet skew targetsIt inserts clock buers (which have equal rise and fall time, unlike normal
buers!)ThereisnoskewinformationforanyHFNs.

Isitpossibletohaveazeroskewinthedesign?

Answer:

Theoreticallyitispossible.!

Practicallyitisimpossible.!!

Practicallywecantreduceanydelaytozero.delaywillexisthencewetrytomakeskewequal
Practicallywecantreduceanydelaytozero.delaywillexisthencewetrytomakeskewequal
(or same) rather than zeronow with this optimization all ops get the clock edge with same
delay relative to each other. so virtually we can say they are having zero skew or skew is
balanced.

Whatyoumeanbyscanchainreordering?

Answer1:

Based on timing and congestion the tool optimally places standard cells. While doing so, if scan
chainsaredetached,itcanbreakthechainordering(whichisdonebyascaninsertiontoollikeDFT
compiler from Synopsys) and can reorder to optimize it. it maintains the number of ops in a
chain.

Answer2:

During placement, the optimization may make the scan chain dicult to route due to congestion.
Hencethetoolwillreorderthechaintoreducecongestion.

Thissometimesincreasesholdtimeproblemsinthechain.Toovercomethesebuersmayhavetobe
inserted into the scan path. It may not be able to maintain the scan chain length exactly. It cannot
swapcellfromdierentclockdomains.

Becauseofscanchainreorderingpaernsgeneratedearlierisofnouse.Butthisisnotaproblemas
ATPGcanberedonebyreadingthenewnetlist.

Onwhatbasiswedecidetheclockfrequencyinanydesign?

Answer:

Thereareseveralfactors.Importantofthemare:

1) Input and output data rate : For example if you are designing any encryptor or decryptor you
needminimum100MHz

2)Power:Higherthefrequencymorethepowerconsumption

3)Accuracyoftheresultsrequired:IfhigheraccuracyisnotneededRCoscillatorcanbeusedwhich
savesareaandeverythingwewantincompactsize..butRCcantproducehigherfrequency!

4)Technology:Lowerthenodemorespeed(alsomorepower.againtradeo!!).howmuchfast
wewant?

5)Targetplatform: Is it FPGA or custom ASIC. naturally ASIC can give higher clok frequency
butFPGAfrequencyofoperationislimitedbyseveralotherfactors
WhatisJTAG?

Answer1:

JTAG is acronym for Joint Test Action Group.This is also called as IEEE 1149.1 standard for
Standard Test Access Port and BoundaryScan Architecture. This is used as one of the DFT
techniques.

Answer2:

JTAG(JointTestActionGroup)boundaryscanisamethodoftestingICsandtheirinterconnections.
This used a shift register built into the chip so that inputs could be shifted in and the resulting
outputscouldbeshiftedout.JTAGrequiresfourI/Opinscalledclock,inputdata,outputdata,and
statemachinemodecontrol.

TheusesofJTAGexpandedtodebuggingsoftwareforembeddedmicrocontrollers.Thiselimjinates
the need for incircuit emulators which is more costly. Also JTAG is used in downloading
congurationbitstreamstoFPGAs.

JTAGcellsarealsoknownasboundaryscancells,aresmallcircuitsplacedjustinsidetheI/Ocells.
Thepurposeistoenabledatato/fromtheI/Othroughtheboundaryscanchain.Theinterfacetothese
scan chains are called the TAP (Test Access Port), and the operation of the chains and the TAP are
controlledbyaJTAGcontrollerinsidethechipthatimplementsJTAG.

ASICDesignCheckList

SiliconProcessandLibraryCharacteristics

Whatexactprocessareyouusing?
Howmanylayerscanbeusedforthisdesign?
AretheCrosstalkNoiseconstraints,XtalkAnalysisconguration,CellEM&WireEMavailable?

DesignCharacteristics

Whatisthedesignapplication?
Numberofcells(placeableobjects)?
IsthedesignVerilogorVHDL?
Isthenetlistatorhierarchical?
IsthereRTLavailable?
Isthereanydatapathlogicusingspecialdatapathtools?
IstheDFTtobeconsidered?
Canscanchainsbereordered?
IsmemoryBIST,boundaryscanusedonthisdesign?
ArestatictiminganalysisconstraintsavailableinSDCformat?

ClockCharacteristics

Howmanyclockdomainsareinthedesign?
Whataretheclockfrequencies?
Whataretheclockfrequencies?
Isthereatargetclockskew,latencyorotherclockrequirements?
DoesthedesignhaveaPLL?
Ifso,isitusedtoremoveclocklatency?
IsthereanyI/Ocellinthefeedbackpath?
IsthePLLusedforfrequencymultipliers?
Aretherederivedclocksorcomplexclockgenerationcircuitry?
Arethereanygatedclocks?
Ifyes,dotheyusesimplegatingelements?
Isthegateclockusedfortimingorpower?
Forgatedclocks,canthegatingelementsbesizedfortiming?
AreyoumuxinginatestclockorusingaJTAGclock?
Availablecellsforclocktree?
Arethereanyspecialclockrepeatersinthelibrary?
ArethereanyEM,sleworcapacitancelimitsontheserepeaters?
Howmanydrivestrengthsareavailableinthestandardbuersandinverters?
Doanyofthebuershavebalancedriseandfalldelays?
Anytherespecialrequirementsforclockdistribution?
Willtheclocktreebeshielded?Ifso,whataretheshieldingrequirements?

FloorplanandPackageCharacteristics

Targetdiearea?
Doestheareaestimateincludepower/signalrouting?
Whatgates/mm2hasbeenassumed?
Numberofroutinglayers?
Anyspecialpowerroutingrequirements?
NumberofdigitalI/Opins/pads?
Numberofanalogsignalpins/pads?
Numberofpower/groundpins/pads?
Totalnumberofpins/padsandLocation?
Willthischipuseawirebondpackage?
Willthischipuseaipchippackage?
IfYes,isitI/Obumppitch?Rowsofbumps?Bumpallocation?Bumppadlayoutguide?
Haveyoualreadydoneoorplanningforthisdesign?
Ifyes,isconformancetotheexistingoorplanrequired?
Whatisthetargetdiesize?
Whatistheexpectedutilization?
Pleasedrawtheoveralloorplan?
IsthereanexistingoorplanavailableinDEF?
Whatarethenumberandtypeofmacros(memory,PLL,etc.)?
Arethereanyanalogblocksinthedesign?
Whatkindofpackagingisused?Flipchip?
AretheI/OsperipheryI/OorareaI/O?
HowmanyI/Os?
Isthedesignpadlimited?
PowerplanningandPoweranalysisforthisdesign?
Arelayoutdatabasesavailableforhardmacros?
Timinganalysisandcorrelatio?
Physicalverication?

DataInput

Libraryinformationfornewlibrary
Libraryinformationfornewlibrary
.libfortiminginformation
GDSIIorLEFforlibrarycellsincludinganyRAMs
RTLinVerilog/VHDLformat
NumberoflogicalblocksintheRTL
ConstraintsfortheblockinSDC
FloorplaninformationinDEF
I/Opinlocation
Macrolocations

PowerGating

Power Gating is eective for reducing leakage power [3]. Power gating is the technique wherein
circuitblocksthatarenotinusearetemporarilyturnedotoreducetheoverallleakagepowerofthe
chip.Thistemporaryshutdowntimecanalsocallaslowpowermodeorinactivemode.When
circuitblocksare required for operation once again they are activated toactivemode.Thesetwo
modes are switched at the appropriate time and in the suitable manner to maximize power
performance while minimizing impact to performance. Thus goal of power gating is to minimize
leakagepowerbytemporarilycuingpowerotoselectiveblocksthatarenotrequiredinthatmode.

Powergatingaectsdesignarchitecturemorecomparedtotheclockgating.Itincreasestimedelays
aspowergatedmodeshavetobesafelyenteredandexited.Thepossibleamountofleakagepower
savinginsuchlowpowermodeandtheenergydissipationtoenterandexitsuchmodeintroduces
some architectural tradeos. Shuing down the blocks can be accomplished either by software or
hardware. Driver software can schedule the power down operations. Hardware timers can be
utilized.Adedicatedpowermanagementcontrolleristheotheroption.

Anexternallyswitchedpowersupplyisverybasicformofpowergatingtoachievelongtermleakage
power reduction. To shuto the block for small interval of time internal power gating is suitable.
CMOS switches that provide power to the circuitry are controlled by power gating controllers.
Outputofthepowergatedblockdischargeslowly.Henceoutputvoltagelevelsspendmoretimein
thresholdvoltagelevel.Thiscanleadtolargershortcircuitcurrent.

Power gating uses lowleakage PMOS transistors as header switches to shut o power supplies to
parts of a design in standby or sleep mode. NMOS footer switches can also be used as sleep
transistors. Inserting the sleep transistors splits the chips power network into a permanent power
networkconnectedtothepowersupplyandavirtualpowernetworkthatdrivesthecellsandcanbe
turnedo.

Thequalityofthiscomplexpowernetworkiscriticaltothesuccessofapowergatingdesign.Twoof
themostcriticalparametersaretheIRdropandthepenaltiesinsiliconareaandroutingresources.
Power gating can be implemented using cell or clusterbased (or ne grain) approaches or a
distributedcoarsegrainedapproach.

Powergatingparameters

Power gating implementation has additional considerations than the normal timing closure
Power gating implementation has additional considerations than the normal timing closure
implementation.Thefollowingparametersneedtobeconsideredandtheirvaluescarefullychosen
forasuccessfulimplementationofthismethodology[1][2].

Powergatesize:Thepowergatesizemustbeselectedtohandletheamountofswitchingcurrent
atanygiventime.Thegatemustbebiggersuchthatthereisnomeasurablevoltage(IR)dropdue
to the gate. Generally we use 3X the switching capacitance for the gate size as a rule of thumb.
Designerscanalsochoosebetweenheader(PMOS)orfooter(NMOS)gate.Usuallyfootergates
tend to be smaller in area for the same switching current. Dynamic power analysis tools can
accuratelymeasuretheswitchingcurrentandalsopredictthesizeforthepowergate.

Gatecontrolslewrate:Inpowergating,thisisanimportantparameterthatdeterminesthepower
gatingeciency.Whentheslewrateislarge,ittakesmoretimetoswitchoandswitchonthe
circuitandhencecanaectthepowergatingeciency.Slewrateiscontrolledthroughbuering
thegatecontrolsignal.

Simultaneous switching capacitance: This important constraint refers to the amount of circuit
that can be switched simultaneously without aecting the power network integrity. If a large
amount of the circuit is switched simultaneously, the resulting rush current can compromise
thepowernetworkintegrity.Thecircuitneedstobeswitchedinstagesinordertopreventthis.

Power gate leakage: Since power gates are made of active transistors, leakage is an important
considerationtomaximizepowersavings.

Finegrainpowergating

Adding a sleep transistor to every cell that is to be turned o imposes a large area penalty, and
individually gating the power of every cluster of cells creates timing issues introduced by inter
cluster voltage variation that are dicult to resolve. Finegrain power gating encapsulates the
switchingtransistorasapartofthestandardcelllogic.Switchingtransistorsaredesignedbyeither
library IP vendor or standard cell designer. Usually these cell designs conform to the normal
standardcellrulesandcaneasilybehandledbyEDAtoolsforimplementation.

Thesizeofthegatecontrolisdesignedwiththeworstcaseconsiderationthatthiscircuitwillswitch
duringeveryclockcycleresultinginahugeareaimpact.Someoftherecentdesignsimplementthe
negrainpowergatingselectively,butonlyforthelowVtcells.IfthetechnologyallowsmultipleVt
libraries,theuseoflowVtdevicesisminimuminthedesign(20%),sothattheareaimpactcanbe
reduced.WhenusingpowergatesonthelowVtcellstheoutputmustbeisolatedifthenextstageisa
highVtcell.OtherwiseitcancausetheneighboringhighVtcelltohaveleakagewhenoutputgoesto
anunknownstateduetopowergating.

Gate control slew rate constraint is achieved by having a buer distribution tree for the control
signals.Thebuersmustbechosenfromasetofalwaysonbuers(buerswithoutthegatecontrol
signal) designed with high Vt cells. The inherent dierence between when a cell switches o with
respecttoanother,minimizestherushcurrentduringswitchonandswitcho.

Usuallythegatingtransistorisdesignedasahighvtdevice.Coarsegrainpowergatingoersfurther
exibility by optimizing the power gating cells where there is low switching activity. Leakage
optimization has to be done at the coarse grain level, swapping the low leakage cell for the high
leakage one. Finegrain power gating is an elegant methodology resulting in up to 10X leakage
reduction. This type of power reduction makes it an appealing technique if the power reduction
requirementisnotsatisedbymultipleVtoptimizationalone.

Coarsegrainpowergating
The coarsegrained approach implements the grid style sleep transistors which drives cells locally
The coarsegrained approach implements the grid style sleep transistors which drives cells locally
throughsharedvirtualpowernetworks.ThisapproachislesssensitivetoPVTvariation,introduces
less IRdrop variation, and imposes a smaller area overhead than the cell or clusterbased
implementations. In coarsegrain power gating, the powergating transistor is a part of the power
distributionnetworkratherthanthestandardcell.

Therearetwowaysofimplementingacoarsegrainstructure:

1)Ringbased

2)columnbased

Ringbasedmethodology:Thepowergatesareplacedaroundtheperimeterofthemodulethatis
being switchedo as a ring. Special corner cells are used to turn the power signals around the
corners.

Columnbased methodology: The power gates are inserted within the module with the cells
abuedtoeachotherintheformofcolumns.Theglobalpoweristhehigherlayersofmetal,while
theswitchedpowerisinthelowerlayers.

Gatesizingdependsontheoverallswitchingcurrentofthemoduleatanygiventime.Sinceonlya
fractionofcircuitsswitchatanypointoftime,powergatesizesaresmallerascomparedtothene
grain switches. Dynamic power simulation using worst case vectors can determine the worst case
switching for the module and hence the size. IR drop can also be factored into the analysis.
Simultaneous switching capacitance is a major consideration in coarsegrain power gating
implementation. In order to limit simultaneous switching daisy chaining the gate control buers,
specialcountersareusedtoselectivelyturnonblocksofswitches.

IsolationCells

Isolation cells are used to prevent short circuit current. As the name indicates these cells isolate
power gated block from the normally on block. Isolation cells are specially designed for low short
circuit current when input is at threshold voltage level. Isolation control signals are provided by
powergatingcontroller.Isolationofthesignalsofaswitchablemoduleisessentialtopreservedesign
integrity.UsuallyasimpleORorANDlogiccanfunctionasanoutputisolationdevice.Multiplestate
retention schemes are available in practice to preserve the state before a module shuts down. The
simplesttechniqueistoscanouttheregistervaluesintoamemorybeforeshuingdownamodule.
Whenthemodulewakesup,thevaluesarescannedbackfromthememory.

RetentionRegisters

Whenpowergatingisused,thesystemneedssomeformofstateretention,suchasscanningoutdata
to a RAM, then scanning it back in when the system is reawakened. For critical applications, the
memorystatesmustbemaintainedwithinthecell,aconditionthatrequiresaretentionoptostore
bits in a table. That makes it possible to restore the bits very quickly during wakeup. Retention
registersarespeciallowleakageipopsusedtoholdthedataofmainregisterofthepowergated
block.Thusinternalstateoftheblockduringpowerdownmodecanberetainedandloadedbackto
itwhentheblockisreactivated.Retentionregistersarealwayspoweredup.Theretentionstrategyis
designdependent.Duringthepowergatingdatacanberetainedandtransferredbacktoblockwhen
powergatingiswithdrawn.Powergatingcontrollercontrolstheretentionmechanismsuchaswhen
tosavethecurrentcontentsofthepowergatingblockandwhentorestoreitback.
MultipleThresholdCMOS(MTCMOS)Circuits

James T. Kao et al. [2] showed MTCMOS logic is eective standby leakage control technique, but
dicult to implement since sleep transistor sizing is highly dependent on discharge paern within
the circuit block. They showed dual Vt domino logic avoids the sizing diculties and inherent
performance associated with MTCMOS. High Vt cells are used where leakage has to be prevented
whereas low Vt cells are employed where speed is of concern. Both cells are eectively used in
MTCMOStechnique.

MTCMOStechnique[1]

InactivemodeofoperationthehighVttransistorsareturnedoandthelogicgatesconsistingoflow
Vt transistors can operate with low switching power dissipation and smaller propagation delay. In
standbymodethehighVttransistorsareturnedotherebycuingotheinternallowVtcircuitry.

VariableThresholdCMOS(VTCMOS)

One of the ecient methods to reduce power consumption is to use low supply voltage and low
threshold voltage without loosing speed performance. But increase in the lower threshold voltage
devices leads to increased sub threshold leakage and hence more standby power consumption.
Variable Threshold CMOS (VTCMOS) devices are one solution to this problem. In VTCMOS
technique threshold voltage of the low threshold devices are varied by applying variable substrate
biasvoltagefromacontrolcircuitry.

VTCMOS technique is very eective technique to reduce the power consumption with some
drawbacks with respect to manufacturing of these devices. VTCMOS requires either twin well or
triplewelltechnologytoachievedierentsubstratebiasvoltagelevelsatdierentpartsoftheIC.The
areaoverheadofthesubstratebiascontrolcircuitryisnegligible.[1]

VoltageScaling

Reducing the power supply voltage is the eective technique to reduce dynamic power with the
speedpenalty.Keepingallothersfactorsconstantifpowerscalingisscaleddownpropagationdelay
willincrease.Thiscanbecompensatedbyscalingdownthethresholdvoltagetothesameextentas
thesupplyvoltage.ThisallowsthecircuittoproducethesamespeedperformanceatalowerVdd.At
thesametimesmallerthresholdvoltagesleadtosmallernoisemarginandincreasedleakagecurrent.
DynamicVoltageandFrequencyScaling(DVFS)

We know that supply voltage can be reduced if frequency of operation is reduced. If reduction in
supply voltage is quadratic then approximately cubic reduction of power consumption can be
achieved.However,itshouldbenotedthatfrequencyreductionslowstheoperation.

The above mentioned relation between energy and voltage is not always true. The authors in [1]
showedthatquadraticrelationshipbetweenenergyandVdddeviatesasVddisscaleddownintothe
sub threshold voltage level. Sub threshold leakage current increases exponentially with the supply
voltage.Sinceinsubthresholdoperationtheoncurrenttakestheformofsubthresholdcurrentdelay
increases exponentially with voltage scaling. At very low voltages dynamic power reduces
quadratically.Buttheleakageenergyincreaseswithsupplyvoltagereductionsinceleakageenergyis
linear with the circuit delay. Hence dynamic and leakage power becomes comparable in sub
thresholdvoltageregion.

According to Bo Zhai et al. [1] dynamic voltage and frequency scaling is very popular low power
technique. But larger voltage ranges does not improve power eciency. They showed that for sub
threshold supply voltages, leakage energy becomes dominant, making just in time completion
energyinecient.TheyalsoshowedthatextendingvoltagerangebelowhalfVddwillimprovethe
energyeciencyformostprocessordesignswhileextendingthisrangetosubthresholdoperations
isbenecialonlyforspecicapplications.Oneoftheimportantpointstobenotedfromtheirstudyis
DVFSinsubthresholdvoltagerangeisneverenergyecient.

MultiThreshold(MVT)VoltageTechnique

MultiplethresholdvoltagetechniquesusebothLowVtandHighVtcells.Uselowerthresholdgates
on critical path while higher threshold gates o the critical path. This methodology improves
performancewithoutanincreaseinpower.FlipsideofthistechniqueisthatMultiVtcellsincrease
fabrication complexity. It also lengthens the design time. Improper optimization of the design may
utilizemoreLowVtcellsandhencecouldendupwithincreasedpower!

Ruchir Puri et al. [2] have discussed the design issues related with multiple supply voltages and
multiple threshold voltages in the optimization of dynamic and static power. They noted several
advantages of Multi Vt optimization. Multi Vt optimization is placement non disturbing
transformation.FootprintandareaoflowVtandhighVtcellsaresameasthatofnominalVtcells.
ThisenablestimecriticalpathstobeswappedbylowVtcellseasily.

Frank Sill et al. [3] have proposed a new method for assignment of devices with dierent Vth in a
doubleVthprocess.TheydevelopedmixedVthgates.Theyshowedleakagereductionof25%.They
createdalibraryofLVT,mixedVt,HVTandMultiVt.TheycomparedsimulationresultswithaLVT
version of each design. Leakage power dissipation decreased by average 65% with mixed Vth
techniquecomparedtotheLVTimplementation.

MeetaSrivatsavetal.[4]haveexploredvariouswaysofreducingleakagepowerandrecommended
Multi Vt approach. They have carried out analysis using 130 nm and 90 nm technology. They
synthesizeddesignwithdierentcombinationoftargetlibrary.ThecombinationswereLowVtcells
only,HighVtcellsonly,HighVtcellswithincrementalcompileusingLowVtlibrary,nominal(or
regular) Vt cell and Multi Vt targeting Hvt and Lvt in one go. With only Low Vt highest leakage
regular) Vt cell and Multi Vt targeting Hvt and Lvt in one go. With only Low Vt highest leakage
powerof469wwasobtained.WithonlyHighVtcellsleakagepowerconsumptionwasminimum
buttimingwasnotmet(1.13ofslack).WithnominalVtmoderateleakagepowervalueof263w
wasobtained.Bestresults(54wwithtimingmet)obtainedforsynthesistargetingHvtlibraryand
incrementalcompileusingLvtlibrary.

Dierent low leakage synthesis ows are carried out by Xiaodong Zhang [1] using Synopsys EDA
toolsarelistedbelow:

LowVt>MultiVtow:Thisproducesleastcellcountandleastdynamicpower.Butproduce
highest leakage power. It takes very low runtime. Good for a design with very tight timing
constraints

MultiVtonepassow:Ittakeslongestruntimeandcanbeusedinmostofdesigns.

HighVt>MultiVtow:Produceleastleakagepowerconsumptionbuthashighcellcountand
dynamicpower.Thismethodologyisgoodforleakagepowercriticaldesign.

HighVt > MultiVt with dierent timing constraints ow: This is a well balanced ow and
producessecondleastleakagepower.Thishassmallercellcount,areaanddynamicpowerand
shorterruntime.Thisdesignisalsogoodformostofdesigns.

OptimizationStrategies

ThetradeosbetweenthedierentVtcellstoachieveoptimalperformanceareespeciallybenecial
duringsynthesistechnologygatemappingandplacementoptimization.Thelogicsynthesis,orgate
mapping phase of the optimization process is implemented by synthesis tool, and placement
optimizationishandledphysicalimplementationtool.

Synthesis

Duringlogicsynthesis,thedesignismappedtotechnologygates.Atthispointintheprocessoptimal
logicarchitecturesareselected,mappedtotechnologycells,andoptimizedforspecicdesigngoals.
SincearangeofVtlibrariesarenowavailableandchoiceshavetobemadeacrossarchitectureswith
dierentVtcells,logicsynthesisistheidealplacetostartdeployingamixofdierentVtcellsinto
thedesign.

SinglePassvs.TwoPassSynthesiswithmultiplethresholdlibraries

Multiple libraries are currently available with dierent performance, area and power utilization
characteristics, and synthesis optimization can be achieved using either one or more libraries
concurrently. In a singlepass ow, multiple libraries can be loaded into synthesis tool prior to
synthesis optimization. In a twopass ow, the design is initially optimized using one library, and
thenanincrementaloptimizationiscarriedoutusingadditionallibraries.

About multi vt optimization in his paper Ruchir Puri[2] says: The multithreshold optimization
algorithm implemented in physical synthesis is capable of optimizing several Vt levels at the same
time. Initially, the design is optimized using the higher threshold voltage library only. Then, the
MultiVt optimization computes the powerperformance tradeo curve up to the maximum
allowable leakage power limit for the next lower threshold voltage library. Subsequently, the
optimizationstartsfromthemostcriticalslackendofthispowerperformancecurveandswitchesthe
most critical gate to next equivalent lowerVt version. This will increase the leakage in the design
beyond the maximum permissible leakage power. To compensate for this, the algorithm picks the
leastcriticalgatefromtheotherendofthepowerperformancecurveandsubstitutesitbackwithits

higherVt version. If this does not bring the leakage power below the allowed limit, it traverses
higherVt version. If this does not bring the leakage power below the allowed limit, it traverses
further from the curve (from least critical towards more critical) substituting gates with higherVt
gates, until the leakage limit is satised. Then we jump back to the second most critical cell and
switchittothelowerVtversion.Thisiterationcontinuesuntilwecannolongerswitchanygatewith
thelowervtversionwithoutviolatingtheleakagepowerlimit.

ButAmitAgarwaletal.[5]havewarnedabouttheyieldlosspossibilitiesduetodualVtows.They
showedthatinnanoscaleregime,conventionaldualVtdesignsuersfromyieldlossduetoprocess
variationandvastlyoverestimatesleakagesavingssinceitdoesnotconsiderjunctionBTBT(BandTo
BandTunneling)leakageintoaccount.Theiranalysisshowedthe importance of considering device
basedanalysiswhiledesigninglowpowerschemeslikedualVt.Theirresearchalsoshowedthatin
scaledtechnology,statisticalinformationofbothleakageanddelayhelpsinminimizingtotalleakage
whileensuringyieldwithrespecttotargetdelayindualVtdesigns.However,nonscalabilityofthe
present way of realizing high Vt, requires the use of dierent process options such as metal gate
workfunctionengineeringinfuturetechnologies.

MultiVdd(Voltage)

Dynamic power is directly proportional to power supply. Hence naturally reducing power
signicantly improves the power performance. At the same time gate delay increases due to the
decreasedthresholdvoltage.Highvoltagecanbeappliedtothetimingcriticalpathandrestofthe
chip runs in lower voltage. Overall system performance is maintained. Dierent blocks having
dierent voltage supplies can be integrated in SoC. This increases power planning complexity in
terms of laying down the power rails and power grid structure. Level shifters are necessary to
interfacebetweendierentblocks.

MultipleVoltageASIC/SoCDesign:Classification

Multivoltagedesignstrategiescanbebroadlyclassiedasfollows[1]:

Static Voltage Scaling (SVS): Dierent but xed voltage is applied to dierent blocks or
subsystemsoftheSoCdesign.

Multilevel Voltage Scaling (MVS): The block or subsystem of the ASIC or SoC design is
switchedbetweentwoormorevoltagelevels.Butfordierentoperatingmodeslimitednumbers
ofdiscretevoltagelevelsaresupported.

Dynamic Voltage and Frequency Scaling (DVFS): Voltage as well as frequency is dynamically
variedasperthedierentworkingmodesofthedesignsoastoachievepowereciency.When
highspeedofoperationisrequiredvoltageisloweredtoaainhigherspeedofoperationwiththe
penaltyofincreasedpowerconsumption.

Adaptive voltage Scaling (AVS): Here voltage is controlled using a control loop. This is an
extensionofDVFS.

MultiVoltageDesignChallenges

LevelShifters

Signalscrossingfromonevoltagedomaintoanothervoltagedomainhavetobeinterfacedthrough
Signalscrossingfromonevoltagedomaintoanothervoltagedomainhavetobeinterfacedthrough
thelevelshifterbuerswhichappropriatelyshiftthesignallevels.Designofsuitablelevelshifterisa
challengingjob.

TimingAnalysis

Timinganalysisofthegivendesignbecomessimplerwiththesinglevoltageasitcanbeperformed
forsingleperformancepointbasedonthecharacterizedlibraries.Toolscanoptimizethedesignfor
worst case PVT (Process, Voltage, temperature) conditions. This is not the case with multi voltage
designs. Libraries should be characterized for dierent voltage levels that are used in the design.
EDAtoolhastooptimizeindividualblocksorsubsystemsandalsomultiplevoltagedomains.This
analysisbecomescomplexforlargerASIC/SoC.

FloorplanningandPowerPlanning

Multiple power domain demands multiple power grid structure and a suitable power distribution
amongthem.ForalargerASIC/SoCmorecarefuloorplanningandpowerplanningisessential.The
speed in which dierent power domains switch on or o is also important. A low voltage power
domain may activate early compared to the high voltage domain. Multi voltage designs pose
additional board level complexities. Separate power supply may necessary to provide dierent
powerlevels.

MultiVoltageDesigns:TimingIssues

Clock

Clock Tree Synthesis (CTS) tools should be aware of dierent power domains and understand the
levelshifterstoinserttheminappropriateplaces.Clocktreeisroutedthroughlevelshifterstoreach
dierent power domains. Simultaneous timing analysis and optimization is necessary for multiple
voltagedomains.ThusCTSbecomesmorecomplexinmultivoltagedesigns.

TimingIssueswithmultivoltagedesign

StaticTimingAnalysis(STA)

Timinganalysisforsinglevoltagedesigniseasy.Whenitcomestostaticvoltagescalingitbecomes
liletougherjobasanalysishastobecarriedoutfordierentvoltages.Thismethodologyrequires
librarieswhicharecharacterizedfordierentvoltagesused.Multilevelanddynamicvoltagescaling
poseagreaterchallenge.Foreachsupplyvoltageleveloroperatingpointconstraintsarespecied.
There can be dierent operating modes for dierent voltages. Constraints need not be same for all
modesandvoltages.Theperformancetargetforeachmodecanvary.EDAtoolshouldbecapableof
handling all these situations simultaneously to carry out timing analysis. Dierent constraints at
dierentmodesandvoltageshavetobesatised.

MultiVoltageDesigns:PowerPlanningIssues

EcientpowerplanningisoneofthekeyconcernsofmodernSoCdesigns.Inmultivoltagedesigns
providing power to the dierent power domains is challenging. Every power domain requires
independent local power supply and grid structure and some designs may even have a separate
powerpad.Separatepowerpadispossibleinipchipdesignsandpowerpadcanbetakenoutnear
fromthepowerdomain.Otherchipshavetotakeoutthepowerpadsfromtheperipherywhichcan
putlimittothenumberofpowerdomains.

Local on chip voltage regulation is good idea to provide multiple voltages to dierent circuits.
Local on chip voltage regulation is good idea to provide multiple voltages to dierent circuits.
Unfortunately most of the digital CMOS technologies are not suitable for the implementation of
either switched mode of operation or linear voltage regulations. Separate power rail structure is
requiredforeachpowerdomain.TheseadditionalpowerrailsintroducedierentlevelsofIRdrop
puinglimittotheachievablepowereciency.

LowPowerDesignTechniques

MichaelKeatingetal.[1]listsseverallowpowertechniquestotacklethedynamicandstaticpower
consumptioninmodernSoCdesigns.Dynamicpowercontroltechniquesincludeclockgating,multi
voltage, variable frequency, and ecient circuits. Leakage power control techniques include power
gating,multiVtcells.CommonmethodssupportedbyEDAtoolsincludeclockgating,gatesizing,
lowpowerplacement,registerclustering,lowpowerCTS,multiVtoptimization.

Someofthelowpowertechniquesinusetodayarelistedinbelowtable.

DierentLowPowerTechniques[3]

Tradeosassociatedwiththevariouspowermanagementtechniques[2]

Abovetablesummarizestradeosassociatedwithdierentpowermanagementtechniques.Power
gatingandDVFSdemandlargemethodologychangewhereasmultivtandclockgatingaectleast.
Unlesslargeleakageoptimizationisnotnecessaryitisalwaysbenecialtogowitheithermultivtor
clockgatingtechniques.Basedonthedesigncomplexityandrequirementscombinationofanylow
powertechniquescanbeadopted.Multivtoptimizationalongwiththepowergatingisfoundtobe
ecient in some of the complex designs. Advanced improvements in the implementation (i.e.
fabrication) technology has allowed substrate biasing techniques to be used heavily as it does not
poseanyarchitecturalanddesignvericationchallengesandalsoprovideshighleakagereduction.

MultiVoltageDesigns:PowerPlanningIssues

EcientpowerplanningisoneofthekeyconcernsofmodernSoC
designs.Inmultivoltagedesignsprovidingpowertothedierent
powerdomainsischallenging.Everypowerdomainrequiresindependent
localpowersupplyandgridstructureandsomedesignsmayevenhavea
separatepowerpad.Separatepowerpadispossibleinipchip
designsandpowerpadcanbetakenoutnearfromthepowerdomain.
Otherchipshavetotakeoutthepowerpadsfromtheperipherywhich
canputlimittothenumberofpowerdomains.

Localonchipvoltageregulationisgoodideatoprovidemultiple
voltagestodierentcircuits.UnfortunatelymostofthedigitalCMOS
technologiesarenotsuitablefortheimplementationofeither
switchedmodeofoperationorlinearvoltageregulations.

Separatepowerrailstructureisrequiredforeachpowerdomain.These
Separatepowerrailstructureisrequiredforeachpowerdomain.These
additionalpowerrailsintroducedierentlevelsofIRdroppuing
limittotheachievablepowereciency.

VariableThresholdCMOS(VTCMOS)Circuits

Oneoftheecientmethodstoreducepowerconsumptionistouselow
supplyvoltageandlow
thresholdvoltagewithoutloosingspeedperformance.Butincreasein
thelowerthresholdvoltagedevicesleadstoincreasedsubthreshold
leakageandhencemorestandbypowerconsumption.

VariableThresholdCMOS(VTCMOS)devicesareonesolutiontothis
problem.InVTCMOStechniquethresholdvoltageofthelowthreshold
devicesarevariedbyapplyingvariablesubstratebiasvoltagefroma
controlcircuitry.

VTCMOStechniqueisveryeectivetechniquetoreducethepower
consumptionwithsomedrawbacksrelatedtomanufacturingofthese
devices.VTCMOSrequireseithertwinwellortriplewelltechnologyto
achievedierentsubstratebiasvoltagelevelsatdierentpartsof
theIC.Theareaoverheadofthesubstratebiascontrol

MultiVoltageDesigns:TimingIssues

Clock

ClockTreeSynthesis(CTS)toolsshouldbeawareofdierentpower
domainsandunderstandthelevelshifterstoinsertthemin
appropriateplaces.Clocktreeisroutedthroughlevelshiftersto
reachdierentpowerdomains.Simultaneoustiminganalysisand
optimizationisnecessaryformultiplevoltagedomains.ThusCTS
becomesmorecomplexinmultivoltagedesigns.

StaticTimingAnalysis(STA)

Timinganalysisforsinglevoltagedesigniseasy.Whenitcomesto
staticvoltagescalingitbecomesliletougherjobasanalysishas
tobecarriedoutfordierentvoltages.Thismethodologyrequires
librarieswhicharecharacterizedfordierentvoltagesused.

Multilevelanddynamicvoltagescalingposeagreaterchallenge.For
eachsupplyvoltageleveloroperatingpointconstraintsare
specied.Therecanbedierentoperatingmodesfordierent
voltages.Constraintsneednotbesameforallmodesandvoltages.The
performancetargetforeachmodecanvary.EDAtoolshouldbecapable

ofhandlingallthesesituationssimultaneouslytocarryouttiming
ofhandlingallthesesituationssimultaneouslytocarryouttiming
analysis.Dierentconstraintsatdierentmodesandvoltageshave
tobesatised.

MultipleVoltageDesignChallenges

LevelShifters

Signalscrossingfromonevoltagedomaintoanothervoltagedomainhas
tobeinterfacedthroughthelevelshifterbuerswhichappropriately
shiftsthesignallevels.Designofsuitablelevelshifterisa
challengingjob.

TimingAnalysis

Timinganalysisofthegivendesignbecomessimplerwiththesingle
voltageasitcanbeperformedforsingleperformancepointbasedon
thecharacterizedlibraries.Toolscanoptimizethedesignforworst
casePVT(Process,Voltage,temperature)conditions.

Thisisnotthecasewithmultivoltagedesigns.Librariesshouldbe
characterizedfordierentvoltagelevelsthatareusedinthe
design.EDAtoolhastooptimizeindividualblocksorsubsystemsand
alsomultiplevoltagedomains.Thisanalysisbecomescomplexfor
largerASIC/SoC.

FloorplanningandPowerPlanning

Multiplepowerdomaindemandsmultiplepowergridstructureanda
suitablepowerdistributionamongthem.ForalargerASIC/SoCmore
carefuloorplanningandpowerplanningisessential

Thespeedinwhichdierentpowerdomainsswitchonoroalso
important.Alowvoltagepowerdomainmayactivateearlycomparedto
thethehighvoltagedomain.Multivoltagedesignsposeadditional
boardlevelcomplexities.Separatepowersupplymaynecessaryto
providedierentpowerlevels.

MultipleThreshold(MultiVt)CellLibraries

Withthetechnologiesshrinkingto90nm,30nmandbelowoneofthe
commonwaystoreduceleakagepoweristousemultipleVtlibraries.
SubthresholdleakagevariesexponentiallywiththeVtcomparatedto
theweakerdependanceofdelayoverVt.

Librariesareoeredindierentversionseachconsistingof
standardVtcells,lowVtcellsandhighVtcellsindependantofeach
other.Powerandtimingisoptimizedbasedontheselibrariesandthey
oergoodexibilityandopportunitytologicandphysicalsynthesis
oergoodexibilityandopportunitytologicandphysicalsynthesis
toolforoptimizationprocess.

DualVtsynthesisowhasbecomequitecommonin130nmandbelow
tehnologynodes.Inthisowinitialsynthesisiscarriedout
targetingprimarylibrarywhichmaybealowVtorhighVtornormal
Vtlibrary,andtheseconditeratonofsynthesisandoptimizationis
performedbasedonsecondarylibrarieswhicharealsolibraries
consistitngofmultiplethresholdcells.

Whichlibraryhastobeusedasprimarylibrary?

Thisdependsontheoptimizationtargetasperthedesignrequirement.
Ingeneral,ifoptimizationtargetispowerperformance,rst
syntheizethedesignusingthehighVtcelllibrarywhichachieves
lowestleakagepower.Inthenextiterationofoptimizationcellsin
thecriticalpathhastobereplacedbylowVtcellswhicharefaster.
IftheoptimizationtargetistomeettimingthenrstuselowVt
celllibrarytoachievetimingandthenoptimizeleakagepowerusing
highVtcells.

LowPowerTechniques:MultiVoltage(Vdd)

Dynamicpowerisdirectlyproportionaltopowersupply.Hence
naturallyreducingpowersignicantlyimprovesthepower
performance.Atthesametimegatedelayincreasesduetothedecreased
thresholdvoltage.

Highvoltagecanbeappliedtothetimingcriticalpathandrestof
thechiprunsinlowervoltage.Overallsystemperformanceis
maintained.

Dierentblockshavingdierentvoltagesuppliescanbeintegrated
inSoC.Thisincreasespowerplanningcomplexityintermsoflaying
downthepowerrailsandpowergridstructure.Levelshiftersare
necessarytointerfacebetweendierentblocks.

LowPowerTechniques:ClockGating

Clockbuersconsumemorethan50%ofdynamicpower.Henceitis
gooddesignideatoturnotheclockwhenitisnotneeded.Automatic
clockgatingissupportedbymodernEDAtools.Theyidentifythe
circuitswhereclockgatingcanbeinserted.

Specicclockgatingcellsarerequiredinlibrarytobeutilizedby
thesynthesistools.Availabilityofclockgatingcellsandautomatic
insertionbytheEDAtoolsmakesitsimplermethodoflowpower

technique.Advantageofthismethodisthatclockgatingdoesnot
technique.Advantageofthismethodisthatclockgatingdoesnot
requiremodicationstoRTLdescription.

CommentsRSS(ReallySimpleSyndication)feed

1Comment:

relocation
April24,2013at3:50pm
BestAntivirusSoftware.co.nzisNewZealandsNo.FREE
relocationhp://www.zoomgroups.com/userProle/6716514

Reply

BlogatWordPress.com.

You might also like