Backend (Physical Design)

BipeenKulkarni
ASICSoCPhysicalDesignguidelinesandSolutionsunderonecloud
Backend(PhysicalDesign)InterviewQuestions
andAnswers
Filedunder:VLSIInterviewpreparation1Comment
March29,2013
Belowarethesequenceofquestionsaskedforaphysicaldesignengineer.
Inwhicheldareyouinterested?
Answertothisquestiondependsonyourinterest,expertiseandtotherequirementforwhichyou
havebeeninterviewed.
Well..thecandidategaveanswer:Lowpowerdesign
Canyoutalkaboutlowpowertechniques?Howlowpowerandlatest90nm/65nmtechnologiesare
related?
Referhereandbrowsefordierentlowpowertechniques.
Doyouknowaboutinputvectorcontrolledmethodofleakagereduction?
Leakagecurrentofagateisdependantonitsinputsalso.Hencendthesetofinputswhichgives
least leakage. By applyig this minimum leakage vector to a circuit it is possible to decrease the
leakage current of the circuit when it is in the standby mode. This method is known as input
vectorcontrolledmethodofleakagereduction.
Howcanyoureducedynamicpower?
ReduceswitchingactivitybydesigninggoodRTL
Clockgating
Architecturalimprovements
Reducesupplyvoltage
UsemultiplevoltagedomainsMultivdd
Whatarethevectorsofdynamicpower?
VoltageandCurrent
Howwillyoudopowerplanning?
Howwillyoudopowerplanning?
Referhereforpowerplanning.
IfyouhavebothIRdropandcongestionhowwillyouxit?
Spreadmacros
Spreadstandardcells
Increasestrapwidth
Increasenumberofstraps
Useproperblockage
IsincreasingpowerlinewidthandprovidingmorenumberofstrapsaretheonlysolutiontoIR
drop?
Spreadmacros
Spreadstandardcells
Useproperblockage
Inaregtoregpathifyouhavesetupproblemwherewillyouinsertbuerneartolaunchingop
orcaptureop?Why?
(buersareinsertedforxingfanoutvoilationsandhencetheyreducesetupvoilation;otherwise
wetrytoxsetupvoilationwiththesizingofcells;nowjustassumethatyoumustinsertbuer!)
Neartocapturepath.
Because there may be other paths passing through or originating from the op nearer to lauch
op. Hence buer insertion may aect other paths also. It may improve all those paths or
degarde. If all those paths have voilation then you may insert buer nearer to launch op
provideditimprovesslack.
Howwillyoudecidebestoorplan?
Referhereforoorplanning.
Whatisthemostchallengingtaskyouhandled?WhatisthemostchallengingjobinP&Row?
ItmaybepowerplanningbecauseyoufoundmoreIRdrop
Itmaybelowpowertargetbecauseyouhadmoredynamicandleakagepower
Itmaybemacroplacementbecauseithadmoreconnectionwithstandardcellsormacros
ItmaybeCTSbecauseyouneededtohandlemultipleclocksandclockdomaincrossings
ItmaybetimingbecausesizingcellsinECOowisnotmeetingtiming
Itmaybelibrarypreparationbecauseyoufoundsomeinconsistancyinlibraries.
ItmaybeDRCbecauseyoufacedthousandsofvoilations
Howwillyousynthesizeclocktree?
Singleclocknormalsynthesisandoptimization
MultipleclocksSynthesiseachclockseperately
MultipleclockswithdomaincrossingSynthesiseachclockseperatelyandbalancetheskew
Howmanyclockswerethereinthisproject?
Itisspecictoyourproject
Moretheclocksmorechallenging!
Moretheclocksmorechallenging!
Howdidyouhandleallthoseclocks?
Multipleclocks>synthesizeseperately>balancetheskew>optimizetheclocktree
AretheycomefromseperateexternalresourcesorPLL?
Ifitisfromseperateclocksources(i.e.asynchronous;fromdierentpadsorpins)thenbalancing
skewbetweentheseclocksourcesbecomeschallenging.
IfitisfromPLL(i.e.synchronous)thenskewbalancingiscomparativelyeasy.
Whybuersareusedinclocktree?
Tobalanceskew(i.e.optoopdelay)
Whatiscrosstalk?
Switching of the signal in one net can interfere neigbouring net due to cross coupling
capacitance.Thisaectisknownascrostalk.Crosstalkmayleadsetuporholdvoilation.
Howcanyouavoidcrosstalk?
Doublespacing=>morespacing=>lesscapacitance=>lesscrosstalk
Multiplevias=>lessresistance=>lessRCdelay
Shielding=>constantcrosscouplingcapacitance=>knownvalueofcrosstalk
Buerinsertion=>boostthevictimstrength
Howshieldingavoidscrosstalkproblem?Whatexactlyhappensthere?
Highfrequencynoise(orglitch)iscoupledtoVSS(orVDD)sinceshildedlayersareconnectedto
eitherVDDorVSS.
CouplingcapacitanceremainsconstantwithVDDorVSS.
Howspacinghelpsinreducingcrosstalknoise?
widthismore=>more spacing between two conductors=>cross coupling capacitanceisless=>less

crosstalk
Whydoublespacingandmultipleviasareusedrelatedtoclock?
Why clock? because it is the one signal which chages it state regularly and more compared to
anyothersignal.Ifanyothersignalswitchesfastthenalsowecanusedoublespace.
Doublespacing=>widthismore=>capacitanceisless=>lesscrosstalk
Multiplevias=>resistanceinparellel=>lessresistance=>lessRCdelay
Howbuercanbeusedinvictimtoavoidcrosstalk?
Buerincreasevictimssignalstrength;buersbreakthenetlength=>victimsaremoretolerantto
coupledsignalfromaggressor.
0commentsLinkstothispost
Labels:PhysicalDesign,Synthesis,TimingAnalysis
Labels:PhysicalDesign,Synthesis,TimingAnalysis
PhysicalDesignQuestionsandAnswers
Iamgeingseveralemailsrequestinganswerstothequestionspostedinthisblog.Butitisvery
diculttoprovidedetailedanswertoallquestionsinmyavailablesparetime.Henceidecidedto
give short and sweet one line answers to the questions so that readers can immediately
beneted. Detailed answers will be posted in later stage.I have given answers to some of the
physicaldesignquestionshere.Enjoy!
Whatparameters(oraspects)dierentiateChipDesignandBlockleveldesign?
ChipdesignhasI/Opads;blockdesignhaspins.
Chipdesignusesallmetallayesavailable;blockdesignmaynotuseallmetallayers.
Chipisgenerallyrectangularinshape;blockscanberectangular,rectilinear.
Chipdesignrequiresseveralpackaging;blockdesignendsinamacro.
Howdoyouplacemacrosinafullchipdesign?
Firstcheckylinesi.e.checknetconnectionsfrommacrotomacroandmacrotostandardcells.
If there is more connection from macro to macro place those macros nearer to each other
preferablynearertocoreboundaries.
Ifinputpinisconnectedtomacrobeertoplacenearertothatpinorpad.
Ifmacrohasmoreconnectiontostandardcellsspreadthemacrosinsidecore.
Avoidcriscrossplacementofmacros.
Usesoftorhardblockagestoguideplacementengine.
DierentiatebetweenaHierarchicalDesignandatdesign?
Hierarchialdesignhasblocks,subblocksinanhierarchy;Flaeneddesignhasnosubblocksandit
hasonlyleafcells.
Hierarchicaldesigntakesmoreruntime;Flaeneddesigntakeslessruntime.
Whichismorecomplicatedwhenuhavea48MHzand500MHzclockdesign?
500MHz;becauseitismoreconstrained(i.e.lesserclockperiod)than48MHzdesign.
Namefewtoolswhichyouusedforphysicalverication?
HerculisfromSynopsys,CaliberfromMentorGraphics.
Whataretheinputleswillyougiveforprimetimecorrelation?
Netlist,Technologylibrary,Constraints,SPEForSDFle.
Iftheroutingcongestionexistsbetweentwomacros,thenwhatwillyoudo?
Iftheroutingcongestionexistsbetweentwomacros,thenwhatwillyoudo?
Providesoftorhardblockage
Howwillyoudecidethediesize?
Bycheckingthetotalareaofthedesignyoucandecidediesize.
If lengthy metal layer is connected to diusion and poly, then which one will aect by antenna
problem?
Poly
Ifthefullchipdesignisroutedby7layermetal,whymacrosaredesignedusing5LMinsteadof
using7LM?
Becausetoptwometallayersarerequiredforglobalroutinginchipdesign.Iftopmetallayersare
alsousedinblocklevelitwillcreateroutingblockage.
Inyourprojectwhatisdiesize,numberofmetallayers,technology,foundry,numberofclocks?
Diesize:tellinmmeg.1mmx1mm;remeber1mm=1000micronwhichisabigsize!!
Metallayers:Seeyourtechle.generallyfor90nmitis7to9.
Technology:Againlookintotechles.
Foundry:Againlookintotechles;eg.TSMC,IBM,ARTISANetc
Clocks:LookintoyourdesignandSDCle!
Howmanymacrosinyourdesign?
Youknowitwellasyouhavedesignedit!ASoC(SystemOnChip)designmayhave100macros
also!!!!
Whatiseachmacrosizeandnumberofstandardcellcount?
Dependsonyourdesign.
Whataretheinputneedsforyourdesign?
Forsynthesis:RTL,Technologylibrary,Standardcelllibrary,Constraints
ForPhysicaldesign:Netlist,Technologylibrary,Constraints,Standardcelllibrary
WhatisSDCconstraintlecontains?
Clockdenitions
Timingexceptionmulticyclepath,falsepath
InputandOutputdelays
Howdidyoudopowerplanning?
Howtocalculatecoreringwidth,macroringwidthandstraportrunkwidth?
HowtondnumberofpowerpadandIOpowerpads?
HowtondnumberofpowerpadandIOpowerpads?
Howthewidthofmetalandnumberofstrapscalculatedforpowerandground?
Getthetotalcorepowerconsumption;getthemetallayercurrentdensityvaluefromthetechle;
Divide total power by number sides of the chip; Divide the obtained value from the current
density to get core power ring width. Then calculate number of straps using some more
equations.Willbeexplainedindetaillater.
Howtondtotalchippower?
Total chip power=standard cell power consumption,Macro power consumption pad power
consumption.
Whataretheproblemsfacedrelatedtotiming?
Prelayout:Setup,Maxtransition,maxcapacitance
Postlayout:Hold
Howdidyouresolvethesetupandholdproblem?
Setup:upsizethecells
Hold:insertbuers
Inwhichlayerdoyoupreferforclockroutingandwhy?
Nextlowerlayertothetoptwometallayers(globalroutinglayers).Becauseithaslessresistance
hencelessRCdelay.
Ifinyourdesignhasresetpin,thenitllaectinputpinoroutputpinorboth?
Outputpin.
Duringpoweranalysis,ifyouarefacingIRdropproblem,thenhowdidyouavoid?
Increasepowermetallayerwidth.
Goforhighermetallayer.
Spreadmacrosorstandardcells.
Providemorestraps.
Deneantennaproblemandhowdidyouresolvetheseproblem?
Increased net length can accumulate more charges while manufacturing of the device due to
ionisation process. If this net is connected to gate of the MOSFET it can damage dielectric
property of the gate and gate may conduct causing damage to the MOSFET. This is antenna
problem.
Decreasethelengthofthenetbyprovidingmoreviasandlayerjumping.
Insertantennadiode.
HowdelaysvarywithdierentPVTconditions?Showthegraph.
Pincrease>dealyincrease
Pincrease>dealyincrease
Pdecrease>delaydecrease
Vincrease>delaydecrease
Vdecrease>delayincrease
Tincrease>delayincrease
Tdecrease>delaydecrease
Explaintheowofphysicaldesignandinputsandoutputsforeachstepinow.
Clickheretoseetheowdiagram
Whatiscelldelayandnetdelay?
Gatedelay
Transistorswithinagatetakeanitetimetoswitch.Thismeansthatachangeontheinputofa
gatetakesanitetimetocauseachangeontheoutput.[Magma]
Gatedelay=functionof(i/ptransitiontime,Cnet+Cpin).
CelldelayisalsosameasGatedelay.
Celldelay
Foranygateitismeasuredbetween50%ofinputtransitiontothecorresponding50%ofoutput
transition.
Intrinsicdelay
Intrinsicdelayisthedelayinternaltothegate.Inputpinofthecelltooutputpinofthecell.
It is dened as the delay between an input and output pair of a cell, when a near zero slew is
applied to the input pin and the output does not see any load condition.It is predominantly
causedbytheinternalcapacitanceassociatedwithitstransistor.
Thisdelayislargelyindependentofthesizeofthetransistorsformingthegatebecauseincreasing
sizeoftransistorsincreaseinternalcapacitors.
NetDelay(orwiredelay)
The dierence between the time a signal is rst applied to the net and the time it reaches other
devicesconnectedtothatnet.
Itisduetotheniteresistanceandcapacitanceofthenet.Itisalsoknownaswiredelay.
Wiredelay=fn(Rnet,Cnet+Cpin)
Whataredelaymodelsandwhatisthedierencebetweenthem?
LinearDelayModel(LDM)
NonLinearDelayModel(NLDM)
Whatiswireloadmodel?
Whatiswireloadmodel?
WireloadmodelisNLDMwhichhasestimatedRandCofthenet.
WhyhighermetallayersarepreferredforVddandVss?
BecauseithaslessresistanceandhenceleadstolessIRdrop.
Whatislogicoptimizationandgivesomemethodsoflogicoptimization.
Upsizing
Downsizing
Buerinsertion
Buerrelocation
Dummybuerplacement
Whatisthesignicanceofnegativeslack?
negativeslack==>thereissetupvoilation==>deisgncanfail
Whatissignalintegrity?HowitaectsTiming?
IRdrop,ElectroMigration(EM),Crosstalk,Groundbouncearesignalintegrityissues.
IfIdropismore==>delayincreases.
crosstalk==>therecanbesetupaswellasholdvoilation.
WhatisIRdrop?Howtoavoid?Howitaectstiming?
There is a resistance associated with each metal layer. This resistance consumes power causing
voltagedropi.e.IRdrop.
IfIRdropismore==>delayincreases.
WhatisEManditeects?
Due to high current ow in the metal atoms of the metal can displaced from its origial place.
Whenithappensinlargeramountthemetalcanopenorbulgingofmetallayercanhappen.This
eectisknownasElectroMigration.
Aects:Eithershortoropenofthesignallineorpowerline.
Whataretypesofrouting?
GlobalRouting
TrackAssignment
DetailRouting
Whatislatency?Givethetypes?
SourceLatency
SourceLatency
It is known as source latency also. It is dened as the delay from the clock origin point to the
clockdenitionpointinthedesign.
Delayfromclocksourcetobeginningofclocktree(i.e.clockdenitionpoint).
The time a clock signal takes to propagate from its ideal waveform origin point to the clock
denitionpointinthedesign.
Networklatency
ItisalsoknownasInsertiondelayorNetworklatency.Itisdenedasthedelayfromtheclock
denitionpointtotheclockpinoftheregister.
Thetimeclocksignal(riseorfall)takestopropagatefromtheclockdenitionpointtoaregister
clockpin.
Whatistrackassignment?
Secondstageoftheroutingwhereinparticularmetaltracks(orlayers)areassignedtothesignal
nets.
Whatiscongestion?
If the number of routing tracks available for routing is less than the required tracks then it is
knownascongestion.
Whethercongestionisrelatedtoplacementorrouting?
Routing
Whatareclocktrees?
Distributionofclockfromtheclocksourcetothesyncpinoftheregisters.
Whatareclocktreetypes?
Htree,Balancedtree,Xtree,Clusteringtree,Fishbone
Whatiscloningandbuering?
Cloningisamethodofoptimizationthatdecreasestheloadofaheavilyloadedcellbyreplicating
thecell.
Bueringisamethodofoptimizationthatisusedtoinsertbeersinhighfanoutnetstodecrease
thedealy.
Whatisthedifferencebetweenalatchandaflipflop?
Both latches and ipops are circuit elements whose output depends not only on the present
inputs,butalsoonpreviousinputsandoutputs.
Theybotharehencereferredassequentialelements.
Theybotharehencereferredassequentialelements.
In electronics, a latch, is a kind of bistable multi vibrator, an electronic circuit which has two
stablestatesandtherebycanstoreonebitofofinformation.Todaythewordismainlyusedfor
simpletransparentstorageelements,whileslightlymoreadvancednontransparent(orclocked)
devicesaredescribedasipops.Informally,asthisdistinctionisquitenew,thetwowordsare
sometimesusedinterchangeably.[wiki]
In digital circuits, a ipop is a kind of bistable multi vibrator, an electronic circuit which has
twostablestatesandtherebyiscapableofservingasonebitofmemory.Today,thetermipop
has come to generally denote nontransparent (clocked or edgetriggered) devices, while the
simplertransparentonesareoftenreferredtoaslatches.[wiki]
Aipopiscontrolledby(usually)oneortwocontrolsignalsand/oragateorclocksignal.
Latchesarelevelsensitivei.e.theoutputcapturestheinputwhentheclocksignalishigh,soas
longastheclockislogic1,theoutputcanchangeiftheinputalsochanges.
FlipFlopsareedgesensitivei.e.ipopwillstoretheinputonlywhenthereisarisingorfalling
edgeoftheclock.
A positive level latch is transparent to the positive level(enable), and it latches the nal input
beforeitischangingitslevel(i.e.beforeenablegoesto0orbeforetheclockgoestovelevel.)
A positive edge op will have its output eective when the clock input changes from 0 to 1
state(1to0fornegativeedgeop)only.
Latchesarefaster,ipopsareslower.
Latchissensitivetoglitchesonenablepin,whereasipopisimmunetoglitches.
Latchestakelessgates(lesspower)toimplementthanipops.
DFFisbuiltfromtwolatches.Theyareinmasterslaveconguration.
Latchmaybeclockedorclockless.Butipopisalwaysclocked.
ForatransparentlatchgenerallyDtoQpropagationdelayisconsideredwhileforaopclockto
Qandsetupandholdtimeareveryimportant.
Synthesisperspective:ProsandConsofLatchesandFlipFlops
InsynthesisofHDLcodesinappropriatecodingcaninferlatchesinsteadofipops.Eg.:ifand
casestatements.Thisshouldbeavoidedsalatchesaremorepronetoglitches.
Latchtakeslessarea,Flipoptakesmorearea(asipopismadeupoflatches).
Latchfacilitatetimeborrowingorcyclestealingwhereasipopsallowsynchronouslogic.
Latches are not friendly with DFT tools. Minimize inferring of latches if your design has to be
madetestable.Sinceenablesignaltolatchisnotaregularclockthatisfedtotherestofthelogic.
Toensuretestability,youneedtouseORgateusingenable andscan_enablesignalsasinput
andfeedtheoutputtotheenableportofthelatch.[ref]
Most EDA software tools have diculty with latches. Static timing analyzers typically make
assumptionsaboutlatchtransparency.Ifoneassumesthelatchistransparent(i.e.triggeredbythe
activetimeofclock,nottriggeredbyjustclockedge),thenthetoolmayndafalsetimingpath
activetimeofclock,nottriggeredbyjustclockedge),thenthetoolmayndafalsetimingpath
throughtheinputdatapin.Ifoneassumesthelatchisnottransparent,thenthetoolmaymissa
criticalpath.
If target technology supports a latch cell then race condition problems are minimized. If target
technologydoesnotsupportalatchthensynthesistoolwillinferitbybasicgateswhichisprone
to race condition. Then you need to add redundant logic to overcome this problem. But while
optimization redundant logic can be removed by the synthesis tool ! This will create endless
problemsforthedesignteam.
Duetothetransparencyissue,latchesarediculttotest.Forscantesting,theyareoftenreplaced
byalatchipopcompatiblewiththescantestshiftregister.Undertheseconditions,aipop
wouldactuallybelessexpensivethanalatch.Readagoodarticleonproblemsoflatchpublished
ineetimeslongback!!
FlipopsarefriendlywithDFTtools.Scaninsertionforsynchronouslogicishasslefree.
Labels:Digitaldesign
WhatarethedifferenttypesofdelaysinASICorVLSIdesign?
DierentTypesofDelaysinASICorVLSIdesign
SourceDelay/Latency
NetworkDelay/Latency
InsertionDelay
TransitionDelay/Slew:Risetime,falltime
PathDelay
Netdelay,wiredelay,interconnectdelay
PropagationDelay
PhaseDelay
CellDelay
IntrinsicDelay
ExtrinsicDelay
InputDelay
OutputDelay
ExitDelay
Latency(Pre/postCTS)
Latency(Pre/postCTS)
Uncertainty(Pre/PostCTS)
Unateness:Positiveunateness,negativeunateness
Jier:PLLjier,clockjier
Gatedelay
Transistorswithinagatetakeanitetimetoswitch.Thismeansthatachangeontheinputofa
gatetakesanitetimetocauseachangeontheoutput.[Magma]
Gatedelay=functionof(i/ptransitiontime,Cnet+Cpin).
CelldelayisalsosameasGatedelay.
SourceDelay(orSourceLatency)
It is known as source latency also. It is dened as the delay from the clock origin point to the
clockdenitionpointinthedesign.
Delayfromclocksourcetobeginningofclocktree(i.e.clockdenitionpoint).
The time a clock signal takes to propagate from its ideal waveform origin point to the clock
denitionpointinthedesign.
NetworkDelay(latency)
ItisalsoknownasInsertiondelayorNetworklatency.Itisdenedasthedelayfromtheclock
denitionpointtotheclockpinoftheregister.
Thetimeclocksignal(riseorfall)takestopropagatefromtheclockdenitionpointtoaregister
clockpin.
Insertiondelay
Thedelayfromtheclockdenitionpointtotheclockpinoftheregister.
Transitiondelay
ItisalsoknownasSlew.Itisdenedasthetimetakentochangethestateofthesignal.Time
takenforthetransitionfromlogic0tologic1andviceversa.orTimetakenbytheinputsignalto
risefrom10%(20%)tothe90%(80%)andviceversa.
Transitionisthetimeittakesforthepintochangestate.
Slew
Rateofchangeoflogic.SeeTransitiondelay.
Slewrateisthespeedoftransitionmeasuredinvolt/ns.
RiseTime
Risetimeisthedierencebetweenthetimewhenthesignalcrossesalowthresholdtothetime
whenthesignalcrossesthehighthreshold.Itcanbeabsoluteorpercent.
Lowandhighthresholdsarexedvoltagelevelsaroundthemidvoltageleveloritcanbeeither
Lowandhighthresholdsarexedvoltagelevelsaroundthemidvoltageleveloritcanbeeither
10% and 90% respectively or 20% and 80% respectively. The percent levels are converted to
absolutevoltagelevelsatthetimeofmeasurementbycalculatingpercentagesfromthedierence
betweenthestartingvoltagelevelandthenalseledvoltagelevel.
FallTime
Falltimeisthedierencebetweenthetimewhenthesignalcrossesahighthresholdtothetime
whenthesignalcrossesthelowthreshold.
The low and high thresholds are xed voltage levels around the mid voltage level or it can be
either10%and90%respectivelyor20%and80%respectively.Thepercentlevelsareconvertedto
absolutevoltagelevelsatthetimeofmeasurementbycalculatingpercentagesfromthedierence
betweenthestartingvoltagelevelandthenalseledvoltagelevel.
Foranidealsquarewavewith50%dutycycle,therisetimewillbe0.Forasymmetrictriangular
wave,thisisreducedtojust50%.
Clickheretoseewaveform.
Clickheretoseemoreinfo.
The rise/fall denition is set on the meter to 10% and 90% based on the linear power in Was.
Thesepointstranslateintothe10dBand0.5dBpointsinlogmode(10log0.1)and(10log0.9).
Therise/falltimevaluesof10%and90%arecalculatedbasedonanalgorithm,whichlooksatthe
meanpoweraboveandbelowthe50%pointsoftherise/falltimes.Clickheretoseemore.
Pathdelay
Pathdelayisalsoknownaspintopindelay.Itisthedelayfromtheinputpinofthecelltothe
outputpinofthecell.
NetDelay(orwiredelay)
The dierence between the time a signal is rst applied to the net and the time it reaches other
devicesconnectedtothatnet.
Itisduetotheniteresistanceandcapacitanceofthenet.Itisalsoknownaswiredelay.
Wiredelay=fn(Rnet,Cnet+Cpin)
Propagationdelay
transition.
Thisisthetimerequiredforasignaltopropagatethroughagateornet.Forgatesitisthetimeit
takesforaeventatthegateinputtoaectthegateoutput.
Fornetitisthedelaybetweenthetimeasignalisrstappliedtothenetandthetimeitreaches
otherdevicesconnectedtothatnet.
Itistakenastheaverageofrisetimeandfalltimei.e.Tpd=(Tphl+Tplh)/2.
Phasedelay
Sameasinsertiondelay
Sameasinsertiondelay
Celldelay
transition.
Intrinsicdelay
Intrinsicdelayisthedelayinternaltothegate.Inputpinofthecelltooutputpinofthecell.
It is dened as the delay between an input and output pair of a cell, when a near zero slew is
applied to the input pin and the output does not see any load condition.It is predominantly
causedbytheinternalcapacitanceassociatedwithitstransistor.
Thisdelayislargelyindependentofthesizeofthetransistorsformingthegatebecauseincreasing
sizeoftransistorsincreaseinternalcapacitors.
Extrinsicdelay
Sameaswiredelay,netdelay,interconnectdelay,ighttime.
Extrinsicdelayisthedelayeectthatassociatedtowithinterconnect.outputpinofthecelltothe
inputpinofthenextcell.
Inputdelay
Inputdelayisthetimeatwhichthedataarrivesattheinputpinoftheblockfromexternalcircuit
withrespecttoreferenceclock.
Outputdelay
Output delay is time required by the external circuit before which the data has to arrive at the
outputpinoftheblockwithrespecttoreferenceclock.
Exitdelay
Itisdenedasthedelayinthelongestpath(criticalpath)betweenclockpadinputandanoutput.
Itdeterminesthemaximumoperatingfrequencyofthedesign.
Latency(pre/postcts)
Latency is the summation of the Source latency and the Network latency. Pre CTS estimated
latencywillbeconsideredduringthesynthesisandafterCTSpropagatedlatencyisconsidered.
Uncertainty(pre/postcts)
Uncertaintyistheamountofskewandthevariationinthearrivalclockedge.PreCTSuncertainty
isclockskewandclockJier.AfterCTSwecanhavesomemarginofskew+Jier.
Unateness
Afunctionissaidtobeunateiftherisetransitiononthepositiveunateinputvariablecausesthe
ouputtoriseornochangeandviceversa.
Negative unateness means cell output logic is inverted version of input logic. eg. In inverter
Negative unateness means cell output logic is inverted version of input logic. eg. In inverter
having input A and output Y, Y is ve unate w.r.to A. Positive unate means cell output logic is
sameasthatofinput.
These+veadveunatenessareconstraintsdenedinlibraryleandaredenedforoutputpin
w.r.tosomeinputpin.
Aclocksignalispositiveunateifarisingedgeattheclocksourcecanonlycausearisingedgeat
theregisterclockpin,andafallingedgeattheclocksourcecanonlycauseafallingedgeatthe
registerclockpin.
Aclocksignalisnegativeunateifarisingedgeattheclocksourcecanonlycauseafallingedgeat
the register clock pin, and a falling edge at the clock source can only cause a rising edge at the
registerclockpin.Inotherwords,theclocksignalisinverted.
Aclocksignalisnotunateiftheclocksenseisambiguousasaresultofnonunatetimingarcsin
theclockpath.Forexample,aclockthatpassesthroughanXORgateisnotunatebecausethere
arenonunatearcsinthegate.Theclocksensecouldbeeitherpositiveornegative,dependingon
thestateoftheotherinputtotheXORgate.
Jier
Theshorttermvariationsofasignalwithrespecttoitsidealpositionintime.
Jieristhevariationoftheclockperiodfromedgetoedge.Itcanvarry+/jiervalue.
From cycle to cycle the period and duty cycle can change slightly due to the clock generation
circuitry.Thiscanbemodeledbyaddinguncertaintyregionsaroundtherisingandfallingedges
oftheclockwaveform.
SourcesofJierCommonsourcesofjierinclude:
Internalcircuitryofthephaselockedloop(PLL)
Randomthermalnoisefromacrystal
Otherresonatingdevices
Randommechanicalnoisefromcrystalvibration
Signaltransmiers
Tracesandcables
Connectors
Receivers
ClickheretoreadmoreaboutjierfromAltera.
Clickheretoreadwhatwikisaysaboutjier.
Skew
Thedierenceinthearrivalofclocksignalattheclockpinofdierentops.
Twotypesofskewsaredened:LocalskewandGlobalskew.
Twotypesofskewsaredened:LocalskewandGlobalskew.
Localskew
Thedierenceinthearrivalofclocksignalattheclockpinofrelatedops.
Globalskew
Thedierenceinthearrivalofclocksignalattheclockpinofnonrelatedops.
Skewcanbepositiveornegative.
WhendataandclockareroutedinsamedirectionthenitisPositiveskew.
Whendataandclockareroutedinoppositethenitisnegativeskew.
RecoveryTime
Recoveryspeciestheminimumtimethatanasynchronouscontrolinputpinmustbeheldstable
afterbeingdeassertedandbeforethenextclock(activeedge)transition.
Recoverytimespeciesthetimetheinactiveedgeoftheasynchronoussignalhastoarrivebefore
theclosingedgeoftheclock.
Recoverytimeistheminimumlengthoftimeanasynchronouscontrolsignal(eg.preset)mustbe
stablebeforethenextactiveclockedge.Therecoveryslacktimecalculationissimilartotheclock
setupslacktimecalculation,butitappliesasynchronouscontrolsignals.
Equation1:
RecoverySlackTime=DataRequiredTimeDataArrivalTime
DataArrivalTime=LaunchEdge+ClockNetworkDelaytoSourceRegister+Tclkq+Registerto
RegisterDelay
DataRequiredTime=LatchEdge+ClockNetworkDelaytoDestinationRegister=Tsetup
Iftheasynchronouscontrolisnotregistered,equationsshowninEquation2isusedtocalculatethe
recoveryslacktime.Equation2:
RecoverySlackTime=DataRequiredTimeDataArrivalTime
DataArrivalTime=LaunchEdge+MaximumInputDelay+PorttoRegisterDelay
DataRequiredTime=LatchEdge+ClockNetworkDelaytoDestinationRegisterDelay+Tsetup
Iftheasynchronousresetsignalisfromaport(deviceI/O),youmustmakeanInputMaximum
Delayassignmenttotheasynchronousresetpintoperformrecoveryanalysisonthatpath.
RemovalTime
Removalspeciestheminimumtimethatanasynchronouscontrolinputpinmustbeheldstable
beforebeingdeassertedandafterthepreviousclock(activeedge)transition.
Removal time species the length of time the active phase of the asynchronous signal has to be
heldaftertheclosingedgeofclock.
Removaltimeistheminimumlengthoftimeanasynchronouscontrolsignalmustbestableafter
Removaltimeistheminimumlengthoftimeanasynchronouscontrolsignalmustbestableafter
the active clock edge. Calculation is similar to the clock hold slack calculation, but it applies
asynchronous control signals. If the asynchronous control is registered, equations shown in
Equation3isusedtocalculatetheremovalslacktime.
Iftherecoveryorremovalminimumtimerequirementisviolated,theoutputofthesequentialcell
becomes uncertain. The uncertainty can be caused by the value set by the resetbar signal or the
valueclockedintothesequentialcellfromthedatainput.
Equation3
RemovalSlackTime=DataArrivalTimeDataRequiredTime
Data Arrival Time = Launch Edge + Clock Network Delay to Source Register + Tclkq of Source
Register+RegistertoRegisterDelay
DataRequiredTime=LatchEdge+ClockNetworkDelaytoDestinationRegister+Thold
Iftheasynchronouscontrolisnotregistered,equationsshowninEquation4isusedtocalculate
theremovalslacktime.
Equation4
RemovalSlackTime=DataArrivalTimeDataRequiredTime
Data Arrival Time = Launch Edge + Input Minimum Delay of Pin + Minimum Pin to Register
Delay
DataRequiredTime=LatchEdge+ClockNetworkDelaytoDestinationRegister+Thold
Iftheasynchronousresetsignalisfromadevicepin,youmustspecifytheInputMinimumDelay
constrainttotheasynchronousresetpintoperformaremovalanalysisonthispath.
Whatisthedifferencebetweensoftmacroandhardmacro?
Whatisthedierencebetweenhardmacro,rmmacroandsoftmacro?
or
WhatareIPs?
Hard macro, rm macro and soft macro are all known as IP (Intellectual property). They are
optimized for power, area and performance. They can be purchased and used in your ASIC or
FPGA design implementation ow. Soft macro is exible for all type of ASIC implementation.
HardmacrocanbeusedinpureASICdesignow,notinFPGAow.BeforebyinganyIPitis
very important to evaluate its advantages and disadvantages over each other, hardware
compatibilitysuchasI/Ostandardswithyourdesignblocks,reusabilityforotherdesigns.
Softmacros
SoftmacrosareinsynthesizableRTL.
Softmacrosaremoreexiblethanrmorhardmacros.
Softmacrosaremoreexiblethanrmorhardmacros.
Softmacrosarenotspecictoanymanufacturingprocess.
Soft macros have the disadvantage of being somewhat unpredictable in terms of performance,
timing,area,orpower.
Soft macros carry greater IP protection risks because RTL source code is more portable and
therefore,lesseasilyprotectedthaneitheranetlistorphysicallayoutdata.
Fromthephysicaldesignperspective,softmacroisanycellthathasbeenplacedandroutedina
placementandroutingtoolsuchasAstro.(ThisisthedenitiongiveninAstroRailusermanual!)
Softmacrosareeditableandcancontainstandardcells,hardmacros,orothersoftmacros.
Firmmacros
Firmmacrosareinnetlistformat.
Firmmacrosareoptimizedforperformance/area/powerusingaspecicfabricationtechnology.
Firmmacrosaremoreexibleandportablethanhardmacros.
Firmmacrosarepredictiveofperformanceandareathansoftmacros.
Hardmacro
HardmacrosaregenerallyintheformofhardwareIPs(orwetermeditashardwreIPs!).
HardmacosaretargetedforspecicICmanufacturingtechnology.
Hardmacrosareblockleveldesignswhicharesilicontestedandproved.
Hardmacroshavebeenoptimizedforpowerorareaortiming.
Inphysicaldesignyoucanonlyaccesspinsofhardmacrosunlikesoftmacroswhichallowsusto
manipulateindierentway.
Youhavefreedomtomove,rotate,ipbutyoucanttouchanythinginsidehardmacros.
Very common example of hard macro is memory. It can be any design which carries dedicated
singlefunctionality(ingeneral)..forexampleitcanbeaMP4decoder.
Be aware of features and characteristics of hard macro before you use it in your design other
thanpower,timingandareayoualsoshouldknowpinpropertieslikesyncpin,I/Ostandardsetc
LEF,GDS2leformatallowseasyusageofmacrosindierenttools.
Fromthephysicaldesign(backend)perspective:
Hardmacroisablockthatisgeneratedinamethodologyotherthanplaceandroute(i.e.using
fullcustomdesignmethodology)andisbroughtintothephysicaldesigndatabase(eg.Milkyway
inSynopsys;VolcanoinMagma)asaGDS2le.
HereisonearticlepublishedinembeddedmagazineaboutIPs.Clickheretoread.
Synthesis and placement of macros in modern SoC designs are challenging. EDA tools employ
Synthesis and placement of macros in modern SoC designs are challenging. EDA tools employ
dierentalgorithmsaccomplishthistaskalongwiththetargetofpowerandarea.Thereareseveral
research papers available on these subjects. Some of them can be downloaded from the given link
below.
HardMacroPlacementinComplexSoCDesignviewandreadarticlefromsoccentral
HardMacroPlacementinComplexSoCDesigndownloadwhitepaper
IEEE/Univerityresearchpapers
LocalSearchforFinalPlacementinVLSIDesigndownload
Consistent Placement of MacroBlocks Using Floorplanning and standard cell placement

download
A TimingDriven SoftMacro Placement And Resynthesis Method In Interaction with Chip

Floorplanningdownload
Labels:ASIC,PhysicalDesign,VLSI
WhatisthedifferencebetweenFPGAandCPLD?
FPGAField Programmable Gate Array and CPLDComplex Programmable Logic Device both are
programmablelogicdevicesmadebythesamecompanieswithdierentcharacteristics.
A Complex Programmable Logic Device (CPLD) is a Programmable Logic Device with

complexity between that of PALs (Programmable Array Logic) and FPGAs, and architectural
features of both. The building block of a CPLD is the macro cell, which contains logic
implementingdisjunctivenormalformexpressionsandmorespecializedlogicoperations.
ThisiswhatWikidenes..!!
Clickheretoseewhatelsewikihastosayaboutit!
Architecture
GranularityisthebiggestdierencebetweenCPLDandFPGA.
FPGAarenegraindevices.Thatmeansthattheycontainhundredsof(upto100000)oftiny
blocks (called as LUT or CLBs etc) of logic with ipops, combinational logic and
memories.FPGAs oer much higher complexity, up to 150,000 ipops and large number of
gatesavailable.
CPLDs typically have the equivalent of thousands of logic gates, allowing implementation of
moderately complicated data processing devices. PALs typically have a few hundred gate
equivalentsatmost,whileFPGAstypicallyrangefromtensofthousandstoseveralmillion.
CPLD are coarsegrain devices. They contain relatively few (a few 100s max) large blocks of
logicwithipopsandcombinationallogic.CPLDsbasedonANDORstructure.
CPLDshavearegisterwithassociatedlogic(AND/ORmatrix).CPLDsaremostlyimplemented
CPLDshavearegisterwithassociatedlogic(AND/ORmatrix).CPLDsaremostlyimplemented
in control applications and FPGAs in datapath applications. Because of this course grained
architecture,thetimingisveryxedinCPLDs.
FPGAareRAMbased.Theyneedtobedownloaded(congured)ateachpowerup.CPLDare
EEPROM based. They are active at powerup i.e. as long as theyve been programmed at least
once.
FPGAneedsbootROMbutCPLDdoesnot.Insomesystemsyoumightnothaveenoughtimeto
bootupFPGAthenyouneedCPLD+FPGA.
Generally,theCPLDdevicesarenotvolatile,becausetheycontainashorerasableROMmemory
inallthecases.TheFPGAarevolatileinmanycasesandhencetheyneedacongurationmemory
for working. There are some FPGAs now which are nonvolatile. This distinction is rapidly
becominglessrelevant,asseveralofthelatestFPGAproductsalsooermodelswithembedded
congurationmemory.
ThecharacteristicofnonvolatilitymakestheCPLDthedeviceofchoiceinmoderndigitaldesigns
to perform boot loader functions before handing over control to other devices not having this
capability.AgoodexampleiswhereaCPLDisusedtoloadcongurationdataforanFPGAfrom
nonvolatilememory.
Becauseofcoarsegrainarchitecture,oneblockoflogiccanholdabigequationandhenceCPLD
haveafasterinputtooutputtimingsthanFPGA.
Clickheretoreadonegoodarticle.
Features
FPGA have special routing resources to implement binary counters,arithmetic functions like
adders,comparatorsandRAM.CPLDdonthavespecialfeatureslikethis.
FPGA can contain very large digital designs, while CPLD can contain small designs only.The
limitedcomplexity(<500>
Speed: CPLDs oer a singlechip solution with fast pintopin delays, even for wide input
functions. Use CPLDs for small designs, where instanton, fast and wide decoding, ultralow
idlepowerconsumption,anddesignsecurityareimportant(e.g.,inbaeryoperatedequipment).
Security:InCPLDonceprogrammed,thedesigncanbelockedandthusmadesecure.Sincethe
congurationbitstreammustbereloadedeverytimepowerisreapplied,designsecurityinFPGA
isanissue.
Power: The high static (idle) power consumption prohibits use of CPLD in baeryoperated
equipment.FPGAidlepowerconsumptionisreasonablylow,althoughitissharplyincreasingin
thenewestfamilies.
Designexibility:FPGAsoermorelogicexibilityandmoresophisticatedsystemfeaturesthan
CPLDs: clock management, onchip RAM, DSP functions, (multipliers), and even onchip
microprocessors and MultiGigabit Transceivers.These benets and opportunities of dynamic
reconguration,evenintheendusersystem,areanimportantadvantage.
UseFPGAsforlargerandmorecomplexdesigns.
ClickheretoreadwhatXilinxhastosayaboutit.
FPGAissuitedfortimingcircuitbecaucetheyhavemoreregisters,butCPLDissuitedforcontrol
FPGAissuitedfortimingcircuitbecaucetheyhavemoreregisters,butCPLDissuitedforcontrol
circuitbecausetheyhavemorecombinationalcircuit.Atthesametime,Ifyousynthesisthesame
code for FPGA for many times, you will nd out that each timing report is dierent. But it is
dierentinCPLDsynthesis,youcangetthesameresult.
As CPLDs and FPGAs become more advanced the dierences between the two device types will
continuetoblur.Whilethistrendmayappeartomakethetwotypesmorediculttokeepapart,the
architectural advantage of CPLDs combining low cost, nonvolatile conguration, and macro cells
with predictable timing characteristics will likely be sucient to maintain a product dierentiation
fortheforeseeablefuture.
WhatisthedifferencebetweenFPGAandASIC?
ThisquestionisverypopularinVLSIfresherinterviews.Itlookssimplebutadeeperinsightinto
thesubjectrevealsthefactthattherearelotofthinkstobeunderstood!!Sohereistheanswer.
FPGAvs.ASIC
Dierence between ASICs and FPGAs mainly depends on costs, tool availability, performance
anddesignexibility.Theyhavetheirownprosandconsbutitisdesignersresponsibilitytond
the advantages of the each and use either FPGA or ASIC for the product. However, recent
developmentsintheFPGAdomainarenarrowingdownthebenetsoftheASICs.
FPGA
FieldProgramableGateArrays
FPGADesignAdvantages
Faster timetomarket: No layout, masks or other manufacturing steps are needed for FPGA
design.ReadymadeFPGAisavailableandburnyourHDLcodetoFPGA!Done!!
No NRE (Non Recurring Expenses): This cost is typically associated with an ASIC design. For
FPGAthisisnotthere.FPGAtoolsarecheap.(sometimesitsfree!YouneedtobuyFPGA.thats
all !). ASIC youpay huge NRE and tools are expensive. I would say very expensiveIts in
crores.!!
Simpler design cycle: This is due to software that handles much of the routing, placement, and
timing. Manual intervention is less.The FPGA design ow eliminates the complex and time
consumingoorplanning,placeandroute,timinganalysis.
More predictable project cycle: The FPGA design ow eliminates potential respins, wafer
capacities, etc of the project since the design logic is already synthesized and veried in FPGA
device.
Field Reprogramability: A new bitstream ( i.e. your program) can be uploaded remotely,
instantly.FPGAcanbereprogrammedinasnapwhileanASICcantake$50,000andmorethan4
6weekstomakethesamechanges.FPGAcostsstartfromacoupleofdollarstoseveralhundreds
ormoredependingonthehardwarefeatures.
Reusability: Reusability of FPGA is the main advantage. Prototype of the design can be
Reusability: Reusability of FPGA is the main advantage. Prototype of the design can be
implemented on FPGA which could be veried for almost accurate results so that it can be
implementedonanASIC.IfdesignhasfaultschangetheHDLcode,generatebitstream,program
toFPGAandtestagain.ModernFPGAsarerecongurablebothpartiallyanddynamically.
FPGAsaregoodforprototypingandlimitedproduction.Ifyouaregoingtomake100200boards
itisntworthtomakeanASIC.
Generally FPGAs are used for lower speed, lower complexity and lower volume designs.But
todays FPGAs even run at 500 MHz with superior performance. With unprecedented logic
densityincreasesandahostofotherfeatures,suchasembeddedprocessors,DSPblocks,clocking,
andhighspeedserialateverlowerprice,FPGAsaresuitableforalmostanytypeofdesign.
Unlike ASICs, FPGAs have special hardwares such as BlockRAM, DCM modules, MACs,
memories and highspeed I/O, embedded CPU etc inbuilt, which can be used to get beer
performace. Modern FPGAs are packed with features. Advanced FPGAs usually come with
phaselocked loops, lowvoltage dierential signal, clock data recovery, more internal routing,
high speed, hardware multipliers for DSPs, memory,programmable I/O, IP cores and
microprocessor cores. Remember Power PC (hardcore) and Microblaze (softcore) in Xilinx and
ARM(hardcore)andNios(softcore)inAltera.ThereareFPGAsavailablenowwithbuiltinADC!
Usingallthesefeaturesdesignerscanbuildasystemonachip.Now,douyoreallyneedanASIC
?
FPGAsythesisismuchmoreeasierthanASIC.
InFPGAyouneednotdooorplanning,toolcandoiteciently.InASICyouhavedoit.
FPGADesignDisadvantages
Powe consumption in FPGA is more. You dont have any control over the power optimization.
ThisiswhereASICwinstherace!
YouhavetousetheresourcesavailableintheFPGA.ThusFPGAlimitsthedesignsize.
Goodforlowquantityproduction.Asquantityincreasescostperproductincreasescomparedto
theASICimplementation.
ASIC
ApplicationSpecicIntergratedCirciut
ASICDesignAdvantages
Cost.cost.cost.Lowerunitcosts:Forveryhighvolumedesignscostscomesouttobevery
less.LargervolumesofASICdesignprovestobecheaperthanimplementingdesignusingFPGA.
Speedspeedspeed.ASICs are faster than FPGA: ASIC gives design exibility. This gives
enoromousopportunityforspeedoptimizations.
Low power.Low power.Low power: ASIC can be optimized for required low power. There
are several low power techniques such as power gating, clock gating, multi vt cell libraries,
pipeliningetcareavailabletoachievethepowertarget.ThisiswhereFPGAfailsbadly!!!Canyou
think of a cell phone which has to be charged for every call..never..low power ASICs helps
baerylivelongerlife!!
InASICyoucanimplementanalogcircuit,mixedsignaldesigns.Thisisgenerallynotpossiblein
InASICyoucanimplementanalogcircuit,mixedsignaldesigns.Thisisgenerallynotpossiblein
FPGA.
InASICDFT(DesignForTest)isinserted.InFPGADFTisnotcarriedout(ratherforFPGAno
needofDFT!).
ASICDesignDiadvantages
Timetomarket: Some large ASICs can take a year or more to design. A good way to shorten
developmenttimeistomakeprototypesusingFPGAsandthenswitchtoanASIC.
Design Issues: In ASIC you should take care of DFM issues, Signal Integrity isuues and many
more. In FPGA you dont have all these because ASIC designer takes care of all these. ( Dont
forgetFPGAisanICanddesignedbyASICdesignenginner!!)
ExpensiveTools:ASICdesigntoolsareverymuchexpensive.YouspendahugeamountofNRE.
StructuredASICS
StructuredASICshavetheboommetallayersxedandonlythetoplayerscanbedesignedby
thecustomer.
Structured ASICs are custom devices that approach the performance of todays Standard Cell
ASICwhiledramaticallysimplifyingthedesigncomplexity.
Structured ASICs oer designers a set of devices with specic, customizable metal layers along
with predened metal layers, which can contain the underlying paern of logic cells, memory,
andI/O.
FPGAvs.ASICDesignFlowComparison
hp://www.xilinx.com/company/geingstarted/fpgavsasic.htm
Otherlinks
hp://www.controleng.com/article/CA607224.html
hp://www.soccentral.com/results.asp?CategoryID=488&EntryID=15887
hp://www.us.designreuse.com/articles/article9010.html
Labels:ASIC,FPGA
Inscanchainsifsomeflipflopsare+veedgetriggeredand
remainingflipflopsareveedgetriggeredhowitbehaves?
Answer:
Fordesignswithbothpositiveandnegativeclockedops,thescaninsertiontoolwillalwaysroute
Fordesignswithbothpositiveandnegativeclockedops,thescaninsertiontoolwillalwaysroute
thescanchainsothatthenegativeclockedopscomebeforethepositiveedgeopsinthechain.This
avoidstheneedoflockuplatch.
For the same clock domain the negedge ops will always capture the data just captured into the
posedgeopsontheposedgeoftheclock.
For the multiple clock domains, it all depends upon how the clock trees are balanced. If the clock
domainsarecompletelyasynchronous,ATPGhastomaskthereceivingops.
Whatisdifferencebetweennormalbufferandclockbuffer?
Answer:
Clock net is one of the High Fanout Net(HFN)s. The clock buers are designed with some special
property like high drive strength and less delay. Clock buers have equal rise and fall time. This
preventsdutycycleofclocksignalfromchangingwhenitpassesthroughachainofclockbuers.
Normal buers are designed with W/L ratio such that sum of rise time and fall time is minimum.
Theytooaredesignedforhigherdrivestrength.
WhatisdifferencebetweenHFNsynthesisandCTS?
Answer:
HFNsaresynthesizedinfrontendalso.butatthatmomentnoplacementinformationofstandard
cellsareavailablehencebackendtoolcollapsessynthesizedHFNs.ItresenthesizesHFNsbasedon
placement information and appropriately inserts buer. Target of this synthesis is to meet delay
requirementsi.e.setupandhold.
Forclocknosynthesisiscarriedoutinfrontend(why..????..becausenoplacementinformationof
ipops ! So synthesis wont meet true skew targets !!) in backend clock tree synthesis tries to
meet skew targetsIt inserts clock buers (which have equal rise and fall time, unlike normal
buers!)ThereisnoskewinformationforanyHFNs.
Isitpossibletohaveazeroskewinthedesign?
Answer:
Theoreticallyitispossible.!
Practicallyitisimpossible.!!
Practicallywecantreduceanydelaytozero.delaywillexisthencewetrytomakeskewequal
Practicallywecantreduceanydelaytozero.delaywillexisthencewetrytomakeskewequal
(or same) rather than zeronow with this optimization all ops get the clock edge with same
delay relative to each other. so virtually we can say they are having zero skew or skew is
balanced.
Whatyoumeanbyscanchainreordering?
Answer1:
Based on timing and congestion the tool optimally places standard cells. While doing so, if scan
chainsaredetached,itcanbreakthechainordering(whichisdonebyascaninsertiontoollikeDFT
compiler from Synopsys) and can reorder to optimize it. it maintains the number of ops in a
chain.
Answer2:
During placement, the optimization may make the scan chain dicult to route due to congestion.
Hencethetoolwillreorderthechaintoreducecongestion.
Thissometimesincreasesholdtimeproblemsinthechain.Toovercomethesebuersmayhavetobe
inserted into the scan path. It may not be able to maintain the scan chain length exactly. It cannot
swapcellfromdierentclockdomains.
Becauseofscanchainreorderingpaernsgeneratedearlierisofnouse.Butthisisnotaproblemas
ATPGcanberedonebyreadingthenewnetlist.
Onwhatbasiswedecidetheclockfrequencyinanydesign?
Answer:
Thereareseveralfactors.Importantofthemare:
1) Input and output data rate : For example if you are designing any encryptor or decryptor you
needminimum100MHz
2)Power:Higherthefrequencymorethepowerconsumption
3)Accuracyoftheresultsrequired:IfhigheraccuracyisnotneededRCoscillatorcanbeusedwhich
savesareaandeverythingwewantincompactsize..butRCcantproducehigherfrequency!
4)Technology:Lowerthenodemorespeed(alsomorepower.againtradeo!!).howmuchfast
wewant?
5)Targetplatform: Is it FPGA or custom ASIC. naturally ASIC can give higher clok frequency
butFPGAfrequencyofoperationislimitedbyseveralotherfactors
WhatisJTAG?
Answer1:
JTAG is acronym for Joint Test Action Group.This is also called as IEEE 1149.1 standard for
Standard Test Access Port and BoundaryScan Architecture. This is used as one of the DFT
techniques.
Answer2:
JTAG(JointTestActionGroup)boundaryscanisamethodoftestingICsandtheirinterconnections.
This used a shift register built into the chip so that inputs could be shifted in and the resulting
outputscouldbeshiftedout.JTAGrequiresfourI/Opinscalledclock,inputdata,outputdata,and
statemachinemodecontrol.
TheusesofJTAGexpandedtodebuggingsoftwareforembeddedmicrocontrollers.Thiselimjinates
the need for incircuit emulators which is more costly. Also JTAG is used in downloading
congurationbitstreamstoFPGAs.
JTAGcellsarealsoknownasboundaryscancells,aresmallcircuitsplacedjustinsidetheI/Ocells.
Thepurposeistoenabledatato/fromtheI/Othroughtheboundaryscanchain.Theinterfacetothese
scan chains are called the TAP (Test Access Port), and the operation of the chains and the TAP are
controlledbyaJTAGcontrollerinsidethechipthatimplementsJTAG.
ASICDesignCheckList
SiliconProcessandLibraryCharacteristics
Whatexactprocessareyouusing?
Howmanylayerscanbeusedforthisdesign?
AretheCrosstalkNoiseconstraints,XtalkAnalysisconguration,CellEM&WireEMavailable?
DesignCharacteristics
Whatisthedesignapplication?
Numberofcells(placeableobjects)?
IsthedesignVerilogorVHDL?
Isthenetlistatorhierarchical?
IsthereRTLavailable?
Isthereanydatapathlogicusingspecialdatapathtools?
IstheDFTtobeconsidered?
Canscanchainsbereordered?
IsmemoryBIST,boundaryscanusedonthisdesign?
ArestatictiminganalysisconstraintsavailableinSDCformat?
ClockCharacteristics
Howmanyclockdomainsareinthedesign?
Whataretheclockfrequencies?
Whataretheclockfrequencies?
Isthereatargetclockskew,latencyorotherclockrequirements?
DoesthedesignhaveaPLL?
Ifso,isitusedtoremoveclocklatency?
IsthereanyI/Ocellinthefeedbackpath?
IsthePLLusedforfrequencymultipliers?
Aretherederivedclocksorcomplexclockgenerationcircuitry?
Arethereanygatedclocks?
Ifyes,dotheyusesimplegatingelements?
Isthegateclockusedfortimingorpower?
Forgatedclocks,canthegatingelementsbesizedfortiming?
AreyoumuxinginatestclockorusingaJTAGclock?
Availablecellsforclocktree?
Arethereanyspecialclockrepeatersinthelibrary?
ArethereanyEM,sleworcapacitancelimitsontheserepeaters?
Howmanydrivestrengthsareavailableinthestandardbuersandinverters?
Doanyofthebuershavebalancedriseandfalldelays?
Anytherespecialrequirementsforclockdistribution?
Willtheclocktreebeshielded?Ifso,whataretheshieldingrequirements?
FloorplanandPackageCharacteristics
Targetdiearea?
Doestheareaestimateincludepower/signalrouting?
Whatgates/mm2hasbeenassumed?
Numberofroutinglayers?
Anyspecialpowerroutingrequirements?
NumberofdigitalI/Opins/pads?
Numberofanalogsignalpins/pads?
Numberofpower/groundpins/pads?
Totalnumberofpins/padsandLocation?
Willthischipuseawirebondpackage?
Willthischipuseaipchippackage?
IfYes,isitI/Obumppitch?Rowsofbumps?Bumpallocation?Bumppadlayoutguide?
Haveyoualreadydoneoorplanningforthisdesign?
Ifyes,isconformancetotheexistingoorplanrequired?
Whatisthetargetdiesize?
Whatistheexpectedutilization?
Pleasedrawtheoveralloorplan?
IsthereanexistingoorplanavailableinDEF?
Whatarethenumberandtypeofmacros(memory,PLL,etc.)?
Arethereanyanalogblocksinthedesign?
Whatkindofpackagingisused?Flipchip?
AretheI/OsperipheryI/OorareaI/O?
HowmanyI/Os?
Isthedesignpadlimited?
PowerplanningandPoweranalysisforthisdesign?
Arelayoutdatabasesavailableforhardmacros?
Timinganalysisandcorrelatio?
Physicalverication?
DataInput
Libraryinformationfornewlibrary
Libraryinformationfornewlibrary
.libfortiminginformation
GDSIIorLEFforlibrarycellsincludinganyRAMs
RTLinVerilog/VHDLformat
NumberoflogicalblocksintheRTL
ConstraintsfortheblockinSDC
FloorplaninformationinDEF
I/Opinlocation
Macrolocations
PowerGating
Power Gating is eective for reducing leakage power [3]. Power gating is the technique wherein
circuitblocksthatarenotinusearetemporarilyturnedotoreducetheoverallleakagepowerofthe
chip.Thistemporaryshutdowntimecanalsocallaslowpowermodeorinactivemode.When
circuitblocksare required for operation once again they are activated toactivemode.Thesetwo
modes are switched at the appropriate time and in the suitable manner to maximize power
performance while minimizing impact to performance. Thus goal of power gating is to minimize
leakagepowerbytemporarilycuingpowerotoselectiveblocksthatarenotrequiredinthatmode.
Powergatingaectsdesignarchitecturemorecomparedtotheclockgating.Itincreasestimedelays
aspowergatedmodeshavetobesafelyenteredandexited.Thepossibleamountofleakagepower
savinginsuchlowpowermodeandtheenergydissipationtoenterandexitsuchmodeintroduces
some architectural tradeos. Shuing down the blocks can be accomplished either by software or
hardware. Driver software can schedule the power down operations. Hardware timers can be
utilized.Adedicatedpowermanagementcontrolleristheotheroption.
Anexternallyswitchedpowersupplyisverybasicformofpowergatingtoachievelongtermleakage
power reduction. To shuto the block for small interval of time internal power gating is suitable.
CMOS switches that provide power to the circuitry are controlled by power gating controllers.
Outputofthepowergatedblockdischargeslowly.Henceoutputvoltagelevelsspendmoretimein
thresholdvoltagelevel.Thiscanleadtolargershortcircuitcurrent.
Power gating uses lowleakage PMOS transistors as header switches to shut o power supplies to
parts of a design in standby or sleep mode. NMOS footer switches can also be used as sleep
transistors. Inserting the sleep transistors splits the chips power network into a permanent power
networkconnectedtothepowersupplyandavirtualpowernetworkthatdrivesthecellsandcanbe
turnedo.
Thequalityofthiscomplexpowernetworkiscriticaltothesuccessofapowergatingdesign.Twoof
themostcriticalparametersaretheIRdropandthepenaltiesinsiliconareaandroutingresources.
Power gating can be implemented using cell or clusterbased (or ne grain) approaches or a
distributedcoarsegrainedapproach.
Powergatingparameters
Power gating implementation has additional considerations than the normal timing closure
Power gating implementation has additional considerations than the normal timing closure
implementation.Thefollowingparametersneedtobeconsideredandtheirvaluescarefullychosen
forasuccessfulimplementationofthismethodology[1][2].
Powergatesize:Thepowergatesizemustbeselectedtohandletheamountofswitchingcurrent
atanygiventime.Thegatemustbebiggersuchthatthereisnomeasurablevoltage(IR)dropdue
to the gate. Generally we use 3X the switching capacitance for the gate size as a rule of thumb.
Designerscanalsochoosebetweenheader(PMOS)orfooter(NMOS)gate.Usuallyfootergates
tend to be smaller in area for the same switching current. Dynamic power analysis tools can
accuratelymeasuretheswitchingcurrentandalsopredictthesizeforthepowergate.
Gatecontrolslewrate:Inpowergating,thisisanimportantparameterthatdeterminesthepower
gatingeciency.Whentheslewrateislarge,ittakesmoretimetoswitchoandswitchonthe
circuitandhencecanaectthepowergatingeciency.Slewrateiscontrolledthroughbuering
thegatecontrolsignal.
Simultaneous switching capacitance: This important constraint refers to the amount of circuit
that can be switched simultaneously without aecting the power network integrity. If a large
amount of the circuit is switched simultaneously, the resulting rush current can compromise
thepowernetworkintegrity.Thecircuitneedstobeswitchedinstagesinordertopreventthis.
Power gate leakage: Since power gates are made of active transistors, leakage is an important
considerationtomaximizepowersavings.
Finegrainpowergating
Adding a sleep transistor to every cell that is to be turned o imposes a large area penalty, and
individually gating the power of every cluster of cells creates timing issues introduced by inter
cluster voltage variation that are dicult to resolve. Finegrain power gating encapsulates the
switchingtransistorasapartofthestandardcelllogic.Switchingtransistorsaredesignedbyeither
library IP vendor or standard cell designer. Usually these cell designs conform to the normal
standardcellrulesandcaneasilybehandledbyEDAtoolsforimplementation.
Thesizeofthegatecontrolisdesignedwiththeworstcaseconsiderationthatthiscircuitwillswitch
duringeveryclockcycleresultinginahugeareaimpact.Someoftherecentdesignsimplementthe
negrainpowergatingselectively,butonlyforthelowVtcells.IfthetechnologyallowsmultipleVt
libraries,theuseoflowVtdevicesisminimuminthedesign(20%),sothattheareaimpactcanbe
reduced.WhenusingpowergatesonthelowVtcellstheoutputmustbeisolatedifthenextstageisa
highVtcell.OtherwiseitcancausetheneighboringhighVtcelltohaveleakagewhenoutputgoesto
anunknownstateduetopowergating.
Gate control slew rate constraint is achieved by having a buer distribution tree for the control
signals.Thebuersmustbechosenfromasetofalwaysonbuers(buerswithoutthegatecontrol
signal) designed with high Vt cells. The inherent dierence between when a cell switches o with
respecttoanother,minimizestherushcurrentduringswitchonandswitcho.
Usuallythegatingtransistorisdesignedasahighvtdevice.Coarsegrainpowergatingoersfurther
exibility by optimizing the power gating cells where there is low switching activity. Leakage
optimization has to be done at the coarse grain level, swapping the low leakage cell for the high
leakage one. Finegrain power gating is an elegant methodology resulting in up to 10X leakage
reduction. This type of power reduction makes it an appealing technique if the power reduction
requirementisnotsatisedbymultipleVtoptimizationalone.
Coarsegrainpowergating
The coarsegrained approach implements the grid style sleep transistors which drives cells locally
The coarsegrained approach implements the grid style sleep transistors which drives cells locally
throughsharedvirtualpowernetworks.ThisapproachislesssensitivetoPVTvariation,introduces
less IRdrop variation, and imposes a smaller area overhead than the cell or clusterbased
implementations. In coarsegrain power gating, the powergating transistor is a part of the power
distributionnetworkratherthanthestandardcell.
Therearetwowaysofimplementingacoarsegrainstructure:
1)Ringbased
2)columnbased
Ringbasedmethodology:Thepowergatesareplacedaroundtheperimeterofthemodulethatis
being switchedo as a ring. Special corner cells are used to turn the power signals around the
corners.
Columnbased methodology: The power gates are inserted within the module with the cells
abuedtoeachotherintheformofcolumns.Theglobalpoweristhehigherlayersofmetal,while
theswitchedpowerisinthelowerlayers.
Gatesizingdependsontheoverallswitchingcurrentofthemoduleatanygiventime.Sinceonlya
fractionofcircuitsswitchatanypointoftime,powergatesizesaresmallerascomparedtothene
grain switches. Dynamic power simulation using worst case vectors can determine the worst case
switching for the module and hence the size. IR drop can also be factored into the analysis.
Simultaneous switching capacitance is a major consideration in coarsegrain power gating
implementation. In order to limit simultaneous switching daisy chaining the gate control buers,
specialcountersareusedtoselectivelyturnonblocksofswitches.
IsolationCells
Isolation cells are used to prevent short circuit current. As the name indicates these cells isolate
power gated block from the normally on block. Isolation cells are specially designed for low short
circuit current when input is at threshold voltage level. Isolation control signals are provided by
powergatingcontroller.Isolationofthesignalsofaswitchablemoduleisessentialtopreservedesign
integrity.UsuallyasimpleORorANDlogiccanfunctionasanoutputisolationdevice.Multiplestate
retention schemes are available in practice to preserve the state before a module shuts down. The
simplesttechniqueistoscanouttheregistervaluesintoamemorybeforeshuingdownamodule.
Whenthemodulewakesup,thevaluesarescannedbackfromthememory.
RetentionRegisters
Whenpowergatingisused,thesystemneedssomeformofstateretention,suchasscanningoutdata
to a RAM, then scanning it back in when the system is reawakened. For critical applications, the
memorystatesmustbemaintainedwithinthecell,aconditionthatrequiresaretentionoptostore
bits in a table. That makes it possible to restore the bits very quickly during wakeup. Retention
registersarespeciallowleakageipopsusedtoholdthedataofmainregisterofthepowergated
block.Thusinternalstateoftheblockduringpowerdownmodecanberetainedandloadedbackto
itwhentheblockisreactivated.Retentionregistersarealwayspoweredup.Theretentionstrategyis
designdependent.Duringthepowergatingdatacanberetainedandtransferredbacktoblockwhen
powergatingiswithdrawn.Powergatingcontrollercontrolstheretentionmechanismsuchaswhen
tosavethecurrentcontentsofthepowergatingblockandwhentorestoreitback.
MultipleThresholdCMOS(MTCMOS)Circuits
James T. Kao et al. [2] showed MTCMOS logic is eective standby leakage control technique, but
dicult to implement since sleep transistor sizing is highly dependent on discharge paern within
the circuit block. They showed dual Vt domino logic avoids the sizing diculties and inherent
performance associated with MTCMOS. High Vt cells are used where leakage has to be prevented
whereas low Vt cells are employed where speed is of concern. Both cells are eectively used in
MTCMOStechnique.
MTCMOStechnique[1]
InactivemodeofoperationthehighVttransistorsareturnedoandthelogicgatesconsistingoflow
Vt transistors can operate with low switching power dissipation and smaller propagation delay. In
standbymodethehighVttransistorsareturnedotherebycuingotheinternallowVtcircuitry.
VariableThresholdCMOS(VTCMOS)
One of the ecient methods to reduce power consumption is to use low supply voltage and low
threshold voltage without loosing speed performance. But increase in the lower threshold voltage
devices leads to increased sub threshold leakage and hence more standby power consumption.
Variable Threshold CMOS (VTCMOS) devices are one solution to this problem. In VTCMOS
technique threshold voltage of the low threshold devices are varied by applying variable substrate
biasvoltagefromacontrolcircuitry.
VTCMOS technique is very eective technique to reduce the power consumption with some
drawbacks with respect to manufacturing of these devices. VTCMOS requires either twin well or
triplewelltechnologytoachievedierentsubstratebiasvoltagelevelsatdierentpartsoftheIC.The
areaoverheadofthesubstratebiascontrolcircuitryisnegligible.[1]
VoltageScaling
Reducing the power supply voltage is the eective technique to reduce dynamic power with the
speedpenalty.Keepingallothersfactorsconstantifpowerscalingisscaleddownpropagationdelay
willincrease.Thiscanbecompensatedbyscalingdownthethresholdvoltagetothesameextentas
thesupplyvoltage.ThisallowsthecircuittoproducethesamespeedperformanceatalowerVdd.At
thesametimesmallerthresholdvoltagesleadtosmallernoisemarginandincreasedleakagecurrent.
DynamicVoltageandFrequencyScaling(DVFS)
We know that supply voltage can be reduced if frequency of operation is reduced. If reduction in
supply voltage is quadratic then approximately cubic reduction of power consumption can be
achieved.However,itshouldbenotedthatfrequencyreductionslowstheoperation.
The above mentioned relation between energy and voltage is not always true. The authors in [1]
showedthatquadraticrelationshipbetweenenergyandVdddeviatesasVddisscaleddownintothe
sub threshold voltage level. Sub threshold leakage current increases exponentially with the supply
voltage.Sinceinsubthresholdoperationtheoncurrenttakestheformofsubthresholdcurrentdelay
increases exponentially with voltage scaling. At very low voltages dynamic power reduces
quadratically.Buttheleakageenergyincreaseswithsupplyvoltagereductionsinceleakageenergyis
linear with the circuit delay. Hence dynamic and leakage power becomes comparable in sub
thresholdvoltageregion.
According to Bo Zhai et al. [1] dynamic voltage and frequency scaling is very popular low power
technique. But larger voltage ranges does not improve power eciency. They showed that for sub
threshold supply voltages, leakage energy becomes dominant, making just in time completion
energyinecient.TheyalsoshowedthatextendingvoltagerangebelowhalfVddwillimprovethe
energyeciencyformostprocessordesignswhileextendingthisrangetosubthresholdoperations
isbenecialonlyforspecicapplications.Oneoftheimportantpointstobenotedfromtheirstudyis
DVFSinsubthresholdvoltagerangeisneverenergyecient.
MultiThreshold(MVT)VoltageTechnique
MultiplethresholdvoltagetechniquesusebothLowVtandHighVtcells.Uselowerthresholdgates
on critical path while higher threshold gates o the critical path. This methodology improves
performancewithoutanincreaseinpower.FlipsideofthistechniqueisthatMultiVtcellsincrease
fabrication complexity. It also lengthens the design time. Improper optimization of the design may
utilizemoreLowVtcellsandhencecouldendupwithincreasedpower!
Ruchir Puri et al. [2] have discussed the design issues related with multiple supply voltages and
multiple threshold voltages in the optimization of dynamic and static power. They noted several
advantages of Multi Vt optimization. Multi Vt optimization is placement non disturbing
transformation.FootprintandareaoflowVtandhighVtcellsaresameasthatofnominalVtcells.
ThisenablestimecriticalpathstobeswappedbylowVtcellseasily.
Frank Sill et al. [3] have proposed a new method for assignment of devices with dierent Vth in a
doubleVthprocess.TheydevelopedmixedVthgates.Theyshowedleakagereductionof25%.They
createdalibraryofLVT,mixedVt,HVTandMultiVt.TheycomparedsimulationresultswithaLVT
version of each design. Leakage power dissipation decreased by average 65% with mixed Vth
techniquecomparedtotheLVTimplementation.
MeetaSrivatsavetal.[4]haveexploredvariouswaysofreducingleakagepowerandrecommended
Multi Vt approach. They have carried out analysis using 130 nm and 90 nm technology. They
synthesizeddesignwithdierentcombinationoftargetlibrary.ThecombinationswereLowVtcells
only,HighVtcellsonly,HighVtcellswithincrementalcompileusingLowVtlibrary,nominal(or
regular) Vt cell and Multi Vt targeting Hvt and Lvt in one go. With only Low Vt highest leakage
regular) Vt cell and Multi Vt targeting Hvt and Lvt in one go. With only Low Vt highest leakage
powerof469wwasobtained.WithonlyHighVtcellsleakagepowerconsumptionwasminimum
buttimingwasnotmet(1.13ofslack).WithnominalVtmoderateleakagepowervalueof263w
wasobtained.Bestresults(54wwithtimingmet)obtainedforsynthesistargetingHvtlibraryand
incrementalcompileusingLvtlibrary.
Dierent low leakage synthesis ows are carried out by Xiaodong Zhang [1] using Synopsys EDA
toolsarelistedbelow:
LowVt>MultiVtow:Thisproducesleastcellcountandleastdynamicpower.Butproduce
highest leakage power. It takes very low runtime. Good for a design with very tight timing
constraints
MultiVtonepassow:Ittakeslongestruntimeandcanbeusedinmostofdesigns.
HighVt>MultiVtow:Produceleastleakagepowerconsumptionbuthashighcellcountand
dynamicpower.Thismethodologyisgoodforleakagepowercriticaldesign.
HighVt > MultiVt with dierent timing constraints ow: This is a well balanced ow and
producessecondleastleakagepower.Thishassmallercellcount,areaanddynamicpowerand
shorterruntime.Thisdesignisalsogoodformostofdesigns.
OptimizationStrategies
ThetradeosbetweenthedierentVtcellstoachieveoptimalperformanceareespeciallybenecial
duringsynthesistechnologygatemappingandplacementoptimization.Thelogicsynthesis,orgate
mapping phase of the optimization process is implemented by synthesis tool, and placement
optimizationishandledphysicalimplementationtool.
Synthesis
Duringlogicsynthesis,thedesignismappedtotechnologygates.Atthispointintheprocessoptimal
logicarchitecturesareselected,mappedtotechnologycells,andoptimizedforspecicdesigngoals.
SincearangeofVtlibrariesarenowavailableandchoiceshavetobemadeacrossarchitectureswith
dierentVtcells,logicsynthesisistheidealplacetostartdeployingamixofdierentVtcellsinto
thedesign.
SinglePassvs.TwoPassSynthesiswithmultiplethresholdlibraries
Multiple libraries are currently available with dierent performance, area and power utilization
characteristics, and synthesis optimization can be achieved using either one or more libraries
concurrently. In a singlepass ow, multiple libraries can be loaded into synthesis tool prior to
synthesis optimization. In a twopass ow, the design is initially optimized using one library, and
thenanincrementaloptimizationiscarriedoutusingadditionallibraries.
About multi vt optimization in his paper Ruchir Puri[2] says: The multithreshold optimization
algorithm implemented in physical synthesis is capable of optimizing several Vt levels at the same
time. Initially, the design is optimized using the higher threshold voltage library only. Then, the
MultiVt optimization computes the powerperformance tradeo curve up to the maximum
allowable leakage power limit for the next lower threshold voltage library. Subsequently, the
optimizationstartsfromthemostcriticalslackendofthispowerperformancecurveandswitchesthe
most critical gate to next equivalent lowerVt version. This will increase the leakage in the design
beyond the maximum permissible leakage power. To compensate for this, the algorithm picks the
leastcriticalgatefromtheotherendofthepowerperformancecurveandsubstitutesitbackwithits
higherVt version. If this does not bring the leakage power below the allowed limit, it traverses
higherVt version. If this does not bring the leakage power below the allowed limit, it traverses
further from the curve (from least critical towards more critical) substituting gates with higherVt
gates, until the leakage limit is satised. Then we jump back to the second most critical cell and
switchittothelowerVtversion.Thisiterationcontinuesuntilwecannolongerswitchanygatewith
thelowervtversionwithoutviolatingtheleakagepowerlimit.
ButAmitAgarwaletal.[5]havewarnedabouttheyieldlosspossibilitiesduetodualVtows.They
showedthatinnanoscaleregime,conventionaldualVtdesignsuersfromyieldlossduetoprocess
variationandvastlyoverestimatesleakagesavingssinceitdoesnotconsiderjunctionBTBT(BandTo
BandTunneling)leakageintoaccount.Theiranalysisshowedthe importance of considering device
basedanalysiswhiledesigninglowpowerschemeslikedualVt.Theirresearchalsoshowedthatin
scaledtechnology,statisticalinformationofbothleakageanddelayhelpsinminimizingtotalleakage
whileensuringyieldwithrespecttotargetdelayindualVtdesigns.However,nonscalabilityofthe
present way of realizing high Vt, requires the use of dierent process options such as metal gate
workfunctionengineeringinfuturetechnologies.
MultiVdd(Voltage)
Dynamic power is directly proportional to power supply. Hence naturally reducing power
signicantly improves the power performance. At the same time gate delay increases due to the
decreasedthresholdvoltage.Highvoltagecanbeappliedtothetimingcriticalpathandrestofthe
chip runs in lower voltage. Overall system performance is maintained. Dierent blocks having
dierent voltage supplies can be integrated in SoC. This increases power planning complexity in
terms of laying down the power rails and power grid structure. Level shifters are necessary to
interfacebetweendierentblocks.
MultipleVoltageASIC/SoCDesign:Classification
Multivoltagedesignstrategiescanbebroadlyclassiedasfollows[1]:
Static Voltage Scaling (SVS): Dierent but xed voltage is applied to dierent blocks or
subsystemsoftheSoCdesign.
Multilevel Voltage Scaling (MVS): The block or subsystem of the ASIC or SoC design is
switchedbetweentwoormorevoltagelevels.Butfordierentoperatingmodeslimitednumbers
ofdiscretevoltagelevelsaresupported.
Dynamic Voltage and Frequency Scaling (DVFS): Voltage as well as frequency is dynamically
variedasperthedierentworkingmodesofthedesignsoastoachievepowereciency.When
highspeedofoperationisrequiredvoltageisloweredtoaainhigherspeedofoperationwiththe
penaltyofincreasedpowerconsumption.
Adaptive voltage Scaling (AVS): Here voltage is controlled using a control loop. This is an
extensionofDVFS.
MultiVoltageDesignChallenges
LevelShifters
Signalscrossingfromonevoltagedomaintoanothervoltagedomainhavetobeinterfacedthrough
Signalscrossingfromonevoltagedomaintoanothervoltagedomainhavetobeinterfacedthrough
thelevelshifterbuerswhichappropriatelyshiftthesignallevels.Designofsuitablelevelshifterisa
challengingjob.
TimingAnalysis
Timinganalysisofthegivendesignbecomessimplerwiththesinglevoltageasitcanbeperformed
forsingleperformancepointbasedonthecharacterizedlibraries.Toolscanoptimizethedesignfor
worst case PVT (Process, Voltage, temperature) conditions. This is not the case with multi voltage
designs. Libraries should be characterized for dierent voltage levels that are used in the design.
EDAtoolhastooptimizeindividualblocksorsubsystemsandalsomultiplevoltagedomains.This
analysisbecomescomplexforlargerASIC/SoC.
FloorplanningandPowerPlanning
Multiple power domain demands multiple power grid structure and a suitable power distribution
amongthem.ForalargerASIC/SoCmorecarefuloorplanningandpowerplanningisessential.The
speed in which dierent power domains switch on or o is also important. A low voltage power
domain may activate early compared to the high voltage domain. Multi voltage designs pose
additional board level complexities. Separate power supply may necessary to provide dierent
powerlevels.
MultiVoltageDesigns:TimingIssues
Clock
Clock Tree Synthesis (CTS) tools should be aware of dierent power domains and understand the
levelshifterstoinserttheminappropriateplaces.Clocktreeisroutedthroughlevelshifterstoreach
dierent power domains. Simultaneous timing analysis and optimization is necessary for multiple
voltagedomains.ThusCTSbecomesmorecomplexinmultivoltagedesigns.
TimingIssueswithmultivoltagedesign
StaticTimingAnalysis(STA)
Timinganalysisforsinglevoltagedesigniseasy.Whenitcomestostaticvoltagescalingitbecomes
liletougherjobasanalysishastobecarriedoutfordierentvoltages.Thismethodologyrequires
librarieswhicharecharacterizedfordierentvoltagesused.Multilevelanddynamicvoltagescaling
poseagreaterchallenge.Foreachsupplyvoltageleveloroperatingpointconstraintsarespecied.
There can be dierent operating modes for dierent voltages. Constraints need not be same for all
modesandvoltages.Theperformancetargetforeachmodecanvary.EDAtoolshouldbecapableof
handling all these situations simultaneously to carry out timing analysis. Dierent constraints at
dierentmodesandvoltageshavetobesatised.
MultiVoltageDesigns:PowerPlanningIssues
EcientpowerplanningisoneofthekeyconcernsofmodernSoCdesigns.Inmultivoltagedesigns
providing power to the dierent power domains is challenging. Every power domain requires
independent local power supply and grid structure and some designs may even have a separate
powerpad.Separatepowerpadispossibleinipchipdesignsandpowerpadcanbetakenoutnear
fromthepowerdomain.Otherchipshavetotakeoutthepowerpadsfromtheperipherywhichcan
putlimittothenumberofpowerdomains.
Local on chip voltage regulation is good idea to provide multiple voltages to dierent circuits.
Local on chip voltage regulation is good idea to provide multiple voltages to dierent circuits.
Unfortunately most of the digital CMOS technologies are not suitable for the implementation of
either switched mode of operation or linear voltage regulations. Separate power rail structure is
requiredforeachpowerdomain.TheseadditionalpowerrailsintroducedierentlevelsofIRdrop
puinglimittotheachievablepowereciency.
LowPowerDesignTechniques
MichaelKeatingetal.[1]listsseverallowpowertechniquestotacklethedynamicandstaticpower
consumptioninmodernSoCdesigns.Dynamicpowercontroltechniquesincludeclockgating,multi
voltage, variable frequency, and ecient circuits. Leakage power control techniques include power
gating,multiVtcells.CommonmethodssupportedbyEDAtoolsincludeclockgating,gatesizing,
lowpowerplacement,registerclustering,lowpowerCTS,multiVtoptimization.
Someofthelowpowertechniquesinusetodayarelistedinbelowtable.
DierentLowPowerTechniques[3]
Tradeosassociatedwiththevariouspowermanagementtechniques[2]
Abovetablesummarizestradeosassociatedwithdierentpowermanagementtechniques.Power
gatingandDVFSdemandlargemethodologychangewhereasmultivtandclockgatingaectleast.
Unlesslargeleakageoptimizationisnotnecessaryitisalwaysbenecialtogowitheithermultivtor
clockgatingtechniques.Basedonthedesigncomplexityandrequirementscombinationofanylow
powertechniquescanbeadopted.Multivtoptimizationalongwiththepowergatingisfoundtobe
ecient in some of the complex designs. Advanced improvements in the implementation (i.e.
fabrication) technology has allowed substrate biasing techniques to be used heavily as it does not
poseanyarchitecturalanddesignvericationchallengesandalsoprovideshighleakagereduction.
MultiVoltageDesigns:PowerPlanningIssues
EcientpowerplanningisoneofthekeyconcernsofmodernSoC
designs.Inmultivoltagedesignsprovidingpowertothedierent
powerdomainsischallenging.Everypowerdomainrequiresindependent
localpowersupplyandgridstructureandsomedesignsmayevenhavea
separatepowerpad.Separatepowerpadispossibleinipchip
designsandpowerpadcanbetakenoutnearfromthepowerdomain.
Otherchipshavetotakeoutthepowerpadsfromtheperipherywhich
canputlimittothenumberofpowerdomains.
Localonchipvoltageregulationisgoodideatoprovidemultiple
voltagestodierentcircuits.UnfortunatelymostofthedigitalCMOS
technologiesarenotsuitablefortheimplementationofeither
switchedmodeofoperationorlinearvoltageregulations.
Separatepowerrailstructureisrequiredforeachpowerdomain.These
Separatepowerrailstructureisrequiredforeachpowerdomain.These
additionalpowerrailsintroducedierentlevelsofIRdroppuing
limittotheachievablepowereciency.
VariableThresholdCMOS(VTCMOS)Circuits
Oneoftheecientmethodstoreducepowerconsumptionistouselow
supplyvoltageandlow
thresholdvoltagewithoutloosingspeedperformance.Butincreasein
thelowerthresholdvoltagedevicesleadstoincreasedsubthreshold
leakageandhencemorestandbypowerconsumption.
VariableThresholdCMOS(VTCMOS)devicesareonesolutiontothis
problem.InVTCMOStechniquethresholdvoltageofthelowthreshold
devicesarevariedbyapplyingvariablesubstratebiasvoltagefroma
controlcircuitry.
VTCMOStechniqueisveryeectivetechniquetoreducethepower
consumptionwithsomedrawbacksrelatedtomanufacturingofthese
devices.VTCMOSrequireseithertwinwellortriplewelltechnologyto
achievedierentsubstratebiasvoltagelevelsatdierentpartsof
theIC.Theareaoverheadofthesubstratebiascontrol
MultiVoltageDesigns:TimingIssues
Clock
ClockTreeSynthesis(CTS)toolsshouldbeawareofdierentpower
domainsandunderstandthelevelshifterstoinsertthemin
appropriateplaces.Clocktreeisroutedthroughlevelshiftersto
reachdierentpowerdomains.Simultaneoustiminganalysisand
optimizationisnecessaryformultiplevoltagedomains.ThusCTS
becomesmorecomplexinmultivoltagedesigns.
StaticTimingAnalysis(STA)
Timinganalysisforsinglevoltagedesigniseasy.Whenitcomesto
staticvoltagescalingitbecomesliletougherjobasanalysishas
tobecarriedoutfordierentvoltages.Thismethodologyrequires
librarieswhicharecharacterizedfordierentvoltagesused.
Multilevelanddynamicvoltagescalingposeagreaterchallenge.For
eachsupplyvoltageleveloroperatingpointconstraintsare
specied.Therecanbedierentoperatingmodesfordierent
voltages.Constraintsneednotbesameforallmodesandvoltages.The
performancetargetforeachmodecanvary.EDAtoolshouldbecapable
ofhandlingallthesesituationssimultaneouslytocarryouttiming
ofhandlingallthesesituationssimultaneouslytocarryouttiming
analysis.Dierentconstraintsatdierentmodesandvoltageshave
tobesatised.
MultipleVoltageDesignChallenges
LevelShifters
Signalscrossingfromonevoltagedomaintoanothervoltagedomainhas
tobeinterfacedthroughthelevelshifterbuerswhichappropriately
shiftsthesignallevels.Designofsuitablelevelshifterisa
challengingjob.
TimingAnalysis
Timinganalysisofthegivendesignbecomessimplerwiththesingle
voltageasitcanbeperformedforsingleperformancepointbasedon
thecharacterizedlibraries.Toolscanoptimizethedesignforworst
casePVT(Process,Voltage,temperature)conditions.
Thisisnotthecasewithmultivoltagedesigns.Librariesshouldbe
characterizedfordierentvoltagelevelsthatareusedinthe
design.EDAtoolhastooptimizeindividualblocksorsubsystemsand
alsomultiplevoltagedomains.Thisanalysisbecomescomplexfor
largerASIC/SoC.
FloorplanningandPowerPlanning
Multiplepowerdomaindemandsmultiplepowergridstructureanda
suitablepowerdistributionamongthem.ForalargerASIC/SoCmore
carefuloorplanningandpowerplanningisessential
Thespeedinwhichdierentpowerdomainsswitchonoroalso
important.Alowvoltagepowerdomainmayactivateearlycomparedto
thethehighvoltagedomain.Multivoltagedesignsposeadditional
boardlevelcomplexities.Separatepowersupplymaynecessaryto
providedierentpowerlevels.
MultipleThreshold(MultiVt)CellLibraries
Withthetechnologiesshrinkingto90nm,30nmandbelowoneofthe
commonwaystoreduceleakagepoweristousemultipleVtlibraries.
SubthresholdleakagevariesexponentiallywiththeVtcomparatedto
theweakerdependanceofdelayoverVt.
Librariesareoeredindierentversionseachconsistingof
standardVtcells,lowVtcellsandhighVtcellsindependantofeach
other.Powerandtimingisoptimizedbasedontheselibrariesandthey
oergoodexibilityandopportunitytologicandphysicalsynthesis
oergoodexibilityandopportunitytologicandphysicalsynthesis
toolforoptimizationprocess.
DualVtsynthesisowhasbecomequitecommonin130nmandbelow
tehnologynodes.Inthisowinitialsynthesisiscarriedout
targetingprimarylibrarywhichmaybealowVtorhighVtornormal
Vtlibrary,andtheseconditeratonofsynthesisandoptimizationis
performedbasedonsecondarylibrarieswhicharealsolibraries
consistitngofmultiplethresholdcells.
Whichlibraryhastobeusedasprimarylibrary?
Thisdependsontheoptimizationtargetasperthedesignrequirement.
Ingeneral,ifoptimizationtargetispowerperformance,rst
syntheizethedesignusingthehighVtcelllibrarywhichachieves
lowestleakagepower.Inthenextiterationofoptimizationcellsin
thecriticalpathhastobereplacedbylowVtcellswhicharefaster.
IftheoptimizationtargetistomeettimingthenrstuselowVt
celllibrarytoachievetimingandthenoptimizeleakagepowerusing
highVtcells.
LowPowerTechniques:MultiVoltage(Vdd)
Dynamicpowerisdirectlyproportionaltopowersupply.Hence
naturallyreducingpowersignicantlyimprovesthepower
performance.Atthesametimegatedelayincreasesduetothedecreased
thresholdvoltage.
Highvoltagecanbeappliedtothetimingcriticalpathandrestof
thechiprunsinlowervoltage.Overallsystemperformanceis
maintained.
Dierentblockshavingdierentvoltagesuppliescanbeintegrated
inSoC.Thisincreasespowerplanningcomplexityintermsoflaying
downthepowerrailsandpowergridstructure.Levelshiftersare
necessarytointerfacebetweendierentblocks.
LowPowerTechniques:ClockGating
Clockbuersconsumemorethan50%ofdynamicpower.Henceitis
gooddesignideatoturnotheclockwhenitisnotneeded.Automatic
clockgatingissupportedbymodernEDAtools.Theyidentifythe
circuitswhereclockgatingcanbeinserted.
Specicclockgatingcellsarerequiredinlibrarytobeutilizedby
thesynthesistools.Availabilityofclockgatingcellsandautomatic
insertionbytheEDAtoolsmakesitsimplermethodoflowpower
technique.Advantageofthismethodisthatclockgatingdoesnot
technique.Advantageofthismethodisthatclockgatingdoesnot
requiremodicationstoRTLdescription.
CommentsRSS(ReallySimpleSyndication)feed
1Comment:
relocation
April24,2013at3:50pm
BestAntivirusSoftware.co.nzisNewZealandsNo.FREE
relocationhp://www.zoomgroups.com/userProle/6716514
Reply
BlogatWordPress.com.

Backend (Physical Design)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Backend (Physical Design)

Uploaded by

Copyright:

Available Formats

BipeenKulkarni

widthismore=>more spacing between two conductors=>cross coupling capacitanceisless=>less

Consistent Placement of MacroBlocks Using Floorplanning and standard cell placement

A TimingDriven SoftMacro Placement And Resynthesis Method In Interaction with Chip

A Complex Programmable Logic Device (CPLD) is a Programmable Logic Device with

You might also like