Professional Documents
Culture Documents
IntroductiontoPipelining
IntroductiontoPipelining
Latencyvs.throughput
Latency
Eachinstructiontakesacertaintimeto
complete.
Thisisthelatencyforthatoperation.
It'stheamountoftimebetweenwhenthe
instructionisissuedandwhenit
completes.
Throughput
Thenumberofinstructionsthatcompletein
aspanoftime.
Thisisnotnecessarilythesameasdividing
thetimespanbythelatencyifpipeliningis
used.
Pipelining
Definition
Pipeliningistheabilitytooverlapexecutionof
differentinstructionsatthesametime.
Itexploitsparallelismamonginstructions
andisNOTvisibletotheprogrammer.
Thisissimilartobuildingacaronanassembly
line.
Whileitmaytaketwohourstobuilda
singlecar,therearehundredsofcarin
progressatanytime.
http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html
1/6
4/3/2016
IntroductiontoPipelining
Thethroughputoftheassemblylineisthe#ofcars
completedperhour.
ThethroughputofaCPUpipelineisthe#of
instructionscompletedpersecond.
Pipelinestages
Eachstepinapipelineiscalledapipestage.
Inourassemblylineexample,astagecorresponds
toaworkstationontheassemblyline.
Pipelining
Cycletime
EverythinginaCPUmovesinlockstep,
synchronizedbytheclock("heartbeat"ofthe
CPU.)
Amachinecycle:timerequiredtocompletea
singlepipelinestage.
Amachinecycleisusuallyone,sometimes
two,clockcycleslong,butrarelymore.
Inmachineswithnopipelining:
Themachinecyclemustbelongenoughtocompleteasingle
instruction
Oreachinstructionmustbedividedintosmallerchunks(multiple
clockcyclesperinstruction).
http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html
2/6
4/3/2016
IntroductiontoPipelining
Pipelinecycletime
Allpipelinestagesmust,bydesign,takethesame
time.
Thus,themachinecycletimeisthatofthe
longestpipelinestage.
Ideally,allstagesshouldbeexactlythesame
length.
Pipelining
Pipelinespeedup
Theidealspeedupfromapipelineisequaltothe
numberofstagesinthepipeline.
However,thisonlyhappensifthepipelinestages
areallofequallength.
Splittinga40nsoperationinto5stages,
each8nslong,willresultina5xspeedup.
Splittingthesameoperationinto5stages,4
ofwhichare7.5nslongandoneofwhich
is10nslongwillresultinonlya4x
speedup.
Ifyourstartingpointisamultipleclockcycleper
instructionmachinethenpipeliningdecreases
CPI.
Ifyourstartingpointisasingleclockcycleper
http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html
3/6
4/3/2016
IntroductiontoPipelining
instructionmachinethenpipeliningdecreases
cycletime.
Wewillfocusonthefirststartingpointinour
analysis.
SimpleDLXoperation(withoutpipelining)
EachDLXinstructionhasfivephases.
Thus,eachinstructionrequiresfivecyclesto
execute(CPI=5)
Instructionfetch(IF)
Getthenextinstruction.
Instructiondecode®isterfetch(ID)
Decodetheinstructionandgettheregistersfrom
theregisterfile.
Execution/effectiveaddresscalculation(EX)
Performtheoperation.
Forloadandstores,calculatethememory
address(base+immed).
Forbranches,compareandcalculatethe
branchdestination.
Memoryaccess/branchcompletion(MEM)
Forloadandstores,performthememoryaccess.
Fortakenbranches,updatetheprogramcounter.
Writeback(WB)
Writetheresulttotheregisterfile.
Forstoresandbranches,donothing.
SimpleDLXoperation(withoutpipelining)
Datapathfortheunpipelinedversion:
http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html
4/6
4/3/2016
IntroductiontoPipelining
Redboxesaretemporarystoragelocations.
SimpleDLXoperation(withoutpipelining)
Thetemporarystoragelocationswereaddedtothedatapath
oftheunpipelinedmachinetomakeiteasytopipeline.
Notethatbranchandstoreinstructionstake4clockcycles.
Assumingbranchfrequencyof12%andastore
frequencyof5%,CPIis4.83.
Thisimplementationisnotoptimal.Improvementsinclude:
CompletingALUinstructionsduringtheMEMcycle(dropsCPI
to4.35assuming47%ALUoperationfrequency).
http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html
5/6
4/3/2016
IntroductiontoPipelining
OtherimprovementstoCPIarepossiblebutarelikelyto
increasetheclockcycletime.
Also,severalhardwareredundanciesexist:
ALUcanbeshared.
Dataandinstructionmemorycanbecombinedsinceaccess
occursondifferentclockcycles.
PipeliningDLX
Sincetherearefiveseparatestages,wecanhaveapipeline
inwhichoneinstructionisineachstage.
ThiswilldecreaseCPIto1,sinceoneinstructionwillbe
issued(orfinish)eachcycle.
Duringanycycle,oneinstructionispresentineachstage.
Ideally,performanceisincreasedfivefold!
However,thisisrarelyachievableaswewillsee.
http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html
6/6