You are on page 1of 6

4/3/2016

IntroductiontoPipelining

IntroductiontoPipelining
Latencyvs.throughput
Latency
Eachinstructiontakesacertaintimeto
complete.
Thisisthelatencyforthatoperation.
It'stheamountoftimebetweenwhenthe
instructionisissuedandwhenit
completes.

Throughput
Thenumberofinstructionsthatcompletein
aspanoftime.
Thisisnotnecessarilythesameasdividing
thetimespanbythelatencyifpipeliningis
used.
Pipelining
Definition
Pipeliningistheabilitytooverlapexecutionof
differentinstructionsatthesametime.
Itexploitsparallelismamonginstructions
andisNOTvisibletotheprogrammer.

Thisissimilartobuildingacaronanassembly
line.
Whileitmaytaketwohourstobuilda
singlecar,therearehundredsofcarin
progressatanytime.
http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html

1/6

4/3/2016

IntroductiontoPipelining

Thethroughputoftheassemblylineisthe#ofcars
completedperhour.
ThethroughputofaCPUpipelineisthe#of
instructionscompletedpersecond.

Pipelinestages
Eachstepinapipelineiscalledapipestage.
Inourassemblylineexample,astagecorresponds
toaworkstationontheassemblyline.
Pipelining
Cycletime
EverythinginaCPUmovesinlockstep,
synchronizedbytheclock("heartbeat"ofthe
CPU.)

Amachinecycle:timerequiredtocompletea
singlepipelinestage.
Amachinecycleisusuallyone,sometimes
two,clockcycleslong,butrarelymore.

Inmachineswithnopipelining:
Themachinecyclemustbelongenoughtocompleteasingle
instruction
Oreachinstructionmustbedividedintosmallerchunks(multiple
clockcyclesperinstruction).

http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html

2/6

4/3/2016

IntroductiontoPipelining

Pipelinecycletime
Allpipelinestagesmust,bydesign,takethesame
time.
Thus,themachinecycletimeisthatofthe
longestpipelinestage.

Ideally,allstagesshouldbeexactlythesame
length.
Pipelining
Pipelinespeedup
Theidealspeedupfromapipelineisequaltothe
numberofstagesinthepipeline.

However,thisonlyhappensifthepipelinestages
areallofequallength.
Splittinga40nsoperationinto5stages,
each8nslong,willresultina5xspeedup.
Splittingthesameoperationinto5stages,4
ofwhichare7.5nslongandoneofwhich
is10nslongwillresultinonlya4x
speedup.

Ifyourstartingpointisamultipleclockcycleper
instructionmachinethenpipeliningdecreases
CPI.
Ifyourstartingpointisasingleclockcycleper
http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html

3/6

4/3/2016

IntroductiontoPipelining

instructionmachinethenpipeliningdecreases
cycletime.
Wewillfocusonthefirststartingpointinour
analysis.
SimpleDLXoperation(withoutpipelining)
EachDLXinstructionhasfivephases.
Thus,eachinstructionrequiresfivecyclesto
execute(CPI=5)

Instructionfetch(IF)
Getthenextinstruction.
Instructiondecode&registerfetch(ID)
Decodetheinstructionandgettheregistersfrom
theregisterfile.
Execution/effectiveaddresscalculation(EX)
Performtheoperation.
Forloadandstores,calculatethememory
address(base+immed).
Forbranches,compareandcalculatethe
branchdestination.
Memoryaccess/branchcompletion(MEM)
Forloadandstores,performthememoryaccess.
Fortakenbranches,updatetheprogramcounter.
Writeback(WB)
Writetheresulttotheregisterfile.
Forstoresandbranches,donothing.
SimpleDLXoperation(withoutpipelining)
Datapathfortheunpipelinedversion:

http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html

4/6

4/3/2016

IntroductiontoPipelining

Redboxesaretemporarystoragelocations.
SimpleDLXoperation(withoutpipelining)
Thetemporarystoragelocationswereaddedtothedatapath
oftheunpipelinedmachinetomakeiteasytopipeline.

Notethatbranchandstoreinstructionstake4clockcycles.
Assumingbranchfrequencyof12%andastore
frequencyof5%,CPIis4.83.

Thisimplementationisnotoptimal.Improvementsinclude:
CompletingALUinstructionsduringtheMEMcycle(dropsCPI
to4.35assuming47%ALUoperationfrequency).

http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html

5/6

4/3/2016

IntroductiontoPipelining

OtherimprovementstoCPIarepossiblebutarelikelyto
increasetheclockcycletime.

Also,severalhardwareredundanciesexist:
ALUcanbeshared.
Dataandinstructionmemorycanbecombinedsinceaccess
occursondifferentclockcycles.
PipeliningDLX
Sincetherearefiveseparatestages,wecanhaveapipeline
inwhichoneinstructionisineachstage.

ThiswilldecreaseCPIto1,sinceoneinstructionwillbe
issued(orfinish)eachcycle.

Duringanycycle,oneinstructionispresentineachstage.

Ideally,performanceisincreasedfivefold!
However,thisisrarelyachievableaswewillsee.

http://eceresearch.unm.edu/jimp/611/slides/chap3_1.html

6/6

You might also like