Professional Documents
Culture Documents
UNIT -II
4.
Bus Operations
Types of bus cycles:
Single transfer cycle
Burst transfer cycle
Interrupt ack cycle
Inquire cycle etc.
Some of the signals are used to indicate the type of bus cycle.
M/IO# - Memory / input output - output pin
If high - memory cycle or low I/O operation
Single-Transfer Cycle
Burst Cycles
Cache uses burst cycles.
A new 8 byte chunk can be transferred every clock cycle.
The processor supplies the starting address of the first group of 8 bytes at
the beginning of the cycle.
The next groups of 8 bytes are transferred according to the burst order.
Burst transfer order:
T12: This state indicates there are two outstanding bus cycles.
The processor is starting the second bus cycle at the same time
that data is being transferred for the first.
In T12, the processor drives
BRDY# - first cycle
ADS# - second cycle
T2P: This state indicates there are two outstanding bus cycles.
both are in their second and subsequent clocks.
Same job as T12
TD : Dead state
This state is used to insert a dead state between two consecutive
cycles (read followed by write or vice versa) in order to give
the system bus time to change states.
BREQ
Flow
Functional description
No Request Pending
The processor starts a new bus cycle & ADS# is asserted in the T1 state.
If no cycle is pending when the processor finishes the current cycle or NA# is
not asserted, the processor goes back to the idle state.
When the processor finishes the current cycle, and no dead clock is needed, it goes to
the T2 state.
When the processor finishes the current cycle and a dead clock is needed, it goes to the
TD state.
If the current cycle is not completed, the processor always moves to T2P to process the
data transfer.
10
The processor stays in T2P until the first cycle transfer is over.
11
The processor finishes the first cycle and no dead clock is needed, it goes to T2 state
12
When the first cycle is complete, and a dead clock is needed, it goes to TD state.
13
14
Shutdown :
If Pentium detects an internal parity error then run the shutdown
cycle.
Execution is suspended while in shutdown until the processor
receives an NMI, INIT and RESET request.
Cache is unchanged.
Writethrough : Writing results to the cache and to main memory are called
Writethrough.
Cache hit Data found in cache. Results in data transfer at maximum speed.
Cache miss Data not found in cache. Processor loads data from Memory
and copies into cache. This results in extra delay, called miss
penalty.
Cache Line : Cache is partitioned into lines (also called blocks). During
data transfer, a whole line is read or written.
Each line has a tag that indicates the address in Memory from which the line
has been copied
Types of Cache
1. Fully Associative
2. Direct Mapped
3. Set Associative
Sequential Access :
Start at the beginning and read through in order
Access time depends on location of data and previous location
Example: tape
Direct Access :
Individual blocks have unique address
Access is by jumping to vicinity then performing a sequential search
Access time depends on location of data within "block" and previous
location
Example: hard disk
Random access:
Each location has a unique address
Access time is independent of location or previous access
e.g. RAM
Associative access :
Data is retrieved based on a portion of its contents rather than its
address
Access time is independent of location or previous access
e.g. cache
Performance
Transfer Rate : Rate at which data can be moved
TN = TA + N/R
where
TN = Average time to read or write N bits
TA = Average access time
N = Number of bits
R = Transfer rate, in bits per second(bps)
Direct-Mapped Cache
One way set associative cache.
Memory divided into cache pages
Page size and cache size both are
equal.
Line 0 of any page - Line 0 of
cache
Directly maps the memory line into
an equivalent cache line.
Set-Associative Cache
Set associative is a compromise
between the other two.
The bigger the way the better the
performance, but the more complex
and expensive.
Combination of fully associative and
direct mapped caching schemes.
Divide the cache in to equal sections
called cache ways.
01111101011101110001101100111000
Compare all tag fields for the value
011111010111011100011011001.
If a match is found, return byte 11000
(2410) of the line.
Two way set-associate mapping with 19 bit tag, 6 bit index and 5 bit
offset
01111101011101110001101100111000
Compare the tag fields of lines 0110010 to 0110011 for the value
011111010111011100011.
If a match is found, return byte 11000 (2410) of that line
Split-line Access
It permits upper half of one line and lower half of next to be fetched from
code cache in one clock cycle.
When split-line is read, the information is not correctly aligned.
Split-line Access
Multiprocessor System
When multiple processors are used in a single system, there needs to be a
mechanism whereby all processors agree on the contents of shared cache
information.
For e.g., two or more processors may utilize data from the same memory
location, X.
Each processor may change value of X, thus which value of X has to be
considered?
If each processor change the value of the data item, we have different
(incoherent) values of Xs data in each cache.
Clean Data : The data in the cache and the data in the main memory
both are same, the data in the cache is called clean data.
Dirty Data : The data is modified within cache but not modified in
main memory, the data in the cache is called dirty data.
Stale Data : The data is modified with in main memory but not
modified in cache, the data in the cache is called stale data.
Out of- date main memory Data: The data is modified within cache
but not modified in main memory, the data in the main memory is
called Out of- date main memory Data.
Cache Coherency
Pentiums mechanism is called MESI
(Modified/Exclusive/Shared/Invalid)Protocol.
This protocol uses two bits stored with each line of data to keep track of the
state of cache line.
The four states are defined as follows:
Modified:
The current line has been modified (does not match with main memory)
and is only available in a single cache.
Exclusive:
The current line has not been modified (matches with main memory)
and is only available in a single cache.
Writing to this line changes its state to modified
Shared:
Copies of the current line may exist in more than one cache.
A write to this line causes a write through to main memory and may
invalidate the copies in the other cache.
Invalid:
The current line is empty.
A read from this line will generate a miss.
Only the shared and invalid states are used in code cache.
Snoop : when a cache is watching the address lines for transaction, this is
called a snoop.
This function allows the cache to see if any transactions are
accessing memory it contains within itself.
Snarf: when a cache takes the information from the data lines, the cache is
said to have snarfed the data.
This function allows the cache to be updated and maintain consistency
LRU Algorithm
One or more bits are added to the cache entry to support the LRU algorithm.
One LRU bit & Two valid bits for two lines.
If any invalid line (out of two) is found out that is replaced with the newly
referred data.
If all the lines are valid a LRU line is replaced by the new one.
Byte enables indicate the type of bus cycle. BE4 is low and all other BEs are
high.
BE7 BE6 BE5 BE4 BE3 BE2 BE1 BE0
1
1 1
0
1
1
1
1
Cache instructions:
INVD invalidate cache
Effectively erases all the information in the data cache. (by marking
it all invalid).
INVD instruction should be used with care. This instruction does not
write back modified cache lines.
Flush cycle is driven after the INVD and WBIND instructions are
executed.
BE7 BE6 BE5 BE4 BE3 BE2 BE1 BE0
1
1 1 1
1
1
0
1
write back cycle is generated followed by the flush cycle.
PF : prefetch
D1 : Instruction decode
D2 : Address Generation
EX : Execute -ALU and Cache Access
WB : Write Back
D1
D2
EX
WB
I1
I2
I3
I4
I1
I2
I3
I4
I1
I2
I3
I4
I1
I2
I3
I4
I1
I2
I3
I4
Instructions are fed into the PF stages from the cache or memory.
D1 stage - determine the current pair of instructions can execute together.
D2 stage addresses for operands that reside in memory are calculated.
EX stage - operands are read from the data cache or memory.
ALU operations are performed.
branch prediction for instruction are verified. (except conditional
branches)
Branch Prediction
Branch Prediction Strategies :
Static
The actions for a branch are fixed for each branch during the entire
execution. The actions are fixed at compile time.
Decided before runtime
Based on the object code
Dynamic
The decision causing the branch prediction can dynamically change
during the program execution.
Based on the execution history.
Prediction decisions may change during the execution of the
program
BHT
IFA:
xx
xx:
ANT
AT
Initialised when a
branch is taken first
ANT
Strongly
Weakly
taken
taken
11
10
AT
Prediction: "Taken"
AT
ANT
Weakly
not
taken
Strongly
not
taken
01
00
AT
ANT
Another buffer activated when BTB predicts taken will prefetch instruction
from the target address.
Functional
Block
Diagram of
Pentium
Sign bit
Exponent
Mantissa
The term floating point is derived from the fact that there is no fixed
number of digits before and after the decimal point; that is, the decimal
point can float.
There are also representations in which the number of digits before and
after the decimal point is set, called fixed-point representations.
PF - prefetch
D1 instruction decode
D2 address generation
EX
X1
First ins- U pipeline makes up the first five stages of the FPU pipeline
Second ins V pipeline