Cache

MEMORY ORGANIZATION
TYPES OF MEMORY
R
D
AS
R
D
M
A D
R
M D
A
R
M
A
M
R
O
M
2
MEMORY: BASIC CONCEPTS
Stores large number of bits
m x n: m words of n bits each
k = Log2(m) address input
signals
or m = 2^k words
e.g., 4,096 x 8 memory:
32,768 bits
12 address input signals
8 input/output data signals
Memory access
r/w: selects read or write
enable: read or write only when
asserted
Nonvolatile memory
Can be read from but not written
to, by a processor
in an embedded system
Traditionally written to,
programmed, before
inserting to embedded system
2k n ROM
enable
External view
Uses
Store software program for
general-purpose processor
program instructions can be one
or more ROM words
Example: 8 x 4 ROM
m
w
or
d
s
n
n
r/ me
2
bit
ewA mo
n
m
A
read
s
Q
n Qry
and
e
write
pe
a ext
m
mem
r
ory
bl or
er
wo
e nal
y
rd
vie
w
P
E
E
R
P
E
O
F
R
P
M
L
O
R
A
M
O
S
M
H
Ext
e ern
2k
A
n aln
A
a0 vie
RO
Q
Q
kbl
wM
e1 n-1 0
M
e
m
o
S
Rr
y
A
M
0
k1
n-1 0
TermS
Traditional ROM/RAM
distinctions
ROM
read only, bits stored without
power
RAM
read and write, lose stored
bits without power
Traditional distinctions
blurred
4
Advanced ROMs can be written
to
e.g., EEPROM
Advanced RAMs can hold bits
without power
e.g., NVRAM
Write ability
Manner and speed a memory
can be written
Storage permanence
ability of memory to hold
stored bits after they are written
ROM: Read-Only Memory
6
7
Horizontal lines = words
Vertical lines = data
Lines connected only at circles
Decoder sets word 2s line to 1 if
address input is 010
Data lines Q3 and Q1 are set to 1
because there is a programmed
connection with word 2s line
Word 2 is not connected with
data lines Q2 and Q0
Output is 1010
Inte
8rnal
4
3
vie
ROMw
e
w
A
w
A
w
w
A
o
n
o
0 8
1
od
2
d
rro
ap
e QQQQra
r
d
bro
d
c 3 21 0w
dd
t
o
l g 0
1
ir
2a
d
li
era
e
e
n
li
r
m
m
a
bl
e
c
o
n
d
e
n
e
O
R
Mask-programmed ROM
Connections programmed at
fabrication
set of masks
Lowest write ability
only once
Highest storage permanence
9bits never change unless
damaged
Typically used for final design of

high-volume systems
spread out NRE cost for a low
unit cost
OTP ROM: One-time programmable
ROM
Connections programmed after
manufacture by user
user provides file of desired
contents of ROM
file input to machine called ROM
programmer
each programmable connection
is a fuse
ROM programmer blows fuses
where connections should not exist
Very low write ability
10
typically written only once and
requires ROM programmer device
Very high storage permanence
bits dont change unless
reconnected to programmer and
more fuses blown
Commonly used in final products
cheaper, harder to inadvertently
modify
(a)
source drain
0V
floating gate
EPROM: Erasable programmable
ROM
Programmable component is
a MOS transistor
Transistor has floating gate
surrounded by an insulator
(a) Negative charges form a
channel between source and drain
storing a
logic 1
(b) Large positive voltage at
gate causes negative charges to
move out of
channel and get trapped in floating
gate storing a logic 0
(c) (Erase) Shining UV rays on
surface of floating-gate causes
negative
charges to return to channel from
floating gate restoring the logic 1
. (d)
(b) source drain
+15V
(c)
source drain
5-30 min
11
(d) An EPROM package showing
quartz window through which UV
light
can pass
Better write ability
can be erased and
reprogrammed thousands of times
Reduced storage
permanence
program lasts about 10 years but
is susceptible to radiation and
electric noise
Typically used during design
development
EEPROM: Electrically erasable
programmable ROM
Programmed and erased
electronically
typically by using higher than
normal voltage
can program and erase
individual words
Better write ability
can be in-system programmable
with built-in circuit to provide
higher than normal
12
voltage
built-in memory controller
commonly used to hide details from
memory user
writes very slow due to erasing
and programming
busy pin indicates to processor
EEPROM still writing
can be erased and programmed
tens of thousands of times
Similar storage permanence to
EPROM (about 10 years)
Far more convenient than
EPROMs, but more expensive
Flash Memory
Extension of EEPROM
Same floating gate principle
Same write ability and storage
permanence
Fast erase
Large blocks of memory erased
at once, rather than one word at a
time
13
Blocks typically several thousand
bytes large
Writes to single words may be
slower
Used with embedded systems
storing large data items in
nonvolatile
memory
e.g., digital cameras, TV set-top
boxes, cell phones
RAM: Random-access memory
Typically volatile memory
bits are not held without power
supply
Read and written to easily by
embedded system
during execution
Internal structure more
complex than ROM
a word consists of several
memory cells, each storing 1 bit
enable
2k n read and write
memory
A0
r/w
Qn-1 Q0
Ak-1
external view
14
each input and output data line
connects to each cell in its
column
rd/wr connected to every cell
when row is enabled by decoder,
each cell has logic that
stores input data bit when rd/wr
indicates write or outputs
stored bit when rd/wr indicates read
44 RAM
24
deco
der
Q3 Q0
A0
enab
le
A1
Q2 Q1
Memo
ry cell
I3 I2 I1 I0
rd/wr To every cell
internal
view
Basic types of RAM
SRAM: Static RAM
Memory cell uses flip-flop to
store bit
Requires 6 transistors
e
ct
io
n
Holds data as long as power

supplied
Fast.
DRAM: Dynamic RAM
memory cell internals
Data' Data
SRAM
15
Memory cell uses MOS transistor
and
capacitor to store bit
More compact than SRAM
Refresh required due to
capacitor leak
words cells refreshed when read
Typical refresh rate 15.625
microsec.
Slower to access than SRAM
W
Data
W
DRAM
RAM variations
PSRAM: Pseudo-static RAM
DRAM with built-in memory
refresh controller
Popular low-cost high-density
alternative to SRAM
NVRAM: Nonvolatile RAM
Holds data after external power
removed
16
Battery-backed RAM
SRAM with own permanently
connected battery
writes as fast as reads
no limit on number of writes
unlike nonvolatile ROM-based
memory
SRAM with EEPROM or flash
stores complete RAM contents on
EEPROM or flash before power
turned of
Composing memory
Memory size needed often difers
from size of readily available
memories
When available memory is
larger, simply ignore unneeded
high-order address bits and lower
data lines
When available memory is
smaller, compose several smaller
memories into one larger memory
17
Connect side-by-side to increase
width of words
Connect top to bottom to
increase number of words
added high-order address line
selects smaller memory
containing desired word using a
decoder
Combine techniques to increase
number and width of words
18
19
20
Compose 1Kx8ROMs into a
1K32ROM
21
Compose 1Kx8ROMs into an
8K8ROM
Briefly define each of the
following: mask-programmed ROM,
PROM, EPROM, EEPROM, flash
EEPROM, RAM, SRAM,
DRAM, PSRAM, and NVRAM.
22
23
24
Sketch the internal design of a 4
3 ROM.
25
Grounded
Sketch the internal design of a 4

3 RAM.
26
Compose 1Kx8ROMs into a
1K32ROM
27
Compose 1Kx8ROMs into an
8K8ROM
28
29
Cache Memory
31
No
Yes
Cache ,,
Cache is usually designed using
static RAM rather than
dynamic RAM, which is one reason
that cache is more
expensive but faster than main
memory.
Cache usually appears on the
same chip as a processor, where
space is very limited, cache size is
typically only a fraction of
the size main memory
33
Cache access time may be as
low as just one clock cycle,
whereas main memory access time
is typically several cycles.
Cache is partitioned into lines
(also called blocks). Each
line has 4-64 bytes in it. During
data transfer, a whole line is
read or written
Parameters of Cache memory
Cache Hit
A referenced item is found in the
cache by the processor
Cache Miss
A referenced item is not present
in the cache
Hit ratio
Ratio of number of hits to total
number of references =>
number of hits/(number of hits +
number of Miss)
Miss penalty
Additional cycles required to
serve the miss
35
Some assumptions are made
while designing the memory control
circuitry:
The CPU does not need to know
explicitly about the existence of
the cache.
36
The CPU simply makes Read and
Write request. The nature of
these two operations are same
whether cache is present or not.
The address generated by the
CPU always refer to location of
main
memory.
The memory access control
circuitry determines whether or not
the requested word currently exists
in the cache.
Cache Operation
Cache mapping
Direct mapping
Fully-associative mapping
Set-associative mapping
Cache replacement policy
37
Random replacement policy
Least-recently used (LRU)
First-in-first-out (FIFO)
replacement policy
Cache write techniques
Write-through technique
Write-back technique
Cache mapping
is the method for assigning main
memory addresses to the far
fewer number of available cache
addresses, and for
determining whether a particular
main memory address
contents are in the cache.
38
Direct mapping:
A particular block of main
memory can be brought to a
particular
block of cache memory. So, it is not
flexible.
Associative mapping:
In this mapping function, any
block of Main memory can
potentially reside in any cache
block position. This is much more
39
flexible mapping method.
Block-set-associative
mapping:
In this method, blocks of cache
are grouped into sets, and the
mapping allows a block of main
memory to reside in any block of
a specific set. From the flexibility
point of view, it is in between to
the other two methods.
All these three mapping methods
are explained with the
help of an example.
Consider a cache of 4096 (4K)
words with a block size of 32
words.
How many blocks are there in
the cache?
4K cache size require 12 bit
address lines
Since the cache is divided into
blocks.
40
Calculate the number of address
lines required to select and
block
Calculate the numbers of
address lines required to select one
word from each block
Cache Memory
Therefore, the cache is organized
as 128 blocks.
For 4K words, required address
lines are 12 bits.
To select one of the block out of
128 blocks, we need 7 bits
of address lines and to select one
word out of 32 words, we
need 5 bits of address lines.
41
So the total 12 bits of address is
divided for two groups,
lower 5 bits are used to select a
word within a block, and
higher 7 bits of address are used to
select any block of cache
memory.
Main Memory
Let us consider a main memory
system consisting 64K
words.
The size of address bus is 16
bits.
Since the block size of cache is
32 words, so the main
memory is also organized as block
size of 32 words.
42
Calculate the total number of
blocks in the main memory
and the bits required to address the
blocks and also to select
a word within a block.
Therefore, the total number of

blocks in main memory is 2048
(2K x 32 words = 64K words). To
identify any one block of 2K
blocks, we need 11 address lines.
Out of 16 address lines of main
memory, lower 5 bits are used to
select a word within a block and
higher 11 bits are used to select a
block out of 2048 blocks.
43
Number of blocks in cache
memory is 128 and number of
blocks
in main memory is 2048, so at any
instant of time only 128 blocks
out of 2048 blocks can reside in
cache menory. Therefore, we
need mapping function to put a
particular block of main memory
into appropriate block of cache
memory.
Direct Mapping Technique:
The simplest way of associating
main memory blocks with cache
block is the direct mapping
technique.
In this technique, block k of main
memory maps into
block k modulo m of the cache,
where m is the total number of
blocks in cache.
44
In this example, the value of m is
128. In direct mapping
technique, one particular block of
main memory can be
transfered to a particular block of
cache which is derived by
the modulo function.
Since more than one main
memory block is mapped onto a
given cache block position,
contention may arise for that
45
position.
This situation may occurs even
when the cache is not full.
Contention is resolved by allowing
the new block to
overwrite the currently resident
block.
So the replacement algorithm is
trivial.
The detail operation of direct
mapping technique is as
follows:
The main memory address is
divided into three fields.
The field size depends on the
memory capacity and the block
size of cache.
46
In this example, the lower 5 bits
of address is used to
identify a word within a block.
Next 7 bits are used to select a
block out of 128 blocks
(which is the capacity of the
cache).
The remaining 4 bits are used as
a TAG to identify the proper
block of main memory that is
mapped to cache.
47
When a new block is first brought
into the cache,
the high order 4 bits of the main
memory address are stored
in four TAG bits associated with its
location in the cache.
When the CPU generates a
memory request, the 7-bit block
address determines the
corresponding cache block.
48
The TAG field of that block is
compared to the TAG field of
the address.
If they match, the desired word
specified by the low-order 5
bits of the address is in that block
of the cache.
If there is no match, the required
word must be accessed
from the main memory, that is, the
contents of that block of
the cache is replaced by the new
block that is specified by the
new address generated by the CPU
and correspondingly the
TAG bit will also be changed by the
high order 4 bits of the
49
address
Associated Mapping Technique:
In the associative mapping
technique, a main memory block
can
potentially reside in any cache
block position.
In this case, the main memory
address is divided into two groups,
low-order bits identifies the location
of a word within a block and
high-order bits identifies the block.
In the example here,
50
11 bits are required to identify a
main memory block when it is
resident in the cache , high-order
11 bits are used as TAG bits and
low-order 5 bits are used to identify
a word within a block.
The TAG bits of an address
received from the CPU must be
compared to the TAG bits of each
block of the cache to see if the
desired block is present.
51
In the associative mapping, any
block of main memory can go
to any block of cache, so it has got
the complete flexibility
and we have to use proper
replacement policy to replace a
block from cache if the currently
accessed block of main
memory is not present in cache.
52
It might not be practical to use
this complete flexibility of
associative mapping technique due
to searching overhead,
because the TAG field of main
memory address has to be
compared with the TAG field of all
the cache block.
Block-Set-Associative Mapping
Technique
This mapping technique is
intermediate to the previous two
techniques.
Blocks of the cache are grouped
into sets, and the mapping
allows a block of main memory to
reside in any block of a
specific set.
Therefore, the flexibity of
associative mapping is reduced
53
from full freedom to a SET of
specific blocks.
This also reduces the searching
overhead, because the search
is restricted to number of SETS,
instead of number of blocks.
Also the contention problem of
the direct mapping is eased
by having a few choices for block

replacement.
Organize the cache with 4 blocks
in each set
Each set contains 4 blocks, total
number of set is 32.
TAG field of associative mapping
technique is divided into
two groups, one is termed as SET
bit and the second one is
termed as TAG bit
54
The main memory address is
grouped into three parts:
low-order 5 bits are used to
identifies a word within a block.
Since there are total 32 SETS are
present, next 5 bits are used
to identify the set.
High-order 6 bits are used as
TAG bits
55
It is clear that if we increase the
number of blocks per set,
then the number of bits in SET field
is reduced.
Due to the increase of blocks per
set, complexity of search is
also increased. The extreme
condition of 128 blocks per set
requires no set bits and
corresponds to the fully associative
56
mapping technique with 11 TAG
bits.
The other extreme of one block
per set is the direct mapping
method.
The 5-bit set field of the address
determines which set of the cache
might contain the desired block.
This is similar to direct mapping
technique, in case of direct
mapping, it
looks for block, but in case of blockset-associative mapping, it looks for
set.
The TAG field of the address
must then be compared with the
TAGs of
the four blocks of that set.
57
If a match occurs, then the block
is present in the cache; otherwise
the
block containing the addressed
word must be brought to the cache.
This block will potentially come
to the corresponding set only.
Since, there are four blocks in
the set, we have to choose
appropriately
which block to be replaced if all the
blocks are occupied.
Since the search is restricted to
four block only, so the searching
complexity is reduced.
58
59
Cache mapping
is the method for assigning main
memory addresses to the far
fewer number of available cache
addresses, and for
determining whether a particular
main memory address
contents are in the cache.
Direct Mapping
A given memory block can be
mapped into one and only
60
cache line
Cache Line
Cache is partitioned into lines
(also called blocks). Each line
has 4-64 bytes in it. During data

transfer, a whole line is
read or written
61
Each line has a tag that indicates
the address
in M from which the line has been
copied.
Direct Mapping
A given memory block can be
mapped into one and only cache
line
62
Cache address
No of bits in the
caches = log2(cache
size) Used to grab a
particular word
in the cache line
63 Cache Line
Advantage
No need of expensive associative
search
Disadvantage
Miss rate may go up due to
possible increase of mapping
conflicts
64
Fully Associative Cache
No restriction on mapping from
M to C.
Associative search of tags is
expensive.
Feasible for very small size
caches only.
65
Set-Associative Cache
This technique is a compromise
between direct and
fully-associative mapping.
66
N-way set-associative cache
Each M-block can now be
mapped into any one of a set of N
C-blocks.
The sets are predefined.
Let there be K blocks in the
cache. Then
67
N = 1 Direct-mapped cache
N = K Fully associative cache
Most commercial cache have N=
2, 4, or 8.
Block Replacement
Least Recently Used: (LRU)
Replace that block in the set that
has been in the cache
longest with no reference to it.
First Come First Out: (FIFO)
has been in the cache
longest.
Least Frequently Used: (LFU)
has experienced the
fewest references
Cache replacement policy
We need to update the memory
and cache periodically
Write Through Technique
Whenever we write to the cache,
we also write to main
memory, requiring the processor to
wait until the write
to main memory completes.
easy to implement, this
technique may result in several
unnecessary writes to main
memory.
Write Back
Only cache is updated during
write operation
Technique reduces the number of
writes to main
Requires extra bit, called a dirty

bit, with each block. (
similar to FLAG)
whenever the bit is set, cache
copy contents are updated
Performance analysis
Total size of cache total number
of bytes stored - excluding the tag
bits
Cache size bigger lower miss
rates -- but slower accessing
So what is the optimum design ?
71
Example
Assume a small cache 2 Kb
Miss rate 15 % [ 15 out of
every 100 accesses results in a
miss on an average]
Miss penalty (cache miss) 20
cycles [ cost of going to the main
memory when there is a miss]
72
Cache Hit - 2 cycles [ cost of
data access when there is a hit].
Average cost of memory
access ?
[(hit rate = 1- miss rate) * cache hit
(in cycles) + Miss rate * Cache
miss (in cycle)]
Solution ? ___ cycles
Now we double the cache size
which directly slows the cache
retrieval by 1 clock pulse .. Thus
the cache-miss rate
decreases to 6.5 %.
What is the average cost of
memory access ?
73
Again double the cache size,
slows the cache retrieval by 1
clock pulse
the cache-miss rate decreases to
5.565 %.
What is the average cost of
memory access ?
Given the following two cache
designs, find the one with the
best performance by calculating
the average cost of access.
Show all calculations.
(a) 4 Kbyte, 8-way set
associative cache with a 6% miss
rate;
cache hit costs one cycle, cache
miss costs 12 Cycles
74
(b) 8 Kbyte, 4-way setassociative cache with a 4% miss
rate;
cache hit costs two cycles, cache
miss costs 12 cycles.
Problem ..
Given the following three cache
designs, find the one with
the best performance by
calculating the average cost of
access. Show all calculations.
(a) 4 Kbyte, 8-way set
associative cache with a 6% miss
rate;
cache hit costs one cycle, cache
miss costs 12 Cycles
75
(b) 8 Kbyte, 4-way setassociative cache with a 4% miss
rate;
cache hit costs two cycles, cache
miss costs 12 cycles.
(c) 16 Kbyte, 2-way setassociative cache with a 2% miss
rate; cache hit costs three cycles,
cache miss costs 12 cycles.
A given design with cache
implemented has a main memory
access cost of 20 cycles on a miss

and two cycles on a hit. The
same design without the cache has
a main memory access
cost of 16 cycles. Calculate the

minimum hit rate of the
cache to make the cache
implementation worthwhile.
76

Cache

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cache

Uploaded by

Copyright:

Available Formats

MEMORY ORGANIZATION

Typically used for final design of

Holds data as long as power

Sketch the internal design of a 4

Therefore, the total number of

by having a few choices for block

has 4-64 bytes in it. During data

Requires extra bit, called a dirty

access cost of 20 cycles on a miss

cost of 16 cycles. Calculate the

You might also like