Computer Organization-Memory PDF

EC 208
COMPUTER ORGANIZATION AND

MICROPROCESSOR
Computer Abstractions and Technology

What You Have to Learn
• How programs are translated into the machine

language
 And how the hardware executes them
• The hardware/software interface
• What determines program performance
 And how it can be improved
• How hardware designers improve
performance
• What is parallel processing
Understanding Performance
• Algorithm
 Determines number of operations executed
• Programming language, compiler, architecture
 Determine number of machine instructions executed per
operation
• Processor and memory system
 Determine how fast instructions are executed
• I/O system (including OS)
 Determines how fast I/O operations are executed
Below Your Program
• Application software
 Written in high-level language
• System software
 Compiler: translates HLL code to machine code
 Operating System: service code
 Handling input/output
 Managing memory and storage
 Scheduling tasks & sharing resources
• Hardware
 Processor, memory, I/O controllers
What is “Computer Architecture”?
Application (Netscape)
Operating System
Compiler (Unix;
Software Assembler Windows 9x)
Instruction Set
Architecture
Hardware Processor Memory I/O system
Datapath & Control

Digital Design
Circuit Design Your Learning
transistors, IC layout
• Key Idea: levels of abstraction

– hide unnecessary implementation details
– helps us cope with enormous complexity of real
systems
The Instruction Set: A Critical
Interface
software
instruction set
hardware
Levels of Program Code
• High-level language
– Level of abstraction closer to problem
domain
– Provides for productivity and portability
• Assembly language
– Textual representation of instructions
• Hardware representation
– Binary digits (bits)
– Encoded instructions and data
Components of a Computer
The BIG Picture • Same components for
all kinds of computer
 Desktop, server, embedded
• Input/output includes
 User-interface devices
• Display, keyboard, mouse
 Storage devices
• Hard disk, CD/DVD, flash
 Network adapters
• For communicating with other
computers
The Von Neumann Computer
The Computer
• Stored-Program Concept – Storing programs as numbers – by
John von Neumann – Eckert and Mauchly worked in engineering
the concept.
• Idea: A program is written as a sequence of instructions,

represented by binary numbers. The instructions are stored in
the memory just as data. They are read one by one, decoded
and then executed by the CPU.
The data path of a typical Von Neumann
machine
Response Time and Throughput
• Response time
– How long it takes to do a task
• Throughput
– Total work done per unit time
• e.g., tasks/transactions/… per hour
• How are response time and throughput

affected by
– Replacing the processor with a faster version?
– Adding more processors?
• We’ll focus on response time for now…
Relative Performance
• Define Performance = 1/Execution Time
• “X is n time faster than Y”
Performance X Performance Y
 Execution timeY Execution timeX  n
 Example: time taken to run a program

 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B
Measuring Execution Time
• Elapsed time
 Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
 Determines system performance
• CPU time
 Time spent processing a given job
• Discounts I/O time, other jobs’ shares
 Different programs are affected differently by CPU and
system performance
CPU Clocking
Operation of digital hardware governed by a constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
 Clock period: duration of a clock cycle

 e.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second
 e.g., 4.0GHz = 4000MHz = 4.0×109Hz
CPU Time
CPU Time  CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles

Clock Rate
Performance improved by
 Reducing number of clock cycles (good algorithm or hardware design)
 Increasing clock rate (good technology)
 Hardware designer must often trade off clock rate against cycle count
Performance Summary
The BIG Picture
Instructions Clock cycles Seconds

CPU Time   
Program Instruction Clock cycle
Performance depends on
 Algorithm: affects IC, possibly CPI
 Programming language: affects IC, CPI
 Compiler: affects IC, CPI
 Instruction set architecture: affects IC, CPI, Tc
Instructions Clock cycles Seconds
CPU Time   
Program Instruction Clock cycle
A basic performance equation:
T – processor time required to execute a program (not total time used);

N - Actual number of machine instructions (including that due to loops);
S – Average No. of clock cycles/instruction;
R – Cycle/sec
Earlier measures –
MIPS (Millions of Instructions per sec.)
MFLOPS – Million floating point operations per sec.
CPI – Cycles per Instruction;
IPC – Instructions per cycle = 1/CPI;
Speedup = (Earlier execution time) / (Current execution time)

Basic Terminology
• Computer • Software
A device that accepts input, A computer program that tells the
processes data, stores data, and computer how to perform particular
produces output, all according to a tasks.
series of stored instructions.
• Network
• Hardware Two or more computers and other
Includes the electronic and devices that are connected, for the
mechanical devices that process the purpose of sharing data and
data; refers to the computer as well programs.
as peripheral devices.
• Peripheral devices
Used to expand the computer’s
input, output and storage
capabilities.
Basic Terminology
• Input
• Whatever is put into a computer system.
• Data
• Refers to the symbols that represent facts, objects, or ideas.
• Information
• The results of the computer storing data as bits and bytes; the words, numbers,
sounds, and graphics.
• Output
• Consists of the processing results produced by a computer.
• Processing
• Manipulation of the data in many ways.
• Memory
• Area of the computer that temporarily holds data waiting to be processed,
stored, or output.
• Storage
• Area of the computer that holds data on a permanent basis when it is not
immediately needed for processing.
…Key Terminology
•• Out of order
Control Pathexecution
•• Datapath
Microcontroller
•• Out-of-order
ALU, FPU, GPU etc.
execution
•• Register renaming
CPU design
•• Dataflow
Hardwarearchitecture
description language
•• Stream processing
Pipelining
•• multi-threading
Cache
•• RISC, CISC
Von-Neumann architecture
•• Instruction-level parallelism (ILP)
Multi-core (computing)
•• Addressing
SuperscalarModes
• Vector processor
• Instruction set
Multiprocessors
• Multicore microprocessors
 More than one processor per chip
• Requires explicitly parallel programming

 Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
 Hard to do
• Programming for performance
• Load balancing
• Optimizing communication and synchronization
COMPUTER ORGANISATION AND ARCHITECTURE
• The components from which computers are built, i.e.,

computer organization.
• In contrast, computer architecture is the science of integrating
those components to achieve a level of functionality and
performance.
• It is as if computer organization examines the lumber, bricks,
nails, and other building material
• While computer architecture looks at the design of the house.
COMPUTER TYPES
Computers are classified based on the

parameters like
• Speed of operation
• Cost
• Computational power
• Type of application
DESK TOP COMPUTER
• Processing & storage units, visual display & audio units,

keyboards
• Storage media-Hard disks, CD-ROMs
• Eg: Personal computers which is used in homes and offices
• Advantage: Cost effective, easy to operate, suitable for general
purpose educational or business application
NOTEBOOK COMPUTER
• Compact form of personal computer (laptop)
• Advantage is portability
WORK STATIONS
• More computational power than PC
• Costlier
• Used to solve complex problems which arises in engineering application
(graphics, CAD/CAM etc)
ENTERPRISE SYSTEM (MAINFRAME)
• More computational power
• Larger storage capacity
• Used for business data processing in large organization
• Commonly referred as servers or super computers
SERVER SYSTEM
• Supports large volumes of data which frequently need to be accessed or to
be modified
• Supports request response operation
SUPER COMPUTERS
• Faster than mainframes
• Helps in calculating large scale numerical and algorithm calculation
• Used for aircraft design and testing, military application and weather
forecasting
Computing Systems
Computers have two kinds of components:
• Hardware, consisting of its physical devices
(CPU, memory, bus, storage devices, ...)
• Software, consisting of the programs it has
(Operating system, applications, utilities, ...)
The Big Picture
Processor
Input
Control
Memory
ALU
Output
FUNCTIONAL UNITS OF COMPUTER
• Input Unit
• Output Unit
• Central processing Unit (ALU and Control Units)
• Memory
• Bus Structure
Structure - Top Level
Peripherals Computer
Central Main
Processing Memory
Unit
Computer Systems
Interconnection
Input
Output
Communication
lines
Structure - The CPU
CPU
Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection
Control
Unit
Structure - The Control Unit
Control Unit
CPU
Sequencing
ALU Login
Control
Internal
Bus Unit
Control Unit
Registers Registers and
Decoders
Control
Memory
Function
ALL computer functions are:
– Data PROCESSING
– Data STORAGE
– Data MOVEMENT Data = Information
Coordinates How
– CONTROL Information is Used
• NOTHING ELSE!
INPUT UNIT:
• Converts the external world data to a binary format, which can be understood by CPU
• Eg: Keyboard, Mouse, Joystick etc
OUTPUT UNIT:
• Converts the binary format data to a format that a common man can understand
• Eg: Monitor, Printer, LCD, LED etc
CPU
• The “brain” of the machine
• Responsible for carrying out computational task
• Contains ALU, CU, Registers
• ALU Performs Arithmetic and logical operations
• CU Provides control signals in accordance with timings which in turn controls the execution process
• Register Stores data and result and speeds up the operation

Example
Add R1, R2
Control unit works with
T1 Enable R1
a reference signal
called processor clock
T2 Enable R2
Processor divides the

operations into basic
stepsEnable ALU for add op
T3
T4
Each basic step is
executed in one clock
cycle
MEMORY
•Stores data, results, programs

• Two class of storage
(i) Primary (ii) Secondary
• Two types are RAM or R/W memory and ROM read only memory
• ROM is used to store data and program which is not going to change.
• Secondary storage is used for bulk storage or mass storage

Interconnection between Processor and Memory
Registers
Registers are fast stand-alone storage locations- hold data temporarily.
Multiple registers are needed to facilitate the operation of the CPU.
 Two registers-MAR (Memory Address Register) and MDR (Memory Data
Register) : To handle the data transfer between main memory and processor.
 MAR-Holds addresses,
 MDR-Holds data
 Instruction register (IR):Hold Instructions that is currently being executed
 Program counter: Points next instructions to be fetched from memory
• (PC) (MAR) the contents of PC transferred to MAR

• (MAR) (Address bus) Select a particular memory location
• Issues RD control signals
• Reads instruction present in memory and loaded into MDR
• Will be placed in IR (Contents transferred from MDR to IR)
Registers
 MAR-Holds addresses,
 MDR-Holds data
 Instruction register (IR):Hold Instructions
that is currently being executed
 Program counter: Points next instructions
to be fetched from memory
• (PC) (MAR) the contents of PC transferred to MAR

• (MAR) (Address bus) Select a particular memory location
• Issues RD control signals
• Reads instruction and loaded into MDR
• Will be placed in IR
• Instruction present in IR will be decoded
(processor understand what operation it has to perform)
• Increments PC by 1, so that it points to the next instruction address

• If data required for operation is available in register, it performs the
operation
• If data is present in memory following sequence is performed
• Address of the data MAR

• MAR ADD BUS
• Issued RD signal
• Reads data via data bus MDR
• From MDR data (a) directly routed to ALU or
(b) placed in register and operation can be performed
• Results of the operation directed towards output device, memory or
register
Connecting
• All the units must be connected
• Different type of connection for different type
of unit
Memory
Input/Output
CPU
Computer
Modules
Memory Connection
• Receives and sends data
• Receives addresses (of locations)
• Receives control signals
 Read
 Write
 Timing
Input/Output Connections
Similar to memory from computer’s viewpoint
• Output
 Receive data from computer
 Send data to peripheral
• Input
 Receive data from peripheral
 Send data to computer
• Receive control signals from computer
• Send control signals to peripherals
 e.g. spin disk
• Receive addresses from computer
 e.g. port number to identify peripheral
• Send interrupt signals (control)
CPU Connection
• Reads instruction and data
• Writes out data (after processing)
• Sends control signals to other units
• Receives (& acts on) interrupts
BUS STRUCTURE
Connecting CPU and memory
The CPU and memory are normally connected by three groups of connections,
each called a bus:
(Group of wires which carries information form CPU to peripherals or vice – versa)
(A communication pathway connecting two or more devices)
 data bus,
 address bus and
 control bus
Connecting CPU and memory using three buses

PERFORMANCE
•Time taken by the system to execute a program
Parameters which influence the performance are

• Clock speed
• Type and number of instructions available
• Average time required to execute an instruction
• Memory access time
• Power dissipation in the system
• Number of I/O devices and types of I/O devices connected
• The data transfer capacity of the bus
•Databus: bidirectional
(Remember there is no difference in “data” and “instruction” at this level)
 group of wires which carries data information bit form
 Width is a key determinant of performance,8, 16, 32, 64 bit
processor to peripherals and vice – versa
• Address bus : unidirectional :

 group of wires ,carries address information bits
 Identify the source or destination of data
 Bus width determines maximum memory capacity of system
e.g. 8085 has 16 bit address bus giving 64k address space
processor to peripherals (16,20,24 or more parallel signal lines)
•Control bus: bidirectional:

 group of wires which carries control signals form
 Control and timing information
• Memory read/write signal
• Interrupt request
• Clock signals processor to peripherals and vice – versa
Bus Interconnection Scheme
BUS STRUCTURE
•Single bus structure: Common bus used to communicate between peripherals
and microprocessor.
INPUT MEMORY PROCESSOR OUTPUT
Single Bus Problems

Lots of devices on one bus leads to:
 Propagation delays
 Long data paths mean that co-ordination of bus use can adversely
affect performance
 If aggregate data transfer approaches bus capacity
Most systems use multiple buses to overcome these problems
Bus Types
Dedicated
Separate data & address lines
Multiplexed
 Shared lines
 Address valid or data valid control line
 Advantage - fewer lines
 Disadvantages
• More complex control
• Ultimate performance
Bus Arbitration
• More than one module controlling the bus
• e.g. CPU and DMA controller
• Only one module may control bus at one time
• Arbitration may be centralised or distributed
Centralised or Distributed
Arbitration
• Centralised
 Single hardware device controlling bus access
• Bus Controller
• Arbiter
 May be part of CPU or separate
• Distributed
 Each module may claim the bus
 Control logic on all modules
Memory Organization
MEMORY ORGANIZATION
• Memory Hierarchy
• Main Memory
• Auxiliary Memory
• Associative Memory
• Cache Memory
• Virtual Memory
• Memory Management Hardware

MEMORY HIERARCHY
Memory Hierarchy is to obtain the highest possible access speed while

minimizing the total cost of the memory system
Auxiliary memory
Magnetic
tapes I/O Main
processor memory
Magnetic
disks
CPU Cache
memory
MEMORY HIERARCHY
MEMORY HIERARCHY
Register
Cache
Speed
Main Memory
Size
Magnetic Disk
Magnetic Tape
 The memory unit that directly communicate with CPU is called the main memory
 Devices that provide backup storage are called auxiliary memory
 The memory hierarchy system consists of all storage devices employed in a computer system.
Memory Hierarchy
• CPU logic is usually faster than main memory access time,
(processing speed is limited primarily by the speed of main memory)
• The cache is storing segments of programs currently being

executed in the CPU and temporary data.
(frequently needed in the present calculations )
• The typical access time ratio between cache and main memory
is about 1 to 7~10
• Auxiliary memory access time is usually 1000 times that of

main memory
Main Memory
• Most of the main memory is made up of RAM IC chips,

(but a portion of the memory may be constructed with ROM chips)
• RAM– Random Access memory

Integated RAM are available in two possible operating modes, Static and Dynamic
• ROM– Read Only memory

Classification
• Read Only Read Write
• Random Access Sequential
Access
• Volatile Non Volatile
• Static Dynamic
Static RAM (SRAM)
– Each cell stores bit with a six-transistor circuit.
– Data is stored as long as supply is applied.
– Fast – so used where speed is important
– use sense amps for performance
– compatible with CMOS technology
– Relatively insensitive to disturbances such as electrical noise.
– Faster (8-16 times faster) and more expensive (8-16 times expensive ) than
DRAM.
6-transistor SRAM Cell (CMOS)
WL
M2 M4
Q M6
M5 !Q
M1 M3
!BL BL
Dynamic RAM (DRAM)
 Each cell stores bit with a capacitor and transistor.
 Small cells (1 to 3 fets/cell) – so more bits/chip
 Periodic refresh required
 Sensitive to disturbances.
 Slower – so used for main memories
 Not typically compatible with CMOS technology
Typical storage capacitance has a value of 20 to 50 fF.
Data stored as charge in a capacitor can be retained only for a limited time
due to the leakage current which eventually removes or modifies the charge.
Assuming that the voltage on the fully charged storage capacitor is V = 2.5V,
and that the leakage current is I = 40pA, then the time to discharge the
capacitor C = 20fF to the half of the initial voltage can be estimated as
Hence ever memory cell must be refreshed approximately every half

millisecond.
ADVANTAGES OF DRAM:
• The size of a DRAM cell is in the order of 8F2,
where F is the smallest feature size in a given technology.
For F = 0.2μm the size is 0.32μm2
No static power is dissipated for storing charge in a capacitance
• SDRAM : Synchronous DRAM
• SDR SDRAM : Single Data Rate SDRAM
• DDR SDRAM : Double Data Rate SDRAM
• DDR2 SDRAM : an evolution over DDR SDRAM
• DDR3 SDRAM : improvement over DDR2
• RLDRAM : Reduced‐latency DRAM
Tran. Access
per bit time Persist? Sensitive? Cost Applications
SRAM 6 1X Yes No 100x cache memory
DRAM 1 10X No Yes 1X Main memories,

frame buffers
MEMORY CAPACITY
WORD ORGANIZATION
64-cell Memory Array
Row
1 1 1 0 0 1 0 1 0
2 0 1 0 0 1 0 1 0
3 0 1 0 0 1 1 0 1
4 1 0 1 0 0 1 0 1
5 1 0 1 1 1 0 1 1
6 0 1 0 1 0 0 0 0
7 1 0 0 1 0 1 1 0
8 0 0 1 1 0 0 0 0
1 2 3 4 5 6 7 8 Column
Memory Organized as 4 x 16 and 1 x 64 Arrays
Row
1 1 1 0 0 1 1
2 0 1 0 0 2 0
3 0 1 0 0 3 0
4 1 0 1 0 4 1
5 1 0 1 1 5 1
6 0 1 0 1 6 0
.• .• .•
.• .• .•
.• .• .•
13 1 0 1 1 61 1
14 0 0 0 0 62 0
15 0 1 1 0 63 0
16 0 0 0 0 64 0
1 2 3 4 Column 1
Block Diagram of a Read-write Memory
Memory
Select
Address
Decoder
Memory Array
Address Data Bus
Bus
Read Write
Memory Read Operation (8 X 8)
Address Memory Data

MAR
Buffer Select MDR
Buffer
100 Address 00110001
Decoder
000
1 1 0 1 0 1 0 0
001
0 0 0 1 1 0 0 1
010
0 1 1 0 1 1 0 0
011
3 1 0 0 1 0 0 1 0
100 0 0 1 1 0 0 0 1
101
Address 0 1 0 0 1 1 0 0
Data Bus
110 1 0 0 1 0 0 1 0
Bus
111 0 0 1 0 0 1 0 0
Read Write
Memory Write Operation
Address Memory Data

Buffer Select Buffer
011 Address 10110010
Decoder 000
1 1 0 1 0 1 0 0
001
0 0 0 1 1 0 0 1
010
0 1 1 0 1 1 0 0
011
3 1 0 1 1 0 0 1 0
100
0 0 1 1 0 0 0 1
101
Address 0 1 0 0 1 1 0 0
Data Bus
110 1 0 0 1 0 0 1 0
Bus
111 0 0 1 0 0 1 0 0
Read Write
16K x 8 Static RAM
A0
Address
Lines RAM Data Input/
16K x 8 Output Lines
A13
CS
WE
OE
Tri State Buffer
A tristate buffer can output 3 different values:

– Logic 1 (high)
– Logic 0 (low)
– High-Impedance
Encoder
Input
D-Latch D-Flip Flop
• Level triggered vs Edge triggered

• Positive edge vs negative edge triggered
• Race around condition
• FF characteristic tables and excitation table
MAIN MEMORY
RAM and ROM Chips
Typical RAM chip Chip select 1 CS1

Chip select 2 CS2
Read RD 128 x 8 8-bit data bus
RAM
Write WR
7-bit address AD 7
CS1 CS2 RD WR Memory function State of data bus

0 0 x x Inhibit High-impedence
1 0 0 0 Inhibit High-impedence
1 0 0 1 Write Input data to RAM
1 0 1 x Read Output data from RAM
Typical ROM chip
Chip select 1 CS1

Chip select 2 CS2
512 x 8 8-bit data bus
ROM
9-bit address AD 9
Memory Subsystems Organization
Two or more memory chips can be combined to create memory
with more bits per location
Address Lines : 3
(two 8X2 chips can create a 8X4 memory) Bits per Address: 4
Horizontal expansion
Add: #1 #2
A2A1A0 D3D2 D1D0
000 D3D2 D1D0
001 D3D2 D1D0
010 D3D2 D1D0
011 D3D2 D1D0
100 D3D2 D1D0
101 D3D2 D1D0
110 D3D2 D1D0
111 D3D2 D1D0
Memory Subsystems Organization
Two or more memory chips can be combined to create more
locations Address Lines : 4
(two 8X2 chips can create 16X2 memory) Bits per Address: 2
Address:
A3A2A1A0
0 0000
1 0001
0010
A3=0
2
3 0011
4 0100
5 0101
6 0110
Vertical expansion 7 0111
8 1000
9 1001
10 1010
A3=1
11 1011
12 1100
13 1101
14 1110
15 1111
MEMORY ADDRESS MAP
Pictorial representation of assigned address space for each chip in the system
Address space assignment to each memory chip

Example: 1 KB MEMPRY 512 bytes RAM (from 128X8 chip) and 512 bytes ROM
Memory Connection to CPU

- RAM and ROM chips are connected to a CPU through the data and address buses
- The low-order lines in the address bus select the byte within the chips and
other lines in the address bus select a particular chip through its chip select inputs
RAM: 512 Byte (512 X8) from 128x8 chip
No of Chips required: (512X8 ) / (128X8) = 4 Chips
Tentative address : RAM 1 : 0-127

RAM 2 : 128 -255
RAM 3 : 256- 383
RAM 4 : 384 -511
No of Address lines required : 128 =28 8 lines
RAM :512 Byte from 512 X 8 chip = 1 chip

No of Address lines required : 512 =29 9 lines
Hexa Address bus

Component
address 10 9 8 7 6 5 4 3 2 1
RAM 1 0000 - 007F 0 0 0 x x x x x x x
RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
ROM 0200 - 03FF 1 x x x x x x x x x
RAM: 512 Byte (512 X8) from 128x8 chip
RAM :512 Byte from 512 X 8 chip
Address bus
Hexa
Component address
10 9 8 7 6 5 4 3 2 1
RAM 1 0000 - 007F 0 0 0 x x x x x x x

RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
ROM 0200 - 03FF 1 x x x x x x x x x
CONNECTION OF MEMORY TO CPU
Address bus CPU
16-11 10 9 8 7-1 RD WR Data bus
Decoder
3 2 1 0
CS1
CS2
Data
RD 128 x 8
RAM 1
0000 – 007F
WR
AD7
CS1
CS2
Data
RD 128 x 8 0080 – 00FF
RAM 2
WR
AD7
CS1
CS2
Data
RD 128 x 8
RAM 3
0100 – 017F
WR
AD7
CS1
CS2
RD 128 x 8 Data 0180 – 017F
RAM 4
WR
AD7
CS1
CS2
Data
1- 7 512 x 8 0200 – 03FF

8
9 } AD9 ROM
Memory Hierarchy Design
Registers Cache (one or Main Disk
(CPU) more levels) Memory Storage
Specialized bus Memory bus I/O bus

(internal or external
to CPU)
Tradeoff between size, speed and cost and exploits the principle of locality.
1. Register
 Fastest memory element; but small storage; very expensive
2. Cache
 Fast and small compared to main memory; acts as a buffer between the CPU
and main memory: it contains the most recent used memory locations
(address and contents are recorded here)
3. Main memory is the RAM of the system
4. Disk storage - HDD
Memory Hierarchy Design
Comparison between different types of
memory
Register Cache Memory HDD
size 32 - 256 B 32KB - 4MB 1000 MB 500 GB
speed 1-2 ns 2-4 ns 60 ns 8 ms
$/Mbyte $20/MB $0.2/MB $0.001/MB
larger, slower, cheaper

CACHE MEMORY
Locality of Reference
-The references to memory at any given time interval tend to be confined within a localized areas
-This area contains a set of information and the membership changes gradually as time goes by
 Temporal Locality
The information which will be used in near future is likely to be in use already.
 Spatial Locality
If a word is accessed, adjacent(near) words are likely accessed soon.
(e.g. Related data items are usually stored together; instructions are executed sequentially)
Cache
- The property of Locality of Reference makes the Cache memory systems work
- Cache is a fast small capacity memory that should hold those information
which are most likely to be accessed
Main memory
CPU
Cache memory
Cache Memory
• High speed (towards CPU speed)
• Small size (power & cost)
Miss
Main
CPU Memory
(Slow)
Mem
Cache
(Fast)
Hit Cache
95% hit ratio
Access = 0.95 Cache + 0.05 Mem

PERFORMANCE OF CACHE
Memory Access
 All the memory accesses are directed first to Cache
 If the word is in Cache; Access cache to provide it to CPU
 If the word is not in Cache; Bring a block (or a line) including that word
to replace a block now in Cache
Q. How can we know if the word that is required is there ?
Q. If a new block is to replace one of the old blocks, which one should
we choose ?
 Replacement algorithm
 Hit / miss
 Write-through / Write-back
 Load through
Cache Memory
Main
CPU Memory
(Slow)
Mem
Cache
(Fast)
Cache
Cache
• Every address reference goes first to the cache;
if the desired address is not here, then we have a cache
miss;
If the desired data is in the cache, then we have a cache hit
• Most software exhibits temporal locality of access,
• Transfers between main memory and cache occur at

granularity of cache lines or cache blocks, around 32 or 64
bytes (rather than bytes or processor words).
PERFORMANCE OF CACHE
Performance of Cache Memory System
Hit Ratio - % of memory accesses satisfied by Cache memory system

Te : Effective memory access time in Cache memory system
Tc : Cache access time
Tm : Main memory access time
Te = hTc + (1 - h) Tm
Example: Tc = 0.4 s,

Tm = 1.2s,
h = 0.85%
Te = 0.4 + (1 - 0.85) * 1.2 = 0.58s

ASSOCIATIVE MEMORY
- Accessed by the content of the data rather than by an address
- Also called Content Addressable Memory (CAM)
- Memory accessed simultaneously & in parallel on the basis of data content
- Rather than by specific Address
-Search can be done : on entire word or
Specific field within a word
- Search in parallel.
- Each cell have storage capacity, match logic
Write operation: No address is given

Memory capable of finding an empty, unused location to store the word
ASSOCIATIVE MEMORY
Hardware Organization
Argument register(A)
n
Key register (K)
Match
n register
Associative memory
Input m
array and logic
mXn M
Read m words
Write n bits per word
- Compare each word in CAM in parallel with the content of A (Argument Register)
- If CAM Word[i] = A, M(i) = 1
- Read sequentially accessing CAM for CAM Word(i) for M (i) = 1
- K (Key Register) provides a mask for choosing a
particular field or key in the argument in A
(only those bits in the argument that have 1’s in their corresponding position of K are compared)
Key register: masking for ‘A’

ORGANIZATION OF CAM
A1 Aj An
K1 Kj Kn
Word 1 C11 C1j C1n M1
Word i Ci1 Cij Cin Mi
Word m Cm1 Cmj Cmn Mm
Bit 1 Bit j Bit n
Internal organization of a typical cell Cij

Aj Kj
Input
Write
R S
F ij Match To M i
Read logic
Output
MATCH LOGIC
K1 A1 K2 A2 Kn An
F'i1 F i1 F'i2 F i2 .... F' in F in
Mi
Back on Cache Memory
CPU 30-bit Address

Main
Memory
Cache 1 Gword
1 Mword
Only 20 bits !!!

Cache Memory
00000000
00000001 Main
•
• Memory
•
00000
00001
Cache •
•
• •
• •
• •
• •
FFFFF •
3FFFFFFF
Address Mapping !!!

MEMORY AND CACHE MAPPING
Mapping Function
Specification of correspondence between main memory blocks and cache blocks
 Associative mapping
 Direct mapping
 Set-associative mapping
Associative mapping
Any block of Main memory can potentially reside in any cache block position.
This is much more flexible mapping method.
Direct mapping
A particular block of main memory can be brought to a particular block of cache memory.
So, it is not flexible.
Set-associative mapping
Blocks of cache are grouped into sets, and the mapping allows a block of main memory to
reside in any block of a specific set.
Associative Mapping
- Any block location in Cache can store any block in memory -> Most flexible
- Mapping Table is implemented in an associative memory -> Fast, very Expensive
- Mapping Table Stores both address and the content of the memory word
address (15 bits)
Argument register
Address Data
01000 3450
CAM 02777 6710
22235 1234
Octal address
Associative Mapping
Address
68212
Can have any Cache

number of
locations
68212 01A6
Data
15830 0005 01A6
08993 47CC
How many
comparators?
15 Bits 12 Bits
(Key) (Data)
Associative Mapping
Cache
Location Main
Memory
00000000
00000
00001
Cache 00000001
• •
•
• •
00012000
• 00012000
• •
•
• 15000000 •
• •
• •
• 08000000 08000000
•
• •
FFFFF •
•
•
•
Address (Key) Data 15000000
•
3FFFFFFF
Direct Mapping
- Each memory block has only one place to load in Cache
- Mapping Table is made of RAM instead of CAM
- n-bit memory address consists of 2 parts; k bits of Index field and (n-k) bits of Tag field
- n-bit addresses are used to access main memory , k-bit Index used to access the Cache
Addressing Relationships 2n word in main memory

2k words in cache memory
n bits
(n-k) bits k bits
Tag(6) Index (9)
00 000 32K x 12
000
512 x 12
Main memory Cache memory
Address = 15 bits Address = 9 bits
Data = 12 bits Data = 12 bits
77 777 777
Direct Mapping
Direct Mapping Cache Organization
Memory Memory data

address
00000 1220
Index
address Tag Data
00777 2340 000 00 1220
01000 3450
01777 4560
02000 5670
777 02 6710
02777 6710
Cache memory
 Each word in cache consists of data word and memory tag.
Operation
 When a new
- CPU word isafirst
generates brought
memory in cache,
request tag bits stored alongside data bits
with (TAG;INDEX)
 When CPU generate
- Access a memory
Cache using INDEX ;request, index field
(tag; data) used TAG
Compare as address
and tag for cache
 Tag- field of CPU->address
If matches Hit compared with tag of word to CPU
Provide Cache[INDEX](data)
-If both match,
- If not matchHIT, else-MISS
-> Miss
• M [tag;INDEX] <- Cache[INDEX](data)
• Cache[INDEX] <- (TAG;M[TAG; INDEX])
• CPU <- Cache[INDEX](data)
Direct Mapping
CPU Address What happens when

Address
000 00500 = 100 00500
00000
Cache
00500 000 01A6
Tag Data
00900 080 47CC 000 01A6
01400 150 0005
FFFFF
Match
Compare
No match
20
10 Bits 16 Bits
Bits
(Tag) (Data)
(Addr)
Direct Mapping Main
memory
Block 0
0
Block j of main memory maps onto block j modulo 128 Block 1
of the cache
Cache Block 127

Tag Block W ord
tag
5 7 4 Main memory address Block 0 Block 128
1
tag
Block 1 Block 129
4: one of 16 words. (each block has 16=24 words)
7: points to a particular block in the cache (128=27)

tag
Block 127 Block 255
5: 5 tag bits are compared with the tag bits associated with
Block 256
its location in the cache. Identify which of the 32 blocks 2
that are resident in the cache (4096/128). Block 257
EX. 11101,1111111,1100
 Tag: 11101 Block 4095 3
 Block: 1111111=127, in the 127th block of the cache 1
Word:1100=12, the 12th word of the 127th block in the cache
Direct Mapping (with block)
Address
000 0050 0
Block Size = 16
00000
Cache
00500 01A6
000
00501 0254
• Tag Data
00900
080
47CC 000 01A6
00901 A0B4
•
01400 0005
150
01401 5C04
•
FFFFF
Match
Compare
No match
20
Bits 10 Bits 16 Bits
(Addr) (Tag) (Data)
Set Associative Mapping
- Each memory block has a set of locations in the Cache to load
Set Associative Mapping Cache with set size of two

Index Tag Data Tag Data
000 01 3450 02 5670
777 02 6710 00 2340
Operation
- CPU generates a memory address(TAG; INDEX)
- Access Cache with INDEX, (Cache word = (tag 0, data 0); (tag 1, data 1))
- Compare TAG and tag 0 and then tag 1
- If tag i = TAG -> Hit, CPU <- data i
- If tag i  TAG -> Miss,
Replace either (tag 0, data 0) or (tag 1, data 1),
Assume (tag 0, data 0) is selected for replacement,
M[tag 0, INDEX] <- Cache[INDEX](data 0)
Cache[INDEX](tag 0, data 0) <- (TAG, M[TAG,INDEX]),
CPU <- Cache[INDEX](data 0)
Main
Set Associative Mapping memory
Block 0
Block 1
4: one of 16 words. (each block
has 16=24 words)
Cache
6: points to a particular set in tag
the cache (128/2=64=26) Set 0
Block 0
Block 63
tag
6: 6 tag bits is used to check if Block 1
the desired block is present Block 64
tag
Block 2
(4096/64=26). Set 1
tag Block 65
Block 3
tag Block 127

Block 126
Set 63
Block 128
tag
Block 127
Block 129
Block 4095
Tag Set Word Set-associative-mapped cache with two blocks per set.
6 6 4 Main memory address

Set Associative Mapping
CPU Address
000 00500
2-Way Set Associative
00000
Cache
00500 000 01A6 010 0721
Tag1 Data1 Tag2 Data2
00900 080 47CC 000 0822 000 01A6 010 0721
01400 150 0005 000 0909
FFFFF
Compare Compare
20
10 Bits 16 Bits 10 Bits 16 Bits
Bits
(Tag) (Data) (Tag) (Data) Match No match
(Addr)
Replacement Algorithms
• Difficult to determine which blocks to kick out
• The cache controller tracks references to all blocks as
computation proceeds.
• Increase / clear track counters when a hit/miss occurs
Which Block should be Replaced on a Cache Miss?
Two primary strategies for full and set associative caches
– Random – candidate blocks are randomly selected
 Some systems generate pseudo random block numbers, to get reproducible
behavior useful for debugging
– LRU (Least Recently Used) – to reduce the chance that information that
has been recently used will be needed again, the block replaced is the
least-recently used one.
• For Associative & Set-Associative Cache
Which location should be emptied when the cache
is full and a miss occurs?
 First In First Out (FIFO)
 Least Recently Used (LRU)
• Distinguish an Empty location from a Full one
 Valid Bit
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss
Cache A A A A A E E E E E
FIFO  B B B B B A A A A
C C C C C C C F
D D D D D D
First in - A
then - B
then - C Hit Ratio = 3 / 10 = 0.3
then - D
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss
Cache A B C A D E A D C F
LRU  A B C A D E A D C
A B C A D E A D
B C C C E A
Hit Ratio = 4 / 10 = 0.4

CACHE WRITE
Write Through
When writing into memory
If Hit, both Cache and memory is written in parallel
If Miss, Memory is written
For a read miss, missing block may be overloaded onto a cache block
Memory is always updated

-> Important when CPU and DMA I/O are both executing
Slow, due to the memory access time
Write-Back (Copy-Back)
When writing into memory
If Hit, only Cache is written
If Miss, missing block is brought to Cache and write into Cache
For a read miss, candidate block must be written back to the memory
Memory is not up-to-date,

(the same item in Cache and memory may have different value)
VIRTUAL MEMORY
Give the programmer the illusion that the system has a very large memory,
even though the computer actually has a relatively small main memory
Address Space (Logical) and Memory Space (Physical)
address space memory space
virtual address Mapping

(logical address) physical address
address generated by programs actual main memory address

VIRTUAL MEMORY
address space memory space
virtual address Mapping

(logical address) physical address
address generated by programs actual main memory address
Address Mapping
Memory Mapping Table for Virtual Address -> Physical Address
Virtual address
Virtual Memory Main memory

address address Main
mapping memory
register table register
Physical
Address
Memory table Main memory
buffer register buffer register
ADDRESS MAPPING
Address Space and Memory Space are each divided into fixed size group of
words called blocks or pages.
Ex: 1K words group(page size) Page 0
Page 1
Page 2
Address space Memory space Block 0
N = 8K = 213 Page 3
M = 4K = 212 Block 1
Page 4
Block 2
Page 5
Block 3
Page 6
Page 7
Organization of memory Mapping Table in a paged system
Page no. Line number
1 0 1 0 1 0 1 0 1 0 0 1 1 Virtual address
Table Presence
address bit
000 0 Main memory
001 11 1 Block 0
010 00 1 Block 1
011 0 01 0101010011 Block 2
100 0 Block 3
Main memory
101 01 1 address register
110 10 1
111 0 MDR
Memory page table

01 1
ASSOCIATIVE MEMORY PAGE TABLE
Assume that
Number of Blocks in memory = m
Number of Pages in Virtual Address Space = n
Page Table
- Straight forward design -> n entry table in memory
Inefficient storage space utilization
<- n-m entries of the table is empty
More efficient method is m-entry Page Table

Page Table made of an Associative Memory ,m words;(Page Number:Block Number)
Virtual address
Page no.
1 0 1 Line number Argument register
1 0 1 0 0 Key register
0 0 1 1 1
0 1 0 0 0 Associative memory
1 0 1 0 1
1 1 0 1 0
Page no. Block no.
Page Fault
Page number cannot be found in the Page Table
PAGE REPLACEMENT ALGORITHMS
LRU Implementation Methods
• Counters
- For each page table entry - time-of-use register
- Incremented for every memory reference
- Page with the smallest value in time-of-use register is replaced
• Stack
- Stack of page numbers
- Whenever a page is referenced its page number is removed from the stack and
pushed on top
- Least recently used page number is at the bottom
Reference string 4 7 0 1 2 7 1
2 7
1 2
0 1
7 0
4 4
ADDRESS MAPPING
Address Space and Memory Space are each divided into fixed size group of
words called blocks or pages.
Ex: 1K words group(page size) Page 0
Page 1
Page 2
Address space Memory space Block 0
N = 8K = 213 Page 3
M = 4K = 212 Block 1
Page 4
Block 2
Page 5
Block 3
Page 6
Page 7
Organization of memory Mapping Table in a paged system
Page no. Line number
1 0 1 0 1 0 1 0 1 0 0 1 1 Virtual address
Table Presence
address bit
000 0 Main memory
001 11 1 Block 0
010 00 1 Block 1
011 0 01 0101010011 Block 2
100 0 Block 3
Main memory
101 01 1 address register
110 10 1
111 0 MDR
Memory page table

01 1
PAGE REPLACEMENT ALGORITHMS
LRU Implementation Methods

• Counters
- For each page table entry - time-of-use register
- Incremented for every memory reference
- Page with the smallest value in time-of-use register is replaced
• Stack
- Stack of page numbers
- Whenever a page is referenced its page number is removed from the stack and
pushed on top
- Least recently used page number is at the bottom
Reference string 4 7 0 1 2 7 1
2 7
1 2
0 1
7 0
4 4
TIMING AND CONTROL
Control unit of basic computer
Instruction register (IR)
15 14 13 12 11 - 0 Other inputs
3x8
decoder
7 6543 210
D0
I
D7 Control Control
logic outputs
gates
T15
T0
15 14 . . . . 2 1 0
4 x 16
decoder
4-bit Increment (INR)

sequence Clear (CLR)
counter
(SC) Clock
Control unit implementation

• Hardwired Implementation
• Microprogrammed Implementation

Computer Organization-Memory PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Organization-Memory PDF

Uploaded by

Copyright:

Available Formats

EC 208

COMPUTER ORGANIZATION AND

Computer Abstractions and Technology

• How programs are translated into the machine

Datapath & Control

• Key Idea: levels of abstraction

• Idea: A program is written as a sequence of instructions,

• How are response time and throughput

 Example: time taken to run a program

 Clock period: duration of a clock cycle

The BIG Picture

Instructions Clock cycles Seconds

A basic performance equation:

T – processor time required to execute a program (not total time used);

Speedup = (Earlier execution time) / (Current execution time)

• Requires explicitly parallel programming

• The components from which computers are built, i.e.,

Computers are classified based on the

• Processing & storage units, visual display & audio units,

• Central processing Unit (ALU and Control Units)

• Eg: Keyboard, Mouse, Joystick etc

• Eg: Monitor, Printer, LCD, LED etc

• Responsible for carrying out computational task

• Contains ALU, CU, Registers

• ALU Performs Arithmetic and logical operations

• Register Stores data and result and speeds up the operation

Processor divides the

•Stores data, results, programs

• Secondary storage is used for bulk storage or mass storage

• (PC) (MAR) the contents of PC transferred to MAR

• (PC) (MAR) the contents of PC transferred to MAR

• Increments PC by 1, so that it points to the next instruction address

• Address of the data MAR

Connecting CPU and memory using three buses

•Time taken by the system to execute a program

Parameters which influence the performance are

processor to peripherals and vice – versa

• Address bus : unidirectional :

processor to peripherals (16,20,24 or more parallel signal lines)

•Control bus: bidirectional:

INPUT MEMORY PROCESSOR OUTPUT

Single Bus Problems

• Memory Management Hardware

Memory Hierarchy is to obtain the highest possible access speed while

 Devices that provide backup storage are called auxiliary memory

• The cache is storing segments of programs currently being

• Auxiliary memory access time is usually 1000 times that of

• Most of the main memory is made up of RAM IC chips,

• RAM– Random Access memory

• ROM– Read Only memory

Typical storage capacitance has a value of 20 to 50 fF.

Hence ever memory cell must be refreshed approximately every half

SRAM 6 1X Yes No 100x cache memory

DRAM 1 10X No Yes 1X Main memories,

Address Memory Data

Address Memory Data

A tristate buffer can output 3 different values:

• Level triggered vs Edge triggered

Typical RAM chip Chip select 1 CS1

CS1 CS2 RD WR Memory function State of data bus

Typical ROM chip

Chip select 1 CS1

Address space assignment to each memory chip