You are on page 1of 125

EC 208

COMPUTER ORGANIZATION AND


MICROPROCESSOR

Computer Abstractions and Technology


What You Have to Learn

• How programs are translated into the machine


language
 And how the hardware executes them
• The hardware/software interface
• What determines program performance
 And how it can be improved
• How hardware designers improve
performance
• What is parallel processing
Understanding Performance
• Algorithm
 Determines number of operations executed
• Programming language, compiler, architecture
 Determine number of machine instructions executed per
operation
• Processor and memory system
 Determine how fast instructions are executed
• I/O system (including OS)
 Determines how fast I/O operations are executed
Below Your Program
• Application software
 Written in high-level language
• System software
 Compiler: translates HLL code to machine code
 Operating System: service code
 Handling input/output
 Managing memory and storage
 Scheduling tasks & sharing resources
• Hardware
 Processor, memory, I/O controllers
What is “Computer Architecture”?
Application (Netscape)
Operating System
Compiler (Unix;
Software Assembler Windows 9x)
Instruction Set
Architecture
Hardware Processor Memory I/O system

Datapath & Control


Digital Design
Circuit Design Your Learning
transistors, IC layout

• Key Idea: levels of abstraction


– hide unnecessary implementation details
– helps us cope with enormous complexity of real
systems
The Instruction Set: A Critical
Interface
software

instruction set

hardware
Levels of Program Code
• High-level language
– Level of abstraction closer to problem
domain
– Provides for productivity and portability

• Assembly language
– Textual representation of instructions

• Hardware representation
– Binary digits (bits)
– Encoded instructions and data
Components of a Computer
The BIG Picture • Same components for
all kinds of computer
 Desktop, server, embedded
• Input/output includes
 User-interface devices
• Display, keyboard, mouse
 Storage devices
• Hard disk, CD/DVD, flash
 Network adapters
• For communicating with other
computers
The Von Neumann Computer
The Computer
• Stored-Program Concept – Storing programs as numbers – by
John von Neumann – Eckert and Mauchly worked in engineering
the concept.

• Idea: A program is written as a sequence of instructions,


represented by binary numbers. The instructions are stored in
the memory just as data. They are read one by one, decoded
and then executed by the CPU.
The data path of a typical Von Neumann
machine
Response Time and Throughput
• Response time
– How long it takes to do a task
• Throughput
– Total work done per unit time
• e.g., tasks/transactions/… per hour

• How are response time and throughput


affected by
– Replacing the processor with a faster version?
– Adding more processors?
• We’ll focus on response time for now…
Relative Performance
• Define Performance = 1/Execution Time
• “X is n time faster than Y”

Performance X Performance Y
 Execution timeY Execution timeX  n

 Example: time taken to run a program


 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B
Measuring Execution Time
• Elapsed time
 Total response time, including all aspects
• Processing, I/O, OS overhead, idle time
 Determines system performance

• CPU time
 Time spent processing a given job
• Discounts I/O time, other jobs’ shares
 Different programs are affected differently by CPU and
system performance
CPU Clocking
Operation of digital hardware governed by a constant-rate clock

Clock period

Clock (cycles)

Data transfer
and computation
Update state

 Clock period: duration of a clock cycle


 e.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second
 e.g., 4.0GHz = 4000MHz = 4.0×109Hz
CPU Time
CPU Time  CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles

Clock Rate

Performance improved by
 Reducing number of clock cycles (good algorithm or hardware design)
 Increasing clock rate (good technology)
 Hardware designer must often trade off clock rate against cycle count
Performance Summary

The BIG Picture

Instructions Clock cycles Seconds


CPU Time   
Program Instruction Clock cycle

Performance depends on
 Algorithm: affects IC, possibly CPI
 Programming language: affects IC, CPI
 Compiler: affects IC, CPI
 Instruction set architecture: affects IC, CPI, Tc
Instructions Clock cycles Seconds
CPU Time   
Program Instruction Clock cycle

A basic performance equation:

T – processor time required to execute a program (not total time used);


N - Actual number of machine instructions (including that due to loops);
S – Average No. of clock cycles/instruction;
R – Cycle/sec

Earlier measures –
MIPS (Millions of Instructions per sec.)
MFLOPS – Million floating point operations per sec.
CPI – Cycles per Instruction;
IPC – Instructions per cycle = 1/CPI;

Speedup = (Earlier execution time) / (Current execution time)


Basic Terminology

• Computer • Software
A device that accepts input, A computer program that tells the
processes data, stores data, and computer how to perform particular
produces output, all according to a tasks.
series of stored instructions.
• Network
• Hardware Two or more computers and other
Includes the electronic and devices that are connected, for the
mechanical devices that process the purpose of sharing data and
data; refers to the computer as well programs.
as peripheral devices.
• Peripheral devices
Used to expand the computer’s
input, output and storage
capabilities.
Basic Terminology
• Input
• Whatever is put into a computer system.
• Data
• Refers to the symbols that represent facts, objects, or ideas.
• Information
• The results of the computer storing data as bits and bytes; the words, numbers,
sounds, and graphics.
• Output
• Consists of the processing results produced by a computer.
• Processing
• Manipulation of the data in many ways.
• Memory
• Area of the computer that temporarily holds data waiting to be processed,
stored, or output.
• Storage
• Area of the computer that holds data on a permanent basis when it is not
immediately needed for processing.
…Key Terminology
•• Out of order
Control Pathexecution
•• Datapath
Microcontroller
•• Out-of-order
ALU, FPU, GPU etc.
execution
•• Register renaming
CPU design
•• Dataflow
Hardwarearchitecture
description language
•• Stream processing
Pipelining
•• multi-threading
Cache
•• RISC, CISC
Von-Neumann architecture
•• Instruction-level parallelism (ILP)
Multi-core (computing)
•• Addressing
SuperscalarModes
• Vector processor
• Instruction set
Multiprocessors
• Multicore microprocessors
 More than one processor per chip

• Requires explicitly parallel programming


 Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer
 Hard to do
• Programming for performance
• Load balancing
• Optimizing communication and synchronization
COMPUTER ORGANISATION AND ARCHITECTURE

• The components from which computers are built, i.e.,


computer organization.
• In contrast, computer architecture is the science of integrating
those components to achieve a level of functionality and
performance.
• It is as if computer organization examines the lumber, bricks,
nails, and other building material
• While computer architecture looks at the design of the house.
COMPUTER TYPES

Computers are classified based on the


parameters like
• Speed of operation
• Cost
• Computational power
• Type of application
DESK TOP COMPUTER

• Processing & storage units, visual display & audio units,


keyboards
• Storage media-Hard disks, CD-ROMs
• Eg: Personal computers which is used in homes and offices
• Advantage: Cost effective, easy to operate, suitable for general
purpose educational or business application

NOTEBOOK COMPUTER
• Compact form of personal computer (laptop)
• Advantage is portability
WORK STATIONS
• More computational power than PC
• Costlier
• Used to solve complex problems which arises in engineering application
(graphics, CAD/CAM etc)
ENTERPRISE SYSTEM (MAINFRAME)
• More computational power
• Larger storage capacity
• Used for business data processing in large organization
• Commonly referred as servers or super computers

SERVER SYSTEM
• Supports large volumes of data which frequently need to be accessed or to
be modified
• Supports request response operation
SUPER COMPUTERS
• Faster than mainframes
• Helps in calculating large scale numerical and algorithm calculation
• Used for aircraft design and testing, military application and weather
forecasting
Computing Systems
Computers have two kinds of components:
• Hardware, consisting of its physical devices
(CPU, memory, bus, storage devices, ...)
• Software, consisting of the programs it has
(Operating system, applications, utilities, ...)
The Big Picture
Processor
Input
Control
Memory

ALU
Output
FUNCTIONAL UNITS OF COMPUTER

• Input Unit

• Output Unit

• Central processing Unit (ALU and Control Units)

• Memory

• Bus Structure
Structure - Top Level

Peripherals Computer

Central Main
Processing Memory
Unit

Computer Systems
Interconnection

Input
Output
Communication
lines
Structure - The CPU

CPU

Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection

Control
Unit
Structure - The Control Unit

Control Unit

CPU
Sequencing
ALU Login
Control
Internal
Bus Unit
Control Unit
Registers Registers and
Decoders

Control
Memory
Function
ALL computer functions are:
– Data PROCESSING
– Data STORAGE
– Data MOVEMENT Data = Information

Coordinates How
– CONTROL Information is Used

• NOTHING ELSE!
INPUT UNIT:
• Converts the external world data to a binary format, which can be understood by CPU

• Eg: Keyboard, Mouse, Joystick etc

OUTPUT UNIT:
• Converts the binary format data to a format that a common man can understand

• Eg: Monitor, Printer, LCD, LED etc

CPU
• The “brain” of the machine

• Responsible for carrying out computational task

• Contains ALU, CU, Registers

• ALU Performs Arithmetic and logical operations

• CU Provides control signals in accordance with timings which in turn controls the execution process

• Register Stores data and result and speeds up the operation


Example
Add R1, R2
Control unit works with
T1 Enable R1
a reference signal
called processor clock

T2 Enable R2

Processor divides the


operations into basic
stepsEnable ALU for add op
T3

T4
Each basic step is
executed in one clock
cycle
MEMORY

•Stores data, results, programs


• Two class of storage
(i) Primary (ii) Secondary

• Two types are RAM or R/W memory and ROM read only memory

• ROM is used to store data and program which is not going to change.

• Secondary storage is used for bulk storage or mass storage


Interconnection between Processor and Memory
Registers
Registers are fast stand-alone storage locations- hold data temporarily.
Multiple registers are needed to facilitate the operation of the CPU.
 Two registers-MAR (Memory Address Register) and MDR (Memory Data
Register) : To handle the data transfer between main memory and processor.
 MAR-Holds addresses,
 MDR-Holds data
 Instruction register (IR):Hold Instructions that is currently being executed
 Program counter: Points next instructions to be fetched from memory

• (PC) (MAR) the contents of PC transferred to MAR


• (MAR) (Address bus) Select a particular memory location
• Issues RD control signals
• Reads instruction present in memory and loaded into MDR
• Will be placed in IR (Contents transferred from MDR to IR)
Registers
 MAR-Holds addresses,
 MDR-Holds data
 Instruction register (IR):Hold Instructions
that is currently being executed
 Program counter: Points next instructions
to be fetched from memory

• (PC) (MAR) the contents of PC transferred to MAR


• (MAR) (Address bus) Select a particular memory location
• Issues RD control signals
• Reads instruction and loaded into MDR
• Will be placed in IR
• Instruction present in IR will be decoded
(processor understand what operation it has to perform)

• Increments PC by 1, so that it points to the next instruction address


• If data required for operation is available in register, it performs the
operation
• If data is present in memory following sequence is performed

• Address of the data MAR


• MAR ADD BUS
• Issued RD signal
• Reads data via data bus MDR
• From MDR data (a) directly routed to ALU or
(b) placed in register and operation can be performed
• Results of the operation directed towards output device, memory or
register
Connecting
• All the units must be connected
• Different type of connection for different type
of unit
Memory
Input/Output
CPU
Computer
Modules
Memory Connection
• Receives and sends data
• Receives addresses (of locations)
• Receives control signals
 Read
 Write
 Timing
Input/Output Connections
Similar to memory from computer’s viewpoint
• Output
 Receive data from computer
 Send data to peripheral
• Input
 Receive data from peripheral
 Send data to computer
• Receive control signals from computer
• Send control signals to peripherals
 e.g. spin disk
• Receive addresses from computer
 e.g. port number to identify peripheral
• Send interrupt signals (control)
CPU Connection
• Reads instruction and data
• Writes out data (after processing)
• Sends control signals to other units
• Receives (& acts on) interrupts
BUS STRUCTURE
Connecting CPU and memory
The CPU and memory are normally connected by three groups of connections,
each called a bus:
(Group of wires which carries information form CPU to peripherals or vice – versa)
(A communication pathway connecting two or more devices)
 data bus,
 address bus and
 control bus

Connecting CPU and memory using three buses


PERFORMANCE

•Time taken by the system to execute a program

Parameters which influence the performance are


• Clock speed
• Type and number of instructions available
• Average time required to execute an instruction
• Memory access time
• Power dissipation in the system
• Number of I/O devices and types of I/O devices connected
• The data transfer capacity of the bus
•Databus: bidirectional
(Remember there is no difference in “data” and “instruction” at this level)
 group of wires which carries data information bit form
 Width is a key determinant of performance,8, 16, 32, 64 bit

processor to peripherals and vice – versa

• Address bus : unidirectional :


 group of wires ,carries address information bits
 Identify the source or destination of data
 Bus width determines maximum memory capacity of system
e.g. 8085 has 16 bit address bus giving 64k address space

processor to peripherals (16,20,24 or more parallel signal lines)

•Control bus: bidirectional:


 group of wires which carries control signals form
 Control and timing information
• Memory read/write signal
• Interrupt request
• Clock signals processor to peripherals and vice – versa
Bus Interconnection Scheme
BUS STRUCTURE
•Single bus structure: Common bus used to communicate between peripherals
and microprocessor.

INPUT MEMORY PROCESSOR OUTPUT

Single Bus Problems


Lots of devices on one bus leads to:
 Propagation delays
 Long data paths mean that co-ordination of bus use can adversely
affect performance
 If aggregate data transfer approaches bus capacity
Most systems use multiple buses to overcome these problems
Bus Types
Dedicated
Separate data & address lines
Multiplexed
 Shared lines
 Address valid or data valid control line
 Advantage - fewer lines
 Disadvantages
• More complex control
• Ultimate performance
Bus Arbitration
• More than one module controlling the bus
• e.g. CPU and DMA controller
• Only one module may control bus at one time
• Arbitration may be centralised or distributed
Centralised or Distributed
Arbitration
• Centralised
 Single hardware device controlling bus access
• Bus Controller
• Arbiter
 May be part of CPU or separate
• Distributed
 Each module may claim the bus
 Control logic on all modules
Memory Organization
MEMORY ORGANIZATION

• Memory Hierarchy

• Main Memory

• Auxiliary Memory

• Associative Memory

• Cache Memory

• Virtual Memory

• Memory Management Hardware


MEMORY HIERARCHY

Memory Hierarchy is to obtain the highest possible access speed while


minimizing the total cost of the memory system

Auxiliary memory

Magnetic
tapes I/O Main
processor memory
Magnetic
disks

CPU Cache
memory
MEMORY HIERARCHY
MEMORY HIERARCHY

Register

Cache
Speed

Main Memory

Size
Magnetic Disk

Magnetic Tape

 The memory unit that directly communicate with CPU is called the main memory

 Devices that provide backup storage are called auxiliary memory

 The memory hierarchy system consists of all storage devices employed in a computer system.
Memory Hierarchy
• CPU logic is usually faster than main memory access time,
(processing speed is limited primarily by the speed of main memory)

• The cache is storing segments of programs currently being


executed in the CPU and temporary data.
(frequently needed in the present calculations )

• The typical access time ratio between cache and main memory
is about 1 to 7~10

• Auxiliary memory access time is usually 1000 times that of


main memory
Main Memory

• Most of the main memory is made up of RAM IC chips,


(but a portion of the memory may be constructed with ROM chips)

• RAM– Random Access memory


Integated RAM are available in two possible operating modes, Static and Dynamic

• ROM– Read Only memory


Classification
• Read Only Read Write
• Random Access Sequential
Access
• Volatile Non Volatile
• Static Dynamic
Static RAM (SRAM)
– Each cell stores bit with a six-transistor circuit.
– Data is stored as long as supply is applied.
– Fast – so used where speed is important
– use sense amps for performance
– compatible with CMOS technology
– Relatively insensitive to disturbances such as electrical noise.
– Faster (8-16 times faster) and more expensive (8-16 times expensive ) than
DRAM.
6-transistor SRAM Cell (CMOS)

WL

M2 M4
Q M6
M5 !Q

M1 M3

!BL BL
Dynamic RAM (DRAM)
 Each cell stores bit with a capacitor and transistor.
 Small cells (1 to 3 fets/cell) – so more bits/chip
 Periodic refresh required
 Sensitive to disturbances.
 Slower – so used for main memories
 Not typically compatible with CMOS technology

Typical storage capacitance has a value of 20 to 50 fF.

Data stored as charge in a capacitor can be retained only for a limited time
due to the leakage current which eventually removes or modifies the charge.
Assuming that the voltage on the fully charged storage capacitor is V = 2.5V,
and that the leakage current is I = 40pA, then the time to discharge the
capacitor C = 20fF to the half of the initial voltage can be estimated as

Hence ever memory cell must be refreshed approximately every half


millisecond.

ADVANTAGES OF DRAM:
• The size of a DRAM cell is in the order of 8F2,
where F is the smallest feature size in a given technology.
For F = 0.2μm the size is 0.32μm2
No static power is dissipated for storing charge in a capacitance
• SDRAM : Synchronous DRAM
• SDR SDRAM : Single Data Rate SDRAM
• DDR SDRAM : Double Data Rate SDRAM
• DDR2 SDRAM : an evolution over DDR SDRAM
• DDR3 SDRAM : improvement over DDR2
• RLDRAM : Reduced‐latency DRAM

Tran. Access
per bit time Persist? Sensitive? Cost Applications

SRAM 6 1X Yes No 100x cache memory

DRAM 1 10X No Yes 1X Main memories,


frame buffers
MEMORY CAPACITY
WORD ORGANIZATION
64-cell Memory Array
Row
1 1 1 0 0 1 0 1 0
2 0 1 0 0 1 0 1 0
3 0 1 0 0 1 1 0 1
4 1 0 1 0 0 1 0 1
5 1 0 1 1 1 0 1 1
6 0 1 0 1 0 0 0 0
7 1 0 0 1 0 1 1 0
8 0 0 1 1 0 0 0 0

1 2 3 4 5 6 7 8 Column
Memory Organized as 4 x 16 and 1 x 64 Arrays

Row
1 1 1 0 0 1 1
2 0 1 0 0 2 0
3 0 1 0 0 3 0
4 1 0 1 0 4 1
5 1 0 1 1 5 1
6 0 1 0 1 6 0
.• .• .•
.• .• .•
.• .• .•

13 1 0 1 1 61 1
14 0 0 0 0 62 0
15 0 1 1 0 63 0
16 0 0 0 0 64 0
1 2 3 4 Column 1
Block Diagram of a Read-write Memory

Memory
Select
Address
Decoder

Memory Array
Address Data Bus
Bus

Read Write
Memory Read Operation (8 X 8)

Address Memory Data


MAR
Buffer Select MDR
Buffer
100 Address 00110001
Decoder
000
1 1 0 1 0 1 0 0
001
0 0 0 1 1 0 0 1
010
0 1 1 0 1 1 0 0
011
3 1 0 0 1 0 0 1 0
100 0 0 1 1 0 0 0 1
101
Address 0 1 0 0 1 1 0 0
Data Bus
110 1 0 0 1 0 0 1 0
Bus
111 0 0 1 0 0 1 0 0

Read Write
Memory Write Operation

Address Memory Data


Buffer Select Buffer
011 Address 10110010
Decoder 000
1 1 0 1 0 1 0 0
001
0 0 0 1 1 0 0 1
010
0 1 1 0 1 1 0 0
011
3 1 0 1 1 0 0 1 0
100
0 0 1 1 0 0 0 1
101
Address 0 1 0 0 1 1 0 0
Data Bus
110 1 0 0 1 0 0 1 0
Bus
111 0 0 1 0 0 1 0 0

Read Write
16K x 8 Static RAM

A0

Address
Lines RAM Data Input/
16K x 8 Output Lines

A13

CS
WE
OE
Tri State Buffer

A tristate buffer can output 3 different values:


– Logic 1 (high)
– Logic 0 (low)
– High-Impedance
Encoder

Input
D-Latch D-Flip Flop

• Level triggered vs Edge triggered


• Positive edge vs negative edge triggered
• Race around condition
• FF characteristic tables and excitation table
MAIN MEMORY
RAM and ROM Chips

Typical RAM chip Chip select 1 CS1


Chip select 2 CS2
Read RD 128 x 8 8-bit data bus
RAM
Write WR
7-bit address AD 7

CS1 CS2 RD WR Memory function State of data bus


0 0 x x Inhibit High-impedence
0 1 x x Inhibit High-impedence
1 0 0 0 Inhibit High-impedence
1 0 0 1 Write Input data to RAM
1 0 1 x Read Output data from RAM
1 1 x x Inhibit High-impedence

Typical ROM chip

Chip select 1 CS1


Chip select 2 CS2
512 x 8 8-bit data bus
ROM
9-bit address AD 9
Memory Subsystems Organization
Two or more memory chips can be combined to create memory
with more bits per location
Address Lines : 3
(two 8X2 chips can create a 8X4 memory) Bits per Address: 4

Horizontal expansion

Add: #1 #2
A2A1A0 D3D2 D1D0
000 D3D2 D1D0
001 D3D2 D1D0
010 D3D2 D1D0
011 D3D2 D1D0
100 D3D2 D1D0
101 D3D2 D1D0
110 D3D2 D1D0
111 D3D2 D1D0
Memory Subsystems Organization
Two or more memory chips can be combined to create more
locations Address Lines : 4
(two 8X2 chips can create 16X2 memory) Bits per Address: 2
Address:
A3A2A1A0

0 0000
1 0001
0010

A3=0
2
3 0011
4 0100
5 0101
6 0110
Vertical expansion 7 0111
8 1000
9 1001
10 1010

A3=1
11 1011
12 1100
13 1101
14 1110
15 1111
MEMORY ADDRESS MAP
Pictorial representation of assigned address space for each chip in the system

Address space assignment to each memory chip


Example: 1 KB MEMPRY 512 bytes RAM (from 128X8 chip) and 512 bytes ROM

Memory Connection to CPU


- RAM and ROM chips are connected to a CPU through the data and address buses

- The low-order lines in the address bus select the byte within the chips and
other lines in the address bus select a particular chip through its chip select inputs
RAM: 512 Byte (512 X8) from 128x8 chip
No of Chips required: (512X8 ) / (128X8) = 4 Chips

Tentative address : RAM 1 : 0-127


RAM 2 : 128 -255
RAM 3 : 256- 383
RAM 4 : 384 -511

No of Address lines required : 128 =28 8 lines

RAM :512 Byte from 512 X 8 chip = 1 chip


No of Address lines required : 512 =29 9 lines

Hexa Address bus


Component
address 10 9 8 7 6 5 4 3 2 1
RAM 1 0000 - 007F 0 0 0 x x x x x x x
RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
ROM 0200 - 03FF 1 x x x x x x x x x
RAM: 512 Byte (512 X8) from 128x8 chip

RAM :512 Byte from 512 X 8 chip

Address bus
Hexa
Component address
10 9 8 7 6 5 4 3 2 1

RAM 1 0000 - 007F 0 0 0 x x x x x x x


RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
ROM 0200 - 03FF 1 x x x x x x x x x
CONNECTION OF MEMORY TO CPU
Address bus CPU
16-11 10 9 8 7-1 RD WR Data bus

Decoder
3 2 1 0
CS1
CS2

Data
RD 128 x 8
RAM 1
0000 – 007F
WR
AD7

CS1
CS2

Data
RD 128 x 8 0080 – 00FF
RAM 2
WR
AD7

CS1
CS2

Data
RD 128 x 8
RAM 3
0100 – 017F
WR
AD7

CS1
CS2
RD 128 x 8 Data 0180 – 017F
RAM 4
WR
AD7

CS1
CS2
Data

1- 7 512 x 8 0200 – 03FF


8
9 } AD9 ROM
Memory Hierarchy Design
Registers Cache (one or Main Disk
(CPU) more levels) Memory Storage

Specialized bus Memory bus I/O bus


(internal or external
to CPU)

Tradeoff between size, speed and cost and exploits the principle of locality.
1. Register
 Fastest memory element; but small storage; very expensive
2. Cache
 Fast and small compared to main memory; acts as a buffer between the CPU
and main memory: it contains the most recent used memory locations
(address and contents are recorded here)
3. Main memory is the RAM of the system
4. Disk storage - HDD
Memory Hierarchy Design
Comparison between different types of
memory
Register Cache Memory HDD

size 32 - 256 B 32KB - 4MB 1000 MB 500 GB

speed 1-2 ns 2-4 ns 60 ns 8 ms

$/Mbyte $20/MB $0.2/MB $0.001/MB

larger, slower, cheaper


CACHE MEMORY
Locality of Reference
-The references to memory at any given time interval tend to be confined within a localized areas
-This area contains a set of information and the membership changes gradually as time goes by

 Temporal Locality
The information which will be used in near future is likely to be in use already.

 Spatial Locality
If a word is accessed, adjacent(near) words are likely accessed soon.
(e.g. Related data items are usually stored together; instructions are executed sequentially)

Cache
- The property of Locality of Reference makes the Cache memory systems work
- Cache is a fast small capacity memory that should hold those information
which are most likely to be accessed

Main memory
CPU
Cache memory
Cache Memory
• High speed (towards CPU speed)
• Small size (power & cost)

Miss

Main
CPU Memory
(Slow)
Mem
Cache
(Fast)
Hit Cache

95% hit ratio

Access = 0.95 Cache + 0.05 Mem


PERFORMANCE OF CACHE

Memory Access
 All the memory accesses are directed first to Cache
 If the word is in Cache; Access cache to provide it to CPU
 If the word is not in Cache; Bring a block (or a line) including that word
to replace a block now in Cache

Q. How can we know if the word that is required is there ?

Q. If a new block is to replace one of the old blocks, which one should
we choose ?

 Replacement algorithm
 Hit / miss
 Write-through / Write-back
 Load through
Cache Memory

Main
CPU Memory
(Slow)
Mem
Cache
(Fast)
Cache
Cache
• Every address reference goes first to the cache;
if the desired address is not here, then we have a cache
miss;
If the desired data is in the cache, then we have a cache hit

• Most software exhibits temporal locality of access,

• Transfers between main memory and cache occur at


granularity of cache lines or cache blocks, around 32 or 64
bytes (rather than bytes or processor words).
PERFORMANCE OF CACHE

Performance of Cache Memory System

Hit Ratio - % of memory accesses satisfied by Cache memory system


Te : Effective memory access time in Cache memory system
Tc : Cache access time
Tm : Main memory access time

Te = hTc + (1 - h) Tm

Example: Tc = 0.4 s,


Tm = 1.2s,
h = 0.85%

Te = 0.4 + (1 - 0.85) * 1.2 = 0.58s


ASSOCIATIVE MEMORY
- Accessed by the content of the data rather than by an address
- Also called Content Addressable Memory (CAM)
- Memory accessed simultaneously & in parallel on the basis of data content
- Rather than by specific Address
-Search can be done : on entire word or
Specific field within a word
- Search in parallel.
- Each cell have storage capacity, match logic

Write operation: No address is given


Memory capable of finding an empty, unused location to store the word
ASSOCIATIVE MEMORY
Hardware Organization
Argument register(A)

n
Key register (K)
Match
n register

Associative memory
Input m
array and logic
mXn M
Read m words
Write n bits per word

- Compare each word in CAM in parallel with the content of A (Argument Register)
- If CAM Word[i] = A, M(i) = 1
- Read sequentially accessing CAM for CAM Word(i) for M (i) = 1
- K (Key Register) provides a mask for choosing a
particular field or key in the argument in A
(only those bits in the argument that have 1’s in their corresponding position of K are compared)

Key register: masking for ‘A’


ORGANIZATION OF CAM
A1 Aj An

K1 Kj Kn

Word 1 C11 C1j C1n M1

Word i Ci1 Cij Cin Mi

Word m Cm1 Cmj Cmn Mm

Bit 1 Bit j Bit n

Internal organization of a typical cell Cij


Aj Kj
Input

Write

R S
F ij Match To M i
Read logic

Output
MATCH LOGIC

K1 A1 K2 A2 Kn An

F'i1 F i1 F'i2 F i2 .... F' in F in

Mi
Back on Cache Memory

CPU 30-bit Address


Main
Memory
Cache 1 Gword
1 Mword

Only 20 bits !!!


Cache Memory

00000000
00000001 Main

• Memory

00000
00001
Cache •

• •
• •
• •
• •
FFFFF •
3FFFFFFF

Address Mapping !!!


MEMORY AND CACHE MAPPING
Mapping Function
Specification of correspondence between main memory blocks and cache blocks

 Associative mapping
 Direct mapping
 Set-associative mapping

Associative mapping
Any block of Main memory can potentially reside in any cache block position.
This is much more flexible mapping method.

Direct mapping
A particular block of main memory can be brought to a particular block of cache memory.
So, it is not flexible.

Set-associative mapping
Blocks of cache are grouped into sets, and the mapping allows a block of main memory to
reside in any block of a specific set.
Associative Mapping
- Any block location in Cache can store any block in memory -> Most flexible
- Mapping Table is implemented in an associative memory -> Fast, very Expensive
- Mapping Table Stores both address and the content of the memory word

address (15 bits)

Argument register

Address Data
01000 3450
CAM 02777 6710
22235 1234

Octal address
Associative Mapping

Address
68212

Can have any Cache


number of
locations
68212 01A6

Data
15830 0005 01A6

08993 47CC

How many
comparators?
15 Bits 12 Bits
(Key) (Data)
Associative Mapping

Cache
Location Main
Memory
00000000
00000
00001
Cache 00000001
• •

• •
00012000
• 00012000
• •

• 15000000 •
• •
• •
• 08000000 08000000

• •
FFFFF •



Address (Key) Data 15000000

3FFFFFFF
Direct Mapping
- Each memory block has only one place to load in Cache
- Mapping Table is made of RAM instead of CAM
- n-bit memory address consists of 2 parts; k bits of Index field and (n-k) bits of Tag field
- n-bit addresses are used to access main memory , k-bit Index used to access the Cache

Addressing Relationships 2n word in main memory


2k words in cache memory
n bits
(n-k) bits k bits
Tag(6) Index (9)

00 000 32K x 12
000
512 x 12
Main memory Cache memory
Address = 15 bits Address = 9 bits
Data = 12 bits Data = 12 bits
77 777 777
Direct Mapping
Direct Mapping Cache Organization

Memory Memory data


address
00000 1220
Index
address Tag Data
00777 2340 000 00 1220
01000 3450

01777 4560
02000 5670

777 02 6710
02777 6710

Cache memory
 Each word in cache consists of data word and memory tag.
Operation
 When a new
- CPU word isafirst
generates brought
memory in cache,
request tag bits stored alongside data bits
with (TAG;INDEX)
 When CPU generate
- Access a memory
Cache using INDEX ;request, index field
(tag; data) used TAG
Compare as address
and tag for cache
 Tag- field of CPU->address
If matches Hit compared with tag of word to CPU
Provide Cache[INDEX](data)
-If both match,
- If not matchHIT, else-MISS
-> Miss
• M [tag;INDEX] <- Cache[INDEX](data)
• Cache[INDEX] <- (TAG;M[TAG; INDEX])
• CPU <- Cache[INDEX](data)
Direct Mapping

CPU Address What happens when


Address
000 00500 = 100 00500

00000
Cache
00500 000 01A6
Tag Data
00900 080 47CC 000 01A6

01400 150 0005

FFFFF
Match
Compare
No match

20
10 Bits 16 Bits
Bits
(Tag) (Data)
(Addr)
Direct Mapping Main
memory

Block 0
0
Block j of main memory maps onto block j modulo 128 Block 1

of the cache

Cache Block 127


Tag Block W ord
tag
5 7 4 Main memory address Block 0 Block 128
1
tag
Block 1 Block 129

4: one of 16 words. (each block has 16=24 words)

7: points to a particular block in the cache (128=27)


tag
Block 127 Block 255
5: 5 tag bits are compared with the tag bits associated with
Block 256
its location in the cache. Identify which of the 32 blocks 2
that are resident in the cache (4096/128). Block 257

EX. 11101,1111111,1100
 Tag: 11101 Block 4095 3
 Block: 1111111=127, in the 127th block of the cache 1
Word:1100=12, the 12th word of the 127th block in the cache
Direct Mapping (with block)

Address

000 0050 0
Block Size = 16

00000
Cache
00500 01A6
000
00501 0254
• Tag Data

00900
080
47CC 000 01A6
00901 A0B4

01400 0005
150
01401 5C04

FFFFF
Match
Compare
No match

20
Bits 10 Bits 16 Bits
(Addr) (Tag) (Data)
Set Associative Mapping
- Each memory block has a set of locations in the Cache to load

Set Associative Mapping Cache with set size of two


Index Tag Data Tag Data
000 01 3450 02 5670

777 02 6710 00 2340

Operation
- CPU generates a memory address(TAG; INDEX)
- Access Cache with INDEX, (Cache word = (tag 0, data 0); (tag 1, data 1))
- Compare TAG and tag 0 and then tag 1
- If tag i = TAG -> Hit, CPU <- data i
- If tag i  TAG -> Miss,
Replace either (tag 0, data 0) or (tag 1, data 1),
Assume (tag 0, data 0) is selected for replacement,
M[tag 0, INDEX] <- Cache[INDEX](data 0)
Cache[INDEX](tag 0, data 0) <- (TAG, M[TAG,INDEX]),
CPU <- Cache[INDEX](data 0)
Main
Set Associative Mapping memory
Block 0

Block 1
4: one of 16 words. (each block
has 16=24 words)
Cache
6: points to a particular set in tag
the cache (128/2=64=26) Set 0
Block 0
Block 63
tag
6: 6 tag bits is used to check if Block 1
the desired block is present Block 64
tag
Block 2
(4096/64=26). Set 1
tag Block 65
Block 3

tag Block 127


Block 126
Set 63
Block 128
tag
Block 127
Block 129

Block 4095

Tag Set Word Set-associative-mapped cache with two blocks per set.

6 6 4 Main memory address


Set Associative Mapping
CPU Address

000 00500
2-Way Set Associative

00000
Cache
00500 000 01A6 010 0721

Tag1 Data1 Tag2 Data2

00900 080 47CC 000 0822 000 01A6 010 0721

01400 150 0005 000 0909

FFFFF

Compare Compare

20
10 Bits 16 Bits 10 Bits 16 Bits
Bits
(Tag) (Data) (Tag) (Data) Match No match
(Addr)
Replacement Algorithms
• Difficult to determine which blocks to kick out
• The cache controller tracks references to all blocks as
computation proceeds.
• Increase / clear track counters when a hit/miss occurs
Which Block should be Replaced on a Cache Miss?
Two primary strategies for full and set associative caches
– Random – candidate blocks are randomly selected
 Some systems generate pseudo random block numbers, to get reproducible
behavior useful for debugging
– LRU (Least Recently Used) – to reduce the chance that information that
has been recently used will be needed again, the block replaced is the
least-recently used one.
Replacement Algorithms
• For Associative & Set-Associative Cache
Which location should be emptied when the cache
is full and a miss occurs?
 First In First Out (FIFO)
 Least Recently Used (LRU)
• Distinguish an Empty location from a Full one
 Valid Bit
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Miss Hit Hit Miss

Cache A A A A A E E E E E
FIFO  B B B B B A A A A
C C C C C C C F
D D D D D D
First in - A
then - B
then - C Hit Ratio = 3 / 10 = 0.3
then - D
Replacement Algorithms
CPU A B C A D E A D C F
Reference
Miss Miss Miss Hit Miss Miss Hit Hit Hit Miss

Cache A B C A D E A D C F
LRU  A B C A D E A D C
A B C A D E A D
B C C C E A

Hit Ratio = 4 / 10 = 0.4


CACHE WRITE
Write Through
When writing into memory
If Hit, both Cache and memory is written in parallel
If Miss, Memory is written
For a read miss, missing block may be overloaded onto a cache block

Memory is always updated


-> Important when CPU and DMA I/O are both executing
Slow, due to the memory access time
Write-Back (Copy-Back)
When writing into memory
If Hit, only Cache is written
If Miss, missing block is brought to Cache and write into Cache
For a read miss, candidate block must be written back to the memory

Memory is not up-to-date,


(the same item in Cache and memory may have different value)
VIRTUAL MEMORY

Give the programmer the illusion that the system has a very large memory,
even though the computer actually has a relatively small main memory

Address Space (Logical) and Memory Space (Physical)

address space memory space

virtual address Mapping


(logical address) physical address

address generated by programs actual main memory address


VIRTUAL MEMORY

address space memory space

virtual address Mapping


(logical address) physical address

address generated by programs actual main memory address

Address Mapping
Memory Mapping Table for Virtual Address -> Physical Address

Virtual address

Virtual Memory Main memory


address address Main
mapping memory
register table register

Physical
Address
Memory table Main memory
buffer register buffer register
ADDRESS MAPPING
Address Space and Memory Space are each divided into fixed size group of
words called blocks or pages.
Ex: 1K words group(page size) Page 0
Page 1
Page 2
Address space Memory space Block 0
N = 8K = 213 Page 3
M = 4K = 212 Block 1
Page 4
Block 2
Page 5
Block 3
Page 6
Page 7
Organization of memory Mapping Table in a paged system
Page no. Line number
1 0 1 0 1 0 1 0 1 0 0 1 1 Virtual address
Table Presence
address bit
000 0 Main memory
001 11 1 Block 0
010 00 1 Block 1
011 0 01 0101010011 Block 2
100 0 Block 3
Main memory
101 01 1 address register
110 10 1
111 0 MDR

Memory page table


01 1
ASSOCIATIVE MEMORY PAGE TABLE
Assume that
Number of Blocks in memory = m
Number of Pages in Virtual Address Space = n
Page Table
- Straight forward design -> n entry table in memory
Inefficient storage space utilization
<- n-m entries of the table is empty

More efficient method is m-entry Page Table


Page Table made of an Associative Memory ,m words;(Page Number:Block Number)
Virtual address
Page no.
1 0 1 Line number Argument register

1 0 1 0 0 Key register

0 0 1 1 1
0 1 0 0 0 Associative memory
1 0 1 0 1
1 1 0 1 0
Page no. Block no.

Page Fault
Page number cannot be found in the Page Table
PAGE REPLACEMENT ALGORITHMS

LRU Implementation Methods

• Counters
- For each page table entry - time-of-use register
- Incremented for every memory reference
- Page with the smallest value in time-of-use register is replaced

• Stack
- Stack of page numbers
- Whenever a page is referenced its page number is removed from the stack and
pushed on top
- Least recently used page number is at the bottom

Reference string 4 7 0 1 2 7 1

2 7
1 2
0 1
7 0
4 4
ADDRESS MAPPING
Address Space and Memory Space are each divided into fixed size group of
words called blocks or pages.
Ex: 1K words group(page size) Page 0
Page 1
Page 2
Address space Memory space Block 0
N = 8K = 213 Page 3
M = 4K = 212 Block 1
Page 4
Block 2
Page 5
Block 3
Page 6
Page 7
Organization of memory Mapping Table in a paged system
Page no. Line number
1 0 1 0 1 0 1 0 1 0 0 1 1 Virtual address
Table Presence
address bit
000 0 Main memory
001 11 1 Block 0
010 00 1 Block 1
011 0 01 0101010011 Block 2
100 0 Block 3
Main memory
101 01 1 address register
110 10 1
111 0 MDR

Memory page table


01 1
PAGE REPLACEMENT ALGORITHMS

LRU Implementation Methods


• Counters
- For each page table entry - time-of-use register
- Incremented for every memory reference
- Page with the smallest value in time-of-use register is replaced

• Stack
- Stack of page numbers
- Whenever a page is referenced its page number is removed from the stack and
pushed on top
- Least recently used page number is at the bottom

Reference string 4 7 0 1 2 7 1

2 7
1 2
0 1
7 0
4 4
TIMING AND CONTROL
Control unit of basic computer
Instruction register (IR)
15 14 13 12 11 - 0 Other inputs

3x8
decoder
7 6543 210
D0
I
D7 Control Control
logic outputs
gates
T15
T0

15 14 . . . . 2 1 0
4 x 16
decoder

4-bit Increment (INR)


sequence Clear (CLR)
counter
(SC) Clock

Control unit implementation


• Hardwired Implementation
• Microprogrammed Implementation

You might also like