You are on page 1of 110

RAM (Random Access Memory)

Speaker: Lung-Sheng Chien

Reference: [1] Bruce Jacob, Spencer W. Ng, David T. Wang, MEMORY SYSTEMS
Cache, DRAM, Disk
[2] Hideo Sunami, The invention and development of the first trench-
capacitor DRAM cell, http://www.cmoset.com/uploads/4.1-08.pdf
[3] JEDEC STANDARD: DDR2 SDRAM SPECIFICATION
[4] John P. Uyemura, Introduction to VLSI circuits and systems
[5] Benson, university physics
OutLine

• Preliminary
- parallel-plate capacitor
- RC circuit
- MOSFET

• DRAM cell
• DRAM device
• DRAM access protocol
• DRAM timing parameter
• DDR SDRAM
SRAM

Typical PC organization.
Cache: use SRAM
main memory : use DRAM

Basic organization of DRAM internals


DRAM versus SRAM
Dynamic RAM Static RAM
• Random access: each location in
Cost Low High
memory has a unique address. The
Speed Slow Fast time to access a given location is
# of transistors 1 6 independent of the sequence of
Density High Low prior accesses and is constant.
target Main memory Cache

DRAM cell
( 1T1C cell )

Question 1: what is capacitor? SRAM cell


Question 2: what is transistor?
Electric flux
electric flux = number of field lines passing though a surface
 
1 uniform electric field, electrical flux is defined by Φ E = E ⋅ A

2 if the surface is not flat or field is not uniform, then


one must sum contributions of all tiny elements of
area
       
Φ E ≈ E1 ⋅ ∆A1 + E2 ⋅ ∆A2 +  = ∑ E j ⋅ ∆A j → ∫ E ⋅ dA
Gauss’s Law

flux leaving surface = flux entering surface


 
net flux is 0, say Φ E = ∫ ⋅ dA = 0
E

Gauss’s Law: net flux through a closed surface is


proportional to net charge enclosed by the surface
  Q
Φ E = ∫ E ⋅ dA = enc
ε0
Qenc = net charge enclosed by a closed surface
−12  C2 
ε 0 = 8.85 × 10 F / m 2 
permitivity in free space
 N ⋅ m 
Conductor

• When a net charge is added to a conductor, free electrons


will redistribute themselves in a short time ( ~1ps ) such
that internal electrical field is 0
• If we draw a Gaussian surface (dashed line) inside a
conductor, then zero flux implies zero charge inside
conductor

1 ps = 10−12 s

Gaussian pillbox
Example: infinite conducting plate

 
Φ E = ∫ E ⋅ dA = Eupper Aupper + Edown Adown = EA

Qenc = Aσ σ = surface charge density


  Q σ metal
Φ E = ∫ E ⋅ dA = enc ⇒ E =
ε0 ε0
Capacitor: parallel plate [1]
Consider two parallel plate (metal) with total charge Q on each plate respectively
Q/2
surface charge density σ =
A

σ
E=
ε0
+++++++++
+++++++++ +++++++++ +++++++++
σ +++++++++ +++++++++
ε0
d 2σ
E=
σ ε0
−−−−−−−−
ε0 −−−−−−−−
−−−−−−−− −−−−−−−−
−−−−−−−−
−−−−−−−−
σ
ε0
Capacitor: parallel plate [2]
In most cases, we don’t care about thickness of the plate. For simplicity we may
assume plate has no thickness, say flat sheet, with total charge Q on each plate
respectively. Then definition of surface charge density is different
Q
surface charge density σ=
A

+++++++++ +++++++++
d σ
E=
ε0
−−−−−−−−
−−−−−−−−

  Q
Φ E = ∫ E ⋅ dA = E1 A1 + E2 A2 =
ε0
σ
E1 = E2 =
2ε 0
Capacitance [1]
Q (t ) dQ ( t )
Kirchhoff’s voltage law: V = VR + VC VR = I ( t ) ⋅ R VC = I (t ) =
C dt
R (resistance) R

+++
V C (capacitor) V C
−−−

T = 0 ,no charge on capacitor T > 0 ,capacitor has some charge


V V
VC = 0 , I ( 0 ) = charges the capacitor VC > 0 , I ( t ) < still charges capacitor
R R

R
T >> 1 ,capacitor contains maximum charge
+++++ VC = V , I ( t ) = 0 doesn’t charge capacitor anymore
V C
−−−−−
Capacitance [2]

+++++++++
d σ
E= VC = Ed
ε0
−−−−−−−−
Q ε0 A
capacitance is defined by C = =
VC d

capacitance is capability of storing charge


Electric field is not uniform near edge,
called fringe field

 
C ∝ ε 0 since from Gauss’s law Φ = E ⋅ dA = Qenc , E ∝ 1
1
E ∫ ε0 ε0
1
2 C∝ since if we fix total charge Q and area A, then
d
Q σ
σ = is fixed ⇒ E = is fixed ⇒ V = Ed ∝ d
A ε0
3 C ∝ A since if we fix potential difference V and space d, then
V σ
E = is fixed ⇒ σ is fixed due to E = ⇒ Q =σ A∝ A
d ε0
Capacitance [3]
Suppose we add an insulator into parallel metal plate, what happens on capacitor?

dipole insulator
+q
+

d

−q −

p = qd : dipole moment
metal When charge is stored on capacitor, then
electric field would separate positive and
No charge on capacitor, nothing happens negative charge inside insulator.

E0 : field produced by charge on capacitor


ED : net field within insulator (dielectric)
Ei : field induced by separate charge of insulator
Capacitance [4]

+q
+
  dipole moment
d p = qd : dipole moment P= = polarization
unit volume
−q −

Constitutive equation: P = ε 0 χ e Etotal χ e : electric susceptibility


P 1 1
Etotal = Eext − ⇒ Etotal = Eext ≡ Eext ε r : dielectric constant
ε0 1 + χe εr

material Dielectric constant Material Dielectric constant


vacuum 1 Benzene (苯) 2.28
Silicon dioxide 3.9 Diamond 5.7
Ta2O5 25 Salt 5.9
BST >200 Silicon 11.8
TiO2 (Titanium dioxide) 85 Methanol (甲醇) 33
ZrO2 23 SrZrO3 30
Al2O3 9.1 La2O3 (氧化鑭 ) 16
HfO2 (Hafnium oxide) 25 water 80.1
BaTiO3 (鈦酸鋇) 3000~8000 KTaNbO3 34000
Capacitance [5]

+++++++++
σ Q ε0 A
d Eext = VC = Eext d C0 = =
ε0 VC d
−−−−−−−−

+++++++++
d 1 Q ε A
E= Eext VC = Ed C= = ε r 0 = ε r C0
εr VC d
−−−−−−−−
Insulator (dielectric)

1 Keep all geometrical parameters, area A and height d, then we can add insulator to increase
capacitance of capacitor

2 Insulator would induce polarization to cancel part of external field such that small voltage gap
can store the same charge. In other words, capability of charge storage is increasing so that
capacitance is also increasing
Area of plate: A
3 Design parameters of a capacitor are Distance between two plate : d

Dielectric constant : εr
RC circuit
Q (t ) dQ ( t )
Kirchhoff’s voltage law: V = VR + VC VR = I ( t ) ⋅ R VC = I (t ) =
C dt
dQ Q
First order ODE: V =R + I .C. Q ( 0 ) = q
dt C
R
dQ Q
1 Charging: Q ( 0 ) = 0 with V = R +
dt C
  t 
V C VC = V 1 − exp  − 
  RC  

R
dQ Q
2 discharging: Q ( 0 ) = CV with 0 = R +
dt C
+++++  t 
C V VC = V exp  − 
−−−−−  RC 

Typical time: T = RC (RC time constant)


For discharging case, when t = RC , then VC = 0.37V
FET (Field-Effect Transistor, 場效電晶體)
CMOS inverter
truth table
Logical symbol

x x

x = 0 ⇒ x =1 x =1⇒ x = 0

current flow
MOSFET (Metal-Oxide-Semiconductor) [1]
top view
polysilicon (poly)

SiO2

L: channel length, also called feature size, up to 45 nm so far

pFET
side view

http://ezphysics.nchu.edu.tw/prophys/electron/lecturenote/7_5.pdf
MOSFET [2]
nFET cross section pFET cross section

G (gate)

S (source) D (drain)

Typical thickness tox = 5nm

(
Typical gate capacitance CG ∼ fF femtofarad ,10 −15 F )
MOSFET operation [1]
zero gate voltage open switch

n+ n+ W

p + p − p
V V No current
n − n + n

pn junction forward current reverse blocking

= n p n two pn junction
MOSFET operation [2]
positive gate voltage closed switch

n+ n+ W

electron channel

current flows through thin electron channel from source to drain

high gate voltage negative gate voltage

open switch closed switch


CMOSFET layers
MOSFET layers in an n-well process

Metal interconnect layers


OutLine
• Preliminary
• DRAM cell
-1T1C structure
- trench capacitor, stack capacitor
- array structure
- sense amplifier
- read / write operation

• DRAM device
• DRAM access protocol
• DRAM timing parameter
• DDR SDRAM
DRAM cell
DRAM cell = cell transistor +
storage capacitor

1 charging R

V C

R
2 leakage
+++
C V
−−−
Scaling of memory cell and die size of DRAM
Storage capacitance should be kept constant despite the cell scaling to provide
adequate operational margin with sufficient signal-to-noise ratio

對於在製程微縮過程中電容所面臨問題的解決方式,為了增加平行電板的面積又
不至於增加細胞的尺寸,有兩種製程流派來維持電容值在容許的數值之上:
深溝電容(trench capacitor)以及堆疊電容 (stack capacitor)。
Popular model of DRAM cell

堆疊電容 (stack capacitor)

深溝電容(trench capacitor)
A scaling limit of capacitor structure
Dielectric film should be physically thin enough not fill up the trench.

F: feature size

Ti: dielectric film thickness

2Ti < F

Cross-section of storage node DRAM capacity (bits/die)

After K. Itoh, H. Sunami, K. Nakazato, and M. Horiguchi, ECS Spring Meeting, May 4, 1998
Objective: decrease feature size to increase density of DRAM cells

material Dielectric constant Material Dielectric constant


Silicon dioxide 3.9 Al2O3 9.1
Ta2O5 25 La2O3 (氧化鑭 ) 16
TiO2 (Titanium dioxide) 85 BaTiO3 (鈦酸鋇) 3000~8000
ZrO2 23 SrZrO3 30
HfO2 (Hafnium oxide) 25 KTaNbO3 34000
Read operation in DRAM [1]
Suppose a DRAM cell is high voltage (data value is 1) in capacitor, when do read
operation, address line (word line) is selected and value of capacitor would be
extracted

current flow out


(dis-charging)

1 Precharge to reference voltage 2 Open transistor (world line is selected)

off open
∆V
Vdd Vdd
Vref Vref
Sense amplifier capacitor Sense amplifier capacitor

∆V > 0 Sense amplifier sets bit line as 1


capacitance of storage capacitor 1
=
capacitance of bitline 10
Read operation in DRAM [2]

4 Turn off transistor, complete one


read operation
off
current flow in
Vdd
(charging)

Sense amplifier capacitor

3 Data restoration
Question 3: what do you think “if transistor is off,
open then capacitor is isolated, no leakage current
flows out” ?
Vdd
Vdd
Sense amplifier capacitor

Since when data is read out, then capacitor


is discharging such that it can not be read
again, hence data restoration is necessary.
DRAM array structure
Open bitline folded bitline

area per cell = 8F 2

Differential sense amplifier use a pair of bitlines to sense the


voltage value in DRAM cell
area per cell = 6F 2
Functionality of sense amplifier
• Sense the minute change in voltage
- access transistor is turned on
- storage capacitor places its charge on the bitline
- sense amplifier compares voltage on that bitline against a reference
voltage on a separate bitline

• Restores the value of cell after the voltage on the bitline is sensed

• Temporary data storage, called row buffer


4 steps of amplifier operation [1]
Basic sense amplifier circuit diagram

Vcc
Signal EQ activates tow transistors such that source Vref = charges two drains (bitlines)
2
4 steps of amplifier operation [2]

• Signal EQ is deactivated such that equalization circuit is disable


• Storage capacitor is discharging till voltage of storage capacitor is equal to voltage of
bitline, a little bit larger than reference voltage

Open transistor (world line is selected)


open
∆V
Vdd
Vref
current flow out
Sense amplifier capacitor
(dis-charging)
∆V > 0 Sense amplifier sets bit line as 1
4 steps of amplifier operation [3]

Vref + ∆V
1
1 Vref + ∆V > Vcc exceeds threshold such that transistor is turned on
2 2
SAN = 0
2 signal SAN is set GND (ground)

Vref  3 1 3 current from bitline flows into SAN, then voltage of bitline is
decreasing till voltage is 0

4 1
Vref + ∆V  V < Vcc , its complement exceeds threshold such that
2
6 transistor is turned on

4 5 5 signal SAP is set Vcc (power line)


1 SAP = Vcc
V < Vcc 6 current from SAP flows into bitline, then voltage of bitline is
2 increasing till voltage is Vcc
4 steps of amplifier operation [4]

Vref  8

7 current from bitline flows into capacitor, then


addr = 1 capacitor is charging (data restoration)

7
Vcc 8 signal CSL (column-select line) is activated,
then transistor is turn on, current flows into
output. After voltage is stable in output, CSL
is deactivated and turn off transistor, then
SAN = 0 data is stored in output (row buffer)
SAP = Vcc

V =0

Bi-stable circuit
Written into DRAM array

• Data written by memory controller is buffered by I/O buffer of DRAM device and used
to overwrite sense amplifiers and DRAM cells.

• The time period required for write data to overdrive sense amplifiers and written
through into DRAM cells is t_WR

• The row cycle time of DRAM device is write-cycle limited due to t_WR
OutLine

• Preliminary
• DRAM cell
• DRAM device
- DRAM SPEC
- input/output signal
- channel, rank, bank, row, column

• DRAM access protocol


• DRAM timing parameter
• DDR SDRAM
Typical 16Mbit DRAM (4M x 4)
RAS : row address select A [ 0 :10 ] : address line, 11 bits, for row and column

CAS : column address select D [ 0 : 3 ] : data line, 4 bits


WE : write enable ( write operation ) refresh counter : time to do refresh
OE : output enable ( read operation )
packaging of 16Mbit DRAM (4M x 4)
A [ 0 :10 ] : address line, 11 bits, for row and column

D [1 : 4 ] : data line, 4 bits

RAS : row address select

CAS : column address select

WE : write enable ( write operation )

OE : output enable ( read operation )

Vcc : power supply × 2


Vss : ground pin × 2

NC : no connect
Spec of DDR (double data rate)
JEDEC document: http://www.jedec.org/Catalog/display.cfm

DDR prefetch buffer is 2 bits deep


Standard Memory Cycle I/O Bus Data transfers Module Peak
name clock time clock per second name transfer rate
DDR-200 100 MHz 10 ns 200 MHz 200 Million PC-1600 1600 MB/s
DDR-266 133 MHz 7.5 ns 266 MHz 266 Million PC-2100 2100 MB/s
DDR-333 166 MHz 6 ns 333 MHz 333 Million PC-2700 2700 MB/s
DDR-400 200 MHz 5 ns 400 MHz 400 Million PC-3200 3200 MB/s
from http://en.wikipedia.org/wiki/DDR_SDRAM

1 ns (nano second) = 10−9 second

DDR-xxx denotes data transfer rate


Bandwidth is calculated by taking transfers per second and multiplying by eight.
This is because DDR memory modules transfer data on a bus that is 64 data bits wide

64 data bits = 8 (chip per side) x 8 (bits per chip)


Spec of DDR2 (double data rate)
From http://en.wikipedia.org/wiki/DDR2_SDRAM

DDR2 prefetch buffer is 4 bits deep

Standard Memory Cycle I/O Bus Data Module Peak


name clock time clock transfers per name transfer
second rate
DDR2-400 100 MHz 10 ns 200 MHz 400 Million PC2-3200 3200 MB/s
DDR2-533 133 MHz 7.5 ns 266 MHz 533 Million PC2-4200 4266 MB/s
PC2-4300
DDR2-667 166 MHz 6 ns 333 MHz 667 Million PC2-5300 5333 MB/s
PC2-5400
DDR2-800 200 MHz 5 ns 400 MHz 800 Million PC2-6400 6400 MB/s
DDR2-1066 266 MHz 3.75 ns 533 MHz 1066 Million PC2-8500 8533 MB/s
PC2-8600

DDR2-xxx denotes data transfer rate

PC2-xxxx denotes theoretical bandwidth and is used to describe assembled DIMMs,


Bandwidth is calculated by taking transfers per second and multiplying by eight.
This is because DDR2 memory modules transfer data on a bus that is 64 data bits wide
Spec of DDR3 (double data rate)
From http://en.wikipedia.org/wiki/DDR3_SDRAM
DDR3 prefetch buffer is 8 bits deep > DDR2 (4 bits deep) > DDR (2 bits deep)

Standard Memory Cycle I/O Bus Data transfers Module Peak


name clock time clock per second name transfer rate
DDR3-800 100 MHz 10 ns 400 MHz 800 Million PC3-6400 6400 MB/s
DDR3-1066 133 MHz 7.5 ns 533 MHz 1066 Million PC3-8500 8533 MB/s
DDR3-1333 166 MHz 6 ns 667 MHz 1333 Million PC3-10600 10667 MB/s
DDR3-1600 200 MHz 5 ns 800 MHz 1600 Million PC3-12800 12800 MB/s

From http://shopping.pchome.com.tw/

A DIMM (dual in-line memory module)


comprises a series of DRAM IC.
1GB DDR2-800, 240 pins full data bit-width of the DIMM ie 64 bits
Motherboard P5Q PRO
DDR2 DIMM_A1 240-pin module
DDR2 DIMM_A2 240-pin module
DDR2 DIMM_B1 240-pin module

North Bridge DDR2 DIMM_B2 240-pin module


Intel P45 chipset
memory controller

Channel A: DIMM_A1 and DIMM_A2


Channel B: DIMM_B1 and DIMM_B2

FSB/CPU外頻對照表

前側匯流排 FSB 1600 FSB 1333 FSB 1066 FSB 800


CPU外頻 400 MHz 333 MHz 266 MHz 200 MHz
DIMMs, ranks, banks, and arrays

North bridge
memory slots

• A system has many DIMMs, each of which contains ranks.


• Each rank is a set of ganged DRAM devices, each of which has many banks.
• Each bank has many constituent arrays.
Nomenclature: Channel
DMC (DRAM memory controller)

CPU

DIMM

North bridge

Two channels
Nomenclature: rank
Memory system with 2 ranks of DRAM devices

A “rank” is a set of DRAM devices that operate in lockstep in response to a given command.

Chip-select signal is used to select appropriate rank of DRAM devices to respond to a given
command.
Nomenclature: bank
SDRAM device with 4 banks of DRAM arrays internally

A “bank” is a set of independent memory arrays inside a DRAM devices.


All banks can be read in pipelined and refresh simultaneously.
Nomenclature: row
Generic DRAM devices with 4 banks, 8196 rows, 512 columns per row and 16 data
bits per column.

A “row” is a group of storage cells that are activated in parallel in response to a row
activation command.

size of row = size of row of a DRAM device x # of DRAM devices in a given rank
Nomenclature: column
A column of data is the smallest addressable unit of memory

width of data bus = 16 x 4 = 64 (bits)


Nomenclature: DIMM (dual In-line Memory Module)
http://www.simmtester.com/page/news/showpubnews.asp?title=Memory+Module+Picture+2007&num=150

240-pin fully buffered DDR2 DIMM Standard 240-pin DDR2 DIMM

• A dual inline memory module (DIMM) consists of a number of memory components


(usually black) that are attached to a printed circuit board (usually green).
• Each 240-pin DIMM provides a 64-bit data path (72-bit for ECC or registered
modules).
• DIMM has 120 pins on the front and 120 pins on the back, for a total of 240 pins.
• Standard DDR2 DIMM has 8 chips (block) on one side, total is 16 chips.
• Fully buffered DDR2 DIMM has 9 chips on one size, total is 18 chips.
Configuration of DRAM [1]

DIMMs are built using "x4" (by 4) memory chips or "x8" (by 8) memory chips with 8(9)
chips per side. "x4" or "x8" refer to the data width of the DRAM chips in bits.

Example: a x4 DRAM indicates that DRAM has at least four memory array in a single
bank and a column width is 4 bits.
Configuration of DRAM [2]
256-Mbit SDRAM device configuration

Device configuration 64 M x 4 32 M x 8 16 M x 16
Number of banks 4 4 4
Number of rows 8192 8192 8192
Number of columns 2048 1024 512
Data bus width 4 8 16

Configuration = (number of addressable location, number of data bits per location)

1 256-Mbit = 64 M (locations) x 4 (bits per location)

2 64 M (locations) = 8192 (rows) x 2048 (cols) x 4 (banks)

From http://shopping.pchome.com.tw/

1GB DDR2-800, 240 pins


OutLine

• Preliminary
• DRAM cell
• DRAM device
• DRAM access protocol
- pipelined-base resource usage model
- read / write operation

• DRAM timing parameter


• DDR SDRAM
Basic DRAM Memory-Access Protocol

Command and data movement on a generic SDRAM device DRAM memory-access protocol
defines commands and timing
constraints that a DRAM
memory controller uses to
manage the movement of data
between itself and DRAM
devices

five basic DRAM commands


- row access command
- column-read command
- column-write command
- precharge command
- refresh command

resource usage model : at any given instance, 4 operations exist in 4 phases, this
constitute 4-stage pipelined. Resources are not shared among these 4 phases.
Sometimes we call it as 4-stage pipelined.
Generic DRAM command format

t parameter 1 measures duration of “phase 2” (spends in the use of selected bank)


t parameter 2 measures duration of “phase 3” (spends in the use of resource
to multiple banks of DRAM)

t parameter 1 is minimum time between two commands whose relative timing is limited by
the sharing of resources within a given bank of DRAM arrays

t parameter 2 is minimum time between two commands whose relative timing is limited by
the sharing of resources by multiple banks of DRAM arrays within the same
DRAM devices.

parameter description
t_CMD Command transport duration. The time period that a command occupies on the
command bus as it is transported from the DRAM controller to the DRAM devices.
Row Access Command
Objective: move data from the cells in DRAM arrays to sense amplifiers and then
restore the data back into the cells in DRAM array.

parameter description
t_RCD Row to Column command Delay. The time interval between row access and data ready at
sense amplifiers.
The time required between RAS (Row Address Select) and CAS (Column Address Select).
t_RAS Row Access Strobe latency. The time interval between row access command and data
restoration in DRAM array. A DRAM bank cannot be precharged until at least t_RAS time
after the previous bank activation.
Column-Read Command [1]

Objective: move data from array of sense amplifiers through data bus back to
memory controller

parameter description
t_CAS ( t_CL ) Column Access Strobe latency. The time interval between column access
command and start of data return by DRAM devices.
Column-Read Command [2]

parameter description
t_BURST Data burst duration. The time period that data burst occupies on the data bus.
In DDR2 SDRAM, 4 beats of data occupy 2 full clock cycles.

• One beat burst means one-column data


• Each column of SDRAM is individually addressable and given a column address in
the middle of 4 column burst , SDRAM will reorder the burst to provide the data of
requested address first, this is called critical-word forwarding.
Column-Read Command [3]

parameter description
t_CCD Column-to-Column Delay. The minimum column command timing, determined by internal
burst (prefetch) length. Multiple internal bursts are used to form longer burst for column
read.
t_CCD is 2 beats (1 cycles) for DDR SDRAM
t_CCD is 4 beats (2 cycles) for DDR2 SDRAM
t_CCD is 8 beats (4 cycles) for DDR3 SDRAM
Column-Write Command [1]
Objective: move data from memory controller to sense amplifiers of targeted bank.
Clearly ordering of phases is reversed between column-read and column-write
commands.

parameter description
t_CWD Column Write Delay. The time interval between issuance of column-write command and
placement of data on the bus by DRAM controller.
SDRAM: t_CWD = 0 cycle
DDR SDRAM: t_CWD = 1 cycle
DDR2 SDRAM: t_CWD = t_CAS – t_CMD cycles
DDR3 SDRAM : t_CWD = 5 ~ 8 cycles
Column-Write Command [2]

parameter description
t_WTR Write To Ready delay time.
The minimum time interval between the end of a write data burst and the start of a
column-read command. I/O gating is released by write command.
Write command  read command
t_WR Write Recovery time.
The minimum time interval between the end of a write data burst and the start of a
precharge command. Allows sense amplifiers to restore data to cells.
Wrtie command  precharge command
Precharge Command [1]

Data access in a typical DRAM device is composed of two-step process

• Step1: row access command moves data from DRAM cells to sense amplifiers (data
is cached), then column access command moves data between DRAM device and
memory controller
• Step 2: precharge command completes the row access sequence as it resets the
sense amplifiers and bitlines and prepares them for another row access command to
the same DRAM array.
Precharge Command [2]

parameter description
t_RP Row Precharge.
The time interval that it takes for a DRAM array to be precharged (precharge bitline and
sense amplifiers) for another row access.
Switching between memory banks.
t_RC Row Cycle.
The time interval between accesses to different rows in a bank.
t_RC = t_RAS + t_RP
Refresh Command [1]
• Non-persistent charge storage in DRAM cells means that charge stored in capacitor
will gradually leak out through access transistors.

• To maintain data integrity, DRAM must be periodically read out and restored before
charge decay to indistinguishable level.

parameter description
t_RFC ReFresh Cycle time.
The time interval between refresh and activation commands.
One refresh command may refresh 1, 2, 4, 8 rows. The more rows are refreshed, the
more time t_RFC is.
Refresh Command [2]
A refresh command refresh DRAM cells in all banks since all banks can operate independently

DRAM capacity Number of Refresh count Number of row per t_RC t_RFC
device family rows refresh command
DDR 512MB 8192 8192 1 55 ns 70 ns
DDR2 512MB 16384 8192 2 55 ns 105 ns
4096MB 65536 8192 8 ~ 327.5 ns

Suppose memory is DDR2-800 2GB, (memory clock = 200MHz, 5ns/clock), then


t_RFC = 52 (memory clocks) = 52 x 5 (ns/clock) = 260 ns
Read Cycle [1]

Principle of spatial locality: row access command fetch a row into sense amplifier

A [i ][ j ] A [i ][ j + 1]

A [i ][ j ] → A [i ][ j + 1] Row access column access column access

A [i ][ j ] A [i + 1][ j ]

A [i ][ j ] → A [i + 1][ j ] Row access column access precharge Row access column access
Read Cycle [2]
t RC
t RAS t RP

cmd & addr bus row acc col read prec. row act
bank utilization data sense bank access data restore array precharge
device utilization I/O gating
data bus time
t RCD data burst

tCAS t BURST

row access column read precharge


Read Cycle [3]
t RC
t RAS t RP

cmd & addr bus row acc col read prec. row act
bank utilization data sense bank access data restore array precharge
device utilization I/O gating
data bus time
t RCD data burst

tCAS t BURST

row access column read precharge


Read Cycle [4]
t RC
t RAS t RP

cmd & addr bus row acc col read prec. row act
bank utilization data sense bank access data restore array precharge
device utilization I/O gating
data bus time
t RCD data burst

tCAS t BURST

row access column read precharge


Write Cycle [1]

Row cycle time is limited by the duration of write cycle since data path of write is

memory controller data bus I/O gating MUX sense amplifier DRAM cells

t RAS ( write ) = t RCD + tCWD + t BURST + tWR



t RAS ( read ) = t RCD + tCAS + t BURST + ( remaining ) trestore
Write Cycle [2]

t RC
t RAS t RP

cmd & addr bus row acc col write prec. row act
bank utilization data sense write data restore array precharge
device utilization I/O gating
data bus time
t RCD data burst

tCWD t BURST tWR

row access column read data restore precharge


Consecutive reads and writes to same open bank [1]
Two column-read commands to the same row are issued
t BURST

cmd & addr bus Read 0 Read 1


bank “i” utilization row x open
rank “m” utilization I/O gating I/O gating
data bus data burst data burst

tCAS t BURST
Precharge is not necessary since one row of data has been latched in sense amplifier
t RC
t RAS t RP

cmd & addr bus row acc col read prec. row act
bank utilization data sense bank access data restore array precharge
device utilization I/O gating
data bus time
tRCD data burst

tCAS t BURST

row access column read precharge


Consecutive reads and writes to same open bank [2]
cmd & addr bus row acc Read 0 Read 1
bank utilization data sense bank access data restore bank access data restore
device utilization I/O gating I/O gating
data bus t RCD data burst data burst

tCAS t BURST tCAS t BURST

row access column read column read

data restore for “read 0” and “read 1” can be done simultaneously

cmd & addr bus row acc Read 0 Read 1


bank utilization data sense bank access data restore data restore
device utilization I/O gating I/O gating
data bus tRCD data burst data burst

tCAS t BURST

row access column read

N consecutive column-read needs time t RCD + tCAS + N ⋅ t BURST , not N ( t RCD + tCAS + t BURST )
Consecutive reads to different rows of same bank
t RAS + t RP

cmd & addr row acc read 0 prec row acc read 1
bank “i” utilization data sense row x open -data restore bank i precharge data sense row y open-data restore
rank “m” utilization I/O gating I/O gating
data bus data burst data burst

t RAS t RP

A [i ][ j ] → A [i + 1][ j ] destroys spatial locality such that A [i ][ j ] , A [i + 1][ j ] are on different rows.

The time to access A [i ][ j ] or A [i + 1][ j ] require whole row cycle time t RC

Consecutive writes to different rows of same bank

tCWD + t BURST + tWR + t RP + t RCD

cmd & addr write 0 prec row acc write 1


bank “i” of rank data restore array precharge data sense data restore
rank “m” utilization I/O gating I/O gating
data bus data burst data burst

tCWD t BURST tWR t RP tRCD t BURST


Consecutive reads to different banks (bank conflict)

t RP + t RCD

cmd & addr read 0 prec row acc read 1


bank “i” of rank “m” bank i open
bank “j” of rank “m” row x open bank j precharge data sense row y open-data restore
rank “m” utilization I/O gating I/O gating
data bus data burst data burst

t RP tRCD

bank i and bank j are open together but read request to bank j is different from
active row in sense amplifier, hence bank j must precharge bitline first. This is
called “bank conflict”.
Consecutive reads to different ranks
t BURST + t RTRS
time
cmd & addr read 0 read 1
bank “i” of rank “m” bank i open
bank “j” of rank “n” bank j open
rank “m” utilization I/O gating
rank “n” utilization I/O gating
data bus data burst sync data burst

tCAS t BURST tOST t BURST


parameter description
t_RTRS Rank-To-Rank-Switching time. Used in DDR and DDR2 SDRAM system.1 full cycle in DDR SDRAM

Consecutive writes to different ranks


t BURST tOST
time
cmd & addr write 0 write 1
bank “i” of rank “m” bank i access
bank “j” of rank “n” bank j access
rank “m” utilization I/O gating
rank “n” utilization I/O gating
data bus data burst data burst

tCWD t BURST tOST t BURST


Write command following read command to open banks
tCAS + t BURST + t RTRS − tCWD tCWD

cmd & addr read 0 write 1


bank “i” of rank “m” row x open
bank “j” of rank “m” data restore
rank “m” utilization I/O gating I/O gating
data bus data burst sync data burst

tCAS t BURST tRTRS

Read command following write command to open banks


tCWD + t BURST + tWTR

cmd & addr write 0 read 1


bank “i” of rank “m” data restore
bank “j” of rank “m” row x open
rank “m” utilization I/O gating I/O gating
data bus data burst data burst

tCWD t BURST tWTR


DRAM protocol overheads for DDR and DDR2 SDRAM

R = read ; W = write ; s = same ; d = different


prev next rank bank row scheduling distance between column access
commands (no command reordering)
R R s s t_BURST
R R s d t_BURST
R R s s d t_RAS + t_RP
R R d s/d t_RTRS + t_BURST
R W s d t_CAS + t_BURST + t_RTRS - t_CWD
W R s d t_CWD + t_BURST - t_WTR
W W s s t_BURST
W W s s d t_CWD + t_BURST + t_WR + t_RP + t_RCD
W W d s/d t_OST + t_BURST

Later on, we will determine value of timing parameter and calculate overhead explicitly
OutLine

• Preliminary
• DRAM cell
• DRAM device
• DRAM access protocol
• DRAM timing parameter
- CL value
- system calibration

• DDR SDRAM
CL value of commodity DDRx SDRAM

From http://shopping.pchome.com.tw/

CL value: tCL − t RCD − t RP − t RAS


memory clock speed = 533

CL value: tCL − t RCD − t RP


memory clock speed = 400
CAS latency (CL value) [1]
from http://en.wikipedia.org/wiki/CAS_latency

• when DDR is read, a single read produces 64 bits of data from 8 chips, 8 bits per chip.
• when talking about the time between bits, it is referring to the time from the
appearance of the first group of bits (8 bits a chip) until the appearance of the next
group of bits
• CAS latency only specifies the delay between the request and the first bit.
• Remaining bits (7 bits) are fetched one bits per cycle.

type Data rate ns/bit Command rate ns/cycle CL first word (ns) 8 word (ns)
DDR-400 400MHz 2.5 200MHz 5 3 15 32.5
DDR2-800 800MHz 1.25 400MHz 2.5 5 12.5 21.25
DDR2-1066 1066MHz 0.94 533MHz 1.88 5 9.4 15.98
DDR3-1333 1333MHz 0.75 666MHz 1.5 9 13.5 18.75
DDR3-1600 1600MHz 0.625 800MHz 1.25 8 10 14.375

Example: DDR-400
1
first word needs time CL × = 3 × 5 = 15 ns
command rate
1
remaining 7 words need time 7 × = 7 × 2.5 = 17.5 ns
data rate
CAS latency [2]
I/O bus clock
type Data rate ns/bit Command rate ns/cycle CL first word (ns) 8 word (ns)
DDR-400 400MHz 2.5 200MHz 5 3 15 32.5
DDR2-800 800MHz 1.25 400MHz 2.5 5 12.5 21.25
DDR2-1066 1066MHz 0.94 533MHz 1.88 5 9.4 15.98
DDR3-1333 1333MHz 0.75 666MHz 1.5 9 13.5 18.75
DDR3-1600 1600MHz 0.625 800MHz 1.25 8 10 14.375

Correct “command rate”

type Data rate ns/bit Command rate ns/cycle CL first word (ns) 8 word (ns)
(memory clock)
DDR-400 400MHz 2.5 200MHz 5 3 15 32.5
DDR2-800 800MHz 1.25 200MHz 5 5 25 33.75
DDR2-1066 1066MHz 0.94 266MHz 3.75 5 18.75 25.33
DDR3-1333 1333MHz 0.75 133MHz 6 9 54 59.25
DDR3-1600 1600MHz 0.625 200MHz 5 8 40 44.375
Memory divider
from http://en.wikipedia.org/wiki/Memory_divider

• A Memory divider is a ratio which is used to determine the operating clock frequency
of computer memory in accordance with Front Side Bus frequency, if memory system
is dependent on FSB clock speed

• Memory Divider is also commonly referred as "DRAM:FSB Ratio".

• Ideally, Front Side Bus and system memory should run at the same clock speed
because FSB connects memory system to the CPU. But, it is sometimes desired to
run the FSB and system memory at different clock speeds when you overclock FSB

type Data rate ns/bit Command rate ns/cycle CL first word (ns) 8 word (ns)
(memory clock)
DDR2-800 800MHz 1.25 200MHz 5 5 25 33.75
DDR2-1066 1066MHz 0.94 266MHz 3.75 5 18.75 25.33

Motherboard: P5Q PRO with system clock 266MHz, FSB = 1066 MHz
Memory: DDR2-800 with CL = 5
http://www.lavalys.com/ System calibration Software
System information tool, show everything of PC

http://www.tweakers.fr/memset.html

Memory system
Memory timing parameter
Timing parameter

From BIOS of motherboard P5Q PRO

5-5-5-18-3-52-6-3 8-3-5-4-6-4-6 14-5-1-6-6

CAS Latency READ to WRITE Delay (S/D) WRITE to PRE Delay

DRAM RAS to CAS delay Write to Read Delay(S) READ to PRE Delay

DRAM RAS Precharge WRITE to READ Delay(D) PRE to PRE Delay

DRAM RAS Activate to READ to READ Delay(S) ALL PRE to ACT Delay
Precharge time
RAS to RAS Delay READ to READ Delay(D) ALL PRE to REF
Delay

Row Refresh Cycle Time WRITE to WRITE Delay(S)

Write Recovery Time WRITE to WRITE Delay(D)

Read to Precharge Time

CL value: tCL − t RCD − t RP − tRAS = 5 − 5 − 5 − 18


System information

1 System clock (外頻) = 267.3MHz

2 CPU multiplier (倍頻) = 9

3 CPU clock = 2405.4 MHz

CPU clock = 外頻 x 倍頻

4 Memory bus = 400.9 MHz

5 DRAM : FSB ratio = 12 : 8


DRAM I/O bus clock = 400 MHz
System clock = 266 MHz

6 Memory type: DDR2-800

7 Dual-channel: enable

Standard Memory Cycle I/O Bus Data transfers Module Peak


name clock time clock per second name transfer rate
DDR2-800 200 MHz 5 ns 400 MHz 800 Million PC2-6400 6400 MB/s
Performance of cache, memory
CPU clock = 2405.4 MHz implies 1 ns = 2.4 cycle

Cache parameters of processors based on Intel Core Microarchitecture


Access Access
level capacity Associativity Line size Latency Throughput Write update
(ways) (bytes) (clocks) (clocks) policy

L1 data cache 32 KB 8 64 3 1 Write-back


L2 cache 2, 4 MB 8 or 16 64 14 2 Write-back

L1 cache has estimate latency 1.2 ns = 2.88 cycles ∼ 3 cycles


L2 cache has estimate latency 5.6 ns = 13.44 cycles ∼ 14 cycles
DDR2-800 SDRAM has estimate latency 91.5 ns = 220 cycles which is larger than best case (25 ns)

type Data rate ns/bit memory clock ns/cycle CL first word (ns) 8 word (ns)

DDR2-800 800MHz 1.25 200MHz 5 5 25 33.75


60 CPU cycles 81 CPU cycles
EVEREST: CPU information

倍頻 = 9

外頻 = 267MHz

2 cores share 8 MB L2 cache

Huge number of transistors

feature size of MOS = 65 nm


Density of MOS transistor

L ( feature size ) = 65nm

Suppose length of MOS is 5L and MOS is square

, then area of MOS = 25 L2 = 105625nm 2

Lmax = 5 L
2
Area of Die = (
286mm 2 = 286 106 nm ) = 286 × 1012 nm 2
area of Die 286 × 1012 nm 2
Maximum number of MOS in a Die = = 2
= 2700 Million
area of MOS 105625nm

582 M
number of MOS of CPU = 582 M, about = 21.6% of Die
2700 M
This means a large space of Die is preserved for other usage
EVEREST: motherboard information

bus width of FSB = 64 bits

Bandwidth = 1067MHz (data rate) x 64 bits


(bus width)

Memory bus = 64 (bits per channel) x 2


(dual-channel)

Bandwidth = 800MHz (data rate) x 128 bits


(bus width)

Type Data rate ns/bit memory clock ns/cycle CL first word (ns) 8 word (ns)
DDR2-800 800MHz 1.25 200MHz 5 5 25 33.75
60 CPU cycles 81 CPU cycles
EVEREST: SDRAM information

t_REF = 3120 (memory clock) = 3120 x 5 (ns/clock) = 15600 ns = 15.6 µ s


EVEREST: SPD information
the JEDEC standards require certain parameters to be placed in the
lower 128 bytes of an EEPROM located on the memory module.
These bytes contain timing parameters, manufacturer, serial number
and other useful information about the module

Module has two sides, one rank per side,


each rank has 8 SDRAM chips, 8 banks
per chip

8 (bits per chip) x 8 (chips per rank)


Concrete timing parameter

parameter description Memory


from EVEREST Clocks
t_CAS Column Access Strobe 5
latency.
t_RCD Row to Column command 5
Delay
t_RP Row precharge 5

t_RAS Row Access Strobe 18

t_RTRS Rank-To-Rank-Switching 1
time.
t_BURST Data burst duration. 2

t_CMD Command transport 2


duration.
t_CWD Column Write Delay. t_CAS – t_CMD
=3
t_WR Write recovery time 14

t_WTR Write to read delay same rank: 11


different rank: 5
t_OST ODT switching time. 1

t_WTP Write to precharge delay 14

t_RTP Read to precharge delay 5


DRAM protocol overheads for DDR2-800 SDRAM

R = read ; W = write ; s = same ; d = different

prev next rank bank row scheduling distance between column access Memory CPU
commands (no command reordering) clocks clocks

R R s s t_BURST 2 24
R R s d t_RP + t_RCD 10 120
R R s s d t_RAS + t_RP 23 276
R R d s/d t_RTRS + t_BURST 3 36
R W s d t_CAS + t_BURST + t_RTRS - t_CWD 5 60
W R s d t_CWD + t_BURST + t_WTR 16 192
W W s s t_BURST 2 24
W W s s d t_CWD + t_BURST + t_WR + t_RP + t_RCD 29 348
W W d s/d t_OST + t_BURST 3 36

1 memory clock = 5 ns = 5 ns x (2.4 CPU cycle/ns) = 12 CPU cycle

Observation: different combination of commands reveals different overhead, we expect that


false-sharing would have large overhead

219.6 CPU cycles


CAS latency, memory speed, and price
from http://shopping.pchome.com.tw/

dual-channel (雙通道)

tCL = 5 tCL = 7

tri-channel (三通道)

tCL = 9
tCL = 8
Objective: choose low CL-value and high clock speed memory module

Question: Is DDR3 faster than DDR2 ?


OutLine

• Preliminary
• DRAM cell
• DRAM device
• DRAM access protocol
• DRAM timing parameter
• DDR SDRAM
- DDR2-SDRAM, DDR3-SDRAM
- dual-channel, tri-channel
- memory controller
DDR SDRAM [1]
• SDRAM device operates data bus at the same data rate as the address and command buses.
• DDR SDRAM device operates data bus twice the data rate as the address and command buses.

SDRAM
1 data out per cycle

DDR SDRAM
Two data out per cycle
DDR SDRAM [2]

SDRAM device architecture with 4 banks DDR SDRAM device I/O

The rate of internal data transfer in DDR SDRAM is not increased.


DDR SDRAM use 2-bit prefetch to increase bandwidth
I/O bus clock run twice faster than memory clock such that I/O bus can transfer 2N data at one time unit

Standard Memory Cycle I/O Bus Data transfers Module Peak


name clock time clock per second name transfer rate
DDR-266 133 MHz 7.5 ns 266 MHz 266 Million PC-2100 2100 MB/s
DDR-333 166 MHz 6 ns 333 MHz 333 Million PC-2700 2700 MB/s
DDR-400 200 MHz 5 ns 400 MHz 400 Million PC-3200 3200 MB/s
DDR2 SDRAM
DDR2 SDRAM device I/O

The rate of internal data transfer in DDR2 SDRAM is not increased.

DDR2 SDRAM use 4-bit prefetch to increase bandwidth.

I/O bus clock run 2 times faster than memory clock and sample data at rising edge and falling edge of clock
signal such that I/O bus can transfer 4N data at one time unit

Standard Memory Cycle I/O Bus Data transfers Module Peak


name clock time clock per second name transfer rate
DDR2-533 133 MHz 7.5 ns 266 MHz 533 Million PC2-4300 4266 MB/s
DDR2-667 166 MHz 6 ns 333 MHz 667 Million PC2-5300 5333 MB/s
DDR2-800 200 MHz 5 ns 400 MHz 800 Million PC2-6400 6400 MB/s
DDR2 SDRAM SPEC [1]
512MB addressing

1GB addressing
DDR2 SDRAM SPEC [2]
Simplified state diagram (not real)
DRAM controller
• Row-Buffer-Management Policy
- open-page policy
- close-page policy
• Address Mapping Scheme
- minimize bank address conflicts in temporal adjacent requests and maximize the
parallelism in memory system (parallelism of channels, ranks, banks, rows, and
columns)
- utilize dual-channel architecture
- flexibility for inserting/removing memory module
• DRAM Command Ordering Scheme
North-bridge on P5Q PRO motherboard
http://www.intel.com/products/desktop/chipsets/p45/p45-overview.htm

Dual-channel architecture describes a technology that theoretically doubles data


throughput from the memory to the memory controller. Dual-channel-enabled memory
controllers utilize two 64-bit data channels, resulting in a 128-bit data path
Intel Dual-Channel DDR Memory Architecture White Paper

single-channel memory feeds data


With two channels, data is transferred
to CPU through a single pipe.Data
128 bits at a time.
is transferred 64 bits at a time.
Possible allocation of memory modules
Peak bandwidth
P5Q PRO: FSB/CPU外頻對照表

前側匯流排 FSB 1600 FSB 1333 FSB 1066 FSB 800


CPU外頻 400 MHz 333 MHz 266 MHz 200 MHz

Bandwidth of FSB = 1066 (MHz) x 8 (bytes) = 8 GB/s

Standard Memory Cycle I/O Bus Data Peak transfer Dual-


name clock time clock transfers rate (single channel
per second channel)
DDR2-800 200 MHz 5 ns 400 MHz 800 Million 6.4GB/s 12.8 GB/s

bandwidth of FSB < bandwidth of dual-channel DRAM

Note: 外頻指的是 CPU 的外部頻率. 前端匯流排FSB則是用來作為CPU和晶片組Chipset之間連接用的這


一段。 FSB速度是以CPU外頻為基準,利用倍頻技術,使FSB在每一週期傳輸一次資料提升到兩次或四
次的資料,也就是兩倍頻或四倍頻,如266MHz(133x2)、333MHz(166x2).

Quad data rate (or quad pumping) is a communication signaling technique wherein data is transmitted
at four points in the clock cycle: on the rising and falling edges, and at two intermediate points between
them. The intermediate points are defined by a 2nd clock that is 90° out of phase from the first.
Intel 82955X MCH (Memory Controller Hub)

http://www.d-cross.com/show_article.asp?page=2&article_id=693

Two channels, 4 ranks per channel


Definition
symbol description Number
K Number of channels in system 2^k
L Number of ranks per channel 2^l
B Number of banks per rank 2^b
R Number of rows per bank 2^r
C Number of columns per row 2^c
V Number of bytes per column 2^v
Z Number of bytes per cacheline 2^z
N Number of cacheline per row 2^n

Number of bytes per row per bank = C × V = N × Z

A memory system has capacity of K × L × B × R × C × V bytes


A memory system needs k + l + b + r + c + v or k + l + b + r + z + n address bits

symmetric dual channel mode:


sequentially consecutive cacheline addresses are mapped to alternating channels
so that requests from a streaming request sequence are mapped to both channels
concurrently.
Address Mapping in Intel 82955X MCH
Rank Device configuration: Rank configuration: Bank row column column
capacity bank count x row Rank composition: bank count x row address address address address
(MB) count x col count x device density x count x col count x bits (b) bits (r) bits (c) offset (v)
col size (bytes) device count col size
(BxRxCxV)
128 4 x 8192 x 512 x 2 256Mbit x 4 4 x 8192 x 512 x 8 2 13 9 3

256 4 x 8192 x 1024 x 2 512Mbit x 4 4 x 8192 x 1024 x 8 2 13 10 3


512 4 x 16384 x 1024 x 1 512Mbit x 8 4 x 16384 x 1024 x 8 2 14 10 3
512 8 x 8192 x 1024 x 2 1Gbit x 4 8 x 8192 x 1024 x 8 3 13 10 3
1024 8 x 16384 x 1024 x 1 1Gbit x 8 8 x 16384 x 1024 x 8 3 14 10 3

4 banks per device

device
device device
device device
device device
device
Bank
device
Bank 23 Bank
device
Bank 23 Bank
device
Bank 23 Bank
device
Bank 23
Device 0
Bank 1 Device 1
Bank 1 Device 2
Bank 1 Device 3
Bank 1
8192 rows
Bank 0 Bank 0 Bank 0 Bank 0

4 devices per rank


1024 columns
2 byte per column

col size per rank = col size per device x device count per rank = 2 x 4 = 8 (bytes)
Per-channel, per-rank address mapping scheme for
single/asymmetric channel mode

Rank Rank 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 0
capacity configuration: 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
(MB) row count x bank
count x col count
x col size
128 8192x4x512x8 1 9 8 7 6 5 4 3 2 1 0 1 1 0 1 8 7 6 5 4 3 2 1 0 X X X
0 1 2

256 8192x4x1024x8 1 1 9 8 7 6 5 4 3 2 1 0 1 1 0 9 8 7 6 5 4 3 2 1 0 X X X
2 0 1
512 16384x4x1024x8 1 1 1 9 8 7 6 5 4 3 2 1 0 1 1 0 9 8 7 6 5 4 3 2 1 0 X X X
3 2 0 1
512 8192x8x1024x8 1 1 1 9 8 7 6 5 4 3 2 1 0 0 1 2 9 8 7 6 5 4 3 2 1 0 X X X
2 1 0
1024 16384x8x1024x8 1 1 1 1 9 8 7 6 5 4 3 2 1 0 0 1 2 9 8 7 6 5 4 3 2 1 0 X X X
3 2 1 0

Row ID bank ID col ID Byte offset

Channel address and rank address are mapped to highest bit field such that each rank or
channel is a contiguous block of memory
Per-rank address mapping scheme for
dual channel mode
Rank Rank 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 0
capacity configuration: 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
(MB) row count x bank
count x col count
x col size
128 8192x4x512x8 1 9 8 7 6 5 4 3 2 1 0 1 1 0 1 8 7 6 5 4 3 0 2 1 0 X X X
0 1 2

256 8192x4x1024x8 1 1 9 8 7 6 5 4 3 2 1 0 1 1 0 9 8 7 6 5 4 3 0 2 1 0 X X X
2 0 1
512 16384x4x1024x8 1 1 1 9 8 7 6 5 4 3 2 1 0 1 1 0 9 8 7 6 5 4 3 0 2 1 0 X X X
3 2 0 1
512 8192x8x1024x8 1 1 1 9 8 7 6 5 4 3 2 1 0 0 1 2 9 8 7 6 5 4 3 0 2 1 0 X X X
2 1 0
1024 16384x8x1024x8 1 1 1 1 9 8 7 6 5 4 3 2 1 0 0 1 2 9 8 7 6 5 4 3 0 2 1 0 X X X
3 2 1 0

Row ID bank ID col ID Byte offset


Channel ID

• A channel is 64-byte contiguous block (cacheline is 64 byte), hence


consecutive cacheline addresses are interleaved into channels
• Rank address is mapped to highest bit field such that a rank is a contiguous
memory block

You might also like