Professional Documents
Culture Documents
Kevin Marks
Dell, Inc.
March 1, 2011
Queueing Interface
NVM Command Set
Admin Command Set
End-to-end Protection (DIF/DIX)
Security
Physical Region Pages (PRPs)
NVMe
Technical Work Begins
...
2009
2010
2011
2012
2013
2014
Server Storage
x16
SAN
Controller A
Root
Complex
Root
Complex
SAS
PCIe
Switch
x16
PCIe/PCIe
RAID
x4
NVMe
SAN
x16
PCIe
Switch
NVMe
Root
Complex
Root
Complex
NVMe
External Storage
Client Storage
IO Hub
x4
NVMe
NVMe
NVMe
NVMe
NVMe
Controller B
x16
PCIe
Switch
SAS
NVMe
NVMe
NVMe
NVMe
SATA
HDD
SAS
HDD
Typically for
persistent data
Redundant (i.e.,
RAIDed)
Commonly used
as Tier-0 storage
NVMe Queues
Tail
Head
Logical View
High Memory
Queue Size
Head
Queue Empty
Low Memory
Head
Tail
Head == Tail
Queue Full
Types of Queues
Up to 64K queues per NVMe controller with up to 64K elements per queue
Used to submit/complete IO commands
1) Queue Command(s)
2) Ring Doorbell (New Tail)
3) Fetch Command(s)
4) Process Command (s)
5) Queue Completion(s)
6) Generate Interrupt
PCIe TLP
PCIe TLP
PCIe TLP
PCIe TLP
PCIe TLP
PCIe TLP
6
3
45
SQ and CQ relationships
Each SQ is associated with only one CQ (i.e.,
commands submitted on a specific SQ
complete on a specific CQ.
The SQ to CQ relationship is defined at SQ
creation time.
It is permissible within the architecture to
have multiple SQs mapped to a single CQ
(n:1)
Flash Memory Summit 2013
Santa Clara, CA
10
Core 0
Admin
Completion
Queue
I/O
Submission
Queue
MSI-X
Core 1
I/O
Completion
Queue
I/O
Submission
Queue
I/O
Submission
Queue
MSI-X
Core N
I/O
Completion
Queue
...
MSI-X
I/O
Submission
Queue
I/O
Completion
Queue
MSI-X
NVMe Controller
Per core: One or more submission queues, one completion queue, and one MS-X
interrupt
High performance and low latency command issue
No locking between cores
11
Command Arbitration
All controllers support round robin arbitration
ASQ
SQ
SQ
SQ
RR
SQ
SQ
12
Command Arbitration
An NVMe controller may support weighted round robin with urgent priority class
arbitration
13
Arbitration Primitives
High
...
Priority
Arb
...
...
Low
Weight = 3
...
...
Weight = 2
Round
WRR
Arb
...
...
Med
...
Weight = 1
...
14
15
PCIe Port
NVMe Controller
PCI Function 0
NSID 1
NS
A
16
PCIe Port
NVMe Controller
PCI Function 0
NSID 1
NSID 2
NS
A
NS
B
17
PCI Function 0
NVM Express Controller
NSID 1
NSID 2
NS
A
PCI Function 1
NVM Express Controller
NSID 1
NSID 2
NS
C
NS
B
PCIe Port x
PCIe Port y
PCI Function 0
NVM Express Controller
PCI Function 0
NVM Express Controller
NSID 1
NS
A
NSID 2
NSID 1
NSID 2
NS
C
NS
B
18
PCIe
SSD
Host
Host
PCIe
PCIe
PCIe Switch
PCIe Switch
PCIe
SSD
PCIe
SSD
PCIe
SSD
PCIe
SSD
PCIe
SSD
PCIe
SSD
PCIe
SSD
19
Host
A
Host
B
NVMe Controller
PCI Function 0
NVMe Controller
PCI Function 1
NSID 1
NSID 2
NSID 2
NS
A
NSID 1
NS
C
NS
B
NVM Subsystem
20
21
Controller Initialization
The host performs the following actions in sequence to initialize the
controller to begin executing Admin commands:
1. Set the PCI and PCI Express registers based on the system
configuration. This includes configuration of power management
features. Pin-based or single-message MSI interrupts should be used
until the number of I/O Queues is determined.
2. Configure the Admin Queue by setting the Admin Queue Attributes
(AQA), Admin Submission Queue Base Address (ASQ), and Admin
Completion Queue Base Address (ACQ) to appropriate values.
3. Configure:
1. the arbitration mechanism in CC.AMS
2. the memory page size in CC.MPS
3. the I/O Command Set in CC.CSS
22
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
5
Metadata Pointer
PRP Entry 1
PRP Entry 2
D Word
6
7
8
9
10
11
12
13
14
15
23
24
PRP Example
NVMe command example utilizing the two PRP Entries as
PRPs. The first PRP has an offset into the memory page.
PRP Entry 1
Offset
PRP
2
PRP
ListEntry
Pointer
0
Offset
25
Offset
PRP List Pointer
0
0
0
0
0
PRP List Pointer
PRP List
0
0
0
PRP List
Flash Memory Summit 2013
Santa Clara, CA
26
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
5
D Word
6
7
SGL Entry 1
9
10
11
12
13
14
15
27
SGL List
7
0
MSB
SGL Descriptor
2
3
4
5
SGL Descriptor
SGL Descriptor
SGL Segment
SGL Descriptor
Byte
MSB
SGL Descriptor
10
SGL Descriptor
SGL Descriptor
Descriptor
Type Specific
11
12
13
14
SGL Descriptor
Last
SGL Segment
SGL Descriptor
SGL Descriptor
SGL Descriptor
15
MSB
Desc. Type Specific
Code
Descriptor Type
0h
1h
2h
SGL Segment
3h
4h - Eh
Flash Memory Summit 2013
Santa Clara, CA
LSB
Fh
Reserved
Vendor Specific
28
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
DWord
0
1
2
3
SQ Identifier
Status Field
SQ Head Pointer
P
Command Identifier
29
Phase Tag
High Memory
High Memory
0
0
0
0
0
0
0
0
0
Queue Size
Low Memory
Completion Queue
Initial State
Tail
Head
High Memory
0
0
0
0
0
0
1
1
1
Tail
Head
1
1
1
0
0
0
0
0
0
Low Memory
Low Memory
Initially zero
Host knows phase tag of completions and can determine when last full
entry is reached
30
Command Set
Admin
Command
Set
NVM
Cmd
Set
Rsvd
#1
Rsvd
#2
Rsvd
#3
31
Admin Commands
Command
Required or
Optional
Required
Required
Required
Required
Identify
Required
Get Features
Required
Set Features
Required
Required
Required
Abort
Required
Abort Command
Optional
Firmware Activate
Optional
Firmware
Update / Management
Optional
Optional
Vendor Specific
Category
Queue
Management
Configuration
Status Reporting
32
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
5
DWord
PRP Entry 1
8
9
10
Queue Size
11
Queue Identifier
QPRIO PC
12
13
14
15
33
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
1
2
3
4
5
DWord
PRP Entry 1
7
8
9
10
Queue Size
11
Interrupt Vector
IEN PC
12
13
14
15
Queue Identifier
0 Interrupts disabled
1 Interrupts enabled
1- Submission queue is physically contiguous in
host memory
0 Submission queue is not physically
contiguous
34
Identify
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
5
DWord
6
7
8
9
10
11
12
PRP Entry 1
PRP Entry 2
CNS
13
14
15
35
Identify
Admin Command
1023
Active NSID
0
...
n
n+1
List of active
NSIDs greater
than or equal to
CDW1.NSID
...
Active Namespace
Data Structure
...
Identify Namespace
Data Structure
...
Identify Controller
Data Structure
Active NSID
Active NSID
Active NSID
Active NSID
Active NSID
Active NSID
0
Active Namespace
Data Structure
36
37
38
Set Feature
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
DWord
6
7
8
9
PRP Entry 1
PRP Entry 2
10
11
Feature Identifier
Parameter
12
13
14
15
39
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
1
2
3
4
5
DWord
PRP Entry 1
7
8
PRP Entry 2
9
10
Number of DWords
11
12
13
14
15
40
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
Command Identifier
Opcode
FUSE
Namespace Identifier
1
2
3
4
5
DWord
6
7
8
9
10
Error Status
SMART / Health status
Vendor Specific
Examples:
11
12
13
14
15
Byte 3
Byte 2
Byte 1
Log Page
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
DWord
Type
1
2
SQ Identifier
3
Status 2013
Field
Flash
Memory Summit
Santa Clara, CA
SQ Head Pointer
P
Command Identifier
41
42
Required or
Optional
Format NVM
Optional
Security Send
Optional
Security Receive
Optional
Category
Admin
43
Required or
Optional
Read
Required
Write
Required
Flush
Required
Write Uncorrectable
Optional
Write Zeros
Optional
Compare
Optional
Dataset Management
Optional
Reservation Acquire
Optional
Reservation Register
Optional
Reservation Release
Optional
Reservation Report
Optional
Optional
Category
Required
Data Commands
Optional
Data Commands
Data Hints
Reservations Commands
Vendor Specific
44
Read
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
Command Identifier
FUSE
Opcode
Namespace Identifier
1
2
3
4
D Wor d
PRP Entry1
7
8
10
PRINFO
DSM
13
Starting LBA
11
15
PRP Entry2
14
12 LR FUA
45
Fused Operation
A fused operation is a method to create a complex command by
fusing together two simpler commands.
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0
Command Identifier
FUSE
Field definition
Opcode
Namespace Identifier
2
3
4
5
DWord
6
7
8
9
10
11
12
Metadata Pointer
PRP Entry1
PRP Entry2
13
14
15
46
47
Read Cmd
Starting LBA
Num Logical Blks
Starting LBA
Num Logical Blks
DSM
DSM
DSM
DSM
Dataset
Management
Cmd
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
1 to 256
Ranges
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
Reservation Overview
Reservations provide capabilities that may be utilized by two or
more hosts to provide coordinated access to a shared
namespace
The protocol and manner in which these capabilities are used are
outside the scope of NVMe
Reservations are functionally compatible with T10 persistent
reservations
49
NVM Express
Controller 1
NVM Express
Controller 2
Host ID = A
Host ID = A
NSID 1
NSID 1
Host
B
Host
C
NVM Express
Controller 3
NVM Express
Controller 4
Host ID = B
Host ID = C
NSID 1
NSID 1
Namespace
NVM Subsystem
Host Identifier (Host ID) associated with each controller allows NVM subsystem to
identify controllers associated with the same host and preserve reservation
properties across controllers
Flash Memory Summit 2013
Santa Clara, CA
50
Operation
Register a reservation key
Reservation Register
Reservation Acquire
Reservation Release
Reservation Report
51
Reservation Type
Reservation
Holder
Read
Write
Registrant
Read
Write
Non-Registrant
Read
Write
Write Exclusive
Exclusive Access
52
Reservation Acquire
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
3
4
D Wor d
6
7
8
9
10
11
12
PRP Entry 1
PRP Entry 2
Reservation Type
IEKEY
RACQA
000b Acquire
001b Preempt
010b Preempt and Abort
13
14
15
53
LBA Metadata
2 where n9
512B, 1024 B, 2048B, 4096B, ...
N Bytes
54
55
LBA Data
LBA Metadata
PI
LBA Metadata
LBA Data
LBA Metadata
PI
LB Data
LB Data
NVMe
Controller
Host
No Data Protection
Information
NVM
PCIe SSD
LB Data
Prot.
LB Data
Prot.
LB Data
NVMe
Controller
Host
Prot.
NVM
End-to-End
Data Protection
Information
PCIe SSD
LB Data
Host
LB Data
LB Data
NVMe
Controller
Prot.
NVM
PCIe SSD
Functionally compatible with T10 DIF & DIX, including DIF Type 1, 2, and 3
57
Format NVM
Used to low level format a namespace
Byte 3
Byte 2
Byte 1
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
Byte 0
8
Opcode
FUSE
Namespace Identifier
2
3
4
5
DWord
6
7
8
9
10
11
SES
PIL
PI
MS
LBAF
13
15
0 No PI
1 Type 1
2 Type 2
3 Type 3
12
14
0 Two buffers
1 Extended LBA
0 No secure erase
1 User data erase
2 Cryptographic erase
58
Power
Manager
Power State
(host software)
NVMe
SSD
Performance Statistics
Entry
Latency
Exit
Latency
Relative
Read
Throughput
Relative
Read
Latency
Relative
Write
Throughput
Relative
Write
Latency
25 W
Yes
ms
ms
18 W
Yes
ms
ms
18 W
Yes
ms
ms
15 W
Yes
20
ms
15
ms
7W
Yes
20
ms
30
ms
1W
No
100 mS
50 mS
.25 W
No
100 mS
500 mS
59
Idle Time
Prior to
Transition
Idle
Transition
Power State
ms
500 ms
ms
500 ms
ms
ms
500 ms
20
ms
15
ms
500 ms
20
ms
30
ms
500 ms
50 mS
10,000 ms
500 mS
Power
State
Maximum
Power
Operational
State
Entry
Latency
Exit
Latency
25 W
Yes
ms
18 W
Yes
ms
18 W
Yes
15 W
Yes
7W
Yes
1W
No
100 mS
.25 W
No
100 mS
Power State
500 ms Idle
Power State
I/O Activity
Submission
Queue Tail
Doorbell Written
10,000 ms Idle
Power State
60
Backup
Flash Memory Summit 2013
Santa Clara, CA
61
MSB
1
2
3
Address
4
5
6
Byte
7
8
MSB
LSB
MSB
Length
10
11
LSB
12
Reserved
13
Length
Length of the data block in
bytes
A value of zero indicates that
no data is transferred
14
15
MSB
Desc. Type Specific
62
1
2
3
Reserved
4
5
Length
6
Byte
7
8
MSB
Length
10
11
LSB
12
Reserved
13
14
15
MSB
Desc. Type Specific
63
MSB
1
2
3
Address
4
5
6
Byte
7
8
MSB
LSB
MSB
Length
10
11
LSB
12
Reserved
13
14
15
MSB
Desc. Type Specific
Length
Length of the segment in
bytes
Must be multiple of 16 (a
descriptor is 16B)
64
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Namespace Identifier
Opcode
3
4
5
DWord
6
7
8
9
10
Queue Identifier
11
12
13
14
15
65
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
5
DWord
6
7
8
9
10
Queue Identifier
11
12
13
14
15
66
Get Feature
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
5
DWord
6
7
8
9
10
PRP Entry 1
PRP Entry 2
Feature Identifier
11
12
13
14
15
67
Abort
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
Command Identifier
Opcode
FUSE
Namespace Identifier
3
4
5
DWord
6
7
8
9
Command Identifier
10
Submission Queue ID
11
12
13
14
15
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
DWord
1
2
3
SQ Identifier
Status Field
SQ Head Pointer
P
Command Identifier
68
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
5
DWord
6
7
8
9
PRP Entry 1
PRP Entry 2
10
Number of Dwords
11
Offset
12
13
14
15
69
Firmware Activate
Used to activate a firmware images
Byte 3
Byte 2
Byte 1
Command Identifier
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
FUSE
Opcode
Namespace Identifier
2
3
4
5
DWord
7
8
9
10
AA
FS
11
12
13
14
15
70
Security Received
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Namespace Identifier
1
2
3
4
5
D Wor d
PRP Entry1
7
8
PRP Entry2
9
10
Security Protocol
11
Opcode
SP Specific
Allocation Length
12 LR
13
14
15
71
Security Send
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Namespace Identifier
1
2
3
4
5
D Wor d
PRP Entry1
7
8
Opcode
PRP Entry2
9
10
Security Protocol
11
SP Specific
Allocation Length
12 LR
13
14
15
72
Write
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
Command Identifier
FUSE
Opcode
Namespace Identifier
1
2
3
D Wor d
PRP Entry 1
PRP Entry 2
10
PRINFO
DSM
13
14
15
Described later
Starting LBA
11
12 LR FUA
73
Flush
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
1
2
FUSE
Namespace Identifier
Opcode
4
5
DWord
6
7
8
9
10
11
12
13
14
15
74
Write Uncorrectable
Mark logical blocks invalid
Subsequent read return Unrecovered Read Error status
Byte 3
Byte 2
Byte 1
Command Identifier
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
FUSE
Opcode
Namespace Identifier
3
4
5
D W or d
6
7
8
9
10
11
12
Starting LBA
Number of Logical Blocks
13
14
15
75
Write Zeroes
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
Command Identifier
FUSE
Opcode
Namespace Identifier
1
2
4
5
D W or d
6
7
Starting LBA
11
12 LR FUA
PRINFO
13
14
15
10
Described later
Compare
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
P
Command Identifier
FUSE
Opcode
Namespace Identifier
3
4
D Wor d
PRP Entry1
7
8
PRP Entry2
9
10
Starting LBA
11
12 LR FUA
PRINFO
13
14
15
Dataset Management
Allows host to indicate attributes for ranges
of logical blocks
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
5
DWord
6
7
8
9
10
11
PRP Entry 1
PRP Entry 2
Number of Ranges
ID
R
AD IDW IDR
12
13
14
15
Range Definition
Byte 3
Byte 2
Byte 1
Context Attributes
Byte 0
8
Starting LBA
Range 0
Range 1
Context Attributes
Length in Logical Blocks
Starting LBA
Context Attributes
Starting LBA
Buffer
Range 2
Context Attributes
Length in Logical Blocks
Starting LBA
Range 3
Context Attributes
Length in Logical Blocks
Starting LBA
Context Attributes
Range 4
DWord
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
79
Context Attributes
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
0
WP SW SR
SR
4
AL
AF
No info provided
Typical access
Infrequent access
No info provided
Typical latency
Sequential Read Range (SR) Optimize for sequential reads as a single object
Sequential Write Range (SW) Optimize for sequential writes as a single object
Command Access Size Number of logical block that are expected to be accessed in a read or write command in the
near future. Zero indicates no information provided
Flash Memory Summit 2013
Santa Clara, CA
80
Reservation Register
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
D Wor d
6
7
8
9
10 CPTPL
PRP Entry 1
PRP Entry 2
IEKEY
RREGA
11
12
13
14
15
81
Reservation Release
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
FUSE
Opcode
Namespace Identifier
2
3
4
5
D Wor d
7
8
9
10
11
12
13
14
15
PRP Entry 1
PRP Entry 2
Reservation Type
IEKEY
RRELA
00b Release
01b Clear
82
Reservation Report
Byte 3
Byte 2
Byte 1
Byte 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
0
Command Identifier
Namespace Identifier
2
3
4
FUSE
Opcode
D Wor d
7
8
9
10
PRP Entry 1
PRP Entry 2
Number of Dwords
11
12
13
14
15
83
Host
A
Host
B
Host
C
NVM Express
Controller 3
NVM Express
Controller 4
Host ID = B
Host ID = C
NSID 1
NSID 1
Reservations in Action
NVM Express
Controller 1
NVM Express
Controller 2
Host ID = A
Host ID = A
NSID 1
NSID 1
HostA-Register(NSID,Key_A) -> OK
HostB-Register(NSID,Key_B) -> OK
HostA-AcquireReservation(NSID, Reservation, WriteExclusiveRegistrantsOnly,Key_A) -> OK
HostC-AcquireReservation(NSID, Reservation, WriteExclusiveRegistrantsOnly,Key_C) ->
Error Reservation Conflict
HostA-Write(NSID) -> OK
HostB-Read(NSID) -> OK
HostB-Write(NSID) -> OK
HostC->Read(NSID) -> OK
HostC->Write(NSID) -> Error Reservation Conflict
HostA-ReleaseReservation(NSID,Key1) -> OK
HostC-Write(NSID) -> OK
Queue Management
To allocate I/O Submission Queues and I/O Completion Queues,
host software follows these steps:
1. Configure the Admin Registers and enable controller (CC.EN=1)
2. Submit a Set Features command for the Number of Queues
attribute in order to request the number of I/O Submission
Queues and I/O Completion Queues desired. The completion of
this Set Features command indicates the number of I/O
Submission and Completion Queues allocated.
3. Determine the maximum number of entries supported per queue
(CAP.MQES) and whether the queues are required to be
physically contiguous (CAP.CQR)
4. Allocate the desired I/O Completion Queues by using the Create
I/O Completion Queue command.
5. Allocates the desired I/O Submission Queues by using the
Create I/O Submission Queue command.
Flash Memory Summit 2013
Santa Clara, CA
85
Physical
Function
0
NVMe Controller
Virtual Function (0,1)
NSID 1
NS
A
NSID 2
NVMe Controller
Virtual Function (0,3)
NVMe Controller
Virtual Function (0,2)
NSID 1
NSID 2
NSID 1
NS
C
NS
B
NSID 2
NVMe Controller
Virtual Function (0,4)
NSID 1
NSID 2
NS
D
NS
E
86
NVM Express 1.1 added hooks to enable Enterprise multi-host usage models
Globally Unique ID for a namespace
Reservation capability
PCIe Port x
PCIe Port y
NVMe Controller
PCI Function 0
NVMe Controller
PCI Function 0
NSID 1
NSID 2
NSID 1
NS
A
NSID 2
NS
C
NS
B
Flash Memory Summit 2013
Santa Clara, CA
87
Controller Shutdown
The host performs the following actions in sequence for a normal shutdown:
1. Stop submitting any new I/O commands to the controller and allow any
outstanding commands to complete.
2. The host should delete all I/O Submission Queues, using the Delete I/O
Submission Queue command.
3. The host should delete all I/O Completion Queues, using the Delete I/O
Completion Queue command.
4. The host should set the Shutdown Notification (CC.SHN) field to 01b to indicate
a normal shutdown operation. The controller indicates when shutdown
processing is completed by updating the Shutdown Status (CSTS.SHST) field to
10b.
The host perform the following actions in sequence for an abrupt shutdown:
1. Stop submitting any new I/O commands to the controller.
2. The host should set the Shutdown Notification (CC.SHN) field to 10b to indicate
an abrupt shutdown operation. The controller indicates when shutdown
processing is completed by updating the Shutdown Status (CSTS.SHST) field to
10b.
Flash Memory Summit 2013
Santa Clara, CA
88
Activate Firmware:
Firmware Slots
0
Controller Reset
Resets
90
Type
Reset
31:00
RW
0h
Description
NVM Subsystem Reset Control (NSSRC): A write of the value 4E564D65h
("NVMe") to this field initiates an NVM Subsystem Reset. A write of any other value
has no functional effect on the operation of the NVM subsystem. This field shall return
the value 0h when read.
Data Protection
Data protection information associated
with each sector
Same format as DIF / DIX
Bit
7
0
MSB
1
2
MSB
Byte
3
4
Guard field
Guard
Application Tag
LSB
LSB
MSB
Reference Tag
5
6
LSB