You are on page 1of 29

Using PCIe® over Cable for High Speed

CPU-to-CPU Communications
Steve Cooper
One Stop Systems

Copyright © 2008, PCI-SIG, All Rights Reserved 1


Abstract

ƒ Although PCIe® is typically thought of as a CPU-to-I/O interface solution, it also


has applicability for high-speed CPU-to-CPU communications used within high-
performance computing networks. This presentation will discuss both the tree
and network usage models and the hardware and software challenges involved
in using standard PCIe in real-world multi-CPU applications. Topics will include
how to interconnect multiple root complexes, how to isolate multiple spread-
spectrum clocks, how to set-up burst transfers without a DMA controller and how
to deal with network configuration and hot-swap events.

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 2


PCIe over a Cable

ƒ Native PCIe
9 Full bandwidth
9 Full S/W transparency

ƒ Four connector and cable


sizes defined
9 x1, x4, x8 and x16

ƒ Cable length unspecified


9 Available up to 7 meters

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 3


PCI Express Basics
Two Architectures

CPU
9 Tree – One CPU and multiple I/O boards

I/O I/O I/O


CPU CPU CPU

9 Network – Multi CPUs, Multi I/O


– Requires special H/W and S/W Switch

I/O I/O I/O

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 4


Tree Architecture – I/O
Expansion Usage Model

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 5


Upstream Adapters
ƒ PC, laptop and industrial form factors commercially
available now

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 6


PCIe Cables
ƒ PCI-SIG PCI Express External Cabling Specification 1.0
– x1 Æ 2.5 Gb/s
– x4 Æ 10 Gb/s
– x8 Æ 20 Gb/s
– x16 Æ 40 Gb/s
ƒ Cable pin-outs include PCIe lanes and auxiliary signals for:
• 100MHz Reference Clock (CREFCLKp, CREFCLKn)
• Cable Present (CPRSNT#)
• Platform Reset (CPERST#)
• Cable Power On (CPWRON#)
• Sideband Return (SB_RTN)
• Cable Wake (CWAKE#)
• 3.3V Power (+3.3V POWER, PWR_RTN)
ƒ Looking beyond the current specification
9 Same cables usable for PCIe 2.0 5GT/s?
9 Fiber optic cable solutions for longer distance?

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 7


PCIe over Cable Multi-port Switch

ƒ Extends PCIe to multiple downstream sub-systems


ƒ One upstream link to multiple downstream links

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 8


Downstream Adapters and
Devices
ƒ Creating downstream PCIe
endpoints
9 PCIe board adapters
9 Backplane interface boards
9 Subsystems with PCIe cable
inputs
9 Backplanes with PCIe cable inputs

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 9


PCI Express Basics
Two Architectures

CPU
9 Tree – One CPU and multiple I/O boards

I/O I/O I/O


CPU CPU CPU

9 Network – Multi CPUs, Multi I/O


– Requires special H/W and S/W Switch

I/O I/O I/O

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 10


Network Usage Model Configuration

ƒ Networking
characteristics
9 Peer-to-peer
9 TCP/IP compatibility
9 High speed direct data
transfers

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 11


PCIe over Cable Comparison
versus Ethernet
ƒ PCIe Performance
9 3 to 100 times faster than 1Gb
Ethernet
10
10 Gb
Gb
Ethernet
Ethernet ƒ PCIe Cost – Source OSS
9 Adapters: ~ $100 to $700
PCIe
PCIe over
over Cable
Cable
2.5Gb
9 Cables: $30 to $300
2.5Gb to
to 80Gb
80Gb
Price

9 Switches: $600 to $1,200


ƒ PCIe cables
9 Heavy-duty well shielded cables
9 All cables are cross-over style
ƒ PCIe best suited for small, local
11 Gb
Gb networks
Ethernet
Ethernet 9 Direct mapping of memory
versus store and forward
Performance
PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 12
Challenges in using PCIe in
CPU-to-CPU Configurations

ƒ Hardware
9 Connecting multiple root complexes together
9 Isolating multiple spread-spectrum clocks
9 Backplane configurations
9 Choosing the right PCIe slots
ƒ Software
9 Direct data transfers and TCP/IP compatibility
9 Burst transfers without a DMA controller
9 Network configurations
9 Hot-swap events
9 Fault-tolerance

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 13


Connecting Multiple Root
Complexes

ƒ Non-transparent (NT) PCIe ports make


CPU CPU CPU the node CPU elements PCIe end points
Root NT Port
... NT Port
ƒ S/W drivers set up memory windows for
Complex each CPU to CPU connection
ƒ Two modes of communication
9 Direct data transfers
9 TCP/IP transfers
– Looks like a NIC card to applications
PCIe Switch

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 14


Isolating Multiple Spread-
Spectrum Clocks
CPU #1 CPU #2 ƒ Spread-spectrum clocking (SSC)
Root Complex Root Complex 9 Reduces noise spike at core frequencies
by dynamically changing the clock
SSC #2 frequency
SSC #1
9 Driven by each root complex
NT Port and SSC
isolation ƒ Challenge is how to interconnect two
PCIe root complex domains each with
their own SSCs
PCIe with SSC #1 9 Current PCIe switch components don’t
support this
SSC #2 ƒ Solution:
9 Back-to-back switch components
Switch
9 Each isolate one SSC to a common fixed
clock rate
Constant clock
9 Allows SSC to be used in each system
and over the cable
Switch

SSC #1
PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 15
Backplane Configurations

10 Gb
Ethernet
ƒ Multi-CPU industrial buses or blade
servers
NT
Bridge ƒ Upgrade path instead of Ethernet over
the backplane
NT
Bridge
ƒ Three implementations
NT 9 Switch board contains non-transparent
Bridge
switches so CPU boards are unchanged
Switch NT 9 Node CPU boards contain non-
Bridge
transparent bridges
NT 9 Backplane contains non-transparent
Bridge
bridges
NT
Bridge

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 16


Choosing the Right PCIe Slots
ƒ PCIe slots connected to
North bridge have faster
CPU access to memory
ƒ Measured performance
9 Two system configuration
North
Memory PCIe Slot 9 x4 PCIe over cable
Bridge
9 Direct data transfers
PCIe Slot 9 Results:
South – 693MB/s using NB slots
PCIe Slot – 404MB/s using SB slots
Bridge
PCIe Slot 9 72% faster using North
bridge slots
ƒ Results likely to be chip
set specific

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 17


Direct Data Transfers and
TCP/IP Compatibility
ƒ TCP/IP driver provides
standard networking
Application Layer protocol support
TCP Layer
9 FTP, file copy, web
DDT Software access, etc. all work just
IP Layer Implementation like Ethernet
ExpressNet Driver Sockets API ƒ Direct Data Transfer
PCI Express Transaction Layer 9 Maps memory
Data Link Layer Hardware 9 Direct writes into the other
Implementation computer’s memory
Physical Layer
9 No store and forward
9 Sockets API

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 18


Direct Data Transfers
ƒ Each application writes
directly to memory without
store and forward
Address 9 Transfers to and from each
Memory
Memory
Space machine’s memory
Space
Space
9 Separate allocated memory
App
App 11 for each possible pair of
App
App 22 intercommunication nodes
9 Typically 1 MB per pair is
allocated
Address Memory
Memory
9 Limits total number of
Space Space
Space nodes to <256 nodes
PCIe over
cable

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 19


Memory Mapped Communications
ƒ Software virtual mesh architecture
9 Each node has dedicated memory for each other node
9 High performance
9 Limits maximum number of nodes

2 3 4 2 3 4 1 3 4 1 3 4 1 2 4 1 2 4 1 2 3 1 2 3

1 2 3 4

Each numbered box represents an address translation

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 20


Burst Transfers without a
DMA Controller
ƒ Processor-assisted burst transfers
ƒ Write combining -- special x86 class
of memory addresses that create
CPU #1 single PCIe burst write transfers of 64
CPU #2 bytes
Memory 9 Consecutive memory store using
special write combining memory
Write
space
Write
combining
combining Burst
Burst store
store into
into – Mapped with MTRRs – memory type
cache
cache memory
memory range registers

9 Write combining improves


PCIe Packet performance ~ 10X

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 21


Network Topologies and
Configurations

ƒ Point-to-point or simple star networks are


straight-forward
ƒ Larger networks or unusual topologies are
more complex
9 Cascading switches
9 Ring networks
ƒ Network setup is distributed
9 Systems can power-up in any order
9 Each node self-configures

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 22


Configuration Examples
Two System Configuration

ƒ Point-to-point connection

ƒ Non-transparent bridging and clock


isolation in node PC

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 23


Configuration Examples
5 System Configuration

ƒ PCIe switch board in host PC

ƒ Non-transparent bridging and clock


isolation in node PCs

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 24


Configuration Examples
8 System Configuration

ƒ External multi-port switch

ƒ Non-transparent bridging
and clock isolation in
node PCs

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 25


Hot-Plug Events

ƒ Startup
9 Power up in any order
9 Host and nodes must be able to operate
without the others powered on
ƒ Add-ons
9 Network must recognize new system
configuration
9 Bring up and run
9 Maximum network topology may be
preset in S/W
– Saves on pre-allocated memory
ƒ Removal
9 Notifies all attached applications on all
other nodes
9 If in the middle of a transaction:
– Using TCP, application gets error
message
– Using DDT, user code defines how errors
are handled

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 26


Fault-tolerant Networks

ƒ Protocols that detect and recover


from faults
9 Same as Ethernet
Switch
Switch Switch
Switch
A
A B
B
ƒ Nodes survive if a switch goes down
ƒ Nodes automatically re-direct traffic
ƒ Bridging software
– Included in Windows and Linux
9 Set up preferred route
Node
Node 11 Node
Node 22 ... Node
Node nn 9 Then set up alternate routing
9 TCP/IP automatically adjusts
9 Bridging software
– Included in Windows and Linux

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 27


Summary

ƒ PCIe over cable exists


CPU CPU CPU 9 Approved PCI-SIG standard
ƒ Tree architecture
9 Simple
9 High performance
Switch 9 Good fit for I/O expansion
ƒ Network architecture
9 More complex H/W and S/W
9 High performance
I/O I/O I/O
9 Good fit for small network applications

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 28


Thank you for attending the
PCI-SIG Developers Conference 2008

For more information please go to


www.pcisig.com

PCI-SIG Developers Conference Copyright © 2008, PCI-SIG, All Rights Reserved 29

You might also like