You are on page 1of 17

COM-5502SOFT

IP/TCP SERVER/UDP/ARP/PING STACK for


10GbE
VHDL SOURCE CODE OVERVIEW

Overview simple enough to interface with any Ethernet MAC


component with minimum glue logic.
10Gigabit-speed IP protocols like TCP/IP and
UDP/IP can demand a high level of computation on
Wireshark Libpcap network capture files can be
processors. The trend has been to move the
used as receiver input for simulation purposes.
implementation of these fast but highly repetitive
tasks to a TCP offload engine (TOE) to free the
application processor from frequent interrupts. Block Diagram
MAC APPLICATION
INTERFACE IGMP INTERFACE
The COM-5502SOFT is a generic Internet protocol
stack (including the VHDL source code) designed
to support near 10Gbps throughputs on any low- NEIGHBOR
DISCOVERY
cost FPGAs running at 156.25 MHz.
STREAMS TO
ARP
PACKETS
The modular architecture of VHDL components REPLY

reflects the various internet protocols implemented


within: TCP servers1, UDP transmit, UDP receive, PING
UDP_TX
10Gb MAC REPLY
ARP, NDP and PING. Ancillary components are Receive
also included for streaming. These components can Interface
ROUTING
be easily enabled or disabled as needed by the user's TABLE
application. PACKET
PARSING
ARP/NDP
REQUEST
The VHDL source code is fully portable to a variety
of FPGA platforms.
PACKETS TO
UDP_RX
STREAMS
The maximum number of concurrent TCP
connections can be adjusted prior to VHDL
synthesis depending on the available FPGA TCP
TCP_RXBUF

resources. SERVERS

TCP_TXBUF
The code is written specifically for IEEE 802.3
Ethernet packet encapsulation (RFC 894). It
supports IPv4, IPv6, jumbo frames. TCP_TX

The code interfaces seamlessly with the 10Gb MAC


Arbitration
COM-5501SOFT 10Gbps Ethernet MAC for the Transmit
Interface
MAC / PHY layers implementation or the
COM-5401SOFT 10/100/1000 Mbps Ethernet 180007

MAC. However, the MAC interface is generic and

1
See COM-5503SOFT for TCP clients.
MSS • 845 Quince Orchard Boulevard Ste N • Gaithersburg, Maryland 20878-1676 • U.S.A.
Telephone: (240) 631-1111 Facsimile: (240) 631-1676 www.ComBlock.com
© MSS 2018 Issued 8/19/2018
Target Hardware
The code is written in generic VHDL so that it can Throughput Performance Examples
be ported to a variety of FPGAs capable of running UDP
at 156.25 MHz or above. IPv4 UDP throughput using 512-Byte data frames:
8.64 Gbits/s

Device Utilization Summary IPv4 UDP throughput using 2048-Byte data frames:
9.62 Gbits/s
(Excludes 10G Ethernet MAC and XAUI)
Device: Xilinx Artix-7
UDP-only: TCP
1 UDP rx, 1 UDP tx, IPv4 TCP single server, uni-directional stream,
0 TCP server, ARP, MTU = 1500 Bytes, equal length maximum size IP
Ping, routing table, frames:
IPv4 only, 8KB 9.38 Gbits/s
UDP tx buffer
Flip Flops 3198 TCP
LUTs 2740 IPv6 TCP single server, uni-directional stream,
36Kb block RAM 7.5 MTU = 1500 Bytes, equal length maximum size IP
DSP48 0 frames:
9.23 Gbits/s
TCP IPv4 only:
0 UDP rx, 0 UDP tx,
1 TCP server, ARP,
IPv4 TCP single server, bi-directional streams,
Ping, routing table, MTU = 1500 Bytes
IPv4 only, MTU 8.82 Gbits/s in each direction
1500, 32KB TCP
buffers IPv4 TCP single server, uni-directional stream,
Flip Flops 3515 MTU = 8252 Bytes, equal length maximum size IP
LUTs 3855 frames, buffer size = 32K Bytes:
36Kb block RAM 26 9.88 Gbits/s
DSP48 0
IPv6 TCP single server, uni-directional stream,
TCP IPv4 only: MTU = 8252 Bytes, equal length maximum size IP
0 UDP rx, 0 UDP tx, frames, buffer size = 32K Bytes:
2 TCP servers, ARP, 9.86 Gbits/s
Ping, routing table,
IPv4 only, 32KB
TCP buffers
Flip Flops 4255 TCP Latency Performance Examples
LUTs 5174
The transmit and receive latency depend on the
36Kb block RAM 41
DSP48 0 frame length. For a maximum frame length of 1460
bytes, FPGA 156.25 MHz processing clock:
1 UDP rx, 1 UDP tx,  Transmit latency (from the 1st byte of
1 TCP server, ARP, payload data input to the 1st byte of
Ping, NDP, routing payload data output to the Ethernet MAC):
table, IPv4. IPv6,
23.9µs
32KB TCP buffers
Flip Flops 7293  Receive latency (from the 1st byte of
LUTs 9215 Ethernet MAC input to the 1st byte of
36Kb block RAM 32.5 payload data output): 12.2µs
DSP48 0
If latency is more important than throughput, the
transmit segmentation threshold can be reduced to
X payload bytes. In this more general case,

2
 Transmit latency (from the 1st byte of
payload data input to the 1st byte of Interfaces
payload data output to the Ethernet MAC):
MAC INTERFACE APP INTERFACE
0.5 + 2X/125 µs CLK UDP_RX_DATA(63:0)
The receive latency (from the 1st byte of Ethernet SYNC_RESET UDP_RX_DATA_VALID(7:0)
MAC_TX_DATA(63:0) UDP_RX_SOF
MAC input to the 1st byte of payload data MAC_TX_DATA_VALID(7:0)
UDP RX
DATA UDP_RX_EOF
output): 0.5 + X/125 µs MAC_TX_SOF MAC TX UDP_RX_FRAME_VALID
MAC_TX_EOF DATA
MAC_TX_CTS UDP_RX_DEST_PORT_NO_IN(15:0)
MAC_TX_RTS CHECK_UDP_RX_DEST_PORT_NO
UDP_RX_DEST_PORT_NO_OUT(15:0)
MAC_RX_DATA(63:0)
MAC_RX_DATA_VALID(7:0) UDP_TX_DATA(63:0)
MAC_RX_SOF MAC RX UDP_TX_DATA_VALID(7:0)
MAC_RX_EOF DATA
UDP_TX_SOF
MAC_RX_FRAME_VALID UDP_TX_EOF
UDP TX
DATA UDP_TX_CTS
UDP_TX_ACK
UDP_TX_NAK

UDP_TX_DEST_IP_ADDR(127:0)
UDP_TX_DEST_IPv4_6n

TCP_RX_DATA (63:0)
REPLICATED TCP_RX_DATA_VALID(7:0)
NTCPSTREAMS TCP RX TCP_RX_RTS
TIMES TCP_RX_CTS
DATA
TCP_RX_CTS_ACK
TCP_LOCAL_PORTS

TCP_TX_DATA(63:0)
TCP_TX_DATA_VALID(7:0)
REPLICATED TCP TX TCP_TX_DATA_FLUSH
NTCPSTREAMS DATA TCP_TX_CTS
TIMES TCP_CONNECTED_FLAG
CONTROLS
MAC_ADDR(47:0) CONNECTION_RESET
IPv4_ADDR(31:0) TCP_KEEPALIVE_EN
IPv4_MULTICAST_ADDR(31:0)
IPv4_SUBNET_MASK(31:0)
180013
IPv4_GATEWAY_ADDR(31:0)
IPv6_ADDR(127:0)
IPv6_SUBNET_PREFIX_LENGTH(7:0)
IPv6_GATEWAY_ADDR(127:0)

Component Interface
This interface comprises three primary signal
groups:
 MAC interface (direct connection to COM-
5501SOFT Ethernet MAC or equivalent)
 TCP streams
 UDP frames or UDP streams to/from the
user application.

All signals are clock synchronous. See the


clock/timing section.

3
Configuration IGMP enabled IGMP_EN
enabled (1) / disabled (0)
The key configuration parameters are brought to the Enable when using UDP
interface so that the user can change them multicast addresses
dynamically at run-time. Other, more arcane, Inactive input TX_IDLE_TIMEOUT When
parameters are fixed at the time of VHDL synthesis. stream timeout segmenting a TCP transmit
stream, a packet will be sent out
Pre-synthesis configuration parameters with pending data if no new data
was received within the specified
The following configuration parameters are set timeout.
prior to synthesis in the com5502pkg.vhd package Expressed as integer multiple of
or at the top level component com5502.vhd. 4s.
Configuration Description TCP keep-alive TCP_KEEPALIVE_
parameters in period PERIOD
com5502pkg.vhd period in seconds for sending no
Maximum number NTCPSTREAMS_MAX. data keepalive frames.
of concurrent TCP This applies to all COM5502
streams components instantiated in a "Typically TCP Keepalives are
project. It primarily affects the data sent every 45 or 60 seconds on
width of the TCP interface. an idle TCP connection, and the
Configuration Description connection is dropped after 3
parameters in sequential ACKs are missed"
com5502.vhd CLK frequency CLK_FREQUENCY
Number of NTCPSTREAMS CLK frequency in MHz. Needed
concurrent TCP This applies to a given COM5502 to compute actual delays.
streams for a given instance. Configuration Description
COM5502 Each additional TCP stream parameters in
instance. requires additional resources ping_10g.vhd
(RAM block, logic). Maximum ping MAX_PING_SIZE
size maximum IP/ICMP size
UDP transmit NUDPTX (excluding Ethernet/MAC, but
enabled enabled (1) / disabled (0) including the IP/ICMP header) in
Note: a component handles 64-bit words. Larger echo
multiple ports. requests will be ignored. The
UDP receive NUDPRX ping buffer contains up to
enabled enabled (1) / disabled (0) 18Kbits total (for a queued
Note: a component handles IP/ICMP response waiting for
multiple ports the tx path to become available)
Enable IPv6 IPv6_ENABLED Configuration Description
protocols '1' to allow IPv6 protocols in parameters in
addition to the baseline IPv4. arp_cache2.vhd
'0' to ignore IPv6 messages. Routing table REFRESH_PERIOD(19:0)
MTU size MTU refresh period Refresh period for this routing
Maximum Transmission Unit: IP table. Expressed as an integer
frame maximum byte size. multiple of 100ms. Default value
-- Typically 1500 for standard is 3000 (5 minutes).
frames, 9000 for jumbo frames.
-- A frame will be deemed invalid
if its payload size exceeds this
MTU value.
-- Should match the values in
MAC layer)
-- elastic buffers at the user
interface should be sized to contain
at least 4 IP frames payload. See
ADDR_WIDTH generic
parameter.

4
Run-time configuration parameters
Configuration Description
The user can set and modify the following controls
parameters in
at run-time. All controls are
tcp_txbuf_10G.vhd
synchronous with the user-supplied
tcp_rxbufndemux2
global CLK.
_10G.vhd
Elastic buffer size ADDR_WIDTH
Run-time configuration Description
Specifies the elastic buffer MAC address This network node 48-bit
MAC_ADDR(47:0) MAC address. The user is
size for each stream. Data
width is fixed at 8 bytes. responsible for selecting a
Thus ADDR_WIDTH = 11 unique ‘hardware’ address
indicates a buffer size of for each instantiation.
128 Kbits. Maximum value
= 12 (256Kbits) Natural bit order: enter
Note that the buffer size x0123456789ab for the
must be large enough to MAC address
store two complete IP 01:23:45:67:89:ab
frames payloads (see MTU It is essential that this input
above). matches the MAC address
used by the MAC/PHY.
Configuration Description
IPv4 address Local IP address. 4
parameters in IPv4_ADDR(31:0) bytes for IPv4.
udp_tx_10g.vhd Byte order:
UDP checksum enable UDP_CKSUM_ENABLED (MSB)192.68.1.30(LSB)
(IPv4) IPv4 Subnet Mask Subnet mask to assess
Enable (1) / Disable (0) IPv4_SUBNET_MASK(31:0) whether an IP address is
UDP checksum local (LAN) or remote
computation for IPv4. (WAN)
Objective is to save FPGA Byte order:
resources. (MSB)255.255.255.0(LSB)
Configuration Description IPv4 Gateway IP address One gateway through which
parameters in IPv4_GATEWAY packets with a WAN
stream_2_packets_10g _ADDR(31:0) destination are directed.
. Byte order:
vhd (MSB)192.68.1.1(LSB)
Maximum packet size MAX_PACKET_SIZE IPv6 address Local IP address. 16 bytes
when segmenting a When segmenting a IPv6_ADDR(127:0) for IPv6.
stream to packets transmit stream, a packet Byte order example:
will be sent out as soon as (MSB)FE80::
MAX_PACKET_SIZE 0102:0304:0506:0708(LSB)
bytes are collected. IPv6 Subnet Prefix Length Valid range 64-128
The recommended size is IPv6_SUBNET_PREFIX
512 bytes for a low _LENGTH (7:0)
overhead.
Retransmission timer TX_RETRY_TIMEOUT IPv6 Gateway IP address One gateway through which
IPv6_GATEWAY packets with a WAN
A re-transmission attempt
_ADDR(127:0) destination are directed.
will be made periodically
until routing information is Must be on the same local
available and the transmit network as this device.
path to the MAC is TCP_KEEPALIVE_EN Keep-alive is a mechanism
available. The retry period to detect when a TCP
is expressed as an integer connection is interrupted.
multiple of 4s. Keep-alive messages are
sent periodically. Three
missed keep-alive messages
cause a TCP reset.
Enable (1)/ Disable (0) for
each stream.

5
Throughout this document CTS and RTS refer to
flow control signals "Clear To Send" and "Ready UDP receive interface
To Send" respectively. CTS is generated by the data UDP rx word Output. Receive 0 to 8 bytes.
sink to indicate it can process and/or store incoming UDP_RX_DATA(63:0) Byte order: MSB first (easier
data. RTS is generated by the data source to to read contents during
indicate that data bits are available, should the data simulation)
sink raise its CTS flag. All words in a frame contain
8 bytes, except the last word
which may contain fewer.
UDP-Application Interface UDP rx data valid Output. Indicates the
UDP transmit interface UDP_RX_DATA_VALID meaningful bytes in
UDP transmit word Input: send 0 to 8 bytes. (7:0) UDP_RX_DATA.
UDP_TX_DATA (63:0) Byte order: MSB first (easier 0xFF for 8 bytes, 0x80 for one
to read contents during byte, 0xC0 for two bytes, etc.
simulation). Start Of Frame / End Of Outputs. 1 CLK wide markers
Unused bytes are expected to Frame to delineate the frame
UDP_RX_SOF boundaries.
be zeroed. UDP_RX_EOF
UDP data valid Input. Indicates the SOF = Start Of Frame
UDP_TX_DATA_VALID meaningful bytes in EOF = End Of Frame.
(7:0) UDP_TX_DATA. Aligned with
0xFF for 8 bytes, 0x80 for one UDP_RX_DATA_VALID
byte, 0xC0 for two bytes, etc. UDP_RX_FRAME_VALID Output. The frame validity
UDP_TX_SOF Inputs. 1 CLK wide markers UDP_RX_FRAME_VALID is
UDP_TX_EOF to delineate the frame displayed at the end of frame
boundaries. when UDP_RX_EOF = '1'
SOF = Start Of Frame
EOF = End Of Frame The user is responsible for
Must be aligned with discarding bad frames.
UDP_TX_DATA_VALID
Flow control Output Always check
UDP_CTS ‘1’ = Clear To Send UDP_RX_FRAME_VALID at
‘0’ = input buffer is nearly the end of packet
full. Do not send more data. UDP_RX_EOF = '1') to confirm
that the UDP packet is valid.
The user must check the External buffer may have to
Clear-To-Send flag before backtrack to the the last
sending additional data. The valid pointer to discard an
timing is not precise (it is safe invalid UDP packet.
to send data for a few clocks Reason: we only knows about
after CTS goes low), thanks to bad UDP packets at the end.
an input elastic buffer. CHECK_UDP_RX_ Input. '1' when the COM5502
Outputs DEST_PORT_NO component filters out UDP
Transmission
acknowledgements UDP_TX_ACK: frames sent to a destination
UDP_TX_ACK 1 CLK-wide pulse indicating port other than the user-
UDP_TX_NAK that the previous UDP frame specified
was successfully sent. UDP_RX_DEST_PORT_NO_
IN
UDP_TX_ACK
1 CLK-wide pulse indicating '0' when UDP frames with any
that the previous UDP frame destination port (but with the
could not be sent (destination right IP address) are passed to
not present for example). the user.
UDP_RX_DEST_PORT Input. User-specified UDP rx
USAGE: wait until the _NO_IN destination port (enabled when
previous UDP tx frame CHECK_UDP_RX_
UDP_TX_ACK or DEST_PORT_NO = '1'
UDP_TX_NAK to send the
follow-on UDP tx frame

6
TCP-Application Interface can be pulsed or
continuous.
Prior to synthesis, one must configure the following
constants: The latency between
 The maximum number of TCP servers TCP_RX_CTS and
TCP_RX_DATA_VALID
NTCPSTREAMS_MAX in com5502.pkg. is two clocks.
This limit applies to all instantiated The TCP interface is replicated NTCPSTREAMS
COM5502 components in a project: times, depending on the number of connections
 The number of TCP servers implemented.
NTCPSTREAMS for a given COM5502
instance, as declared in the generic section. See the TCP receive interface timing section for
details.
TCP receive interface (for TCP connection # I)
TCP local port Input. TCP_SERVER Limitations
TCP_LOCAL_PORTS(I)(15:0) port configuration. Each This software does not support the following:
one of the
- IEEE 802.3/802.2 encapsulation, RFC
NTCPSTREAMS
streams handled by this
1042, only the most common Ethernet
component must be encapsulation.
configured with a
distinct port number. Only one gateway is supported at any given time.
This value is used as
destination port number Software Licensing
to filter incoming
packets, and as source The COM-5502SOFT is supplied under the
port number in outgoing following key licensing terms:
packets.
TCP rx word Output. Receive 0 to 8
TCP_RX_DATA(I)(63:0) bytes. 1. A nonexclusive, nontransferable
Byte order: MSB first corporate/organization license to use the
(easier to read contents VHDL source code internally, and
during simulation) 2. An unlimited, royalty-free, nonexclusive
TCP rx data valid Output. Indicates the transferable license to make and use products
TCP_RX_DATA_VALID (I) meaningful bytes in incorporating the licensed materials, solely in
(7:0) TCP_RX_DATA.
bitstream format, on a worldwide basis.
0xFF for 8 bytes, 0x80
for one byte, 0xC0 for
two bytes, etc. The complete VHDL/IP Software License
Ready To Send Output. Agreement can be downloaded from
TCP_RX_RTS(I) Usage: TCP_RX_RTS http://www.comblock.com/download/softwarelicense.pdf
goes high when at least
one byte is in the output
queue (i.e. not yet visible
at the output
TCP_RX_DATA). The
application should then
raise TCP_RX_CTS for
one clock to fetch the
next word 2 CLKs later.
Note that the next word
may be partial (<8 bytes)
or full.
Flow control: Clear To Send Input.
TCP_RX_CTS(I) Flow control signal. '1' to
indicate that the user is
ready to accept the next
TCP rx word. This signal
7
Configuration Management Ready-to-use Hardware
The current software revision is 0. Use examples are available to run on the following
Comblock hardware modules:
Directory Contents  COM-1800 FPGA (XC7A100T) + ARM +
DDR3 SODIMM socket + GbE LAN
/project_1 Xilinx Vivado 2017.4 project
development platform
/doc Specifications, user manual, http://www.comblock.com/com1800.html
implementation documents
/src .vhd source code, .ucf constraint files,  ComBlock COM-5104 10G Ethernet
.pkg packages. network interface (SFP+ connector to 4-
One component per file. lane XAUI to FPGA)
/sim Testbenches, Wireshark capture files as http://www.comblock.com/com5104.html
simulation stimulus
/bin .bit configuration file for use with COM- All hardware schematics are available online at
1800/COM-5104 hardware modules. comblock.com/download/com_1800schematics.pdf
/use_example Source code for use with COM- comblock.com/download/com_5104schematics.pdf
1800/COM-5104 hardware modules

Test components (stream to packets Acronyms


segmentation, etc) are in directory
\use_example\src Directory Contents
ARP Address Resolution Protocol (only for IPv4)
Key project file: CTS Clear To Send (flow control signal)
Xilinx ISE project file: com-5502_ISE14.xise EOF End Of Frame
LSB Least Significant Byte in a word
MSB Most Significant Byte in a word
VHDL development environment MTU Maximum Transmission Unit (frame length)
The VHDL software was developed using the NDP Neighbor Discovery Protocol
following development environment: RX Receive
(a) Xilinx Vivado 2017.4 as synthesis tool RTS Ready To Send (flow control signal)
(b) Xilinx Vivado 2017.4 as VHDL simulation SOF Start Of Frame
tool TCP Transmission Control Protocol
TX Transmit
For best FPGA place and route timing, the UDP User Datagram Protocol
recommended Xilinx Vivado synthesis settings are
the default + keep equivalent registers + no
resource sharing.

8
Top-Level VHDL hierarchy on the fly and responds with the local MAC
address information.
- The PING_10G.vhd component detects
ICMP echo (ping) requests and assembles a
ping echo Ethernet packet for transmission
to the MAC. Instantiated once. Ping works
for both IPv4 and IPv6.
- The WHOIS2_10G.vhd component
generates an ARP request broadcast packet
(IPv4) or a Neighbor solicitation message
(IPv6) requesting that the target identified
by its IP address responds with its MAC
address.
- The ARP_CACHE2_10G.vhd component is
a shared routing table that stores up to 128
IP addresses with their associated 48-bit
MAC addresses and a ‘freshness’
timestamp. This component determines
whether the destination IP address is local
or not. In the latter case, the MAC address
of the gateway is returned. Only records
regarding local addresses are stored (i.e. not
WAN addresses since these often point to
the router MAC address anyway). An
arbitration circuit is used to arbitrate the
routing request from multiple transmit
instances. Instantiated once.
The code is stored with one, and only one, - The flexible UDP_TX_10G.vhd component
component per file. encapsulates a data packet into a UDP
frame addressed from any port to any
The root entity (highlighted above) is port/IP destination. Supports both IPv4 and
COM5502.vhd. It contains instantiations of the IP IPv6. Generally instantiated once,
protocols and a transmit arbitration mechanism to irrespective of the number of source or
select the next packet to send to the MAC/PHY. destination UDP ports. However, multiple
instantiations can easily be implemented by
The root also includes the following components: modifying the COM5502.vhd top level
- The PACKET_PARSING_10G.vhd code (search for the TX_MUX_00x and
component parses the received packets RT_MUX_00x processes). Multiple
from the MAC and efficiently extracts key instances are useful when multiple UDP
information relevant for multiple protocols. sources need transmit arbitration to prevent
Parsing is done on the fly without storing collisions.
data. Instantiated once. - The UDP_RX_10G.vhd component
- The ARP_10G.vhd component detects ARP validates received UDP frames and extracts
requests and assembles an ARP response the data packet within. As the validation is
Ethernet packet for transmission to the performed on the fly (no storage) while
MAC. Instantiated once. ARP only applies received data is passing through, the
to IPv4. For IPv6, use neighbour discovery validity confirmation is made available at
protocol instead. the end of the packet. The calling
application is therefore responsible for
- The ICMPV6_10G.vhd component detects discarding packets marked as 'invalid' at the
incoming IP/ICMPv6 neighbor solicitations end. See PACKETS_2_STREAM_10G.vhd

9
for assistance in discarding invalid packets. the end of packet. It is designed to connect
Instantiated once, irrespective of the seamlessly with the TCP_RX.vhd
number of UDP ports being listened to. component.
- The TCP_SERVER_10G.vhd component is
the heart of the TCP protocol. It is written
parametrically so as to support
NTCPSTREAMS concurrent TCP
connections. It essentially handles the TCP
state machine of a TCP server: initially
listening for connection requests from
remote TCP clients, establishing and
tearing down the connections and managing
flow control and byte ordering while the
connections are established. Since this is a
server, it does not know a priori whether
the protocol is IPv4 or IPv6 (it depends on
the client), so each server is given two IP
addresses, one for each IP version.
- The TCP_TX_10G.vhd component formats
TCP tx frames, including all layers: TCP,
IP, MAC/Ethernet. It is common to all
concurrent streams and is thus instantiated
once.
- The TCP_TXBUF_10G.vhd component
stores TCP tx payload data in individual
elastic buffers, one for each transmit
stream. The buffer size is configurable prior
to synthesis through the ADDR_WIDTH
generic parameter.
- The TCP_RXBUFNDEMUX_10G.vhd
component demultiplexes several TCP rx
streams. This component has two
objectives: (1) tentatively hold a received
TCP frame on the fly until its validity is
confirmed at the end of frame. Discard if
invalid or further process if valid.
(2) demultiplex multiple TCP streams,
based on the destination port number.

Additional components are also provided for use


during system integration or tests.
- STREAM_2_PACKETS_10G.vhd segments
a continuous data stream into packets. The
transmission is triggered by either the
maximum packet size or a timeout waiting
for fresh stream bytes.
- PACKETS_2_STREAM_10G.vhd
reassembles a data stream from received
valid packets while discarding invalid
packets. The packet’s validity is assessed at
10
VHDL simulation
Test benches are provided for HDL simulation of Clock / Timing
UDP transmit, UDP receive. The COM-5502SOFT can connect to 10G Ethernet
MAC as well as to lower-speed 10/100/1000Mbps
Several test benches use Wireshark Libpcap Ethernet MAC without any code change. However,
network capture files as stimulus. See Libcap File the clock domains are different, as illustrated by the
Player two use-cases below.
For TCP server simulation, a TCP client simulator
is needed (not supplied), because of the interactive 4*
156.25 MHz clock domain user clock domain

nature of the TCP protocol. The COM-5502SOFT + 3.125Gbps

COM-5503SOFT bundle allows the comprehensive 10G DPRAM USER


TCP server - TCP client simulation. 10G
PHY
XAUI ETHERNET
MAC
10G IP
STACK
APPLICATION
DPRAM

The testbenches (tb*.vhd) are located in the /sim 180008

directory
125/25/2.5MHz clock domain user clock domain
Quick start:
In the Xilinx Vivado, open a .xpr project. The 10/100/1000 Mbps
ETHERNET DPRAM
available testbenches are displayed as illustrated 10/100/1000
PHY
MAC 10G IP
STACK
USER
APPLICATION
below. Start the simulator. In the simulator, open DPRAM

the stored .wcfg configuration file which bears the


same name as the testbench. 180008

At 10G speed, the COM-5502SOFT uses the same


156.25 MHz clock as the 10G Ethernet MAC. If the
user application uses a different clock, dual-port
RAMs must be used to cross the clock domain
When the COM-5502SOFT is connected with the
lower-speed 10/100/1000 Mbps tri-mode Ethernet
MAC, dual-port RAMs within the Ethernet MAC
are used to cross the clock domains. The COM-
5502SOFT can then use the same clock as the user
application.

The COM-5502SOFT code is written to run at


156.25 MHz on a Xilinx Artix7 -1 speed grade with
2 concurrent TCP streams instantiated.

11
UDP Receive Latency Alternative lower-latency method:
In order to minimize the latency, UDP payload When low-latency is a priority, the
bytes are forwarded directly to the user application
TCP_RXBUFNDEMUX2_10G.vhd component may be
interface with only a partial validation. This allows bypassed (requires minor code editing). In this case,
the application to start processing the UDP payload
the user application is responsible to discarding
data without delay since frame errors are very rare. invalid frames based upon the
However, the complete validation information is
RX_FRAME_VALID confirmation at the end of
only available at the end of the UDP frame frame RX_EOF.
(UDP_RX_EOF). The user application is
responsible to discarding invalid frames based upon
the UDP_RX_FRAME_VALID confirmation.

Validation checks performed prior to the first UDP


payload word (UDP frame is not forwarded to the
user application if any of these checks fail):
 IP datagram
 Destination IP address
 IPv4 header checksum
 UDP protocol
 Destination UDP port (when enabled) Validation checks performed prior to the first TCP
Validation checks performed at the end of UDP payload word in an IP frame (TCP payload data is
frame (user is responsible for discarding the frame not forwarded out to the user application if any of
if any of these checks fail): these checks fail):
 UDP checksum  IP datagram
 Destination IP address
 IPv4 header checksum
 TCP connection
TCP Receive Latency
 TCP protocol
In the baseline code, the TCP receive payload data
goes through the TCP_RXBUFNDEMUX2_10G.vhd  Destination TCP port
component which conveniently discards bad frames  No gap in received sequence
and packs payload data into neat 64-bit words.
 Non-zero data length
The 'price to pay' for this convenience is a delay
which can be significant as the user application is  Originator is identified (no spoofing)
notified of valid payload bytes at the end of an IP
frame.
Validation checks performed at the end of TCP
frame (user is responsible for discarding the frame
if any of these checks fail):
 TCP checksum

Troubleshooting
1. PC cannot ping or receive UDP frames or
establish a TCP connection.
12
One likely cause may be Windows security.
Declaring the network adapter as a "Private
Network" makes it easier to access the FPGA board
from the PC.

One method is to define the default gateway field in


the network adapter IPv4 configuration as the
FPGA board IPv4 address .

13
TCP receive interface timing
The receive interface for TCP and UDP are somewhat different. Each TCP stream stores the receive data in an
independent output buffer. The application controls the output rate through TCP_RX_CTS (Clear-To-Send)
pulses. One TCP_RX_CTS pulse will fetch the next word as long as it contains at least one byte. The
TCP_RX_RTS goes high when at least one byte is unread in the output buffer. Thus the application should do
the following:
1. Check TCP_RX_RTS until it indicates data hidden in the output buffer
2. Send a APP_RX_CTS pulse (1 clock wide)
3. Get the data byte(s) in APP_RX_DATA(63:0). The number of bytes available is shown in
APP_RX_DATA_VALID(7:0)
4. Wait until the output word is full (TCP_RX_DATA_VALID(7:0) = xFF) to get the full word
contents. Filling the output word with incoming bytes is automatic. Note that this may take several
clocks, depending on the rate at which data is sent over the LAN.
5. Repeat steps 1-5

The timing diagrams below illustrates this interface's timing.

The transmitted sequence is 010203..etc.


The first two output words contain 8 valid data bytes. They are available one clock after requested the
application generates the TCP_RX_CTS pulse.
The third output word contains only two words (the last two words received at this time). The next
TCP_RX_CTS is ignored as there is no other data waiting in the output buffer.

The third output word is being filled as data arrives. Note that the number of valid bytes is updated with some
delay as the component must confirm each frame's validity at the end of each Ethernet frame.

UDP Transmitter Latency


Before sending a UDP frame, the data must be stored in a buffer while the checksum is being computed.
Therefore, the transmit latency depends to a large part on the size of the UDP frame, since transmission can only
start after the last word is received from the user.

For example: in the case of a 2048-byte frame, the transmit latency is 1.734us

14
The 10Gbits/s capacity is nearly fully utilized in the case of 2048-byte UDP frames: 2048 bytes/1.702us = 9.62
Gbits/s

Libcap File Player


Real network packets captured by the popular Wireshark LAN analyzer can be used as realistic stimulus for the
COM-5502 software. The tbcom5502.vhd test bench reads a libpcap-formatted file as captured by Wireshark
and feeds it to the COM-5502 receive path. The input file must be named input.cap and be placed in the same
directory as the Vivado project.

The libpcap file format is described in http://wiki.wireshark.org/Development/LibpcapFileFormat

Note that Wireshark is sometimes unable to capture checksum fields when the PC operating system offloads the
checksum computation to the network interface hardware. In order to still be allowed to simulate, set
SIMULATION := ‘1’ in the generic map section of the COM5502.vhd component. When doing so,
(a) the IP header checksum is considered valid when read as x”0000”.
(b) The TCP checksum computation is forced to a valid 0x0001, irrespective of the 16-bit field captured by
Wireshark.

15
Components details

WHOIS2.VHD
Before sending any IP packet, one must translate the destination IP address into a 48-bit MAC address.
A look-up table (within arp_cache2.vhd) is available for this purpose. Whenever there is
no entry for the destination IP address in the look-up table, an ARP request is broadcasted
to all asking for the recipient to respond with an ARP response. The main task of the
whois2.vhd component is to assemble and send this ARP request.

ARP_CACHE2.VHD
A block RAM is used as cache memory to store 128 MAC/IP/Timestamp records. Each record
comprises (a) a 48-bit MAC address, (b) the associated IP address (32-bit IPv4 or 64-bit local IPv6)
and (c) a timestamp when the information was last updated. The information is updated continuously
based on received ARP responses and received IP packets. The component keeps track of the oldest
record, which is the next record to be overwritten.

Whenever the application requests the MAC address for a given IP address (search key), this
component searches the block RAM for a matching IP address key. If found, it returns the associated
MAC address. If the search key is not found or is older than a refresh period, this component asks
whois2.vhd to send an ARP request packet.

The code is optimized for fast access. Response time is between 32ns and 850ns depending on the
record location in memory.

This routing table is instantiated once and shared among multiple instances requiring routing services.
An arbitration circuit is used to sequence routing requests from several transmit instances (for example
several instantiations of the UDP_TX component).

16
ComBlock Compatibility List
FPGA development platform
COM-1800 FPGA (XC7A100T) + ARM + DDR3 SODIMM socket + GbE LAN development
platform
Network adapter
COM-5104 10G Ethernet network interface
Software
COM-5501SOFT 10Gbps Ethernet MAC. VHDL source code.
COM-5401SOFT 10/100/1000 Mbps Ethernet MAC. VHDL source code.
COM-5502SOFT IP/UDP/TCP/ARP/PING stack for 10GbE. VHDL source code.
COM-5503SOFT IP/UDP/TCP CLIENT/ARP/PING stack for 10GbE. VHDL source code.

ComBlock Ordering Information


COM-5502SOFT IP/TCP SERVER/UDP/ARP/PING PROTOCOL STACK for 10GbE, VHDL SOURCE
CODE

ECCN: 5E001.b.4

MSS • 845 Quince Orchard Boulevard Ste N •


Gaithersburg, Maryland 20878-1676 • U.S.A.
Telephone: (240) 631-1111
Facsimile: (240) 631-1676
E-mail: sales@comblock.com

17

You might also like