Professional Documents
Culture Documents
Efficient Communication
Between
Hardware Accelerators and PS
Recommended Videos & Slides
M.S. Sadri, ZYNQ Training
High-bandwidth Direct
Memory Access (DMA)
between a memory-
mapped source address
and a memory-mapped
destination address
Optional Scatter Gather
(SG)
Initialization, status, and
control registers are
accessed through an
AXI4-Lite slave interface
Source: Building Zynq Accelerators with Vivado HLS, FPL 2013 Tutorial
AXI DMA-based Accelerator Communication
Write to Accelerator
processor allocates buffer
processor writes data into buffer
processor flushes cache for buffer
processor initiates DMA transfer
........
......
/* Receive a packet */
Status = XAxiDma_SimpleTransfer(&AxiDma,(u32) RxBufferPtr,
BYTES_TO_RCV, XAXIDMA_DEVICE_TO_DMA);
if (Status != XST_SUCCESS) { return XST_FAILURE; }
while (!RxDone);
Transmitting a Packet
Using Lower-Level Functions
/* Transmit a packet */
Xil_Out32(AxiDma.TxBdRing.ChanBase +
XAXIDMA_SRCADDR_OFFSET, (u32) TxBufferPtr);
Xil_Out32(AxiDma.TxBdRing.ChanBase + XAXIDMA_CR_OFFSET,
Xil_In32(AxiDma.TxBdRing.ChanBase +XAXIDMA_CR_OFFSET)
| XAXIDMA_CR_RUNSTOP_MASK);
Xil_Out32(AxiDma.TxBdRing.ChanBase +
XAXIDMA_BUFFLEN_OFFSET, BYTES_TO_SEND);
while (TxDone == 0);
Receiving a Packet
Using Lower-Level Functions
/* Receive a packet */
Xil_Out32(AxiDma.RxBdRing.ChanBase +
XAXIDMA_DESTADDR_OFFSET, (u32) RxBufferPtr);
Xil_Out32(AxiDma.RxBdRing.ChanBase+XAXIDMA_CR_OFFSET,
Xil_In32(AxiDma.RxBdRing.ChanBase+XAXIDMA_CR_OFFSET)
| XAXIDMA_CR_RUNSTOP_MASK);
Xil_Out32(AxiDma.RxBdRing.ChanBase +
XAXIDMA_BUFFLEN_OFFSET, BYTES_TO_RCV);
while (RxDone == 0);
PL-PS Interfaces
Source: Building Zynq Accelerators with Vivado HLS, FPL 2013 Tutorial
Coherent AXI DMA-based Accelerator
Communication
Write to Accelerator
processor allocates buffer
processor writes data into buffer
processor flushes cache for buffer
processor initiates DMA transfer
128K
kmalloc Selection of Pakcets:
(image_size (image_size
dma_alloc_coherent (Addressing)
bytes) bytes)
Depends on the memory - Normal
Sharing method - Bit-reversed
@Source @Dest
Address Address
Loop: N times
Image Sizes: Measure execution interval.
4KBytes
16K FIFO: 128K
65K
128K FIR
256K read write
1MBytes process
2MBytes
Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini Energy and performance 46
exploration of ACP Using ZYNQ
Memory Sharing Methods
Accelerat ACP SC
or U L2 DRAM
ACP Loses!
CPU OCM between
CPU ACP & CPU HP
298MBytes/s
239MBytes/s
4K 16K
64K 128K
256K 1MBytes
Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini Energy and performance
exploration of ACP Using ZYNQ
48
Energy Comparison
Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini Energy and performance 49
exploration of ACP Using ZYNQ
Lessons Learned & Conclusion
Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini Energy and performance 50
exploration of ACP Using ZYNQ