You are on page 1of 4

10.1109/ULTSYM.2015.

0514

FPGA Implementation of Low-Power 3D


Ultrasound Beamformer
Richard Sampson , Ming Yang , Siyuan Wei , Rungroj Jintamethasawat ,
Brian Fowlkes , Oliver Kripfgans , Chaitali Chakrabarti , and Thomas F. Wenisch
Department of EECS, University of Michigan
Schoolof ECEE, Arizona State University
Department of Radiology, University of Michigan

Abstract3D ultrasound is common for non-invasive sign that uses a new delay calculation algorithm coupled
medical imaging in cardiology and OB-GYN because of with custom hardware to achieve high frame-rate, high
its accuracy, safety, and real-time ease of use. However, resolution 3D synthetic aperture ultrasound images, all
high bandwidth requirements and extreme computational
complexity have precluded hand-held and low-power 3D within a small, 15 W power budget.
systems, limiting 3D applications. In previous work, we However, our previous evaluations of Sonic Millip3De
presented Sonic Millip3De, a hardware design that can relied on synthesis of a Verilog model of the hardware
efficiently handle the high computational demand of real- design, and our image quality evaluation relied exclu-
time 3D synthetic aperture beamforming, even in handheld sively on synthetic echo data from Field II [1, 3] and
and mobile applications. The design combines a custom,
highly parallel hardware system with a novel delay ap- simulation of the proposed beamforming algorithm in
proximation method to quickly produce high quality 3D MATLAB. Hence, a substantial gap remained between
image data within an estimated 15 W full-system power our proposed design and its validation.
budget. Prior evaluations of the design relied on software In this work, we build on our previous results by
prototypes; this work extends previous evaluations with implementing a hardware prototype of a single channel
an FPGA implementation of the beamforming accelerator,
validating the results of earlier prototypes. In particular, of the Sonic Millip3De beamformer design on an Altera
we carry out image quality analyses of our beamforming DE1 FPGA board. Using this prototype, we then carry
architecture using simulated 3D echo data (from Field II) out image quality analyses using synthetic 3D echo data
and 2D artificial tissue phantom data acquired using a (from Field II) and 2D artificial tissue phantom data
Verasonics V-1 system and Philips P4-1 probe. We compare acquired using a Verasonics V-1 system and Philips P4-1
results from the FPGA implementation to an ideal software
beamformer and prior software prototypes of the Sonic probe. We compare results from the FPGA implementa-
Millip3De design. tion to an ideal software beamformer and prior software
prototypes of the Sonic Millip3De design. Finally, we
I. I NTRODUCTION report on the performance of the FPGA prototype taking
As technology advances, medical imaging systems into account the limitations of our scaled-down imple-
continue to become smaller and more portable. Portable mentation to re-examine our performance projections for
ultrasound systems, in particular, are becoming more the full-scale ASIC design.
prevalent due to the inherent low transmit power of the
modality as well as its safety and ease of usemany II. S ONIC M ILLIP 3D E
2D hand-held systems are available today. In contrast, Sonic Millip3De [4, 5] is our previously proposed
real-time 3D imaging remains intractable in hand-held hardware solution for portable, hand-held 3D ultrasound.
systems due to the high data rates and enormous compu- The design features custom hardware to implement our
tational requirements; versatile, state-of-the-art real-time iterative delay estimation algorithm, which eliminates the
3D systems [2] are larger than a commercial refrigerator. need for a large look-up-table (LUT) or complex hard-
In our previous work, we proposed Sonic Mil- ware for on-the-fly delay calculations. In addition to the
lip3De [4, 5], a new approach to 3D ultrasound beam- custom beamforming accelerator, Sonic Millip3De also
forming specifically targeted at a low-power and portable makes use of state-of-the-art 3D die stacking technology
design to enable small, hand-held, real-time 3D systems. to allow the transducers, storage, and beamformer silicon
Sonic Millip3Des key innovation is a highly parallel de- layers to connect directly to each other, creating an

978-1-4799-8182-3/15/$31.00 2015 IEEE 2015 IEEE International Ultrasonics Symposium Proceedings


Fig. 1: Sonic Millip3De Hardware Overview. The full-scale ASIC design features 3 stacked die layers connected with through-
silicon vias. The first layer comprises transducers grouped into banks with one transducer per bank in each subaperture. Analog
transducer outputs from each bank are multiplexed and routed over TSVs to the second layer, comprising 12-bit ADC units
and SRAMs arrays for storage. The stored data is passed via face-to-face links to layer 3 for processing in the 3 stages of
the beamsum accelerator. The interpolation stage upsamples the signal by 4x and performs apodization. The select stage maps
signal data from the receive time domain to the image space domain for multiple scanlines. The summing unit combines the
incoming signal from all beamsum nodes over a pipelined interconnect and compounds previous image data, and the resulting
updated image is written back to memory. Our FPGA prototype implements a single channel of the beamforming layer.

efficient design that can provide the required bandwidth approach reduces storage by orders of magnitude while
from the transducers to the beamformer. We provide a requiring only simple arithmetic circuits.
brief overview of the algorithmic and hardware innova-
tions of the design. B. Hardware Design
The full-scale Sonic Millip3De system design (Fig. 1)
A. Delay Estimation comprises three distinct silicon die layers (transducers,
At the heart of the Sonic Millip3De design is a new ADC/SRAM, and beamformer) that are connected ver-
approach we developed for calculating delays in a delay- tically using through-silicon vias (TSVs). This design
sum synthetic aperture system. One of the key challenges technique reduces wire lengths as the short TSVs elim-
of large 3D systems is that over 100 billion round-trip inate the long interconnect wires required in a con-
delay calculations are needed to convert the RF channel ventional single-die, planar design. TSVs facilitate tight
data into beamformed images. The standard 2D approach and dense connections between the layers, providing
of pre-computing delays and storing them in a look- the necessary bandwidth that a large 3D ultrasound
up table (LUT) is infeasible for any high-frame rate, transducer array requires.
portable design, as the table would require 10s to 100s This study examines the third layer of the design,
of GB of storage. In contrast, computing delays on-the- comprising the 1024-channel beamforming accelerator,
fly while beamforming requires no storage. However, which generates the image from the RF data stored in
exact calculation requires trigonometric and square root the ADC/Storage layer. The accelerator itself comprises
functions, where are computationally expensive, and three primary components that each perform a distinct
billions of such computations are similarly infeasible in part of the beamforming process. The interpolation unit
a small, low-power system. up-samples the RF data via linear 4 interpolation and
Sonic Millip3De instead allows for an efficient com- performs apodization. The select unit uses the previously
promise between these methods by using the key insight described delay estimation algorithm to map the inter-
that, along a scanline, the change in the channel index polated data onto scanlines. The unit contains multiple
value from one focal point to the next can be approx- sub-units (8 in our FPGA prototype) that operate in
imated accurately with piecewise quadratic formulae. parallel to each generate a single scanline. The summing
By computing the delay values iteratively along each unit combines scanline data across the parallel channels
scanline, the quadratic formulae can be calculated using via a uni-directional ring network. The network also
only add operations and a small table of pre-computed transfers coefficients for the select and interpolation unit
coefficients. Relative to a fully pre-computed LUT, our as the accelerator iterates over all scaliness. The network
TABLE I: System parameters for 2D and 3D results TABLE II: Root-mean-square difference in pixel value across
Parameter 2D 3D each beamforming method pair, expressed as a percent of the
Number of Transducers 96 1024 (32 x 32)
40 dB dynamic range.
Number of Scanlines per Image 192 1024 (32 x 32) Image Pair 2D RMS 3D RMS
Number of Points per Scanline 780 1560 Float-Fixed 6.8% 0.7%
Image Depth 7 cm 4 cm Float-FPGA 9.9% 0.5%
Image Total Angular Width /2 /3 x /3 Fixed-FPGA 9.4% 0.6%
Transmission Frequency 2.5 MHz 4 MHz
Sampling Frequency 10 MHz 40 MHz
Interpolation Factor 4x 4x II FPGA chip. This FPGA is quite small, and can
Interpolated Sampling Frequency 40 MHz 160 MHz accommodate only a single beamforming channel with
Speed of Sound (tissue) 1540 m/s 1540 m/s a half-width select unit (processing eight rather than 16
scanlines concurrently).
terminates at a low-power ARM Cortex-A3 processor, We iteratively load each transducers receive signal
which manages off-chip memory accesses. into the FPGA and process the channel. To automate
III. M ETHODOLOGY channel processing, we implemented a MATLAB pro-
A. Data Collection gram that manages communication with the FPGA, itera-
tively loading the beamforming coefficients and RF data.
2D: We acquire RF echo data from an artificial tissue
The FPGA then fully processes the image volume for the
phantom using a Philips P4-1 operating at a 2.5 MHz
single channel, producing eight scanlines at a time in
center frequency. The transmission pulses of all 96
repeated passes over the channel data. The partial sums
transducers are time-delayed to create a virtual source
from all channel are then accumulated in MATLAB.
located 1 mm behind the transducer head. We apply
Hamming window apodization to the transmit pulses. We IV. R ESULTS
collect the unprocessed RF data with a Verasonics V-1 A. Image Quality
system and then perform beamforming on the FPGA.
Fig. 2 shows x-z and y-z slices of the 3D results and
3D: Currently, commercially available 3D ultrasound
Fig. 3 shows the beamformed image results from the 2D
probes do not provide access to unprocessed RF data.
phantom data. By visual comparison, the images are all
Hence, we use synthetic echo data, simulated in Field
similar with notable objects equally identifiable in the
II [1, 3], to create an input data set for 3D beamforming.
images with no apparent degradation or artifacts.
We simulate a 32x32 transducer receive aperture and
To compare the images quantitatively, we report the
transmit from a center circle of 78 transducers. Similar
root-mean-square difference in pixel values, expressed
to the 2D methodology, transmission pulses are time-
as a percent of the 40 dB range, for each pair of
delayed to create a virtual source 1 mm behind the array.
beamforming methods (shown in Table II). (These reflect
B. Beamforming the entire image volume, not just the shown slices). The
We evaluate image quality by contrasting three beam- small differences, particularly for 2D, are primarily a
forming methods for both 2D and 3D echo data using consequence of rounding errors in type conversion when
the parameters described in Table I. First, we generate pre-processing the RF data to load it onto the FPGA.
a ideal baseline image using MATLAB and double pre-
B. FPGA Performance
cision floating-point, calculating delays precisely based
on Euclidean distance. Second, we model the operation We report performance (time to compute a frame) of
of the hardware in MATLAB, using 12-bit fixed-point the FPGA and contrast it with the projected performance
arithmetic and our iterative delay calculation approach, of a full-scale Sonic Millip3De ASIC implementation.
as described in II-A. Finally, we perform beamforming The FPGA prototype runs at 50 MHz and computes a
in hardware on the DE1 FPGA board. partial image in 0.2 seconds per transducer channel in
In all scenarios, we pre-process RF data using time- 2D, and 2.3 seconds per transducer in 3D. When scaled
gain compensation. We truncate input signals to 12-bit based on our prior estimates [4, 5] for 1 GHz ASIC clock
precision for the fixed-point methods. After beamform- frequency (20x speedup), 16 instead of 8 way parallelism
ing, we perform demodulation and Cartesian re-gridding in select units (2x speedup), and individual high-speed
(to facilitate easier viewing of 3D images) in software. memories (estimated 24x speedup), we project that a
full-scale Sonic Millip3De implementation can generate
C. FPGA Processing 2D images at over 4k frames per second and 3D images
We implement a Sonic Millip3De beamforming chan- at over 400 volumes per second for the system configu-
nel on an Altera DE1 board which features a Cyclone rations described in Table I.
(a) (b) (c)

(d) (e) (f)


Fig. 2: 3D Image Quality Comparison. Beamformed images of a simulated 3D cylindrical cyst using Field II [1, 3]. (a-c)
depict x-z slices and (d-f) depict y-z slices. (a,d) Generated in MATLAB with full delay and double-precision floating-point
(b,e) Generated in MATLAB with 12-bit fixed-point and iterative delay estimation (c,f) Generated on the FPGA prototype.

in real-time 3D systems. In previous work, we proposed


a new approach to real-time 3D synthetic aperture beam-
forming with our Sonic Millip3De system. In this work,
we validate the Sonic Millipede approach with a single-
channel FPGA prototype. Our image quality comparison
demonstrates that our approach can producing high-
quality images and validates previous projections of the
(a) (b) achievable frame rate of a full-scale design.
ACKNOWLEDGEMENTS
This work was supported in part by NSF CCF-
1406739, CCF-1406810, and CSR-0910699.
R EFERENCES
[1] J. Jensen. FIELD: A Program for Simulating Ultrasound Systems.
In Nordicbaltic Conf. on Biomedical Imaging, 1996.
[2] J. Jensen et al. SARUS: A synthetic aperture real-time ultrasound
system. IEEE Trans. on Ultrasonics, Ferroelectrics, and Frequency
(c) (d) Control, 60(9):18381852, Sep 2013.
[3] J. Jensen and N. Svendsen. Calculation of pressure fields from
Fig. 3: 2D Image Quality Comparison. (a) Phantom guide.
arbitrarily shaped, apodized, and excited ultrasound transducers.
Phantom features two hyperechoic gray scale targets at 3 cm:
IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency
+6 dB (left) and >+15 dB (right). (b) Image generated with
Control, 39(2):262 267, March 1992.
double-precision floating point and exact delay calculation. (c)
[4] R. Sampson, M. Yang, S. Wei, C. Chakrabarti, and T. F. Wenisch.
The same image generated via our delay estimation and 12-bit
Sonic Millip3De: A massively parallel 3D-stacked accelerator for
fixed-point precision in MATLAB. (d) Image generated on the
3D ultrasound. Proc. of IEEE 19th International Symp. on High
FPGA prototype. Performance Computer Architecture, Feb. 2013.
[5] R. Sampson, M. Yang, S. Wei, C. Chakrabarti, and T. F. Wenisch.
V. C ONCLUSIONS Sonic Millip3De with Dynamic Receive Focusing and Apodization
2D portable ultrasound is now common, but over- Optimization. Proc. of IEEE International Ultrasonics Symp., July
whelming computation requirements hamper portability 2013.

You might also like